Data Science at SparkCognition

Kevin Gullikson

You probably won't get an R1 faculty job

  • 27 Tenure-track faculty jobs nation-wide as of Tuesday night (AAS job register)

Besides, astronomy is basically over now, right?

  • 82 job listings for data scientist in Austin (linkedin)

What is SparkCognition?

DeepArmor

Example projects:

Detecting Icing on Wind Turbines

Clustering/Anomaly detection in Natural Gas Turbines

Tools

Data Science Roles (at SparkCognition)

  • Build models for individual clients
    • Usually one data scientist per client (well, $<1$ data scientist...)
    • Data munging
    • Visualization of high-dimensional data ($D \sim 300 - 1000$)
    • Usually unsupervised - clients are looking for anomalies or different operating modes
  • Communicate model results to client, sometimes as often as 1x/week
  • Work on general-purpose tools
    • Extending scikit-learn models
    • Finding other useful libraries
    • Implementing algorithms from the literature.

Data Scientist Skills

  • Coding

    • Mostly Python or R
    • Version control (git)
    • Best practices (OOP, reusable code, comments, docstrings...)
  • Machine Learning

    • Have practical experience! Using in science is great, in a blog is good too.
    • Understand what is happening under the hood for common algorithms
  • Statistics

  • Communication of technical results to non-technical audiences

    • Sometimes you get to talk to data-literate people at other companies
    • Most of the time you don't
    • Informative plots, have a story to describe
  • Working with software developers to integrate DS models into pretty applications

Surprisingly similar feel to academia

Before:

After:

Differences with Academia

Perks!

Everyone is working towards the same goal

Much faster timeline

  • Proof-of-concepts < 1 month
  • Full projects ~ 3 months
  • If you're working on the same thing for a year, something is very wrong.

How can I get into data science?

  • Use python for data analysis/visualization
    • Consider a software carpentry course (scipy)
  • Give public talks on your research
  • If you can, incorporate ML models into your research
    • If that isn't going to work as well, start playing with toy projects
  • Start teaching yourself statistics
    • Especially if you're a theorist
    • Rob's class and associated book are great
    • The astroml book is great too
  • Go to meetups and talk to people!