Practical Skills That Practical Data Scientists Need

Interesting blog post about the set of skills that one “data scientist” uses in practice.

While a lot of these posts contain job-specific and employer-specific information, you still get a sense of what employers want. I will use the information about SQL to shape my undergraduate econometrics lectures related to database. He lists the following as “essential SQL concepts and functions that I find necessary”:

  • DESCRIBE and EXPLAIN
  • WHERE clauses, including IN (…)
  • GROUP BY
  • Joins, mostly left and inner
  • Using already indexed fields
  • LIMIT and OFFSET
  • LIKE and REGEXP
  • if()
  • String manipulation, primarily left() and lower()
  • Date manipulation: date_add, datediff, to and from UNIX timestamps, time component extraction
  • regexp_extract (if you’re lucky to use a database that supports it) or substring_index (if you’re less lucky)
  • Subqueries

Although I don’t use databases very much in my own work (text files work just as well for my small datasets, they’re easy to track/backup in a Git repo, and any coauthors can use them trivially), just about every employer I’ve talked to has mentioned SQL as a necessary skill. This will make a nice reference point as I redesign my course. Now if only I could figure out the most effective way to teach students about cleaning up messy data…

Last Update: 2016-02-24