Practical Skills That Practical Data Scientists Need
Interesting blog post about the set of skills that one “data scientist” uses in practice.
While a lot of these posts contain job-specific and employer-specific information, you still get a sense of what employers want. I will use the information about SQL to shape my undergraduate econometrics lectures related to database. He lists the following as “essential SQL concepts and functions that I find necessary”:
- DESCRIBE and EXPLAIN
- WHERE clauses, including IN (…)
- GROUP BY
- Joins, mostly left and inner
- Using already indexed fields
- LIMIT and OFFSET
- LIKE and REGEXP
- if()
- String manipulation, primarily left() and lower()
- Date manipulation: date_add, datediff, to and from UNIX timestamps, time component extraction
- regexp_extract (if you’re lucky to use a database that supports it) or substring_index (if you’re less lucky)
- Subqueries
Although I don’t use databases very much in my own work (text files work just as well for my small datasets, they’re easy to track/backup in a Git repo, and any coauthors can use them trivially), just about every employer I’ve talked to has mentioned SQL as a necessary skill. This will make a nice reference point as I redesign my course. Now if only I could figure out the most effective way to teach students about cleaning up messy data…
Last Update: 2016-02-24