Some Advice on Programming for New Grad Students
Inspired by exceedingly lame Twitter arguments about “Stata vs R”, “Python vs R”, or whatever is the flavor of the month in these completely useless debates, I wanted to jump in with my thoughts.
First, a bit about my background. I’m not someone that learned to program on the side to get through an econometrics class in grad school. I started programming in the mid-80s on paper. I came across a book about programming that showed me the potential of computers, but not having access to a computer, the best I could do was write out the programs I would write if I had one. I was a CS major before I was an econ major, and the only reason I abandoned CS was because I wanted to work on AI, but they told me I had to write business applications (this was the early 1990s). I’ve used dozens of languages over the years, created a project to facilitate embedding of R inside a D program and vice versa, am currently writing a compiler for R as a side project, have worked through most of SICP, and use a functional programming approach in my teaching. This is not to claim I’m an expert (I’m not) but rather to clarify that I have sufficient understanding of the issues to have an opinion.
On to the point of this post. There are lots of good options out there for someone starting grad school in economics: R, Python, Julia, Matlab, Octave, Fortran, Mathematica, Stata and others. The only thing that matters is that you learn how to write the programs you need for your research. Learn several languages and focus on writing correct programs. Don’t worry about picking the “right” language, because that’s not a meaningful concept. Just pick one and run with it.
Moreover, don’t waste your time on a generic intro to programming course taught in a CS department or online. You won’t learn much that’s useful. If you’re an undergrad and you can count it as an elective, by all means, take the intro programming class, since it’ll have more benefit to your research than a second art history class. Just don’t expect to learn the things you need in the real world of economics research. CS departments moved on to Java and OOP and web programming long ago. They use Python these days, but not the type of Python that will help you. You need to learn numerical programming. It’s a bit puzzling, given the heavy emphasis on machine learning in 2022, but I haven’t seen a movement in intro courses to teaching numerical programming.
Here’s my recommended list of topics for someone looking to kill time the summer before starting an econ PhD program:
- Functions
- Vectors and matrices
- Compound data structures, like lists in R, or structs in C
- Loops
- map, filter, and reduce
- How to do a regression
- Numerical optimization and nonlinear estimation
- Random number generation
You can learn some of this stuff in Hadley Wickham’s Advanced R book. One objection is that this is advanced material. Well, economic research requires advanced programming concepts, so that’s what you need to learn. Dynamic programming is advanced math, but a couple generations of grad students have survived what has mostly been awful teaching of those methods. These days we have Google and many free resources if someone needs to fill in the details.
Returning to the motivation for this post, nothing in the above list is specific to a particular language. Some languages are more convenient than others for particular tasks (in some cases by a wide margin), but there is almost no advantage to choosing one of Python, R, or Julia rather than the others. They’ll all work just fine. After you’ve learned one, learn the others. Don’t worry, it’ll go fast learning new syntax for the same operations.
tldr: Programming languages are not magic pixie dust. Focus on the core concepts rather than superficial differences across languages. No programming language can make up for a lack of understanding of programming concepts.