Billions of Regressors?

While carrying out my annual reading of the AEA’s Papers and Proceedings, I came upon Machine Learning Methods for Demand Estimation. They give an example of seriously big data: “…Google estimates the demand for a given web page by using a model of the network structure of literally billions of other web pages on the right-hand side.” Billions of potential regressors?

Cool stuff, but I’m skeptical of the model they select. There’s only so much information in a given dataset and just too much opportunity for spurious relationships to enter. At least economists can use economic theory as a starting point. On the other hand, it must be working, given how much better Google’s search results are relative to the competition.

The example is taken from Mining of Massive Datasets. Their book can be viewed online. Even better, there’s information on that page about a MOOC covering the content in the book.

Last Update: 2015-05-15