Recursion: A very simple example

I teach econometrics classes using R as the main language.1 As much as possible, I teach the students how to implement their problems by writing a function that can be used with Reduce. Let’s consider a very simple example. You want to simulate from the model \[y_{t} = \beta y_{t-1} + \varepsilon_{t}\] \[\varepsilon_{t} \sim N \left( 0,1 \right)\] to compute the mean of \(y\) by simulation.

Set up the recursion:

`y[t]` <- function(`y[t-1]`, `e[t]`) {
  return(0.9*`y[t-1]` + `e[t]`)
}

That function should be clear enough. Given the previous value of y and the current shock, it returns the next value in the sequence. We need the initial value of y to start the recursion and the series of shocks. We’ll use \(y_{0} = 0.0\) and generate 10,000 observations. That leads to a Reduce call that looks like this:

set.seed(100)
y.sim <- Reduce(`y[t]`, rnorm(10000), init=0.0, accumulate=TRUE)
mean(y.sim)

That’s not exactly zero, but then we only generated 10,000 observations to compute the mean by simulation. Let’s check the standard deviation of the mean for this problem. We’ll generate 1000 samples:

set.seed(100)
simoutput <- replicate(1000, {
  y.sim <- Reduce(`y[t]`, rnorm(10000), init=0.0, accumulate=TRUE)
  mean(y.sim)
}, simplify="vector")
mean(simoutput)
sd(simoutput)

The combination of a persistent series (AR(1) coefficient of 0.9) and a small simulation size (10,000 observations) leads to extreme imprecision. What if we set the AR(1) coefficient to 0.2 and double the sample size?

`y[t]` <- function(`y[t-1]`, `e[t]`) {
  return(0.2*`y[t-1]` + `e[t]`)
}

set.seed(100)
simoutput <- replicate(1000, {
  y.sim <- Reduce(`y[t]`, rnorm(20000), init=0.0, accumulate=TRUE)
  mean(y.sim)
}, simplify="vector")
mean(simoutput)
sd(simoutput)

The standard deviation is only a tenth as large now; the simulation estimate of the mean of y is quite precise for that model and 20,000 observations.

I see nothing especially difficult about any of the above code. It’s absolutely the case that there are other approaches without a recursive function that would be just as good. The question you need to ask is what the odds are that someone with no programming experience would, with no guidance at all, be able to do so.

There is nothing inside the recursive function that doesn’t relate to the problem. There is no reference to a numerical index that has no relationship to the simulation. There is no need to allocate an array. There is no need to properly place the results into an array. The recursive function defines the problem and nothing but the problem and lets the language take care of the irrelevant implementation details. The student needs only think about how to make y[t] and y[t-1] align properly to make the call to Reduce work.

As the problem grows in complexity, the above properties will continue to hold. The recursive function will still be only a few lines long. The call to Reduce will be one line. The call to replicate will be one line to call Reduce and one line to compute the summary statistic of interest. Recursion scales particularly well. Students make good use of their time because they’re focused on modeling the question of interest rather than the implementation details.


  1. I make no claim that this is the best choice. A good argument could be made for numerous other languages. R is one of many well-designed languages I enjoy using. Yes, I said it’s well-designed. I’ve never seen a computer scientist make a compelling argument against R, though I’ve read many critiques that demonstrated computer scientists would do well to spend time learning about statistical analysis before opining on the topic. Let’s not forget that computer scientists thought it was a good idea to use Java in Programming 101 classes.↩︎