Detailed dynlm example

Generate some time series data:

set.seed(475)
y <- ts(rnorm(100))
x <- ts(rnorm(100))

Running a regression works just fine if there are no lags:

fit <- lm(y ~ x)
fit

This gives the output:

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
    0.05133      0.13895

Let’s regress y on the first lag of x:

fit2 <- lm(y ~ lag(x,-1))
fit2

Yielding

Call:
lm(formula = y ~ lag(x, -1))

Coefficients:
(Intercept)   lag(x, -1)  
    0.05133      0.13895  

Which looks mysteriously exactly the same as the first regression output. That’s unlikely to be correct. Let’s estimate an AR(1) model for x:

fit3 <- lm(x ~ lag(x, -1))
fit3

Yielding

Call:
lm(formula = x ~ lag(x, -1))

Coefficients:
(Intercept)   lag(x, -1)  
  -1.11e-17     1.00e+00  

Something’s wrong. There’s no way that an estimated AR(1) model should result in an exact pure random walk model. And looking at the R-squared

summary(fit3)$r.squared

We see that we have a perfect fit (R-squared is one). The problem is that the lm function has no understanding of time series data, so it’s not doing what you’d expect. It’s unfortunate that R does not give a warning when you send time series data to lm.

You could instead do this to estimate an AR(1) model:

xt <- window(x, start=2)
xt1 <- window(x, end=99)
fit4 <- lm(xt ~ xt1)
fit4

That returns

Call:
lm(formula = xt ~ xt1)

Coefficients:
(Intercept)          xt1  
   -0.06577     -0.02109  

The dynlm function in the package of the same name provides a shortcut:

library(dynlm)
fit5 <- dynlm(x ~ L(x, 1))
fit5

That returns exactly the same output:

Time series regression with "ts" data:
Start = 2, End = 100

Call:
dynlm(formula = x ~ L(x, 1))

Coefficients:
(Intercept)      L(x, 1)  
   -0.06577     -0.02109  

The advantages are greater if you want to include lags of multiple variables. Suppose you want three lags of both x and y as regressors. The fastest way to do that using lm would be to call the embed function:

yr <- embed(y,4)
xr <- embed(x,4)
fit6 <- lm(yr[,1] ~ yr[,2:4] + xr[,2:4])
fit6

That’s clumsy and error-prone.

Call:
lm(formula = yr[, 1] ~ yr[, 2:4] + xr[, 2:4])

Coefficients:
(Intercept)   yr[, 2:4]1   yr[, 2:4]2   yr[, 2:4]3   xr[, 2:4]1   xr[, 2:4]2  
   0.043292     0.053224    -0.056656     0.084205     0.056499     0.007075  
 xr[, 2:4]3  
  -0.014557  

With dynlm, you can do

fit7 <- dynlm(y ~ L(y, 1:3) + L(x, 1:3))
fit7

and get the same estimates.

The output of a call to dynlm is an object that includes the output of an lm call as one component, so you can treat dynlm output the same as lm output, and pass it to any function that works with lm output:

> inherits(fit7, "lm")
[1] TRUE

One of the handier features of dynlm is that it does not strip the time series properties from the data, so the fitted values and residuals are themselves time series objects:

fitted(fit7)
residuals(fit7)

If you want to plot or otherwise work with the residuals of fit6, you will have to add the time series information yourself, which is of course a recipe for disaster in real research projects:

plot(ts(residuals(fit6), start=4, frequency=1))

versus

plot(residuals(fit7))