Detailed dynlm example
Generate some time series data:
set.seed(475)
y <- ts(rnorm(100))
x <- ts(rnorm(100))
Running a regression works just fine if there are no lags:
fit <- lm(y ~ x)
fit
This gives the output:
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
0.05133 0.13895
Let’s regress y on the first lag of x:
fit2 <- lm(y ~ lag(x,-1))
fit2
Yielding
Call:
lm(formula = y ~ lag(x, -1))
Coefficients:
(Intercept) lag(x, -1)
0.05133 0.13895
Which looks mysteriously exactly the same as the first regression output. That’s unlikely to be correct. Let’s estimate an AR(1) model for x:
fit3 <- lm(x ~ lag(x, -1))
fit3
Yielding
Call:
lm(formula = x ~ lag(x, -1))
Coefficients:
(Intercept) lag(x, -1)
-1.11e-17 1.00e+00
Something’s wrong. There’s no way that an estimated AR(1) model should result in an exact pure random walk model. And looking at the R-squared
summary(fit3)$r.squared
We see that we have a perfect fit (R-squared is one). The problem is that the lm
function has no understanding of time series data, so it’s not doing what you’d expect. It’s unfortunate that R does not give a warning when you send time series data to lm
.
You could instead do this to estimate an AR(1) model:
xt <- window(x, start=2)
xt1 <- window(x, end=99)
fit4 <- lm(xt ~ xt1)
fit4
That returns
Call:
lm(formula = xt ~ xt1)
Coefficients:
(Intercept) xt1
-0.06577 -0.02109
The dynlm
function in the package of the same name provides a shortcut:
library(dynlm)
fit5 <- dynlm(x ~ L(x, 1))
fit5
That returns exactly the same output:
Time series regression with "ts" data:
Start = 2, End = 100
Call:
dynlm(formula = x ~ L(x, 1))
Coefficients:
(Intercept) L(x, 1)
-0.06577 -0.02109
The advantages are greater if you want to include lags of multiple variables. Suppose you want three lags of both x and y as regressors. The fastest way to do that using lm
would be to call the embed
function:
yr <- embed(y,4)
xr <- embed(x,4)
fit6 <- lm(yr[,1] ~ yr[,2:4] + xr[,2:4])
fit6
That’s clumsy and error-prone.
Call:
lm(formula = yr[, 1] ~ yr[, 2:4] + xr[, 2:4])
Coefficients:
(Intercept) yr[, 2:4]1 yr[, 2:4]2 yr[, 2:4]3 xr[, 2:4]1 xr[, 2:4]2
0.043292 0.053224 -0.056656 0.084205 0.056499 0.007075
xr[, 2:4]3
-0.014557
With dynlm
, you can do
fit7 <- dynlm(y ~ L(y, 1:3) + L(x, 1:3))
fit7
and get the same estimates.
The output of a call to dynlm
is an object that includes the output of an lm
call as one component, so you can treat dynlm
output the same as lm
output, and pass it to any function that works with lm
output:
> inherits(fit7, "lm")
[1] TRUE
One of the handier features of dynlm
is that it does not strip the time series properties from the data, so the fitted values and residuals are themselves time series objects:
fitted(fit7)
residuals(fit7)
If you want to plot or otherwise work with the residuals of fit6
, you will have to add the time series information yourself, which is of course a recipe for disaster in real research projects:
plot(ts(residuals(fit6), start=4, frequency=1))
versus
plot(residuals(fit7))