Chapter 23Time-Series Econometrics

Intro

The econometrics of Chapter 10 was built on a single quiet assumption: the observations are independent draws from a fixed population. Randomize a treatment, find an instrument, exploit a discontinuity, and the i.i.d. sampling that justifies the asymptotics takes care of the rest. Most of the data a macroeconomist or a finance researcher actually works with does not arrive that way. It arrives in order. Quarterly GDP, monthly inflation, the overnight policy rate, daily equity returns: each observation sits next to its neighbors in time, and what happened last quarter is part of what determines this one.

That ordering changes everything downstream. Serial dependence means consecutive observations carry overlapping information, so the effective sample is smaller than the count of data points suggests and the standard error formulas of cross-sectional OLS understate uncertainty. Worse, many economic series trend: they wander upward over decades with no fixed mean to return to. Run an ordinary regression on two such series and you can get a high $R^2$ and a decisive $t$-statistic linking quantities that have nothing to do with each other. The cross-sectional toolkit does not merely lose efficiency on ordered data; on trending data it manufactures relationships that are not there.

A different apparatus is needed, organized around one question (what kind of process generated this series?) and a second, multivariate question once several series are in play: which shocks moved the system, and can we even say? The answer to the first builds from stationarity through ARMA models to unit-root tests. The answer to the second builds from the vector autoregression to its structural interpretation, where the chapter meets its sharpest problem: the data describe how variables move together but never say which one moved first.

Intuition

Pourquoi c’est important : Time-ordered data remember their past, and the methods that ignored the ordering will mislead you. A regression that treats this year's GDP as an independent draw is reading the same slow drift over and over and mistaking it for evidence. The whole chapter is about taking the order seriously: figuring out whether a series settles back toward a center or wanders off forever, and figuring out, when several series move together, which one is doing the pushing.

À la fin de ce chapitre, vous serez capable de :
  1. Define covariance stationarity and explain why it is the precondition for standard time-series inference
  2. Identify AR and MA processes from their autocorrelation and partial-autocorrelation signatures and forecast from an ARMA model
  3. Distinguish difference-stationary from trend-stationary series, recognize a unit root, and explain the spurious-regression trap
  4. Specify a reduced-form VAR, read a Granger-causality result, and interpret an impulse response and a variance decomposition
  5. Explain why a reduced-form VAR is not structural and how an identifying restriction (Cholesky ordering, sign restrictions) changes the answer
  6. State what cointegration is, write the error-correction model that corresponds to a cointegrating relationship, and interpret the speed of adjustment
  7. Recognize volatility clustering and explain how ARCH/GARCH models make conditional variance forecastable

Prerequisites: the identification frame, OLS, and the serial-correlation note of Chapter 10 (Econometrics Foundations); linear algebra (the VAR is matrix-valued); basic probability. The stochastic-process intuition is built here, not assumed.

The methods below were not always part of economics. Through the 1970s the discipline learned to take the time dimension seriously, largely through the work of Clive Granger and Robert Engle on long-run relationships and volatility and Christopher Sims on multivariate dynamics; their program reshaped how macroeconomists handle data. That intellectual lineage, who arrived at these ideas and against what, belongs to the history-of-economic-thought volume rather than here, and is traced in its chapter on the information-economics and game-theory era.

23.1 Stationarity and the Building Blocks

Before any model can be fit, the series has to be the kind of object a model can describe. The organizing idea is stationarity. A stationary process is one whose probabilistic character is stable over time: the same mechanism is generating the data at the start of the sample as at the end. If that is true, the past is informative about the future in a way that does not itself shift; if it is not, the parameters being estimated are aimed at a moving target.

Covariance (weak) stationarity. A process $\{y_t\}$ is covariance stationary if its mean is constant, its variance is constant and finite, and the covariance between $y_t$ and $y_{t-k}$ depends only on the lag $k$ and not on the date $t$. The first two moments and the autocovariance structure do not drift over time.
Strict stationarity. The stronger requirement that the entire joint distribution of any block of observations is invariant to shifts in time: $(y_t, \ldots, y_{t+m})$ has the same distribution as $(y_{t+h}, \ldots, y_{t+m+h})$ for all $h$. Strict stationarity implies covariance stationarity when the second moments exist; the converse fails. Time-series econometrics works almost entirely with the weak version.
$$E[y_t]=\mu, \qquad \mathrm{Var}(y_t)=\sigma^2, \qquad \mathrm{Cov}(y_t, y_{t-k})=\gamma_k$$ (Eq. 23.1)
White noise. A sequence $\{\varepsilon_t\}$ with zero mean, constant variance $\sigma^2$, and no autocorrelation at any nonzero lag ($\mathrm{Cov}(\varepsilon_t, \varepsilon_{t-k})=0$ for $k\neq 0$). White noise is the raw, unpredictable innovation, the surprise component out of which every process in this chapter is built.
Lag operator. The operator $L$ that shifts a series back one period: $L y_t = y_{t-1}$, and $L^k y_t = y_{t-k}$. It is the algebra that makes the rest of the chapter compact: a polynomial in $L$ stands in for a whole pattern of lags, and the roots of that polynomial decide whether a process is stationary.

The lag operator turns dynamics into algebra. A polynomial $\phi(L)=1-\phi_1 L-\cdots-\phi_p L^p$ acting on $y_t$ encodes a whole autoregression in one symbol, and the process is stationary exactly when the roots of $\phi(z)=0$ lie outside the unit circle. When a root sits on the unit circle, stationarity fails; that is the unit root of §23.3.

Wold decomposition. Any covariance-stationary process can be written as the sum of a deterministic component and an infinite moving average of white-noise innovations: a weighted sum of current and past shocks. It is the structural justification for modeling stationary series as ARMA processes: every such series is, at bottom, a filtered stream of surprises.
$$y_t = \sum_{j=0}^{\infty} \psi_j \, \varepsilon_{t-j} + \kappa_t, \qquad \sum_{j=0}^{\infty} \psi_j^2 < \infty$$ (Eq. 23.2)

Here $\{\varepsilon_t\}$ is white noise, the $\psi_j$ are square-summable weights with $\psi_0=1$, and $\kappa_t$ is the part of $y_t$ that is perfectly predictable from its own past (a deterministic trend or seasonal). The square-summability of the weights is what keeps the variance finite, and it is the formal content of "the influence of an old shock fades."

Intuition

Pourquoi c’est important : A stationary process is one whose statistical character doesn't drift. Slide a window along the series and the picture inside it looks the same. The mean you would estimate from the first decade matches the mean from the last; the size of the typical wiggle is the same throughout; how strongly today relates to last month doesn't depend on which month. White noise is the purest case: pure surprise, no memory. And Wold's result says something reassuring about everything else, because every well-behaved series is just a stream of those surprises, with old ones fading as new ones arrive. The figure below lets you watch a series stay tethered to its mean, and lets you push it toward the edge where the tether snaps.

Figure 23.1. Stationary-vs-non-stationary process explorer. A simulated AR(1) path $y_t = \delta + \phi y_{t-1} + \varepsilon_t$ with its sample mean. At low $\phi$ the series hugs its mean (stationary). As $\phi \to 1$ it wanders farther and returns more slowly. At $\phi = 1$ it becomes a random walk, with no mean reversion and permanent shocks (the §23.3 boundary case). Drag the slider across $\phi = 1$; toggle drift; reseed to confirm the behavior is generic.

Once a series is recognized as stationary, the next question is how to model it. Two primitive ways a series can remember its past combine into a single family.

23.2 ARMA Processes

A stationary series remembers, and there are two clean ways to write down what it remembers. It can carry forward its own past values, so that high output last quarter raises expected output this quarter. Or it can carry forward the echoes of past surprises, a shock to the system that takes a few periods to work through. The first is autoregression; the second is moving average. Combine them and you have the workhorse model of univariate time series.

Autoregressive (AR) process. A process in which the current value is a linear function of its own past values plus a white-noise innovation. An AR(p) uses $p$ lags. The series remembers by carrying its own level forward.
$$y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \varepsilon_t$$ (Eq. 23.3)
Moving-average (MA) process. A process in which the current value is a linear function of the current and past white-noise shocks. An MA(q) uses $q$ lagged shocks. The series remembers by carrying the echoes of past surprises, each fading after $q$ periods.
$$y_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \cdots + \theta_q \varepsilon_{t-q}$$ (Eq. 23.4)
ARMA process. A process combining both representations, $p$ autoregressive lags and $q$ moving-average terms, written compactly in the lag operator as $\phi(L)y_t = \theta(L)\varepsilon_t$. ARMA models are parsimonious: a low-order ARMA can capture autocorrelation that would need many pure-AR or pure-MA terms.
$$\phi(L)\, y_t = \theta(L)\, \varepsilon_t, \qquad \phi(L)=1-\phi_1 L-\cdots-\phi_p L^p, \quad \theta(L)=1+\theta_1 L+\cdots+\theta_q L^q$$ (Eq. 23.5)
Autocorrelation function (ACF). The correlation of the series with its own lags, $\rho_k = \gamma_k/\gamma_0$, plotted against $k$. Its shape diagnoses the moving-average order.
Partial autocorrelation function (PACF). The correlation of the series with its lag $k$ after removing the linear influence of all the intervening lags. Its shape diagnoses the autoregressive order.

The ACF and PACF are the diagnostic instruments of the Box-Jenkins model-building approach, and the rule that makes them useful is a contrast in how they fall off. A pure AR process has a PACF that cuts off sharply after lag $p$ (once you control for the first $p$ lags, nothing further adds predictive content) while its ACF decays gradually. A pure MA process is the mirror image: its ACF cuts off after lag $q$, while its PACF decays. The cut-off-versus-decay pattern is how a practitioner reads the order of a model off the data.

Invertibility. The condition under which an MA process can be rewritten as a convergent infinite autoregression, which holds when the roots of $\theta(z)=0$ lie outside the unit circle. Invertibility is what makes the MA representation unique and the parameters identifiable from the data.

Forecasting from an ARMA model follows from the lag-operator algebra. The one-step-ahead forecast sets future innovations to their expected value of zero and rolls the estimated coefficients forward; the multi-step forecast iterates that recursion, and because the process is stationary the forecast converges to the unconditional mean $\mu$ as the horizon grows, with the forecast-error variance rising to the unconditional variance.

Intuition

Pourquoi c’est important : An AR series carries its own past forward: today's level is a faded copy of yesterday's. An MA series carries the echoes of past surprises, a shock that rings for a few periods and then is gone. They leave different fingerprints, and the two diagnostic plots read those fingerprints: for an AR process the partial-autocorrelation plot drops to zero abruptly while the plain autocorrelation tails off; for an MA process it is the other way around. You are not learning a formula so much as learning to recognize two shapes. Forecasting then needs no new idea. With no fresh surprise to expect, the best guess for the far future is just the long-run average, and the model tells you how fast you get there. Set the sliders below to a pure AR and watch one plot snap to zero; switch to a pure MA and watch the other one do it.

Figure 23.2. ARMA(1,1) simulator with sample ACF and PACF. Set a pure AR(1) ($\theta_1=0$): the PACF spikes at lag 1 then cuts off, while the ACF decays geometrically. Set a pure MA(1) ($\phi_1=0$): the ACF spikes at lag 1 then cuts off, while the PACF decays. The cut-off-versus-decay contrast is the Box-Jenkins identification rule. Drag the sliders.

All of this assumed the series was stationary. The diagnostic plots, the forecasts, the very meaning of the coefficients: each rests on a mean to return to and a finite variance. What happens when the series has neither?

23.3 Unit Roots and Spurious Regression

Push the AR(1) coefficient all the way to one and the machinery of §23.2 breaks. The series no longer has a mean to return to. Figure 23.1, dragged across $\phi = 1$, showed it directly: below one the series is tethered and a shock decays; at exactly one the tether snaps and a shock becomes permanent. That boundary case is the random walk, the most important non-stationary process in economics because so many macro series behave like it.

Random walk. An AR(1) with unit coefficient, $y_t = y_{t-1} + \varepsilon_t$, in which each period's value is last period's value plus a fresh shock. With a constant added, $y_t = \delta + y_{t-1} + \varepsilon_t$, it is a random walk with drift, which trends upward (or downward) while still wandering. It is the canonical non-stationary process.
$$y_t = y_{t-1} + \varepsilon_t \qquad (\text{with drift:} \;\; y_t = \delta + y_{t-1} + \varepsilon_t)$$ (Eq. 23.6)
Unit root. A root of the autoregressive characteristic polynomial $\phi(z)=0$ lying on the unit circle ($z=1$). A unit root is the source of this kind of non-stationarity: it makes shocks accumulate rather than decay, so the variance grows without bound over time.
Order of integration, I(d). The number of times a series must be differenced to render it stationary. A stationary series is I(0); a random walk is I(1) because its first difference $\Delta y_t = y_t - y_{t-1} = \varepsilon_t$ is white noise; a series needing two differences is I(2).
Difference-stationary vs. trend-stationary. Two series can both trend upward yet require opposite treatments. A difference-stationary (stochastic-trend) series, such as a random walk with drift, is made stationary by differencing; a trend-stationary (deterministic-trend) series, stationary fluctuations around a fixed line, is made stationary by subtracting the fitted trend. Mistaking one for the other induces either over-differencing or a left-behind unit root; the distinction is not cosmetic.

The reason a unit root cannot be ignored is the trap it sets for ordinary regression. Take two random walks generated completely independently of each other, with no causal link, no common driver, nothing. Regress one on the other and, far more often than chance should allow, you will find a large $R^2$ and a $t$-statistic that comfortably clears any conventional threshold. The regression announces a strong relationship between series that share nothing. This is the spurious-regression result that Granger and Newbold warned of in 1974, and it overturns the instinct that high $R^2$ plus a significant $t$ means something real.

Spurious regression. The phenomenon (Granger and Newbold, 1974) whereby regressing two unrelated I(1) series on each other produces an inflated $R^2$ and a spuriously significant $t$-statistic. The standard inference is invalid because the regression residuals are themselves non-stationary, so the usual distributional theory does not apply.
Dickey-Fuller / Augmented Dickey-Fuller (ADF) test. A test of the null hypothesis that a series contains a unit root. The augmented version regresses the first difference on the lagged level plus enough lagged differences to whiten the residuals; the coefficient on the lagged level being zero is the unit-root null. The test statistic does not follow the usual $t$-distribution; under a unit root the relevant distribution is non-standard, so special Dickey-Fuller critical values must be used.
$$\Delta y_t = \alpha + \gamma\, y_{t-1} + \sum_{i=1}^{k}\delta_i\, \Delta y_{t-i} + \varepsilon_t, \qquad H_0:\gamma = 0 \;\; (\text{unit root})$$ (Eq. 23.7)

Under $H_0:\gamma=0$ the level $y_{t-1}$ drops out and $\Delta y_t$ is driven only by its own lagged differences and noise: a unit root. Rejecting $H_0$ in favor of $\gamma<0$ means a deviation from the level is partly pulled back, i.e. the series is stationary. The lagged differences are the "augmentation" that absorbs serial correlation in $\varepsilon_t$ so the test is valid; the number of lags $k$ is chosen by an information criterion.

Intuition

Pourquoi c’est important : A random walk has no gravity. A stationary series is held on a tether, so pull it away from its center and it springs back; a random walk has nothing pulling it home, so a shock today never washes out and the series just wanders wherever the accumulated shocks take it. That is what a unit root means: permanence instead of decay. And here is the trap. Two wanderers, set loose independently, will both drift somewhere over a long sample, and any two things that drift will look correlated, because a line through the cloud always slopes one way or the other. Your regression will report a strong relationship between two series that have never met. The fix is to ask the right question first, whether this series is tethered or wandering, using a formal unit-root test, and to study wanderers in their changes (differences) rather than their levels. The figure below makes the trap visceral: regenerate the two independent walks and watch a "significant" relationship keep reappearing out of nothing.

Figure 23.3. Spurious-regression demonstrator. Two random walks generated with independent shocks, plotted together; the readout reports the OLS regression of one on the other with its $R^2$ and $t$-statistic. Reseed and watch high $R^2$ and "significant" $t$ recur across independent draws; the trap is systematic, not a fluke. Switch the regression to first differences and the apparent relationship collapses. Reseed; toggle levels vs. differences.

A standing example. Is US real GDP a random walk? Fit an AR(1) to its logarithm and the estimated coefficient comes back very close to one; run an ADF test and it typically fails to reject the unit-root null. The practical upshot is that output is better modeled in growth rates (first differences) than in levels. The historical episode this draws on is the province of the economic-history volume, but the apparatus is the point here.

So far one series at a time. Macroeconomic questions are usually about several series at once, such as output and inflation, or the interest rate and the exchange rate, moving together. The single-equation tools generalize into a system.

23.4 Vector Autoregressions (VAR)

Most interesting macroeconomic questions involve more than one variable. Inflation and the policy rate respond to each other; output, prices, and money move as a system. The vector autoregression is the natural generalization of the AR model to this setting: stack the variables into a vector and let each variable depend on the recent past of all of them.

Vector autoregression (VAR). A system in which a vector of variables $\mathbf{y}_t$ is regressed on $p$ lags of itself, with a vector of innovations $\mathbf{u}_t$. Each equation looks like an ordinary regression of one variable on the lags of every variable, so the whole system can be estimated equation by equation by OLS.
$$\mathbf{y}_t = \mathbf{c} + A_1\mathbf{y}_{t-1} + A_2\mathbf{y}_{t-2} + \cdots + A_p\mathbf{y}_{t-p} + \mathbf{u}_t, \qquad E[\mathbf{u}_t\mathbf{u}_t'] = \Sigma$$ (Eq. 23.8)

Because every equation has the same regressors, the lags of all variables, OLS applied equation by equation is efficient, and no system estimator is required for the reduced form. The estimated $A_i$ matrices and the residual covariance $\Sigma$ summarize the dynamics.

Granger causality. $X$ Granger-causes $Y$ if the past of $X$ improves the forecast of $Y$ beyond what the past of $Y$ alone provides. The label is a warning as much as a definition: Granger causality is predictive, not structural. It says past $X$ helps predict $Y$, not that $X$ causes $Y$ in any economic sense.
Impulse response function (VAR). The dynamic path traced out by each variable in the system following a one-time shock, plotted against the horizon. In the reduced-form VAR the impulse response describes how the system moves after a shock; but a shock to which variable, and isolated how, is exactly the question §23.5 must confront.
Forecast-error variance decomposition. The share of the forecast-error variance of each variable, at a given horizon, attributable to each shock in the system. It answers "how much of the unpredictable movement in output comes from policy shocks versus its own shocks?" and, like the impulse response, depends on how the shocks are defined.

A reduced-form VAR delivers two things almost for free. It forecasts the whole system jointly, often better than a structural model because it imposes few restrictions; and it tests Granger causality, telling you which series carry predictive content for which others. The impulse response and the variance decomposition seem to promise a third thing: a story about what happens after a shock. But there is a catch the prose has been signposting. The innovations $\mathbf{u}_t$ are correlated across equations. A movement in the inflation residual tends to come alongside a movement in the rate residual. So when the system "responds to a shock," whose shock was it? The reduced form cannot say.

Intuition

Pourquoi c’est important : A VAR lets every variable depend on the recent past of all the others. That buys two things immediately: a joint forecast of the whole system, and a test of which series helps predict which, called "Granger causality." But notice the careful word: helps predict is not causes. Knowing that ice-cream sales help predict drownings does not mean ice cream causes drowning; summer drives both. A VAR can tell you the policy rate helps predict inflation, and that is genuinely useful, but it stops short of saying the rate moved inflation. The reason it stops short is that the surprises in the different equations arrive together, tangled with one another, so "the system's response to a shock" is ambiguous until you untangle which shock you mean. That untangling is the next section, and it is the hardest problem in the chapter.

Figure 23.4. VAR / SVAR impulse-response explorer (the chapter's signature figure). A two-variable system: inflation and the policy rate, the monetary VAR of the standing example. Pick which variable is shocked and the panel shows both variables' dynamic responses. The Granger-causality and variance-decomposition readouts update. In §23.5, flip the Cholesky ordering and watch the impulse responses change: same data, a different identifying assumption, a different economic story. Choose the shock; set the horizon; toggle the ordering.

The impulse responses you just generated assumed an ordering, a choice about which variable can move the other within the period. That choice was made quietly. Bringing it into the open is the structural step, and it is where the data stop being able to settle the argument.

23.5 Structural VAR and Identification

The reduced-form residuals are correlated, and that correlation is the whole problem. An economic shock, a surprise tightening of monetary policy, say, should be a single disturbance with a clear interpretation. But the reduced-form innovation in the rate equation is contaminated by whatever moved inflation at the same instant, and vice versa. The residuals are mixtures. To recover the underlying structural shocks, you have to specify how the mixtures were formed, and the data do not contain that information.

Structural VAR (SVAR). A VAR whose innovations are given an economic interpretation as orthogonal (uncorrelated, unit-variance) structural shocks. The reduced-form residuals are written as a linear combination of these structural shocks, $\mathbf{u}_t = B\boldsymbol{\varepsilon}_t$, and the task of identification is to pin down the matrix $B$.
$$\mathbf{u}_t = B\,\boldsymbol{\varepsilon}_t, \qquad E[\boldsymbol{\varepsilon}_t\boldsymbol{\varepsilon}_t'] = I, \qquad \text{so} \;\; \Sigma = BB'$$ (Eq. 23.9)
Identification (time-series sense). The problem of recovering the structural shocks $\boldsymbol{\varepsilon}_t$ from the reduced-form residuals $\mathbf{u}_t$, equivalently, pinning down $B$. The estimated covariance $\Sigma$ gives only $n(n+1)/2$ equations, but $B$ has $n^2$ unknowns; the gap, $n(n-1)/2$ restrictions, must come from outside the data.

For a two-variable system, $\Sigma=BB'$ supplies three equations (two variances and one covariance) for the four elements of $B$. One restriction is missing. Imposing it is identification, and the impulse responses at horizon $h$, given by $\Theta_h = A^h B$, inherit whatever was imposed.

$$\Theta_h = A^h B \qquad (\text{structural impulse response at horizon } h)$$ (Eq. 23.10)
Cholesky / recursive identification. A scheme that makes $B$ lower-triangular, equivalent to ordering the variables and assuming the first cannot respond contemporaneously to shocks in those ordered after it. In the monetary VAR, ordering the rate first says inflation reacts to a rate shock within the period but the rate does not react to an inflation shock within the period. The ordering is the assumption.
Sign restrictions. An identification approach that, instead of zero restrictions, requires the structural impulse responses to have certain signs over some horizon (a monetary-tightening shock raises the rate and lowers inflation, for instance). Rather than a single answer it delivers a set of impulse responses consistent with the imposed signs, an honest representation of what the assumptions leave undetermined.

Three families of restriction are in common use, and Figure 23.4 lets you feel the consequence of the most common one. Toggle the Cholesky ordering between "rate first" and "inflation first" and the impulse responses visibly change shape. Nothing in the data changed: the same residuals, the same estimated dynamics. What changed is the assumption about which variable can move the other within the period, and that assumption rewrote the economic story. Short-run zero restrictions like Cholesky are one option; long-run restrictions, which assume a shock has no permanent effect on some variable (Blanchard and Quah's decomposition of demand and supply shocks is the standard example), are another, named here but not derived; sign restrictions are the modern alternative that reports the identified set rather than a single line.

This is where the chapter's framing tension lives. Christopher Sims, who introduced the VAR to macroeconomics, argued that the elaborate identifying restrictions of older structural models were "incredible," assumptions imposed for tractability rather than belief, and that an honest empirical macro should let the data speak with as few restrictions as possible. The opposing view holds that without some economic structure the impulse responses are uninterpretable, so the right move is to impose restrictions you can defend and be explicit about them. Both positions are serious, and the toggle you just used is exactly what they disagree about: how much identification the data can honestly support, and what to do about the gap.

Which side has the better of the argument is not settled here. The apparatus (what identification is, why the choice matters, what each scheme buys and gives up) is the chapter's job, and it is now in hand. The verdict over whether atheoretical VARs or structural identification won the field is argued at length in the walkthrough on econometric-methodology credibility, where the methodological stakes get the space they need.

Intuition

Pourquoi c’est important : The data hand you correlated shocks: the surprises in inflation and in the interest rate arrive tangled together. To tell an economic story you have to say which way the arrow points within the same period: did the rate move first and inflation follow, or the reverse? Economics has to supply that arrow, because the data cannot. They only show the two moving together, not the order. And here is the uncomfortable part: that single added assumption, the one the data can't check, decides the whole answer. In the figure, flipping the ordering kept every number in the dataset identical and still flipped the story. Sims's worry was that economists were smuggling in arrows they couldn't justify and calling the result a finding. The other camp says you can't avoid choosing an arrow, so choose one you can defend and say so out loud. The honest takeaway is the tension itself, which is why the verdict is argued in a walkthrough, not pronounced here.

Did the data ever speak for themselves?

Apparatus stop. The VAR/SVAR identification problem you just met is the technical core of the methodological-credibility debate; this is where the walkthrough gets its machinery.

Ce que dit le modèle

The Cholesky-ordering toggle in Figure 23.4 is the whole methodological argument in one gesture: identical data, a different identifying assumption, a different impulse response. Sims's atheoretical-VAR program was a bid to minimize such untestable assumptions; the structural-identification reply is that some assumption is unavoidable and the discipline should make defensible ones explicitly. The reduced-form VAR (§23.4), the $\mathbf{u}_t = B\boldsymbol{\varepsilon}_t$ mapping (§23.5), and the menu of restrictions (Cholesky, long-run, sign) are the apparatus on which that argument is conducted.

Le jugement (à ce niveau)

This chapter teaches what identification is and shows that the choice changes the answer; it deliberately stops short of declaring who won. The walkthrough takes the apparatus and argues the verdict: whether the credibility revolution vindicated the design-based skeptics, whether structural macro earned its restrictions back, and where time-series identification sits in that story.

Apparatus stop — VAR/SVAR identification

Did the data ever speak for themselves?

Flip the ordering and the answer flips with it. Sims wanted the data to speak with as few assumptions as possible; the structural camp says some assumption is unavoidable. The walkthrough argues who was right.

Follow this thread →

The systems so far have been built from stationary or differenced series. But differencing can throw away something real: when two series wander together, taking their changes erases the relationship that ties them. The next section recovers it.

23.6 Cointegration and Error Correction

Recall the spurious-regression warning: two independent wanderers look related, and the fix is to study them in differences. But sometimes two wanderers are genuinely tethered, not to a fixed mean each, but to each other. Short-term and long-term interest rates drift over the decades, yet the spread between them stays within a band. Consumption and income each trend upward, yet the ratio is stable. Differencing such a pair would difference away exactly the long-run relationship that matters. Cointegration is the apparatus for the case where the relationship is real.

Cointegration. Two (or more) I(1) series are cointegrated if a linear combination of them is I(0), that is, stationary. The series wander individually, but the combination does not: it is pulled back whenever it strays. That stationary combination is a long-run equilibrium relationship.
$$z_t = y_t - \beta x_t \sim I(0) \quad \text{when} \quad y_t, x_t \sim I(1)$$ (Eq. 23.11)
Engle-Granger two-step. A procedure that (1) estimates the cointegrating relationship by OLS on the levels, which is legitimate precisely because the variables are cointegrated, so the regression is not spurious, and tests whether the residual $z_t$ is stationary, then (2) uses the estimated residual as the error-correction term in a dynamic model.
Error-correction model (ECM). A model of the short-run dynamics that includes a term proportional to last period's deviation from the long-run relationship. The coefficient on that term, $\lambda$, is negative and measures how fast the system corrects: each period, a fraction $|\lambda|$ of the gap is closed.
$$\Delta y_t = \lambda\,(y_{t-1}-\beta x_{t-1}) + \text{(lagged } \Delta y, \Delta x \text{ terms)} + \varepsilon_t, \qquad \lambda<0$$ (Eq. 23.12)
Granger representation theorem. The result that cointegration and an error-correction representation are equivalent: if a set of I(1) series is cointegrated, then there exists an ECM describing their dynamics, and conversely. Stated here, not proved: the equivalence is what licenses estimating the ECM once cointegration is established.
Johansen procedure. A system maximum-likelihood method that tests how many independent cointegrating relationships exist among a set of series: the cointegrating rank. Where Engle-Granger handles a single relationship in two steps, Johansen handles several jointly. The rank is the number of long-run equilibrium relations tying the system together.

The error-correction coefficient is the most interpretable number in this section. If $\lambda = -0.2$, then a fifth of any deviation from the long-run relationship is undone each period; if $\lambda = 0$, there is no pull and the series are not cointegrated at all, so the "relationship" is the spurious kind from §23.3. The Granger representation theorem is what makes this respectable: it guarantees that whenever genuine cointegration is present, an error-correction model is the right way to write the dynamics, so estimating the ECM is not an ad hoc add-on but the implied form. The Johansen procedure extends the idea to systems with possibly several equilibrium relations, and its rank test answers how many; the mechanics of the eigenvalue computation are beyond the scope here, but the meaning of the rank, the count of long-run ties, is the part to carry forward.

Intuition

Pourquoi c’est important : Two series can each wander forever and yet never drift apart from each other, like two dogs off the leash but tied together by a short rope. Each goes where it likes; the rope keeps the distance between them from growing. That shared distance is the long-run equilibrium, and the speed at which the rope yanks them back when they stray is the error-correction term. This is the exact opposite of the spurious-regression trap: there, two wanderers only looked related; here, they truly are, and the rope is real. Earlier the advice was to difference wanderers before studying them, but if you difference a tethered pair you cut the rope and lose the very thing you wanted to see. The figure below has the rope's tightness as a slider: at zero the two series float apart freely, and as you tighten it the gap between them settles into a stationary band even while each series keeps wandering.

Figure 23.5. Cointegration / error-correction explorer. Two I(1) series share a common stochastic trend; the second panel shows their spread $z_t = y_t - x_t$. At $|\lambda|=0$ the spread itself wanders (not cointegrated). As $|\lambda|$ rises, deviations from the long-run relationship are pulled back faster and the spread becomes visibly stationary. Drag the error-correction speed; reseed.

Everything so far has modeled the conditional mean, where the series is expected to go. A final movement turns to the conditional variance, how uncertain that expectation is, and finds that uncertainty itself has a predictable rhythm.

23.7 ARCH / GARCH Volatility Modeling

Look at a long series of daily stock returns and one feature jumps out before any model is fit: the turbulence comes in clumps. Quiet stretches of small moves are interrupted by stormy stretches of large moves, and the storms persist for days or weeks before calm returns. The returns themselves are close to unpredictable, as efficient markets would lead you to expect, but their size is not. Large moves cluster with large moves. This is the stylized fact that the conditional-variance models were built to capture, and it sits orthogonal to everything in §23.1 through §23.6, which modeled the conditional mean.

Volatility clustering. The empirical regularity that large changes tend to be followed by large changes and small by small, regardless of sign. Periods of high and low volatility cluster in time: the defining stylized fact of financial-return series.
Conditional vs. unconditional variance. The conditional variance is the variance of the next observation given the past; it changes over time as new information arrives. The unconditional variance is the long-run average, a single number. ARCH/GARCH models make the conditional variance time-varying and forecastable while leaving the unconditional variance constant.
ARCH process. A model (Engle, 1982) in which the conditional variance is a function of past squared shocks: a large shock raises the variance of the next period, which is why turbulence persists. An ARCH(q) uses $q$ lagged squared shocks.
$$\sigma_t^2 = \omega + \sum_{i=1}^{q}\alpha_i\,\varepsilon_{t-i}^2$$ (Eq. 23.13)
GARCH process. The generalization (Bollerslev, 1986) in which the conditional variance also depends on its own past values: past conditional variances as well as past squared shocks. A GARCH(1,1) captures the persistence of volatility with just two parameters and is the workhorse of empirical finance.
$$\sigma_t^2 = \omega + \sum_{i=1}^{q}\alpha_i\,\varepsilon_{t-i}^2 + \sum_{j=1}^{p}\beta_j\,\sigma_{t-j}^2$$ (Eq. 23.14)

In the GARCH(1,1), the sum $\alpha_1+\beta_1$ measures persistence: it governs how slowly a volatility shock decays, and the unconditional variance $\omega/(1-\alpha_1-\beta_1)$ is finite only when $\alpha_1+\beta_1<1$. Estimated persistence on daily equity data is routinely above $0.95$, meaning a spike in volatility takes a long time to subside: the empirical content of "turbulence lingers."

Intuition

Pourquoi c’est important : You cannot predict whether tomorrow's return is up or down; if you could, the trade would already be made and the prediction erased. But you can predict whether tomorrow will be a calm day or a wild one, because calm and wild cluster: a stormy market today tells you the next few days are likely to be stormy too, even though it tells you nothing about which direction the moves will go. That split, where the direction of a surprise is unforecastable but the size of a surprise is forecastable, is the entire idea behind these models. A big move today feeds into a bigger expected swing tomorrow; when that feedback is strong, storms last a long time once they start. The figure below has a persistence dial: turn it low and the series looks like featureless noise; turn it toward one and watch calm and turbulent stretches organize themselves into long swells, no equation required to see it happen.

Figure 23.6. GARCH(1,1) volatility-clustering explorer. The top panel shows a simulated return series; the bottom panel shows the conditional volatility $\sigma_t$ that generated it. At low persistence ($\alpha_1+\beta_1$ small) the returns look like homoskedastic noise. As persistence approaches 1, calm and turbulent periods cluster sharply and the volatility line shows long swells. The slider guards $\alpha_1+\beta_1<1$. Drag the persistence dials; reseed.

The basic GARCH(1,1) has spawned a family of refinements, each fixing a limitation. EGARCH models the logarithm of the variance, so it allows the "leverage effect," where bad news raises volatility more than equally large good news, and needs no non-negativity constraints. GARCH-in-mean (GARCH-M) lets the conditional variance enter the return equation directly, formalizing the idea that investors demand higher expected returns when risk is high. These extensions are named here as the working vocabulary of empirical finance, not derived. Beyond them lies a frontier (Bayesian estimation of high-dimensional VARs, machine-learning approaches to volatility and prediction) that is outside the scope of a first course but worth knowing exists.

Résumé

  1. Stationarity is the precondition. Standard time-series inference assumes covariance stationarity: a stable mean, variance, and autocovariance structure. The lag operator compacts the algebra; the Wold decomposition guarantees every stationary series is a filtered stream of white-noise shocks.
  2. ARMA models and their fingerprints. AR processes carry their own past forward; MA processes carry the echoes of past shocks. The ACF and PACF diagnose them (PACF cuts off for AR, ACF cuts off for MA), which is the Box-Jenkins identification rule.
  3. Unit roots break the toolkit. A random walk has a unit root: shocks are permanent, the series wanders, and two independent wanderers regress spuriously (high $R^2$, significant $t$). The ADF test detects a unit root; differencing is the fix.
  4. The VAR models the system. Each variable on lags of all variables gives a joint forecast and a Granger-causality test, but Granger causality is predictive, not structural, and the reduced-form shocks are correlated.
  5. Structural identification is the pivot. Recovering economic shocks from reduced-form residuals needs a restriction the data cannot supply. Cholesky ordering, long-run restrictions, and sign restrictions each impose one, and the choice changes the impulse response. This is the Sims atheoretical-VAR tension.
  6. Cointegration recovers long-run relationships. I(1) series can share a stationary linear combination, a long-run equilibrium that differencing would destroy. The error-correction model measures the speed of return, and the Granger representation theorem makes cointegration and ECM equivalent.
  7. ARCH/GARCH model time-varying risk. Volatility clusters: the size of returns is forecastable even when their direction is not. GARCH makes conditional variance depend on past squared shocks and past variance; persistence near 1 means turbulence lingers.

Équations clés

LibelléÉquationDescription
Eq. 23.1$E[y_t]=\mu,\ \mathrm{Var}(y_t)=\sigma^2,\ \mathrm{Cov}(y_t,y_{t-k})=\gamma_k$Covariance-stationarity conditions
Eq. 23.2$y_t = \sum_{j=0}^{\infty}\psi_j\varepsilon_{t-j} + \kappa_t$Wold decomposition
Eq. 23.3$y_t = c + \phi_1 y_{t-1} + \cdots + \phi_p y_{t-p} + \varepsilon_t$AR(p)
Eq. 23.4$y_t = \mu + \varepsilon_t + \theta_1\varepsilon_{t-1} + \cdots + \theta_q\varepsilon_{t-q}$MA(q)
Eq. 23.5$\phi(L)y_t = \theta(L)\varepsilon_t$ARMA in lag-operator form
Eq. 23.6$y_t = y_{t-1} + \varepsilon_t$ (with drift $+\,\delta$)Random walk
Eq. 23.7$\Delta y_t = \alpha + \gamma y_{t-1} + \sum\delta_i\Delta y_{t-i} + \varepsilon_t;\ H_0:\gamma=0$Augmented Dickey-Fuller regression
Eq. 23.8$\mathbf{y}_t = \mathbf{c} + A_1\mathbf{y}_{t-1} + \cdots + A_p\mathbf{y}_{t-p} + \mathbf{u}_t$Reduced-form VAR(p)
Eq. 23.9$\mathbf{u}_t = B\boldsymbol{\varepsilon}_t,\ E[\boldsymbol{\varepsilon}_t\boldsymbol{\varepsilon}_t']=I$SVAR mapping (residuals to structural shocks)
Eq. 23.10$\Theta_h = A^h B$Structural impulse response at horizon $h$
Eq. 23.11$z_t = y_t - \beta x_t \sim I(0)$ when $y_t, x_t \sim I(1)$Cointegrating relation
Eq. 23.12$\Delta y_t = \lambda(y_{t-1}-\beta x_{t-1}) + \cdots + \varepsilon_t$Error-correction model
Eq. 23.13$\sigma_t^2 = \omega + \sum_{i=1}^{q}\alpha_i\varepsilon_{t-i}^2$ARCH(q) conditional variance
Eq. 23.14$\sigma_t^2 = \omega + \sum\alpha_i\varepsilon_{t-i}^2 + \sum\beta_j\sigma_{t-j}^2$GARCH(p,q) conditional variance

Pratique

  1. You are shown an ACF that decays geometrically and a PACF that spikes at lag 1 and is essentially zero thereafter. Identify the process (AR, MA, or ARMA) and its order, and state the Box-Jenkins rule you used.
  2. For each series, classify it as AR or MA from its diagnostic plots: (a) ACF cuts off after lag 2, PACF decays; (b) PACF cuts off after lag 1, ACF decays. State the order in each case.
  3. A series must be differenced twice before its ACF and an ADF test indicate stationarity; its first difference still shows a near-unit-root ACF. State its order of integration and justify the classification.
  4. A Granger-causality table reports that the policy rate Granger-causes inflation ($p<0.01$) but inflation does not Granger-cause the rate ($p=0.4$). Interpret the result, and state precisely what it does and does not establish about causation.

Application

  1. Write down the ADF regression for testing whether log real GDP has a unit root. State the null hypothesis in terms of the coefficient on the lagged level, explain why ordinary $t$-critical values are inappropriate, and explain the spurious-regression risk that motivates running the test before regressing GDP on another trending series.
  2. Specify a two-variable VAR(1) in output growth and inflation. Write the two equations explicitly, state what regressors each contains, and explain in words what an impulse response of inflation to an output-growth shock would show.
  3. In the monetary VAR of Figure 23.4, explain why the Cholesky ordering matters. State the contemporaneous restriction each ordering imposes, and explain why reordering the variables changes the impulse responses even though the estimated reduced form is unchanged.
  4. A GARCH(1,1) fit to daily returns gives $\alpha_1 = 0.08$, $\beta_1 = 0.90$. Compute the persistence, state whether the unconditional variance is finite, and interpret what the persistence value implies about how long a volatility spike lasts.

Défi

  1. Using the lag operator, derive the MA($\infty$) representation of a stationary AR(1), $y_t = \phi y_{t-1} + \varepsilon_t$ with $|\phi|<1$. Show that the moving-average weights are $\psi_j = \phi^j$, and explain what the condition $|\phi|<1$ guarantees about the weights and the variance.
  2. Sketch the Granger-Newbold logic for why OLS on two independent random walks produces spurious significance. Explain why the regression residuals are non-stationary under the null of no relationship, and why that invalidates the usual $t$-distribution for the slope coefficient.
  3. State the Granger representation theorem's equivalence (cointegration if and only if an ECM representation exists) in words. Then, given a cointegrating vector implying the long-run relationship $y = 1.5\,x$, write the error-correction model for $\Delta y_t$, label the error-correction term, and interpret the sign and magnitude of its coefficient.
  4. Contrast Sims's skepticism about structural identification with a defensible structural-identification scheme (e.g. a Cholesky ordering justified by an assumption that monetary policy responds to inflation only with a lag). State what each approach buys and what each gives up, and explain why the question of which is right is one this material poses rather than answers.

Sources

Wold (1938); Box & Jenkins (1970); Granger & Newbold (1974); Dickey & Fuller (1979); Engle (1982); Sims (1980); Bollerslev (1986); Engle & Granger (1987); Blanchard & Quah (1989); Johansen (1991); Hamilton (1994); Stock & Watson (2001); Enders (2014).