black1.tex

%\input texsis
%\book
%\singlespace
%\input epsf.tex
\input grafinp3
\input psfig
%\eqnotracetrue
%\showchaptIDfalse
\chapter{Time Series\label{timeseries}}
\footnum=0
\def\toone{{t+1}}
\def\ttwo{{t+2}}
\def\tthree{{t+3}}
\def\Tone{{T+1}}
\def\TTT{{T-1}}
\def\rtr{{\rm tr}}
%
%\def\frac#1/#2{\leavevmode\kern.1em
% \raise.5ex\hbox{\the\scriptfont0 #1}\kern-.1em
% /\kern-.15em\lower.25ex\hbox{\the\scriptfont0 #2}}
\def\frac#1#2{#1\over #2}
%
%\showchaptIDfalse
%\def\specsec#1{\medskip{\sc\noindent{#1}}}

%\line{\hfil \today}
\section{Two workhorses}
  This chapter describes two tractable models
of time series: finite state \idx{Markov chain}s and   first-order
stochastic linear difference equations.   These models are
organizing devices that  put    restrictions on a
sequence of random vectors.  They are useful because they describe
a time series  with parsimony. In later chapters, we shall  make
two uses each of Markov chains and stochastic linear difference
equations: (1) to represent the exogenous information flows
impinging on an agent or an economy, and (2) to represent an
optimum or equilibrium outcome of agents' decision making.
The Markov chain and the first-order stochastic linear difference
both use a sharp  notion of a \idx{state} vector.  A state vector
summarizes the information about the current position of a system that
is relevant for determining its future.
The Markov chain and the stochastic linear difference equation
will   be useful tools   for studying dynamic optimization
problems.
%  ^^|wonkers.m|   %  this is how to use the index program for
                   %  matlab programs
                   \index{stochastic!linear difference equations}

\section{Markov chains}

A stochastic process is a sequence of random vectors.  For
us, the sequence will be ordered by a time index, taken to be  the
integers in this book.  So we study discrete time models. \index{stochastic!process}%
  We study a discrete-state stochastic process with
the following property:
\medskip
\specsec{Markov Property:}  A stochastic process $\{x_t\}$ is
said to have the {\it Markov property} if for all $k \geq 1$ and
all $t$,
$${\rm Prob}(x_{t+1}\vert x_t, x_{t-1}, \ldots, x_{t-k})
    = {\rm Prob}(x_{t+1}\vert x_t) .$$
\medskip
We assume the Markov property and characterize the process by a
{\it Markov chain}.    \index{Markov chain}
 A  time-invariant Markov chain is defined by a triple of objects, namely,
an  $n$-dimensional state space consisting of vectors
$e_i, i=1, \ldots, n$, where $e_i$ is an $n\times 1$ unit vector
whose  $i$th entry is $1$ and all other entries are zero;
an $n \times n$
{\it transition matrix} $P$, which    \index{transition matrix}
records the probabilities of moving from one value of the state to
another in one period; and an $(n \times 1)$ vector $\pi_0$
whose $i$th element is the probability of being in state $i$ at time 0:
$ \pi_{0i} = {\rm Prob} (x_0 = e_i)$.
The elements of  matrix $P$ are
$$ P_{ij} = {\rm Prob}(x_{t+1} = e_j \vert x_t = e_i).$$
For these interpretations to be valid, the matrix $P$ and the
vector $\pi_0$ must satisfy the following assumption:

\specsec{Assumption M:}

\item{a.}  For $i=1, \ldots, n$, the matrix $P$ satisfies
$$ \sum_{j=1}^n P_{ij} =1. \EQN obA1$$

\medskip
\item{b.}  The vector $\pi_0$ satisfies
$$ \sum_{i=1}^n \pi_{0i} =1.$$

    A matrix $P$ that satisfies property \Ep{obA1} is called a
{\it stochastic matrix}.  A stochastic matrix \index{stochastic!matrix}%
defines the
probabilities of moving from one value of the state to another
in one period.  The probability of moving from one value of the
state to another in {\it two\/} periods is determined by $P^2$
because
$$\eqalign{ &{\rm Prob}(x_{t+2} =e_j \vert x_t = e_i) \cr
   &=\sum_{h=1}^n {\rm Prob}(x_{t+2}=e_j \vert x_{t+1}=e_h)
      {\rm Prob}(x_{t+1}=e_h \vert x_t = e_i) \cr
      &=
    \sum_{h=1}^n P_{ih} P_{hj} = P^{(2)}_{ij},\cr }$$
where $P^{(2)}_{ij}$ is the $i,j$ element of $P^2$.  Let $P^{(k)}_{i,j}$
denote the $i,j$ element of $P^k$.
By iterating on the preceding equation, we discover that
$$ {\rm Prob}(x_{t+k} = e_j \vert x_t = e_i) =
     P^{(k)}_{ij}  .$$
The unconditional probability distributions of $x_t$ are
determined by
$$\eqalign{\pi_1'= {\rm Prob} (x_1)  &= \pi_0' P \cr
           \pi_2'= {\rm Prob}( x_2) &= \pi_0' P^2 \cr
             \vdots & \cr
            \pi_k'={\rm Prob} (x_k) & = \pi_0' P^k ,\cr}$$
where $\pi_t'={\rm Prob} (x_t)$ is the $(1 \times n)$ vector whose
$i$th element is ${\rm Prob}(x_t = e_i)$.

\index{distribution!stationary}
\subsection{Stationary distributions}

Unconditional probability distributions evolve according to
$$ \pi_{t+1}' = \pi_t ' P. \EQN obA2$$
An unconditional
 distribution is called {\it stationary\/}  or {\it invariant\/}
if it satisfies
$$ \pi_{t+1} = \pi_t,$$
that is, if  the unconditional
distribution remains unaltered with the passage of time.
From the law of motion \Ep{obA2} for unconditional
distributions, a stationary distribution must satisfy
$$ \pi' = \pi' P \EQN steadst1 $$
or
$$ \pi' (I - P) =0.$$
Transposing both sides of this equation gives
$$ (I-P') \pi =0, \EQN obA3$$
which determines $\pi$ as an eigenvector (normalized to satisfy
$\sum_{i=1}^n \pi_i = 1$) associated with a unit eigenvalue of $P'$.
We say that $P, \pi$ is a {\it stationary Markov chain\/} if the initial distribution
$\pi$ is such that \Ep{steadst1} holds.

The fact that $P$ is a stochastic matrix (i.e., it has nonnegative
elements and satisfies $\sum_j P_{ij} =1$ for all $i$) guarantees that
$P$ has at least one unit eigenvalue, and
that there is at least one eigenvector
 $\pi$ that satisfies equation \Ep{obA3}.
This stationary distribution may not be
unique because $P$ can have a repeated unit eigenvalue.

\smallskip
\noindent{\bf Example 1}.  A Markov chain
$$ P =\left[\matrix{1 & 0 & 0 \cr
                   .2 & .5 & .3 \cr
                    0 & 0 & 1 \cr } \right] $$
has two unit eigenvalues with associated stationary distributions
$ \pi' = \left[\matrix{1 & 0 & 0 \cr}\right]$ and
$ \pi' = \left[\matrix{0 & 0 & 1 \cr}\right]$. Here states $1$ and $3$
are both {\it absorbing\/} states.
Furthermore, any initial distribution that puts zero probability
on state $2$   is a stationary distribution. See exercises {\it \the\chapternum.10\/} and
{\it \the\chapternum.11\/}.

\smallskip
\noindent{\bf Example 2}.  A Markov chain
$$ P =\left[\matrix{.7 & .3 & 0 \cr
                   0 & .5 & .5 \cr
                    0 & .9 & .1 \cr } \right] $$
has one unit eigenvalue with associated stationary distribution
$ \pi' = \left[\matrix{0 & .6429 & .3571 \cr}\right]$.
Here states $2$ and $3$ form an {\it absorbing subset\/} of the state space.
\subsection{Asymptotic stationarity}

   We often ask the following question
about a    Markov process: for an arbitrary initial distribution
$\pi_0$, do the unconditional distributions
$\pi_t$ approach a stationary distribution
$$ \lim_{t \to \infty} \pi_t = \pi_\infty ,$$
where $\pi_\infty$ solves equation
 \Ep{obA3}?  If the answer is yes, then does the
limit distribution $\pi_\infty$ depend on the initial distribution
$\pi_0$?  If the limit $\pi_\infty$ is independent of the
initial distribution $\pi_0$, we say that the process is
{\it asymptotically stationary with a unique invariant distribution}.
We call a solution $\pi_\infty$ a {\it stationary
distribution\/} or an {\it invariant distribution\/} of $P$.

\index{distribution!stationary} \index{distribution!invariant}

  We state these concepts formally in the following definition:
%\specsec{Definition:}  Let $\pi_\infty$ be a unique vector that
%satisfies
%$(I-P') \pi_\infty=0.$  If for all initial distributions
%$\pi_0$ it is true that $P^t{'} \pi_0$ converges to
%the same $\pi_\infty$, we say that the Markov chain
%is asymptotically stationary with a unique invariant distribution.

\medskip%\noindent{\sc Definition:}
\definition{def1}  Let $\pi_\infty$ be a unique vector that
satisfies
$(I-P') \pi_\infty=0.$  If for all initial distributions
$\pi_0$ it is true that $P^t{'} \pi_0$ converges to
the same $\pi_\infty$, we say that the Markov chain
is asymptotically stationary with a unique invariant distribution.
\enddefinition
\medskip
  The following theorems can be used to show
that a Markov chain is asymptotically stationary.
\medskip
%\medskip\noindent{\sc Theorem 1:}
\theorem{ch11}  Let $P$ be a stochastic
matrix with $P_{ij} > 0 \ \forall (i,j)$.  Then $P$ has
a unique stationary distribution, and the
process is asymptotically stationary.
\endtheorem
\medskip
%\noindent{\sc Theorem 2:}
\theorem{ch12}  Let $P$ be a stochastic matrix
for which $P^n_{ij} > 0 \ \forall (i,j)$ for some
value of $n \geq 1$.  Then $P$ has a unique stationary
distribution, and the process is asymptotically stationary.
\endtheorem
\medskip
\noindent The conditions of \Theorem{ch11} (and \Theorem{ch12}) state that from any
state there is a positive probability of moving to any other state
in one (or $n$) steps. Please note that some of the examples below will violate
the conditions of \Theorem{ch12} for any $n$.


\subsection{Forecasting the state}

The minimum mean squared error  forecast of the state next period is the conditional
mathematical expectation:
$$ E [x_{t+1} | x_t = e_i] = \bmatrix{ P_{i1} \cr P_{i2} \cr \vdots \cr P_{in} \cr}
    = P' e_i = P_{i,\cdot}'  \EQN Pprep
    $$
where $P_{i,\cdot}'$ denotes
the transpose of the $i$th row of the matrix $P$.  In section \use{sec:HMM}
of this book's appendix \use{lincontrol}, we use this equation to motivate
the following first-order  stochastic difference equation for the state:\index{Markov chain!as difference equation}%
$$ x_{t+1} = P' x_t + v_{t+1} \EQN Pdiffeqn $$
where $v_{t+1}$ is a random disturbance that evidently satisfies
$E [v_{t+1} | x_t] = 0 $.

Now let $\overline y$ be an $n\times 1$ vector of real numbers and define
$y_t = \overline y' x_t$, so that $y_t = \overline y_i$ if $x_t = e_i$.
Evidently, we can write
$$ y_{t+1} = \bar y' P' x_t + \bar y' v_{t+1}.  \EQN observnonlinear $$
The pair of equations \Ep{Pdiffeqn}, \Ep{observnonlinear} becomes a simple
example of a \idx{hidden Markov model}
when the observation $y_t$ is too coarse to reveal the state.  See section
\use{sec:HMM} of technical  appendix \use{lincontrol} for a discussion of such models.


\subsection{Forecasting functions of the state}
From the conditional and unconditional probability
distributions that we have listed, it follows that the unconditional
expectations of $y_t$ for $t \geq 0$ are determined by
$ E y_t =  (\pi_0' P^t) \overline y$.
%or
%$$\eqalign{ E y_0 &= \pi_0' \overline y \cr
%            E y_1 & = \pi_0 ' P \overline y \cr
%           & \vdots \cr
%            E y_t & = \pi_0 ' P^k \overline y. \cr } $$
Conditional expectations  are determined by
\index{conditional expectation}%
$$\EQNalign{& E (y_{t+1}| x_t = e_i)   =
 \sum_j P_{ij} \overline y_j
   = (P \overline y)_i \EQN conde1  \cr
  & E(y_{t+2} | x_t =e_i) = \sum_k P_{ik}^{(2)} \overline y_k
  = (P^2 \overline y)_i  \EQN conde2 \cr}  $$
and so on, where $P_{ik}^{(2)}$ denotes the $(i,k)$ element of $P^2$ and  $(\cdot)_i$ denotes
the $i$th row of the matrix $(\cdot)$. An equivalent formula
from \Ep{Pdiffeqn}, \Ep{observnonlinear} is
$E [y_{t+1} | x_t] = \bar y' P' x_t = x_t' P \bar y$ , which
equals $(P\bar y)_i$ when $x_t = e_i$.
  Notice that
$$ \eqalign{ E [ E (y_{t+2} | x_{t+1} = e_j) | x_t =
  e_i] & = \sum_j P_{ij} \sum_k P_{jk} \overline y_k \cr
    = \sum_k (\sum_j P_{ij} P_{jk}) \overline y_k   & = \sum_k P_{ik}^{(2)}
                 \overline y_k = E(y_{t+2} | x_t = e_i).  \cr}  $$
Connecting the first and last terms in this string of equalities
yields $E [E(y_{t+2}|x_{t+1})| x_t ] = E [y_{t+2} | x_t]$.
This is an example of the
``law of iterated expectations.'' The \idx{law of iterated expectations}
states that for any random variable $z$ and two information sets
$J, I$ with $J \subset I$, $E [E(z | I)|J]=E(z|J)$.
As another example of the law of
iterated expectations, notice that
$$E y_1 =  \sum_j \pi_{1,j} \overline y_j =
\pi_1' \overline y = (\pi_0' P) \overline y
   = \pi_0' (P \overline y) $$
and that
$$ E[ E(y_1 | x_0 =e_i) ]% = \sum_j P_{ij} \overline x_j
  = \sum_i \pi_{0,i} \sum_j P_{ij} \overline y_j
  = \sum_j (\sum_i \pi_{0,i} P_{ij}) \overline y_j = \pi_1' \overline y
= E y_1 .$$

%$$ E (x_{t+k} \vert x_t= \overline x) = P^k \overline x .   $$
%
%Notice that
%$$\eqalign{ E (x_t) & = \pi_t' \overline x = (\pi_0' P^t) \overline x \cr
%                            & = \pi_0' (P^t \overline x)  \cr
%                            & = E [E (x_t \vert x_0 = \overline x )] .\cr}$$
%The statement that $E (x_t ) = E (E x_t \vert x_0 = \overline x)$
%is an example of the {\it law of iterated expectations}.
\index{law of iterated expectations}

\subsection{Forecasting functions}\label{sec:resolvent}
 There are powerful formulas for forecasting functions
of a Markov state.
   Again, let $\overline y $ be an $n\times 1$ vector and
consider the random variable $y_t = \overline y' x_t$.
Then
$$ E[y_{t+k} \vert x_t = e_i] =  (P^k \overline y)_i $$
where $(P^k \overline y)_i$ denotes the $i$th row of $P^k \overline y$.
Stacking all $n$ rows together, we express this as
$$ E[y_{t+k} | x_t] = P^k \overline y .  \EQN foreformulak $$
We also have
$$  \sum_{k=0}^\infty \beta^k E [y_{t+k}  \vert x_t =
    \overline e_i ]
       = [ (I -\beta P)^{-1} \overline y ]_i,$$
where $\beta \in (0,1)$ guarantees existence of $(I -\beta P)^{-1}
= (I + \beta P + \beta^2 P^2 + \cdots \, )$.
The matrix $(I -\beta P)^{-1}$ is called a ``resolvent operator.''
\index{resolvent operator}% GGGG



\subsection{Enough one-step-ahead forecasts determine $P$}
One-step-ahead forecasts of
a sufficiently rich set of  random variables characterize a Markov
chain. In particular,
   one-step-ahead conditional expectations of $n$ independent
functions (i.e., $n$ linearly independent vectors $h_1, \ldots, h_n$)
uniquely determine the transition matrix $P$.  Thus, let
$E[h_{k,t+1} \vert x_t = e_i] = (P h_k)_i $. We can collect the
conditional expectations of $h_k$  for all initial states $i$ in
an $n \times 1$ vector
$E [h_{k,t+1} \vert x_t] = P h_k$.
We can then collect conditional expectations for the $n$
independent vectors $h_1, \ldots, h_n$ as $P h = J$ where
$h=\left[\matrix{h_1 & h_2 & \ldots & h_n\cr}\right]$ and
$J$ is  the $n \times n$ matrix consisting of all conditional expectations of
all $n$ vectors $h_1, \ldots , h_n$.  If we know $h$ and $J$, we
can determine $P$ from $P= J h^{-1}$.

\subsection{Invariant functions and  ergodicity}

Let $P, \pi$ be a stationary $n$-state Markov chain with the
 state space
$X= [e_i, i=1, \ldots, n]$.   An $n \times 1$ vector $\overline y$
defines a  random variable $y_t = \overline y' x_t$.
%Thus, a random variable is another term for  ``function of the underlying
%Markov state.''
Let $E [y_\infty | x_0]$ be the expectation of $y_s$ for $s$ very large,
conditional on the initial state.
The following is a useful precursor to a law of large numbers:
\medskip
\theorem{lawlargenumbers0}
Let $\overline y$ define a random variable as a function of an underlying
state $x$, where $x$ is governed by a stationary
Markov chain $(P, \pi)$.  Then
$$ {1 \over T} \sum_{t=1}^T y_t \rightarrow   E [y_\infty |  x_0]
   \EQN lawlarge0 $$
with probability $1$.
\endtheorem
\medskip

To illustrate \Theorem{lawlargenumbers0}, consider the following example:

\medskip
\noindent{\bf Example:}
 Consider the Markov chain $P = \bmatrix{ 1 & 0 \cr 0 & 1\cr}, \pi_0 = \bmatrix{p \cr (1-p)\cr}$ for
$p \in (0,1)$.  Consider the random variable $y_t = \bar y' x_t$ where $\bar y = \bmatrix{10 \cr 0 \cr}$. The chain has two possible sample
paths, $y_t = 10, t\geq 0$, which occurs with probability $p$  and $y_t = 0, t\geq 0$, which occurs with probability $1-p$.
Thus,  $ {1 \over T} \sum_{t=1}^T y_t \rightarrow 10$ with probability $p$ and ${1 \over T} \sum_{t=1}^T y_t \rightarrow  0$ with
probability $(1-p)$.

\medskip

The outcomes in this example indicate why we  might want something   more than \Ep{lawlarge0}.
In particular, we would like to be free to
replace $E[y_\infty |  x_0]$ with the constant unconditional
mean $E[y_t] = E[y_0]$ associated with the stationary distribution $\pi$.
 To get this outcome, we must strengthen
what we assume about $P$ by using the following concepts.

Suppose that $(P, \pi)$ is a stationary Markov chain.  Imagine repeatedly drawing $x_0$ from $\pi$ and
then generating $x_t, t \geq 1$ by successively drawing from  transition densities given by the matrix $P$.
We use
\medskip
\definition{invariant} A random variable $y_t = \overline y' x_t $ is said to
be {\it invariant\/} if
$y_t = y_0, t \geq 0$,
 for all realizations of  $x_t, t \geq 0$ that occur with positive probability under $(P, \pi)$.
%$\bar y_i = \bar y_j$ for all $i,j$ that occur with
%positive probability according to the stationary probability distribution
%$\pi$.
\enddefinition
\medskip
\noindent  Thus, a random variable $y_t$ is invariant (or ``an invariant function
of the state'') if it remains constant
at $y_0$ while the underlying state $x_t$ moves through the state space $X$.  Notice how
the definition leaves open the possibility that $y_0$ itself might differ across sample paths indexed by
different draws of the initial condition
$x_0$ from the initial (and stationary) density $\pi$.

%
%Evidently, ${\rm Prob}(y_{t+1} = \bar y_j | y_t = \bar y_i) = P_{ij}$.  If  $P_{ij} >0 $ for all $(i,j)$,
%then the only way that we can have $y_{t+1} = y_t$ for all realizations  is if we have $\bar y_i = \bar y_j$
%for all $i, j$.  More generally, the only way that we can have $y_{t+1} = y_t = \bar y_i$ for sure is that
%$\bar y_j = \bar y_i $ whenever $P_{ij} >0$.  Thus, we can say that if $y_t = \bar y  x_t$ is an invariant variable for Markov chain
%with transition matrix $P$,  then $P\bar y = \bar y$, which is to say, $\bar y$ is a right eigenvector of $P$ associated
%with a unit eigenvalue.
%
%If $P_{ij} > 0$ for all $(i,j)$, then the only eigenvector associated with a unity eigenvector has the form $ \bar y_o {\bf 1} $,
%where $y_o$ is a real scalar and ${\bf 1}$ is the $n \times 1$ unit vector.  If $P_{ij} = 0$ for some $(i,j)$, there
%can be eigenvectors that are not of the form $ \bar y_o {\bf 1} $, as illustrated in the following example.
%
%\medskip
%\specsec{Example:}  Consider the stochastic matrix
%$$ P = \bmatrix{ 1 & 0 & 0 \cr
%                 0 & 1 & 0 \cr
%                 .25 & .25 & .5} .$$
%A right eigenvector of $P$ associated with a unit eigenvalue is $\bmatrix{ 2 & -2 & 0 } $, which evidently is not constant across states.
%
%\medskip




The stationary Markov chain $(P, \pi)$ induces a joint density $f(x_{t+1}, x_t)$ over $(x_{t+1}, x_t)$ that
is independent of calendar time $t$; $P, \pi$ and the definition $y_t = \overline y' x_t$ also
induce a joint density $f_y(y_{t+1}, y_t)$ that is independent of calendar time.  In what follows,
we compute mathematical expectations with respect to the joint density $f_y(y_{t+1}, y_t)$.

For a finite-state Markov chain, the following theorem gives a convenient
way to characterize
invariant        functions of the state.

\medskip
\theorem{invariant2200}  Let $(P, \pi)$ be a stationary Markov chain. %and assume that $\pi$ puts positive
%probability on all $x_t \in X$.
If
$$ E [y_{t+1} | x_t] = y_t \EQN invariant22 $$
then the random variable   $y_t = \overline y' x_t$
is invariant.
\endtheorem


\proof  By using the law of iterated expectations, notice that
$$\eqalign{ E (y_{t+1} - y_t)^2 &= E[E(y_{t+1}^2 - 2 y_{t+1} y_t
+ y_t^2)|x_t]  \cr
 & = E[ E y_{t+1}^2 | x_t - 2 E (y_{t+1}| x_t) y_t + E y_t^2 | x_t]  \cr
 & = E y_{t+1}^2 - 2 E y_t^2 + E  y_t^2  \cr
 & =0 \cr } $$
where the middle term on the right side of the second line
uses that $E[y_{t}|x_t]= y_t$, the middle term on the right side of the third line
uses the hypothesis \Ep{invariant22}, and the third line uses the hypothesis that $\pi$
is a stationary distribution.  In a finite Markov chain, if
$E (y_{t+1} - y_t)^2=0$,  then $y_{t+1} = y_t$ for all  $y_{t+1}, y_t$ that
occur with positive probability under the stationary distribution.
\endproof
\medskip
As we shall have reason to study in chapters \use{selfinsure} and
\use{incomplete},
{\it any\/} (not necessarily stationary) stochastic   process $y_t$ that satisfies
\Ep{invariant22} is said to be a {\it martingale\/}.
\Theorem{invariant2200} tells us that a martingale that is a function of a finite-state
stationary Markov state $x_t$ must be constant over time.  This result is a special case of the
martingale convergence theorem that underlies some remarkable results about savings to be studied
in chapter \use{selfinsure}.\NFootnote{\Theorem{invariant2200} tells us that a stationary martingale process
has so little freedom to move that it has to be constant forever, not just eventually, as asserted
by the martingale convergence theorem.}



Equation \Ep{invariant22} can be expressed as
$ P \overline y   = \overline y$
or
$$ (P - I)\overline y = 0, \EQN invariant3 $$
which states that an invariant function of the state is
a (right) eigenvector of $P$ associated with a unit eigenvalue.
Thus, associated with unit eigenvalues of $P$ are (1) left eigenvectors that are stationary distributions of the chain (recall equation \Ep{obA3}),
and (2) right eigenvectors that are invariant functions of the chain (from equation \Ep{invariant3}).


\definition{ergodicity}  Let $(P, \pi)$ be a stationary Markov chain.
The chain is said to be {\it ergodic\/} if the only invariant
functions $\overline y$ are constant with probability $1$ under the stationary unconditional probability
distribution $\pi$, i.e.,
$\overline y_i = \overline y_j$ for all $i, j$ with $\pi_i >0, \pi_j >0$.
\enddefinition

\medskip

\specsec{Remark:}  Let $\tilde \pi^{(1)}, \tilde \pi^{(2)}, \ldots, \tilde \pi^{(m)}$ be
$m$ distinct `basis' stationary distributions for an $n$ state Markov chain with transition matrix
$P$.  Each $\tilde \pi^{(k)}$ is an $(n \times 1)$  left eigenvector of $P$ associated with a distinct
unit eigenvalue. Each $\pi^{(j)}$ is scaled to be a probability vector (i.e., its components are nonnegative and sum to unity).
The set $S$ of {\it all\/} stationary distributions is convex. An element $\pi_b \in S$   can  be represented
as
$$ \pi_b =   b_1 \tilde \pi^{(1)} + b_2 \tilde \pi^{(2)} +  \cdots + b_m \tilde \pi^{(m)} , $$
where $b_j \geq 0, \sum_j b_j =1$ is a probability vector.

\medskip
\specsec{Remark:}  A stationary density $\pi_b$ for which the pair  $(P, \pi_b)$ is an ergodic Markov chain
is an extreme point of the convex set $S$, meaning that it can be represented
as $ \pi_b = \tilde \pi^{(j)} $
for one of the `basis' stationary densities.



\medskip
\noindent A law of large numbers for Markov chains is:
\medskip
\theorem{LLNMarkov}
Let $\overline y$ define a random variable on a stationary and ergodic
Markov chain $(P, \pi)$.  Then
$$ {1 \over T} \sum_{t=1}^T y_t \rightarrow     E[y_0]
   \EQN lawlarge1 $$
with probability $1$.
\endtheorem
\medskip
This theorem tells us that the time series average
converges to the population mean of the stationary distribution.

\medskip
 Three examples illustrate these concepts.
\medskip
\noindent{\bf Example 1.} A chain with transition matrix
$P=\left[\matrix{0 & 1 \cr 1 & 0\cr}\right]$ has a unique stationary
distribution $ \pi=\left[\matrix{.5 & .5 \cr}\right]'$  and
the invariant functions are $\left[\matrix{\alpha & \alpha \cr}\right]'$
for any scalar $\alpha$.  Therefore, the process is ergodic and
\Theorem{LLNMarkov} applies.

\medskip
\noindent{\bf Example 2.}  A chain with transition matrix
$P=\left[\matrix{1 & 0 \cr 0 & 1\cr}\right]$ has a continuum of
stationary distributions
$\gamma \left[\matrix{1 \cr 0 \cr}\right]+
(1- \gamma )\left[\matrix{0 \cr 1 \cr}\right]$ for any $\gamma \in [0,1]$ and
invariant functions
$\left[\matrix{0 \cr \alpha_1 \cr}\right] $ and
$\left[\matrix{\alpha_2  \cr 0 \cr}\right]$ for any scalars $\alpha_1, \alpha_2$.  Therefore,
the process is not ergodic when $\gamma \in (0,1)$, for note that neither invariant function is
constant across states that receive positive probability according to a stationary distribution associated with $\gamma \in (0,1)$.
Therefore, the  conclusion \Ep{lawlarge1} of  \Theorem{LLNMarkov} does not hold for an initial  stationary distribution associated with
$\gamma \in (0,1)$,
 although the weaker result  \Theorem{lawlargenumbers0} does hold. When $\gamma \in (0,1)$,  nature chooses state $i=1$ or
 $i=2$ with probabilities $\gamma, 1-\gamma$, respectively,
  at time $0$.   Thereafter, the chain remains stuck in the realized time $0$ state. Its failure ever to
 visit the unrealized state prevents the sample average from converging to the population mean of  an arbitrary  function $\bar y$ of the state.
Notice that conclusion \Ep{lawlarge1} of \Theorem{LLNMarkov}  does hold for  the stationary distributions associated
with $\gamma=0$ and $\gamma=1$.
\medskip
\noindent{\bf Example 3.}
  A chain with transition matrix
$P=\left[\matrix{.8 & .2 & 0  \cr .1  & .9 & 0 \cr
               0 & 0 & 1\cr}\right]$ has a continuum of
stationary distributions
$ \gamma \left[\matrix{ {1\over 3} & {2 \over 3} & 0 \cr}\right]'
+(1- \gamma) \left[\matrix{ 0 & 0 & 1 \cr}\right]' $ for $\gamma \in [0,1]$ and
invariant functions
$ \alpha_1\left[\matrix{1  &  1 & 0 \cr}\right]'$ and
$ \alpha_2\left[\matrix{0 &  0 & 1 \cr}\right]'$
for any scalars $\alpha_1, \alpha_2$.
The conclusion \Ep{lawlarge1} of  \Theorem{LLNMarkov} does not hold for
the stationary distributions associated with $\gamma \in (0,1)$,
but \Theorem{lawlargenumbers0} does hold.
But again, conclusion \Ep{lawlarge1} does hold for the  stationary distributions associated with $\gamma =0$ and $\gamma=1$.


\subsection{Simulating a Markov chain}
 It is easy to simulate a Markov chain using a random number
generator.  The Matlab program {\tt markov.m}
%^^|markov.m|
\mtlb{markov.m}
 does the job.
We'll use this program in some later chapters.\NFootnote{An index
in the back of the book lists Matlab programs.}
 %available at
% $<$ftp://zia.stanford.edu/\raise-4pt\hbox{\~{}}sargent/pub/webdocs/matlab$>$.}


\subsection{The likelihood function}
Let $P$ be an $n \times n$ stochastic matrix
with states $1 ,2, \ldots, n$.
 Let $\pi_0$ be
an $n \times 1$  vector with nonnegative elements summing to
$1$, with $\pi_{0,i}$ being the probability that the state is
$i$ at time $0$.
Let $i_t$ index the state at time
$t$.
The Markov property implies that the probability of drawing the path
$(x_0, x_1, \ldots, x_{T-1}, x_T) = (\overline e_{i_0}, \overline e_{i_1},
\ldots, \overline e_{i_{T-1}}, \overline e_{i_T})$ is
$$\eqalign{ L & \equiv  {\rm Prob}( \overline  x_{i_T},
 \overline x_{i_{T-1}}, \ldots, \overline x_{i_1},  \overline x_{i_0}) \cr
       & =  P_{i_{T-1}, i_T} P_{i_{T-2}, i_{T-1}}
   \cdots P_{i_0, i_1} \pi_{0,i_0}.   \cr} \EQN likeli1
     $$
The probability $L$ is called the {\it likelihood}. It is a
function of both the sample realization $x_0, \ldots , x_T$ and
the parameters of the stochastic matrix $P$.   \index{likelihood
function} For a sample $x_0, x_1, \ldots, x_T$, let $n_{ij}$ be
the number of times that there occurs a one-period transition from
state $i$ to state $j$.  Then the likelihood function can be
written
$$ L = \pi_{0,{i_0}} \prod_i\ \prod_j P_{i,j}^{n_{ij}},$$
a {\it multinomial\/} distribution.
\index{distribution!multinomial} \index{maximum likelihood}
\index{likelihood function!multinomial}

 Formula \Ep{likeli1} has two uses.   A first, which we
shall encounter often, is to describe the
probability of alternative histories of a Markov
chain.   In chapter \use{recurge},
we shall use this formula to study prices and allocations
in competitive equilibria.

A second use is for estimating the parameters of a model
whose solution is a Markov chain.
Maximum likelihood estimation for free parameters $\theta$ of a
 Markov process
works as follows.  Let the transition matrix $P$
and the initial distribution
$\pi_0$ be functions $P (\theta), \pi_0(\theta)$
 of a vector of free parameters $\theta$.
Given a sample $\{x_t\}_{t=0}^T $, regard the likelihood function
as a function of the parameters $\theta$.  As the
estimator of $\theta$, choose the value that maximizes the
likelihood function $L$.



\section{Continuous-state Markov chain}
In chapter \use{recurge}, we shall  use a somewhat different
notation to express the same ideas. This alternative notation can
accommodate either discrete- or continuous-state Markov chains.  We
shall let $S$ denote the state space with typical element $s \in
S$.     Let state transitions be described by the cumulative distribution function
$\Pi(s' | s) =  {\rm
Prob} (s_{t+1} \leq  s' | s_t =s)$ and let the initial state $s_0$ be described by
 the cumulative distribution function $\Pi_o(s) =  {\rm
Prob} (s_0 \leq s)$.  The {\it transition density\/} is $\pi(s' | s )  =  {d \over d s'} \Pi(s'|s)$
and the initial density is % {\rm
%Prob} (s_{t+1} = s' | s_t =s)$ and the initial density is
$\pi_0(s) = {d \over d s} \Pi_0(s)$.  For all $s\in S, \pi(s'|s )
\geq 0$ and $  \int_{s'} \pi(s'|s) d s' =1$; also $\int_s \pi_0(s) d s
=1$.\NFootnote{Thus, when $S$ is discrete,
 $\pi(s_j|s_i)$ corresponds
to $P_{i,j}$ in our earlier notation.}
Corresponding to \Ep{likeli1}, the likelihood function or density
over the history $s^t = [s_t, s_{t-1}, \ldots, s_0]$
is
 $$ \pi(s^t) = \pi(s_t | s_{t-1} )\cdots \pi(s_1| s_0) \pi_0 (s_0).
  \EQN likeli2 $$
For $t\geq 1$, the time $t$ unconditional distributions
evolve according to
$$ \pi_t(s_t) = \int_{s_{t-1}} \pi(s_t|s_{t-1}) \pi_{t-1} (s_{t-1})
     d \, s_{t-1} .$$
A stationary or  {\it invariant\/} distribution
satisfies
$$ \pi_\infty(s') = \int_s \pi(s'|s) \pi_\infty (s) d \, s, $$
which is the counterpart to
\Ep{steadst1}.

\medskip
\specsec{Definition:} A Markov chain $\bigl(\pi(s'|s), \pi_0(s)\bigr)$ is said
to be {\it stationary\/} if $\pi_0$ satisfies
  $$ \pi_0(s') = \int_s \pi(s'|s) \pi_0 (s) d \, s. $$

\medskip

\specsec{Definition:} Paralleling our discussion of finite-state Markov chains,
we can say that the function $\phi(s) $ is {\it invariant\/} if
$$ \int \phi(s') \pi(s'| s) d s' = \phi(s). $$
A stationary continuous-state Markov process is said to be {\it ergodic\/}
if the only invariant functions  $\phi(s')$ are constant with probability
$1$ under the stationary distribution $\pi_\infty$.

\medskip

A law of large numbers for Markov processes states:

\medskip
\theorem{LLNMarkovcontinuous}
Let $y(s)$ be a random variable, a measurable  function of  $s$,
and let
$\bigl(\pi(s'|s),\pi_0(s)\bigr)$ be  a stationary  and  ergodic   continuous-state
Markov process.   Assume that $E |y| < +\infty$. Then
$$ {1 \over T} \sum_{t=1}^T y_t \rightarrow E y
   = \int y(s) \pi_0(s) ds $$
with probability $1$ with respect to the distribution $\pi_0$.
\endtheorem
\index{second-moment restrictions}
\section{Stochastic linear difference equations}\label{sec:stochlinear}%
   The first-order linear vector stochastic difference equation
is a  useful example of a continuous-state Markov process.
Here we  use
$x_t \in {\bbR}^n$   rather than $s_t$ to denote the time $t$ state
and
specify that the initial distribution   $\pi_0(x_0)$ is Gaussian with mean
$\mu_0$ and covariance matrix $\Sigma_0$, and that
the transition density $\pi(x'|x) $ is Gaussian with
mean $A_o x$ and covariance $C C'$.\NFootnote{An $n \times 1$ vector
$z$ that is multivariate normal has the density
function $$ f(z) = (2 \pi)^{-.5 n} |\Sigma|^{-.5} \exp (-.5 (z - \mu)' \Sigma^{-1} (z-\mu)
) $$
where $ \mu = E z$ and $\Sigma = E (z-\mu)(z-\mu)'$.}
This specification pins down the joint distribution of the
stochastic process $\{x_t\}_{t=0}^\infty$ via formula
\Ep{likeli2}.
The joint distribution determines
all  moments of the process.

This specification can be represented in terms of
the first-order stochastic linear difference equation
$$ x_{t+1} = A_o x_t + C  w_{t+1} \EQN diff1 $$
for $t = 0, 1 ,\ldots$,
where $x_t$ is an $n \times 1$ state vector, $x_0$ is a random
initial condition drawn from a probability distribution with mean $E x_0 = \mu_0$
and covariance matrix $E (x_0 - \mu_0)(x_0 - \mu_0)' = \Sigma_0$,
$A_o$ is an $n \times n$ matrix, $C$ is an $n \times m$
matrix, and  $w_{t+1}$ is an $m \times 1$   vector satisfying
 the following:
\medskip
\specsec{Assumption A1:}  $w_{t+1}$ is an  i.i.d.\ process satisfying
 $w_{t+1} \sim {\cal N}(0, I)$.

\medskip

We can weaken the Gaussian assumption A1.
 To focus only on first and second moments of the $x$ process,
it is sufficient to make the weaker assumption:
\medskip
\specsec{Assumption A2:} $w_{t+1}$ is an $m \times 1$ random vector
satisfying:
$$ \EQNalign{ E w_{t+1} \vert J_t & = 0 \EQN wprop1;a \cr
              E w_{t+1} w_{t+1}'\vert J_t & = I , \EQN wprop1;b \cr}$$
where %$J_t = \left[\matrix{w_t & \cdots & w_1 & x_0 \cr} \right]$
$J_t = [w_t, w_{t-1}, \ldots, w_1, x_0]$
is the information set at $t$, and $E [ \ \cdot \ | J_t]$ denotes
the conditional expectation.  We impose no distributional
assumptions beyond \Ep{wprop1}.  A sequence $\{w_{t+1}\}$
satisfying equation \Ep{wprop1;a} is said to be a martingale
difference sequence adapted to $J_t$.
% A sequence
% $\{z_{t+1}\}$ that satisfies $E [z_{t+1}|J_t ] = z_t$ is said
% to be a martingale adapted to $J_t$.
 \index{martingale!difference sequence}%
\index{martingale}%

\medskip
An even weaker assumption is
\specsec{Assumption A3:}  $w_{t+1}$ is a process satisfying
$$ E w_{t+1} = 0 $$ for all $t$ and
$$ E w_{t} w_{t-j}' = \cases{ I, & if $ j=0$; \cr
                              0, & if $j \neq 0 $. \cr}  $$
A process satisfying assumption A3 is said to be
a vector ``white noise.''\NFootnote{Note that \Ep{wprop1;a}
by itself allows the distribution of  $w_{t+1}$   conditional on $J_t$ to be
heteroskedastic.}
\index{white noise}%

Assumption A1 or A2 implies assumption A3 but not vice versa.  Assumption
A1 implies assumption A2 but not vice versa.
Assumption A3 is sufficient to justify the formulas that we report
below for second moments.
  We shall often append an observation equation  $y_t = G x_t$ to
equation \Ep{diff1}
and deal with the augmented system
$$ \EQNalign{x_{t+1} & = A_o x_t + C w_{t+1} \EQN statesp1;a \cr
        y_t & = G x_t . \EQN statep1;b \cr}$$
Here $y_t$ is a vector of variables observed at $t$, which
may include only some linear combinations of $x_t$.   The
system \Ep{statesp1} is often called a linear {\it state-space system}.

\medskip
\noindent{\bf Example 1.}  Scalar second-order autoregression:
Assume that $z_t$ and $w_t$ are   scalar processes and that
$$ z_{t+1} = \alpha + \rho_1 z_t+ \rho_2 z_{t-1} + w_{t+1}. $$
Represent this relationship as
the system
$$ \eqalign{ \left[\matrix{z_{t+1} \cr
                 z_t \cr
                  1 \cr} \right] &=
    \left[\matrix{\rho_1 & \rho_2 & \alpha\cr
                   1 & 0 & 0 \cr
                   0 &0 & 1 \cr}\right]
    \left[\matrix{z_t \cr
                  z_{t-1} \cr
                   1 \cr} \right]
    + \left[\matrix{ 1 \cr
                     0 \cr
             0 \cr} \right] w_{t+1}  \cr
     z_t  & = \left[\matrix{1 & 0 & 0\cr} \right]
    \left[\matrix{z_t \cr
                  z_{t-1} \cr
                   1 \cr} \right]  \cr} $$
which has form \Ep{statesp1}.
\medskip
\noindent{\bf Example 2.}  First-order scalar mixed moving
average and autoregression:  Let
$$ z_{t+1} = \rho z_t + w_{t+1} + \gamma w_t. $$
Express this relationship as
$$ \eqalign{ \left[\matrix{z_{t+1} \cr
                   w_{t+1} \cr}\right]
                 & = \left[\matrix{\rho & \gamma \cr
                                    0 & 0 \cr} \right]
                 \left[\matrix{ z_t \cr w_t \cr}\right]
         + \left[ \matrix{1 \cr 1 \cr}\right] w_{t+1} \cr
     z_t & = \left[\matrix{1 & 0 \cr} \right]
               \left[\matrix{z_t \cr
                             w_t \cr} \right]. \cr} $$


\noindent{\bf Example 3.}  Vector autoregression:
  Let $z_t$ be an $n \times 1$ vector of random variables.
We define a \idx{vector    autoregression} by a stochastic
difference equation
$$ z_{t+1} = \sum_{j=1}^4 A_j z_{t+1-j} + C_y w_{t+1}, \EQN vecaug $$
where $w_{t+1}$ is an $n \times 1$ martingale difference
sequence satisfying equation
 \Ep{wprop1} with $x_0' = \left[\matrix{z_0 & z_{-1} &
z_{-2} & z_{-3}\cr} \right]$ and $A_j$ is an $n \times n$ matrix
for each $j$.
We can map equation \Ep{vecaug} into equation
 \Ep{diff1} as follows:
$$ \left[\matrix{z_{t+1} \cr z_t \cr z_{t-1} \cr z_{t-2} \cr}\right]=
  \left[\matrix{A_1 & A_2 & A_3 &  A_4 \cr
                I & 0 & 0 & 0 \cr
                0 & I & 0 & 0 \cr
                0 & 0 & I & 0 \cr}\right]
\left[\matrix{z_t \cr z_{t-1} \cr z_{t-2} \cr z_{t-3} \cr}\right]
+ \left[\matrix{C_y \cr 0 \cr 0 \cr 0 \cr} \right]  w_{t+1} .
  \EQN vecaug2 $$
Define $A_o$ as the state transition matrix in equation \Ep{vecaug2}.
Assume
that $A_o$ has all of its eigenvalues bounded in modulus
below unity. Then equation \Ep{vecaug}  can be initialized
so that $z_t$ is {\it covariance stationary\/}, a term we
 define soon.

\subsection{First and second moments}
We can use equation \Ep{diff1} to deduce
the first and second moments of the sequence of random vectors
$\{x_t\}_{t=0}^\infty$.  A sequence of random vectors is called a
{\it stochastic process\/}.\index{stochastic!process}%
\index{covariance stationary}
\medskip
%\specsec{Definition:}
\definition{def:covstat}  A stochastic process $\{x_t\}$ is said to
be {\it covariance stationary\/} if it
satisfies the following two properties: (a) the mean is
independent of time,
 $ E x_t = E x_0$ for all $t$,
 and (b) the sequence of autocovariance matrices
 $E(x_{t+j} - E x_{t+j})(x_t - E x_t)'$
depends on the separation between dates
 $j = 0, \pm 1, \pm 2, \ldots$, but not on $t$.
\enddefinition
\smallskip

We use
\definition{stable}
A square real valued matrix $A_o$ is said to be {\it stable\/} if
all of its eigenvalues modulus   are strictly less than
unity.
\enddefinition
\smallskip
We shall often find it useful to assume that \Ep{statesp1} takes the
special form
$$ \left[\matrix{x_{1,t+1}  \cr x_{2,t+1} \cr}\right]
  = \left[\matrix{1 & 0 \cr 0 & \tilde A \cr}\right]
  \left[ \matrix{x_{1,t} \cr x_{2t} \cr} \right]
   + \left[\matrix{0 \cr \tilde C\cr}\right] w_{t+1}
   \EQN statesp10 $$
where $\tilde A$ is a stable matrix. That $\tilde A$ is a stable
matrix implies that the only solution of $(\tilde A  -  I) \mu_2
=0$ is $\mu_2=0$ (i.e., $1$ is {\it not\/} an eigenvalue of
$\tilde A$). It follows that the matrix $A_o=
   \left[\matrix{1 & 0 \cr 0 & \tilde A \cr}\right]$
on the right side of \Ep{statesp10} has one eigenvector associated
with a single unit eigenvalue: $(A_o  - I)\left[\matrix{\mu_1 \cr
  \mu_2 \cr}\right] =0$ implies
$\mu_1$ is an arbitrary scalar and $\mu_2 =0$. The first equation
of \Ep{statesp10} implies that $x_{1,t+1} = x_{1,0}$ for all $t
\geq 0$.  Picking the initial condition $x_{1,0}$ pins down a
particular eigenvector $\left[\matrix{x_{1,0} \cr 0 \cr}\right]$
of $A_o$. As we shall see soon, this eigenvector is our candidate
for the unconditional mean of $x$ that makes the process
covariance stationary.

We will make an assumption that guarantees that there exists an initial
condition $(\mu_0, \Sigma_0) = (Ex_0, E(x - Ex_0) (x-Ex_0)')$  that makes
the $x_t$ process covariance stationary.
Either of the following conditions works:

\medskip
\specsec{Condition A1:}  All of the eigenvalues of $A_o$ in
\Ep{statesp1} are strictly less than $1$ in modulus.
\medskip
\specsec{Condition A2:} The state-space representation takes the special
form \Ep{statesp10} and all of the eigenvalues of $\tilde A$ are strictly
less than $1$ in modulus.
\medskip

To discover the first and second moments of
the $x_t$ process,
we regard   the initial condition  $x_0$ as
being drawn from a distribution with mean $ \mu_0 = E x_0$ and
covariance $\Sigma_0 = E (x- E x_0) (x - E x_0)'$.    We shall deduce
starting values for the mean and covariance
that make the process covariance
stationary, though our formulas are also useful for describing
what happens when we start from other initial conditions that
generate transient behavior that stops the process from being covariance
stationary.

Taking mathematical expectations on both sides of equation
\Ep{diff1} gives
$$ \mu_{t+1} = A_o \mu_t \EQN diff2 $$
where $\mu_t = E x_t $.   We will assume that all of the eigenvalues
of $A_o$ are strictly  less than unity in modulus, except possibly for one that
is affiliated with the constant terms in the various equations.  Then
$x_t$  possesses a stationary mean defined to satisfy
$\mu_{t+1} = \mu_t$, which from equation \Ep{diff2} evidently satisfies
$$ (I - A_o) \mu = 0 , \EQN diff3  $$
which characterizes the mean $\mu$ as an eigenvector
associated with the single unit eigenvalue of $A_o$. The condition
 that the remaining eigenvalues of $A_o$ are less than unity
in modulus implies that starting from any $\mu_0$, $\mu_t
\rightarrow \mu$.\NFootnote{To understand this, assume that the
eigenvalues of $A_o$ are distinct, and use the representation $A_o
= P \Lambda P^{-1}$ where $\Lambda$ is a diagonal matrix of the
eigenvalues of $A_o$, arranged in descending order of magnitude,
and $P$ is a matrix composed of the corresponding eigenvectors.
Then equation  \Ep{diff2} can be represented as $\mu_{t+1}^* =
\Lambda \mu_t^*$, where $\mu_t^* \equiv  P^{-1} \mu_t$, which
implies that $\mu_t^* = \Lambda^t \mu_0^*$.  When all eigenvalues
but the first are less than unity,  $\Lambda^t$ converges to a
matrix of zeros except for the $(1,1)$ element, and $\mu_t^*$
converges to a vector of zeros except for the first element, which
stays at $\mu_{0,1}^*$, its initial value, which we are free to set equal to
 $1$, to capture the constant.  Then $\mu_t = P \mu_t^*$ converges
to $P_1 \mu_{0,1}^* = P_1$, where $P_1$ is the eigenvector corresponding
to the unit eigenvalue.}

Notice that
$$ x_{t+1} - \mu_{t+1} = A_o ( x_t - \mu_t) + C w_{t+1}. \EQN diff4 $$
  From equation \Ep{diff4},
we can compute that the law of motion of the  covariance matrices
$ \Sigma_t \equiv E(x_t - \mu_t) (x_t - \mu_t)' .$
Thus,
$$ E(x_{t+1} - \mu_{t+1}) (x_{t+1} -\mu_{t+1})' = A_o E(x_t -\mu_t) (x_t-\mu_t)'A_o'
  + C C'$$
or
$$ \Sigma_{t+1} = A_o \Sigma_t A_o' + C C' . $$
A fixed point of this matrix difference equation evidently  satisfies
$$ \Sigma_\infty = A_o \Sigma_\infty A_o' + C C' .  \EQN diff5 $$
A fixed point $ \Sigma_\infty  $
 is the covariance matrix $E (x_t -\mu) (x_t - \mu)'$ under  a stationary distribution of $x$.
%
% Thus, to compute $C_x(0)$, we  must solve
% $$ C_x(0)   =
%    A_o C_x(0) A_o^\prime  + C C' ,\EQN diff5 $$
% where $C_x(0) \equiv E (x_t -\mu) (x_t - \mu)'$.
Equation \Ep{diff5} is
a {\it discrete Lyapunov} equation in the $n \times n$ matrix
$\Sigma_\infty$.  It can be solved with the Matlab program {\tt
doublej.m}. % ^^|doublej.m|
\mtlb{doublej.m}%


By virtue of  \Ep{diff1} and \Ep{diff2}, note that for $j \geq 0$
$$ (x_{t+j} - \mu_{t+j}) = A_o^j (x_t - \mu_t) + C w_{t+j} + \cdots
  +   A_o^{j-1} C w_{t+1}.  $$
Postmultiplying both sides by $(x_t - \mu_t)'$
and taking expectations shows that
the autocovariance sequence satisfies
$$ \Sigma_{t+j,t} \equiv E (x_{t+j} - \mu_{t+j}) (x_t - \mu_t)' =
       A_o^j \Sigma_t. \EQN diff6 $$
Note that $\Sigma_{t+j,t}$ depends on both $j$, the gap between dates, and $t$, the earlier date.

In the special case that $\Sigma_t = \Sigma_\infty$ that solves the discrete Lyapunov equation
\Ep{diff5}, $\Sigma_{t+j,t} = A_0^j \Sigma_\infty$ and so depends only on the gap $j$ between time
periods. In this case, an autocovariance  matrix sequence $\{\Sigma_{t+j,t}\}_{j=0}^\infty$ is often  also called an {\it
autocovariogram}.  \index{autocovariogram}%

%
%   Once \Ep{diff5},  is solved, the
% remaining second moments $\Sigma_{t+j,t}$ can be deduced from equation
%  \Ep{diff6}.\NFootnote{Notice that
% $C_x(-j) = C_x(j)'$.}

Suppose that $y_t = G x_t$.  Then
 $\mu_{yt}= E y_t = G \mu_t$ and
$$ E (y_{t+j} - \mu_{yt+j}) (y_t - \mu_{yt})'
                              = G \Sigma_{t+j,t} G',  \EQN ydiff2  $$
for $j=0, 1,\ldots$. Equations  \Ep{ydiff2} show that %are matrix versions of
%the so-called \idx{Yule-Walker equations}, according to which
the autocovariogram for a stochastic   process governed by a stochastic linear
difference equation  obeys the nonstochastic version of that
difference equation.



\subsection{Summary of moment formulas}
\medskip

%\table{ch1table}
%\caption{Some formulas for moments}


\ruledtable
Object | Formula \cr
unconditional mean   |  $ \mu_{t+1} = A_o \mu_t$ \nr
unconditional covariance | $\Sigma_{t+1} = A_o \Sigma_{t} A_o' + C C' $ \cr
 $E [x_t\vert x_0] $ | $A_o^t x_0$ \nr
$E(x_t - E_0 x_t)(x_t - E_0 x_t)'$ | $\sum_{h=0}^{t-1} A_o^h C C' (A_o^h)'$ \cr
stationary mean | $(I - A_o)\mu =0$ \nr
stationary variance | $\Sigma_\infty = A_o \Sigma_\infty A_o' + C C' $ \nr
stationary autocovariance |$ \Sigma_{t+j,t} = A_o^j \Sigma_\infty$
\endruledtable
The accompanying table summarizes some formulas for various conditional and unconditional first and second moments of
the state $x_t$ governed by our linear stochastic state space system $A_o, C, G$.
In section \use{sec:populationregression}, we select some moments and use them to form  population linear regressions.


%
\subsection{Impulse response function}
Suppose that the eigenvalues of $A_o$ not associated with
the constant are bounded above in modulus  by unity.  Using
the lag operator $L$ defined by $L x_{t+1} \equiv x_t$,
express equation \Ep{diff1} as
$$ (I - A_o L) x_{t+1} = C w_{t+1} . \EQN diff2 $$
%Recall the Neumann expansion
%$ (I - A_o L)^{-1} =  (I + A_o L + A_o^2 L^2 + \cdots \, )$ and apply
%$(I - A_o L)^{-1}$ to both sides of equation \Ep{diff2} to get
%$$ x_{t+1}=  \sum_{j=0}^{\infty} A_o^j C w_{t+1-j}, \EQN dsoln1 $$
%which is the solution of equation \Ep{diff1}  assuming that
%equation \Ep{diff1} has been operating for the infinite past
%before $t=0$.  Alternatively,
Iterate equation \Ep{diff1} forward from
$t=0$ to get
$$ x_t = A_o^t x_0 + \sum_{j=0}^{t-1} A_o^j C w_{t-j}
  \EQN dsoln2 $$
Evidently, $$ y_t =G A_o^t x_0 + G\sum_{j=0}^{t-1} A_o^j C w_{t-j}
  \EQN dsoln3 $$
and $E y_t | x_0 = G A_o^t x_0$.
Equations % \Ep{dsoln1},
 \Ep{dsoln2} and \Ep{dsoln3}
 are examples of
a \idx{moving average representation}.
Viewed as a function of lag $j$, $h_j = A_o^j C$  or $\tilde h_j = G
A_o^j C$  is called
the {\it impulse response function\/}. \index{impulse response
function}%
  The moving average representation
and the associated impulse response function show
how $x_{t+j}$ or $y_{t+j}$ is affected by lagged values
of the shocks, the $w_{t+1}$'s.  Thus, the contribution of a shock
$w_{t-j}$ to $x_t$ is $A_o^jC$.\NFootnote{The Matlab programs {\tt
dimpulse.m} and {\tt impulse.m} compute impulse response
functions.}

Equation \Ep{dsoln3} implies that the $t$-step ahead conditional covariance matrices
are given by
$$ E(y_t - E y_t | x_0)(y_t - E y_t | x_0)'
= G[ \sum_{h=0}^{t-1} A_o^h C C' A_o^{h \prime} ] G'. \EQN jstepy $$
 % ^^|dimpulse.m|  ^^|impulse.m|
\mtlb{dimpulse.m} \mtlb{impulse.m}
\subsection{Prediction and discounting}
  From equation \Ep{diff1} we can compute the useful prediction formulas
$$ E_t x_{t+j} = A_o^j x_t \EQN predstoc1  $$
for $j \geq 1$, where $E_t(\cdot)$ denotes the mathematical
expectation conditioned on $x^t = (x_t, x_{t-1}, \ldots, x_0)$.
Let $y_t = G x_t$, and  suppose that we want to compute
$ E_t \sum_{j=0}^\infty \beta^j y_{t+j} $.
Evidently,
$$ E_t \sum_{j=0}^\infty \beta^j y_{t+j} = G(I -\beta A_o)^{-1} x_t,
\EQN discount1 $$
provided that the eigenvalues of $\beta A_o$ are less than
unity in modulus.  Equation \Ep{discount1} tells us how to compute
an expected discounted sum, where the discount factor $\beta$ is
constant.

\subsection{Geometric sums of quadratic forms}
In some applications, we want to calculate
$$
  \alpha_t = E_t \sum^\infty_{j=0} \beta^j
x^\prime_{t+j} Y x_{t+j} $$
where $x_t$ obeys the stochastic difference equation \Ep{diff1} and
$Y$ is an $n \times n$ matrix.
To get a formula for $\alpha_t$, we use a \idx{guess-and-verify method}.
We guess that $\alpha_t$ can be written in the form
$$\alpha_t = x^\prime_t \nu x_t + \sigma, \EQN asset15$$
where $\nu$ is an $(n\times n)$ matrix and $\sigma$ is a scalar.  The
definition of $\alpha_t$ and the guess \Ep{asset15} imply\NFootnote{Here we are repeatedly using
the fact that for two conformable matrices $A, B$, ${\rm trace} A B = {\rm trace} B A$
to conclude that $E (w_{t+1}' C' \nu C w_{t+1} ) = E {\rm trace}( \nu C w_{t+1} w_{t+1}' C' )
= {\rm trace} (\nu C E w_{t+1} w_{t+1}' C') = {\rm trace} (\nu C C')$.}
$$\eqalign{\alpha_t &= x^\prime_t Y x_t
        + \beta E_t (x^\prime_{t+1}
\nu x_{t+1} + \sigma)\cr
&= x^\prime_t Y x_t
     +\beta E_t \left[ (A_ox_t +
   Cw_{t+1})^\prime\,\nu\, (A_o x_t+Cw_{t+1})
 + \sigma
\right]\cr
&= x^\prime_t (Y + \beta A_o^\prime \nu A_o)\,x_t
     +\beta\hbox{ trace} (\nu CC^\prime)+\beta\sigma .\cr}$$
It follows that $\nu$ and $\sigma$ satisfy
$$\eqalign{ \nu &= Y + \beta A_o^\prime \nu A_o\cr
\sigma &= \beta \sigma +\beta \hbox{ trace }\nu CC^\prime .\cr}
\EQN asset16$$   \index{Lyapunov equation!discrete}%
 The first equation of \Ep{asset16} is a {\it discrete Lyapunov
equation\/} in the square matrix $\nu$ and can be solved by using
one of several algorithms.\NFootnote{The Matlab control toolkit
has a program called {\tt dlyap.m} that works when all of the eigenvalues of
$A_o$ are strictly less than unity; the program called
{\tt doublej.m} works even when there is a unit eigenvalue associated with the constant.}
  After $\nu$ has been computed, the second equation can be
solved for the scalar $\sigma$.
%^^|dlyap.m|
%^^|doublej.m|
\mtlb{dlyap.m} \mtlb{doublej.m} %



We mention two important applications of formulas \Ep{asset15} and
\Ep{asset16}.
\medskip
\subsubsection{Asset pricing}
  Let $y_t$  be governed by  the state-space system \Ep{statesp1}.
In addition, assume that there is a scalar random process $z_t$ given
by
$$  z_t = H x_t. $$
Regard the process $y_t$  as a payout or dividend from an asset,
and regard  $\beta^t z_t$ as a \idx{stochastic
discount factor}.  The price of a perpetual claim on
the stream of payouts is
$$ \alpha_t = E_t \sum_{j=0}^\infty (\beta^j z_{t+j}) y_{t+j} .\EQN discount2 $$
To compute $\alpha_t$, we simply set $Y = H' G$ in \Ep{asset15} and \Ep{asset16}.
In this application,
 the term $\sigma$  functions as a
risk premium; it is zero when $C=0$.


\subsubsection{Evaluation of dynamic criterion}\label{sec:Howard1}%
 Let a state $x_t$ be governed by
$$ x_{t+1} = A x_t + B u_t + C w_{t+1} \EQN lasset1  $$
where $u_t$ is a control vector that is set by a decision maker according to
a fixed rule
$$ u_t = - F_0 x_t .\EQN lasset2 $$
Substituting \Ep{lasset2} into \Ep{lasset1} gives \Ep{diff1}
where
$A_o = A - BF_0$.
We want to compute the  {\it value function\/}
\index{value function}%
$$  v(x_0) = - E_0 \sum_{t=0}^\infty \beta^t [x_t' R x_t + u_t' Q u_t] $$
for fixed positive definite matrices $R$ and $Q$,  fixed decision rule $F_0$ in \Ep{lasset2},
$A_o=A-BF_0$, and arbitrary initial condition $x_0$.
Formulas  \Ep{asset15} and \Ep{asset16} apply with
$Y = R + F_0' Q F_0$ and $A_o=A-BF_0$.
Express the solution as
$$ v(x_0) = - x_0' P_0 x_0 - \sigma  \EQN lasset3 $$
where by applying formulas \Ep{asset15}
and \Ep{asset16}, $P_0$ satisfies  the following formula:
$$ P_0 = R + F_0'QF_0 + \beta (A -BF_0)' P_0 (A-BF_0), \EQN lasset7 $$
which can be recognized to be a discrete Lyapunov equation of the form
of the first equation \Ep{asset16}.
Given $F_0$, formula \Ep{lasset7} determines the matrix $P_0$ in the
value function that describes the expected discounted value of the sum
of payoffs from
sticking forever with this decision rule.


Now consider the following one-period problem.  Suppose that
we must use decision rule $F_0$ from time $1$ onward,  so that
the value at time $1$ on starting from state $x_1$ is
$$ v(x_1) = - x_1' P_0 x_1 - \sigma. \EQN lasset4 $$
Taking $u_t= - F_0 x_t$ as given for $t \geq 1$,
what is the best choice of $u_0$?  This   leads
to the optimum problem:
$$ \max_{u_0} - \{x_0' R x_0 + u_0' Q u_0
  + \beta E(A x_0 + B u_0 + C w_1)' P_0 (A x_0 + B u_0 + C w_1)
  + \beta \sigma\}. \EQN lasset5 $$
The first-order conditions for this problem can be rearranged to attain
$$ u_0 = - F_1 x_0 \EQN lasset5 $$
where $$ F_1 = \beta(Q + \beta B' P_0 B)^{-1} B' P_0 A.
 \EQN lasset6 $$
 Given $P_0$, formula
\Ep{lasset6} gives the best decision rule $u_0 = - F_1
x_0$ if at $t=0$ you are permitted only a one-period deviation from the
rule $u_t = - F_0 x_t$ that has to be used  for $t \geq 1$.   If $F_1 \neq F_0$, we say that the decision
maker would accept the opportunity to deviate from $F_0$ for one period.

  It is tempting to iterate on \Ep{lasset6} and \Ep{lasset7} as follows
to seek a decision rule from which a decision maker would not want to
deviate for one period:
(1) given an $F_0$, find  $P_0$; (2) reset $F$ equal to the
$F_1$ found in step 1, then to substitute it for $F_0$ in  \Ep{lasset7} to compute
a new $P$, call it $P_1$; (3) return to step 1 and iterate to convergence.
This leads to the two equations
$$\eqalign{P_{j} & = R + F_j'QF_j + \beta (A -BF_j)' P_{j} (A-BF_j) \cr
 F_{j+1} & = \beta(Q + \beta B' P_j B)^{-1} B' P_j A \cr
 } \EQN howardimprove $$
which are to be initialized from an arbitrary
$F_0$ that ensures that $\sqrt{\beta}( A - B F_0)$ is a stable matrix.
After
this process has converged, one cannot find a value-increasing
one-period deviation
from the limiting decision rule $u_t = - F_\infty x_t$.\NFootnote{It turns out
that if you don't want to deviate for one period, then you would never
want to deviate, so that the limiting rule is optimal.}


As we shall see in chapter \use{practical},   this is
an excellent algorithm for solving a \idx{dynamic programming}
problem. It is  an example of the \idx{Howard policy improvement algorithm}.
In chapter \use{dplinear}, we describe an alternative algorithm that iterates on the following
equations
$$\eqalign{P_{j+1} & = R + F_j'QF_j + \beta (A -BF_j)' P_{j} (A-BF_j) \cr
 F_{j} & = \beta(Q + \beta B' P_j B)^{-1} B' P_j A, \cr
 } \EQN bellman_comparison_howard $$
that is to be initialized from an arbitrary positive semi-definite matrix $P_0$.\NFootnote{$P_0=0$ is
a popular choice.}





\section{Population regression}\label{sec:populationregression}%
\index{population regression}%
This section explains the notion of a population regression
equation.  \index{regression!population}%
  Suppose that we have a state-space system \Ep{statesp1} with
initial conditions that make it covariance stationary. We can use
the preceding formulas to compute the second moments of any pair
of random variables.  These moments let us compute a linear
regression.   Thus, let $X$ be a $p \times 1$ vector of random
variables somehow selected from the stochastic process $\{y_t\}$
governed by the system \Ep{statesp1}.
   For example,  let $p=  2m$, where
$y_t$ is an $m \times 1$ vector, and take $X =  \bmatrix{y_t \cr y_{t-1}\cr}$ for any $t \geq 1$.   Let $Y$ be any scalar
random variable selected from the $m \times 1$ stochastic process
$\{y_t\}$.  For example, take $Y = y_{t+1,1}$ for the same $t$
used to define $X$, where $y_{t+1,1}$ is the first component of
$y_{t+1}$.

  We consider the following least-squares approximation problem:
find a  $1\times p$ vector of real numbers $\beta$ that attain
$$ \min_\beta E (Y -  \beta X)^2.  \EQN leastsq  $$
Here $ \beta X$ is being  used to estimate $Y$, and we want the
value of $\beta$ that minimizes the expected squared error. The
first-order necessary condition for minimizing $E (Y -  \beta
X)^2$ with respect to $\beta$ is
$$ E  (Y -  \beta  X) X' = 0 , \EQN normal$$
which  can be rearranged as\NFootnote{That
 $E X'X$ is nonnegative definite implies that
the second-order conditions for a minimum of condition \Ep{leastsq} are
satisfied.}
$$ \beta =  (  E    YX') [E(X X')]^{-1}.  \EQN leastsq2  $$

  By using the formulas \Ep{diff3}, \Ep{diff5}, \Ep{diff6},  and
\Ep{ydiff2}, we can compute $E  X X'$ and $E Y X'$ for whatever
selection of $X$ and $Y$ we choose. The condition \Ep{normal} is
called the least-squares normal equation.  It states that the
projection error $Y - \beta X $ is orthogonal to $X$. Therefore,
we can represent $Y$ as
$$ Y =  \beta X + \epsilon \EQN regress $$
where $E  \epsilon X' =0$.   Equation \Ep{regress} is called a
population regression equation, and $\beta X $ is called the least-squares
projection of $Y$ on $X$ or the least-squares regression of $Y$ on
$X$. The vector $\beta$ is called the population least-squares
regression vector.
 The law of large numbers for continuous-state Markov processes,
\Theorem{LLNMarkovcontinuous},
\index{law of large numbers}%
states conditions  that guarantee that sample moments converge to
population moments, that is, ${1 \over S} \sum_{s=1}^S  X_s X_s'
\rightarrow E  X X' $ and ${ 1\over S} \sum_{s=1}^S  Y_s X_s'
\rightarrow E  Y X'$. Under those conditions, sample least-squares
estimates converge to $\beta$. \index{regression equation}
\index{least-squares projection}


  There are as many such regressions  as there are ways of selecting
$Y, X$.    We have shown how a model (e.g., a triple $A_o, C, G$,
together with an initial distribution for $x_0$) restricts a
regression.    Going backward, that is, telling what a given
regression tells about a model, is more difficult. Many regressions tell little about the model, and what little they
have to say can be difficult to decode.    As we indicate in sections \use{sec:estimation1} and \use{sec:estimation2}, the \idx{likelihood
function} completely describes what a given data set says about a model in a way that  is straightforward to decode.

  \subsection{Multiple regressors}
Now let $Y$ be an $n \times 1$ vector of random variables and
think of regression solving the least squares problem for each of
them to attain a representation
$$ Y = \beta X  + \epsilon \EQN leastsq10 $$
where $\beta$ is now $n \times p$ and $\epsilon$ is now an $n
\times 1$ vector of least squares residuals.  The population
regression coefficients are again given by
$$ \beta = E(Y X') [E(X X')]^{-1} . \EQN leastsq12 $$
We will use this formula repeatedly in section \use{sec:Kalman100} to derive
the Kalman filter.\index{Kalman filter}%




\section{Estimation of model parameters}\label{sec:estimation1}%
We have shown  how to map the matrices $A_o, C$
into all of the second moments of the
stationary distribution of the stochastic process $\{x_t\}$.
Linear economic models typically give $A_o, C$ as functions
of a set of deeper parameters $\theta$. We shall
give examples of  such models in chapters \use{practical} and
\use{dplinear}.
Such a model and the formulas
of this chapter give us a mapping from $\theta$ to these theoretical moments
of the $\{x_t\}$ process.  That  mapping
 is an  important ingredient of econometric methods designed to estimate a
wide class of \idx{linear rational expectations models} (see
Hansen and Sargent, 1980, 1981).
\auth{Hansen, Lars P.}%
 Briefly,
these methods use the following procedures to match theory to data.   To simplify, we shall assume that at time $t$,
observations are
available on the entire state $x_t$. As discussed in section \use{sec:estimation2},
 the details are more complicated
if only a subset of the state vector or a noisy signal of the state is observed,
though the basic principles remain the same.

\index{distribution!Gaussian} \index{likelihood
function!Gaussian} Given a sample of observations for
 $\{x_t\}_{t=0}^T \equiv x_t,
t=0,\ldots, T$, the likelihood function  is defined as
the joint probability distribution
$f(x_T, x_{T-1}, \ldots, x_0)$.  The likelihood function
can be {\it factored} using
$$\eqalign{ f(x_T, \ldots, x_0) &= f(x_T \vert x_{T-1}, \ldots , x_0)
     f(x_{T-1} \vert x_{T-2}, \ldots, x_0) \cdots \cr
      & f(x_1 \vert x_0) f(x_0),\cr} \EQN diff9 $$
where in each case $f$ denotes an appropriate probability distribution.
For system \Ep{diff1}, $ f(x_{t+1} \vert x_t, \ldots, x_0)
= f(x_{t+1}\vert x_t)$, which
follows from the Markov property possessed
by equation \Ep{diff1}.
  Then the likelihood function
has the recursive form
$$ f(x_T, \ldots, x_0) = f(x_T \vert x_{T-1})
     f(x_{T-1} \vert x_{T-2}) \cdots
       f(x_1 \vert x_0) f(x_0). \EQN diff10 $$
 If we  assume  that
the $w_t$'s are Gaussian, then the conditional
distribution $f(x_{t+1}\vert x_t)$ is Gaussian with
mean $A_o x_t$ and covariance matrix $C C'$.
Thus, under the Gaussian distribution,
the log  of the conditional density
of the $n$ dimensional vector $x_{t+1}$ becomes
 $$ \eqalign{ \log f(x_{t+1} \vert x_t) & =  -.5  n \log (2 \pi)
  -.5 \ \log {\rm det} \ (
 C  C')  \cr & - .5
(x_{t+1} - A_o x_t)' (C  C')^{-1} (x_{t+1} - A_o x_t) \cr} \EQN Gauss1 $$
Given an assumption about the distribution of the initial condition
$x_0$, equations
 \Ep{diff10} and \Ep{Gauss1}   can be used to form  the likelihood function
of a sample of observations on $\{x_t\}_{t=0}^T$. One computes
\idx{maximum likelihood} estimates by using a hill-climbing
algorithm  to maximize the likelihood function with respect to
free parameters that  determine $A_o, C$.\NFootnote{For example, putting those free parameters into  a vector $\theta$,
think of $A_o, C$ as being  the matrix functions $A_o(\theta), C(\theta)$.}

  When the state  $x_t$  is not observed, we need to go beyond the likelihood function for
$\{x_t\}$.   One approach uses filtering methods
\index{filter!linear} to build up the likelihood function for the
subset of observed variables.\NFootnote{See Hamilton (1994), Canova (2007), DeJong and Dave (2011), and section \use{sec:estimation2} below.}
 In section \use{sec:Kalman100}, we derive the Kalman filter as an application
 of the population regression formulas of section \use{sec:populationregression}. %\NFootnote{Also see chapter \use{dplinear}
% and Appendix \use{lincontrol} (see Technical
% Appendixes).}
 Then in section \use{sec:estimation2}, we use the Kalman filter as a device that tells us how to
find state variables that allow us recursively  to form
a likelihood function for observations of variables that are not themselves Markov.
\auth{Hamilton, James D.}%
\auth{Canova, Fabio}%
\auth{DeJong, David}%
\auth{Dave, Chetan}%



\section{The Kalman filter}\label{sec:Kalman100}%
\index{population regression!Kalman filter}%
\index{hidden Markov model!Kalman filter}%
\tag{Kfilterchap1}{\the\pageno}%
As a fruitful  application of the population regression formula
\Ep{leastsq12}, we derive the celebrated Kalman filter for the
state space system for $t \geq 0$:\NFootnote{In exercise  \the\chapternum.22, we ask you to derive
the Kalman filter for a state space system that uses a different timing convention and that allows
the state and measurement noises to be correlated.}
$$\EQNalign{ x_{t+1} & = A_o x_t + C w_{t+1} \EQN kalf1 \cr
             y_{t} & = G x_t + v_t \EQN kalf2 \cr} $$
where $x_t$ is an $n \times 1$ state vector and $y_t$ is an $m
\times 1$ vector of signals on the hidden state; $w_{t+1}$
is a $p \times 1$ vector iid sequence of normal random variables
with mean $0$ and identity covariance matrix, and $v_t$ is another
iid vector sequence of normal random variables with mean zero and
covariance matrix $R$.  We assume that $w_{t+1}$ and $v_s$
are orthogonal (i.e., $E w_{t+1} v_s'=0$) for all $t+1$ and $s$ greater than or equal to $0$.
We assume that
$$x_0 \sim {\cal N}(\hat x_0, \Sigma_0) . \EQN kalf3 $$
This specification implies that
$$ y_0 \sim {\cal N}(G \hat x_0, G \Sigma_0 G' + R). \EQN kalf4 $$

The decision maker is assumed to observe $y_t, \ldots, y_0$ but not $x_t, \ldots , x_0$ at time
$t$.  He knows the structure \Ep{kalf1}-\Ep{kalf2} and the first and second moments implied by this
structure.
The decision maker  wants recursive formulas for the population regressions $\hat x_t = E [ x_t | y_{t-1}, \ldots, y_0]$ and
the covariance matrices
$\Sigma_t = E (x_t -\hat x_t)(x_t -\hat x_t)'$.


\tag{Kfilterchap2}{\the\pageno}%
We use the insight that the new information in $y_0$ relative to what is already
known $(\hat x_0, \Sigma_0)$ is $a_0 \equiv  y_0 - G \hat x_0 $ (and more generally, the new information at
$t$ relative to what can be inferred from the past is  $a_t = y_t - G \hat x_t$).  The decision maker regresses
what he doesn't know on what he does. Thus, first apply \Ep{leastsq12}  to compute the population
regression
$$ x_0 - \hat x_0 = L_0 (y_0 - G \hat x_0)  + \eta \EQN kalf5 $$
where $L_0$ is a matrix of regression coefficients and $\eta $ is a matrix of least squares residuals.  The least
squares orthogonality conditions are
$$ E(x_0 -\hat x_0)(y_0 - G \hat x_0)' = L_0 E(y_0 - G \hat x_0)(y_0 - G \hat
x_0)'. $$ Evaluating  the moment matrices and solving for $L_0$  gives the formula
$$ L_0 = \Sigma_0 G'(G \Sigma_0 G'+R)^{-1}. \EQN kalf6 $$

Define $\hat x_1 = E [x_1 | y_0]$.\NFootnote{It is understood that the decision maker knows $\hat x_0$.  Instead of writing
$E  [x_1 | y_0, \hat x_0]$, we choose simply to write $E [x_1 | y_0]$, but we intend the meaning to be the same.  More generally,
when we write  $E[x_t | y^{t-1}]$, it is understood that the mathematical expectation is also conditioned on $\hat x_0$.}  Equation \Ep{kalf1} implies
that $E[x_1 | \hat x_0] = A_o\hat x_0$ and that
$$x_1 = A_o\hat x_0 + A_o(x_0 - \hat x_0)+C w_1 . \EQN kalf6a $$  Furthermore, applying
\Ep{kalf5} shows that
$E x_1 | y_0 = A_o\hat x_0 + A_oL_0 (y_0 - G \hat x_0)$, which we express
as
$$ \hat x_1 = A_o\hat x_0 + K_0 (y_0 - G \hat x_0) , \EQN kalf7 $$
where
$$ K_0 = A_o \Sigma_0 G'(G \Sigma_0 G'+R)^{-1} .\EQN kalf7a $$
Subtract \Ep{kalf7} from \Ep{kalf6a} to get
$$ x_1 -\hat x_1 = A_o(x_0- \hat x_0) + C w_1 - K_0 (y_0 - G
\hat x_0).   \EQN kalf8 $$
Use this equation and $y_0 = G x_0 + v_0$ to compute the following recursion for
$ E(x_1 -\hat x_1)(x_1 -\hat x_1)'= \Sigma_1$:
$$ \Sigma_1 = (A_o-K_0 G)\Sigma_0 (A_o-K_0 G)' + (C C' + K_0 R K_0') . \EQN kalf9 $$
Thus, we have that the distribution of $x_1 | y_0 \sim{\cal N}(\hat x_1, \Sigma_1)$.
%  Define $\hat x_t = E [ x_t | y_{t-1}, \ldots, y_0]$ and
%$\Sigma_t = E (x_t -\hat x_t)(x_t -\hat x_t)'$.
Iterating the above argument for $t \geq 2$ gives the recursion:
$$ \EQNalign{  a_t & = y_t - G \hat x_t \EQN kalf10;a  \cr
                   K_t & = A_o \Sigma_t G'(G \Sigma_t G'+R)^{-1} \EQN kalf10;b \cr
                    \hat x_{t+1} & = A_o\hat x_t + K_t a_t \EQN kalf10;c \cr
                   \Sigma_{t+1} & =  C C' + K_t R K_t' + (A_o-K_t G)\Sigma_t (A_o-K_t G)' .  \EQN kalf10;d \cr} $$
System \Ep{kalf10} is the celebrated Kalman filter, and $K_t$ is called the Kalman gain.
Substituting for $K_t$ from \Ep{kalf10;b}  allows us to rewrite \Ep{kalf10;d} as
$$ \Sigma_{t+1} = A_o\Sigma_t A_o' + CC' - A_o\Sigma_t G'(G\Sigma_t G' + R)^{-1} G \Sigma_t A_o', \EQN kalf10new $$
a formula that we shall be reminded of when we study dynamic programming  problems with linear constraints
and quadratic return functions.
Equation \Ep{kalf10new} is known as a matrix Riccati difference equation that restricts a sequence of  covariance matrices $\{\Sigma_t\}_{t=0}^\infty$.

The process $a_t = y_t - E[y_t | y_{t-1}, \ldots, y_0]$ is called the  `innovation' process in $y$. It is the part
of $y_t$ that cannot be predicted from past values of $y$.
Note that $E a_t a_t' = (G \Sigma_t G' + R)$, the moment matrix whose inverse appears on the right side of the
least squares regression formula \Ep{kalf10;b}.  A direct calculation that uses the formulas $a_t = G(x_t - \hat x_t) + v_t$
and $a_{t-1} = G (x_{t-1} - \hat x_{t-1}) + v_{t-1}$ to compute expected values of products shows that
$E a_t a_{t-1}' = 0$, and more generally that $E [a_t | a_{t-1}, \ldots, a_0]=0$.
 \index{Riccati equation!matrix difference}%
An alternative argument based on first principles proceeds as follows. Let $H(y^t)$ denote the linear space of all linear combinations
of $y^t$.   Note that $a_{t+1} = y_{t+1} - E y_{t+1}| y^t$;
that $a_{t} \in H(y^{t})$; that by virtue of being a least-squares error, $a_{t+1} \perp H(y^t)$; and that therefore $a_{t+1} \perp a_t$, and more
generally, $a_{t+1} \perp a^t$.  Thus, $\{a_t\}$ is a `white noise' process of innovations to the $\{y_t\}$ process.

Sometimes \Ep{kalf10} is called a `whitening filter' that takes a $\{y_t\}$ process of signals as an input and produces a process $\{a_t\}$ of
innovations as an output. The linear space $H(a^t)$ is evidently an orthogonal basis for the linear space $H(y^t)$.




In what will seem to be superficially very different contexts, we shall  encounter equations that will remind us
of \Ep{kalf10;b}, \Ep{kalf10;d}. See chapter \use{dplinear}, page \use{KfilterLQ}.
\index{Kalman filter}%
\index{filter!linear}%

%\midinsert{
%$$\grafone{r2red1c.eps,height=3in}{{\bf Figure 1.3}
%  Impulse response, spectrum, covariogram, and
%sample path
%of process $(1 - .98L) y_t =(1 - .7L) w_t $.}
%$$
%}\endinsert

\section{Applications  of the Kalman filter}\label{Kfilterapplications}%
\subsection{Muth's reverse engineering exercise}\label{sec:Muth_reverse}%
  Phillip Cagan  (1956) and Milton Friedman  (1957)
posited that to form
expectations of future values of a
scalar $y_t$, people use the following ``\idx{adaptive expectations}''
scheme:
$$ y_{t+1}^* = K \sum_{j=0}^\infty (1-K)^j y_{t-j} \EQN adapt1;a $$
or
$$ y_{t+1}^*  = (1-K) y_t^* + K y_t , \EQN adapt1;b$$
where $y_{t+1}^*$ is people's expectation.\NFootnote{See Hamilton (1994) and Kim and Nelson (1999)
for diverse applications of the Kalman   filter.   Appendix \use{lincontrol}
 (see Technical Appendixes)
 briefly describes  a discrete-state
nonlinear filtering problem.}    Friedman
used this scheme to describe people's forecasts of future
income.  Cagan used it to model their forecasts of inflation
during hyperinflations. Cagan and Friedman did not assert that
the scheme is an optimal one, and so did not fully defend it.
Muth (1960) wanted to understand the circumstances under which
this forecasting  scheme  would be optimal.  Therefore,
he sought a stochastic process for $y_t$ such that equation \Ep{adapt1}
would be optimal.  In effect, he posed and solved an
``\idx{inverse optimal prediction}'' problem of the form  ``You give me
the forecasting scheme; I have to find the stochastic process
that makes the scheme optimal.''
Muth solved the problem using classical (nonrecursive) methods.
    The Kalman
filter  was first described in print in the same year as Muth's  solution
of this problem (Kalman, 1960).  The Kalman filter lets  us solve Muth's problem quickly.
\auth{Muth, John F.}%
\auth{Friedman, Milton}%
\auth{Cagan, Phillip}%
\auth{Nelson, Charles R.}%
\auth{Kim, Chang-Jin}%

Muth studied the model
$$\EQNalign{x_{t+1} & = x_t + w_{t+1} \EQN muth1;a \cr
            y_t & = x_t + v_t, \EQN muth1;b \cr}$$
where  $y_t, x_t$ are scalar random processes,
and $w_{t+1}, v_t$ are mutually independent
i.i.d.\ Gaussian random processes with means of zero and
variances $E w_{t+1}^2 = Q, E v_t^2 =R$, and  $Ev_s w_{t+1} = 0$
for all $t,s$.  The initial condition
is that $x_0$ is Gaussian with mean $\hat x_0$ and
variance $\Sigma_0$. Muth sought formulas for
$\hat x_{t+1} = E [x_{t+1} \vert y^t]$, where
$y^t = [y_t, \ldots, y_0]$.


% Maria: please note how this was done.

\midfigure{muth1f}
\centerline{\epsfxsize=3true in\epsffile{muth1crop.eps}}
\caption{Graph of $f(\Sigma) = {
 {\Sigma(R+Q) + Q R} \over \Sigma + R }$, $Q= R=1$, against the 45-degree
line. Iterations on the Riccati equation for $\Sigma_t$ converge
to the fixed point.}
\infiglist{Muth}
\endfigure




% $$\grafone{muth1crop.eps,height=2in}{{\bf Figure 4.1}  Graph
% of $f(\Sigma) = {
%  {\Sigma(R+Q) + Q R} \over \Sigma + R }$, $ Q= R=1$, against the 45-degree
% line. Iterations on the Riccati equation for $\Sigma_t$ converge
%to the fixed point.}$$

  For this problem,  $A=1, CC'=Q, G=1$, making the Kalman filtering
 equations  become
$$\EQNalign{K_t &= {\Sigma_t   \over \Sigma_t + R } \EQN muth2;a \cr
            \Sigma_{t+1} & = \Sigma_t + Q - {\Sigma_t^2 \over
                            \Sigma_t +R}. \EQN muth2;b \cr }$$
The second equation can be rewritten
$$ \Sigma_{t+1} = { \Sigma_t(R+Q) + Q R \over \Sigma_t + R }. \EQN muth3$$
For $Q=R=1$, Figure \Fg{muth1f} plots the function
$f(\Sigma)
 = {\Sigma(R+Q) + Q R \over \Sigma + R }$ appearing on the right side
of equation \Ep{muth3} for values $\Sigma \geq 0$ against the
45-degree line. Note that $f(0) = Q$.
This graph identifies the fixed point of iterations on
$f(\Sigma)$ as the intersection of $f(\cdot)$ and the 45-degree line.
That the slope of $f(\cdot)$ is less than unity at the intersection
assures us that the iterations on $f$ will converge as $t \rightarrow
+\infty$ starting from any $\Sigma_0 \geq 0$.

   Muth studied the solution of this problem  as
$t \rightarrow \infty$.  Evidently, $\Sigma_t \rightarrow \Sigma_\infty
\equiv \Sigma$
is the fixed point of a graph like Figure \Fg{muth1f}.
Then $K_t \rightarrow K$ and the formula for
$\hat x_{t+1}$
becomes
$$ \hat x_{t+1} = (1-K)\hat x_t + K  y_t \EQN cagan$$
where $K  = {\Sigma \over \Sigma + R} \in (0,1)$.   This
is a version of Cagan's \idx{adaptive expectations} formula. It can be shown that
$K \in [0,1]$ is an increasing function of ${\frac{Q}{R}}$.  Thus, $K$ is the fraction
of  the innovation  $a_t$ that should be regarded as `permanent' and $1-K$ is the fraction that is
purely transitory.    Iterating
backward on equation \Ep{cagan} gives
$ \hat x_{t+1} = K  \sum_{j=0}^t (1-K)^j y_{t-j} +  (1-K)^{t+1} \hat x_0,$
which is a version of Cagan and Friedman's geometric
distributed lag formula.   Using equations \Ep{muth1}, we find
that $E [y_{t+j} \vert y^t]  = E [x_{t+j} \vert y^t] =
\hat x_{t+1}$ for all $j \geq 1$.  This result in conjunction with
equation \Ep{cagan}
establishes that the adaptive expectation formula \Ep{cagan}
gives the optimal forecast of $y_{t+j}$ for all horizons $j \geq 1$.
This finding  is remarkable  because for most processes,
the optimal forecast will depend on the horizon.  That
there is a single optimal forecast for all horizons  justifies the term {\it permanent income\/} that Milton
Friedman (1955) chose to describe the forecast of income.

  The dependence of the forecast on horizon can be studied using
the formulas
$$\EQNalign{ E \left[ x_{t+j}  \vert y^{t-1}\right] & = A^j    \hat x_t
                 \EQN forecast1;a \cr
             E \left[ y_{t+j} \vert y^{t-1} \right] & = G A^j \hat x_t
                 \EQN forecast1;b \cr }$$
In the case of Muth's example,
$$ E \left[y_{t+j} \vert y^{t-1} \right]  = \hat y_t = \hat x_t  \ \ \forall j
    \geq 0. $$

For Muth's model, the innovations representation is
$$\EQNalign{ \hat x_{t+1} & = \hat x_t + K a_t \cr
              y_t & = \hat x_t + a_t , \cr}$$
where $a_t = y_t - E [y_t | y_{t-1}, y_{t-2}, \ldots]$.  The innovations
representation implies that
$$ y_{t+1} - y_t = a_{t+1} + (K-1) a_t .\EQN mutharma $$
Equation \Ep{mutharma} represents $\{y_t\}$ as a process whose first difference
is a first-order moving average process. Notice how  Friedman's adaptive expectations
coefficient $K$ appears in this representation.



\auth{Jovanovic, Boyan}%
\subsection{Jovanovic's application}
  In chapter \use{search1}, we will describe a version of Jovanovic's (1979)
matching model, at the core of which is a ``signal-extraction''
problem that simplifies Muth's problem.
Let $x_t, y_t$ be scalars with $A=1, C= 0, G=1, R > 0$.
Let $x_0$ be Gaussian with mean $\mu$ and variance $\Sigma_0$.
Interpret $x_t$ (which is evidently constant with this specification)
as the hidden value of $\theta$, a ``match parameter''. Let
$y^t$ denote the history of $y_s$ from $s=0$ to $s=t$.  Define
$m_t \equiv \hat x_{t+1} \equiv E [\theta \vert y^t]$ and
$\Sigma_{t+1} = E(\theta - m_t)^2$.
Then  the Kalman filter becomes
$$\EQNalign{ m_t &  = (1-K_t) m_{t-1} + K_t y_t \EQN kalman6;a \cr
            K_t & = {\Sigma_t \over \Sigma_t + R} \EQN kalman6;b \cr
\noalign{\smallskip}
           \Sigma_{t+1} & = {\Sigma_t R \over \Sigma_t + R}.
 \EQN kalman6;c \cr}
        $$
The recursions are  to be initiated from
$(m_{-1}, \Sigma_0)$, a pair that embodies all ``prior'' knowledge about the
position of the system.  It is easy to see from Figure \Fg{muth1f} that when
$C C' \equiv Q=0$, $\Sigma =0$ is the limit point of iterations on equation \Ep{kalman6;c}
starting from any $\Sigma_0 \geq 0$.  Thus, the value of the match
parameter is eventually learned.

  It is instructive to write equation \Ep{kalman6;c} as
$$ {1 \over \Sigma_{t+1}} = {1  \over \Sigma_t} + {1 \over R}. \EQN kalman7 $$
The reciprocal of the variance is often called the
{\it precision\/} of the estimate.  According to equation
\Ep{kalman7} the precision  increases
without bound as $t$ grows, and $\Sigma_{t+1} \rightarrow 0$.\NFootnote{As
 a further special case, consider
 when there is zero precision
initially ($\Sigma_0 = + \infty$). Then solving the
difference equation \Ep{kalman7} gives ${1 \over \Sigma_t }
= t/ R$.    Substituting this into equations  \Ep{kalman6}
gives $K_t = (t+1)^{-1}$, so that the Kalman filter becomes $m_0 = y_0$ and
$ m_t  = [1-(t+1)^{-1}] m_{t-1} +  (t+1)^{-1} y_t$, which
implies that $m_t = (t+1)^{-1} \sum_{s=0}^t y_t$, the sample
mean, and $\Sigma_t = R /t$.}

   We can represent the Kalman filter in the form % \Ep{kalman4} as
$$  m_{t+1} = m_t + K_{t+1} a_{t+1} $$
which implies that
$$ E(m_{t+1} -m_t)^2 =  K_{t+1}^2 \sigma_{a,t+1}^2 $$
where $a_{t+1} = y_{t+1} - m_t$ and the variance
of $a_t$ is equal to
$\sigma_{a,t+1}^2 = (\Sigma_{t+1} +R)$ from equation
 \Ep{kalman5}.  This implies
$$ E(m_{t+1} - m_t)^2 = {\Sigma_{t+1}^2 \over \Sigma_{t+1}+R }. $$
For the purposes of our discrete-time counterpart of the Jovanovic
model in chapter \use{search1}, it will be
convenient to represent the motion
of $m_{t+1}$ by means of the equation
$$ m_{t+1} = m_t + g_{t+1} u_{t+1} $$
where $g_{t+1} \equiv \left({\Sigma_{t+1}^2 \over
\Sigma_{t+1} + R }\right)^{.5}$ and $u_{t+1}$
is a standardized i.i.d.\ normalized and standardized with
mean zero  and variance $1$
constructed to obey  $g_{t+1} u_{t+1} \equiv  K_{t+1} a_{t+1} $.


%$\frac{1}{ 3}$
\section{Vector autoregressions and the Kalman filter}\label{sec:VAR}%
\subsection{Conditioning on the semi-infinite past of $y$}
Under an interesting set of conditions summarized, for example, by
Anderson, Hansen, McGrattan, and Sargent (1996), iterations on \Ep{kalf10;b}, \Ep{kalf10;d} converge to time-invariant
$K, \Sigma$ for any positive semi-definite initial covariance matrix $\Sigma_0$.
A time-invariant matrix $\Sigma_t=\Sigma$ that solves \Ep{kalf10;d} is the covariance matrix
of $x_t$ around $E x_t | \{y^{t-1}_{-\infty}\} $, where $\{y^{t-1}_{-\infty}\}$ denotes the semi-infinite
history of $y_s$ for all dates on or before $t-1$.\NFootnote{The Matlab program {\tt kfilter.m} implements the
time-invariant Kalman filter, allowing for correlation between the $w_{t+1}$ and $v_t$ Also see exercise \the\chapternum.22.}
\mtlb{kfilter.m}%

\subsection{A time-invariant VAR}
Suppose that  the fixed point of \Ep{kalf10;d} just  described exists. If we initiate \Ep{kalf10;d} from this fixed point $\Sigma$, then the  innovations representation
becomes time invariant:
$$\EQNalign{ \hat x_{t+1} & = A_o \hat x_t + K a_t \EQN innovti;a \cr
             y_t & = G \hat x_t + a_t \EQN innovti;b \cr} $$
where $E a_t a_t' = G \Sigma G' + R$.  Use \Ep{innovti} to express
$ \hat x_{t+1} = (A -KG) \hat x_t + K y_t$.  If we assume that
the eigenvalues of $A-KG$ are bounded in modulus below unity,\NFootnote{Anderson, Hansen, McGrattan, and Sargent (1996) show assumptions that guarantee that the eigenvalue of $A-KG$ are bounded in modulus below unity.} we can
solve the preceding equation to get
$$\hat x_{t+1} = \sum_{j=0}^\infty  (A-KG)^j K y_{t-j}. \EQN xhatform $$
Then solving \Ep{innovti;b} for $y_t$ gives
the {\it vector autoregression\/}
$$ y_t = G \sum_{j=0}^\infty (A-KG)^j K y_{t-j-1} + a_t ,\EQN VAR1 $$
where by construction
$$ E[a_t y_{t-j-1}' ]= 0 \quad \forall j \geq 0 .\EQN VARorth $$
The orthogonality conditions \Ep{VARorth} identify \Ep{VAR1} as a \idx{vector autoregression}.

\subsection{Interpreting VARs}

Equilibria of economic models (or linear or log-linear approximations to them -- see chapter \use{linappro} and the examples
in section \use{sec:LQmodel} of this chapter and appendix \use{sec:backus} of chapter \use{assetpricing2})  typically
take the form of the state space system \Ep{kalf1},\Ep{kalf2}.  This   \idx{hidden Markov model} disturbs the evolution of the  state $x_t$
by the $p\times 1 $ shock vector $w_{t+1}$  and it perturbs the $ m \times 1$ vector of observed variables $y_t$
by the $ m \times 1$ vector of measurement errors. Thus,  $p + m$ shocks  impinge on $y_t$.
An economic theory typically makes  $w_{t+1}, v_t$ be directly interpretable as shocks that impinge on preferences, technologies,
endowments, information sets, and measurements.
The state space system  \Ep{kalf1},\Ep{kalf2} is a representation of the stochastic process  $y_t$ in terms of these interpretable shocks.
But the typical situation is that these shocks can  not be recovered directly from the $y_t$s, even when we know the matrices $A_o, G, C, R$.

The \idx{innovations representation} \Ep{innovti;a}, \Ep{innovti;b} represents  the % {\it same\/}
stochastic process $y_t$
in terms of an $ m \times 1$ vector of shocks $a_t$ that would  be recovered by running an infinite-order (population) vector autoregression
for $y_t$.  Because of its role in constructing the mapping from the original representation \Ep{kalf1},\Ep{kalf2} to the one associated
with the vector autoregression \Ep{VAR1}, the    Kalman filter is a very useful tool for interpreting vector autoregressions.
\index{Kalman filter!and vector autoregressions}%



\section{Estimation again}\label{sec:estimation2}%
The \idx{innovations representation} that emerges from the Kalman filter
is
$$ \EQNalign{ \hat x_{t+1} & = A_o \hat x_t + K_t a_{t} \EQN innovrep1;a \cr
            y_t & = G \hat x_t + a_t \EQN innovrep1;b \cr} $$
where for $t\geq 1$, $\hat x_t = E[x_t | y^{t-1}]$ and $E a_t a_t' = G \Sigma_t G' + R \equiv \Omega_t$.
Evidently, for $t \geq 1$,  $ E [y_t | y^{t-1}] = G \hat x_{t}$ and
the  distribution of $y_t$ conditional on $y^{t-1}$ is ${\cal N}(G \hat x_{t}, \Omega_t)$.  The objects
$G \hat x_{t}, \Omega_t$ emerging from the Kalman filter are thus sufficient statistics for the
distribution of $y_t$ conditioned on $y^{t-1}$ for $ t \geq 1$.  The sufficient statistics and also the innovation
$a_t = y_t - G \hat x_t$ can be calculated recursively from \Ep{kalf10}.  The unconditional distribution of
$y_0$ is evidently ${\cal N}(G \hat x_0, \Omega_0)$.

As a counterpart to \Ep{diff10}, we can factor the likelihood function for a sample $(y_T, y_{T-1}, \ldots, y_0)$ as
$$ f(y_T, \ldots, y_0) = f(y_T \vert y^{T-1})
     f(y_{T-1} \vert y^{T-2}) \cdots
       f(y_1 \vert y_0) f(y_0). \EQN diff100 $$
 The log  of the conditional density
of the $m\times 1$ vector  $y_{t}$ is
 $$  \log f(y_{t} \vert y^{t-1})  =  -.5 m \log (2 \pi)
  -.5 \log {\rm det} \ (\Omega_t)   - .5
a_t' \Omega_t^{-1} a_t .  \EQN Gauss100 $$
We can use \Ep{Gauss100} and \Ep{kalf10} to evaluate  the likelihood function  \Ep{diff100}
recursively for a given set of parameter values   $\theta$ that underlie the matrices $A_o, G, C, R$.  Such calculations are at the heart of efficient strategies for computing   maximum-likelihood
and Bayesian estimators of free parameters.\NFootnote{See
 Hansen (1982); Eichenbaum (1991);
Christiano and Eichenbaum (1992); Burnside, Eichenbaum,
and  Rebelo (1993); and Burnside and Eichenbaum (1996a, 1996b) for alternative
estimation strategies.}

The likelihood function is also an essential object for a Bayesian statistician.\NFootnote{See  Canova (2007), Christensen and  Kiefer (2009),
 and DeJong and Dave (2011)
for extensive descriptions of how Bayesian and maximum likelihood methods can be applied to macroeconomic and other dynamic models.}  It completely summarizes how the data influence
the Bayesian posterior via the following application
of Bayes' law.  Where $\theta$ is our parameter vector, $y_0^T$ our data record, and $\tilde p(\theta)$
 a probability density that summarizes our prior `views' or `information' about $\theta$ {\it before\/} seeing  $y_0^T$,
our views about $\theta$ {\it after\/} seeing $y_0^T$ are described by a  {\it posterior probability}  $\tilde p(\theta| y^T_0) $ that is constructed from
Bayes's law via
$$ \tilde p(\theta| y^T_0) =
 {f(y^T_0| \theta ) \tilde p(\theta) \over \int f(y^T_0| \theta )  \tilde p(\theta) d\, \theta} , $$
 where the denominator is the marginal joint density $f(y_0^T)$ of $y_0^T$.



In appendix \use{appB}, we  describe a simulation algorithm for approximating a  Bayesian posterior.  The algorithm constructs a Markov chain
whose invariant distribution {\it is\/} the posterior, then iterates the Markov chain to convergence.
\auth{Eichenbaum, Martin} \auth{Christiano, Lawrence J.}
\auth{Burnside, Craig} \auth{Rebelo, Sergio}%
\auth{Canova, Fabio}%
\auth{DeJong, David}%
\auth{Dave, Chetan}%
\auth{Christensen, Bent Jasper}%
\auth{Kiefer,  Nicholas M.}%


\section{The spectrum}
 For a covariance stationary stochastic process,  all second moments
can be encoded in a complex-valued matrix called the
{\it spectral density} matrix.  The autocovariance sequence for
the process  determines the spectral density. Conversely,
the spectral density can be used to determine the autocovariance sequence.

  Under the assumption that
$A_o$ is a stable matrix,\NFootnote{It is sufficient that
the only eigenvalue
of $A_o$ not strictly less than unity in modulus is that associated with
the constant, which implies that $A_o$ and $C$ fit together in a way
that validates \Ep{blackspectrum}.}
 the state $x_t$ converges
 to a unique covariance stationary probability distribution as
$t$ approaches infinity.
The spectral density matrix of this covariance stationary
distribution $S_x(\om)$ is  defined to be
the Fourier transform of the covariogram of $x_t$:
$$S_x(\om) \equiv \sum_{\tau=-\infty}^{\infty}
C_x (\tau) e^{-i\om\tau}. \EQN diff7 $$
  For the system \Ep{diff1}, the
\idx{spectral density} of the stationary  distribution
\index{distribution!stationary} is given by the formula
$$ S_x(\om) = [I-A_o e^{-i\om}]^{-1}C  C' [I-A_o'
e^{+i\om}]^{-1},  \quad \forall
   \om \in[-\pi,\pi].  \EQN blackspectrum $$
The spectral density summarizes all
covariances.  They can be recovered from $S_x(\om)$ by the
\idx{Fourier inversion formula}\NFootnote{Spectral
 densities for continuous-time systems are discussed by
Kwakernaak and Sivan (1972).  For an elementary discussion of
discrete-time systems, see Sargent (1987a).  Also see
Sargent (1987a, chap. 11) for definitions of the
spectral density function and methods of evaluating this integral.}

$$C_x(\tau) =(1/2\pi)\int_{-\pi}^\pi S_x(\om) e^{+i\om\tau} d\om.$$
\auth{Kwakernaak, Huibert}%
\auth{Sivan, Raphael}%
\auth{Sargent, Thomas J.}%
 Setting $\tau=0$ in the inversion formula gives
$$C_x(0) =(1/2\pi)\int_{-\pi}^\pi S_x(\om)  d\om,$$
which shows that the spectral
density  decomposes covariance across
 frequencies.\NFootnote{More interestingly, the spectral density  achieves
a  decomposition of covariance into components that are
orthogonal across frequencies.}
A formula used  in the process
of generalized method of moments (GMM) estimation
emerges by  setting $\omega=0$ in equation \Ep{diff7}, which gives
$$S_x(0) \equiv \sum_{\tau=-\infty}^{\infty}
C_x (\tau) .  $$ \topfigure{r2red1d} \centerline{\epsfxsize=3true
in\epsffile{r2red1d.eps}} \caption{Impulse response, spectrum,
covariogram, and sample path of process $(1 - 1.3L + .7L^2) y_t =
w_t $.} \infiglist{First order a.r.}
\endfigure

\vfill\eject


%\topinsert{
%$$\grafone{r2red1d.eps,height=3in}{{\bf Figure 1.4}
%  Impulse response, spectrum, covariogram, and
%sample path
%of process $(1 - 1.3L + .7L^2) y_t = w_t $.}
%$$
%}\endinsert




 \topfigure{r2red1a} \centerline{\epsfxsize=3true
in\epsffile{r2red1a.eps}} \caption{Impulse response, spectrum,
covariogram, and sample path of process $(1 - .9L) y_t = w_t $.}
\infiglist{First order a.r.}
\endfigure

%\topinsert{
%$$\grafone{r2red1b.eps,height=3in}{{\bf Figure 1.2}
%  Impulse response, spectrum, covariogram, and
%sample path
% of process $(1 - .8L^4) y_t = w_t $.}
%$$
%}\endinsert

\topfigure{r2red1b}
\centerline{\epsfxsize=3true in\epsffile{r2red1b.eps}}
\caption{Impulse response, spectrum, covariogram, and
sample path
 of process $(1 - .8L^4) y_t = w_t $.}
\infiglist{First order a.r.}
\endfigure


\midfigure{r2red1c} \centerline{\epsfxsize=3true
in\epsffile{r2red1c.eps}} \caption{Impulse response, spectrum,
covariogram, and sample path of process $(1 - .98L) y_t =(1 - .7L)
w_t $.} \infiglist{First order a.r.}
\endfigure

\subsection{Examples}
To give some practice in reading spectral densities,
we used the Matlab program {\tt bigshow3.m}  to generate
Figures \Fg{r2red1a}, \Fg{r2red1b}, \Fg{r2red1d}, and \Fg{r2red1c}
%, 1.2, 1.3, and 1.4. %  ^^|bigshow.m|
\mtlb{bigshow3.m} %
 The program takes as an input a univariate process of the
form
$$ a(L) y_t = b(L) w_t,$$
where $w_t$ is a univariate martingale difference sequence with
unit variance, where $a(L) = 1 - a_2 L - a_3 L^2 - \cdots - a_n
L^{n-1}$ and $b(L) = b_1 +  b_2 L + \cdots + b_n L^{n-1}$, and
where we require that $a(z)=0 $ imply that $| z | > 1$.   The
program computes and displays a realization of the process, the
impulse response function from $w$ to $y$, and the spectrum of
$y$. By using this program, a reader can teach himself to read
spectra and impulse response functions. Figure \Fg{r2red1a} is for the pure
autoregressive process with $a(L) = 1 -.9L, b=1$. The spectrum
sweeps downward in what C.W.J. Granger (1966) called the ``typical
spectral shape'' for an economic time series. \index{typical
spectral shape}
 Figure \Fg{r2red1b} sets $a=1-.8L^4, b=1$.  This is a process with
a strong seasonal component.  That the spectrum
peaks at $\pi$ and $\pi/2$ is a  telltale sign of
a strong seasonal component. Figure \Fg{r2red1d} sets
$a=1 -1.3 L + .7L^2, b=1$.  This
is a process that has a spectral peak in the interior of $(0, \pi)$ and cycles in its
covariogram.\NFootnote{See Sargent (1987a) for a more extended
discussion.}
Figure \Fg{r2red1c} sets a = $1-.98L, b = 1 -.7L$.  This
is a version of a process studied by Muth (1960).  After the first
lag, the impulse response declines as $.99^j$, where
$j$ is the lag length.

\medskip


\section{Example: the LQ permanent income model}\label{sec:LQmodel}%
To illustrate several of the key ideas   of this chapter,
 this section describes the linear quadratic savings problem
whose solution is a rational expectations version of the permanent
income model of Friedman  (1956) and Hall (1978). We use this
model as a vehicle for illustrating impulse response functions,
alternative notions of the state, the idea of cointegration,
and an invariant subspace method.

The LQ permanent income model  is a modification
(and not quite a special case, for reasons that will be apparent later)
of the following
``savings problem'' to be studied in chapter \use{selfinsure}.
  A consumer has preferences over consumption streams
that are ordered by
the utility functional
$$  E_0 \sum_{t=0}^\infty \beta^t u(c_t)  \EQN  sprob1 $$
where $E_t$ is the mathematical expectation conditioned
on the consumer's time $t$ information,  $c_t$ is time $t$ consumption,
$u(c)$ is a strictly concave one-period utility function, and
$\beta \in (0,1)$ is a discount factor.  The consumer maximizes
\Ep{sprob1} by choosing a consumption, borrowing plan
 $\{c_t, b_{t+1}\}_{t=0}^\infty$ subject to the sequence of budget constraints
$$ c_t + b_t = R^{-1} b_{t+1}  + y_t \EQN sprob2 $$
where $y_t$ is an exogenous
 stationary endowment process, $R$ is a constant gross
risk-free interest rate, $b_t$ is one-period risk-free  debt  maturing at
$t$, and $b_0$ is a given initial condition.  We shall assume
that $R^{-1} = \beta$.  For example, we might assume that the endowment
process has the state-space representation
$$ \EQNalign{ z_{t+1} & = A_{22} z_t + C_2 w_{t+1} \EQN sprob15;a \cr
               y_t & = U_y  z_t \EQN sprob15;b \cr}$$
where $w_{t+1}$ is an i.i.d.\ process with mean zero and
identity contemporaneous covariance matrix, $A_{22}$ is a stable matrix,
its eigenvalues being strictly below unity in modulus, and
$U_y$ is a selection vector that identifies $y$ with a particular
linear combination of the $z_t$.
We impose the following condition on the
consumption, borrowing plan:
$$  % \lim_{T \rightarrow +\infty}
 E_0 \sum_{t=0}^\infty \beta^t b_t^2 < +\infty. \EQN sprob3 $$
This condition suffices to rule out Ponzi schemes.
The {\it state\/} vector confronting the household at $t$  is
$\left[\matrix{b_t & z_t\cr}\right]'$, where $b_t$ is its one-period debt falling
 due at the beginning of period $t$
and $z_t$ contains all variables useful for
forecasting its future endowment.  We impose this condition to
rule out an always-borrow scheme that would allow the household to
enjoy bliss consumption forever.  The rationale for imposing this
condition is to make the solution resemble the solution of problems to be studied in chapter
\use{selfinsure} that impose nonnegativity  on the consumption path.
   First-order conditions for maximizing \Ep{sprob1} subject to
\Ep{sprob2} are\NFootnote{We shall study how to derive this first-order condition
in detail in later chapters.}
$$ E_t u'(c_{t+1}) = u'(c_t) , \ \ \forall t \geq 0. \EQN sprob4 $$

  For the rest of this section we assume
the quadratic  utility function
$u(c_t) =  -.5 (c_t - \gamma)^2$,
where $\gamma$ is a bliss level of consumption. Then
\Ep{sprob4} implies\NFootnote{\label{ftsupermart}A linear marginal utility is essential for deriving
\Ep{sprob5} from \Ep{sprob4}.  Suppose instead that we had imposed the
following more standard assumptions on the utility function:
$u'(c) >0, u''(c)<0, u'''(c) > 0$ and required that $c \geq 0$.  The Euler equation
remains \Ep{sprob4}. But the fact that $u''' <0$ implies via \idx{Jensen's inequality}
that $E_t u'(c_{t+1}) >  u'(E_t c_{t+1})$.  This inequality together with \Ep{sprob4}
implies that $E_t c_{t+1} > c_t$ (consumption is said to be a `\idx{submartingale}'), so that
consumption stochastically diverges to $+\infty$.  The consumer's savings also diverge to $+\infty$.  Chapter \use{selfinsure} discusses this
`\idx{precautionary savings}' divergence result in depth.}
\tag{supermart_first}{\the\pageno}%
$$ E_t c_{t+1} = c_t . \EQN sprob5 $$
Along  with the quadratic utility specification, we allow consumption
$c_t$ to be negative.\NFootnote{That $c_t$ can be negative explains why
we impose condition  \Ep{sprob3} instead of an upper bound on
the level of borrowing, such as the natural borrowing limit of
chapters \use{recurge}, \use{selfinsure}, and  \use{incomplete}.}


 To deduce the optimal decision rule, we have to solve the system
of difference equations formed by \Ep{sprob2} and  \Ep{sprob5}
subject to the boundary condition \Ep{sprob3}.   To accomplish this,
solve \Ep{sprob2} forward and impose $\lim_{T\rightarrow +\infty} \beta^T b_{T+1} =0$ to get
$$ b_t = \sum_{j=0}^\infty \beta^j (y_{t+j} - c_{t+j}) .  \EQN sprob6 $$
Imposing $\lim_{T\rightarrow +\infty} \beta^T b_{T+1} =0$ suffices to impose \Ep{sprob3} on the debt
path.
Take conditional expectations on both sides of \Ep{sprob6} and use \Ep{sprob5}
and the law of iterated expectations to deduce
$$ b_t = \sum_{j=0}^\infty \beta^j E_t y_{t+j} - {1 \over 1-\beta} c_t
\EQN sprob7 $$
 or
$$ c_t = (1-\beta)
\left[ \sum_{j=0}^\infty \beta^j E_t y_{t+j} - b_t\right].
 \EQN sprob8  $$
If we define the net rate of interest $r$ by $\beta ={1 \over 1+r}$, we can
also express this
equation as
$$ c_t = {r \over 1+r}
\left[ \sum_{j=0}^\infty \beta^j E_t y_{t+j} - b_t\right]. \EQN sprob9 $$
Equation \Ep{sprob8} or \Ep{sprob9} expresses consumption as equaling
economic {\it income\/}, namely,  a constant
marginal propensity to consume or interest factor ${r \over 1+r}$ times
the sum of nonfinancial wealth $
\sum_{j=0}^\infty \beta^j E_t y_{t+j}$ and financial
wealth $-b_t$.   Notice that \Ep{sprob8} or \Ep{sprob9} represents
$c_t$  as a function of the {\it state\/} $[b_t, z_t]$
 confronting the household, where  from \Ep{sprob15} $z_t$ contains the
information useful for forecasting the endowment process.



\subsection{Another representation}

Pulling together our preceding results, we can regard $z_t, b_t$ as
the time $t$ state, where $z_t$ is an {\it exogenous\/} component of the state
and $b_t$ is an {\it  endogenous\/} component of the state vector. The system
can be represented as
$$ \eqalign{ z_{t+1} & = A_{22} z_t + C_2 w_{t+1} \cr
   b_{t+1} & = b_t + U_y [ (I -\beta A_{22})^{-1} (A_{22} - I) ] z_t \cr
   y_t & = U_y z_t \cr
   c_t & = (1-\beta) [ U_y(I-\beta A_{22})^{-1} z_t - b_t ].  \cr }$$


Another  way to  understand the solution is to show that {\it after\/}
the optimal   decision rule has been obtained, there is a point
of view that allows us to regard
the state as being  $c_t$  together with $z_t$
  and to regard $b_t$ as an outcome.  Following Hall (1978), this
is a sharp way to summarize the implication of the LQ permanent
income theory. We now proceed to transform the state vector in this way.



To represent the solution for  $b_t$,
substitute  \Ep{sprob8} into \Ep{sprob2} and after
rearranging obtain
$$ b_{t+1} = b_t +\left({\beta^{-1} -1}\right) \sum_{j=0}^\infty
  \beta^j E_t y_{t+j} - \beta^{-1} y_t. \EQN sprob10 $$
Next, shift \Ep{sprob8} forward one period and eliminate $b_{t+1}$ by
using \Ep{sprob2}  to obtain
$$ c_{t+1} = (1-\beta)\sum_{j=0}^\infty  E_{t+1} \beta^j y_{t+j+1}
  - (1-\beta)[\beta^{-1} (c_t + b_t - y_t)]. $$
If we add and subtract
$ \beta^{-1} (1-\beta) \sum_{j=0}^\infty \beta^j E_t y_{t+j}$
from the right side of the preceding equation and rearrange, we obtain
$$ c_{t+1} - c_t = (1-\beta) \sum_{j=0}^\infty \beta^j (E_{t+1} y_{t+j+1}
  - E_t y_{t+j+1} ) .\EQN sprob11 $$
The right side is the time $t+1$  innovation to the  expected present
value of the endowment process $y$.   It is useful to express this innovation in terms
of a moving average representation for income $y_t$.  Suppose that the
endowment process has the moving average representation\NFootnote{Representation
\Ep{sprob15} implies that $d(L) = U_y (I - A_{22} L)^{-1} C_2$.}
$$ y_{t+1} = d(L) w_{t+1} \EQN sprob12 $$
where $w_{t+1}$ is an i.i.d.\ vector process with $E w_{t+1}
=0$
and contemporaneous covariance matrix $E w_{t+1}
w_{t+1}'=I$, $d(L) = \sum_{j=0}^\infty d_j L^j$, where $L$ is
the lag operator, and the household has an information set\NFootnote{A moving average
representation for a process $y_t$ is said to be {\it fundamental} if the
linear space spanned by $y^t$ is equal to the linear space spanned by $w^t$.  A time-invariant
innovations representation, attained via the Kalman filter as in section \use{sec:Kalman100}, is by
construction fundamental.}
$w^t = [w_t, w_{t-1}, \ldots, ]$ at time $t$.
Then notice that
$$y_{t+j} - E_t y_{t+j} = d_0 w_{t+j} + d_1 w_{t+j-1}
  + \cdots + d_{j-1} w_{t+1}.  $$
It follows that
$$ E_{t+1} y_{t+j} - E_t y_{t+j} = d_{j-1} w_{t+1} .
\EQN sprob120 $$
Using \Ep{sprob120} in \Ep{sprob11} gives
$$ c_{t+1} - c_t = (1-\beta) d(\beta) w_{t+1}. \EQN sprob13 $$
The object $d(\beta)$ is the present value of the moving average
coefficients in the representation for the endowment process $y_t$.




  After all of this work, we can represent the  optimal decision rule
for $c_t, b_{t+1}$ in the form of the two equations
\Ep{sprob11} and \Ep{sprob7}, which we repeat here for convenience:
$$\EQNalign{ c_{t+1} &= c_t + (1-\beta) \sum_{j=0}^\infty \beta^j (E_{t+1} y_{t+j+1}
  - E_t y_{t+j+1} ) \EQN sprob11aa \cr
b_t  &= \sum_{j=0}^\infty \beta^j E_t y_{t+j} - {1 \over 1-\beta} c_t .
\EQN sprob7aa \cr} $$
Equation \Ep{sprob7aa} asserts that the household's debt  due at $t$ equals the
expected present value of  its endowment minus the expected present value of its consumption stream.
A high debt thus indicates a large expected present value of `surpluses' $y_t - c_t$.




Recalling the form of the endowment process \Ep{sprob15}, we can
compute
$$ \EQNalign{  E_t \sum_{j=0}^\infty \beta^j z_{t+j} &  =
   (I-\beta A_{22})^{-1} z_t \cr
    E_{t+1} \sum_{j=0}^\infty \beta^j z_{t+j+1} & = (I -\beta A_{22})^{-1} z_{t+1} \cr
    E_t \sum_{j=0}^\infty \beta^j z_{t+j+1} & = (I - \beta A_{22})^{-1}
   A_{22} z_t. \cr} $$
Substituting these formulas into \Ep{sprob11aa} and \Ep{sprob7aa}
and using \Ep{sprob15;a}
gives the following representation for the consumer's optimum decision
rule:\NFootnote{See appendix \use{sec:perm_income_ge}
of chapter \use{selfinsure} for a reinterpretation
of precisely these outcomes in terms of a
 competitive equilibrium of a model
with a complete set of markets in history- and date-contingent claims to consumption.}
$$ \EQNalign{ c_{t+1} & = c_t + (1-\beta) U_y  (I-\beta A_{22})^{-1}
  C_2 w_{t+1}  \EQN  sprob16;a \cr
    b_t & = U_y (I-\beta A_{22})^{-1} z_t - {1 \over 1-\beta} c_t
  \EQN sprob16;b \cr
   y_t & = U_y z_t \EQN sprob16;c \cr
   z_{t+1} & = A_{22} z_t + C_2 w_{t+1} \EQN sprob16;d \cr }$$

%%%% Tom: problem in preceding footnote.  doesn't like the chapter and section numbers under \use{ }.


Representation \Ep{sprob16} reveals several things about the
optimal decision rule.  (1) The {\it state\/} consists of the
endogenous part $c_t$ and the exogenous part $z_t$.  These contain
all of the relevant information for forecasting future $c,y, b$.
Notice that financial assets $b_t$ have disappeared as a component
of the state because they are properly encoded in $c_t$. (2)
According to \Ep{sprob16}, consumption is a random walk with
innovation $(1-\beta) d(\beta)w_{t+1}$ as implied also by
\Ep{sprob13}.  This outcome confirms that the Euler equation
\Ep{sprob5} is built into the solution. That consumption is a
random walk of course implies that it does not possess an
asymptotic stationary distribution, at least so long as $z_t$
exhibits perpetual random fluctuations, as it will generally under
\Ep{sprob15}.\NFootnote{The failure of consumption to converge
will  occur again in chapter \use{selfinsure} when we drop quadratic
utility and assume that consumption must be nonnegative.} This feature
is inherited partly from the assumption that $\beta R =1$. (3) The
impulse response function of $c_t$ is a box: for all $j\geq 1$,
the response of $c_{t+j}$ to an increase in the innovation
$w_{t+1}$ is $(1-\beta) d(\beta) = (1-\beta) U_y (I -\beta
A_{22})^{-1} C_2$. (4) Solution \Ep{sprob16} reveals that the
joint process  $c_t,b_t$ possesses the property that
 Granger and Engle (1987) called {\it cointegration\/}.
 \tag{cointegrationtag}{\the\pageno}%
 \index{cointegration}%
   In particular,
{\it both\/} $c_t$ and $b_t$ are non-stationary  because they have
unit roots (see representation \Ep{sprob10} for $b_t$), but there
is a  linear combination  of $c_t, b_t$ that {\it is\/} stationary
provided that $z_t$ is stationary.  From \Ep{sprob7aa}, the linear
combination is $(1-\beta) b_t + c_t$.  Accordingly, Granger and
Engle would call $\left[\matrix{(1-\beta) & 1 \cr}\right]$ a
cointegrating vector that, when applied to the nonstationary
vector process $\left[ \matrix{b_t  & c_t \cr}\right]'$, yields a
process that is asymptotically stationary. Equation \Ep{sprob7}
can be arranged to take the form
$$ (1-\beta) b_t + c_t = (1-\beta) E_t \sum_{j=0}^\infty \beta^j y_{t+j} ,
\EQN sprob77 $$ which asserts that the `cointegrating residual'
on the left side equals  the conditional expectation of the
geometric sum of future incomes on the right.\NFootnote{See
Campbell and Shiller (1988) and  Lettau and Ludvigson (2001, 2004)
for interesting applications of related ideas.} \auth{Engle,
Robert F.}\auth{Granger, C.W.J.} \auth{Shiller, Robert}
\auth{Campbell, John Y.} \auth{Ludvigson, Sydney}\auth{Lettau, Martin}



\subsection{Debt dynamics}\label{sec:debt_dynamics}%
If we subtract equation  \Ep{sprob16;b}  evaluated at time $t$ from
equation  \Ep{sprob16;b}  evaluated at time $t+1$ we obtain
$$ b_{t+1}- b_t = U_y (I-\beta A_{22})^{-1} (z_{t+1} - z_t) - {\frac{1}{1-\beta}}(c_{t+1} - c_t ) .$$
Substituting $z_{t+1} - z_t = (A_{22} - I )z_t + C_2 w_{t+1}$ and equation \Ep{sprob16;a} into the
above equation and rearranging gives
$$ b_{t+1} - b_t =U_y (I - \beta A_{22})^{-1} (A_{22} - I) z_t . \EQN debt_evolution $$




\subsection{Two classic examples}\label{sec:classic_consumption}%
We illustrate formulas \Ep{sprob16} with the following two examples.
In both examples,
the endowment follows the process
$y_t = z_{1t} + z_{2t} $ where
$$ \bmatrix{ z_{1t+1} \cr z_{2t+1} } = \bmatrix{ 1 & 0 \cr 0 & 0 }
                      \bmatrix{ z_{1t} \cr z_{2t} } + \bmatrix{ \sigma_1 & 0 \cr 0 & \sigma_2}
                      \bmatrix{ w_{1t+1} \cr w_{2 t+1} } $$
where $w_{t+1}$ is an i.i.d.\  $2 \times 1$ process distributed as ${\cal N}(0,I)$.  Here $z_{1t}$ is
a permanent component of $y_t$ while $z_{2t}$ is  a purely transitory component.


\medskip
\noindent {\bf Example 1.} Assume that the consumer observes the state $z_t$ at time $t$. This implies that the consumer can
construct $w_{t+1}$ from observations of $z_{t+1}$ and $z_t$.  Application
of formulas \Ep{sprob16} implies that
$$ c_{t+1} - c_t = \sigma_1 w_{1t+1} + (1-\beta) \sigma_2 w_{2t+1}. \EQN consexample1 $$
Since $1-\beta = {\frac{r}{1+r}}$ where $R = (1+r)$,  formula \Ep{consexample1} shows how
an increment $\sigma_1 w_{1t+1}$ to the permanent component of income $z_{1t+1}$ leads to
a permanent one-for-one increase in consumption and no increase in savings $-b_{t+1}$; but how  the purely transitory component of income   $\sigma_2 w_{2t+1}$
leads to a  permanent increment in consumption by a fraction  $(1-\beta)$ of transitory income, while
the remaining  fraction $\beta$ is saved, leading to a permanent increment in $-b$. Application of formula \Ep{debt_evolution}
to this example shows that
$$ b_{t+1} - b_t = - z_{2t} = - \sigma_2 w_{2t}, \EQN consexample1a $$
which confirms that none of $\sigma_1 w_{1t}$ is saved, while all of $\sigma_2 w_{2t}$ is saved.
\medskip
\noindent{\bf Example 2.}  Assume that the consumer observes $y_t$, and its history up to $t$, but not $z_t$ at time $t$.
Under this assumption, it is appropriate to use an {\it innovation representation} to form $A_{22}, C_2, U_y$ in formulas
\Ep{sprob16}.  In particular, using our results from section \use{sec:Muth_reverse}, the pertinent state space
representation for $y_t$ is
$$\EQNalign{ \bmatrix{ y_{t+1} \cr a_{t+1} } & = \bmatrix{ 1 & -(1-K) \cr 0 & 0 } \bmatrix{y_t \cr a_t } + \bmatrix{ 1\cr 1} a_{t+1} \cr
                    y_t & = \bmatrix{ 1 & 0 }  \bmatrix{y_t \cr a_t }  } $$
where $K$ is the Kalman gain and $a_t = y_t - E [ y_t | y^{t-1}]$.  From subsection \use{sec:Muth_reverse}, we know that $K \in [0,1]$ and that $K$ increases as ${\frac{\sigma_1^2}{\sigma_2^2}}$
increases, i.e., as the ratio of the variance of the permanent shock to the variance of the transitory shock to  income increases.
Applying formulas \Ep{sprob16} implies
$$    c_{t+1} - c_t = [1-\beta(1-K) ] a_{t+1} \EQN consexample2 $$
where the endowment process can now be represented in terms of the univariate innovation to $y_t$ as
$$ y_{t+1} - y_t = a_{t+1} - (1-K) a_t. \EQN incomemaar $$
Equation \Ep{incomemaar} indicates that the consumer regards a fraction $K$ of an innovation $a_{t+1}$ to $y_{t+1}$
as {\it permanent} and a fraction $1-K$ as purely transitory. He permanently increases his consumption by the full amount of his estimate of the permanent part of $a_{t+1}$,
but by only $(1-\beta)$ times his estimate of the purely transitory part of $a_{t+1}$.   Therefore, in total  he permanently increments his  consumption by
 a fraction $K + (1-\beta) (1-K) = 1 - \beta (1-K)$  of $a_{t+1}$ and saves the remaining fraction $\beta (1-K)$ of $a_{t+1}$.
  According to equation \Ep{incomemaar}, the first difference of income is a first-order moving average, while \Ep{consexample2}
  asserts that the first difference of consumption is   i.i.d.
  Application of formula \Ep{debt_evolution}
to this example shows that
$$ b_{t+1} - b_t = (K-1) a_t , \EQN consexample1b $$
which indicates how the fraction $K$ of the innovation to $y_t$ that is regarded as permanent influences the fraction of the innovation that is saved.



\subsection{Spreading consumption cross section}

Starting from an arbitrary initial distribution for $c_0$ and say the asymptotic stationary  distribution for $z_0$, if we were to apply formulas \Ep{diff6} and \Ep{ydiff2} to the state space system \Ep{sprob16},
the common  unit root affecting $c_t, b_t$ would cause  the time $t$ variance of $c_t$ to grow linearly with $t$.
If we think of the initial distribution as describing the joint distribution of $c_0, b_0$ for a cross section
of ex ante identical households `born at time $0$, then these formulas would describe the evolution of the cross-section
for $b_t, c_t$ as the population of households ages.  The distribution would spread out.\NFootnote{See Deaton
and Paxton (1994) and Storesletten, Telmer, and  Yaron (2004) for evidence that cross section distributions of consumption spread out with age.}
\subsection{Invariant subspace approach}
We can glean additional insights about  the structure of the optimal
decision rule by solving  the decision problem
in a mechanical but quite revealing way that
easily generalizes to a host of problems, as we shall
see later in chapter \use{dplinear}.
  We can represent the system consisting of
the Euler equation
\Ep{sprob5}, the budget constraint \Ep{sprob2}, and the description of the
endowment process \Ep{sprob15} as
$$ \left[\matrix{\beta & 0 & 0 \cr
                  0 & I & 0 \cr
                  0 & 0 & 1 \cr}\right]
\left[\matrix{b_{t+1} \cr z_{t+1} \cr c_{t+1} \cr} \right]
  =     \left[\matrix{ 1 & - U_y & 1 \cr
               0 & A_{22} & 0 \cr
               0 & 0 & 1 \cr} \right]
\left[\matrix{b_{t} \cr z_{t} \cr c_{t} \cr} \right]
  + \left[\matrix{0 \cr C_2 \cr  C_1 \cr}\right] w_{t+1} \EQN sprob17 $$
where $C_1$ is an undetermined coefficient.  Premultiply both sides
by the inverse of the matrix on the left and write
$$ \left[\matrix{ b_{t+1} \cr z_{t+1} \cr c_{t+1} \cr}\right]
   = \tilde A \left[\matrix{b_t \cr z_t \cr c_t \cr}\right]
   + \tilde C w_{t+1}. \EQN sprob18 $$
We want to find solutions of \Ep{sprob18} that satisfy the
no-explosion condition \Ep{sprob3}.   We can do this
by using machinery
to be introduced in chapter \use{dplinear}.  The key idea is to discover what  part of
the vector $\left[\matrix{b_t & z_t & c_t\cr}\right]'$
is truly a {\it state\/} from the view of the decision maker, being
inherited from the past,
and what part is a {\it costate\/} or {\it jump\/} variable that can
adjust at $t$.  For our problem $b_t, z_t$ are  truly components
of the state, but $c_t$ is free to adjust. The theory
determines $c_t$ at $t$ as a function of the true state
variables $[b_t, z_t]$.
A powerful approach to determining this function is the following
so-called invariant subspace method of chapter \use{dplinear}.
Obtain the eigenvector decomposition of $\tilde A$:
$$ \tilde A = V \Lambda V^{-1}$$ where $\Lambda$ is a diagonal matrix
consisting of the eigenvalues of $\tilde A$ and $V$ is a matrix of
the associated eigenvectors.  Let  $V^{-1}\equiv \left[\matrix{
V^{11} & V^{12} \cr V^{21} & V^{22} \cr}\right]$. Then applying
formula \Ep{invarsub2} of chapter \use{dplinear} implies that if
\Ep{sprob3} is to hold,  the jump variable $c_t$ must satisfy
$$ c_t = -(V^{22})^{-1} V^{21} \left[\matrix{ b_t \cr z_t \cr}\right].
 \EQN sprob19 $$
Formula \Ep{sprob19} gives the unique value of $c_t$ that ensures
that \Ep{sprob3} is satisfied, or in other words,
that the state remains in the ``stabilizing subspace.''
Notice that the variables on the right side of \Ep{sprob19} conform with
those called for by \Ep{sprob9}: $-b_t$ is there as a measure
of financial wealth, and  $z_t$ is there because it includes all variables
that are useful for forecasting the  future endowments that appear in \Ep{sprob9}.






\section{Concluding remarks}\label{concl1}%
In addition to giving us tools for thinking about time series, the
Markov chain and the stochastic linear difference equation have
each introduced us to the notion of the state vector as a
description of the present position of a system.\NFootnote{See
Quah (1990) and Blundell and Preston (1998) for   applications
 of some of the tools of this
chapter and of chapter \use{dplinear} to  studying some puzzles
associated with a permanent income model.} \auth{Quah,
Danny}\auth{Blundell, Richard}\auth{Preston, Ian}%
 Subsequent chapters use both
Markov chains and stochastic linear difference equations. In the
next chapter we study decision problems in which the goal is
optimally to manage the evolution of a state vector that can be
partially controlled.


\medskip
\appendix{A}{Linear difference equations}\label{appa1}%
\subsection{A first-order difference equation}
This section describes the solution of a linear
first-order scalar difference equation.
First, let $ |\lambda | < 1$, and let $\{u_t\}_{t=-\infty}^\infty$ be a
bounded sequence  of scalar real numbers. Let $L$ be the
lag operator defined by $L x_t \equiv x_{t-1}$ and
let $L^{-1}$ be the forward shift operator defined by $L^{-1} x_t \equiv x_{t+1}$.
Then $$(1 - \lambda L) y_t = u_t, \forall t \EQN a0 $$
has the solution
$$y_t = (1 -\lambda L)^{-1} u_t +k \lambda^t \EQN a1$$ for
any real number $k$.  You can verify this fact by applying
$(1-\lambda L)$ to both sides  of equation \Ep{a1} and noting
that $(1 - \lambda L) \lambda^t =0$.
To pin down $k$ we need one condition imposed from outside
(e.g., an initial or terminal condition)
on the path of $y$.
\medskip
Now let $| \lambda | > 1$.   Rewrite equation \Ep{a0} as
$$ y_{t-1} = \lambda^{-1} y_t - \lambda^{-1} u_t , \forall t \EQN a2 $$
or
$$ (1 - \lambda^{-1} L^{-1}) y_t = - \lambda^{-1} u_{t+1}. \EQN a3 $$
A solution is
$$ y_t = - \lambda^{-1}\left({ 1 \over  1 - \lambda^{-1} L^{-1}} \right)
        u_{t+1} + k \lambda^t \EQN a4  $$
for any $k$.  To verify that  this is a solution,
check the consequences of operating on both sides of equation \Ep{a4}
by $(1 -\lambda L)$ and compare to  \Ep{a0}.

\medskip
Solution \Ep{a1} exists for $|\lambda  | < 1$ because the
distributed lag in $u$ converges.  Solution   \Ep{a4} exists
when $|\lambda| > 1$ because the distributed lead in $u$ converges.
When $|\lambda | > 1$, the distributed lag in $u$ in \Ep{a1} may
diverge, so that a solution of this form does not exist.
The distributed lead in $u$ in \Ep{a4} need not converge
when $|\lambda| < 1$.


\subsection{A second-order difference equation}
Now consider the second order difference equation
$$ (1-\lambda_1 L) (1 - \lambda_2 L) y_{t+1} = u_t  \EQN eqndiff2$$
where $\{u_t\}$ is  a bounded sequence, $y_0$ is an initial condition,
$| \lambda_1 | < 1$ and $| \lambda_2| >1$.  We seek a bounded sequence $\{y_t\}_{t=0}^\infty$ that
satisfies \Ep{eqndiff2}.  Using insights from the previous subsection, operate on both sides
of \Ep{eqndiff2} by the forward inverse of $(1-\lambda_2 L)$ to rewrite equation \Ep{eqndiff2} as
$$ (1-\lambda_1 L) y_{t+1} = -{\frac{\lambda_2^{-1}}{1 - \lambda_2^{-1}L^{-1}}} u_{t+1} $$
or
$$ y_{t+1} = \lambda_1 y_t - \lambda_2^{-1} \sum_{j=0}^\infty \lambda_2^{-j} u_{t+j+1} . \EQN second_order_solution$$
Thus, we obtained equation \Ep{second_order_solution} by ``solving stable roots (in this case $\lambda_1$ backward, and unstable
roots (in this case $\lambda_2$) forward''. Equation \Ep{second_order_solution} has a form that we shall encounter often.
$\lambda_1 y_t$ is called the `feedback part' and $ -{\frac{\lambda_2^{-1}}{1 - \lambda_2^{-1}L^{-1}}} u_{t+1} $ is called
the ``feed-forward part' of the solution. We have already encountered solutions of this form.
Thus, notice that equation \Ep{debt_evolution} from subsection \use{sec:debt_dynamics} is almost of this form,
`almost' because in equation \Ep{debt_evolution}, $\lambda_1 =1$.
In section \use{lagrangianformulation} of chapter \use{dplinear} we return to these ideas in a more general setting.


\appendix{B}{MCMC approximation of Bayesian posterior}\label{appB}%
%\subsection{MCMC and maximum likelihood estimation}
The last twenty years witnessed impressive advances in  numerical methods for computing Bayesian and maximum likelihood estimators.
In this appendix, we briefly describe a  Markov Chain Monte Carlo method that constructs a Bayesian posterior distribution by forming a
Markov chain whose invariant distribution equals that posterior distribution.

In the Bayesian method, the following objects are in play:
\medskip
\item{1.} A sample of data $y_0^T$.
\medskip
\item {2.} A vector $\theta \in \Theta$ of free parameters  describing the preferences, technology, and information sets
of an economic model.
\medskip
\item{3.} A prior probability distribution $\tilde p(\theta)$ over the parameters.
\medskip
\item {4.} A mapping from $\theta$ to a state-space representation of an equilibrium of a economic dynamic model.
We present examples of this mapping in sections \use{sec:LQmodel}  and appendix  \use{sec:backus} of chapter \use{assetpricing2} and chapters  \use{dplinear} and \use{recurpe}.
\medskip
\item{5.} As described in section \use{sec:Kalman100},
a mapping from a state space representation of an equilibrium of an economic model to an innovations representation
via the Kalman filter and thereby to a recursive representation of a Gaussian log likelihood function
$$\log p(y^T_0| \theta ) = %\log { L}(\theta | y_0^T) =
   -.5(T+1)k \log (2\pi) -.5 \sum_{t=0}^T \log | \Omega_t|
 - .5 \sum_{t=0}^T a_t' \Omega_t^{-1} a_t ,$$
  where $\Omega_t = E a_t a_t'$.
\medskip

\item{6.} A posterior probability $$\tilde p(\theta| y^T_0) =
 {p(y^T_0| \theta ) \tilde p(\theta) \over \int p(y^T_0| \theta )  \tilde p(\theta) d\, \theta}$$
 where the denominator is the marginal density of $y_0^T$.


\medskip \index{Markov Chain Monte Carlo}%
Our goal is to compute the posterior $\tilde p(\theta| y^T_0).$
The Markov Chain Monte Carlo (MCMC) method constructs a Markov chain on a state space $\Theta$ for which
$\theta \in \Theta$ and such that
\medskip
\item{a.} The chain is easy to sample from.
\medskip
\item{b.} The chain has a unique invariant distribution $\pi(\theta)$.
\medskip
\item{c.} The invariant distribution equals the posterior:  $\pi(\theta) = \tilde p(\theta|y^T_0)$.

\medskip

\noindent Two key ingredients of the {\it Metropolis-Hastings algorithm\/} are
\medskip
\item{i.} The {\it target\/} density $\tilde p(\theta|y^T_0)$.
\medskip
\item{ii.} A {\it proposal\/} or {\it jumping\/} density $q(z|\theta;y^T_0)$.
\medskip

\noindent The proposal density should be a good guess at $\tilde p(\theta|y^T_0)$.  For our applications, a standard choice
of a proposal density comes from adjusting the asymptotic distribution associated with the maximum likelihood estimator,
$\theta \sim {\cal N}(\hat \theta_{ML}, \Sigma_\theta)$ where
$\Sigma_\theta = V^{-1}$ and $V = -{\partial^2 \log L (\theta|y^T_0)\over \partial \theta \partial \theta'}
\Bigr|_{\theta_{ML}}.$  A common choice of a proposal density is:
$$ q(\theta^* | \theta_j, y^T_0)  = {\cal N}(\theta_j, c \Sigma_\theta), \EQN propdensity $$
where $c$ is a scale parameter.
 Define the
{\it kernel\/}  $\kappa(\theta | y^T_0) $ by
$$ \log \kappa(\theta | y^T_0) = \log p(y^T_0| \theta ) + \log \tilde p(\theta) .$$
Note that
$$ \tilde p(\theta | y_0^T) \propto \kappa(\theta|y^T_0) $$
where the factor of proportionality is the integrating constant $\int p(y^T_0| \theta )  \tilde p(\theta) d\, \theta$.

The \idx{Metropolis-Hastings algorithm} defines a Markov chain on $\Theta$ by these steps:\NFootnote{See Robert and
Casella (2004, ch.~7) for  discussions of this algorithm and conditions for convergence.}

\medskip
\item{1.} Draw $\theta_0$, $j=0$.
\medskip

\item{2.}For $j \geq 0$, draw $\theta^*$ from $q(\theta^* |\theta_j, y^T_0)$; $\theta^*$ is a ``candidate'' for the next
draw of $\theta_j$.
\medskip
\item{3.} Randomly  decide whether to accept this candidate by first computing the probability of acceptance
$$ r = {\tilde p(\theta^* | y^T_0) \over \tilde p(\theta_j| y^T_0)}
      = {\kappa(\theta^* | y^T_0) \over \kappa(\theta_j| y^T_0) } .$$
(Note that in this step, we only have  to compute the kernels, not the integrating constant $\int p(y^T_0| \theta )  \tilde p(\theta) d\, \theta$.)
Then set
$$ \theta_{j+1} = \cases{\theta^* & with probability $\min (r,1)$; \cr
                         \theta_j & otherwise. \cr} $$
\medskip
\noindent This algorithm defines the transition density of a Markov chain mapping $\theta_j$ into $\theta_{j+1}$.
Let the transition density be ${\rm Prob}(\theta_{j+1} = \theta^* | \theta_j = \theta) = \Pi(\theta,\theta^*)$.
Then we have the following

\medskip
\noindent
{\sc Proposition:} The invariant distribution of the chain is the posterior:
$$ \tilde p(\theta^* | y^T_0) = \int \Pi(\theta,\theta^*) \tilde p(\theta | y^T_0) d \, \theta .$$

Two practical concerns associated with the Metropolis-Hastings algorithm are, first, whether the chain converges,
and, second, the rate of convergence. The literature on MCMC has developed  practical diagnostics for checking convergence, and it is important
to use these thoughtfully.     The rate of convergence is influenced by the acceptance rate, which can be influenced by choice of  the scale parameter $c$ when the proposal density is chosen as recommended above.  Common piece  chooses $c$ to give an acceptance rate between $.2$ and $.4$.

Dynare computes maximum likelihood and  Bayesian estimates using the above algorithm.  See Barillas,
Bhandari, Bigio,  Colacito, Juillard, Kitao, Matthes, Sargent,
and Shin (2012) for some examples.\NFootnote{See $<$http://www.dynare.org$>$.}
\auth{Barillas, Francisco}%
\auth{Bhandari, Anmol}%
\auth{Colacito, Riccardo}%
\auth{Kitao, Sagiri}%
\auth{Matthes, Christian}%
\auth{Sargent, Thomas J.}%
\auth{Shin, Yongseok}%
\auth{Juillard, Michel}%
\auth{Bigio, Saki}%

\newpage

%\section{Exercises}

\showchaptIDfalse
\showsectIDfalse
\section{Exercises}
\showchaptIDtrue
\showsectIDtrue
\eqnum=0
\medskip

\noindent{\it Exercise  \the\chapternum.1}
\quad   Consider the Markov chain {\ninepoint $(P, \pi_0) =
\Biggl(\left[\matrix{.9  & .1  \cr
               .3 & . 7 \cr}\right], \left[\matrix{.5 \cr .5\cr}\right]
\Biggr),$} %endninepoint
and a random variable $y_t = \overline y x_t$
where $\overline y = \left[\matrix{1 \cr 5 \cr}\right]$.
Compute the likelihood of the following three
histories for $y_t$ for $t = 0, 1, \ldots , 4$:
\medskip
\noindent{\bf a.}    $1, 5, 1, 5, 1$.
\medskip
\noindent{\bf b.} $1, 1, 1, 1, 1$.
\medskip
\noindent{\bf c.} $5,5,5,5,5$.


\medskip
\noindent{\it Exercise \the\chapternum.2} \quad  Consider a two-state
Markov chain.
Consider a random variable $y_t = \overline y x_t$
where $\overline y=\left[\matrix{1 \cr 5 \cr}\right]$.
   It is known
that $E ( y_{t+1}   | x_t ) =  \left[\matrix{1.8 \cr
                             3.4 \cr}\right]$
and that $E(  y_{t+1}^2   | x_t ) =  \left[\matrix{5.8 \cr
                             15.4 \cr}\right]$.
Find a transition matrix consistent with these  conditional
expectations. Is this transition matrix unique (i.e., can
you find another one that is consistent with these conditional
expectations)?

\medskip

\noindent{\it Exercise \the\chapternum.3} \quad
  Consumption is governed by  an $n$-state Markov chain
$P, \pi_0$ where $P$ is a stochastic matrix and
$\pi_0$ is an initial probability distribution.
Consumption takes one of the values in the $n \times 1$ vector
$ \overline c$.
A consumer ranks stochastic processes of consumption
$t =0 ,1 \ldots$  according to
$$ E \sum_{t=0}^\infty \beta^t u(c_t) $$
where $E$ is the mathematical expectation and
$u(c) = {c^{1-\gamma} \over 1 - \gamma}$ for some
parameter $\gamma \geq 1$.      Let $u_i = u(\overline c_i)$.
Let
$v_i = E[ \sum_{t=0}^\infty \beta^t u(c_t) | x_0 = \overline e_i]$ and
$V = E v $, where $\beta \in (0,1)$
is a discount factor.

\medskip
\noindent
{\bf a.}  Let $u$ and $v$ be the  $n \times 1$
vectors
whose $i$th components are $u_i$ and $v_i$, respectively.
  Verify the following formulas for  $v$ and $V$:
$ v = (I - \beta P)^{-1} u,$ and $ V = \sum_i \pi_{0,i} v_i$.
\medskip
\noindent {\bf b.}   Consider the following two Markov processes:

\noindent
Process 1:  $\pi_0 = \left[\matrix{.5 \cr
                                  .5 \cr}\right]$,
$P = \left[\matrix{1 & 0 \cr
                   0 & 1 \cr} \right]$.

\noindent
Process 2:  $\pi_0 = \left[\matrix{.5 \cr
                                  .5 \cr}\right]$,
$P = \left[\matrix{.5 & .5 \cr
                   .5  & .5 \cr} \right]$.

\noindent For both Markov processes, $\overline c = \left[\matrix { 1 \cr
                                                     5 \cr} \right]$.
\medskip
\noindent Assume that $\gamma=2.5, \beta=.95$.
Compute the unconditional discounted
expected utility  $V$ for each of these processes.
Which of the two processes does
the consumer  prefer? Redo the calculations for $\gamma=4$.  Now which
process does the consumer prefer?
\medskip
\noindent{\bf c.}   An econometrician observes a sample of 10
observations of consumption rates for our consumer.  He knows that
one of the two preceding Markov processes generates the data, but he does
not know  which one.  He assigns equal ``prior probability'' to the two
chains.   Suppose that the 10 successive observations on
consumption are as follows: $1,1,1,1,1,1,1,1,1,1$.   Compute the
likelihood of this sample under process 1 and under process 2.
Denote the likelihood function  \hfil\break ${\rm Prob(data}
\vert {\rm Model}_i), i=1,2$.

\medskip
\noindent{\bf d.}
  Suppose that the
econometrician uses Bayes' law to revise his initial probability
 estimates for the two models, where in this context Bayes'  law
states:
%$$ {\rm Prob(Model}_i) | {\rm data} =$$
%   {
%    ( {\rm Prob(data)}  \vert {\rm Model}_i )\cdot {\rm Prob (Model}_i)
%  \over \sum_j     {\rm Prob(data)}  \vert {\rm Model}_j  \cdot
%    {\rm Prob (Model})_j) }.$$
$$ {\rm Prob}(M_i) \vert {\rm data} =
 { ({\rm Prob(data)} \vert M_i) \cdot {\rm Prob}(M_i)
   \over \sum_j {\rm Prob(data)}\vert M_j \cdot {\rm Prob}(M_j)}
   $$
where $M_i$ denotes model $i$. The denominator of this expression
is the unconditional probability of the data. After observing the
data sample,  what probabilities does the econometrician place on
the two possible models?


\medskip
\noindent{\bf e.}  Repeat the calculation in part d, but now assume
that the data sample is
$ 1, 5, 5, 1, 5, 5, 1, 5, 1, 5$.


\medskip

\noindent{\it Exercise \the\chapternum.4} \quad     Consider the univariate stochastic
process
$$y_{t+1} = \alpha + \sum_{j=1}^4 \rho_j y_{t+1-j} + c w_{t+1}  $$
where $w_{t+1}$ is a scalar martingale difference sequence
adapted to \hfil\break
 $J_t = [w_t, \ldots, w_1, y_{0}, y_{-1}, y_{-2}, y_{-3}]$,
$\alpha = \mu (1 - \sum_j \rho_j)$ and
the $\rho_j$'s are such that the matrix
$$ A = \left[\matrix{\rho_1 &\rho_2 & \rho_3 &\rho_4 & \alpha \cr
                      1 & 0 & 0 & 0 & 0 \cr
                      0 & 1 & 0 & 0 & 0 \cr
                      0 & 0 & 1 & 0 & 0 \cr
                      0 & 0 & 0 & 0 & 1 \cr} \right] $$
has all of its eigenvalues in modulus  bounded below unity.

\medskip
\noindent{\bf a.}  Show how to map this  process  into
a first-order linear stochastic difference equation.
\medskip
\noindent{\bf b.}   For each of the following  examples,
 if possible, assume that the initial conditions
are such that $y_t$ is covariance stationary.  For each
case, state   the appropriate initial conditions.  Then
compute the covariance stationary mean and
variance of $y_t$ assuming the following parameter sets of parameter
values:
\medskip
\noindent {\it i.} $\rho= \left[\matrix{ 1.2 & -.3  & 0 & 0 \cr}\right]$,
$\mu=10, c=1$.
\medskip
\noindent{\it ii.} $\rho= \left[\matrix{ 1.2 & -.3  & 0 & 0 \cr}\right]$,
$\mu=10, c=2$.
\medskip
\noindent{\it iii.} $\rho=\left[\matrix{ .9 & 0 & 0 & 0 \cr}\right]$,
$\mu=5, c=1$.
\medskip
\noindent{\it iv.} $\rho=\left[\matrix{ .2 & 0 & 0 & .5 \cr}\right]$,
$\mu=5, c=1$.
\medskip
\noindent{\it v.} $\rho=\left[\matrix{ .8 & .3 & 0 & 0 \cr}\right]$,
$\mu=5, c=1$.
\medskip
\noindent{\it Hint 1:}  The Matlab program {\tt doublej.m}, in
particular, the command \hfil\break {\tt X=doublej(A,C*C')}
computes the solution of the matrix equation $AX A' + CC' = X$.
This program can be downloaded from \hfil\break
%{\tt ftp://zia.stanford.edu/pub/sargent/webdocs/matlab}.
%$<$ftp://zia.stanford.edu/pub/\raise-4pt\hbox{\~{}}sargent/webdocs/matlab$>$.
% $<$http://homepages.nyu.edu/pub/\raise-4pt\hbox{\~{}}ts43/source\hbox{\_{}}code/mitbook.zip$>$.
% $<$http://homepages.nyu.edu/pub/\raise-4pt\hbox{\~{}}ts43/source\hbox{\_{}}code/mitbook.zip$>$.
$<$www.tomsargent.com/source\_code/mitbook.zip$>$.
%$<$https://files.nyu.edu/ts43/public/books.html$>$.
\medskip
\noindent{\it Hint 2:} The mean vector is the eigenvector
of $A$ associated with a unit eigenvalue, scaled so that
the mean of   unity in the state vector is unity.

\medskip
\noindent{\bf c.}  For each case in part b, compute the $h_j$'s in
$E_t y_{t+5}  = \gamma_0 + \sum_{j=0}^3 h_j y_{t-j}$.
\medskip
\noindent{\bf d.}  For each case  in part b, compute the $\tilde h_j$'s
in $E_t \sum_{k=0}^\infty .95^k  y_{t+k} = \sum_{j=0}^3 \tilde h_j y_{t-j}$.

\medskip
\noindent{\bf e.}  For each case in  part b, compute
the autocovariance $E (y_t - \mu_y) (y_{t-k} - \mu_y)$ for the three values
$k = 1, 5, 10$.


\medskip
\noindent{\it Exercise \the\chapternum.5} \quad   A consumer's rate of
consumption follows the stochastic process
$$ \eqalign{ c_{t+1} & = \alpha_c
+ \sum_{j=1}^2 \rho_j c_{t-j+1}  + \sum_{j=1}^2
 \delta_j  z_{t+1-j}  + \psi_1 w_{1,t+1} \cr
  z_{t+1} & = \sum_{j=1}^2 \gamma_j c_{t-j+1} + \sum_{j=1}^2 \phi_j z_{t-j+1}
+ \psi_2 w_{2,t+1}
   \cr} \leqno(1)   $$
where $w_{t+1}$ is a $2 \times 1$ martingale difference
sequence,  adapted to \hfil \break $J_t = \left[\matrix{w_t & \ldots w_1 &
c_{0} & c_{-1} & z_0 & z_{-1}\cr}\right]$, with contemporaneous
covariance matrix $E w_{t+1}w_{t+1}' | J_t = I$,  and the
coefficients  $\rho_j, \delta_j, \gamma_j, \phi_j$ are such
that
the matrix
$$ A = \left[\matrix{ \rho_1 & \rho_2 & \delta_1 &\delta_2 & \alpha_c \cr
                      1 & 0 & 0 & 0 & 0 \cr
                     \gamma_1 & \gamma_2 & \phi_1 &\phi_2 & 0 \cr
                     0 & 0 & 1 & 0 & 0 \cr
                     0 & 0 & 0 & 0 & 1 \cr }\right]$$
has eigenvalues bounded strictly below unity in modulus.

The consumer evaluates consumption streams according to
$$ V_0=  E_0 \sum_{t=0}^\infty .95^t u(c_t), \leqno(2) $$
where the  one-period utility function is
$$u(c_t) =  -.5 ( c_t - 60)^2. \leqno(3)$$

\medskip
\noindent{\bf a.}  Find a formula for $V_0$ in terms
of the parameters of the one-period utility function    (3)
and the stochastic process for consumption.

\medskip
\noindent{\bf b.}  Compute $V_0$ for the following two sets of
parameter values:
\medskip
\noindent{\it i.}  $\rho=\left[\matrix{.8 & -.3 \cr}\right],
\alpha_c =1, \delta =\left[\matrix{.2 &0 \cr}\right],
\gamma=\left[\matrix{0 & 0 \cr}\right],
\phi = \left[\matrix{.7 & -.2}\right]$, $\psi_1 =\psi_2 =1$.

\medskip
\noindent{\it ii.}  Same as for part i except now $\psi_1=2, \psi_2 =1$.

\medskip
\noindent{\it Hint:}  Remember
{\tt doublej.m}.

\medskip

\noindent{\it Exercise \the\chapternum.6} \quad   Consider the stochastic process
$\{c_t, z_t\}$ defined by equations (1) in exercise {\it \the\chapternum.5\/}.   Assume the parameter
values described in part b, item  i.   If possible,
assume the initial conditions are such that  $\{c_t, z_t\}$ is
covariance stationary.

\medskip
\noindent{\bf a.}  Compute
the initial mean and  covariance matrix
that make the process covariance stationary.

\medskip
\noindent{\bf b.} For the initial conditions in part a,
  compute numerical values of
the following population linear
regression:
$$  c_{t+2} = \alpha_0 + \alpha_1 z_{t}  + \alpha_2 z_{t-4} +
    w_t $$
where $E w_t \left[\matrix{1 & z_t & z_{t-4}\cr}\right] =
\left[\matrix{0 & 0 & 0 \cr}\right]$.





\medskip
\noindent{\it Exercise \the\chapternum.7} \quad  Get the Matlab
programs {\tt bigshow3.m} and {\tt freq.m} from
\hfil\break
\noindent
%{\tt ftp://zia.stanford.edu/pub/sargent/webdocs/matlab}.
%$<$http://homepages.nyu.edu/pub/\raise-4pt\hbox{\~{}}ts43/source\hbox{\_{}}code/mitbook.zip$>$.
$<$www.tomsargent.com/source\_code/mitbook.zip$>$.
%$<$https://files.nyu.edu/ts43/public/books.html$>$. %
\noindent Use {\tt bigshow3} to compute and display a simulation
of length 80, an impulse response function, and a spectrum for
each of the following scalar stochastic processes $y_t$.  In each
of  the following,  $w_t$ is a scalar martingale difference
sequence adapted to its own history and the initial values
of lagged $y$'s.
\medskip
\noindent{a.} $y_t = w_t$.
\medskip
\noindent{b.} $y_t = (1 + .5L) w_t$.
\medskip
\noindent{c.} $y_t = (1 + .5L + .4L^2) w_t $.
\medskip
\noindent{d.} $(1 - .999L) y_t = (1 - .4 L) w_t $.
\medskip
\noindent{e.} $(1- .8L) y_t = (1 + .5L + .4L^2 ) w_t$.
\medskip
\noindent{f.}  $ (1+.8L) y_t = w_t $.
\medskip
\noindent{g.}  $y_t = (1 - .6L) w_t$.

\medskip
\noindent  Study   the output and look for patterns.  When you
are done, you will be well on your way to knowing how
to read spectral densities.


\medskip





\noindent{\it Exercise  \the\chapternum.8} \quad This exercise
deals with Cagan's  money demand under rational
expectations.
A version of Cagan's (1956) demand function for money
is
$$ m_t - p_t = - \alpha (p_{t+1} - p_t),  \alpha > 0, \ t \geq 0, \leqno(1) $$
where $m_t$ is the log of the nominal money supply and
$p_t$ is the price level at $t$.  Equation  (1) states
that the demand for real balances varies inversely with
the expected rate of inflation, $(p_{t+1} -p_t)$.  There
is no uncertainty, so the expected   inflation rate equals
the actual one.   The money supply obeys  the
difference equation
$$ (1-L)(1-\rho L) m_t^s = 0  \leqno(2)$$
subject to initial condition for $m_{-1}^s, m_{-2}^s$.
In equilibrium, $$m_t \equiv m_t^s  \ \  \forall t \geq 0  \leqno(3) $$
 (i.e., the
demand for money equals the supply).  For now assume that
$$ \vert \rho \alpha  /( 1+ \alpha ) \vert< 1. \leqno(4)$$
An {\it equilibrium} is a $\{p_t\}_{t=0}^\infty$
that satisfies equations  (1), (2), and  (3)  for all $t$.

\medskip
\noindent{\bf a.}  Find an expression for an equilibrium $p_t$
of the form
$$ p_t = \sum_{j=0}^n w_j m_{t-j} + f_t. \leqno(5)  $$
Please tell how to get formulas for the $w_j$ for all $j$ and
the $f_t$ for all $t$.

\medskip
\noindent{\bf b.}  How many equilibria are there?

\medskip
\noindent{\bf c.}  Is there an equilibrium with $f_t =0$ for
all $t$?
\medskip
\noindent{\bf d.}  Briefly tell where, if anywhere, condition
(4) plays a role in your answer to part a.

\medskip
\noindent{\bf e.}  For the parameter values $\alpha =1,
\rho=1$, compute and display all the equilibria.
\medskip

\noindent{\it Exercise \the\chapternum.9} \quad
  The $n \times 1$  state vector of an economy
is governed by the linear stochastic difference equation
$$ x_{t+1} = A x_t + C_t w_{t+1} \leqno(1) $$
where $C_t$ is a possibly time-varying matrix (known at $t$)
and $w_{t+1}$ is an $m \times 1$ martingale difference
sequence adapted to its own history with $E w_{t+1} w_{t+1} '| J_t = I$,
where $J_t = \left[\matrix{w_t & \ldots & w_1 & x_0\cr}\right]$.
A scalar one-period payoff $p_{t+1}$ is given by
$$ p_{t+1} = P x_{t+1} \leqno(2)  $$
The stochastic discount factor for this economy is
a scalar $m_{t+1}$ that obeys
$$ m_{t+1} = {M x_{t+1}\over M x_t}. \leqno(3) $$
Finally, the price at time $t$ of the one-period payoff
is given by $q_t = f_t(x_t)$, where $f_t$ is some possibly time-varying
function
of the   state.    That $m_{t+1}$ is  a  stochastic   discount factor means
that
 $$E(m_{t+1} p_{t+1} | J_t) = q_t. \leqno(4) $$
\medskip
\noindent{\bf a.}   Compute  $f_t(x_t)$, describing in
detail how it depends on $A$ and $C_t$.

\medskip
\noindent{\bf b.}   Suppose that an econometrician has a time series
data set \hfil\break $X_t = \left[\matrix{z_t & m_{t+1} &  p_{t+1} & q_t \cr} \right]$,
for $t =1, \ldots, T$,
where $ z_t$ is a strict subset of the variables in the
state   $x_t$.  Assume that  investors in the economy see $x_t$ even
though the econometrician  sees only a subset $z_t$ of $x_t$.
Briefly describe a way to use these data to test implication
(4).  (Possibly but perhaps not useful hint: recall the law
of iterated expectations.)

\medskip
\noindent{\it Exercise \the\chapternum.10} \quad
Let $P$ be a transition matrix for a Markov chain. Suppose that
$P'$  has two
distinct eigenvectors $\pi_1, \pi_2$ corresponding to
unit eigenvalues of $P'$.  Scale $\pi_1$ and $\pi_2$ so that they are vectors of probabilities (i.e., elements are nonnegative and sum to unity).
 Prove for any
$\alpha \in  [0,1]$  that $\alpha \pi_1 + (1- \alpha ) \pi_2$
is an invariant distribution of $P$.
\medskip
\noindent{\it Exercise \the\chapternum.11}\quad Consider a Markov chain with
transition matrix
$$ P = \left[\matrix{ 1 & 0 & 0 \cr
                .2 & .5 & .3 \cr
                0 & 0 & 1 \cr} \right]$$
with initial distribution $  \pi_0 = \left[\matrix{ \pi_{1,0} &
 \pi_{2,0} & \pi_{3,0} \cr}\right]'$.    Let $\pi_t
= \left[\matrix{ \pi_{1t} & \pi_{2t} & \pi_{3t} \cr}\right]'$
 be the distribution
over states at time $t$.  Prove that for $t >0$
$$ \eqalign{ \pi_{1t} & = \pi_{1,0} +
   .2 \left( {1 - .5^t \over 1 -.5} \right) \pi_{2,0} \cr
     \pi_{2t} & = .5^t \pi_{2,0} \cr
   \pi_{3t} & = \pi_{3,0} + .3 \left( {1 - .5^t \over 1 - .5} \right)
  \pi_{2,0} .\cr }$$

\medskip
\noindent{\it Exercise \the\chapternum.12}\quad Let $P$ be a transition matrix for
a Markov chain.  For $t=1, 2, \ldots$, prove that the $j$th column
of $(P')^t$ is the distribution across states at $t$ when the initial
distribution is
$\pi_{j,0}=1, \pi_{i,0}=0 \forall i \neq j$.

\eqnum=0

\medskip
\noindent{\it Exercise \the\chapternum.13}\quad A household has preferences over
consumption processes $\{c_t\}_{t=0}^\infty$ that are  ordered by
$$ -  .5 \sum_{t=0}^\infty \beta^t [(c_t - 30)^2  + .000001b_t^2 ] \eqno(1)$$
where $\beta=.95$.  The household chooses a consumption, borrowing
plan to maximize (1) subject to the sequence of budget constraints
$$ c_t + b_t = \beta b_{t+1} + y_t  $$ % \EQN e132 $$
for $t \geq 0$, where $b_0$ is an initial condition,
$\beta^{-1}$ is the one-period gross risk-free interest rate,
$b_t$ is the household's one-period debt that is due in period $t$,
and $y_t$ is its labor income, which obeys the second-order autoregressive
process
$$ (1-\rho_1 L - \rho_2 L^2)y _{t+1}  =    (1 -\rho_1 -\rho_2) 5 +
    .05 w_{t+1} $$ % \EQN 133 $$
where $\rho_1 =1.3, \rho_2 = -.4$.

\medskip
\noindent{\bf a.}  Define the {\it state\/} of the household  at $t$ as
$x_t= \left[\matrix{1 & b_t & y_t & y_{t-1} \cr}\right]'$ and
the {\it control\/} as   $u_t = (c_t -30)$.  Then express the
transition law facing the household in the form \Ep{lasset1}.
Compute the eigenvalues of $A$.  Compute the
zeros of the characteristic polynomial $(1-\rho_1 z - \rho_2 z^2)$ and  compare
them with the eigenvalues of $A$. ({\it Hint:} To compute the zeros in
 Matlab,
set
%$a=\left[ \matrix{1 &  -1.3  & .4\cr}\right]$
$a=\left[ \matrix{.4 &  -1.3  &  1 \cr}\right]$
 and call
{\tt roots(a)}. The zeros of $(1 -\rho_1 z  - \rho_2 z^2) $ equal
the {\it reciprocals\/} of the eigenvalues of the associated $A$.)

\medskip
\noindent{\bf b.}  Write a Matlab program that uses the Howard improvement
algorithm \Ep{howardimprove} to compute the household's optimal decision rule
for $u_t = c_t -30$.   Tell how many iterations it takes for this to converge
(also tell your convergence criterion).
 \medskip
\noindent{\bf c.}  Use the household's optimal decision rule
to compute the law of motion for $x_t$ under the optimal decision rule in
the form $$ x_{t+1} = (A - BF^*) x_t + C w_{t+1}, $$ where $u_t = - F^* x_t$
is the optimal decision rule.  Using Matlab,
compute the impulse response function
of $\left[\matrix{c_t & b_t \cr}\right]'$ to $w_{t+1}$.
Compare these with the theoretical expressions
 \Ep{sprob16}.
\medskip
\noindent{\it Exercise \the\chapternum.14}\quad Consider a Markov chain with
transition matrix
$$P  =\left[\matrix{.5 & .5 & 0 & 0 \cr
                    .1 & .9 & 0 & 0 \cr
                    0 & 0 & .9 & .1 \cr
                    0 & 0 & 0 & 1 \cr} \right] $$
with state space $X = \{e_i, i=1, \ldots, 4\}$ where $e_i$ is the  $i$th unit
vector. A random variable $y_t$ is a function $y_t =
\left[\matrix{ 1 & 2 & 3 & 4\cr} \right] x_t$ of the underlying state.
\medskip
\noindent{\bf a.}  Find all stationary distributions of the Markov
chain.
\medskip
\noindent{\bf b.}  Can you find a stationary distribution for which the  Markov chain ergodic?
\medskip
\noindent{\bf c.}  Compute all possible limiting values of
the sample mean$ {1 \over T} \sum_{t=0}^{T-1} y_t$
as $T \rightarrow \infty$.


\medskip

\noindent{\it Exercise \the\chapternum.15}\quad
Suppose that a scalar is related to a scalar white noise $w_t$ with variance
$1$ by
 $y_t  = h(L) w_t $ where $h(L) = \sum_{j=0}^\infty L^j h_j $
and $\sum_{j=0}^\infty h_j^2 < + \infty$.   Then a special case
of formula \Ep{blackspectrum} coupled with the observer equation
$y_t = G x_t$ implies that the spectrum of $y$ is given by
$$ S_y(\omega)  = h(\exp(-i \omega) ) h( \exp(i \omega))
= | h(\exp(-i \omega))|^2 $$
where $h(\exp(-i \omega)) = \sum_{j=0}^\infty h_j \exp(- i \omega j) $.
\medskip
\noindent In a famous paper, Slutsky investigated the
consequences of applying the following filter to white noise:
$h(L) = (1 +L)^n(1-L)^m$  (i.e., the convolution of $n$ two-period
moving averages with $m$ difference operators).
Compute and plot the spectrum of $y$ for $\omega \in [-\pi, \pi]$ for
the following choices
of $m,n$:

\medskip
\noindent{\bf a.}  $m=10, n=10$.
\medskip
\noindent{\bf b.} $m=10, n=40$.
\medskip
\noindent{\bf c.}  $m=40, n=10$.
\medskip
\noindent{\bf d.} $m=120, n=30$.
\medskip
\noindent{\bf e.} Comment on these results.
\medskip
\noindent{\it Hint:}  Notice that $h(\exp(-i\omega))
 = (1+ \exp(-i \omega))^n (1-\exp(-i\omega))^m $.
\medskip



\noindent{\it Exercise \the\chapternum.16}\quad
Consider an $n$-state Markov chain with state space
$X = \{e_i, i = 1, \ldots ,n\}$ where $e_i$ is the $i$th unit vector.
Consider the indicator variable $I_{it} = e_i x_t$ which equals $1$
if $x_t = e_i$ and $0$ otherwise.
 Suppose that the chain has a unique stationary distribution and that
it is ergodic.
Let $\pi$ be the stationary distribution.

\medskip

\noindent{\bf a.}  Verify that $E I_{it} = \pi_i$.
%\answer E y_t = \pi' \overline y$.
\medskip
\noindent{\bf b.}  Prove that
$$ {1\over T} \sum_{t=0}^{T-1} I_{it} = \pi_i $$
as $T \rightarrow \infty$ with probability one with respect to
the stationary distribution $\pi$.

\medskip
\noindent{\it Exercise \the\chapternum.17}\quad  {\bf Lake model}
\medskip
\noindent
A worker can be in one of two states, state 1 (unemployed) or state
2 (employed).
At the beginning of each period,
a previously  unemployed worker has probability $\lambda = \int_{\bar w}^B d F(w)$
of  becoming employed.  Here $\bar w$ is his reservation wage
and $F(w)$ is the c.d.f.\ of     a wage offer distribution. We assume
that $F(0)=0, F(B)=1$. At the
beginning of each period an unemployed worker draws one and only one wage
offer from $F$. Successive draws from $F$ are i.i.d.
The worker's decision rule is to accept the job if $w \geq \bar w$,
and otherwise to reject it   and remain unemployed one more period.
Assume that $\overline w$ is such that $\lambda \in (0,1)$.
At the beginning of each period, a previously employed worker is fired
with probability $\delta \in (0,1)$. Newly fired workers must remain unemployed
for one period before drawing a new wage offer.

\medskip
\noindent{\bf a.}  Let the state space be $X = \{e_i, i=1,2\}$ where
$e_i$ is the $i$th unit vector. Describe the Markov chain on $X$
that is induced by the description above.
Compute all stationary distributions of the chain.
Under what stationary distributions, if any, is the chain ergodic?

\medskip
\noindent{\bf b.}   Suppose that $\lambda = .05, \delta  =.25$.  Compute
a stationary  distribution.  Compute the fraction of his life that
an infinitely lived worker would spend
unemployed.
\medskip
\noindent{\bf c.}   Drawing the initial state from the
stationary distribution, compute the joint distribution
$ g_{ij} = {\rm Prob}(x_t = e_i, x_{t-1} =e_j) $ for $i=1,2, j=1,2$.
\medskip
\noindent{\bf d.}  Define an indicator function by  letting
$I_{ij, t} = 1$ if $x_t = e_i, x_{t-1} = e_j$
at time $t$, and $0$ otherwise.
Compute
$$ \lim_{T \rightarrow \infty} {1 \over T} \sum_{t=1}^T I_{ij,t}$$
for all four $i,j$ combinations.

\medskip
\noindent{\bf e.}  Building on your results in  part d, construct
method of moments estimators of
$\lambda$  and $\delta$.
Assuming that you know the wage offer distribution $F,$
construct a method of moments estimator of  the reservation wage
$\bar w$.
\medskip
\noindent{\bf f.}  Compute maximum likelihood estimators of
$\lambda$ and $\delta$.
\medskip
\noindent {\bf g.}  Compare   the estimators you derived in
parts e and f.

\medskip
\noindent{\bf h.} {\it Extra credit.}  Compute the asymptotic covariance
matrix of the maximum likelihood estimators of $\lambda$ and $\delta$.

\medskip

\noindent{\it Exercise \the\chapternum.18}\quad  {\bf Random walk}
\medskip
\noindent
A Markov chain has state space $X=\{e_i, i=1, \ldots, 4\}$ where
$e_i$ is the unit vector and transition matrix
$$P = \left[\matrix{1 & 0 & 0 & 0 \cr
                    .5 & 0 & .5 & 0 \cr
                   0 & .5 & 0 & .5 \cr
                   0 & 0 & 0 & 1 \cr}\right].$$
A random variable $y_t = \overline y x_t$ is defined by
$\overline y = \left[\matrix{ 1 & 2 & 3 & 4\cr}\right]$.
\medskip
\noindent{\bf a.}  Find all stationary distributions   of this Markov
chain.
\medskip
\noindent{\bf b.}  Under what stationary distributions, if any, is this chain ergodic?   Compute invariant functions
of $P$.
\medskip
\noindent{\bf c.}  Compute $E[y_{t+1}| x_t]$ for $x_t = e_i, i=1, \ldots, 4$.

\medskip\noindent{\bf d.}
Compare your answer to part (c) with \Ep{invariant22}. Is $y_t =
\overline y'x_t$ invariant?  If not, what  hypothesis of
\Theorem{invariant2200} is violated?
\medskip
\noindent{\bf e.}  The stochastic process $y_t  = \overline y'x_t$
is evidently a bounded martingale.  Verify that $y_t$ converges almost
surely to a constant.  To what constant(s) does it converge?
\medskip

\noindent{\it Exercise \the\chapternum.19}\quad  {\bf IQ}
\medskip
\noindent An infinitely lived  person's `true intelligence'  $\theta \sim {\cal N}(100, 100)$, i.e., mean 100, variance 100. For each date $t \geq 0$,
the person takes a `test' with the  outcome being a univariate random variable   $y_t = \theta + v_t$,
where $v_t$ is an iid process with distribution ${\cal N}(0, 100)$. The person's initial IQ is $IQ_0=100$ and at date  $t\geq 1$ before the date $t$ test is
taken it is ${\rm IQ}_t = E \theta | y^{t-1}$, where $y^{t-1}$ is the history of test scores from date $0$ until date $t-1$.

\medskip
\noindent{\bf a.}  Give a recursive formula for ${\rm IQ}_t$ and for $E ({\rm IQ}_t - \theta)^2$.
\medskip
\noindent {\bf b.}  Use Matlab to simulate 10 draws of $\theta$ and associated paths of $y_t, {\rm IQ}_t$ for $t=0, \ldots, 50$.
\medskip
\noindent {\bf c.}  Prove that $  \lim_{t\rightarrow \infty} E ({\rm IQ}_t - \theta)^2=0$.

\medskip

\noindent{\it Exercise \the\chapternum.20}\quad  {\bf Random walk}
\medskip
\noindent A scalar process $x_t$ follows the process
$$ x_{t+1} = x_t + w_{t+1}  $$
where $w$ is an iid ${\cal N}(0,1)$ scalar process and $x_0 \sim {\cal N}(\hat x_0, \Sigma_0)$.
Each period, an observer receives two signals in the form of a $2 \times 1$ vector $y_t$ that obeys
$$ y_t = \bmatrix{1 \cr 1\cr} x_t + v_t $$
where the $2 \times 1$ process $v_t$ is iid with distribution $v_t \sim {\cal N}(0, R)$ where
$R =\bmatrix{ 1 & 0 \cr 0 &1  \cr}$.

\medskip
\noindent{\bf a.}  Suppose that $\Sigma_0 = 1.36602540378444$.  For $t \geq 0$, find formulas for
$E [x_t |  y^{t-1}] $, where $y^{t-1}$ is the history of $y_s$ for $s$ from $0$ to $t-1$.

\medskip
\noindent{\bf b.} Verify numerically that the matrix $A-KG$ in formula \Ep{VAR1} is stable.

\medskip
\noindent{\bf c.}  Find an infinite-order vector autoregression for $y_t$.


\medskip

\noindent{\it Exercise \the\chapternum.21}\quad  {\bf Impulse response for VAR}
\medskip
\noindent Find the impulse response function for the state space representation \Ep{innovti} associated with a vector autoregression.



\medskip

\noindent{\it Exercise \the\chapternum.22}\quad  {\bf Kalman filter with cross-products}
\medskip
\noindent Consider the state space system
$$ \eqalign{ x_{t+1} & = A_o  x_t + C w_{t+1} \cr
             y_{t+1} & = G x_t + D w_{t+1} \cr } $$ % \EQN Kfiltercross0$$
where $x_t $ is an $n \times 1$ state vector $w_{t+1}$ is an $m \times 1$ iid process with distribution ${\cal N}(0, I)$, $y_t$
is an $m \times 1$ vector of observed variables, and $x_0 \sim {\cal N}(\hat x_0, \Sigma_0)$.
For $t \geq 1$,   $\hat x_t = E[x_t | y^t]$ where $y^t = [y_t, \ldots, y_1]$ and $\Sigma_t = E (x_t - \hat x_t)(x_t - \hat x_t)'$.
\medskip
\noindent{\bf a.} Show how to select $w_{t+1},  C$, and $D$ so that $C w_{t+1}$ and $D w_{t+1}$ are mutually uncorrelated processes.
Also give an example in which $C w_{t+1}$ and $D w_{t+1}$ are correlated.
\medskip
\noindent {\bf b.}  Construct a recursive representation for $\hat x_t$ of the form:
$$ \eqalign{ \hat x_{t+1} & = A_o \hat x_t + K_t a_{t+1} \cr
              y_{t+1} & = G \hat x_t + a_{t+1} \cr } $$ %  \EQN Kfiltercross1 $$
where $a_{t+1} = y_{t+1} - E [y_{t+1}| y^t]$ for $t\geq 0$ and verify that
$$\eqalign{ K_t  & = (CD' + A \Sigma_t G')(DD' + G \Sigma_t G') ^{-1} \cr
           \Sigma_{t+1} & = (A-K_t G) \Sigma_t (A-K_t G)' + (C-K_t D) (C-K_t D)' \cr}  $$ % \EQN Kfiltercross2$$
and  $E a_{t+1} a_{t+1}' = G \Sigma_t G' + D D' $.  {\it Hint:} apply the population regression formula.
\index{Kalman filter!cross-products between measurement and state noise}


\medskip

\noindent{\it Exercise \the\chapternum.23}\quad  {\bf A monopolist, learning, and ergodicity}
\medskip
 \noindent A monopolist  produces a quantity $Q_t$ of a single good in every period $t \geq 0$ at zero cost.
 At the beginning of each period $t \geq 0$, before output price $p_t$ is observed, the monopolist sets quantity $Q_t$ to maximize
$$ E_{t-1} p_t Q_t \leqno(1)  $$
where $p_t$ satisfies the linear inverse demand curve
$$ p_t = a - b Q_t + \sigma_p \epsilon_{t}  \leqno(2) $$
where $b>0$ is a constant known to the firm,  $\epsilon_t$ is an i.i.d.\ scalar with distribution $\epsilon_t \sim {\cal N}(0,1)$, and the constant
in the inverse demand curve
$a$ is a  scalar random variable unknown to the firm and whose  unconditional distribution is $a \sim {\cal N}(\mu_a, \sigma_a^2)$,
where $\mu_a>0$ is large relative to $\sigma_a>0$. Assume that the random variable $a$  is independent of $\epsilon_t$ for all $t$. Before the firm chooses
$Q_0$, it knows the unconditional distribution of $a$, but not the realized value of $a$. For each $t \geq 0$, the firm wants to estimate  $a$ because it wants to make a good decision about output $Q_t$.
 At the end of each period $t$, when it must set $Q_{t+1}$, the firm observes $p_t$ and also of course knows the value of $Q_t$ that it had
 set.
 In (1), for $t \geq 1$,  $E_{t-1} (\cdot) $ denotes the mathematical expectation conditional on the history of signals $p_s, q_s, s= 0, \ldots, t-1$;
for $t=0$, $E_{t-1} (\cdot)$ denotes the expectation conditioned on no previous  observations of $p_t, Q_t$.

\medskip
\noindent {\bf a.}    What is the optimal setting for $Q_0$?  For each date $t \geq 0$, determine the firm's optimal setting for  $Q_t$ as a function
of the information $p^{t-1}, Q^{t-1}$ that the firm has when it sets $Q_t$.
\medskip
\noindent{\bf b.}  Under the firm's optimal policy, is  the pair $(p_t, Q_t)$ Markov?
\medskip
\noindent{\bf c.} `Finding the state is an art.' Find a recursive representation of the firm's optimal policy for setting $Q_t$ for $t \geq 0$.  Interpret the state variables that you
propose.
\medskip
\noindent  {\bf d.} Under the firm's optimal rule for setting $Q_t$, does  the random variable $ E_{t-1} p_t$  converge to a constant
as $t \rightarrow +\infty$?  If so, prove that it does and find the limiting value. If not, tell why it does not converge.
\medskip
\noindent {\bf e.}  Now suppose that instead of maximizing (1) each period, there is a single infinitely lived monopolist who
once and for all before time $0$ chooses a plan for an entire sequence  $\{Q_t\}_{t=0}^\infty$, where the $Q_t$ component has to be a measurable function
of $(p^{t-1}, q^{t-1})$, and where the monopolist's objective is to maximize
$$ E_{-1} \sum_{t=0}^\infty \beta^t p_t Q_t \leqno(3)  $$
where $\beta \in (0,1)$ and $E_{-1}$ denotes the mathematical expectation conditioned on the null history. Get as far as you can
in deriving the monopolist's optimal sequence of decision rules.



\medskip
\noindent{\it Exercise \the\chapternum.24}\quad  {\bf Stationarity}
\medskip
\noindent A pair of scalar stochastic processes $(z_t, y_t)$ evolves according to the state
system for $t \geq 0$:
$$\eqalign{ z_{t+1} & = .9 z_t + w_{t+1} \cr
             y_t & = z_t + v_t \cr} $$
where $w_{t+1}$ and $v_t$ are mutually uncorrelated scalar Gaussian random variables with
means of $0$ and variances of $1$.  Furthermore, $E w_{t+1} v_s = 0$ for all $t, s$ pairs.
In addition, $z_0 \sim {\cal N}(\hat z_0, \Sigma_0)$.

\medskip
\noindent{\bf a.}  Is $\{z_t\}$ Markov?  Explain.

\medskip
\noindent {\bf b.} Is $\{y_t\}$ Markov? Explain.
\medskip
\noindent{\bf c.} Define what it would mean for the scalar process $\{z_t\}$ to be
{\it covariance stationary\/}.
\medskip
\noindent {\bf d.} Find values of $(\hat z_0, \Sigma_0)$ that make the process
for $\{z_t\}$ covariance stationary.

\medskip
\noindent {\bf e.} Assume that $y_t$ is observable, but that $z_t$ is not.  Define what it would mean for the scalar process $y_t$ to be
{\it covariance stationary\/}.
\medskip
\noindent {\bf f.} Describe in as much detail as you can  how to represent  the distribution of $y_t$ conditional  on the infinite  history $y^{t-1}$  in the form $y_t \sim {\cal N}(E [y_t | y^{t-1}], \Omega_t)$. % Find a value of $\Omega_0$ that makes the scalar $\{y_t\}$ process covariance stationary.


\medskip
\noindent{\it Exercise \the\chapternum.25}\quad  {\bf Consumption}
\medskip
\noindent {\bf a.}  Please use formulas \Ep{sprob16} to verify formulas  \Ep{consexample1}
and \Ep{consexample2}-\Ep{incomemaar} of subsection \use{sec:classic_consumption}.
\medskip
\noindent{\bf b.} Please use formulas \Ep{muth2} to compute the decision rules
in formulas   \Ep{consexample1} and \Ep{consexample2} for the following parameter values:
$\beta = .95, \sigma_1= \sigma_2 =1$.
\medskip
\noindent{\bf c.}
 Please use formulas \Ep{muth2} to compute the decision rules
in formulas   \Ep{consexample1} and \Ep{consexample2} for the following parameter values:
$\beta = .95, \sigma_1= 2,  \sigma_2 =1$.

\medskip
\noindent{\bf d.} Please use formula \Ep{debt_evolution} to confirm formulas
\Ep{consexample1a} and \Ep{consexample1b}.

\medskip

\noindent{\it Exercise \the\chapternum.26}\quad  {\bf Math and verbal IQ's}
\medskip
\noindent An infinitely lived  person's `true intelligence' $\theta$ has two components, math ability $\theta_1$ and
verbal ability $\theta_2$, where  $\theta \sim {\cal N}\Biggl(\bmatrix{100 \cr 100},  \bmatrix{100 & 0 \cr 0 & 100}\Biggr)$.
 For each date $t \geq 0$,
the person takes a single `test' with the  outcome being a univariate random variable   $y_t = G_t \theta + v_t$,
where $v_t$ is an iid process with distribution ${\cal N}(0, 50)$ and $G_t =\bmatrix{.9 & .1 }$ for $t = 0, 2, 4, \ldots$ and
$G_t = \bmatrix{.01 & .99}$ for $t=1, 3, 5, \ldots$.  Here the person takes a math test at $t$ even and a verbal test at $t$ odd (but you have to know
how to read English to survive the math test, and you have to know how to tell time in order to plan your time allocation well for the verbal test).
 The person's initial IQ vector is $IQ_0=\bmatrix{100 \cr 100}$ and at date  $t\geq 1$ before the date $t$ test is
taken it is ${\rm IQ}_t = E \theta | y^{t-1}$, where $y^{t-1}$ is the history of test scores from date $0$ until date $t-1$.

\medskip
\noindent{\bf a.}  Give a recursive formula for ${\rm IQ}_t$ and for $E ({\rm IQ}_t - \theta)({\rm IQ}_t - \theta)'$.
\medskip
\noindent {\bf b.}  Use Matlab to simulate 10 draws of $\theta$ and associated paths of $y_t, {\rm IQ}_t$ for $t=0, \ldots, 50$.
\medskip
\noindent {\bf c.} Show computationally or analytically  that $\lim_{t\rightarrow +\infty}E({\rm IQ}_t - \theta)({\rm IQ}_t - \theta)'=\bmatrix{0 & 0 \cr 0 & 0}$.




\medskip

\noindent{\it Exercise \the\chapternum.27}\quad  {\bf Permanent income model again}
\medskip
\noindent  Each of two consumers named $i=1,2$  has preferences over consumption streams
that are ordered by
the utility functional
$$  E_0 \sum_{t=0}^\infty \beta^t u(c_t^i)  \leqno(1) $$
where $E_t$ is the mathematical expectation conditioned
on the consumer's time $t$ information,  $c_t^i$ is time $t$ consumption of consumer $i$ at time $t$,
$u(c)$ is a strictly concave one-period utility function, and
$\beta \in (0,1)$ is a discount factor.  The consumer maximizes
(1)  by choosing a consumption, borrowing plan
 $\{c_t^i, b_{t+1}^i\}_{t=0}^\infty$ subject to the sequence of budget constraints
$$ c_t^i + b_t^i = R^{-1} b_{t+1}^i  + y_t^i  $$
where $y_t$ is an exogenous
 stationary endowment process, $R$ is a constant gross
risk-free interest rate, $b_t^i$ is one-period risk-free  debt  maturing at
$t$, and $b_0^i = 0$ is a given initial condition.  Assume
that $R^{-1} = \beta$. We impose the following condition on the
consumption, borrowing plan of consumer $i$:
$$  % \lim_{T \rightarrow +\infty}
 E_0 \sum_{t=0}^\infty \beta^t (b_t^i)^2 < +\infty. $$ % \EQN sprob3 $$
Assume
the quadratic  utility function
$u(c_t) =  -.5 (c_t - \gamma)^2$,
where $\gamma > 0 $ is a bliss level of consumption. Negative consumption  rates
are allowed.

 Let $s_t \in \{0,1\}$ be an i.i.d.\ process
with ${\rm Prob} (s_t =1) = {\rm Prob} (s_1 = 0 ) = .5$.  The endowment process
of consumer 1 is $y_t^1 = 1 - .5 s_t$ and the endowment process of person 2 is
$y_t^2 = .5+ .5 s_t$.  Thus, the two consumers' endowment processes are perfectly
negatively correlated i.i.d.\ processes with means of .75.


\medskip
\noindent{\bf a.} Find  optimal decision rules for consumption for both consumers.
Prove that the consumers' optimal decisions imply the following laws of motion
for $b_t^1, b_t^2$:
$$ \eqalign{ b_{t+1}^1(s_t =0) & = b_t^1 - .25 \cr
            b_{t+1}^1(s_t =1) & = b_t^1 + .25 \cr
            b_{t+1}^2(s_t =0) & = b_t^2 + .25 \cr
            b_{t+1}^2(s_t =1) & = b_t^2 - .25 \cr } $$
\medskip
\noindent{\bf b.} Show that for each consumer, $c_t^i, b_t^i$ are co-integrated.
\medskip
\noindent{\bf c.} Verify that $b_{t+1}^i$ is risk-free in the sense that conditional on
information available at time $t$, it is independent of news arriving at time $t+1$.
\medskip
\noindent{\bf d.} Verify that with the initial conditions $b_{0}^1 = b_0^2 =0$, the following
two equalities obtain:
$$ \eqalign{ b_t^1 + b_t^2& = 0 \quad \forall t \geq 1 \cr
             c_t^1 + c_t^2 & = 1.5 \quad \forall t \geq 1 \cr} $$
Use these conditions to interpret the decision rules that you have computed as describing a closed pure consumption loans economy
in which consumers $1$ and $2$ borrow and lend with each other and in which the risk-free asset is a one-period IOU from one of the consumers to the other.
\medskip

\noindent{\bf e.}  Define the `stochastic discount factor of consumer $i$' as
$m_{t+1}^i = {\frac{\beta u'(c_{t+1}^i)}{u'(c_t^i)}}$.  Show that the stochastic discount factors of consumer $1$ and $2$ are
$$  m_{t+1}^1 = \cases{ \beta + .25 {\frac{\beta(1-\beta)}{(\gamma- c_t^1)}} , & if $s_{t+1} = 0 $; \cr
                          \beta - .25 {\frac{\beta(1-\beta)}{(\gamma- c_t^1)}} , & if $s_{t+1} = 1 $;    . \cr}  $$
$$  m_{t+1}^2 = \cases{ \beta - .25 {\frac{\beta(1-\beta)}{(\gamma- c_t^2)}} , & if $s_{t+1} = 0 $; \cr
                          \beta + .25 {\frac{\beta(1-\beta)}{(\gamma- c_t^2)}} , & if $s_{t+1} = 1 $;    . \cr}  $$
Are the stochastic discount factors of the two consumers equal?
\medskip
\noindent{\bf f.} Verify that $E_t m_{t+1}^1 = E_t m_{t+1}^2 = \beta$.
\medskip





\medskip

\noindent{\it Exercise \the\chapternum.28}\quad  {\bf Invertibility}
\medskip
\noindent A univariate stochastic process $y_t$ has a first-order moving
average representation
$$ y_t = \epsilon_t - 2 \epsilon_{t-1} \leqno(1) $$
where $\{\epsilon_t\}$ is an i.i.d.\ process distributed
${\cal N}(0,1)$.

\medskip
\noindent{\bf a.}  Argue that $\epsilon_t$ cannot be expressed as as linear combination of
$y_{t-j}, j \geq 0$ where the sum of the squares of the weights is finite.  This means
that $\epsilon_t$ is not in the space spanned by square summable linear combinations
of the infinite history $y^t$.
\medskip
\noindent{\bf b.}  Write equation (1) as a state space system, indicating the matrices
$A, C, G$.
\medskip
\noindent {\bf c.} Using the matlab program {\tt kfilter.m} \mtlb{kfilter.m}%
to compute an innovations representation for $\{y_t\}$.  Verify that the innovations
representation for $y_t$ can be represented as
$$ y_t = a_t - .5 a_{t-1} \leqno(2) $$
where $a_t = y_t - E [y_t| y^{t-1} ] $ is a serially uncorrelated process.
Compute the variance of $a_t$.  Is it larger or smaller than the variance of $\epsilon_t$?

\medskip
\noindent{ \bf d.}  Find an autoregressive representation for $y_t$ of the form
$$ y_t = \sum_{j=1}^\infty A_j y_{t-j} + a_t \leqno(3) $$
where $E a_t y_{t-j} = 0 $ for $j \geq 1$. ({\it Hint:} either use formula (2) or else  remember formula \Ep{VAR1}.)

\medskip
\noindent{\bf e.}  Is $y_t$ Markov?  Is $\bmatrix{y_t & y_{t-1}}'$ Markov? Is $\bmatrix{y_t & y_{t-1} & \cdots & y_{t-10}}'$ Markov?

\medskip
\noindent{\bf f.}  Extra credit.  Verify that $\epsilon_t$ {\it can\/} be expressed as a square summable linear combination
of $y_{t+j}, j\geq 1$.



\medskip

\noindent{\it Exercise \the\chapternum.29}\quad  {\bf Pure prediction}
\medskip

\noindent Consider the Bellman equation
$$ \mu_t = R x_t + A' \mu_{t+1}, \leqno(1) $$
where $ x_{t+1} = A x_t $.  Here $x_t$ is an $n \times 1$ vector, $A$
is an $n \times n$ stable matrix, and $R$ is an $n \times n$ positive definite matrix.

\medskip

\noindent{\bf a.} Show that $$\mu_t = R \sum_{j=0}^\infty  A^{\prime j} x_{t+j} \leqno(2) $$ is a solution of the forward-looking difference
equation (1).

\medskip

\noindent{\bf a.} Guess that $\mu_t = P x_t$, then verify that
$$ P = R + A' P A . \leqno(3) $$

\medskip
\noindent{\bf c.} Please compare formula (3) for $P$ with formula \Ep{diff5} for the unconditional covariance matrix of
a vector governed by an autoregression.

\medskip
\noindent{\bf d.}  Please compare formula (3) with the formula \Ep{lasset7} for $P$ that is a key step in the Howard policy
improvement algorithm.

\medskip
\noindent{\bf e.} Can you invent a counterpart of the Howard policy improvement algorithm  to  compute a time-invariant
version of the Kalman filter?


\medskip

\noindent{\it Exercise \the\chapternum.30}\quad  {\bf Phelps and Pollak (1968) meet Howard}
\medskip

\noindent A sequence of decision makers at dates $t= 0, 1, \ldots, $ chooses $\{x_{t+1}, u_t \}_{t=0}^\infty$ subject to
$$ x_{t+1} = A x_t + B u_t, \quad t \geq 0 $$
where $A$ is an $n \times n$ matrix, $B$ is an $n \times k$ matrix,  $u_t$ is a $k \times 1$ vector of time $t$
``controls'', and $x_0$ is a given initial condition.   Let $r(x_t, u_t ) = - ( x_t ' R x_t + u_t' Q u_t)$, where $R$ and $Q$ are positive definite
matrices.  A time $t$ decision maker chooses $u_t, x_{t+1} $. A time $t$ decision maker's preferences are ordered by
$$ r(x_t, u_t) + \delta \sum_{j=1}^\infty \beta^j r(x_{t+j}, u_{t+j}), \leqno(0) $$
where $\beta \in (0,1)$ and $\delta \in (0,1]$.

\medskip
\noindent{\bf a.} Let $V(x)$ solve the following Bellman equation:
$$ V(x) = r(x, -Fx) + \beta V((A-BF)x )  .  \leqno(1) $$
Please interpret $V(x)$; i.e., please complete the sentence ``$V(x)$ is the value of $\ldots$ ''.

% \medskip
%
% \noindent{\bf a'. }  Please write a discrete Lyapunov equation whose solution determines $V(x)$.

\medskip \noindent{\bf b.} For a given matrix $F$, please guess a functional form for $V(x)$, then
describe an algorithm for solving the functional equation (1) for $V(x)$.  Please get as far as you can in computing  $V(x)$.

\medskip

\noindent{\bf c.}  Consider the functional equation
$$ U(x) = \max_u \left\{ r(x,u) + \delta \beta V(Ax + B u) \right\} , \leqno(2) $$
where $V(x)$ satisfies the Bellman equation (1). Further, let $U(x)$ be attained by
$u = - G x$, so that
$$ U(x) = r(x,-Gx) + \delta \beta V\left((A-BG)x\right) . \leqno(2) $$
Please interpret $U(x)$ as a value function.

\medskip
\noindent{\bf d.} Please define a Markov perfect equilibrium for the sequence of  problems solved by the sequence
of decision makers who choose $\{u_t\}_{t=0}^\infty$.

\medskip
\noindent{\bf e.} Please describe how to compute a Markov perfect equilibrium in this setting.

\medskip
\noindent{\bf f.} Please compare your algorithm for computing a Markov perfect equilibrium with the Howard policy improvement
algorithm.

\medskip

\noindent{\bf g.} Let $\vec a_0 = \{a_t\}_{t=0}^\infty$.
Define $\vec a_1 = \{a_t\}_{t=1}^\infty$ as the continuation of the sequence $\vec a_0$. Is a continuation of a Markov perfect equilibrium a Markov perfect equilibrium?

\medskip

\noindent  {\bf h.} Suppose instead that there is a dictator who at time $0$ chooses $\{u_t\}_{t=0}^\infty$ to maximize the time $t=0$ value
of the criterion (0). Please write  Bellman equations and tell how to solve them for an optimal plan for the time $0$  dictator.

\medskip
\noindent{\bf i.}
Given $x_1$, a time $1$ dictator chooses $\{u_t\}_{t=1}^\infty$ to maximize utility function (0) for time $t$.
Is a continuation of the time $0$ dictator's plan the time $1$  dictator's plan?


\medskip
\noindent {\bf j.} Can you restrict $\delta \in (0,1]$ so that the  time $0$ dictator's  plan equals the outcome of the Markov
perfect equilibrium that you described above?


\index{Howard policy improvement algorithm}%


\eqnotracefalse
%\showchaptIDtrue