-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathblack1.tex
3570 lines (3157 loc) · 167 KB
/
black1.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%\input texsis
%\book
%\singlespace
%\input epsf.tex
\input grafinp3
\input psfig
%\eqnotracetrue
%\showchaptIDfalse
\chapter{Time Series\label{timeseries}}
\footnum=0
\def\toone{{t+1}}
\def\ttwo{{t+2}}
\def\tthree{{t+3}}
\def\Tone{{T+1}}
\def\TTT{{T-1}}
\def\rtr{{\rm tr}}
%
%\def\frac#1/#2{\leavevmode\kern.1em
% \raise.5ex\hbox{\the\scriptfont0 #1}\kern-.1em
% /\kern-.15em\lower.25ex\hbox{\the\scriptfont0 #2}}
\def\frac#1#2{#1\over #2}
%
%\showchaptIDfalse
%\def\specsec#1{\medskip{\sc\noindent{#1}}}
%\line{\hfil \today}
\section{Two workhorses}
This chapter describes two tractable models
of time series: finite state \idx{Markov chain}s and first-order
stochastic linear difference equations. These models are
organizing devices that put restrictions on a
sequence of random vectors. They are useful because they describe
a time series with parsimony. In later chapters, we shall make
two uses each of Markov chains and stochastic linear difference
equations: (1) to represent the exogenous information flows
impinging on an agent or an economy, and (2) to represent an
optimum or equilibrium outcome of agents' decision making.
The Markov chain and the first-order stochastic linear difference
both use a sharp notion of a \idx{state} vector. A state vector
summarizes the information about the current position of a system that
is relevant for determining its future.
The Markov chain and the stochastic linear difference equation
will be useful tools for studying dynamic optimization
problems.
% ^^|wonkers.m| % this is how to use the index program for
% matlab programs
\index{stochastic!linear difference equations}
\section{Markov chains}
A stochastic process is a sequence of random vectors. For
us, the sequence will be ordered by a time index, taken to be the
integers in this book. So we study discrete time models. \index{stochastic!process}%
We study a discrete-state stochastic process with
the following property:
\medskip
\specsec{Markov Property:} A stochastic process $\{x_t\}$ is
said to have the {\it Markov property} if for all $k \geq 1$ and
all $t$,
$${\rm Prob}(x_{t+1}\vert x_t, x_{t-1}, \ldots, x_{t-k})
= {\rm Prob}(x_{t+1}\vert x_t) .$$
\medskip
We assume the Markov property and characterize the process by a
{\it Markov chain}. \index{Markov chain}
A time-invariant Markov chain is defined by a triple of objects, namely,
an $n$-dimensional state space consisting of vectors
$e_i, i=1, \ldots, n$, where $e_i$ is an $n\times 1$ unit vector
whose $i$th entry is $1$ and all other entries are zero;
an $n \times n$
{\it transition matrix} $P$, which \index{transition matrix}
records the probabilities of moving from one value of the state to
another in one period; and an $(n \times 1)$ vector $\pi_0$
whose $i$th element is the probability of being in state $i$ at time 0:
$ \pi_{0i} = {\rm Prob} (x_0 = e_i)$.
The elements of matrix $P$ are
$$ P_{ij} = {\rm Prob}(x_{t+1} = e_j \vert x_t = e_i).$$
For these interpretations to be valid, the matrix $P$ and the
vector $\pi_0$ must satisfy the following assumption:
\specsec{Assumption M:}
\item{a.} For $i=1, \ldots, n$, the matrix $P$ satisfies
$$ \sum_{j=1}^n P_{ij} =1. \EQN obA1$$
\medskip
\item{b.} The vector $\pi_0$ satisfies
$$ \sum_{i=1}^n \pi_{0i} =1.$$
A matrix $P$ that satisfies property \Ep{obA1} is called a
{\it stochastic matrix}. A stochastic matrix \index{stochastic!matrix}%
defines the
probabilities of moving from one value of the state to another
in one period. The probability of moving from one value of the
state to another in {\it two\/} periods is determined by $P^2$
because
$$\eqalign{ &{\rm Prob}(x_{t+2} =e_j \vert x_t = e_i) \cr
&=\sum_{h=1}^n {\rm Prob}(x_{t+2}=e_j \vert x_{t+1}=e_h)
{\rm Prob}(x_{t+1}=e_h \vert x_t = e_i) \cr
&=
\sum_{h=1}^n P_{ih} P_{hj} = P^{(2)}_{ij},\cr }$$
where $P^{(2)}_{ij}$ is the $i,j$ element of $P^2$. Let $P^{(k)}_{i,j}$
denote the $i,j$ element of $P^k$.
By iterating on the preceding equation, we discover that
$$ {\rm Prob}(x_{t+k} = e_j \vert x_t = e_i) =
P^{(k)}_{ij} .$$
The unconditional probability distributions of $x_t$ are
determined by
$$\eqalign{\pi_1'= {\rm Prob} (x_1) &= \pi_0' P \cr
\pi_2'= {\rm Prob}( x_2) &= \pi_0' P^2 \cr
\vdots & \cr
\pi_k'={\rm Prob} (x_k) & = \pi_0' P^k ,\cr}$$
where $\pi_t'={\rm Prob} (x_t)$ is the $(1 \times n)$ vector whose
$i$th element is ${\rm Prob}(x_t = e_i)$.
\index{distribution!stationary}
\subsection{Stationary distributions}
Unconditional probability distributions evolve according to
$$ \pi_{t+1}' = \pi_t ' P. \EQN obA2$$
An unconditional
distribution is called {\it stationary\/} or {\it invariant\/}
if it satisfies
$$ \pi_{t+1} = \pi_t,$$
that is, if the unconditional
distribution remains unaltered with the passage of time.
From the law of motion \Ep{obA2} for unconditional
distributions, a stationary distribution must satisfy
$$ \pi' = \pi' P \EQN steadst1 $$
or
$$ \pi' (I - P) =0.$$
Transposing both sides of this equation gives
$$ (I-P') \pi =0, \EQN obA3$$
which determines $\pi$ as an eigenvector (normalized to satisfy
$\sum_{i=1}^n \pi_i = 1$) associated with a unit eigenvalue of $P'$.
We say that $P, \pi$ is a {\it stationary Markov chain\/} if the initial distribution
$\pi$ is such that \Ep{steadst1} holds.
The fact that $P$ is a stochastic matrix (i.e., it has nonnegative
elements and satisfies $\sum_j P_{ij} =1$ for all $i$) guarantees that
$P$ has at least one unit eigenvalue, and
that there is at least one eigenvector
$\pi$ that satisfies equation \Ep{obA3}.
This stationary distribution may not be
unique because $P$ can have a repeated unit eigenvalue.
\smallskip
\noindent{\bf Example 1}. A Markov chain
$$ P =\left[\matrix{1 & 0 & 0 \cr
.2 & .5 & .3 \cr
0 & 0 & 1 \cr } \right] $$
has two unit eigenvalues with associated stationary distributions
$ \pi' = \left[\matrix{1 & 0 & 0 \cr}\right]$ and
$ \pi' = \left[\matrix{0 & 0 & 1 \cr}\right]$. Here states $1$ and $3$
are both {\it absorbing\/} states.
Furthermore, any initial distribution that puts zero probability
on state $2$ is a stationary distribution. See exercises {\it \the\chapternum.10\/} and
{\it \the\chapternum.11\/}.
\smallskip
\noindent{\bf Example 2}. A Markov chain
$$ P =\left[\matrix{.7 & .3 & 0 \cr
0 & .5 & .5 \cr
0 & .9 & .1 \cr } \right] $$
has one unit eigenvalue with associated stationary distribution
$ \pi' = \left[\matrix{0 & .6429 & .3571 \cr}\right]$.
Here states $2$ and $3$ form an {\it absorbing subset\/} of the state space.
\subsection{Asymptotic stationarity}
We often ask the following question
about a Markov process: for an arbitrary initial distribution
$\pi_0$, do the unconditional distributions
$\pi_t$ approach a stationary distribution
$$ \lim_{t \to \infty} \pi_t = \pi_\infty ,$$
where $\pi_\infty$ solves equation
\Ep{obA3}? If the answer is yes, then does the
limit distribution $\pi_\infty$ depend on the initial distribution
$\pi_0$? If the limit $\pi_\infty$ is independent of the
initial distribution $\pi_0$, we say that the process is
{\it asymptotically stationary with a unique invariant distribution}.
We call a solution $\pi_\infty$ a {\it stationary
distribution\/} or an {\it invariant distribution\/} of $P$.
\index{distribution!stationary} \index{distribution!invariant}
We state these concepts formally in the following definition:
%\specsec{Definition:} Let $\pi_\infty$ be a unique vector that
%satisfies
%$(I-P') \pi_\infty=0.$ If for all initial distributions
%$\pi_0$ it is true that $P^t{'} \pi_0$ converges to
%the same $\pi_\infty$, we say that the Markov chain
%is asymptotically stationary with a unique invariant distribution.
\medskip%\noindent{\sc Definition:}
\definition{def1} Let $\pi_\infty$ be a unique vector that
satisfies
$(I-P') \pi_\infty=0.$ If for all initial distributions
$\pi_0$ it is true that $P^t{'} \pi_0$ converges to
the same $\pi_\infty$, we say that the Markov chain
is asymptotically stationary with a unique invariant distribution.
\enddefinition
\medskip
The following theorems can be used to show
that a Markov chain is asymptotically stationary.
\medskip
%\medskip\noindent{\sc Theorem 1:}
\theorem{ch11} Let $P$ be a stochastic
matrix with $P_{ij} > 0 \ \forall (i,j)$. Then $P$ has
a unique stationary distribution, and the
process is asymptotically stationary.
\endtheorem
\medskip
%\noindent{\sc Theorem 2:}
\theorem{ch12} Let $P$ be a stochastic matrix
for which $P^n_{ij} > 0 \ \forall (i,j)$ for some
value of $n \geq 1$. Then $P$ has a unique stationary
distribution, and the process is asymptotically stationary.
\endtheorem
\medskip
\noindent The conditions of \Theorem{ch11} (and \Theorem{ch12}) state that from any
state there is a positive probability of moving to any other state
in one (or $n$) steps. Please note that some of the examples below will violate
the conditions of \Theorem{ch12} for any $n$.
\subsection{Forecasting the state}
The minimum mean squared error forecast of the state next period is the conditional
mathematical expectation:
$$ E [x_{t+1} | x_t = e_i] = \bmatrix{ P_{i1} \cr P_{i2} \cr \vdots \cr P_{in} \cr}
= P' e_i = P_{i,\cdot}' \EQN Pprep
$$
where $P_{i,\cdot}'$ denotes
the transpose of the $i$th row of the matrix $P$. In section \use{sec:HMM}
of this book's appendix \use{lincontrol}, we use this equation to motivate
the following first-order stochastic difference equation for the state:\index{Markov chain!as difference equation}%
$$ x_{t+1} = P' x_t + v_{t+1} \EQN Pdiffeqn $$
where $v_{t+1}$ is a random disturbance that evidently satisfies
$E [v_{t+1} | x_t] = 0 $.
Now let $\overline y$ be an $n\times 1$ vector of real numbers and define
$y_t = \overline y' x_t$, so that $y_t = \overline y_i$ if $x_t = e_i$.
Evidently, we can write
$$ y_{t+1} = \bar y' P' x_t + \bar y' v_{t+1}. \EQN observnonlinear $$
The pair of equations \Ep{Pdiffeqn}, \Ep{observnonlinear} becomes a simple
example of a \idx{hidden Markov model}
when the observation $y_t$ is too coarse to reveal the state. See section
\use{sec:HMM} of technical appendix \use{lincontrol} for a discussion of such models.
\subsection{Forecasting functions of the state}
From the conditional and unconditional probability
distributions that we have listed, it follows that the unconditional
expectations of $y_t$ for $t \geq 0$ are determined by
$ E y_t = (\pi_0' P^t) \overline y$.
%or
%$$\eqalign{ E y_0 &= \pi_0' \overline y \cr
% E y_1 & = \pi_0 ' P \overline y \cr
% & \vdots \cr
% E y_t & = \pi_0 ' P^k \overline y. \cr } $$
Conditional expectations are determined by
\index{conditional expectation}%
$$\EQNalign{& E (y_{t+1}| x_t = e_i) =
\sum_j P_{ij} \overline y_j
= (P \overline y)_i \EQN conde1 \cr
& E(y_{t+2} | x_t =e_i) = \sum_k P_{ik}^{(2)} \overline y_k
= (P^2 \overline y)_i \EQN conde2 \cr} $$
and so on, where $P_{ik}^{(2)}$ denotes the $(i,k)$ element of $P^2$ and $(\cdot)_i$ denotes
the $i$th row of the matrix $(\cdot)$. An equivalent formula
from \Ep{Pdiffeqn}, \Ep{observnonlinear} is
$E [y_{t+1} | x_t] = \bar y' P' x_t = x_t' P \bar y$ , which
equals $(P\bar y)_i$ when $x_t = e_i$.
Notice that
$$ \eqalign{ E [ E (y_{t+2} | x_{t+1} = e_j) | x_t =
e_i] & = \sum_j P_{ij} \sum_k P_{jk} \overline y_k \cr
= \sum_k (\sum_j P_{ij} P_{jk}) \overline y_k & = \sum_k P_{ik}^{(2)}
\overline y_k = E(y_{t+2} | x_t = e_i). \cr} $$
Connecting the first and last terms in this string of equalities
yields $E [E(y_{t+2}|x_{t+1})| x_t ] = E [y_{t+2} | x_t]$.
This is an example of the
``law of iterated expectations.'' The \idx{law of iterated expectations}
states that for any random variable $z$ and two information sets
$J, I$ with $J \subset I$, $E [E(z | I)|J]=E(z|J)$.
As another example of the law of
iterated expectations, notice that
$$E y_1 = \sum_j \pi_{1,j} \overline y_j =
\pi_1' \overline y = (\pi_0' P) \overline y
= \pi_0' (P \overline y) $$
and that
$$ E[ E(y_1 | x_0 =e_i) ]% = \sum_j P_{ij} \overline x_j
= \sum_i \pi_{0,i} \sum_j P_{ij} \overline y_j
= \sum_j (\sum_i \pi_{0,i} P_{ij}) \overline y_j = \pi_1' \overline y
= E y_1 .$$
%$$ E (x_{t+k} \vert x_t= \overline x) = P^k \overline x . $$
%
%Notice that
%$$\eqalign{ E (x_t) & = \pi_t' \overline x = (\pi_0' P^t) \overline x \cr
% & = \pi_0' (P^t \overline x) \cr
% & = E [E (x_t \vert x_0 = \overline x )] .\cr}$$
%The statement that $E (x_t ) = E (E x_t \vert x_0 = \overline x)$
%is an example of the {\it law of iterated expectations}.
\index{law of iterated expectations}
\subsection{Forecasting functions}\label{sec:resolvent}
There are powerful formulas for forecasting functions
of a Markov state.
Again, let $\overline y $ be an $n\times 1$ vector and
consider the random variable $y_t = \overline y' x_t$.
Then
$$ E[y_{t+k} \vert x_t = e_i] = (P^k \overline y)_i $$
where $(P^k \overline y)_i$ denotes the $i$th row of $P^k \overline y$.
Stacking all $n$ rows together, we express this as
$$ E[y_{t+k} | x_t] = P^k \overline y . \EQN foreformulak $$
We also have
$$ \sum_{k=0}^\infty \beta^k E [y_{t+k} \vert x_t =
\overline e_i ]
= [ (I -\beta P)^{-1} \overline y ]_i,$$
where $\beta \in (0,1)$ guarantees existence of $(I -\beta P)^{-1}
= (I + \beta P + \beta^2 P^2 + \cdots \, )$.
The matrix $(I -\beta P)^{-1}$ is called a ``resolvent operator.''
\index{resolvent operator}% GGGG
\subsection{Enough one-step-ahead forecasts determine $P$}
One-step-ahead forecasts of
a sufficiently rich set of random variables characterize a Markov
chain. In particular,
one-step-ahead conditional expectations of $n$ independent
functions (i.e., $n$ linearly independent vectors $h_1, \ldots, h_n$)
uniquely determine the transition matrix $P$. Thus, let
$E[h_{k,t+1} \vert x_t = e_i] = (P h_k)_i $. We can collect the
conditional expectations of $h_k$ for all initial states $i$ in
an $n \times 1$ vector
$E [h_{k,t+1} \vert x_t] = P h_k$.
We can then collect conditional expectations for the $n$
independent vectors $h_1, \ldots, h_n$ as $P h = J$ where
$h=\left[\matrix{h_1 & h_2 & \ldots & h_n\cr}\right]$ and
$J$ is the $n \times n$ matrix consisting of all conditional expectations of
all $n$ vectors $h_1, \ldots , h_n$. If we know $h$ and $J$, we
can determine $P$ from $P= J h^{-1}$.
\subsection{Invariant functions and ergodicity}
Let $P, \pi$ be a stationary $n$-state Markov chain with the
state space
$X= [e_i, i=1, \ldots, n]$. An $n \times 1$ vector $\overline y$
defines a random variable $y_t = \overline y' x_t$.
%Thus, a random variable is another term for ``function of the underlying
%Markov state.''
Let $E [y_\infty | x_0]$ be the expectation of $y_s$ for $s$ very large,
conditional on the initial state.
The following is a useful precursor to a law of large numbers:
\medskip
\theorem{lawlargenumbers0}
Let $\overline y$ define a random variable as a function of an underlying
state $x$, where $x$ is governed by a stationary
Markov chain $(P, \pi)$. Then
$$ {1 \over T} \sum_{t=1}^T y_t \rightarrow E [y_\infty | x_0]
\EQN lawlarge0 $$
with probability $1$.
\endtheorem
\medskip
To illustrate \Theorem{lawlargenumbers0}, consider the following example:
\medskip
\noindent{\bf Example:}
Consider the Markov chain $P = \bmatrix{ 1 & 0 \cr 0 & 1\cr}, \pi_0 = \bmatrix{p \cr (1-p)\cr}$ for
$p \in (0,1)$. Consider the random variable $y_t = \bar y' x_t$ where $\bar y = \bmatrix{10 \cr 0 \cr}$. The chain has two possible sample
paths, $y_t = 10, t\geq 0$, which occurs with probability $p$ and $y_t = 0, t\geq 0$, which occurs with probability $1-p$.
Thus, $ {1 \over T} \sum_{t=1}^T y_t \rightarrow 10$ with probability $p$ and ${1 \over T} \sum_{t=1}^T y_t \rightarrow 0$ with
probability $(1-p)$.
\medskip
The outcomes in this example indicate why we might want something more than \Ep{lawlarge0}.
In particular, we would like to be free to
replace $E[y_\infty | x_0]$ with the constant unconditional
mean $E[y_t] = E[y_0]$ associated with the stationary distribution $\pi$.
To get this outcome, we must strengthen
what we assume about $P$ by using the following concepts.
Suppose that $(P, \pi)$ is a stationary Markov chain. Imagine repeatedly drawing $x_0$ from $\pi$ and
then generating $x_t, t \geq 1$ by successively drawing from transition densities given by the matrix $P$.
We use
\medskip
\definition{invariant} A random variable $y_t = \overline y' x_t $ is said to
be {\it invariant\/} if
$y_t = y_0, t \geq 0$,
for all realizations of $x_t, t \geq 0$ that occur with positive probability under $(P, \pi)$.
%$\bar y_i = \bar y_j$ for all $i,j$ that occur with
%positive probability according to the stationary probability distribution
%$\pi$.
\enddefinition
\medskip
\noindent Thus, a random variable $y_t$ is invariant (or ``an invariant function
of the state'') if it remains constant
at $y_0$ while the underlying state $x_t$ moves through the state space $X$. Notice how
the definition leaves open the possibility that $y_0$ itself might differ across sample paths indexed by
different draws of the initial condition
$x_0$ from the initial (and stationary) density $\pi$.
%
%Evidently, ${\rm Prob}(y_{t+1} = \bar y_j | y_t = \bar y_i) = P_{ij}$. If $P_{ij} >0 $ for all $(i,j)$,
%then the only way that we can have $y_{t+1} = y_t$ for all realizations is if we have $\bar y_i = \bar y_j$
%for all $i, j$. More generally, the only way that we can have $y_{t+1} = y_t = \bar y_i$ for sure is that
%$\bar y_j = \bar y_i $ whenever $P_{ij} >0$. Thus, we can say that if $y_t = \bar y x_t$ is an invariant variable for Markov chain
%with transition matrix $P$, then $P\bar y = \bar y$, which is to say, $\bar y$ is a right eigenvector of $P$ associated
%with a unit eigenvalue.
%
%If $P_{ij} > 0$ for all $(i,j)$, then the only eigenvector associated with a unity eigenvector has the form $ \bar y_o {\bf 1} $,
%where $y_o$ is a real scalar and ${\bf 1}$ is the $n \times 1$ unit vector. If $P_{ij} = 0$ for some $(i,j)$, there
%can be eigenvectors that are not of the form $ \bar y_o {\bf 1} $, as illustrated in the following example.
%
%\medskip
%\specsec{Example:} Consider the stochastic matrix
%$$ P = \bmatrix{ 1 & 0 & 0 \cr
% 0 & 1 & 0 \cr
% .25 & .25 & .5} .$$
%A right eigenvector of $P$ associated with a unit eigenvalue is $\bmatrix{ 2 & -2 & 0 } $, which evidently is not constant across states.
%
%\medskip
The stationary Markov chain $(P, \pi)$ induces a joint density $f(x_{t+1}, x_t)$ over $(x_{t+1}, x_t)$ that
is independent of calendar time $t$; $P, \pi$ and the definition $y_t = \overline y' x_t$ also
induce a joint density $f_y(y_{t+1}, y_t)$ that is independent of calendar time. In what follows,
we compute mathematical expectations with respect to the joint density $f_y(y_{t+1}, y_t)$.
For a finite-state Markov chain, the following theorem gives a convenient
way to characterize
invariant functions of the state.
\medskip
\theorem{invariant2200} Let $(P, \pi)$ be a stationary Markov chain. %and assume that $\pi$ puts positive
%probability on all $x_t \in X$.
If
$$ E [y_{t+1} | x_t] = y_t \EQN invariant22 $$
then the random variable $y_t = \overline y' x_t$
is invariant.
\endtheorem
\proof By using the law of iterated expectations, notice that
$$\eqalign{ E (y_{t+1} - y_t)^2 &= E[E(y_{t+1}^2 - 2 y_{t+1} y_t
+ y_t^2)|x_t] \cr
& = E[ E y_{t+1}^2 | x_t - 2 E (y_{t+1}| x_t) y_t + E y_t^2 | x_t] \cr
& = E y_{t+1}^2 - 2 E y_t^2 + E y_t^2 \cr
& =0 \cr } $$
where the middle term on the right side of the second line
uses that $E[y_{t}|x_t]= y_t$, the middle term on the right side of the third line
uses the hypothesis \Ep{invariant22}, and the third line uses the hypothesis that $\pi$
is a stationary distribution. In a finite Markov chain, if
$E (y_{t+1} - y_t)^2=0$, then $y_{t+1} = y_t$ for all $y_{t+1}, y_t$ that
occur with positive probability under the stationary distribution.
\endproof
\medskip
As we shall have reason to study in chapters \use{selfinsure} and
\use{incomplete},
{\it any\/} (not necessarily stationary) stochastic process $y_t$ that satisfies
\Ep{invariant22} is said to be a {\it martingale\/}.
\Theorem{invariant2200} tells us that a martingale that is a function of a finite-state
stationary Markov state $x_t$ must be constant over time. This result is a special case of the
martingale convergence theorem that underlies some remarkable results about savings to be studied
in chapter \use{selfinsure}.\NFootnote{\Theorem{invariant2200} tells us that a stationary martingale process
has so little freedom to move that it has to be constant forever, not just eventually, as asserted
by the martingale convergence theorem.}
Equation \Ep{invariant22} can be expressed as
$ P \overline y = \overline y$
or
$$ (P - I)\overline y = 0, \EQN invariant3 $$
which states that an invariant function of the state is
a (right) eigenvector of $P$ associated with a unit eigenvalue.
Thus, associated with unit eigenvalues of $P$ are (1) left eigenvectors that are stationary distributions of the chain (recall equation \Ep{obA3}),
and (2) right eigenvectors that are invariant functions of the chain (from equation \Ep{invariant3}).
\definition{ergodicity} Let $(P, \pi)$ be a stationary Markov chain.
The chain is said to be {\it ergodic\/} if the only invariant
functions $\overline y$ are constant with probability $1$ under the stationary unconditional probability
distribution $\pi$, i.e.,
$\overline y_i = \overline y_j$ for all $i, j$ with $\pi_i >0, \pi_j >0$.
\enddefinition
\medskip
\specsec{Remark:} Let $\tilde \pi^{(1)}, \tilde \pi^{(2)}, \ldots, \tilde \pi^{(m)}$ be
$m$ distinct `basis' stationary distributions for an $n$ state Markov chain with transition matrix
$P$. Each $\tilde \pi^{(k)}$ is an $(n \times 1)$ left eigenvector of $P$ associated with a distinct
unit eigenvalue. Each $\pi^{(j)}$ is scaled to be a probability vector (i.e., its components are nonnegative and sum to unity).
The set $S$ of {\it all\/} stationary distributions is convex. An element $\pi_b \in S$ can be represented
as
$$ \pi_b = b_1 \tilde \pi^{(1)} + b_2 \tilde \pi^{(2)} + \cdots + b_m \tilde \pi^{(m)} , $$
where $b_j \geq 0, \sum_j b_j =1$ is a probability vector.
\medskip
\specsec{Remark:} A stationary density $\pi_b$ for which the pair $(P, \pi_b)$ is an ergodic Markov chain
is an extreme point of the convex set $S$, meaning that it can be represented
as $ \pi_b = \tilde \pi^{(j)} $
for one of the `basis' stationary densities.
\medskip
\noindent A law of large numbers for Markov chains is:
\medskip
\theorem{LLNMarkov}
Let $\overline y$ define a random variable on a stationary and ergodic
Markov chain $(P, \pi)$. Then
$$ {1 \over T} \sum_{t=1}^T y_t \rightarrow E[y_0]
\EQN lawlarge1 $$
with probability $1$.
\endtheorem
\medskip
This theorem tells us that the time series average
converges to the population mean of the stationary distribution.
\medskip
Three examples illustrate these concepts.
\medskip
\noindent{\bf Example 1.} A chain with transition matrix
$P=\left[\matrix{0 & 1 \cr 1 & 0\cr}\right]$ has a unique stationary
distribution $ \pi=\left[\matrix{.5 & .5 \cr}\right]'$ and
the invariant functions are $\left[\matrix{\alpha & \alpha \cr}\right]'$
for any scalar $\alpha$. Therefore, the process is ergodic and
\Theorem{LLNMarkov} applies.
\medskip
\noindent{\bf Example 2.} A chain with transition matrix
$P=\left[\matrix{1 & 0 \cr 0 & 1\cr}\right]$ has a continuum of
stationary distributions
$\gamma \left[\matrix{1 \cr 0 \cr}\right]+
(1- \gamma )\left[\matrix{0 \cr 1 \cr}\right]$ for any $\gamma \in [0,1]$ and
invariant functions
$\left[\matrix{0 \cr \alpha_1 \cr}\right] $ and
$\left[\matrix{\alpha_2 \cr 0 \cr}\right]$ for any scalars $\alpha_1, \alpha_2$. Therefore,
the process is not ergodic when $\gamma \in (0,1)$, for note that neither invariant function is
constant across states that receive positive probability according to a stationary distribution associated with $\gamma \in (0,1)$.
Therefore, the conclusion \Ep{lawlarge1} of \Theorem{LLNMarkov} does not hold for an initial stationary distribution associated with
$\gamma \in (0,1)$,
although the weaker result \Theorem{lawlargenumbers0} does hold. When $\gamma \in (0,1)$, nature chooses state $i=1$ or
$i=2$ with probabilities $\gamma, 1-\gamma$, respectively,
at time $0$. Thereafter, the chain remains stuck in the realized time $0$ state. Its failure ever to
visit the unrealized state prevents the sample average from converging to the population mean of an arbitrary function $\bar y$ of the state.
Notice that conclusion \Ep{lawlarge1} of \Theorem{LLNMarkov} does hold for the stationary distributions associated
with $\gamma=0$ and $\gamma=1$.
\medskip
\noindent{\bf Example 3.}
A chain with transition matrix
$P=\left[\matrix{.8 & .2 & 0 \cr .1 & .9 & 0 \cr
0 & 0 & 1\cr}\right]$ has a continuum of
stationary distributions
$ \gamma \left[\matrix{ {1\over 3} & {2 \over 3} & 0 \cr}\right]'
+(1- \gamma) \left[\matrix{ 0 & 0 & 1 \cr}\right]' $ for $\gamma \in [0,1]$ and
invariant functions
$ \alpha_1\left[\matrix{1 & 1 & 0 \cr}\right]'$ and
$ \alpha_2\left[\matrix{0 & 0 & 1 \cr}\right]'$
for any scalars $\alpha_1, \alpha_2$.
The conclusion \Ep{lawlarge1} of \Theorem{LLNMarkov} does not hold for
the stationary distributions associated with $\gamma \in (0,1)$,
but \Theorem{lawlargenumbers0} does hold.
But again, conclusion \Ep{lawlarge1} does hold for the stationary distributions associated with $\gamma =0$ and $\gamma=1$.
\subsection{Simulating a Markov chain}
It is easy to simulate a Markov chain using a random number
generator. The Matlab program {\tt markov.m}
%^^|markov.m|
\mtlb{markov.m}
does the job.
We'll use this program in some later chapters.\NFootnote{An index
in the back of the book lists Matlab programs.}
%available at
% $<$ftp://zia.stanford.edu/\raise-4pt\hbox{\~{}}sargent/pub/webdocs/matlab$>$.}
\subsection{The likelihood function}
Let $P$ be an $n \times n$ stochastic matrix
with states $1 ,2, \ldots, n$.
Let $\pi_0$ be
an $n \times 1$ vector with nonnegative elements summing to
$1$, with $\pi_{0,i}$ being the probability that the state is
$i$ at time $0$.
Let $i_t$ index the state at time
$t$.
The Markov property implies that the probability of drawing the path
$(x_0, x_1, \ldots, x_{T-1}, x_T) = (\overline e_{i_0}, \overline e_{i_1},
\ldots, \overline e_{i_{T-1}}, \overline e_{i_T})$ is
$$\eqalign{ L & \equiv {\rm Prob}( \overline x_{i_T},
\overline x_{i_{T-1}}, \ldots, \overline x_{i_1}, \overline x_{i_0}) \cr
& = P_{i_{T-1}, i_T} P_{i_{T-2}, i_{T-1}}
\cdots P_{i_0, i_1} \pi_{0,i_0}. \cr} \EQN likeli1
$$
The probability $L$ is called the {\it likelihood}. It is a
function of both the sample realization $x_0, \ldots , x_T$ and
the parameters of the stochastic matrix $P$. \index{likelihood
function} For a sample $x_0, x_1, \ldots, x_T$, let $n_{ij}$ be
the number of times that there occurs a one-period transition from
state $i$ to state $j$. Then the likelihood function can be
written
$$ L = \pi_{0,{i_0}} \prod_i\ \prod_j P_{i,j}^{n_{ij}},$$
a {\it multinomial\/} distribution.
\index{distribution!multinomial} \index{maximum likelihood}
\index{likelihood function!multinomial}
Formula \Ep{likeli1} has two uses. A first, which we
shall encounter often, is to describe the
probability of alternative histories of a Markov
chain. In chapter \use{recurge},
we shall use this formula to study prices and allocations
in competitive equilibria.
A second use is for estimating the parameters of a model
whose solution is a Markov chain.
Maximum likelihood estimation for free parameters $\theta$ of a
Markov process
works as follows. Let the transition matrix $P$
and the initial distribution
$\pi_0$ be functions $P (\theta), \pi_0(\theta)$
of a vector of free parameters $\theta$.
Given a sample $\{x_t\}_{t=0}^T $, regard the likelihood function
as a function of the parameters $\theta$. As the
estimator of $\theta$, choose the value that maximizes the
likelihood function $L$.
\section{Continuous-state Markov chain}
In chapter \use{recurge}, we shall use a somewhat different
notation to express the same ideas. This alternative notation can
accommodate either discrete- or continuous-state Markov chains. We
shall let $S$ denote the state space with typical element $s \in
S$. Let state transitions be described by the cumulative distribution function
$\Pi(s' | s) = {\rm
Prob} (s_{t+1} \leq s' | s_t =s)$ and let the initial state $s_0$ be described by
the cumulative distribution function $\Pi_o(s) = {\rm
Prob} (s_0 \leq s)$. The {\it transition density\/} is $\pi(s' | s ) = {d \over d s'} \Pi(s'|s)$
and the initial density is % {\rm
%Prob} (s_{t+1} = s' | s_t =s)$ and the initial density is
$\pi_0(s) = {d \over d s} \Pi_0(s)$. For all $s\in S, \pi(s'|s )
\geq 0$ and $ \int_{s'} \pi(s'|s) d s' =1$; also $\int_s \pi_0(s) d s
=1$.\NFootnote{Thus, when $S$ is discrete,
$\pi(s_j|s_i)$ corresponds
to $P_{i,j}$ in our earlier notation.}
Corresponding to \Ep{likeli1}, the likelihood function or density
over the history $s^t = [s_t, s_{t-1}, \ldots, s_0]$
is
$$ \pi(s^t) = \pi(s_t | s_{t-1} )\cdots \pi(s_1| s_0) \pi_0 (s_0).
\EQN likeli2 $$
For $t\geq 1$, the time $t$ unconditional distributions
evolve according to
$$ \pi_t(s_t) = \int_{s_{t-1}} \pi(s_t|s_{t-1}) \pi_{t-1} (s_{t-1})
d \, s_{t-1} .$$
A stationary or {\it invariant\/} distribution
satisfies
$$ \pi_\infty(s') = \int_s \pi(s'|s) \pi_\infty (s) d \, s, $$
which is the counterpart to
\Ep{steadst1}.
\medskip
\specsec{Definition:} A Markov chain $\bigl(\pi(s'|s), \pi_0(s)\bigr)$ is said
to be {\it stationary\/} if $\pi_0$ satisfies
$$ \pi_0(s') = \int_s \pi(s'|s) \pi_0 (s) d \, s. $$
\medskip
\specsec{Definition:} Paralleling our discussion of finite-state Markov chains,
we can say that the function $\phi(s) $ is {\it invariant\/} if
$$ \int \phi(s') \pi(s'| s) d s' = \phi(s). $$
A stationary continuous-state Markov process is said to be {\it ergodic\/}
if the only invariant functions $\phi(s')$ are constant with probability
$1$ under the stationary distribution $\pi_\infty$.
\medskip
A law of large numbers for Markov processes states:
\medskip
\theorem{LLNMarkovcontinuous}
Let $y(s)$ be a random variable, a measurable function of $s$,
and let
$\bigl(\pi(s'|s),\pi_0(s)\bigr)$ be a stationary and ergodic continuous-state
Markov process. Assume that $E |y| < +\infty$. Then
$$ {1 \over T} \sum_{t=1}^T y_t \rightarrow E y
= \int y(s) \pi_0(s) ds $$
with probability $1$ with respect to the distribution $\pi_0$.
\endtheorem
\index{second-moment restrictions}
\section{Stochastic linear difference equations}\label{sec:stochlinear}%
The first-order linear vector stochastic difference equation
is a useful example of a continuous-state Markov process.
Here we use
$x_t \in {\bbR}^n$ rather than $s_t$ to denote the time $t$ state
and
specify that the initial distribution $\pi_0(x_0)$ is Gaussian with mean
$\mu_0$ and covariance matrix $\Sigma_0$, and that
the transition density $\pi(x'|x) $ is Gaussian with
mean $A_o x$ and covariance $C C'$.\NFootnote{An $n \times 1$ vector
$z$ that is multivariate normal has the density
function $$ f(z) = (2 \pi)^{-.5 n} |\Sigma|^{-.5} \exp (-.5 (z - \mu)' \Sigma^{-1} (z-\mu)
) $$
where $ \mu = E z$ and $\Sigma = E (z-\mu)(z-\mu)'$.}
This specification pins down the joint distribution of the
stochastic process $\{x_t\}_{t=0}^\infty$ via formula
\Ep{likeli2}.
The joint distribution determines
all moments of the process.
This specification can be represented in terms of
the first-order stochastic linear difference equation
$$ x_{t+1} = A_o x_t + C w_{t+1} \EQN diff1 $$
for $t = 0, 1 ,\ldots$,
where $x_t$ is an $n \times 1$ state vector, $x_0$ is a random
initial condition drawn from a probability distribution with mean $E x_0 = \mu_0$
and covariance matrix $E (x_0 - \mu_0)(x_0 - \mu_0)' = \Sigma_0$,
$A_o$ is an $n \times n$ matrix, $C$ is an $n \times m$
matrix, and $w_{t+1}$ is an $m \times 1$ vector satisfying
the following:
\medskip
\specsec{Assumption A1:} $w_{t+1}$ is an i.i.d.\ process satisfying
$w_{t+1} \sim {\cal N}(0, I)$.
\medskip
We can weaken the Gaussian assumption A1.
To focus only on first and second moments of the $x$ process,
it is sufficient to make the weaker assumption:
\medskip
\specsec{Assumption A2:} $w_{t+1}$ is an $m \times 1$ random vector
satisfying:
$$ \EQNalign{ E w_{t+1} \vert J_t & = 0 \EQN wprop1;a \cr
E w_{t+1} w_{t+1}'\vert J_t & = I , \EQN wprop1;b \cr}$$
where %$J_t = \left[\matrix{w_t & \cdots & w_1 & x_0 \cr} \right]$
$J_t = [w_t, w_{t-1}, \ldots, w_1, x_0]$
is the information set at $t$, and $E [ \ \cdot \ | J_t]$ denotes
the conditional expectation. We impose no distributional
assumptions beyond \Ep{wprop1}. A sequence $\{w_{t+1}\}$
satisfying equation \Ep{wprop1;a} is said to be a martingale
difference sequence adapted to $J_t$.
% A sequence
% $\{z_{t+1}\}$ that satisfies $E [z_{t+1}|J_t ] = z_t$ is said
% to be a martingale adapted to $J_t$.
\index{martingale!difference sequence}%
\index{martingale}%
\medskip
An even weaker assumption is
\specsec{Assumption A3:} $w_{t+1}$ is a process satisfying
$$ E w_{t+1} = 0 $$ for all $t$ and
$$ E w_{t} w_{t-j}' = \cases{ I, & if $ j=0$; \cr
0, & if $j \neq 0 $. \cr} $$
A process satisfying assumption A3 is said to be
a vector ``white noise.''\NFootnote{Note that \Ep{wprop1;a}
by itself allows the distribution of $w_{t+1}$ conditional on $J_t$ to be
heteroskedastic.}
\index{white noise}%
Assumption A1 or A2 implies assumption A3 but not vice versa. Assumption
A1 implies assumption A2 but not vice versa.
Assumption A3 is sufficient to justify the formulas that we report
below for second moments.
We shall often append an observation equation $y_t = G x_t$ to
equation \Ep{diff1}
and deal with the augmented system
$$ \EQNalign{x_{t+1} & = A_o x_t + C w_{t+1} \EQN statesp1;a \cr
y_t & = G x_t . \EQN statep1;b \cr}$$
Here $y_t$ is a vector of variables observed at $t$, which
may include only some linear combinations of $x_t$. The
system \Ep{statesp1} is often called a linear {\it state-space system}.
\medskip
\noindent{\bf Example 1.} Scalar second-order autoregression:
Assume that $z_t$ and $w_t$ are scalar processes and that
$$ z_{t+1} = \alpha + \rho_1 z_t+ \rho_2 z_{t-1} + w_{t+1}. $$
Represent this relationship as
the system
$$ \eqalign{ \left[\matrix{z_{t+1} \cr
z_t \cr
1 \cr} \right] &=
\left[\matrix{\rho_1 & \rho_2 & \alpha\cr
1 & 0 & 0 \cr
0 &0 & 1 \cr}\right]
\left[\matrix{z_t \cr
z_{t-1} \cr
1 \cr} \right]
+ \left[\matrix{ 1 \cr
0 \cr
0 \cr} \right] w_{t+1} \cr
z_t & = \left[\matrix{1 & 0 & 0\cr} \right]
\left[\matrix{z_t \cr
z_{t-1} \cr
1 \cr} \right] \cr} $$
which has form \Ep{statesp1}.
\medskip
\noindent{\bf Example 2.} First-order scalar mixed moving
average and autoregression: Let
$$ z_{t+1} = \rho z_t + w_{t+1} + \gamma w_t. $$
Express this relationship as
$$ \eqalign{ \left[\matrix{z_{t+1} \cr
w_{t+1} \cr}\right]
& = \left[\matrix{\rho & \gamma \cr
0 & 0 \cr} \right]
\left[\matrix{ z_t \cr w_t \cr}\right]
+ \left[ \matrix{1 \cr 1 \cr}\right] w_{t+1} \cr
z_t & = \left[\matrix{1 & 0 \cr} \right]
\left[\matrix{z_t \cr
w_t \cr} \right]. \cr} $$
\noindent{\bf Example 3.} Vector autoregression:
Let $z_t$ be an $n \times 1$ vector of random variables.
We define a \idx{vector autoregression} by a stochastic
difference equation
$$ z_{t+1} = \sum_{j=1}^4 A_j z_{t+1-j} + C_y w_{t+1}, \EQN vecaug $$
where $w_{t+1}$ is an $n \times 1$ martingale difference
sequence satisfying equation
\Ep{wprop1} with $x_0' = \left[\matrix{z_0 & z_{-1} &
z_{-2} & z_{-3}\cr} \right]$ and $A_j$ is an $n \times n$ matrix
for each $j$.
We can map equation \Ep{vecaug} into equation
\Ep{diff1} as follows:
$$ \left[\matrix{z_{t+1} \cr z_t \cr z_{t-1} \cr z_{t-2} \cr}\right]=
\left[\matrix{A_1 & A_2 & A_3 & A_4 \cr
I & 0 & 0 & 0 \cr
0 & I & 0 & 0 \cr
0 & 0 & I & 0 \cr}\right]
\left[\matrix{z_t \cr z_{t-1} \cr z_{t-2} \cr z_{t-3} \cr}\right]
+ \left[\matrix{C_y \cr 0 \cr 0 \cr 0 \cr} \right] w_{t+1} .
\EQN vecaug2 $$
Define $A_o$ as the state transition matrix in equation \Ep{vecaug2}.
Assume
that $A_o$ has all of its eigenvalues bounded in modulus
below unity. Then equation \Ep{vecaug} can be initialized
so that $z_t$ is {\it covariance stationary\/}, a term we
define soon.
\subsection{First and second moments}
We can use equation \Ep{diff1} to deduce
the first and second moments of the sequence of random vectors
$\{x_t\}_{t=0}^\infty$. A sequence of random vectors is called a
{\it stochastic process\/}.\index{stochastic!process}%
\index{covariance stationary}
\medskip
%\specsec{Definition:}
\definition{def:covstat} A stochastic process $\{x_t\}$ is said to
be {\it covariance stationary\/} if it
satisfies the following two properties: (a) the mean is
independent of time,
$ E x_t = E x_0$ for all $t$,
and (b) the sequence of autocovariance matrices
$E(x_{t+j} - E x_{t+j})(x_t - E x_t)'$
depends on the separation between dates
$j = 0, \pm 1, \pm 2, \ldots$, but not on $t$.
\enddefinition
\smallskip
We use
\definition{stable}
A square real valued matrix $A_o$ is said to be {\it stable\/} if
all of its eigenvalues modulus are strictly less than
unity.
\enddefinition
\smallskip
We shall often find it useful to assume that \Ep{statesp1} takes the
special form
$$ \left[\matrix{x_{1,t+1} \cr x_{2,t+1} \cr}\right]
= \left[\matrix{1 & 0 \cr 0 & \tilde A \cr}\right]
\left[ \matrix{x_{1,t} \cr x_{2t} \cr} \right]
+ \left[\matrix{0 \cr \tilde C\cr}\right] w_{t+1}
\EQN statesp10 $$
where $\tilde A$ is a stable matrix. That $\tilde A$ is a stable
matrix implies that the only solution of $(\tilde A - I) \mu_2
=0$ is $\mu_2=0$ (i.e., $1$ is {\it not\/} an eigenvalue of
$\tilde A$). It follows that the matrix $A_o=
\left[\matrix{1 & 0 \cr 0 & \tilde A \cr}\right]$
on the right side of \Ep{statesp10} has one eigenvector associated
with a single unit eigenvalue: $(A_o - I)\left[\matrix{\mu_1 \cr
\mu_2 \cr}\right] =0$ implies
$\mu_1$ is an arbitrary scalar and $\mu_2 =0$. The first equation
of \Ep{statesp10} implies that $x_{1,t+1} = x_{1,0}$ for all $t
\geq 0$. Picking the initial condition $x_{1,0}$ pins down a
particular eigenvector $\left[\matrix{x_{1,0} \cr 0 \cr}\right]$
of $A_o$. As we shall see soon, this eigenvector is our candidate
for the unconditional mean of $x$ that makes the process
covariance stationary.
We will make an assumption that guarantees that there exists an initial
condition $(\mu_0, \Sigma_0) = (Ex_0, E(x - Ex_0) (x-Ex_0)')$ that makes
the $x_t$ process covariance stationary.
Either of the following conditions works:
\medskip
\specsec{Condition A1:} All of the eigenvalues of $A_o$ in
\Ep{statesp1} are strictly less than $1$ in modulus.
\medskip
\specsec{Condition A2:} The state-space representation takes the special
form \Ep{statesp10} and all of the eigenvalues of $\tilde A$ are strictly
less than $1$ in modulus.
\medskip
To discover the first and second moments of
the $x_t$ process,
we regard the initial condition $x_0$ as
being drawn from a distribution with mean $ \mu_0 = E x_0$ and
covariance $\Sigma_0 = E (x- E x_0) (x - E x_0)'$. We shall deduce
starting values for the mean and covariance
that make the process covariance
stationary, though our formulas are also useful for describing
what happens when we start from other initial conditions that
generate transient behavior that stops the process from being covariance
stationary.
Taking mathematical expectations on both sides of equation
\Ep{diff1} gives
$$ \mu_{t+1} = A_o \mu_t \EQN diff2 $$
where $\mu_t = E x_t $. We will assume that all of the eigenvalues
of $A_o$ are strictly less than unity in modulus, except possibly for one that
is affiliated with the constant terms in the various equations. Then
$x_t$ possesses a stationary mean defined to satisfy
$\mu_{t+1} = \mu_t$, which from equation \Ep{diff2} evidently satisfies
$$ (I - A_o) \mu = 0 , \EQN diff3 $$
which characterizes the mean $\mu$ as an eigenvector
associated with the single unit eigenvalue of $A_o$. The condition
that the remaining eigenvalues of $A_o$ are less than unity
in modulus implies that starting from any $\mu_0$, $\mu_t
\rightarrow \mu$.\NFootnote{To understand this, assume that the
eigenvalues of $A_o$ are distinct, and use the representation $A_o
= P \Lambda P^{-1}$ where $\Lambda$ is a diagonal matrix of the
eigenvalues of $A_o$, arranged in descending order of magnitude,
and $P$ is a matrix composed of the corresponding eigenvectors.
Then equation \Ep{diff2} can be represented as $\mu_{t+1}^* =
\Lambda \mu_t^*$, where $\mu_t^* \equiv P^{-1} \mu_t$, which
implies that $\mu_t^* = \Lambda^t \mu_0^*$. When all eigenvalues
but the first are less than unity, $\Lambda^t$ converges to a
matrix of zeros except for the $(1,1)$ element, and $\mu_t^*$
converges to a vector of zeros except for the first element, which
stays at $\mu_{0,1}^*$, its initial value, which we are free to set equal to
$1$, to capture the constant. Then $\mu_t = P \mu_t^*$ converges
to $P_1 \mu_{0,1}^* = P_1$, where $P_1$ is the eigenvector corresponding
to the unit eigenvalue.}
Notice that
$$ x_{t+1} - \mu_{t+1} = A_o ( x_t - \mu_t) + C w_{t+1}. \EQN diff4 $$
From equation \Ep{diff4},
we can compute that the law of motion of the covariance matrices
$ \Sigma_t \equiv E(x_t - \mu_t) (x_t - \mu_t)' .$
Thus,
$$ E(x_{t+1} - \mu_{t+1}) (x_{t+1} -\mu_{t+1})' = A_o E(x_t -\mu_t) (x_t-\mu_t)'A_o'
+ C C'$$
or
$$ \Sigma_{t+1} = A_o \Sigma_t A_o' + C C' . $$
A fixed point of this matrix difference equation evidently satisfies
$$ \Sigma_\infty = A_o \Sigma_\infty A_o' + C C' . \EQN diff5 $$
A fixed point $ \Sigma_\infty $
is the covariance matrix $E (x_t -\mu) (x_t - \mu)'$ under a stationary distribution of $x$.
%
% Thus, to compute $C_x(0)$, we must solve
% $$ C_x(0) =
% A_o C_x(0) A_o^\prime + C C' ,\EQN diff5 $$
% where $C_x(0) \equiv E (x_t -\mu) (x_t - \mu)'$.
Equation \Ep{diff5} is
a {\it discrete Lyapunov} equation in the $n \times n$ matrix
$\Sigma_\infty$. It can be solved with the Matlab program {\tt
doublej.m}. % ^^|doublej.m|
\mtlb{doublej.m}%
By virtue of \Ep{diff1} and \Ep{diff2}, note that for $j \geq 0$
$$ (x_{t+j} - \mu_{t+j}) = A_o^j (x_t - \mu_t) + C w_{t+j} + \cdots
+ A_o^{j-1} C w_{t+1}. $$
Postmultiplying both sides by $(x_t - \mu_t)'$
and taking expectations shows that
the autocovariance sequence satisfies
$$ \Sigma_{t+j,t} \equiv E (x_{t+j} - \mu_{t+j}) (x_t - \mu_t)' =
A_o^j \Sigma_t. \EQN diff6 $$
Note that $\Sigma_{t+j,t}$ depends on both $j$, the gap between dates, and $t$, the earlier date.
In the special case that $\Sigma_t = \Sigma_\infty$ that solves the discrete Lyapunov equation
\Ep{diff5}, $\Sigma_{t+j,t} = A_0^j \Sigma_\infty$ and so depends only on the gap $j$ between time
periods. In this case, an autocovariance matrix sequence $\{\Sigma_{t+j,t}\}_{j=0}^\infty$ is often also called an {\it
autocovariogram}. \index{autocovariogram}%
%
% Once \Ep{diff5}, is solved, the
% remaining second moments $\Sigma_{t+j,t}$ can be deduced from equation
% \Ep{diff6}.\NFootnote{Notice that
% $C_x(-j) = C_x(j)'$.}
Suppose that $y_t = G x_t$. Then
$\mu_{yt}= E y_t = G \mu_t$ and
$$ E (y_{t+j} - \mu_{yt+j}) (y_t - \mu_{yt})'
= G \Sigma_{t+j,t} G', \EQN ydiff2 $$
for $j=0, 1,\ldots$. Equations \Ep{ydiff2} show that %are matrix versions of
%the so-called \idx{Yule-Walker equations}, according to which