-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathblack21.tex
1095 lines (1043 loc) · 54.9 KB
/
black21.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\input grafinp3
%\input grafinput8
\input psfig
%
%\showchaptIDtrue
%\def\@chaptID{21.}
%\input form1
%\chapternum=28
\def\cov{\mathop{\rm cov}\nolimits} \overfullrule=0pt
%\hbox{}
\footnum=0
\Appendix{B}{Control and Filtering\label{lincontrol}}
\section{Introduction}
By recursive techniques we mean the application of dynamic
programming to control problems, and of Kalman filtering to
the filtering problems. We describe classes of problems in
which the dynamic programming and the Kalman filtering algorithms
are formally equivalent, being tied together by {\it duality}.
By exploiting their equivalence, we reap
double dividends from any results that apply to one or the other
problem.\NFootnote{The concepts of controllability and reconstructibility
are used to establish conditions for the convergence and other
important properties of the recursive algorithms.}
The next-to-last section of this appendix contains statements of a few
facts about linear least-squares projections.
% Familiarity with Sargent [1987, Ch. 10]
%would also help the reader.
The final section briefly describes filtering problems
where the state evolves according to a finite-state Markov
process.
\index{optimal linear regulator}
\section{The optimal linear regulator control problem}
We briefly recapitulate the {\it optimal linear regulator\/}
problem. Consider a system with a $(n \times 1)$ {\it state\/} vector
$x_t$ and a $(k \times 1)$ {\it control\/} vector $u_t$. The system is
assumed to evolve according to the law of motion
$$x_{t+1} = A_tx_t + B_tu_t \qquad t = t_0, t_0+1, \ldots, t_1 - 1 ,\EQN
ori2.1$$
where $A_t$ is an $(n \times n)$ matrix and $B_t$ is an $(n \times k)$
matrix. Both
$A_t$ and $B_t$ are known sequences of matrices. We define the {\it return
function\/} at time $t,\ r_t (x_t, u_t)$, as the quadratic form
$$r_t (x_t,u_t) = - \left[x^\prime_t\, u^\prime_t\right]
\left[\matrix{R_t & W_t \cr
W^\prime_t & Q_t \cr}\right]\ \left[\matrix{x_t \cr u_t \cr}\right] \qquad
t = t_0, \ldots, t_1 -1$$
where $R_t$ is $(n \times n),\ Q_t$ is $(k \times k)$, and $W_t$ is $(n
\times k)$. We shall initially assume that the matrices $\Bigl[{R_t \atop
W^\prime_t} \, {W_t \atop
Q_t}\Bigr]$ are positive semidefinite, though subsequently we shall see that
the problem can still be well posed even if this assumption is weakened. We
are also given an $(n \times n)$ positive semidefinite matrix $P_t$, which
is used to assign a terminal value of the state $x_{t_1}$.
The {\it optimal linear regulator\/} problem is to maximize
$$\EQNalign{ - \sum^{t_1-1}_{t = t_0}\, &
\left[\matrix{x_t\cr u_t \cr}\right]'
\left[\matrix{R_t & W_t \cr W^\prime_t & Q_t \cr} \right]
\left[\matrix{x_t \cr u_t \cr}\right]\, - \, x^\prime_{t_1} P_{t_1} x_{t_1} \cr
\hbox{subject to }\qquad & x_{t+1} = A_t x_t + B_t u_t, \qquad x_{t_0}\
\hbox{ given} . \EQN ori2.2 \cr}$$
The maximization is carried out over the sequence of controls $(u_{t_0},
u_{t_{0+1}}, \hfill\break
\ldots, u_{t_{1-1}})$. This is a recursive or serial problem, which is
appropriate to solve using the method of dynamic programming. In this case,
the {\it value functions\/} are defined as the quadratic forms, $s = t_0, t_0
+ 1, \ldots, t_1 -1$,
$$\eqalign{- x^\prime_s P_s x_s = \max\ &
\biggl\{ - \sum^{t_{1-1}}_{t=s}\, \left[\matrix{x_t\cr u_t \cr}\right]'
\left[\matrix{R_t & W_t \cr W^\prime_t & Q_t \cr}\right]\ \left[\matrix{x_t \cr
u_t \cr}\right] - x^\prime_{t_1} P_{t_1} x_{t_1} \biggr\} \cr
\hbox{subject to} \quad & x_{t+1}=A_t x_t + B_t u_t ,\cr}\EQN ori2.3$$
$x_s$ given, $s = t_0, t_0 + 1, \ldots, t_1 -1$. The {\it Bellman equation\/}
becomes the following backward recursion in the quadratic forms $x^\prime_t\,
P_t\, x_t$:
\index{Bellman equation}
$$\eqalign{
x^\prime_t P_t x_t = \min_{u_t}\ &\Bigl\{x^\prime_t R_t x_t + u^\prime_t Q_t
u_t + 2 x^\prime_t W_t u_t + (A_t x_t + B_t u_t)^\prime \cr
& P_{t+1} (A_t x_t +B_t u_t) \Bigr\} ,\cr
&\hskip .75in t = t_1 -1, t_1 -2, \ldots, t_0\cr
&\hskip .75in \qquad P_{t_1} \hbox{ given .} \cr}\EQN ori2.4$$
Using the rules for differentiating quadratic forms, the
first-order necessary condition for the problem on the right side of equation
\Ep{ori2.4} is found by differentiating with respect to the vector $u_t$:
$$\left\{ Q_t + B^\prime_t P_{t+1} B_t \right\}u_t = -(B^\prime_t P_{t+1}
A_t + W^\prime_t) x_t.$$
Solving for $u_t$ we obtain
$$u_t = -(Q_t + B^\prime_t P_{t+1} B_t)^{-1} (B^\prime_t P_{t+1} A_t +
W^\prime_t) x_t .\EQN ori2.5$$
The inverse $(Q_t + B^\prime_t P_{t+1} B_t)^{-1}$ is assumed to exist.
Otherwise, it could be interpreted as a generalized inverse, and most of our
results would go through.
Equation \Ep{ori2.5} gives the optimal control in terms of a {\it feedback rule\/}
upon the state vector $x_t$, of the form
$$u_t = -F_t x_t \EQN ori2.6$$
where
$$F_t = (Q_t + B^\prime_t P_{t+1} B_t)^{-1} (B^\prime_t P_{t+1} A_t +
W^\prime_t) . \EQN ori2.7$$
Substituting equation \Ep{ori2.5} for $u_t$ into equation \Ep{ori2.4} and
rearranging gives the following recursion for $P_t$:
$$\eqalign{P_t = R_t + A^\prime_t P_{t+1} A_t - & (A_t P_{t+1} B_t + W_t)
\ (Q_t + B^\prime_t P_{t+1} B_t)^{-1}\cr
& (B^\prime_t P_{t+1} A_t + W^\prime_t) .\cr}\EQN ori2.8$$
Equation \Ep{ori2.8} is a version of the {\it matrix Riccati
difference equation}. \index{Riccati equation!matrix difference}
Equations \Ep{ori2.8} and \Ep{ori2.5} provide a recursive algorithm
for computing the optimal controls in feedback form. Starting at time
$(t_1 -1)$, and given
$P_{t_1}$, equation \Ep{ori2.5} is used to compute $u_{t_{1^{-1}}} =
- F_{t_{1^{-1}}} x_{t_{1^{-1}}}$. Then equation \Ep{ori2.8} is used to
compute $P_{t_{1^{-1}}}$. Then equation \Ep{ori2.5} is used to compute
$u_{t_{1^{-2}}} = F_{t_{1^{-2}}} x_{t_{1^{-2}}}$, and so on.
By substituting the optimal control $u_t = - F_t x_t$ into the state equation
\Ep{ori2.1}, we obtain the optimal {\it closed loop system\/} equations
$$x_{t+1} = (A_t - B_t F_t) x_t.$$
Eventually, we shall be concerned extensively with the properties of the
optimal closed loop system, and how they are related to the properties of $A_t,
\, B_t,\, Q_t,\, R_t$, and $W_t$.
\dsection{Converting a problem with cross products in states and controls
to one with no such cross products}{Converting a problem with cross products}
\noindent For our future work it is useful to introduce a problem that is equivalent
with equations \Ep{ori2.2} and \Ep{ori2.3}, and has a form in which no
cross products between states and controls appear in the objective
function. This is useful because our theorems about the properties
of the solutions \Ep{ori2.5} and \Ep{ori2.8}
will be in terms of the special case in which $W_t = 0\quad \forall t$. The
equivalence between the problems \Ep{ori2.2} and \Ep{ori2.3} and the
following problem
implies that no generality is lost by restricting ourselves to the case in
which $W_t = 0\quad \forall t$.
\par
The equivalent problem
$$\min_{\{u^\ast_t\}} \sum^{t_{1^{-1}}}_{t=t_0}\, \Bigl\{x^\prime_t (R_t - W_t
Q_t^{-1} W^\prime_t) x_t + u_t^{\ast\prime} Q_t u^\ast_t \Bigr\} +
x^\prime_{t_1} P_{t_1} x_{t_1} \EQN ori2.9$$
subject to
$$x_{t+1} = (A_t - B_t Q_t^{-1} W^\prime_t) x_t + B_t u^\ast_t,\EQN ori2.10$$
and $x_{t_0}, \, P_{t_0}$ are given. The new control variable $u^\ast_t$ is
related to the original control $u_t$ by
$$u^\ast_t = Q^{-1}_t W^\prime_t x_t + u_t .\EQN ori2.11$$
We can state the problem \Ep{ori2.9}--\Ep{ori2.10} in a more compact
notation as being to minimize
$$\sum^{t_{1^{-1}}}_{t=t_0} \, \Bigl\{x^\prime_t \bar R_t x_t +
u_t^{\ast \prime} Q_t u^\ast_t \Bigr\} + x_i, P_t, x_t ,\EQN ori2.12$$
subject to
$$x_{t+1} = \bar A_t x_t + B_t u^\ast_t \EQN ori2.13$$
where
$$\bar R_t = R_t - W_t Q_t^{-1} W^\prime_t \EQN ori2.14$$
and
$$\bar A_t = A_t - B_t Q^{-1}_t W^\prime_t. \EQN ori2.15$$
With these specifications, the solution of the problem can be computed using
the following versions of equations \Ep{ori2.5} and \Ep{ori2.8}
$$u^\ast_t = -\bar F_t x_t \equiv - (Q_t + B^\prime_t P_{t+1} B_t)^{-1} B_t
P_{t+1} \bar A_t \EQN ori2.16$$
$$P_t = \bar R_t + \bar A^\prime_t P_{t+1} \bar A_t - \bar A^\prime_t P_{t+1}
B_t (Q_t +
B^\prime_t P_{t+1} B_t)^{-1} B^\prime_t P_{t+1} \bar A_t \EQN ori2.17$$
We ask the reader to verify the following facts:
\itemitem{a.} Problems \Ep{ori2.2}--\Ep{ori2.3} and \Ep{ori2.9}--\Ep{ori2.10} are equivalent.
\itemitem{b.} The feedback laws $\bar F_t$ and $F_t$ for $u^\ast_t$ and
$u_t$, respectively, are related by
$$F_t = \bar F_t + Q_t^{-1} W^\prime_t.$$
\itemitem{c.} The Riccati equations \Ep{ori2.8} and \Ep{ori2.17}
are equivalent.
\itemitem{d.} The ``closed loop'' transition matrices are related by
$$A_t - B_t F_t = \bar A_t - B_t \bar F_t .$$
\section{An example}
We now give an example of a problem for which the preceding transformation is
useful. A consumer wants to maximize
$$\sum^\infty_{t=t_0}\ \beta^t\ \Bigl\{ u_1 c_t-{u_2 \over 2}c^2_t \Bigr \}
\quad 0 \, < \beta < 1\quad ,\ u_1> 0,\, u_2>0 \EQN ori2.18$$
subject to the intertemporal budget constraint
$$k_{t+1} = (1 + r)\ (k_t + y_t - c_t) , \EQN ori2.19$$
the law of motion for labor income
$$y_{t+1} = \lambda_0 + \lambda_1 y_t, \EQN ori2.20$$
and a given level of initial assets, $k_{t_0}$. Here $\beta$ is a discount
factor, $u_1$ and $u_2$ are constants, $c_t$ is consumption, $k_t$ is
``nonhuman'' assets at the beginning of time $t,\ r > -1$ is the interest
rate on nonhuman assets, and $y_t$ is income from labor at time $t$.
We define the transformed variables
$$\eqalign{\tilde k_t &= \beta^{t/2} k_t, \cr
\tilde y_t &= \beta^{t/2} y_t, \cr
\tilde c_t &= \beta^{t/2} c_t.\cr}$$
In terms of these transformed variables, the problem can be rewritten as
follows: maximize
$$\sum^\infty_{t=t_0}\ \Bigl\{u_1 \beta^{t/2} \cdot \tilde c_t - {u_2 \over 2}
\tilde c^2_t \Bigr\} \EQN ori2.21$$
subject to
$$\eqalign{\tilde k_{t+1} &=(1+r)\beta^{1/2}\ (\tilde k_t + \tilde y_t -
\tilde c_t )\quad \hbox { and } \cr
\noalign{\smallskip}
\tilde y_{t+1} &= \lambda_0 \beta^{{t + 1 \over 2}}+\lambda_1 \beta^{1/2}
\tilde y_t \cr} \EQN ori2.22$$
and $k_{t_0}$ given. We write this problem in the state-space form:
$$\eqalign{\max_{\{\tilde u_t\}}\ & \sum^\infty_{t=t_0}\ \Bigl\{\tilde
x_t^\prime R \tilde x_t + 2 \tilde x^\prime_t W \tilde u_t + \tilde u^\prime_t
Q \tilde u_t \Bigr \} \cr
& \hbox{ subject to } \ \tilde x_{t+1} = A \tilde x_t + B \tilde u_t.\cr}$$
We take
$$\eqalign{\tilde x_t &= \left[\matrix{\tilde k_t \cr \tilde y_t \cr
\beta^{t/2}\cr} \right], \ \tilde u_t = \tilde c_t, \cr
\noalign{\medskip}
R &= \left[\matrix{0 & 0 & 0 \cr 0 & 0 & 0 \cr 0 & 0 & 0 \cr}\right], \
W^\prime = \left[\matrix{ 0 & 0 & {u_1 \over 2}\cr} \right], \cr
\noalign{\medskip}
Q = - {u_2 \over 2}, \ A &= \left[\matrix{(1+r) & (1+r) & 0 \cr 0 &
\lambda_1 & \lambda_0 \cr 0 & 0 & 1 \cr}\right]\ \beta^{1/2}, \quad B =
\left[\matrix{-(1+r) \cr 0 \cr 0\cr}\right]\ \beta^{1/2} . \cr}$$
To obtain the equivalent transformed problem in which there are no
cross-product terms between states and controls in the return function, we
take
$$\eqalign{\bar A &= A - BQ^{-1} W^\prime = \left[\matrix{(1+r) & (1+r)
& - {u_1 (1+ r) \over u_2} \cr 0 & \lambda_1 & \lambda_0 \cr 0 & 0 & 1 \cr}
\right] \ \beta^{1/2} \cr
\noalign{\medskip}
\bar R &= R - WQ^{-1} W^\prime = \left[\matrix{0 & 0 & 0 \cr 0 & 0 & 0 \cr
0 & 0 &{u^2_1 \over 2u_2}\cr}\right]\cr
u^\ast_t &= \tilde u_t + Q^{-1} W^\prime \tilde x_t \cr
c^\ast_t &= \tilde c_t - {u_1 \over u_2} \beta^{t/2} .\cr}\EQN ori2.23$$
Thus, our original problem can be expressed as
$$\eqalign{\max_{\{u^\ast_t\}}\ & \sum^\infty_{t=t_0}\ \Bigl\{ \tilde
x_t^\prime \bar R \tilde x_t + u_t^{\ast \prime} Q u^\ast_t \Bigr \} \cr
\hbox {\ subject to} \quad & \tilde x_{t+1} = \bar A \tilde x_t +
B u^\ast_t.\cr} \EQN ori2.24$$
\section{The Kalman filter} \index{Kalman filter}
Consider the linear system
$$x_{t+1} = A_t x_t + B_t u_t + G_t w_{1t+1} \EQN ori2.25$$
$$y_t = C_t x_t + H_t u_t + w_{2 t}, \EQN ori2.26$$
where $[w_{1 t+1}^\prime,\, w_{2t}^\prime]$ is a vector white noise with
contemporaneous covariance matrix
$$E \left[\matrix{w_{1 t+1} \cr w_{2 t} \cr}\right] \left[\matrix{w_{1
t+1}\cr w_{2 t} \cr}\right]^\prime = \left[\matrix{V_{1t} & V_{3t} \cr
V^\prime_{3t} & V_{2t}\cr}\right]\ \geq \, 0 .$$
The $[w^\prime_{1t+1},\ w^\prime_{2t}]$ vector for $t\geq t_0$ is assumed
orthogonal to the initial condition $x_{t_0}$, which represents the initial
state. Here, $A_t$ is $(n \times n), B_t$ is $(n \times k), \
G_t$ is $(n \times N), C_t$ is $(\ell \times n), H_t$ is $(\ell \times k),
w_{1t+1}$ is $(N \times 1), w_{2 t+1}$ is $(\ell \times 1), x_t$
is an $(n \times 1)$ vector of {\it state\/} variables, $u_t$ is a
$(k \times 1)$ vector
of {\it controls\/}, and $y_t$ is an $(\ell \times 1)$ vector
of {\it output\/}
or observed variables. The matrices $A_t, B_t, G_t, C_t, \hbox { and } H_t$
are known, though possibly time varying. The noise vector $w_{1 t+1}$ is the
state disturbance, while $w_{2t}$ is the measurement error.
\par
The analyst does not directly observe the $x_t$ process. So from his point
of view, $x_t$ is a ``hidden state vector.'' The system is assumed to start
up at time $t_0$, at which time the state vector $x_{t_0}$ is regarded as a
random variable with mean $E x_{t_0} = \hat x_{t_0}$, and given covariance
matrix
$\sum_{t_0} = \sum_0$. The pair $(\hat x_{t_0}, \sum_0)$ can be regarded as
the mean and covariance of the analyst's Bayesian prior distribution on
$x_{t_0}$.
It is assumed that for $s \geq 0$, the vector of random variables
$\bigl[{w_{1 t_0 + s + 1} \atop w_{2 t_0 + s}}\bigr]$ is orthogonal to the
random variable $x_{t_0}$ and to the random variables $\bigl[{w_{1 t_0 + r +
1} \atop w_{2 t_0 + r}} \bigr]$ for $r \not = s$. It is also assumed that $E
\bigl[{w_{1 t_0 + s + 1} \atop w_{2 t+s}}\bigr]\hfil\break
= 0 \hbox { for } s \geq 0$.
Thus, $\bigl[{w_{1t} \atop w_{2t}}\bigr]$ is a serially uncorrelated or
white noise process. Further, from equations \Ep{ori2.25} and \Ep{ori2.26} and
the orthogonality
properties posited for $\bigl[{w_{1 t+1} \atop w_{2t}}\bigr]$ and $x_{t_0}$,
it follows that $\bigl[{w_{1 t+1} \atop w_{2t}}\bigr]$ is orthogonal to $\{x_s,
y_{s-1}\}$ for $s \leq t$. This conclusion follows because $y_t \hbox { and } x_{t+1}$
are in the space spanned by current and lagged $u_t, w_{1t+1},
w_{2t}, \hbox { and } x_{t_0}$.
\par
The analyst is assumed to observe at time $t\, \{ y_s,\, u_s : s = t_0,
t_0 + 1, \ldots t \}$, for $t = t_0, t_0 + 1, \ldots t_1$. The object is then
to compute the linear least-squares projection of the state $x_{t+1}$ on this
information, which we denote $\widehat E_t x_{t+1}$. We write this
projection as
$$\widehat E_t x_{t+1} \equiv \widehat E [x_{t+1} \mid y_t, y_{t-1},
\ldots, y_{t_0}, \hat x_{t_0}], \EQN ori2.27$$
where $\hat x_{t_0}$ is the initial estimate of the state. It is convenient to
let $Y_t$ denote the information on $y_t$ collected through time $t$:
$$Y_t = \{y_t, y_{t-1}, \ldots, y_{t_0}\} .$$
The linear least-squares projection of $y_{t+1}$ on $Y_t$, and $\hat x_{t_0}$
is, from equations \Ep{ori2.26} and \Ep{ori2.27}, given by
$$\eqalign{\widehat E_t y_{t+1} &\equiv \widehat E [y_{t+1} \mid Y_t,
\hat x_0] \cr
&= C_{t+1} \widehat E_t x_{t+1} + H_{t+1} \ u_{t+1},\cr} \EQN ori2.28$$
since $w_{2 t+1}$ is orthogonal to $\{w_{1s+1},\, w_{2s} \},\ s \leq t,\hbox{
and } \hat x_{t_0}$ and is therefore orthogonal to $\{Y_t,\, \hat x_{t_0}\}$.
In the interests of conveniently constructing the projections $\widehat E_t
x_{t+1}$ and $\widehat E_t y_{t+1}$, we now apply a \idx{Gram-Schmidt
orthogonalization} procedure to the set of random variables $\{\hat x_{t_0},
y_{t_0}, y_{t_0 + 1}, \ldots y_{t_1}\}$. An orthogonal basis for this
set of random variables is formed by the set $\{\hat x_{t_0}, \tilde y_{t_0}
\tilde y_{t_0 +1}, \ldots, \tilde y_{t_1}\}$ where
$$\tilde y_t = y_t - \widehat E [y_t \mid \tilde y_{t-1}, \tilde y_{t-2},
\ldots \tilde y_{t_0}, \hat x_{t_0}] .\EQN ori2.29$$
For convenience, let us write $\widetilde Y_t =\{\tilde y_{t_0},
\tilde y_{t_0 +1}, \ldots, \tilde y_t\}$. We note that the
linear spaces spanned by
$(\hat x_{t_0}, Y_t)$ equal the linear spaces spanned by $(\hat x_{t_0},
\tilde Y_t)$. This follows because (a) $ \tilde y_t$ is formed as
indicated previously as a linear function of $Y_t$ and $\hat x_{t_0}$,
and (b) $ \ y_t$ can be recovered from $\tilde Y_t$ and $\hat x_{t_0}$
by noting that $y_t = \widehat E [y_t \mid \hat x_{t_0}, \tilde Y_{t-1}] +
\tilde y_t$. It follows that $\widehat E[y_t \mid \hat x_{t_0}, Y_{t-1}] =
\widehat E [y_t \mid \hat x_{t_0}, \tilde Y_{t-1}] = E_{t-1} y_t$. In equation \Ep{ori2.29},
we use equation \Ep{ori2.26} to write
$$\widehat E [y_{t_0} \mid \hat x_{t_0}] = C_{t_0} \hat x_{t_0} +
H_{t_0} u_{t_0} .$$
We set $\hat x_{t_0} = Ex_0$. To summarize developments up to
this point, we have defined the {\it innovations process}
$$\eqalign{
\tilde y_t &= y_t - \widehat E [y_t\mid \hat x_{t_0},\ Y_{t-1}] \cr
&= y_t - \widehat E [y_t \mid \hat x_{t_0}, \tilde Y_{t-1}],\ t\geq t_0 + 1 \cr
\tilde y_{t_0} &= y_{t_0}-\hat E[y_{t_0} \mid \hat x_{t_0}] .\cr}$$
The innovations process is {\it serially uncorrelated\/} ($\tilde y_t$ is
orthogonal to $\tilde y_s$ for $t \not= s$) and spans the same linear space
as the original $Y$ process.
\par
We now use the innovations process to get a recursive procedure for evaluating
$\widehat E_t x_{t+1}$. Using \Theorem{th21.4} about projections on orthogonal
bases gives
$$\eqalign{
\widehat E\, & [x_{t+1} \mid \hat x_{t_0}, \tilde y_{t_0},
\tilde y_{t_0 + 1}, \ldots, \tilde y_t] \cr
&= \widehat E [x_{t+1} \mid \tilde y_t] + \widehat E [x_{t+1} \mid
\hat x_{t_0}, \tilde y_{t_0}, \tilde y_{t_0 + 1}, \ldots, \tilde y_{t-1}]
- E x_{t+1}. \cr} \EQN ori2.30$$
We have to evaluate the first two terms on the right
side of equation \Ep{ori2.30}.
From \Theorem{th21.1}, we have the
following:\NFootnote{Here, we are using $E\tilde y_t=0$.}
$$\widehat E [x_{t+1} \mid \tilde y_t] = Ex_{t+1} +\cov \ (x_{t+1},
\tilde y_t)\ \bigl[\cov\ (\tilde y_t, \tilde y_t)\bigr]^{-1} \tilde y_t .
\EQN ori2.31$$
To evaluate the covariances that appear in equation \Ep{ori2.31}, we shall use
the covariance matrix of one-step-ahead errors, $\tilde x_t = x_t -
\widehat E_{t-1} x_t$, in estimating $x_t$. We define this covariance
matrix as $\Sigma_t = E \tilde x_t \tilde x_t^\prime$. It follows from
equations \Ep{ori2.25} and \Ep{ori2.26} that
$$\eqalign{
\cov (x_{t+1}, \tilde y_t) = & \cov (A_t x_t + B_t u_t - G_t w_{1t+1}, y_t -
\widehat E_{t-1} y_t) \cr
= & \cov (A_t x_t + B_t u_t + G_t w_{1t+1}, \, C_t x_t + w_{2t} - C_t \widehat
E_{t-1} x_t) \cr
= &\cov (A_t x_t + B_t u_t + G_t w_{1t+1}, C_t \tilde x_t + w_{2t}) \cr
= & E \{[ A_t x_t + B_t u_t + G_t w_{1t+1} - E (A_t x_t + B_t u_t + G_t
w_{1t+1})] \cr
& [C_t \tilde x_t + w_{2t}-E (C_t \tilde x_t + w_{2t}]^\prime)\}\cr
=& E [(A_t x_t + G_t w_{1t+1} - A_t E x_t) (\tilde x_t^\prime C_t^\prime
+ w_{2t}^\prime)] \cr
= & E (A_t x_t \tilde x_t^\prime C_t^\prime) + G_t E (w_{1t+1}
\tilde x_t^\prime C_t^\prime) - A_t E x_t E \tilde x_t^\prime C_t^\prime \cr
& + A_t E (x_t w_{2t}^\prime) + G_t E (w_{1t+1} w_{2t}^\prime ) -
A_t Ex_t Ew_{2t}^\prime\cr
= & E (A_t x_t \tilde x_t^\prime C_t^\prime) + G_t E (w_{1t+1} w_{2t}^\prime)\cr
= & E[ A_t (\tilde x_t + \widehat E_{t-1} x_t) \tilde x_t^\prime C_t^\prime]
+ G_t E ( w_{1t+1} w_{2t}^\prime)\cr
= & A_t E \tilde x_t \tilde x_t^\prime C_t^\prime + G_t E (w_{1t+1}\,
w_{2t}^\prime) = A_t \Sigma_t C_t^\prime + G_t V_{3t} . \cr} \EQN ori2.32$$
The second equality uses the fact that $\widehat E_{t-1} w_{2t} = 0$, since
$w_{2t}$ is orthogonal to $\{x_s,\, y_{s-1}\},\, s \leq t$. To get the
fifth equality, we use the fact that $E \tilde x_t = E (x_t - \widehat E_{t-1}
x_t) = 0$ by the unbiased property of linear projections when one of the
regressors is a constant. We also use the
facts that $u_t$ is known and that $w_{1t+1}$ and $w_{2t}$ have zero means. The
seventh equality follows from the orthogonality of $w_{1t+1}$ and $w_{2t}$ to
variables dated $t$ and earlier and the means of $w_{2t}^\prime$ and $\tilde
x_t^\prime$ being zero. Finally, the ninth equation relies on the fact that
$\tilde x_t$ is orthogonal to the subspace generated by $y_{t-1}, y_{t-2},
\ldots, \hat x_{t_0}$ and $\widehat E_{t-1} x_t$ is a function of these vectors.
Next, we evaluate
$$\eqalign{\cov (\tilde y_t, \tilde y_t ) & = E ( C_t
\tilde x_t + w_{2t}) (C_t \tilde x_t + w_{2 t} )^\prime \cr
&= C_t \Sigma_t C_t^\prime + V_{2t}, \cr}$$
since $E \tilde y_t = 0 \hbox { and } E \tilde x_t w_{2t}^\prime = 0$.
Therefore, equation \Ep{ori2.31} becomes
$$\widehat E (x_{t+1} \mid \tilde y_t ) = E ( x_{t+1}) + (A_t \Sigma_t
C_t^\prime + G_t V_{3t}) (C_t \Sigma_t C_t^\prime + V_{2t})^{-1} \tilde y_t .
\EQN ori2.33$$
Using equation \Ep{ori2.25}, we evaluate the second term on the right side
of equation \Ep{ori2.30},
$$\widehat E (x_{t+1} \mid \tilde Y_{t-1}, \hat x_{t_0}) = A_t \widehat E (x_t
\mid \tilde Y_{t-1}, \hat x_{t_0}) + B_t u_t$$
or
$$\widehat E_{t-1} x_{t+1} = A_t \widehat E_{t-1} x_t+B_t u_t .\EQN ori2.34$$
Using equations \Ep{ori2.33} and \Ep{ori2.34} in equation \Ep{ori2.30} gives
$$\widehat E_t x_{t+1} = A_t \widehat E_{t-1} x_t + B_t u_t + K_t
(y_t - \widehat E_{t-1} y_t ) \EQN ori2.35$$
where
$$K_t = \Bigl( A_t \Sigma_t C_t^\prime + G_t V_{3 t} \Bigr)
\Bigl(C_t \Sigma_t C_t^\prime + V_{2 t}\Bigr)^{-1} .\EQN ori2.36$$
Using $\widehat E_{t-1} y_t = C_t \widehat E_{t-1} x_t + H_t u_t$,
equation \Ep{ori2.35} can also be written
$$\widehat E_t x_{t+1} = (A_t - K_t C_t ) \widehat E_{t-1} x_t + ( B_t - K_t
H_t ) u_t + K_t y_t .\EQN orn2.35;a$$
We now aim to derive a recursive formula for the covariance matrix
$\Sigma_t$. From equation \Ep{ori2.26} we know that $\widehat E_{t-1} y_t
= C_t \widehat E_{t-1} x_t + H_t u_t$. Subtracting this expression from $y_t$ in equation
\Ep{ori2.26} gives
$$y_t - \widehat E_{t-1} y_t = C_t (x_t-\widehat E_{t-1} x_t) + w_{2t} .
\EQN orn2.35;b$$
Substituting this expression in equation \Ep{ori2.35} and
subtracting the result from equation \Ep{ori2.25} gives
$$\eqalign{ (x_{t+1} - \widehat E_t x_{t+1} ) = & (A_t - K_t C_t)\, (x_t -
\widehat E_{t-1} x_t ) \cr
&+ G_t w_{1 t+1} - K_t w_{2 t} \cr}$$
or
$$\tilde x_{t+1} = (A_t - K_t C_t ) \tilde x_t + G_t w_{1 t+1} -
K_t w_{2t} . \EQN ori2.37$$
From equation \Ep{ori2.37} and our specification of the covariance matrix
$$E\left[\matrix{w_{1 t+1} \cr w_{2t} \cr}\right] \left[\matrix{w_{1t+1} \cr
w_{2 t}\cr}\right]^\prime = \left[\matrix{V_{1 t} & V_{3t}\cr V_{3t}^\prime &
V_{2 t} \cr} \right]$$
we have
$$\eqalign{E \tilde x_{t+1} \tilde x_{t+1}^\prime = & \Bigl(A_t - K_t C_t
\Bigr) E \tilde x_t \tilde x_t^\prime \Bigl(A_t - K_t C_t \Bigr)^\prime\cr
&+ G_t V_{1 t} G_t^\prime + K_t V_{2 t} K_t^\prime\cr
&- G_t V_{3 t} K_t^\prime - K_t V_{3 t}^\prime G_t^\prime . \cr}$$
We have defined the covariance matrix of $\tilde x_t$ as $\Sigma_t = E
\tilde x_t \tilde x_t^\prime = E (x_t - \widehat E_{t-1} x_t)\hfil\break
(x_t - \widehat E_{t-1} x_t)^\prime$. So we can express the
preceding equation as
$$\eqalign{\Sigma_{t+1} = & \Bigl(A_t - K_t C_t \Bigr) \Sigma_t \Bigl(
A_t - K_t C_t \Bigr)^\prime\cr
&+ G_t V_{1 t} G_t^\prime + K_t V_{2 t} K_t^\prime - G_t V_{3 t}
K_t^\prime \cr
&- K_t V_{3t}^\prime G^\prime_t .\cr} \EQN ori2.38$$
Equation \Ep{ori2.38} can be rearranged to the equivalent form
$$\eqalign{\Sigma_{t+1} = & A_t \Sigma_t A_t^\prime + G_t V_{1 t}
G_t^\prime \cr
&- \Bigl(A_t \Sigma_t C_t^\prime + G_t V_{3 t} \Bigr) \Bigl (C_t
\Sigma_t C_t^\prime + V_{2 t} \Bigr)^{-1}\, \Bigl(A_t\Sigma_t C_t + G_t
V_{3t}\Bigr)^\prime . \cr} \EQN orn2 $$
%\beginleftbox repeated eq no 2.36\endleftbox
%We repeat \Ep{ori2.36} here for your convenience
%$$K_t = \Bigl (A_t \Sigma_t C_t^\prime + G_t V_{3 t} \Bigr) \Bigl(C_t
%\Sigma_t C_t^\prime + V_{2 t} \Bigr)^{-1}\leqno (2.36)$$
Starting from the given initial condition for $\Sigma_{t_0} = E (x_{t_0}
- E x_{t_0}) (x_{t_0} - E x_{t_0})^\prime$, equations \Ep{ori2.38}
and \Ep{ori2.36} give a recursive
procedure for generating the ``Kalman gain'' $K_t$, which is the crucial
unknown ingredient of the recursive algorithm \Ep{ori2.35} for generating
$\widehat E_t x_{t+1}$.
%\beginleftbox there is a 2.39 refered to in this para but there is
%no leqno 2.39 \endleftbox
The Kalman filter is used as follows: Starting from time $t_0$ with
$\Sigma_{t_0} = \Sigma_0$ and $\hat x_{t_0} = E x_0$ given, equation \Ep{ori2.36}
is used to form $K_{t_0}$, and equation \Ep{ori2.35} is used to obtain
$\widehat E_{t_0} x_{t_0 + 1}$ with $\widehat E_{t_{0^{-1}}} x_{t_0} =
\hat x_0$. Then equation \Ep{ori2.38} is used to form
$\Sigma_{t_0 +1}$, equation \Ep{ori2.36} is used to form $K_{t_0 + 1}$,
equation \Ep{ori2.35} is used to obtain $\widehat E_{t_{0^{+1}}} x_{t_0 + 2}$,
and so on.
Define $\hat x_t = \widehat E_{t-1}x_t$ and $\hat y_t =
\widehat E_{t-1} y_t$. Set
$$a_t=w_{2t} + C_t(x_t-\hat x_t) . \EQN ori2.40$$
From equation \Ep{orn2.35;b}, we have
$$y_t - \hat y_t = C_t (x_t - \hat x_t) + w_{2t}$$
or
$$y_t - \hat y_t = a_t .\EQN ori2.41$$
We know that $E a_t a_t^\prime = C_t \Sigma_t C_t^\prime + V_{2t}$. The
random process $a_t$ is the ``innovation'' in $y_t$, that is, the part of
$y_t$ that cannot be predicted linearly from past $y$'s.
From equations \Ep{ori2.25} and \Ep{ori2.41} we get $y_t=C_t \hat x_{t+1}
H_t u_t+a_t$. Substituting this expression into equation \Ep{orn2.35;a} produces the
following system:
$$\eqalign{\hat x_{t+1} &= A_t \hat x_t + B_t u_t + K_t a_t \cr
y_t &= C_t \hat x_t +H_t u_t + a_t . \cr}\EQN ori2.42$$
System \Ep{ori2.42} is called an {\it innovations representation}.
\index{innovations representation}
Another representation of the system that is useful is obtained from
equation \Ep{orn2.35;a}:
$$\eqalign{\hat x_{t+1} &= (A_t - K_t C_t) \hat x_t + (B_t - K_t H_t)\, u_t
+ K_t y_t \cr
a_t &= y_t - C_t \hat x_t - H_t u_t . \cr} \EQN ori2.43$$
This is called a {\it whitening filter}. Starting from a given $\hat x_{t_0}$,
this system accepts as an ``input'' a history of $y_t$ and gives as an output
the sequence of innovations $a_t$, which by construction are serially
uncorrelated.
\index{whitening filter}
We shall often study situations in which the system is time invariant, that is,
$A_t = A,\, B_t=B,\, G_t=G,\, H_t=H,\, C_t = C$, and $V_{jt} = V_j$ for all $t$.
We shall later describe
regulatory conditions on $A, C, V_1, V_2$, and $V_3$ which imply that (1) $\ K_t
\rightarrow K$ as $t \rightarrow \infty$ and $\Sigma_t \rightarrow \Sigma$ as
$t\rightarrow \infty$; and (2) $ \ \mid \lambda_i (A - KC) \mid < 1$ for all
$i$, where $\lambda_i$ is the $i$th eigenvalue of $(A - KC)$. When these
conditions are met, the limiting representation for equation \Ep{ori2.43} is time
invariant and is an (infinite dimensional) innovations representation.
Using the lag operator $L$ where $L \hat x_t = \hat x_{t-1}$, imposing
time invariance in equation \Ep{ori2.42}, and rearranging gives the representation
$$y_t = [I + C (L^{-1} I - A)^{-1} K] a_t +\bigl[ H+C (L^{-1} I-A)\,B\bigr]\,
u_t, \EQN ori2.44$$
which expresses $y_t$ as a function of $[a_t, a_{t-1}, \ldots]$. In
order that $[y_t, y_{t-1},
\ldots]$ span the same linear space as $[a_t, a_{t-1}, \ldots]$,
it is necessary that the following condition be met:
$$\det\, [I + C (zI - A)^{-1} K] = 0 \ \Rightarrow \ \mid z \mid < 1 .$$
Now by a theorem from linear algebra we know that\NFootnote{See Noble and Daniel
(1977, exercises 6.49 and 6.50, p. 210).}
\auth{Nobel, Ben} \auth{Daniel, James W.}
%
$$\det [I + C (zI-A)^{-1} K] = {\det [zI - (A-KC)] \over \det (zI-A)} .$$
The formula shows that the zeros of $\det [I + C (zI-A)^{-1} K]$ are
zeros of $\det [zI - (A - KC)]$, which are eigenvalues of $A - KC$. Thus,
if the eigenvalues of $(A-KC)$ are all less than unity in modulus, then the
spaces $[a_t, a_{t-1}, \ldots]$ and $[y_t, y_{t-1}, \ldots]$ in
representation \Ep{ori2.44} are equal.
%\beginleftbox these eqs. 35, 38, 6, and 7 are repeated from earlier
%in the paper. should they be renumbered? \endleftbox
\index{duality!of control and filtering}
\section{Duality}
For purposes of highlighting their relationship, we now repeat the Kalman
filtering formulas for $K_t$ and $\Sigma_t$ and the optimal linear regulator
formulas for $F_t$ and $P_t$
$$K_t = \Bigl(A_t \Sigma_t C_t^\prime + G_t V_{3 t} \Bigr) \Bigl(C_t
\Sigma_t C_t^\prime + V_{2t} \Bigr)^{-1}.\EQN duality1$$
$$\eqalign{\Sigma_{t+1} = & A_t \Sigma_t A_t^\prime + G_t V_{1t}
G_t^\prime \cr
&- \Bigl ( A_t \Sigma_t C_t^\prime + G_t V_{3t}\Bigr) \Bigl(C_t \Sigma_t
C_t^\prime + V_{2 t} \Bigr)^{-1} \cr
&\times \Bigl( A_t \Sigma_t C_t^\prime + G_t V_{3 t} \Bigr)^\prime\cr}
\EQN duality2
$$
$$F_t = (Q_t + B^\prime_t P_{t+1} B_t)^{-1} (B^\prime_t P_{t+1} A_t +
W^\prime_t) . \EQN duality3 $$
$$\eqalign{P_t = & R_t + A_t^\prime P_{t+1} A_t \cr
&- (A_t^\prime P_{t+1} B_t + W_t)\, (Q_t + B_t^\prime P_{t+1} B_t)^{-1} \cr
& \times \Bigl (B_t^\prime P_{t+1} A_t + W^\prime_t\Bigr)\cr}\EQN duality4$$
for $t = t_0, t_0 + 1, \ldots, t_1$. Equations \Ep{duality1} and
\Ep{duality2} are
solved forward from $t_0$ with $\Sigma_{t_0}$ given, while equations
\Ep{duality3} and \Ep{duality4}, are solved backward from $t_1 -1$ with
$P_{t_1}$ given.
The equations for $K_t$ and $F_t$ are intimately related, as are the
equations for $P_t$ and $\Sigma_t$. In fact, upon properly reinterpreting
the various matrices in equations \Ep{duality1}, \Ep{duality2},
\Ep{duality3}, and \Ep{duality4}, the
equations for the Kalman filter and the optimal linear regulator can
be seen to be identical. Thus, where $A$ appears in the Kalman filter,
$A^\prime$ appears in the corresponding regulator equation; where $C$
appears in the Kalman filter, $B^\prime$ appears in the corresponding
regulator equation; and so on. The correspondences are listed in detail
in \Tbl{tab21.1}. %Table 21.1.
By taking account of these correspondences, a single set
of computer programs can be used to solve either an optimal linear
regulator problem or a Kalman filtering problem.
The concept of {\it duality\/} helps to clarify the relationship between
the optimal regulator and the Kalman filtering problem.
\medskip
\table{tab21.1}
\caption{\bf Duality}
$$\vbox{\offinterlineskip \hrule
\halign{#\hfil & \qquad #\hfil \cr
\noalign{\smallskip}
Object in Optimal Linear & \phantom{00000} Object in \cr
\phantom{000} Regulator Problem & \phantom{000}Kalman Filter \cr
\noalign{\smallskip}
\noalign{\hrule}
\noalign{\medskip}
$A_{t_0 + s}, s = 0,\ldots, t_1 - t_0 -1$ & $A_{t_1 - 1 -s}^\prime,
s=0, \ldots, t_1 -t_0 -1$ \cr
\noalign{\smallskip}
$B_{t_0 + s}$ & $C_{t_1 -1 -s}^\prime$ \cr
\noalign{\smallskip}
$R_{t_0+s}$ & $G_{t_1 -1 -s} V_{1 t_1 -1 -s} G_{t_1 -1 -s}^\prime$ \cr
\noalign{\smallskip}
$Q_{t_0 + s}$ & $V_{2t_1 -1 -s}$ \cr
\noalign{\smallskip}
$W_{t_0 + s}$ & $G_{t_1 -1 -s} V_{3 t_1 -1 -s}$\cr
\noalign{\smallskip}
$P_{t_0 + s}$ & $\Sigma_{t_1 - s}$\cr
\noalign{\smallskip}
$F_{t_0 + s}$ & $K_{t_1 -1 -s}^\prime$ \cr
\noalign{\smallskip}
$P_{t_1}$ & $\Sigma_{t_0}$ \cr
\noalign{\smallskip}
$A_{t_0 + s} - B_{t_0 + s} F_{t_0 +s}$ & $A_{t_1 -1 -s}^\prime -
C_{t_1 -1 -s}^\prime K_{t_1 -1 -s}^\prime$ \cr
\noalign{\smallskip}\cr
\noalign{\hrule} }}$$
\endtable
%\specsec{Definition 21.1}:
\definition{def21.1} Consider the time-varying linear system:
$$\eqalign{x_{t+1} &= A_t x_t + B_t u_t \cr
y_t &= C_t x_t, \ t = t_0, \ldots, t_1 -1 . \cr} \EQN ori2.47$$
The {\it dual\/} of system \Ep{ori2.47} (sometimes called the ``dual
with respect to $t_1-1$'') is the system
$$\eqalign{x^\ast_{t+1} &= A_{t_1 -1 -t}^\prime x^\ast_t +
C^\prime_{t_1 -1 -t} u^\ast_t \cr
y^\ast_t &= B^\prime_{t_1 -1 -t} x^\ast_t\cr}$$
with $t = t_0, t_0 + 1, \ldots, t_1 -1$.
\enddefinition
With this definition, the correspondence exhibited in \Tbl{tab21.1} %Table 21.1
can be
summarized succinctly in the following proposition:
%\specsec{Proposition 21.1:}
\theorem{prop21.1}
Let the solution of the optimal linear regulator
problem defined by the given matrices
$\{A_t, B_t, R_t, Q_t, W_t; t = t_0,
\ldots, t_1 -1; \, P_{t_1}\}$ be given by $\{ P_t, F_t, \ t= t_0, \ldots, t_1
-1\}$. Then the solution of the Kalman filtering problem defined by
$\{A_{t_1 -1 -t}^\prime,\, C_{t_1 -1 -t}^\prime$, $ G_{t_1 -1 -t}\,
V_{1 t_1 -1 -t} G^\prime_{t_1 -1-t}, V_{2 t_1 -1 -t},
G_{t_1 -1 -t}\,\hfil\break V_{3t_1 -1 -t};\ t = t_0, \ldots,
t_1 -1; \ \Sigma_{t_0}\}$
is given by $\{K_{t_1 -t -1}^\prime =$ $F_t,
\Sigma_{t_1 -t} = P_t; \ t = t_0,\, t_0 + 1, \ldots, t_1 -1 \}$.
\endtheorem
\smallskip
This proposition describes the sense in which the Kalman filtering problem and
the optimal linear regulator problems are ``dual'' to one another.
As is also true of so-called
classical control and filtering methods, the
same equations arise in solving both
the filtering problem and the
control problem. This fact implies that almost everything that we learn about
the control problem applies to the filtering problem, and {\rm vice versa}.
As an example of the use of duality, recall the transformations \Ep{ori2.13}
and \Ep{ori2.14} that we used to convert the optimal linear regulator
problem with
cross products between {\it states\/} and {\it controls\/} into an equivalent
problem with no such cross products. The preceding discussion of duality and
%Table 21.1
\Tbl{tab21.1} suggest that the same transformation will convert the original dual
filtering problem, which has nonzero covariance matrix $V_3$ between {\it
state noise\/} and {\it measurement noise\/}, into an equivalent problem with
covariances zero. This hunch is correct. The transformations, which can be
obtained by duality directly from equations \Ep{ori2.13} and \Ep{ori2.14}, are for
$t = t_0, \ldots, t_1 -1$
$$\eqalign{\bar A^\prime_{t_1 -1 -t} &= A^\prime_{t_1 -1 -t}
-C_{t_1 -1 -t}^\prime V_{2 t_1 -1 -t}^{-1} V^\prime_{3 t_1 -1 -t}
G^\prime_{t_1 -1 -t}\cr
\bar V_{1 t_1 -1 -t} &= V_{1 t_1 -1 -t} - V_{3 t_1 -1 -t}
V^{-1}_{2 t_1 -1 -t} V_{3 t_1 -1 -t}^\prime .\cr}$$
The Kalman filtering problem defined by $\{\bar A_t, C_t,
-G_t \bar V_{1t} G^\prime_t -V_{2t}, 0; \hfil\break
t= t_0, \ldots, t_1 -1; \, \Sigma_0\}$
is equivalent to the original problem in the sense that
$$A_t - K_t C_t = \bar A_t - \bar K_t C_t, $$
where $\bar K_t$ is the solution of the transformed problem. We also have,
by the results for the regulator problem and duality, the following:
$$\bar K_t = K_t - G_t V_{3 t} V_{2 t}^{-1}.$$
\section{Examples of Kalman filtering}
This section contains several examples that have been widely used by
economists and that fit into the Kalman filtering setting. After the reader
has worked through our examples, no doubt many other examples will occur.
\medskip \noindent
{\it a. Vector autoregression\/}: We consider an $(n \times 1)$ stochastic
process $y_t$ that obeys the linear stochastic difference equation
$$y_t = A_1 y_{t-1} + \ldots + A_m y_{t-m} + \varepsilon_t , $$
where $\varepsilon_t$ is an $(n \times 1)$ vector white noise, with mean
zero and $E \varepsilon_t \varepsilon_t^\prime = V_{1 t},\, E\varepsilon_t
y_s^\prime = 0, \ t > s$. We define the state vector $x_t$ and shock vector
$w_t$ as
$$x_t = \left[\matrix{y_{t-1} \cr y_{t-2} \cr \vdots \cr y_{t-m}\cr}\right], \
\left[\matrix{w_{1 t+1} \cr w_{2 t} \cr}\right] = \pmatrix{\varepsilon_t \cr
\varepsilon_t}.$$
The law of motion of the system then becomes
$$\left[\matrix{y_t \cr y_{t-1} \cr y_{t-2} \cr \vdots \cr y_{t-m+1}\cr}
\right] =
\left[\matrix{A_1 & A_2 & \ldots & A_m \cr I & 0 & \ldots & 0 \cr 0 & I &
\ldots & 0 \cr \vdots & \vdots & \ddots & \vdots \cr
0 & \ldots & I & 0 \cr}\right] \pmatrix{y_{t-1} \cr y_{t-2} \cr y_{t-3} \cr
\vdots \cr y_{t-m} \cr} + \left[\matrix{I\cr0\cr0\cr\vdots \cr 0}\right] \,
\varepsilon_t.$$
The measurement equation is
$$y_t = [A_1 \ A_2 \ldots A_m] \, x_t + \varepsilon_t.$$
For the filtering equations, we have
$$\eqalign{A_t &= \left[\matrix{A_1 & A_2 & \ldots & A_m \cr I & 0 & \ldots &
0 \cr 0 & I & \ldots & 0 \cr \vdots & \vdots & \ddots & \vdots \cr
0 & \ldots & I & 0 \cr}\right], \ G_t = G =
\left[\matrix{I \cr 0 \cr 0 \cr \vdots \cr 0\cr} \right ] \cr
C_t &= [A_1, \ldots, A_n] \cr
V_{1 t} &= V_{2 t} = V_{3 t}.\cr}$$
Starting from $\Sigma_{t_0} = 0$, which means that the system is imagined to
start up with $m$ lagged values of $y$ having been observed, equation
\Ep{ori2.36} implies
$$K_{t_0} = G,$$
while equation \Ep{ori2.38} implies that $\Sigma_{t_0 + 1} = 0$. It follows
recursively that $K_t = G$ for all $t \geq t_0$ and that $\Sigma_t = 0$
for all $t \geq t_0$. Computing $(A-KC)$, we find that
$$\widehat E_t x_{t+1} = \left[\matrix{0 & 0 & \ldots & 0 \cr I & 0 & \ldots &
0 \cr 0 & I & \ldots & 0 \cr \vdots \cr 0 & \ldots & I & 0 \cr}\right] \
\widehat E_{t-1} x_t + \left[\matrix{I \cr 0 \cr \vdots\cr 0 \cr}\right]
y_t,$$
which is equivalent with
$$\widehat E_t x_{t+1} = \left[\matrix{y_t \cr y_{t-1} \cr \vdots \cr
y_{t-m}\cr}\right].$$
The equation $\widehat E_t y_{t+1} = C \widehat E_t x_{t+1}$ becomes
$$\widehat E_t y_{t+1} = A_1 y_t + A_2 y_{t-1} + \ldots + A_m y_{t-m+1}.$$
Evidently, the preceding equation for forecasting a vector autoregressive
process can be obtained in a much less roundabout manner, with no need to
use the Kalman filter.
\medskip \noindent
{\it b. Univariate moving average\/}: We consider the model
$$y_t = w_t + c_1 w_{t-1} + \ldots + c_n w_{t-n}$$
where $w_t$ is a univariate white noise with mean zero and variance
$V_{1 t}$. We write the model in the state-space form
$$\eqalign{x_{t+1} &= \left[\matrix{w_t \cr w_{t-1} \cr \vdots \cr
w_{t-n+1}\cr}\right] =
\left[\matrix{0 & 0 & \ldots & 0 \cr 1 & 0 & \ldots & 0 \cr \vdots & \vdots &
\ddots & \vdots \cr 0 & \ldots
& 1 & 0 \cr}\right] \left[\matrix{w_{t-1} \cr w_{t-2} \cr \vdots \cr w_{t-n}
\cr}\right] + \left[\matrix{1 \cr 0 \cr \vdots\cr 0\cr}\right] w_t \cr
y_t &= [c_1 \ c_2 \ldots c_n] x_t + w_t .\cr}$$
We assume that $\Sigma_{t_0} = 0$, so that the initial state is known. In this
setup, we have $A, G$, and $C$ as indicated previously, and $w_{1t+1} = w_t, w_{2t}
= w_t$, and $V_1 = V_2 = V_3$. Iterating on the Kalman filtering
equations \Ep{ori2.38} and \Ep{ori2.36} with $\Sigma_{t_0} = 0$, we obtain
$\Sigma_t = 0, \ t \geq t_0,\ K_t = G,\ t \geq t_0$, and
$$(A-KC) = \pmatrix{-c_1 & -c_2 & \ldots & -c_{n-1} & -c_n \cr 1 & 0 & \ldots
& 0 & 0 \cr 0 & 1 & \ldots & 0 & 0 \cr \vdots & \vdots & \ddots & \vdots &
\vdots \cr 0 & 0 & \ldots & 1 & 0 \cr} .$$
It follows that
$$\eqalign{ \widehat E_t x_{t+1} &= \widehat E_t \pmatrix{w_t \cr w_{t-1} \cr
\vdots \cr w_{t-n+1}\cr} = \pmatrix{-c_1 & -c_2 & \ldots & -c_{n-1} & -c_n \cr
1 & 0 & \ldots & 0 & 0 \cr 0 & 1 & \ldots & 0 & 0 \cr \vdots & \vdots & \ddots
& \vdots & \vdots \cr 0 & 0 & \ldots & 1 & 0 \cr} \cr
\noalign{\medskip}
&\hskip .50in \widehat E_{t-1} \pmatrix{w_{t-1} \cr w_{t-2} \cr \vdots \cr
w_{t-n}\cr} + \pmatrix{1 \cr 0 \cr \vdots \cr 0 \cr} y_t . \cr}$$
With $\Sigma_{t_0} = 0$, this equation implies
$$\widehat E_t w_t = y_t - c_1 w_{t-1} - \ldots -c_n w_{t-n} .$$
Thus, the innovation $w_t$ is recoverable from knowledge of $y_t$ and $n$
past innovations.
\medskip \noindent
{\it c. Mixed moving average--autoregression\/}: We consider the univariate,
mixed second-order autoregression, first-order moving average process
$$y_t = A_1 y_{t-1} + A_2 y_{t-2} + v_t + B_1 v_{t-1}, $$
where $v_t$ is a white noise with mean zero, $Ev_t^2 = V_1$ and $Ev_t y(s)
= 0$ for $s<t$. The trick in getting this system into the state-space form is
to define the state variables $x_{1 t} = y_t - v_t$, and $x_{2 t} = A_2
y_{t-1}$. With these definitions the system and measurement equations become
$$x_{t+1} = \pmatrix{A_1 & 1 \cr A_2 & 0 \cr} x_t + \pmatrix{B_1 + A_1 \cr
A_2 \cr} v_t \EQN ori2.48$$
$$y_t = [1 \ 0] x_t + v_t. \EQN ori2.49$$
Notice that using equation \Ep{ori2.48} and \Ep{ori2.49} repeatedly, we have
$$\eqalign{y_t = x_{1 t} + v_t &= A_1 x_{1 t-1} + x_{2 t-1} + (B_1 + A_1)
v_{t-1} + v_t \cr
&= A_1 (x_{1 t-1} + v_{t-1}) + v_t + B_1 v_{t-1}+ A_2 (x_{1 t-2} + v_{t-2})\cr
&= A_1 y_{t-1} + A_2 y_{t-2} + v_t + B_1 v_{t-1} \cr}$$
as desired. With the state and measurement equations \Ep{ori2.48} and
\Ep{ori2.49}, we have $V_1 = V_2 = V_3$,
$$A = \pmatrix{A_1 & 1 \cr A_2 & 0\cr}, G = \pmatrix{B_1 + A_1 \cr A_2
\cr},\ C = [1\ 0].$$
We start the system off with $\Sigma_{t_0} = 0$, so that the initial state is
imagined to be known. With $\Sigma_{t_0} = 0$, recursions on equations \Ep{ori2.35} and
\Ep{ori2.38} imply that $\Sigma_t = 0$ for $t \geq t_0$ and $K_t = G$ for $t \geq
t_0$. Computing $A - KC$, we find
$$(A-KC) = \pmatrix{-B_1 & 1 \cr 0 & 0\cr}$$
and we have
$$\widehat E_t x_{t+1} = \left[\matrix{-B_1 & 1\cr 0 & 0\cr}\right]\, \hat
t_{t-1} x_t + \left[\matrix{B_1 + A_1\cr A_2\cr}\right]\, y_t .$$
Therefore, the recursive prediction equations become
$$\widehat E_t y_{t+1} = \left[\matrix{1 & 0\cr}\right]
\widehat E_{t+1} x_{t+1} = \widehat E_t
x_{1t+1} .$$
Recalling that $x_{2 t} = A_2 y_{t-1}$, the preceding two equations imply that
$$\widehat E_t y_{t+1} = -B_1 \widehat E_{t-1} y_t + A_2 y_{t-1} + (B_1 + A_1)
y_t . \EQN ori2.50$$
Consider the special case in which $A_2 = 0$, so that the $y_t$ obeys a first-order
moving average, first-order autoregressive process. In this
case equation \Ep{ori2.50} can be expressed
$$\widehat E_t y_{t+1} = B_1 (y_t - \widehat E_{t-1} y_t) + A_1 y_t ,$$
which is a version of the Cagan-Friedman ``error-learning'' model. The
solution of the preceding difference equation for $\widehat E_t y_{t+1}$ is given
by the geometric distributed lag
$$\eqalign{\widehat E_t y_{t+1} &= (B_1 + A_1) \sum^m_{j=0} (-B_1)^j y_{t-j}\cr
&+ (-B_1)^{m+1} \widehat E_{t-m-1} y_{t-m} .\cr}$$ For the more
general case depicted in equation \Ep{ori2.50} with $A_2 \not= 0$,
$\widehat E_t y_{t+1}$ can be expressed as a
convolution\NFootnote{A sequence $\{c_s\}$ is said to be the
convolution of the two sequences $\{a_s\}, \{b_s\}$ if $c_s =
\sum_{j=-\infty}^\infty a_j b_{s-j}$.} of geometric lag
distributions in current and past $y_t$'s.
\medskip\noindent
{\it d. Linear regressions\/}: Consider the standard linear regression model
$$y_t = z_t \beta + \varepsilon_t, \quad t = 1, 2, \ldots, T$$
where $z_t$ is a $1 \times n$ vector of independent variables, $\beta$ is an
$n \times 1$ vector of parameters, and $\varepsilon_t$ is a serially
uncorrelated random term with
mean zero and variance $E \varepsilon^2_t = \sigma^2$, and satisfying
$E\varepsilon_t z_s = 0$ for $t \geq s$. The least-squares estimator of
$\beta$ based on $t$ observations, denoted $\hat \beta_{t+1}$, is obtained as
follows. Define the stacked matrices
$$Z_t = \left[\matrix{z_1 \cr z_2 \cr \vdots \cr z_t\cr}\right] ,\quad Y_t =
\left[\matrix{y_1 \cr y_2 \cr \vdots \cr y_t \cr}\right].$$
Then the least-squares estimator based on data through time $t$ is given by
$$\hat \beta_{t+1} = (Z_t^\prime Z_t)^{-1} Z_t^\prime Y_t \EQN ori2.51 $$
with covariance matrix
$$E (\hat \beta_{t+1} - E \hat \beta_{t+1}) (\hat \beta_{t+1} - E \hat
\beta_{t+1})^\prime = \sigma^2 (Z^\prime_t Z_t)^{-1}. \EQN ori2.52$$
For reference, we note that
$$\eqalign{\hat \beta_t &= (Z^\prime_{t-1} Z_{t-1})^{-1} Z_{t-1}^\prime Y_{t-1}
\cr
E (\hat \beta_t & - E \hat \beta_t) (\hat \beta_t - E \hat \beta_t)^\prime =
\sigma^2 (Z^\prime_{t-1} Z_{t-1})^{-1} .\cr}\EQN ori2.52e$$
If $\hat \beta_t$ has been computed by equation \Ep{ori2.52e}, it is computationally
inefficient to compute $\hat \beta_{t+1}$ by equation \Ep{ori2.51} when new data
$(y_t, z_t)$ arrive at time $t$. In particular, we can avoid inverting
the matrix $(Z_t^\prime Z_t)$ directly, by employing a recursive procedure
for inverting it. This approach can be viewed as an application of the
Kalman filter. We explore this connection briefly.
We begin by noting how least-squares estimators can be computed recursively
by means of the Kalman filter. We let $y_t$ in the Kalman filter be $y_t$ in the
regression model. We then set $x_t = \beta$ for all $t$, $V_{1 t} = 0, \,
V_{3 t} = 0, \, V_{2 t} = \sigma^2, \, w_{1t+1} = 0, \, w_{2t} = \varepsilon_t,
\, A = I$, and $C_t = z_t$. Let
$$\hat \beta_{t+1} = E \left[ \beta \mid y_t, y_{t-1}, \ldots y_1, z_t,
z_{t-1}, \ldots, z_1, \hat \beta_0\right],$$
where $\hat \beta_0$ is $\hat x_0$. Also, let $\Sigma_t = E (\hat \beta_t -
E \hat \beta_t) (\hat \beta_t - E \hat \beta_t)^\prime$. We start things off
with a ``prior'' covariance matrix $\Sigma_0$. With these definitions, the
recursive formulas \Ep{ori2.36} and \Ep{ori2.38} become
$$\eqalign{K_t &= \Sigma_t z^\prime_t (\sigma^2 + z_t \Sigma_t
z_t^\prime)^{-1} \cr
\Sigma_{t+1} &= \Sigma_t - \Sigma_t z_t^\prime (\sigma^2 + z_t \Sigma_t
z_t^\prime)^{-1} z_t \Sigma_t\cr}\EQN ori2.53$$
Applying the formula $\hat x_{t+1} = (A - K_t C_t) \hat x_t + K_t y_t$ to the
present problem with the preceding formula for $K_t$ we have
$$\hat \beta_{t+1} = (I - K_t z_t) \hat \beta_t + K_t y_t.\EQN ori2.54$$
\par
We now show how equations \Ep{ori2.53} and \Ep{ori2.54} can be derived directly
from equations \Ep{ori2.51} and \Ep{ori2.52}. From a matrix inversion formula
(see Noble and Daniel, 1977, p. 194), we have
$$\eqalign{(Z_t^\prime Z_t)^{-1} &= (Z^\prime_{t-1} Z_{t-1})^{-1}\cr
& - (Z_{t-1}^\prime Z_{t-1})^{-1} z_t^\prime [1 + z_t(Z^\prime_{t-1}
Z_{t-1}^1)^{-1} z^\prime_t]^{-1} z_t (Z^\prime_{t-1} Z_{t-1})^{-1} .\cr} \EQN
ori2.55$$
Multiplying both sides of equation \Ep{ori2.55} by $\sigma^2$ immediately gives
equation \Ep{ori2.53}. Use the right side of equation \Ep{ori2.55} to substitute for
$(Z^\prime_t Z_t)^{-1}$ in equation \Ep{ori2.51} and write
$$Z^\prime_t Y_t = Z^\prime_{t-1} Y_{t-1} + z^\prime_t y_t$$
to obtain
$$\eqalign{\hat \beta_{t+1} = & {1 \over \sigma^2} \{ \Sigma_t - \Sigma_t
z_t^\prime (\sigma^2 + z_t \Sigma_t z_t^\prime)^{-1} z_t \Sigma_t \}
\cdot \{ Z_{t-1}^\prime Y_{t-1} + z^\prime_t y_t \} \cr
=& \underbrace{{1\over \sigma^2}\Sigma_t Z_{t-1}^\prime
Y_{t-1}}_{\hat \beta_t} -
\underbrace{\Sigma_t z_t^\prime (\sigma^2 + z_t \Sigma_t z^\prime_t)^{-1}}
_{K_t}\, \underbrace{z_t}_{C_t} \ \underbrace{{1\over \sigma^2}\Sigma_t
Z^\prime_{t-1} Y_{t-1}}_{\beta_t} \cr
& + \underbrace{\Sigma_t Z^\prime_t (\sigma^2 + z_t \Sigma_t
Z^\prime_t)^{-1}}_{K_t} y_t\cr
\hat \beta_{t+1} = &(A - K_t C_t) \hat \beta_t + K_t y_t.\cr}$$
These formulas are evidently equivalent with those asserted earlier.
%
%insert1
%
%
%insert2
%
%
\section{Linear projections} %to Chapter 2}
For reference we state the following theorems about linear least-squares
projections. We let $Y$ be an $(n \times 1)$ vector of random variables and
$X$ be an $(h \times 1)$ vector of random variables. We assume that the
following first and second moments exist:
$$\eqalign{EY &= \mu_Y,\ EX = \mu_X , \cr
EXX^\prime &= S_{XX},\ EYY^\prime = S_{YY},\ EYX^\prime = S_{YX} .\cr}$$
Letting $x=X - EX$ and $y = Y - EY$, we define the following covariance matrices
$$Exx^\prime = \Sigma_{xx},\ E_{yy}^\prime = \Sigma_{yy},\ Eyx^\prime =
\Sigma_{yx}.$$
We are concerned with estimating $Y$ as a linear function of $X$. The
estimator of $Y$ that is a linear function of $X$ and that minimizes the
mean squared error between each component $Y$ and its estimate is called the
{\it linear projection of $Y$ on $X$.}
\medskip\noindent
%\specsec{Definition 21.2:}
\definition{def21.2} The {\it linear projection\/} of $Y$ on $X$ is the
affine
function $\hat Y = AX + a_0$ that minimizes $E\hbox{ trace } \{(Y-\hat Y)\,
(Y-\hat Y)^\prime\}$ over all affine functions $a_0+AX$ of $X$. We denote
this linear
projection as $\widehat E [Y \mid X]$, or sometimes as $\widehat E\, [Y\mid x,
\, 1]$ to emphasize that
a constant is included in the ``information set.''
\enddefinition
\par
The linear projection of $Y$ on $X$, $\widehat E \, [Y \mid X]$ is also
sometimes called the {\it wide sense expectation of $Y$ conditional on $X$.}
We have the following theorems:
\medskip \noindent
%\specsec{Theorem 21.1:}
\theorem{th21.1}
$$\widehat E\,[Y \mid X] = \mu_y + \Sigma_{yx} \Sigma^{-1}_{xx} (X-\mu_x) .
\EQN A1$$
\endtheorem
\medskip \noindent
%\specsec{Proof:}
\proof
The theorem follows immediately by writing out $E\, {\rm trace}
\ (Y-\hat Y) (Y - \hat
Y)^\prime$ and completing the square, or else by writing out $E \, {\rm trace}
(Y-\hat Y) (Y - \hat Y)^\prime$ and obtaining first-order necessary conditions
(``normal equations'') and solving them. \endproof %\qed
\medskip \noindent
%\specsec{Theorem 21.2:}
\theorem{th21.2}
$$\widehat E\,\biggl[\bigl(Y - \widehat E [Y \mid x]\bigr) \mid X^\prime
\biggr] = 0.$$
\endtheorem
\noindent This equation states that the errors from the projection are orthogonal to each
variable included in $X$.
\medskip\noindent
%\specsec{Proof:}
\proof Immediate from the normal equations.\endproof % \qed
\medskip\noindent
%\specsec{Theorem 21.3:}
\theorem{th21.3} \quad (Orthogonality principle)
$$E\Bigl[ [Y-\widehat E\,(Y\mid x)]\,x^\prime\Bigr]=0 .$$
\endtheorem
\medskip\noindent
%\specsec{Proof:}
\proof Follows from Theorem 21.3. \endproof %\qed
%\specsec{Theorem 21.4:}
\theorem{th21.4} \quad (Orthogonal regressors)
\medskip \noindent
Suppose that\hfil\break
$X^\prime = (X_1, X_2, \ldots, X_h)^\prime, EX^\prime= \mu^\prime = (\mu_{x1},
\ldots, \mu_{xh})^\prime$, and $E (X_i - \mu_{xi})\, (X_j-\mu_{xj}) = 0$
for $i \not= j$. Then
$$\widehat E \, [Y \mid x_1,\ldots, x_n, 1] = \widehat E\,[Y \mid x_1]+\widehat
E\,[Y\mid x_2] + \ldots + \widehat E\, [Y\mid x_n]-(n-1) \mu_y . \EQN A2$$
\endtheorem
\medskip\noindent
%\specsec{Proof:}
\proof Note that from the hypothesis of orthogonal regressors, the
matrix $\Sigma_{xx}$ is diagonal. Applying equation
\Ep{A1} then gives equation \Ep{A2}. \endproof %\qed
\index{Markov chain!hidden}
\index{filter!nonlinear}
\section{Hidden Markov models\label{sec:HMM}}
This section gives a brief introduction to hidden Markov models,
a tool that is useful to study a variety of nonlinear filtering
problems in finance and economics. We display a solution to
a nonlinear filtering problem that a reader might want
to compare to the linear filtering problem described earlier.
Consider an $N$-state Markov chain. We can represent the
state space in terms of the unit vectors
$S_x = \{e_1,\ldots, e_N\}$, where $e_i$ is the $i$th
$N$-dimensional unit vector. Let the $N \times N$ transition
matrix be $P$, with $(i,j)$ element
$$P_{ij} = {\rm Prob}(x_{t+1} = e_j\mid x_t = e_i).$$
With these definitions, we have
$$E x_{t+1} \mid x_t = P^\prime x_t.$$
%
Define the ``residual''
$$v_{t+1} = x_{t+1} - P^\prime x_t,$$
which implies the linear ``state-space'' representation
$$x_{t+1} = P^\prime x_t + v_{t+1}.$$
Notice how it follows that
$E \ v_{t+1} \mid x_t = 0 ,$ which qualifies $v_{t+1}$ as a ``martingale