on time series of observations from exponential family distributions

Anuncio
ON TIME SERIES OF OBSERVATIONS FROM EXPONENTIAL
FAMILY DISTRIBUTIONS
Juan Carlos Abril
Facultad de Ciencias Económicas,
Universidad Nacional de Tucumán y CONICET.
Casilla de correo 209, 4000 Tucumán, Argentina
SUMMARY
In this work a general method is developed for handling non-Gaussian
observations in linear state space time series models and this is applied to the
special case in which the observations come from exponential family
distributions. The method is based on the idea of estimating the state vector by
its posterior mode. Let ∼ be the vector (∼ 1⟨ ,..., ∼ ⟨n ) ⟨ and let p (∼ | Yn ) be the
conditional density of ∼ given the whole sample Yn . The posterior mode
estimate (PME) of ∼ is defined to be the value ∼ˆ of ∼ that maximises p (∼ | Yn ) .
ˆ Ζ E (∼ | Yn ) . When the observations
When the model is linear and Gaussian, ∼
are non-Gaussian, however, E (∼ | Yn ) is generally difficult or impossible to
compute and to use the mode instead is a natural alternative. More than that, it
can be argued that in the non-Gaussian case the PME is preferable to E (∼ | Yn )
since it is the value of ∼ which is the most probable given the data. In this
respect it can be thought of analogous to the maximum likelihood estimate of a
fixed parameter vector.
The question of calculating suitable starting values for the state iteration is
considered and the estimation of the hyperparameters is investigated in the work.
KEYWORDS: Approximate Kalman filtering and smoothing; Exponential family distributions;
Kalman filtering and smoothing; Non-Gaussian time series models; Posterior mode estimate.
JEL classification: C3, C5.
1.
Introduction
We begin with the linear Gaussian state space model. Although our main concern is
with non-Gaussian models, the linear Gaussian model provides the basis from which all our
methods will be developed. The model can be formulated in a variety of ways; we shall take
the form
yt Ζ Z t ∼ t Η ⁄ t ,
⁄ t ~ N (0, H t ),
∼ t Ζ Tt ∼ t ϑ1 Η Rt ♣ t , ♣ t ~ N (0, Qt ),
1
(1)
for t Ζ 1,..., n, where y t is a p ⌠ 1 vector of observations, ∼ t is an unobserved m ⌠ 1 state
vector, Rt is a selection matrix composed of r columns of the identity matrix I m , which need
not be adjacent, and the variance matrices H t and Qt are nonsingular. The disturbance
vectors ⁄ t and ♣ t are serially independent and independent of each other. Matrices H t , Qt ,
Z t and Tt are assumed known apart for possible dependence on a parameter vector ÷
which in classical inference is assumed fixed and unknown and in Bayesian inference is
assumed to be random. The first line of equation (1) is called the observation equation and
the second line, the state equation of the state space model. Matrices Z t and Tt are
permitted to depend on y1 ,..., y t ϑ1 . The initial state ∼ 0 is assumed to be N ( a 0 , P0 )
independently of ⁄1 ,..., ⁄ n and ♣1 ,..., ♣ n , where a 0 and P0 are first assumed known; later, we
consider how to proceed in the absence of knowledge of a 0 and P0 and particularly in the
diffuse case when P0ϑ1 Ζ 0 .
Let Yt ϑ1 denote the set y1 ,..., y t ϑ1 together with any information prior to time t Ζ 1 .
Starting at t Ζ 1 and building up the distributions of ∼ t and y t recursively, it can be shown
that the conditional densities p ( y t | ∼ 1 ,..., ∼ t , Yt ϑ1 ) Ζ p ( y t | ∼ t ) and p (∼ t | ∼ 1 ,..., ∼ t ϑ1 , Yt ϑ1 )
Ζ p (∼ t | ∼ t ϑ1 ) , thus establishing the truly Markovian nature of the model. Since in model (1)
all distributions are Gaussian, conditional distributions are also Gaussian. Assume that ∼ t
given Yt ϑ1 is N ( a t , Pt ) and that ∼ t given Yt is N ( at |t , Pt |t ) . The object of the Kalman filter
and smoother (KFS) is to calculate a t |t , Pt |t , at Η1 and Pt Η1 given a t , Pt recursively.
In this work a general method is developed for handling non-Gaussian observations
in linear state space time series models and this is applied to the special case in which the
observations come from exponential family distributions. The method is based on the idea of
estimating the state vector by its posterior mode, a device which has already been
considered by some authors for spline smoothing (see, for example, Abril (1999) and the
references given therein). Let ∼ be the stacked vector (∼ 1⟨ ,..., ∼ ⟨n ) ⟨ and let p (∼ | Yn ) be the
conditional density of ∼ given Yn . The posterior mode estimate (PME) of ∼ is defined to be
the value ∼ˆ of ∼ that maximises p (∼ | Yn ) . When the model is linear and Gaussian,
∼ˆ Ζ E (∼ | Yn ) . When the observations are non-Gaussian, however, E (∼ | Yn ) is generally
difficult or impossible to compute and to use the mode instead is a natural alternative. More
than that, it can be argued that in the non-Gaussian case the PME is preferable to E (∼ | Yn )
2
since it is the value of ∼ which is the most probable given the data. In this respect it can be
thought of analogous to the maximum likelihood (ML) estimate of a fixed parameter vector;
ˆ is
see, for example, Whittle (1991) and the subsequent discussion. The tth subvector of ∼
called the smoothed value of ∼ t and is denoted by ∼ˆ t .
The idea of using the PME for departure from the linear Gaussian case can be found
in earlier literature, for example in Sage and Melsa (1971). The approach has been
developed for exponential families by Fahrmeir (1992) and in earlier papers referenced
therein. The treatment below is based on ideas of Durbin and Koopman (1994) who
developed a technique for iterative computing of the PME using fast KFS algorithms, as well
as methods for calculating approximate ML estimate of hyperparameters.
Let At be the stacked vector (∼ 1⟨ ,..., ∼ ⟨t ) ⟨ , t Ζ 1,..., n . Then ∼ˆ is the PME of An and
for filtering, the PME of At given Yt ϑ1 is the value of At that maximises the conditional
density of p ( At | Yt ϑ1 ) , t Ζ 1,..., n . The tth subvector of this is denoted by a t .
In all cases we shall consider, the PME ∼ˆ is the solution of the equations
⌡ log p(∼ | Yn )
Ζ 0,
⌡ ∼t
t Ζ 1,..., n.
Since, however, log p (∼ | Yn ) Ζ log p (∼ , Yn ) ϑ log p (Yn ) , it is more easily obtained from the
joint density as the solution of the equations
⌡ log p(∼ , Yn )
Ζ 0,
⌡ ∼t
t Ζ 1,..., n.
(2)
Similarly, at the filtering stage the PME of At given Yt ϑ1 is the solution of the equations
⌡ log p( At | Yt ϑ1 )
Ζ 0, s Ζ 1,η , t.
⌡∼ s
(3)
We leave aside for the moment the initialisation question.
In the non-Gaussian case these equations are non-linear and must be solved by
iteration. The idea we shall use for obtaining suitable iterative steps is the following. Write out
the equations (2) for the linear Gaussian analogue of the model under consideration. Since
for the Gaussian densities modes are equal to means, the solution of the equations is
E (∼ | Yn ) . But we know that this is given by the KFS. It follows that the equations in the
~ be a trial value of ∼ and
linear Gaussian case can be solved by the KFS. Now let ∼
~ . This can be done by expanding locally about ∼
~ or
construct a linearised form of (2) using ∼
by other methods to be illustrated later. Manipulate the resulting equations into a form that
mimics the analogous Gaussian equations and solve by the KFS to obtain an improved value
3
of ∼ . Repeat the process and continue until suitable convergence has been achieved. Of
course, this iteration for estimation of the state vector has to be combined with parallel
iteration for estimation of the hyperparameter vector.
In the next section the technique is illustrated by applying it to observations from a
general exponential family. Section 3 investigates the problem of hyperparameter estimation,
section 4 considers the question of calculating suitable starting values for the state iteration
and in section 5 we give some conclusions.
2.
State space models when the observations come from a general exponential
family distribution
We consider the case where the observations come from a general exponential family
distribution with density
p( y t | ∼ t ) Ζ expξ °⟨t y t ϑ bt (° t ) Η ct ( y t )ζ
(4)
where ° t Ζ Z t ∼ t , t Ζ 1,..., n , and we refer to it as the signal. We retain the same state
transition equation as in the Gaussian case (1), namely
∼ t Ζ Tt ∼ t ϑ1 Η Rt ♣ t ,
♣ t ~ N (0, Qt ),
t Ζ 1,..., n.
(5)
This model covers many important time series applications occurring in practice, including
binomial, Poisson, multinomial and exponential distributed time series data. In (4), bt and ct
are asummed to be known functions and y t is a p ⌠ 1 vector of observations which may be
continuous or discrete. Assumptions regarding (5) are the same as in the Gaussian case.
If the observations y t in (4) had been independent with ∼ t , bt and c t constant then
(4) would have been a generalised linear model, for treatment of which see McCulloch and
Nelder (1989). Model (4) and (5) was proposed for appropriate type of non-Gaussian time
series data by West, Harrison and Migon (1985) who called it the dynamic generalised linear
model. They gave an approximate treatment based on conjugate priors. The treatment given
here differs completely from theirs and is based on Durbin and Koopman (1994). In fact, it is
an extension of this last work.
It is evident that Poisson data with mean exp( Z t⟨∼ t ) satisfy (4). For binomial data,
suppose y t is the number of successes in N trials with probability ↓ t . Then
log p( y t | ∼ t ) Ζ exp≡ logξ ↓ t Ε1 ϑ ↓ t Φ ζ y t Η N logΕ1 ϑ ↓ t Φ Η log N!
ϑ log yt ! ϑ logΕN ϑ y t Φ ! …
which satisfies (4) by taking
↓ t Ζ expΕZ t⟨∼ t Φ ξ1 Η expΕZ t⟨∼ t Φζ .
4
This is the logistic transform which is in general used for logistic regression of binomial data
in the non-time series case.
The main reason for the inclusion of Rt in (5) is that some of the equations of (5)
have zero error terms. This happens, for example, when some elements of the state vector
are regression coefficients which are constant over time. The function of Rt is then to select
those error terms which are non-zero. This is done by taking the columns of Rt to be a
subset of the columns of I m . For simplicity let us assume that this is so. Then Rt⟨Rt Ζ I g ,
where g is the number of non-zero error terms. Also, Rt Qt Rt⟨ and Rt Qtϑ1 Rt⟨ are MoorePenrose generalised inverses of each other (see, for example, section 16.5 of Rao (1973)).
Since none of the elements of ♣ t need be degenerated we assume that Qt is positive
definite.
For simplicity suppose that ∼ 0 is fixed and known, leaving until later consideration of
the initialisation question. The log density of ∼ , Yn is then
log p(∼, Yn ) Ζ ϑ
1 n
Ε∼ t ϑ Tt ∼ t ϑ1 Φ⟨ Rt Qtϑ1 Rt⟨Ε∼ t ϑ Tt ∼ t ϑ1 Φ
å
2 t Ζ1
n
Η å ξ∼ ⟨t Z t⟨ y t ϑ bt ΕZ t ∼ t Φ Η c t Ε y t Φ ζ .
t Ζ1
Differentiating with respect to ∼ t and equating to zero, we obtain the PME´s ∼ˆ 1 ,..., ∼ˆ n as the
solution of the equations
ϑ Rt Qtϑ1 Rt⟨ Ε∼ t ϑ Tt ∼ t ϑ1 Φ Η Tt Η1 Rt Η1QtϑΗ11 Rt⟨Η1 Ε∼ t Η1 ϑ Tt Η1∼ t Φ
ξ
ζ
Η Z t⟨ y t ϑ bt* ΕZ t ∼ t Φ Ζ 0
(6)
for t Ζ 1, η , n ϑ 1 where for t Ζ n the second term is absent. Here bt* ( x ) denotes dbt ( x ) / dx
for any p ⌠ 1 vector x .
~ gives
~ is a trial value of ∼ . Expanding b * ΕZ ∼ Φ about ∼
Suppose that ∼
t
t
t
t t
t
~ Φ Η b ** ΕZ ∼
~
~
bt* ΕZ t ∼ t Φ Ζ bt* ΕZ t ∼
t
t
t t Φ Z t Ε∼ t ϑ ∼ t Φ
to the first order where bt** ( x ) Ζ dbt* ( x ) / dx Ζ d 2 bt ( x ) / dx dx ⟨ . Putting
ξ
ζξ
ζ
~ Φ ϑ1 y ϑ b * ΕZ ∼
~
~
~
y t Ζ bt** ΕZ t ∼
t
t
t
t t Φ Η Zt ∼ t
in the last term of (6) gives
~ Φξ ~
Z t⟨bt** ΕZ t ∼
yt ϑ Z t ∼ t ζ .
t
Now the equations corresponding to (6) for the linear Gaussian model (1) are
5
(7)
ϑ Rt Qtϑ1 Rt⟨ Ε∼ t ϑ Tt ∼ t ϑ1 Φ Η Tt Η1 Rt Η1QtϑΗ11 Rt⟨Η1 Ε∼ t Η1 ϑ Tt Η1∼ t Φ
Η Z t⟨H tϑ1 Ε y t ϑ Z t ∼ t Φ Ζ 0.
(8)
Comparing (7) with the last term of (8) we see that the linearised form of (6) has the same
ξ
ζ
~ Φ ϑ1 . Since the solution of (8) is
y t and bt** ΕZ t ∼
form as (8) if we replace y t and H t in (8) by ~
t
given by the KFS it follows that with these substitutions the solution of the linearised form of
(6) can also be obtained by the KFS. By this means we obtain an improved value of ∼ which
is used as a new trial value to obtain a new improved value, and so on until suitable
convergence has been reached.
Initialisation of the Kalman filter can be carried out by any of the techniques
considered by de Jong (1989, 1991) and Koopman (1993) (see, for example, Abril (1999)).
Since in the course of the iterations variances and covariances of errors of estimation of the
∼ t ´s are not required, Koopman´s smoother should be used in preference to de Jong´s since
it is faster. After iterations for state and parameter estimation are complete, if approximate
variances and covariances are required a further smoothing pass using de Jong´s smoother
can be made. However, the accuracy of these is open to doubt since in the non-Gaussian
case the matrix bt** ΕZ t ∼ˆ t Φ is a random matrix whereas in the Gaussian case the analogous
matrix H t is fixed.
3.
Hyperparameter estimation
Since construction of the exact likelihood is difficult or impossible, the ML estimation
of the unknown parameter vector ÷ that we consider in this section is necessarily
approximate. The simplest approach is to construct an approximate likelihood by the
prediction error decomposition method. For the Gaussian model (1) in the diffuse case the
likelihood is given by
n
L Ζ p(Yn | ÷) Ζ ( 2↓) ϑ( n ϑ ≥ ) p / 2 Ft
t Ζ ≥ Η1
ϑ
1
2
1 n
exp ϑ v t⟨ Ft ϑ1v t 2 t Ζ≥ Η1
(9)
where vt Ζ y t ϑ Z t at , a t is the filtered estimate of ∼ t and Ft Ζ Var (vt ) . By analogy with this
our first approximate form for the likelihood is given by the same expression (9), where
vt and Ft are now the values calculated in a final pass of the Kalman filter using the methods
~ equal to the final smoothed estimate of ∼ .
of the previous section with ∼
t
t
Assume now that the initial mean and variance a 0 and P0 are known. For the second
approximate form, denote Eξ(∼ ϑ ∼ˆ )(∼ ϑ ∼ˆ ) ⟨ζ for the exponential family model (4) and (5) by
6
Ve and denote the same matrix for the Gaussian model (1) with Rt Ζ I m by V g . Let
pe (∼ , Yn | ÷ ) , p g (∼ , Yn | ÷ ) , pe (Yn | ÷) , p g (Yn | ÷ ) be the joint densities of ∼ , Yn and the
marginal densities of Yn under the two models. Then
ϑ1 / 2
p g (∼, Yn | ÷) Ζ p g (Yn | ÷) (2↓) ϑ nm / 2 V g
1
exp ϑ (∼ ϑ ∼ˆ )⟨V gϑ1 (∼ ϑ ∼ˆ ) ,
2
so on putting ∼ Ζ ∼ˆ we have
p g (Yn | ÷ ) Ζ ( 2↓) nm / 2 V g
1/ 2
p g (∼ˆ , Yn | ÷ ).
(10)
Now
n
p g (∼, Yn | ÷) Ζ (2↓) ϑ n ( m Η p ) / 2 Õ Qt
ϑ1 / 2
ϑ1 / 2
Ht
f g (∼, Yn )
t Ζ1
where
1 n
⟨
f g (∼ , Yn ) Ζ expϑ å Ε∼ t ϑ Tt ∼ t ϑ1 Φ Qtϑ1 Ε∼ t ϑ Tt ∼ t ϑ1 Φ
2 t Ζ1 ⟨
Η Ε y t ϑ Z t ∼ t Φ H tϑ1 Ε y t ϑ Z t ∼ t Φ .
(11)
We deduce from (10) and (11) that
1/ 2
p g (Yn | ÷) Ζ (2↓) ϑ np / 2 V g
n
ÕQ
ϑ1 / 2
t
Ht
ϑ1 / 2
f g (∼, Yn ).
(12)
t Ζ1
But by the prediction error decomposition,
n
p g (Yn | ÷ ) Ζ ( 2↓) ϑnp / 2 Ft
ϑ1 / 2
t Ζ1
1 n
expϑ vt⟨ Ft ϑ1vt .
2 t Ζ1
(13)
Since the quadratic forms in Yn in the exponents of (12) and (13) must be the same, the
following identity holds:
Vg
1/ 2
n
Ζ Õ Qt
1/ 2
Ht
1/ 2
Ft
ϑ1 / 2
.
(14)
t Ζ1
We now apply these results to the exponential family model. The joint density of
∼ and Yn is
n
p e (∼ , Yn | ÷ ) Ζ ( 2↓) ϑ nm / 2 Qt
ϑ1 / 2
t Ζ1
1 n
⟨
⌠ expϑ Ε∼ t ϑ Tt ∼ t ϑ1 Φ Qtϑ1 Ε∼ t ϑ Tt ∼ t ϑ1 Φ
2 t Ζ1 n
Η ξ∼ ⟨t Z ' t y t ϑ bt ΕZ t ∼ t Φ Η c t Ε y t Φ ζ .
t Ζ1
7
(15)
By analogy with (10) we take for the approximate marginal density of Yn , and hence our
second approximation to the likelihood,
L Ζ p e (Yn | ÷ ) Ζ (2↓) nm / 2 Ve
1/ 2
p e (∼ˆ , Yn | ÷ )
(16)
where
Ve
1/ 2
n
Ζ Õ Qt
t Ζ1
1/ 2
bt** ( Z t ∼ˆ t )
ϑ1 / 2
Fˆt
ϑ1 / 2
(17)
and where ∼ˆ t is the final smoothed value of ∼ t and F̂t is the value of Ft that is computed in
~ Ζ ∼ˆ . Similar results hold in the diffuse case when
the final pass of the Kalman filter with ∼
t
t
the Kalman filter are initialised by any of the techniques considered by de Jong (1989, 1991)
and Koopman (1993) (see, for example, Abril (1999)).
Either of these forms of the likelihood can be maximised by numerical maximisation
routines as for the prediction error decomposition in the Gaussian case, with concentration
where appropriate (see, for example, Abril (1999)). However, a new point arises that was not
present in the Gaussian case. Values of the likelihood are calculated from the results of
iterated estimation of ∼ˆ and are therefore subject to small errors since the iteration cannot
be continued until the errors are infinitesimal. Consequently they are not suitable for
maximising routines that calculate the gradient from adjacent values of the hyperparameters
and then proceed from the last approximation along a direction solely determined by this
gradient. Instead, routines should be used such as the downhill simplex method given in
section 10.4 of Press et al. (1986), Powell´s method given in section 10.5 of the same book
or the Gill-Murray Pitfield algorithm.
Instead of estimating the hyperparameters by direct numerical maximisation of the
loglikelihood, the EM algorithm may be used as an alternative, or the two techniques may be
combined using the EM algorithm in the early stages of maximisation, where it is relatively
fast, and then switching to numerical maximisation in the later stages, where it is relatively
slow.
Let us now consider the use of the EM algorithm. For the model we are considering,
the observational density (4) does not depend on unknown hyperparameters. Consequently,
for construction of the EM algorithm we can neglect the observational component, so the E
step gives, as an approximation, and ignoring negligible terms,
1 n
~
E c ξ log p(∼, Yn | ÷ )ζ Ζ ϑ å ξ log Qt Η tr Qtϑ1 (∼ˆ t ϑ Tt ∼ˆ t ϑ1 )(∼ˆ t ϑ Tt ∼ˆ t ϑ1 )⟨
2 t Ζ1
~ ~
~
~
Η Vt Η TtVt ϑ1Tt⟨ ϑ Tt C t ϑ C t⟨Tt .
ζ
8
(18)
~ , where ÷
~ is a trial value of ÷ ,
Here, ∼ˆ t is the final smoothed value of ∼ t given that ÷ Ζ ÷
~
Ec
denotes
expectation
with
respect
to
density
~
∼ˆ Ζ E c (∼ ) ,
~) ,
p (∼ | Yn , ÷
~ ~
~ ~
Vt Ζ E c ξ(∼ t ϑ ∼ˆ t ) (∼ t ϑ ∼ˆ t )⟨ζ and C t Ζ E c ξ(∼ t ϑ1 ϑ ∼ˆ t ϑ1 ) (∼ t ϑ ∼ˆ t )⟨ζ .
Similarly, and preferably, if hyperparameters occur only in Qt and not in Tt we can
use the faster Koopman (1993) form
ξ
ζ
1 n
~
~♣
~⟨ ~ ~ ~
(19)
E c ξ log p (∼, Yn | ÷ )ζ Ζ ϑ å log Qt Η tr Qtϑ1 (♣
t t Η Qt ϑ Qt Dt Qt )
2 t Ζ1
~
~ ,Q
where ♣
t
t and Dt are calculated for the linearised model as in the same way as for the
~
linear model (see, for example, Abril (1999)). In either case, in the M step, E c [.] is
maximised with respect to ÷ to obtain an improved estimate.
For cases in which the EM algorithm is not used it is sometimes convenient to
compute the score function in order to specify the direction along which numerical search
~ is
should be made. The approximate score function at ÷ Ζ ÷
⌡ log p (Yn | ÷ )
⌡÷
~
÷ Ζ÷
Ζ
~
⌡ E c ξlog p (∼ , Yn | ÷ )ζ
⌡÷
~
÷ Ζ÷
~
where E c [log p (∼ , Yn | ÷)] is given by (18) or (19). As we said, this is used to specify the
direction of numerical search for the maximum.
4.
Calculation of starting values for the state iteration
Two methods are suggested for obtaining starting values for the state iteration, the
~ , the
extended Kalman filter and the approximate two-filter smoother. With trial values ∼
t
Kalman filter step during the state iteration that we have just been discussing can be written
as
a t Η1
~
Pt Η1
~
Ζ Tt Η1 a t Η K t v~t ,
~
~ ) K~ Η R Q R ⟨ ,
Ζ Tt Η1 Pt Tt⟨Η1 ϑ Z t⟨bt** ( Z t ∼
t
t
t Η1 t Η1 t Η1
ξ
ζ
(20)
where
v~t
~
Kt
~
Ft
~ Φ Η b ** ΕZ ∼
~
~
yt ϑ bt* ΕZ t ∼
t
t
t t ΦZ t Ε∼ t ϑ ∼ t Φ,
~
~ ) Z F~ ϑ1 ,
Ζ Tt Η1 Pt bt** ( Z t ∼
t
t t
~
**
~
~ ) Η b ** ( Z ∼
~
Ζ bt ( Z t ∼ t ) Z t Pt Z t⟨bt** ( Z t ∼
t
t
t t ).
Ζ
~ Ζ a in these formulae. Thus
The extended Kalman filter is obtained merely by taking ∼
t Η1
t Η1
~ the estimate of ∼ given by the filter at time t – 1.
all we are doing is taking for ∼
t
t
9
The extended Kalman filter normally gives adequate starting values for the state
iteration. Indeed Fahrmeir (1992) claims that it gives an adequate approximation to the PME
though Durbin and Koopman (1994) and Abril (1999) dispute this. However, it is obvious that
the estimates of ∼ t that it gives for t small are less accurate than those for t large. This fault
is rectified by the following device which uses the entire sample to estimate ∼ t at each time
point.
Let us revert momentarily to the Gaussian model (1). Let Yt Ηn1 Ζ ( y t⟨Η1 , η , y n⟨ ) ⟨ . Then,
because of the Markovian nature of the model,
p (∼ t | Yn ) Ζ
Ζ
Ζ
Ζ
p(∼ t , Yn )
p (Yn )
p(Yn | ∼ t ) p (∼ t )
p(Yn )
p(Yt | ∼ t ) p (Yt nΗ1 | ∼ t ) p (∼ t )
p (Yn )
p(∼ t | Yt ) p (∼ t | Yt nΗ1 )c(Yn )
p (∼ t )
(21)
where c (Yn ) depends only on Yn and where p (∼ t ) is the marginal density of ∼ t . This
formula has been derived in a wider context by Solo (1989). Now suppose, as will normally
be the case in practice, that we have initialised our Kalman Filter by a diffuse prior. The
p (∼ t ) will also be diffuse so the estimate of ∼ t going forwards in time and the estimate
going backwards in time will be independent. Suppose further that a valid and manageable
state space model is available going backwards in time; often, of course, it will be effectively
the
same
model
as
the
one
going
forwards.
Let
atn Ζ E (∼ t | Yt nΗ1 )
and
Pt n Ζ E[(∼ t ϑ atn )(∼ t ϑ a tn )⟨ | Yt nΗ1 ] , these are, of course, given routinely by the reverse
Kalman filter. Then
ξ
ζξ
∼ˆ t Ζ Pt|ϑt 1 at|t Η ( Pt n ) ϑ1 atn Pt|ϑt 1 Η ( Pt n ) ϑ1
ζ
ϑ1
which is called the two-filter smoother.
While the two-filter smoother gives the same values as the classical and de Jong
smoothers for Gaussian models, it is not as efficient as them computationally. Its real value
in the present context is that after approximation and simplification it is capable of giving a
better approximation to the PME of the stacked vector than the extended Kalman filter in
non-Gaussian cases (see, for example, Abril (1999)). We can use the idea to obtain state
starting values after implementing extended Kalman filters in both forwards and backwards
directions. Let us assume, as is often the case, that the dimensionality of ∼ t is so large that
10
inversion of the matrices Pt and Pt n at each time point is impractical. Let f and b be the
forwards and backwards estimates of a particular element of ∼ t and let v f and vb be the
corresponding diagonal elements of the variance matrices Pt and Pt n produced by the two
filters. Then take as the estimate of this element that is to be used as a starting value in the
state iteration the weighted average ( f / v f Η b / vb ) (1 / v f Η 1 / v g ) . The starting values
obtained in this way should be more accurate than those given by the extended Kalman filter
and should therefore lead to fewer iterations, but whether the resulting computational gain is
worth while depends on the particular model. We call this the approximate two-filter
smoother.
5.
Conclusions
This paper presents a methodology for the treatment of non-Gaussian time series
observations, particularly when they come from exponential family distributions. The
methodology is developed that can be used by applied researchers for dealing with real nonGaussian time series data without them having to be time series specialists. The idea
underlying the techniques is to put everything in state space form and then, linearised it to
obtain an approximation to the Gaussian case in order to apply the KFS, estimating the state
vector by its posterior mode. The PME is clearly a reasonable estimate because it maximises
the corresponding density and is analogous to the maximum likelihood estimate.
The way that the state and hyperparameter iterations are tied in together is as
follows. Let ÷ (1) be the value of ÷ that maximises the likelihood using only the extended
Kalman filter or the approximate two-filter smoother. Let ∼ (t1) be the value of ∼ t produced by
this filter or this smoother in the final iteration of this hyperparameter iteration. Taking
÷ Ζ ÷ (1) and ∼ (t1) as starting values, let ∼ t( 2 ) be the smoothed value of ∼ t produced by the
state iteration of section 2. With ∼ t Ζ ∼ (t 2 ) and ÷ (1) as a trial value of ÷ let ÷ ( 2 ) be the value
of ÷ that maximises the likelihood. Taking ∼ t( 2 ) as starting value let ∼ t( 3) be the value
produced by the state iteration. The process continues in this way until the relative change in
log L is less than a preassigned number, say 10-5 or 10-6. A final state iteration is then made
to give the final smoothed value of ∼ˆ t .
REFERENCES
ABRIL, JUAN CARLOS. (1999). Análisis de Series de Tiempo Basado en Modelos de Espacio
de Estado. EUDEBA: Buenos Aires.
11
DE
JONG, P. (1989). Smoothing and interpolation with the state-space models. J. Amer.
Statist. Assoc., 84, 1085-88.
DE JONG,
P. (1991). The diffuse Kalman filter. Ann. Statist., 19, 1073-83.
DURBIN, J. AND S. J. KOOPMAN. (1994). Filtering, smoothing and estimation for time series
when the observations come from exponential family distributions. London School of
Economics Statistics Research Report.
FAHRMEIR, L. (1992). Posterior mode estimation by extended Kalman filtering for multivariate
dynamic generalised linear models. J. Amer. Statist. Ass., 87, 501-9.
KOOPMAN, S. J. (1993). Disturbance smoother for state space models. Biometrika. 80, 11726.
MCCULLAGH, P. AND J. A. NELDER. (1989). Generalised Linear Models (Second Ed.).
Chapman and Hall: London.
PRESS, W., B. FLANNERY, S. TEUKOLSKY AND W. VETTERLING. (1986). Numerical Recipes:
The Art of Scientific Computing. Cambridge University Press: Cambridge.
RAO, C. RADHAKRISHNA. (1973). Linear Statistical Inference and Its Applications (Second
Ed.). Wiley: New York.
SAGE, A. AND J. MELSA. (1971). Estimation Theory, With Applications to Communications and
Control. McGraw-Hill: New York.
SOLO, VICTOR. (1989). A simple derivation of the Granger representation theorem.
Unpublished manuscript.
WEST, M., J. HARRISON AND H. S. MIGON. (1985). Dynamic generalised linear models and
Bayesian Forecasting. J. Amer. Statist. Ass., 80, 73-97.
WHITTLE, PETER. (1991). Likelihood and cost as path integrals (with discussion). J. R. Statist.
Soc., B, 53, 505-29.
12
Descargar