Introduction to Stochastic Processes and Their Applications

Anuncio
Springer Series in Statistics
Probability and its Applications
A Series of the Applied Probability Trust
Editors-Probability and its Applications
J. Gani, C.c. Heyde
Editors-Springer Series in Statistics
J. Berger, S. Fienberg, J. Gani, K. Krickeberg,
I. Oikin, B. Singer
Springer Series in Statistics
Anderson: Continuous-Time Markov Chains: An Applications-Oriented
Approach.
Andrews/Herzberg: Data: A Collection of Problems from Many Fields for the
Student and Research Worker.
Anscombe: Computing in Statistical Science through APL.
Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition.
BolJarine/Zacks: Prediction Theory for Finite Populations.
Bremaud: Point Processes and Queues: Martingale Dynamics.
Brockwell/Davis: Time Series: Theory and Methods, 2nd edition.
Clloi: ARMA Model Identification
Daley!Vere-Jones: An Introduction to the Theory of Point Processes.
Dzllaparidze: Parameter Estimation and Hypothesis Testing in Spectral Analysis of
Stationary Time Series.
Farrell: Multivariate Calculation.
Fienberg/Hoaglin/Kruskal/Tanur (Eds.): A Statistical Model:
Frederick Mosteller's Contributions to Statistics, Science, and
Public Policy.
Goodman/Kruskal: Measures of Association for Cross Classifications.
Grandell: Aspects of Risk Theory.
Hall: The Bootstrap and Edgeworth Expansion.
Hardie: Smoothing Techniques: With Implementation in S.
Hartigan: Bayes Theory.
Heyer: Theory of Statistical Experiments.
Jolliffe: Principal Component Analysis.
Kotz/Jollilson (Eds.): Breakthroughs in Statistics Volume I.
Kotz/JollIIson (Eds.): Breakthroughs in Statistics Volume II.
Kres: Statistical Tables for Multivariate Analysis.
Leadbetter/LindgrenIRootzen: Extremes and Related Properties of Random
Sequences and Processes.
Le Cam: Asymptotic Methods in Statistical Decision Theory.
Le CamlYang: Asymptotics in Statistics: Some Basic Concepts.
Manoukian: Modern Concepts and Theorems of Mathematical Statistics.
Miller, Jr.: Simultaneous Statistical Inference, 2nd edition.
Mosteller/Wallace: Applied Bayesian and Classical Inference: The Case of The
Federalist Papers.
Pollard: Convergence of Stochastic Processes.
Pratt/Gibbons: Concepts of Nonparametric Theory.
Read/Cressie: Goodness-of-Fit Statistics for Discrete Multivariate Data.
Reiss: Approximate Distributions of Order Statistics: With Applications to
Nonparametric Statistics.
Ross: Nonlinear Estimation.
Sachs: Applied Statistics: A Handbook of Techniques, 2nd edition.
Salsburg: The Use of Restricted Significance Tests in Clinical Trials.
Samdal/Swensson/Wretman: Model Assisted Survey Sampling.
Seneta: Non-Negative Matrices and Markov Chains.
(colllinued after index)
Petar Todorovic
An Introduction to
Stochastic Processes and
Their Applications
With 15 Illustrations
Springer-Verlag
New York Berlin Heidelberg London Paris
Tokyo Hong Kong Barcelona Budapest
Petar Todorovic
Department of Statistics and
Applied Probability
University of California-Santa Barbara
Santa Barbara, CA 93106
USA
Series Editors:
J. Gani
Department of Statistics
University of California
Santa Barbara, CA 93106
USA
C.C. Heyde
Department of Statistics
Institute of Advanced Studies
The Australian National University
GPO Box 4, Canberra ACT 2601
Australia
Mathematics Subject Classification (1991): 60G07, 60G12, 60125
Library of Congress Cataloging-in-Publication Data
Todorovic, P. (Petar)
An introduction to stochastic processes and their applications / by P. Todorovic.
p. cm.-(Springer series in statistics)
Includes bibliographical references and index.
ISBN -13: 978 -1-4613-9744 -1
e-ISBN -13: 978 -1-4613-9742-7
DOl: 10.1007/978-1-4613-9742-7
1. Stochastic processes. I. Series.
QA274.T64
1992
519.2-dc20
91-46692
Printed on acid-free paper.
© 1992 The Applied Probability Trust.
Softcoverreprint of the hardcover 1st edition 1992
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New
York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis.
Use in connection with any form of information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed
is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even if
the former are not especially identified, is not to be taken as a sign that such names, as understood
by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.
Production managed by Henry Krell; manufacturing supervised by Robert Paella.
Typeset by Asco Trade Typesetting Ltd., Hong Kong.
9 8 7 6 5 4 321
ISBN -13: 978 -1-4613-9744-1
To my wife Zivadinka
Preface
This text on stochastic processes and their applications is based on a set of
lectures given during the past several years at the University of California,
Santa Barbara (UCSB). It is an introductory graduate course designed for
classroom purposes. Its objective is to provide graduate students of statistics
with an overview of some basic methods and techniques in the theory of
stochastic processes. The only prerequisites are some rudiments of measure
and integration theory and an intermediate course in probability theory.
There are more than 50 examples and applications and 243 problems and
complements which appear at the end of each chapter.
The book consists of 10 chapters. Basic concepts and definitions are provided in Chapter 1. This chapter also contains a number of motivating examples and applications illustrating the practical use of the concepts. The
last five sections are devoted to topics such as separability, continuity, and
measurability of random processes, which are discussed in some detail.
The concept of a simple point process on R+ is introduced in Chapter
2. Using the coupling inequality and Le Cam's lemma, it is shown that
if its counting function is stochastically continuous and has independent
increments, the point process is Poisson. When the counting function is
Markovian, the sequence of arrival times is also a Markov process. Some
related topics such as independent thinning and marked point processes are
also discussed. In the final section, an application of these results to flood
modeling is presented.
Chapter 3 provides a short introduction to the theory of one-dimensional
Brownian motion. Principal topics here are hitting times, extremes, the reflection principle, properties of sample paths and the law of the iterated logarithm. The chapter ends with a discussion of the Langevin equation, the
Ornstein-Uhlenbeck process and stochastic integration.
viii
Preface
Chapter 4 deals with the theory of Gaussian processes. It begins with a
brief account of the relevant matrix theory and the definition of a multivariate
Gaussian distribution and its characteristic function, which is worked out in
detail. We show that a system of random variables is Gaussian if and only if
every linear combination of these variables is normal. We also discuss the
Markov-Gaussian process and prove Doob's theorem, which asserts that the
only stationary Gaussian processes are the Ornstein-Uhlenbeck processes.
Chapter 5 contains a brief introduction to the Hilbert space L 2 , which
has some particular features not shared by other Lp spaces. Here the emphasis
is on those topics essential in subsequent sections. They include the RieszFisher theorem, the structure of L2 spaces, the concept of orthogonal projection and orthogonal basis, separability, and linear and projection operators.
Chapter 6 deals with the theory of second order (or L 2 ) processes, which
are characterized up to Hilbert space isometry by their covariance functions.
The focus here is on the covariance function and its properties. It is natural
to have criteria for continuity, etc., expressed in terms of the covariance
function. Expansion of the covariance function in terms of its eigenvalues and
eigenfunctions, as well as the Karhunen-Loeve expansion are discussed in
some detail.
The first part of Chapter 7 is concerned with the spectral analysis of (wide
sense) stationary processes. The gist of this section is the "spectral representation" of a stationary process, which establishes an isometric isomorphism
between the closed linear manifold spanned by the random variables of the
process and a certain L2 space of complex functions. With the groundwork
laid, the problem of estimation (and its special cases filtering and prediction)
can now be investigated. The method for solving the prediction problem
described here is due to Yaglom. Its starting point is the spectral representation of the process. However, the results obtained are most useful for rational
spectral densities. Finally, the Wold decomposition is also considered in some
detail.
Chapter 8, an introduction to Markov processes, consists of three parts.
The first lists some basic features of homogeneous Markov processes: it is
shown that the existence of a stationary measure is a necessary and sufficient
condition for the process to be strictly stationary. The second part treats a
class of homogeneous Markov processes with countable state space. The
focus here is on the transition probability and its properties. If sample paths
of the Markov process are right continuous, then its transition probability is
not only uniformly continuous but also differentiable. This is used to derive
Kolmogorov's backward and forward differential equations. In this section
we also introduce the concept of the "strong Markov" property and discuss
the structure of Markov chains. The last part is concerned with homogeneous
diffusion. We briefly describe Ito's approach, which shows that a diffusion
process is governed by a first-order stochastic differential equation which
depends on a standard Brownian motion process.
Preface
ix
Chapter 9 provides an introduction to the application of semigroup theory
to Markov processes, whereas Chapter 10 discusses some rudiments of the
theory of discrete parameter martingales.
I would like to point out that after Chapter 1 (or at least the first half of it)
one can move directly to most of the other chapters. Chapter 5, however, is a
necessary prerequisite for reading Chapters 6, 7, and 9. The course has been
tested over years on graduate students of statistics at the University of California, Santa Barbara, and contains material suitable for an introductory as
well as a more advanced course in stochastic processes.
For encouragement, support, and valuable advice, I am glad to thank Dr.
Joe Gani. I am also grateful to the referees including William Griffith and
Gennady Samorodnitsky for their comments on the first draft of this book.
My special thanks to Chris Heyde for his extraordinarily careful reading of
the whole manuscript and for correcting numerous errors and misprints.
Finally, I acknowledge with warm thanks my indebtedness to colleagues and
students at the UCSB Department of Statistics and Applied Probability.
Petar Todorovic
Contents
Preface
vii
CHAPTER 1
Basic Concepts and Definitions
1.1.
1.2.
1.3.
1.4.
1.5.
1.6.
1.7.
1.8.
1.9.
1.10.
1.11.
Definition of a Stochastic Process
Sample Functions
Equivalent Stochastic Processes
Kolmogorov Construction
Principal Classes of Random Processes
Some Applications
Separability
Some Examples
Continuity Concepts
More on Separability and Continuity
Measurable Random Processes
Problems and Complements
1
2
5
7
8
12
18
21
22
25
27
30
CHAPTER 2
The Poisson Process and Its Ramifications
34
2.1.
2.2.
2.3.
2.4.
2.5.
2.6.
2.7.
2.8.
2.9.
2.10.
34
35
37
40
43
Introduction
Simple Point Process on R+
Some Auxiliary Results
Definition of a Poisson Process
Arrival Times {ttl
Markov Property of N(t) and Its Implications
Doubly Stochastic Poisson Process
Thinning of a Point Process
Marked Point Processes
Modeling of Floods
Problems and Complements
46
50
51
53
56
58
Contents
XII
CHAPTER 3
Elements of Brownian Motion
62
3.1.
3.2.
3.3.
3.4.
3.5.
3.6.
3.7.
3.8.
62
65
67
71
74
Definitions and Preliminaries
Hitting Times
Extremes of ~(t)
Some Properties of the Brownian Paths
Law of the Iterated Logarithm
Some Extensions
The Ornstein-Uhlenbeck Process
Stochastic Integration
Problems and Complements
79
81
85
88
CHAPTER 4
Ga ussian Processes
4.1.
4.2.
4.3.
4.4.
4.5.
4.6.
Review of Elements of Matrix Analysis
Gaussian Systems
Some Characterizations of the Normal Distribution
The Gaussian Process
Markov Gaussian Process
Stationary Gaussian Process
Problems and Complements
CHAPTER 5
L2 Space
5.1.
5.2.
5.3.
5.4.
5.5.
5.6.
5.7.
5.8.
Definitions and Preliminaries
Convergence in Quadratic Mean
Remarks on the Structure of L2
Orthogonal Projection
Orthogonal Basis
Existence of a Complete Orthonormal Sequence in L2
Linear Operators in a Hilbert Space
Projection Operators
Problems and Complements
92
92
93
96
97
99
102
103
106
106
110
113
115
118
121
122
125
126
CHAPTER 6
Second-Order Processes
129
6.1.
6.2.
6.3.
6.4.
6.5.
6.6.
129
Covariance Function C(s, t)
Quadratic Mean Continuity and Differentiability
Eigenvalues and Eigenfunctions of C(s, t)
Karhunen-Loeve Expansion
Stationary Stochastic Processes
Remarks on the Ergodicity Property
Problems and Complements
132
136
139
143
145
148
CHAPTER 7
Spectral Analysis of Stationary Processes
150
7.1.
7.2.
7.3.
150
153
157
Preliminaries
Proof of the Bochner-Khinchin and Herglotz Theorems
Random Measures
Contents
xiii
7.4.
7.S.
7.6.
7.7.
7.8.
7.9.
7.10.
7.11.
7.12.
7.13.
7.14.
7.15.
160
Process with Orthogonal Increments
Spectral Representation
Ramifications of Spectral Representation
Estimation, Prediction, and Filtering
An Application
Linear Transformations
Linear Prediction, General Remarks
The Wold Decomposition
Discrete Parameter Processes
Linear Prediction
Evaluation of the Spectral Characteristic qJ(A, h)
General Form of Rational Spectral Density
Problems and Complements
162
166
169
172
174
176
179
182
185
188
192
196
CHAPTER 8
Markov Processes I
200
8.1.
8.2.
8.3.
8.4.
8.5.
8.6.
8.7.
8.8.
200
Introduction
Invariant Measures
Countable State Space
Birth and Death Process
Sample Function Properties
Strong Markov Processes
Structure of a Markov Chain
Homogeneous Diffusion
Problems and Complements
203
205
211
214
216
218
223
227
CHAPTER 9
Markov Processes II: Application of Semigroup Theory
232
9.1.
9.2.
9.3.
9.4.
9.5.
9.6.
9.7.
232
234
238
241
243
247
252
256
Introduction and Preliminaries
Generator of a Semigroup
The Resolvent
Uniqueness Theorem
The Hille-Yosida Theorem
Examples
Some Refinements and Extensions
Problems and Complements
CHAPTER 10
Discrete Parameter Martingales
258
10.1.
10.2.
10.3.
10.4.
10.5.
10.6.
258
260
263
266
268
Conditional Expectation
Discrete Parameter Martingales
Examples
The Upcrossing Inequality
Convergence of Submartingales
Uniformly Integrable Martingales
Problems and Complements
272
276
Bibliography
279
Index
284
CHAPTER 1
Basic Concepts and Definitions
1.1. Definition of a Stochastic Process
Generally speaking, a stochastic or random process (in this book both terms
will be used in an equivalent sense) is a family of random variables defined on
a common probability space, indexed by the elements of an ordered set T,
which is called the parameter set. Most often, T is taken to be an interval of
time and the random variable indexed by an element t E T is said to describe
the state of the process at time t. Random processes considered here are
specified by the following definition:
Definition 1.1. A stochastic process is a family of (real- or complex-valued)
random variables {X(t);t E T}, defined on a common probability space
{n, BiJ, P}, where the parameter set T is a subset of the real line R.
In the following we call a stochastic process real if the random variables
X(t) are real-valued for all t E T. If the random variables X(t) are all complexvalued, the random process is called complex.
The parameter set T is often called the domain of definition of the stochastic process X (t). If T = N + = {O, 1, ... } the process is said to be a discrete parameter stochastic process. If T is an interval of the real line, the
process is said to have a continuous parameter. Most often, T is either N + or
R+ = [0, (0).
Let {X(t);tE T} be a real-valued random process and {t 1 , ••• ,tn } c T,
where t1 < t2 < ... < tn' then
(1.1.1)
is a finite-dimensional marginal distribution function of the process
1. Basic Concepts and Definitions
2
{X(t); t E T}. One of the basic characteristics of a real random process are its
marginal distribution functions.
Definition 1.2. We shall say that the random process {X(t);t
all its marginal distribution functions
{Fr, ..... tJ·, ... ,· )}
E
T} is given if
(1.1.2)
obtained for all finite subsets {t l , ••. ,tn } c T are given.
Marginal distribution functions (1.2) must satisfy the following consistency
conditions:
(1.1.3)
(i)
for any permutation k l , ••. , kn of 1, ... , n.
(ii) For any 1 ::; k < n and Xl'"'' X k E R,
Many future statements will hold for both real- and complex-valued random processes. For this and other reasons, it is convenient to introduce the
concept of a state space {S, 2'} of a process. Here, S is the set of values of the
process and 2' is a a-algebra of subsets of S. For instance, if the process is
real, S will be the real line or a part of it and 2' will be the a-field of Borel
subsets of S.
1.2. Sample Functions
From Definition 1.1.1, it follows that for each t E T fixed, X(t) = X(t, .) is a
random variable defined on n. On the other hand, for any OJ E n fixed, X ( . ,OJ)
is a function defined on the parameter set T. Any such function is called a
realization, trajectory, sample path, or sample function of the stochastic
process X(t). In Figure 1.1, a realization of the process X(t) is depicted when
T = [0, To].
XU)
o
Figure 1.1. A realization of X(t).
1.2. Sample Functions
3
A stochastic process (say, real) {X(t); t E T} is said to be directly given if
the sample space n = RT, where RT is the set of all functions on the parameter
set T with values in R, i.e., any OJ E n is a mapping
OJ;
T
--+
R.
The stochastic process {X(t); t E T} is then defined as a family of coordinate
mappings on RT to R. In other words, for any t E T, X(t,') is a random
variable on RT defined as follows:
for any OJ(')
EXAMPLE
ity space
1.2.1. Let X and Y be independent random variables on a probabilwith
{n,~,p}
Hx(x) = P{X ~ x}
Let
{~(t);
X(t,OJ) = OJ(t).
E RT,
t
~
and
Hy(y) = P{Y ~ y}.
O} be a stochastic process defined as
~(t) =
tX
+ Y.
In Figure 1.2 are some sample functions of ~(t).
Clearly,
Ft(x) = P{tX
+ Y ~ x}
=
=
For any 0 <
t1
<
t2
f:
f-roro
p{X
~ x ~ y} dHy(y)
(x - y)
Hx ~t- dHy(y).
< '" < tn' we have
Ft, ..... tJX1'···'Xn) = P{¢(td ~
=
fro P
-ro
Xl> ... ,
{x ~
Xl -
~(tn):::; x n}
y, ... , X:::; Xn - y} dHy(y)
t1
~(t)
o
Figure 1.2. Realizations of ~(t).
~
1. Basic Concepts and Definitions
4
It is also easy to verify that
stE{X2}
Eg(t)~(s)} =
+ E{y2},
assuming that E{X} = E{Y} = 0 and that X, Y E
L2{n,~,p}.
Before concluding this section, a few remarks on the structure of the a-field
in the case when n is the space offunctions R T , seem appropriate. Let B 1 ,
B2 , ••• , Bn be n arbitrary Borel subsets of the real line, and consider the
following subset of RT:
[]B,
{w;X(td
E
B1, ... ,X(tn) E Bn },
where {t1, ... ,tn } c T and X(t,·) is coordinate mapping on RT to R. This
subset is called an n-dimensional Borel cylinder. Then the a-algebra 91 is
defined as the least a-algebra of subsets of RT containing all the finitedimensional Borel cylinders.
The problem with this a-field is that the sample space n = RT is too big
and the a-algebra ~ is too small, in the sense that many important subsets of
n are not members of ~. Therefore, we cannot compute their probabilities
because the probability measure P is defined on ~. For instance, the subset
of all continuous functions of RT is not an event (i.e., a member of 91). The
subset
{ w; sup X(t)::;
rET
x} = n {X(t)::; x}
rET
does not belong in general to 91. As a matter of fact, any subset of RT that is
an intersection or union of uncountably many events from ~ may not necessarily be an element of ~. Such problems do not arise when T is a countable
set.
These difficulties are alleviated by the separability concept introduced by
Doob which shall be discussed in a later section. The following example is
quite instructive.
EXAMPLE 1.2.2. Let {~(t); t
space {n,9I,p} given by
~
O} be a real stochastic process on a probability
~(t)
= t + X,
where X ~ N(O, 1) (i.e., X has a standard normal distribution). The sample
functions of the process ~(t) are straight lines on T = [0, ct:)). Let D c T be a
countable set, say D = {t 1 ,t2 , ••• }, and consider the random event
A = {~(t) = 0
That A
E []B
for at least one tED}.
can be easily seen if we write it as
A=
U g(t;)=O} = U {X=-tj.
tiED
tiED
5
1.3. Equivalent Stochastic Processes
Because each {X = -tJ E flI, it follows that A E flI. It is also easy to see that
= 0 because P{X = -til = 0 for each ti E D.
On the other hand, if we choose D to be [0,1], then the subset Ben
defined as
P(A)
B = {W) = 0 for at least one t E [0, 1]}
U
=
te[O,l]
{X
=
-t}
is not necessarily an event because it is an uncountable union of random
events. But,
U
{X=-t}={XE[-I,O]}EflI.
te[O,l]
We also see that
P(B)
= (2nr1
fO
e- x2 / 2 dx > O.
-1
Therefore, sometimes an uncountable union of null events can be an event
with positive probability. However, a countable union of null events is always
a null event.
1.3. Equivalent Stochastic Processes
Let {X(t);tE T} and {Y(t);tE T} be two stochastic processes defined on a
common probability space {n, flI, P} and taking values in the same state space
{S,.P}.
Definition 1.3.1. If, for each n = 1, 2, ...
P{X(td E B1 ,
... ,
X(tn) E Bn} = P{Y(td E B1 ,
... ,
Y(tn) E Bn}
(1.3.1)
for all {t 1 , ... ,tn } c T and {B 1 " .. ,Bn } c.P, the random processes X(t) and
Y(t) are said to be stochastically equivalent in the wide sense.
Definition 1.3.2. If, for every t E T,
P{X(t) = Y(t)} = 1,
(1.3.2)
the random processes are said to be stochastically equivalent, or just
equivalent.
Let us show that (1.3.2) implies (1.3.1): Due to (1.3.2),
P{X(td E B 1 ,
... ,
X(tn) E Bn}
= P{X(t 1) E B 1, ... , X(t n) E Bn> X(td
= Y(t 1), ... , X(t n) = Y(t n)}
1. Basic Concepts and Definitions
6
= P{Y(td E B l , ... ,
Y(tn) E Bn , X(t l
= P{Y(td E B l ,···,
Y(tn)
E
)
=
Y(t l
), ... ,
X(tn) = Y(t n)}
Bn}·
Definition 1.3.3. Let g(t);t E T} be a stochastic process on {n,~,p} with
state space {S, 2}. Any other stochastic process g(t); t E T} on the same
probability space, equivalent to ~(t), is called a "version" of ~(t).
Definition 1.3.4. The stochastic process {X(t); t E T} and {Y(t); t E T} are said
to be "indistinguishable" if almost all their sample paths coincide on T. In
other words, if
P{X(t)
=
Vt E T} = 1.
Y(t)
(1.3.3)
This is the strongest form of equivalence, which clearly implies the other
two. The following example shows that two equivalent processes may have
quite different sample paths. In other words, (1.3.2) holds but not (1.3.3).
EXAMPLE 1.3.1. Let n = [0, 1J, ~ the a-field of Borel subsets of [0,1J, P
the Lebesgue measure, and T = [0,1]. Consider two stochastic processes
{X (t); t E T} and {Y(t); t E T} on {n,~, P} defined as follows:
X(t,w)
=
Y(t, ro)
=
For any
t E
° on
°
T for each WEn,
on T except at the point t = w when Y(w, w) = 1.
T fixed,
{w;X(t,w)
=1=
Y(t,w)} = {w;w
= t} = {t}.
Because P {t} = 0, it follows that
P{X(t)
= Y(t)} = 1 for each t E T.
In other words, these two processes are stochastically equivalent and yet their
sample paths are different. In addition,
P{w;X(t,w)
sup X(t)
teT
=
Y(t,w) Vt E
T} = 0,
= 0, sup Y(t) = 1.
teT
Remark 1.3.1. At this point, it would be of some interest to elucidate the
principal distinction between Definitions 1.3.2 and 1.3.4. The point is that in
Definition 1.3.2 the null set At on which X(t) and Y(t) may differ depends on
t. As we have seen in Example 1.2.2, the union
U At
teT
of uncountably many null events may not be a null event. On the other hand,
in Definition 1.3.4 there is just one null event A such that
1.4. Kolmogorov Construction
7
X(t,w)
for every w
E
= Y(t,w)
on T
AC.
Under what conditions are Definitions 1.3.2 and 1.3.4 equivalent? This is
clearly the case if T is a countable set. For continuous time, we need some
conventions. Let {e(t); t E T} be a stochastic process with state space {S,2},
which is a metric space. The stochastic process is said to be Cadlag if each of
its trajectory is a right continuous function and has limits on the left.
E T} and {Y(t); t E T} be stochastically equivalent and both right continuous. Then X(t) and Y(t) are indistinguishable.
Proposition 1.3.1. Let {X(t); t
Let Q be the set of rational numbers. For each r E Q (\ T, P{X(r)
Y(r)} = O. Consequently, P(G) = 0, where
PROOF.
U
G=
rEQI"\T
{X(r)
=1=
=1=
Y(r)}.
However, by right continuity,
{X(t)
for any t
E
=1=
Y(t)} c G
T. Therefore,
U {X(t) =1= Y(t)} c
G,
tET
which shows that
P{X(t)
o
= Y(t) 'it E T} = 1.
1.4. Kolmogorov Construction
Let {W); t E T} be a stochastic process on {n,BI,p} with state space {R,9l},
where R is the real line and 9l the u-algebra of Borel subsets of R. The
stochastic process determines a consistent family of marginal distribution
functions by
Ft, ..... tJx1, ... ,xn ) = P{Wd:$; x1, .. ·,Wn):$; x n }·
Is the converse true? In other words, given a consistent family of distribution
functions, does there exist a stochastic process for which these distributions
are its marginal distributions? The following theorem due to Kolmogorov
(which is not going to be proved here) provides an affirmative answer to this
question.
Theorem 1.4.1. Assume that
{Ft, ..... tJX1'· .. 'Xn)}, {t1, ... ,tn} C T,
n = 1,2, ...
(1.4.1)
1. Basic Concepts and Definitions
8
is a consistent family of distribution functions, then there exists a stochastic
process with (1.4.1) as its marginal distribution functions.
As the probability space on which this process is defined we may use
{RT,BI,P}
(see Section 1.2 for definitions) and the stochastic process is specified by
X(t, w) = w(t)
for each wERT and t
E
T.
The method in the Kolmogorov theorem used to construct a stochastic
process starting from a consistent family of distribution functions leads to
a rather large set of sample functions, namely, RT. Often it is desirable
to construct a process whose sample functions are subject to some regularity conditions. For instance, we may want them to be Borel measurable,
or continuous, and so on. Denote this particular subset of RT by no. In
order that there exists a process {~o(t); t E T} stochastically equivalent to
{~(t); t E T}, which can be realized in no, it is necessary and sufficient that
p*(n o) = 1, where P* is the outer measure induced by P as follows: For any
McR T ,
P*(M) = inf{P(C); C
~
M},
where C c RT is a cylinder set. In such a case, the system {no, Blo, P*} is the
new probability space with Blo = fJI n no. We are not going to prove this
result here.
1.5. Principal Classes of Random Processes
In this section, we isolate and define several important classes of random
processes which have been extensively studied and used in various
applications.
According to Definition 1.1.1, every random process is a family of r.v.'s
defined on a common probability space. The theory of stochastic processes is
concerned with different features of such a family, many of which can be
studied by means of their marginal distributions. Of particular interest are
those processes endowed with certain stochastic structure. More specifically,
those whose marginal distributions are assigned certain properties.
In the sequel, we identify the following five classes of random processes,
each of them having a particular stochastic structure.
(i) Processes with independent increments. Let
{~(t); t E
T}
(1.5.1)
be a real-valued stochastic process on a probability space {n,BI,p}, where
T c R is an interval.
1.5. Principal Classes of Random Processes
9
Definition 1.5.1. The process (1.5.1) is said to have "independent increments"
iffor any finite subset {to, t 1, •.. ,tn} c T with to < t 1 < ... < tn' the increments
(1.5.2)
are independent r.v.'s.
From the definition, it readily follows that all the marginal distributions
are completely determined by the distribution of W) for all t E T and by
~(tl) - ~(tl)' t 1 , tl E T, t1 < t l . Two most important processes with independent increments are the Poisson and Wiener (or Brownian motion) process.
They will be discussed in some detail in the following chapters.
(ii) Markov processes. Consider a random process on {n,8i6',p},
{W);t
E
T}
(1.5.3)
where T = [0, 00), with values in an R.
°
Definition 1.5.2. Process (1.5.3) is called Markov with a state space {R, 9l} if,
for any ~ t1 < t2 < ... < tn and B E 9t,
P{ Wn) E BIWd, ... , ~(tn-d}
= P{ ~(tn) E BI~(tn-d} (a.s.). (1.5.4)
Property (1.5.4) of the stochastic process is called the Markov property. To
elucidate somewhat its true meaning let us fix t E (0, 00) and consider the
families of r. v.'s
(1.5.5)
g(s);s ~ t} and g(u);u ~ t}.
If we take the instant t to be present, then the first family of r.v.'s represents
the "past and present" of the process, whereas the second family is its "future."
Now, for any B1 , Bl E 9t and s < t < u, one can easily verify that (1.5.4)
implies that the following holds almost surely:
Pg(s) E Bp ~(u) E B21~(t)}
= Pg(s) E BtlW)}Pg(u) E B11~(t)}.
(1.5.6)
Thus, the Markov property, roughly speaking, means that given the present,
the future and the past of a Markov process are independent.
Definition 1.5.3. The distribution n on 9t defined by
n(B)
= P{ ~(o) E B}
(1.5.7)
is called the "initial distribution" of the process.
Definition 1.5.4. A version
P(s,t,x,B)
of P{ ~(t)
E BI~(s) =
(1.5.8)
x} having the properties
a. for every s < t and x E R fixed, P(s, t, x, . ) is a probability measure on 9t,
b. for every s < t and B E 9t fixed, P(s, t, ., B) is an 9t-measurable function,
1. Basic Concepts and Definitions
10
is called the transition probability function (or simply the transition probability) of the Markov process.
From the definition, it follows that
P(t,t,x,B)
I
{0
=
if x
'f
E B
lx¢:B
(1.5.9)
for all t 2 O. In addition, the transition probability satisfies the so-called
"Chapman-Kolmogorov equation." For any 0 :::; s < t < U and B E Yl,
P(s, u, x, B)
=
t
P(s, t, x, dy)P(t, u, y, B).
(1.5.10)
The initial distribution 11: and the transition probability function P(s, t, x, B)
determine completely and uniquely the probability measure P of the process.
To show this, let {Bd1 c Yl and 0 < t1 < ... < tn be arbitrary. Then taking
into account (1.5.4) we clearly have
P{W1) E B 1, ... , Wn)
E
Bn}
=f f ···f P{~(0)Edxo,W1)Edx1,···,Wn)Edxn}
R
BI
Bn
This defines P on the class of measurable cylinders. The rest follows from
Theorem 1.4.1.
Definition 1.5.5. A Markov process is said to be "homogeneous" or to have
"stationary transition probability" if, for any 0 :::; s < t,
P(s, t, x, B)
P(O, t - s, x, B).
=
(1.5.12)
To simplify notation we shall write
P(x, t, B)
=
t
P(O, t, x, B).
(1.5.13)
In this case, the Chapman-Kolmogorov equation becomes
P(x, s
+ t, B) =
P(x, t, dy)P(y, s, B).
(1.5.14)
From (1.5.14), there follows an interesting fact: It is enough to know
P(y, s, B) for all s ::;; 8, where 8 > 0 is arbitrarily small, because, for all u > 8,
it is determined by Equation (1.5.14). In other words, the local behavior of the
process (in the neighborhood of zero) determines its global behavior.
(iii) Strictly stationary processes. In the theory of stochastic processes, an
important class consists of those processes whose marginal distributions are
invariant under time-shift.
1.5. Principal Classes of Random Processes
11
Definition 1.5.6. A real-valued stochastic process {W); t E T} on a probability
space {n,gjJ,p} is said to be "strictly stationary" if for any {tl, ... ,tn } C T,
{t l + h, ... ,tn + h} c T, and any {BkH c~, n = 1,2, ... ,
P{ Wl +h) E B h
... ,
Wn + h) E Bn} = Pg(td E Bl , ... , Wn) E Bn}. (1.5.15)
In applications of strictly stationary stochastic processes, the parameter set
T is usually R+ or R. If EI e(t)1 < 00, then from (1.5.15) it follows that
Eg(t)}
= E{W + h)} = m
(constant).
Similarly, if the second moment of the process exists, then it is also equal to
a constant.
Finally, set T = R+ and let s, t E T. Consider
R(s, t) = E {e(s)W)}.
Clearly, R(s, t)
due to (1.5.15)
= R(t, s). Assume for the sake of definiteness that s < t. Then
Eg(s)W)} = E{e(s
When we set h
(1.5.16)
=
+ h)W + h)}.
(1.5.17)
-s in (1.5.17) we obtain
R(s, t) = R(O, t - s).
The function (1.5.16) is often called the covariance function.
Definition 1.5.7. A real or complex r.v. Z on {n,gjJ,p} is said to be "second
order" if EIZI 2 < 00. The family of all second-order r.v.'s on {n,gjJ,p} is
denoted by
Definition 1.5.8. A stochastic process {W); t
if
W) E L 2 {n,gjJ,p}
E
T} on {n, gjJ, P} is second order
TIt
E
T.
(iv) Wide sense stationary processes. There exists a large class of problems
in the theory of stochastic processes and their applications, whose solution
requires only knowledge of the first two moments of a process. In such a case,
problems are considerably simplified if these moments are invariant under
time-shift.
Definition 1.5.9. A second-order stochastic process { e(t); t E T} on a probability space {n, gjJ, P} is said to be "wide sense" stationary if, for all t E T,
E { W)} =
m (constant)
and
(1.5.18)
R(t, t
+ t) = R(O, t).
12
1. Basic Concepts and Definitions
It is clear that the second-order strictly stationary stochastic process is a
fortiori wide sense stationary. The time invariance of covariance function
with respect to time-shifts implies that the methods of harmonic analysis may
playa useful role in the theory of wide sense stationary processes. This will
be discussed in some details later.
(v) Martingales. Let g(t);t E T} be a real stochastic process on {n,~,p}
such that E{le(t)l} < 00 for all t E T.
Definition 1.5.10. The stochastic process e(t) is said to be a martingale if, for
every tl < t2 < ... < tn in T,
E{ Wn)IWd, ... , Wn-d}
= Wn-l) (a.s.).
(1.5.19)
(a.s.),
(1.5.20)
(a.s.),
(1.5.21)
If
Eg(tn)le(td,···,Wn-l)}:::;; Wn-d
the process W) is supermartingale. Finally, if
Eg(tn)le(t1),···,Wn-d} ~ Wn-d
the process e(t) is said to be a submartingale.
1.6. Some Applications
There is a wide variety of applications of stochastic processes. For some of
them all that is needed is calculation of one or more parameters of a stochastic
process which is of direct interest in the given problem. Some other applications require very sophisticated theoretical methods of great complexity. This
book deals with elements and methods of the theory of stochastic processes
on an intermediate level, with strong emphasis on their applications. Successful use of the theory depends a great deal on successful modeling. For this
reason, it is worthwhile to make a few remarks about it.
Roughly speaking, mathematical modeling is the formulation of a mathematical system designed to "simulate" behavior of certain aspects of a physicalor biological phenomenon. For instance, Newton's law of motion of a
body falling freely under gravity may serve as an example of a mathematical
model.
A mathematical model represents our conceptual image of the phenomenon that we observe and as such it reflects our quantitative understanding of
a situation. Its formulation is always based on a certain number of assumptions concerning the fundamental nature of the phenomenon we investigate.
If these assumptions are of a general nature, we may finish with a model of
great mathematical complexity. On the other hand, too many simplifying
assumptions may mean considerable restrictions on the model's ability to
provide a sufficiently detailed and accurate description of the system being
modeled.
1.6. Some Applications
13
The formulation then of a mathematical model must be a compromise
between these two extremes. For this reason one usually talks about "a"
mathematical model rather than "the" model. In the rest of this section, a
number of examples illustrating mathematical modeling of various systems
will be discussed in some details.
1.6.1. Model of a Germination Process
From an agricultural viewpoint, germination is a process which begins when
a dry seed is planted in wet soil and ends when the seedling emerges above
the ground. The duration of this process is a r.v. T such that 0 < T ~ 00. The
stochastic nature of the quantity T is a result of the fact that the water uptake
(by imbibition) is a purely physical process based on diffusion of water
through a porous medium (the seed coat). In addition, the soil matrix potential, the seed soil contact area, and the concentration of soil moisture around
the seed are additional factors contributing to random variations of T.
Consider the following problem: Suppose, at time t = 0, N seeds of the
same plant species are planted and denote by T1 ,· T2 , ••• , TN' their respective germination times. How many seeds will germinate in (0, t]? Denote
by X(t) this number; it is then quite clear that X(t) is a stochastic process
defined as follows:
X(t) =
N
L I{T St}
i=l
I
for t > 0 and X(O) =
o.
(1.6.1)
We now make several assumptions. Our physical intuition is not violated
by assuming that:
a. {1i}~ is an i.i.d. sequence ofr.v.'s with a common distribution function
H(t)
= P{T ~ t} with H(O) = 0 and H(oo) = 1 -
p.
(1.6.2)
Here, 0 ~ p < 1 is the probability that a seed may fail to germinate. From
(1.6.1) and (1.6.2), we have
P{X(t)
= k} = (~)(H(t))k{1 -
H(t)}N-k,
(1.6.3)
Thus, if H(t) is known, the probability (1.6.3) is completely determined.
Now make the following assumption:
b. The average proportion of germinations in (s, s + L\s) is approximately
A. • L\s, where A. > 0 is a constant.
Next, partition (0, t] into n subintervals of equal length L\t= tin. Then
due to assumption b, the average number of germinations in (0, L\t] is
approximately
N(1 - p)A.L\t.
1. Basic Concepts and Definitions
14
Thus, the average number of nongerminating seeds at time M is
N(I-p)(I-AM). The average number of germinating seeds in (M,2~t] is
N(1 - p)(1 - AM)A~t, so that the average number of nongerminating seeds
at time 2~t is
N(I - p)(1 - AM)2,
and so on. Continuing in this fashion, we conclude that the number of
nongerminating seeds at time t is approximately
N(I - p)(1 -
A~tt.
From this, we deduce that for n sufficiently large
P{T:S; t}
By letting M
---t
~
{I - (1 - AM)"}(1 - p).
0, we obtain that
H(t) = (1 - p)(1 - e- At ).
(1.6.4)
From this and (1.6.3), the required probability is now easy to obtain.
1.6.2. Modeling of Soil Erosion Effect
Soil erosion is a result of perpetual activities of various geophysical forces
acting on the earth's surface. The ensuing degradation of farmland may have
profound impact on future food supplies. Because no experimental data exist,
there is no choice but to resort to mathematical models in assessing future
crop losses due to erosion. Here we present a simplified version of such a
model.
Surface erosion occurs by the migration of individual soil particles in
response to forces such as wind, raindrop impact, surface runoff, and so on.
Erosion reduces plant-available minerals and nutrients which are adsorbed
in eroded particles. It also continuously decreases the root zone depth, which,
in turn, reduces the water-holding capacity of the soil. This leads to decline
of crop yields due to newly induced droughts that become more frequent and
severe.
Even under optimal conditions, the production of a particular crop
fluctuates from one year to another in a random fashion. The mechanism
responsible for these variations is quite complex. It involves climatic and
meteorological factors (such as air temperature, wind, solar radiation, rainfall) soil characteristics, and other conditions. Due to these reasons, the effect
of soil erosion on crop yield is not directly observable but becomes gradually
evident over a long period of time.
Having outlined the mechanism of the soil erosion process, we now embark on the problem of constructing a model to assess its effect on crop
production. To this end consider a crop-producing area A and denote by
{Y,,}f the sequence of annual yields of a given crop. If the area A is not
15
1.6. Some Applications
affected by erosion and the same agricultural practice is followed year after
year, we may assume that Pi} l' is an i.i.d. sequence of strictly positive r.v.'s
on {n,~,p}, with a common distribution function
such that E{yj} <
Q(y) = P{Y ~ y}
00,
i = 1,2.
(1.6.5)
If, on the other hand, A is subject to soil erosion, the resulting sequence of
annual yields {Xn }1' consists of r.v.'s which are neither independent nor
identically distributed. To determine this stochastic process, suppose that the
soil erosion reduces annual yield by a certain percentage each year and
denote by R j the percentage in the ith year. Then the loss suffered in the first
year is
Rl
Yl·100=Yl·Ul.
Thus, Xl = Yl - Yl · Ul = Yl(l - Ul ). The loss suffered in the second year
is
R2
Y2 (1 - Ud· 100
= Y2 (1
- Ud· U2
so that X 2 = Y2 (1- Ud - Y2 (1 - U l )U2 = Y2 (1 - Ud(1 - U2 ), and so on.
Therefore, the crop yield in year n is
Xn =
Y"
n Zj,
n
(1.6.6)
j=l
Our physical intuition is not violated by assuming that {Z;}1' is an i.i.d.
sequence of r.v.'s independent of {y"}1', with commOn support in (0,1]. It
seems appropriate to call Ln = n~ Zj "the loss-rate function." Notice that
{L;}1' is a Markov chain. It is interesting to note that the quantity Ln = n~ Zj
first appeared in a paper by Kolmogorov (1941) which is concerned with the
modeling of a rock crunching process.
1.6.3. Brownian Motion
The kinetic theory envisages a fluid as composed of a very large number of
molecules moving with widely varying velocities in all directions and colliding
with each other. In the early days of the theory, the most frequently asked
questions were: Are molecules real? Do they actually exist and can we demonstrate their existence? The attempts to anSwer them led to an exhaustive study
of an interesting phenomenon and shed much light on other kinetic properties. The phenomenon takes its name after the English botanist R. Brown who
in 1827 noticed the irregular but ceaseless motion of the small particles, e.g.,
pollen suspended in a liquid. At first, the motion was thought to be of organic
origin. After the advent of kinetic theory, it became clear that the only
reasonable explanation for it lay in the assumption that the particles were
1. Basic Concepts and Definitions
16
subject to the continual bombardment by the molecules of the surrounding
medium.
Suppose at time t = 0, when our observation begins, the pollen was at the
point x = O. Denote by ~(t) its position at time t > 0 [here W) denotes one
coordinate of the particle]. The chaotic nature of this motion clearly implies
that W) is a r.v. for all t > O. Thus, g(t); t ;;:: O} is a continuous parameter
stochastic process.
Examining the nature of the phenomenon, it seems reasonable to assume
that the distribution of ~(t + s) - ~(t) does not depend on t if the temperature
of the liquid remains constant. It also seems reasonable to assume that the
change in position of the particle during the time interval [t, t + s] is independent of anything that happened up to time t. This implies that the process ~(t)
has independent increments. Finally, the trajectories of the process should be
continuous functions.
We shall show that under these assumptions the stochastic process ~(t) is
Gaussian (i.e., all the marginal distributions of the process are normal) with
E{W)} =0,
Varg(t)} = t.
From this, we have for s < t
In other words,
E { ~(s)~(t)} = min(s, t).
1.6.4. Modeling of Dispersion in a Porous Medium
Consider a column packed with a porous material saturated with a single
incompressible fluid (say, fresh water) under convection (see Figure 1.3).
Suppose that at time t = 0 a set of tagged (dynamically neutral) particles are
introduced into the flow (across the section AB of the column). The transport
of these particles in the direction of the flow is called "longitudinal dispersion
in a porous medium."
Porous
medium
"'j~ . .
-1---
Tagged
particle
o·
. li:.
.
Figure 1.3. Dispersion in a porous medium.
17
1.6. Some Applications
We can construct a model of the dispersion under the following conditions:
(i) the flow is steady and uniform;
(ii) the porous structure is isotropic and homogeneous porous medium;
(iii) tagged particles have identical "transport characteristics" and move independently of one another;
(iv) there is no mass transfer between the solid phase and the fluid.
A tagged particle in the flow through porous medium undergoes a particular kind of random walk. It progresses through pores and voids of the porous
structure in a series of steps of random length, with a rest period of random
duration between two consecutive steps.
Denote by X(t) the distance ofparticie from the section AB at time t; then
X(O)
= 0:5; X(td:5;
X(t 2 )
if 0 < t1 < t 2 •
It is intuitively clear that {X(t); t ~ O} is a time homogeneous Markov process. Under conditions (i)-(iv), the marginal distribution
(1.6.7)
Ft(x) = P{X(t):5; x}
provides the longitudinal concentration function of tagged particles in the
column at any t ~ O. Calculation of (1.6.7) is based on the fact that X(t) can
be approximated by a regular Markov jump process.
1.6.5. Queues
A queueing system can be described as follows. Customers arrive at a service
station (for instance, a post office, bank, etc.) with a certain number of servers.
An arriving customer may have to wait until his turn comes or one of the
servers becomes available. He leaves the station upon the completition of the
service.
To formulate a mathematical model of a queueing system, we must specify
conditions under which the system operates. For instance;
1. in what manner do customers enter the system?
2. in what order are they served?
3. how long are their service times?
Concerning the first question, we shall assume that the arrival time of the
nth customer in the system is a r.v. 'n (we assume that '0 == 0 < '1 < '2 < ... ).
Consequently, the number of customers who have entered the system by the
time t is a random process {N(t); t ~ O} such that N(O) == 0
N(t): 0, 1,2, ... for t >
°
and
N(td :5; N(t2)
if t1 < t 2. (1.6.8)
Clearly, for any n = 0, 1, ... ,
'n = inf{t;N(t) = n},
N(t) = max{n;'n:5; t}.
(1.6.9)
1. Basic Concepts and Definitions
18
An arriving customer joins the end of a single line of people waiting to be
served. The service is on a "first come first served" basis. When one of servers
becomes free, he turns his attention to the customer at the head of the waiting
line. This answers the second question.
Finally, concerning the third question we, shall assume that the service
times form an i.i.d. sequence ofr.v.'s {Un};", independent of {'l:n};".
Here we shall consider queueing systems with a single server. Under
certain conditions such a queue is completely described by the process
{N (t); t ~ O} and {Un};". The state of the system at any time t ~ 0 is specified
by the random process
{S(t);t ~ O},
S(O) == 0,
(1.6.10)
which is the number of waiting customers, including one in the process of
being served, at time t.
What kind of information about S(t) is of interest to us? For instance, we
would like to know something about the asymptotic behavior of S(t) when
t -+ 00. When is S(t) asymptotically stationary? When is S(t) a Markov chain
or when does it contain an imbedded Markov chain?
The answers to these and other similar questions depend on the process
N(t) and {Un};". Set T1 = '1: 1 and T" = 'l:n -'l:n- 1 for n ~ 1. Clearly, {T,J;" is
the sequence of interarrival times. We shall assume that {1k};" is also an i.i.d.
sequence of r.v.'s Set
Fu(t)
= P{U ::;; t}.
(1.6.11)
Under these assumptions, a queueing system is completely specified by
these distributions. For this reason, it seems convenient to describe a queueing system in terms of the distributions (1.6.11). The most common notational
scheme for this purpose is by a triple A/B/s, where A specifies Fro B specifies
Fu , and s is the number of servers.
For instance, M/M/1 means that
FT(t)
= 1-
e- At,
Fu(t)
= 1 ~ e- act ,
s = 1.
On the other hand, G/G/1 means that FT and Fu are some general distributions and s = 1, and so on.
1.7. Separability
Let {e(t); t E T} be a real stochastic process on a complete probability space
{n, 86', Pl. From the definition of a stochastic process, it follows that the
mapping
e(t,·):n-+R
is 86'-measurable for every t E T. In other words,
1.7. Separability
19
{w; ~(t, w) E
for each Borel subset B c R. However,
B} E f14
n {w; W,w)
{w; W,w) E B, t E T} =
E B}
leT
need not be in f14 in general, unless T is countable. This then implies that
functions like
sup
and
~(t)
inf
leT
~(t)
leT
may not be 8i-measurable either because, for instance,
{ SUP
lET
W)
~ x} =
n g(t) ~ x}.
lET
Therefore, a large number of important functionals of a continuous parameter stochastic process (all those which depend on an uncountable number of
coordinates) may not be random variables. A remedy for this situation is the
separability concept introduced by Doob which will be defined next.
Definition 1.7.1. The stochastic process {~(t); t E T}, where T c R is an interval, is said to be separable if there exists a countable dense subset D c T and
a null event A. c Q such that
{w; W, w) E C, tEl (") D} - {w; ~(t, w) E C, tEl (")
T}
c
A.
(1.7.1)
for any closed set C and any open interval I.
The countable set D is called the separant. Note that
{w; ~(t,w) E C, tEl (") D}
::::>
{w; ~(t,w) E C, tEl (")
T}.
From (1.7.1) it follows that
N (") {w; ~(t, w)
E
C, tEl (") D}
= N (") {w; W, w) E C, tEl (") T}, (1.7.2)
which clearly implies that the right-hand side of (1.7.2) is an event because the
left-hand side is so. The next proposition shows that (roughly speaking) every
trajectory of the process {~(t); t E T} from N is as regular on T as its restriction on D.
Proposition 1.7.1. For each wEN and open interval I c T,
w(") T, w) = W (") D, w).
(1.7.3)
PROOF. Here ~(l (") T,w) denotes the closure inR of the set of values assumed
by the trajectory ~( ., w) as t varies in 1(") T. The other set in (1. 7.3) has similar
interpretation.
To prove the proposition consider a closed subset C c R, we then have
W (") D, w) c
C <=> W (") D, w) c C.
20
1. Basic Concepts and Definitions
On the other hand, from (1.7.2) we deduce that
\fw E AC,
Taking C
=
Wn
WnD,w) c C-$>Wn T,w) c C.
D, w), it follows from this that
Wn T,w) c WnD,w)
from which we conclude that (1.7.3) holds. This proves the proposition.
From (1.7.3) we obtain that for each wEN and t
¢(t,w)
E
W n D,w)
E
D
T,
(1.7.4)
for every open interval I containing t. This, on the other hand, implies that
because, by definition, D is dense in T, for every t E T there exists a sequence
{udf cD such that Uk --+ t. Then, for each wEN,
lim ¢(u k , w) = ¢(t, w)
(1.7.5)
(the sequence {Uk} may depend on w). This can be seen as follows. Let tEl,
then
lim ¢(u b w)
k .... oo
E
Wn
D, w).
Let us show that for every wEN and open set I
sup ¢(t, w) = sup ¢(t, w).
I
(1.7.6)
InD
From its definition, it follows that the left-hand side of (1.7.6) is the upper
bound of the set W, w) which belongs to it. Similarly, the right-hand side of
(1.7.6) is the upper bound of W n D, w), and must belong to this set. From this
and (1.7.3), the assertion follows. In the same fashion, one can show that
inf ¢(t, w) = inf ¢(t, w).
lInD
Do separable processes exist? The following proposition due to Doob
provides an affirmative answer to this question.
Proposition 1.7.2. Every real-valued stochastic process {¢(t); t E T} defined on
a complete probability space {Q, 86', P} has a separable version.
We are not going to prove this proposition here. This is an important
result which implies the following: For any real random process {¢(t); t E T}
on a complete probability space {Q,ge,P}, there is a real random process
{~(t); t E T} defined on the same probability space which is separable and
stochastically equivalent to ¢(t). Note that ~(t) and ¢(t) may have different
trajectories. This, however, is not the case if ~(t) and ¢(t) are both Cadlag (see
Proposition 1.3.1).
1.8. Some Examples
21
Remark 1.7.1. If { W); t E T} is separable, then, as we have seen from (1.7.5),
for any wEN and t E T
~(t, w)
= lim
~(Uk> w),
n->oo
where {uk}i cD (depends on w) is such that Uk -+ t. In other words, the
values of the sample functions on T are limits of sequences { ~(Uk' .)} i .
1.8. Some Examples
In this section we present two examples. In the first one we consider a
separable version of a nonseparable stochastic process.
EXAMPLE 1.8.1. Let {n,~,p} be a probability space with n=[0,1],
f!4 the u-algebra of Borel subsets of [0, 1], and P the Lebesgue measure. Let
g(t); t E [0,1]} be a stochastic process on the probability space defined as
follows:
~(t, w)
=
{
1 ift = w
0 if t # w.
From this definition, it follows that
{w;W,w)
=
O}
=
{w;w # t}.
Hence, for any subset reT = [0,1],
n {w;W,w)=O}
{W;~(t,W)=O,tEr} =
fer
n{w; w
=
fer
= {w; W
E
# t} =
(u {w; w = t})C
fer
r}C = [0, 1] - r.
From this, it follows that if leT is any open interval and D the set of all
rationals,
P{w;~(t,w)
= 0, tEl} = 1 -
P(I),
P{w;~(t,w)=O,tElnD} = 1,
which clearly shows that the process ~(t) is not separable.
Now, let a(t); t E [0, 1]} be a stochastic process on the same probability
space specified as follows:
e(t,w)
Then, we clearly have
= 0
for all (t,w)
E
[0,1] x n.
22
1. Basic Concepts and Definitions
{w;e(t,w) i= ~(t,w)}
=
{w;t
= w}
=
{t},
which is a null event. Thus, ~(t) is an equivalent version of W). It is also easy
to check that ~(t) is separable.
1.8.2. Let {W); t E T} be a real stochastic process on the probability
space {RT, a3', P} (see Section 1.2 for definitions). Denote by
EXAMPLE
CT
=
{x(t); t
E
T}
the set of all continuous functions with values in R. Clearly, Cn which is a
subset of R T , is not an element of a3'. As matter offact,
{w;e(',w)
ECT } = J~l kQ It-Qllk {w;IW,W) -
e(s,w)1
~~} ¢ a3'.
However, if the process is separable with a set D eTas separant and Tis
closed,
{w;eLw)
E
T} =
C
LLQ
S.oD
{w; IW,w) - e(s,w)1
Is-tl'; 11k
~
n
(1.8.1)
and as such is an element of a3'. In order that the realizations of a separable
process are continuous with probability 1, it is necessary and sufficient that
the probability of the random event (1.8.1) is 1.
1.9. Continuity Concepts
In this section we define three types of continuity of a stochastic process. The
first one is stochastic continuity (or continuity in probability). Let {W); t E T}
be a real-valued stochastic process on {n, a3', P}.
Definition 1.9.1. The process {e(t); t E T} is said to be stochastically continuous at a point to E T if, for any B > 0,
lim P{IW) If (1.9.1) holds in every point to
continuous on T.
E
Wo)1 > B}
=
O.
(1.9.1)
T, we say that the process is stochastically
Remark 1.9.1. From the definition, we see that stochastic continuity is a
regularity condition on bivariate marginal distributions of the process. As
matter of fact, we have
23
1.9. Continuity Concepts
~(t)
4
I
I
I
r---j
3
I
2
I
r------tlI
I
I
I
I
I
I
I
I
I
I
o
Figure 1.4. A realization of ~(t).
EXAMPLE 1.9.1. Let {T,.}f be an i.i.d. sequence of non-negative r.v.'s on a
probability space {n,.1l,p} with H(t) = P{Y; :$; t} continuous. Set
~(t)
Clearly,
=
n
L I {Ti~t}'
;=1
t ~
o.
(1.9.2)
o :$; ~(td :$; ~(t2)
for all 0 :$; tl :$; t 2 • The realizations of ~(t) are nondecreasing step functions
with unit jumps at points
where {T,,;}1 are order statistics for {Y;}7. In Figure 1.4 is a trajectory of ~(t).
Although every sample function of ~(t) has discontinuities, the random
process ~(t) is stochastically continuous on [0, (0) because due to Markov's
inequality and continuity of H( . ), it follows that
P{IW
± h) -
as h --. 0+ for each t
~
n
W)I > e} ~ -IH(t
e
± h) -
H(t)I--'O.
O.
Remark 1.9.2. Condition (1.9.1) is equivalent to
~(t)
p
--. ~(to)
as t --. to.
(1.9.3)
From Example 1.9.1, we see that a process may be stochastically continuous
even if each realization of the process is discontinuous. This is so when the
probability that a discontinuity will occur in an instant t E T is zero. So, when
is a stochastic process stochastically discontinuous at a point to E T? The
answer is only if to is a fixed discontinuity point of the process. What is a fixed
discontinuity point?
24
1. Basic Concepts and Definitions
Definition 1.9.2. Suppose that {~(t); t E T} is real-valued and separable, and
denote by Nto the set of sample functions which are discontinuous at to E T. If
peNt) > 0, we say that to E T is a fixed discontinuity point of ~(t).
Definition 1.9.3. Suppose that {~(t); t E T} is real-valued and separable. The
stochastic process is continuous (a.s.) at a point to E T if the set of realizations
discontinuous at to is negligible. If the process is (a.s.) continuous at every
t E T, we say that the process is (a.s.) continuous on T.
Remark 1.9.3. It is apparent that every separable process continuous (a.s.) at
a point t is also continuous in probability at t [in this case peNt) = 0], but the
converse is false, in general.
Definition 1.9.4. A stochastic process {~(t); t E T} is said to have (a.s.) continuous trajectories if the set of those trajectories which have discontinuities on T
is negligible.
If {~(t); t E T} is (a.s.) continuous at each t E T, it does not necessarily imply
that it has (a.s.) continuous trajectories. Indeed, the set of all those trajectories
of the process without discontinuities on Tis
NC
--
n
NtC
tET
and this event may not have probability 1.
1.9.2. Let {n,~,p} be a probability space with n = [0,1],
the u-algebra of Borel subsets of [0, 1], and P the Lebesgue measure. Let
{~(t); t E [0,1]} be defined as follows:
EXAMPLE
~
o
e(t,w)= {1
(see Figure 1.5). Let
r
=
if t < w
ift~w
[0, s]; then
o
Figure 1.5. Graphical depiction of W, ro).
25
1.10. More on Separability and Continuity
{m;e(t,m)
= O,t E r} = [0,1] - r,
so that
P{m;e(t,m)=O,tEr}
= I-s.
On the other hand, if D is the set of all rationals
P{m;e(t,m)=O,tEDnr} = I-s.
This shows that the process is separable. In addition, for t
so that P(Nt ) = 0. However,
N =
U Nt = Q
E
T, Nt = {m; m= t}
so that P(N) = 1.
tET
The following theorem by Kolmogorov gives a sufficient condition for (a.s.)
continuity of sample functions of a separable stochastic process.
°
Proposition 1.9.1. Let { e(t); t E T} be a real-valued separable process and T a
compact interval. If there exist constants ex > 0, p > 0, C > such that
(1.9.4)
then almost all sample functions of e(t) are continuous on T.
1.10. More on Separability and Continuity
Given a real-valued process, there will be, generally speaking, several equivalent versions, some of which are separable. This permits us to choose a
separable version whose sample paths have some nice properties.
Let {e(t); t E T} be real-valued and separable with D c: T as a separant.
Then as we have seen, there exists a negligible subset A c: Q such that, for
each mEN, the value of e(·, m) at every t E T is the limit
lim Wn,m),
n-+oo
where {t n} c: D and tn -+ t as n -+
00.
The sequence {t n} i, in general, depends
onm.
Consider a real-valued stochastic process {e(t); t E T} on a complete probability space {Q, 91, P}, without discontinuity points of the second kind. In
other words,
e(t - 0, m) and e(t + 0, m)
exist for all t E T and mEn. The next simple proposition shows that in such
a case we can always choose a version e(t) of ,(t) which is continuous from
the right if e(t) is stochastically continuous.
Proposition 1.10.1. Let {e(t); t
E T} be real-valued stochastically continuous
random process on a complete probability space {Q, 91, P} and T c: R a compact
26
1. Basic Concepts and Definitions
interval. If the process does not have discontinuities of the second kind, there
exists a separable version ~(t) equivalent to ~(t) whose sample functions are
continuous from the right.
PROOF. Choose a separable version and denote by B the random event that
the limit
exists for every t
and consider
E
T. Let us show that P(B)
=
1. To this end, let t
T be fixed
E
Because the process is separable, this limit exists for all wEN, where A is a
null set. Therefore, P(Bt ) = 1 for each t E T. On the other hand, due to the
separability assumption,
B
=
n
tEDnT
Bt=P(B)
= 1.
Next, set
~(t,W) =
if wEB, and ~(t, w)
{w;
= ¢(t, w)
lim
n--+oo
~(t + !,w)
n
if WE Be; then
~(t) f= ~(t)} = rQ {W; I~(t) - ~(t)1 >
n
Hence,
P{W;
~(t) f= W)} = !~~ p( {I~(t) - ~(t)1 >
On the other hand,
P{I~(t) - ~(t)1 >
fI
B.
n
fI
B).
npCQ L\ {I~(t n-~(t)1 n)
!~~ peOn {I~(t n-~(t)1 n)
=
=
+
+
~ !~~ p{I~(t +~) - ~(t)1 >
due to stochastic continuity. Therefore,
P{w;~(t)f=W)}=O,
VtET.
>
n
>
=
0
1.11. Measurable Random Processes
Finally, because \fm E Band
t E
T
e(t) =
it follows that {e(t); t
assertion.
E
27
W + 0),
T} is continuous from the right. This proves the
D
The following result will be needed in the sequel.
Proposition 1.10.2. Let {~(t); t E T} be a real-valued stochastic process and
T eRa compact interval. A necessary and stifficient condition for ~(t) to be
continuous in probability on T is that
(1.10.1)
sup P{I~(t)-~(s)l>e}-+O
It-sl<h
as h -+ 0 for any e > O.
PROOF. Sufficiency of (1.10.1) is clear. To prove its necessity, assume that ~(t)
is stochastically continuous on T. Then for any fixed e, b > 0 and u E T, there
exists h > 0 so that
sup P{IW) - ~(u)1 > e} < b.
It-ul<h
T, with U 1 < U 2 < ... < Un' be a sequence such that T c
h, Ui + h). If I is an open interval contained in one of (u i - h, Ui + h),
then, for any s, tEl,
Let
{u;}~ c
U~ (u i
-
{IW) - ~(s)1 > 2e} c {I~(t) - ~(u)1 > e}
U
{I~(s) - ~(u)1
> e}.
This implies the inequality
P{IW) - ~(s)1 > 2e} ~ 2b,
which holds as long as It - s I < h. The proof now follows by letting b -+ O.
D
1.11. Measurable Random Processes
Let R be the real line and gJ the a-algebra of Borel subsets of R. In the
following we shall often use the notation
gJc=gJnC= {BnC;BE9P}.
(1.11.1 )
Clearly, gJc is a a-algebra, such that gJc c gJ if C E 9P.
Let {~(t); t E T} be real-valued stochastic process on a probability space
{Q, 86', P} and T c R an interval. From Definition 1.1.1, we know that ~(t, .)
is a 86'-measurable function on Q for every t E T. From this, however, it does
not follow that the mapping
1. Basic Concepts and Definitions
28
~(-,
is
fYlT
.): T x
n -+ R
(1.11.2)
x 8I-measurable. We now give the following definition:
Definition 1.11.1. A real-valued stochastic process {~(t); t E T} is said
to be measurable if the mapping (1.11.2) is fYlT x g6'-measurable, i.e., if
{(w, t); W, w) E B} E fYlT X g6', for every BE fYI.
{~(t); t E T} is a measurable process. Then from Fubini's
theorem, it follows that for almost all WEn the sample functions ~(', w) are
fYlT-measurable.
To motivate the consideration of the measurability concept, suppose that
for a measurable process ~(t), we have
Remark 1.11.1. If
ff Tx(l
1~(t, w)1 dt P(dw) <
(1.11.3)
00.
Again invoking Fubini's theorem, we obtain
ff Tx(l
'~(t'W)'dtP(dW)=fT E{I~(t)l}dt= J(lr p(dW)fT 1~(t,w)ldt.
Thus,
t E{I~(t)l}
dt <
00
and
X(w) =
tl~(t'W)ldt< 00
(a.s.),
where X(·) is a r.v. on {n, 81, P}.
Next we give an example of a measurable process.
1.11.1. Let gd~ be a sequence of r.v.'s on {n,g6',p}. Let
g(t);t E [a,b]} be a stochastic process defined as follows: Set a < t1 <
t 2 < ... < tn < b and write
EXAMPLE
W)
t E [a,td;
= ~o,
~(t) = ~n'
t E [tn,b]
and
~(t) = ~i'
t
E
[ti' t i+1),
i = 1,2, ... , n - 1.
To show that the stochastic process is measurable, let
{(t,w);W,W)Er}
=
r
E
fYI and consider
[a,t 1) x {W;~oEr}U[t1,t2) x {W;~1 Er}
U '" U
[tn,b] x {w;~n
E
B}.
29
1.11. Measurable Random Processes
Clearly, then (T = [a,b])
{(t, (0); W, (0) E r}
E
at T
X
f!4,
so that ~(t) is measurable.
Next, we give some sufficient conditions which ensure measurability of a
random process.
Proposition 1.11.1. Let {~(t); t E T} be a real-valued stochastic process and
T eRa compact interval. If ~(t) is continuous in probability, there exists a
version
~(t)
which is separable and measurable.
PROOF.
Due to continuity in probability of W), we have (see 1.10.1)
sup P{I~(t) - ~(s)1 > e}
It-·I<h
--+
0 as h --+ O.
(1.11.4)
Now set T = [a,b] and choose
to < ti < ... < t:n =
a=
b
such that
1
p{le(u) - e(v)1 > e} < 2
n
(1.11.5)
for all u, v E [tj-l' tj]. Suppose now that for every n, {tj} c {tj+l} and let L,
consisting of all the tj, be dense in T. Define
~n(t,oo) =
Wj,oo)
for tj::::;; t < tj+l'
From Example 1.11.1 we clearly see that en(t,oo) is a measurable process for
every n = 1, 2, .... Next, from (1.11.5) we infer that
P{IW) - en(t) I > e i.o.}
= 0,
so that
~n(t) --+
e(t) (a.s.), Vt E T.
Further, because {en(t); t E T} is a sequence of measurable processes,
e(t) = lim sup en(t)
n-+oo
d
is a measurable process such that e(t) =
e(t).
Finally, from the definition of e(t) and en(t), it is apparent that e(t) = en(t)
for all tEL. On the other hand, for every t E T and 00 E n, e(t) is the lim sup
of {e(tt)} for a sequence
c L which increases to t. From this and.
0
Definition 1.7.1, it follows that ~(t) is separable.
-
{trnl
30
1. Basic Concepts and Definitions
Problems and Complements
1.1. Let {n, fll, P} be a probability space. Let p be a function on fll x fll defined by
p(A,B)
= P(A 6 B),
where A 6 B = (A - B) u (B - A). Show that, for any C E fll,
peA, B) ::s; peA, C)
+ pCB, C).
1.2. Show that for any two random events A and B
IpeA) -
P(B)I ::s; peA, B).
1.3. Let Xl and X 2 be independent real-valued r.v.'s on a probability space {n, fll, P}
with the distribution functions (d.f.'s) Hl (-) and H 2 (-), respectively. Let
{~(t); t ~ O} be a stochastic process defined by
tX l
~(t) =
+ X2.
Calculate peA), where A is the set of all nondecreasing sample functions of the
process.
1.4. Let
{~(t);
t
~
O} be a stochastic process defined by
X
~(t) =
+ at,
a > 1,
where X is a r. v. with the Cauchy distribution. Let D c [0, 00) be finite or
countably infinite. Determine:
a. P{ ~(t) = 0 for at least one tED},
b. P{ ~(t) = 0 for at least one t E (1,2]}.
1.5. Let X and Y be r.v.'s on {n, fll, P}, where Y ~ N(O, 1) (standard normal). Let
{~(t); t ~ O} be a stochastic process defined by
~(t) = X
+ try + t).
Let A be the set of sample functions of ~(t) nondecreasing on [0,00). Show that
A is an event and determine peA).
1.6. In Problem 1.5 denote by B the set of sample functions which are nonincreasing
in [0,1]. Show that B is an event and determine PCB).
1.7. Let Xl and X 2 be independent r.v.'s on {n,fll, P} with common standard normal
dJ. Let {~(t); t ~ O} be a stochastic process defined by
~(t)
= (Xl + X 2 )t.
Determine F,t .... "Jx l , ... , xn). If A is the set of all non-negative sample functions, show that A is an event and determine peA).
1.8. Let
{~(t); t ~
O} be a stochastic process defined by
~(t)
= X cos(t + U),
where X and U are independent r.v.'s U is uniform in [-n,n], and E{X}
Determine its covariance function.
= O.
Problems and Complements
31
1.9. A stochastic process {W); t E T} is called normal (or Gaussian) if all its marginal
distributions are Gaussian. Set
x(t) = E{ W)},
then
~(t) ~
N(x(t), C(t, t»
for all t E T. Assume that
x(t)
= 3,
Show that
P{ ~(t) :s; 2}
~
.309.
1.10. Let {W); t ~ O} be random process with ~(O) = 0 (a.s.) and marginal probability
densities (to = Xo = 0)
{'
Jt, •...• ,.
(x 1 ,···,x") -.
_ 0" [2n(ti 1=1
ti-d]
-1/2
(1
(XI -
exp --2 (
X i _ 1)2)
ti - t i - 1
)
(0 < t 1 < ... < tn). Determine
E{W)},
E{e(s)W)}
= C(s,t).
1.11. Let X and U be independent r.v.'s on {n,~,p}, U is uniform in [0,2n], and the
probability density of X is defined by
fx(x) =
Let
{W~ t ~
{
0
2X3 e(-1/2x4)
X ~
0,
x< 0.
O} be defined by
W) = X 2 cos(2nt + U).
Show that W) is a Gaussian process [i.e., every random vector (~(tl)' ... ,
is normally distributed; see Chapter 4].
~(t.»
1.12. Let {W);tE T} and {X(t);tE T} be real stochastic processes on {n,~,P}. If
they are stochastically equivalent, show that they have identical marginal distributions. Under what conditions will they have the same sample functions?
1.13. Let {Xi}!' and {Yi}l be r.v.'s on {n,~,p} such that E{Xi} = E{Yi} = 0,
Var{Xi } = Var{Yi} = lIt < 00, E{XiXJ } = 0, E{Yilj} = 0, and E{Xi lj} = 0 for
all i j. Let {W); t ~ O} be defined by
*'
W) =
n
L {XJCOSAjt + ljsinAA·
j=1
Is the process wide sense stationary? Determine its covariance function.
1.14. Let {Xn}~ be an i.i.d. sequence ofr.v.'s with E{Xd = 0 and Var{X;} = 112 < 00.
Let {N(t);t ~ O} be a homogeneous Poisson process with E{N(t)} = At and
independent of {X.}~. Is the stochastic process {W); t ~ O} defined by
~(t) =
XN(t)
32
1. Basic Concepts and Definitions
strictly stationary? [A non-negative integer-valued random process with independent increments {N(t); t ~ O} is homogeneous Poisson process if 0 ~ N(tl) ~
N(t2) for all 0 ~ tl ~ t2 and
P{N(t 2) - N(td = k} = exp[ -A(t 2 - tdJ
[A(t2 - tl)]k
k!
;
see Chapter 2].
1.15. Let Z be a Bernoulli r.v. such that
P{Z=l}=p,
P{Z=-l}=q,
p+q=1.
Let {N(t); t ~ O} be a homogeneous Poisson process with parameter A, independent of Z. Consider the stochastic process {~(t); t ~ O} defined by
~(t)
= (_1)1/2(I-Z)+N(t).
Determine
Pg(t)
= I},
1.16. Let {~(t); -00 < t < oo} be a random telegraph process defined as follows:
assumes only two values, -1 and + 1, with equal probabilities
Pg(t)
= I} =!,
Pg(t)
~(t)
= -I} =!.
The sequence of transition times {Tk}~oo forms a homogeneous Poisson process
with parameter A > O. In other words, the number of transitions (from -1 to
+ 1 and vice versa) in (u, u + tJ is a homogeneous Poisson process N(t). Show
that the process is wide sense stationary and determine its covariance function.
1.17. Let {~(t); t E 9l} be a sample continuous wide sense stationary stochastic process. Determine
1.18. Let
{~(t); t ~
O} be a process with independent increments. Assume that
<PI(t,A) = E{ei.lWl}
and
<P2(tI,t 2,A) =
E{ei.l[~(t2)-Wl)]}
are given. Determine the characteristic function 0f (~(td, ... , ~(tn)) in terms of
<PI and <P2·
1.19. A stochastic process with independent increments is said to be homogeneous if
W + s) - ~(s) ~ W) for all s, t ~ O. In other words, W + s) - ~(s) and ~(t) have
the same distribution. Show that the characteristic function <pdt, A) is infinitely
divisible if continuous at t = O.
1.20. (Continuation) Let a stochastic process with independent increments
{~(t); t ~ O} be homogeneous. If <p(t; A) is continuous at t = 0 for all A, the
process is stochastically continuous.
E T} be a real stochastic process on a complete probability space
{Q,ai,P}, Dc T countable and everywhere dense, and A c Q negligible. If, for
any OJ r/= A and t E T, there exists {t n}'[' C D, tn --+ t, such that W., OJ) --+ ~(t, OJ),
show that the process is separable.
1.21. Let {~(t); t
Problems and Complements
33
1.22. If {W); t E T} is separable and f: R
is separable.
--+
R continuous, show that {f( W)); t E T}
1.23. If { ~(t); t E T} is such that ~(., w) is continuous on T for all wEN, where A c
is negligible, show that the process is separable.
n
1.24. Let {~(t); t E T} be stochastically continuous on T and f: T --+ R. Then X(t) =
~(t) + f(t) is stochastically continuous in those and only those points of T where
f(t) is continuous.
1.25. Let { ~(t); t E T} be a family of i.i.d. r.v.'s with common probability density f(·).
Show that ~(t) cannot be stochastically continuous at any point t E T.
1.26. Let {W); t E T} be stochastically continuous at every t
stochastically continuous if'll: R --+ R is continuous.
E
T. Then 'I'(W)) is also
1.27. In the previous problem, show that <p(t) = E{'I'<e{t))} is a continuous function.
1.28. Let {W);tE T} be a stochastically continuous process on {n,~,p}. Let
{X(t); t E T} be another process on {n,~, P} equivalent to W). Show that X(t)
is also stochastically continuous.
1.29. A stochastic process {~(t); t E T} is said to be "bounded in probability" (or
stochastically bounded) if
lim sup P{IW)I > C} =
C-++oo reT
o.
If ~(t) is stochastically continuous on T = [a, b], then (.) holds.
1.30. Let {W); t E T} be a process with independent increments. If Varg(t)} is a
continuous function in t, the process is stochastically continuous.
1.31. Let {W); t E [0, 1]} be a standard Brownian motion [i.e.,
increments and for any s, t + S E [0,1] with t > 0
P{W
+ s) - W) ~ x}
= (2ns)-1/2
too
W) has independent
e- z2 / 2s dz].
Show that almost all sample functions of ~(t) are continuous.
1.32. Let {W); t E [0, 1]} be a real Gaussian process with E{ W)} = 0 and covariance
function
C(s,t) = min{s,t} - st.
Show that its sample functions are (a.s.) continuous on [0,1].
1.33. Let {~(t); t E [a, b]} be a real process with (a.s.) continuous sample paths. Show
that ~(t) is measurable.
1.34. If all the sample functions of a real stochastic process are Borel measurable, does
this imply measurability of the random process?
CHAPTER 2
The Poisson Process and
Its Ramifications
2.1. Introduction
We begin by describing in an informal fashion the subject matter of this
chapter. The part of the general theory of stochastic processes dealing with
countable sets of points randomly distributed on the real line or in an arbitrary space (for instance, Cartesian d-dimensional space) is called the "Theory
of Point Processes." Of all point processes, those on the real line have been
most widely studied. Notwithstanding their relatively simple structure, they
form building blocks in a variety of industrial, biological, geophysical, and
engineering applications. The following example describes a general situation
which in a natural fashion introduces a point process on a line.
2.1.1. Let A be an event which reoccurs repeatedly in time so that
the time interval between any two consecutive occurrences of A has a random
length. Assume that our observation of the event A begins at time t = 0 and
that 0 < tl < t2 < ... are instances of the first, second, etc., occurrence of A.
These instances may be regarded as a set of points randomly distributed on
the half-line R+ = [0, 00). An alternative interpretation of this situation is
that the event A occurs at points of a randomly selected countable subsets
co c R+, where
co: (tl,t2' ... ).
EXAMPLE
The random event A may be the rainfall occurrence at a given site. Due to
stochastic nature of the rainfall phenomenon, its arrival times at a given
location may be regarded as points randomly distributed on R+. Other
specific cases of A could be the earthquake occurrences, volcanic eruptions,
flooding of a city, and so on.
2.2. Simple Point Process on R+
35
o
Figure 2.1. Crossings of the level Xo by the process e(t}.
Point processes generated by crossings of a fixed level by a stochastic
process are of particular interest in engineering. The interest in crossing
problems dates back to the original paper of M. Kac (1943) and of S.O. Rice
(1945). As an example in hydrology, consider the discharge rate ~(t) of a
streamflow at a given site. Because the surface runoff flows vary randomly in
time, ~(t) is a continuous parameter stochastic process. The crossing points
of a fixed level Xo by ~(t) are randomly distributed (see Figure 2.1). Of
particular interest in various applications is asymptotic behavior of this point
process when Xo -+ +00.
2.2. Simple Point Process on R+
Set R+ = [0, 00) and denote by fY/ + the a-algebra of Borel subsets of R+. Let
g be the collection of all infinite countable subsets of R+ which are locally
finite and do not contain zero as an element. In other words, every
of the form
0<
W: (t 1 ,t2 , ... ),
<
t1
t2
< "',
WEn
is
(2.2.1)
and such that for any compact K c R+
#(w (\ K) < 00,
where # denotes the number of elements in the set. It is clear from this
definition that tn -+ +00 as n -+ 00.
Let eX<·) be the Dirac measure on fY/+ concentrated at x, i.e.,
VB E fY/+,
1 if x
ex(B) = { 0
E
ifx ~
B
B,
and consider the Borel-Radon measure
Jl{-)
=
Le
te ro
t (·)·
(2.2.2)
2. The Poisson Process and Its Ramifications
36
This measure is called a "point measure" on R+. It clearly takes only nonnegative integer values and
Jl(K) <
00
on any compact K c R+. Note that every WEn is a closed subset of R+
because its intersection with any compact subset of R+ is a finite set.
Let {"t"n}f be a sequence of coordinate mappings on n, i.e.,
specified as follows: for any w: (t1' t 2 , ••• ),
"t"n(w)
= tm n = 1,2, ....
(2.2.3)
From this it readily follows that on n
0< "t"1(-) < "t"2(-) < ...
(2.2.4)
and that "t"n(·) -+ 00 as n -+ 00. Finally, we denote by fJI the least u-algebra of
subsets ofn with respect to which all"t"n are measurable.
We now give a definition of a point process on R+.
Definition 2.2.1. A random measure 11 on 91+,
n -+ N+,
11: 91+ x
where N+ = {O, 1, ... }, defined by, VB E ~+
(2.2.5)
is called a "point process" on R+.
From (2.2.3) and (2.2.5), it is clear that for every fixed WEn, 11(B, W) = Jl(B),
where Jl(.) is a point measure defined by (2.2.2). In addition, for any t E R+
and WEn,
11({t},W)=
{o1
ift¢w
·f
1 t E W,
(2.2.6)
which means that the point process does not have multiple points. Such a
point process is called "simple."
When B = (0, t], we will write
N(t) = 11«0, t], .),
N(O) == O.
(2.2.7)
From (2.2.5) we readily have that
L I(o.r)("t"k).
00
N(t) =
k=l
(2.2.8)
The stochastic process {N(t); t ~ O} is called the "counting random function"
of the point process 11. It is easy to see from (2.2.8) that every realization of
38
2. The Poisson Process and Its Ramifications
Because {/;}1 are independent r.v.'s, we readily obtain from (2.3.2) that
pt~ Ii > k} ~ pt~ Ii > k -
,~ P{:~ Ii =
I}
0,1,
=
I}.
(2.3.3)
On the other hand,
pt~ Ii > o} = 1 =
P{/l = O, ... ,In = O}
1 - P{/l
=
O, ... ,In- l =O}+P{/l =O, ... '!n-l =O,In= I}
= ... = 1 - P {/ 1 = O} + P {/ 1 = 0, 12 = I} + ...
From this and (2.3.3) we obtain the recursion
pt~ Ii > k} ~
ptt
Ii >
k- 1}pt~ Ii> O}
which proves (2.3.1).
D
Let X and Y be r. v.'s on a probability space {n,!JI, P} with values in some
abstract space {S, 2}.
Definition 2.3.1. The total variation distance d(X, Y) between X and Y is
defined by
d(X, Y) =
IlPx - Pyll = sup IPx(D) - Py(D)I,
DE !l'
(2.3.4)
where Px and P y are, respectively, the distributions of X and Yon {S, 2}.
Remark 2.3.1. When X and Yare real-valued r.v.'s, the metric
do(X, Y) =
IlEx - Fyll = sup IFx(x) - Fy(x) I
x
(2.3.5)
is often useful, where Fx and Fy are the distribution functions (d.f.'s) of X and
Y, respectively. Clearly,
do(X, Y) ~ d(X, Y).
Finally, if X and Yare integer-valued,
1 <XJ
d(X, Y) =:2 k~O IP{X = k} - pry = k}l.
(2.3.6)
The next result is known as the coupling lemma.
Lemma 2.3.2
sup IPx(D) - Py(D) I ~ P{X "# Y}.
DE !l'
(2.3.7)
37
2.3. Some Auxiliary Results
N(t)
i
i
I
r----I
I
I
Tl
T2
I
I
0
I
T3
T4
Figure 2.2. A sample function of N(t}.
N(t) is a nondecreasing step function, continuous from the right, with unit
jumps at each of its discontinuity points. In Figure 2.2, a sample function of
N(t) is depicted.
The r.v.'s {'t"n}'1 are usually called "arrival times" of the point process.
The following relation between the counting function N(t) and its arrival
times 't"n is easy to see:
{'t"n
~
t}
= {N(t) ~
n}.
(2.2.9)
Let P be a probability measure on {n,.1l} and set
A(t)
Fn(t) = P{'t"n ~ t},
= E{N(t)}.
(2.2.10)
Then from (2.2.8), it clearly follows that
A(t) =
00
L Fk(t).
k=l
(2.2.11)
2.3. Some Auxiliary Results
In this section, we discuss some inequalities involving sequences of independent Bernoulli r.v.'s. These inequalities, which are of independent interest, will
considerably simplify many proofs in forthcoming sections.
Lemma 2.3.1. Let {Ik}1 be independent Bernoulli r.v.'s; then,
(2.3.1)
PROOF.
First, we have the following equality:
n
P { i~ Ii > k
}
{'-1
n
n
}
=,~ P i~ Ii = 0,1, = 1, i~' Ii > k - 1 .
(2.3.2)
40
2. The Poisson Process and Its Ramifications
Then
= pry; ~ 2} + pry; = O,l i = I}
= 1 - e- P; - Pie-P; + e-P;[l - e P;(1
P{Ii"# Y;}
::; 1 - Pi(1 - Pi) - (1 - Pi) =
pr.
- p;)]
(2.3.10)
This and (2.3.9) prove the assertion.
0
2.4. Definition of a Poisson Process
In this section we discuss one of the most important point processes on R+,
the Poisson point process. We first give its definition.
Definition 2.4.1. A simple point process on R+, with counting random function {N(t);t ~ O}, is called a "Poisson process" if
a. {N(t); t
b. {N(t); t
~
~
O} has independent increments,
O} is stochastically continuous.
As before, set A(t)
=
Lemma 2.4.1. For all t
E{N(t)}; we shall show that for all t <
~
00,
A(t) <
00.
0,
P{N(t) > O}
A(t)::; 1 - P{N(t) > O}'
(2.4.1)
0= t no < tnl < ... < tnn = t
(2.4.2)
PROOF: Consider
so that
max (tni - tn.i-d
-+
i
as n -+
00.
(2.4.3)
0
Set
(2.4.4)
'¥(s) = 1[1,oo)(s);
then we clearly have
n
L ,¥(N(tni) n-oo i=l
N(t) = lim
N(tn,i-l))
(a.s.).
(2.4.5)
This follows from the stochastic continuity of N(t) and the fact that the sum
on the right-hand side of (2.4.5) is nondecreasing in n. Next, write
(2.4.6)
Clearly,
{'¥ni}~
is a sequence of independent Bernoulli r.v.'s. Thus, according
39
2.3. Some Auxiliary Results
PROOF. We have
P{X ED} - pry ED}
= P{X E D}
S;
for all D
E
P{X
E
- pry E D,X ED} - P{Y E D,X
D, Y
f. D}
S;
f. D}
P{X #; Y}
!l', which proves the assertion.
D
Remark 2.3.2. Concerning the coupling inequality (2.3.7), the following
should be pointed out. We can use any joint distribution of (X, Y) in inequality (2.3.7) as long as its marginals are Px and Py• Thus, to get a sharp bound
in (2.3.7), we will select, from all joint distributions of (X, Y) with the same
marginals Px and Py, one that has the least probability, P{X #; Y}.
Application of the coupling lemma hinges on our ability to calculate
P{X #; Y}, which is not always a simple task.
The next result in this respect is of particular interest to us. Its purpose is
to determine simple exact upper bounds for the distance do(X, Y) and for
d(X, Y), in the case when X is a sum of independent Bernoulli r.v.'s and Ya
Poisson r.v. suitably chosen to approximate X in distribution.
Lemma 2.3.3 (Le Cam). Let {I;}1 be independent Bernoulli r.v.'s with
E{I;}=Pi'
and Ya Poisson r.v. with E{Y} =
i=1,2, ...
Ii Pi. Then
dCt Ii' Y) S;
it
(2.3.8)
pf.
PROOF. We can write
Y = Y1
+ ... + Y,.,
where li are independent Poisson r.v.'s with E{li}
Coupling Lemma 2.3.2, we have
dCt Ii' Y)
= dCt
Ii'
= Pi. Then from the
it li) pLt it li}
S;
Ii #;
(2.3.9)
To evaluate P{I i #; li}, several methods are available. The following one
is due to Serfling (1975). Let Zi also be a Bernoulli r.v. independent of li and
such that
Set
2.4. Definition of a Poisson Process
41
to Lemma 2.3.1, we have
Finally, by letting n -+ 00 in this inequality and invoking (2.4.5), we obtain
P{N(t) > k} ::s; (P{N(t) > 0})k+1
or [see (2.2.9)],
P{"t"k+1 ::s; t} ::s; (P{N(t) > O})k+1.
o
This and Equation (2.2.10) prove the assertion.
Corollary 2.4.1. From (2.4.1), it clearly follows that A(t) is finite and continuous
at every t ~ O. As a matter of fact, for any t ~ 0 and s ~ 0, it follows from the
Lebesgue dominated convergence theorem and stochastic continuity of N(t) that
lim {A(t + s) -
A(t)}
5-+0
= E {lim (N(t + s) -
N(t))}
5-+0
= 0,
which implies right continuity of A(t) at any t < 00. In the same fashion, one
can prove continuity of A(t) at any t ~ 0 from the left.
Set
A(to,t 1) = A(t1) - A(to),
O::S; to < t1 < 00;
(2.4.7)
then we have the following result:
Proposition 2.4.1. For any O::S; to < t1 < 00 and n =,1, ... ,
P{N(t 1) - N(t o) = n}
PROOF.
= exp[ -A(to,t1)]
{A(to,t 1W
,
.
n.
(2.4.8)
Consider the partition of [to, t 1]
where max k (t/l k
-
t/l,k-d -+ 0 as n -+ 00, and set
'¥/li
= ,¥(N(t/li) - N(t/l,i-1))'
where '1'(.) is defined by Equation (2.4.4). As in the previous lemma, we have
that
L '¥/li -+ N(td /I
i=1
N(t o ) (a.s.).
Set
P/li
= P{'¥/li = I}.
2. The Poisson Process and Its Ramifications
42
Because by assumption N(t) is stochastically continuous,
sup Pni::;:; sup P{N(tnJ - N(tn,i-1) ~ I}
i
-+
i
as n -+ 00. Hence, Pni -+ 0 as n -+
Suppose now that as n -+ 00,
(2.4.9)
0
uniformly in i.
00
n
I Pni -+ L(to, t 1) < 00,
i=1
(2.4.10)
and consider sequences {XniH of independent Poisson r.v.'s with E{XnJ =
Pni'
Invoking Le Cam's Inequality (2.3.8), we have
However, due to (2.4.9) and (2.4.10),
n
n
I P;i ::;:; max
Pnk I Pni -+ 0
i=1
k
i=1
as n -+
00
and the assertion follows due to the fact that
I
d
n
i=1
X ni -+ Y,
where Y has a Poisson distribution with E {Y} = L(to, t 1)'
D
Remark 2.4.1. The previous proposition was proved assuming that (2.4.10)
holds. We shall now prove this hypothesis.
Lemma 2.4.2. Under the conditions a and b of Definition 2.4.1,
n
lim
n-+oo
L Pni < 00.
i=1
(2.4.11)
PROOF. From the definition of 'link' it follows that there exists at least one
integer v = 1, 2, ... , n such that
P{
t
k=1
'link =
v} > O.
(2.4.12)
Because {'I'nkH are independent Bernoulli r.v.'s, we have
p{ t 'I'nk=V} =
k=l
I"'I
1:5',;i 1 < ... < iv:5:n
P{'I'ni 1=1,.··,'I'ni,=I,'I'ni'+1=0,.·.,'I'nin=0}
= }] P{'I'ni=O}
1$i~'.'.: ~'$JJ P{'I'ni = 1}
r
< 07=1 P{'I'ni = O} (Li Pni)'
-
(1 - SUPiPni)"
v!
ID
P{'I'nir =O}
2.5. Arrival Times {tk}
43
Now, invoking the inequality
!ln P{'Pni
= O} =
iInJ (1 -
(
n )
Pni) :::;; exp - ~ Pni ,
we obtain that
P{
L 'IInk = V
n
}
:::;;
k=l
L1
)
C exp (- nPni
(L~ Pnit
,.
v.
Therefore, ifL7=1 Pni -+ 00 as n -+ 00, the right-hand side ofthe last inequality
0
converges to zero, which contradicts (2.4.12). This proves the lemma.
Remark 2.4.2. When
A(t)
= At,
A > 0,
the Poisson process is said to be time homogeneous with mean rate A.
Remark 2.4.3. Any Poisson process can be transformed into a time homogeneous Poisson process. Indeed, let A(t) = E{N(t)} and denote by A-I the
right continuous inverse of A, i.e., for all u ~ 0,
A- 1 (u) = inf{t;A(t) > u}.
Because A(t) -+ 00 as t
satisfies
-+ 00,
A(A -l(U))
(2.4.13)
A-l(U) is defined for all u ~ 0 and, furthermore,
= U,
A(t) > u
if t > A -1 (u).
Therefore, the stochastic process {No(t); t ~ O} defined by
No(t)
= N(A -1 (t»
is a homogeneous Poisson process with
E{No(t)} = A(A -1 (t» = t.
Remark 2.4.4. Any Poisson process is also a Markov process.
2.5. Arrival Times {'!d
Let {Tn}! be the sequence of "arrival times" of a Poisson point process
{N(t); t ~ O} with A(t) = E{N(t)}, and set Fn(t) = P{ tn :::;; t}. Then, from (2.2.9)
and (2.4.8), we obtain that
Fn(t) = 1 - exp[ - A(t)]
1
= r(n)
where r(n) = (n - 1)!
n-l
[A(t)]k
k=O
k!
L --
ft exp[ -A(s)] [A(s)]n-l dA(s),
0
(2.5.1)
2. The Poisson Process and Its Ramifications
44
An important property of the arrival times of a Poisson process can be
described as follows. Given that exactly n events have occurred in [to, t 1]'
these n points are distributed throughout the interval [to, t 1] as n points
selected randomly (and independently) from this interval according to the
probability distribution
dA(s)
(2.5.2)
This is established in the following proposition.
Proposition 2.5.1. Let to = So < Sl < ... < Sr
and k 1, k 2 , ••• , kr non-negative integers. Set k1
= t1 be a partition of [to, t 1],
+ ... + kr =
k; then
(2.5.3)
PROOF. This follows from the fact that
{N(sJ - N(Si-1)
= ki, i = 1, ... ,r} c
{N(t1) - N(t o) = k}.
0
To this proposition one may also give the following interpretation: Let Z
be a r.v. with support [to, t 1] such that
P{Z <
-
t}
=
A(to,t)
A(to ,t 1 )'
to
~t~
t 1,
(2.5.4)
and consider n independent copies Zl, ... , Zn of Z. Denote by Zn1' ... , Znn the
corresponding sequence of order statistics. Then, the following is a variation
of Proposition 2.5.1. Given that N(t1) - N(t o) = n, the joint distribution of
arrival time {,n~, to < 'f < ... < ': ~ t 1 is the same as of {Zn;}~' As a
matter of fact, after some straightforward calculations, one obtains that, for
any to < U 1 < ... < Un < t 1 ,
But then, as is well known, the right-hand side of (2.5.5) is the joint distribution of (Zn1' ... , Znn).
The next example shows the usefulness of this result.
EXAMPLE 2.5.1. For every t > 0 and n
E
ttl
'kIN(t)
=
n}
=
=
1,2, ... ,
(t -
A~t)
I
A(S)dS)
n.
Indeed, according to Proposition 2.5.1 and Equation (2.5.4), we have
2.5. Arrival Times {'t"k}
E{
f
k=l
45
'kIN(t)=n}=E{
f
k=l
Znk}=E{
f
k=l
zk}=nE{zd=An()
t
Jit sdA(s).
0
As every process with independent increments, a Poisson process is also a
Markov process. It is easy to see that, for any 0 < s < t < 00 and 0 ::::;; i ::::;; j,
its transition probability is given by
Pij(s, t) = P{N(t) = jIN(s) = i} = P{N(t) - N(s) = j - i}.
(2.5.6)
It is of some interest to investigate how the Markov stochastic structure of
{N(t);t ~ O} reflects on the stochastic structure of the process {'n}1' and vice
versa. In the rest of this section, this question will be discussed in some detail.
But first, note that, under relatively mild regularity conditions on A(t),
P{N(t
+ s) -
N(t)
~
2}
=
o(s)
(2.5.7)
as s --+ o. We now have the following result:
Proposition 2.5.2. For all 0 < s < t and 0 ::::;; i ::::;; j,
Pij(s,t) = P{N(t) = jl'j = s}.
PROOF.
For i = 1,2, ... ,
P{s <
as
~s --+
(2.5.8)
'j : : ; s + ~s} =
= P{N(s)
o. Thus,
Pis,t)P{s < 'j::::;;
S
+ ~s) ~ i}
i - 1, N(s + ~s) = i} + o(~s)
P{N(s) ::::;; i - 1, N(s
=
+ ~s}
+ ~s) = i} + o(~s)
= P{N(s) = i - 1, N(s + ~s) = i, N(t) = j} + o(~s)
= P{N(t) = j, s < 'j::::;; s + ~s} + o(~s).
= Pij(s,t)P{N(s) = i - 1, N(s
From this we conclude that
Pij(s, t) = P{N(t) = jls < 'j::::;; s
By letting
~s --+
0 the assertion follows.
+ ~s} + 0(1).
D
Remark 2.5.1. The proof of (2.5.6) is based on two features ofthe process N(t):
its Markov property and condition (2.5.7). Therefore, the proposition holds
for any point process whose counting random function is a Markov process
and which satisfies condition (2.5.7).
'Ii = '1 and 1'" = 'n - 'n-1 for n ~ 2. Then, if A(t) = A.t
[i.e., N(t) is a homogeneous Poisson process], {T,,}1' is an U.d. sequence of r.v.'s
with common dJ.
P{T1 ::::;; t} = 1 - exp( -A.t).
Corollary 2.5.1. Set
2. The Poisson Process and Its Ramifications
46
PROOF. The proof is quite simple. From (2.5.6), it follows that
1 - Pjj(s,s
+ u) = P{N(s + u)?: j + IIN(s) =j} = P{rj+l
=
P{1j+l :::;; ulrj
=
:::;; s
+ ulrj = s}
s}.
On the other hand,
Pjj(s, s
+ u) =
exp( - AU),
which proves the assertion.
D
2.6. Markov Property of N(t) and Its Implications
Let {N(t); t ?: O} be a Markov process. The aim of this section is to show that
in such a case the corresponding sequence of arrival times h}! is also a
Markov process. The converse, however, does not hold in general. In other
words, if {rJ! is a Markov process, it does not necessarily follow from this
that N(t) has the Markov property. An exception is, of course, the Poisson
process.
In what follows we shall prove these two statements. Our elementary
method of proof will require that the condition
P{N(t
+ At) -
N(t)?: 2}
=
o(At)
(2.6.1)
as At --+ 0 holds.
Proposition 2.6.1. Let {N (t); t ?: O} be a Markov process with transition probability Pij(s, t). If condition (2.6.1) holds, then {r j }! is also a Markov process with
transition probability
(2.6.2)
PROOF. From (2.5.8), it readily follows that
00
I Pik(S,t) = P{rj+l :::;; tlrj = s}
k=j+l
which proves (2.6.2). Next,
+ dt) = I}
O}PO,l(t,t + dt).
P{r 1 Edt} = P{N(t) = 0, N(t
=
We also have, for allj
=
P{N(t)
=
(2.6.3)
1,2, ... ,
+ dt) = j + llrj = s}
= Pjj(s, t)Pj,j+l (t, t + dt).
P{rj+1 E dtlrj = s} = P{N(t) = j, N(t
(2.6.4)
2.6. Markov Property of N(t) and Its Implications
47
Finally, for 0 < tl < ... < tn < 00 and n = 1, 2, ... arbitrary, we obtain,
after some straightforward calculations, that
P{'rl E
dt 1'''',!n E dtn}
= P{N(td =
=
+ dt 1) = 1, ... , N(t n) = n - 1, N(t n + dtn) =
P{N(td = 0}P01 (t 1,t 1 + dtdPl1(tl,t2)P12(t2,t2 + dt 2) ...
Pn- 1,n-l (t n- 1, t n)Pn- 1,n(tn, tn + dt n)·
0, N(tl
n}
From this, (2.6.3), and (2.6.4), it follows that
P{!l E dt 1"",!n E dt n} = P{!l E dtdP{!2 E dt21!1 = td ...
P{!n E dt nl!n-l = tn-d·
(2.6.5)
o
This proves the proposition.
Corollary 2.6.1. If N(t) is a Poisson process with A(t) = E{N(t)}, it clearly
follows from (2.5.6) and (2.6.2) that {!j}f is a Markov process with stationary
transition probability
P{!j+1 ::;;
tlrj
=
s}
= 1-
exp[ -A(s,t)].
(2.6.6)
In fact, as we have indicated before, one can prove much more. To this end
we need the following auxiliary result.
Lemma 2.6.1. Let {!j} f be a Markov chain with stationary transition probability (2.6.6). Then, for any 0 < tl < ... < tn < t and n = 1,2, ... ,
(2.6.7)
PROOF.
After some simple calculations, we deduce that
d
d I () _ }_P{!lEdtl, ... ,!nEdtn,!n+l>t}
{
P !1 E t 1""'!n E tn N t - n P{N(t) = n}
=
e{p( - A(t)j
P N(t) = n
Ii dA(tj ).
1
On the other hand,
P{!n Edt}
= P{O <!1 < ... < !n-l <
f. f
O<tl < ...
<I n -1 <t
From this and (2.6.6), we have
t'!n Edt}
P{!lEdtl,· .. ,!n-1Edtn-l,1:nEdt}.
(2.6.8)
48
2. The Poisson Process and Its Ramifications
P{Tn Edt}
=
f·· f
exp[ -A(t)]
CD
dA(tj ) )dA(t)
0<11 < ... <tn - l <t
{A(t) }n-l
(2.6.9)
= exp[ -A(t)] (n _ I)! dA(t).
Therefore, for all t > 0 and n = 0, 1, ... ,
{A(tW
P{N(t) = n} = exp[ -A(t)]-,-.
n.
(2.6.10)
o
This and (2.5.1) prove the lemma.
Proposition 2.6.2. Let {N(t); t ~ O} be a simple point process with A(t) =
E{N(t)}. A necessary and sufficient condition for N(t) to be a Poisson process
is that its sequence of arrival times {Tj}'f is a Markov process with stationary
transition probability (2.6.6).
PROOF. Necessity of the condition is obvious. Its sufficiency will be amply
illustrated by showing that N(s) and N(t) - N(s) are independent r.v.'s. Thus,
let 0 < s < t and k, n = 0, 1, ... be arbitrary arid write
P{N(s) = k, N(t) - N(s) = n}
= P{N(s)
= k, N(t) - N(s) = nIN(t) = k + n}P{N(t) = k + n}. (2.6.11)
Now, according to Lemma 2.6.1, given N(t) = n + k, the k arrival times in
(0, s] and n in (s, t] are distributed as k + n independent r.v.'s with common
dJ.
A(u)
A(t) ,
0 ~ u ~ 1.
Hence,
P{N(s)
= k, N(t) - N(s) = nIN(t) = n + k} = (n ;
k)(~~:Dk (A~~~;)y-k.
This, (2.6.10) and (2.6.11) yield
[A (S)]k
[A(s, t)]"
P{N(s)=k, N(t) - N(s)=n} = exp[ -A(s)]~exp[ -A(s,t)]
n!
'
which proves the assertion.
o
Denote by
(2.6.12)
the sequence of interarrival times of a point process. It is often of interest to
49
2.6. Markov Property of N(t) and Its Implications
know the stochastic structure of {1J}f. As we have seen in Corollary 2.5.1,
this is an i.i.d. sequence with common negative exponential distribution if N(t)
is a homogeneous Poisson process. If the Poisson process is not homogeneous, the r.v.'s {4}f are neither independent nor identically distributed. In
the case when N(t) is a "pure birth process," i.e., a Markov process with
stationary transition probability, the stochastic structure of {1J}f is easy to
determine.
Proposition 2.6.3. Suppose that {N(t);t
ary transition probability, i.e.,
~
O} is a Markov process with station-
Pij(s, s + u) = P;j(u),
(2.6.13)
then {1J}f is a sequence of independent r.v.'s with
P{1}+1 :::;; u}
= 1 - exp( -AjU), j
PROOF. From (2.6.13), it follows that, for all t, U > 0 andj
Pjj(u) = P{N(t
= 1-
+ u) =
P{tj + 1
:::;;
jIN(t) = j} = P{N(t
t
(2.6.14)
= 0,1, ....
= 0, 1, ... ,
+ u) = il't) =
t}
+ ultj = t} = 1 - P{1J+l :::;; ultj = t}. (2.6.15)
This clearly implies that 1J+l is independent of tj for all j
proves that {1J}f is a sequence of independent r.v.'s with
P{1j+1 :::;; t}
= 1, 2, ... , which
= 1 - Pjj(t).
(1.6.16)
Next, let 0 < s < t be arbitrary and consider
Pjj(t)
= P{N(s + t) = jIN(s) =j}
= P{N(t) = j, N(s + t) = jIN(s) = j}
= Pjj(t - s)Pjj(s).
If we set U
= t - s, we obtain
Pjj(s
+ u) = Pjj(u)Pjj(s),
(2.6.17)
which is the Cauchy's functional equation. The function Pjj(u) is nonincreasing with Pjj(O) = 1 for allj = 0, 1, .... In addition, for all u, s > 0 and
j =,1, ... ,
as s -+ 0, which implies that Pjj(u) is continuous for all U ~ O. In such a case,
the only solution of (2.6.17) is
Pjj(u)
= exp( - AjU), j
This concludes the proof of the proposition.
= 0, 1, ....
(2.6.18)
o
50
2. The Poisson Process and Its Ramifications
2.7. Doubly Stochastic Poisson Process
To motivate the consideration of a doubly stochastic Poisson process, we
shall begin by the following example.
EXAMPLE 2.7.1. A deep space probe was sent to investigate the volcanic
activities of the Jupiter's moon 10. Stationed high above the moon in a
stationary orbit, the probe counts the number of volcanic eruptions on the
moon's surface. However, if hit by a meteorite, the functioning of the probe
would stop. Therefore, its lifetime is a r.v., say T.
Suppose that the number of volcanic eruptions N(t} is a homogeneous
Poisson process with intensity A. Clearly, T and N(t} are independent r.v.'s.
The total number of eruptions detected during the probe's lifetime is N(T}.
This is a r.v. with the distribution
P{N(T}
= n} =
1
00
o
P{N(t}
= n} dH(t} =,1
n.
1
00
0
exp( -At)(At}"dH(t},
where H(t} = P{T:::; t}.
Suppose now that a meteorite impact at a random time T does not destroy
the probe but only reduces its sensitivity, so that after T, the intensity of the
Poisson process becomes Ao < A. To simplify the problem, we assume that the
second impact by a meteorite is not very likely. In such a case, the intensity
of the process N(t} is a r.v. specified as follows:
Hence,
A(t, . ) =
t
(2.7.1)
A(S, .) ds = Amin(T, t)
+ Ao(t -
(2.7.2)
min(T, t)}.
Denote by No(t) the homogeneous Poisson process with A = 1. Then, if
N(t) is an arbitrary Poisson process with E {N(t)} = A(t), according to Remark 2.4.3,
(No 0 A)(t) = N(t}.
Therefore, the probability that the probe will detect n volcanic eruptions
in (0, t] is
P{No(A(t,·» = n} = P{No(At) = n, T> t}
+ P{No(AT+ Ao(t -
T» = n, T:::; t}
(At)"
= exp( -At)-, [1 - H(t)]
+
I
t
0
n.
exp{ -[AS
+ Ao(t -
= E {exP [ _ A(t, .)]
s)]}
[A(~!. )]"}.
[AS
+ Ao(t n!
s)]"
dH(t)
2.8. Thinning of a Point Process
51
We now give the following definition of a doubly stochastic Poisson
process.
Definition 2.7.1. Let {A(t, ·);t ~ O} be a stochastic process on {o,ar,p} whose
realizations are strictly increasing functions such that A(O, . ) = 0 (a.e.). Let
No(t) be the Poisson process on the same probability space with E {No(t)} = t,
independent of A(t, '). A point process {N(t); t ~ O} is called a doubly stochastic Poisson (or Cox) process if it has the same distribution as
(No
0
A)(t)
= No(A(t, . )).
(2.7.3)
An exhaustive study on this subject can be found in a monograph by
Grandell (1976).
EXAMPLE
2.7.2. When
A(t,')
= Zt,
where Z is a non-negative r.v., the doubly stochastic Poisson process is called
a "weighted or mixed" Poisson process. It can be shown that the weighted
Poisson process is a Markov process with the transition probability
(t - sy-i E{Zje- Zt }
Pij(s, t) = (j _ i)! E {Zie ZS}'
(2.7.4)
2.8. Thinning of a Point Process
To provide the motivation for studying this particular procedure, we consider
the following example.
EXAMPLE 2.8.1. Suppose we observe the occurrence of the rainfall at a particular location. Suppose we begin our observation at time t = 0 and let 0 < T 1 <
T 2 < ... be the times of the first, second, etc., rainfall after t = 0 at this
particular site. The sequence of r.v.'s {Tn}! represents a point process on
R+ = (0, ex»).
Suppose now that for some practical purposes not all rainfall events arriving at the given location are of interest but only those which are sufficiently
large to cause flooding. If only their arrival times are recorded, the corresponding point process can be visualized as a point process obtained from
the point process {Tn}! by deletion of a certain number of its points. The new
point process is called a thinned point process.
In this section we are concerned with the so-called "independent thinning,"
which can be described as follows. Let {N(t);t ~ O} be an arbitrary simple point process. It undergoes independent thinning if each of its points
52
2. The Poisson Process and Its Ramifications
is deleted with probability p, 0 < p < 1, and retained with probability
q = 1 - p, independently for each point. The retained set of points forms a
thinned point process.
Let '1 q (. ) be the thinned version after the scale contraction by a factor q. In
other words, for A E ~+,
'1q(A) is the thinning of '1(q-1 A)
[see (2.2.5) for the definition of '1], where
q- 1A = {q-1 x ;XEA}.
The point process '1i') resulting from both thinning and contraction of the
time scale will have '1q(A) = n say, only if exactly n of the '1(q-1 A) points in
the set q-1 A from the original process are retained in the thinned process.
It is of some interest to investigate the asymptotic behavior of '1i') as
q -+ O. In the rest of this section we shall investigate the weak convergence of
'1 q (.) as q -+ 0 (weak convergence means convergence in finite-dimensional
distributions). The next result has been known for some time (see Belaeyev,
1963).
Proposition 2.S.1. The point process '1i' ), obtained by independent thinning and
contraction from a point process '1(.) on R+, converges weakly to a homogeneous Poisson process with intensity A E (0, (0), if and only if for all 0 < t < 00
qN
G)
.4 At
(2.8.1)
as q -+ 0, where N(') is the counting random function of '1(').
PROOF.
The proof is essentially due to Westcott (1976). Define
cDq(s,A)
= E(exp{ -S'1iA)}),
'Pis, A) = E(exp{ - 2S'1(q-1 A)}),
where s > 0; then
cDq(s, A) = E(E(exp { - s'1q(A)} I'1(q -1 A))).
The thinning mechanism implies that
E(exp{ - s'1 q(A)} 1'1(q-1 A))
Thus,
cDq(s,A) = E(exp{17(Q- 1A)ln(1 - q(1 - e- S ))}).
(2.8.2)
2.9. Marked Point Processes
But for any 0 < e <
53
!
q(1 - e- S) < -log{1 - q(1 - e- S)} < q(1 - e-S)(l
+ e)
(2.8.3)
if q < e. Because q --+ 0, we may choose e> 0 arbitrarily small and deduce
from (2.8.2) and (2.8.3) that
<l>q(s,A)
and '¥q(l-e-S,A)
have the same limit, if any, as q --+ O.
We shall first prove necessity of (2.8.1). To this end note that '¥q(s', (0, t])
converges to e-As't for all 0 $; s' < 1 as q --+ O. Now apply the continuity
theorem for Laplace transform (Feller, 1971, p. 408), remembering that such
functions are analytic, and hence determined, in their domain of definition,
by values in an interval.
To prove sufficiency of the condition (2.8.1), note that it implies
qrJ(q-l A) ~
AIIAII
for all A E J, the ring generated by intervals (0, t]. Here,
Lebesgue measure. So for such A,
11'11
denotes the
(2.8.4)
as q --+ O. But from a result by Renyi (1967), the Poisson process is uniquely
determined by the fact that
P{rJ(A)
= n} =
e-AIIAII (AliA lit
n!
for all A E J, and (2.8.4) implies all convergent subsequences of finite-dimensional distributions possess this property. Thus, they all have the same limit,
which proves the assertion.
0
2.9. Marked Point Processes
Consider a simple point process on R+ with arrival times {tJ l' and the
counting random measure rJ, all defined on a probability space {Q,BB,P}. Let
{~J1' be a sequence of real-valued r.v.'s on the same probability space. The
bivariate sequence of r.v.'s
(2.9.1)
represents a point process on R+ x R, which is called "a marked point
process."
The marked point process (2.9.1) is completely characterized by the nonnegative integer-valued random measure Q on fJl+ x fJl defined by
(2.9.2)
2. The Poisson Process and Its Ramifications
54
where for any G E fJl + x fJl
B(x,y)(G)
I if(x,Y)EG
= { 0 if (x, y)¢ G.
From (2.9.2) it readily follows that for any A
E
(2.9.3)
fJl+
Q(A x R) = '1 (A).
The counting random function N*(t, D) ofthe marked point process (2.9.1)
is defined by
N*(t, D) = Q«O, t] x D),
DE
[}t.
Clearly,
on {N(t) = O}
on N{(t)
~
I},
(2.9.4)
where N(t) stands for the counting random function of the point process
{rj}i·
From (2.9.4) we have that
N*(t,D) =
=
=
00
L I{N(t)2!j}I D(e)
j=l
j~ (~j I{N(t)=k})ID(ej )
k~l Ctl ID(ej ) )I{N(t)=k}'
In other words,
N*(t,D) =
N(t)
L ID(ej),
j=l
on {N(t) ~ I},
(2.9.5)
and N*(t,D) = 0 on {N(t) = O}. In addition,
N*(t, R) = N(t).
It is clear from (2.9.5) that for any D E fJl fixed, N*(t, D) is a stochastic
process with nondecreasing step sample functions with unit jumps at those
points Tk" Tk 2,'" (1 ~ Tk, < Tk2 < ... ) for which ek j E D. This implies that
N*(t, D) is a thinning of the original point process {TJi.
Denote by P~(') the distribution of a r.v. and, as before, set A(t) =
E{N(t)}.
e
Proposition 2.9.1. Assume that
a. {N(t); t ~ O} is a Poisson process.
b. {ej } i is a sequence of U.d. r.v.'s, with common distribution P~( . ), independent
of {Tj}i.
55
2.9. Marked Point Processes
Then {N*(t, D); t ~
O} represents a Poisson process with
E{N*(t,D)} = A(t)P~(D)
if P~(D) > O.
PROOF. There are many ways to prove this assertion. The following amply
illustrates one of the methods of proof. For any 0 < t1 < t2 < 00 and
O~k~n,
P{N*(t 1,D) = k, N*(t 2,D) = n}
N(I.J
N(12)
= P { i~ ID(ei) = k'i~ ID(e) = n
I~ rJ-k
=
pLt
jt
ID(ei) = k,
}
ID(el+)=n-k}
x P{N(td = 1}P{N(t2) - N(td = r}.
Invoking conditions of the proposition, we obtain
pLt
pLt
ID(O = k} = G)(P~(D»k[l
ID(ej ) = n - k} = (n
-
p~(D)J/-k,
~ k)(p~(Drk[l - p~(D)]r-n+k,
(A(td)'
P{N(t1) = I} = exp[ -A(tdJ-1-!- ,
P{N(t2) - N(td
= r} = exp[ -A(t 1,t2)]
(A(t 1 ,t2
,
r.
», .
From this, after some straightforward calculations, we have
P{N*(t1,D)
= k, N*(t 2,D) = n}
= exp[ -A(tdP~(D)] (A(t1~~(D»k
x exp[ -A(t1,t2)P~(D)]
(A(t 1, t2)P~(D»"-k
(n _ k)!
.
Therefore,
P{N*(t 1,D) = k, N*(t 2,D) - N*(t 1,D) = r}
= P{N*(t 1,D)
=
exp[ -
= k, N*(t 2,D) = r + k}
A(t1)P~(D)] (A(t1~~(D)t
x exp[ _A(t1,t2)P~(D)J(A(t1,t2~p~(D»r,
r.
56
2. The Poisson Process and Its Ramifications
from which we conclude that {N*(t, D); t ~ O} is a Poisson process with
E {N*(t, D)} = A(t)P~(D). This proves the assertion.
0
2.10. Modeling of Floods
In spite of the experience accumulated over many centuries in dealing with
floods, losses in property and lives not only increased considerably in recent
times but all indications are that they will increase even more in the future.
How and why? Is it because the rainfall-runoff relationships have changed,
or that hydrological factors responsible for floods have multiplied all of
sudden? There is plenty of evidence that this increase is not due to a dramatic
shift in the natural balance. Instead, the escalation of flood damage around
the world is a result of new factors emerging gradually in our societies. We
shall indicate one of them.
In many of the highly industrialized and densely populated areas of the
world, a reduction of the natural retention area of the flood plain has taken
place. Due to this fact, the flood waves have increased in amplitude and
accelerated, resulting in more flood damage downstream than had ever been
anticipated before. In fact, there are reaches of some European rivers in which
the last few years have repeatedly brought floods which exceeded the 100-year
design flood, on which the designs of various hydraulic structures and flood
protection works were based.
Denote by ~(t) the discharge rate of a streamflow at a given site. Clearly,
{~(t); t ~ O} is a non-negative stochastic process. We can assume, without
violating our physical intuition, that almost all sample functions of ~(t) are
continuous. Then, according to Proposition 1.6.1, there exists a version which
is separable and measurable.
Set
x(t)
=
sup
~(s).
(2.10.1)
It is apparent that x(td ::::;; X(t 2 ) for all 0::::;; t1 < t 2 • For the purpose of flood
modeling, it is essential to determine
<I>(x,t)
= P{X(t)::::;; x}.
(2.10.2)
Due to the separability property, the function <I>(x, t) is well defined. Unfortunately, we know very little about the stochastic process ~(t) and its stochastic
structure which makes an evaluation of <I>(x, t) extremely difficult. For this
reason, the common approach in practice to the problem of estimation of the
distribution <I>(x, t) is based on an empirical procedure [the method of best
curve fit to the observed values of the maximum X(t)]. This clearly is not
satisfactory and we will attempt a different approach.
Our approach is based on the following rationale. Because the floods are
our concern, then only those flows that can cause flooding are of interest to
57
2.10. Modeling of Floods
~(t)
o
Figure 2.3. A realization of the flow process
~(s).
us. For this reason we will confine our attention to those flows which exceed
a certain threshold level Xo (see Figure 2.3). The truncated part of the process
~(s) above Xo that lies between an upcrossing and the corresponding downcrossing of this threshold is called an "exceedance" or an "excursion." The
time point at which the nth exceedance achieves its maximum is denoted by
Tn' n = 1,2, .... These exceedances are, of course, caused by the rainfalls.
It is quite clear that {Tj}l' is a sequence of r.v.'s such that
(2.10.3)
Thus, {Tj}l' represents a simple point process obtained by the thinning
of the point process associated with the rainfall occurrences at the given
location. It seems "intuitively plausible" to assume that we are dealing with
independent thinning. If Xo is high enough, the point process {Tj } l' should
have a Poisson distribution, which has been confirmed by observation
records.
Let {N(t); t ;;::: O} be the counting function of the point process {Tj}l' which
is assumed to be Poisson with E{N(t)} = A(t) and set
(2.10.4)
For the sake of simplicity, we will assume that {Xk}l' is an i.i.d. sequence of
r.v.'s, independent of {Tdl'. In such a case, it follows from Proposition 2.9.3
that {(Xk' Tk)}l' is a marked Poisson process.
Denote by N*(t, D) its counting function, and set
E {N*(t, D)} = A(t)Px(D),
where
A(t) = E{N(t)}
and
P{Xl ~ x} = H(x).
(2.10.5)
Write
x*(O) = 0,
X*(t) = SUP{Xk;Tk ~ t},
Ft(x) = P{X*(t) ~ x},
x;;::: O.
(2.10.6)
(2.10.7)
58
2. The Poisson Process and Its Ramifications
It is easy to see that, for every x
~
0 and t
~
0,
{X*(t)::;; x} = {N*(t,(x, (0» = OJ.
(2.10.8)
Ft(x) = exp{ -A(t)[1 - H(x)}.
(2.10.9)
From this, we obtain
Next, we will investigate some properties of the stochastic process {X*(t);
t ~ OJ. For any 0 ::;; s < t, set
X*(s, t) = sup {Xj; 1) E (s, tJ}.
(2.10.10)
Because the increments of N*(t, D) are independent, we have
P{X(t)::;; x} = P{X(s)::;; x,X(s,t)::;; x}
= P{N*(s,(x, (0)) = 0, N*(t,(x, (0»
- N*(s,(x, (0» = O}
= P{X*(s)::;; x}P{X*(s,t)::;; x}.
(2.10.11)
Hence,
*
_ P{X*(t) ::;; x}
P{X (s,t)::;; x} - P{X*(s)::;; x}"
(2.10.12)
Using (2.10.11), we can easily show that X*(t) is a Markov process. Indeed,
for any
o< tI
< ... < tIl < t and 0 < x I
::;; ... ::;;
x" < x,
n = 1, 2, ... , we have
P{x*(t)::;; xlx*(td =
Xl' ... ,
X*(t ll ) = x,,}
= P{x*(tn )::;; X,X*(tno t)::;; xlx*(t l ) = Xl' ... , X*(t,,} = XII}
= P{X*(t", t) ::;; x}.
This proves the assertion.
An extensive discussion on this topic can be found in the work of
Todorovic (1982).
Problems and Complements
2.1. A switchboard receives on average two calls per minute. If it is known that the
number of calls is represented by a homogeneous Poisson process, find
(i) the probability that 10 calls will arrive in 5 minutes;
(ii) the probability that the third call comes during the second minute.
2.2. An orbiting satellite is subject to bombardment by small particles which arrive
according to a homogeneous Poisson process. Let p > 0 be the conditional
probability that a particle which has hit the satellite will also hit a given instrument on it. Determine the probability that in (0, t]:
Problems and Complements
59
(i) the particular instrument will be hit exactly k times;
(ii) the instrument will be hit at least once.
2.3. Let {N(t); t ;::: O} be a homogeneous Poisson process. Show that for any 0 <
s<tandO<k<n
P{N(s)
= kIN(t) = n} =
(n)k (S)k
t (1 - ts)n-k .
2.4. A life insurance company has established that the average rate of the policy
holders whose insurance is less than $100,000 is a constant p. Assume that
arrival times of insurance claims represent a homogeneous Poisson process. If
death has nothing to do with the value of a particular policy, find:
(i) the probability that there will be exactly k claims, each less than $100,000,
in an interval (0, t];
(ii) the probability that all the claims in (0, t] will be higher than or equal to
$100,000.
2.5. Suppose that all car drivers of a city are categorized in several groups, say AI'
A 2 , ... , An' according to their driving abilities. It is established that every car
accident for drivers in group Ai carries a small probability p of fatality. Suppose
that accidents (for members from Ai) occur according to a homogeneous
Poisson process. If the random event "fatal accident" is independent of the
process, determine:
(i) the probability that a driver from Ai will stay alive during (0, t];
(ii) the expected time of the fatal accident.
2.6. Buses arrive in accordance with a homogeneous Poisson process. Somebody
arrives at the bus stop at time t. Find the distribution ofthe waiting time for the
next bus.
2.7. Let {N(t);t;::: O} be a homogeneous Poisson process and 0 < s < t. Show that,
for any k = 1, 2, ... , n,
P{~k ~ sIN(t) = n} = it
C)(iY (1 -
on-i.
2.8. Suppose that the number of eggs laid by an insect in an interval of time (0, t] is
a homogeneous Poisson process with mean rate .t Assume that the probability
of one egg hatching is p > O. If Xl (t) is the number of hatched eggs and X 2 (t) the
number of unhatched eggs in (0, t], show that (assuming independence of eggs)
X 1 (t) and X 2 (t) are independent Poisson processes with mean rates AP and
A(1 - p), respectively.
2.9. Let {N(t);t;::: O} be a Poisson process with E{N(t)} = At. Show that
1
- N(t) -+ A (a.s.)
t
as t -+
00.
2.10. Let {N(t); t;::: O} be a Poisson process with E{N(t)} = A(t). Show that [see
(2.6.12)]
P{T,,+1 >
where Sl < ... <
Sn
tlt1 = s'''''~n = sn} = exp{ -A(t + sn) + A(sn)},
and t > O.
60
2. The Poisson Process and Its Ramifications
2.11. Let {N(t);t ~ O} be a Poisson process with E{N(t)} = At. It undergoes independent thinning with a probability P (see Section 2.8). Let N1(t) be the number of
recorded points and N 2 (t) the number of deleted points. Show that Nl (t) and
N 2 (t) are independent Poisson processes with E{N1(t)} = Apt.
2.12. Show that
E(exp { -i~ J(T;)}) = exp { - t<Xl {l- exp[ -J(S)]} dA(S)} ,
where {Tn} is the sequence of arrival times of a Poisson process N(t) with
E{N(t)} = A(t) and J(.) ~ O.
2.13. Let
{Ni(t)~
be n independent Poisson processes with E{Ni(t)} = At. Show that
<Xl
(
lim ~ J x
n-+oo k-O
k) (Ant)k
+ - -k' exp( -Ant) =
n.
J(x
+ At),
where J( .) is a continuous function.
2.14. Let X in , ... , Xnn be a sequence of independent Bernoulli r.v.'s such that
P{Xin = 1} = Pin'
Pin
+ qin = 1,
for all i = 1, 2, ... , n, and
n
L Pin =
i=l
A (independent of n).
(As an example consider a town in which an arsonist secretly lives, and if there
are n houses, excluding his, Ptn is the probability of a particular house being
burned down.) Show that
Sn =
n
!t'
L X in -+ N,
i=l
where N is a Poisson r.v. with E{N} = A.
2.15. Let gn}':' be a sequence of independent r.v.'s such that
Pg n = k} =
and An -+
+00
as n -+
00.
e-;.J?t
Show that
P {en - An
y0l.
~ x} -+ (2nrl/2 fX e- 1/2 z2 dz.
-<Xl
2.16. Let {N(t); t ~ O} be a Poisson process with E{N(t)} = A(t). Determine the probability density of (T 1"'" Tn) and of T. - T.- 1, n = 1, 2, ....
2.17. Let {N(t);t ~ O} be the counting random function of a simple point process.
If N(t) is a homogeneous Markov process (see Proposition 2.6.3), determine
P{N(t) = k} and Pij(u) [N(t) is called the pure birth process].
2.18. For a pure birth process (see the previous problem), find necessary and sufficient
conditions for
<Xl
L
to hold.
"=0
P{N(t) = n} = 1
Problems and Complements
61
2.19. Let {N(t); t ~ O} be a nonhomogeneous Markov process such that the limit
An(t)
= lim -hI P{N(t + h) = n + IIN(t) = n}
h~O
exists; in the case when
An(t) = CPo(t)
+ nCP! (t),
N(t) is called a Polya process. Determine
P{N(t) = n}
in the case when
IXA
CP! (t)
=
1 + <xtA'
IX ~
0,
A> 0,
and show that
I
00
k=O
P{N(t)
=
k}
=
1.
2.20. Let.P be the smallest ring of subsets of R+ = [0, (0) (Le., if A, B E .P, A u B E .P,
and A - B E .P) containing all intervals of the form (a, b] c R+. Let Jl(.) be a
nonatomic Borel-Radon measure on R+ (i.e., Jl is non-negative and finite on
compact subsets of R+). If, for each D E.P,
P{'1(D) =
n}
[Jl(D)]n
= exp[ -Jl(D)]--,
n!
the point process '1 [see (2.2.5)] is a Poisson process [This is due to Renyi
(1967).]
2.21. (Continuation) If, for each D E .P,
P{'1(D) = O} = exp[ -Jl(D)]
and P{'1(f') ~ 2}:5: Jl(i)cp(Jl(r)), where cp(.) ~ 0 is increasing and such that
cp(x) ~ 0 as x ~ 0, then '1(.) is a Poisson process.
2.22. Let {N(t);t ~ O} be a Poisson process with E{N(t)} = A(t) and arrival times
{tn}1'. Consider {(t"~n)1', where {e.}1' is an i.i.d. sequence with common dJ.
H(x) = P{e. :5: x}, independent of N(t).
(i) Show that {X(t); t ~ O} is a Markov process, where
X(t) = max {ek; tk :5: t}.
(ii) Let T(t) be the time instant in (0, t] where the maximum X(t) is achieved. Find
the dJ. of T(t).
2.23. (Continuation) Determine
P{x(t) :5: x, T(t) :5: s}.
CHAPTER 3
Elements of Brownian Motion
3.1. Definitions and Preliminaries
In Chapter 1 (Section 1.6, Example 1.3), we discussed in some detail the
nature and causes of the random motion of a small colloidal-size particle
submerged in water. According to kinetic theory, this movement is due to the
thermal diffusion of the water molecules, which are incessantly bombarding
the particle, forcing it to move constantly in a zigzag path. The phenomenon
was named "Brownian motion" after R. Brown, an English botanist who was
first to observe it. In 1904, H. Poincare explained that large particles submerged in water do not move, notwithstanding a huge number of impacts
from all directions by the molecules of the surrounding medium, simply
because, according to the theory of large numbers, they neutralize each
other.
The qualitative analysis of the motion was given independently by Einstein (1905) and Smoluchowski in 1916. However, a rigorous mathematical
formulation of the problem was formulated a decade later by N. Wiener. To
acknowledge his contribution, the stochastic process, which represents a
mathematical model of the motion, is called the Wiener process. In the
modern literature, the term Brownian motion process is used more often than
the Wiener process. In this book, we shall use both names equally.
Throughout this chapter we shall confine ourselves to the one-dimensional
aspect of the motion. Suppose we begin our observation of the wandering
particle at time t = 0 and assume it is at point x = O. Denote by ~(t) its
position on the line at time t > O. Due to the chaotic nature of the motion
~(t) is a r.v. for every t > 0 with ~(O) = O. Examining further the nature of this
phenomenon, it seems reasonable to suppose that E {~(t)} = 0 for all t ;;::: O.
Finally, if the temperature of the water remains constant, the distribution of
3.1. Definitions and Preliminaries
63
any increment e(t + s) - e(t) should not depend on t. This gives rise to the
following definition:
Definition 3.1.1. A standard Brownian motion or Wiener process is a stochastic process {W);t ~ O} on a probability space {n,Lf,p} having the following
properties:
(i) e(O) = 0 (a.s.);
(ii) {e(t); t ~ O} has independent increments;
(iii) for any 0 ::;; s < t,
P{W) - e(s)::;; x} =
J 21t(t1 -
s)
fX
exp ( - 2( u »)dU.
t- s
2
-<Xl
(3.1.1)
Remark 3.1.1. As any process with independent increments, e(t) is a Markov
process; the stationary transition probability function of Brownian motion is
P{W
+ 't)::;; xle(t) = y} =
~
V 21t't
f
X
-
Y
-00
exp (- U
2't )dU.
.
2
(3.1.2)
From (3.1.1), it follows that for any 0::;; s ::;; t
Var{ ~(t) - e(s)}
=t -
(3.1.3)
s.
On the other hand,
E g(s)e(t)} = E g(s) [W) - e(s)]
=
+ ~2(S)}
E{e(s)} = s.
In other words for two arbitrary non-negative numbers u and v,
Eg(u)~(v)} =
min{u, v}.
(3.1.4)
The increments of a standard Brownian motion are stationary in the sense
that the distribution of ~(t + 't) - W) depends only on 'to
From the fact that e(t) has independent increments and (3.1.1), we can
easily determine the joint probability density of (e(t 1 ), ... , (t n )) for any 0 <
t 1 < ... < tn' It can be readily seen that this density is given by the expression
ft" .. .,tJX1,"·' xn) = ft, (xdft 2 -t, (X2
-
xd ... ftn-ln_'(X n - xn-d, (3.1.5)
where
1
(X2)
ft(x) = --exp
-- .
2t
.Jbd
(3.1.6)
The system of distributions (3.1.5) clearly satisfies the necessary consistency conditions ofthe Kolmogorov theorem 1.4.1 This assures the existence
of a standard Brownian motion.
Roughly speaking, there are two types of statements to be made about any
stochastic process. The first one deals with its distributional properties and
64
3. Elements of Brownian Motion
the second one with sample path properties. Concerning the distributional
properties of Brownian motion, Equations (3.1.5) and (3.1.6) provide enough
information in this respect. As far as the sample path properties are concerned, we will show that almost all of them are continuous functions. To this
end, we need the following lemma.
Lemma 3.1.1. Let X be a r.v. with N(O, (12) distribution, then
E{X4}
= 3(14.
PROOF.
o
We now can prove the following statement.
Proposition 3.1.1. With probability 1, every sample function of a separable
Brownian process is uniformly continuous on every finite interval.
PROOF. Let {~(t); t ~ O} be a separable version of a standard Brownian
motion (which, according to Doob's proposition 1.7.2, always exists). Then
according to (3.1.1) and Lemma 3.1.1 for any t ~ 0
E(~(t
+ h) -
~(t))4 = 3h 2 •
From this and Theorem 1.9.1 (with C = 3,
fJ = 1, and
~~
IX
= 4), the assertion
0
Before concluding this section, we will mention that the process {X(t);
t ~ OJ, defined as
X(t) = IXt
is called a Brownian motion with drift
process {Z(t);O::;; t ::;; 1}, specified as
Z(t)
=
~(t)
+ ~(t),
IX.
(3.1.7)
On the other hand, the stochastic
- t~(1),
(3.1.8)
is called a Brownian bridge (or tied down Brownian motion). In both cases,
~(t) is, of course, a standard Brownian motion.
From (3.1.8), we see the following: Z(O) = Z(1) = 0 (a.s.); in addition,
the Brownian bridge is a Gaussian stochastic process. In other words,
for any 0 < t1 < ... < t n ::;; 1, (Z(td, ... ,Z(tn )) has a multivariate normal
distribution.
3.2. Hitting Times
65
3.2. Hitting Times
Let {e(t); t ~ O} be a separable version of a standard Brownian motion process. Denote by 'x the first instant t > 0 at which e(t) = x (the hitting time of
x). More formally,
'x(w)
= inf{t > O;W,w) = x}.
(3.2.1)
Clearly,
{'x~t}={sup
OS8St
e(s):::;x}
(3.2.2)
(see Figure 3.1). From this, we conclude, taking into account that the e(t) is
separable, that 'x is a r.v. for all x E R such that 0 < 'XI < 'X2 (a.s.) for all
Xl < x 2 • It is also clear that 'x and Lx have the same distribution.
Denote by ,~ the time of the first crossing of the level x by the process e(t),
i.e.,
,~=
inf{t > O;e(t) > x}.
(3.2.3)
Our physical intuition compels us to deduce that 'x :::; ,~. We will actually
show that 'x = ,~ (a.s.).
We now embark on the problem of determining the distribution of 'x'
Proposition 3.2.1. For all x
E
R,
P{'x:::; t} = 2P{W) ~ x}
and E{,x} =
00
(3.2.4)
if x =F O.
PROOF. Suppose first that x > O. From (3.2.1), we readily deduce that for an
instant t ~ 0
xr-------------~~~----
O~~~--------~------~
Figure 3.1. Random hitting time
'x'
66
3. Elements of Brownian Motion
Consequently,
{e(t)
Next, because for all 0
P{e(t)
~
~
x}
C
{'t"x
(3.2.5)
t}.
~
s < t and x E R,
x} = P{e(t)
~ xl~(s) =
~ xl~(s) =
x},
which is easy to prove using (3.1.2), it seems quite clear that
P{e(t) ~ xl't"x ~ t}
= P{ W) ~ xl't"x ~ t}
(3.2.6)
because at the instant 't"x the process was at x.
Therefore, taking into account (3.2.5), we obtain
P{e(t)
P{e(t) ~ xl't"x ~ t} = P{
x}
} .
t
(3.2.7)
e- 1/2z2 dz,
(3.2.8)
~
't"x ~
This and (3.2.6) then yield
P{'t"x
which clearly shows that
E{'t"x} =
~
't"x
t} = 2P{e(t)
<
~
x}
f'X)
=
(2/n)1/2
00
(a.s.) for all x> O. On the other hand,
Jx/ 0
roo P{'t"x > s}ds = rOO {1_(2/n)1/2 roo
Jo
= (2/n)1/2
Jo
LOO
ds
J~~
e-1/2Z2dZ}dS
f:/~ e- 1/2z2 dz = (2/n)1/2 IX) e-1/2z2 dz
f:
2Z2
' ds
This completes the proof of the proposition in the case when x > o. If x < 0,
the assertion follows from the fact that 't"x and Lx have the same distribution.
Thus, from (3.2.8), we have
P{'t"x
~ t} = (2/n)1/2 roo
JI I/0
e- 1/2z2 dz.
0
X
Coronary 3.2.1. From Proposition 3.2.1, we deduce that given x # 0, the standard Brownian motion process will hit x with probability 1. However, the
average length of time to achieve this is infinite.
Concerning the r.v.'s 't"x and 't"~, we have the following result.
Lemma 3.2.1. For all x
E
R,
't"x
= 't"~ (a.s.)
Proof. Because ~(t) and -e(t) have the same distribution, it suffices to prove
the statement for x > o. Clearly, 't"x ~ 't"~; hence, what we have to show is that
3.3. Extremes of ~(t)
P{'x <
,~} = O.
67
Obviously,
{'x <
,~} iQ nQ t~~£/n e(s) =
C
x}.
(3.2.9)
Therefore, to prove the lemma it suffices to show that, for all x > 0,
P{suPo<S:51 e(s) = x} = O. To this end, note that for tl < t,
P
tSs~~t e(s) = x} = P t~~~tl e(s) = x, tlS~~t e(s) ~ x}
+P
t~~~tl e(s) ~ x, tlS~~t e(s) = x}
~ P t~~~tl e(s) = x} + s:oo P {Wi)
E
dy,
tlS~~t e(s) -
e(t 1 )
=
x-
y}
=pt~~~tl e(s)=x}+ s:oo P{WdEdy}pLS~~t e(S)-Wl)=X- y }.
However, P{SUPt19:51 e(s) - Wi) = x - y} = 0 for almost all y (with respect
to Lebesgue measure). Thus,
ptss~~t e(s) = x} ~ pt~~~tl e(s) = x}.
In other words, P{SUP09:51 e(s) = x} does not decrease as t LO. On the other
hand, due to the continuity of e(t), we have for all e > 0
p{~~;t e(s) = e}
--+
0 as t LO.
Thus,
p{ sup e(s)
OSsSt
=
x}
~ lim
p{ sup e(s) > !x} =
t--+O
OssSt
2
O.
o
From this and (3.2.9), the assertion follows.
Remark 3.2.1. The last inequality follows from
{ SUP
OssSt
e(s)
=
x}
C
{sup e(s) > !x}.
OSS:51
2
3.3. Extremes of ~(t)
The result (3.2.4) is the key to many properties of the Brownian sample
functions. For instance, from (3.2.2) and (3.2.4), we obtain that
P{X(t) >
x}
=
(2/n)1/2
(00
Jx/JI
e- 1/2z2 dz,
(3.3.1)
68
3. Elements of Brownian Motion
~(s)
~~----~---------L-----+s
Figure 3.2. The reflection principle.
where
x(t)
=
(3.3.2)
sup ~(s).
O:o;s:o;t
We will give one more proof of (3.3.1) using the so-called reflection principle.
The following argument is not easy to formalize, although it seems intuitively quite clear. We have
P{x(t) > x,~(t) > x} = P{X(t) > x,~(t) < x}.
(3.3.3)
This can be explained heuristically as follows: For every sample path of the
process ~(t) which hits the level x before time t, but finishes above this level
at time t, there is another "equally probable trajectory" (shown by the dotted
line in Figure 3.2) that hits x before t and such that ~(t) < x. This is the
meaning of (3.3.3). On the other hand,
P{X(t) >
x}
=
P{X(t) > x,~(t) >
= 2P{X(t) >
x,~(t)
x} + P{X(t) >
x,~(t)
< x}
> x}
due to (3.3.3). The proof of (3.3.1) now follows from the fact that
P{X(t) > x,~(t) > x} = Pg(t) > x}
and from (3.1.1).
Remark 3.3.1. Equation (3.3.1) leads to a very interesting conclusion: For all
t > 0,
P{x(t) > O} = 1.
On the other hand,
pt~~~t ~(s) < o} = p{ - O~~~t (-~(s)) < o}
=
pts:o;~~t (-~(s)) > o}.
3.3. Extremes of W)
69
But due to the symmetry of e(t),
o}
P {SUp e(s) >
OSsSt
= p{ sup (-e(s»
OssSt
>
o}.
Consequently, for all t > 0,
pt~~t e(s) < o} = 1.
Therefore, starting from point x = 0 at time t = 0, a trajectory of a standard
Brownian motion process will intersect with the t axis infinitely many times
in (0, t] for any t > O. This shows how irregular sample paths of a Brownian
motion process are, and yet, almost surely they are continuous functions.
The following proposition gives the joint distribution of (X(t), 'l"x).
Proposition 3.3.1. For any t > 0 and x:::;; y,
P{X(t) :::;; y, 'l"x :::;; u}
{1
YX
= ( -x) fU f - exp --2 (V2
+ -X2)}
n
0
0
t- s
s
Jdvds .
s s(t - s)
(3.3.4)
PROOF. The following argument is not rigorous but it can be made so using
the strong Markov property concept which will be discussed later.
Because the Brownian motion process has independent increments and the
distribution of e(s + t) - e(s) does not depend on s, it seems intuitively clear
that once it reaches a point x at time t x ' it behaves after tx as if x is its starting
point. Therefore, given that tx = s, where 0 < s :::;; t, we have, for all 0 :::;;
x :::;;y,
P{x(t):::;; YI'l"x = s} = p{ sup e(u):::;; YI'l"x = s}
s~u:S:t
=p{x+
sup
OsuSt-s
e(u):::;;y} = P{X(t-s):::;;y-x}.
From this, (3.3.1), and (3.2.8), we obtain
P{X(t) :::;; y, 'l"x:::;; u}
=
s:
P{X(t):::;; YI'l"x
(x) fU (
="2
X
which proves the assertion.
0
2
n(t - s)
= s} dP{t x :::;; s}
)1/2 f Y-X {_V2}
0
exp 2(t - s)
( ns23 )1/2 exp (X2)
- 2s dv ds,
D
3. Elements of Brownian Motion
70
Denote by T(t) the epoch of the largest value of ~(s) in [0, t]. In the sequel
we will attempt to determine the joint distribution of (X(t), T(t)).
Proposition 3.3.2. For all 0 < u < t and x > 0,
P{X(t)
PROOF.
E
dx, T(t)
du}
E
=
nu
J u(tx -
u)
X2) dxdu.
exp ( --2
u
(3.3.5)
Our proof of the proposition is based on the observation that
T(t) = 'x
Consequently, for any 0 <
P{X(t)
E
dx, 'x
< t,
S
E
on the set {X(t) = x}.
ds}
=
P{'x E dslx(t)
=
= P{T(t) E dslx(t)
x}P{X(t)
E
dx}
x}P{X(t) E dx}
=
= P{T(t) E ds,X(t) E dx}.
But from (3.3.4), we have
P{X(t)EdY"x Edu }=
nu
J u(tx -
u)
[
1
exp --2
(3.3.6)
((y - X)2 +X2)] dudy.
t- u
u
(3.3.7)
Replacing y with x in this equation, it follows from (3.3.6) that, for 0 < u <
and x> 0,
P{X(t)
E
dx, T(t)
E
du}
=
nu
J u(tx -
u)
t
X2) dudx,
exp ( --2
u
o
which proves the assertion.
Corollary 3.3.1. From the last equation, we have
P{T(t)Edu}
f
oo
=
o
P{X(t)Edx,TEdu}=
n
J
~
u(t - u)
.
(3.3.8)
Therefore,
P{T(t) :::;; s}
1
= -
n
fS J
0
du
u(t - u)
2
.
= -arCSIn
n
Ii
-.
t
This is the famous arc sin law. In Figure 3.3, a graphical depiction of its
probability density is given.
3.4. Some Properties of the Brownian Paths
__
71
_ _ _ _ _ _ L -_ _ _ _ _ _ L -_ _
o
s
t/2
Figure 3.3. Graphical presentation of arc sin law.
3.4. Some Properties of the Brownian Paths
The Brownian motion process is governed by subtle and remarkable principles. Some feeling for the nature of the process may be obtained from
considerations of the local properties of its paths.
Proposition 3.4.1. Let g(t); t ~ O} be separable standard Brownian motion
process, then almost all its sample functions are not differentiable at any t ~ O.
PROOF.
We want to show that for all t
0 and h > 0,
~
p{lim e(t+h)-W)=oo}=1.
h.... O+
h
(3.4.1)
To prove this, note that for any 0 < h < b,
sup e(t
O<h<d
+ hh - W) ~ ~
sup (e(t
u O<h<d
+ h) - W».
From this and (3.3.1), we have for any x > 0,
P {sup
O<h<d
W+ ~ -
e(t)
>
x} ~ P { sup + h) - W» > bX}
= p{ sup e(h) > bX}
O<h<d
(e(t
O<h<d
=
(-2)1/2 fOO
e- 1/2z2 dz -+ 1
"./6
as b -+ O. From this, (3.4.1) follows. Because W) and - W) have the same
1t
distribution, it follows also that
3. Elements of Brownian Motion
72
P {lim
W + h) - ~(t) =
h
h-+O+
_ oo} = 1.
o
This proves the assertion.
Corollary 3.4.1. For almost all ro, ~(-, ro) does not have bounded variation on
any interval of [0, (0). Hence, the graph of ~(t,ro) is not rectifiable. Furthermore, for almost every ro, there is no interval (a, b) c [0,(0) on which ~(', ro) is
monotone. Indeed, if ~Lro) were of bounded variation on a finite interval, it
would be differentiable almost everywhere (Lebesgue measure) there. Also, if
~(', ro) were monotone on an interval, it would be differentiable at almost every
point of this interval.
In the previous section (see Remark 3.3.1) we have shown that
p{ sup
O~s~h
~(s) > o} =
~(s) < o} =
p{ inf
O~s~h
1
for any arbitrary small h > O. The next result yields properties of Brownian
sample functions for very large t.
Proposition 3.4.2
p{ sup
Ost<oo
PROOF.
p{
W) = +oo} =
inf
Ost<c:()
W) = -oo} =
(3.4.2)
Let k > 0 be an integer. Then taking into account (3.3.1),
p{ sup
Ost<:x)
~(t) > k} ';? p{ sup ~(s) > k}
O.s;s:::.;;t
= (2/n)1/2 (00 e-1/2z2 dz -. (2/n)1/2
Jk/jt
as
1.
t -. 00.
Thus,
p{
sup
O~t<oo
W) = oo} =
foo e- 1/2z2 dz =
1
0
p{n
k=l
{sup
O~t<oo
~(t) > k}} =
1.
On the other hand,
p{
inf
O:::;t<oo
~(t) = -oo} =
which proves the proposition.
p{ sup
O:$t<oo
(-e(t»
= oo} = 1,
o
According to the Corollary 3.4.1, almost all sample paths of a Brownian
motion process have infinite variation in any arbitrary small interval of time.
In contrast to this unpleasant feature of the process, a very sharp positive
3.4. Some Properties of the Brownian Paths
73
statement can be made about the so-called "quadratic variation" of Brownian
sample paths, as the following result clearly shows:
Proposition 3.4.3. Let 0 = t no < tnl < ... < tnn = t be a partition of the interval
[0, t], such that max;(tn,i+l - tni ) --. 0 as n --. 00. Then
l.i.m.
n~oo
n-l
L (Wn,i+d -
i=O
(3.4.3)
e(tni ))2 = t
and
(3.4.4)
where l.i.m. indicates the convergence in quadratic mean.
PROOF.
From (3.1.3), we have
E(e(t n,i+1) -
e(tnJ)2 = tn,i+l - tni ,
(4.3.5)
so that
n-l
L (e(tn,i+d -
E
i=O
e(tni ))2
= t.
(3.4.6)
Therefore, to prove (3.4.3) it is sufficient to show that
Var
tta
(Wn,i+d - Wn;))2} --.0
as n --'00. But from Lemma 3.1.1 and (3.4.5), we obtain
Var
tta
=
;))2} = ~ta Var {(Wn'i+d -
(e(tn,i+l) - e(tn
n-l
L
i=O
=2
{E(e(tn,i+d - Wni))4 - (tn,i+l - tnY}
n-l
L (tn,i+1 i=O
n-l
t ni )2 ~ 2 sup (tn,i+l - t ni ) L: (tn,i+l - t ni )
O,;;i';;n
i=O
= 2t sup (tn,i+l - t ni ) --. 0
O,;;i';;n
as n --. 00, which proves (3.4.3).
To prove the second part of the proposition, set
tni = (i/n)t
and consider
Wni))2}
3. Elements of Brownian Motion
74
By the Markov inequality, we have
(3.4.7)
After some elementary calculations, using the fact that for any 0 :::; s - t
E(~(t) - ~(S))2k
= 1 . 3··· (2k -
we deduce that
E{Y,,4} :::; c
k
1)(t - S)k,
GY
= 1, 2, ... ,
(3.4.8)
t,
where C > 0 is a constant. From this and (3.4.7), it follows that, for any B > 0,
P{I Y"I > B i.o.} :::; Ct4
Consequently,
f -.;n <
00.
n=l
o
Y" ~ 0 (a.s.) as n ~ 00. This completes the proof.
Corollary 3.4.2. By means of this result, we can show once more that almost all
sample functions of a Brownian process are not rectifiable. First, we have the
inequality
n-1
L (~(tn,i+1) i=O
~(tni)f:::;
sup
0,,;i";n-1
IWn,i+1) - ~(tni)1
n-1
L IWn,i+1) -
i=O
Wni)l·
But, from (3.3.1), we have that
sup
0,,;i";n-1
IWn,i+d - Wni)1 ~ 0
(a.s.)
From this and Proposition 3.4.3, it clearly follows that
n-1
L
i=O
1~(tn,i+1) - Wni)1 ~
(a.s.),
00
which is the conclusion of the corollary.
3.5. Law of the Iterated Logarithm
One of the most basic propositions in probability theory is the strong law
of large numbers. It states that if {~;}f is an i.i.d. sequence of r.v.'s with
EI~ll < 00 and Egd = 0, then
Snln ~ 0 (a.s.) as n ~
00,
where Sn = ~1 + ... + ~n' Roughly speaking, it means that for any B > 0, ISnl
will be less than nB if n is sufficiently large, so that Sn oscillates with an
amplitude less then nB.
3.5. Law of the Iterated Logarithm
75
In many situations, it is of some interest to have more precise information
on the rate of growth of the sum Sn as n -+ 00. This is given in the form of the
celebrated law of the iterated logarithm, which is considered as a crowning
achievement of the classical probability theory. The following proposition is
a precise formulation of the law of the iterated logarithm.
Proposition 3.5.1. Let
E(en = 1. Then
g;}f be an i.i.d. sequence of r.v.'s with
p{lim sup
n-+oo
p{lim inf
n-+oo
E(el)
=
°and
Sn
= 1} = 1,
J2nln In n
J 2nSnIn In n = -1} = 1.
This result for bounded r.v.'s was first obtained by Khintchine in 1924.
Later, it was generalized by Kolmogorov and Feller. Under the conditions of
the proposition, it was proved by Hartman and Wintner (1941). In this
section, we will establish an analogous result for a standard Brownian motion
process. But first we will prove the following auxiliary result.
Lemma 3.5.1. Let a r.v. X have the N(O, 1) distribution. Then, for any x > 0,
~(~ - ~)e-X2/2 < P{X > x} < ~e-X2/2.
v' 2n x
PROOF.
x
Xv' 2n
For all y > 0,
Multiplying by
follows.
(1 -:4)
e- y2 /2 < e- y2 /2 <
(1 + :2)
(3.5.1)
e- y2 /2.
1/.j2n and integrating over (x, (0), where x> 0, the assertion
Corollary 3.5.1. For x>
0
°sufficiently large,
P{X > x} '" _1_e- x2 /2
(3.5.2)
P{X > x} < e- x2 / 2 •
(3.5.3)
x.j2n
and, for all x > 1,
This result follows readily from Lemma 3.5.1.
We are now ready to embark on the problem of proving the law of the
iterated logarithm for a Brownian motion process. We will first prove the
so-called local law of the iterated logarithm.
76
3. Elements of Brownian Motion
Proposition 3.5.1. Let {e(t); t ~ O} be a standard Brownian motion process. Then
. sup
P { hm
t~O
p{lim inf
t~O
PROOF.
W)
J2tlnlnC 1
W)
=
J2t In In C 1
1} = 1,
(3.5.4)
-1} = 1.
(3.5.5)
=
We clearly have
P{W) > x} = (1/2n)1/2
foo
e- u2 /2 du.
x/ji
From this and (3.5.2), it follows that
P{W) > x} '"
as
x/Jt --+
+00.
_t~e-x2/2
(3.5.6)
xyTic
Let 0 < b < 1 and consider the random event Bk defined by
Bk = { sup e(s) > (1
O-<;s-<;b k
+ e)X k+ 1},
where e > 0 is arbitrarily small and
X
k = Jbklnlnb k.
Then as k --+ 00 it is clear that xk/ft --+
(3.5.6), we have, as k --+ 00,
P(Bk ) = 2P{eW) > (1
00.
+ e)xk+d
Therefore, from (3.2.2), (3.2.4), and
((1
ft
+ e)2xf+l)
2)1/2
'" ( exp - ---::-::-;:-n
(1 + e)xk+l
2bk
=
_1_(nb In b-k- 1)-1/2 (In b-k-1 )-b(l +£)2
1+ e
(In b -1 )b(l +£)2
1
(1
Next, set b = (1
+ e)fib (k + 1)b(1+£)2Jln(k +
1)
+ Inlnb- 1·
+ et 1 ; then clearly
P(B ) '"
k
_C~(e=)=
k1+e~'
(3.5.7)
where
C(e)
Because the series
If 1/k
1 +e
=
+ e)](l +e)
.
In(1 + e)
[1n(1
converges, it follows readily from (3.5.7) that
77
3.5. Law of the Iterated Logarithm
and, consequently, by the Borel-Cantelli lemma,
P{Bk i.o.} = O.
Therefore, for all sufficiently small t > 0 (for t < bk for some k)
~(t) < (1
+ e)J2t In In t
1
or, equivalently,
.
P { lIm sup
1.... 0
J 2t ~(t)
In In t-
}
1
< 1+e
=
1
for all e > O.
Let us now show that
.
P { lIm sup
1.... 0
To this end, define
Dk =
J 2t ~(t)
In In C
}
1
> 1- e
= 1.
(3.5.8)
{~W) - ~(bk+l) > (1 - ~)Xk}'
Clearly, {Dk}'i' is a sequence of independent events. Because the distribution
of ~W) - ~(bk+l) is equal to the distribution of ~W(l - b», we have for large
k that
P(Dk ) = _1_
fo
1
e- u2j2 du
00
(1-£j2)xkjbk (1-b)
Jb k(1 - b)
e/2)xk exp
'" fo(l -
(1 -
e/2)22bklnlnb-k)
2bk(1 - b)
.
From this, after some straightforward calculations, we obtain, as k -+
P(Dk )
JI=b
'"
00,
that
k-«1-(£j2))/(1-b))
--"-------= --=~-
2(1 - e/2)J1r,
From this, it follows that, for b
~
Jink
e/2, the sum
Thus, invoking the second Borel-Cantelli lemma, we have
P{Dk i.o.} = 1.
(3.5.9)
Further, due to symmetry of W), it follows from (3.5.3) that for sufficiently
large k and arbitrary ~ > 0
~(bk+l)
> - (1 + ~)Xk'
78
3. Elements of Brownian Motion
This and (3.5.8) then imply that the events
-(1 +
~(bk+l) + [~W) - ~(bk+1)] = ~W) >
-(1
=
b)Xk
+
(1 - ~e)Xk
+ b)xk + x ke/2 + (1
- e)Xk
occur infinitely often. But, the right-hand side of this inequality can be made
larger than (1 - e)Xk (by a proper choice of b). For instance, this is the case if
b is such that
which holds if
(1
+ b)
In In a- k - 1 e
b In In a k < 2'
which is true if b is sufficiently small. This proves (3.5.8) and because e > 0
is arbitrarily small, relation (3.5.4) follows.
To prove (3.5.5), note that due to the symmetry of ~(t),
-w)
· sup ----r"====;'
11m
1--+0
J2t In In t
· 10
. f
=> 11m
1--+0
W)
----r"====;'
J2t In In t
=
1 (a.s. )
= -
1 (a.s..)
This is the desired result.
Next, we shall prove the following result.
Lemma 2.5.2. If {~(t); t ;;::-: O} is a standard Brownian motion process, so is
PROOF.
Let 0 ::s;; u < t and consider
E(exp{B{t~(D - u~G)]})
= E
exp{Oi( ~ G)(t - u) - u[ ~ G) - ~G)])}
o
79
3.6. Some Extensions
Independence of increments in this case can be established by showing that
they are uncorrelated. Specifically,
E(ue(~)[te(D - ue(~) J)
= E(uteG)e(D - u2e2G)) = utG) - u2 G) =
o
This proves the lemma.
Proposition 3.5.2. Let {e(t); t
~
o} be a standard Brownian motion process; then
.
P { hm sup
1-+00
P {lim inf
1-+00
PROOF.
o.
J 2te(t)In In t = 1} = 1,
J 2tW)In In t =
-1}
= 1.
This follows from the fact that
· sup
I1m
1-+00
e(t)
I·
ue(1/u)
1
= 1m sup
=
.J2t In In t
u-+O
.J2u In In(1/u)
(
),
a.s.
and so on.
3.6. Some Extensions
Every Brownian process has independent and normally distributed increments. These are the defining features of the process. Also, every separable
version has, with probability 1, continuous sample paths. It is remarkable
that, in a certain sense, the converse also holds. It other words, every stochastic process with independent increments is Gaussian if its sample functions
are continuous with probability 1. This result is due to Doob. We now give a
precise formulation of this statement.
Proposition 3.6.1. Let {w(t); t ~ o} be a stochastic process with independent
increments. If its sample functions are continuous with probability 1, then its
increments are normally distributed.
PROOF. We will show that, under the conditions of the proposition, w(t) is
normally distributed for every t ~ O. Because, for any 0 ::::;; s < t,
w(t) = w(s)
+ [w(t) -
w(s)],
it will then follow from the Cramer theorem that wet) - w(s) has a normal
distribution.
80
3. Elements of Brownian Motion
The stochastic process w(t) is clearly separable (see Proposition 1.10.1).
Because every sample path of w(t) is continuous with probability 1, it is also
uniformly continuous in every finite subinterval of [0, 00). Hence, for every
e > 0, there exists a c5 = c5(e) such that
p{ sup
lu-vl <.I
IW(U)-W(V)I~e}<e,
Let e1 > e2 > ... and en -+
°
U,VE[O,t].
as n -+ 00. Consider
where
tni - tn,i-1
=
t
k
n
Set
if Iw(tnJ - w(tn,i-dl ~ en
if IW(tni) - W(tn,i-1)1 < en'
Clearly, then,
p{W(t) =I
~ Y"i} =
=
~ en})
pCQ {Iw(tni ) - w(tn,i-dl
p
{s~p Iw(tnJ -
W(tn,i-1)1
~ en} < en'
Now, using the independence of {Y"J, we have
E(e i6w (t»)
= !~~ E (ex p
(w ~ Y,,) ).
Set
IXni
If IXn
-+
IX and
v" -+
(J2,
E(e i6w(t»)
=
v"i =
E(y"J,
where IX and
(J
are finite, we obtain
kn
=
Var{Y"J,
lim exp(i8IXn) TI E(exp[i8(Y"i - IXnJ])
n-oo
1
3.7. The Ornstein-Uhlenbeck Process
81
Therefore,
which is the desired result.
o
3.7. The Ornstein-Uhlenbeck Process
The strange irregular motion of a small particle submerged in liquid,
caused by molecular bombardment, was first described mathematically by
L. Bachelier in 1900. He went so far as to note the Markov property of the
process. In 1905, A. Einstein and, independently, M. Smolukhowski proposed
theories of the motion which could be used, for instance, to evaluate molecular diameters.
A rigorous mathematical theory of Brownian motion was developed by
Wiener in 1923 (the Wiener process). This theory, however, makes no pretence of having any real connections with physical Brownian motion-no
particle can follow a typical sample path of the Wiener process. In 1930,
Ornstein and Uhlenbeck proposed yet another process, somewhat similar to
the Wiener process but more closely related to physical reality. The foundation for their work was laid down 22 years earlier by P. Langevin, whose
theory will be discussed briefly here.
The theory of Brownian motion developed by Einstein and Smoluchowski
were not based on Newtonian mechanics. Langevin's approach, on the other
hand, relies heavily on Newton's Second Law of Motion. In what follows we
will give a brief account of Langevin's model.
Denote by m the mass of a Brownian particle suspended in liquid and let
v(t) be its velocity at time t. There are two forces acting on this particle. One
is the frictional force exerted by liquid, which according to Stoke's law, is
given by - pv(t), where p > 0 is a constant which depends of the viscosity of
the liquid and on the particle's mass and diameter.
The second force acting on the particle is due to the effect of molecular
bombardment. It produces instantaneous random changes in the acceleration
ofthe particle. Denote this force by w(t). Then, according to Newton's Second
Law of Motion, we have
mAv(t) = - pv(t)At + Aw(t).
(3.7.1)
We assume that w(O) = 0 and that the following conditions hold:
(i) the stochastic process {w(t); t ~ O} has independent increments;
(ii) the distribution of w(t + s) - w(t) depends only on s; and
(iii) the sample paths of w(t) are continuous with probability 1.
But then, according to Proposition 3.6.1, w(t) is a Brownian motion
process, possibly with drift. Assuming then that E {w(t)} == 0 (no drift) and
82
3. Elements of Brownian Motion
putting E {W(t)}2 = u 2t, we can write
(3.7.2)
w(t) = uW),
where
~(t)
is standard Brownian motion. With this equation (3.7.1) becomes
+ uL\~(t).
mL\v(t) = - pv(t)M
Dividing by M and letting M
-+ 0,
we obtain
m dv(t) = _ pv(t)
dt
+ u d~(t)
(3.7.3)
dt '
which is called "the Langevin equation."
The unpleasant thing here is that this equation contains the derivative of
the Brownian motion process which, as we know very well, does not exist.
Therefore, the equation does not formally make sense. The problem offinding
a proper stochastic interpretation of the Langevin equation was resolved by
Doob in 1942, in the following fashion. Write Equation (3.7.3) as
m dv(t)
=-
pv(t) dt
+ u d~(t)
(3.7.4)
and try to give these differentials a suitable interpretation. We will interpret
(3.7.4) to mean that, with probability 1,
m
r
f(t)dv(t) = -p
r
f(t)v(t)dt
+u
r
f(t)dW)
(3.7.5)
for all 0 ~ a < b < 00 and f(· ) a nonrandom continuous function on [a, b].
As we shall see in the next section, all these integrals exist when the stochastic
processes are continuous with probability 1.
Finally, if in (3.7.5) we put a = 0, b = t, and f(t) = e<Xt, where IX = p/m, we
obtain
Iot
d(e<X'v(s» = -u
m
It
e<X' d~(s).
0
From this, assuming that v(O) = vou/m (constant) we readily deduce that, with
probability 1,
v(t) = ; (voe-<xt
+
I e-<X(t-')d~(S»).
(3.7.6)
Therefore, the velocity v(t) of a Brownian particle is the stochastic process
defined by (3.7.6).
Definition 3.7.1. The stochastic process {v(t); t ;;::: O}, where v(t) is given by
(3.7.6), is called the Ornstein-Uhlenbeck process. Integrating by parts, (3.7.6)
can also be written as
v(t) = ; (voe-<xt
+ ~(t) -
lXe-<xt
I e<XS~(S)dS).
(3.7.7)
3.7. The Ornstein-Uhlenbeck Process
83
From this definition, we obtain immediately that the average velocity of a
Brownian particle (according to the Ornstein-Uhlenbeck model) is
(3.7.8)
To determine its covariance, let t
~
s. Then
(3.7.9)
For t
= s, we obtain
Var{v(t)}
=
a)2 1 _ 2a
(m
e- 2IXt
(3.7.10)
From (3.7.6), we readily deduce that the Ornstein-Uhlenbeck process is
Gaussian. This follows from the fact that the integral
I
e-IXSde(s)
is the limit of sums
of independent normally distributed r.v.'s. All these results can be summarized as follows.
Proposition 3.7.1. The Ornstein-Uhlenbeck process {v(t); t ~ O} is a Gaussian
process with
E{v(t)} = !!..voe- lXt,
m
Cov(v(t), v(s» =
a)2
(m
e-IXlt-sl _
2a
e-IX(t+s)
.
To summarize, the solution v(t) of the Langevin equation (3.7.4) is called
the Ornstein-Uhlenbeck process. It is a model for the velocity of a Brownian
particle. The "derivative" of e(t), which formally does not exist, is called a
(Gaussian) white noise (the reason for this will be explained later), However,
84
3. Elements of Brownian Motion
because
X(t)
=
+
Xo
I
v(s) ds
is the displacement of the particle, x(t) is the physical Brownian motion.
Therefore, v(t) is the physical noise.
By letting t --+ 00, we obtain from (3.7.8) and (3.7.10) that
E{v(t)}
--+
0 and
Var{v(t)}
--+
(;y
21a = p.
(3.7.11)
Thus,
v(t)~v(CX)
v(oo)~N(O,p).
and
Now, let U be an N(O, p) r.v. independent of {v(t); t ?:: O}, and consider
Z(t)
= e- at (;
I
eaSde(s)
+
U).
(3.7.12)
Then the following result holds.
Proposition 3.7.2. {Z(t); t ?:: O} given by (3.7.12) is a stationary Gaussian process
with
Cov(s, t)
pe -alt-sl.
=
(3.7.13)
PROOF. The proof is straightforward. Normality follows from the fact that
the r.v.'s in (3.7.12) are all normal. The covariance is obtained by direct
computation.
0
Proposition 3.7.3. {Z(t); t ?:: O} given by (3.7.12) is a Markov process with
stationary transition probability
P{Z(s
+ t) E BIZ(s) =
=
x}
1
J2np(1 - e- 2at )
f
B
~p
{(u
- xe- at )2
-
2p(1 - e 2at)
(3.7.14)
~
and (a.s.) continuous sample functions.
PROOF.
According to (3.7.13), for all 0
E{(Z(t
+ u) -
e-auZ(t»Z(s)}
=
~
s < t and u > 0,
pe-a(t+u-S) - pe-aUe-a(t-S)
Therefore, Z(t + u) - e -au Z(t) is independent of all Z(s) if s
hand, for any 0 < Sl < ... < Sn < s,
~
=
O.
t. On the other
+ t) E BIZ(Sl) = Xl, ... , Z(Sn) = Xn, Z(S) = X}
= P{Z(S + t) - e-atZ(s) E B - e-atxIZ(Sd = Xl' ... , Z(Sn) =
= P{Z(S + t) - e-atZ(s) E B - xe- at }.
P{Z(s
Xn, Z(S)=X}
85
3.8. Stochastic Integration
But, the r.v. Z(s
E(Z(s
+ t) -
+ t) -
e-1X/Z(s) is normal with mean zero and
e- 1X/ Z(s»2
= E(Z(s + t) - e-1X/Z(s»(Z(s + t) = E{(Z(s + t) - e-1X/Z(s»Z(s + t)}
e-1X/Z(s»
= p(1 _ e- 21X/ ).
The continuity of sample paths follows from (3.7.7). This completes the proof
of the assertion.
D
3.8. Stochastic Integration
Let {x(t);tE[a,b]} and {y(t);tE[a,b]} be real stochastic processes on a
probability space {Q,~, Pl. The fundamental problem of stochastic integration is, roughly speaking, to give a sensible interpretation to the expression
r
(3.8.1)
x(s) dy(s).
If the process {y(t); t E [a, b]} is not of bounded variation, the integral (3.8.1)
cannot be defined pathwise (i.e., for each WE Q separately) as an ordinary
Stieltjes integral. Thus, a pathwise Stieltjes approach breaks down.
It was Ito (1944), extending the work of Wiener, who discovered a way of
defining stochastic integrals of the form
r
h(s,w)d~(s,w),
(3.8.2)
where ~ is a Brownian process and h a suitable random function, defined on
the same probability space.
EXAMPLE
r
3.8.1. To illustrate Ito's approach, let us determine the integral
~(s) d~(s),
where g(s); s E [a, b]} is a standard Brownian motion. To this end, consider
a
= t no <
tnl
< ... < tnn
= b,
where
max (tn,Hl
i
We now define
Because
-
t ni ) -+ 0
as n -+
00.
86
3. Elements of Brownian Motion
n-I
I
i=O
Wn;) [Wn.i+d - ~(tni)]
r
we obtain, invoking Proposition 3.4.3, that
~(s)d~(s) = H~2(b) - ~2(a)] -
t(b - a).
Many attempts have been made to generalize Ito's integral. The first
generalization consisted of replacing the Brownian motion process with a
square integrable martingale. Kunita and Watanabe (1967) introduced the
concept of a local continuous martingale and a stochastic integral with respect to it. Now the latest result in the theory is that one cannot integrate
with respect to anything more general than a semimartingale.
In this section, we discuss somewhat informally a version of the concept of
a stochastic integral. Our aim is to present some basic facts about stochastic
integration to justify the mathematical operation of the previous section in
connection with the solution of the Langevin equation.
Let {C(t);t E [a,b]} be a second-order random process (see Definition
1.5.8) such that E {W)} = 0 and h( . ) be a real nonrandom function on [a, b].
In the sequel, we will define the concept of Riemann stochastic integral of h(· )
with respect to C,
r
(3.8.3)
h(s)C(s)ds,
for a suitable class of functions h( . ).
Let
a
= t no <
tni < ... < tnn = b
be a partition of [a, b] such that SUPi (tn,i+1
sequence of r.v.'s {Un}':' defined by
-
t n;)
--+
0 as n --+
00.
Consider the
n-I
Un
=L
i=O
h(tni)Wni)(tn,i+1 - t ni )·
If {Un}':' converges in the mean square to a r.v. U, i.e., if
E(Un
-
U)2
--+
r
0 as n --+
00,
then we call U the Riemann integral of h( . ) with respect to Cand write
U
=
h(s)((s)ds.
(3.8.4)
3.8. Stochastic Integration
87
Note that (3.8.4) is equivalent to
E(U,,- Uk)2--+0 asmin{k,n}--+oo.
Next we discuss conditions under which the limit (3.8.4) exists.
rr
Proposition 3.8.1. For the integral (3.8.3) to exist, it is sufficient that the integral
(3.8.5)
h(s)h(t)C(s,t)dsdt
exists, where
C(s,t) = Eg(s)W)}.
PROOF.
Assume that the integral (3.8.5) exists and consider
E(U" - ukf
= EU; -
rr
2EU" Uk
+ EU;.
After some straightforward calculations, we find that
!~~ EU; =
lim
min(n,k)-+oo
EU" Uk =
h(s)h(t)C(s,t)dsdt,
(3.8.6)
f.b f.b h(s)h(t)C(s, t) ds dt.
(3.8.7)
a
a
Therefore,
lim
min(",k)-+ao
E(U" - U,,>2 = 0,
o
which proves the sufficiency of (3.8.5).
r
Corollary 3.8.1. If we assume that the function h( . ) is of bounded variation and
C(s, t) is continuous, then the integral
f..b
exists.
(3.8.8)
C(s, t) dh(s) dh(t)
Now, consider
,,-1
L h(t"i) [(t",i+1) -
i=O
W"I)]
11-1
11-1
i=O
i=O
= L h(tlli)C(t",i+d =
L h(t"iK(tlli )
L h(t",j-1)Wllj) + h(bK(b) j=1
II
= h(b)C(b) - h(a)C(a) -
h(a)C(a) -
L h(t"i)C(t"i)
i=1
"
II
L (tlli ) [h(t"i) ,,=1
h(t",i-1)].
(3.8.9)
88
3. Elements of Brownian Motion
r
r
But, because by assumption the integral (3.8.8) exists, it follows from Proposition 3.8.1 that
l~i:~. i~ Wni) [h(tni ) -
h(tn.i+l)] =
r
'(s)dh(s)
exists. Therefore, because the limit of the right-hand side of (3.8.9) exists, so
does the left-hand side and we have
h(s)d'(s) = h(bK(b) - h(aK(a) -
(3.8.10)
'(s)dh(s).
EXAMPLE 3.8.2. Let {W); t ;?; O} be a standard Brownian motion process. For
which functions h(· ) does the integral
I
h(s)e(s)ds
(3.8.11)
exist in the mean square? For this integral to exist, the integral (3.8.5) must
exist. Because, in this case,
C(s, t) = min(s, t),
we have
II
min(u, v)h(u)h(v) du dv =
=
=
I
I f:
I
I
h(V)dv{f: uh(u)du
+v
uh(u)du
+
h(v)dv f: uh(u)du
+
h(v)dv
=2
I
I
f
h(U)dU}
f
f:
vh(v)dv
h(u)du
h(u)du
vh(v)dv
f: h(v)uh(u) du dv.
From this, we deduce that the integral (3.8.11) exists if h( . ) ELl' This, then,
implies that the integral
I
h(s)de(s)
also exists if h( . ) is a function of bounded variation.
Problems and Complements
3.1. Determine the correlation function of a standard Brownian motion.
3.2. Let {W); t ~ O} be a standard Brownian motion. Determine the joint probability density of W1) and W2)' 0 < t1 < t2 < 1, given W) = o.
89
Problems and Complements
3.3. If {W); t ~ O} is a standard Brownian motion, show that X(t) = tW- l ), if t > 0
and X(O) = 0, is also a standard Brownian motion.
3.4. The reflected Brownian motion is {IW)I;t ~ O}, where W) is a standard
Brownian motion. Show that IW)I is a Markov process. Find Egl(t)l} and
Var{l~(t)l}.
3.5. Let { ~(t); t ~ O} be a standard Brownian motion. Find the conditional probability density of ~(t) given that Wd = Xl' Wz) = xz, and tl < t < t z .
3.6. Let {~i(t); t ~ O}, i = 1, 2, be two independent standard Brownian motions.
Define
X(t) =
{~l(t)'
~z(-t),
Find the covariance function of {X(t);
-00
t~0
t < O.
< t < oo}.
3.7. Let {~(t); t ~ O} be a standard Brownian motion. Show that
X(t) =
1
-~(ct)
c
is a separable Brownian motion.
3.8. Let {Z(t); 0 ~ t ~ 1} be a Brownian bridge [see (3.1.8)]. Find Cov(Z(t dZ(tz)).
Define
Y(t) =
Show that {Y(t); t
~
(1
+ t)z(_t_),
1+t
t
~ O.
O} is a standard Brownian motion.
3.9. Let {~(t); t ~ O} be a standard Brownian motion and consider
Z(t) =
I ~(s)ds,
t
~ O.
(i) What kind of process is Z(t)?
(ii) Determine its covariance function.
3.10. Let {~(t); t ~ O} be a standard Brownian motion and let 't x be the first hitting
time of X (see Section 3.2.1). Show that
3.11. Let X(t) = sUPo:s;.:s;, ~(s). Show that, for any
X
< y,
Pg(t) ~ x, X(t) > y} = Pg(t) > 2y - x}.
3.12. Show that {~('tx
+ t) -
~('tx); t ~ O} is a Brownian motion independent of't x •
3.13. Let {~(t); t ~ O} be a standard Brownian motion and X(t) = sup{ ~(s);O ~ s ~ t}.
Show that, for x < y and y > 0,
P{X(t) ~ y, W)
E
dx} =
1 { exp (Xz)
Jf.it
- 2t -
(2Y2t- X)z)} dx.
exp -
3.14. Let {W); t ~ O} be a standard Brownian motion and 'tx be the first hitting time
90
3. Elements of Brownian Motion
of a state x > O. Define
if t < tx
ift ~ t x .
Z(t) = {:(t)
Determine P{Z(t) ::;; y}, where y < x.
3.15. Show that the processes IW)I and X(t) - ~(t), where ~(t) is a standard Brownian
motion and X(t) = sup{ ~(s); 0::;; s ::;; t}, are stochastically equivalent in the wide
sense (see Definition 1.3.1).
3.16. Determine the probability
p{
min
to~s::;;to+t
[This is the probability that
Wo) = Xo > 0.]
~(s)::;; 0IWo) =
~(s) =
xo}.
0 at least once in (to, to
+ t) given that
3.17. Let {W); t ~ O} be a standard Brownian motion. Find the probability that W)
has at least one zero in (to, to + t).
3.18. Let T* be the largest zero in (0, t); then
P{T*::;; u} =
3.19. Let
{~(t);
~arc sin~.
7t
t ~ O} be a standard Brownian motion; then
V(t) = e-'~(e2')
is called an Ornstein-Uhlenbeck process. Show that V(t) is a Gaussian process.
Determine its covariance function.
3.20. Let g(t);t ~ O} be a standard Brownian motion and h(·) a real continuous
function on [0, (0). Consider
X(t) =
I h(s)~(s)
ds.
What kind of a process is X(t)? Determine its mean and variance.
3.21. Let f(·) and h(·) defined on [0, (0) be differentiable and g(t); t
dard Brownian motion. Show that
{f f d~(s) f
{f rh(s)f(t)d~(s)d~(t)
E
f(s)h(t)
dW) } =
~
h(t)f(t) dt.
3.22. (Continuation) Show that
E
} = 0 if a < b ::;; c < d.
3.23. Verify the identity
bfC
E {fa
a
f(s)h(t) d~(s) dW)
} fmiO{b.C}
=
a
h(t)f(t) dt.
O} be a stan-
91
Problems and Complements
3.24. Let {,(t); t ~ O} be a standard Brownian motion. Find the mean and covariance
of
e ot f~ e-OSd,(s).
3.25. Let {W); t ~ O} be a standard Brownian motion. Show that, for all t ~ 0,
IW)I ~ W) - inf '(s).
O,;s,;t
3.26. Suppose that the stochastic process {X(t);t ~ O} is a solution of the stochastic
differential equation
+ exX(t) = W),
X(O) = Vo,
where m, ex are positive constants and W) is a standard Brownian motion. What
mX'(t)
kind of process is X(t)? Determine its mean and covariance function.
3.27. Let the process {X(t); t ~ O} be a solution ofthe following stochastic differential
equation:
exX'(t) + PX(t) = ,'(t),
X(O) = Vo,
where ex, P> 0 are constants, Vo is a real number, and ,(t) is a standard Brownian
motion. Define precisely what is meant by a solution of this equation. Find the
solution satisfying the initial condition X(O) = Vo. Find the mean and covariance
ofW)·
CHAPTER 4
Gaussian Processes
4.1. Review of Elements of Matrix Analysis
In this section we present a review of some basic properties of the square
matrices that will be needed throughout this chapter. It is expected that those
who read this section have some background in matrix analysis.
Let M = (a ij ) be a square matrix whose elements (or entries) aij are, unless
otherwise stated, real numbers. If aii = 1 for all i and aij = 0 when i -# j the
square matrix is called the "unit matrix" and denoted by I. As usual we will
denote by M' the "transpose" of the matrix M, which is a square matrix
M' = (Pij), such that /3ij = aji for all i and j. From this definition, it follows that
l' = 1. When M = M', the square matrix M is said to be "symmetric." The
following properties of the "unary" operation' are easy to verify:
(M')' = M,
Let M = (a i) be an arbitrary square matrix. In this book, we shall use the
symbollMI to denote the determinant of M. One can verify that
IM'I=IMI·
(4.1.2)
If IMI = 0, the square matrix M is said to be "singular"; otherwise, we call it
nonsingular.
Let M be a nonsingular square matrix; then there exists the unique square
matrix, denoted by M- 1 , such that
M-1M = MM- 1 = 1.
(4.1.3)
The square matrix M- 1 is called the "inverse" of M. It is well known that
(4.1.4)
The matrix M- 1 is nonsingular; if Ml and M2 are nonsingular so is MI' M2
4.2. Gaussian Systems
93
and
(4.1.5)
In addition,
(4.1.6)
We will use the notation x for a column vector. Then x' is a row vector.
Now let A = (aij) be an n x n symmetric matrix. The matrix A is said to be
"non-negative definite" if the quadratic form
n
x'Ax =
n
L L aijXiXj;;::: O.
i=i j=i
(4.1.7)
If x'Ax = 0 if and only if x = 0, the square matrix A is said to be "positive
definite." A symmetric matrix A is positive definite if and only if there exists
a square matrix C such that ICI > 0 and
C'C = A.
(4.1.8)
If M is an n x n matrix, the equation
1M - All = 0
(4.1.9)
is of degree n with respect to A. The roots Ai,"" An of this equation are called
the "eigenvalues" (or "spectral values") of the matrix M. It is a well-known
result that every symmetric n x n matrix A has n real, not necessarily distinct,
eigenvalues Ai, ... , An' and that
n
IAI=nAi'
(4.1.10)
i=i
If all Ai > 0, the symmetric matrix A is positive definite.
Finally, a square matrix T is said to be "orthogonal" if
T'· T = 1.
(4.1.11)
The next result concerning orthogonal matrices is due to Scheff€:. Let A be a
symmetric matrix; then there exists an orthogonal matrix T such that
T'AT = D,
(4.1.12)
where D = (dij) is a diagonal matrix; that is, dij = 0 when i #- j. In addition,
dii = Aj • Finally, let M be an arbitrary square matrix; then the product
M'M=A
is a symmetric and non-negative definite square matrix.
4.2. Gaussian Systems
Let
(4.2.1)
94
4. Gaussian Processes
be a sequence of real r.v.'s on a probability space {n,~,p}, such that
{X;}~ c L2{n,91,p}. Denote by
J1.j
= E(Xj)'
uij
= E(Xi -
J1.i)(Xj - J1.j).
(4.2.2)
The symmetric n x n matrix
(4.2.3)
A = (Uij)
is called the covariance matrix. Because
(4.2.4)
the matrix A is positive definite. Thus, IAI > 0 and, consequently IA-li > O.
Denote by pi = (J1.l, ... ,J1.n)' The system of r.v.'s (4.2.1) is normally distributed if its joint probability density is
(4.2.5)
and A -1 is also a symmetric positive definite matrix.
Now let us determine K. Because A -1 is symmetric and positive definite,
there exists a square n x n nonsingular matrix C such that
(4.2.6)
Hence,
(x - p)'A-1(X - p) = (x - p)'C'C(x - p)
= (c(x - p))" C(x - p).
Set
y = C(x - p),
x
= C- 1y + p.
(4.2.7)
The Jacobian J of this transformation has the form
(4.2.8)
where II C- 111 indicates the absolute value of the determinant IC- 11. From
(4.2.6), we have
(4.2.9)
Hence,
(4.2.10)
Therefore,
f:··· f:oo!(X1".. ,xn)dx1...dxn= K t:··· f:
or
K 'IAI 1/2
{f:
e- l /2x2 dx
e- l /21'1IAI 1/2 dYl" .dYn = 1,
r
= 1.
4.2. Gaussian Systems
95
From this, we readily obtain that
K
=(~)1/2
(2nt
'
(4.2.11)
(4.2.12)
Denote by
CP(tl,· .. ,tn )
{ r.
= E expV f-t tjXj)} = E {'I'X}
e'
(4.2.13)
the characteristic function of the system (4.2.1), where
(4.2.14)
The following proposition gives the form of the characteristic function
(4.2.13) assuming that the joint probability density of the system (4.2.1) is
(4.2.12).
Proposition 4.2.1. The characteristic function cp(t 1 , ••• ,tn) under (4.2.12) is
(4.2.15)
PROOF. Because the symmetric matrix A is positive definite, there exists a
square n x n nonsingular matrix L such that
LV
= A.
Then,
(x - p)'A-l(X - p) = [L -l(X - p)]'L -l(X - p),
so that
x
( IA-11)1/2
(2n)n
r: . ·f:
cp(t 1, ... , t n) =
exp{it'x - ![L-1(x - p)]'L-1(x - P)}dXl .. ·dxn •
Now set
The Jacobian of this transformation is
Therefore,
x = Lz + p.
4. Gaussian Processes
96
cp(tl>".,t n )=
=
A -11)1/2 fOO
( I(2n)'
-00'"
exp(it'J!)
(),/2
2n
foo
".
foo
-00
foo
-00
exp[it'(LZ+J!)-~z'zJIAI1/2dz1 .. ·dz,
1,· ,
exp(-zzz+ltLz)dz 1 ,,·dzn ,
-00
or if we set u' = t'L, where u' = (u i , ... , u,),
J] fOO exp( -ZZj2+·lujz)dzj}
=
exp(it'J!) {'
(2n),/2
=
exp(it'J!) ( ,
)
(2 ),/2
exp( - ~uJ
1
-00
JJ
n
J-l
JJ, (fOO
J-1
=
exp(it'J!)
(2n),/2 [exp( -~u'u)J(2n)'/2
=
exp(it'J! - YLL't)
=
)
exp [ - ~(Zj - iuYJ dZj
-00
=
exp(it'J!- ~u'u)
exp(it'J! - ~t'At).
This proves the proposition.
D
4.3. Some Characterizations of the
Normal Distribution
A system of r.v.'s
(4.3.1 )
defined on a probability space {n,.?4, P} is said to be normal (or Gaussian) if
{XJ c L 2 {n,.?4,p} and its joint probability density is given by Equation
(4.2.12). We will show several properties of a Gaussian system.
Let
n
L
Z =
IXiXi
= (X'X,
(4.3.2)
i~l
where
IXl, ... , IX,
are constants and
X' =
(Xl"'"
Xn),
be a linear combination. We want to show that the r.v. Z is also normally
distributed with
E {Z}
n
=
L IXiJ.li =
(X'J!,
i~l
n
Var{Z}
=
n
L L IXiIX/Iij'
i~1 j~l
See (4.2.2) for definition of the notation. This will be demonstrated using
characteristic functions.
4.4. The Gaussian Process
97
According to (4.2.13), the characteristic function of X is
<p(tl,···,tn )
= E{exp(it'X)} = E{exp(i
ktl tkXk)}
(4.3.3)
On the other hand,
E{e iuZ } = E{exp(iU
j~
tljXj )}.
(4.3.4)
From this and (4.3.3), we see that (4.3.4) is a particular case of (4.3.3) obtained
by setting
Hence,
(4.3.5)
and this is the characteristic function of the normal distribution with mean
tliJli and variance L~ L~ tlktlPkj' This proves our assertion.
Does the converse hold? In other words, if every linear combination of
(4.3.1) is normally distributed, does this imply that the system (Xl"'" Xn) has
a normal joint distribution? The answer is affirmative as we will show next.
To this end, assume that the Z [see (4.3.2)] is normally distributed; then its
characteristic function is (4.3.5). Now, consider
L~
<p(t 1 ,··., tn ) = E {exp
(i t tjX
j )}.
This clearly can be obtained from (4.3.5) if in (4.3.5) we set Utlj = tj • Hence,
<p(tl"'" t n )
= exp
(i t
tjUj
-
~
tt
= exp(it'p - tt' At).
t h C1jk)
(4.3.6)
According to Proposition 4.2.1, this is the characteristic function of the normal distribution. This proves our assertion.
The previous two results can be formulated as follows. A system of n r.v.'s
from L2 {n,,qJ, P} is Gaussian if and only if every linear combination of its
members is normally distributed.
4.4. The Gaussian Process
Let {e(t); t E T} be a real-valued stochastic process on {n,,qJ, P} such that
g(t); t
E
T}
C
L2 {n,,qJ,
Pl.
98
4. Gaussian Processes
Set
(4.4.1)
Jl(t) = E g(t)}
and
C(s, t) =
E(~(s)
-
Jl(s»(~(t)
(4.4.2)
- Jl(t».
the mean and the covariance function, respectively, of the process
~(t).
Definition 4.4.1. The stochastic process {~(t); t E T} is said to be a "Gaussian
process" if and only if for each {tl, ... ,t,,} c T, n = 1,2, ... , (~(td,· .. ,W,,» is
jointly normal.
The covariance function (4.4.2) possesses the following properties:
(i) C(s, t) = C(t, s);
(ii) for any {t l , .•. ,t,,} c T, n = 1,2, ... , and any real numbers U l ,
L" L" UiUjC(t i, t) ~ o.
••• , U",
(4.4.3)
i=1 j=1
In other words, the covariance function is non-negative definite [see (4.2.4)].
We now show that the symmetry and non-negative definiteness are sufficient conditions for there to exist a Gaussian random process with a specified
covariance function. Let C(s, t) be a symmetric non-negative definite function.
Consider {tJ~ c T such that t 1 < ... < tIl and specify that WI), ... , ~(t,,) is
jointly Gaussian with covariance matrix [C(t i, tj)]' i, j = 1, ... , n. Then any
subfamily is also jointly Gaussian. Thus, this family satisfies the Kolmogorov
consistency requirement. It now remains to apply the Kolmogorov theorem.
EXAMPLE
4.4.1. Let {W); t
~
o} be a Gaussian process with
E {~(t)} = 0
and
C(s, t) = min(s, t).
(4.4.4)
From the definition, it follows that, for any 0 < tl < ... < t", the system of
r.v.'s (Wd, ... , ~(tll» has a normal distribution with the covariance matrix
A=
(4.4.5)
One can show that this matrix is positive definite. Let us now prove that the
process ~(t) has independent increments. First, from (4.4.4), it readily follows
that, for any 0 ::;;; s < t,
Varg(t) - ~(s)} = t - s.
On the other hand, for any 0 ::;;;
E(~(S2)
SI
<
S2
<
- ~(SI»(~(S4) - ~(S3»
S3
(4.4.6)
< S4.
= S2
-
S2 -
SI
+ SI = 0,
99
4.5. Markov Gaussian Process
which clearly shows that the r.v.'s
(4.4.7)
e(t 1), W2) - W1), ... , W.) - W1)
are uncorrelated. On the other hand, because every linear combination of a
Gaussian system is also normally distributed, each e(t;) - e(ti-l) is a normal
r.v. with mean zero and variance given by (4.4.6). Hence, (4.4.7) are jointly
Gaussian and, because they are uncorrelated, they must be independent.
Therefore, the process {e(t); t ~ O} has independent increments, which are
normally distributed, and the distribution of W + s) - W) does not depend
on t. Thus, {e(t); t ~ O} is clearly a standard Brownian motion. From this, we
may conclude that a standard Brownian motion is a Gaussian stochastic
process satisfying condition (4.4.4).
Remark 4.4.1. If (X 1 " " , X.) is a jointly Gaussian collection, then these r.v.'s
are mutually independent if and only if the covariance matrix of the system is
diagonal. Moreover, because the covariance matrix is diagonal if and only if
each couple of Xi and Xj is independent, a Gaussian collection is mutually
independent if and only if it is pairwise independent.
4.5. Markov Gaussian Process
In this section, the following result is required: Let (Xl' ... ,X.) be a Gaussian
system of r.v.'s with joint probability density
IA-11)1/2
f(x 1, ... ,Xn) = ( (2n)"
exp
(1.
-:z
n
)
i~ j.f; aijXiXj ,
(4.5.1)
where A -1 = [aij] is a symmetric positive definite matrix. The conditional
probability density of Xn given that Xl = Xl' ... , X n- 1 = Xn- 1, denoted by
f(x nlx 1, ... ,Xn-l), is defined by
f(x l' ... ,X n)
J~oof(X1"'" x.) dXn =
exp( - ~
J~oo exp( -~
I7=1 Ij=l aijXiX)
Ii=l Ij=l aijxixj) dx;
(4.5.2)
After some simple algebraic calculations, one readily obtains that
n
n
n-1 n-l
I I aijXiXj = i=l
I I
i=l j=l
j=l
n-1
aijXiXj
+ x;ann + 2xn I
i=l
ainXi'
(4.5.3)
From this and (4.5.2), we have
=
I
[
Cexp { -:zann Xn
+ n-1
i~
(a.)
J2} ,
a:: Xi
(4.5.4)
4. Gaussian Processes
100
where C is a norming constant independent of Xl' ... , x n- l . This is clearly a
normal probability density with mean
n-l (a. )
-I
~
ann
Xi'
i=l
Thus,
(4.5.5)
Let g(t); t
E
T} be a Gaussian process with
E(~(t)) =
0
and
C(s, t) =
E(~(s)~(t)),
s, t E T, which is also a Markov process. We want to show that the following
result holds.
Proposition 4.5.1. The covariance matrix C(s, t) of a Markov Gaussian process
with mean value 0 satisfies the condition that, for any t l , t 2, t3 E T with
tl < t2 < t 3,
(4.5.6)
PROOF. Because
s < t from T,
~(t)
is a Markov process, it follows from (4.5.5) that, for any
E g(t)1 ~(s)
= x} = -
(a ann
n - l •n)
x.
(4.5.7)
To determine (an-l.n/ann), consider
f.t(Xl,X2)
=
•
(lA -11)1/2
211:
1
I
-1
exp(-"2xA x),
where
A -1 = [C(s, s)
C(t, s)
C(s, t)]-l
C(t, t)
1
=
C(s, s)C(t, t) - C 2(s, t)
[ C(t, t)
- C(s, t)
-C(s,t)] .
C(s,s)
From this and (4.5.7), we conclude that
E{~(t)I~(s) = x} = (~~:::Dx.
Finally, according to (1.5.6), we can write
C(t l ,t 3) = E(WdW3)) = E{E(WdW3)IW2))}
= E{E(~(tdIW2))E(~(t3)1~(t2))}'
(4.5.8)
4.5. Markov Gaussian Process
101
which proves the assertion.
D
According to the last proposition, the covariance function of every Gaussian
Markov process with mean zero must satisfy Equation (4.5.6). One can show
that the converse is also true. In other words, any mean zero Gaussian process
whose covariance function satisfies Equation (4.5.6) must have the Markov
property.
To prove the last statement, we need to observe that a zero mean Gaussian
process is Markov if and only if, for any tl < t2 < .. , < tn' n = 2, 3, ... ,
Necessity of this condition is obvious. To prove its sufficiency, note that from
(4.5.5) and (4.5.7), it follows that
On the other hand, it is not difficult to verify that
1 [ Xn
!(Xn!Xn-l) = Cexp { -2ann
+
(a
n - l • n)
~
Xn-l
J2} ,
which proves the assertion.
Let us now show that (4.5.6) implies (4.5.9). Note that from (4.5.6), it follows
that, for any 1 :::; k :::; n - 1,
C(tk, tn ) -_ C(tko tn- l )C(tn- l , tn),
C(tn - l , t n - l )
for all k = 1, ... , n - 1. Thus, because these are normally distributed r.v.'s,
e(tn) - ( C(tn- 1, tn) ) Wn-l)
C(tn- l , tn-d
4. Gaussian Processes
102
or
4.6. Stationary Gaussian Process
Let
{~(t); t E
T} be a real-valued Gaussian process such that
E{Wn
= 0,
(4.6.1)
Vt E T.
The marginal distributions of such a stochastic process are completely determined by its covariance function
(4.6.2)
According to Definition 1.5.6, marginal distributions of a strictly stationary stochastic process are invariant under time-shifts. When is a Gaussian
zero-mean process strictly stationary? From the previous remarks, the condition should be on C(s, t). To simplify our notation, assume that T = [0, 00);
then we have the following result.
Proposition 4.6.1. A real-valued Gaussian process
°
:$;
s
:$;
if
t,
C(s, t) = C(O, t - s) = R(t - s).
PROOF.
o} with zero-mean
and only if, for any
{~(t); t ~
and covariance function C(s, t) is strictly stationary
(4.6.3)
If ~(t) is strictly stationary, then
E{~(s)~(t)} = E{~(s
or
C(s,t) = C(s
+ .)W + .)}
+ .,t + .).
By taking. = - s we have
C(s, t)
=
C(O, t - s)
= R(t - s).
On the other hand, if (4.6.3) holds, the characteristic function of
t1 < .. , < t n) is
(~(t1),··.,~(tn)) (O:$;
Problems and Complements
103
[see (4.3.6)]. But this is also the characteristic function of (e(t 1
+ -r)). This proves the assertion.
+ -r), ... ,
0
e(t n
Corollary 4.6.1. A zero-mean wide sense stationary Gaussian process is also
strictly stationary.
When is a stationary Gaussian process also a Markov process? The following proposition due to Doob provides an answer to this question.
Proposition 4.6.2. A stochastically continuous stationary Gaussian process
{W);t;:::.: O} with
E{e(t)}
is Markov
=0
=1
E{W)V
and
if and only if
R(s) = e- Y1s1 ,
y > O.
(4.6.4)
PROOF. If e(t) is stationary and Gaussian, then according to the condition of
the proposition, (4.6.3) and (4.5.6), we have that
R(s
+ t) = R(s)R(t).
(4.6.5)
Now, because e(t) is stochastically continuous, R(u) is continuous at u = 0
(prove this); then, because R(O) = 1, we have
R(t
+ h) -
R(t) = R(t) [R(h) - 1]
This shows that R(t) is continuous at every t
of the Cauchy equation (4.6.5) for t > 0 is
R(t) =
e- yt ,
E
-+
0 as h -+ O.
R. Then the unique solution
y > O.
On the other hand, R( - t) = R(t), which proves necessity of condition (4.6.4).
Conversely, if the covariance of a stationary zero-mean and variance 1
Gaussian process is of the form (4.6.4), then condition (4.5.6) holds. This
implies that condition (4.5.9) must also hold, which proves the proposition.
o
Problems and Complements
4.1. Let X = (Xl' ... ,X4 ) be a Gaussian system ofr.v.'s with covariance matrix
A=
15
3 1
0
3
16 6
-2
1
6 4
0
-2 1
3
104
4. Gaussian Processes
Determine its probability density for f(xl' ... ' X4) if
III = 1,0,
= 0,
112
113
= -1,0,
114
= 1.
4.2. The probability density of the system X = (X l ,X2 ,X3 ) is given by
f(x l ,x2,x3)
=
1 f3 (1
2+ 4X22-
16V1i3exp -g[2x l
2x 2(X3
2) .
+ 5) + (X3 + 5)]
Find 111,112,113' and the covariance matrix.
4.3. Let (X l ,X2 ,X3 ,X4 ) be a Gaussian system ofr.v.'s with E{Xd
Show that
E{X 1 X 2 X 3 X 4 } =
where
O"ij
0"120"34
= 0, i = 1, ... , 4.
+ 0"130"24 + 0"140"23,
= E{XjXJ.
4.4. Let {XjH be an i.i.d. sequence ofr.v.'s with each X j - N(O, 1). Let {a;}~ and {bj}~
be real numbers. Show that
Y=
L" ajXj
i==l
and
Z =
.
L bjXj
i=l
are independent r.v.'s if and only if L~=l ajbj = 0
4.5. Let A be an k x k square symmetric nonsingular positive definite matrix. Show
that there exists a nonsingular square matrix r such that A = rr- l .
4.6. Let X and Y be i.i.d. r.v.'s with finite variance
fJ such that a 2+ fJ2 = 1, a . fJ =f. 0, and
aX
0"2.
If there exist constants a and
+ fJY':" X,
then X - N(O,O").
4.7. Let g(t); t ;;::: O} be a standard Brownian motion. Show that
a. E{W2)IWd
= xd = Xl'
tl < t2;
b. E{W2)IWd = Xl' W3) = X3} = Xl
+ (X3
- Xl)(t2 - td,
t3 - tl
for all 0 < tl < t2 < t3.
4.8. Let Xl and X 2 be i.i.d. r.v.'s X j - N(O,O"). Consider
W) =
Xl cos 1t
+ X 2 sin 1t.
Find the mean and the covariance function of the stochastic process {e(t); t ;;::: O}.
Show that the process is Gaussian.
4.9. Let {X(t); t ;;::: O} be a Gaussian process with
E{X(t)} = 0
and
E{X(t)X(t
+ 't")} =
C('t").
Find the covariance function of {I1(t);t;;::: O}, where
I1(t) = X(t)X(s
+ t).
4.10. Show that a Brownian motion process g(t); t ;;::: O} is a Gaussian process.
Problems and Complements
105
4.11. Construct (X, Y) which is not normal, but X and Yare normal.
4.12. Let {Xn}'1 be a sequence ofr.v.'s Xn ~ N(Jln, an) such that Xn q~ X. IfVar{X} =
a 2 > 0, show that X ~ N(Jl, a).
4.13. Let X and 0 be independent r.v.'s, 0 uniform in [0, 2n], and
_ {2x3e-1/2X"
fx(x) - 0,
x;::: 0
x < o.
Let g(t); t ;::: o} be stochastic process defined by
W)
= X 2 cos(2nt + 0).
Show that W) is Gaussian.
4.14. Let {X(t); t ;::: O} be a stochastic process specified by
X(t) =
e-'~(e2'),
where ~(t) is a standard Brownian motion process. Show that X(t) is a Gaussian
Markov process.
4.15. Let {~(s); s ;::: o} be a standard Brownian motion. Consider
X(t) =
I ~(s)ds.
What kind of process is {X(t); t ;::: OJ? Determine its covariance function.
4.16. Let g(t); t ;::: O} be a Gaussian process specified by
~(t)
= X cos(2nt + U),
where U is uniform, in [0,2n] independent of x. Determine the distribution of X.
4.17. Let g(t); t ;::: O} be a Gaussian process with E g(t) = O} and suppose that
C(s, t) = Eg(s)~(t)} is continuous in sand t. Show that
X(t)
=
I o:(s)~(s)
ds
is also a Gaussian process where 0:(. ) is a continuous function.
4.18. Assume that
{~(t);
t ;::: O} is a stationary Gaussian process. Show that
f+T ~(s) ds =
Z(t)
is also a stationary Gaussian.
4.19. Let {~(s); s ;::: O} be a Markov Gaussian process. Show that
X(t)
f
= ,'+T ~(s)ds
is not Markovian.
4.20. Let g(t); t ;::: O} be a stationary Markov Gaussian process with continuous
covariance function C(s). Determine C(s).
4.21. Complete the proof of Proposition 4.6.2.
CHAPTER 5
L2 Space
5.1. Definitions and Preliminaries
In many applications of the theory of stochastic processes, an important role
is played by families of square integrable (second-order) r.v.'s. In this section,
we give some basic definitions and prove some fundamental inequalities
involving second-order complex-valued r.v.'s.
Let {n,~,p} be a probability space and Z a complex-valued r.v., i.e.,
Z = X
+ iY,
where X and Yare real-valued r.v.'s on {n,~,p}. As usual, we write X =
Re {Z} for the "real" part and Y = 1m {Z} for the "imaginary" part of the
complex variable Z. The conjugate Z of Z is defined by
Z=
and the modulus
X - iY
IZI is given by
IZI 2 = (X2 + y2).
The obvious connection between the conjugate and the modulus of Z is
Z·Z = IZI2.
Definition 5.1.1. A complex-valued r.v. Z on {n,~,p} is called second order if
EIZI 2 <
00.
The family of all such r.v.'s is denoted by
L2 = L2{n,~,p}.
5.1. Definitions and Preliminaries
Proposition 5.1.1. For any Zl' Zz
107
E
L2,
IEZ1 ·Z2 Iz :::;; EI Z 1I z ·EI Z zl z.
PROOF.
(5.1.1)
This is Schwarz's inequality. It follows readily from
O<Elz
_(EZ 1'ZZ)Z IZ
1
EIZzl z
z
EZ1' Zz) )((EZl 'Zz) - )
= E ( Zl - ( EIZzl z Zz Zl - EIZzl z Zz
= EIZ IZ _ (IEZl . ZzIZ)
EIZzl z
1
'
With the usual pointwise definitions of addition and multiplication by
complex constants, L z becomes a linear (or vector) space. As such, L z makes
no provision for such concepts as the norm of Z. We now append this notion
to L z . To this end, we need the following convention in dealing with elements
of L z .
Definition 5.1.2. Any two Zl' Zz
E
L z are to be regarded as equal if
Zl = Zz
Definition 5.1.3. For Zl'
Z2 E
(a.s.).
(5.1.2)
L z , the complex number (Zl'ZZ) defined by
(5.1.3)
is called the "inner" product.
That (5.1.3) always exists follows trivially from Schwarz's inequality. We
obviously have
(i) (cZ 1,Zz) = C(Zl'ZZ)'
(ii) (Zl' Zz) = (Zz' Zd,
(iii) (Zl + Z2, Z) = (Zl' Z)
+ (Zz' Z).
(5.1.4)
Finally, from (5.1.2), it follows that
(Z, Z) = 0 implies Z = 0 (a.s.).
Thus, (Zl' Zz) satisfies all the requirements for an inner product. Therefore,
L2 is an inner product space.
Remark 5.1.1. The convention (5.1.2) is an informal device for dealing with the
problem posed by failure of (Z, Z) = 0 to imply Z = O. A formal means of
treating this difficulty is to replace the space L z with the space of equivalence
classes defined by the assertion that two r.v.'s are equivalent if and only ifthey
are equal (a.e.). All subsequent statements would be applied to equivalence
classes rather than to the r.v.'s themselves.
108
5. L2 Space
The inner product can be used to define another function on L 2 •
Definition 5.1.4. For each
Z E L 2 , the "norm," denoted by IIZII, is defined by
(5.1.5)
Proposition 5.1.2. For any Zl'
Z2 E L 2 ,
IIZ1 + Z211 2 + IIZ1 - Z211 2 = 211Z1112 + 211Z2112.
(5.1.6)
PROOF. This is the Parallelogram Law. To prove it, write
IIZ1 + Z211 2= (Zl + Z2,Zl + Z2) = I Zl11 2+ (Zl,Z2) + (Z2,Zd + IIZ2112,
IIZ1-Z2112 =(Zl-Z2,Zl-Z2)= I ZlI1 2+(Zl, -Z2)+(-Z2,Zl)+ I Z211 2
By adding these and employing (5.1.4), the assertion follows.
D
Definition 5.1.5. If Zl, Z2
E L2 are such that (Zl, Z2) = 0, we say that they are
"orthogonal" and indicate this by writing Zl .1 Z2'
Proposition 5.1.3. If Zl .1 Z2, we have
IIZ1 + Z211 2 = IIZl112 + IIZ2112.
(5.1. 7)
PROOF. Write
IIZ1 + Z2112
+ Z2,Zl + Z2)
= IIZl112 + (Zl,Z2) + (Z2,Zl) + I Z211 2
= IIZl112 + IIZ2112.
= (Zl
Definition 5.1.6. A subset (9 c L2 is called an "orthogonal collection" if, for
any Zl' Z2 E (9 such that Zl "# Z2 (a.s.), we have Zl .1 Z2' In addition, if, for
every Z E (9, we have IIZII = 1, (9 is called an "orthonormal collection."
The following result is known as the Bessel inequality.
Proposition 5.1.4. If {Z;}1 c L2 is an orthonormal sequence, then, for any
ZEL 2 ,
n
L I(Z,Z;)12 :s; IIZII2.
i=l
PROOF.
o :s; II Z =
i~ (Z, Z;)Zi
(Z - i~
r
(Z,Zi)Z;,Z -
i~ (Z,Z;)Zi)
(5.1.8)
5.1. Definitions and Preliminaries
n
109
n
n
=
II Z 112 -
L (Z, Z,)(Z, Z;) - ,=1
L (Z, Z,)(Z"
,=1
=
IIZII 2 -
L I(Z"ZW - ,=1
L I(Z"Z)1 2 + ,=1
L I(Z,Z,W
,=1
=
IIZI1 2 -
L I(Z,Z,W·
i=l
n
n
Z) +
n
L L (Z, Z,)(Z, Zj)(Z"
,=1 j=l
Z)
n
n
o
This proves the assertion.
Corollary 5.1.1. If {Zi}f c L2 is orthonormal, then
00
L I(Z,ZiW S
1
(5.1.9)
IIZI12.
This is so because, according to the previous proposition, (5.1.8) holds for
every partial series of this sequence.
The following result is known as the Cauchy inequality.
Proposition 5.1.5. If Zl' Z2
E
L 2 , then
II Z 1 + Z211
s
II Z 111
+
IIZ211·
(5.1.10)
PROOF.
However,
(Zl,Z2)
+ (Z2,Zd =
s
s
(Zl,Z2)
+ (Zl,Z2)
21(Zl,Z2)1
21IZ111·IIZ211.
Therefore,
IIZ1 + Z211 2 S IIZl112 + 211Z111·IIZ211 + IIZ2112
=
(11 Z 111
+ II Z211 )2,
o
which proves (5.1.10).
From (5.1.10) we readily deduce the "triangle inequality":
II Z 1 - Z311 S II Z 1 - Z211
+
II Z 2 - Z311·
(5.1.11)
Remark 5.1.2. The concept of a norm permits us to measure the "distance"
d(Zl,Z2) between any two members Zl' Z2 E L2 as
(5.1.12)
Now, we can prove the following result.
5. L2 Space
110
Proposition 5.1.6. The next four functions are uniformly continuous [(a) and (c)
on L z and (b) and (d) on L z x L z ]'
(a) aZ;
PROOF.
IIZII;
(c)
From (5.1.10), it follows that
(a) d(aZ 1 ,aZZ ) = lald(ZI'Zz),
(b) I(Z1>Z) - (Zz,Z)I::;; IIZII'II ZI - Zzll = IIZlld(ZI'Zz),
(c) IIIZIII-IIZzlll ::;;d(ZI,ZZ)'
(d) II(ZI + Zz) - (VI + Vz)11 ::;; I ZI - VIII + IIZz - Vzll
= d(ZI'
Vd + d(Zz, Vz)·
5.2. Convergence in Quadratic Mean
Because L z = L2 {n, gj, P} is a metric space, with the distance defined by
(5.1.12), we can now introduce a notion of convergence.
Definition 5.2.1. A sequence {Zn}f c L z is said to converge to an element
Z E L z if
IIZn - ZII
= d(Zn, Z) ..... 0
as n .....
00.
(5.2.1)
For this kind of convergence, it is not necessary that the values Zn(w)
converge to Z(w) pointwise or a.s. We shall call this kind of convergence,
"convergence in quadratic mean" or "convergence in the mean square" and
write
Zn ~ Z
or
Z = l.i.mZn •
Proposition 5.2.1. If {Zn}f c L2 converges in quadratic mean (q.m.), it may
have at most one limit.
PROOF.
Suppose that
Zn~ V
where V, V
c
and
Zn~ V,
L z ; then, for all n = 1,2, ... ,
I V - VII::;; IIZn - VII + IIZn - Vii ..... 0 as n ..... 00.
Hence, I V - V II = 0, which implies that V = V (a.s.).
Proposition 5.2.2. If {Zn}f c L2 converges in q.m. to Z E L 2, then
IIZ.II ..... IIZII·
D
111
5.2. Convergence in Quadratic Mean
PROOF.
The obvious inequalities
IIZn I
Ilzll
~
~
Ilzll + IIZn - zll,
IIZnll + Ilzn - zil
imply that
which proves the assertion.
D
Definition 5.2.2. A sequence {Zn} C L2 is said to be "Cauchy" or "fundamental," if for every e > 0 there exists a natural number N such that
IIZn - Zmll
~
e
ifm, n > N.
Proposition 5.2.3. If
sequence.
{Zn}':' c L2 has a limit Z E L 2, then it is a Cauchy
PROOF. Let e > 0 be arbitrary; then there exists a natural number N, such that
for any n > N
IIZn - ZII < e/2.
Now, if m, n > N, then
which proves the assertion.
D
The converse statement is much deeper.
Proposition 5.2.4 (Riesz-Fischer). If
Z E L2 such that
{Zn}':' is a Cauchy sequence, there exists
PROOF. Consider the convergent series Lk=l rk. Because {Zn}':' is fundamental, for every k there is an nk such that
IIZn - Zmll < 2- k ifm, n
~
nk'
We may suppose, without loss of generality, that n 1 <
IIZnk+l - Znk I < 2- k •
Therefore,
n2
< "', so that
5. L2 Space
112
On the other hand, due to convexity (Jensen's inequality)
IIZnH, - Znkl12 = EIZnk+, - Znkl2 ~ (EIZnk+, - Znkl)2,
so that
EIZnH, - Znkl ~ IIZnk+, - ZnJ·
From this, it follows that the series
00
L E IZnH, k=l
Znkl
converges. Then, according to the Beppo-Levi theorem, the series
converges (a.s.). Consequently, the series
converges (a.s.). However,
00
Znl
+ k=l
L (Znk+1 -
ZnJ = lim Znk'
k-+oo
Set
where this limit exists and Z = 0 on n where this limit does not exist.
Next, we have to show that Z E L2 and that
Z
= Li.mZn •
To this end, take an 6 > 0 and a natural number N such that, for m, n > N,
IIZn - Zmll < 6. If ko > N, then
IIZn - Znkll2 < 62
for any n > Nand k > k o. By applying Fatou's lemma to the sequence
{liZ. - Znk II h>ko' we obtain that
IIZn - ZII 2 ~ liminf IIZn - ZnJ 2 ~ 6 2.
k-+oo
In other words, for all n > N, IIZn - ZII < 6. From this, it follows that
Zn - Z E L2 and, consequently, Z E L 2, and Z is the mean square limit of
{Zn} l' This completes the proof of the Riesz-Fischer theorem.
0
Corollary 5.2.1. Because IEZn - EZI 2 ~ liZ. - ZII 2 --+ 0 as n --+
that
00,
it follows
5.3. Remarks on the Structure of L2
113
The following is a criterion for the mean square convergence due to Loeve.
Proposition 5.2.5. A sequence {Zn}1' c L2 converges in quadratic mean
only if
EZnZm ~ C (a finite constant)
if and
(5.2.2)
as nand m tend independently to infinity.
PROOF.
Assume that (5.2.2) holds; then, as m, n ~
00,
IIZn - Zml1 2 = EIZn - Zml 2 = E(Zn - Zm)(Zn - Zm)
=
EZnZn - EZn' Zm - EZmZn
~
C- C- C
+C=
+ EZmZm
O.
On the other hand, if Zn ~ Z, then
EZnZm ~ EZZ = IIZI12.
This follows from the fact that if {ZnH' c L2 and {Un }1' c L2 converge in
q.m. to Z and U, respectively, then from Corollary 5.2.1, it follows that
D
Remark 5.2.1. The property of the space L2 established by the Riesz- Fischer
theorem is called "completeness." With this property, L2 = L2 {n,~, P} becomes a Hilbert space.
5.3. Remarks on the Structure of L2
The structure of the Hilbert space L2 = L 2 {n,.?4,p} depends clearly on the
underlying probability space {Q,.?4, Pl. In this section, we discuss the problem
of approximation of a r.v. Z E L2 by means of a certain type of bounded r.v.
from L 2 • We also discuss the problem of separability of the space L 2 • To this
end, the following definition is needed.
Definition 5.3.1. A (real or complex) r.v. Z on a probability space {Q,.?4, P} is
said to be "essentially bounded" if there is a constant 0 < c < 00 such that
IZI :s; c (a.s.).
Loo
The subset of all essentially bounded r.v.'s of L2{Q,~,P} is denoted by
= Loo{Q,~,P}. On L oo , we define a norm by
IIZII = IIZlloo = inf{ c; IZI :s; c (a.s.)}.
(5.3.1)
With this norm, Loo becomes a normed linear space. The norm IIZlloo is often
5. L2 Space
114
called the "essential supremum" of IZI and is denoted by
IIZllao
= esssuplZI.
(5.3.2)
Next, we prove the following result.
Proposition 5.3.1. Let V be an arbitrary complex r.v. on
> 0 there exists a r.v. Z E Lao {n, Pl, P} such that
{n,~, P},
then for any
8
P{V i= Z} <
8.
PROOF. Set Ak = {IVI > k}, k = 1,2, ... , and write G = {lUI = oo}. We
clearly have that P(G) = 0,
ao
and that G =
Ak•
n
k=l
From this, it follows that P(Ad -+ 0 as k -+
P(Ano) < 8. Now define the r.v. Z as
00,
Z- { V on A~
-
0
Clearly, IZI ~ no and P{Z i= V} <
8.
so that there exists no such that
0
on Ano.
This proves the assertion.
o
Definition 5.3.2. A subset D c L2 is said to be everywhere "dense" in L2 if any
element of L2 is the mean square limit of a sequence of elements from D.
It is not difficult to see that a necessary and sufficient condition for a subset
D c L2 to be everywhere dense in L2 is that, for any Z c L2 and any 8 > 0,
it is possible to find Zo E D such that
liZ - Zoll <
8.
Proposition 5.3.2. The class Lao c L2 of all bounded r.v.'s is everywhere dense
in L 2 •
PROOF. Let Z
E
L2 and consider
cp(A) =
L
Z2 dP,
A
E
!!4.
Clearly, cp(. ) is a bounded non-negative countably additive set function on
Pl such that cp « P. Therefore, for any 8 > 0, there exists a f> > 0 such that if
P(A) < f>,
then
cp(A) <
8 2•
Now, according to Proposition 5.3.1, there exists a bounded r.v. Zo such that
5.4. Orthogonal Projection
115
P{Z -# Zo} < (j.
Without loss of generality, we may assume that Zo
Then
=
0 on the set {Z -# Zo}.
o
which proves the proposition.
Next, we define two important concepts in L2 theory. The first one is that
of a "linear manifold."
Definition 5.3.3. A subset M c L2 is called a linear manifold if, for any Zl,
Z2 E M and complex numbers a and /3, aZ l + PZ2 EM.
Definition 5.3.4. A closed linear manifold H
c
L2 is called a subspace.
Let G c L z be an arbitrary subset. There always exists at least one linear
manifold that contains G. The intersection of all linear manifolds containing
G is denoted by <G) and is called the linear manifold spanned by G. The
closed linear manifold is denoted by <G).
One of the fundamental questions in the theory of L2 spaces is that of their
dimensionality. In this respect, of particular interest are conditions on the
probability space {n,86,p} under which L z {n,86,p} is countably infinite
dimensional. This question is closely related to the one of whether there exists
a countable everywhere dense subset of L z. In other words, is the space
L z separable? In general, this is not the case unless the probability space
{n, 86, P} has a particular structure.
Remark 5.3.1. A linear inner product space L z with the norm IIhll = (h,h)l/Z
for h E L z is often called unitary. This term, however, is not standard in the
literature.
5.4. Orthogonal Projection
To investigate the geometry of the space L 2 {n,86,p}, we only need the
concept of orthogonality. According to Definition 5.1.5, elements Zl' Zz E L2
are said to be orthogonal, written Z l l. Zz, if (Zl,ZZ) = O. We say that an
element Z E L2 is orthogonal to a subset M c L 2 , and write
Zl.M
if(Z, U) = 0 for all U E M.
In this section, we discuss an important and useful result in the theory of
Hilbert spaces, the so-called projection theorem, which deals with decomposi-
116
5. L2 Space
tion of an element Z E L2 into orthogonal components. Roughly speaking,
the content of this theorem can be described as follows. Given a subspace
He L2 and an arbitrary Z c L 2 , there exists a unique decomposition of the
form
Z = ZH + Zo,
where ZH E Hand Zo .1 H. The ZH is called the orthogonal projection of Z
onH.
To this end, let us show that there is always a minimal distance from an
element Z E L2 to a given subspace.
Proposition 5.4.1. Let H c L2 be a subspace and Z
lJ = inf{ liZ - VII; V
Then there is Zo
E
E
E
L2 arbitrary and write
H}.
H such that
liZ - Zoll = lJ.
PROOF. Choose any {Zk}f c H such that liZ - Zkll
from (5.1.6), we have
II(Zn - Z)
+ (Zk
-+
lJ. On the other hand,
- Z)1I 2 + II(Zn - Z) - (Zk - Z)11 2
= 211 Zn - ZI1 2 + 211 Zk - ZI1 2
or
IIZn - Zkl1 2 = 211Zn - ZI1 2 + 211Zk - ZII 2 - 411t(Zn
Because t(Zn
+ Zk) E H, it follows that
Ilt(Zn + Zk) -
+ Zk) -
Z112.
ZI1 2 ~ lJ 2 •
Therefore,
(5.4.1)
By letting nand k -+ 00, the right-hand side in (5.4.1) converges to zero. This
implies that {Zn}f is a Cauchy sequence. Therefore, Zn ~ Zo E H. From the
continuity of the norm, it follows that
o
liZ - Zoll = lim liZ - Znll = lJ.
n .... oo
Proposition 5.4.2. Let H c L2 be a subspace and Z
be such that
liZ - ZO II = inf{ liZ - VII; V
E
L2 arbitrary. Let Zo
E
H};
then (Z - Zo) .1 H.
PROOF.
If V
E
H, so is Zo - aV for any complex a. Therefore,
liZ - Zoll ~ liZ - Zo
+ aVII·
E
H
5.4. Orthogonal Projection
117
Set Z* = Z - Zo and 7J = (Z*, U). This yields
IIZ*112 ~ IIZ*
+ tXUI1 2 =
IIZ*112
+ tXf3 + tXf3 + ItXI 211UI1 2
or
By successively assigning tX values t, - t, it, and - it where t > 0, one obtains
-tlIUII ~ f3
+ 7J ~ tlIUII,
- t II U II ~ i(f3 - 7J) ~ til U II·
Because t > 0 is arbitrary, it follows that f3 + 7J = f3 - 7J = O. Therefore,
(Z*, U) = f3 = 0 for all U E H, which proves the proposition.
0
Definition 5.4.1. The orthogonal complement of a subspace H c L2 is the
subset H1- c L2 of all Z E L2 such that
Z1-H.
Proposition 5.4.3. The orthogonal complement H1- of a subspace H is also a
subspace such that H n H1- = {ZO}, where Zo = 0 (a.s.).
PROOF.
If Zl' Z2 E H1-, then (Zl' Z)
= (Z2' Z) = 0 for any Z E H. Therefore,
+ f3(Z2'Z) = (tXZ1 + f3Z2'Z)
for any tX and f3. This shows that tXZ1 + f3Z2 E H1-, so that H1- is a linear
manifold. To see that it is closed, let {Udl' c H1- converge in q.m. to Zoo
0= tX(Zl,Z)
Then, by continuity of the inner product, we have that for any Z
E
H
(ZO,Z) = lim (Un,Z) = O.
Therefore, ZO 1- Z, so that ZO E Hl.. Finally, if Zo E H n H\ then (Zo, Zo) =
0, so that Zo = 0 (a.s.).
0
Definition 5.4.2. Let A and B be two arbitrary subsets of L 2 ; then
A
+ B = {Z + U;ZEA,UEB}.
(5.4.2)
If H1 and H2 are two orthogonal subspaces of L2 one writes H1 E9 H2
for H1 + H 2. Thus, the use of the symbol E9 entails orthogonality of the
summands.
Proposition 5.4.4. If H is a subspace of L 2, then H E9 H1- = L 2 •
PROOF. Clearly, H E9 H1- c L2.1f Z E L 2, then by Proposition 5.4.1 and 5.4.2,
there exists Zo E H such that (Z - Zo) 1- H, which implies that (Z - Zo) E H1-.
From this, the assertion follows because Z = Zo + (Z - Zo).
5. L2 Space
118
Proposition 5.4.5. If ZI + U1 = Z2
then ZI = Z2 and U1 = U2 (a.s.).
+ U2 , where ZI' Z2 E Hand U1 , U2 E H1-,
PROOF. Because ZI - Z2 E Hand U1 - U2 E H\ it follows that ZI - Z2 =
U2 - U1 E H (\ H1- and the assertion follows because H (\ H1- = {O}.
0
Corollary 5.4.1.
Remark 5.4.1. If H c L2 is a subspace, it follows from Propositions 5.4.4 and
5.4.5 that any Z E L2 can be expressed uniquely as a sum
(5.4.3)
where ZH E H and ZH~ E H1-. We call ZH the orthogonal projection of Z on
H. It is clear that Z = ZH if and only if Z E H.
5.5. Orthogonal Basis
At the end of the Section 5.3 we made a brief commentary concerning the
dimensionality of L2 spaces, although the notion by itself has not been
formally defined. This concept, whose formal definition will be given at the
end of this section, is closely related to the concepts of "orthogonality" and
"completeness" which will be discussed in some detail in this section. We
begin with the following definition (see Definition 5.1.6):
Definition 5.5.1. An orthogonal system (!) c L2 is called "complete" if no other
orthogonal collection of elements of L2 contains (!) as its proper subset.
Next, we shall show that the closed linear manifold spanned by a complete
orthonormal collection (!) c L2 is equal to L 2, <(9) = L 2. For this reason a
complete orthonormal system is often called a "basis" for the space L 2 •
Definition 5.5.2. Let {Uk}! c L2 be an arbitrary orthonormal system and
Z E L 2 • The complex numbers
Ck
= (Z, Uk)'
k = 1,2, ... ,
(5.5.1)
are called the "Fourier coefficients" of Z with respect to the system {Uk}! and
the sum
(5.5.2)
is called a "Fourier series" of Z with respect to {Uk}!'
5.5. Orthogonal Basis
119
From (5.5.1) and the Bessel inequality (5.1.9), it follows that
(5.5.3)
This then clearly implies that
t
II k=m+l
CkUk
112 =
t
k=m+l
ickl 2 -+ 0
as m, n -+
00.
(5.5.4)
Therefore,
liS" - Smll
where S" = L~=l
CkUk •
-+
0 as m, n -+ 00,
Hence, by the Riesz-Fischer theorem,
Sn~SEL2'
Now we can prove the following result.
Proposition 5.5.2. Let {Zk}'1 c L2 be a complete orthonormal system; then
every element Z E L2 admits an expansion convergent in the mean square
(5.5.5)
where Ck = (Z, Zk), i.e.,
(5.5.6)
Furthermore, we have that
IIZII 2
II
=
L ickl 2 ,
k=l
(5.5.7)
which is called Parseval's formula.
PROOF.
From (5.5.4), it follows that
II
'"
L...
k=l
Ck
Zkq.m.
Z0
-
as n -+
00.
Now, for any fixed k = 1, 2, ... and n ~ k,
I(Zo,Zk) - cll =
I( jt
Zo -
~ II Zo -
jt
= II Zo -
jt
CjZj'
CjZj
j
l CjZ
Zk)1
11'IIZkll
II·
(5.5.8)
5. L2 Space
120
As n --+
00
the right-hand side tends to zero due to (5.5.8), which implies that
(Zo, Zk) = Ck'
Hence, for any k
= 1,2, ... ,
(Zo - Z,Zk) = (ZO,Zk) - (Z,Zd = Ck - Ck = 0,
so that Zo - Z is orthogonal to every element of the complete orthonormal
system {Zk}'f. This then implies that Zo - Z = 0 (a.s.) so that
(5.5.9)
Zo = Z (a.s.).
Finally, because
II Z - ktl CkZk
r
=
IIZlI z - ktl ickl z,
we obtain the Parseval formula by letting n --+
(5.5.9).
00
(5.5.10)
in (5.5.10) and invoking
0
Remark 5.5.1. The expansion (5.5.5) is called the generalized Fourier series of
Z E L z , where ck = (Z, Zd are the generalized Fourier coefficients of Z with
respect to the orthonormal sequence of {Zd'f. When {Zk}'f is a complete
orthonormal system, it follows from Proposition 5.5.2 that
(5.5.11)
Remark 5.5.2. From Proposition 5.5.2, it also follows that for a given complete orthonormal {Zd'f c L z and Z E L z , we have that (5.5.5) holds where
Ck = (Z, Zd, k = 1, 2, ... , and (5.5.7) holds. Does the converse also hold? In
other words, given an arbitrary sequence of complex numbers {cd'f and a
complete orthonormal system {Zk}'f, does there exist a Z E L z such that
Ck = (Z, Zd for all k = 1, 2, ... and such that (5.5.5) holds? From Bessel's
inequality, we know that only those sequences come into consideration for
which Lk=l ICklz < 00. This condition is also sufficient. In fact, from (5.5.4)
and using the Riesz-Fischer theorem, we have that the series
converges in the mean to a Z
E
L z ifLk=llckl z <
(Z,Zk) = Ck ,
k
00,
and
= 1,2, ....
In other words, if {Zk}'f is a complete orthonormal system and {cd'f a sequence of complex numbers for which Lk=l ICklz < 00, then Lk=l CkZk ~
Z E L z , such that
(Z,Zd = ck ,
and no other solution Z
E
k
= 1,2, ... ,
(5.5.12)
L z of the system of equations (5.5.12) exists.
5.6. Existence of a Complete Orthonormal Sequence in L2
121
Therefore, every complete orthonormal sequence {Zk}f E L2 establishes,
by means of the formula Z = L~=l CkZk, a one-to-one correspondence between the elements Z E L2 and the sequences of complex numbers {ck}f
satisfying L~11ck12 < 00. Now we can give the following definition.
Definition 5.5.3. The dimension of the space L2 is the cardinality of its complete orthonormal set. If there is a complete orthonormal sequence {Zk}f c
L 2, then L2 is countably infinite dimensional.
Remark 5.5.3. The following, more general, form of Parseval's formula also
holds. For any two Z, U E L 2 ,
L (Z, Zk)(U, Zk)'
k=l
<X)
(Z, U) =
(5.5.13)
This follows from the identity (which is easy to verify)
4(Z, U) = liZ
+ UI1 2 - liZ - UI1 2 + iliZ + iUIi
- iliZ - iU11
(5.5.14)
and from (5.5.7).
5.6. Existence of a Complete Orthonormal
Seq uence in L 2
Definition 5.5.3 specifies that a Hilbert space L2 is countably infinite dimensional if there exists a complete orthonormal sequence {Zk}f c L2 (often
called its base). In general, such a sequence does not exist. If, however, there
exists a countable everywhere dense subset Do c L2 (in other words, such that
any element Z E L2 is the mean square limit of a sequence of elements from
Do), then there exists a complete orthonormal sequence in L 2 •
To show this, write elements of Do as a sequence {Uk}f, i.e., Do = {Udf.
From this, by means of the well-known Gram-Schmid orthogonalization
procedure, one can always construct an orthonormal family {Zk}f as follows.
Put
(5.6.1)
To obtain Z2' set
W2 = U2
-
(U2 ,ZdZ l'
Clearly, (W2' Zl) = O. Thus,
Continuing this processs, we obtain
k-l
w,. = Uk - L
j=l
(Uk' Zj)Zj,
(5.6.2)
122
5. L2 Space
so that
(5.6.3)
and so on. It is easy to verify that the sequence {Zdf is orthonormal.
Let us now show that {Zn}f is also complete. If Zo E L2 is orthogonal to
every Zk' it follows from (5.6.1) and (5.6.2) that Zo is also orthogonal to every
Ui • Consequently,
IIZo - Uk l1 2
=
(Zo - Uk,Zo - Uk)
=
(ZO,ZO)
+ (Ub
Uk);;::: II ZoI12.
But because {Uk}f is everywhere dense in L 2 , the left-hand side can be made
arbitrarily small by suitable choice of k. This implies that IIZo \I = 0, which
proves the assertion.
We conclude this section with some remarks on second-order stochastic
processes. Let {((t); t E T} be such a process, and denote by «(T» the closed
linear manifold spanned by this family. Clearly, «(T» is a subspace of L2
and, as such, it is a Hilbert space.
Assume now that {((t); t E T} is a q.m. continuous process and that Tis
an interval. In other words, at every t E T.
IIW + h) -
((t)1\2
-+
°
as h -+ 0.
Denote now by Q c R the set of rationals of R. Then, for every t E T, there
exists {td c Q n T such that tk -+ t as k -+ 00. Therefore, because {((t); t E T}
is continuous in quadratic mean, we have, for every t E T,
((t)
= l.i.m Wn).
In other words, the countable family {((t); t E Q n T} is everywhere dense in
«(T». Therefore, according to the previous discussion, there exists a complete countable orthonormal family in «(T».
5.7. Linear Operators in a Hilbert Space
In this section, we discuss the concept of a continuous linear mapping of a
Hilbert space into itself.
Definition 5.7.1. Let Yf be a Hilbert space. A mapping T: Yf
"linear operator" if
T(C1Xl
+ C2 X 2 ) =
Cl TX l
+ C2 TX 2
-+
Yf is called a
(5.7.1)
for all Xl' X 2 E Yf and any complex numbers Cl , C2'
Definition 5.7.2. The operator T is continuous at Xo E Yf if Xn ~ Xo implies that TXn ~ TX o' This means that, for any I: > 0, there is b = b(l:)
such that
IIXn - Xoll < b=> IITXn - TXol1 < 1:.
124
(v)
5. L2 Space
I T* T II = I TT* II = I T 112. (Clearly, the composites
T* T: Yf
and
-+ Yf
TT*: Yf
-+ Yf
are well defined.)
(vi) (T1 0 T2)* = (T2* 0 T1*)'
Definition 5.7.5. An operator T: Yf -+ Yf is called
(a)
(b)
(c)
(d)
(e)
isometric if: T*T= I (identity operator),
unitary if: T*T= TT* = I,
self-adjoint if: T* = T,
projection if: TT = T and T = T*,
normal if: T*T = TT*.
The following result is very useful. Suppose that the space L2 is separable.
Then, according to the Proposition 5.5.4, there exists a complete orthonormal
basis {Zdf c L 2. Therefore, every Z E L2 has a unique representation
Proposition 5.7.2. Let {bn}f be a bounded sequence of complex numbers and
C = l.u.b{lbkl; k = 1,2, ... }. There exists a unique operator T such that
(i)
(ii)
(iii)
(iv)
(v)
(vi)
TZk = bkZb k = 1,2, ... ,
T
CkZk =
CkTZb
II Til = C,
T*Zk = bkZk,
T*T = TT*,
T*(2:r=1 ckZd =
Ck T*Zk'
2:r=1
2:r=1
2:r=1
PROOF. Consider Z EYf and Z = 2:r=1 CkZk, where {Zk}f c
Yf is complete
and orthonormal. The problem is to define TZ. Because
co
co
L ICkbkl 2 :::; C 2k=1
2: ICkl2 =
k=1
[this follows from (5.5.13) if we put U
=
C 211Z11 2
(5.7.4)
Z], we can define
co
TZ
=
2: bkckZk·
k=l
From (5.7.4), we have that
I TZII 2 :::; C 2 11Z11 2 ,
which shows that T is continuous and I TIl :::; c.
Clearly, TZk = bkZk; because IIZk I = 1,
IITII = sup IITZII;;:: II TZkll = IIbkZkll = Ibkl
IIZII=l
for all k = 1, 2, .... Hence, II Til;;::
c. This proves (i)-(iii).
(5.7.5)
5.7. Linear Operators in a Hilbert Space
123
Definition 5.7.3. A linear operator is bounded if there exists a constant C > 0
such that
(5.7.2)
II TXII ~ C IIXII
for all X
II TIl·
E
.Yf. The least such C is called the "norm" of T and is denoted by
From the last definition we have
II TIl = inf{C; II TXII ~ qXII,x E.Yf}
. { C;1tXf
II TXII ~ C,X E.Yf} .
= mf
In other words,
IITII = sup IITXII.
Xe Jff IIXII
X,.o
(5.7.3)
An alternative formula for the norm of Tis
II TIl = sup II TXII·
IIXII;l
From (5.7.2), it follows that every bounded linear operator is continuous
on .Yf.
Proposition 5.7.1. If T is a linear and continuous operator on .Yf, there exists a
unique linear continuous operator T* on.Yf such that (TX, Y) = (X, T*Y) for
all X, Y E .Yf and
II TIl = II T* II·
Definition 5.7.4. The operator T* is called the "adjoint" of T. We now list
some simple properties of adjoint operators.
(i) (A. T)* = IT*.
=-=--="':-::-::::
(ii) (T*Y,X) = (X, T*Y) = (TX, Y) = (Y, TX).
(iii) (T*)* = T [it follows from (ii) and
((T*)*X, Y)
= (X, T*Y) = (TX, Y).
(iv) If 7;: .Yf -+ .Yf are continuous and linear, i = 1, 2, then
(Tl
+ T2)* =
Tl*
+ T2*·
[This follows from (iii) and
((Tl
+ T2)* X, Y)
= (X, (Tl
+ T2) Y)
= (X, Tl Y
+ T2 Y)
= (X, Tl Y) + (X, T2 Y) = (Tl Y,X) + (T2 Y,X)
= (Y, Tl*X) + (Y, T2*X) = (TrX, Y) + (TtX, Y)
= (Tl*X + T2*X, Y).]
5.8. Projection Operators
125
To prove (iv) and (v), suppose that
00
Z
00
= k-I
L CkZk
the problem is to show that Yk
T*Z
and
= k-I
L YkZk;
= Ckbk' For all k,
Yk = (T*Z,Zk) = (Z, TZk) = (Z,b,.Zk) = bk(Z,Zk) = bkCk'
Finally, T*TZk = T*Ib,.12Zk = TT*Zk; because {ZkH' is complete, T*T =
TT*. In addition, if To is another operator such that TOZk = CkZk> then
TOZk = TZk for all k = 1,2, ... ; because {Zk}f is complete, To = T.
D
5.8. Projection Operators
As we have seen in Section 5.4 (Propositions 5.4.1 and 5.4.2), if H c L2
is a subspace and Z E L2 an arbitrary element, there exists a unique
decomposition
Z = ZH + ZHi,
where ZH is called the orthogonal projection of Z on Hand ZHi is orthogonal
on Hl..
Let H be an arbitrary subspace of L2 and let P be the mapping
P: L2 -.L 2
associated with the subspace H such that, for any Z
PZ
E
L2,
= ZH'
(5.8.1)
The following properties of the mapping (5.8.1) are not difficult to verify.
(Here 0 E L2 is a zero element.)
PZ=Z,
(i)
and
VZEH
(PZ, V) = (Z, PV)
(ii)
PZ
=0
for all Z, V
if Z
E
L 2•
E
Hl.;
(5.8.2)
(5.8.3)
[This can be shown as follows:
(PZ, V) = (ZH' VH + VHi)
(Z,PV)
= (ZH' VH),
= (ZH + ZHi, VH) =
(ZH' VH)']
It is also easy to see that
(iii)
P(Z
Z
so that P(Z
+ U) =
+U=
ZH
+ U) = PZ + PU,
(ZH
(5.8.4)
+ VH) + (ZHi + VHi),
+ UH = PZ + PV;
(iv)
P(cZ) = cPZ;
(v)
PPZ = PZ;
(5.8.5)
126
5. L2 Space
(vi)
(PZ,Z) =
IIPZI1 2 ~ IIZII,
IIZI1 2 = IIZHII 2+ IIZH"1I2
~
IIZHI12
=
IIPZI1 2 ;
(vii) H is the range of P.
Remark 5.8.1. The mapping (5.8.1) is called the projection of L2 on H. Sometimes, the notation PH is used to indicate the relationship of P to the subspace
He L 2 .
Proposition 5.8.1. If T is any projection operator, there is a unique linear
subspace He£, such that T = PH.
Problems and Complements
5.1. If {IXJ'i and {Pi}! are sequences of complex numbers, show that (Cauchy's
inequality)
5.2. Using the Cauchy inequality, show that
5.3. Show that every metric induced by a norm satisfies
(i) d(Zl + IX, Z2 + IX) = d(Zl, Z2),
(ii) d(IXZ 1 ,IXZ2 ) = IlXld(Zl,Z2),
where IX is a complex number. Can every metric be obtained from a norm?
5.4. Can we obtain a norm from a metric?
5.5. Let Z
E
L2 and A c L 2. The distance between Z and A is defined by
d(Z,A)
=
inf{d(Z, U); U
E
A}.
Show that
5.6. Verify the identity
IIZ1 - Z211 2 + IIZ1 - Z311 2 = HZ2 - z31i 2 + 211Z1 - t(Z2 + Z3)112.
5.7. Let {Z.}O' c L 2 ; show that Z. ~ Z if
IIZ.II-+ IIZII
5.8. Let Zl' Z2
E
and
(Z.,Z)-+(Z,Z).
L2 {Q, fl, P} be such that IZd ·IZ21 2: 1 (a.s.). Show that
(EIZdHEIZ21) 2: 1.
Problems and Complements
127
5.9. Let Co the set of all sequences of complex numbers {lXt}i' such that {k;lXk #- O}
is finite. Define
(X, Y) =
ao
L XkYk'
1
X, Y E Co. Show that Co is an inner product space but not a Hilbert space.
5.10. Prove that Lao = Lao{n,aJ,p} is a Banach space if IIZII = esssuplZI.
5.11. Let Lo
C
L2 be a subspace and Z E L 2 • Show that
inf{ liZ - UII; U E Lo}
= sup{I(Z, W); WELt, II WII
~
1}.
5.12. If {Z.}i' and {U.}i' from L2 are such that
Z.~Z
and
Un~U,
show that (Z., Un) -+ (Z, U).
5.13. Show that in L 2 , ZI ..L Z2 if and only if
II Z I
+ IXZ211
= IIZI - IXZ211.
5.14. Prove that if {Zt}i' c L2 is orthonormal, then, for any Z E L 2 ,
(i) lim.~ao (Z, Z.) = 0,
(ii) IIZi - Zjll = 2 for all i #- j.
5.15. A subset M c L2 is said to be convex if (0
~
IX
~
1)
ZI,Z2 E M=A = {ZE L 2; Z = IXZ1 + (1-IX)Z2} eM.
If{Z.} c Msuch that IIZ.II-+d = inf{lIUlI; U E M},showthat {Z.} is a Cauchy
sequence. The set A is called the segment joining ZI and Zz.
5.16. Let {Z.}i' c L2 be an orthonormal sequence. Show that for any U, VE L z
ao
L
k=1
I(U,Zk)(V,Zl)1 ~ 1lU11·11V1I·
5.17. Let {Z.} c L z be such that Z. ~ Z. If
1 •
Y.=n
L Zi'
i=1
then show that Y. ~ Z.
5.18. Let {Zt}7 be an orthonormal collection. Prove that
is attained if IXk = (Z, Zt).
5.19. If {Z.}
C
L2 is such that suP. liZ. II ~ K <
5.20. Let lp, 1 ~ p <
that
00,
00,
show that Z./n -+ 0 (a. e.).
be the set of all sequences ofreal numbers (1X1'lXz, ... ) such
128
5. L2 Space
L Icx;lP < cx.
00
i::!
Show that Ip is separable.
5.21. Let £ be Hilbert space and qJ a linear functional on £. Show that there exists
a unique element Xo E £ such that
qJ(x) = (x, xo)
for all x
(This is the Riesz representation theorem.)
5.22. Prove Proposition 5.7.1.
E
£.
CHAPTER 6
Second-Order Processes
6.1. Covariance Function C(s, t)
There exists a large class of engineering and physics problems whose solutions require only the knowledge of the first two moments and some very
general properties of a second-order random process (see Definition 1.5.8).
This chapter is concerned with some key properties of complex-valued
second-order random processes.
Let {~(t); t E T} be a complex-valued second-order random process on
{n,~,p}, i.e.,
(6.1.1)
Second-order processes are often called "Hilbert processes." Separating the
real and imaginary parts of the process, we can write
~(t) =
X(t)
+ iY(t),
(6.1.2)
where X(t) and Y(t) are two second-order real-valued random processes.
In the sequel, unless otherwise stated, we will always assume that
E{W)} = 0 for all t E T.
Definition 6.1.1. The covariance function C(s, t) of the process (6.1.1) is by
definition the second mixed moment, i.e.,
C(s, t)
= E {~(s),(t)}
(6.1.3)
or, according to the definition (5.1.3) of the inner product,
C(s, t) = R(s), e(t)).
(6.1.4)
6. Second-Order Processes
130
From (6.1.3), it readily follows that
C(s, t) = C(t, s),
(6.1.5)
which is known as the "Hermitian symmetry." We also have that
C(t,t)
= Varg(t)} =
(6.1.6)
11~(t)112.
The covariance function of a second-order process is always finite-valued.
This follows from the Schwarz inequality (see 5.1.1)
lC(s,t)1 2 ~ EI~(sW·EI~(tW = 11~(s)1I2·1I~(t)1I2.
A covariance function possesses a number of interesting features, some of
which are listed below. The following property imposes definite restriction on
the form of such a function.
(i) Every covariance function is non-negative definite. In other words, for
any {t 1 , ••• , tn } c T and any complex numbers Zl' ... , Zn' n = 1,2,3, ... ,
n
n
II
n
L L ZiZjC(t i, t) = i=l
L j=1
L zizjEWi)~(tj)
i=1 j=1
= E
=
{t Zi~(ti)
II it ZiWJ
r~
jt1
Zi~(tj)}
o.
(6.1.7)
(ii) Any complex-valued function on TxT which is non-negative definite
is Hermitian. To show this, let R(s, t) be such a function and consider
n
n
L L R(ti' tj)ZiZj ~ o.
i=1
(6.1.8)
j=l
From this, for n
= 1 we have that
R(t,t)
~
0,
'tit E T.
Next, for n = 2, (6.1.8) yields
+ R(t1,t2)Z1Z2 + R(t2,tdz2Z1 + R(t2,t2)Z2Z2
R(t1,tdz1Z1
This implies that, for all complex numbers
R(t1,t2)Z1Z2
Zl,
~
o.
Z2,
+ R(t 2,tdz1Z2
is real.
(6.1.9)
For Z1 = 1, Z2 = i, (6.1.9) becomes
(R(t 2,td - R(tl,t2))i is real,
so that
R(t 2, td - R(t 1, t 2) is pure imaginary.
Finally, if we set Z1
=
Z2
=
(6.1.10)
1 in (6.1.9), we conclude that
R(tl,t 2) + R(t 2,t 1) is real,
(6.1.11)
131
6.1. Covariance Function C(s, t)
which together with (6.1.10) clearly implies that
R(t 1 ,t 2) = R(t 2,td·
It seems that the last property of the function R(s, t) implies that any
non-negative definite function on TxT is a covariance function of a secondorder stochastic process {e(t); t E T}.
(iii) For any non-negative definite function R(s, t) on TxT (real or complex), there exists a second-order process {e(t); t E T} whose covariance function is precisely R(s, t).
This has already been established in the case when R(s, t) was real (see
Chapter 4; Section 4.4). To show that the statement holds when R(s, t) is
complex, consider the Hermitian form
H(tl, .. ·,t.)
=
• •
L L R(ti,tj)ZiZj,
i=l j=l
(6.1.12)
where {t 1 , ••• , t.} c: Tn = 1,2, .... Let Rl = Re{R} and R2 = Im{R}; then
we can write
(6.1.13)
R(s, t) = Rl (s, t) + iR 2 (s, t).
If we set Zj = Uj - iVj' we readily obtain that
H(t1> ... ,t.) =
• •
L L Rl(ti,tj)(UiUj + ViVJ
i=l j=l
- i=l
L• j=l
L• R 2(t i, tj)(UiVj -
UjVi)
(6.1.14)
(there is no imaginary part because, by assumption, R(s, t) is non-negative
definite). According to Equation (4.3.3),
is the characteristic function of 2n-dimensional Gaussian distribution of a
system of 2n real r.v.'s, say
(6.1.15)
(X(td,···,X(t.), Y(t 1 ), ••• , Y(t.))
with E(X(tJ) = E(Y(tJ) = 0, i = 1, ... , n, and
E{X(tJX(tJ}
= E{Y(t;)Y(tJ} = R 1 (t i,tj),
E{X(tJY(tj)} = -R 2(t i,tj).
(6.1.16)
(6.1.17)
Set
W) = (X(t)
+ iY(t))/J2.
(6.1.18)
We see that
R(s,t) = E{e(s)W)} = R 1 (s,t)
+ iR 2 (s,t). (6.1.19)
Therefore, the system (Wd, ... , W.)) has a Gaussian distribution for all
{t 1 , ••• , t.} c: T and n = 1,2, .... These distributions satisfy the Kolmogorov
6. Second-Order Processes
132
consistency conditions, so there exists a complex-valued process e(t) having
R(s, t) as its covariance function.
Next we shall list a few basic properties of the class of covariance functions.
(iv) The class of covariance functions is closed under additions, multiplications, and passages to the limit. In other words, if Cl and C2 are two covariance functions of random process with a common parameter set T, then
so are IXlC l + (x2C2 and Cl ·C2 when IXl' 1X2 > O. In addition, if {Cn}f is a
sequence of covariance functions and
C
=
lim Cn'
then C is also a covariance function.
It is apparent that non-negative definiteness is preserved under positive
linear combinations or under passage to the limit. In view of what was said
in (iii), this proves the first and third statement.
Let ~l(t) and ~2(t) be independent; then
E gl (S)~2(S)· ~ 1(t)e2(t)} = E g 1(s)~ 1(t)~2(S)~2(t)}
= Egl(S)~1(t)}Eg2(S)~2(t)}
= C l (S,t)C 2(s,t).
Because the first member is a covariance function, the second statement is
proved. To see that two such processes, ~l (t) and ~2(t), exist, it suffices to
assume that ~ 1(t) is Gaussian on a probability space {ill' ,q~\, Pd and, similarly, that e2(t) is normal on {il2' Bi2, P2 }, and then form the product space.
6.2. Quadratic Mean Continuity and Differentiability
Let {~(t); t E T} be an L2 stochastic process with T c R an interval. In general, its covariance function C(s, t) does not provide any direct information
about properties of sample functions of ~(t) such as continuity, differentiability, etc. In this section, we will define analogous concepts which make
sense in Hilbert space and give criteria for L2 continuity and differentiability
in terms of C. What is needed for this purpose is the notion of L2 convergence,
i.e., convergence in L2 norm, which is specified by Definition 5.2.l.
Definition 6.2.1. A second-order process {~(t); t E T} is said to be L2 continuous [or continuous in quadratic mean (q.m.)] at a point t E T if and only if
~(t
+ h) ~ ~(t)
as h -+ O.
According to the definition (5.2.1), this is equivalent to
IIW + h) - W)11 2 = EIW + h) - WW-+O
as h -+ O.
(6.2.1)
6.2. Quadratic Mean Continuity and Differentiability
133
If a process is q.m. continuous at every t E T, we will say it is a q.m.
continuous process.
The following two propositions establish a relation between q.m. continuity of a stochastic process and continuity of its covariance function.
Proposition 6.2.1. Let {(t);t E T} be an L2 process with C(s,t) = E{(s)(t)}.
The process is q.m. continuous at a point t E T if and only if C( " . ) is continuous at (t, t).
PROOF.
Set
IIW + h) - (t)112
=
=
+ h,t + h) - C(t + h,t) - C(t,t + h) + C(t,t)
C(t + h,t + h) - C(t,t) - (C(t + h,t) - C(t,t»
- (C(t, t + h) - C(t, t».
C(t
From this, it is clear that the process is q.m. continuous at the point
C(', .) is continuous at (t, t).
Conversely, if {(t); t E T} is q.m. continuous at point t E T, then
t E
T if
IC(t + h, t + h') - C(t, t)1 = lEW + h)~(t + h') - E(t)(t)1
= IE«((t + h) - W»W + h')
+ E«((t + h') - ~(t»~(t)1
::s; IE«((t + h) - W»W + h')1
+ IE«((t + h') - (t»(t)l.
From this and the Schwarz inequality (5.1.1), the assertion follows.
D
The next proposition shows that q.m. continuity of (t) on T implies
continuity of C(s, t) on TxT.
Proposition 6.2.2. If C(t, t) = R(t) is continuous at every t E T, C(', .) is continuous on TxT.
PROOF.
Consider
IC(s + h, t + h') - C(s, t)1
+ h)(t + h') - E(s)(t)1
= IE«((s + h) - (s»(t + h')
+ E«((t + h') - (t»(s)1
::s; [E«((s + h) - (s»(t + h')[
+ IE«((t + h') - (t»(s)l.
= IE(s
Again applying the Schwartz inequality and taking into account the previous
proposition, the assertion follows.
D
6. Second-Order Processes
134
Remark 6.2.1. Continuity in q.m. of a second-order process does not imply
sample function continuity. As an example, consider a time-homogeneous
Poisson process N(t) (see Remark 2.4.2) with E{N(t)} = At. As is known, this
process has independent increments so that, for any 0 < s < t,
E{N(s)N(t)}
= E(N(t) - N(s»N(s) + EN2(S)
= A(t - S)AS + As(1 + AS),
which yields
C(s, t) = AS.
Because C(s, t) is a continuous function, the Poisson process N(t) is q.m. continuous. However, its sample functions are step functions with probability 1.
Definition 6.2.2. A second-order random process {e(t); t E T} is said to have a
derivative e'(t) in q.m. at a point t E T if
W + h) - e(t) ~ e'(t) when h --. o.
h
The r.v.
e'(t) = de(t)
dt
is called the q.m. derivative of the random process e(t) at the point t E T.
In the sequel, we will need one more definition.
Definition 6.2.3. The second generalized derivative of a covariance function
C(s, t) is defined as the limit (if it exists) of the quotient
1
hh' {C(s + h, t
+ h') -
C(s
+ h, t) -
C(s, t
+ h') + C(s, t)}
as h, h' --. 0, which is denoted by
(6.2.2)
Proposition 6.2.3. Let {e(t); t E T} be a second-order process. A necessary and
sufficient condition for q.m. differentiability of e(t) at t E T is that the generalized derivative (6.2.2) exists.
PROOF. Write
E{e(t
+ h) h
1
= hh' {C(t
e(t)
W + h') h'
+ h, t + h') -
W)}
C(t, t
+ h') -
C(t
+ h, t) + C(t, t)}
and the assertion follows from the Loeve criterion (see Proposition 5.2.5) 0
6.2. Quadratic Mean Continuity and Differentiability
135
Corollary 6.2.1. If {e(t); t E T} is q.m. differentiable at a point t E T, dE {W) }/dt
exists and
Eg'(t)}
(6.2.3)
:tE{W)}.
=
As a matter of fact, this is implied by the q.m. differentiability of the process
at the point t E T and the inequality
IE{e'(t) - e(t +
h- W)}I:::;;
(Ele'(t) _ e(t + h~ - W)12YI2.
If a second-order process {W); t E T} is q.m. differentiable at every t E T,
then {e'(t); t E T} is also a second-order random process.
Proposition 6.2.4. Let {e(t); t E T} be a second-order process with covariance
function C(s, t). If the generalized derivative
02C(S,t)
osot
(6.2.4)
exists for every s = t and t E T, then e(t) is q.m. differentiable on T. In addition,
(6.2.5)
and
(6.2.6)
PROOF.
Only formulas (6.2.5) and (6.2.6) require a proof. Write
Eg'(s)W)}
= lim E(W) e(s + h) - e(s))
h
11-0
= lim C(s
+ h, t) -
C(s, t)
h
11-0
oC(s,t)
=
----as'
Similarly,
Ee'(s}e'(t) = lim E (e(s
II,/J'-O
= lim C(s
11,11'-0
02C(S, t)
osot .
+ h) h
e(s) e(t + h'~ - e(t))
h
+ h,t + h') -
C(s,t
+ h') hh'
C(s
+ h,t) + C(s,t)
6. Second-Order Processes
136
The last result implies that the second generalized derivative exists everywhere on TxT. This proves the proposition.
0
Remark 6.2.1. The concept of a stochastic integral in quadratic mean was
discussed in some detail in Chapter 3, Section 3.8, but only in the case of real
second-order processes. The same concept and results hold in the case of
complex-valued second-order processes.
6.3. Eigenvalues and Eigenfunctions of C(s, t)
°
Let {~(t); t E T} be a complex-valued second-order stochastic process on a
probability space {n,fJIJ,p}, such that E{~(t)} = for each t E T. Let C(s,t) be
its covariance function. In this section, we will give a brief review of some
basic properties of eigenvalues and eigenfunctions of C(s, t). This is required
for the problem of orthogonal expansions of second-order stochastic processes which will be discussed in the next section.
Denote, as before, by <~(T) the closed linear manifold spanned by
g(t); t E T} (see Definition 5.3.4). Clearly, ~(T) is a (Hilbert) subspace of
L2 {n, fJIJ, Pl. As we have established in Chapter 5, Section 5.6, if the process
~(t) is q.m. continuous on T, the subspace ~(T) is separable. In other words,
there exists a countable everywhere dense subset in <~(T). From this, by
means of the well-known Gram-Schmidt orthogonalization procedure (see
Section 5.6 of Chapter 5) we can always construct an orthonormal family
{Zk}1', which is complete in <~(T). In such a case, according to Proposition
5.5.2, for every t E T, ~(t) admits an expansion of the form
<
<
00
~(t) =
L {3k(t)Zk,
(6.3.1)
k=l
which is convergent in q.m., where
(6.3.2)
From (6.3.1), it seems to follow that under certain conditions we may write
C(s, t) = (~(s), W)) = E g(s)~(t)}
=
E
{~ j~ {3i(S)Zi{3j(t)Zj}
(6.3.3)
As we shall see later, {{3k(t)}1' is an orthogonal sequence of functions such
that
b
i =F j
(6.3.4)
({3i(t), {3it)) =
{3i(t){3j(t) dt = 11k . = .
f
a
{O,
"
I
]
6.3. Eigenvalues and Eigenfunctions of C(s, t)
137
(when T = [a, b]). In such a case, if term-by-term integration of (6.3.3) is
permitted, we obtain readily that, for all k = 1,2, ... ,
{3k(S) - Ak
r
C(s, t){3k(t) dt
= O.
(6.3.5)
This is a Fredholm linear integral equation of the second kind. Therefore, the
Fourier coefficients in expansion (6.3.1) are solutions of the integral equation
(6.3.5). For this reason, it seems appropriate to list some basic properties of
integral equations of the form (6.3.5)
Consider the integral equation
q>(x) - A
r
K(x, y)q>(y) dy = 0,
(6.3.6)
where the kernel K(x, y) is a given (real or complex) function, q>(.) is unknown,
and A is a parameter. Equations of this type are called Fredholm homogeneous linear integral equations of the second kind. In general, Equation (6.3.6)
has only the trivial solution q>(x) == O. For certain critical values of A, however,
there may exist nontrivial solutions.
The values AO' A1 , ••• for which nontrivial solutions of (6.3.6) exist are called
"eigenvalues" of the kernel K(x, y), and the corresponding q>'s are eigenfunctions. The eigenvalues and eigenfunctions depend upon the kernel
K(x,y).
Of particular interest here are kernels which are "Hermitian," i.e.,
K(x,y)
= K(y,x).
(6.3.7)
As we shall see, such kernels possess a number of important properties,
which collectively permit a thorough analysis of the integral equations in
question. We now discuss some basic properties of eigenvalues of a Hermitian
kernel. The natural context for our discussions will be the Hilbert space
L2 [a, b] of square integrable complex-valued functions defined on the interval [a, b]. The symbol 11·11 will be used to denote the norm and
r
{r r
(f, g) =
f(x)g(x) dx
is the inner product. In the case of square integrable complex-valued functions of two independent variables, we will write
IIKII =
IK(X,yWdXdyf/2.
(6.3.8)
Finally, we shall often write the linear transformation (6.3.6) as
q> = AKq>.
Proposition 6.3.1. Any non-null Hermitian kernel K(x,y) satisfying
must have at least one eigenvalue A.
(6.3.9)
IIKII <
00
6. Second-Order Processes
138
In other words, if the kernel K(x,y) is Hermitian, Equation (6.3.6) must
have nontrivial solutions. The next proposition is easy to prove.
Proposition 6.3.2. The eigenvalues of Hermitian kernels are real.
PROOF.
Assume that II <p II =I 0; then
)'(K<p, <p) = ()'K<p, <p) = (<p, <p) = 11<p1l2,
so that )'(K<p, <p) is positive. On the other hand, because K is Hermitian,
(K<p,<p) = (<p,K<p) = (<p,K<p) = (K<p,<p),
o
which, with the above, implies that), must be real.
Corollary 6.3.1. The eigenvalues of a Hermitian kernel K(x, y) and its conjugate
K(x, y) are identical.
Corollary 6.3.2. If the Hermitian kernel K(x,y) is positive definite, its eigenvalues are positive. As a matter of fact,
rr
Gr r
K(x, y) <p (x) <p (y) dx dy > 0
and
K(x, y)<p(y) dy >
CP(X)dX))'
o.
Consequently,
which implies that), > O.
Proposition 6.3.3. To every eigenvalue), of a Hermitian kernel K(x,y), there
corresponds at least one eigenfunction. The number of linearly independent
eigenfunctions corresponding to a given eigenvalue is finite.
Remark 6.3.1. A kernel of the type
K(x,y)
un
=
n
L h;(x)J;(y),
;=1
(6.3.10)
where {h;}~ and
c L 2 [a,b] are two families of linearly independent
functions, is called a Pincherle-Goursat kernel.
Proposition 6.3.4. Every nonzero Hermitian kernel either has a countably infinite number of eigenvalues or is a Pincherle-Goursat kernel.
We are not going to prove this proposition here. However, we shall prove
the following result.
6.4. Karhunen-Loeve Expansion
139
Proposition 6.3.5. Two eigenfunctions, cp(.) and t{!(.), of a Hermitian kernel
K(x,y), corresponding to two different eigenvalues A1 and A2' are orthogonal to
one another.
PROOF. Note that
(cp,t{!)
= (A1 K cp,t{!) = A1(Kcp,t{!) = A1(CP,Kt{!)
=
A1
;:(CP,A.2 K t{!)
2
A1
A1
2
Jl.2
= ;:(CP,A.2 K t{!) = ,(cp,t{!),
which proves the assertion.
D
6.4. Karhunen-Loeve Expansion
Let {~(t); t E [a, b]} be second-order complex-valued q.m. continuous process
defined on a probability space {n,~,p} with Eg(t)} = O. Its covariance
function C(s, t) is non-negative definite Hermitian and, according to Proposition 2.2, continuous on [a, b] x [a, b]. Unless C(s, t) is a Pincherle-Goursat
kernel, the covariance function has a countably infinite number of eigenvalues
{Ak} f (see Proposition 6.3.4) such that Ak > 0 for all k = 1,2, ... (see Corollary
6.3.2). The corresponding eigenfunctions {CPk(t)} f are continuous on [a, b]. In
addition, we will assume that
{o1
(CPi' cp) =
ifi,., j
·f·
.
1 Z= ]
for all i, j = 1, 2, ... (see Proposition 6.3.5).
Assume that C(s, t) is square integrable; then, for any fixed s E [a, b], the
Fourier series of C(s, t) with respect to s is
L Ck(t)CPk(S),
00
C(s, t) '"
k=1
where
Consequently,
00
1
C(s, t) '" k~1 Ak <Pk(t)CPk(S).
If the system of eigenfunctions {CPk(t)} 'f is complete, we would have that
00
C(s, t)
= k~1
1
.
Ak <Pk(t)CPk(S)
6. Second-Order Processes
140
in mean square (see Proposition 5.5.2). In other words,
1
L ,- q>k(t)CPk(S).
n
C(s, t) = l.i.m.
n-oo k=l Ilk
The following proposition due to Mercer (1909) holds for all square integrable continuous Hermitian kernels whose eigenvalues are all of one sign.
Proposition 6.4.1. Let get); t E [a, bJ} be a second-order q.m. continuous stochastic process with Eg(t)} = 0 and square integrable covariance function
C(s, t) with eigenvalues {Ak} f and eigerifunctions {CPk(t)} f, which form an orthonormal sequence. Then,
(6.4.1)
where the irifinite series converges absolutely and uniformly on [a, bJ x [a, b].
A proof of this proposition can be found in Riesz and Sz.-Nagy (1955).
We now embark on a proof of a result known as the Karhunen-Loeve
orthogonal expansion. As before, let
g(t);tE[a,bJ}
(6.4.2)
be a second-order q.m. continuous process with E {~(t)} = 0 and covariance
function C(s, t). According to Proposition 3.8.1, the integrals
(6.4.3)
exist for all k = 1,2, ... and represent r.v.'s. Clearly,
Eg k } =0,
k=1,2, ...
and
_
- (A;Aj)
1/2 f abfba
-
C(s, t) cpit) cp;(s) dt ds.
From the fact that
we obtain
-
(Ai)1/2 fb
EgiO = ~
a
-
{(A;/Aj )1 /2
q>is)CPi(s)ds = 0
This clearly implies that the r.v.'s {~k}f are uncorrelated.
if i = j
ifi # j.
(6.4.4)
6.4. Karhunen-Loeve Expansion
141
Proposition 6.4.2 (Karhunen-Loeve). On [a, b], the stochastic process (6.4.2)
admits an orthogonal expansion of the form
(6.4.5)
where the infinite series in (6.4.5) converges in q.m. to
PROOF.
~(t)
uniformly in t.
Consider
n
+ k~l
1
_
(Ak)1/2 IPk(t)IPk(t).
(6.4.6)
Using (6.4.3), it is not difficult to verify that
E
{W)~d =
(Ak)1/2 E
{f ~(s)~(t)IPk(S)
dS}
Similarly,
This and (6.4.6) then yield
E 1W) - kt1
(Ak~1/2 IPk(t)~k 12 = C(t, t) -
kt1
~k IPk(t)IPk(t).
By letting n -+ (fJ and invoking the Mercer theorem, we see that the series in
(6.4.5) converges in q.m. to ~(t) for every t E [a, b]. Finally, because C(t, t) is
continuous, the convergence is uniform due to Dini's theorem. This proves
the assertion.
0
6.4.1. Suppose that ~(t) is a real Gaussian process. Then, the r.v.'s ~"
defined by (6.4.3) are normal and, as such, due to (6.4.4), they are independent.
In addition, because
EXAMPLE
L
00
k=l
{
1
E (1 )1/2 IPk(t)~k
li.k
the series in (6.4.5) converges (a.s.).
}2 =
L00 11 IPf(t) = C(t, t),
k=l li.k
6. Second-Order Processes
142
EXAMPLE 6.4.2. Let {e(t); t E [0, I]} be a standard Brownian motion process
(see Definition 3.1.1). As we know [see (3.1.4)], its covariance function is
(6.4.7)
C(S,t) = min{s,t}.
To determine its eigenvalues and eigenfunctions, consider the integral
equation
<p,,(S) - Ak
t
(6.4.8)
C(s, t)<fJk(t)dt = 0,
r
where C(s, t) is specified by (6.4.7). Now write this equation as
Ak
t
t<fJk(t) dt
+ AkS
r
<fJk(t) dt
= <fJk(S).
(6.4.9)
Differentiating both sides with respect to s, we obtain
Ak {S<fJk(S)
+
<fJk(t) dt - S<fJk(S)}
=
<fJ~(s)
or
<fJ~(S) =
Ak
f.1 <fJk(t) dt.
(6.4.10)
Ak<fJk(S).
(6.4.11)
Differentiating once more, we have
<fJ; = -
As is well known, the general solution of this second-order linear differential
equation is
(6.4.12)
From Equation (6.4.9), it clearly follows that <fJk(O) =
from Equation (6.4.10), we deduce that
<fJ~(I) =
o. On the other hand,
O.
(6.4.13)
From the first initial condition and (6.4.13), we have
C2(A k)1/2 COS«Ak)1/2) = 0,
which implies that (2k)1/2 = n(2k - 1)/2, so that
<fJk(S)
= C2sin(~)(2k -
l)s,
k
= 1,2, ....
(6.4.14)
Finally, to determine the constant C2 , we use the fact that {<fJk(t)}'f are
orthonormal functions, so that
6.5. Stationary Stochastic Processes
143
From this, we deduce that
and, thus,
(6.4.15)
Finally, we have
At
= (k -
= 1,2, ....
(6.4.16)
sin 1I:(k ~ t)t ~k'
(k - 2)11:
(6.4.17)
t)211: 2, k
Therefore, according to (6.4.5),
W) = (2)1/2
f
k=1
where
{~k}l
is a sequence of independent N(O, 1) r.v.'s.
6.5. Stationary Stochastic Processes
Let
g(t);t
E
R}
(6.5.1)
be a second-order complex-valued stochastic process on a probability space
{n,.1i, Pl. The stochastic process is called "wide sense stationary" (see Definition 1.7.9) if, for all s, t, hER,
E {~(t + h)}
= E {~(t)}
and
E {~(s + h)W + h)}
= E {~(s)~(t)}.
(6.5.2)
In this case, E {~(t)} is clearly a constant, say fl., and the covariance function
C(s, t) is a function of t - s, i.e.,
fl.
= Eg(t)},
(6.5.3)
From (6.5.3), we deduce that C( -r) = C(r). In what follows, we will assume,
without loss of generality, that
Eg(t)}
= 0 and Varg(t)}
= C(O) = 1.
The stochastic process (6.5.1), regarded as a family of complex-valued
r.v.'s is a subset of L2 {n,.1i, Pl. Because for every t E R, ~(t) is a point in
L 2 {n,.1I,p}, the process (6.5.1) represents a curve in L 2 {n,.1i,p}. Because
E {~(t)} = (W), 1) = 0
and (~(s), ~(t»
= C(t -
s),
(6.5.4)
this curve lies in the subspace which is the orthogonal complement (see
Definition, 5.4.1) to {I} in L 2 {n,.1i,p}.
According to the theory of second-order processes, the stochastic process
(6.5.1) is said to be "continuous" if it is q.m. continuous (see Definition 6.2.1),
144
6. Second-Order Processes
that is, if
Eg I(t
+ h) -
~(t)12} -+ 0
as h -+ 0 for all t E (-00,00). From Chebyshev's inequality, it follows that
(6.5.1) is also stochastically continuous. Indeed,
P{lW
for all t
E
+ h) -
~(t)1
> e}
I
e
~ 2E{I~(t
+ h) -
WW}-+O
R, as h -+ O. We now have the following result.
Proposition 6.5.1. A wide sense stationary stochastic process is q.m. continuous
if and only if the real part of its covariance function is continuous at O.
PROOF. For the process (6.5.1),
E{I~(t
+ h) -
+ h) - ~(t»(~(t + h) - W»
= EI~(t + h)12 + EIWW
- Eg(t)~(t + h) + W + h)W)}
= 2C(0) - C(h) - C(h)
= 2(C(0) - Re{C(h)})
= 2 Re{l - C(h)}.
~(t)12} = E(W
o
The proof now follows directly.
As an example consider the random telegraph process.
EXAMPLE 6.5.1. Let {~(t); t E R} be a real-valued random process where, for
each t E R, the r.v. ~(t) may assume only two values, -1 and + 1, with
Pg(t)
= -I} =!,
Pg(t)
= +l} =!.
The sequence of transition times {Td':'oo forms a time-homogeneous Poisson
process with a parameter A. > O. More specifically, if Pk(U) is the probability of
k transitions in (t, t + u), then
Pk(U) = e-A.U (A.U)k)
~'
k = 0, 1, ....
Show that ~(t) is wide sense stationary and determine its covariance function.
Demonstration:
First we have, for all t
E
R,
Eg(t)} = 1· P{W)
= I} + (-l)Pg(t)
=
-I} = O.
On the other hand, after some straightforward calculations, we obtain
+ T) = -I} = P{W) = 1,W + T) = I},
P{W) = -l,W + T) = I} = Pg(t) = I,W + r) = -I}.
Pg(t) = -l,W
6.6. Remarks on the Ergodicity Property
145
Next, for t > 0,
Pg(t + t) = ll~(t) = l} =
L
<Xl
k=O
P2k(t)
= e-;'tch(At).
Similarly,
Pg(t
+ t) =
ll~(t)
where
ch(x) = t(e X
= -l} =
+ e-
L P2k+1(t) = e-;"sh(At),
<Xl
k=O
X ),
From this, we obtain
C(t, t
+ t) =
E g(t)~(t
+ tn
= l,W + t) = l} - Pg(t) = -l,~(t + t) = l}
P g(t) = 1, W + t) = -1} + P g(t) = -1, ~(t + t) = - l}
= Pg(t)
=
e-;'t{ch(At) - sh(At)} = e- Ut •
If, however, t < 0,
C(t
Therefore,
C(t, t
+ t, t) =
eUt •
+ t) = C(t) = e- U1tl.
This shows that the stochastic process ~(t) is wide sense stationary. Figure
6.1(a) is a graphical depiction of a sample function and Figure 6.l(b) a
depiction of the covariance function C(t) of the process ~(t).
6.6. Remarks on the Ergodicity Property
Consider an arbitrary random process ~(t) and suppose that we want to
determine some characteristics of the process, such as its mean, variance,
and covariance function. For the first two characteristics, one needs one-
r----'I 1
C(T)
,,
1
T -2'
I
,
I
-.J
T
-11
1
I
0
TIl
1
I
~.,
T21
I
I
L
(a)
(b)
Figure 6.1
146
6. Second-Order Processes
dimensional marginal distributions of the process. The evaluation of the
covariance function requires bivariate marginal distributions of ~(t).
In the case when these distributions are not available, estimates of these
characteristics can be achieved if enough sample functions of the process ~(t)
are available.
Large classes of stationary random processes have an interesting property
known as "ergodicity." This feature, roughly speaking, means that on the
basis of a single realization over a sufficiently long time interval, we can
obtain an estimate of the mean of the process and, possibly, of some other
parameters. More specifically, if ~(t) is an ergodic stationary process, then the
time average is equal to the ensemble average, namely,
1
lim -2
T .... oo
T
fT
-T
C;(t) dt
= E {e(s)} = ex,
(6.6.1)
where the convergence is in the q.m. sense. In general, if
1
~~ 2T
fT
-T
f(W))dt
= E{f(~(s))}
(6.6.2)
holds for every Borel function f( . ) such that
E If(~(t))1
<
00,
we say that the process is ergodic. From this, we also obtain that
=
1
lim -2
T
T .... '"
fT
W)~(t
-T
+ r) dt.
(6.6.3)
In general, it is not easy to give a simple sufficient condition for ergodicity. For Gaussian processes, however, we have the following result. Let
{~(t); -00 < t < oo} be Gaussian and stationary with
E{e(t)} = ex,
Then
~(t)
is ergodic if
f:
IR(r)1 dr <
00.
(6.6.4)
Sometimes it may happen that (6.6.2) holds for some function f(·). For
instance, ifit holds for linear functions only, we say that the process is ergodic
with respect to its mean. This question can also be asked in a somewhat more
direct way. Again, let {~(t); -00 < t < oo} be a stationary process; we are
interested in a sufficient condition for
f ~(t)
1
-2
T
T
-T
dt
~ ex,
(6.6.5)
6.6. Remarks on the Ergodicity Property
where
IX =
147
Eg(t)}. In other words, we want to know when
~(t)dt -
lim E {-21 fT
T-T
T~oo
1X}2
=
o.
(6.6.6)
Clearly,
If T
E { 2T -T ~(t)dt -
IX
}2
1 f T f T
= (2T)2
-T -T (Eg(t)~(s)) - 1X2)dtds
=
(2 1 )2 fT fT R(t - s)dtds.
-T-T
T
We change the variables of integration by putting
z
r = t - s,
t =
!(z + r),
=
t
+ s,
s = !(z - r).
The Jacobian of the transformation is
ot
or
as
or
ot
oz
aS
oz
2
2
I~
2
1
2·
1
On the o"ther hand, the domains of integration are shown in Figure 6.2.
Therefore, we have
1
(2 T)2
fT fT
-T
1
= 2(2T)2
-T
R(t - s) dt ds
(fO f<+2T
-2T -<-2T R(r)dzdr
r 2T r-<+2T
)
J<-2T R(r) dz dr
+ Jo
z
2T
T
-T
0
T
s
_2~T~------~--------}-~r
-T
-2T
Figure 6.2
6. Second-Order Processes
148
(f:2T R(r)(4T)dr + I2T R(r)(4T - 2r)dr)
=
2 (21T) 2
=
2~[f:2T R(r)dr + I2T R(r)( 1 - 2~ )dr]
::; 21T[f:2T IR(r)ldr + I2T IR(r)l(l - 2~)drJ
1 foo
::; 21 f2T IR(r)1 dr ::; -2
IR(r)1 dr.
T -2T
T
-ro
Thus, (6.6.5) holds if
f:
IR(r)1
dr <
00,
which is a simple sufficient condition for ergodicity.
Problems and Complements
6.1. Show that the following functions are non-negative definite:
R I (s, t) = min(s, t),
(i)
(ii)
Rz(s,t)
(iii)
I - It - sl,
={
It -
sl ::;
0;
1
0,
It - s) > 1
= min(s, t) -
st,
R3(S, t)
(iv)
z
s, t
R 4 (s,t)
= exp{ -It
-
s, t
E
R;
s, t E [0,1];
sl},
s, t
E
R.
6.2. Let C(s, t) be a covariance function and Pn (' ) a polynomial with positive coefficients. Show that
Cl(s,t) = Pn(C(s,t))
is also a covariance function.
6.3. Consider the stochastic process
{~(t);t
W)
=
z
O}, where
X cos ('It
+ 0)
and X > 0, 'I > 0 are r.v.'s independent of 0 with finite second moments, whereas
is uniform in [0,2n]. Show that W) is stationary.
o
6.4. Let hi (t), ... , hn(t) be real functions and a l , ... , an positive constants. Show that
C(s, t)
=
n
L akhk(s)hk(t)
k~l
is a covariance function.
6.5. Let {W); t
z
O} be a stochastic process defined by
W) =
IX sin(fJt
+ X),
149
Problems and Complements
where ex. > 0 and fJ > 0 are constants and X ~ N(O, 1). Determine Eg(t)} and
Eg(s)~(t)}. Is the process wide sense stationary?
6.6. Let {X.} c: L2 and X. ~ X. Is X
E
L2? Show that
Z.~Z,
U.~U.
(i) EX. -+ EX,
(ii) EIX.12 -+ EIXI 2,
(iii) (Xn' X) -+ (X, X).
6.7. Let {Zn}, {Un} c: L2 be such that
Show that EZnUn -+ EZU.
6.8. Let {X.} c: L2 and Xn ~ X. Find a condition under which X. ~ X
Xn -+ X (a.s.).
=>
6.9. Let {~(t); t E T} c: L2 be a random process and to E T. Show that ~(t) ~ Z E
L2 as t -+ to if and only iffor all {tn}f, {sn}f such that t. -+ to, Sn -+ to, we have
EWn)~(s.) -+
Co
(constant).
6.10. If g(t);t E [a,bJ} c: L2 is L 2-continuous, show that, in L 2 ,
I'
-d
~(s)ds = ~(t),
dt •
a ~ t ~ b.
6.11. Find the Karhunen-Loeve expansion on the interval [0, 1] of an L2 process with
covariance function
C(s, t) = st.
6.12. Determine eigenvalues and eigenfunctions of the Fredholm homogeneous integral equation
q>(X) - A
I'
C(x, t)q>(t) dt = 0,
where
C(x,t) = {
cosxsint,
.
cos tsmx,
O~x~t
t
~
x
~
n.
6.13. Let C(s, t) be given by
C (s,t) = {
S(t - 1),
0~s~t
t(s - 1),
t
~
s
~
1.
Find the eigenvalues and complete orthonormal eigenfunctions.
6.14. Consider (0
~
X,Y
~
1)
K(x,y) = min(x,y) - xy.
Show that the kernel is non-negative definite and find its eigenvalues and
eigenfunctions.
6.15. Determine the eigenvalues and eigenfunctions of the symmetric kernel
K(s, t) =
{t(S + 1),
s(t
+ 1),
t
~s~1
t
~
s
~
1.
CHAPTER 7
Spectral Analysis of Stationary Processes
7.1. Preliminaries
Let {~(t); t E R} be a wide sense stationary, complex-valued random process
with E{~(t)} = 0 and
C(t) = E g(s)~(s
+ t)}.
(7.1.1)
In this chapter, we will continue to build a theory of this particular class of
stochastic processes based on the covariance function alone, using methods
discussed in Chapter 5.
As we have established in Chapter 6, the covariance C(t) is a non-negative
definite complex function of real argument. In this section, we will see that the
class of such functions coincides with the class of complex functions of real
argument t/J(t) which can be written as
K > 0,
t/J(t) = Kt/Jo(t),
(7.1.2)
where t/Jo(t) is the characteristic function of a real-valued r.v. Consequently,
every covariance function should have a representation of the form
t/J(t) =
J:oo e
i1x
dF(x),
(7.1.3)
where F(-) ~ 0 is a nondecreasing bounded function on R. Let us first prove
the following result.
Proposition 7.1.1. The covariance function C(t) is continuous on R
continuous at t = O.
PROOF.
Because C(t) is non-negative definite,
if
it is
7.1. Preliminaries
151
n
n
LL
;=1 j=1
Set n = 3, Z1 = 1, Z2 =
C(O)
=
Z, Z3
+ zC( -u) -
=
(7.1.4)
C(t; - tj)ZiZj ~ O.
-Z, t1
= 0, t2 = U,
t3
= v in (7.1.4) to obtain
+ zC(u) + IzI2C(U) -lzI2C(U - v)
- zC(v) - IzI2C(U - v) + IzI2C(0)
C(O) + 2 Re{z[C(u) - C(v)]} + 2IzI2[C(0) - Re{C(u - v)}]
zC( -v)
~ O.
Writing
C(u) - C(v)
z
=
= Ic(u) - c(v)le i8 ,
xe- 8i , x real,
the last inequality becomes
C(O)
+ 2xlc(u) - C(v)1 + 2X2[C(0) - Re{C(u - v)}]
Because this holds for all x
E
~ O.
R, the discriminant cannot be positive, so that
lC(u) - c(v)1 2 ~ 2C(0) [C(O) - Re{C(u - v)}J.
(7.1.5)
Now, by assumption, C(.) is continuous at o. Thus, because in the last
inequality the right-hand side tends to zero as u -+ v, so does the left-hand
side. This proves the assertion.
0
Remark 7.1.1. From (7.1.5) we see that C(t) is uniformly continuous on R ifit
is continuous at zero.
The following is the celebrated Bochner-Khinchin theorem.
Proposition 7.1.2. A complex-valued function C(t) defined on R continuous at
zero is the covariance function of a wide sense stationary stochastic process if
and only if it can be written in the form
C(t) =
f:
e itx dF(x),
(7.1.6)
where F( . ) is a real nondecreasing bounded function on R.
The function F(· ) is referred to as the spectral distribution of the process
It is uniquely defined up to an additive constant and we can always
suppose that
(7.1.7)
F( -00) = 0,
F( +00) = C(O).
~(t).
In addition, we assume that F(·) is right-continuous. If F(·) is absolutely
continuous, the derivative
f(·) = F'(·)
exists and is called the spectral density of the process.
7. Spectral Analysis of Stationary Processes
152
Remark 7.1.2. Let <p( . ): R --+ R+ = [0, 00) be a continuous symmetric function
such that <p(0) = 1 and <p(t) --+ as t --+ 00. In addition, if <p(.) is convex on R+,
then <p(. ) is a characteristic function. This result is due to P6lya. It is useful
in establishing if a certain function is non-negative definite.
°
EXAMPLE
7.1.1. Let us see if the following functions are non-negative definite:
B (t)
o
(i)
=
(ii)
{I°- It I
Bl (t)
1
if It I ~
ifltl>l;
= e- I/I ;
B2 (t) = ei/i •
(iii)
Clearly, Bo(t) satisfies all the conditions of the P6lya theorem and, therefore, represents a characteristic function. The same conclusion holds for Bl (t).
However, B 2 (t) is not non-negative definite.
7.1.2. Let {Zk}~ c L 2 {n,81,p} be an orthonormal family of
complex-valued r.v.'s with E{Zd = 0, k = 1, ... , n; then
EXAMPLE
n
~(t) =
L
k=l
Zk exp(iAkt),
Ak real,
is a wide sense stationary stochastic process. Its covariance function is
n
E {e(s)~(s
+ t)} = (~(s), ~(s + t)) = L
k=l
exp(iAkt).
The next proposition gives the spectral representation of a covariance
function in the discrete case.
Proposition 7.1.3 (Herglotz's Theorem). Let C(n) be the covariance function
of a wide sense stationary sequence {en}':' with E{en} = 0, then
C(n) =
f~" e iAn dF(A),
(7.1.8)
where F(·) is a bounded nondecreasing function with support [ - n, n].
The last two propositions will be proved in the next section.
Remark 7.1.3. If the covariance function C(t) is absolutely integrable on
(-00,00), i.e., if
f:
IC(t)1 dt <
00,
(7.1.9)
then, as is known, F(· ) is absolutely continuous and
1
f(x) = -2
n
foo
-00
e-ixlC(t)dt.
(7.1.10)
7.2. Proof of the Bochner-Khinchin and Herglotz Theorems
If (7.1.9) does not hold, we have, for any
Xl
153
< x2,
p
2
F (X2 ) - F( Xl ) -- - 1 l'1m fT exp(itxd - . ex (-itX )C()d
t t.
21t T .... oo -T
zt
(7.1.11)
Remark 7.1.4. If F( . ) is a symmetric function, i.e., if
F(x)
= C(O) -
F( -x
+ 0),
(7.1.12)
the covariance function is real and
f~oo (cos tx) dF(x).
C(t) =
(7.1.13)
In addition, if (7.1.9) holds,
J(x) = -1 foo C(t)(costx)dt.
1t
0
(7.1.14)
A covariance function is real if and only if (7.1.12) holds.
EXAMPLE 7.1.3. For the random telegraph process (see Example 6.5.1), we
have established that
C(t)
= e-2.l lt l,
-00
< t<
00.
This is clearly an absolutely integrable function so we can use the formula
(7.1.14)
J(x)
= -1 foo e-2A.t cos(xt) dt.
1t
0
Integrating by parts twice, we obtain
~ too e- w
cos(xt) dt = 2!A.
(1 - ~~ too
e-2.lt cos(xt) dt).
Therefore,
7.2. Proof of the Bochner-Khinchin and
Herglotz Theorems
We shall first prove the Bochner-Khinchin theorem. For this purpose we
need the following lemma.
Lemma 7.2.1. Let tp(.) be bounded and integrable on [ - T, T] and such that,
Jor all x E R,
7. Spectral Analysis of Stationary Processes
154
= fT e-itx<p(t)dt
fT(X)
-T
~ o.
(7.2.1)
Then, fT(X) is integrable on R = (-00,00).
PROOF. From (7.2.1), it is clear that fT(·) is continuous and, therefore, integrable on every finite interval. Next, set
GT(X)
= f~x fT(U)du.
Because fTO ~ 0, GT(xd :::;; GT(X 2 ) for every
GT ( . ) is bounded. To this end, set (s > 0)
rPT(S)
(7.2.2)
Xl :::;;
X2 • We must show that
=! f2. GT(x)dx ~ GT(s).
S
•
Consequently, if rPT(·) is bounded, so is GT(·). From (7.2.1) and (7.2.2), we have
GT(x)
=f
Therefore,
rPT(S) = -2
x
du
fT.
eItU<p(t) dt
-x
f2' dx fT
S.
= ~ fT
s
=
-T
-T
= 2 fT
-T
-T
sintx
- - <p(t) dt.
t
sin
tx <p(t) dt
-t
~{cosst -
cos 2st} <p(t) dt
t
~ f:T ~{1- 2sin2(~) -1 + 2sin st} <p(t)dt.
2
Set M = suptl<p(t)l; then, clearly,
IrPT(S) I :::;; 6M
f:
(Si:Xy dx,
o
which proves the lemma.
We now prove the Bochner-Khinchin theorem. First, let us show that the
function C(t) defined by (7.1.6) is non-negative definite. Indeed,
il ktl C(tj - tk)ZjZk =
=
=
j~ ktl (f:oo exp[i(tj -
f: C~
foo It
-00
)-1
tk)x] dF(X») ZjZk
exp(itjx)Zj ktl exp(itkX)Zk)dF
eX P(itjX)ZjI2 dF(x)
~ o.
7.2. Proof of the Bochner-Khinchin and Herglotz Theorems
155
y
v
Tt------,
D
' - -_ _ _ _L - - _....
o
U
~---~-----~-.y
o
T
T
Figure 7.1
Next, let us show that any non-negative definite function which is continuous at zero has a representation in the form (7.1.6). To this end, consider
fT(X) =
Clearly, fT(X)
~
2~T IT IT C(U -
v)exp[ -i(u - v)x]dudv.
(7.2.3)
0 because
C(u - v)e-iue iv
~
O.
Let us now make the change of variables in the double integral (7.2.3) as
follows:
t =
dudv = IJldtdy = dtdy,
y = u - v,
u,
where J is the Jacobian of the transformation. From this we obtain (see
Figure 7.1)
fT(X)
=
. dt
-21- (fO dy fT C(y)e-'Yx
nT -T
-Y
1 (f
.
= 2nT
-TO
C(y)e'YX(T
=
+
0
+ y)dy +
~
fT C(y) (1 - ~)
2n -T
T
e- iyX
fT dy f T- Y C(y)e-'Yx
. dt
)
°
°
fT
.
C(y)e-'YX(T
- y)dy )
dy.
According to Lemma 7.2.1, fT(X) is integrable over (-00,00). Therefore, the
inverse Fourier transform of fT(X) exists and is equal to
C(y)
(1 - I~I)
for alllyl ~ T.
From (7.2.4), we have
C(O)
=
=
t:
t:
fT(x)e iXY dx
fT(X) dx
(7.2.4)
7. Spectral Analysis of Stationary Processes
156
for all T > O. Because
IYI) ~ C(y)
T
(
C(y) 1 -
as
T~ 00
uniformly on every finite interval and
fT(X)
~
C(y)
=
we obtain that
as T
f(x)
f:
~ 00,
f(x)e ixy dx,
which proves the proposition when F(x) in (7.1.6) is absolutely continuous. In
a similar fashion, one can show it holds in the general case.
PROOF OF THE BERGLOTZ THEOREM. Clearly, C(n) given by Equation (7.1.8) is
non-negative definite. For each 1i = 1,2, ... and x E [ -n, n], define fn(x) by
then fn(·)
we have
~
O. Because there are n - Iml couples (k, v) for which k - v = m,
nf
J,,(x) = 21
(1 - ~) C(r)e- ixr .
n r=-n+1
n
(7.2.5)
Set
1
Fn(x) = 2n
fX
_It
J,,(u)du.
From (7.2.5) and the fact that
f
it
eixkdx = 0
-"
if k =I 0,
it follows that
f"
1
-2
J,,(u) du = C(O).
n -"
Therefore, Fi· ) is a nondecreasing continuous function. Also from (7.2.5),
(7.2.6)
From the Belly compactness theorem, there is a subsequence of {Fn} which
converges weakly to a bounded distribution F and, hence, converges completely because Fn( -n) = 0 and Fn(n) = C(O) for all n. Equation (7.1.8) then
holds as a consequence of the second Belly theorem and (7.2.6).
D
7.3. Random Measures
157
7.3. Random Measures
In this section, we define the concept of an orthogonal random measure and
of a stochastic integral with respect to it. We begin with some notation and
definitions needed for this purpose. Let {n,.sf,p} be a probability space and
{S,9'} an arbitrary measurable space. Let 9'0 be an algebra which generates
the u-algebra 9'.
Definition 7.3.1. A mapping
(7.3.1)
satisfying the conditions
(i) 17(e) = 0, where e is the empty set,
(ii) 17(A u B) = 17(A) + 17(B) (a.s.) for any disjoint A, B
E ~
(7.3.2)
is called an elementary random (or stochastic) measure.
From the definition it is clear that
m(A) = 1117(A)1I 2 <
00
for any A E~.
(7.3.3)
Definition 7.3.2. An elementary random measure is said to be orthogonal if
(17(A), 17 (B)) = 0
for all disjoint A, B E 9'0'
(7.3.4)
For an elementary orthogonal random measure, the set function m(·) on
defined by (7.3.3) is finitely additive. Indeed, if Ai> A2 E ~ are such that
At ( j A2 =
then
~
e,
m(At u A 2) = 1117(At u A 2)1I 2 = 1117(A t ) + 17(A 2)1I 2
(due to (7.3.2.ii). Now, due to (7.3.4),
1117(A t )
+ 17(A 2)1I 2 = 1117(AdIl 2 + 1117(A 2 )1I 2 = m(At) + m(A2)'
Assume now that m( . ) is subadditive; then it can be extended to a measure
on {S, 9'}. This extension, which is unique, will again be denoted by m. Then
m( .) is called the measure associated with 17('), In the following, we will
denote by
(7.3.5)
the Hilbert space of complex-valued functions on S which are square integrable with respect to m.
Let {BkH c ~ be disjoint and consider
h(s) =
where
Ct, ••• ,
n
L ckIBJs),
k=t
cn are complex numbers. Define
(7.3.6)
7. Spectral Analysis of Stationary Processes
158
t{I(h) =
Is
(7.3.7)
h(S)l1(ds) = ktl Ckl1(Bk)·
This integral is well defined and, clearly, t{I(h)
E
L 2 {n,.1l,p}. If
n
f(s) =
L (X;!B,(S),
i=l
where (Xl' ... , (Xn are complex numbers,
=
jt1 c/ijm(Bj ) =
Is
h(s)f(s)m(ds) = (h,f).
(7.3.8)
The last inner product is defined in L 2 (m) [see (7.3.5)]. Thus, the mapping t{I
preserves the inner product.
Denote by L! c L 2 (m) the set of all functions on S of type (7.3.6). The
integral (7.3.7) represents a linear mapping of L! to L 2 {n,.1l,p}. Because the
inner product is preserved, the mapping is clearly isometric. Now, let g E
L 2 (m) be arbitrary and let {hn}f be a sequence offunctions of type (7.3.6) such
that
II g -
hn I
->
as n ->
0
(7.3.9)
00.
Then, due to (7.3.8),
Ilt{I(hn) -
t{I(hm) I =
Ilhn-
hmll
->
0 as m, n ->
00,
(7.3.10)
which shows that {t{I(hn)}f is a Cauchy sequence. By the Riesz-Fisher theorem (Proposition 5.2.4), there exists a r.v., say t{I(g) E L 2 {n,.1l,p}, such that
IIt{I(g) - t{I(hn)1I
->
0
as n ->
00.
This LV. is called the stochastic integral of g E LAm) with respect to the
elementary orthogonal random measure 11. We denote it by
t{I(g) =
Is
(7.3.11)
g(S)l1(ds).
Thus, the mapping
(7.3.12)
represents an isometric isomorphism. The following properties of t{I are
straightforward consequences of its construction: For any f, g E L 2 (m),
(i)
(ii)
(iii)
(t{I(f), t{I(g)) = (f, g);
Ilt{I(f)11 2 = IIfl12
=
Is
If(sWm(ds);
+ c2 g) = C 1 t{I(f) + C2 t{I(g) (a.s.);
Ilt{I(fn) - t{I(f)11 -> 0 if Ilf - fn I -> 0 as n -> 00.
t{I(cJ
(7.3.13)
7.3. Random Measures
159
Can an elementary orthogonal stochastic measure 1'/, which is defined on
Yo, be extended to g. The answer is affirmative and we will use the definition
of the integral (7.3.11) to prove it. For A
i7(A)
=
E
g, define
Is IA(s)I'/(ds).
(7.3.14)
It follows from (7.3. 13.ii) that
i7(AI n A 2) = i7(A I ) + i7(A 2) (a.s.)
for AI' A2
E
g such that Al n A2
= ().
From (7.3.13.i), we also have
11i7(A)11 2 = m(A),
A
(7.3.15)
E [f'.
Let us show that i7 is countably additive in the q.m. sense. Let {Bk}f
be disjoint and let
c
g
Then,
n
00
=
00
L:I !/J(IBJ - L1 !/J(IBJ = k=n+l
L I/I(IBJ
However,
as n --+
00.
Consequently,
~ i7(B
I i7(B) -
k)
11--+ 0 as n --+ 00.
From (7.3.13.i), we also have
if AI' A2 E !/' are disjoint.
The mapping
r;: !/' --+ L2 {n, 86', P}
is countably additive in the q.m. sense and coincides with 1'/ on Yo. We will
call i7 an orthogonal stochastic measure and write from now on 1'/ instead of
i7. We also call
I/I(g)
=
Is g(s)I'/(ds),
defined above, a stochastic integral with respect to '1.
160
7. Spectral Analysis of Stationary Processes
7.4. Process with Orthogonal Increments
In the rest of this chapter {S, Yo m} = {R, 9l, m}, where R is the real line and
9l is the a-algebra of Borel subsets of R. The purpose of this section is to show
that orthogonal random measures are generated by processes with orthogonal increments, and vice versa. Let {Z(t);tER}cL 2 {n,9i,p}, with
E{Z(t)} = 0 for all t E R.
Definition 7.4.1. The stochastic process {Z(t); t
increments if, for any to < tl < t 2 ,
E(Z(td - Z(t O ))(Z(t 2 )
E R}
is said to have orthogonal
Z(td) = O.
-
(7.4.1)
In the following, we shall say that the process is right-continuous in the
q.m. sense if, for every t E R fixed,
IIZ(t) - Z(tk)11 - 0
as tk! t.
(7.4.2)
Let us now show that an orthogonal stochastic measure '1 generates a process
having orthogonal increments. Let m be the finite measure associated with '1.
Set
Z(t)
= '1«
-00, tJ).
(7.4.3)
Then
= m«
IIZ(t)1I2
-00, tJ)
= F(t)
(7.4.4)
and, for all s < t,
IIZ(t) - Z(s)1I 2
= 1I'1«s,t])11 2 = m«s,t]).
(7.4.5)
From this, we clearly have that Z(t) is right-continuous in the q.m. sense. In
addition, the function F( . ) defined by (7.4.4) is also right-continuous. Finally,
because
E(Z(tl) - Z(t O ))(Z(t 2 )
-
Z(t 1 ))
= ('1«t o, t 1 J),'1«tl' t 2 ])) = 0,
the process Z(t) defined by (7.4.3) has orthogonal increments.
Conversely, let {Z(t);t E R} be as in Definition 7.4.1 and, for any s < t,
define
F(t) - F(s) = IIZ(t) - Z(s)112.
(7.4.6)
Because the process has orthogonal increments, for any s < , < t,
IIZ(t) - Z(s)11 2 = IIZ(t) - Z(,)
= IIZ(t) -
+ Z(,) - Z(s)11 2
Z(,)11 2 + IIZ(,) - Z(s) II 2
~ IIZ(,) - Z(s) II 2.
Therefore,
F(t)
~
F(,),
(7.4.7)
7.4. Process with Orthogonal Increments
161
which implies that F(·) is a nondecreasing function on R. Finally, from (7.4.6),
we see that Z(t) is q.m. continuous at t E R if and only if F(t) is continuous at
t. In the following, we assume that
F( -(0) = 0
F( +(0) <
and
(7.4.8)
00.
Let m(·) be the Lebesgue-Stieltjes measure that F generates. We see that,
for any interval (s, t],
m«s, t]) = F(t) - F(s) = IIZ(t) - Z(s)ll.
Therefore, if !l' is the algebra of sets
A
=
n
U (a
k=l
k ,,,,,],
where (a k , bk ] are disjoint and
l1(A)
=
n
L [Z(b
k=l
k) -
Z(a k )],
(7.4.9)
F(a k )]
(7.4.10)
it is evident that
where
n
m(A) =
L
k=l
[F(bk )
-
and
(11 «ai' b;]), 11 «aj' bj])) = 0 for all i ¥- j.
This clearly implies that l1(A) defined by (7.4.9) represents an elementary
orthogonal stochastic measure. The set function m(·) defined by (7.4.1) has a
unique extension to a measure on ~. From this and the preceding section, we
see that 11( . ) can be also extended to ~ and that
1111(A)11 2 = m(A),
A
E
9f!.
Consequently, there exists an isomorphism between processes {Z(t); t E R} C
L2 {n, gj, P} with orthogonal increments continuous from the right in the q.m.
sense and such that
IIZ(t)11 2 = F(t),
F(-oo) = 0,
F( +(0)
<
00,
and orthogonal stochastic measures 11 with m( . ), the measure associated with
it. The correspondence is given by
Z(t)
= 11« -00, t]),
F(t) = m« -00, t]),
and
l1«S, t]) = Z(t) - Z(s),
m«s, t]) = F(t) - F(s).
162
7. Spectral Analysis of Stationary Processes
Finally, the stochastic integral
f:
(7.4.11)
h(t)dZ(t)
will mean the stochastic integral
f:
h(t),,(dt).
7.5. Spectral Representation
This section is concerned with the spectral representation of a wide sense
stationary stochastic process. Mathematically speaking, the spectral representation establishes an isometric isomorphism between the closed linear
manifold spanned by the process and a certain Hilbert space L2 {R, fJI, F} of
complex functions. Such a representation is often useful in a large class of
scientific and engineering problems, for example, guidance and control, signal
detection, image enhancement, and so on.
Let g(t);t E R} c L2{n,~,p} be a wide sense stationary stochastic process with
Eg(t)} = 0,
C(t)
= E(~(s)~(s + t».
(7.5.1)
If the covariance function C(t) is continuous at zero, then by Proposition
7.1.2, there exists a unique nondecreasing bounded function F(·) on R such
that
C(t) =
f:
(7.5.2)
e itx dF(x).
The function F( . ), which we assume to be continuous from the right, is called
the "spectral distribution" of the process ~(t). The set of the discontinuity
points of F(·), which is always finite or countably infinite, is called the
"spectrum" of the process ~(t).
Proposition 7.5.1. Assume that the covariance function C(t) is continuous at
zero. There exists unique orthogonal stochastic measure " with· values in
such that
L2{n,~,p}
~(t) =
f:
Furthermore,
1I,,(A)11 2 =
m(A) =
(7.5.3)
eo.t,,(d)..) (a.s.).
L
dF
for all A
E
fJI,
(7.5.4)
7.5. Spectral Representation
163
where m( .) is the Lebesgue-Stieltjes measure associated with '1 and generated
by the spectral distribution F.
PROOF. There are various ways to prove this proposition. The method adopted
here is a version of Stone's spectral representation theorem for a family of
unitary operators in a Hilbert space.
Let L 2(m) = L2 {R, 9f, m} be the Hilbert space of complex square integrable
functions on R with the inner product
(f, h)
=
f:
f(l)h(l)m(dl).
(7.5.5)
Consider the system of complex functions
{eilt;t
E
R} c L2{R,9f,m}.
>
Denote by <{e iA1 ; t E R} the linear manifold spanned by {e iA1 ; t E R}. This
linear manifold is dense in L 2 (m) (see Proposition 5.3.2). Therefore,
(7.5.6)
As usual, we denote by <e(R» the closed linear manifold spanned by
g(t);t E R}.
We now establish a one-to-one correspondence between the linear manifolds <{e iAt ; t E R} and <e(R» by setting
>
e ilt +-+ e(t),
(7.5.7)
t E R.
Clearly then, for any {t i , ... , t n } c R, we have
(7.5.8)
Note that (7.5.7) is a consistent definition in the sense that
n
n
L cje iA1J = 0 (a.e.) m¢> j=i
L Cje(t) = 0 (a.s.) P.
j=i
The correspondence (7.5.7) is an isometry. As a matter of fact, keeping in
mind (7.5.2) and (7.5.7), we have
(e iAt' , e iAt2 ) =
f:
e iA(tI-/2) dF(l) = C(ti - t 2 )
(7.5.9)
Similarly,
(
~ t:tje iAtJ, ~ Pk eiA/k ) = (~ t:tj~(t), ~ Pke(tk»).
Now, consider' E
<~(R». There exists gn}'i' c <~(R» such that
II' - 'nil
--+
0 as n --+
00.
(7.5.10)
7. Spectral Analysis of Stationary Processes
164
Consequently, {(n}f is a Cauchy sequence and, therefore, so is {tn}f c
L 2 {R,9l,m}, where
fn +-+ 'no
L 2(m) such that 11f.. - fll -+
E L 2 (m) and Ilf - f,,11 -+ 0,
<~(R) and {17,,} c <~(R) such that
Because L2 {R, 9l, m} is complete, there exists f
E
o as n -+ 00. The converse also holds, namely, if f
where {t,,} c
<{e
it .<;
t E R}), there is 17 E
1117 - 17,,11
-+
0
as
n -+
and
00
17" +-+ fn.
This clearly implies that the correspondence can be extended to elements of
< {e it .<; t E R}) and <~(R).
Next, consider
f(J..) = IB(J..),
and let 17(B) E
<~(R)
BE 9l,
be such that
IB(J..) +-+ 17 (B).
Clearly,
so that due to isometry,
1117(B)11 2 = m(B).
In addition, if Bl r. B2 = 0, where B 1 , B2
E
r!li,
(17(B 1 ),17(B2))
=0
and
II 17(B) - ktl 17(Bk) 112 -+ 0 as n -+
00,
where {Bd f c r!li are disjoint and B = Uk Bk •
Therefore, the family 17(B), B E 9l, forms an orthonormal stochastic measure with respect to which we can define the stochastic integral
t/I(f) =
Let f
E
f:
f(J..)17(dJ..),
f
E
L2(m).
(7.5.11)
L 2 (m) and f +-+, and set
, = tfi(f)·
Let us show that
t/I(f) = tfi(f) (a.s.) P.
(7.5.12)
If
f(J..) =
L" ckIBk(J..)
k=l
(7.5.13)
7.5. Spectral Representation
165
with Bk = (ak' bk] disjoint, then from (7.5.11),
n
t/I(f) =
L c 1/(B
k=l
k
k ),
which is evidently equal to fjJ(f). Hence, (7.5.12) holds for functions of the
form (7.5.13). However, if f E L2(m) and
Ilf - fnll
-+
0 as n -+
00,
where fn(A) are of the form (7.5.13), then
IIfjJ(f) - fjJ(fn)ll -+ 0
as n -+
1It/I(f) - t/I(f..) I
[by (7.3.13.iii)],
-+
0
00,
which proves (7.5.12).
Finally, consider
f(A) = e iM ;
t:
then, by (7.5.7), fjJ(e iM ) = ~(t). On the other hand, due to (7.5.11),
t/I(e iAt ) =
This and (7.5.12) then yield
W)
=
f:
eiAt 1/(dA).
eU '1/(dA.)
(a.s.)
0
for all t E R. This proves the proposition.
Remark 7.5.1. Let {Z(t); t E R} be a stochastic process with orthogonal
increments corresponding to an orthogonal stochastic measure 1/(')' Then,
from (7.4.11) and (7.4.12), we deduce that
~(t) =
Remark 7.5.2. If E{W)} =
IX,
f:
eU'dZ(A).
(7.5.14)
then we have
~(t) =
IX
+
f:
ei ),'1/(dA).
Remark 7.5.3. From the proof of the proposition it is clear that there must be
many different representations of a wide sense stationary process. The one
that we give here is particularly useful, because it allows application of
classical harmonic analysis in the studying of an important class of random
processes.
Let C(n) be the covariance function of a wide sense stationary sequence
{~n}O with E {~n} = O. According to Herglotz's theorem (Proposition 7.1.3)
166
7. Spectral Analysis of Stationary Processes
the covariance function has the spectral representation
C(n) =
f~" e
inA
dF()"),
where F( . ) is a real nondecreasing bounded function.
Proposition 7.5.2. There is an orthogonal random measure", on ~ n (n, n] such
that, for every n,
(7.5.15)
and II ",(d)") 112 = dF()").
The proof of this proposition is essentially the same as the proof of the
previous proposition.
Remark 7.5.4. The spectral representation of a wide sense stationary process
implies that, for each t, e(t) can be approximated in the q.m. sense by a finite
sum of the form
n
L e iAkl .1Z()..k)'
k;l
where .1Z()..k) are orthogonal r.v.'s from <e(R».
7.6. Ramifications of Spectral Representation
As we have established (see Remark 7.4.1), the covariance function C(t) of a
second-order stationary process is real if and only if its spectral distribution
F( . ) is symmetric. Something similar holds for stationary processes.
Let {e(t); t E R} be a real wide sense stationary process and consider its
spectral representation
Because
W) =
e(t), we have
W) =
f:
e- w dZ()..)
=
f:
e W dZ( - )..),
which implies that
dZ()..) = dZ( - )..).
From this [see also (7.4.11) and (7.4.12)], one can deduce that, for any A
",(A) = ",( - A),
(7.6.1)
E~,
(7.6.2)
7.6. Ramifications of Spectral Representation
167
where '1(.) is the orthogonal random measure induced by {Z(t); t E R} and
-A={X;-XEA}.
Set
(7.6.3)
It is readily seen that '11 (-) and '12(·) are real measures. According to (7.6.2),
we have
.
(7.6.4)
Again consider the real wide sense stationary stochastic process e(t) and
write its spectral representation, using (7.6.3), as
W) =
=
f:
f:
f:
(cos At
cos At '11 (dA) -
+
Because
+ i sin At)('11 (dA) + i'12(dA))
f:
sin At '12 (dA)
+ i{r: cos At '12 (dA)
sinAt'11(dA)}.
W) is real,
f:ao cosAt '12 (dA) + f: sin At '11 (dA) = o.
This and (7.6.4) yield
W)
=
f:
cos At '11 (dA)
+
= r : cos At dZ1(A) +
f:
f:
sin At '12(dA)
sin At dZ2(A),
(7.6.5)
where
(7.6.6)
Now consider the operation of q.m. differentiation of a wide sense stationary stochastic process ~(t). According to Proposition 6.2.4, the q.m. derivative
of a second-order process with covariance function C(s, t) exists if
02C(S, t)1
osot .=1
(7.6.7)
exists at every t E T. For a wide sense stationary stochastic process
{W); t E R} with covariance function C(t), (7.6.7) is equivalent to the requirement that C"(O) exists. This, on the other hand, is equivalent to
C"(O)
=
f:
A2 dF(A) <
00.
(7.6.8)
7. Spectral Analysis of Stationary Processes
168
Assume now that (7.6.8) holds. It is not difficult to show that
eiJ.h - 1
--=-h- ~ iA
(7.6.9)
(with respect to F)
h- ~(t) f:
as h --+ O. Next, taking into account (7.5.14), we have
W+
eiJ.h - 1 eiAtlJ(dA).
h
=
The following is clearly permissible due to (7.6.8):
· W + h)h 1.l.m.
I'.Lm. (eiJ.hh- 1) e
fro
W) __
h~O
From this and (7.6.9), we deduce
~'(t) =
i
f:
iJ.<
IJ
(d1)
A.
h~O
-00
(7.6.10)
AewlJ(dA).
Let us show that ~'(t) is also a wide sense stationary process. Because by
assumption Eg(t)} = 0, clearly
Eg'(t)} = O.
(7.6.11)
In addition,
which proves the assertion.
Let {~(t); t E T} bewide sense stationary with spectral representation
(7.5.14) and let (E <~(R». What is the structure of the r.v. (?
Proposition 7.6.1. Let
(E <~(R»,
(a.s.)
(=
PROOF.
then there exists cP
f:
cp(A)IJ(dA).
E
L 2 {R,91,F} such that
(7.6.12)
Set
n
(n =
L
Cik~(td;
k=l
(7.6.13)
then by (7.5.14)
(7.6.14)
In other words, (7.6.12) holds for
n
CPn(A) =
L Cike iAtk .
k=l
(7.6.15)
7.7. Estimation, Prediction, and Filtering
In the general case, for,
(7.6.13) such that
E <~(R»,
169
there exists a sequence of r.v.'s of type
But then
IIIPn-CPmll = "'n-'ml~O asn,m~oo.
Consequently {IPn} is Cauchy in L2 {R, 91, F}, so that there exists an IP E
L2 {R, 91, F} such that
Ilcp - CPnll
~
0 as n ~
00.
But then [see (7.3.13.iii)]
III/!(cp) - I/!(IPn)1I
~
0 as n ~
00
and because
we have
, = I/!(cp)
(a.s.),
which proves the assertion.
D
7.7. Estimation, Prediction, and Filtering
The general estimation problem in L2 {n, 91, P} can be formulated as follows.
Let Y E L2 be arbitrary but fixed and let S c L2 be a subspace. Find the
element YES closest to Y. From the definition of Y, it clearly follows that
IIY- YII =inf{IIY- UII;UES}.
(7.7.1)
In Hilbert space terminology, Y is the orthogonal projection of Y on S,
characterized as the element of S such that
Y - Y.iS.
(7.7.2)
This implies that
(Y, U) =
(Y, U)
. (7.7.3)
S. The following two particular cases are of special interest.
C L2 {n,91, P} be a stochastic process.
Assume that this process is observed up to time t, i.e., its values ~(s) are known
for all s ::;:;; t. It is required to predict its value at some moment of time t + t,
t > O. In this case,
for each U
E
(i) Pure Prediction. Let g(s); s E R}
Y=
where
§l =
W + t)
ug(s);s::;:;; t}.
and
S = L 2 {n,§I,p},
(7.7.4)
7. Spectral Analysis of Stationary Processes
170
and Prediction. Let {X(t);tER} and {Z(t);tER}c
be given processes, where X(t) is the one that interests us.
Unfortunately, X(t) cannot be observed directly as it is "contaminated" by
the unobservable disturbance Z(t). What is observed is the sum
(ii) Filtering
L2{Q,~,P}
Y(t)
= X(t) + Z(t).
(7.7.5)
Here is a simple example of what an engineer would call a "signal" X(t) plus
unwanted noise Z(t), giving Y(t), which is what we observe. The problem here
is to separate the noise from the signal. More precisely, one must find an
estimate X(t) of X(t) from observations Y(s) for s ::5: t. This is called filtering.
Clearly, X(t) is the orthogonal projection of X(t) on L2{Q,~:,P}. If we have
to find an estimate X(t + -r) of X(t + -r), where -r > 0, from values Y(s) for
s ::5: t, this is called filtering and prediction.
We now outline a general solution of the estimation problem. Let ~ c ~
be a sub-cr-algebra. We shall denote by L2(~) = L2{Q,~,P}, the Hilbert
subspace of L 2 , which consists of those elements which are ~-measurable.
We want to show that the orthogonal projection of Y on L2(~) (see Section
10.1 of Chapter 10) is
y= E{YI~}·
(7.7.6)
Clearly, E{YI~} E L2(~). To prove the assertion, it suffices to show that
(7.7.2) holds. Take any Z E L2(~); then,
(Y - E{YI~},Z)
= (Y,Z)
- (E{YI~},Z)
= E(YZ) - E(ZE{YI~})
= E(YZ) - E(E{YZI~}) =
o.
Therefore,
Y - E{YI~}.1 L2(~).
(7.7.7)
Remark 7.7.1. The estimate (7.7.6) is clearly unbiased; that is,
E{Y} = E(E{YI~}) = E{Y}.
Remark 7.7.2. The quantity
~2 =
IIY _ YI1 2
(7.7.8)
is called the mean square error of the estimate (7.7.6).
The next simple example contains the basic idea used in the following to
resolve an estimation problem.
7.7.1. In regression analysis, the following problem is of some interest. Let Xl' X 2 E L2 {Q,~,P}; from the family of Borel functions,
EXAMPLE
h: C -+ C
7.7. Estimation, Prediction, and Filtering
171
such that h(X 1 ) E L 2 , (C is the set of complex numbers) find one that minimizes the norm
IIX2 - h(Xdll.
Clearly, h(Xd E L 2(a{Xd); therefore, the norm will achieve its minimum if
h( . ) is such that
X2
where L2(a{X})
=
h(X1)~L2(a{X}),
-
L2 {Q, a{X}, P}. From this and (7.7.6), it then follows that
(a.s.).
h(Xd = E{X2IXd
(7.7.9)
From this example we see that the best approximation to X 2 in the class
of all Borel functions h(') of Xl [which, of course, must be such that
h(Xd E L 2] is E{X2IXd. This solution, unfortunately, has very limited practical applicability because the expectation is not easy to determine. A more
useful solution can be obtained by seeking an approximation, not in the class
of all Borel functions but in a subclass consisting of the linear functions
h(X1) = aX 1·
In such a case, the condition X 2
-
aX 1 ~ <~> would yield
(X 2 ,X1 ) = allX l
11
2
from which we obtain a.
The next example is slightly more general.
EXAMPLE
E{Zd
=
7.7.2. Let {Zd~ c L2{Q,~,P} be an orthonormal sequence with
0, k = 1, ... , n. Write
ff,. = a{Zl,· .. ,Zn}
and consider L2(ff,.). Let Y E L2 be arbitrary; then, according to (7.7.6), the
element in L 2(ff,.) closest to Y is Y, defined by
Y = E{YIff,.}.
(7.7.10)
Now, consider the linear manifold <{Zk}D spanned by {Zd~. Clearly,
<{Zk}~ >is the subspace (because <{Zk}1 >is closed) consisting of all the linear
combinations Lk=l CkZk, where Ck' k = 1, ... , n, are complex numbers. Therefore, the element Yo E <{ZdD closest to Y must be of the form
To determine the coefficients
rY.b
observe that
n
Y-
L
k=l
rY.kZk ~ <{Zk}D
which, due to orthonormality of {Zd~, yields
172
7. Spectral Analysis of Stationary Processes
(Y,Zi)
= lXi'
i
= 1, ... , n.
Therefore,
n
Yo
=
L (Y,Zk)Zk·
k=l
(7.7.11)
Note that Ygiven by (7.7.10) is, in general, a better estimate of Y than Yo.
As matter offact, from (7.7.7), we have
I Y - Yo 112 = I Y - YI1 2 + I Y - Yo 112.
(7.7.12)
This fact also follows from <{Zk}D c L2(~).
Remark 7.7.3. If Y, Zl' ... ' Zn form a Gaussian system,
E{YIZl, ... ,Zn}
=
n
L CkZk E <{Zk}D,
k=l
where, clearly, Ck = (Y, Zk). Therefore, in this case
Y=
Yo.
To summarize, using the L2 norm as our criterion of closeness, we have
shown that the closest element in the subspace L2(~) to an arbitrary Y E L2
is simply the conditional expectation E {YI ~}. On the other hand, the closest
element to Y in the subspace <{ZkH) is Yo given by (7.7.11). From (7.7.12),
we see that Y represents a better approximation of Y than Yo.
7.8. An Application
Consider the simple model
(7.8.1)
where {en}~<Xl and {Z"}~<Xl c L2 {n,fJI, P}. Here, we interpret en as a signal and
Zn as noise. Neither en nor Zn can be observed directly. It is the superposition
Xn of these two that can be observed.
For the signal gn}~<Xl and the noise {Zn}~<Xl' the following conditions are
assumed to hold:
(i) {en}~<Xl and {Zn}~<Xl are both wide sense stationary processes with
(7.8.2)
(ii) gn}~<Xl and {Zn}~<Xl are uncorrelated. In other words,
(ej,Zk) = 0
for all k, j: ... - 1,0,1, ....
(7.8.3)
Denote by
(7.8.4)
7.8. An Application
k=
173
... -1, 0, 1, .... Clearly,
(7.8.5)
C~( - k) = C~(k),
In addition, after some straightforward calculations, we have
(7.8.6)
Let us define the problem of interest here. We want to find a linear estimate
of a r.v. Y E L2 on the basis of observations
(7.8.7)
This is equivalent to the problem of finding the orthogonal projection Yo of
Yon the linear manifold <{Xm+Jo) (which is a closed subspace of L 2 ).
The best linear predictor Yo E <{Xm+l}o). This clearly implies that it must
be of the form
(7.8.8)
where ho, ... , hr are some (complex) constants. Because
Yo.i <{Xm+i}o),
Y-
it follows that
(Y, X m +j ) =
(Yo, X
m +j )
for any j = 0, ... , r. In other words,
(Y,Xm+) =
=
r
L hl(Xm+I,Xm+)
1=0
r
L hlCx(i -
j).
1=0
Now consider the case when
Y = ~m+r+k
(prediction with filtration). In such a case,
(Y, X m+j) = (~m+r+k' ~m+j
+ Zm+j)
= (~m+r+k' ~m+)
= C~(j - r - k).
Thus, the system of equations characterizing
r
L hlCx(i -
1=0
where j
Yo becomes
j) = C~(j - r - k),
j = 0, ... , r,
= 0, ... , r. A solution of this system yields ho, ... , hr.
7. Spectral Analysis of Stationary Processes
174
7.9. Linear Transformations
Let g(t); t E R} c: L2 {n, at, P} be a wide sense stationary q.m. continuous
process with E {~(t)} = 0, spectral distribution F( . ), and orthogonal random
measure 1]. Then (see Proposition 7.5.1), ~(t) has the spectral representation
(7.9.1)
where 111](dA)11 2 = dF(A).
Let {((t);t E R} c: L2{n,~,p} be another wide sense stationary process
such that
((t)
=
f:
(7.9.2)
ei.l.t<pp.)1](dA)
where <p(·)EL2{R,Bl,F}. We then say that the stationary process W) was
obtained from W) by a linear transformation. The function <p(.) in (7.9.2) is
called the "spectral characteristic" of the transformation.
Being a stationary process, ((t) has its own spectral representation
((t) =
f:oo e
iAt1]o(dA).
It then follows from (7.9.2) that
1]0 d)'
= <p().)1](dA).
(7.9.3)
In addition, because
the spectral distribution Fo(') of ((t) is given by
dFo().) = I<p(AW dF(A).
(7.9.4)
7.9.1. One of the simplest linear transformations of a stationary
stochastic process ~(t) is the "shift" operator 1',. defined by
EXAMPLE
1',.~(t)
From this, we have
1',.~(t) =
=
f:
W + s).
(7.9.5)
ei'-te iAS1](dA)
which is a particular case of (7.9.2), where <p(A) = eiAs• Clearly, we then have
that Fo == F and 1]0 d)' = ei ;'s1] dA.
Consider the closed linear manifold <~(R» spanned by the process ~(t).
According to Proposition 7.6.1, an arbitrary element
<~(R) can be
'E
7. Spectral Analysis of Stationary Processes
176
f: f:oo h(t =
f: f:
SdC(Sl - s2)'h(t - s2)ds 1 dS 2
h(sdh(S2)C(Sl - s2)ds 1 dS2
exists, which clearly holds due to (7.9.9).
Remark 7.9.3. If h(u) = 0 for u < 0, the filter is said to be physically realizable.
The integral (7.9.8) is also known as "a moving average process."
EXAMPLE 7.9.2. Let cp(.) be the Fourier transform of some integrable function
h('), namely,
cp(A)
=
f:oo e-iAth(t) dt.
Then, the linear transformation of the process
istic (7.9.10) is the process
(t)
=
=
=
=
=
Clearly,
E {«s)(s
+ t)}
= E
=
f:
f: (f:
~(t)
(7.9.10)
with the spectral character-
e iAt cp(A)l1(dA)
e iAt
t: (f:oo
f: (f:
e-iASh(s) dS) l1(dA)
eiA(t-S)h(s) dS) l1(dA)
eiAUh(t - u)
dU) l1(dA)
f:oo h(t - u)~(u)du.
{f:
h(s -
t: t:
u)~(u) du
f:
h(s
+t-
v)~(v) dV}
h(Sl)h(S2)C(S2 - sdds 1 ds2·
7.10. Linear Prediction, General Remarks
Let {e(t); t E R} C L2 {n,.?4, P} be a wide sense stationary stochastic process
with E {W)} = O. As before, we denote by ~(R» the closed linear manifold
spanned by ~(t). Clearly, <~(R» c L 2 {n,gB,p}. Because for every t E R, ~(t)
<
7.9. Linear Transformations
represented as
175
,= I~oo
qJ(A),,(dA),
qJ
E
L2 {R,Bl, F}.
Actually, the following result holds. If {W); t E R} c: L2 {n, 81, P} is wide sense
stationary and such that
<nR» c: <e(R»,
(7.9.6)
then there exists qJ( .) E L2 {R, Bl, F} such that
W)
I~oo eiAtqJ(A),,(dA).
=
(7.9.7)
To show this, consider
,,,(t)
" IXke(t + t k) E <e(R».
= L
k=l
From (7.9.1), we obtain the spectral representation of ,,,(t) as
,,,(t) =
I:
eiAtqJ,,(A),,(dA)
where
The general case is now obtained for arbitrary qJ(' ) by passage to the limit.
Remark 7.9.1. When (7.9.6) holds, we often say that the process W) is subordinated to e(t).
Remark 7.9.2. The derivative (7.6.1) of a wide sense stationary process is
another particular case of (7.9.2), where qJ(A) = iA.
Consider now another type of linear operation of a wide sense stationary
process. The transformation
W)
=
I~oo h(t -
s)e(s)ds
(7.9.8)
of the process e(t) is called "an admissible linear filter." The complex function
h( . ) is absolutely integrable, i.e.,
I:
Ih(s) Ids <
00
(7.9.9)
and square integrable on every finite interval. The integral (7.9.8) exists if and
only if the integral
7. Spectral Analysis of Stationary Processes
178
of e(u) on <e(R u - t » with u = t
+ 't'. Now, because <e(R
U-
t2 » c <e(Ru - t ,»,
inf{lIe(u) - ZII;Z E <e(Ru - t ,»} ~ inf{lIe(u) - ZII;Z E <e(RU - t2 »}
(see Proposition 5.4.1). This proves (7.10.4).
If the stochastic process {e(t); t E R} is deterministic, then e(t
<e(R,» for all t, 't' E R, so that
e(t
+ 't') E
+ 't') = e(t + 't').
This implies that <5('t') = 0 for all 't' ~ 't'o' Conversely, if <5('t'o) = 0 for some
't'o > 0, the process is deterministic. Indeed, due to (7.10.4) then, <5('t') = 0 for
all 't' ~ 't'o. Therefore,
e(t
for all t
~
+ 't') =
e(t
+ 't') E <e(R,»
't'o and all t E R. This then implies that
<e(R'+t» c <e(R,»
for all t E Rand 't' > O. Consequently, <e(Rt+t » = <e(Rt».
In the following, for convenience sake, we will denote by
pte(t
+ 't'),
't'
~
0
(7.10.5)
the orthogonal projection of W + 't') on <e(Rt». With this notation, the error
of the linear prediction (7.10.2) can now be written as
<5('t') = IIW
+ 't') -
pte(t
+ 't')II.
(7.10.6)
Next, we shall prove the following result.
Proposition 7.10.1. The stochastic process e(t) is purely nondeterministic
only if <5('t') -+ C(O) as 't' -+ +00.
PROOF. Assume that <5('t') -+ C(O) as 't' -+ +00. Because <e(
all s E R, we have
-(0» c
lIe(t) - p.W)II ~ lIe(t) - P-ooW)II.
if and
<e(R.» for
(7.10.7)
Next, because (e(t) - p.e(t» 1- p.e(t),
IIW)1I 2
=
IIP.W)1I 2
+
IIW) - p.e(t)1I2.
(7.10.8)
We also have
IIe(t)II 2 = IIP_ooe(t)II 2 + IIe(t) - p-ooe(t) II 2.
From this, (7.10.7) and (7.10.8), we deduce that
IIP.W)II ~ IIP-ooW)II.
However, IIe(t)II 2 = C(O) for all t, and
IIe(t) - p.e(t)II2 = <5(t - s) -+ C(O)
(7.10.9)
177
7.10. Linear Prediction, General Remarks
is a "point" in L2 {n, f!l, P}, the whole process represents a curve in the Hilbert
space L2 {n, ffI, Pl. The closed linear manifold <~(R) is the smallest subspace
of L 2 {n,f!l,p} which contains this curve.
For any t E R, write R t = (-00, t]; the <~(Rt) is defined to be the closed
linear manifold spanned by r.v.'s ~(s), s ~ t. The subspace <~(Rt) is often said
to represent the past and present of the stochastic process g(s); s E R}, given
that t is the present time. Clearly, for any tl ~ t 2 ,
<~(Rt,)
~ <~(Rt2)'
Denote by
<~( -(0)
=
n
tER
<~(Rt)·
Therefore, in the terminology just introduced,
finitely remote past.
<~( -(0)
represents the in-
Definition 7.10.1. If
<~( -00)
=
<~(R)
or, equivalently, if <~(Rt) is the same for all t E R, the stochastic process ~(t)
is called "deterministic." If, on the other hand,
<~( -00)
~ <~(R),
the stochastic process is said to be nondeterministic. Finally, if <~( -(0)
{O}, ~(t) is called "purely nondeterministic."
=
To elucidate somewhat the physical meaning of these notions, we will
explain them in terms oflinear prediction. As we have seen in Section 7.7, the
problem of linear prediction of a process ~(t) is to find an estimate of ~(t + r)
for some r > 0, given the values of ~(s) for s ~ t. The best linear predictor of
W + r) is its orthogonal projection on <~(Rt), which is denoted by ~(t + r).
From Proposition 5.4.2, we know that the deviation
W + r) -
+ r) 1. <~(R).
(7.10.1)
IIW + r) - ~(t + r)11
(7.10.2)
~(t
Its norm
t5(r) =
is called the "error" of the linear prediction. It is independent of t because the
process ~(t) is wide sense stationary. Note also that
~(t
+ r) E <~(Rt).
(7.10.3)
Let us show that, for any 0 ~ r l ~ r2'
t5(r l )
~
t5(r2)
The orthogonal projection of W
and
t5(r) = 0
+ r) on <~(Rt)
for r
~
O.
(7.10.4)
is the same as the projection
7.11. The Wold Decomposition
179
as s ~ -00. Therefore, by (7.10.8),
IIPsW)II2 ~ 0 as s ~ -00.
Consequently, ~(t) 1.. <~( -00) for all t, so <~( -00)
Conversely, suppose that <~( -00) = {OJ and let
Pt-s~(t) ~ Z
=
{OJ.
as s ~ 00;
then Z E <~( -00) = {OJ so that
lim <5(r) = lim IIW
+ r) -
PtW
+ r)11 2
o
This completes the proof of the assertion.
7.11. The Wold Decomposition
Let g(t);t E R} c L2{n,~,p} be a wide sense stationary nondeterministic
stochastic process. The Wold decomposition, which will be discussed next,
shows that such a process can be represented as a sum of two mutually
orthogonal second-order processes, one of which is deterministic and the
other purely nondeterministic.
Proposition 7.11.1. Let {W); t
Then it can be written as
E
R} be wide sense stationary with E g(t)} = O.
(7.11.1 )
where g 1 (t); t E R} is purely nondeterministic and
istic. In addition,
g 2 (t); t E R}
is determin-
(7.11.2)
for any s, t
PROOF.
E
R.
Set
(7.11.3)
Clearly,
~2(t) E <~( -(0)
for all t and, according to Proposition 5.4.2,
(7.11.4)
which implies that gl(t); t E R} and {~2(t); t E R} are orthogonal processes.
Next, because ~ 1 (t) E <~(Rt) for all t, it follows that
(7.11.5)
This and (7.11.4) then imply that <~1( -00) = {OJ so that the process ~1(t) is
purely nondeterministic.
7. Spectral Analysis of Stationary Processes
180
On the other hand, it is clear that
<~2(Rt»
so that
<~2(R»
c <~(
c <~(
-(0»,
(7.11.6)
Vt E R,
-(0». Because ~(t) = ~l(t) + ~2(t),
<~(Rt»
c <~l(Rt»
EtJ <~2(Rt», Vt E R.
(7.11.7)
From this, (7.11.5), and (7.11.6), we deduce that
<~(
-(0»
c <~l(Rt»
Now, (7.11.4) clearly implies that
<~l(Rt»..L <~(
EB <~2(Rt».
(7.11.8)
-(0».
(7.11.9)
Because [see (7.11.8)] <~( -00 » ~ <~1 (R t» EtJ <~2(Rt» for all t, it seems clear
that <~(
c <~2(Rt» for every t. This and (7.11.6) yield
-(0»
-(0»
<~(
-(0»
=
<~2(Rt»
for all t, so that <~2(
= <~2(Rt» for all t, from which we conclude that
{~2(t); t E R} is deterministic. Using a similar argument we can show that the
decomposition (7.11.1) is unique.
0
Corollary 7.11.1. Both processes,
Eg 1 (t)} = Eg 2 (t)} = O.
~l(t)
and
~2(t)
are wide sense stationary with
We now give without proof a necessary and sufficient condition for a
wide stationary stochastic process {~(t); t E R} to be nondeterministic. Let
{Z(t); t E R} be the stochastic process with orthogonal increments associated
with ~(t). Then we have the following result.
Proposition 7.11.2. For the stationary process ~(t) be nondeterministic, it is
necessary and sufficient that it has the representation
~(t) =
where
f:
roo lX(t - s) dZ(s),
IIX(sWds < 00,
<~(Rt»
= <Z(Rt»·
Assume that the spectral distribution F(') of the process
continuous, i.e.,
F(A.)
(7.11.10)
(7.11.11)
~(t)
is absolutely
= f:oo f(s) ds;
then the following result holds.
Proposition 7.11.3. The stochastic process ~(t) is purely nondeterministic if and
only if its spectral density f( . ) exists and
7.11. The Wold Decomposition
181
f
oo Inf(A.~ dA. >
1 + A.
-00.
(7.11.12)
-00
Remark 7.11.1. The stochastic process
f
~(t)
is deterministic if
oo Inf(A.~ dA. =
1 + A.
-00.
(7.11.13)
-00
If f( . ) does not exist, then
Remark 7.11.2. The spectral characteristic [see (7.9.2)] <p(A.) of the linear
transformation (7.11.10) is given by
<p(A.) =
too e-i).tlX(t)dt
(7.11.14)
and represents the boundary value of the analytical function
<p*(z) =
o.
on the half-plane Im{z} <
EXAMPLE
7.11.1. Let g(t);t
E{W)}
too e-iztlX(t) dt
E
R} be wide sense stationary with
= 0
and
C(t) = lXe- t2 /4 ,
IX> O.
Because the condition (7.1.9) holds, the spectral density f(·) exists [see
(7.1.10)J and
2
f( I\,') -- ~
2 foo e -iMlXe -t /4dt =
1t
-00
~
1/2 e
_).2
1t
.
From this, we clearly see that (7.11.13) holds, which implies that the process
~(t) is deterministic.
EXAMPLE
7.11.2. Let us show that the wide sense stationary process
E{W)} = 0,
is deterministic. Indeed, the process has a discrete spectrum [i.e., its spectral
distribution F( . ) is a step functionJ. To show this, consider
C(t) = E {~(s)~(s
=
co
L e- i
).kt
+ t)}
E('1{A.k})2
k=1
dF(A.)
= 0,
7. Spectral Analysis of Stationary Processes
182
If we set In {dF()')/d)'}
holds.
=-
00
when dF()')/d)'
= 0,
the condition (7.11.13)
7.11.3. The following example is particularly instructive. Let
R} be wide sense stationary with E {W)} = 0 and covariance C(t),
which we assume to be infinitely differentiable at zero. Let F(') be the corresponding spectral distribution. Then, according to (7.1.6),
EXAMPLE
{~(t); t E
f~oo eio·dF().).
C(t) =
From this, it readily follows that
f: ).
2 n dF()')
<
00
(7.11.15)
and vice versa, i.e., if (7.11.15) holds for all n = 1, 2, ... , the covariance
function is infinitely differentiable at zero.
According to (7.6.8), then (7.11.15) implies the existence (in the q.m. sense)
of the derivative ~'(t). Next, because ~'(t) is also wide sense stationary with
covariance function
we clearly have
C;(O)
=-
f:
).4 dF()').
Therefore, because this integral is finite, the first derivative of ~'(t) (in the q.m.
sense) exists and represents a wide sense stationary process and so on.
In general, if the condition (7.11.15) is met, the nth derivative ~(n)(t), n =
1,2, ... , exists and has the representation
~(n)(t) =
f:
(i).)ne iAt'1(d)').
From this, it follows that, for any. > 0, .
W + .) =
.k
L00
k=O
k!
~(k)(t).
This allows an exact prediction of the process for all • > O.
7.12. Discrete Parameter Processes
Let Egn}~oo be a wide sense stationary process with
Eg n} = 0 and C(n) = Egkek+n}'
According to the Wold decomposition, we have
(7.11.16)
7.12. Discrete Parameter Processes
183
(7.12.1)
e
where enl and en2 are two wide sense stationary processes such that ekl ..L j 2
for all k, j. The first component in (7.12.1), enl is deterministic and the second
component en2 represents a purely nondeterministic process. The deterministic component is perfectly predictable, and an explicit formula may be
obtained for predicting the purely nondeterministic component as the following proposition shows.
Proposition 7.12.1. Let {Xn}~oo be a wide sense stationary, purely nondeterministic process such that E {Xn} = O. Then,
Xn
00
=
L Cf.k(n-k,
k=O
(7.12.2)
where gm}~oo is a sequence of pairwise orthogonal r.v.'s with
EI(ml 2
= 1,
and
<{Xk}'~OO>
= <gk}~OO>
(7.12.3)
for all m and {Cf.k} '1 a sequence of complex numbers such that
Remark 7.12.1. The sequence gm}~oo is often called an "innovation sequence"
for the process {Xn}~oo; the reason for this is that (n+1 provides "new information" needed to obtain <{Xk}~~>.
We may drop the requirement that <{Xk}~oo> = <{(k}~oo> for all m and
(7.12.2) will still hold as the following proposition shows.
Proposition 7.12.2. A wide sense stationary process {Xn}~oo is purely nondeterministic if and only if it can be represented as
Xn
L bkZ
k=O
00
=
n-
k (a.s.),
(7.12.4)
where {Zm}~oo is an orthonormal system of r.v.'s and
Remark 7.12.1. A process in the form (7.12.4) is called a one-sided moving
average process.
An analogous result to Proposition 7.11.3 also holds in the discrete case.
Proposition 7.12.3. Let gn}~oo be a nondegenerate wide sense stationary stochastic process with E {en} = o. If the process is purely nondeterministic, there
exists a spectral density f().) such that
f:"
lnf()')d)' >
-00.
(7.12.5)
184
7. Spectral Analysis of Stationary Processes
Conversely, if the process gn}'~'" has a spectral density satisfying (7.12.5), the
process is purely nondeterministic.
Remark 7.12.2. From (7.12.5), it follows that f(·) > 0 (a.e.) with respect to
Lebesgue measure.
Remark 7.12.3. If
f~" Inf(A)dA =
-00,
(7.12.6)
the process {~.}':'", is deterministic.
7.12.1. Let {Z.}':'oo be an orthonormal sequence of r.v.'s, i.e.,
E{Z.} = 0, IIZ.II = 1 for all nand (Zj,Zk) = 0 for all j # k. From this, we
obtain that the covariance function Cz(n) is defined by
EXAMPLE
{I,
=
Cz(n)
n=0
n #0.
0,
(7.12.7)
Assume that the process {Z.}':'", has a spectral density f(A). Taking into
account that
{(2n)-1/2 e-i... }':'",
(7.12.8)
is a complete orthonormal system on ( -n, n) we can write
f(A) = -
1
2n
L'"
.
k=-oo
Ck e- 1k ..,
where
ck
= f~" e-ikY(A.)dA.
But from this, (7.12.7), and Herglotz's theorem [see (7.1.8)], we conclude that
I,
Ck
= Cz(k) = { 0,
k =0
k # o.
Therefore,
f().)
= {1/2n, -n :::;; ~ :::;; n
0,
otherwIse.
(7.12.9)
The stochastic process {Z.}':'", is often called "white noise." Because the
condition (7.12.5) is clearly satisfied, the process is nondeterministic.
EXAMPLE
7.12.2. Let {~.}':'", be a stochastic process defined by
00
~. =
L IXkZn-k,
k=-oo
(7.12.10)
185
7.13. Linear Prediction
where {Zn}~oo is a white noise process (see the previous example) and the
sequence of complex numbers {ocn}~oo is such that
(7.12.11)
The process gn}~oo is called a moving average.
Clearly, E {~n} = 0 for all n; in addition,
00
00
L L CXiXkE{Zm-jZm+n-d
j=-oo k=-oo
Egm~m+n} =
=
00
L cxk-niik
k=-oo
00
= k=-oo
L cxkiik+n'
(7.12.12)
From this and (7.12.1), it follows that the
Write
{~n}~oo
Then the spectral density f~(A) ofthe process
h(A) = -
1
L
C~(n)e-').n
L
cxrei).r 12
00
2n n=-oo
= -1
2n
1
00
r=-oo
is wide sense stationary.
gn}~oo
is equal to
•
> O.
Remark 7.12.3. The last example shows that for a moving average process the
spectral density f(A) > 0 (a.e.) with respect to the Lebesgue measure. One can
show that the converse also holds. In other words, a wide sense stationary
process with spectral density f(A) > 0 (a.e.) can be represented as a moving
average.
7.13. Linear Prediction
A particular case of a general estimation theory is that of (pure) prediction;
the basic idea was outlined at the beginning of Section 7.7. In Section 7.10,
general ideas of linear prediction were discussed in some detail. This section
7. Spectral Analysis of Stationary Processes
186
is concerned with some basic methods and techniques of the theory of linear
prediction.
Let g(t); t E R} be a wide sense stationary process with E g(t)} = 0, covariance function C(t) (continuous at zero), and a spectral distribution F(A).
Given values ofthe process on a set To !:; R t = ( -00, t], we want to "predict"
its value at a time t + h, h > 0. The prediction ~(t + h) of W + h) is clearly a
functional of g(s);s E To}, i.e.,
~(t
+ h) = ~g(s);s E To}.
(7.13.1)
We are concerned with the linear prediction of e(t + h). As we have seen
in Sections 7.7 and 7.10, the prediction is called "linear" if
~(t
+ h) E <e(To».
(7.13.2)
The best possible prediction in such a case would be the element in <e(To»
closest (in the L2 sense) to W + h). From Proposition 5.4.1, we know that
there exists such an element (uniquely determined) which is the orthogonal
projection of W + h) on <e(To». This projection will be denoted by
PToW
+ h).
(7.13.3)
From Proposition 5.4.2, it follows that
e(t
+ h) - PToW + h) 1. <e(To».
From this, it follows that, for any' E <e(To»,
+ h), 0 =
The norm of the deviation e(t + h) b(h) = IIW + h) -
+ h),O·
PToW + h),
PToe(t + h)1I
(7.13.4)
+ h) - PToW + h)12)1/2,
(7.13.5)
(e(t
= (EIW
(PToe(t
is called the "error" of the linear prediction.
EXAMPLE
7.13.1. Assume that To = {to,tl, ... ,tn} with to < tl < ... < tn =:; t.
In this case
n
PToW
+ h) = L
k=O
IXke(tk),
(7.13.6)
where the sequence of complex constants must be such that (7.13.4) is satisfied. In other words, for any j = 0, ... , n,
(W + h), Wj»
n
=
L IXk(Wk), e(t)
k=O
(7.13.7)
+ h - tJ
(7.13.8)
or
n
L IXkC(tj k=O
t k) = C(t
7.13. Linear Prediction
187
The solution of this system of linear equations will give us the values of the
coefficients Oto, ••• , Otn , for which the right-hand side of (7.13.6) represents the
orthogonal projection of W + h) on <e(To).
Let us now determine the value of the "error" of the linear prediction.
From (7.13.5) and (7.13.6), we have
(j2(h) = II e(t
+ h) -
Jo
Otke(tk) 112
(W + h) - ktO OtkWk), W+ h) - ktO Otke(tk))
= (W + h) - ktO Otke(tk), W
+ h))
=
n
L
= C(O) -
k=O
OtkC(t
+h-
tk)·
(7.13.9)
From this and (7.13.8), we obtain
n
n
(j2(h) = C(O) - L L OtkOtiC(tk - til.
k=O i=O
(7.13.10)
Next, we shall attempt to determine the best linear predictor of e(t + h)
when
To =Rt •
As before [see (7.10.5)], the orthogonal projection of e(t + h) on the subspace
<e(Rt) will be denoted by pte(t + h). Because PtW + h) E <e(Rt ), it follows
from Proposition 7.6.1 and (7.9.7) that PrW + h) has the representation
Pre(t
+ h) =
f:
e iAt q>(2, h)" (d2),
(7.13.11)
where q>(2, h) E L2 {R, al, F} is the spectral characteristic [see (7.9.2)] of the
prediction process.
As an illustration, consider the previous example, and in particular (7.13.6).
Using the spectral representation of the process e(t), we obtain
PToe(t
+ h) =
f: eto
OtkeiAtk),,(d2).
Here,
q>(2, h)
=
n
L OtkeiAtk
k=O
(7.13.12)
[the right-hand side of (7.13.12) depends on h because the coefficients are
functions of hl
To determine q>(2, h) we must devise an analog of the system of linear
equations (7.13.8). The following method is due to Yaglom. Starting from the
188
7. Spectral Analysis of Stationary Processes
fact that
(7.13.13)
it follows that, for any s
(~(t
~
0,
+ h) - P,W + h), W -
=0
s))
or, equivalently,
E{¢(t
+ h)~(t -
s)} - E{PtW
+ h)~(t - s)} =
O.
(7.13.14)
Using the spectral representations
W
+ h) = I :
e i).(t+h)'1(dl),
W - s) = I : e i).(t-S)'1(dl),
and (7.13.11), Equation (7.13.14) becomes
I : ei).S[ei)'h - cp(l, h)] dF(l)
= 0,
(7.13.15)
= O.
(7.13.16)
or if F( . ) is absolutely continuous,
I : ei).S[ei)'h - cp(l,h)]f(l)dl
The corresponding prediction error is
b 2(h)
= EI~(t + h) = E II :
PtW
+ hW
e i).(t+h)'1(dl) - I : eWcp(l, h)'1(dl) 12
= ElI: eW[ei)'h - CP(l,h)]'1(dl)12
= I : leW - cp(l, h)1 2 f(l) dl
= C(O) - I : Icp(l, hW f(l) dl.
(7.13.7)
7.14. Evaluation of the Spectral Characteristic <p(A, h)
According to (7.13.11), the best linear prediction in the L2 sense, of W + h),
h > 0, given all the values of ~(s) for s ::;; t, is completely characterized by its
spectral characteristic cp(l, h). In some special cases, the determination of
7.14. Evaluation of the Spectral Characteristic q>(}., h)
189
<p(A., h) is simple enough (see Example 7.13.1). However, in general, the situation is far more complex. First, the spectral characteristic <p(A., h) may not even
exist (except as a generalized function). Even if it exists, there is generally
speaking, no sufficiently simple expression for Pt~(t + h). Second, from Proposition 7.6.1, we know that the <p(A., h) E L2 {R,!Jt, F} and must be a q.m. limit
of a sequence of linear combinations of eiJ.s, s ~ O. Finally, the condition
(7.13.15) must be satisfied. As a convenience, we list these requirements below.
A function <p(A., h) is a spectral characteristic if the following holds:
(i) <p(A., h) E L2 {R,!Jt, F}, where F(·) is the spectral distribution which is
assumed to be absolutely continuous so that the spectral density f(A.) =
F'(A.) exists.
(ii) <p(A., h) is the q.m. limit of a sequence of linear combinations of eiuJ..
J:oo eisJ.[eihA -
(iii)
for all s
~
<p(A., h)]f(A.) dA. = 0
(7.14.1)
O.
Write
H(A., h)
= [e ihA - <p(A., h)]f(A.)
(7.14.2)
and consider the function H(z, h) of a complex variable z which is assumed to
be single-valued and holomorphic (analytical) in the upper half of the complex plane {z; Im{z} ~ O}. The following proposition gives a sufficient condition for (7.14.1) to hold.
Proposition 7.14.1. If, as Izl-+
some e > 0, then (7.14.1) holds.
00,
IH(z, h)1 vanishes faster than 1/lzll+e for
PROOF. Because e iZA and H(z, A.) are both holomorphic in the upper half of the
complex plane, their product is also holomorphic. Therefore, according to the
Cauchy theorem,
L
e izs H(z, h) dz = 0
for any closed contour C c {z;Im{z} ~ O}. Take C = CR , where CR is formed
by the semicircle Izl = R in the upper half ofthe complex plane and [ -R, R].
According to the residue theorem,
f-R
R
eisAH(A.,h)dA.
=
r eiZAH(z,h)dz.
JC
(7.14.3)
R
However (M > 0 is a constant),
r eizAH(Z,A.)dzl ~ max
IH(z, A.) InR ~ ~ -+ 0
IJ~
~~
R
as R -+
00.
From this and (7.14.3), the assertion follows.
D
7. Spectral Analysis of Stationary Processes
190
In the following, we shall assume that f( . ) is bounded, i.e.,
sup f(A) <
(7.14.4)
00,
),ER
and that <p(A, h) E L2 {R, f7t, F} is such that
(7.14.5)
<p(z, h) is holomorphic in {z;Im{z} ~ O},
as Izl-+
<p(z, h) = O(lzlr)
z E {z;Im{z} ~ O}
00,
(7.14.6)
for some r -+ O.
The conditions of Proposition 7.14.1, (7.14.4), (7.14.5), and (7.14.6) are
sufficient for (i)-(iii) to hold. We will illustrate with some examples how these
conditions are used to construct a spectral characteristic <p(A, h). We will
confine ourselves to the case where f(A) is a rational function.
EXAMPLE 7.14.1. Let the wide sense stationary process g(t); t
covariance function
C(t) = 0'2e-~lrl, IX > O.
E
R} have the
As in Example 7.1.3, we can show that the corresponding spectral density is
0'2
0'2
IX
f(A) = -; A2
+ 1X2 =
IX
-; (2
+ lXi)(A - lXi)'
Now, we have [see (7.14.2)]
0'2
e izh - <p(z, A)
H(z, h) = -; (z + lXi)(z - lXi)
The function f(z) possesses a single pole, z = lXi, in the upper half-plane
{z; Im{z} ~ O}. Because H(z, h) is assumed holomorphic in this domain, this
pole must be canceled by the pole of the numerator at the same point.
Therefore, we must have
ei(~i)h
- <p(lXi, A)
= 0,
(7.14.7)
In addition, the requirement of Proposition 7.14.1 must be satisfied. The only
function <p(z, h) which satisfies all these conditions is <p(z, h) = e-~h (constant).
Thus,
(7.14.8)
From this and (7.13.11), we deduce that the best linear predictor Pre(t
of e(t + h) is
Pre(t
+ h) =
e-~It
f:
e ir ),l1(dA) =
e-~he(t)
+ h)
(7.14.9)
which depends only on the value of e(t) at the last observed instant of time.
The prediction error is easily found to be
~2 =
0'2(1 _
e2~1t).
7.14. Evaluation ofthe Spectral Characteristic qJ(l, h)
191
The following example is more instructive although slightly more
complicated.
EXAMPLE 7.14.2. Let {,(t); t E R} be a wide sense stationary stochastic process
with covariance function
+ sinocltJ)e-"'iti, oc > o.
(7.14.10)
Find the best linear prediction of ,(t + h) given ,(s) for all s ~ t,
Let us determine the spectral distribution of W). Because condition (7.1.9)
C(t) = u 2 (cosoct
holds and C(t) is real, we can uSe formula (7.1.14) to obtain the spectral
density
f().) = -1
foo C(t)cost)'dt
0
1t
2
= -u
1t
foo e"'t(cos oct + sin oct) cos).t dt.
0
After some calculation, we have
f().) =
4oc 3
•
).4 + (ocJ2)4
(7.14.11)
Because
).4
+ (ocJ2)4 =
=
+ i(ocJ2f] [).2 - i(ocJ2)2]
[). + (1 + i)oc] [). - (1 + i)oc] [). + (1
[).2
- i)oc] [). - (1 - i)oc],
we obtain
f().) = [).
+ (1 + i)oc] [). -
(1
+
4oc 3
i)oc] [).
3
ih)'
+ (1
- i)oc] [). - (1 - i)oc]
Hence,
H()' h)
,
=
[).
h)
40c [e - qJ()., ]
+ (1 + i)oc] [). - (1 + i)oc] [). + (1 - i)oc] [). - (1 - i)oc]
Becase the poles of the denominator in the upper half-plane are
). = (1
+ i)oc,
). =
-(1 - i)oc,
it is clear that the numerator must also vanish at these points, so that
+ i)oc, h) = e ih(lH)", = e- h(l-i)""
(7.14.12)
qJ( -(1 - i)oc, h) = e ih(i-l)", = e- h(lH)",.
(7.14.13)
qJ«1
On the other hand, from condition (i) we see that
f:
IqJ()., hW f().) d)' <
00,
192
7. Spectral Analysis of Stationary Processes
which clearly holds if
= a2 + b.
qJ(2, h)
(7.14.14)
With this spectral characteristic, all the conditions concerning asymptotic
behavior of qJ(Z, h) and H(z, h) as Izl--+ 00 are also satisfied.
To determine the coefficients a and b in (7.14.14), we use conditions
(7.14.12) and (7.14.13) to obtain the system oflinear equations
a(1
-a(1
+ i)a + b = e-h(l-i)",
- i)a + b = e-h(l+i)".
Solving this system, we obtain
. -h"
a
Ie
. h
= --sm
a,
b = e-h,,(cos ha
a
+ sin hal.
From this and (7.14.14), it follows that
2' -h"
qJ(2, h) = _Ie_sinha
a
+ e-h"(cosha + sinha).
We now substitute (7.14.15) in (7.13.11) to obtain
PtW
+ h) =
f
oo
-00
eitA (2'~
~sinha
= e~h" sin ha
f:
)
(7.14.15)
+ e-h"(cosha + sinha) '1 (d2)
i2e iAt '1(d2)
+ e-h,,(cos ha + sin ha)
f:
eW '1(d2).
From this and (7.11.16), we conclude that
PtW
+ h) = e- h" (sinaha ~'(t) + (cos ha + sin ha)~(t)).
(7.14.16)
7.15. General Form of Rational Spectral Density
In this section we will discuss the problem of evaluation of the spectral
characteristics qJ(2, h) in the case of rational spectral densities of the form
f(2) = K\(2
-ad· .. (2 - ak)\2,
Pd .. .(2 - Pn)
(7.15.1)
(2 -
where K > 0, k < n, and {a;} and {Pr} are the complex numbers such that
Im{a;} > 0, j
EXAMPLE
= 1, ... , k,
1m {Pr} > 0,
r = 1, ... , n.
7.15.1. The spectral density f(2) given by
22 + p2
f(2) = K 24
+cx
2'
K > 0,
a> 0,
P> 0,
(7.15.2)
193
7.15. General Form of Rational Spectral Density
can be written in the form (7.15.1) as follows. Clearly,
,e + /]2 = IA -
i/W,
(A2f
+ (a 2)2 = IA2 -
ia 212.
Hence,
f(A)
=
1
2
A - if3
(A - aJi)(A
+ aJi)
1 •
The function [eiA.h - <p(A, h)] will be holomorphic in the upper part of the
complex plane if the spectral characteristic <p(A, h) has singularities only at
points aI' ... , ak • In such a case, <p(A, h) must be in the form
<p(A, h)
=
(A -
Q(A, h)
ad .. ·(A - ad
Q(A, h)
TI1 (A -
(7.15.3)
a)'
where Q(A, h) is an entire function (i.e., it does not have singularities at finite
points). This takes care of condition (7.14.5). It is also required [see condition
(i) of Section 7.14] that
f:
I<p(A, hWf(A) dA <
(7.15.4)
00.
In addition [see (7.14.2)],
H(A h) = K {eiA.h TI1 (Ie - aj) - Q(A, h)}
,
(TI~ (A - f3))(TI~ (Ie - 13))
Ii (Ie _
I
iX.)
(7.15.5)
J
must be holomorphic in the upper part of the complex plane and such that
condition (7.14.6) holds. All these requirements clearly imply that Q(Ie, h) must
be a polynomial function
Q(A, h) = Co
+ CIA + ... + Cn_IA n- l .
(7.15.6)
The coefficients {cj }Z-l will be determined first under the condition that all
131' ... , f3n in (7.15.1) are different, i.e.,
(7.15.7)
Because, by assumption, H(A, h) is holomorphic in the upper half of the
complex plane, it follows readily that
(7.15.8)
for A = 13" 1 = 1, ... , n. Solution of this system of n linear equations gives the
unknown constants Co, ••• , Cn - 1 •
7.15.2. Determine the best linear predictor for ~(t + h), h > 0, based
s :5:: t, if the spectral density of the process ~(t) is given by
EXAMPLE
on
~(s),
f(A)
=
(A 2
1
+ 2( 2 )(AZ + aZj2)' a> 0.
7. Spectral Analysis of Stationary Processes
194
We can clearly write
f(A) =
1
,
I(A - iCXj2)(A - icx/j2W
(7.15.9)
so that n = 2, Pl = icxj2, P2 = icx/j2, and conditions (7.15.2) and (7.15.7) are
satisfied. From this it clearly follows that
Q(A,h) = Co
+ C1A,
so that Equation (7.15.8) becomes
e Uh
-
(co
+ C1A) = 0
for A = irxj2 and A = icx/ j2. Therefore, we have the system
+ icxj2c 1 = exp( - cxhj2),
Co + c 1 icx/j2 = exp( -cxh/j2)
Co
from which we obtain
Co
= [2 - exp( -rxhj2)] exp( -rxhj2),
C1
'0.£.[1
= _/ycx
(7.15.10)
- exp( -cxh/j2)] exp( -cxh/j2).
According to (7.15.3), we now have
((J(A,h) = Co
+ C1A,
so that best linear predictor is [see (7.13.11)]
Pr~(t + h) =
f:
f:
eiAt({J(A, h)r/(dA)
= Co
= coW)
eUt '1(dA)
+ C1 f:oo Ae w '1(dA)
+ C1 ~'(t)
with the coefficients Co and C1 given by (7.15.10).
Remark 7.15.1. The last example is easy to generalize. If the spectral density
of ~(t) is in the form
then clearly
({J(A-, h) =
n-l
I
j=O
CjA- j,
where the Cj can be obtained as the solution of the system oflinear equations
7.15. General Form of Rational Spectral Density
n-1
L
195
= eiJ.h
c))
j;O
(7.15.11)
for A = f3j ,j = 1, ... , n. In this case, the best linear predictor of ~(t
~(s) for all s :::; t, is
Pt~(t
+ h) =
+ h), given
+ C 1 ~'(t) + ... + Cn- 1~(n-1)(t).
coW)
(7.15.12)
When the zeros f31' ... , f3n in (7.15.1) are not all different, the procedure for
determining the spectral characteristic cp(A, h) remains essentially the same, as
the following example shows.
EXAMPLE
7.15.3. Let the spectral density be given by
In this case, we clearly have that n = 2, so that
cp(A, h) = Q(A, h) =
We obtain the coefficients
Co
and
eihJ. - (co
C1
Co
+ C 1 A.
from the fact that in such a case
+ C1 A) = 0
for A = irx
and
d 'h'
-[e' "- (co
dA
•
+ C1 A)]
=0
for A = irx.
This yields the equations
Co
+ c 1 iA = e-~h
C1
= ihe-~h,
.
the solution of which is
(7.15.13)
The best linear predictor of W + h) in this case is
Pt~(t
+ h) =
coW)
+ C 1 ~'(t),
where Co and C 1 are given by (7.15.13).
In general, if f31 appears 11 times, ... , f3, appears 1, times, where
11 + ... + 1, = n,
we have
dV
dA
-v
(.
e,J.h
n (A '
1
A = f31,
f3j)lj -
L
n-1.)
CjA) = 0
j;O
v = 0, 1, ... ,lp - 1,
p
for
= 1, ... , r.
(7.15.14)
7. Spectral Analysis of Stationary Processes
196
Problems and Complements
7.1. The covariance function of a wide sense stationary process
by (to> 0)
1
{
C(t) = 0
It I
{~(t); t E
R} is given
ifltl~to
to
if It I > to.
Determine its spectral density.
7.2. Let C(t) = e- 1tl cos ext be the covariance function of a wide sense stationary
process. Find its spectral density.
7.3. Does there exist a wide sense stationary process whose covariance function is
defined by
~ 0 ~ t ~ to
C(t) = {(12
o
1ft>
to.
7.4. Let X be a r. v. distributed on [0, tJ with the probability density jx( .). Let Y be
another r.v. independent of X and uniformly distributed on [-t, tJ. Show that
W) = (1cos(tX + Y)
is wide sense stationary and find its spectral density.
7.5. A wide sense stationary process {W); t E R} has the covariance function
C(t)
(12e- a1tl (1
=
+ Itl), ex> o.
Find its spectral density.
7.6. Find the spectral density of a wide sense stationary process with covariance
function
C(t) =
(12e- a1tl
ex>
cos Pt,
o.
7.7. Is the wide sense stationary process with spectral density
(i)
(ii)
j(J..) = J..2
j(J..) = K {ex 2 +
ex
+ P'
(~ _ P)2 + ex 2 + (~ + p)2}'
ex > 0,
differentiable?
7.8. What are conditions for infinite differentiability of a wide sense stationary
process (see Example 7.11.3).
7.9. The wide sense stationary process
{~(t); t E
R} has covariance function
C(t) = e- a1t { 1 + ex Itl
Determine its spectral density.
+ (ext).
Problems and Complements
197
7.10. Find the spectral density of the wide sense stationary process with covariance
function
C(t)
= e-·,t,(cos flt
-
IX sin
flltl).
7.11. Show that the orthogonal stochastic measure Yf has the following properties (see
Section 7.3):
(i) If AI' A2 E g, (Yf(A d, Yf(A 2)) = m(Al n A2)·
(ii) If A, BEg are disjoint,
Yf(A v B) = Yf(A)
+ Yf(B)
(a.s.).
(iii) If {B}f c: g is a disjoint sequence,
YfCQ Bk) = Jl Yf(Bk)
(a.s.).
7.12. Let Yf: g x Q --+ L {Q, 31, p} be a stochastic orthogonal measure, and let m: g
R+ [see (7.3.3) for the definition] be the measure associated with Yf. Define
--+
where hE L 2 {R,9i',m}. Show that Yfl is an orthogonal stochastic measure with
m 1 (. ) as associated measure.
7.13. (Continuation) If p(.) E L2 {R, 9i',md show that
L
P(S)YfI (ds)
=
L
p(s)h(s)l'/(ds)
(a.s.),
BEg.
7.14. (Continuation) If Ihl > 0 (a. e.) em], then
Yf(B) =
L
h- 1 (s)YfI (ds),
BEg.
7.15. Let {Zn}f c: L2{Q,31, P} be an orthogonal family such that I f IIZnl12 < 00.
Let {S, g} be a measurable space with g containing singletons as elements, i.e.,
XES => {x} E g. Define Yf: g --+ L2 {Q, 31, P} by
en
Yf(B)
=I
n=l
ZnIB(Xn),
{xn}f c: S,
BEg.
Show that Yf( . ) is an orthogonal stochastic measure with associated measure m( . )
given by
en
m(B) =
I
n=l
I Z nI1 2IB(xn).
Show also that
7.16. Let Yf be an orthogonal random measure with m(·) as associated measure. If B1 ,
B2 E g show that
198
7. Spectral Analysis of Stationary Processes
(i) 1117(Bd - I7(B2 W = m(B1 A B2 ),
(ii) I7(B1 u B2 ) = I7(Bd + I7(B2 ) - I7(B 1 n B2 ) (a.s.),
(iii) I7(B 1 A B2 ) = I7(B 1 ) + I7(B2 ) - 217(B 1 n B2 ) (a.s.).
7.17. (Continuation) If B1 c B2 , show that
(i) 1117(B1 )11 :s;; 1117(B2 )11,
(ii) I7(B2 - Bd = I7(B2 ) - I7(Bd (a.s.).
7.18. Let {~(t); t E R} be a wide sense stationary process with E {W)} = 0 and spectral
measure F (we use the same symbol for Lebesgue-Stieltjes measure and the
function F that generates the measure). Show that
1.i.m.
' ..... 00
r ~(s) ds = F(O).
Jo
t
7.19. (Continuation) Under the same conditions as in the previous problem, show that
lim -1
t-oo t
it
C(s) ds
= F(O),
0
where C is the covariance function of ~(t).
7.20. Let {X(t); t E R} be a wide sense stationary process with continuous covariance
function and spectral representation
X(t) =
Let
~(t)
f:oo eiAtl7x(dA.).
be defined by
~(t) = f:oo h(s)X(t -
s)ds,
where hE L2 {R, 9l, Fx}. Determine the covariance function and the spectral
density of W), assuming that the spectral density of X(t) is CPx(·).
7.21. Let {~(t); t ~ O} be a standard Brownian motion and Z a r.v. independent of ~(t),
with E{Z} = 0 and EIZI 2 = 1. Suppose that the process {X(t); t ~ O} is observed where X(t) = Zt + ~(t). Given X(s), s :s;; t, determine the best q.m. estimate it of Z in the form
it =
t
I/I(t,s)dX(s).
7.22. Let the process {X(t); t E R} be defined by
X(t)
=
Z cos 2IXt
+ N(t),
where {N(t);t E R} c L 2 {n,£i,p} with E{N(t)}
E{N(s)N(t
=
0,
+ s)} = eltl
and Z is a r.v. with E{Z} = 0, EIZI 2 = 1, and E{ZN(t)} = O. Determine a q.m.
estimator of 1'; of Y given X(s) for all s :s;; t in the form
y. =
t
I/I(t,s)dX(s)
+ X(O).
Problems and Complements
199
7.23. Suppose we observe
X(t) =
~(t)
+ (t),
where {~(t); t E R} is a wide sense stationary process with spectral density f~(A)
and g(t); t E R} a stationary noise process with spectral density h(A). Assuming
that ~(t) and e(s) are orthogonal for all s, t E R, show that the best q.m. linear
estimate of ~(t) based on X(s) is given by
foo
h(A)
iAt
-00
e h(A) + f~(A) Tfx
(dA)
.
7.24. (Continuation) Show that the best linear estimate of e(t
given by
+ h) given X(s), s ~ t, is
f:oo eiJ.t(A, h)Tfx(dl).
7.25. In Problem 7.23, suppose that h and f~ are specified as follows:
HA) = A2
Ki > 0,
0(,
+ 0(2'
P> 0. Show that, for h > 0,
(A h) _
<p,
where
Kl
- Kl
K1
+ K2
0(
0(
+ p p + iA
+ y y + iA e
-.h
,
CHAPTER 8
Markov Processes I
8.1. Introduction
The concept of a Markov random process was defined in Section 1.7 of
Chapter 1. Various particular cases were discussed in subsequent chapters.
Without doubt, this has been the most extensively studied class of random
processes.
Let g(t); t E T} be a Markov process on a probability space {n, 81, P}
assuming values in a set S. Denote by Y' a a-algebra of subsets of S. As usual,
the measure space {S, Y'} will be called the "state space" of the random
process ~(t). The process ~(t) is said to be a homogeneous Markov process if
its transition probability has the property
Pg(s
+ t) E BI~(s) = x} = P(x, t, B).
(8.1.1)
for all t > 0, s, s + t E T, XES, and BEY'. In the following (unless otherwise
stated), we will deal exclusively with homogeneous Markov processes.
The transition probability (8.1.1) must satisfy the Chapman-Kolmogorov
equation
P(X, s
+ t, B) =
Is
In addition, the following also must hold. For each t E T and
P(x, t, .) is a probability measure on Y'
and, for every t
E
(8.1.2)
P(x, s, dy)P(y, t, B).
XES,
(8.1.3)
T and BEY',
P(', t, B) is an Y' -measurable function on S.
In the following, we take T = [0, (0).
(8.1.4)
8.1. Introduction
201
The class of homogeneous Markov processes has an important feature.
From the Chapman-Kolmogorov equation (8.1.2), we see that knowing the
transition probability (8.1.1) for all x, B, and s ~ e, where e > 0 is arbitrarily
small, we can determine P(x, u, B) for any U > e. In other words, the local
behavior of a homogeneous Markov process (in a neighborhood of zero)
determines its global behavior.
Every function P(x, t, B) having properties (8.1.2)-(8.1.4) is said to be a
transition probability. Let n( . ) be a probability distribution on the state space
{S, Y}. If P(x, t, B) is a transition probability, does there exist a Markov
process with n( . ) as its initial distribution (see Definition 1.5.3) and transition
probability P(x, t, B)? The answer is affirmative. This is a stochastic process
{W); t ~ O} with marginal distributions
P{Wd
=
E
Bl, .. ·,~(t,,) E Bn}
f f
n(dxo)
S
B,
where 0 < t1 < ... < t",
P(xo,t1,dxd···
{Bdi
c Y, n
n(B)
r P«X,,_1,t. -
JB"
t,,_1,dx,,),
(8.1.5)
= 1,2, ... , and
= P{ ~(O) E B}.
(8.1.6)
The family of finite-dimensional distributions (8.1.5) is a consistent one which,
according to the Kolmogorov theorem 1.3.1, determines completely and
uniquely a probability measure, which will be denoted by P", on the measurable space {SR+, yR+}, where, as usual, R+ = [0, (0).
In the sequel, the following notation will be often useful. For any XES, we
will write
Px (-) = P,,{·I~(O) = x}.
(8.1.7)
Because
n(B) = P" {~(O) E B},
we have
P,,(G) =
Is Px(G)n(dx),
(8.1.8)
where G E yR+. The expectation operator corresponding to Px will be denoted by EX"
EXAMPLE 8.1.1. As an illustration, consider a homogeneous Markov process
{~(t); t ~ O} with state space S = { -1,1}, an initial distribution n( {I}) = t,
and a transition probability satisfying
P(l,t,{l})
= P(-l,t,{ -I}) = a(t),
(8.1.9)
where lX(t) is a continuous function on [0, (0) such that a(O) = 1. We want to
determine P,,{e(t) = I} and a(t) for all t ~ O.
8. Markov Processes I
202
The first part of the problem is simple. For every t > 0,
P,,{e(t)
= I} = tP(1, t, {I}) + tp( -1, t, {I}).
From this and (8.1.9), we have
P,,{e(t)
= I} = 1-
To determine IX (t), we will use the fact that P(I, t, {I}) is the transition
probability of the Markov process e(t) and, therefore, it must satisfy the
Chapman-Kolmogorov equation
P(I,t
+ s, {I}) = P(I,s, {1})P(I,t, {I}) + P(l,s, {-I})P( -1,t, {I}).
From this and (8.1.9), we readily deduce the functional equation
lX(t
+ s) = IX(S)IX(t) + [1 = 21X(s)lX(t) + 1 -
IX(S)] [1 - lX(t)]
IX(S) - lX(t).
(8.1.10)
Now, by means of the substitution
lX(t)
= t[1 + h(t)],
h(t
+ s) = h(s)h(t).
Equation (8.1.10) becomes
(8.1.11)
This is clearly the Cauchy functional equation whose only continuous solution is
h(t) = e-;", A. > O.
From this and (8.1.11), we obtain that
lX(t) = (1
+ e-;")/2.
Remark 8.1.1. A more conceptually complex concept of a Markov process
was described in Dynkin's book Markov Processes. According to Dynkin's
definition of a Markov process, its evolution takes place only up to a certain
random time. and then it terminates. More precisely, at time • (w), the
trajectory e(t, w) vanishes. For instance, if {e(t); t ~ O} is a Markov process on
a probability space {n, &I, P} and a random time .: n -+ (0, (0) is given, then,
for any WEn, the trajectory e(t, w) is defined only on [0, .(w)].
Markov processes considered in this book are those for which P{.(w) =
+oo} = 1. Such processes are often called "conservative."
Remark 8.1.2. It is reasonable to suppose that no transition can take place in
zero time. For this reason, we will assume from now on that
P(x, 0, B)
1 if x E B
= { 0 if x ¢ B.
(8.1.12)
8.2. Invariant Measures
203
We will also assume that P(x, t, B) is continuous at zero, i.e., that
1 if x E B
lim P(x, t, B) = { 0
if x ¢ B,
1-+0+
for all
XES
(8.1.13)
and BE!/'.
8.2. Invariant Measures
Let {W); t ;;::: O} be a (not necessarily homogeneous) Markov process with
state space {S,!/'} and transition probability P(s, t, x, B) (0 ~ s < t). It is not
difficult to see that if ~(t) is strictly stationary, then it must be a homogeneous
Markov process. This readily follows from
P(s,t,x,B) = P{e(t) E BI~(s) = x}
= P{e(t - s) E BI~(O) = x} = P(O,t - s,x,B) = P(x,t - s,B).
The converse, however, does not hold, in general. In other words, a homogeneous Markov process is not necessarily strictly stationary.
Under what conditions is a homogeneous Markov process strictly stationary? To answer this question, we need the concept of an invariant measure.
Definition 8.2.1. An invariant measure of a homogeneous Markov process
with a state space {S,!/'} and transition probability P(x, t, B) is a measure Jl
on {S,!/'} satisfying the following condition: for each BE!/' and t ;;::: 0
Jl(B) =
I
P(x, t, B)Jl(dx).
(8.2.1)
An invariant measure is not necessarily a probability measure on {S, !/'}.
When Jl(S) = 1, it is often called a "stationary distribution" of the Markov
process.
8.2.1. Let {~(t); t ;;::: O} be a standard Brownian motion process. As
we know (see Remark 3.1.1), ~(t) is a homogeneous Markov process with the
state space {R, aI} and transition probability
EXAMPLE
P(x, t, B) =
(27tttl/2 t exp ( - (y ;/)2) dy.
(8.2.2)
From this, we readily deduce that
P(x, t, B) --. 0
as t --.
+ 00
(8.2.3)
for every compact B E f#i. Therefore, it there exists a finite invariant measure
Jl, we must have
204
j,L(B) =
for every t
(8.2.3) that
~
0 and compact B
j,L(B)
f:
E~.
8. Markov Processes I
P(x, t, B)j,L(dx)
By letting t
--+
+00, we would get from
= 0 for all compact B E {]t.
Consequently, the only finite invariant measure of a standard Brownian
motion is j,L == o.
On the other hand, if j,L is the Lebesgue measure, we have, for all t > 0,
f:oo P(x,t,B)dx =
=
f: L
L f:
P(x,t,dy)dx
[(2nt)-1/2
exp (
(y
~tX)2)dXJdY =
L
dy.
In other words, when j,L is the Lebesgue measure, (8.2.1) holds. This clearly
implies that the Lebesgue measure is an invariant measure of a standard
Brownian motion process.
Let {~(t); t ~ O} be a strictly stationary Markov process. Then, as we have
just seen, it must be homogeneous. Denote by P(x, t, B) its transition probability and by n(· ) its initial distribution. Clearly then, for all t,
nCB)
= Pg(t) E B},
From this, we obtain
nCB)
=
BE f/.
Is P(x, t, B)n(dx).
Consequently, everyone-dimensional marginal distribution of the process is
an invariant measure.
Conversely, if j,L is a stationary distribution of a homogeneous Markov
process, then the process is also strictly stationary under Pw As a matter of
fact, because j,L can be considered as the initial distribution of the homogeneous Markov process, say g(t);t ~ OJ, with a state space {S,f/}, we have,
for any t > 0 and B E f/,
Pg(t)
E
B}
Is Pg(O) dx, ~(t) B}
= Is P(x, t, B)j,L(dx) = j,L(B).
=
E
E
From this, we deduce [see (8.1.5)] that
P{WI
+ s) E Bl,~(t2 + s) E B2}
=fs JBr JBr Pg(s)Edx,Wl+S)Edxl,W2+S)Edx2}
1
2
205
8.3. Countable State Space
=f f f
= Is Pg(0)Edx,WdEBl,~(t2)EB2}
P(x,tl,dxl)P(Xl,t2-tl,dx2)/l(dx)
S
B,
B2
= P{ Wl) E B l , ~(t2) E B 2},
and so on.
8.3. Countable State Space
A class of homogeneous Markov processes with countable state space is of
considerable interest in a variety of applications, particularly in biology and
engineering. In this section, we outline some fundamental properties of this
particular class.
Let {~(t); t ~ O} be a homogeneous Markov process with countable state
space S. Processes of this type are often called "continuous time Markov
chains" or simply Markov chains. In the following, without loss of generality,
we may assume that
S = {O, I, ... }.
We will denote by (s, t
~ 0)
Pij(t) = Pg(s
+ t) =
jl~(s) =
i}
(8.3.1)
the transition probability from a state i E S to another state j E S after a
time-lapse of duration t. In accordance with (8.1.12) and (8.1.13), we have that
I
if i = j
if i =f. j,
Pij(O) = { 0
and that
.
11m Pij(t) =
{I
(8.3.2)
if i = j
0 if·..4.·
1-+0+
I T"
J.
(8.3.3)
The last condition means that Pij(t) is continuous at t = O.
For any t ~ 0 and i,j E S,
o : :; Pij(t) :::;; 1
and
co
L Pik(t) = 1,
k=O
(8.3.4)
so that the transition probability matrix
POl (t)
Pll (t)
... J
...
(8.3.5)
8. Markov Processes I
206
is a stochastic matrix. Due to (8.3.3), we clearly have that
lim Mt
= 1,
t~O+
where 1 is the unit matrix.
The Chapman-Kolmogorov equation has the form (s, t ;;::: 0)
Piis
+ t) =
00
L Pik(S)Pkj(t).
k=O
(8.3.6)
By means of this equation, one obtains readily that
Ms+t=Ms·M t.
(8.3.7)
This clearly implies that the family of transition matrices {Mt; t ;;::: o} forms a
semigroup. Next, we shall show that Pij(t) is continuous for all t ;;::: O.
Proposition 8.3.1. For every i, j
Ipij(s
PROOF.
E
Sand s, t ;;::: 0,
+ t) -
Pij(S) I ~ 1 - Pii(t).
(8.3.8)
From the Chapman-Kolmogorov equation [(8.3.6)], we obtain
Pij(s
00
+ t) -
Pij(s) =
L Pik(t)Pkj(s) k¥i
Pij(S)
+ Pii(t)Piis).
(8.3.9)
On the other hand,
00
00
L Pik(t)Pkj(S) ~ k¥i
L Pik(t) = 1 k¥i
Pii(t),
which yields
Pij(S
+ t) -
Pij(s)
~
[1 - pdt)] [1 - Pij(s)].
In a similar fashion, we deduce from (8.3.9) that
- [Piis + t) - Pij(S)]
=~
00
L Pik(t)Pkis) + Pij(s) k¥i
Pij(S) [1 - pdt)]
~
Pii(t)Pij(S)
[1 - Pii(t)],
o
which proves (8.3.8).
Corollary 8.3.1. From (8.3.8), we have that, for any 0 < t < s,
Ipij(s - t) - Pij(s)1
~
1 - Pii(t).
This, (8.3.8), and (8.3.3) clearly imply that Pij(t) is uniformly continuous on R+
for all i,j E s.
The question of differentiability of Pij(t) is of central importance in applications. This stems from the fact that attempts to calculate Pij(t), under
various regularity assumptions on the Markov chain ~(t), lead to a system
207
8.3. Countable State Space
of difference-differential equations whose solution may give Pij(t). Further
discussion on this subject requires the following lemma due to Hille.
Lemma 8.3.1. Let H: R
--+
R, where H(t)
= 0 for
all t ::;; 0, be subadditive, i.e.,
for all s, t E R,
H(s
+ t) ::;; H(s) + H(t),
(8.3.10)
then
lim H(t)/t
=
t~O+
where c
~
sup H(t)/t
O,;;t<oo
= c,
(8.3.11)
0 may be +00.
PROOF. From the definition of H(·), it follows that H(t) ~ 0 for all t E R. Now,
choose any c' < c. Then, from the definition of c, it follows that there exists
s > 0 such that
c' < H(s)/s.
Next, fix s and write
s=nt+b,
n= 1,2, ... ,
where t > 0 and 0 ::;; b < t. Clearly then,
c' < !H(s) ::;; nH(t)
s
When t --+ 0, b --+ 0 and n --+
inequality, we obtain
00
+ H(b) =
nt H(t)
sst
+ H(b).
s
so that ntis --+ 1. Consequently, from the last
c' ::;; lim inf H(t)/t,
t~O+
o
which proves the lemma.
Proposition 8.3.2. For each i
E
S, the limit
lim (1 - Pii(t»/t = qii
(8.3.12)
exists but may be equal to +00.
PROOF.
Write
(8.3.13)
From (8.3.6), we clearly have
Pii(S
so that
+ t) ~ Pii(S)Pii(t),
8. Markov Processes I
208
and, due to (8.3.2), H (0) = O. Consequently, all the conditions of Lemma 8.3.1
are satisfied. Now, from (8.3.13), we have, as t ---.0,
1
Hi(t)
((1 - pdt» = -t-[1
+ O(Hi(t»)].
o
This and Lemma 8.3.1 prove the assertion.
Remark 8.3.1. For all i E S, Pii(t) is differentiable at zero.
Next, we discuss the question of differentiability of the transition probabilities Pij(t) at t = 0 for i -=1= j. The following proposition shows that P;j(O) = %
always exists and that 1%1 < 00.
Proposition 8.3.3. For all i
-=1=
j, from S, the following limit exists:
lim pij(h)/h
= qij <
(8.3.14)
00.
h~O+
PROOF. Let c E (t, 1) be fixed. Then, due to (8.3.3), there exists an s > 0
sufficiently small so that Pii(S) > c and Pjj(s) > c. Let 0 < S ::;; nh < 1>. Consider
a homogeneous Markov chain gn}(f with state space S such that
p gn+1
= jl ~n =
i}
pij(h).
=
Then, for all n > 1, we have
pij(nh)
=
Pgn
= jl~o =
i} ~ Pgl
= j, ~n = jl~o =
i}
n-I
+ r=1
L Pgl
-=l=j'''''~r-l -=l=j'~r
~ CPiih)( 1 + :t: Pgl
-=1=
j, ... ,
=
il~o
=
~r-I
i,~r+1 =j'~n =jl~o
-=1=
j,
~r = il~o =
=
i}
=
i}
i}).
On the other hand,
Pgl -=l=j'''''~r-I -=l=j'~r
= i}
= Pgl -=l=j'''''~r-2 -=l=j'~r = il~o
- Pgl -=l=j'''''~r-2 -=l=j'~r-I =j'~r = il~o = i}
=
Pgl -=l=j'''''~r-3 -=l=j'~r
= il~o =
i}
- Pgl -=l=j'''''~r-3 -=l=j'~r-2 =j'~r
= il~o =
i}
- Pgl -=l=j'''''~r-2 -=l=j'~r-I =j'~r = il~o = i},
and so on. Continuing in this fashion, we obtain
Pgl -=l=j'''''~r-I -=l=j'~r
= Pgr = il~o
~
C -
(1- c)
= il~o =
i}
L Pgl
-=l=j'''''~k-I -=l=j'~k =j}Pgr = il~k =j}
=
i} -
L
Pgl -=l=j'''''~k-I -=l=j'~k =jl~o = i} ~ 2c-1.
k<r
k<r
8.3. Countable State Space
209
Consequently,
Pij(nh)
~
(2c - l)npij(h).
Set t < b, h < b, and n = [t/h], where [x] is the integer part of x; then,
Pij(h)/h ~ pij([t/h]h)/{[t/h]hc(2c - I)}.
By letting h --+ 0+, we obtain
lim sup (pij(h)jh) ~ Pij(t)/{ tc(2c - I)} <
00
h~O+
for all t > O. Therefore,
lim sup (pij(h)/h) ~ lim inf (Pij(t)/{t(2c - l)c}),
h~O+
t~O+
which means that the limit
qij
=
lim (Piih)jh) <
00
h~O+
o
exists. This proves the proposition.
If So c S is any finite subset which does not contain the state i, we have
(1 - p;;(h»/h
~ (.L
)eSo
Piih»)!h.
By letting h --+ 0+, it follows [see (8.3.12) and (8.3.14» that
00
qu ~
If qu <
00
L qij=>qU ~ j",i
L qu·
jeSo
and
00
qii =
L qij'
Ui
(8.3.15)
the state i is called "stable" or "regular." Otherwise, it is called "nonregular."
Remark 8.3.2. The matrix
A=
(8.3.16)
is called the infinitesimal generator of the Markov chain {e(t); t ~ O}. Its
essential role will become clear in what follows.
Remark 8.3.3. The state i E S is "instantaneous" if qu =
qii = O.
00
and "absorbing" if
8. Markov Processes I
210
Proposition 8.3.4. Assume that qii <
00 for each i E S, then the transition
probabilities Pij(t) are differentiable for all t ~ 0 and i,j E S. In addition,
pW) =
00
L qikPkit) k#i
(8.3.17)
qiiPij(t).
PROOF. Consider any two states i,j E S. Then for any h > 0 and a finite subset
No c S containing i andj, we have
(Pij(t
+ h) -
pij(t»/h = (
L
keNo
+
Pik(h)Pkj(t) - Pij(t)
L
keNo"
Pik(h)Pkj(t»)!h.
(8.3.18)
Clearly,
(8.3.19)
Due to (8.3.15), given any arbitrary small a> 0, we can find No such that
L
keNo
qik:5: a.
Now, choose ho > 0 so that for each h :5: ho, k i= i, and kENo,
IPik(h)/h - qikl :5:
a/II No II
and
1(1 - Pii(h»/h - qiil
:5:
a/II No II,
where IINol1 = Card{No} (the number of elements in No). Consequently, from
(8.3.18) and the condition of the proposition, we have
L
keNo-Ii}
qikPkj(t) - qiiPij(t) :5: lim inf (Pij(t
h-+O+
+ h) -
:5: lim sup (Pij(t
h-+O+
+ h) -
By letting a ~ 0, (8.3.17) follows.
pij(t»/h
Pij(t»/h
o
Remark 8.3.4. The system (8.3.17) is called the Kolmogorov backward equations. The matrix form of it is
(8.3.20)
Remark 8.3.5. The following also holds:
dM'/dt = M'· A,
which is called the Kolmogorov forward equation.
(8.3.21)
8.4. Birth and Death Process
211
8.4. Birth and Death Process
The birth and death process is an important example of the homogeneous
Markov chain {~(t); t ~ O} with state space S = {O, I, ... }. Here, ~(t) represents
the size of a population at time t, which fluctuates according to the following
rules:
if at time t the chain is in a state i E S,
in one transition it can go only to i-lor i
(8.4.1)
+ 1.
The transition from state i to i + 1 indicates a "birth" in the population,
whereas the transition from i to i - I indicates a "death." Two or more
simultaneous births or deaths are not permitted. We also suppose that
"autogenesis," i.e., transition from 0 to 1 is not excluded.
Assume that the chain {~(t); t ~ O} is stochastically continuous and that all
states are stable [i.e., qii < 00 and (8.3.15) holds]; then,
lim (1 - Pii(h))/h = qu <
00,
h ... O
i E S,
for all i E S and, from assumption (8.4.1) we have, as h ..... 0,
pij(h) = o(h)
Pi,H1 (h) = Aih
+ o(h),
if Ii
- il > 1,
(8.4.2)
Pi,i-1 (h) = I'ih
+ o(h),
(8.4.3)
where [in each case o(h) may depend on i]
1'0 = O.
(8.4.4)
From this, one can deduce the following: Aih + o(h) represents the probability
of a birth in (t, t + h) given that ~(t) = i. Similarly, I'ih + o(h) represents the
probability of a death in (t, t + h) given W) = i. Because all the states are
stable, we have that
(8.4.5)
qu = Ai + I'i·
The infinitesimal generator A (see (8.3.16)] in this case is
A=
-AO
AO
0
0
1'1
-(A1 + 1'1)
A1
0
0
1'2
- (A2 + 1'2)
A2
0
0
1'3
-(A,3
+ 1'3)
The parameters Ai and I'i are called the birth and death rates, respectively.
Now, from the Kolmogorov backward matrix equation
dMI/dt = AMI,
8. Markov Processes I
212
[see (8.3.7)] we readily deduce that
POj(t)
= - AoPoN) + AOPlj(t),
p;it) = Jl.iPi-l,j(t) - (Ai
j
= 0, 1, ... ,
+ Jl.i)Pij(t) + AiPi+l,j(t),
i:?;
(8.4.6)
1.
This is the Kolmogorov backward system of difference differential equations
for a birth and death process.
From the Kolmogorov forward equation (8.3.21), we obtain
+ Jl.IPi,l (t), i E S,
Aj-1Pi,j-l(t) - (Aj + Jl.j)Pij(t) + Jl.j+1Pi,j+l(t),
p;o(t) = - AoPiO(t)
P;j(t) =
j:?;
1.
(8.4.7)
Denote by
P{W)
=j} = Pj(t);
then, from (8.4.7), we obtain that
+ Jl.IPl (t),
pj(t) = Aj-1Pj-l (t) - (Aj + Jl.j)Pj(t) + Jl.j+1Pj+l (t).
Po(t) = - AoPo(t)
(8.4.8)
Remark 8.4.1. If Jl.k == 0, the process e(t) is called a "pure birth" process. If, on
the other hand, Ai == 0, the process e(t) is called a "pure death" process.
The following is an application of the birth and death process in a telephone traffic problem.
EXAMPLE 8.4.1. Consider a telephone exchange where the number of available
lines is so large that for all practical purposes it can be considered infinite.
Denote by e(t) the number of lines in use at time t. Then, our physical
intuition is not violated by assuming that {e(t); t :?; O} is a birth and death
process. It also seems reasonable to assume that, for all t > 0, h > 0, and i =
0,1, ... ,
P{e(t
+ h) = i + lle(t) = i} = Ah + o(h)
because the probability that a call will occur in (t, t + h) is independent of
the number of busy lines at time t. On the other hand,
P{e(t
+ h) = i -
lle(t)
= i} = Jl.ih + o(h)
as h --+ 0, for the obvious reasons.
Let us calculate Pi(t) under the assumption that
Jl.i=iJl..
In this case, the system of difference-differential equations (8.4.8) becomes
Ai(t) = - APo(t)
+ Jl.Pl (t),
pj(t) = APj-l (t) - (A
+ jJl.)pit) + Jl.(1 + j)Pj+l (t).
(8.4.9)
(8.4.10)
8.4. Birth and Death Process
213
This is not a recursive system (i.e., it cannot be solved successively). For
this reason we are compelled to use the method of probability generating
functions.
Set
g(t, u) =
co
L Pj(t)u j ;
j=O
then taking into account (8.4.9) and (8.4.1), we obtain
ag(t, u) _ ~ '(t) j
u
at
j=O
- - - L... Pj
= - APo(t)(l - u)
+ 2JlP2(t)u(1
= - A(l - u)
=
+ JlPI (t)(l
- u) - API (t)(l - u)u
- u) - ...
co
L Pj(t)u j + Jl(l j=O
-A(l - u)g(t,u)
+ Jl(l
co
u)
L jPj(t)U j j=l
1
ag(t, u)
- u)---att.
Thus, the generating function g(t, u) satisfies the linear partial differential
equation
ag~; u) _
Jl(l _ u) ag~~ u) = _ A(l _ u)g(t, u).
(8.4.11)
Suppose that e(O) = io; then,
g(O, u)
= u io .
(8.4.12)
Solving this equation by standard methods and using the initial condition
(8.4.12), we obtain
g(t, u) = {1 - (1 - u)e-/lt}io exp
{-~(1 -
u)(l - e-/lt )}.
(8.4.13)
Note that the first term on the right side of (8.4.13) is the probability
generating function of the binomial distribution with P = exp { - Jlt}, whereas
the second term is the probability generating function of the Poisson distribution with mean value
Therefore,
(8.4.14)
where eo(t) is a binomial component, el(t) is a Poisson component, and eo(t)
is independent of el(t) for all t ~ O.
We now expand (8.4.13) to obtain Pk(t). Mter some calculations, we obtain
8. Markov Processes I
214
_
Pj(t) - exp
{_~
Jl
(1
_ -lit}. minj!,o.jJ (iO)(~)j-re-lItk(1 - e- lIt )io+ j-2k
e)
2..
k
(" k)!
'
k=O
Jl
] .
wherej = 0, 1, .... From (8.4.14), we deduce that
Eg(t)} = ioe-llt
A.
+ -(1
- e- lIt ).
Jl
These formulas were first obtained by Palm.
8.5. Sample Function Properties
Let {~(t); t ~ O} be a homogeneous Markov process with the state space
S = {O, I, ... } and transition probabilities Pij(t) which satisfy condition (8.3.3).
In this section, we outline some basic properties ofits sample functions, which
are step functions. First, it is not difficult to establish that the process is
stochastically continuous (see Definition 1.9.1) which is equivalent to
lim P{ W ± h)
h-+O+
= W)} = 1,
h ~ O.
(8.5.1)
To show that (8.5.1) holds, consider
P{ W + h) = ~(t)} =
ex>
L pjj(h)pi t).
j=O
From condition (8.3.3) and the bounded convergence theorem, it follows that
Pg(t + h) = ~(t)} -. 1 as h -. 0+. In a similar fashion, one can show that
Pg(t - h) = ~(t)} -.1 as h -.0+.
Because the process is stochastically continuous, it must be separable and
measurable on every compact interval. In such a case (see Proposition 1.10.1),
there exists a separable version, which is stochastically equivalent to ~(t), all
of whose sample functions are continuous from the right. In the following, we
will deal exclusively with this version, which we will denote by the same
symbol ~(t).
Because the trajectories of ~(t) are (right continuous) step sample functions,
it seems intuitively justifiable that, at least when a state i E S is regular, to
assume that if for some t
~(t, w)
=
i,
then there exists an h > 0, which may depend on wand i, such that
~(t + h, w) = i. This gives rise to the following question. How long does the
process ~(t) stay in a state after entering it? To answer this question, denote by
r(t)
= inf{h >
0; W) "#
W + h)}
the length of time the process remains in the state in which it is at time t.
Let us calculate the conditional probability
215
8.5. Sample Function Properties
(jJi(S)
= P{r(t) >
= i}, i E S,
sl~(t)
S ~
(8.5.2)
0
[this conditional probability does not depend on t because the process
homogeneous]. Because
{r(t) > s
+ u}
= {r(t)
> s,r(t + s) > u},
~(t)
is
(8.5.3)
it follows that
(jJi(S
+ u) =
=
P{r(t) > s,r(t
P{r(t) > sl~(t)
Now, {~(t) = i, r(t) > s}
property of ~(t),
P{r(t
+ s) >
=
=
c
+ s) >
=
ul~(t) =
i}P{r(t
i}
+ s) >
ul~(t) =
i,r(t) > s}.
(8.5.4)
{W + s) = i}. Consequently, due to Markov
ul~(t) =
i,r(t) > s}
+ s) > ul~(t) = i,r(t) > s,W + s) = i}
P{r(t + s) > ulW + s) = i} = (jJi(U).
P{r(t
From this and (8.5.4), we deduce
(jJi(S
+ u) =
(8.5.5)
(jJi (s) (jJi(U).
This is a Cauchy functional equation. Because (jJi(S) is continuous at s = 0,
it is continuous at all points s ~ O. In addition, 0 ~ (jJi(S) ~ 1, which implies
that the only solution of (8.5.5) must be
(jJi(S) = e-A,s
where Ai ~ 0, and Ai may be
(see also Remark 8.3.3).
+00.
for all s
~
0,
(8.5.6)
We now can give the following definition
Definition 8.5.1. A state i E S is called "absorbing" if Ai
00, and instantaneous if Ai = 00.
o < Ai <
=
0, stable if
Clearly, if i is absorbing and if {~(t) = i} occurs, it follows that W + s)
for all s ~ O. In other words, if ~(t) enters i, then it stays there forever.
On the other hand, if i is a stable state, then
=
i
prO < r(t) < ool~(t) = i} = 1.
Finally, if i is an instantaneous state,
P{r(t) = Ol~(t) = i} = 1,
which implies that the process exists from an instantaneous state as soon as
it enters it. This, however, is not possible because, by assumption, we are
dealing with a separable version with right-continuous trajectories. Consequently, the state space S of such a Markov chain does not contain instantaneous states. Conversely, if the state space S of a Markov chain does not
contain instantaneous states, the process is Cadlag (i.e., continuous on the
right and having limits on the left).
8. Markov Processes I
216
Before concluding this section, let us describe once more the structure of
the sample functions of a homogeneous Markov chain {~(t); t ;::: O} with state
space S = {O, I, ... }, where each element of S is a stable state. At time t = 0,
the process is in a state, say i o. It stays there for a random time T1 at the end
of which it jumps to a new state. It remains in the new state for a random time
T2 and then it jumps to another state, and so on. Because all the states in S
are stable, each sample function of the process ~(t) is a right-continuous step
function with left-hand limits.
Set To = and
°
n
L 1'",
'tn =
n = 0, 1, ....
(8.5.7)
k=1
Then we have the following definition.
Definition 8.5.2. A homogeneous Markov chain {W); t ;::: O} with state space
S = {O, 1, ... } is said to be regular if its sample functions are (a.s.) continuous
from the right and
=
sUp'tn
n
+00
(a.s.).
(8.5.8)
Remark 8.5.1. The times 't 1, 't 2, •.. are the instants of transitions of the process
W)·
8.6. Strong Markov Processes
To investigate some deeper properties of a Markov process, the concept of
stopping time is required. Consider a stochastic process {X(t); t E R+} defined
on a probability space {n,gH,p}, with an arbitrary state space {S, 9"}. We will
assume that the stochastic process is directly given (see Section 1.2 of Chapter
1). In other words, n = SR+ and 9" = 9"R+. Therefore, each WEn is a function
w: R+ -+S.
We now define the concept of a "stopping time" associated with the
process X(t). This random variable is also known as a "Markov time" or a
"random variable independent of the future" (Dynkin).
Definition 8.6.1. A random variable
't:
n -+ [0, 00]
is called a "stopping time" if it has the property that for any two
such that
W1(8)
= W2(8)
and 't(w 1) ::;; t, then 't(w 1) = 't(W2)'
for all 0::;; s ::;; t
W 1, W 2 E
n
8.6. Strong Markov Processes
217
From the definition of a stopping time L, it clearly follows that the occurrence or nonoccurrence of the random event {w; L ~ t} can be established
from the observation of the random process X(s) on [0, t] alone.
8.6.1. Clearly every constant L = t > 0 is a stopping time. The first
visiting time of a certain set A E!/' by the process X(t) is also stopping time,
i.e.,
EXAMPLE
LA(') = inf{s;X(s,·) E A}.
Clearly, in this case 0 ~ LAO ~ 00. The nth visiting time of A is also a
stopping time. However, the time of the last visit to A is not a stopping time.
The sum L1 + L2 of two stopping times L1 and L2 is also a stopping time.
The random variables
inf{L1,L2}
and
SUP{L1,L2}
are also stopping times.
Consider now a homogeneous Markov process {e(t); t ~ O} with an arbitrary state space {S, !/'}. These properties of W) imply that, for any 0 ~
Sl < 00. < Sm < sand 0 < t1 < 00. < tn'
+ td E B1,oo.,e(s + tn) E Bnle(sd = x 1,.oo,e(sm) = xm,e(s) = x}
= Px{ Wd E B 1, 00., Wn) E Bn}
(8.6.1)
P{e(s
[see (8.1.7)] where
{BJi
c!/,.
It seems intuitively reasonable to expect that relation (8.6.1) will hold if s
is replaced with a stopping time L. This, however, is not true in general.
Markov processes having this property are called "strong Markov processes."
The following definition gives a precise formulation of the concept.
Definition 8.6.2. A measurable homogeneous Markov process with a state
space S is said to have the strong Markov property if, for any stopping time L,
P{e(L
=
+ t 1) E B 1,oo.,e(L + tn) E Bnle(s);s < L,e(L) =
Px{Wd E B1,oo., Wn)
E
Bn}.
x}
(8.6.2)
When is a homogeneous Markov process endowed with the strong
Markov property? The following result is due to Dynkin and Yushkevich
(1956).
Proposition 8.6.1. A homogeneous Markov process {e(t); t ~ O} with state space
{S,!/'} and transition probability P(x, t, B) has the strong Markov property if
and only if for any XES, BE!/', and t > 0,
PX{e(L
+ t)EB} =
Is P(y,t,B)Px{e(L)Edy}.
(8.6.3)
8. Markov Processes I
218
There is an extremely useful criterion for determining whether a Markov
process has the strong Markov property. To formulate this criterion, some
new notation is needed. Suppose that S is a metric space. Then, Y' is the Borel
algebra. Denote by B the space of all bounded Borel measurable functions on
S. With the usual supremum norm
Ilhll
= sup Ih(x)l,
(8.6.4)
x
the space B is a Banach space.
For each t ;;?: 0, define an operator Tt by
(T' h)(x) =
1
h(y)P(x, t, dy)
(8.6.5)
or equivalently
(8.6.6)
Because
1(T'h)(x)1
=
11
h(Y)P(X,t,dy)l::;;
s~p Ih(y)1 = Ilhll,
T': B -+ B is a contraction.
Proposition 8.6.2. Let {~(t); t
;;?: O} be a homogeneous Markov process with state
space {S, Y'} and a transition probability P(x, t, B) which has the property that
the operator (8.6.5) maps continuous bounded functions into continuous bounded
functions. Then, the process ~(t) possesses the strong Markov property.
Remark 8.6.1. Processes satisfying conditions of Proposition 8.6.2 are called
"Feller processes."
Remark 8.6.2. The following result is due to Yushkevich. Let {~(t); t ;;?: O} be
a homogeneous Markov process with countable state space S. If ~(t) is
stochastically continuous, there exists a separable version which possesses the
strong Markov property. In other words, if all elements of S are stable states
the Markov chain is a strong Markov process.
8.7. Structure of a Markov Chain
Let us now return to Markov processes with countable state spaces and
examine in more detail some questions concerning their stochastic structure.
To this end, consider a homogeneous Markov chain {~(t); t ;;?: O} with state
space S = {a, I, ... } and transition probability Pij(t). We will assume that
condition (8.3.3) holds; in such a case, Pij(t) is called a "standard" transition
8.7. Structure of a Markov Chain
219
probability. From Section 8.5, we see then that the process e(t) must be
stochastically continuous. This, on the other hand, implies that e(t) is separable and measurable. Then (see Proposition 1.10.1) there exists a separable
version all of whose sample functions are continuous from the right and have
left-hand limits.
According to what was stated in Section 8.5, the state space S of e(t) does
not contain instantaneous states. In the rest of this section, we will deal
exclusively with the separable version of the Markov chain, which will be
denoted by the same symbol e(t).
As before, let us denote by T I, T 2, ..• the instants of transitions of the
process e(t), and by To = 0, T" = Tn - Tn-I' n = 1,2, .... Clearly, each Tn is a
stopping time. Because the process e(t) is measurable, X o, X I ' ... , defined by
(8.7.1)
are clearly random variables. Consider the bivariate sequence
{(Xn' T,,)}O',
To =0.
(8.7.2)
What is the stochastic structure of this bivariate process? This and related
questions will be discussed in the rest of this section. For this purpose, the
following two lemmas are needed.
Lemma 8.7.1. For all i
E
Sand t
~
0,
Pj{TI > t} = e- l ".
(8.7.3)
PROOF. Notice that
Pj{TI > t}
On the other hand, for any t
Pj{TI > t
~
= Pj{e(s) = i,O::;;; s::;;; t}.
0, u
~
(8.7.4)
0,
+ u}
= Pj{e(s) = i,O::;;; s::;;; t
+ u}
= Pj{e(s)
= i,O::;;; s::;;; t,e(v) = i,t::;;; v::;;; u + t}
= Pj{e(s)
= i,O::;;; s::;;; t}Pj{e(v) = i,t::;;; v::;;; t + ule(s) = i,O::;;; s::;;; t}.
Now, because e(t) is a homogeneous Markov process, we readily deduce that
Pj{e(v)
= i,t::;;; v::;;; t + ule(s) = i,O::;;; s::;;; t} = ~{e(v) = i,O::;;; v::;;; u}.
Therefore, taking into account (8.7.4), we obtain
Pj{T1 > t
+ u} = Pj{TI >
t}Pj{T1 > u}.
From this, in the same way as in Section 8.5, we conclude that
Pj{TI > t} = e- l ",
which proves the assertion.
o
8. Markov Processes I
220
Lemma 8.7.2. Under Pi' the Tl and Xl are independent random variables.
PROOF.
Denote by
t
+ To
= inf{s; ~(s) =1= i and s
> t}
the first exit time from the state i after time t. Then, for all t > 0 and i
we have (if Pi{Tl > t} > 0)
=1=
j E S,
Pi{Tl > t,X l =j} = Pi{Tl > t,W) = i,~(Td =j}
= Pi{Tl > t,W) = i,W
= Pi{W
+ To) =jlTl >
+ To) =j}
t,~(t)
= i}Pi{Tl > t,W) = i}.
Because, by assumption Pi{Tl > t} > 0,
Pi{W
+ To) =jlTl > t,W) = i} = Pi{W + To) =jl~(t) = i}
= Pig(Td =j}
(due to homogeneity of the Markov process). This gives
Pi{Tl > t,X l = j} = Pig(Td = j}P;{Tl > t},
o
which proves the assertion.
The following proposition is the central result of this section.
Proposition 8.7.1. The bivariate sequence {(T;,Xi}O' represents a Markov renewal process. In other words, for any n = 0, 1, ... , t > 0, j E S,
P{T,,+1 > t,Xn+1 =jl(T;,Xi)o} = Pi{Tl > t,X l = i}
on the set {Xn
(8.7.5)
= i}.
PROOF. Let us begin with some facts. First, ~(t) has the strong Markov
property. Second, knowing {(T;, X;)}o is equivalent to knowing ~(t) for all
o ::;; t ::;; Tn' Therefore,
P{T,,+l > t,Xn+l =jl(T;,X;)i} = P{T,,+l > t,Xn+l =jl~(t);t::;; Tn}
=
fX) P{T,,+l Eds,~(Tn + s) =jl~(Tn)}
=
fX) Px .{ T,,+1 E ds, ~(s) = j}
= pxJ Tl > t, ~(Tl) = j}
(a.s.)
This proves the proposition.
o
8.7. Structure of a Markov Chain
221
Corollary 8.7.1. Write
(8.7.6)
Then, from the last proposition and Lemmas 8.7.1 and 8.7.2, it follows that on
{Xn = i},for all n = 0,1, ... ,
(8.7.7)
Corollary 8.7.2. From (8.7.7), we easily deduce that {Xn}O' is a homogeneous
Markov chain with state space S and the transition probabilities
(8.7.8)
In addition,
(8.7.9)
Corollary 8.7.3. From Proposition 8.7.1, it also follows that
P{T1 > t 1,···, 1',. > tnlXo = io,···,Xn = in}
= P{T1 > t11 X o = io }·· .P{1',.
> t nlXn- 1 = in-d
(8.7.10)
Corollary 8.7.4. It is not difficult to show that the bivariate sequence of random
variables {(tn, Xn)}O' represents a homogeneous Markov chain with transition
probabilities (s < t)
(8.7.11)
The next topic to be discussed is the question of regularity of a Markov
chain (see Definition 8.1.5). Under what conditions on {A.i}O' does (8.5.8) hold?
If the sequence {A'i}O' is bounded,
sup A.i ::;;;
i
P<
(8.7.12)
00,
then the process is certainly regular. This can be proved as follows. If (8.7.12)
holds, then e-A,I ~ e- P1 for all t ~ O. From this and (8.7.10), we have
P{T1
::;;; t 1 ,···, 1',.
::;;; tn }
n (1 n
::;;;
e- P1j ).
1
From this, we deduce that, for all n = 1, 2, ... ,
tn
~
t
k=l
Zk (~means in distribution),
where {Zk}O' is an Li.d. sequence ofrandom variables with common distribu-
222
8. Markov Processes I
tion function 1 - e- Pt . Hence,
P{tn
for all 0 <
t
<
00.
~ t} ~ pt~ Zk ~ t} =
By letting n --+
00,
k~n (~t
we deduce that
~ t} ~ e- Pt lim
P{t
e- pt
f
n~oo k:n
({3?k = O.
k.
The following proposition provides a more general criterion for regularity
of ~(t).
Proposition 8.7.2. A necessary and sufficient condition for a homogeneous
Markov chain to be regular is that
P{fAX~ = oo} =
1.
k:1
PROOF.
According to (8.7.9), for any i
E{e-aTnIXn_1
=
i}
E
S and a> 0,
=
too e- as P{1;'
=
Ai
E
dslXn- 1 = i}
too e-rxse-l,sds = Ad(a + AJ
From this and (8.7.10), we deduce that, for every n = 1,2, ... ,
n-1
E{e-atnIXo = iO,oo.,Xn = in} =
Aij(a + Ai)
n
k:O
or
Set
t
=
Ik"=1 7;,; then
Because
lim E{e- at } = E{I{t<oo}},
a--+O+
we have
Finally, it is well known that
223
8.8. Homogeneous Diffusion
00
lim
fI Axj(a + Ax.) = {I0
a~Ok=O
if
LAx! < 00
k=O
00
if
LAx! =
k=O
00.
This yields
pL~o Ax! < (f)}
P{r < oo} =
o
and the assertion holds.
8.8. Homogeneous Diffusion
Let {e(t); t ~ O} be a real homogeneous Markov process (i.e., S = R, the real
line, and !f = fll, the a-algebra of Borel subsets of R) with (a.s.) continuous
sample functions and a transition probability P(x, t, B). The stochastic process e(t) is termed a "homogeneous diffusion" if the transition probability
satisfies the following conditions: for each (j > 0 and x E R,
(i)
(ii)
(iii)
lim
t~O+
~
lim
t~O+
lim
t~O+
t
~
t
~
t
Jr
Jr
P(x, t, dy) = 0,
Iy-xl >,j
(y - x)P(x,t,dy) = b(x),
(8.8.1)
ly-xl:s;6
Jr
(y -
Iy-xl:s;,j
xf P(x, t, dy) =
a 2 (t).
The first condition seems justified in view of (a.s.) continuity of the sample
paths of the process. The function b(') characterizes the "average trend" of
evolution of the process over a short period of time, given that e(O) = x and
is called the "drift" coefficient. Finally, the non-negative quantity a 2 ( • ) determines the mean square deviation of the process e(t) from its mathematical
expectation. The function a 2 (.) is called the "diffusion" coefficient. We will
assume that the functions b( . ) and 0'2(. ) are finite.
The three conditions (8.8.1) can be written in slightly different form as
follows:
lim
t~O+
~(1
t
- Pxg(t)
E
lim
+ (j])) = 0,
~Ex[e(t) -
lim
t
t~O+
t~O+
[x - (j, x
~Ex[W) t
e(O)]
= b(x),
e(0)]2 = 0'2(X).
8. Markov Processes I
224
The central point in the theory of diffusion processes is that, for given
functions b(· ) and 0"2 ( .), there is a unique and completely determined transition probability P(x, t, B) which yields a homogeneous Markov process with
(a.s.) continuous sample functions. There are various methods for determining
P(x, t, B), ranging from purely analytical to purely probabilistic. The method
presented here was developed by Kolmogorov in 1931.
Consider
u(t, x)
=
f:
(8.8.2)
cp(y)P(x, t, dy),
where cp(.) is bounded continuous function. We shall attempt to derive
the partial differential equation which u(t, x) satisfies, subject to the initial
condition
(8.8.3)
u(o+,x) = cp(x).
The equation that we will derive is known as Kolmogorov's backward diffusion equation, which is satisfied by the function u(t, x).
Proposition 8.8.1. Assume that conditions (8.8.1) hold and that the function
u(t, x) defined by Equation (8.8.2) has a continuous second partial derivative
in x for all t > O. Then,
au
au 1 2 a2 u
at = b(x) ax +"20" (x) ax 2 '
t > O.
(8.8.4)
PROOF. Starting with the Chapman-Kolmogorov equation (8.1.2), we readily
obtain that, for each h > 0,
u(t
+ h, x) =
=
=
f:
f:
f:
cp(y)P(x, t
P(x, h, dz)
+ h, dy)
f:
cp(y)P(z, t, dy)
u(t, z)P(x, h, dz).
Consequently,
u(t
+ h, x) - u(t, x)
h
=
1
Ii
foo
-00
P(x, h, dy){ u(t, y) - u(t, x)}.
(8.8.5)
Now, according to condition (8.8.l.i) and the fact that cp(.) is a bounded
function, Equation (8.8.5) can be written as follows: for each b > 0 (as h -+ 0),
u(t
+ h, x) - u(t, x)
h
1
= Ii
r P(x,h,dy){u(t,y) t-6
XH
u(t,x)}
+ 0(1). (8.8.6)
225
8.8. Homogeneous Diffusion
Next, we expand u(t, x) in Taylor series to obtain
u(t,y) = u(t, x)
ou(t, x)
x)ax
+ (y -
+
(y - X)2 02U(t, x)
2!
. ox2
+ (y -
.
xfR(t,y)
with R(t, y) ~ 0 as y ~ x. Substituting this expansion into the right-hand side
of Equation (8.8.6) and then using conditions (8.8.l.ii) and (8.8.l.iii), we arrive
at
r
1m
u(t
h~O
+ h,x) h
u(t,x) _ b( )ou(t,x)
-
x
-~-
uX
1 2( )02 U(t,X)
x ~ 2
+ -2 q
uX
To complete the proof, we must consider the case h < O. This is handled in
essentially the same way beginning with
u(t
+ h, x) h
u(t, x)
=
1
Ii
foo
-00
P(x, JhJ, dy){u(t
+ h,y) -
u(t
+ h,x)}.
The analysis is the same as before (we use the Taylor expansion) except that,
in obtaining the limit as h ~ 0, it is necessary to invoke the joint continuity
of ou/ox and 02U/OX 2 in t and x. In particular, we need the fact that R(t, y) ~ 0
as y ~ x. This proves the assertion.
0
Remark 8.8.1. The error term in the Taylor expansion leads to an integral
bounded by the quantity
1 fX+cl
X-cl~~~X+cl JR(t,y)J Ii
x-cl
P(x,h,dy)(y - X)2.
Its lim sup as h ~ 0 is not larger, then max JR(t, y)J q2(X) ~ 0 as J ~ O.
Remark 8.8.2. To obtain the transition probability P(x, t, dy), assume that the
transition probability density exists, i.e.,
P(x, t, B) =
L
p(x, t, y) dy,
(8.8.7)
and that the derivatives
op(x, t,y)
ox
exist. Then, taking into account Equation (8.8.2), we can write Equation
(8.8.4) as
f:oo <p(y) {op(~/, y) -
b(x) Op(~:, y) _
~ q2(X) 02p!:~t, y)} dy =
0,
where <p(.) is an arbitrary continuous bounded function. This, then, implies
the equation
226
8. Markov Processes I
op(x,t,y) _ b( )op(x,t,y) _ ~ 2( )02 p(X,t'Y)d = 0
2u x
ox2
Y
.
ot
x
ox
(8.8.8)
Under appropriate conditions on b(·) and u(·), a solution of this equation
exists and is unique. It represents a transition probability of the homogeneous
diffusion process if, for instance, the coefficients b(x) and u2(x) are bounded
and satisfy the Lipschitz conditions
Ib(y) - b(x)1
s Ciy - xl,
lu 2 (y) - u 2 (x)1 s
Ciy - xl
and if u 2 (x) ~ u 2 > o.
Assume again that conditions (8.8.1) hold. If the transition probability
density exists, if it has a derivative
op(x, t,y)
ot
which is continuous with respect to t and y, and if the function b(y)p(x, t, y) is
twice continuously differentiable with respect to y, the equation satisfied by
p(x, t, y) for fixed x turns out to be
op(x, t, y)
ot
1 02
0
= - o/b(y)p(x, t, y)] + 2" oy2 [u(y)p(x, t, y)].
(8.8.9)
This is the Fokker-Planck equation or the forward Kolmogorovequation.
We will not derive this equation here.
The study of diffusion processes via the backward partial differential equation has been reasonably successful in many cases. The basic problem with
this method is that partial differential equations of this type are difficult to
solve. In addition, the probabilistic input is very small.
An alternative approach, more probabilistically appealing, was proposed
by P. Levy and carried out by K. Ito in 1951. The idea here is to approach
the problem in a fashion similar to Langevin's approach (see Section 3.7 of
Chapter 3). Roughly speaking, what Ito has shown is that a diffusion process
is governed by a stochastic differential equation of the form
d~(t) = b(~(t)) dt
+ u 2R(t)) dW(t),
(8.8.10)
where b(·) and u 2 (.) are those from (8.3.1) and W(t) is a standard Brownian
motion. Processes satisfying (8.8.10) are called locally Brownian.
EXAMPLE 8.8.1. Find the solution of the stochastic differential equation
d~(t)
= tXoW)dt + tXl dW(t),
where tXo and tXl > 0 are constant and W(t) is a standard Brownian motion.
Here, clearly, b(x) = tXoX and u 2(.) == tX l •
As we have pointed out [see Equation (3.7.5)], a solution of this equation
227
Problems and Complements
over an interval containing a point to is defined as a process having continuous sample functions which satisfies the equation
i' de(s) = txo
Jto
r' e(s)ds + r' dW(s)
txl
Jto
Jto
or
e(t) - e(t o) =
txo
rt e(s) ds +
txl [W(t)
Jto
- W(t o)].
To solve this equation, multiply both sides by e-a.ot to obtain
e-a.o'[ w)
-
txo
1:
e(s) dSJ =
[Wo) -
txl W(t o)]e-l1.ot
+ txl e-a.otW(t).
This, on the other hand, can be written as
d(e-a.ot
1:
e(S)dS) = [e(t o) -
txl
W(to)]e-a.o'dt
+ txle-l1.otW(t)dt.
Integrating both sides from to to t, we have
e-a.ot
rt e(s)ds = e(to) -
J~
txl
~
W(t o) (e-a.oto _ e-a.ot,) +
txl
rt e-a.oSW(s)ds
J~
or, equivalently,
rt e(s)ds = e(to) -
J~
txl
~
W(t o) (ea.o(t-,o) _ 1) + txl
I' ea.o(t-S)W(s)ds.
J~
Finally, by differentiating with respect to t, we see that
e(t) = (e(t o) -
txl
W(to»el1.o(t-t o) + txl ( W(t) -
txo
1:
ea.o(t-S)W(s) dS)'
Take to = 0 and assume that e(O) = 0; then, e(t) becomes
e(t) =
txl
W(t) -
txOtxl
I
ea.o(I-S)W(s)ds.
This is clearly a Gaussian process with
Eg(t)} = 0
and
Eg(t)e(u)} =
txl(ea.o(t+u) -
ea.o1t-u1)/txo.
Problems and Complements
8.1. Let {W); t ~ O} be a real homogeneous Markov process with transition probability P(x, t, B). Show that, for any Borel function f: RR -+ R and every
tl < ... < tR'
8. Markov Processes I
228
8.2. Show that a stochastic process {,(t); t ~ O} with a state space {S, .'I'} is a Markov
process if and only if, for any 0 :::;; SI < ... < Sk < t < tl < .,. < tn'
= x}
Pg(SI):::;; x1,···,'(x k ):::;; xk,Wd:::;; Yl,···,Wn):::;; YnIW)
p(Ci
=
gi:::;;
xdlW) = x
8.3. (Continuation) Show that {W); t
Eta
=
!.('(Si))
JJ
~
)pCa
gj:::;; yJIW)
=
x).
O} is a Markov process if and only if
h)Wj))IW) = x}
E{]j !'('(si))IW) = x}ELa h)W))IW) = x},
where!. and hj are real Borel functions on {S, 9'}.
8.4. Let {W); t ~ O} be a Markov process. Show that the sequence
Markov property.
8.5. (Continuation) Set Zk
Markov process?
=
g(k)}~
has the
[,(k)], where [x] is the integer part of x. Is {Zd~ a
8.6. Let {,(t); t ~ O} be a real Markov process and f: R ...... R a Borel function. Show
by a counterexample that {f(,(t)); t ~ O} is not necessarily a Markov process.
However, if f(· ) is one-to-one, the Markov property is preserved.
8.7. Let {W); t
Show that
~
O} with E {W)}
= 0 for
all t
~
0 be a Gaussian random process.
is a necessary and sufficient condition for ,(t) to be a Markov process, where
0:::;; tl < ... < tn' n = 2, 3, ....
8.8. Let {'i(t); t
Show that
~
O}, i = 1, 2, be two independent standard Brownian motions.
X(t) = (, dt)
+ '2(t))2
is a Markov process.
'1
8.9. Assume that {Ut); t ~ O}, i = 1, 2, are two zero mean, independent, strictly
stationary, Markov processes. Under what conditions is Y(t) = (t) + '2(t) a
Markov process?
8.10. Let {W); t
~
O} be a standard Brownian motion and
LX
= inf{t;W)
= x}.
Show that {LX;X > O} is a process with independent increments and, hence,
Markovian.
8.11. Let {,(t); t
such that
~
O} be a homogeneous Markov process with state space S = {-1,1}
+ t) = -1IW) = -1} = (2 + e- 3 ')/3,
Pg(s + t) = -1IW) = 1} = (2 - e- 3 ')/3.
Pg(S
Find all invariant measures of the process.
229
Problems and Complements
8.12. Let {~(t); t ~ O} be a standard Brownian motion. Show that the process
Z(t)
= IW) + xl, x> 0,
(so-called "Brownian motion with reflecting barrier") is a homogeneous Markov
process.
8.13. Let g(t); t ~ O} be a standard Brownian motion. A Brownian bridge {X(t);
t E [0, toJ} is defined by
W) + x
X(t) =
t
- -(W)
to
+y-
x).
This is clearly a Brownian process that starts at point x at time t = 0 and passes
through point y at time to. Show that X(t) is a Markov process and determine
its mean and covariance function.
8.14. (Continuation) Show that X(t) and X(t o - t) have the same distribution.
8.15. Let {~(t); t ~ O} be a standard Brownian motion. Show that X(t) = e-t~(e2t) is a
strictly stationary Markov process. Find its mean and covariance function.
8.16. Let {X(t); t ~ O} be a stationary Gaussian process. If X(t) has the Markov
property, show that its covariance function has the form ce- 01tl , c > 0, ex > O.
8.17. Let
{~(t); t ~
O} be a homogeneous Markov process with transition probability
P(x, t, B). If
+ h,t,B + h) = P(x,t,B)
(spatial homogeneity) where B + h = {x + h;x E B}, then show that
P(x
W) has
stationary independent increments.
8.18. Let g(t);t ~ O} be a homogeneous Markov process with state space
S = {O, 1, ... }. Determine
Pij(t)
= Pg(s + t) = jl~(s) = i}
assuming that
Pij(t)
= 0 if i >
j;
Pi,i+! (h)
Pij(h) = o(h)
8.19. Assume that a state j
E
= Ah + o(h) as h ..... 0+;
if h ..... 0+ when j - i
~
2.
S = {O, 1, ... } is stable (see Definition 8.3.15). Show that
pW) = - Pij(t)qi
+
<Xl
L Pik(t)qkj
k#j
(this is the forward Kolmogorov equation).
8.20. Let {~(t); t ~ O} be a homogeneous Markov chain with state space S = {O, 1,.,.}
and transition probability Plj(t). Assume that
Pi,i+!(h)
= Ah + o(h),
pi,/-1(h) = iJ.lh
+ o(h)
as h ..... 0+. Determine P{W) = k} and show that
W) =
where
~o(t)
is binomial,
~ 1 (t)
~o(t)
+ ~l(t),
is Poisson, and
~o(t)
is independent of ~ 1 (t).
8. Markov Processes I
230
8.21. Let g(t); t ~ O} be a "pure birth" process (see Remark 8.4.1). Denote by
0< < < ... its discontinuity points. Show that {Z.}f is a sequence of
independent random variables where Zl = '1, Z. = '. - '.-1, n ~ 2.
'1 '2
8.22. (Continuation) Show that
P{Z.+1 ::;;
t}
= 1 - e- ln ',
where An is defined by (8.4.4).
8.23. (Continuation) Show that (i ::;; j)
Pij(t)
= Pg(s + t) = jl'i = s}.
8.24. (Continuation) Show that
pit) p{ t
Zk::;;
=
k=i+l
8.25. (Continuation) Show that (Ai
:ft
P{W) = n} =
PWt)
=f.
t} - p{ jf
Zk::;;
k=i+1
t}.
Aj if i =f. j)
Aj kto
(exp - ·'!f.1 (Aj l
Ak)). n
~ 1,
= O} = e- lo'.
8.26. (Continuation) Show that
Pk.(t) =
U
AjCt e- li
'!f.1 (Aj - Ak)).
n> k,
Pkk(t) = e- l .,.
8.27. Let {W); t
~
O} be a pure birth process. In order that
I
00
P{W) = n} = 1 for all t,
"=0
show that it is necessary and sufficient that the series
diverges.
8.28. Let {~(t); t ~ O} represent the size of a bacterial population which may grow
(each bacterium can split into two) or decline (by dying). In a small time interval
(t, t + At), each bacterium independently of others has a probability
AAt
+ o(At)
as At -+ 0,
A > 0,
of splitting in two and a probability
IIAt
+ o(At)
as At -+ 0, II > 0,
of dying. Form a system of differential equations determining
Pk(t)
= P{W) = k}, n =
1, ... ,
231
Problems and Complements
and solve it assuming that ,(0)
= 1. Verify
<Xl
L Pt(t) = 1.
k=O
8.29. (Continuation) Show that ,(t) is a homogeneous Markov process and determine
Pij(t) = Pg(s + t) = il e(s) = i}.
8.30. The linear birth and death process with immigration. Let {W); t ~ O} be the size
of a population at time t. Members of the population behave independently of
one another. In (t, t + ~t), a member will give a birth to another member with
probability .Il~t + o(M) as t ~ O. With probability /-I~t + o(~t), the member will
die. In the same time interval, an immigrant will join the population with
probability a~t + o(~t) as M ~ O. If e(O) = i, determine the probability generating function for W).
8.31. (Continuation) Show that
E{W)}
and determine E{W)J2.
= a/(/-1 - .Il)
+ [i -
a/(p- .Il)]e-(/l-Alt
CHAPTER 9
Markov Processes II:
Application of Semigroup Theory
9.1. Introduction and Preliminaries
Let {~(t); t ~ O} be a real homogeneous Markov process with transition
probability P(x, t, B). In applications the following situation is typical. The
transition probability is known for all t in a neighborhood of the origin.
Then, P(x, t, B) can be determined for all t > 0 by means of the ChapmanKolmogorov equation (8.1.2). As in the case of a countable state space, we will
show that, under assumption (8.1.13), the transition probability P(x, t, B) is
completely determined by the value of
ap(x,t,B)
at
0
at t = .
(9.1.1)
Our goal is to deduce everything about the behavior of the Markov
process ~(t) from (9.1.1). Naturally, when we say everything about the
behavior of ~(t), we mean everything which does not depend on the initial
distribution because we only make use of the transition probability.
The basic tool to achieve this goal is semigroup theory. The modern theory
of homogeneous Markov process is basically semigroup theory, whose elements will be discussed in the rest of this chapter. This approach not only
elucidates various aspects of this important class of Markov processes, but
also provides a unified treatment of the theory which is not attainable by
other methods.
We begin with a brief review of some concepts th"t were discussed in some
detail in Chapter 5. Denote by B the set of all real bounded Borel functions
defined on R. With the supremum norm
IIhll = sup Ih(x)l,
xeR
(9.1.2)
233
9.1. Introduction and Preliminaries
the set B becomes a Banach space. A mapping
T: B-+B
is called a "linear operator" (see Definition 5.7.1) if, for any two hI' h2 E B,
(9.1.3)
where IX and f3 are two fixed numbers.
We say that T is bounded if there exists a positive constant M <
that
IIThllsM'llhll
00
forallhEB.
such
(9.1.4)
The smallest M for which (9.1.4) holds is called the "norm of the operator" T
and is denoted by IITII. Thus,
sup II Thll = II Til.
he B IIhll
(9.1.5)
h;"O
From (9.1.5), we clearly have
IIThll s IIhll·IITli.
If II Til s 1, the operator T is said to be a "contraction."
EXAMPLE
9.1.1. Let B
= CEO, 1J be the set of continuous functions on
[0,1].
Define
T: B-+B
by
(Th){x) = xh(x)
for each h E B and x E [0, 1].
Clearly,
T(hl
+ h2 )(x) = x(h l (x) + h2(X)) = xh l (x) + xh 2(x)
= (Thd(x) + (Th 2){x)
so that T is linear. In addition,
IITII
=
sup xlh(x)1 s Ilhll,
which implies that II Til s 1 (in fact, II TIl = 1).
Definition 9.1.1. A one-parameter family {Tt; t ;;::: O} of bounded linear operators on a Banach space B is called a "contraction semi group" if
(i) II Ttull s Ilull for all u E B,
(ii) Tt+s = T t . P = p. Tt for all s, t;;::: 0.
In the following we will say, for short, that {T'; t ;;::: O} is a semi group. A
9. Markov Processes II: Application of Semigroup Theory
234
semigroup is called "strongly continuous" if
TO
=
and
I
liT' - I II ~
°
as t ~
where I is the identity operator. Since, for any s,
t ~
°+ ,
(9.1.6)
0,
because T' is a contraction, this implies uniform continuity of T'.
In the theory of homogeneous Markov processes, of particular interest is
the semi group induced by the transition probability. Let {~(t); t ~ o} be a real
homogeneous Markov process with the transition probability P(x, t, B). Let
B be the set of all real bounded Borel functions on R. For each fEB and
t ~ 0, define
(T'f)(x)
=
r:
f(y)P(x, t, dy).
(9.1.7)
The family of operators T' defined by (9.1.7) is clearly a contraction. On the
other hand, from the Chapman-Kolmogorov equation, we have
(T'+sf)(x)
=
=
=
=
f:
f: f:
f: f:
f:
f(y)P(x,
t
f(y)
+ s, dy)
P(x, t, dz)P(z, s, dy)
P(x, t, dz)
f(y)P(z, s, dy)
P(x, t, dz)(T'f)(z)
=
(T'T'f)(x).
In other words,
and this is the semigroup property.
9.2. Generator of a Semigroup
°
Let {T'; t ~ o} be a strongly continuous semigroup on a Banach space B. If
a sequence {hn}f c:: B is such that IIh n - hll ~ as n ~ 00, we say that it
strongly converges to h and write
h = s lim hnClearly, the strong convergence is nothing but pointwise uniform convergence.
9.2. Generator of a Semigroup
235
Definition 9.2.1. The (infinitesimal) generator A of the semigroup is defined by
· T1- f
Alf =s 11m
-1-+0+
t
(9.2.1)
at those fEB for which this limit exists.
Denote by DA c: B the subset of elements where the limit (9.2.1) exists.
Clearly, if fl' f2 c: D A' then
+ f3f2
Cl.fl
E
DA •
Consequently, A is linear operator on Dk We now prove the following
proposition.
Proposition 9.2.1. Iff E D A, the function T'f is differentiable in t and
(a)
T'fED A ,
dT1
dt
(b)
-
(c)
T1 - f
PROOF.
= AT'f = T'AJ,
=
I
'
(9.2.2)
TSAf ds.
(a) We have to show that the limit
.
ThT'f- T1
s hm
h
h-+O+
exists. This, however, follows from
ThT'f - T'f
h
T"f - f
h
---:;---- = T' ---=---
(9.2.3)
and the fact that the limit on the right-hand side of the last equation exists as
h --+ O.
(b) From (9.2.3), we have that, for all t ~ 0, there exists
.
T'+hf - T'f d+T1
s hm
=-h-+O+
h
dt
and that
(9.2.4)
To complete the proof we have to show that
d+T1
d-T'f
~=~
for eachfE D A •
236
9. Markov Processes II: Application of Semigroup Theory
Consider 0 < s < t and write
I rtf -s Tt-sf -
TtAf11
= I rtf -s rt-sf - Tt-sAf + Tt-sAf - TtAf11
~ I Tt-s(Pfs- f
- Af)11
~ I T t- sI (II TSfs-
f - Afll
as s --+ 0 which completes the proof of (b).
(c) The last part follows from
I :s
= rtf - f =
Pf ds
I
+ I Tt-S(PAf - Anll
+ I PAf -
Alii) --+ 0
PAf ds,
which proves the proposition.
D
In the following we will define the concept of a strongly integrable function. Let W, be a mapping
W: [a,b]
--+
B.
If there exists the limit
slim
n-l
L (tk+l -
6-0 k=O
tdW,k+l'
(9.2.5)
where a = to < tl < ... < tn = band () = max O :;:;k:;:;n-l (tk+l - t k ), the mapping W, is called "strongly integrable" and the limit (9.2.5) is denoted by
f
W,dt.
The following properties of the integral are simple to verify. If W, is
strongly integrable, then nv, is strongly integrable on [a, b] and
(9.2.6)
where T: B --+ B is a linear operator. If W, is strongly continuous and integrable, then
Ilf
w,dtll
~
f
IIW,II dt.
(9.2.7)
In addition, if W, is strongly integrable and strongly continuous from the
right,
s lim -h1
h-O+
f
a h
a
+ W,dt
= J¥".
(9.2.8)
9.2. Generator of a Semigroup
237
Finally, if d'W,/dt is strongly continuous on [a, b], then
dW.
f -i
dt =
b
a
w" -
w,..
(9.2.9)
The next proposition holds for strongly continuous semigroups.
Proposition 9.2.2. Let {T'; t ~ O} be a strongly continuous semigroup; then, for
all t ~ 0,
(9.2.10)
PROOF.
Observe that for all 0 < u < t, we have [see (9.2.6)]
~(TU -I)
t
T"fds =
~{t T"+ufds -
t
T'f dS }
(after the substitution t = u + sand [0, t] = [0, u] U [u, t])
=
~{r+1 TTfdt -
=
~{f+u TTf dt -
t
f:
T"f dS }
T"f dS}.
By letting u -+ 0 and invoking (9.2.8), we have
s lim _(TU
- I)
1
u-+O+
U
i'
TOf ds = T'f - f.
0
Thus, the limit exists and the proposition follows.
D
How big is the set D A? Does it contain enough elements to make
the concept of a generator useful? The following corollary answers these
questions.
Corollary 9.2.1. If A is the generator of a strongly continuous semigroup
{T'; t ~ O} on B, then DAis dense in B. As a matter of fact, because T'f is
strongly continuous, it follows from (9.2.10) that, for every feB,
-1
t
i'
i'
0
T"fdseD A·
On the other hand, due to (9.2.8),
slim -1
1-+0+ t
0
T"f ds = feB.
In other words, any element feB is a strong limit of a family of elements from
DA-
9. Markov Processes II: Application of Semigroup Theory
238
Proposition 9.2.3. The operator A is closed, i.e.,
slim fn =f,
if {f,,}
c DA and
slim Afn = h,
then fED A and Af = h.
PROOF.
From (9.2.2.c) we have, for every n = 1, 2, ... and t
L
L L
Ttfn -
By letting n --+
00,
we obtain
=
rtf - f
f" =
T'Afds
~
0,
T'Af" ds.
=
T'hds
E
DA
due to (9.2.10). Dividing both sides of the last equation by t and letting
t --+ 0 +, we conclude that Af = h, which proves the proposition.
D
9.3. The Resolvent
This section is concerned with solutions of the differential equation (9.2.2.b).
In the numerical case, the semigroup property leads to the classical Cauchy
functional equation
f(s
+ t} = f(s}f(t},
f(O} = 1
whose only possible continuous, in fact, the only measurable solutions, are
f(t} = e llt and f(t} == O.
In the case of the differential equation,
dT t
dt
= A Tt
'
TO
= I,
to be also exponential, namely,
would require that B = D A so that A is a bounded operator. This, however,
is not always the case, although A is always the limit of bounded operators
(Tt - I}/t.
Therefore, a new method to solve this differential equation is required. To this
end, we need the concept of a resolvent.
Definition 9.3.1. Let {Tt;t
space B, and let
R;./ =
~
to
O} be a contraction semigroup on a Banach
e- At T1 dt, fEB,
A > O.
(9.3.1)
The family of operators {R l ; A > O} is called the "resolvent" of the semigroup.
9.3. The Resolvent
239
From (9.3.1), we see that RJ is the Laplace transform of the continuous
function Ttf Because A > 0, the domain of definition of R;. is the whole B.
Consequently, R;. is a bounded linear operator. In fact,
IIRJII :::;; LX) e-Atll Ttfll dt :::;; Ilfll/A,
(9.3.2)
and linearity is apparent. Note that the integral (9.3.1) is defined in the
Riemann sense.
Proposition 9.3.1. Let {Tt; t ~ O} be a strongly continuous semigroup on a
Banach space B and A its generator; then, for each A > 0,
(9.3.3)
In addition, the mapping
(9.3.4)
is one-to-one and
(9.3.5)
PROOF.
Let us first prove that (9.3.3) holds. To this end, write
1 t - I)R;.h = t(T'
1
t(T
- I)
foo e-;,sYSf ds
0
[by invoking (9.2.8)]
1 e;'t
= __
ft
1 e;'t
= __
ft
1
e-;'uTuhdu + _(eAt
- 1) foo e-;,sYShds
t o t
0
1
e-;'uTuhdu + _(e;'t
- 1)R;.h.
t o t
By letting
t -+
0 +, we see that
AR;.h
=
-h
+ AR;.h.
(9.3.6)
Consequently, the limit
slim (T' - I)R;.h/t
t-O+
exists, which proves (9.3.3).
From Equation (9.3.6), we obtain
AR;.h - AR;.h = h
(9.3.7)
240
9. Markov Processes II: Application of Sernigroup Theory
so that 1= R;.h is a solution of the equation
)f - AI = h.
(9.3.8)
To complete the proof we have to show that (9.3.8) has a unique solution
for each A > O. Assume that 11 and 12 are two solutions of (9.3.8); then,
rp = 11 - 12 E DA and
I
(9.3.9)
Arp - Arp = O.
On the other hand, from Proposition 9.2.1, we see that Ttrp
E
DA and satisfies
dTtrp _ TtA _ 'Tt
----;It
rp - I\. rp.
Thus,
Consequently, for all t
0,
~
e-MTtrp = C (a constant).
For t = 0, we obtain that C = rp, which gives
o ~ Ilrpll
~
e-MIITtrpll ~
e-Mllrpll-+O
as t -+ +00, so that Ilrpli = 0, and uniqueness is proved. This shows that
AI - A is one-to-one on DA- Finally, from (9.3.7), we obtain that
R;. = (AI - Ar 1
and that
o
Remark 9.3.1. Proposition 9.3.1 shows that the mapping (9.3.3) is one-to-one
and onto.
Corollary 9.3.1. For any I
E
B, we have
s lim ARJ = f.
;. .... +ex>
(9.3.10)
In lact,
III - ARJII = If! - A Lex> e-;'tTfj dt I
= All Lex> e-;'t[f -
TfjJ dt II
= II Lex> e-U[f -
p;'-ljJ du II
9.4. Uniqueness Theorem
241
Now, for each u 2 0, Ilf - TU'<-'fll --+ 0 as A --+ 00 and the integrand is bounded
by 2e- u llfll. Now invoking the dominated convergence theorem, (9.3.10)
follows.
9.4. Uniqueness Theorem
In this section, we show that, under certain regularity conditions, semigroups
are uniquely determined by their generators. Thus, given a generator, we
should be able, at least in some cases, to recover the corresponding semigroup. This appears simple enough if the generator is bounded. If not, there
are considerable difficulties that must be resolved to achieve this goal.
Let B be a Banach space. The space of all bounded, continuous linear
functionals I: B --+ R with the norm
11111
=
11(f)1
~~~m
(9.4.1)
fEB
is also a Banach space, which is called the "conjugate" space of B and denoted
by B*. We now need the following result.
Lemma 9.4.1. Let the mapping
be continuous and bounded. Then,
Loo e-.<tw(t) dt =
0
(9.4.2)
for all A > 0 implies that w(t) == O.
PROOF. Let I E B*; then I(w(t» is a bounded and continuous real function. Due
to linearity of I, we have, for all A > 0,
Loo e-.<tl(w(t»dt =
I(Loo e-.<tW(t)dt) = O.
This clearly implies that I(w(t» = 0 for all t 2 0 because of the uniqueness
theorem for the Laplace transforms of real functions. Because this holds for
any I E B*, the assertion follows.
D
We are now ready to prove the following proposition.
Proposition 9.4.1. Let {Tci;t 2 O} and {Tf;t 2 O} be two strongly continuous
semigroups on B with common generator A; then, Td = Tf for all t 2 O.
242
9. Markov Processes II: Application of Semigroup Theory
PROOF. From Proposition 9.3.1 we deduce that TJ and T{ have the same
resolvent,
R).
= (A.I
A)-I.
-
Consequently, for every fEB and A. > 0,
Loo e-;.t(TJ! -
Ttf)dt
= R;.f -
RJ = O.
From this and the previous lemma, the assertion follows.
D
Next, we shall attempt to construct a semigroup from its generator A when
A is bounded. To this end, the following notation is needed. Define exp{A} as
co
eA =
L Ak/k!,
k=O
where An is the n-iterate of A, i.e., An
(9.4.3)
= A . An-I, n = 1, 2, .... Because
IIAnl1 :s; IIAII"
and A is bounded, it follows that the series (9.4.3) converges absolutely.
Proposition 9.4.2. If the generator A of a semigroup {Tt; t
for all t
~
~
O} is bounded, then
0,
(9.4.4)
PROOF. Let us show that
I etA t- I -
A 11-+ 0
as t
--+
O.
(9.4.5)
Clearly,
lIe tA
-
I - tAil :s;
co
L IIAllktk/k! = e
k=2
tllAIl -
1-
tiIAII,
so that (9.4.5) holds. Because exp{tA} is a semigroup, the assertion now
follows from Proposition 9.4.1.
D
From this proposition we see that a semigroup Tt must be of the form
(9.4.4) if its generator A is bounded. In addition, D A = B. Therefore, every
bounded operator generates a semigroup.
If the generator A is not bounded, the situation is considerably more
complicated. However, from Proposition 9.4.1, we know that A determines
uniquely its semigroup. The next proposition shows that, under certain conditions, T' is the unique solution of the differential equation (9.2.l.b).
Proposition 9.4.3. Let A be the generator of a semigroup {Tt; t
~ O}. Iff E D A ,
the function W(t) = Ttf is the unique solution of the differential equation
9.5. The Hille-Yosida Theorem
243
dW(t)
dt
= A W(t)
(9.4.6)
subject to the following conditions:
(i) W(t) is strongly differentiable for all t > 0;
(ii) II W(t)11 ~ Ke~', where K, oc E (0, 00) are some constants;
(iii) W(t) -+ f strongly as t -+ 0+.
PROOF. According to (9.2.1.b), the function W(t) satisfies Equation (9.4.6)
and condition (i). To show uniqueness, assume that W1(t) and W2 (t) are two
different solutions of (9.4.6) satisfying (i)-(iii) and set V(t) = WI (t) - W2 (t).
Then V(t) is also a solution of (9.4.6) which satisfies conditions (i) and (ii) and
V(t) -+ 0 (strongly) if t -+ 0+. Set U(t) = e- ll V(t); then, from (9.4.6), we have
d
- U(t) = - Ae- lt V(t)
dt
d
+ e- ll -
dt
V(t)
= -AU(t) + e-l'AV(t)
= -AU(t)
+ AU(t) =
-R;:IU(t).
Hence,
II
II
Integrating both sides of this equation, we get
o U(s) ds
= -Rl
0
d
ds U(s)ds = -RlU(t).
When t -+ 00, the left-hand side tends to the Laplace transform of V(t), while
the right-hand side converges to O. Therefore, for a suitable A > 0, the Laplace
transform of V(t) vanishes. This and Lemma 9.4.1 prove the assertion.
D
9.5. The Hille-Yosida Theorem
In the previous section, we have established that every bounded operator
generates a semigroup specified by (9.4.4). If an operator is unbounded, it may
not be the generator of a semigroup. When is such an operator the generator
of a semigroup? This question is answered by the following proposition due
to Hille and Yosida, which characterizes the operators that are generators of
semigroups.
Proposition 9.5.1. Let B be a Banach space and A a linear operator with domain
DA c B. In order to be the generator of a strongly continuous semigroup on B,
it is necessary and sufficient that the following conditions hold:
9. Markov Processes II: Application of Semigroup Theory
244
(i) DA is dense in B;
(ii) lor every A > 0 and h E B, the equation
(9.5.1 )
)J-AI=h
has a unique solution lEDA;
(iii) the solution I satislies IIIII ~ I h II/A.
PROOF. The necessity of these conditions was already proved in the previous
section. To prove their sufficiency, assume that they hold. Then, from (ii), it
clearly follows that the operator AI - A is invertible. Set
R;.
= (AI -
A> 0
Ar 1 ,
(we will see later that R;. is indeed the resolvent). Clearly, R;.: B --+ DA (onto)
and by (iii) we have that IIR;.II ~ A- 1 .
Define A;. = AAR;. and write
A ..
= A{ -(AI -
A)R ..
+ AR .. } = A(AR;. -
I).
(9.5.2)
From this, we deduce that A .. is bounded, i.e.,
I A .. I
~ AllAR .. - III ~ A(l
Let us prove that, as A --+
+ 1) = 2A.
00,
Ad --+ AI strongly for all I
To this end, we will show first that, as A --+
DA •
(9.5.3)
E
DA •
(9.5.4)
00,
strongly for all I
ARd --+ I
E
Because [see (9.5.2)]
1
ARd - I = -XAd = ARd = R .. AI,
we have
liARd -
III
~
IIR.. IIIIAIIi
~
IIAIll/A --+ 0
as A --+ 00, which proves (9.5.4).
Let us now show that (9.5.4) holds for every IE B. Due to condition (i),
for any IE Band e > 0 arbitrarily small, there exists 10 E D A such that
III - loll ~ e. Then
liARd -
III
~
~
IIARdo - loll + IIAR;.(f - lo}ll + III - loll
IIARdo - loll + 2e.
Consequently, due to (9.5.4)
lim sup liARd - III
.. --+00
which shows that (9.5.4) holds for all I
E
B.
~ 2e,
245
9.5. The Hille-Yosida Theorem
To prove (9.5.3) consider an fED A; then, due to (9.5.4),
A;.f = ).R;.Af -+ Af strongly as). -+
00.
The last result implies, roughly speaking, that the bounded operator A;.
approximates A for large )..
Define
(9.5.5)
TI = exp{tA;.}.
This is clearly a (contraction) semigroup for each), > O. Taking into account
(9.5.2), we can write
TI = exp{t).2R;. - tAl} = exp(-t).)exp{t).2R;.}.
Therefore,
Next, we have to prove that TIf has a limit as ). -+
to show that, for fED A,
II TIf - T;fII
-+
0 as).,,., -+
00.
To do this, we have
00.
(9.5.6)
Let ). and ,., be fixed: Set
(9.5.7)
then
ql(t) = A;. Tlf -
A~ T;f =
A;.qJ(t)
+ g(t),
where
g(t) = (A;. - A,.) T;f.
It is not difficult to show that
d
d/qJ(t)exp { -tA;.}]
= g(t)exp{ -tA;.}.
Therefore, because qJ(O) = 0 [see (9.5.7)],
qJ(t)exp{ -tA;.}
Noting that
=
t
exp{ -sA;.}g(s)ds.
Tl and A" commute, we can write
qJ(t) =
=
Hence, because
t
t
exp{(t - s)A;.}g(s)ds
TI-sT"S(A;. - A,,)f ds.
Tl is a contraction for each), > 0,
9. Markov Processes II: Application of Semigroup Theory
246
II qJ(t) II
~
I
~
t II (A;.
II TrST;(A;.
- Aq)JII ds
- Aq)JII.
(9.5.8)
But, by (9.5.3), the last term tends to zero as A, '1 -+ 00, which proves (9.5.6).
Notice that the convergence is uniform in t in any compact interval.
Now define the family of operators {T'; t ;;::: O} by setting
T'J = slim TIJ. J E DA,
in the strong sense. Because the convergence is uniform in t, T'J is continuous
in t and T'J -+ J strongly if t -+ 0+. In addition, for each JEDA,
II TIJII + II T'J - TIJII ~ IIJII + e,
Consequently, II T'JII ~ IIJII. It is now easy to extend
II T'JII ~
where e-+ 0 as A -+ 00.
Tto allJE B.
Finally, we have to show that the generator of the semigroup that we have
just constructed is A. In any case, T' has a generator, say Ao. Let us compute
AoJ for JEDA' Because
it follows that
TIJ - J =
I
TIA;./ds.
(9.5.9)
Now
IITIA;./ - T'AJII
~
IIT'AJ - TIAJII + IITIAJ - TIA;'/II·
The first term on the right side tends to 0 uniformly as A -+ 00, for t bounded.
The second term is bounded by IIA;./ - AJII because TI is a contraction, and
this bound tends to zero as A -+ 00. Therefore,
TIA;./ -+ T' AJ strongly as A -+ 00
for each fixed JED, and the convergence is uniform in t over compact
intervals. As a result, we can pass to the limit in (9.5.9) to obtain
T'J - J =
I
TSAJ ds.
Dividing by t and letting t -+ 0+, we have
Aof = s lim T'J - J = slim -1
1.... 0+
t
1.... 0+
t
I'
0
(9.5.10)
,T"AJdu = Af
Consequently, DAo ::::l DA and AoJ = AJ on DABecause Ao is the generator of {T'; t ;;::: O}, AI - Ao maps D Ao onto B by
9.6. Examples
247
Proposition 9.3.1. But AI - A agrees with AI - Ao on DA and maps DA onto
B by condition (ii). It follows that D Ao = D A and, thus, Ao = A. This proves
the proposition.
0
9.6. Examples
The first five sections of this chapter are concerned with some elements of
semigroup theory. The question, of course, is where does all this lead? Our
principal interest here is homogeneous Markov processes and their transition
probabilities. Semigroup theory is a tool which is supposed to help us to
investigate what are possible systems of transition probabilities {P(x, t, B)}.
The development so far clearly implies that we should study P(x, t, B) through
its generator. In this respect, it is of some interest to find out which operators
on a suitable Banach space can be generators of semigroups. If an operator
is bounded, then it is certainly the generator of a semigroup (see Proposition
9.4.2). An unbounded operator is not necessarily the generator of a semigroup. The next several examples illustrate this particular situation.
EXAMPLE 9.6.1. Let {~(t); t ~ O} be a homogeneous Markov chain with a finite
state space, say S = {Xl, ... ,XN} and the transition probability
Pij(t) = Pg(s
+ t) =
xjl~(s) =
Xi}.
We will assume that all states in S are stable (see 8.3.15). Denote by Mt the
transition matrix
Mt = {Pij(t)};
then due to condition (8.3.2), M O = I where I is the unit matrix. In addition,
condition (9.3.3) implies that
lim Mt = 1.
t-+O+
From the Chapman-Kolmogorov equation, it readily follows [see (8.3.7)]
that, for all s, t ~ 0,
M S+ t = MS.M t = Mt·M s,
so that the family of transition matrices {Mt; t ~ O} represents a semigroup.
Denote by B the set of all real bounded functions on S (as a matter of fact,
the Banach space B is the set of N-dimensional column vectors). For every
Xi E Sand fEB, we define
(Mtf)(Xi) =
N
L Pij(t)f(x).
j=l
We have
I(Mtf)(Xi) I :::;; Ilfll
for all Xi E S,
(9.6.1)
248
9. Markov Processes II: Application of Semigroup Theory
so that
Xi
which clearly shows that {Mt; t 2 O} is a contraction semigroup.
Consider
~{(Mtf)(XJ -
f(xJ}
=
~{Jo pij(t)f(x) -
f(XJ}
=
~ L~i pij(t)f(x) -
[1 - Pii(t)Jf(XJ }.
From Propositions 9.3.2 and 9.3.3, it readily follows that the limit
1
lim -t {(Mtf)(x;) - f(x;)}
L %fU) N
=
t-+O+
quf(x;)
(9.6.2)
j=li
= (Af)(x;)
exists for all fEB and
Xi E
S, where
A=
Because all states of S are stable, Iqiil < 00.
It is clear that A is a bounded operator because
IIAfil =
S~iP I(Af)(xJ I = S~iP !j~; qijf(xj ) -
quf(X;)!
N
S sup
i
L 1%1·lIfII·
j=i
Finally, to show that A is the generator, consider
II Mtft-f -Afll
=s~ipl~{(Mtf)(X;)-f(X;)} -(Af)(X;)!~O
due to (9.6.2). Because the generator is bounded, we have
Mt
=
etA
= I
+
n
tk
k=l
k.
L ,Ak •
EXAMPLE 9.6.2. Let {~(t); -00 < t < oo} be a strictly stationary homogeneous
chain with the same state space and transition probabilities as in the previous
example. Denote by
249
9.6. Examples
then, for all s, t
~
0,
Pj = P{ ~(s
+ t) = j} =
n
L Pij(t)p;.
;=1
This implies that
(9.6.3)
where Q' = (Pi" .. , Pn) and Q' means the transpose of Q.
Consider now the case N = 2 and take
Our goal is to determine MI. First, we have
Pll(t)
+ pdt) =
Pll (t)
+ P21 (t) = 1.
1
and, from (9.6.3), we deduce
Set Pll(t) = O(t); from the last two equations, we obtain that
M' = [O(t)
1 - O(t)
1 - O(t)J.
O(t)
Consequently,
dM'1
dt
1=0
= A = 0('(0)[ 1
-1
-1J
1
'
where 0 > 0('(0) = - A. Set A = - AAo; then it is readily seen that, for all
n = 1,2, ... ,
An
= t( -2A)n Ao.
Because A is bounded, we have
I _
M - I
= I
1 ~ (- 2At)k A
+ 2 k~l
k!
+ tA o(e- 21, -
1 OJ
=[0 1
0
1)
[t(e- 1)
+ -t(e-2.l.t - 1)
21t -
The two examples discussed above deal with generators which are
bounded linear operators defined on the whole of B. In this respect, it is of
some interest to point out that if A is a generator whose domain is the entire
space B, then A must be bounded.
250
9. Markov Processes II: Application of Semigroup Theory
If a generator A is not bounded the solution (9.4.3), then breaks down
because the series on the right-hand side does not have to converge even on
D A' This is unfortunate because many of the most interesting examples (like
the Brownian motion process) do not have bounded generators.
9.6.3. Let {~(t); t ~ O} be one-dimensional standard Brownian motion. As we have seen, this is a homogeneous Markov process with state space
{R,~} and transition probability
EXAMPLE
P(x, t, B) = (2ntfl/2
f {
B
exp -
(Y_X)2}
2t
(9.6.4)
dy.
Let B be the Banach space of real bounded Borel measurable functions on R
with the usual supremum norm. Simple calculation yields that, for any fEB,
(Ttf)(x) - f(x) = (2nfl/2
f:
exp { - u;} f(uJt
+ x)du -
f(x).
(9.6.5)
From this, it seems clear that the semigroup {P;t ~ O} induced by (9.6.4) is
not strongly continuous on B.
Denote by Co c B the subset of continuous functions and by Bo c B the
subset on which the semigroup is strongly continuous. Let us show that
Bo c Co. To this end, consider the resolvent of the semigroup (h E B)
(R;.h)(x) =
=
=
LX> e-;'t(p g)(x) dt
f
OO
0
e-;'t dt(2ntfl/2
(2Afl/2
f:
A> 0
foo
-00
{(y -2t X)2} h(y) dy
exp -
exp{ly - xljU}h(y)dy.
From this we see that for every h
function on R, so that
E
(9.6.6)
B, (R;.h)(x) is a bounded continuous
On the other hand,
(9.6.7)
[see (9.3.3)], where R;.B = {R;.h; h E B}. But according to Corollary 9.2.1, the
set DA is dense in Bo (as a matter of fact, the closure DA = Bo). Because Co is
also a Banach space, it follows that Bo c Co.
For h E Co, write
(R;.h)(x) = (2Afl/2 {exP ( -xjU)
f:oo exp(yjU)h(y)dy
+ exp(xjU) LXl exp( -
yjU)h(Y)dY }
251
9.6. Examples
and set f(x)
=
(RAh)(x), then, after some simple calculations, we have
f'(x) = -2exp( -xfo)
f:oo exp(yfo)h(y)dy + j2)J'(x),
(9.6.8)
(9.6.9)
f"(x) = 2Af(x) - h(x).
The last equation shows that 1"(') is a bounded and continuous function on
R. Consequently, 1'(.) is uniformly continuous.
From (9.3.7), we have
.if(x) - (Af)(x) = h(x).
This and (9.6.9) yield
(Af)(x) = !f"(x).
(9.6.10)
In other words, every fED A satisfies (9.6.1 0).
Now, denote by ct c Co the set of uniformly continuous functions on R.
From Equation (9.6.8), it follows that I' E Co. This then implies that f E q.
It is also not difficult to see that Af E Ct. From this and (9.6.10), we have that
I" E q.
Let us now show that the converse also holds, i.e., if
f
E ct => I" E ct => fED A'
To this end, consider
~{(T'f)(X) =
~f"(X)
f(x)} -
1
tv'2nt
M:::
foo
{(y -2 X)2} [f(y) -
exp -
t
-00
1
f(x)] dy - -2f"(x)
(using the Taylor formula)
f(y) = f(x)
=
+ (y -
x)f'(x)
foo
exp -
1
M:::
tv' 2nt
+ ~(y -
+ !(y -
x)2f(x
+ o(y -
{(y -2 X)2}{ (y -
x)f'(x)
-00
X)2f"(X
f:
=
(2nfl/2
=
!(2n)-1/2
t
+ o(y -
X»}d Y -
e- u2 / 2 nu 2f"(x
f:oo e-
u2
/
2 [f"(x
as
t
-+ 0+.
f) -
!f"(x)
f"(x)]u 2 du.
This implies that
I ~(T'f -
(0 < 0 < 1)
~f"(X)
+ OuJt)]du -
+ OuJt) -
x»
~f" 11-+ 0
252
9. Markov Processes II: Application of Semigroup Theory
Consequently, DA = {J;f E q and f" E Cc1'} and Bo = Cc1'.
Next, assume that P(x, t, B) is such that
P(x, t, B)
= (Tt IB)(x) E D A;
then
OP(~,/,B) = A(P(',t,B))(x)
or
oP(x, t, B)
at
1 02 P(x, t, B)
-2
ox 2
(9.6.11)
which is the backward Kolmogorov equation [see (8.8.8)]. From the general
theory of the equation of heat conduction, we know that the unique solution
of (9.6.11) is (9.6.4).
9.7. Some Refinements and Extensions
Let g(t); t ;::: O} be a homogeneous Markov process with state space {R, 9I}
and transition probability P(x, t, B). In this section, we discuss some morerefined properties ofthis function and of the semigroup which it induces. The
first question that naturally arises is one concerning the continuity of P(x, t, B)
with respect to t at t = O. As we have seen, this property was repeatedly used
in Chapter 8 to obtain many useful results.
Which of the Markov processes have this property? In other words, when
does condition (8.1.13) hold? To answer this and some related questions,
denote by U(x, e) the open e-neighborhood of x E R, i.e.,
U(x,
e) = {y
E
R;ly - xl < e}.
Definition 9.7.1. A transition probability P(x, t, B) on {R, 9I} is said to be
"stochastically continuous" if
lim P(x, t, U(x, e)) = 1
(9.7.1)
for all e > 0 and x E R fixed. If the limit (9.7.1) holds uniformly in x for
each e > 0, the transition probability is said to be "uniformly stochastically
continuous."
The following proposition specifies conditions on the sample functions of
the Markov process ~(t) under which (9.7.1) holds.
Proposition 9.7.1. Assume that the sample functions of the homogeneous
Markov process g(t); t ;::: O} are continuous from the right; then (9.7.1) holds.
9.7. Some Refinements and Extensions
253
Let {t n} c R+ be a sequence decreasing to zero and set Bn
{Wn) E U(x, e)}. Because e(t) is continuous from the right, we have
PROOF.
{e(0) E U(x,e)} s;;
=
n U Bk = liminf Bn·
00
00
n=1 k=n
n-+oo
Therefore, for all x E Rand e > 0,
li~~f PABn) ~ Px{li~~nf Bn} ~ Px{e(O) E U(x,e)} =
1.
This implies that
lim inf P(x, tn' U(x, e»
n->oo
~
1,
which proves the assertion.
D
In the following, we will always assume, unless otherwise stated, that
Markov processes have right-continuous sample functions. Denote as usual
by B the Banach space of bounded measurable functions on R with
supremum norm II!II = supx If(x) I, fEB. On B, we have defined a oneparameter family of bounded linear operators {Tt; t ~ O} induced by a transition probability P(x, t, B) by
(Ttf)(x)
Clearly, for each t
~
= f:f(Y)P(X, t, dy) = Ex{J(W))}·
0,
Tt: B--+B,
= T P, and I Ttll : :; ; 1.
As we have seen in the previous sections, the theory of semigroups of
bounded linear operators in a Banach space deals with the exponential
functions in infinite-dimensional functional spaces. These problems were investigated independently by Hille and Yosida (1948). They introduced the
concept of the infinitesimal generator A of a semigroup Tt and discussed the
problem of generation of T' in terms of A. Let us now prove the following
result.
T s +t
t •
Proposition 9.7.2. Let fEB be continuous; then, for each x E R,
lim (T1)(x) = f(x).
'->0+
Due to continuity of f, for each x E R and any ~ > 0, there exists
U(x, e) such that when y E U(x, e), If(y) - f(x)1 < ~. Therefore,
PROOF.
(T'f)(x) - f(x) =
r
(f(y) - f(x»P(x, t,dy)
JU(x.£)
+
r
J
UC(x,e)
(f(y) - f(x»P(x, t, dy).
9. Markov Processes II: Application of Semigroup Theory
254
From this, it readily follows that
+ 211fll P(x, t, UC(x, e».
I(Ttf)(x) - f(x) I :::;; bP(x, t, U(x, e»
This and the previous lemma prove the assertion.
D
Corollary 9.7.1. From the last inequality, we deduce that the semigroup is
strongly continuous on the subset of continuous functions C c B if P(x, t, B) is
uniformly stochastically continuous.
The following proposition is due to Hille.
Proposition 9.7.3. Denote by oc(t)
~oc(t) =
lim
t
t .... oo
PROOF.
= In I Ttll; then,
inf
t>O
~oc(t).
t
From the semi group property, we deduce that
11T"+tll
=
IIT"Ttll :::;; IITslI'llytll,
so that
oc(s + t) :::;; oc(s) + oc(t).
Now, denote by
f3 = inf ~oc(t);
t>O
t
then f3 is either -00 or finite. Assume that f3 is finite. We choose, for any e > 0,
a number x > 0 such that oc(x):::;; (f3 + e)x. Let t > x and n be such that
nx:::;; t:::;; (n + l)x:Then
1
t
1
t
1
t
f3 :::;; -oc(t) :::;; -oc(nx) + -oc(t - nx)
nx 1
1
:::;; - -oc(x) + -oc(t - nx)
txt
1
t
nx
:::;; -(f3 + e) + -oc(t - nx).
t
Thus, letting t -.
00
in the above inequality, we obtain
lim
t .... oo
The case f3 =
-00
~oc(t) = f3.
t
is treated similarly.
Remark 9.7.1. Lemma 8.3.1 is a version of this proposition.
D
255
9.7. Some Refinements and Extensions
The concept of the resolvent R;. associated with a semigroup {T'; t ~ O}
was discussed in Section 9.3 of this chapter. According to Definition 9.3.1, this
is a family of bounded linear operators {R;.; A > O} on the Banach space of
bounded Borel-measurable functions on R defined by
(R;.f)(x)
= LX> e-Al(T'j)(x)dt, A> 0, fEB.
From this and Fubini's theorem, it follows that
(Rd)(x) = Ex
It is easy to see that
{Lx> e-).tf(~(t»dt}.
(9.7.2)
IIR;.II = A-I, (R A 1)(x) == A-I.
Proposition 9.7.4. For all AI' A2 > 0,
R A, - RA2
+ (AI
(9.7.3)
- A2)R A2 R A, = O.
In addition, iff E B is continuous at xo, then
lim A(Rd)(xo) = f(xo).
A--'oo
PROOF.
Consider
P(RAJ)(x) = I : P(x,s,dy)
too e-A"(T'f)(y)dt
=
too e-A"dt I : I:f(U)P(X,S,dY)P(y,t,dU)
=
fOO e- A" dt Ioo f(u)P(x, s + t, du)
°
too e-A"(P+'f)(x) dt
-00
=
= e-A,s
Isoo e-A,t(T'f) (x) d•.
From this, we now have
too e- A2S(T sR;.J)(x)ds = (R;'2R;.J)(X)
= too e-;' 2S e-;"s ds Is'" e-;"t(Ttf)(x) d.
=
too e-;.,t(T'f)(x)d.
t
e(;"-;'2)Sds
= (AI - A2tl {(R;.J)(x) - (R;.,f)(x)},
256
9. Markov Processes II: Application of Semigroup Theory
which proves (9.7.3). Next, from
A(R;./)(xo) =
too Ae-M(TIJ)(xo)dt
= too e-t(Tt"-1j)(x o) dt
and Proposition 9.7.2, the second part of the proposition follows.
0
Problems and Complements
9.1. Let {Tt; t ~ O} be a semigroup on a Banach space Band Bo c B the subset on
which Tt is strongly continuous. Show that Bo is a closed linear manifold.
9.2. Let A be a bounded linear operator on a Banach space B. Show that
(i) lIe Al1 ::;; exp{IIAII};
(ii) elI = et 1;
(iii) e A+B = eA. e B if A· B = B· A.
9.3. If the bounded operators A and B are such that IletAl1 ::;; 1 and IletBl1 ::;; 1 for all
t ~ 0, show that IletBf - etAfil ::;; t IIBf - Alii.
9.4. If A is a bounded operator on B, show that Tt
=
etA is a semigroup.
9.5. Let Co [ -00,00] be the space of bounded continuous functions on [-00,00].
Define on Co the operator T t , t ~ 0, by
L ().4f(x <Xl
(Ttf)(x) = e-).t
kll)lk!
k=O
where). > 0 and Il > o. Show that: (i) Tt: Co ..... Co; (ii) {Tt; t ~ O} is a semigroup.
Is the semigroup strongly continuous?
9.6. (Continuation) Define Tt on Co by
(T1)(x) =
f:
K(x - u, t)f(u) du,
where
K(x, t) = (2m)-1/2 e -x 2 /2t,
-00 < x < 00, t >
o.
Show that {Tt; t ~ O} is a strongly continuous semigroup.
9.7. Let {Tt;t
~ O} be a semigroup on a Banach space Band Bo c B the subset on
which it is strongly continuous. Show that TtBo = Bo for all t ~ o.
9.8. (Continuation) Show that there exists a constant M > 0 and), > 0 such that, for
allt~O,
II Ttll ::;;Me).t.
9.9. (Continuation) Show that for each hE B o, Tth is a continuous mapping from
[0,00) into Bo ({Ttf; t ~ O} represents a curve in Bo).
257
Problems and Complements
9.10. Let {W); t ~ O} be a homogeneous Markov process with state space {R, Bt} and
transition probability P(x, t, B) satisfying condition (8.1.13). Let B be the set of
bounded Borel functions f: R ..... R. For each t ~ 0, define T'B ..... B by
(T'f)(x) =
f:
f(y)P(x, t, dy) = Ex {J(W)) }.
Show that {T!; t ~ O} is a strongly continuous contraction semigroup on B.
9.11. (Continuation) Show that, for each t
B.
~
0, T' is a continuous linear operator on
9.12. (Continuation) If hE B is continuous at Xo
E
R, show that
t-+O+
9.13. Show that the generator A of a semigroup is a linear operator on D A.
9.14. Show that for every f
E
D A, Af E Bo.
9.15. Show that the generator ofthe semigroup in Problem 9.4 is A.
9.16. Show that the generator of the semigroup in Problem 9.5 is
(Af)(x) = A.{J(x - Jl.) - f(x)}.
9.17. Determine the generator of the semigroup in Problem 9.6.
9.18. Show that R;.B ,;; B.
9.19. Let {T'; t ~ O} be a contraction semigroup on a Banach space B with generator
A. Show that, for each A. > 0,
(A.l - A): D A
.....
B
is one-to-one, and that the inverse mapping taking B into DAis the resolvent R;.
(see Proposition 9.3.1).
CHAPTER 10
Discrete Parameter Martingales
10.1. Conditional Expectation
The concept of a martingale introduced in Section 1.5 of Chapter 1 [see
(1.5.19)] was defined in terms of the conditional expectation with respect to a
a-algebra. In this section, we will explore briefly some basic properties of this
conditional expectation, which are needed in this chapter. We begin with
some definitions.
Let {n,gjJ,p} be a probability space and denote by L2 = L 2{n,gjJ,p} the
Hilbert space of random variables (complex or real) defined on {n, fJI, P},
having finite variances. On L 2 , the inner product is defined by
(Zl,Z2) = EZ 1 Z2
(see Definition 5.1.3). The norm of an element Z
IIZII
E
L2 is then
= (Z,Z)1/2 = (EIZI2)1/2.
Let f/ c gjJ be a sub-a-algebra; the subset of all those random variables
from L 2, measurable with respect to Yo is a subspace L! = L2 {n, f/, P} of L 2.
Denote by p* the projection operator defined on L 2 {n,gjJ,p} which projects
elements of this space perpendicularly onto L!. According to Proposition
5.4.1., for every Z E L 2, P*Z E L! so that P*Z is f/-measurable. In the following, we will write
(10.1.1)
P*Z = E{ZIf/}
and call it the "conditional expectation of Z with respect to (or given) the
a-algebra f/."
It is clear that, due to Proposition 5.3.1, the f/-measurable random variable E{ZIf/} is uniquely defined up to a P-null set. If Y E L!, then, because
P* Y = Y, it follows that
259
10.1. Conditional Expectation
E{YI9'} = Y
Next, for any A
have
E
9',IA
E
L
(10.1.2)
(a.s.).
q; consequently [see (5.2.8) and (5.2.9)], we clearly
E{ZI9'}dP =
f
P*Z·IAdP = (P*Z,IA) = (Z,P*IA)
= (Z,IA) =
L
ZdP.
(10.1.3)
= E(Z).
(10.1.4)
From this, it follows that
E(E{ZI9'})
If Z = IB' where BE fJI, the conditional expectation E{ZI9'} is called the
"conditional probability of B given 9''' and written P(BI9'), i.e.,
P(BI9')
= E{IBI9'}.
(10.1.5)
= lA
(10.1.6)
From this and (10.1.2), we obtain
P(AI9')
when A
E
(a.s.)
9'. In addition, due to (10.1.3),
L
(10.1.7)
P(BI9')dP = P(A n B).
We now list some basic properties of the conditional expectation which are
not difficult to prove. First,
E{aZ1
+ PZ219'} =
aE{Z119'}
+ PE{Z219'}
(a.s.),
(10.1.8)
where a and Pare constants.
To show (10.1.8), we invoke (10.1.1), (5.8.4), and (5.8.5) to obtain
E{aZ1 + PZ219'}
= P*(aZ 1 + PZ 2)
= aP*Z1 + PP*Z2 = aE{Z119'} + PE{Z219'},
which proves (10.1.8).
Let 9'1 c 9'2 C fJI be two sub-a-algebras, then
(i)
E(E{ZI9'2}I9'd = E{ZI9'd
(ii)
E(E{ZIY't} 19'2) = E{ZI9'd (a.s.).
(a.s.),
(10.1.9)
To show the first equation, note that L2 {n,Y't,p} c L 2 {n,92,p}. If we
denote by Pt Z the orthogonal projection of Z E L2 on the first subspace and
by PI Z on the second one, we have
E(E{ZI92}19'1) = E(PIZI9'd = Pt(PIZ) = ptZ
which proves (1O.9.i). The second relation is a direct consequence of (10.1.2).
If the random variable Z is independent of 9', then
E{ZI9'} = E(Z)
(a.s.).
(10.1.10)
10. Discrete Parameter Martingales
260
As a matter of fact, for any A
L
Y,
E
E{ZIY}dP
=
L
= E(Z·IA)
ZdP
L
= P(A)E(Z) =
Now consider X, Z
measurable; then
E
L2 such that X· Z
E{XZIY}
E
E {Z} dP.
L2 and assume that X is Y-
= XE{ZIY} (a.s.).
(10.1.11)
The proof of this begins by considering the case X = IA. For any C E Y, we
have
Jcr E{XZIY}dP = Jcr ZIA dP =
=
f
f
ZdP
AnC
E{ZIY}dP =
AnC
i
C
XE{ZIY}dP.
Extension of this result to simple functions, to non-negative functions, and
then to a general case follows in the usual way.
Denote by
ff" = a{X1,···,Xn }
the a-algebra generated by the random variables Xl' ... , Xn; then, we write
(10.1.12)
Remark 10.1.1. For the conditional expectation of a random variable Y, given
a a-algebra Y, to exist, it suffices that EI YI < 00 (the second moment is not
really necessary). In such a case, E{YIY} is defined as any Y-measurable
random variable satisfying the condition
L
E{YIY}dP
=
L
YdP
for all A
E Y. Of course, E{YIY} so defined has all the properties (10.1.2)(10.1.11). In the sequel, we will assume only that the random variables we are
dealing with are from Ll = Ll {n,96,p}.
10.2. Discrete Parameter Martingales
Let {~n}O be a sequence of real random variables defined on a probability
space {n, 96, Pl. Denote by
ff" =
ago,··.'~n},
n = 0,1, ... ,
10.2. Discrete Parameter Martingales
261
the sub-O"-algebra of fJI generated by eo, ... , en. Clearly, {ff,,}0' is an increasing
sequence, i.e.,!Fo c !F1 C ... , often called the "internal history" of gn}O'. The
system
(10.2.1)
is called a "stochastic sequence."
Definition 10.2.1. Assume that
(10.2.2)
then the stochastic sequence
{(en, ff,,)}0'
represents a discrete parameter
(i) martingale if E gn+llff,,} = en (a.s.),
(ii) submartingale if Eg n +1Iff,,} ~ en (a.s.),
(iii) supermartingale if E gn+tlff,,} ~ en (a.s.)
for all n
(10.2.3)
= 0, 1, ....
From (10.1.4) and (1O.2.3.i), it follows that
Eg n +1} =
Egn }
for all n= 0, 1, .... In other words, if {(en'ff,,)}O' is a martingale Eg n} =
Ego} for all n ~ 1. On the other hand, if the stochastic sequence is a submartingale, we have
Finally, we have, for all n = 0, 1, ... ,
if the stochastic sequence is supermartingale.
From the properties of conditional expectations, one easily deduces that
(10.2.3) is equivalent to the following: If, for each A E ff", n = 0, 1, ... ,
L
L
L
L
~L
~L
en+1 dP =
en dP , {(en'ff,,)}O' is a martingale;
en+1 dP
en dP, {(en, ff,,)}0' is a submartingale;
en+1 dP
en dP; {(en, ff,,)}0' is a supermartingale.
It is said that a martingale {(en, ff,,)} 0' is closed from the right if there exists
a random variable eoo E L1 {n,fJI,p} such that
en = E goolff,,} (a.s.)
for all n = 0,1, .... On the other hand, if {(en,ff,,)}O' is a submartingale and
262
10. Discrete Parameter Martingales
en ~ Egool§,,}
(a.s.)
for all n = 0, 1, ... , we say that the submartingale is closed from the right.
In the modern literature a somewhat more general definition of a martingale is formulated. Let {y"}0' be an increasing sequence of sub-q-algebras of
!fI, i.e.,
9Oc~c···.
Such a family is usually called a "history" or "filtration." A sequence {en}O' of
random variables on {n,!fI,p} is said to be "adapted" to the history {y"}0' if
en is Y,.-measurable for each n = 0, 1, ....
A sequence gn}O' C L1 {n,!fI,p} of real random variables adapted to a
history {y"}0' is called a "martingale" with respect to {y"}0' if
(10.2.4)
for every n = 0, 1, .... In a similar fashion, one may define the concept of a
submartingale and supermartingale with respect to {y"}0'.
Remark 10.2.1. It is not difficult to see that (10.2.4) leads to the following more
general relation. For every 1 ~ k ~ n,
EgnlY,.}
= ek
(10.2.5)
(a.s.).
This follows from (10.1.9.i) by induction because
EgnlY,.}
= E(EgnlY,.-dlY,.)
= Egn - 1 1Y,.},
and so on.
Remark 10.2.2. In the following, unless otherwise stated, when we say that
a sequence of random variables is a martingale (submartingale or supermartingale), it means it is so with respect to its internal history.
Proposition 10.2.1. Let {en}O' be a martingale and h: R
then {h(en)}O' is a submartingale
if {h(en)}O'
C
-+ R a convex junction,
L1 {n,!fI,p}.
PROOF.
From Jensen's inequality, we have
E{h(en+1)lh(eo),.··,h('n)} ~ h(Eg n +1leo,···,en})
= h(en)
(a.s.).
Therefore,
o
10.3. Examples
263
Corollary 10.2.1. If {e,,}0' is a martingale, {le"I}O' is a submartingale because
Ixl is a convex function. The function x+ = max{O,x} is a nondecreasing
convex function. Indeed, because, for any two real numbers Xl and x 2,
(Xl + x 2)+ ~ xi + xi, it follows that
(PXl
+ qX2)+
~ pxi
+ qxi,
p
+q=
1, 0 ~ p ~ 1,
which proves the assertion. Therefore, {e:}O' is a submartingale.
D
10.3. Examples
In this section a series of motivating examples will be presented.
10.3.1. Let X ELl {n,&I,p} be an arbitrary random variable and
a filtration, such that ~ c &I for all n = 0,1, .... Set
EXAMPLE
{~}O'
ell = E{XI~}.
Clearly, the sequence of random variable {e,,}0' is adapted to
due to (10.1.9.i), we have
{~}O'.
Next,
E{e"+1I~} = E{E{XI~+dl~}
= E{XI~} =
ell'
which implies that the sequence {e,,}0' is a martingale. Many (but not all)
martingales can be obtained in this way.
The next example gives a gambling interpretation of a martingale.
10.3.2. A gambler plays a sequence of independent games. Assume
that X o is his initial fortune (before play commences), X o + Xl is the gambler's fortune after the first play, X o + Xl + X 2 at the end of second, and so
on. In this way we obtain a sequence of partial sums {e,,}0'
EXAMPLE
ell = Xo + Xl
+ ... + X"
of independent random variables. Note that
u{eo, ... ,e,,}
= u{Xo,,,,,X,,}, n = 0, 1, ....
In fact, because Xo = eo and X k= ek X" is u{eo, ... , e,,}-measurable so that
ek-l' k =
(10.3.1)
1, ... , n, each X o, Xl' ... ,
u{Xo,,,,,X,,} c u{eo, ... ,e,,}.
On the other hand, each
k = 0, ... , n, so that
which proves (to.3.1).
ek
is clearly u{Xo, ... ,X,,}-measurable for every
10. Discrete Parameter Martingales
264
Next, invoking (10.1.10) and (10.3.1), we have
Egn+1leo,···,en} = Egn + X n + 1 IXo,···,Xn }
en + E{Xn+d·
=
From this, we deduce the following. The sequence of random variables {en}O'
is a martingale if E {Xn} = 0 for all n = 0, 1, .... If E {Xn} > 0 for every n = 0,
1, ... , gn}O' is a submartingale and a supermartingale if E{Xn} < 0, n = 0,
1, ... .
EXAMPLE 10.3.3. Let {Zn}O' be an i.i.d. sequence of(nondegenerate) random
variables with common support on (0, 1] and E{ZD = IXi > 0, i = 1,2. Set
It is quite clear that the sequence of random variables {Ln}f represents a
martingale such that
(10.3.2)
for all n = 1,2, ... (for an application see Example 1.6.2). From this we clearly
have that E {L;} ~ 00 as n ~ 00. In spite of this, however, Ln ~ 0 (a.s.) as
n~oo.
To show this, write Ln as
Ln = ex p
Ltl
In(Zj/IXd}.
From Jensen's inequality, we have
E{ln(Zj/IX 1 )} < InE{Zj/IXd
=
o.
Therefore, because E{ln(Zj/IX 1 )} < 0, by the strong law oflarge numbers
n
L In(Zj/IX
j=l
1)
~ -00
(a.s.)
and, thus, Ln ~ 0 (a.s.) as n ~ 00. This is an interesting example of a sequence
of random variables converging to zero (a.s.) while its variance increases to
infinity.
EXAMPLE 10.3.4. Let {Xn}f be a sequence of random variables such that the
probability density of (Xl' . .. ,Xn ) is either InO(., ... , .), or 1/ (., ... , .) for each
n = 1,2, .... Assume that In 1 > 0 on R nfor all n = 1,2, ... ; then the random
variable Pn(ro), defined by
Pn
/,,0 (X1' ... ' Xn)
= In1 (Xl'··· ,Xn)'
is a likelihood ratio. If /,,1 is the true probability density of (Xl, . .. ,Xn ), then
10.3. Examples
265
E{Pn+1IX 1 =
Xl,,,,,Xn =
Xn}
= f Pn+1(X 1,· .. ,Xn,y)P{Xn+1 E dylX 1 =
X1,···,Xn
= Xn}
= fin~l(Xl' ... 'Xn'Y)d =inO(x1, ... ,Xn)
y
1
.
j,nl( X1,···,X n)
in (Xt> ... ,x n)
In other words,
E{Pn+1I X t> ... , Xn}
=
Pn
(a.s.),
which shows that {Pn}f is a martingale with respect to the filtration {ff,,}f,
where ff" = 0"{X1,·.·,Xn}·
On the other hand, from (10.1.3), we see that
O"{Pl, ... ,Pn} c O"{Xt> ... ,Xn}.
From this and (10.1.9.i), we deduce that
E{Pn+lIPl, ... ,Pn} = E(E{Pn+lIXl,···,Xn}lpl,···,Pn)
= E(PnIPl,···,Pn)
=
Pn
(a.s.),
so that {Pn}f is a martingale.
EXAMPLE 10.3.5. Consider an urn which contains b ~ 1 black balls and w ~ 1
white balls which are well mixed. Repeated drawings are made from the urn,
in such a way that after each drawing the selected ball is returned to the urn
along with c ~ 1 balls of the same color. Set
Xo
= bl(b + w)
and denote by Xn the proportion of black balls in the urn after the nth
draw. Here we want to show that the sequence of random variables {Xn}~
represents a martingale.
To this end, set Yo = 1 and, for n ~ 1, define y" as follows:
Yn
=
{
1 if the nth ball drawn is black
0 if the nth ball drawn is white.
Let bn and Wn be the number of black and white balls, respectively, in the urn
after the nth draw (wo = w, bo = b). Then, clearly,
Xn
= bnl(bn + wn), n = 1, ... ,
and
Next, we, clearly, have that
266
10. Discrete Parameter Martingales
so that (a.s.)
E{Xn+1IYo, ... ,Y,,}=E{b
n+1
bn+1
lyo, ... ,Y,,}
+ wn+1
= E { bbn + C y"+1 IYo, .. ·, Y" }
n + Wn + C
bn
cP{y"+1 = lIYo,'''' Y,,}
-0---"---+-.0-:.:-:-=----'--"------"-'bn + Wn + c
bn + Wn + c
c
bn
-0---"--- +
Xn
bn + wn + c bn + Wn + c
= b + wn
1
+
n
C
{b +~}=X
+
n
bn
Wn
n'
Because
a{Xo, .. ·,Xn} c a{Yo, .. ·, Y,,},
we see that
E{Xn+1I X o,,,,,Xn} = E(E{Xn+dYo, .. ·, Y,,}IXo,''''Xn )
= E{XnIXo, ... ,Xn} = Xn (a.s.),
which proves the assertion.
lOA. The Upcrossing Inequality
The proof of the fundamental convergence theorem for submartingales, which
will be given in the next section, depends on a result which is known as the
"upcrossing inequality." Its proof, somewhat combinatorial in nature, will be
presented here.
We begin by briefly discussing the convergence of numerical sequences. Let
{lXn}f be a sequence of real numbers. From the definition of convergence, it
is clear that {lXn} f has a limit as n ~ 00 if and only if, for any two rational
numbers -00 < ro < r1 < 00, the sequence passes from below ro to above r1
at most a finite number of times. In other words, if the sequence converges,
the number of "upcrossings" of any such interval [ro, rd is finite. If, on the
other hand, such an interval can be found where the number of upcrossings
by {lXn}f is infinite, the sequence diverges. As a matter of fact, if p(rO ,r1)
represents the number of upcrossings of [ro, rJ by {lXn}f
limlXn
~
ro < r1 ~ limlX n<=>p(ro,r1 ) =
00.
From this, we deduce that the sequence {lXn}f converges if and only if
p(ro, r1) < 00 for any two rational numbers -00 < ro < r1 < 00.
Similarly, a sequence of real random variables {~n}f has a limit (finite or
10.4. The Upcrossing Inequality
267
infinite) with probability 1 if and only if the number of upcrossings of any
[rO ,r1 ] is finite with probability 1.
In this section we will establish an upper bound for the number of upcrossings of submartingales. Let -00 < a < b < +00 be two numbers and
{((n'~)}'f a stochastic sequence. Set
'0 =
'1 =
'2
0,
inf{n > O;(n ~ a},
inf{n > '1; (n
=
'2k-1 = inf{n
'2k = inf{n
'v
;;?:
b},
(10.4.1)
> '2k-2;(n::;; a},
> '2k-1; (n
;;?:
b},
taking = 0 if the corresponding set { } is empty.
Next, for every n ;;?: 1, define
f3n(a,b)
=
(10.4.2)
max{i;'2i::;; n}
and
f3n(a, b) = 0 ifr2 > n.
In other words, the random variable f3n(a, b) represents the number of upcrossings of the interval [a,b] by the sequence (1' ... , (n (see Figure 10.1).
Clearly, 13: 0, 1, ... , [nI2], where [x] represents the integer part of x. The
following proposition is due to Doob.
Proposition 10.4.1. Let {((n' ~)}'f be a submartingale. Then,for every n ;;?: 1,
E{f3n(a, b)} ~ E((n - at /(b - a).
PROOF.
(10.4.3)
From Corollary 10.2.1, we know that the sequence
{((n - a)+,~}'f
,
•
a
I
,
I
I
I
I
0
I
T1
I
•
•
I
b
•
•
,
I
I
I
•
•
•
T3
•
•
T
•
I
I
I
I
T2
(10.4.4)
T4
Figure 10.1
T5
•
•
•
268
10. Discrete Parameter Martingales
is a non-negative submartingale. In addition, the number of upcrossings of
the interval [0, b - a] by
at, ... , (~n - at
(~1 -
°
is again Pn(a, b). Therefore, it suffices to consider a non-negative submartingale
{(X.,ff..)}f, Xn ~ 0, with a = and to show that
E{P.(O,b)}::;; E{Xn}/b.
Set Xo = 0, ~o = {e,n}, and, for i ~ 1, write
Bi
=
{
°
if"k < i ::;;
I
if"k
"k+1
for some odd k
< i ::;; "k+l for some even k.
°
In other words, [see (10.4.1)], Bi = 1 if the largest tk left of i is a downcrossing
and the "k+l is an upcrossing, and Bi = if the largest
left of i is an
upcrossing and "k+l is a downcrossing. Clearly, then, Bl = because 1 ::;; tl
(as a matter of fact, Bi = for all i ::;; ";). Now, we have
°
bP.(O,b)::;;
•
L (Xi -
i=1
"k
°
X i- 1 )Bi
and
Consequently,
bE{P.(O,b)}::;;
=
::;;
•
L E{(Xi -
i=1
t
.=1
f
{.i=l}
X i- 1 )B;}
(E{X;I.~-d -
i~ f(E{Xil~-d -
X i- 1 )dP
Xi-ddP::;; f E{X.Iff..-ddP
= E{X.}.
This proves the assertion.
D
10.5. Convergence of Submartingales
In this section we will prove a theorem of Doob concerning convergence of
submartingales. It represents one of the fundamental results in the theory of
martingales and can be considered an analog of the well-known fact in real
analysis that every bounded monotone sequence has a finite limit.
269
10.5. Convergence of Submartingales
Proposition 10.5.1 (Doob). Let {(Xn,§")}'i' be a submartingale such that
(10.5.1)
n
Then, there exists a random variable Xoo such that
Xn --. Xoo
and EIXool <
PROOF.
(10.5.2)
(a.s.)
00.
Assume that the submartingale does not converge; then,
P{limXn > limXn} > O.
(10.5.3)
Because
{lim Xn > lim Xn} =
for all rationals
IX
U {lim Xn > P > IX > lim Xn}
(%<(1
< p, there exist two rational numbers ro < r 1 such that
P{limXn > ro > r1 > limXn} > O.
In other words, with positive probability, there is an infinite number of
upcrossings of [ro, r1].
Denote by Pn(ro, rd the number of up crossings of [rO ,r1 ] by Xl' ... ' Xn and
write
p(ro,rd = lim Pn(rO ,r1 ).
n-+oo
By (10.4.3),
E{Pn(ro, r1 )}
:::;;
E(Xn - rot /(r1
:::;; E(X:
-
ro)
+ Irol)/(r1 - ro),
and, therefore,
E{p(ro,rd} :::;;
(s~p EX: + Irol)/(r
1 -
ro).
From this (bearing in mind that sUPnEIXnl < oo¢>suPnEX: < (0), we
deduce that
E{p(ro,rd} <
00,
which contradicts (10.5.3). Consequently, (10.5.2) holds and by Fatou's lemma
EIXool :::;; sup EIXnl <
00,
n
D
so the assertion follows.
Corollary 10.5.1. Let {(Xn' §")}'i' be a non-negative martingale; then, the lim Xn
exists (a.s.). Indeed,
sup EIXnl = sup Xn = EX1 <
n
and Proposition 10.5.1 is applicable.
n
00
10. Discrete Parameter Martingales
270
Remark 10.5.1. Proposition 10.5.1 also holds for martingales and supermartingales because every martingale is a submartingale and because
{(-Xn'~)}'f is a submartingale if {(Xn'~)}'f is a supermartingale. For
non-negative martingales or supermartingales, assumption (10.5.1) always
holds.
Remark 10.6.2. Condition (10.5.1) does not guarantee the converges of Xn to
Xoo in the mean as the following counterexample shows. Let {~n}'f be an i.i.d.
sequence of random variables with the common distribution
Pg i = O} = t,
Pg i = 2} =
t.
Consider
Xn =
n ~i'
n
n
i=1
= 1, 2, ... ;
then, {(Xn'~)} 'f is clearly a martingale with EXn = 1. Thus,
Xn -+ Xoo
(a.s.),
where Xoo = 0 (a.s.). On the other hand,
EIXn - Xool
= EXn = 1 for all n = 1,2, ....
However, if assumption (10.5.1) is strengthened to uniform integrability of
the sequence {Xn}'f, then we have, simultaneously,
Xn -+ Xoo (a.s.)
and
Xn -+ Xoo in the mean.
Definition 10.5.1. A sequence of random variables
tegrable if
lim (sup
+oo
n
c ....
f
gn>C)
{~n}'f
I~nl dP) = O.
is uniformly in-
(10.5.4)
Let us give some criteria for uniform integrability. The simplest one is the
following: If there exist a random variable U such that I~nl ~ U and EU < 00,
then {~n}'f is an uniformly integrable sequence. More useful is the following
result.
Proposition 10.5.2. A sequence of random variables {~n}'f is uniformly integrable if and only if:
(i) sUPnEI~nl < 00;
(ii) for every B > 0, there exists (j > 0 such that for any B
supn I~nl dP < B.
fB
E
9B with P(B) < (j,
PROOF. (Necessity). If the sequence is uniformly integrable, then, for any B > 0,
there exists c > 0 such that
10.5. Convergence of Submartingales
271
sup Ele.. 1 = sup (Ele.. II{I~nl2:cl
..
..
+ sup
.. Ele.. II{I~nl<cl)
~ sup Ele.. II{I~nl2:cl
.
~8+
+ Ele.. II{I~nl<cl)
e,
which proves (i). On the other hand,
Ele"IIB
= Ele"IIBn{l~nl2:cl
~
Ele.. II{I~nl2:Cl
+ Ele.. IIBn{l~nl<cl
+ eP(B).
N ow, take clarge enough so that sup" E Ie.. 1I (I~nl2:C l ~ 8/2; then, if P(B) < 8/2e,
we obtain
sup Ele"II B = sup
..
"
(Sufficiency). Take 8> 0 and
Because
(j
f
le.. 1 dP <
8.
B
> 0 so that P(B) <
(j
implies supEle.. IIB < 8.
Ele.. 1 ~ Ele.. II{I~nl2:cl ~ eP{I~nl2:Cl'
using Markov's inequality we have
1
sup P{lenl ~ e} ~ - sup Elenl-+ 0 as e -+
..
e ..
00.
Consequently, for large e, any set {le.. 1 ~ e} can be taken as B. This implies
that
f
le .. ldP <
8,
{I~nl2:cl
which completes the proof of the proposition.
D
Proposition 10.5.3. Let {e .. }! be a uniformly integrable family; then
E{lim e.. } ~ lim Eg .. } ~ lim Eg n} ~ E{lim en}.
PROOF.
(10.5.5)
Assume x > 0 and write
(10.5.6)
Given 8 > 0, due to uniform integrability there is x sufficiently large so that
sup IEg.. Ign<-xll <
..
8.
(10.5.7)
On the other hand, by Fatou's lemma,
lim EgnI{~n>-xl} ~ E{lim e.. I {," > -xl}'
But e.. Ign>-xl} ~
e. ,which yields
lim Eg.. Ign> -xl} ~ E{lim en}·
(10.5.8)
272
10. Discrete Parameter Martingales
Finally, (10.5.6), (10.5.7), and (10.5.8) give
lim Eg n} ~ lim EgnIgn,;;-x}} - e
~
E{lim 'n} - e.
Because e > 0 is arbitrary, the first half of (10.2. 7) follows. The second half can
be proved similarly.
D
Corollary 10.5.2. If, in the last proposition, we have that
integrable,
E'n -+ E,
EI'n -
and
'1-+ 0
'n -+ , (a.s.), then, is
as n -+
00.
Indeed, in such a case we have
E{n:::;; lim E{'n}:::;; lim E{'n}:::;; E{n.
This shows that E {,} is finite and that
lim Eg n} = Eg}.
n-+co
Finally, I'n - '1-+ 0 (a.s.) and, because I'n - ,I :::;; I'nl
integrable family.
+ 1'1, it
is a uniformly
Another simple criterion for uniform integrability gives the following result.
Proposition 10.5.4. Let gn}f c L1 {n,~,p} and h(t) ~ 0 be a nondecreasing
function, defined for all t ~ 0, such that
(i)
lim h(t)/t
'-+co
=
+00,
sup E{h(I'nl)} <
(ii)
(10.5.9)
00;
n
then
{'n}f is uniformly integrable.
PROOF.
h(t)/t
Let e > 0, Q = sUPnE{h(I'nl)}, and IX = Q/e. Take x> 0 so large that
for t ~ x. Then
~ IX
f
{I~nl2:x}
uniformly for
I'nl dP:::;;
~
IX
f
{I~nl2:x}
h(l'nl)dP:::;; Q/IX = e
n ~ 1.
10.6. Uniformly Integrable Martingales
In this section we discuss some basic features of uniformly integrable
martingales.
Proposition 10.6.1. Let {(Xn, ~)}f be a uniformly integrable submartingale.
Then, there exists a random variable Xco with EIXcol < 00 such that
273
10.6. Uniformly Integrable Martingales
(i) Xn -+ Xoo (a.s.),
(ii) EIXn - Xool-+ 0 as n -+
(10.6.1)
00,
and {X1 , ••. ,Xoo ;$l1 , •.• ,$loo }, where$loo
= u{Ui~} is also a submartingale.
PROOF. Because the submartingale is uniformly integrable, condition (10.5.1)
holds, and, consequently, Xn -+ Xoo (a.s.). From Corollary 10.5.2, we conclude
that (1O.6.1.ii) also holds. Finally, consider A E $'" and let N ~ n, then
L
IXN
-
Xool dP -+ 0 as N -+
00.
Hence,
The sequence {EXnI A} is nondecreasing because
E{XNI$lN-d
~ X N- 1 =>
L ~L
XNdP
X N- 1dP.
Therefore,
which implies that
Xn s-; E{Xool$',,}
(a.s.).
o
This completes the proof of the proposition.
Corollary 10.6.1. Let {(Xn , $',,)} i be a submartingale such that for some y > 1,
(10.6.2)
n
Then there is an integrable random variable Xoo such that (10.6.1) holds.
The proof follows from the fact that condition (10.6.2) guarantees the
uniform integrability of {Xn}f (see Proposition 10.5.4).
The following result due to Levy is concerned with the continuity property
of a conditional expectation.
Proposition 10.6.2. Let {n, aJ, P} be a probability space and {$',,}f a filtration,
i.e.,$l1 c $l2 C .•. c aJ. Let ~ E L1 {n, aJ, P} and set$loo = U{U:;'l ~}. Then
(10.6.3)
(a.s.) and in the mean.
PROOF. Set
10. Discrete Parameter Martingales
274
then, clearly, {Xn}f is a martingale. For any a > 0 and b > 0,
f
{IXnl;>a}
IXnldP
~
=
f
f
f
(IXnl;>a)
{IXnl;>a)
=
E{I~II~}dP
1~ldP
{lXnl;>a.I~I:5b}
1~ldP +
f
~ bP{IXnl ~ a} + f
(lXnl;>a.I~I>b)
{I~I>b}
~ bE{IXnl}/a + f
~ bE{I~I}/a +
By letting a ---+
+00
and then b ---+
lim sup
a-oo
n
+00,
f
f
{IO>b}
(1~I>b)
1~ldP
I~I dP
I~I dP
I~I dP.
we obtain
{iXnl;>a}
IXnl dP = O.
This shows that the sequence {Xn}f is uniformly integrable. Therefore, by
Proposition 10.6.1, there exists a r.v. Xoo such that Xn ---+ Xoo (a.s.) and in the
mean. What remains to be shown is that
Let N
~
n and A
E ~;
t
then,
Because the sequence {Xn}f is uniformly integrable and because
it follows that
IXN
-
t
Xool dP
---+
XoodP
=
0
as N
L~dP.
---+ 00,
(10.6.4)
This equation holds for all A E ~ and, therefore, for all A E Uf~. Because
00 and EI~I < 00, the integrals in (10.6.4) are a-additive measures
agreeing on the algebra U f ~. Because of the uniqueness of their extensions
to a{Uf ff,.}, Equation (10.6.4) remains valid for all A E a{Uf ff,.} = $'00'
Thus, for all A E $'00'
EIXool <
275
10.6. Uniformly Integrable Martingales
which implies that
This completes the proof of the assertion.
D
The following example shows that a branching process has the martingale
structure.
EXAMPLE
10.6.1. Let
x~l), X~l), .. .
X~2), X~2), .. .
be a sequence of i.i.d. non-negative integer-valued random variables such
that
Set
To = 1,
and, for n ~ 2,
Tn-l
T" = i~ x!n)
(
0
i~
(.)
=
)
0 .
(10.6.5)
Note that {1a~-l is independent of {x!n)}i. Consequently,
E{T,,}
=
E{T,,-dlt
= Itn.
Definition 10.6.1. The sequence of random variables {T,,}i is called a "Galton
Watson branching process."
Here, T" can be thought of as the size of a population at the nth generation, with each individual independently giving birth to a random number of
offspring to produce the population in the next generation.
Set
Un = T,,/lt n, n
~
1;
then, {Un}i represents a martingale. To show this, consider
10. Discrete Parameter Martingales
276
E{UnIU1, ... , Un-d = E
=
r~' x[nJIU1,···, Un- 1}! p'n
f
k=l
E{I{Tn_,=k J
±
i=l
x[nJIU1, ... ,Un_1}!p.n
L kI{Tn_ =k}/P.n-1 = T,,_dp.n-1 = Un- 1,
00
=
1
k=l
which proves the claim.
Next, because
sup EI Unl
n
= sup E{Un} = 1,
n
then
In other words,
which implies that
T" --+ 0 (a.s.) if p. < 1
and
T" --+ 00 (a.s.) if p. >
1.
Problems and Complements
10.1. Let {n,as',p} be a probability space and §1 and §2 two independent sub-ITalgebras of as'. If A E §1 n ~, show that P(A) = 0 or 1.
10.2. (Continuation) If G c
n is an arbitrary subset, write
as'G = {G n B; B E as'}.
(i) Show that as'G is a IT-algebra on G (the so-called IT-algebra induced on G by
as').
(ii) Let f/ be a family of subsets of n. Show that
IT{f/G}
= IT{f/} n
G.
10.3. Show that there does not exist a IT-algebra having a countably infinite number
of elements.
10.4. Let Xl' ... , X. be n independent Poisson random variables with parameters
A1, ... , An' respectively. Determine:
(i) the conditional distribution of (X 1 , ••• ,X.- 1 ) given that Xl + ... + Xn =
N;
(ii) E{XdX1
+ X 2 }.
Problems and Complements
277
10.5. Let X E LdQ,~,P} and {Bk} C ~ be a partition ofQ. Let fF be a a-algebra
generated by {B,.}'i'. Show that
E{XlfF} =
where
(1.k
f
(1.k I B.
k=l
(a.s.),
= E{XIBk}.
10.6. (Continuation) If fFl and fF2 are two sub-a-algebras of ~ such that a{ X} v fFl
is independent of fF2 show that
10.7. Let {Xi}'i' c Ll {Q,~,P} be an i.i.d. sequence of random variables. Set Sn =
Xl + ... + X. and show that
E{XdS.,S.+1' ... } = Sn/n
(a.s.).
10.8. Let X, YELl {Q,~, P} and let fFl and fF2 be two independent sub-a-algebras.
Show that E{XlfFd and E{YlfF2 } are independent random variables.
10.9. (Continuation) Show that
E{XlfFl n fF2 } = E(X).
10.10. Let {Xi}O' be a sequence of independent random variables such that E(Xi ) = 1,
i = 0, 1, .... Show that the sequence {e.}O' is a martingale, where
e. = n Xi·
n
i=O
10.11. Let {Xi}'i' be a sequence of independent random variables and {lj}'i' another
sequence of random variables such that, for each n, the family {Xn' X.+1' ... } is
independent of {ll, ... , Y,,}. Show that the sequence {Zn}'i' is a martingale,
where
Z. =
•
L
k=l
X k Yk
if X k Yk E LdQ,~,P},
k = 1,2, ....
10.12. Let {e.}O' be a martingale and qJ(.) a convex function such that qJ(en) E
LI {Q,~,P}, n = 0, 1, .... Show that the sequence {(qJ(e.),~)}, where ~ =
ago, ... , e.} is a submartingale.
10.13. Let
{~}'i' be a nondecreasing sequence of a-algebras and fF = a{Uk'=l ~}.
Let be an fF -measurable random variable having finite expectation. Show
that
e
E{ei~} -+
e
(a.s.).
10.14. Let {e.}'i' be a uniformly integrable martingale with en ~ 0, e. -+ e (a.s.). Show
that for all n = 1, 2, ...
e. = E{eiel,···,e.}.
10.15. Let {Xn}'i' be an i.i.d. sequence of random variables with E(Xd = 0, E(Xf) = 1
and ~(l) = E(exp{lXd) < 00 for III < Il. Set S. = Xl + ... + X n • Show that as
278
10. Discrete Parameter Martingales
rjJ(A) = 1 + A2/2
10.16. Let h(t)
+ O(A2).
= (2t log log t)1/2. Show that
lim h(t)/t
= o.
10.17. Let A E (-e, e); show that the sequence {Y,,}f is a martingale, where
Y" = exp{ASn}/(rjJ(A»n.
10.18. Let {Zdf be a sequence of i.i.d. random variables with support in (O,lJ
and a i = E(ZI)i < 1, i = 1, 2. Show that {Ln}f is a martingale where Ln =
(n~ ZJan, Find limn~oo E{L~} and show that Ln ---+ 0 (a.s.).
10.19. (Continuation) Let {Y,Jf c L2 {Q,~, P} be an i.i.d. sequence of strictly positive
random variables independent of {Zk}f and set
Sn = XI
+ ... + X n •
Show that Sn ---+ S (a.s.), where E{S} = a l l1d(l - ad and 111 = E{Yd. Show
also that Sn ~ ZI (YI + Sn-d.
10.20. (Continuation) Denote by Gn(x)
G(x)
where Q(y)
=
I
=
P{ Sn
h(t) dt
:0;
x}, G(x)
=
G-s)
f/l Q
P{S
:0;
x}, and show that
g(s) ds,
= P {Y :0; y} and h is the probability density of ZI'
Bibliography
Chapter 1
Doob, J.L. (1953). Stochastic Processes. Wiley, New York.
Elliott, J.R. (1982). Stochastic Calculus and Applications. Springer-Verlag, New York.
Gihman, 1.1. and Skorohod, A.V. (1969). Introduction to the Theory of Random Processes. Saunders, Philadelphia.
Halmos, P.R. (1950). Measure Theory. Van Nostrand, New York.
Kolmogorov, A.N. (1941). fiber das logarithmisch normale Verteilungsgesetz der
Dimensionen der Teilchen bei Zerstiickelung. Dokl. Akad. Nauk SSR. 31, 99-1Ol.
Loeve, M. (1977). Probability Theory I. Springer-Verlag, New York.
Neveu, J. (1965). Mathematical Foundations of the Calculus of Probability. HoldenDay, San Francisco.
Prohorov, Y.V. and Rozanov, Y.A. (1969). Probability Theory. Springer-Verlag, New
York.
Todorovic, P. (1980). Stochastic modeling of longitudinal dispersion in a porous
medium. Math. Sci. 5,45-54.
Todorovic, P. and Gani, 1. (1987). Modeling of the effect of erosion on crop production. J. Appl. Prob. 24, 787-797.
Chapter 2
Belayev, Yu. K. (1963). Limit theorem for dissipative flows. Theor. Prob. App/. 8,
165-173.
Daley, D.1. and Vere-Jones, D. (1988). An Introduction to the Theory of Point Processes. Springer Verlag, New York.
Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. 2,
2nd ed. Wiley, New York.
Grandell, J. (1976). Doubly Stochastic Poisson Processes (Lecture Notes Math. 529).
Springer-Verlag, New York.
280
Bibliography
Kac, M. (1943). On the average number ofreal roots of a random algebraic equation.
Bull. Am. Math. Soc. 49, 314-320.
Le Cam, L. (1960). An approximation theorem for the Poisson binomial distribution.
Pacific J. Math. 10,1181-1197.
Renyi, A. (1967). Remarks on the Poisson process. Stud. Sci. Math. Hungar. 2,
119-123.
Rice, S.O. (1945). Mathematical analysis of random noise, Bell Syst. Tech. J. 24,
46-156.
Sertling, R.J. (1975). A general Poisson approximation theorem. Ann. Prob. 3, 726731.
Todorovic, P. (1979). A probabilistic approach to analysis and prediction of floods.
Proc. 42 Session lSI. Manila, pp. 113-124.
Westcott, M. (1976). Simple proof of a result on thinned point process. Ann. Prob. 4,
89-90.
Chapter 3
Bachelier, L. (1941). Probabilites des oscillations maxima. C.R. Acad. Sci., Paris 212,
836-838.
Breiman, L. (1968). Probability. Addison-Wesley, Reading MA.
Brown, R. (1928). A brief account of microscopical observations made in the months
of June, July, and August, 1927 on the particles contained in the pollen of plants.
Phi/os. Mag. Ann. Phi/os. (New Series). 4, 161-178.
Einstein, A. (1905). On the movement of small particles suspended in a stationary
liquid demanded by the molecular-kinetic theory of heat. Ann. Physik 17.
Freedman, L. (1983). Brownian Motion and Diffusion. Springer-Verlag, New York.
Hartman, P. and Wintner, A. (1941). On the law of the iterated logarithm. Am. J.
Math.63,169-176.
Hida, T. (1965). Brownian Motion. Springer-Verlag, New York.
Karlin, S. (1968). A First Course in Stochastic Processes. Academic Press, New York.
Kunita, H. and Watanabe, S. (1967). On square integrable Mortingales. Nagoya Math.
J. 30,209-245.
Levy, P. (1965). Processus Stochastiqus et Mouvement Brownien. Gauthier Villars,
Paris.
Nelson, E. (1967). Dynamical Theories oj Brownian Motion. Mathematical Notes,
Princeton University.
Skorohod, A.V. (1964). Random Processes with Independent Increments. Nauka,
Moscow (in Russian).
Smokuchowski, M. (1916). Drei Vortrage tiber Diffusion Brownche Molekulorbewegung und Koagulation von KolloidteiIchen. Phys. Zeit. 17, 557-571.
Uhlenbeck, G.E. and Ornstein, L.S. (1930). On the theory of Brownian motion. Phys.
Rev. 36, 823-841.
Chapter 4
Anderson, T.W. (1958). An Introduction to Multivariate Statistical Analysis. Wiley,
New York.
Doob, J.L. (1953). Stochastic Processes. Wiley, New York.
Bibliography
281
Feller, W. (1971). An Introduction to Probability Theory and its Applications, Volume
2, 2nd ed. Wiley, New York.
Ibragimov, I.A. and Rozanov, Y.A. (1978). Gaussian Random Processes. SpringerVerlag, New York.
Rozanov, Y.A. (1968). Gaussian infinitely dimensional distributions, Steklov Math.
Inst. Publ. 108, 1-136. (in Russian).
Chapter 5
Akhiezer, N.I. and Glazrnan, I.M. (1963). Theory of Linear Operators in Hilbert Space,
Volumes I and II. Frederic Ungar Publishing, Co., New York.
Dudley, R.M. (1989). Real Analysis and Probability. Wadsworth and Brooks/Cole,
Pacific Grove, CA.
Loeve, M. (1978). Probability Theory, II. Springer-Verlag, New York.
Kolmogorov, A.N. and Fomin, S.V. (1970). Introductory Real Analysis. Prentice-Hall,
Englewood Cliffs, NJ.
Natanson, I.P. (1960). Theory of Functions of Real Variables, Volume I and II.
Frederic Ungar Publishing, New York.
Riesz, F. and Sz.-Nagy, B. (1955). Functional Analysis. Frederic Ungar Publishing,
New York.
Robinson, E.A. (1959). An Introduction to Infinitely Many Variates. Hafner Publishing,
New York.
Royden, H.L. (1968). Real Analysis, 2nd ed. The Macmillan Co., New York.
Yosida, K. (1974). Functional Analysis, 4th ed. Springer-Verlag, New York.
Wilansky, A. (1964). Functional Analysis. Blaisdell Publishing, New York.
Chapter 6
Cramer, H. (1940). On the theory of stationary random processes. Ann. Math. 41,
215-230.
Cramer, H. and Leadbetter, M.R. (1967). Stationary and Related Stochastic Processes.
Wiley, New York.
Gihman, I.I. and Skorohod, A.V. (1974). The Theory of Stochastic Processes. SpringerVerlag, New York.
Grenander, U. and Rosenblatt, M. (1956). Statistical Analysis of Stationary Time
Series. Wiley, New York.
Karhunen, K. (1947). Uber Lineare Methoden in der Wahrscheinlichkeitsrechnung.
Ann. Acad. Sci. Fenn. 37.
Khinchin, A.Y. (1938). Correlation theory of stationary random processes. Usp. Math.
Nauk. 5,42-51.
Loeve, M. (1946). Fonctions aleatoires du second ordre. Rev. Sci. 84, 195-206.
Lovitt, W.V. (1924). Linear Integral Equation. McGraw-Hill, New York.
Mercer, J. (1909). Functions of positive and negative type and their connections with
the theory of integral equations. Phil. Trans. Roy. Soc. London, Ser. A, 209,
415-446.
Riesz, F., and Sz-Nagy, B. (1955). Functional Analysis. Frederic Unger Publishing,
New York.
Rozanov, A.Y. (1967). Stationary Random Processes. Holden-Day, San Francisco.
Tricomi, F.G. (1985). Integral Equation. Dover Publishing, New York.
282
Bibliography
Chapter 7
Ash, R. and Gardner, M.L. (1975). Topics in Stochastic Processes. Academic Press,
New York.
Bochner, S. (1955). Harmonic Analysis and Theory oj Probability. University of California Press, Berkeley.
Bochner, S. (1959). Lectures on Fourier Integrals. (Ann. Math. Studies 42). Princeton
University Press, Princeton, NJ.
Cramer, H. (1940). On the theory of stationary random processes. Ann. Math. 41,
215-230.
Cramer, H. and Leadbetter, M.R. (1967). Stationary and Related Stochastic Processes.
Wiley, New York.
Gihman, 1.1. and Skorohod, A.V. (1974). The Theory oj Stochastic Processes, Volume
1. Springer-Verlag, New York.
Hajek, J. (1958). Predicting a stationary process when the correlation function is
convex. Czech. Math. J. 8,150-161.
Khinchin, A.Y. (1938). Correlation theory of stationary random processes. Usp. Math.
Nauk. 5, 42-51.
Kolmogorov, A.N. (1941). Interpolation and extrapolation of stationary random sequences. Izv. Akad. Nauk. SSSR Ser. Math. 5, 3-14.
Krein, M.G. (1954). On the basic approximation problem in the theory of extrapolation and filtering of stationary random processes. Dokl. Akad. N auk. SSSR 94,
13-16.
Liptser, R.S. and Shiryayev, A.N. (1978). Statistics oj Random Processes, Volume 2.
Springer-Verlag, New York.
Rozanov, A.Y. (1967). Stationary Random Processes. Holden-Day, San Francisco.
Vong, E. and Hayek, B. (1985). Stochastic Processes in Engineering Systems. SpringerVerlag, New York.
Wold, H. (1954). A study in the Analysis of Stationary Time Series, 2nd ed., Almqvist
and Wiksell, Stockholm.
Yaglom, A.M. (1949). On the question of linear interpolation of stationary stochastic
processes. Usp. Math. Nauk. 4,173-178.
Yaglom, A.M. (1962). An Introduction to the Theory oj Stationary Random Functions.
Prentice-Hall, Englewood Cliffs, NJ.
Chapter 8
Blumenthal, R.M. and Getoor, R.K. (1968). Markov Processes and Potential Theory.
Academic Press, New York.
Chiang, L.c. (1968). Introduction to Stochastic Processes in Biostatistics. Wiley, New
York.
Chung, K.L. (1967). Markov Chains with Stationary Transition Probabilities, 2nd ed.
Springer-Verlag, New York.
Doob, J.L. (1953). Stochastic Processes. Wiley, New York.
Dynkin, E.B. and Yushkevich A.A. (1956). Strong Markov processes. Theory Prob.
Appl. 1, 134-139.
Dynkin, E.B. (1961). Foundations oj the Theory oj Markov Processes. Prentice-Hall,
Englewood Cliffs, NJ.
Feller, W. (1954). Diffusion processes in one dimension. Trans. Amr. Math. Soc. 77,
1-31.
Bibliography
283
Feller, W. (1966). An Introduction to Probability Theory and its Applications, Volume
2. Wiley, New York.
Gihman, 1.1. and Skorohod, A.V. (1975). The Theory of Stochastic Processes, Volume
2. Springer-Verlag, New York.
Gnedenko, B.V. (1976). The Theory of Probability. Mir Publishers, Moscow.
Hille, E. (1950). On the differentiability of semi-groups of operators. Acta Sci. Math.
Szeged 12B, 19-24.
Ito, K. (1963). Random Processes II. Izdatelstwo Inostranoy Literatury, Moscow
(Russian translation from Japanese).
Karlin, S. (1968). A First Course in Stochastic Processes. Academic Press, New York.
Kolmogorov, A.N. (1951). On the problem of differentiability of transition probabilities of timechomogeneous Markov processes with countable number of states.
Uchenye Zapiski Moskovskovo Gos. 148 (Matematika 4), 53-59 (in Russian).
Lamperti, J. (1977). Stochastic Processes. Springer-Verlag, New York.
Mandl, P. (1968). Analytical Treatment of One-Dimensional Markov Processes.
Springer-Verlag, New York.
Chapter 9
Dynkin, E.B. (1965). Markov Processes. Springer-Verlag, Berlin.
Ethier, N.S. and Kurtz, T.G. (1986). Markov Processes. Wiley, New York.
Feller, W. (1952). The parabolic differential equations and the associated semigroups
of transformations. Ann. Math. 55,468-519.
Hille, E. and Philips, R.S. (1957). Functional Analysis and Semi-groups. Am. Math. Soc.
Colloq. Publ. 31. American Mathematical Society, Providence, RI.
Hille, E. (1948). Functional Analysis and Semi-groups. Collq. Publ. Amer. Math. Soc.
Mandl, P. (1968). Analytical Treatment of One-Dimensional Markov Processes.
Springer-Verlag, New York.
Yosida, K. (1948). On the differentiability and representation of one-parameter semigroups of linear operators. J. Math. Soc. Japan. 1, 15-21.
Yosida, K. (1974). Functional Analysis, 4th ed. Springer-Verlag, New York.
Chapter 10
Breiman, L. (1968). Probability. Addison-Wesley, Reading MA.
Chung, K.L. (1974). A Course in Probability Theory, 2nd ed. Academic Press, New
York.
Doob, J.L. (1953). Stochastic Processes. Wiley, New York.
Loeve, M. (1978). Probability Theory II, 4th ed. Springer-Verlag, New York.
Shiryayev, A.N. (1984). Probability. Springer-Verlag, New York.
Index
A
A/B/s, 18
Absorbing state, 209, 215
Adapted, 262
Adjoint operator, 123
Admissible linear filter, 175
Almost sure continuous trajectories, 24
Analytical, 189
Arc sin law, 70
Arrival times, 17,27
Autogenesis, 211
Average hitting time, 66
Bivariate point process, 53
Bochner-Khinchin, i51
Bombardment molecular, 81
Borel cylinder, 4
Borel-Cantelli, 77
Borel-Radon measure, 35
Bounded generator, 242
Bounded linear operator, 223
Bounded random variables, 113
Branching process, 275
Brown, Robert, 15, 62
Brownian bridge, 64
Brownian motion, 15,63
Brownian motion with drift, 64
B
Bachelier, 81
Backward diffusion equation, 226
Backward equation, 210, 212
Banach space, 218, 233
Basis of L 2 , 118
Belayev, 52, 279
Beppo-Levi, 112
Bernoulli, 37, 39
Bernoulli random variables, 37, 39
Bessel inequality, 108
Best approximation, 171
Best linear predictor, 187
Binomial component, 213
Birth and death process, 211
C
Cadlag, 7, 20, 215
Cardinality, 121
Cauchy functional equation, 49, 202,
215, 238
Cauchy inequality, 109
Cauchy or fundamental sequence, 111
Cauchy theorem, 189
Cauchy sequence, 111, 202
Chapman-Kolmogorov equation, 11,
200,202,206,224
Characterization of normal distribution,
96
Index
Closed contour, 189
Closed linear manifold, 115
Closure, 19
Complete orthogonal system, 118
Completeness, 113
Conditional
expectation, 258
probability, 259
Conjugate space, 241
Conservative Markov process, 202
Consistency conditions, 2
Continuity (a.s.), 24
Continuity concepts, 22
Continuous functions (set), 22
Continuous in probability, 22; see also
stochastically continuous
Continuous time Markov chain, 205
Contraction, 218, 233
Contraction semigroup, 233
Control, 162
Convergence in quadratic mean, 110
Convergence of submartingales, 268
Convex, 127, 262, 263
Countablyadditive, 114
Counting random function, 36
Coupling, 38
Covariance function, 129
Covariance matrix, 94
Cox, 51
Cramer, H., 79
Curve in the Hilbert space, 177
D
Death rate, 211
Death process, 212
Decomposition of Z, 116
Dense subset, 114
Deterministic process, 177
Deviation square, 223
Difference-differential equations,
212
Diffusion coefficient, 223
Diffusion process, 223
Dimensionality, 121
Dinamically neutral, 16
Dini, 141
Dirac measure, 35
Directly given, 3
285
Discontinuities, 23
of the first kind, 25
Discontinuity point (fixed), 23
Discrete parameter processes, 1, 182
Dispersion (longitudinal), 16
Distance in L 2 , 109
Distance (minimal), 116
Distribution (marginal), 1
Doob, 1., 19,20,64,79,82, 103,267,
268
Doubly stochastic, 51
Drift coefficient, 223
Dynkin, 202, 216, 217
E
Eigen function, 136
Eigenfunctions of integral equation,
137
Eigenvalues
of a matrix, 93
of integral equation, 137
Einstein, A., 62
Elementary random measure, 157
Entire function, 193
Epoch,70
Equivalent processes, 5
Ergodicity, 145, 146
Error of estimate, 170
Essentially bounded, 113
Essential supremum, 114
Estimate, 170
Estimation, 169
Everywhere dense, 114
Everywhere dense in L 2, 114
Exceedance,57
Excursion, 57
Extremes of Brownian motion, 67
F
Feller, 53,75
Feller processes, 128,218
Filtering and prediction, 169, 170
Filtration, 262, 265, 273
Finite dimensional distribution, 201
Fischer, 111
Fixed discontinuity point, 24
Flood modeling, 56
Index
286
Fokker Planck, 226
Forward diffusion equation, 226
Fourier coefficients, 118
Fourier series, 118
Fredholm equation, 137
Frictional force, 81
Fubini,28
Functional, 186,241
G
G/G/l,18
Galton-Watson, 275
Gambling, 263
Gaussian process, 97
Gaussian system, 93
Generalized derivative, 134
Generalized Fourier series, 118
Generating function, 213
Generator (infinitesimal), 235
Generator of a semigroup, 234
Germination process, 13
Global behavior, 201
Gram-Schmidt, 121, 136
Grandel,51
H
Hartman, 75
Herglotz's theorem, 152
Hermitian
form, 131
kernel,137
symmetry, 130, 137
Hilbert space, 113
Hille, E., 207
Hille-Yosida, 243, 254
History, 261
Hitting times, 65
Homogeneous diffusion, 223
Homogeneous Markov process, 10, 200
Homogeneous Poisson process, 43
I
Imbibition, 13
Independent increments, 8
Indistinguishable, 6
Initial distribution, 9, 201
Inner product, 107
Innovation, 183
Instantaneous state, 209
Integral (stochastic), 86
Riemann, 86
Interarrival times, 48
Internal history, 261
Invariant measure, 203
Inverse matrix, 92
Isometric
isomorphism, 158
mapping, 158
operator, 124
Ito, 86, 226
J
Jensen's inequality, 112
Joint probability density, 94
Jump, 216
K
Kac, M., 35
Karhunen-Loeve expansion, 139
Kernel of integral equation, 137
Khintchine, 75
Kolmogorov, 7,15,25,75,224,
226
Kunita,86
L
L2 space, 106
L 2-continuous process, 132
Langevin's equation, 81
Laplace transform, 53, 239
Law of motion (Newton), 81
Law of the iterated logarithm, 74
LeCam, 39
Likelihood ratio, 264
l.i.m., 110
Linear estimate, 173
Linear manifold, 115, 173
Linear operator, 122
Linear predictor, 173
Linear transformation, 174
Lipschitz conditions, 226
Local behavior, 201
287
Index
Loeve's criterion, 113
Levy, 226
M
M/M/l, 18
Marginal distribution, 1, 2
Marked point process, 53
Markov Gaussian process, 99
Markov inequality, 23
Markov process, 9, 200
Markov process homogeneous, 10
Markov process regular, 216
Markov property, 9
Markov renewal process, 220
Markov time, 216
Martingale, 12, 46, 258, 272
Martingale closed, 262
Mathematical modeling, 12
Matrix,92
Maximum, 19, 56, 68
Mean rate, 43
Mean square error, 170
Measurable process, 27
Mercer, 140
Metric, 38
Moving average, 176, 183, 185
N
Newton, 12,81
Noise, 172
Non-deterministic process, 177
Non-negative definite, 93, 130
Norm, 108, 123
Norm of an operator, 123
Normal distribution, 94
Normal operator, 124
Normally distributed, 94
o
Order statistics, 23, 44
Ornstein Uhlenbeck process, 81, 82
Orthogonal
basis, 118
collection, 108
complement, 117
matrix,93
projection, 115
random measure, 157
Orthonormal collection, 108
Outer measure, 8
p
Parallelogram law, 108
Parameter set, 1
Parseval, 119
Partition, 73
Path, 2, 71
Physically realizable filter, 176
Pincherle-Goursat kernel, 138
Poincare, R., 62
Point measure, 36
Point process on R, 34
Point process, simple, 35
Poisson process, 39, 40
Pole, 190
P61ya, 152
Porous medium, 16
Positive definite, 93, 94
Prediction (pure), 169
Process with orthogonal increments, 160
Projection operator, 124, 125
Pure birth process, 49, 212
Pure death process, 212
Purely non-deterministic process, 178
Q
Quadratic form, 93
Quadratic mean continuity, 132
Quadratic mean differentiability, 134
Quadratic variation, 73
Queues, 17
R
Rainfall, 34, 51
Random measure, 36, 175
Random process, 1
Random telegraph process, 153
Real valued, 1
Rectifiable, 74
Reflection principle, 68
Regular (stable) state, 209
Regular Markov process, 216
Index
288
Renyi,53
Resolvent, 244, 238
Rice, 35
Riemann stochastic integral, 86
Riesz-Fisher theorem, 111
Right continuous inverse, 43
S
Sample path (function), 2
Scheffe,93
Schwarz's inequality, 107
Second order, 11, 106
Second order process, 129
Self adjoint operator, 123
Semigroup, 232, 233
Separability, 18
Separable process, 19
Separable version, 20
Serfling, 39
Shift operator, 174
Singleton, 197
Singular matrix, 92
Smoluchowski, M., 62
Soil erosion, 14
Spanned, 115
Spectral characteristic, 174
Spectral density, 121
Spectral distribution, 121, 163
Spectral representation of a process,
162
Standard Brownian motion, 63
State space, 2, 200
Stationary distribution, 203
Stationary Gaussian Markov, 102
Stationary Gaussian process, 102
Stationary stochastic process, 10, 11,
143
Stationary transition probability, 10
Stieltjes integral, 85
Stochastic
integration, 85
matrix, 206
measure, 157
process, 1
sequence, 261
structure, 8
Stochastically
continuous, 22
equivalent, 5
equivalent (wide sense), 5
Stone, 162
Stopping time, 216
Strictly stationary, 10
Strong
convergence, 234
Markov process, 216
Markov property, 217
Strongly continuous, 234
Strongly integrable, 236
Subadditive, 207
Submartingale, 12,261
Subordinated process, 175
Subspace, 115
Supermartingale, 12, 261
Supremum norm, 232
Symmetric distribution, 153
T
Telegraph process, 144
Thinned version, 52
Thinning of a point process, 51
Todorovic, 58
Total variation distance, 38
Transition
probability, 10
standard,218
stochastically continuous, 252
Transpose matrix, 92
Triangle inequality, 109
U
Unbiased estimate, 170
Uniform integrability, 270
Uniformly integrable martingales, 272
Unitary, 115
Unitary operator, 124
Upcrossing, 266
Upcrossing inequality, 266
V
Version, 6, 9
W
Weak convergence, 52
Westcott, 52
Index
White noise, 83, 184
Wide sense stationary, 11, 143
Wiener process, 63
Wold decomposition, 179
289
y
Yaglom, 187
Yushkevich, 217, 218
Springer Series in Statistics
(continued from p. ii)
Shed/er: Regeneration and Networks of Queues.
Siegmund: Sequential Analysis: Tests and Confidence Intervals.
Todorovic: An Introduction to Stochastic Processes and Their Applications.
Tong: The Multivariate Normal Distribution.
Vapnik: Estimation of Dependences Based on Empirical Data.
West/Harriso1l: Bayesian Forecasting and Dynamic Models.
Wolter: Introduction to Variance Estimation.
Yaglom: Correlation Theory of Stationary and Related Random Functions I:
Basic Results.
Yaglom: Correlation Theory of Stationary and Related Random Functions II:
Supplementary Notes and References.
Descargar