Springer Series in Statistics Probability and its Applications A Series of the Applied Probability Trust Editors-Probability and its Applications J. Gani, C.c. Heyde Editors-Springer Series in Statistics J. Berger, S. Fienberg, J. Gani, K. Krickeberg, I. Oikin, B. Singer Springer Series in Statistics Anderson: Continuous-Time Markov Chains: An Applications-Oriented Approach. Andrews/Herzberg: Data: A Collection of Problems from Many Fields for the Student and Research Worker. Anscombe: Computing in Statistical Science through APL. Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition. BolJarine/Zacks: Prediction Theory for Finite Populations. Bremaud: Point Processes and Queues: Martingale Dynamics. Brockwell/Davis: Time Series: Theory and Methods, 2nd edition. Clloi: ARMA Model Identification Daley!Vere-Jones: An Introduction to the Theory of Point Processes. Dzllaparidze: Parameter Estimation and Hypothesis Testing in Spectral Analysis of Stationary Time Series. Farrell: Multivariate Calculation. Fienberg/Hoaglin/Kruskal/Tanur (Eds.): A Statistical Model: Frederick Mosteller's Contributions to Statistics, Science, and Public Policy. Goodman/Kruskal: Measures of Association for Cross Classifications. Grandell: Aspects of Risk Theory. Hall: The Bootstrap and Edgeworth Expansion. Hardie: Smoothing Techniques: With Implementation in S. Hartigan: Bayes Theory. Heyer: Theory of Statistical Experiments. Jolliffe: Principal Component Analysis. Kotz/Jollilson (Eds.): Breakthroughs in Statistics Volume I. Kotz/JollIIson (Eds.): Breakthroughs in Statistics Volume II. Kres: Statistical Tables for Multivariate Analysis. Leadbetter/LindgrenIRootzen: Extremes and Related Properties of Random Sequences and Processes. Le Cam: Asymptotic Methods in Statistical Decision Theory. Le CamlYang: Asymptotics in Statistics: Some Basic Concepts. Manoukian: Modern Concepts and Theorems of Mathematical Statistics. Miller, Jr.: Simultaneous Statistical Inference, 2nd edition. Mosteller/Wallace: Applied Bayesian and Classical Inference: The Case of The Federalist Papers. Pollard: Convergence of Stochastic Processes. Pratt/Gibbons: Concepts of Nonparametric Theory. Read/Cressie: Goodness-of-Fit Statistics for Discrete Multivariate Data. Reiss: Approximate Distributions of Order Statistics: With Applications to Nonparametric Statistics. Ross: Nonlinear Estimation. Sachs: Applied Statistics: A Handbook of Techniques, 2nd edition. Salsburg: The Use of Restricted Significance Tests in Clinical Trials. Samdal/Swensson/Wretman: Model Assisted Survey Sampling. Seneta: Non-Negative Matrices and Markov Chains. (colllinued after index) Petar Todorovic An Introduction to Stochastic Processes and Their Applications With 15 Illustrations Springer-Verlag New York Berlin Heidelberg London Paris Tokyo Hong Kong Barcelona Budapest Petar Todorovic Department of Statistics and Applied Probability University of California-Santa Barbara Santa Barbara, CA 93106 USA Series Editors: J. Gani Department of Statistics University of California Santa Barbara, CA 93106 USA C.C. Heyde Department of Statistics Institute of Advanced Studies The Australian National University GPO Box 4, Canberra ACT 2601 Australia Mathematics Subject Classification (1991): 60G07, 60G12, 60125 Library of Congress Cataloging-in-Publication Data Todorovic, P. (Petar) An introduction to stochastic processes and their applications / by P. Todorovic. p. cm.-(Springer series in statistics) Includes bibliographical references and index. ISBN -13: 978 -1-4613-9744 -1 e-ISBN -13: 978 -1-4613-9742-7 DOl: 10.1007/978-1-4613-9742-7 1. Stochastic processes. I. Series. QA274.T64 1992 519.2-dc20 91-46692 Printed on acid-free paper. © 1992 The Applied Probability Trust. Softcoverreprint of the hardcover 1st edition 1992 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Production managed by Henry Krell; manufacturing supervised by Robert Paella. Typeset by Asco Trade Typesetting Ltd., Hong Kong. 9 8 7 6 5 4 321 ISBN -13: 978 -1-4613-9744-1 To my wife Zivadinka Preface This text on stochastic processes and their applications is based on a set of lectures given during the past several years at the University of California, Santa Barbara (UCSB). It is an introductory graduate course designed for classroom purposes. Its objective is to provide graduate students of statistics with an overview of some basic methods and techniques in the theory of stochastic processes. The only prerequisites are some rudiments of measure and integration theory and an intermediate course in probability theory. There are more than 50 examples and applications and 243 problems and complements which appear at the end of each chapter. The book consists of 10 chapters. Basic concepts and definitions are provided in Chapter 1. This chapter also contains a number of motivating examples and applications illustrating the practical use of the concepts. The last five sections are devoted to topics such as separability, continuity, and measurability of random processes, which are discussed in some detail. The concept of a simple point process on R+ is introduced in Chapter 2. Using the coupling inequality and Le Cam's lemma, it is shown that if its counting function is stochastically continuous and has independent increments, the point process is Poisson. When the counting function is Markovian, the sequence of arrival times is also a Markov process. Some related topics such as independent thinning and marked point processes are also discussed. In the final section, an application of these results to flood modeling is presented. Chapter 3 provides a short introduction to the theory of one-dimensional Brownian motion. Principal topics here are hitting times, extremes, the reflection principle, properties of sample paths and the law of the iterated logarithm. The chapter ends with a discussion of the Langevin equation, the Ornstein-Uhlenbeck process and stochastic integration. viii Preface Chapter 4 deals with the theory of Gaussian processes. It begins with a brief account of the relevant matrix theory and the definition of a multivariate Gaussian distribution and its characteristic function, which is worked out in detail. We show that a system of random variables is Gaussian if and only if every linear combination of these variables is normal. We also discuss the Markov-Gaussian process and prove Doob's theorem, which asserts that the only stationary Gaussian processes are the Ornstein-Uhlenbeck processes. Chapter 5 contains a brief introduction to the Hilbert space L 2 , which has some particular features not shared by other Lp spaces. Here the emphasis is on those topics essential in subsequent sections. They include the RieszFisher theorem, the structure of L2 spaces, the concept of orthogonal projection and orthogonal basis, separability, and linear and projection operators. Chapter 6 deals with the theory of second order (or L 2 ) processes, which are characterized up to Hilbert space isometry by their covariance functions. The focus here is on the covariance function and its properties. It is natural to have criteria for continuity, etc., expressed in terms of the covariance function. Expansion of the covariance function in terms of its eigenvalues and eigenfunctions, as well as the Karhunen-Loeve expansion are discussed in some detail. The first part of Chapter 7 is concerned with the spectral analysis of (wide sense) stationary processes. The gist of this section is the "spectral representation" of a stationary process, which establishes an isometric isomorphism between the closed linear manifold spanned by the random variables of the process and a certain L2 space of complex functions. With the groundwork laid, the problem of estimation (and its special cases filtering and prediction) can now be investigated. The method for solving the prediction problem described here is due to Yaglom. Its starting point is the spectral representation of the process. However, the results obtained are most useful for rational spectral densities. Finally, the Wold decomposition is also considered in some detail. Chapter 8, an introduction to Markov processes, consists of three parts. The first lists some basic features of homogeneous Markov processes: it is shown that the existence of a stationary measure is a necessary and sufficient condition for the process to be strictly stationary. The second part treats a class of homogeneous Markov processes with countable state space. The focus here is on the transition probability and its properties. If sample paths of the Markov process are right continuous, then its transition probability is not only uniformly continuous but also differentiable. This is used to derive Kolmogorov's backward and forward differential equations. In this section we also introduce the concept of the "strong Markov" property and discuss the structure of Markov chains. The last part is concerned with homogeneous diffusion. We briefly describe Ito's approach, which shows that a diffusion process is governed by a first-order stochastic differential equation which depends on a standard Brownian motion process. Preface ix Chapter 9 provides an introduction to the application of semigroup theory to Markov processes, whereas Chapter 10 discusses some rudiments of the theory of discrete parameter martingales. I would like to point out that after Chapter 1 (or at least the first half of it) one can move directly to most of the other chapters. Chapter 5, however, is a necessary prerequisite for reading Chapters 6, 7, and 9. The course has been tested over years on graduate students of statistics at the University of California, Santa Barbara, and contains material suitable for an introductory as well as a more advanced course in stochastic processes. For encouragement, support, and valuable advice, I am glad to thank Dr. Joe Gani. I am also grateful to the referees including William Griffith and Gennady Samorodnitsky for their comments on the first draft of this book. My special thanks to Chris Heyde for his extraordinarily careful reading of the whole manuscript and for correcting numerous errors and misprints. Finally, I acknowledge with warm thanks my indebtedness to colleagues and students at the UCSB Department of Statistics and Applied Probability. Petar Todorovic Contents Preface vii CHAPTER 1 Basic Concepts and Definitions 1.1. 1.2. 1.3. 1.4. 1.5. 1.6. 1.7. 1.8. 1.9. 1.10. 1.11. Definition of a Stochastic Process Sample Functions Equivalent Stochastic Processes Kolmogorov Construction Principal Classes of Random Processes Some Applications Separability Some Examples Continuity Concepts More on Separability and Continuity Measurable Random Processes Problems and Complements 1 2 5 7 8 12 18 21 22 25 27 30 CHAPTER 2 The Poisson Process and Its Ramifications 34 2.1. 2.2. 2.3. 2.4. 2.5. 2.6. 2.7. 2.8. 2.9. 2.10. 34 35 37 40 43 Introduction Simple Point Process on R+ Some Auxiliary Results Definition of a Poisson Process Arrival Times {ttl Markov Property of N(t) and Its Implications Doubly Stochastic Poisson Process Thinning of a Point Process Marked Point Processes Modeling of Floods Problems and Complements 46 50 51 53 56 58 Contents XII CHAPTER 3 Elements of Brownian Motion 62 3.1. 3.2. 3.3. 3.4. 3.5. 3.6. 3.7. 3.8. 62 65 67 71 74 Definitions and Preliminaries Hitting Times Extremes of ~(t) Some Properties of the Brownian Paths Law of the Iterated Logarithm Some Extensions The Ornstein-Uhlenbeck Process Stochastic Integration Problems and Complements 79 81 85 88 CHAPTER 4 Ga ussian Processes 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. Review of Elements of Matrix Analysis Gaussian Systems Some Characterizations of the Normal Distribution The Gaussian Process Markov Gaussian Process Stationary Gaussian Process Problems and Complements CHAPTER 5 L2 Space 5.1. 5.2. 5.3. 5.4. 5.5. 5.6. 5.7. 5.8. Definitions and Preliminaries Convergence in Quadratic Mean Remarks on the Structure of L2 Orthogonal Projection Orthogonal Basis Existence of a Complete Orthonormal Sequence in L2 Linear Operators in a Hilbert Space Projection Operators Problems and Complements 92 92 93 96 97 99 102 103 106 106 110 113 115 118 121 122 125 126 CHAPTER 6 Second-Order Processes 129 6.1. 6.2. 6.3. 6.4. 6.5. 6.6. 129 Covariance Function C(s, t) Quadratic Mean Continuity and Differentiability Eigenvalues and Eigenfunctions of C(s, t) Karhunen-Loeve Expansion Stationary Stochastic Processes Remarks on the Ergodicity Property Problems and Complements 132 136 139 143 145 148 CHAPTER 7 Spectral Analysis of Stationary Processes 150 7.1. 7.2. 7.3. 150 153 157 Preliminaries Proof of the Bochner-Khinchin and Herglotz Theorems Random Measures Contents xiii 7.4. 7.S. 7.6. 7.7. 7.8. 7.9. 7.10. 7.11. 7.12. 7.13. 7.14. 7.15. 160 Process with Orthogonal Increments Spectral Representation Ramifications of Spectral Representation Estimation, Prediction, and Filtering An Application Linear Transformations Linear Prediction, General Remarks The Wold Decomposition Discrete Parameter Processes Linear Prediction Evaluation of the Spectral Characteristic qJ(A, h) General Form of Rational Spectral Density Problems and Complements 162 166 169 172 174 176 179 182 185 188 192 196 CHAPTER 8 Markov Processes I 200 8.1. 8.2. 8.3. 8.4. 8.5. 8.6. 8.7. 8.8. 200 Introduction Invariant Measures Countable State Space Birth and Death Process Sample Function Properties Strong Markov Processes Structure of a Markov Chain Homogeneous Diffusion Problems and Complements 203 205 211 214 216 218 223 227 CHAPTER 9 Markov Processes II: Application of Semigroup Theory 232 9.1. 9.2. 9.3. 9.4. 9.5. 9.6. 9.7. 232 234 238 241 243 247 252 256 Introduction and Preliminaries Generator of a Semigroup The Resolvent Uniqueness Theorem The Hille-Yosida Theorem Examples Some Refinements and Extensions Problems and Complements CHAPTER 10 Discrete Parameter Martingales 258 10.1. 10.2. 10.3. 10.4. 10.5. 10.6. 258 260 263 266 268 Conditional Expectation Discrete Parameter Martingales Examples The Upcrossing Inequality Convergence of Submartingales Uniformly Integrable Martingales Problems and Complements 272 276 Bibliography 279 Index 284 CHAPTER 1 Basic Concepts and Definitions 1.1. Definition of a Stochastic Process Generally speaking, a stochastic or random process (in this book both terms will be used in an equivalent sense) is a family of random variables defined on a common probability space, indexed by the elements of an ordered set T, which is called the parameter set. Most often, T is taken to be an interval of time and the random variable indexed by an element t E T is said to describe the state of the process at time t. Random processes considered here are specified by the following definition: Definition 1.1. A stochastic process is a family of (real- or complex-valued) random variables {X(t);t E T}, defined on a common probability space {n, BiJ, P}, where the parameter set T is a subset of the real line R. In the following we call a stochastic process real if the random variables X(t) are real-valued for all t E T. If the random variables X(t) are all complexvalued, the random process is called complex. The parameter set T is often called the domain of definition of the stochastic process X (t). If T = N + = {O, 1, ... } the process is said to be a discrete parameter stochastic process. If T is an interval of the real line, the process is said to have a continuous parameter. Most often, T is either N + or R+ = [0, (0). Let {X(t);tE T} be a real-valued random process and {t 1 , ••• ,tn } c T, where t1 < t2 < ... < tn' then (1.1.1) is a finite-dimensional marginal distribution function of the process 1. Basic Concepts and Definitions 2 {X(t); t E T}. One of the basic characteristics of a real random process are its marginal distribution functions. Definition 1.2. We shall say that the random process {X(t);t all its marginal distribution functions {Fr, ..... tJ·, ... ,· )} E T} is given if (1.1.2) obtained for all finite subsets {t l , ••. ,tn } c T are given. Marginal distribution functions (1.2) must satisfy the following consistency conditions: (1.1.3) (i) for any permutation k l , ••. , kn of 1, ... , n. (ii) For any 1 ::; k < n and Xl'"'' X k E R, Many future statements will hold for both real- and complex-valued random processes. For this and other reasons, it is convenient to introduce the concept of a state space {S, 2'} of a process. Here, S is the set of values of the process and 2' is a a-algebra of subsets of S. For instance, if the process is real, S will be the real line or a part of it and 2' will be the a-field of Borel subsets of S. 1.2. Sample Functions From Definition 1.1.1, it follows that for each t E T fixed, X(t) = X(t, .) is a random variable defined on n. On the other hand, for any OJ E n fixed, X ( . ,OJ) is a function defined on the parameter set T. Any such function is called a realization, trajectory, sample path, or sample function of the stochastic process X(t). In Figure 1.1, a realization of the process X(t) is depicted when T = [0, To]. XU) o Figure 1.1. A realization of X(t). 1.2. Sample Functions 3 A stochastic process (say, real) {X(t); t E T} is said to be directly given if the sample space n = RT, where RT is the set of all functions on the parameter set T with values in R, i.e., any OJ E n is a mapping OJ; T --+ R. The stochastic process {X(t); t E T} is then defined as a family of coordinate mappings on RT to R. In other words, for any t E T, X(t,') is a random variable on RT defined as follows: for any OJ(') EXAMPLE ity space 1.2.1. Let X and Y be independent random variables on a probabilwith {n,~,p} Hx(x) = P{X ~ x} Let {~(t); X(t,OJ) = OJ(t). E RT, t ~ and Hy(y) = P{Y ~ y}. O} be a stochastic process defined as ~(t) = tX + Y. In Figure 1.2 are some sample functions of ~(t). Clearly, Ft(x) = P{tX + Y ~ x} = = For any 0 < t1 < t2 f: f-roro p{X ~ x ~ y} dHy(y) (x - y) Hx ~t- dHy(y). < '" < tn' we have Ft, ..... tJX1'···'Xn) = P{¢(td ~ = fro P -ro Xl> ... , {x ~ Xl - ~(tn):::; x n} y, ... , X:::; Xn - y} dHy(y) t1 ~(t) o Figure 1.2. Realizations of ~(t). ~ 1. Basic Concepts and Definitions 4 It is also easy to verify that stE{X2} Eg(t)~(s)} = + E{y2}, assuming that E{X} = E{Y} = 0 and that X, Y E L2{n,~,p}. Before concluding this section, a few remarks on the structure of the a-field in the case when n is the space offunctions R T , seem appropriate. Let B 1 , B2 , ••• , Bn be n arbitrary Borel subsets of the real line, and consider the following subset of RT: []B, {w;X(td E B1, ... ,X(tn) E Bn }, where {t1, ... ,tn } c T and X(t,·) is coordinate mapping on RT to R. This subset is called an n-dimensional Borel cylinder. Then the a-algebra 91 is defined as the least a-algebra of subsets of RT containing all the finitedimensional Borel cylinders. The problem with this a-field is that the sample space n = RT is too big and the a-algebra ~ is too small, in the sense that many important subsets of n are not members of ~. Therefore, we cannot compute their probabilities because the probability measure P is defined on ~. For instance, the subset of all continuous functions of RT is not an event (i.e., a member of 91). The subset { w; sup X(t)::; rET x} = n {X(t)::; x} rET does not belong in general to 91. As a matter of fact, any subset of RT that is an intersection or union of uncountably many events from ~ may not necessarily be an element of ~. Such problems do not arise when T is a countable set. These difficulties are alleviated by the separability concept introduced by Doob which shall be discussed in a later section. The following example is quite instructive. EXAMPLE 1.2.2. Let {~(t); t space {n,9I,p} given by ~ O} be a real stochastic process on a probability ~(t) = t + X, where X ~ N(O, 1) (i.e., X has a standard normal distribution). The sample functions of the process ~(t) are straight lines on T = [0, ct:)). Let D c T be a countable set, say D = {t 1 ,t2 , ••• }, and consider the random event A = {~(t) = 0 That A E []B for at least one tED}. can be easily seen if we write it as A= U g(t;)=O} = U {X=-tj. tiED tiED 5 1.3. Equivalent Stochastic Processes Because each {X = -tJ E flI, it follows that A E flI. It is also easy to see that = 0 because P{X = -til = 0 for each ti E D. On the other hand, if we choose D to be [0,1], then the subset Ben defined as P(A) B = {W) = 0 for at least one t E [0, 1]} U = te[O,l] {X = -t} is not necessarily an event because it is an uncountable union of random events. But, U {X=-t}={XE[-I,O]}EflI. te[O,l] We also see that P(B) = (2nr1 fO e- x2 / 2 dx > O. -1 Therefore, sometimes an uncountable union of null events can be an event with positive probability. However, a countable union of null events is always a null event. 1.3. Equivalent Stochastic Processes Let {X(t);tE T} and {Y(t);tE T} be two stochastic processes defined on a common probability space {n, flI, P} and taking values in the same state space {S,.P}. Definition 1.3.1. If, for each n = 1, 2, ... P{X(td E B1 , ... , X(tn) E Bn} = P{Y(td E B1 , ... , Y(tn) E Bn} (1.3.1) for all {t 1 , ... ,tn } c T and {B 1 " .. ,Bn } c.P, the random processes X(t) and Y(t) are said to be stochastically equivalent in the wide sense. Definition 1.3.2. If, for every t E T, P{X(t) = Y(t)} = 1, (1.3.2) the random processes are said to be stochastically equivalent, or just equivalent. Let us show that (1.3.2) implies (1.3.1): Due to (1.3.2), P{X(td E B 1 , ... , X(tn) E Bn} = P{X(t 1) E B 1, ... , X(t n) E Bn> X(td = Y(t 1), ... , X(t n) = Y(t n)} 1. Basic Concepts and Definitions 6 = P{Y(td E B l , ... , Y(tn) E Bn , X(t l = P{Y(td E B l ,···, Y(tn) E ) = Y(t l ), ... , X(tn) = Y(t n)} Bn}· Definition 1.3.3. Let g(t);t E T} be a stochastic process on {n,~,p} with state space {S, 2}. Any other stochastic process g(t); t E T} on the same probability space, equivalent to ~(t), is called a "version" of ~(t). Definition 1.3.4. The stochastic process {X(t); t E T} and {Y(t); t E T} are said to be "indistinguishable" if almost all their sample paths coincide on T. In other words, if P{X(t) = Vt E T} = 1. Y(t) (1.3.3) This is the strongest form of equivalence, which clearly implies the other two. The following example shows that two equivalent processes may have quite different sample paths. In other words, (1.3.2) holds but not (1.3.3). EXAMPLE 1.3.1. Let n = [0, 1J, ~ the a-field of Borel subsets of [0,1J, P the Lebesgue measure, and T = [0,1]. Consider two stochastic processes {X (t); t E T} and {Y(t); t E T} on {n,~, P} defined as follows: X(t,w) = Y(t, ro) = For any t E ° on ° T for each WEn, on T except at the point t = w when Y(w, w) = 1. T fixed, {w;X(t,w) =1= Y(t,w)} = {w;w = t} = {t}. Because P {t} = 0, it follows that P{X(t) = Y(t)} = 1 for each t E T. In other words, these two processes are stochastically equivalent and yet their sample paths are different. In addition, P{w;X(t,w) sup X(t) teT = Y(t,w) Vt E T} = 0, = 0, sup Y(t) = 1. teT Remark 1.3.1. At this point, it would be of some interest to elucidate the principal distinction between Definitions 1.3.2 and 1.3.4. The point is that in Definition 1.3.2 the null set At on which X(t) and Y(t) may differ depends on t. As we have seen in Example 1.2.2, the union U At teT of uncountably many null events may not be a null event. On the other hand, in Definition 1.3.4 there is just one null event A such that 1.4. Kolmogorov Construction 7 X(t,w) for every w E = Y(t,w) on T AC. Under what conditions are Definitions 1.3.2 and 1.3.4 equivalent? This is clearly the case if T is a countable set. For continuous time, we need some conventions. Let {e(t); t E T} be a stochastic process with state space {S,2}, which is a metric space. The stochastic process is said to be Cadlag if each of its trajectory is a right continuous function and has limits on the left. E T} and {Y(t); t E T} be stochastically equivalent and both right continuous. Then X(t) and Y(t) are indistinguishable. Proposition 1.3.1. Let {X(t); t Let Q be the set of rational numbers. For each r E Q (\ T, P{X(r) Y(r)} = O. Consequently, P(G) = 0, where PROOF. U G= rEQI"\T {X(r) =1= =1= Y(r)}. However, by right continuity, {X(t) for any t E =1= Y(t)} c G T. Therefore, U {X(t) =1= Y(t)} c G, tET which shows that P{X(t) o = Y(t) 'it E T} = 1. 1.4. Kolmogorov Construction Let {W); t E T} be a stochastic process on {n,BI,p} with state space {R,9l}, where R is the real line and 9l the u-algebra of Borel subsets of R. The stochastic process determines a consistent family of marginal distribution functions by Ft, ..... tJx1, ... ,xn ) = P{Wd:$; x1, .. ·,Wn):$; x n }· Is the converse true? In other words, given a consistent family of distribution functions, does there exist a stochastic process for which these distributions are its marginal distributions? The following theorem due to Kolmogorov (which is not going to be proved here) provides an affirmative answer to this question. Theorem 1.4.1. Assume that {Ft, ..... tJX1'· .. 'Xn)}, {t1, ... ,tn} C T, n = 1,2, ... (1.4.1) 1. Basic Concepts and Definitions 8 is a consistent family of distribution functions, then there exists a stochastic process with (1.4.1) as its marginal distribution functions. As the probability space on which this process is defined we may use {RT,BI,P} (see Section 1.2 for definitions) and the stochastic process is specified by X(t, w) = w(t) for each wERT and t E T. The method in the Kolmogorov theorem used to construct a stochastic process starting from a consistent family of distribution functions leads to a rather large set of sample functions, namely, RT. Often it is desirable to construct a process whose sample functions are subject to some regularity conditions. For instance, we may want them to be Borel measurable, or continuous, and so on. Denote this particular subset of RT by no. In order that there exists a process {~o(t); t E T} stochastically equivalent to {~(t); t E T}, which can be realized in no, it is necessary and sufficient that p*(n o) = 1, where P* is the outer measure induced by P as follows: For any McR T , P*(M) = inf{P(C); C ~ M}, where C c RT is a cylinder set. In such a case, the system {no, Blo, P*} is the new probability space with Blo = fJI n no. We are not going to prove this result here. 1.5. Principal Classes of Random Processes In this section, we isolate and define several important classes of random processes which have been extensively studied and used in various applications. According to Definition 1.1.1, every random process is a family of r.v.'s defined on a common probability space. The theory of stochastic processes is concerned with different features of such a family, many of which can be studied by means of their marginal distributions. Of particular interest are those processes endowed with certain stochastic structure. More specifically, those whose marginal distributions are assigned certain properties. In the sequel, we identify the following five classes of random processes, each of them having a particular stochastic structure. (i) Processes with independent increments. Let {~(t); t E T} (1.5.1) be a real-valued stochastic process on a probability space {n,BI,p}, where T c R is an interval. 1.5. Principal Classes of Random Processes 9 Definition 1.5.1. The process (1.5.1) is said to have "independent increments" iffor any finite subset {to, t 1, •.. ,tn} c T with to < t 1 < ... < tn' the increments (1.5.2) are independent r.v.'s. From the definition, it readily follows that all the marginal distributions are completely determined by the distribution of W) for all t E T and by ~(tl) - ~(tl)' t 1 , tl E T, t1 < t l . Two most important processes with independent increments are the Poisson and Wiener (or Brownian motion) process. They will be discussed in some detail in the following chapters. (ii) Markov processes. Consider a random process on {n,8i6',p}, {W);t E T} (1.5.3) where T = [0, 00), with values in an R. ° Definition 1.5.2. Process (1.5.3) is called Markov with a state space {R, 9l} if, for any ~ t1 < t2 < ... < tn and B E 9t, P{ Wn) E BIWd, ... , ~(tn-d} = P{ ~(tn) E BI~(tn-d} (a.s.). (1.5.4) Property (1.5.4) of the stochastic process is called the Markov property. To elucidate somewhat its true meaning let us fix t E (0, 00) and consider the families of r. v.'s (1.5.5) g(s);s ~ t} and g(u);u ~ t}. If we take the instant t to be present, then the first family of r.v.'s represents the "past and present" of the process, whereas the second family is its "future." Now, for any B1 , Bl E 9t and s < t < u, one can easily verify that (1.5.4) implies that the following holds almost surely: Pg(s) E Bp ~(u) E B21~(t)} = Pg(s) E BtlW)}Pg(u) E B11~(t)}. (1.5.6) Thus, the Markov property, roughly speaking, means that given the present, the future and the past of a Markov process are independent. Definition 1.5.3. The distribution n on 9t defined by n(B) = P{ ~(o) E B} (1.5.7) is called the "initial distribution" of the process. Definition 1.5.4. A version P(s,t,x,B) of P{ ~(t) E BI~(s) = (1.5.8) x} having the properties a. for every s < t and x E R fixed, P(s, t, x, . ) is a probability measure on 9t, b. for every s < t and B E 9t fixed, P(s, t, ., B) is an 9t-measurable function, 1. Basic Concepts and Definitions 10 is called the transition probability function (or simply the transition probability) of the Markov process. From the definition, it follows that P(t,t,x,B) I {0 = if x 'f E B lx¢:B (1.5.9) for all t 2 O. In addition, the transition probability satisfies the so-called "Chapman-Kolmogorov equation." For any 0 :::; s < t < U and B E Yl, P(s, u, x, B) = t P(s, t, x, dy)P(t, u, y, B). (1.5.10) The initial distribution 11: and the transition probability function P(s, t, x, B) determine completely and uniquely the probability measure P of the process. To show this, let {Bd1 c Yl and 0 < t1 < ... < tn be arbitrary. Then taking into account (1.5.4) we clearly have P{W1) E B 1, ... , Wn) E Bn} =f f ···f P{~(0)Edxo,W1)Edx1,···,Wn)Edxn} R BI Bn This defines P on the class of measurable cylinders. The rest follows from Theorem 1.4.1. Definition 1.5.5. A Markov process is said to be "homogeneous" or to have "stationary transition probability" if, for any 0 :::; s < t, P(s, t, x, B) P(O, t - s, x, B). = (1.5.12) To simplify notation we shall write P(x, t, B) = t P(O, t, x, B). (1.5.13) In this case, the Chapman-Kolmogorov equation becomes P(x, s + t, B) = P(x, t, dy)P(y, s, B). (1.5.14) From (1.5.14), there follows an interesting fact: It is enough to know P(y, s, B) for all s ::;; 8, where 8 > 0 is arbitrarily small, because, for all u > 8, it is determined by Equation (1.5.14). In other words, the local behavior of the process (in the neighborhood of zero) determines its global behavior. (iii) Strictly stationary processes. In the theory of stochastic processes, an important class consists of those processes whose marginal distributions are invariant under time-shift. 1.5. Principal Classes of Random Processes 11 Definition 1.5.6. A real-valued stochastic process {W); t E T} on a probability space {n,gjJ,p} is said to be "strictly stationary" if for any {tl, ... ,tn } C T, {t l + h, ... ,tn + h} c T, and any {BkH c~, n = 1,2, ... , P{ Wl +h) E B h ... , Wn + h) E Bn} = Pg(td E Bl , ... , Wn) E Bn}. (1.5.15) In applications of strictly stationary stochastic processes, the parameter set T is usually R+ or R. If EI e(t)1 < 00, then from (1.5.15) it follows that Eg(t)} = E{W + h)} = m (constant). Similarly, if the second moment of the process exists, then it is also equal to a constant. Finally, set T = R+ and let s, t E T. Consider R(s, t) = E {e(s)W)}. Clearly, R(s, t) due to (1.5.15) = R(t, s). Assume for the sake of definiteness that s < t. Then Eg(s)W)} = E{e(s When we set h (1.5.16) = + h)W + h)}. (1.5.17) -s in (1.5.17) we obtain R(s, t) = R(O, t - s). The function (1.5.16) is often called the covariance function. Definition 1.5.7. A real or complex r.v. Z on {n,gjJ,p} is said to be "second order" if EIZI 2 < 00. The family of all second-order r.v.'s on {n,gjJ,p} is denoted by Definition 1.5.8. A stochastic process {W); t if W) E L 2 {n,gjJ,p} E T} on {n, gjJ, P} is second order TIt E T. (iv) Wide sense stationary processes. There exists a large class of problems in the theory of stochastic processes and their applications, whose solution requires only knowledge of the first two moments of a process. In such a case, problems are considerably simplified if these moments are invariant under time-shift. Definition 1.5.9. A second-order stochastic process { e(t); t E T} on a probability space {n, gjJ, P} is said to be "wide sense" stationary if, for all t E T, E { W)} = m (constant) and (1.5.18) R(t, t + t) = R(O, t). 12 1. Basic Concepts and Definitions It is clear that the second-order strictly stationary stochastic process is a fortiori wide sense stationary. The time invariance of covariance function with respect to time-shifts implies that the methods of harmonic analysis may playa useful role in the theory of wide sense stationary processes. This will be discussed in some details later. (v) Martingales. Let g(t);t E T} be a real stochastic process on {n,~,p} such that E{le(t)l} < 00 for all t E T. Definition 1.5.10. The stochastic process e(t) is said to be a martingale if, for every tl < t2 < ... < tn in T, E{ Wn)IWd, ... , Wn-d} = Wn-l) (a.s.). (1.5.19) (a.s.), (1.5.20) (a.s.), (1.5.21) If Eg(tn)le(td,···,Wn-l)}:::;; Wn-d the process W) is supermartingale. Finally, if Eg(tn)le(t1),···,Wn-d} ~ Wn-d the process e(t) is said to be a submartingale. 1.6. Some Applications There is a wide variety of applications of stochastic processes. For some of them all that is needed is calculation of one or more parameters of a stochastic process which is of direct interest in the given problem. Some other applications require very sophisticated theoretical methods of great complexity. This book deals with elements and methods of the theory of stochastic processes on an intermediate level, with strong emphasis on their applications. Successful use of the theory depends a great deal on successful modeling. For this reason, it is worthwhile to make a few remarks about it. Roughly speaking, mathematical modeling is the formulation of a mathematical system designed to "simulate" behavior of certain aspects of a physicalor biological phenomenon. For instance, Newton's law of motion of a body falling freely under gravity may serve as an example of a mathematical model. A mathematical model represents our conceptual image of the phenomenon that we observe and as such it reflects our quantitative understanding of a situation. Its formulation is always based on a certain number of assumptions concerning the fundamental nature of the phenomenon we investigate. If these assumptions are of a general nature, we may finish with a model of great mathematical complexity. On the other hand, too many simplifying assumptions may mean considerable restrictions on the model's ability to provide a sufficiently detailed and accurate description of the system being modeled. 1.6. Some Applications 13 The formulation then of a mathematical model must be a compromise between these two extremes. For this reason one usually talks about "a" mathematical model rather than "the" model. In the rest of this section, a number of examples illustrating mathematical modeling of various systems will be discussed in some details. 1.6.1. Model of a Germination Process From an agricultural viewpoint, germination is a process which begins when a dry seed is planted in wet soil and ends when the seedling emerges above the ground. The duration of this process is a r.v. T such that 0 < T ~ 00. The stochastic nature of the quantity T is a result of the fact that the water uptake (by imbibition) is a purely physical process based on diffusion of water through a porous medium (the seed coat). In addition, the soil matrix potential, the seed soil contact area, and the concentration of soil moisture around the seed are additional factors contributing to random variations of T. Consider the following problem: Suppose, at time t = 0, N seeds of the same plant species are planted and denote by T1 ,· T2 , ••• , TN' their respective germination times. How many seeds will germinate in (0, t]? Denote by X(t) this number; it is then quite clear that X(t) is a stochastic process defined as follows: X(t) = N L I{T St} i=l I for t > 0 and X(O) = o. (1.6.1) We now make several assumptions. Our physical intuition is not violated by assuming that: a. {1i}~ is an i.i.d. sequence ofr.v.'s with a common distribution function H(t) = P{T ~ t} with H(O) = 0 and H(oo) = 1 - p. (1.6.2) Here, 0 ~ p < 1 is the probability that a seed may fail to germinate. From (1.6.1) and (1.6.2), we have P{X(t) = k} = (~)(H(t))k{1 - H(t)}N-k, (1.6.3) Thus, if H(t) is known, the probability (1.6.3) is completely determined. Now make the following assumption: b. The average proportion of germinations in (s, s + L\s) is approximately A. • L\s, where A. > 0 is a constant. Next, partition (0, t] into n subintervals of equal length L\t= tin. Then due to assumption b, the average number of germinations in (0, L\t] is approximately N(1 - p)A.L\t. 1. Basic Concepts and Definitions 14 Thus, the average number of nongerminating seeds at time M is N(I-p)(I-AM). The average number of germinating seeds in (M,2~t] is N(1 - p)(1 - AM)A~t, so that the average number of nongerminating seeds at time 2~t is N(I - p)(1 - AM)2, and so on. Continuing in this fashion, we conclude that the number of nongerminating seeds at time t is approximately N(I - p)(1 - A~tt. From this, we deduce that for n sufficiently large P{T:S; t} By letting M ---t ~ {I - (1 - AM)"}(1 - p). 0, we obtain that H(t) = (1 - p)(1 - e- At ). (1.6.4) From this and (1.6.3), the required probability is now easy to obtain. 1.6.2. Modeling of Soil Erosion Effect Soil erosion is a result of perpetual activities of various geophysical forces acting on the earth's surface. The ensuing degradation of farmland may have profound impact on future food supplies. Because no experimental data exist, there is no choice but to resort to mathematical models in assessing future crop losses due to erosion. Here we present a simplified version of such a model. Surface erosion occurs by the migration of individual soil particles in response to forces such as wind, raindrop impact, surface runoff, and so on. Erosion reduces plant-available minerals and nutrients which are adsorbed in eroded particles. It also continuously decreases the root zone depth, which, in turn, reduces the water-holding capacity of the soil. This leads to decline of crop yields due to newly induced droughts that become more frequent and severe. Even under optimal conditions, the production of a particular crop fluctuates from one year to another in a random fashion. The mechanism responsible for these variations is quite complex. It involves climatic and meteorological factors (such as air temperature, wind, solar radiation, rainfall) soil characteristics, and other conditions. Due to these reasons, the effect of soil erosion on crop yield is not directly observable but becomes gradually evident over a long period of time. Having outlined the mechanism of the soil erosion process, we now embark on the problem of constructing a model to assess its effect on crop production. To this end consider a crop-producing area A and denote by {Y,,}f the sequence of annual yields of a given crop. If the area A is not 15 1.6. Some Applications affected by erosion and the same agricultural practice is followed year after year, we may assume that Pi} l' is an i.i.d. sequence of strictly positive r.v.'s on {n,~,p}, with a common distribution function such that E{yj} < Q(y) = P{Y ~ y} 00, i = 1,2. (1.6.5) If, on the other hand, A is subject to soil erosion, the resulting sequence of annual yields {Xn }1' consists of r.v.'s which are neither independent nor identically distributed. To determine this stochastic process, suppose that the soil erosion reduces annual yield by a certain percentage each year and denote by R j the percentage in the ith year. Then the loss suffered in the first year is Rl Yl·100=Yl·Ul. Thus, Xl = Yl - Yl · Ul = Yl(l - Ul ). The loss suffered in the second year is R2 Y2 (1 - Ud· 100 = Y2 (1 - Ud· U2 so that X 2 = Y2 (1- Ud - Y2 (1 - U l )U2 = Y2 (1 - Ud(1 - U2 ), and so on. Therefore, the crop yield in year n is Xn = Y" n Zj, n (1.6.6) j=l Our physical intuition is not violated by assuming that {Z;}1' is an i.i.d. sequence of r.v.'s independent of {y"}1', with commOn support in (0,1]. It seems appropriate to call Ln = n~ Zj "the loss-rate function." Notice that {L;}1' is a Markov chain. It is interesting to note that the quantity Ln = n~ Zj first appeared in a paper by Kolmogorov (1941) which is concerned with the modeling of a rock crunching process. 1.6.3. Brownian Motion The kinetic theory envisages a fluid as composed of a very large number of molecules moving with widely varying velocities in all directions and colliding with each other. In the early days of the theory, the most frequently asked questions were: Are molecules real? Do they actually exist and can we demonstrate their existence? The attempts to anSwer them led to an exhaustive study of an interesting phenomenon and shed much light on other kinetic properties. The phenomenon takes its name after the English botanist R. Brown who in 1827 noticed the irregular but ceaseless motion of the small particles, e.g., pollen suspended in a liquid. At first, the motion was thought to be of organic origin. After the advent of kinetic theory, it became clear that the only reasonable explanation for it lay in the assumption that the particles were 1. Basic Concepts and Definitions 16 subject to the continual bombardment by the molecules of the surrounding medium. Suppose at time t = 0, when our observation begins, the pollen was at the point x = O. Denote by ~(t) its position at time t > 0 [here W) denotes one coordinate of the particle]. The chaotic nature of this motion clearly implies that W) is a r.v. for all t > O. Thus, g(t); t ;;:: O} is a continuous parameter stochastic process. Examining the nature of the phenomenon, it seems reasonable to assume that the distribution of ~(t + s) - ~(t) does not depend on t if the temperature of the liquid remains constant. It also seems reasonable to assume that the change in position of the particle during the time interval [t, t + s] is independent of anything that happened up to time t. This implies that the process ~(t) has independent increments. Finally, the trajectories of the process should be continuous functions. We shall show that under these assumptions the stochastic process ~(t) is Gaussian (i.e., all the marginal distributions of the process are normal) with E{W)} =0, Varg(t)} = t. From this, we have for s < t In other words, E { ~(s)~(t)} = min(s, t). 1.6.4. Modeling of Dispersion in a Porous Medium Consider a column packed with a porous material saturated with a single incompressible fluid (say, fresh water) under convection (see Figure 1.3). Suppose that at time t = 0 a set of tagged (dynamically neutral) particles are introduced into the flow (across the section AB of the column). The transport of these particles in the direction of the flow is called "longitudinal dispersion in a porous medium." Porous medium "'j~ . . -1--- Tagged particle o· . li:. . Figure 1.3. Dispersion in a porous medium. 17 1.6. Some Applications We can construct a model of the dispersion under the following conditions: (i) the flow is steady and uniform; (ii) the porous structure is isotropic and homogeneous porous medium; (iii) tagged particles have identical "transport characteristics" and move independently of one another; (iv) there is no mass transfer between the solid phase and the fluid. A tagged particle in the flow through porous medium undergoes a particular kind of random walk. It progresses through pores and voids of the porous structure in a series of steps of random length, with a rest period of random duration between two consecutive steps. Denote by X(t) the distance ofparticie from the section AB at time t; then X(O) = 0:5; X(td:5; X(t 2 ) if 0 < t1 < t 2 • It is intuitively clear that {X(t); t ~ O} is a time homogeneous Markov process. Under conditions (i)-(iv), the marginal distribution (1.6.7) Ft(x) = P{X(t):5; x} provides the longitudinal concentration function of tagged particles in the column at any t ~ O. Calculation of (1.6.7) is based on the fact that X(t) can be approximated by a regular Markov jump process. 1.6.5. Queues A queueing system can be described as follows. Customers arrive at a service station (for instance, a post office, bank, etc.) with a certain number of servers. An arriving customer may have to wait until his turn comes or one of the servers becomes available. He leaves the station upon the completition of the service. To formulate a mathematical model of a queueing system, we must specify conditions under which the system operates. For instance; 1. in what manner do customers enter the system? 2. in what order are they served? 3. how long are their service times? Concerning the first question, we shall assume that the arrival time of the nth customer in the system is a r.v. 'n (we assume that '0 == 0 < '1 < '2 < ... ). Consequently, the number of customers who have entered the system by the time t is a random process {N(t); t ~ O} such that N(O) == 0 N(t): 0, 1,2, ... for t > ° and N(td :5; N(t2) if t1 < t 2. (1.6.8) Clearly, for any n = 0, 1, ... , 'n = inf{t;N(t) = n}, N(t) = max{n;'n:5; t}. (1.6.9) 1. Basic Concepts and Definitions 18 An arriving customer joins the end of a single line of people waiting to be served. The service is on a "first come first served" basis. When one of servers becomes free, he turns his attention to the customer at the head of the waiting line. This answers the second question. Finally, concerning the third question we, shall assume that the service times form an i.i.d. sequence ofr.v.'s {Un};", independent of {'l:n};". Here we shall consider queueing systems with a single server. Under certain conditions such a queue is completely described by the process {N (t); t ~ O} and {Un};". The state of the system at any time t ~ 0 is specified by the random process {S(t);t ~ O}, S(O) == 0, (1.6.10) which is the number of waiting customers, including one in the process of being served, at time t. What kind of information about S(t) is of interest to us? For instance, we would like to know something about the asymptotic behavior of S(t) when t -+ 00. When is S(t) asymptotically stationary? When is S(t) a Markov chain or when does it contain an imbedded Markov chain? The answers to these and other similar questions depend on the process N(t) and {Un};". Set T1 = '1: 1 and T" = 'l:n -'l:n- 1 for n ~ 1. Clearly, {T,J;" is the sequence of interarrival times. We shall assume that {1k};" is also an i.i.d. sequence of r.v.'s Set Fu(t) = P{U ::;; t}. (1.6.11) Under these assumptions, a queueing system is completely specified by these distributions. For this reason, it seems convenient to describe a queueing system in terms of the distributions (1.6.11). The most common notational scheme for this purpose is by a triple A/B/s, where A specifies Fro B specifies Fu , and s is the number of servers. For instance, M/M/1 means that FT(t) = 1- e- At, Fu(t) = 1 ~ e- act , s = 1. On the other hand, G/G/1 means that FT and Fu are some general distributions and s = 1, and so on. 1.7. Separability Let {e(t); t E T} be a real stochastic process on a complete probability space {n, 86', Pl. From the definition of a stochastic process, it follows that the mapping e(t,·):n-+R is 86'-measurable for every t E T. In other words, 1.7. Separability 19 {w; ~(t, w) E for each Borel subset B c R. However, B} E f14 n {w; W,w) {w; W,w) E B, t E T} = E B} leT need not be in f14 in general, unless T is countable. This then implies that functions like sup and ~(t) inf leT ~(t) leT may not be 8i-measurable either because, for instance, { SUP lET W) ~ x} = n g(t) ~ x}. lET Therefore, a large number of important functionals of a continuous parameter stochastic process (all those which depend on an uncountable number of coordinates) may not be random variables. A remedy for this situation is the separability concept introduced by Doob which will be defined next. Definition 1.7.1. The stochastic process {~(t); t E T}, where T c R is an interval, is said to be separable if there exists a countable dense subset D c T and a null event A. c Q such that {w; W, w) E C, tEl (") D} - {w; ~(t, w) E C, tEl (") T} c A. (1.7.1) for any closed set C and any open interval I. The countable set D is called the separant. Note that {w; ~(t,w) E C, tEl (") D} ::::> {w; ~(t,w) E C, tEl (") T}. From (1.7.1) it follows that N (") {w; ~(t, w) E C, tEl (") D} = N (") {w; W, w) E C, tEl (") T}, (1.7.2) which clearly implies that the right-hand side of (1.7.2) is an event because the left-hand side is so. The next proposition shows that (roughly speaking) every trajectory of the process {~(t); t E T} from N is as regular on T as its restriction on D. Proposition 1.7.1. For each wEN and open interval I c T, w(") T, w) = W (") D, w). (1.7.3) PROOF. Here ~(l (") T,w) denotes the closure inR of the set of values assumed by the trajectory ~( ., w) as t varies in 1(") T. The other set in (1. 7.3) has similar interpretation. To prove the proposition consider a closed subset C c R, we then have W (") D, w) c C <=> W (") D, w) c C. 20 1. Basic Concepts and Definitions On the other hand, from (1.7.2) we deduce that \fw E AC, Taking C = Wn WnD,w) c C-$>Wn T,w) c C. D, w), it follows from this that Wn T,w) c WnD,w) from which we conclude that (1.7.3) holds. This proves the proposition. From (1.7.3) we obtain that for each wEN and t ¢(t,w) E W n D,w) E D T, (1.7.4) for every open interval I containing t. This, on the other hand, implies that because, by definition, D is dense in T, for every t E T there exists a sequence {udf cD such that Uk --+ t. Then, for each wEN, lim ¢(u k , w) = ¢(t, w) (1.7.5) (the sequence {Uk} may depend on w). This can be seen as follows. Let tEl, then lim ¢(u b w) k .... oo E Wn D, w). Let us show that for every wEN and open set I sup ¢(t, w) = sup ¢(t, w). I (1.7.6) InD From its definition, it follows that the left-hand side of (1.7.6) is the upper bound of the set W, w) which belongs to it. Similarly, the right-hand side of (1.7.6) is the upper bound of W n D, w), and must belong to this set. From this and (1.7.3), the assertion follows. In the same fashion, one can show that inf ¢(t, w) = inf ¢(t, w). lInD Do separable processes exist? The following proposition due to Doob provides an affirmative answer to this question. Proposition 1.7.2. Every real-valued stochastic process {¢(t); t E T} defined on a complete probability space {Q, 86', P} has a separable version. We are not going to prove this proposition here. This is an important result which implies the following: For any real random process {¢(t); t E T} on a complete probability space {Q,ge,P}, there is a real random process {~(t); t E T} defined on the same probability space which is separable and stochastically equivalent to ¢(t). Note that ~(t) and ¢(t) may have different trajectories. This, however, is not the case if ~(t) and ¢(t) are both Cadlag (see Proposition 1.3.1). 1.8. Some Examples 21 Remark 1.7.1. If { W); t E T} is separable, then, as we have seen from (1.7.5), for any wEN and t E T ~(t, w) = lim ~(Uk> w), n->oo where {uk}i cD (depends on w) is such that Uk -+ t. In other words, the values of the sample functions on T are limits of sequences { ~(Uk' .)} i . 1.8. Some Examples In this section we present two examples. In the first one we consider a separable version of a nonseparable stochastic process. EXAMPLE 1.8.1. Let {n,~,p} be a probability space with n=[0,1], f!4 the u-algebra of Borel subsets of [0, 1], and P the Lebesgue measure. Let g(t); t E [0,1]} be a stochastic process on the probability space defined as follows: ~(t, w) = { 1 ift = w 0 if t # w. From this definition, it follows that {w;W,w) = O} = {w;w # t}. Hence, for any subset reT = [0,1], n {w;W,w)=O} {W;~(t,W)=O,tEr} = fer n{w; w = fer = {w; W E # t} = (u {w; w = t})C fer r}C = [0, 1] - r. From this, it follows that if leT is any open interval and D the set of all rationals, P{w;~(t,w) = 0, tEl} = 1 - P(I), P{w;~(t,w)=O,tElnD} = 1, which clearly shows that the process ~(t) is not separable. Now, let a(t); t E [0, 1]} be a stochastic process on the same probability space specified as follows: e(t,w) Then, we clearly have = 0 for all (t,w) E [0,1] x n. 22 1. Basic Concepts and Definitions {w;e(t,w) i= ~(t,w)} = {w;t = w} = {t}, which is a null event. Thus, ~(t) is an equivalent version of W). It is also easy to check that ~(t) is separable. 1.8.2. Let {W); t E T} be a real stochastic process on the probability space {RT, a3', P} (see Section 1.2 for definitions). Denote by EXAMPLE CT = {x(t); t E T} the set of all continuous functions with values in R. Clearly, Cn which is a subset of R T , is not an element of a3'. As matter offact, {w;e(',w) ECT } = J~l kQ It-Qllk {w;IW,W) - e(s,w)1 ~~} ¢ a3'. However, if the process is separable with a set D eTas separant and Tis closed, {w;eLw) E T} = C LLQ S.oD {w; IW,w) - e(s,w)1 Is-tl'; 11k ~ n (1.8.1) and as such is an element of a3'. In order that the realizations of a separable process are continuous with probability 1, it is necessary and sufficient that the probability of the random event (1.8.1) is 1. 1.9. Continuity Concepts In this section we define three types of continuity of a stochastic process. The first one is stochastic continuity (or continuity in probability). Let {W); t E T} be a real-valued stochastic process on {n, a3', P}. Definition 1.9.1. The process {e(t); t E T} is said to be stochastically continuous at a point to E T if, for any B > 0, lim P{IW) If (1.9.1) holds in every point to continuous on T. E Wo)1 > B} = O. (1.9.1) T, we say that the process is stochastically Remark 1.9.1. From the definition, we see that stochastic continuity is a regularity condition on bivariate marginal distributions of the process. As matter of fact, we have 23 1.9. Continuity Concepts ~(t) 4 I I I r---j 3 I 2 I r------tlI I I I I I I I I I I o Figure 1.4. A realization of ~(t). EXAMPLE 1.9.1. Let {T,.}f be an i.i.d. sequence of non-negative r.v.'s on a probability space {n,.1l,p} with H(t) = P{Y; :$; t} continuous. Set ~(t) Clearly, = n L I {Ti~t}' ;=1 t ~ o. (1.9.2) o :$; ~(td :$; ~(t2) for all 0 :$; tl :$; t 2 • The realizations of ~(t) are nondecreasing step functions with unit jumps at points where {T,,;}1 are order statistics for {Y;}7. In Figure 1.4 is a trajectory of ~(t). Although every sample function of ~(t) has discontinuities, the random process ~(t) is stochastically continuous on [0, (0) because due to Markov's inequality and continuity of H( . ), it follows that P{IW ± h) - as h --. 0+ for each t ~ n W)I > e} ~ -IH(t e ± h) - H(t)I--'O. O. Remark 1.9.2. Condition (1.9.1) is equivalent to ~(t) p --. ~(to) as t --. to. (1.9.3) From Example 1.9.1, we see that a process may be stochastically continuous even if each realization of the process is discontinuous. This is so when the probability that a discontinuity will occur in an instant t E T is zero. So, when is a stochastic process stochastically discontinuous at a point to E T? The answer is only if to is a fixed discontinuity point of the process. What is a fixed discontinuity point? 24 1. Basic Concepts and Definitions Definition 1.9.2. Suppose that {~(t); t E T} is real-valued and separable, and denote by Nto the set of sample functions which are discontinuous at to E T. If peNt) > 0, we say that to E T is a fixed discontinuity point of ~(t). Definition 1.9.3. Suppose that {~(t); t E T} is real-valued and separable. The stochastic process is continuous (a.s.) at a point to E T if the set of realizations discontinuous at to is negligible. If the process is (a.s.) continuous at every t E T, we say that the process is (a.s.) continuous on T. Remark 1.9.3. It is apparent that every separable process continuous (a.s.) at a point t is also continuous in probability at t [in this case peNt) = 0], but the converse is false, in general. Definition 1.9.4. A stochastic process {~(t); t E T} is said to have (a.s.) continuous trajectories if the set of those trajectories which have discontinuities on T is negligible. If {~(t); t E T} is (a.s.) continuous at each t E T, it does not necessarily imply that it has (a.s.) continuous trajectories. Indeed, the set of all those trajectories of the process without discontinuities on Tis NC -- n NtC tET and this event may not have probability 1. 1.9.2. Let {n,~,p} be a probability space with n = [0,1], the u-algebra of Borel subsets of [0, 1], and P the Lebesgue measure. Let {~(t); t E [0,1]} be defined as follows: EXAMPLE ~ o e(t,w)= {1 (see Figure 1.5). Let r = if t < w ift~w [0, s]; then o Figure 1.5. Graphical depiction of W, ro). 25 1.10. More on Separability and Continuity {m;e(t,m) = O,t E r} = [0,1] - r, so that P{m;e(t,m)=O,tEr} = I-s. On the other hand, if D is the set of all rationals P{m;e(t,m)=O,tEDnr} = I-s. This shows that the process is separable. In addition, for t so that P(Nt ) = 0. However, N = U Nt = Q E T, Nt = {m; m= t} so that P(N) = 1. tET The following theorem by Kolmogorov gives a sufficient condition for (a.s.) continuity of sample functions of a separable stochastic process. ° Proposition 1.9.1. Let { e(t); t E T} be a real-valued separable process and T a compact interval. If there exist constants ex > 0, p > 0, C > such that (1.9.4) then almost all sample functions of e(t) are continuous on T. 1.10. More on Separability and Continuity Given a real-valued process, there will be, generally speaking, several equivalent versions, some of which are separable. This permits us to choose a separable version whose sample paths have some nice properties. Let {e(t); t E T} be real-valued and separable with D c: T as a separant. Then as we have seen, there exists a negligible subset A c: Q such that, for each mEN, the value of e(·, m) at every t E T is the limit lim Wn,m), n-+oo where {t n} c: D and tn -+ t as n -+ 00. The sequence {t n} i, in general, depends onm. Consider a real-valued stochastic process {e(t); t E T} on a complete probability space {Q, 91, P}, without discontinuity points of the second kind. In other words, e(t - 0, m) and e(t + 0, m) exist for all t E T and mEn. The next simple proposition shows that in such a case we can always choose a version e(t) of ,(t) which is continuous from the right if e(t) is stochastically continuous. Proposition 1.10.1. Let {e(t); t E T} be real-valued stochastically continuous random process on a complete probability space {Q, 91, P} and T c: R a compact 26 1. Basic Concepts and Definitions interval. If the process does not have discontinuities of the second kind, there exists a separable version ~(t) equivalent to ~(t) whose sample functions are continuous from the right. PROOF. Choose a separable version and denote by B the random event that the limit exists for every t and consider E T. Let us show that P(B) = 1. To this end, let t T be fixed E Because the process is separable, this limit exists for all wEN, where A is a null set. Therefore, P(Bt ) = 1 for each t E T. On the other hand, due to the separability assumption, B = n tEDnT Bt=P(B) = 1. Next, set ~(t,W) = if wEB, and ~(t, w) {w; = ¢(t, w) lim n--+oo ~(t + !,w) n if WE Be; then ~(t) f= ~(t)} = rQ {W; I~(t) - ~(t)1 > n Hence, P{W; ~(t) f= W)} = !~~ p( {I~(t) - ~(t)1 > On the other hand, P{I~(t) - ~(t)1 > fI B. n fI B). npCQ L\ {I~(t n-~(t)1 n) !~~ peOn {I~(t n-~(t)1 n) = = + + ~ !~~ p{I~(t +~) - ~(t)1 > due to stochastic continuity. Therefore, P{w;~(t)f=W)}=O, VtET. > n > = 0 1.11. Measurable Random Processes Finally, because \fm E Band t E T e(t) = it follows that {e(t); t assertion. E 27 W + 0), T} is continuous from the right. This proves the D The following result will be needed in the sequel. Proposition 1.10.2. Let {~(t); t E T} be a real-valued stochastic process and T eRa compact interval. A necessary and stifficient condition for ~(t) to be continuous in probability on T is that (1.10.1) sup P{I~(t)-~(s)l>e}-+O It-sl<h as h -+ 0 for any e > O. PROOF. Sufficiency of (1.10.1) is clear. To prove its necessity, assume that ~(t) is stochastically continuous on T. Then for any fixed e, b > 0 and u E T, there exists h > 0 so that sup P{IW) - ~(u)1 > e} < b. It-ul<h T, with U 1 < U 2 < ... < Un' be a sequence such that T c h, Ui + h). If I is an open interval contained in one of (u i - h, Ui + h), then, for any s, tEl, Let {u;}~ c U~ (u i - {IW) - ~(s)1 > 2e} c {I~(t) - ~(u)1 > e} U {I~(s) - ~(u)1 > e}. This implies the inequality P{IW) - ~(s)1 > 2e} ~ 2b, which holds as long as It - s I < h. The proof now follows by letting b -+ O. D 1.11. Measurable Random Processes Let R be the real line and gJ the a-algebra of Borel subsets of R. In the following we shall often use the notation gJc=gJnC= {BnC;BE9P}. (1.11.1 ) Clearly, gJc is a a-algebra, such that gJc c gJ if C E 9P. Let {~(t); t E T} be real-valued stochastic process on a probability space {Q, 86', P} and T c R an interval. From Definition 1.1.1, we know that ~(t, .) is a 86'-measurable function on Q for every t E T. From this, however, it does not follow that the mapping 1. Basic Concepts and Definitions 28 ~(-, is fYlT .): T x n -+ R (1.11.2) x 8I-measurable. We now give the following definition: Definition 1.11.1. A real-valued stochastic process {~(t); t E T} is said to be measurable if the mapping (1.11.2) is fYlT x g6'-measurable, i.e., if {(w, t); W, w) E B} E fYlT X g6', for every BE fYI. {~(t); t E T} is a measurable process. Then from Fubini's theorem, it follows that for almost all WEn the sample functions ~(', w) are fYlT-measurable. To motivate the consideration of the measurability concept, suppose that for a measurable process ~(t), we have Remark 1.11.1. If ff Tx(l 1~(t, w)1 dt P(dw) < (1.11.3) 00. Again invoking Fubini's theorem, we obtain ff Tx(l '~(t'W)'dtP(dW)=fT E{I~(t)l}dt= J(lr p(dW)fT 1~(t,w)ldt. Thus, t E{I~(t)l} dt < 00 and X(w) = tl~(t'W)ldt< 00 (a.s.), where X(·) is a r.v. on {n, 81, P}. Next we give an example of a measurable process. 1.11.1. Let gd~ be a sequence of r.v.'s on {n,g6',p}. Let g(t);t E [a,b]} be a stochastic process defined as follows: Set a < t1 < t 2 < ... < tn < b and write EXAMPLE W) t E [a,td; = ~o, ~(t) = ~n' t E [tn,b] and ~(t) = ~i' t E [ti' t i+1), i = 1,2, ... , n - 1. To show that the stochastic process is measurable, let {(t,w);W,W)Er} = r E fYI and consider [a,t 1) x {W;~oEr}U[t1,t2) x {W;~1 Er} U '" U [tn,b] x {w;~n E B}. 29 1.11. Measurable Random Processes Clearly, then (T = [a,b]) {(t, (0); W, (0) E r} E at T X f!4, so that ~(t) is measurable. Next, we give some sufficient conditions which ensure measurability of a random process. Proposition 1.11.1. Let {~(t); t E T} be a real-valued stochastic process and T eRa compact interval. If ~(t) is continuous in probability, there exists a version ~(t) which is separable and measurable. PROOF. Due to continuity in probability of W), we have (see 1.10.1) sup P{I~(t) - ~(s)1 > e} It-·I<h --+ 0 as h --+ O. (1.11.4) Now set T = [a,b] and choose to < ti < ... < t:n = a= b such that 1 p{le(u) - e(v)1 > e} < 2 n (1.11.5) for all u, v E [tj-l' tj]. Suppose now that for every n, {tj} c {tj+l} and let L, consisting of all the tj, be dense in T. Define ~n(t,oo) = Wj,oo) for tj::::;; t < tj+l' From Example 1.11.1 we clearly see that en(t,oo) is a measurable process for every n = 1, 2, .... Next, from (1.11.5) we infer that P{IW) - en(t) I > e i.o.} = 0, so that ~n(t) --+ e(t) (a.s.), Vt E T. Further, because {en(t); t E T} is a sequence of measurable processes, e(t) = lim sup en(t) n-+oo d is a measurable process such that e(t) = e(t). Finally, from the definition of e(t) and en(t), it is apparent that e(t) = en(t) for all tEL. On the other hand, for every t E T and 00 E n, e(t) is the lim sup of {e(tt)} for a sequence c L which increases to t. From this and. 0 Definition 1.7.1, it follows that ~(t) is separable. - {trnl 30 1. Basic Concepts and Definitions Problems and Complements 1.1. Let {n, fll, P} be a probability space. Let p be a function on fll x fll defined by p(A,B) = P(A 6 B), where A 6 B = (A - B) u (B - A). Show that, for any C E fll, peA, B) ::s; peA, C) + pCB, C). 1.2. Show that for any two random events A and B IpeA) - P(B)I ::s; peA, B). 1.3. Let Xl and X 2 be independent real-valued r.v.'s on a probability space {n, fll, P} with the distribution functions (d.f.'s) Hl (-) and H 2 (-), respectively. Let {~(t); t ~ O} be a stochastic process defined by tX l ~(t) = + X2. Calculate peA), where A is the set of all nondecreasing sample functions of the process. 1.4. Let {~(t); t ~ O} be a stochastic process defined by X ~(t) = + at, a > 1, where X is a r. v. with the Cauchy distribution. Let D c [0, 00) be finite or countably infinite. Determine: a. P{ ~(t) = 0 for at least one tED}, b. P{ ~(t) = 0 for at least one t E (1,2]}. 1.5. Let X and Y be r.v.'s on {n, fll, P}, where Y ~ N(O, 1) (standard normal). Let {~(t); t ~ O} be a stochastic process defined by ~(t) = X + try + t). Let A be the set of sample functions of ~(t) nondecreasing on [0,00). Show that A is an event and determine peA). 1.6. In Problem 1.5 denote by B the set of sample functions which are nonincreasing in [0,1]. Show that B is an event and determine PCB). 1.7. Let Xl and X 2 be independent r.v.'s on {n,fll, P} with common standard normal dJ. Let {~(t); t ~ O} be a stochastic process defined by ~(t) = (Xl + X 2 )t. Determine F,t .... "Jx l , ... , xn). If A is the set of all non-negative sample functions, show that A is an event and determine peA). 1.8. Let {~(t); t ~ O} be a stochastic process defined by ~(t) = X cos(t + U), where X and U are independent r.v.'s U is uniform in [-n,n], and E{X} Determine its covariance function. = O. Problems and Complements 31 1.9. A stochastic process {W); t E T} is called normal (or Gaussian) if all its marginal distributions are Gaussian. Set x(t) = E{ W)}, then ~(t) ~ N(x(t), C(t, t» for all t E T. Assume that x(t) = 3, Show that P{ ~(t) :s; 2} ~ .309. 1.10. Let {W); t ~ O} be random process with ~(O) = 0 (a.s.) and marginal probability densities (to = Xo = 0) {' Jt, •...• ,. (x 1 ,···,x") -. _ 0" [2n(ti 1=1 ti-d] -1/2 (1 (XI - exp --2 ( X i _ 1)2) ti - t i - 1 ) (0 < t 1 < ... < tn). Determine E{W)}, E{e(s)W)} = C(s,t). 1.11. Let X and U be independent r.v.'s on {n,~,p}, U is uniform in [0,2n], and the probability density of X is defined by fx(x) = Let {W~ t ~ { 0 2X3 e(-1/2x4) X ~ 0, x< 0. O} be defined by W) = X 2 cos(2nt + U). Show that W) is a Gaussian process [i.e., every random vector (~(tl)' ... , is normally distributed; see Chapter 4]. ~(t.» 1.12. Let {W);tE T} and {X(t);tE T} be real stochastic processes on {n,~,P}. If they are stochastically equivalent, show that they have identical marginal distributions. Under what conditions will they have the same sample functions? 1.13. Let {Xi}!' and {Yi}l be r.v.'s on {n,~,p} such that E{Xi} = E{Yi} = 0, Var{Xi } = Var{Yi} = lIt < 00, E{XiXJ } = 0, E{Yilj} = 0, and E{Xi lj} = 0 for all i j. Let {W); t ~ O} be defined by *' W) = n L {XJCOSAjt + ljsinAA· j=1 Is the process wide sense stationary? Determine its covariance function. 1.14. Let {Xn}~ be an i.i.d. sequence ofr.v.'s with E{Xd = 0 and Var{X;} = 112 < 00. Let {N(t);t ~ O} be a homogeneous Poisson process with E{N(t)} = At and independent of {X.}~. Is the stochastic process {W); t ~ O} defined by ~(t) = XN(t) 32 1. Basic Concepts and Definitions strictly stationary? [A non-negative integer-valued random process with independent increments {N(t); t ~ O} is homogeneous Poisson process if 0 ~ N(tl) ~ N(t2) for all 0 ~ tl ~ t2 and P{N(t 2) - N(td = k} = exp[ -A(t 2 - tdJ [A(t2 - tl)]k k! ; see Chapter 2]. 1.15. Let Z be a Bernoulli r.v. such that P{Z=l}=p, P{Z=-l}=q, p+q=1. Let {N(t); t ~ O} be a homogeneous Poisson process with parameter A, independent of Z. Consider the stochastic process {~(t); t ~ O} defined by ~(t) = (_1)1/2(I-Z)+N(t). Determine Pg(t) = I}, 1.16. Let {~(t); -00 < t < oo} be a random telegraph process defined as follows: assumes only two values, -1 and + 1, with equal probabilities Pg(t) = I} =!, Pg(t) ~(t) = -I} =!. The sequence of transition times {Tk}~oo forms a homogeneous Poisson process with parameter A > O. In other words, the number of transitions (from -1 to + 1 and vice versa) in (u, u + tJ is a homogeneous Poisson process N(t). Show that the process is wide sense stationary and determine its covariance function. 1.17. Let {~(t); t E 9l} be a sample continuous wide sense stationary stochastic process. Determine 1.18. Let {~(t); t ~ O} be a process with independent increments. Assume that <PI(t,A) = E{ei.lWl} and <P2(tI,t 2,A) = E{ei.l[~(t2)-Wl)]} are given. Determine the characteristic function 0f (~(td, ... , ~(tn)) in terms of <PI and <P2· 1.19. A stochastic process with independent increments is said to be homogeneous if W + s) - ~(s) ~ W) for all s, t ~ O. In other words, W + s) - ~(s) and ~(t) have the same distribution. Show that the characteristic function <pdt, A) is infinitely divisible if continuous at t = O. 1.20. (Continuation) Let a stochastic process with independent increments {~(t); t ~ O} be homogeneous. If <p(t; A) is continuous at t = 0 for all A, the process is stochastically continuous. E T} be a real stochastic process on a complete probability space {Q,ai,P}, Dc T countable and everywhere dense, and A c Q negligible. If, for any OJ r/= A and t E T, there exists {t n}'[' C D, tn --+ t, such that W., OJ) --+ ~(t, OJ), show that the process is separable. 1.21. Let {~(t); t Problems and Complements 33 1.22. If {W); t E T} is separable and f: R is separable. --+ R continuous, show that {f( W)); t E T} 1.23. If { ~(t); t E T} is such that ~(., w) is continuous on T for all wEN, where A c is negligible, show that the process is separable. n 1.24. Let {~(t); t E T} be stochastically continuous on T and f: T --+ R. Then X(t) = ~(t) + f(t) is stochastically continuous in those and only those points of T where f(t) is continuous. 1.25. Let { ~(t); t E T} be a family of i.i.d. r.v.'s with common probability density f(·). Show that ~(t) cannot be stochastically continuous at any point t E T. 1.26. Let {W); t E T} be stochastically continuous at every t stochastically continuous if'll: R --+ R is continuous. E T. Then 'I'(W)) is also 1.27. In the previous problem, show that <p(t) = E{'I'<e{t))} is a continuous function. 1.28. Let {W);tE T} be a stochastically continuous process on {n,~,p}. Let {X(t); t E T} be another process on {n,~, P} equivalent to W). Show that X(t) is also stochastically continuous. 1.29. A stochastic process {~(t); t E T} is said to be "bounded in probability" (or stochastically bounded) if lim sup P{IW)I > C} = C-++oo reT o. If ~(t) is stochastically continuous on T = [a, b], then (.) holds. 1.30. Let {W); t E T} be a process with independent increments. If Varg(t)} is a continuous function in t, the process is stochastically continuous. 1.31. Let {W); t E [0, 1]} be a standard Brownian motion [i.e., increments and for any s, t + S E [0,1] with t > 0 P{W + s) - W) ~ x} = (2ns)-1/2 too W) has independent e- z2 / 2s dz]. Show that almost all sample functions of ~(t) are continuous. 1.32. Let {W); t E [0, 1]} be a real Gaussian process with E{ W)} = 0 and covariance function C(s,t) = min{s,t} - st. Show that its sample functions are (a.s.) continuous on [0,1]. 1.33. Let {~(t); t E [a, b]} be a real process with (a.s.) continuous sample paths. Show that ~(t) is measurable. 1.34. If all the sample functions of a real stochastic process are Borel measurable, does this imply measurability of the random process? CHAPTER 2 The Poisson Process and Its Ramifications 2.1. Introduction We begin by describing in an informal fashion the subject matter of this chapter. The part of the general theory of stochastic processes dealing with countable sets of points randomly distributed on the real line or in an arbitrary space (for instance, Cartesian d-dimensional space) is called the "Theory of Point Processes." Of all point processes, those on the real line have been most widely studied. Notwithstanding their relatively simple structure, they form building blocks in a variety of industrial, biological, geophysical, and engineering applications. The following example describes a general situation which in a natural fashion introduces a point process on a line. 2.1.1. Let A be an event which reoccurs repeatedly in time so that the time interval between any two consecutive occurrences of A has a random length. Assume that our observation of the event A begins at time t = 0 and that 0 < tl < t2 < ... are instances of the first, second, etc., occurrence of A. These instances may be regarded as a set of points randomly distributed on the half-line R+ = [0, 00). An alternative interpretation of this situation is that the event A occurs at points of a randomly selected countable subsets co c R+, where co: (tl,t2' ... ). EXAMPLE The random event A may be the rainfall occurrence at a given site. Due to stochastic nature of the rainfall phenomenon, its arrival times at a given location may be regarded as points randomly distributed on R+. Other specific cases of A could be the earthquake occurrences, volcanic eruptions, flooding of a city, and so on. 2.2. Simple Point Process on R+ 35 o Figure 2.1. Crossings of the level Xo by the process e(t}. Point processes generated by crossings of a fixed level by a stochastic process are of particular interest in engineering. The interest in crossing problems dates back to the original paper of M. Kac (1943) and of S.O. Rice (1945). As an example in hydrology, consider the discharge rate ~(t) of a streamflow at a given site. Because the surface runoff flows vary randomly in time, ~(t) is a continuous parameter stochastic process. The crossing points of a fixed level Xo by ~(t) are randomly distributed (see Figure 2.1). Of particular interest in various applications is asymptotic behavior of this point process when Xo -+ +00. 2.2. Simple Point Process on R+ Set R+ = [0, 00) and denote by fY/ + the a-algebra of Borel subsets of R+. Let g be the collection of all infinite countable subsets of R+ which are locally finite and do not contain zero as an element. In other words, every of the form 0< W: (t 1 ,t2 , ... ), < t1 t2 < "', WEn is (2.2.1) and such that for any compact K c R+ #(w (\ K) < 00, where # denotes the number of elements in the set. It is clear from this definition that tn -+ +00 as n -+ 00. Let eX<·) be the Dirac measure on fY/+ concentrated at x, i.e., VB E fY/+, 1 if x ex(B) = { 0 E ifx ~ B B, and consider the Borel-Radon measure Jl{-) = Le te ro t (·)· (2.2.2) 2. The Poisson Process and Its Ramifications 36 This measure is called a "point measure" on R+. It clearly takes only nonnegative integer values and Jl(K) < 00 on any compact K c R+. Note that every WEn is a closed subset of R+ because its intersection with any compact subset of R+ is a finite set. Let {"t"n}f be a sequence of coordinate mappings on n, i.e., specified as follows: for any w: (t1' t 2 , ••• ), "t"n(w) = tm n = 1,2, .... (2.2.3) From this it readily follows that on n 0< "t"1(-) < "t"2(-) < ... (2.2.4) and that "t"n(·) -+ 00 as n -+ 00. Finally, we denote by fJI the least u-algebra of subsets ofn with respect to which all"t"n are measurable. We now give a definition of a point process on R+. Definition 2.2.1. A random measure 11 on 91+, n -+ N+, 11: 91+ x where N+ = {O, 1, ... }, defined by, VB E ~+ (2.2.5) is called a "point process" on R+. From (2.2.3) and (2.2.5), it is clear that for every fixed WEn, 11(B, W) = Jl(B), where Jl(.) is a point measure defined by (2.2.2). In addition, for any t E R+ and WEn, 11({t},W)= {o1 ift¢w ·f 1 t E W, (2.2.6) which means that the point process does not have multiple points. Such a point process is called "simple." When B = (0, t], we will write N(t) = 11«0, t], .), N(O) == O. (2.2.7) From (2.2.5) we readily have that L I(o.r)("t"k). 00 N(t) = k=l (2.2.8) The stochastic process {N(t); t ~ O} is called the "counting random function" of the point process 11. It is easy to see from (2.2.8) that every realization of 38 2. The Poisson Process and Its Ramifications Because {/;}1 are independent r.v.'s, we readily obtain from (2.3.2) that pt~ Ii > k} ~ pt~ Ii > k - ,~ P{:~ Ii = I} 0,1, = I}. (2.3.3) On the other hand, pt~ Ii > o} = 1 = P{/l = O, ... ,In = O} 1 - P{/l = O, ... ,In- l =O}+P{/l =O, ... '!n-l =O,In= I} = ... = 1 - P {/ 1 = O} + P {/ 1 = 0, 12 = I} + ... From this and (2.3.3) we obtain the recursion pt~ Ii > k} ~ ptt Ii > k- 1}pt~ Ii> O} which proves (2.3.1). D Let X and Y be r. v.'s on a probability space {n,!JI, P} with values in some abstract space {S, 2}. Definition 2.3.1. The total variation distance d(X, Y) between X and Y is defined by d(X, Y) = IlPx - Pyll = sup IPx(D) - Py(D)I, DE !l' (2.3.4) where Px and P y are, respectively, the distributions of X and Yon {S, 2}. Remark 2.3.1. When X and Yare real-valued r.v.'s, the metric do(X, Y) = IlEx - Fyll = sup IFx(x) - Fy(x) I x (2.3.5) is often useful, where Fx and Fy are the distribution functions (d.f.'s) of X and Y, respectively. Clearly, do(X, Y) ~ d(X, Y). Finally, if X and Yare integer-valued, 1 <XJ d(X, Y) =:2 k~O IP{X = k} - pry = k}l. (2.3.6) The next result is known as the coupling lemma. Lemma 2.3.2 sup IPx(D) - Py(D) I ~ P{X "# Y}. DE !l' (2.3.7) 37 2.3. Some Auxiliary Results N(t) i i I r----I I I Tl T2 I I 0 I T3 T4 Figure 2.2. A sample function of N(t}. N(t) is a nondecreasing step function, continuous from the right, with unit jumps at each of its discontinuity points. In Figure 2.2, a sample function of N(t) is depicted. The r.v.'s {'t"n}'1 are usually called "arrival times" of the point process. The following relation between the counting function N(t) and its arrival times 't"n is easy to see: {'t"n ~ t} = {N(t) ~ n}. (2.2.9) Let P be a probability measure on {n,.1l} and set A(t) Fn(t) = P{'t"n ~ t}, = E{N(t)}. (2.2.10) Then from (2.2.8), it clearly follows that A(t) = 00 L Fk(t). k=l (2.2.11) 2.3. Some Auxiliary Results In this section, we discuss some inequalities involving sequences of independent Bernoulli r.v.'s. These inequalities, which are of independent interest, will considerably simplify many proofs in forthcoming sections. Lemma 2.3.1. Let {Ik}1 be independent Bernoulli r.v.'s; then, (2.3.1) PROOF. First, we have the following equality: n P { i~ Ii > k } {'-1 n n } =,~ P i~ Ii = 0,1, = 1, i~' Ii > k - 1 . (2.3.2) 40 2. The Poisson Process and Its Ramifications Then = pry; ~ 2} + pry; = O,l i = I} = 1 - e- P; - Pie-P; + e-P;[l - e P;(1 P{Ii"# Y;} ::; 1 - Pi(1 - Pi) - (1 - Pi) = pr. - p;)] (2.3.10) This and (2.3.9) prove the assertion. 0 2.4. Definition of a Poisson Process In this section we discuss one of the most important point processes on R+, the Poisson point process. We first give its definition. Definition 2.4.1. A simple point process on R+, with counting random function {N(t);t ~ O}, is called a "Poisson process" if a. {N(t); t b. {N(t); t ~ ~ O} has independent increments, O} is stochastically continuous. As before, set A(t) = Lemma 2.4.1. For all t E{N(t)}; we shall show that for all t < ~ 00, A(t) < 00. 0, P{N(t) > O} A(t)::; 1 - P{N(t) > O}' (2.4.1) 0= t no < tnl < ... < tnn = t (2.4.2) PROOF: Consider so that max (tni - tn.i-d -+ i as n -+ 00. (2.4.3) 0 Set (2.4.4) '¥(s) = 1[1,oo)(s); then we clearly have n L ,¥(N(tni) n-oo i=l N(t) = lim N(tn,i-l)) (a.s.). (2.4.5) This follows from the stochastic continuity of N(t) and the fact that the sum on the right-hand side of (2.4.5) is nondecreasing in n. Next, write (2.4.6) Clearly, {'¥ni}~ is a sequence of independent Bernoulli r.v.'s. Thus, according 39 2.3. Some Auxiliary Results PROOF. We have P{X ED} - pry ED} = P{X E D} S; for all D E P{X E - pry E D,X ED} - P{Y E D,X D, Y f. D} S; f. D} P{X #; Y} !l', which proves the assertion. D Remark 2.3.2. Concerning the coupling inequality (2.3.7), the following should be pointed out. We can use any joint distribution of (X, Y) in inequality (2.3.7) as long as its marginals are Px and Py• Thus, to get a sharp bound in (2.3.7), we will select, from all joint distributions of (X, Y) with the same marginals Px and Py, one that has the least probability, P{X #; Y}. Application of the coupling lemma hinges on our ability to calculate P{X #; Y}, which is not always a simple task. The next result in this respect is of particular interest to us. Its purpose is to determine simple exact upper bounds for the distance do(X, Y) and for d(X, Y), in the case when X is a sum of independent Bernoulli r.v.'s and Ya Poisson r.v. suitably chosen to approximate X in distribution. Lemma 2.3.3 (Le Cam). Let {I;}1 be independent Bernoulli r.v.'s with E{I;}=Pi' and Ya Poisson r.v. with E{Y} = i=1,2, ... Ii Pi. Then dCt Ii' Y) S; it (2.3.8) pf. PROOF. We can write Y = Y1 + ... + Y,., where li are independent Poisson r.v.'s with E{li} Coupling Lemma 2.3.2, we have dCt Ii' Y) = dCt Ii' = Pi. Then from the it li) pLt it li} S; Ii #; (2.3.9) To evaluate P{I i #; li}, several methods are available. The following one is due to Serfling (1975). Let Zi also be a Bernoulli r.v. independent of li and such that Set 2.4. Definition of a Poisson Process 41 to Lemma 2.3.1, we have Finally, by letting n -+ 00 in this inequality and invoking (2.4.5), we obtain P{N(t) > k} ::s; (P{N(t) > 0})k+1 or [see (2.2.9)], P{"t"k+1 ::s; t} ::s; (P{N(t) > O})k+1. o This and Equation (2.2.10) prove the assertion. Corollary 2.4.1. From (2.4.1), it clearly follows that A(t) is finite and continuous at every t ~ O. As a matter of fact, for any t ~ 0 and s ~ 0, it follows from the Lebesgue dominated convergence theorem and stochastic continuity of N(t) that lim {A(t + s) - A(t)} 5-+0 = E {lim (N(t + s) - N(t))} 5-+0 = 0, which implies right continuity of A(t) at any t < 00. In the same fashion, one can prove continuity of A(t) at any t ~ 0 from the left. Set A(to,t 1) = A(t1) - A(to), O::S; to < t1 < 00; (2.4.7) then we have the following result: Proposition 2.4.1. For any O::S; to < t1 < 00 and n =,1, ... , P{N(t 1) - N(t o) = n} PROOF. = exp[ -A(to,t1)] {A(to,t 1W , . n. (2.4.8) Consider the partition of [to, t 1] where max k (t/l k - t/l,k-d -+ 0 as n -+ 00, and set '¥/li = ,¥(N(t/li) - N(t/l,i-1))' where '1'(.) is defined by Equation (2.4.4). As in the previous lemma, we have that L '¥/li -+ N(td /I i=1 N(t o ) (a.s.). Set P/li = P{'¥/li = I}. 2. The Poisson Process and Its Ramifications 42 Because by assumption N(t) is stochastically continuous, sup Pni::;:; sup P{N(tnJ - N(tn,i-1) ~ I} i -+ i as n -+ 00. Hence, Pni -+ 0 as n -+ Suppose now that as n -+ 00, (2.4.9) 0 uniformly in i. 00 n I Pni -+ L(to, t 1) < 00, i=1 (2.4.10) and consider sequences {XniH of independent Poisson r.v.'s with E{XnJ = Pni' Invoking Le Cam's Inequality (2.3.8), we have However, due to (2.4.9) and (2.4.10), n n I P;i ::;:; max Pnk I Pni -+ 0 i=1 k i=1 as n -+ 00 and the assertion follows due to the fact that I d n i=1 X ni -+ Y, where Y has a Poisson distribution with E {Y} = L(to, t 1)' D Remark 2.4.1. The previous proposition was proved assuming that (2.4.10) holds. We shall now prove this hypothesis. Lemma 2.4.2. Under the conditions a and b of Definition 2.4.1, n lim n-+oo L Pni < 00. i=1 (2.4.11) PROOF. From the definition of 'link' it follows that there exists at least one integer v = 1, 2, ... , n such that P{ t k=1 'link = v} > O. (2.4.12) Because {'I'nkH are independent Bernoulli r.v.'s, we have p{ t 'I'nk=V} = k=l I"'I 1:5',;i 1 < ... < iv:5:n P{'I'ni 1=1,.··,'I'ni,=I,'I'ni'+1=0,.·.,'I'nin=0} = }] P{'I'ni=O} 1$i~'.'.: ~'$JJ P{'I'ni = 1} r < 07=1 P{'I'ni = O} (Li Pni)' - (1 - SUPiPni)" v! ID P{'I'nir =O} 2.5. Arrival Times {tk} 43 Now, invoking the inequality !ln P{'Pni = O} = iInJ (1 - ( n ) Pni) :::;; exp - ~ Pni , we obtain that P{ L 'IInk = V n } :::;; k=l L1 ) C exp (- nPni (L~ Pnit ,. v. Therefore, ifL7=1 Pni -+ 00 as n -+ 00, the right-hand side ofthe last inequality 0 converges to zero, which contradicts (2.4.12). This proves the lemma. Remark 2.4.2. When A(t) = At, A > 0, the Poisson process is said to be time homogeneous with mean rate A. Remark 2.4.3. Any Poisson process can be transformed into a time homogeneous Poisson process. Indeed, let A(t) = E{N(t)} and denote by A-I the right continuous inverse of A, i.e., for all u ~ 0, A- 1 (u) = inf{t;A(t) > u}. Because A(t) -+ 00 as t satisfies -+ 00, A(A -l(U)) (2.4.13) A-l(U) is defined for all u ~ 0 and, furthermore, = U, A(t) > u if t > A -1 (u). Therefore, the stochastic process {No(t); t ~ O} defined by No(t) = N(A -1 (t» is a homogeneous Poisson process with E{No(t)} = A(A -1 (t» = t. Remark 2.4.4. Any Poisson process is also a Markov process. 2.5. Arrival Times {'!d Let {Tn}! be the sequence of "arrival times" of a Poisson point process {N(t); t ~ O} with A(t) = E{N(t)}, and set Fn(t) = P{ tn :::;; t}. Then, from (2.2.9) and (2.4.8), we obtain that Fn(t) = 1 - exp[ - A(t)] 1 = r(n) where r(n) = (n - 1)! n-l [A(t)]k k=O k! L -- ft exp[ -A(s)] [A(s)]n-l dA(s), 0 (2.5.1) 2. The Poisson Process and Its Ramifications 44 An important property of the arrival times of a Poisson process can be described as follows. Given that exactly n events have occurred in [to, t 1]' these n points are distributed throughout the interval [to, t 1] as n points selected randomly (and independently) from this interval according to the probability distribution dA(s) (2.5.2) This is established in the following proposition. Proposition 2.5.1. Let to = So < Sl < ... < Sr and k 1, k 2 , ••• , kr non-negative integers. Set k1 = t1 be a partition of [to, t 1], + ... + kr = k; then (2.5.3) PROOF. This follows from the fact that {N(sJ - N(Si-1) = ki, i = 1, ... ,r} c {N(t1) - N(t o) = k}. 0 To this proposition one may also give the following interpretation: Let Z be a r.v. with support [to, t 1] such that P{Z < - t} = A(to,t) A(to ,t 1 )' to ~t~ t 1, (2.5.4) and consider n independent copies Zl, ... , Zn of Z. Denote by Zn1' ... , Znn the corresponding sequence of order statistics. Then, the following is a variation of Proposition 2.5.1. Given that N(t1) - N(t o) = n, the joint distribution of arrival time {,n~, to < 'f < ... < ': ~ t 1 is the same as of {Zn;}~' As a matter of fact, after some straightforward calculations, one obtains that, for any to < U 1 < ... < Un < t 1 , But then, as is well known, the right-hand side of (2.5.5) is the joint distribution of (Zn1' ... , Znn). The next example shows the usefulness of this result. EXAMPLE 2.5.1. For every t > 0 and n E ttl 'kIN(t) = n} = = 1,2, ... , (t - A~t) I A(S)dS) n. Indeed, according to Proposition 2.5.1 and Equation (2.5.4), we have 2.5. Arrival Times {'t"k} E{ f k=l 45 'kIN(t)=n}=E{ f k=l Znk}=E{ f k=l zk}=nE{zd=An() t Jit sdA(s). 0 As every process with independent increments, a Poisson process is also a Markov process. It is easy to see that, for any 0 < s < t < 00 and 0 ::::;; i ::::;; j, its transition probability is given by Pij(s, t) = P{N(t) = jIN(s) = i} = P{N(t) - N(s) = j - i}. (2.5.6) It is of some interest to investigate how the Markov stochastic structure of {N(t);t ~ O} reflects on the stochastic structure of the process {'n}1' and vice versa. In the rest of this section, this question will be discussed in some detail. But first, note that, under relatively mild regularity conditions on A(t), P{N(t + s) - N(t) ~ 2} = o(s) (2.5.7) as s --+ o. We now have the following result: Proposition 2.5.2. For all 0 < s < t and 0 ::::;; i ::::;; j, Pij(s,t) = P{N(t) = jl'j = s}. PROOF. For i = 1,2, ... , P{s < as ~s --+ (2.5.8) 'j : : ; s + ~s} = = P{N(s) o. Thus, Pis,t)P{s < 'j::::;; S + ~s) ~ i} i - 1, N(s + ~s) = i} + o(~s) P{N(s) ::::;; i - 1, N(s = + ~s} + ~s) = i} + o(~s) = P{N(s) = i - 1, N(s + ~s) = i, N(t) = j} + o(~s) = P{N(t) = j, s < 'j::::;; s + ~s} + o(~s). = Pij(s,t)P{N(s) = i - 1, N(s From this we conclude that Pij(s, t) = P{N(t) = jls < 'j::::;; s By letting ~s --+ 0 the assertion follows. + ~s} + 0(1). D Remark 2.5.1. The proof of (2.5.6) is based on two features ofthe process N(t): its Markov property and condition (2.5.7). Therefore, the proposition holds for any point process whose counting random function is a Markov process and which satisfies condition (2.5.7). 'Ii = '1 and 1'" = 'n - 'n-1 for n ~ 2. Then, if A(t) = A.t [i.e., N(t) is a homogeneous Poisson process], {T,,}1' is an U.d. sequence of r.v.'s with common dJ. P{T1 ::::;; t} = 1 - exp( -A.t). Corollary 2.5.1. Set 2. The Poisson Process and Its Ramifications 46 PROOF. The proof is quite simple. From (2.5.6), it follows that 1 - Pjj(s,s + u) = P{N(s + u)?: j + IIN(s) =j} = P{rj+l = P{1j+l :::;; ulrj = :::;; s + ulrj = s} s}. On the other hand, Pjj(s, s + u) = exp( - AU), which proves the assertion. D 2.6. Markov Property of N(t) and Its Implications Let {N(t); t ?: O} be a Markov process. The aim of this section is to show that in such a case the corresponding sequence of arrival times h}! is also a Markov process. The converse, however, does not hold in general. In other words, if {rJ! is a Markov process, it does not necessarily follow from this that N(t) has the Markov property. An exception is, of course, the Poisson process. In what follows we shall prove these two statements. Our elementary method of proof will require that the condition P{N(t + At) - N(t)?: 2} = o(At) (2.6.1) as At --+ 0 holds. Proposition 2.6.1. Let {N (t); t ?: O} be a Markov process with transition probability Pij(s, t). If condition (2.6.1) holds, then {r j }! is also a Markov process with transition probability (2.6.2) PROOF. From (2.5.8), it readily follows that 00 I Pik(S,t) = P{rj+l :::;; tlrj = s} k=j+l which proves (2.6.2). Next, + dt) = I} O}PO,l(t,t + dt). P{r 1 Edt} = P{N(t) = 0, N(t = We also have, for allj = P{N(t) = (2.6.3) 1,2, ... , + dt) = j + llrj = s} = Pjj(s, t)Pj,j+l (t, t + dt). P{rj+1 E dtlrj = s} = P{N(t) = j, N(t (2.6.4) 2.6. Markov Property of N(t) and Its Implications 47 Finally, for 0 < tl < ... < tn < 00 and n = 1, 2, ... arbitrary, we obtain, after some straightforward calculations, that P{'rl E dt 1'''',!n E dtn} = P{N(td = = + dt 1) = 1, ... , N(t n) = n - 1, N(t n + dtn) = P{N(td = 0}P01 (t 1,t 1 + dtdPl1(tl,t2)P12(t2,t2 + dt 2) ... Pn- 1,n-l (t n- 1, t n)Pn- 1,n(tn, tn + dt n)· 0, N(tl n} From this, (2.6.3), and (2.6.4), it follows that P{!l E dt 1"",!n E dt n} = P{!l E dtdP{!2 E dt21!1 = td ... P{!n E dt nl!n-l = tn-d· (2.6.5) o This proves the proposition. Corollary 2.6.1. If N(t) is a Poisson process with A(t) = E{N(t)}, it clearly follows from (2.5.6) and (2.6.2) that {!j}f is a Markov process with stationary transition probability P{!j+1 ::;; tlrj = s} = 1- exp[ -A(s,t)]. (2.6.6) In fact, as we have indicated before, one can prove much more. To this end we need the following auxiliary result. Lemma 2.6.1. Let {!j} f be a Markov chain with stationary transition probability (2.6.6). Then, for any 0 < tl < ... < tn < t and n = 1,2, ... , (2.6.7) PROOF. After some simple calculations, we deduce that d d I () _ }_P{!lEdtl, ... ,!nEdtn,!n+l>t} { P !1 E t 1""'!n E tn N t - n P{N(t) = n} = e{p( - A(t)j P N(t) = n Ii dA(tj ). 1 On the other hand, P{!n Edt} = P{O <!1 < ... < !n-l < f. f O<tl < ... <I n -1 <t From this and (2.6.6), we have t'!n Edt} P{!lEdtl,· .. ,!n-1Edtn-l,1:nEdt}. (2.6.8) 48 2. The Poisson Process and Its Ramifications P{Tn Edt} = f·· f exp[ -A(t)] CD dA(tj ) )dA(t) 0<11 < ... <tn - l <t {A(t) }n-l (2.6.9) = exp[ -A(t)] (n _ I)! dA(t). Therefore, for all t > 0 and n = 0, 1, ... , {A(tW P{N(t) = n} = exp[ -A(t)]-,-. n. (2.6.10) o This and (2.5.1) prove the lemma. Proposition 2.6.2. Let {N(t); t ~ O} be a simple point process with A(t) = E{N(t)}. A necessary and sufficient condition for N(t) to be a Poisson process is that its sequence of arrival times {Tj}'f is a Markov process with stationary transition probability (2.6.6). PROOF. Necessity of the condition is obvious. Its sufficiency will be amply illustrated by showing that N(s) and N(t) - N(s) are independent r.v.'s. Thus, let 0 < s < t and k, n = 0, 1, ... be arbitrary arid write P{N(s) = k, N(t) - N(s) = n} = P{N(s) = k, N(t) - N(s) = nIN(t) = k + n}P{N(t) = k + n}. (2.6.11) Now, according to Lemma 2.6.1, given N(t) = n + k, the k arrival times in (0, s] and n in (s, t] are distributed as k + n independent r.v.'s with common dJ. A(u) A(t) , 0 ~ u ~ 1. Hence, P{N(s) = k, N(t) - N(s) = nIN(t) = n + k} = (n ; k)(~~:Dk (A~~~;)y-k. This, (2.6.10) and (2.6.11) yield [A (S)]k [A(s, t)]" P{N(s)=k, N(t) - N(s)=n} = exp[ -A(s)]~exp[ -A(s,t)] n! ' which proves the assertion. o Denote by (2.6.12) the sequence of interarrival times of a point process. It is often of interest to 49 2.6. Markov Property of N(t) and Its Implications know the stochastic structure of {1J}f. As we have seen in Corollary 2.5.1, this is an i.i.d. sequence with common negative exponential distribution if N(t) is a homogeneous Poisson process. If the Poisson process is not homogeneous, the r.v.'s {4}f are neither independent nor identically distributed. In the case when N(t) is a "pure birth process," i.e., a Markov process with stationary transition probability, the stochastic structure of {1J}f is easy to determine. Proposition 2.6.3. Suppose that {N(t);t ary transition probability, i.e., ~ O} is a Markov process with station- Pij(s, s + u) = P;j(u), (2.6.13) then {1J}f is a sequence of independent r.v.'s with P{1}+1 :::;; u} = 1 - exp( -AjU), j PROOF. From (2.6.13), it follows that, for all t, U > 0 andj Pjj(u) = P{N(t = 1- + u) = P{tj + 1 :::;; jIN(t) = j} = P{N(t t (2.6.14) = 0,1, .... = 0, 1, ... , + u) = il't) = t} + ultj = t} = 1 - P{1J+l :::;; ultj = t}. (2.6.15) This clearly implies that 1J+l is independent of tj for all j proves that {1J}f is a sequence of independent r.v.'s with P{1j+1 :::;; t} = 1, 2, ... , which = 1 - Pjj(t). (1.6.16) Next, let 0 < s < t be arbitrary and consider Pjj(t) = P{N(s + t) = jIN(s) =j} = P{N(t) = j, N(s + t) = jIN(s) = j} = Pjj(t - s)Pjj(s). If we set U = t - s, we obtain Pjj(s + u) = Pjj(u)Pjj(s), (2.6.17) which is the Cauchy's functional equation. The function Pjj(u) is nonincreasing with Pjj(O) = 1 for allj = 0, 1, .... In addition, for all u, s > 0 and j =,1, ... , as s -+ 0, which implies that Pjj(u) is continuous for all U ~ O. In such a case, the only solution of (2.6.17) is Pjj(u) = exp( - AjU), j This concludes the proof of the proposition. = 0, 1, .... (2.6.18) o 50 2. The Poisson Process and Its Ramifications 2.7. Doubly Stochastic Poisson Process To motivate the consideration of a doubly stochastic Poisson process, we shall begin by the following example. EXAMPLE 2.7.1. A deep space probe was sent to investigate the volcanic activities of the Jupiter's moon 10. Stationed high above the moon in a stationary orbit, the probe counts the number of volcanic eruptions on the moon's surface. However, if hit by a meteorite, the functioning of the probe would stop. Therefore, its lifetime is a r.v., say T. Suppose that the number of volcanic eruptions N(t} is a homogeneous Poisson process with intensity A. Clearly, T and N(t} are independent r.v.'s. The total number of eruptions detected during the probe's lifetime is N(T}. This is a r.v. with the distribution P{N(T} = n} = 1 00 o P{N(t} = n} dH(t} =,1 n. 1 00 0 exp( -At)(At}"dH(t}, where H(t} = P{T:::; t}. Suppose now that a meteorite impact at a random time T does not destroy the probe but only reduces its sensitivity, so that after T, the intensity of the Poisson process becomes Ao < A. To simplify the problem, we assume that the second impact by a meteorite is not very likely. In such a case, the intensity of the process N(t} is a r.v. specified as follows: Hence, A(t, . ) = t (2.7.1) A(S, .) ds = Amin(T, t) + Ao(t - (2.7.2) min(T, t)}. Denote by No(t) the homogeneous Poisson process with A = 1. Then, if N(t) is an arbitrary Poisson process with E {N(t)} = A(t), according to Remark 2.4.3, (No 0 A)(t) = N(t}. Therefore, the probability that the probe will detect n volcanic eruptions in (0, t] is P{No(A(t,·» = n} = P{No(At) = n, T> t} + P{No(AT+ Ao(t - T» = n, T:::; t} (At)" = exp( -At)-, [1 - H(t)] + I t 0 n. exp{ -[AS + Ao(t - = E {exP [ _ A(t, .)] s)]} [A(~!. )]"}. [AS + Ao(t n! s)]" dH(t) 2.8. Thinning of a Point Process 51 We now give the following definition of a doubly stochastic Poisson process. Definition 2.7.1. Let {A(t, ·);t ~ O} be a stochastic process on {o,ar,p} whose realizations are strictly increasing functions such that A(O, . ) = 0 (a.e.). Let No(t) be the Poisson process on the same probability space with E {No(t)} = t, independent of A(t, '). A point process {N(t); t ~ O} is called a doubly stochastic Poisson (or Cox) process if it has the same distribution as (No 0 A)(t) = No(A(t, . )). (2.7.3) An exhaustive study on this subject can be found in a monograph by Grandell (1976). EXAMPLE 2.7.2. When A(t,') = Zt, where Z is a non-negative r.v., the doubly stochastic Poisson process is called a "weighted or mixed" Poisson process. It can be shown that the weighted Poisson process is a Markov process with the transition probability (t - sy-i E{Zje- Zt } Pij(s, t) = (j _ i)! E {Zie ZS}' (2.7.4) 2.8. Thinning of a Point Process To provide the motivation for studying this particular procedure, we consider the following example. EXAMPLE 2.8.1. Suppose we observe the occurrence of the rainfall at a particular location. Suppose we begin our observation at time t = 0 and let 0 < T 1 < T 2 < ... be the times of the first, second, etc., rainfall after t = 0 at this particular site. The sequence of r.v.'s {Tn}! represents a point process on R+ = (0, ex»). Suppose now that for some practical purposes not all rainfall events arriving at the given location are of interest but only those which are sufficiently large to cause flooding. If only their arrival times are recorded, the corresponding point process can be visualized as a point process obtained from the point process {Tn}! by deletion of a certain number of its points. The new point process is called a thinned point process. In this section we are concerned with the so-called "independent thinning," which can be described as follows. Let {N(t);t ~ O} be an arbitrary simple point process. It undergoes independent thinning if each of its points 52 2. The Poisson Process and Its Ramifications is deleted with probability p, 0 < p < 1, and retained with probability q = 1 - p, independently for each point. The retained set of points forms a thinned point process. Let '1 q (. ) be the thinned version after the scale contraction by a factor q. In other words, for A E ~+, '1q(A) is the thinning of '1(q-1 A) [see (2.2.5) for the definition of '1], where q- 1A = {q-1 x ;XEA}. The point process '1i') resulting from both thinning and contraction of the time scale will have '1q(A) = n say, only if exactly n of the '1(q-1 A) points in the set q-1 A from the original process are retained in the thinned process. It is of some interest to investigate the asymptotic behavior of '1i') as q -+ O. In the rest of this section we shall investigate the weak convergence of '1 q (.) as q -+ 0 (weak convergence means convergence in finite-dimensional distributions). The next result has been known for some time (see Belaeyev, 1963). Proposition 2.S.1. The point process '1i' ), obtained by independent thinning and contraction from a point process '1(.) on R+, converges weakly to a homogeneous Poisson process with intensity A E (0, (0), if and only if for all 0 < t < 00 qN G) .4 At (2.8.1) as q -+ 0, where N(') is the counting random function of '1('). PROOF. The proof is essentially due to Westcott (1976). Define cDq(s,A) = E(exp{ -S'1iA)}), 'Pis, A) = E(exp{ - 2S'1(q-1 A)}), where s > 0; then cDq(s, A) = E(E(exp { - s'1q(A)} I'1(q -1 A))). The thinning mechanism implies that E(exp{ - s'1 q(A)} 1'1(q-1 A)) Thus, cDq(s,A) = E(exp{17(Q- 1A)ln(1 - q(1 - e- S ))}). (2.8.2) 2.9. Marked Point Processes But for any 0 < e < 53 ! q(1 - e- S) < -log{1 - q(1 - e- S)} < q(1 - e-S)(l + e) (2.8.3) if q < e. Because q --+ 0, we may choose e> 0 arbitrarily small and deduce from (2.8.2) and (2.8.3) that <l>q(s,A) and '¥q(l-e-S,A) have the same limit, if any, as q --+ O. We shall first prove necessity of (2.8.1). To this end note that '¥q(s', (0, t]) converges to e-As't for all 0 $; s' < 1 as q --+ O. Now apply the continuity theorem for Laplace transform (Feller, 1971, p. 408), remembering that such functions are analytic, and hence determined, in their domain of definition, by values in an interval. To prove sufficiency of the condition (2.8.1), note that it implies qrJ(q-l A) ~ AIIAII for all A E J, the ring generated by intervals (0, t]. Here, Lebesgue measure. So for such A, 11'11 denotes the (2.8.4) as q --+ O. But from a result by Renyi (1967), the Poisson process is uniquely determined by the fact that P{rJ(A) = n} = e-AIIAII (AliA lit n! for all A E J, and (2.8.4) implies all convergent subsequences of finite-dimensional distributions possess this property. Thus, they all have the same limit, which proves the assertion. 0 2.9. Marked Point Processes Consider a simple point process on R+ with arrival times {tJ l' and the counting random measure rJ, all defined on a probability space {Q,BB,P}. Let {~J1' be a sequence of real-valued r.v.'s on the same probability space. The bivariate sequence of r.v.'s (2.9.1) represents a point process on R+ x R, which is called "a marked point process." The marked point process (2.9.1) is completely characterized by the nonnegative integer-valued random measure Q on fJl+ x fJl defined by (2.9.2) 2. The Poisson Process and Its Ramifications 54 where for any G E fJl + x fJl B(x,y)(G) I if(x,Y)EG = { 0 if (x, y)¢ G. From (2.9.2) it readily follows that for any A E (2.9.3) fJl+ Q(A x R) = '1 (A). The counting random function N*(t, D) ofthe marked point process (2.9.1) is defined by N*(t, D) = Q«O, t] x D), DE [}t. Clearly, on {N(t) = O} on N{(t) ~ I}, (2.9.4) where N(t) stands for the counting random function of the point process {rj}i· From (2.9.4) we have that N*(t,D) = = = 00 L I{N(t)2!j}I D(e) j=l j~ (~j I{N(t)=k})ID(ej ) k~l Ctl ID(ej ) )I{N(t)=k}' In other words, N*(t,D) = N(t) L ID(ej), j=l on {N(t) ~ I}, (2.9.5) and N*(t,D) = 0 on {N(t) = O}. In addition, N*(t, R) = N(t). It is clear from (2.9.5) that for any D E fJl fixed, N*(t, D) is a stochastic process with nondecreasing step sample functions with unit jumps at those points Tk" Tk 2,'" (1 ~ Tk, < Tk2 < ... ) for which ek j E D. This implies that N*(t, D) is a thinning of the original point process {TJi. Denote by P~(') the distribution of a r.v. and, as before, set A(t) = E{N(t)}. e Proposition 2.9.1. Assume that a. {N(t); t ~ O} is a Poisson process. b. {ej } i is a sequence of U.d. r.v.'s, with common distribution P~( . ), independent of {Tj}i. 55 2.9. Marked Point Processes Then {N*(t, D); t ~ O} represents a Poisson process with E{N*(t,D)} = A(t)P~(D) if P~(D) > O. PROOF. There are many ways to prove this assertion. The following amply illustrates one of the methods of proof. For any 0 < t1 < t2 < 00 and O~k~n, P{N*(t 1,D) = k, N*(t 2,D) = n} N(I.J N(12) = P { i~ ID(ei) = k'i~ ID(e) = n I~ rJ-k = pLt jt ID(ei) = k, } ID(el+)=n-k} x P{N(td = 1}P{N(t2) - N(td = r}. Invoking conditions of the proposition, we obtain pLt pLt ID(O = k} = G)(P~(D»k[l ID(ej ) = n - k} = (n - p~(D)J/-k, ~ k)(p~(Drk[l - p~(D)]r-n+k, (A(td)' P{N(t1) = I} = exp[ -A(tdJ-1-!- , P{N(t2) - N(td = r} = exp[ -A(t 1,t2)] (A(t 1 ,t2 , r. », . From this, after some straightforward calculations, we have P{N*(t1,D) = k, N*(t 2,D) = n} = exp[ -A(tdP~(D)] (A(t1~~(D»k x exp[ -A(t1,t2)P~(D)] (A(t 1, t2)P~(D»"-k (n _ k)! . Therefore, P{N*(t 1,D) = k, N*(t 2,D) - N*(t 1,D) = r} = P{N*(t 1,D) = exp[ - = k, N*(t 2,D) = r + k} A(t1)P~(D)] (A(t1~~(D)t x exp[ _A(t1,t2)P~(D)J(A(t1,t2~p~(D»r, r. 56 2. The Poisson Process and Its Ramifications from which we conclude that {N*(t, D); t ~ O} is a Poisson process with E {N*(t, D)} = A(t)P~(D). This proves the assertion. 0 2.10. Modeling of Floods In spite of the experience accumulated over many centuries in dealing with floods, losses in property and lives not only increased considerably in recent times but all indications are that they will increase even more in the future. How and why? Is it because the rainfall-runoff relationships have changed, or that hydrological factors responsible for floods have multiplied all of sudden? There is plenty of evidence that this increase is not due to a dramatic shift in the natural balance. Instead, the escalation of flood damage around the world is a result of new factors emerging gradually in our societies. We shall indicate one of them. In many of the highly industrialized and densely populated areas of the world, a reduction of the natural retention area of the flood plain has taken place. Due to this fact, the flood waves have increased in amplitude and accelerated, resulting in more flood damage downstream than had ever been anticipated before. In fact, there are reaches of some European rivers in which the last few years have repeatedly brought floods which exceeded the 100-year design flood, on which the designs of various hydraulic structures and flood protection works were based. Denote by ~(t) the discharge rate of a streamflow at a given site. Clearly, {~(t); t ~ O} is a non-negative stochastic process. We can assume, without violating our physical intuition, that almost all sample functions of ~(t) are continuous. Then, according to Proposition 1.6.1, there exists a version which is separable and measurable. Set x(t) = sup ~(s). (2.10.1) It is apparent that x(td ::::;; X(t 2 ) for all 0::::;; t1 < t 2 • For the purpose of flood modeling, it is essential to determine <I>(x,t) = P{X(t)::::;; x}. (2.10.2) Due to the separability property, the function <I>(x, t) is well defined. Unfortunately, we know very little about the stochastic process ~(t) and its stochastic structure which makes an evaluation of <I>(x, t) extremely difficult. For this reason, the common approach in practice to the problem of estimation of the distribution <I>(x, t) is based on an empirical procedure [the method of best curve fit to the observed values of the maximum X(t)]. This clearly is not satisfactory and we will attempt a different approach. Our approach is based on the following rationale. Because the floods are our concern, then only those flows that can cause flooding are of interest to 57 2.10. Modeling of Floods ~(t) o Figure 2.3. A realization of the flow process ~(s). us. For this reason we will confine our attention to those flows which exceed a certain threshold level Xo (see Figure 2.3). The truncated part of the process ~(s) above Xo that lies between an upcrossing and the corresponding downcrossing of this threshold is called an "exceedance" or an "excursion." The time point at which the nth exceedance achieves its maximum is denoted by Tn' n = 1,2, .... These exceedances are, of course, caused by the rainfalls. It is quite clear that {Tj}l' is a sequence of r.v.'s such that (2.10.3) Thus, {Tj}l' represents a simple point process obtained by the thinning of the point process associated with the rainfall occurrences at the given location. It seems "intuitively plausible" to assume that we are dealing with independent thinning. If Xo is high enough, the point process {Tj } l' should have a Poisson distribution, which has been confirmed by observation records. Let {N(t); t ;;::: O} be the counting function of the point process {Tj}l' which is assumed to be Poisson with E{N(t)} = A(t) and set (2.10.4) For the sake of simplicity, we will assume that {Xk}l' is an i.i.d. sequence of r.v.'s, independent of {Tdl'. In such a case, it follows from Proposition 2.9.3 that {(Xk' Tk)}l' is a marked Poisson process. Denote by N*(t, D) its counting function, and set E {N*(t, D)} = A(t)Px(D), where A(t) = E{N(t)} and P{Xl ~ x} = H(x). (2.10.5) Write x*(O) = 0, X*(t) = SUP{Xk;Tk ~ t}, Ft(x) = P{X*(t) ~ x}, x;;::: O. (2.10.6) (2.10.7) 58 2. The Poisson Process and Its Ramifications It is easy to see that, for every x ~ 0 and t ~ 0, {X*(t)::;; x} = {N*(t,(x, (0» = OJ. (2.10.8) Ft(x) = exp{ -A(t)[1 - H(x)}. (2.10.9) From this, we obtain Next, we will investigate some properties of the stochastic process {X*(t); t ~ OJ. For any 0 ::;; s < t, set X*(s, t) = sup {Xj; 1) E (s, tJ}. (2.10.10) Because the increments of N*(t, D) are independent, we have P{X(t)::;; x} = P{X(s)::;; x,X(s,t)::;; x} = P{N*(s,(x, (0)) = 0, N*(t,(x, (0» - N*(s,(x, (0» = O} = P{X*(s)::;; x}P{X*(s,t)::;; x}. (2.10.11) Hence, * _ P{X*(t) ::;; x} P{X (s,t)::;; x} - P{X*(s)::;; x}" (2.10.12) Using (2.10.11), we can easily show that X*(t) is a Markov process. Indeed, for any o< tI < ... < tIl < t and 0 < x I ::;; ... ::;; x" < x, n = 1, 2, ... , we have P{x*(t)::;; xlx*(td = Xl' ... , X*(t ll ) = x,,} = P{x*(tn )::;; X,X*(tno t)::;; xlx*(t l ) = Xl' ... , X*(t,,} = XII} = P{X*(t", t) ::;; x}. This proves the assertion. An extensive discussion on this topic can be found in the work of Todorovic (1982). Problems and Complements 2.1. A switchboard receives on average two calls per minute. If it is known that the number of calls is represented by a homogeneous Poisson process, find (i) the probability that 10 calls will arrive in 5 minutes; (ii) the probability that the third call comes during the second minute. 2.2. An orbiting satellite is subject to bombardment by small particles which arrive according to a homogeneous Poisson process. Let p > 0 be the conditional probability that a particle which has hit the satellite will also hit a given instrument on it. Determine the probability that in (0, t]: Problems and Complements 59 (i) the particular instrument will be hit exactly k times; (ii) the instrument will be hit at least once. 2.3. Let {N(t); t ;::: O} be a homogeneous Poisson process. Show that for any 0 < s<tandO<k<n P{N(s) = kIN(t) = n} = (n)k (S)k t (1 - ts)n-k . 2.4. A life insurance company has established that the average rate of the policy holders whose insurance is less than $100,000 is a constant p. Assume that arrival times of insurance claims represent a homogeneous Poisson process. If death has nothing to do with the value of a particular policy, find: (i) the probability that there will be exactly k claims, each less than $100,000, in an interval (0, t]; (ii) the probability that all the claims in (0, t] will be higher than or equal to $100,000. 2.5. Suppose that all car drivers of a city are categorized in several groups, say AI' A 2 , ... , An' according to their driving abilities. It is established that every car accident for drivers in group Ai carries a small probability p of fatality. Suppose that accidents (for members from Ai) occur according to a homogeneous Poisson process. If the random event "fatal accident" is independent of the process, determine: (i) the probability that a driver from Ai will stay alive during (0, t]; (ii) the expected time of the fatal accident. 2.6. Buses arrive in accordance with a homogeneous Poisson process. Somebody arrives at the bus stop at time t. Find the distribution ofthe waiting time for the next bus. 2.7. Let {N(t);t;::: O} be a homogeneous Poisson process and 0 < s < t. Show that, for any k = 1, 2, ... , n, P{~k ~ sIN(t) = n} = it C)(iY (1 - on-i. 2.8. Suppose that the number of eggs laid by an insect in an interval of time (0, t] is a homogeneous Poisson process with mean rate .t Assume that the probability of one egg hatching is p > O. If Xl (t) is the number of hatched eggs and X 2 (t) the number of unhatched eggs in (0, t], show that (assuming independence of eggs) X 1 (t) and X 2 (t) are independent Poisson processes with mean rates AP and A(1 - p), respectively. 2.9. Let {N(t);t;::: O} be a Poisson process with E{N(t)} = At. Show that 1 - N(t) -+ A (a.s.) t as t -+ 00. 2.10. Let {N(t); t;::: O} be a Poisson process with E{N(t)} = A(t). Show that [see (2.6.12)] P{T,,+1 > where Sl < ... < Sn tlt1 = s'''''~n = sn} = exp{ -A(t + sn) + A(sn)}, and t > O. 60 2. The Poisson Process and Its Ramifications 2.11. Let {N(t);t ~ O} be a Poisson process with E{N(t)} = At. It undergoes independent thinning with a probability P (see Section 2.8). Let N1(t) be the number of recorded points and N 2 (t) the number of deleted points. Show that Nl (t) and N 2 (t) are independent Poisson processes with E{N1(t)} = Apt. 2.12. Show that E(exp { -i~ J(T;)}) = exp { - t<Xl {l- exp[ -J(S)]} dA(S)} , where {Tn} is the sequence of arrival times of a Poisson process N(t) with E{N(t)} = A(t) and J(.) ~ O. 2.13. Let {Ni(t)~ be n independent Poisson processes with E{Ni(t)} = At. Show that <Xl ( lim ~ J x n-+oo k-O k) (Ant)k + - -k' exp( -Ant) = n. J(x + At), where J( .) is a continuous function. 2.14. Let X in , ... , Xnn be a sequence of independent Bernoulli r.v.'s such that P{Xin = 1} = Pin' Pin + qin = 1, for all i = 1, 2, ... , n, and n L Pin = i=l A (independent of n). (As an example consider a town in which an arsonist secretly lives, and if there are n houses, excluding his, Ptn is the probability of a particular house being burned down.) Show that Sn = n !t' L X in -+ N, i=l where N is a Poisson r.v. with E{N} = A. 2.15. Let gn}':' be a sequence of independent r.v.'s such that Pg n = k} = and An -+ +00 as n -+ 00. e-;.J?t Show that P {en - An y0l. ~ x} -+ (2nrl/2 fX e- 1/2 z2 dz. -<Xl 2.16. Let {N(t); t ~ O} be a Poisson process with E{N(t)} = A(t). Determine the probability density of (T 1"'" Tn) and of T. - T.- 1, n = 1, 2, .... 2.17. Let {N(t);t ~ O} be the counting random function of a simple point process. If N(t) is a homogeneous Markov process (see Proposition 2.6.3), determine P{N(t) = k} and Pij(u) [N(t) is called the pure birth process]. 2.18. For a pure birth process (see the previous problem), find necessary and sufficient conditions for <Xl L to hold. "=0 P{N(t) = n} = 1 Problems and Complements 61 2.19. Let {N(t); t ~ O} be a nonhomogeneous Markov process such that the limit An(t) = lim -hI P{N(t + h) = n + IIN(t) = n} h~O exists; in the case when An(t) = CPo(t) + nCP! (t), N(t) is called a Polya process. Determine P{N(t) = n} in the case when IXA CP! (t) = 1 + <xtA' IX ~ 0, A> 0, and show that I 00 k=O P{N(t) = k} = 1. 2.20. Let.P be the smallest ring of subsets of R+ = [0, (0) (Le., if A, B E .P, A u B E .P, and A - B E .P) containing all intervals of the form (a, b] c R+. Let Jl(.) be a nonatomic Borel-Radon measure on R+ (i.e., Jl is non-negative and finite on compact subsets of R+). If, for each D E.P, P{'1(D) = n} [Jl(D)]n = exp[ -Jl(D)]--, n! the point process '1 [see (2.2.5)] is a Poisson process [This is due to Renyi (1967).] 2.21. (Continuation) If, for each D E .P, P{'1(D) = O} = exp[ -Jl(D)] and P{'1(f') ~ 2}:5: Jl(i)cp(Jl(r)), where cp(.) ~ 0 is increasing and such that cp(x) ~ 0 as x ~ 0, then '1(.) is a Poisson process. 2.22. Let {N(t);t ~ O} be a Poisson process with E{N(t)} = A(t) and arrival times {tn}1'. Consider {(t"~n)1', where {e.}1' is an i.i.d. sequence with common dJ. H(x) = P{e. :5: x}, independent of N(t). (i) Show that {X(t); t ~ O} is a Markov process, where X(t) = max {ek; tk :5: t}. (ii) Let T(t) be the time instant in (0, t] where the maximum X(t) is achieved. Find the dJ. of T(t). 2.23. (Continuation) Determine P{x(t) :5: x, T(t) :5: s}. CHAPTER 3 Elements of Brownian Motion 3.1. Definitions and Preliminaries In Chapter 1 (Section 1.6, Example 1.3), we discussed in some detail the nature and causes of the random motion of a small colloidal-size particle submerged in water. According to kinetic theory, this movement is due to the thermal diffusion of the water molecules, which are incessantly bombarding the particle, forcing it to move constantly in a zigzag path. The phenomenon was named "Brownian motion" after R. Brown, an English botanist who was first to observe it. In 1904, H. Poincare explained that large particles submerged in water do not move, notwithstanding a huge number of impacts from all directions by the molecules of the surrounding medium, simply because, according to the theory of large numbers, they neutralize each other. The qualitative analysis of the motion was given independently by Einstein (1905) and Smoluchowski in 1916. However, a rigorous mathematical formulation of the problem was formulated a decade later by N. Wiener. To acknowledge his contribution, the stochastic process, which represents a mathematical model of the motion, is called the Wiener process. In the modern literature, the term Brownian motion process is used more often than the Wiener process. In this book, we shall use both names equally. Throughout this chapter we shall confine ourselves to the one-dimensional aspect of the motion. Suppose we begin our observation of the wandering particle at time t = 0 and assume it is at point x = O. Denote by ~(t) its position on the line at time t > O. Due to the chaotic nature of the motion ~(t) is a r.v. for every t > 0 with ~(O) = O. Examining further the nature of this phenomenon, it seems reasonable to suppose that E {~(t)} = 0 for all t ;;::: O. Finally, if the temperature of the water remains constant, the distribution of 3.1. Definitions and Preliminaries 63 any increment e(t + s) - e(t) should not depend on t. This gives rise to the following definition: Definition 3.1.1. A standard Brownian motion or Wiener process is a stochastic process {W);t ~ O} on a probability space {n,Lf,p} having the following properties: (i) e(O) = 0 (a.s.); (ii) {e(t); t ~ O} has independent increments; (iii) for any 0 ::;; s < t, P{W) - e(s)::;; x} = J 21t(t1 - s) fX exp ( - 2( u »)dU. t- s 2 -<Xl (3.1.1) Remark 3.1.1. As any process with independent increments, e(t) is a Markov process; the stationary transition probability function of Brownian motion is P{W + 't)::;; xle(t) = y} = ~ V 21t't f X - Y -00 exp (- U 2't )dU. . 2 (3.1.2) From (3.1.1), it follows that for any 0::;; s ::;; t Var{ ~(t) - e(s)} =t - (3.1.3) s. On the other hand, E g(s)e(t)} = E g(s) [W) - e(s)] = + ~2(S)} E{e(s)} = s. In other words for two arbitrary non-negative numbers u and v, Eg(u)~(v)} = min{u, v}. (3.1.4) The increments of a standard Brownian motion are stationary in the sense that the distribution of ~(t + 't) - W) depends only on 'to From the fact that e(t) has independent increments and (3.1.1), we can easily determine the joint probability density of (e(t 1 ), ... , (t n )) for any 0 < t 1 < ... < tn' It can be readily seen that this density is given by the expression ft" .. .,tJX1,"·' xn) = ft, (xdft 2 -t, (X2 - xd ... ftn-ln_'(X n - xn-d, (3.1.5) where 1 (X2) ft(x) = --exp -- . 2t .Jbd (3.1.6) The system of distributions (3.1.5) clearly satisfies the necessary consistency conditions ofthe Kolmogorov theorem 1.4.1 This assures the existence of a standard Brownian motion. Roughly speaking, there are two types of statements to be made about any stochastic process. The first one deals with its distributional properties and 64 3. Elements of Brownian Motion the second one with sample path properties. Concerning the distributional properties of Brownian motion, Equations (3.1.5) and (3.1.6) provide enough information in this respect. As far as the sample path properties are concerned, we will show that almost all of them are continuous functions. To this end, we need the following lemma. Lemma 3.1.1. Let X be a r.v. with N(O, (12) distribution, then E{X4} = 3(14. PROOF. o We now can prove the following statement. Proposition 3.1.1. With probability 1, every sample function of a separable Brownian process is uniformly continuous on every finite interval. PROOF. Let {~(t); t ~ O} be a separable version of a standard Brownian motion (which, according to Doob's proposition 1.7.2, always exists). Then according to (3.1.1) and Lemma 3.1.1 for any t ~ 0 E(~(t + h) - ~(t))4 = 3h 2 • From this and Theorem 1.9.1 (with C = 3, fJ = 1, and ~~ IX = 4), the assertion 0 Before concluding this section, we will mention that the process {X(t); t ~ OJ, defined as X(t) = IXt is called a Brownian motion with drift process {Z(t);O::;; t ::;; 1}, specified as Z(t) = ~(t) + ~(t), IX. (3.1.7) On the other hand, the stochastic - t~(1), (3.1.8) is called a Brownian bridge (or tied down Brownian motion). In both cases, ~(t) is, of course, a standard Brownian motion. From (3.1.8), we see the following: Z(O) = Z(1) = 0 (a.s.); in addition, the Brownian bridge is a Gaussian stochastic process. In other words, for any 0 < t1 < ... < t n ::;; 1, (Z(td, ... ,Z(tn )) has a multivariate normal distribution. 3.2. Hitting Times 65 3.2. Hitting Times Let {e(t); t ~ O} be a separable version of a standard Brownian motion process. Denote by 'x the first instant t > 0 at which e(t) = x (the hitting time of x). More formally, 'x(w) = inf{t > O;W,w) = x}. (3.2.1) Clearly, {'x~t}={sup OS8St e(s):::;x} (3.2.2) (see Figure 3.1). From this, we conclude, taking into account that the e(t) is separable, that 'x is a r.v. for all x E R such that 0 < 'XI < 'X2 (a.s.) for all Xl < x 2 • It is also clear that 'x and Lx have the same distribution. Denote by ,~ the time of the first crossing of the level x by the process e(t), i.e., ,~= inf{t > O;e(t) > x}. (3.2.3) Our physical intuition compels us to deduce that 'x :::; ,~. We will actually show that 'x = ,~ (a.s.). We now embark on the problem of determining the distribution of 'x' Proposition 3.2.1. For all x E R, P{'x:::; t} = 2P{W) ~ x} and E{,x} = 00 (3.2.4) if x =F O. PROOF. Suppose first that x > O. From (3.2.1), we readily deduce that for an instant t ~ 0 xr-------------~~~---- O~~~--------~------~ Figure 3.1. Random hitting time 'x' 66 3. Elements of Brownian Motion Consequently, {e(t) Next, because for all 0 P{e(t) ~ ~ x} C {'t"x (3.2.5) t}. ~ s < t and x E R, x} = P{e(t) ~ xl~(s) = ~ xl~(s) = x}, which is easy to prove using (3.1.2), it seems quite clear that P{e(t) ~ xl't"x ~ t} = P{ W) ~ xl't"x ~ t} (3.2.6) because at the instant 't"x the process was at x. Therefore, taking into account (3.2.5), we obtain P{e(t) P{e(t) ~ xl't"x ~ t} = P{ x} } . t (3.2.7) e- 1/2z2 dz, (3.2.8) ~ 't"x ~ This and (3.2.6) then yield P{'t"x which clearly shows that E{'t"x} = ~ 't"x t} = 2P{e(t) < ~ x} f'X) = (2/n)1/2 00 (a.s.) for all x> O. On the other hand, Jx/ 0 roo P{'t"x > s}ds = rOO {1_(2/n)1/2 roo Jo = (2/n)1/2 Jo LOO ds J~~ e-1/2Z2dZ}dS f:/~ e- 1/2z2 dz = (2/n)1/2 IX) e-1/2z2 dz f: 2Z2 ' ds This completes the proof of the proposition in the case when x > o. If x < 0, the assertion follows from the fact that 't"x and Lx have the same distribution. Thus, from (3.2.8), we have P{'t"x ~ t} = (2/n)1/2 roo JI I/0 e- 1/2z2 dz. 0 X Coronary 3.2.1. From Proposition 3.2.1, we deduce that given x # 0, the standard Brownian motion process will hit x with probability 1. However, the average length of time to achieve this is infinite. Concerning the r.v.'s 't"x and 't"~, we have the following result. Lemma 3.2.1. For all x E R, 't"x = 't"~ (a.s.) Proof. Because ~(t) and -e(t) have the same distribution, it suffices to prove the statement for x > o. Clearly, 't"x ~ 't"~; hence, what we have to show is that 3.3. Extremes of ~(t) P{'x < ,~} = O. 67 Obviously, {'x < ,~} iQ nQ t~~£/n e(s) = C x}. (3.2.9) Therefore, to prove the lemma it suffices to show that, for all x > 0, P{suPo<S:51 e(s) = x} = O. To this end, note that for tl < t, P tSs~~t e(s) = x} = P t~~~tl e(s) = x, tlS~~t e(s) ~ x} +P t~~~tl e(s) ~ x, tlS~~t e(s) = x} ~ P t~~~tl e(s) = x} + s:oo P {Wi) E dy, tlS~~t e(s) - e(t 1 ) = x- y} =pt~~~tl e(s)=x}+ s:oo P{WdEdy}pLS~~t e(S)-Wl)=X- y }. However, P{SUPt19:51 e(s) - Wi) = x - y} = 0 for almost all y (with respect to Lebesgue measure). Thus, ptss~~t e(s) = x} ~ pt~~~tl e(s) = x}. In other words, P{SUP09:51 e(s) = x} does not decrease as t LO. On the other hand, due to the continuity of e(t), we have for all e > 0 p{~~;t e(s) = e} --+ 0 as t LO. Thus, p{ sup e(s) OSsSt = x} ~ lim p{ sup e(s) > !x} = t--+O OssSt 2 O. o From this and (3.2.9), the assertion follows. Remark 3.2.1. The last inequality follows from { SUP OssSt e(s) = x} C {sup e(s) > !x}. OSS:51 2 3.3. Extremes of ~(t) The result (3.2.4) is the key to many properties of the Brownian sample functions. For instance, from (3.2.2) and (3.2.4), we obtain that P{X(t) > x} = (2/n)1/2 (00 Jx/JI e- 1/2z2 dz, (3.3.1) 68 3. Elements of Brownian Motion ~(s) ~~----~---------L-----+s Figure 3.2. The reflection principle. where x(t) = (3.3.2) sup ~(s). O:o;s:o;t We will give one more proof of (3.3.1) using the so-called reflection principle. The following argument is not easy to formalize, although it seems intuitively quite clear. We have P{x(t) > x,~(t) > x} = P{X(t) > x,~(t) < x}. (3.3.3) This can be explained heuristically as follows: For every sample path of the process ~(t) which hits the level x before time t, but finishes above this level at time t, there is another "equally probable trajectory" (shown by the dotted line in Figure 3.2) that hits x before t and such that ~(t) < x. This is the meaning of (3.3.3). On the other hand, P{X(t) > x} = P{X(t) > x,~(t) > = 2P{X(t) > x,~(t) x} + P{X(t) > x,~(t) < x} > x} due to (3.3.3). The proof of (3.3.1) now follows from the fact that P{X(t) > x,~(t) > x} = Pg(t) > x} and from (3.1.1). Remark 3.3.1. Equation (3.3.1) leads to a very interesting conclusion: For all t > 0, P{x(t) > O} = 1. On the other hand, pt~~~t ~(s) < o} = p{ - O~~~t (-~(s)) < o} = pts:o;~~t (-~(s)) > o}. 3.3. Extremes of W) 69 But due to the symmetry of e(t), o} P {SUp e(s) > OSsSt = p{ sup (-e(s» OssSt > o}. Consequently, for all t > 0, pt~~t e(s) < o} = 1. Therefore, starting from point x = 0 at time t = 0, a trajectory of a standard Brownian motion process will intersect with the t axis infinitely many times in (0, t] for any t > O. This shows how irregular sample paths of a Brownian motion process are, and yet, almost surely they are continuous functions. The following proposition gives the joint distribution of (X(t), 'l"x). Proposition 3.3.1. For any t > 0 and x:::;; y, P{X(t) :::;; y, 'l"x :::;; u} {1 YX = ( -x) fU f - exp --2 (V2 + -X2)} n 0 0 t- s s Jdvds . s s(t - s) (3.3.4) PROOF. The following argument is not rigorous but it can be made so using the strong Markov property concept which will be discussed later. Because the Brownian motion process has independent increments and the distribution of e(s + t) - e(s) does not depend on s, it seems intuitively clear that once it reaches a point x at time t x ' it behaves after tx as if x is its starting point. Therefore, given that tx = s, where 0 < s :::;; t, we have, for all 0 :::;; x :::;;y, P{x(t):::;; YI'l"x = s} = p{ sup e(u):::;; YI'l"x = s} s~u:S:t =p{x+ sup OsuSt-s e(u):::;;y} = P{X(t-s):::;;y-x}. From this, (3.3.1), and (3.2.8), we obtain P{X(t) :::;; y, 'l"x:::;; u} = s: P{X(t):::;; YI'l"x (x) fU ( ="2 X which proves the assertion. 0 2 n(t - s) = s} dP{t x :::;; s} )1/2 f Y-X {_V2} 0 exp 2(t - s) ( ns23 )1/2 exp (X2) - 2s dv ds, D 3. Elements of Brownian Motion 70 Denote by T(t) the epoch of the largest value of ~(s) in [0, t]. In the sequel we will attempt to determine the joint distribution of (X(t), T(t)). Proposition 3.3.2. For all 0 < u < t and x > 0, P{X(t) PROOF. E dx, T(t) du} E = nu J u(tx - u) X2) dxdu. exp ( --2 u (3.3.5) Our proof of the proposition is based on the observation that T(t) = 'x Consequently, for any 0 < P{X(t) E dx, 'x < t, S E on the set {X(t) = x}. ds} = P{'x E dslx(t) = = P{T(t) E dslx(t) x}P{X(t) E dx} x}P{X(t) E dx} = = P{T(t) E ds,X(t) E dx}. But from (3.3.4), we have P{X(t)EdY"x Edu }= nu J u(tx - u) [ 1 exp --2 (3.3.6) ((y - X)2 +X2)] dudy. t- u u (3.3.7) Replacing y with x in this equation, it follows from (3.3.6) that, for 0 < u < and x> 0, P{X(t) E dx, T(t) E du} = nu J u(tx - u) t X2) dudx, exp ( --2 u o which proves the assertion. Corollary 3.3.1. From the last equation, we have P{T(t)Edu} f oo = o P{X(t)Edx,TEdu}= n J ~ u(t - u) . (3.3.8) Therefore, P{T(t) :::;; s} 1 = - n fS J 0 du u(t - u) 2 . = -arCSIn n Ii -. t This is the famous arc sin law. In Figure 3.3, a graphical depiction of its probability density is given. 3.4. Some Properties of the Brownian Paths __ 71 _ _ _ _ _ _ L -_ _ _ _ _ _ L -_ _ o s t/2 Figure 3.3. Graphical presentation of arc sin law. 3.4. Some Properties of the Brownian Paths The Brownian motion process is governed by subtle and remarkable principles. Some feeling for the nature of the process may be obtained from considerations of the local properties of its paths. Proposition 3.4.1. Let g(t); t ~ O} be separable standard Brownian motion process, then almost all its sample functions are not differentiable at any t ~ O. PROOF. We want to show that for all t 0 and h > 0, ~ p{lim e(t+h)-W)=oo}=1. h.... O+ h (3.4.1) To prove this, note that for any 0 < h < b, sup e(t O<h<d + hh - W) ~ ~ sup (e(t u O<h<d + h) - W». From this and (3.3.1), we have for any x > 0, P {sup O<h<d W+ ~ - e(t) > x} ~ P { sup + h) - W» > bX} = p{ sup e(h) > bX} O<h<d (e(t O<h<d = (-2)1/2 fOO e- 1/2z2 dz -+ 1 "./6 as b -+ O. From this, (3.4.1) follows. Because W) and - W) have the same 1t distribution, it follows also that 3. Elements of Brownian Motion 72 P {lim W + h) - ~(t) = h h-+O+ _ oo} = 1. o This proves the assertion. Corollary 3.4.1. For almost all ro, ~(-, ro) does not have bounded variation on any interval of [0, (0). Hence, the graph of ~(t,ro) is not rectifiable. Furthermore, for almost every ro, there is no interval (a, b) c [0,(0) on which ~(', ro) is monotone. Indeed, if ~Lro) were of bounded variation on a finite interval, it would be differentiable almost everywhere (Lebesgue measure) there. Also, if ~(', ro) were monotone on an interval, it would be differentiable at almost every point of this interval. In the previous section (see Remark 3.3.1) we have shown that p{ sup O~s~h ~(s) > o} = ~(s) < o} = p{ inf O~s~h 1 for any arbitrary small h > O. The next result yields properties of Brownian sample functions for very large t. Proposition 3.4.2 p{ sup Ost<oo PROOF. p{ W) = +oo} = inf Ost<c:() W) = -oo} = (3.4.2) Let k > 0 be an integer. Then taking into account (3.3.1), p{ sup Ost<:x) ~(t) > k} ';? p{ sup ~(s) > k} O.s;s:::.;;t = (2/n)1/2 (00 e-1/2z2 dz -. (2/n)1/2 Jk/jt as 1. t -. 00. Thus, p{ sup O~t<oo W) = oo} = foo e- 1/2z2 dz = 1 0 p{n k=l {sup O~t<oo ~(t) > k}} = 1. On the other hand, p{ inf O:::;t<oo ~(t) = -oo} = which proves the proposition. p{ sup O:$t<oo (-e(t» = oo} = 1, o According to the Corollary 3.4.1, almost all sample paths of a Brownian motion process have infinite variation in any arbitrary small interval of time. In contrast to this unpleasant feature of the process, a very sharp positive 3.4. Some Properties of the Brownian Paths 73 statement can be made about the so-called "quadratic variation" of Brownian sample paths, as the following result clearly shows: Proposition 3.4.3. Let 0 = t no < tnl < ... < tnn = t be a partition of the interval [0, t], such that max;(tn,i+l - tni ) --. 0 as n --. 00. Then l.i.m. n~oo n-l L (Wn,i+d - i=O (3.4.3) e(tni ))2 = t and (3.4.4) where l.i.m. indicates the convergence in quadratic mean. PROOF. From (3.1.3), we have E(e(t n,i+1) - e(tnJ)2 = tn,i+l - tni , (4.3.5) so that n-l L (e(tn,i+d - E i=O e(tni ))2 = t. (3.4.6) Therefore, to prove (3.4.3) it is sufficient to show that Var tta (Wn,i+d - Wn;))2} --.0 as n --'00. But from Lemma 3.1.1 and (3.4.5), we obtain Var tta = ;))2} = ~ta Var {(Wn'i+d - (e(tn,i+l) - e(tn n-l L i=O =2 {E(e(tn,i+d - Wni))4 - (tn,i+l - tnY} n-l L (tn,i+1 i=O n-l t ni )2 ~ 2 sup (tn,i+l - t ni ) L: (tn,i+l - t ni ) O,;;i';;n i=O = 2t sup (tn,i+l - t ni ) --. 0 O,;;i';;n as n --. 00, which proves (3.4.3). To prove the second part of the proposition, set tni = (i/n)t and consider Wni))2} 3. Elements of Brownian Motion 74 By the Markov inequality, we have (3.4.7) After some elementary calculations, using the fact that for any 0 :::; s - t E(~(t) - ~(S))2k = 1 . 3··· (2k - we deduce that E{Y,,4} :::; c k 1)(t - S)k, GY = 1, 2, ... , (3.4.8) t, where C > 0 is a constant. From this and (3.4.7), it follows that, for any B > 0, P{I Y"I > B i.o.} :::; Ct4 Consequently, f -.;n < 00. n=l o Y" ~ 0 (a.s.) as n ~ 00. This completes the proof. Corollary 3.4.2. By means of this result, we can show once more that almost all sample functions of a Brownian process are not rectifiable. First, we have the inequality n-1 L (~(tn,i+1) i=O ~(tni)f:::; sup 0,,;i";n-1 IWn,i+1) - ~(tni)1 n-1 L IWn,i+1) - i=O Wni)l· But, from (3.3.1), we have that sup 0,,;i";n-1 IWn,i+d - Wni)1 ~ 0 (a.s.) From this and Proposition 3.4.3, it clearly follows that n-1 L i=O 1~(tn,i+1) - Wni)1 ~ (a.s.), 00 which is the conclusion of the corollary. 3.5. Law of the Iterated Logarithm One of the most basic propositions in probability theory is the strong law of large numbers. It states that if {~;}f is an i.i.d. sequence of r.v.'s with EI~ll < 00 and Egd = 0, then Snln ~ 0 (a.s.) as n ~ 00, where Sn = ~1 + ... + ~n' Roughly speaking, it means that for any B > 0, ISnl will be less than nB if n is sufficiently large, so that Sn oscillates with an amplitude less then nB. 3.5. Law of the Iterated Logarithm 75 In many situations, it is of some interest to have more precise information on the rate of growth of the sum Sn as n -+ 00. This is given in the form of the celebrated law of the iterated logarithm, which is considered as a crowning achievement of the classical probability theory. The following proposition is a precise formulation of the law of the iterated logarithm. Proposition 3.5.1. Let E(en = 1. Then g;}f be an i.i.d. sequence of r.v.'s with p{lim sup n-+oo p{lim inf n-+oo E(el) = °and Sn = 1} = 1, J2nln In n J 2nSnIn In n = -1} = 1. This result for bounded r.v.'s was first obtained by Khintchine in 1924. Later, it was generalized by Kolmogorov and Feller. Under the conditions of the proposition, it was proved by Hartman and Wintner (1941). In this section, we will establish an analogous result for a standard Brownian motion process. But first we will prove the following auxiliary result. Lemma 3.5.1. Let a r.v. X have the N(O, 1) distribution. Then, for any x > 0, ~(~ - ~)e-X2/2 < P{X > x} < ~e-X2/2. v' 2n x PROOF. x Xv' 2n For all y > 0, Multiplying by follows. (1 -:4) e- y2 /2 < e- y2 /2 < (1 + :2) (3.5.1) e- y2 /2. 1/.j2n and integrating over (x, (0), where x> 0, the assertion Corollary 3.5.1. For x> 0 °sufficiently large, P{X > x} '" _1_e- x2 /2 (3.5.2) P{X > x} < e- x2 / 2 • (3.5.3) x.j2n and, for all x > 1, This result follows readily from Lemma 3.5.1. We are now ready to embark on the problem of proving the law of the iterated logarithm for a Brownian motion process. We will first prove the so-called local law of the iterated logarithm. 76 3. Elements of Brownian Motion Proposition 3.5.1. Let {e(t); t ~ O} be a standard Brownian motion process. Then . sup P { hm t~O p{lim inf t~O PROOF. W) J2tlnlnC 1 W) = J2t In In C 1 1} = 1, (3.5.4) -1} = 1. (3.5.5) = We clearly have P{W) > x} = (1/2n)1/2 foo e- u2 /2 du. x/ji From this and (3.5.2), it follows that P{W) > x} '" as x/Jt --+ +00. _t~e-x2/2 (3.5.6) xyTic Let 0 < b < 1 and consider the random event Bk defined by Bk = { sup e(s) > (1 O-<;s-<;b k + e)X k+ 1}, where e > 0 is arbitrarily small and X k = Jbklnlnb k. Then as k --+ 00 it is clear that xk/ft --+ (3.5.6), we have, as k --+ 00, P(Bk ) = 2P{eW) > (1 00. + e)xk+d Therefore, from (3.2.2), (3.2.4), and ((1 ft + e)2xf+l) 2)1/2 '" ( exp - ---::-::-;:-n (1 + e)xk+l 2bk = _1_(nb In b-k- 1)-1/2 (In b-k-1 )-b(l +£)2 1+ e (In b -1 )b(l +£)2 1 (1 Next, set b = (1 + e)fib (k + 1)b(1+£)2Jln(k + 1) + Inlnb- 1· + et 1 ; then clearly P(B ) '" k _C~(e=)= k1+e~' (3.5.7) where C(e) Because the series If 1/k 1 +e = + e)](l +e) . In(1 + e) [1n(1 converges, it follows readily from (3.5.7) that 77 3.5. Law of the Iterated Logarithm and, consequently, by the Borel-Cantelli lemma, P{Bk i.o.} = O. Therefore, for all sufficiently small t > 0 (for t < bk for some k) ~(t) < (1 + e)J2t In In t 1 or, equivalently, . P { lIm sup 1.... 0 J 2t ~(t) In In t- } 1 < 1+e = 1 for all e > O. Let us now show that . P { lIm sup 1.... 0 To this end, define Dk = J 2t ~(t) In In C } 1 > 1- e = 1. (3.5.8) {~W) - ~(bk+l) > (1 - ~)Xk}' Clearly, {Dk}'i' is a sequence of independent events. Because the distribution of ~W) - ~(bk+l) is equal to the distribution of ~W(l - b», we have for large k that P(Dk ) = _1_ fo 1 e- u2j2 du 00 (1-£j2)xkjbk (1-b) Jb k(1 - b) e/2)xk exp '" fo(l - (1 - e/2)22bklnlnb-k) 2bk(1 - b) . From this, after some straightforward calculations, we obtain, as k -+ P(Dk ) JI=b '" 00, that k-«1-(£j2))/(1-b)) --"-------= --=~- 2(1 - e/2)J1r, From this, it follows that, for b ~ Jink e/2, the sum Thus, invoking the second Borel-Cantelli lemma, we have P{Dk i.o.} = 1. (3.5.9) Further, due to symmetry of W), it follows from (3.5.3) that for sufficiently large k and arbitrary ~ > 0 ~(bk+l) > - (1 + ~)Xk' 78 3. Elements of Brownian Motion This and (3.5.8) then imply that the events -(1 + ~(bk+l) + [~W) - ~(bk+1)] = ~W) > -(1 = b)Xk + (1 - ~e)Xk + b)xk + x ke/2 + (1 - e)Xk occur infinitely often. But, the right-hand side of this inequality can be made larger than (1 - e)Xk (by a proper choice of b). For instance, this is the case if b is such that which holds if (1 + b) In In a- k - 1 e b In In a k < 2' which is true if b is sufficiently small. This proves (3.5.8) and because e > 0 is arbitrarily small, relation (3.5.4) follows. To prove (3.5.5), note that due to the symmetry of ~(t), -w) · sup ----r"====;' 11m 1--+0 J2t In In t · 10 . f => 11m 1--+0 W) ----r"====;' J2t In In t = 1 (a.s. ) = - 1 (a.s..) This is the desired result. Next, we shall prove the following result. Lemma 2.5.2. If {~(t); t ;;::-: O} is a standard Brownian motion process, so is PROOF. Let 0 ::s;; u < t and consider E(exp{B{t~(D - u~G)]}) = E exp{Oi( ~ G)(t - u) - u[ ~ G) - ~G)])} o 79 3.6. Some Extensions Independence of increments in this case can be established by showing that they are uncorrelated. Specifically, E(ue(~)[te(D - ue(~) J) = E(uteG)e(D - u2e2G)) = utG) - u2 G) = o This proves the lemma. Proposition 3.5.2. Let {e(t); t ~ o} be a standard Brownian motion process; then . P { hm sup 1-+00 P {lim inf 1-+00 PROOF. o. J 2te(t)In In t = 1} = 1, J 2tW)In In t = -1} = 1. This follows from the fact that · sup I1m 1-+00 e(t) I· ue(1/u) 1 = 1m sup = .J2t In In t u-+O .J2u In In(1/u) ( ), a.s. and so on. 3.6. Some Extensions Every Brownian process has independent and normally distributed increments. These are the defining features of the process. Also, every separable version has, with probability 1, continuous sample paths. It is remarkable that, in a certain sense, the converse also holds. It other words, every stochastic process with independent increments is Gaussian if its sample functions are continuous with probability 1. This result is due to Doob. We now give a precise formulation of this statement. Proposition 3.6.1. Let {w(t); t ~ o} be a stochastic process with independent increments. If its sample functions are continuous with probability 1, then its increments are normally distributed. PROOF. We will show that, under the conditions of the proposition, w(t) is normally distributed for every t ~ O. Because, for any 0 ::::;; s < t, w(t) = w(s) + [w(t) - w(s)], it will then follow from the Cramer theorem that wet) - w(s) has a normal distribution. 80 3. Elements of Brownian Motion The stochastic process w(t) is clearly separable (see Proposition 1.10.1). Because every sample path of w(t) is continuous with probability 1, it is also uniformly continuous in every finite subinterval of [0, 00). Hence, for every e > 0, there exists a c5 = c5(e) such that p{ sup lu-vl <.I IW(U)-W(V)I~e}<e, Let e1 > e2 > ... and en -+ ° U,VE[O,t]. as n -+ 00. Consider where tni - tn,i-1 = t k n Set if Iw(tnJ - w(tn,i-dl ~ en if IW(tni) - W(tn,i-1)1 < en' Clearly, then, p{W(t) =I ~ Y"i} = = ~ en}) pCQ {Iw(tni ) - w(tn,i-dl p {s~p Iw(tnJ - W(tn,i-1)1 ~ en} < en' Now, using the independence of {Y"J, we have E(e i6w (t») = !~~ E (ex p (w ~ Y,,) ). Set IXni If IXn -+ IX and v" -+ (J2, E(e i6w(t») = v"i = E(y"J, where IX and (J are finite, we obtain kn = Var{Y"J, lim exp(i8IXn) TI E(exp[i8(Y"i - IXnJ]) n-oo 1 3.7. The Ornstein-Uhlenbeck Process 81 Therefore, which is the desired result. o 3.7. The Ornstein-Uhlenbeck Process The strange irregular motion of a small particle submerged in liquid, caused by molecular bombardment, was first described mathematically by L. Bachelier in 1900. He went so far as to note the Markov property of the process. In 1905, A. Einstein and, independently, M. Smolukhowski proposed theories of the motion which could be used, for instance, to evaluate molecular diameters. A rigorous mathematical theory of Brownian motion was developed by Wiener in 1923 (the Wiener process). This theory, however, makes no pretence of having any real connections with physical Brownian motion-no particle can follow a typical sample path of the Wiener process. In 1930, Ornstein and Uhlenbeck proposed yet another process, somewhat similar to the Wiener process but more closely related to physical reality. The foundation for their work was laid down 22 years earlier by P. Langevin, whose theory will be discussed briefly here. The theory of Brownian motion developed by Einstein and Smoluchowski were not based on Newtonian mechanics. Langevin's approach, on the other hand, relies heavily on Newton's Second Law of Motion. In what follows we will give a brief account of Langevin's model. Denote by m the mass of a Brownian particle suspended in liquid and let v(t) be its velocity at time t. There are two forces acting on this particle. One is the frictional force exerted by liquid, which according to Stoke's law, is given by - pv(t), where p > 0 is a constant which depends of the viscosity of the liquid and on the particle's mass and diameter. The second force acting on the particle is due to the effect of molecular bombardment. It produces instantaneous random changes in the acceleration ofthe particle. Denote this force by w(t). Then, according to Newton's Second Law of Motion, we have mAv(t) = - pv(t)At + Aw(t). (3.7.1) We assume that w(O) = 0 and that the following conditions hold: (i) the stochastic process {w(t); t ~ O} has independent increments; (ii) the distribution of w(t + s) - w(t) depends only on s; and (iii) the sample paths of w(t) are continuous with probability 1. But then, according to Proposition 3.6.1, w(t) is a Brownian motion process, possibly with drift. Assuming then that E {w(t)} == 0 (no drift) and 82 3. Elements of Brownian Motion putting E {W(t)}2 = u 2t, we can write (3.7.2) w(t) = uW), where ~(t) is standard Brownian motion. With this equation (3.7.1) becomes + uL\~(t). mL\v(t) = - pv(t)M Dividing by M and letting M -+ 0, we obtain m dv(t) = _ pv(t) dt + u d~(t) (3.7.3) dt ' which is called "the Langevin equation." The unpleasant thing here is that this equation contains the derivative of the Brownian motion process which, as we know very well, does not exist. Therefore, the equation does not formally make sense. The problem offinding a proper stochastic interpretation of the Langevin equation was resolved by Doob in 1942, in the following fashion. Write Equation (3.7.3) as m dv(t) =- pv(t) dt + u d~(t) (3.7.4) and try to give these differentials a suitable interpretation. We will interpret (3.7.4) to mean that, with probability 1, m r f(t)dv(t) = -p r f(t)v(t)dt +u r f(t)dW) (3.7.5) for all 0 ~ a < b < 00 and f(· ) a nonrandom continuous function on [a, b]. As we shall see in the next section, all these integrals exist when the stochastic processes are continuous with probability 1. Finally, if in (3.7.5) we put a = 0, b = t, and f(t) = e<Xt, where IX = p/m, we obtain Iot d(e<X'v(s» = -u m It e<X' d~(s). 0 From this, assuming that v(O) = vou/m (constant) we readily deduce that, with probability 1, v(t) = ; (voe-<xt + I e-<X(t-')d~(S»). (3.7.6) Therefore, the velocity v(t) of a Brownian particle is the stochastic process defined by (3.7.6). Definition 3.7.1. The stochastic process {v(t); t ;;::: O}, where v(t) is given by (3.7.6), is called the Ornstein-Uhlenbeck process. Integrating by parts, (3.7.6) can also be written as v(t) = ; (voe-<xt + ~(t) - lXe-<xt I e<XS~(S)dS). (3.7.7) 3.7. The Ornstein-Uhlenbeck Process 83 From this definition, we obtain immediately that the average velocity of a Brownian particle (according to the Ornstein-Uhlenbeck model) is (3.7.8) To determine its covariance, let t ~ s. Then (3.7.9) For t = s, we obtain Var{v(t)} = a)2 1 _ 2a (m e- 2IXt (3.7.10) From (3.7.6), we readily deduce that the Ornstein-Uhlenbeck process is Gaussian. This follows from the fact that the integral I e-IXSde(s) is the limit of sums of independent normally distributed r.v.'s. All these results can be summarized as follows. Proposition 3.7.1. The Ornstein-Uhlenbeck process {v(t); t ~ O} is a Gaussian process with E{v(t)} = !!..voe- lXt, m Cov(v(t), v(s» = a)2 (m e-IXlt-sl _ 2a e-IX(t+s) . To summarize, the solution v(t) of the Langevin equation (3.7.4) is called the Ornstein-Uhlenbeck process. It is a model for the velocity of a Brownian particle. The "derivative" of e(t), which formally does not exist, is called a (Gaussian) white noise (the reason for this will be explained later), However, 84 3. Elements of Brownian Motion because X(t) = + Xo I v(s) ds is the displacement of the particle, x(t) is the physical Brownian motion. Therefore, v(t) is the physical noise. By letting t --+ 00, we obtain from (3.7.8) and (3.7.10) that E{v(t)} --+ 0 and Var{v(t)} --+ (;y 21a = p. (3.7.11) Thus, v(t)~v(CX) v(oo)~N(O,p). and Now, let U be an N(O, p) r.v. independent of {v(t); t ?:: O}, and consider Z(t) = e- at (; I eaSde(s) + U). (3.7.12) Then the following result holds. Proposition 3.7.2. {Z(t); t ?:: O} given by (3.7.12) is a stationary Gaussian process with Cov(s, t) pe -alt-sl. = (3.7.13) PROOF. The proof is straightforward. Normality follows from the fact that the r.v.'s in (3.7.12) are all normal. The covariance is obtained by direct computation. 0 Proposition 3.7.3. {Z(t); t ?:: O} given by (3.7.12) is a Markov process with stationary transition probability P{Z(s + t) E BIZ(s) = = x} 1 J2np(1 - e- 2at ) f B ~p {(u - xe- at )2 - 2p(1 - e 2at) (3.7.14) ~ and (a.s.) continuous sample functions. PROOF. According to (3.7.13), for all 0 E{(Z(t + u) - e-auZ(t»Z(s)} = ~ s < t and u > 0, pe-a(t+u-S) - pe-aUe-a(t-S) Therefore, Z(t + u) - e -au Z(t) is independent of all Z(s) if s hand, for any 0 < Sl < ... < Sn < s, ~ = O. t. On the other + t) E BIZ(Sl) = Xl, ... , Z(Sn) = Xn, Z(S) = X} = P{Z(S + t) - e-atZ(s) E B - e-atxIZ(Sd = Xl' ... , Z(Sn) = = P{Z(S + t) - e-atZ(s) E B - xe- at }. P{Z(s Xn, Z(S)=X} 85 3.8. Stochastic Integration But, the r.v. Z(s E(Z(s + t) - + t) - e-1X/Z(s) is normal with mean zero and e- 1X/ Z(s»2 = E(Z(s + t) - e-1X/Z(s»(Z(s + t) = E{(Z(s + t) - e-1X/Z(s»Z(s + t)} e-1X/Z(s» = p(1 _ e- 21X/ ). The continuity of sample paths follows from (3.7.7). This completes the proof of the assertion. D 3.8. Stochastic Integration Let {x(t);tE[a,b]} and {y(t);tE[a,b]} be real stochastic processes on a probability space {Q,~, Pl. The fundamental problem of stochastic integration is, roughly speaking, to give a sensible interpretation to the expression r (3.8.1) x(s) dy(s). If the process {y(t); t E [a, b]} is not of bounded variation, the integral (3.8.1) cannot be defined pathwise (i.e., for each WE Q separately) as an ordinary Stieltjes integral. Thus, a pathwise Stieltjes approach breaks down. It was Ito (1944), extending the work of Wiener, who discovered a way of defining stochastic integrals of the form r h(s,w)d~(s,w), (3.8.2) where ~ is a Brownian process and h a suitable random function, defined on the same probability space. EXAMPLE r 3.8.1. To illustrate Ito's approach, let us determine the integral ~(s) d~(s), where g(s); s E [a, b]} is a standard Brownian motion. To this end, consider a = t no < tnl < ... < tnn = b, where max (tn,Hl i We now define Because - t ni ) -+ 0 as n -+ 00. 86 3. Elements of Brownian Motion n-I I i=O Wn;) [Wn.i+d - ~(tni)] r we obtain, invoking Proposition 3.4.3, that ~(s)d~(s) = H~2(b) - ~2(a)] - t(b - a). Many attempts have been made to generalize Ito's integral. The first generalization consisted of replacing the Brownian motion process with a square integrable martingale. Kunita and Watanabe (1967) introduced the concept of a local continuous martingale and a stochastic integral with respect to it. Now the latest result in the theory is that one cannot integrate with respect to anything more general than a semimartingale. In this section, we discuss somewhat informally a version of the concept of a stochastic integral. Our aim is to present some basic facts about stochastic integration to justify the mathematical operation of the previous section in connection with the solution of the Langevin equation. Let {C(t);t E [a,b]} be a second-order random process (see Definition 1.5.8) such that E {W)} = 0 and h( . ) be a real nonrandom function on [a, b]. In the sequel, we will define the concept of Riemann stochastic integral of h(· ) with respect to C, r (3.8.3) h(s)C(s)ds, for a suitable class of functions h( . ). Let a = t no < tni < ... < tnn = b be a partition of [a, b] such that SUPi (tn,i+1 sequence of r.v.'s {Un}':' defined by - t n;) --+ 0 as n --+ 00. Consider the n-I Un =L i=O h(tni)Wni)(tn,i+1 - t ni )· If {Un}':' converges in the mean square to a r.v. U, i.e., if E(Un - U)2 --+ r 0 as n --+ 00, then we call U the Riemann integral of h( . ) with respect to Cand write U = h(s)((s)ds. (3.8.4) 3.8. Stochastic Integration 87 Note that (3.8.4) is equivalent to E(U,,- Uk)2--+0 asmin{k,n}--+oo. Next we discuss conditions under which the limit (3.8.4) exists. rr Proposition 3.8.1. For the integral (3.8.3) to exist, it is sufficient that the integral (3.8.5) h(s)h(t)C(s,t)dsdt exists, where C(s,t) = Eg(s)W)}. PROOF. Assume that the integral (3.8.5) exists and consider E(U" - ukf = EU; - rr 2EU" Uk + EU;. After some straightforward calculations, we find that !~~ EU; = lim min(n,k)-+oo EU" Uk = h(s)h(t)C(s,t)dsdt, (3.8.6) f.b f.b h(s)h(t)C(s, t) ds dt. (3.8.7) a a Therefore, lim min(",k)-+ao E(U" - U,,>2 = 0, o which proves the sufficiency of (3.8.5). r Corollary 3.8.1. If we assume that the function h( . ) is of bounded variation and C(s, t) is continuous, then the integral f..b exists. (3.8.8) C(s, t) dh(s) dh(t) Now, consider ,,-1 L h(t"i) [(t",i+1) - i=O W"I)] 11-1 11-1 i=O i=O = L h(tlli)C(t",i+d = L h(t"iK(tlli ) L h(t",j-1)Wllj) + h(bK(b) j=1 II = h(b)C(b) - h(a)C(a) - h(a)C(a) - L h(t"i)C(t"i) i=1 " II L (tlli ) [h(t"i) ,,=1 h(t",i-1)]. (3.8.9) 88 3. Elements of Brownian Motion r r But, because by assumption the integral (3.8.8) exists, it follows from Proposition 3.8.1 that l~i:~. i~ Wni) [h(tni ) - h(tn.i+l)] = r '(s)dh(s) exists. Therefore, because the limit of the right-hand side of (3.8.9) exists, so does the left-hand side and we have h(s)d'(s) = h(bK(b) - h(aK(a) - (3.8.10) '(s)dh(s). EXAMPLE 3.8.2. Let {W); t ;?; O} be a standard Brownian motion process. For which functions h(· ) does the integral I h(s)e(s)ds (3.8.11) exist in the mean square? For this integral to exist, the integral (3.8.5) must exist. Because, in this case, C(s, t) = min(s, t), we have II min(u, v)h(u)h(v) du dv = = = I I f: I I h(V)dv{f: uh(u)du +v uh(u)du + h(v)dv f: uh(u)du + h(v)dv =2 I I f h(U)dU} f f: vh(v)dv h(u)du h(u)du vh(v)dv f: h(v)uh(u) du dv. From this, we deduce that the integral (3.8.11) exists if h( . ) ELl' This, then, implies that the integral I h(s)de(s) also exists if h( . ) is a function of bounded variation. Problems and Complements 3.1. Determine the correlation function of a standard Brownian motion. 3.2. Let {W); t ~ O} be a standard Brownian motion. Determine the joint probability density of W1) and W2)' 0 < t1 < t2 < 1, given W) = o. 89 Problems and Complements 3.3. If {W); t ~ O} is a standard Brownian motion, show that X(t) = tW- l ), if t > 0 and X(O) = 0, is also a standard Brownian motion. 3.4. The reflected Brownian motion is {IW)I;t ~ O}, where W) is a standard Brownian motion. Show that IW)I is a Markov process. Find Egl(t)l} and Var{l~(t)l}. 3.5. Let { ~(t); t ~ O} be a standard Brownian motion. Find the conditional probability density of ~(t) given that Wd = Xl' Wz) = xz, and tl < t < t z . 3.6. Let {~i(t); t ~ O}, i = 1, 2, be two independent standard Brownian motions. Define X(t) = {~l(t)' ~z(-t), Find the covariance function of {X(t); -00 t~0 t < O. < t < oo}. 3.7. Let {~(t); t ~ O} be a standard Brownian motion. Show that X(t) = 1 -~(ct) c is a separable Brownian motion. 3.8. Let {Z(t); 0 ~ t ~ 1} be a Brownian bridge [see (3.1.8)]. Find Cov(Z(t dZ(tz)). Define Y(t) = Show that {Y(t); t ~ (1 + t)z(_t_), 1+t t ~ O. O} is a standard Brownian motion. 3.9. Let {~(t); t ~ O} be a standard Brownian motion and consider Z(t) = I ~(s)ds, t ~ O. (i) What kind of process is Z(t)? (ii) Determine its covariance function. 3.10. Let {~(t); t ~ O} be a standard Brownian motion and let 't x be the first hitting time of X (see Section 3.2.1). Show that 3.11. Let X(t) = sUPo:s;.:s;, ~(s). Show that, for any X < y, Pg(t) ~ x, X(t) > y} = Pg(t) > 2y - x}. 3.12. Show that {~('tx + t) - ~('tx); t ~ O} is a Brownian motion independent of't x • 3.13. Let {~(t); t ~ O} be a standard Brownian motion and X(t) = sup{ ~(s);O ~ s ~ t}. Show that, for x < y and y > 0, P{X(t) ~ y, W) E dx} = 1 { exp (Xz) Jf.it - 2t - (2Y2t- X)z)} dx. exp - 3.14. Let {W); t ~ O} be a standard Brownian motion and 'tx be the first hitting time 90 3. Elements of Brownian Motion of a state x > O. Define if t < tx ift ~ t x . Z(t) = {:(t) Determine P{Z(t) ::;; y}, where y < x. 3.15. Show that the processes IW)I and X(t) - ~(t), where ~(t) is a standard Brownian motion and X(t) = sup{ ~(s); 0::;; s ::;; t}, are stochastically equivalent in the wide sense (see Definition 1.3.1). 3.16. Determine the probability p{ min to~s::;;to+t [This is the probability that Wo) = Xo > 0.] ~(s)::;; 0IWo) = ~(s) = xo}. 0 at least once in (to, to + t) given that 3.17. Let {W); t ~ O} be a standard Brownian motion. Find the probability that W) has at least one zero in (to, to + t). 3.18. Let T* be the largest zero in (0, t); then P{T*::;; u} = 3.19. Let {~(t); ~arc sin~. 7t t ~ O} be a standard Brownian motion; then V(t) = e-'~(e2') is called an Ornstein-Uhlenbeck process. Show that V(t) is a Gaussian process. Determine its covariance function. 3.20. Let g(t);t ~ O} be a standard Brownian motion and h(·) a real continuous function on [0, (0). Consider X(t) = I h(s)~(s) ds. What kind of a process is X(t)? Determine its mean and variance. 3.21. Let f(·) and h(·) defined on [0, (0) be differentiable and g(t); t dard Brownian motion. Show that {f f d~(s) f {f rh(s)f(t)d~(s)d~(t) E f(s)h(t) dW) } = ~ h(t)f(t) dt. 3.22. (Continuation) Show that E } = 0 if a < b ::;; c < d. 3.23. Verify the identity bfC E {fa a f(s)h(t) d~(s) dW) } fmiO{b.C} = a h(t)f(t) dt. O} be a stan- 91 Problems and Complements 3.24. Let {,(t); t ~ O} be a standard Brownian motion. Find the mean and covariance of e ot f~ e-OSd,(s). 3.25. Let {W); t ~ O} be a standard Brownian motion. Show that, for all t ~ 0, IW)I ~ W) - inf '(s). O,;s,;t 3.26. Suppose that the stochastic process {X(t);t ~ O} is a solution of the stochastic differential equation + exX(t) = W), X(O) = Vo, where m, ex are positive constants and W) is a standard Brownian motion. What mX'(t) kind of process is X(t)? Determine its mean and covariance function. 3.27. Let the process {X(t); t ~ O} be a solution ofthe following stochastic differential equation: exX'(t) + PX(t) = ,'(t), X(O) = Vo, where ex, P> 0 are constants, Vo is a real number, and ,(t) is a standard Brownian motion. Define precisely what is meant by a solution of this equation. Find the solution satisfying the initial condition X(O) = Vo. Find the mean and covariance ofW)· CHAPTER 4 Gaussian Processes 4.1. Review of Elements of Matrix Analysis In this section we present a review of some basic properties of the square matrices that will be needed throughout this chapter. It is expected that those who read this section have some background in matrix analysis. Let M = (a ij ) be a square matrix whose elements (or entries) aij are, unless otherwise stated, real numbers. If aii = 1 for all i and aij = 0 when i -# j the square matrix is called the "unit matrix" and denoted by I. As usual we will denote by M' the "transpose" of the matrix M, which is a square matrix M' = (Pij), such that /3ij = aji for all i and j. From this definition, it follows that l' = 1. When M = M', the square matrix M is said to be "symmetric." The following properties of the "unary" operation' are easy to verify: (M')' = M, Let M = (a i) be an arbitrary square matrix. In this book, we shall use the symbollMI to denote the determinant of M. One can verify that IM'I=IMI· (4.1.2) If IMI = 0, the square matrix M is said to be "singular"; otherwise, we call it nonsingular. Let M be a nonsingular square matrix; then there exists the unique square matrix, denoted by M- 1 , such that M-1M = MM- 1 = 1. (4.1.3) The square matrix M- 1 is called the "inverse" of M. It is well known that (4.1.4) The matrix M- 1 is nonsingular; if Ml and M2 are nonsingular so is MI' M2 4.2. Gaussian Systems 93 and (4.1.5) In addition, (4.1.6) We will use the notation x for a column vector. Then x' is a row vector. Now let A = (aij) be an n x n symmetric matrix. The matrix A is said to be "non-negative definite" if the quadratic form n x'Ax = n L L aijXiXj;;::: O. i=i j=i (4.1.7) If x'Ax = 0 if and only if x = 0, the square matrix A is said to be "positive definite." A symmetric matrix A is positive definite if and only if there exists a square matrix C such that ICI > 0 and C'C = A. (4.1.8) If M is an n x n matrix, the equation 1M - All = 0 (4.1.9) is of degree n with respect to A. The roots Ai,"" An of this equation are called the "eigenvalues" (or "spectral values") of the matrix M. It is a well-known result that every symmetric n x n matrix A has n real, not necessarily distinct, eigenvalues Ai, ... , An' and that n IAI=nAi' (4.1.10) i=i If all Ai > 0, the symmetric matrix A is positive definite. Finally, a square matrix T is said to be "orthogonal" if T'· T = 1. (4.1.11) The next result concerning orthogonal matrices is due to Scheff€:. Let A be a symmetric matrix; then there exists an orthogonal matrix T such that T'AT = D, (4.1.12) where D = (dij) is a diagonal matrix; that is, dij = 0 when i #- j. In addition, dii = Aj • Finally, let M be an arbitrary square matrix; then the product M'M=A is a symmetric and non-negative definite square matrix. 4.2. Gaussian Systems Let (4.2.1) 94 4. Gaussian Processes be a sequence of real r.v.'s on a probability space {n,~,p}, such that {X;}~ c L2{n,91,p}. Denote by J1.j = E(Xj)' uij = E(Xi - J1.i)(Xj - J1.j). (4.2.2) The symmetric n x n matrix (4.2.3) A = (Uij) is called the covariance matrix. Because (4.2.4) the matrix A is positive definite. Thus, IAI > 0 and, consequently IA-li > O. Denote by pi = (J1.l, ... ,J1.n)' The system of r.v.'s (4.2.1) is normally distributed if its joint probability density is (4.2.5) and A -1 is also a symmetric positive definite matrix. Now let us determine K. Because A -1 is symmetric and positive definite, there exists a square n x n nonsingular matrix C such that (4.2.6) Hence, (x - p)'A-1(X - p) = (x - p)'C'C(x - p) = (c(x - p))" C(x - p). Set y = C(x - p), x = C- 1y + p. (4.2.7) The Jacobian J of this transformation has the form (4.2.8) where II C- 111 indicates the absolute value of the determinant IC- 11. From (4.2.6), we have (4.2.9) Hence, (4.2.10) Therefore, f:··· f:oo!(X1".. ,xn)dx1...dxn= K t:··· f: or K 'IAI 1/2 {f: e- l /2x2 dx e- l /21'1IAI 1/2 dYl" .dYn = 1, r = 1. 4.2. Gaussian Systems 95 From this, we readily obtain that K =(~)1/2 (2nt ' (4.2.11) (4.2.12) Denote by CP(tl,· .. ,tn ) { r. = E expV f-t tjXj)} = E {'I'X} e' (4.2.13) the characteristic function of the system (4.2.1), where (4.2.14) The following proposition gives the form of the characteristic function (4.2.13) assuming that the joint probability density of the system (4.2.1) is (4.2.12). Proposition 4.2.1. The characteristic function cp(t 1 , ••• ,tn) under (4.2.12) is (4.2.15) PROOF. Because the symmetric matrix A is positive definite, there exists a square n x n nonsingular matrix L such that LV = A. Then, (x - p)'A-l(X - p) = [L -l(X - p)]'L -l(X - p), so that x ( IA-11)1/2 (2n)n r: . ·f: cp(t 1, ... , t n) = exp{it'x - ![L-1(x - p)]'L-1(x - P)}dXl .. ·dxn • Now set The Jacobian of this transformation is Therefore, x = Lz + p. 4. Gaussian Processes 96 cp(tl>".,t n )= = A -11)1/2 fOO ( I(2n)' -00'" exp(it'J!) (),/2 2n foo ". foo -00 foo -00 exp[it'(LZ+J!)-~z'zJIAI1/2dz1 .. ·dz, 1,· , exp(-zzz+ltLz)dz 1 ,,·dzn , -00 or if we set u' = t'L, where u' = (u i , ... , u,), J] fOO exp( -ZZj2+·lujz)dzj} = exp(it'J!) {' (2n),/2 = exp(it'J!) ( , ) (2 ),/2 exp( - ~uJ 1 -00 JJ n J-l JJ, (fOO J-1 = exp(it'J!) (2n),/2 [exp( -~u'u)J(2n)'/2 = exp(it'J! - YLL't) = ) exp [ - ~(Zj - iuYJ dZj -00 = exp(it'J!- ~u'u) exp(it'J! - ~t'At). This proves the proposition. D 4.3. Some Characterizations of the Normal Distribution A system of r.v.'s (4.3.1 ) defined on a probability space {n,.?4, P} is said to be normal (or Gaussian) if {XJ c L 2 {n,.?4,p} and its joint probability density is given by Equation (4.2.12). We will show several properties of a Gaussian system. Let n L Z = IXiXi = (X'X, (4.3.2) i~l where IXl, ... , IX, are constants and X' = (Xl"'" Xn), be a linear combination. We want to show that the r.v. Z is also normally distributed with E {Z} n = L IXiJ.li = (X'J!, i~l n Var{Z} = n L L IXiIX/Iij' i~1 j~l See (4.2.2) for definition of the notation. This will be demonstrated using characteristic functions. 4.4. The Gaussian Process 97 According to (4.2.13), the characteristic function of X is <p(tl,···,tn ) = E{exp(it'X)} = E{exp(i ktl tkXk)} (4.3.3) On the other hand, E{e iuZ } = E{exp(iU j~ tljXj )}. (4.3.4) From this and (4.3.3), we see that (4.3.4) is a particular case of (4.3.3) obtained by setting Hence, (4.3.5) and this is the characteristic function of the normal distribution with mean tliJli and variance L~ L~ tlktlPkj' This proves our assertion. Does the converse hold? In other words, if every linear combination of (4.3.1) is normally distributed, does this imply that the system (Xl"'" Xn) has a normal joint distribution? The answer is affirmative as we will show next. To this end, assume that the Z [see (4.3.2)] is normally distributed; then its characteristic function is (4.3.5). Now, consider L~ <p(t 1 ,··., tn ) = E {exp (i t tjX j )}. This clearly can be obtained from (4.3.5) if in (4.3.5) we set Utlj = tj • Hence, <p(tl"'" t n ) = exp (i t tjUj - ~ tt = exp(it'p - tt' At). t h C1jk) (4.3.6) According to Proposition 4.2.1, this is the characteristic function of the normal distribution. This proves our assertion. The previous two results can be formulated as follows. A system of n r.v.'s from L2 {n,,qJ, P} is Gaussian if and only if every linear combination of its members is normally distributed. 4.4. The Gaussian Process Let {e(t); t E T} be a real-valued stochastic process on {n,,qJ, P} such that g(t); t E T} C L2 {n,,qJ, Pl. 98 4. Gaussian Processes Set (4.4.1) Jl(t) = E g(t)} and C(s, t) = E(~(s) - Jl(s»(~(t) (4.4.2) - Jl(t». the mean and the covariance function, respectively, of the process ~(t). Definition 4.4.1. The stochastic process {~(t); t E T} is said to be a "Gaussian process" if and only if for each {tl, ... ,t,,} c T, n = 1,2, ... , (~(td,· .. ,W,,» is jointly normal. The covariance function (4.4.2) possesses the following properties: (i) C(s, t) = C(t, s); (ii) for any {t l , .•. ,t,,} c T, n = 1,2, ... , and any real numbers U l , L" L" UiUjC(t i, t) ~ o. ••• , U", (4.4.3) i=1 j=1 In other words, the covariance function is non-negative definite [see (4.2.4)]. We now show that the symmetry and non-negative definiteness are sufficient conditions for there to exist a Gaussian random process with a specified covariance function. Let C(s, t) be a symmetric non-negative definite function. Consider {tJ~ c T such that t 1 < ... < tIl and specify that WI), ... , ~(t,,) is jointly Gaussian with covariance matrix [C(t i, tj)]' i, j = 1, ... , n. Then any subfamily is also jointly Gaussian. Thus, this family satisfies the Kolmogorov consistency requirement. It now remains to apply the Kolmogorov theorem. EXAMPLE 4.4.1. Let {W); t ~ o} be a Gaussian process with E {~(t)} = 0 and C(s, t) = min(s, t). (4.4.4) From the definition, it follows that, for any 0 < tl < ... < t", the system of r.v.'s (Wd, ... , ~(tll» has a normal distribution with the covariance matrix A= (4.4.5) One can show that this matrix is positive definite. Let us now prove that the process ~(t) has independent increments. First, from (4.4.4), it readily follows that, for any 0 ::;;; s < t, Varg(t) - ~(s)} = t - s. On the other hand, for any 0 ::;;; E(~(S2) SI < S2 < - ~(SI»(~(S4) - ~(S3» S3 (4.4.6) < S4. = S2 - S2 - SI + SI = 0, 99 4.5. Markov Gaussian Process which clearly shows that the r.v.'s (4.4.7) e(t 1), W2) - W1), ... , W.) - W1) are uncorrelated. On the other hand, because every linear combination of a Gaussian system is also normally distributed, each e(t;) - e(ti-l) is a normal r.v. with mean zero and variance given by (4.4.6). Hence, (4.4.7) are jointly Gaussian and, because they are uncorrelated, they must be independent. Therefore, the process {e(t); t ~ O} has independent increments, which are normally distributed, and the distribution of W + s) - W) does not depend on t. Thus, {e(t); t ~ O} is clearly a standard Brownian motion. From this, we may conclude that a standard Brownian motion is a Gaussian stochastic process satisfying condition (4.4.4). Remark 4.4.1. If (X 1 " " , X.) is a jointly Gaussian collection, then these r.v.'s are mutually independent if and only if the covariance matrix of the system is diagonal. Moreover, because the covariance matrix is diagonal if and only if each couple of Xi and Xj is independent, a Gaussian collection is mutually independent if and only if it is pairwise independent. 4.5. Markov Gaussian Process In this section, the following result is required: Let (Xl' ... ,X.) be a Gaussian system of r.v.'s with joint probability density IA-11)1/2 f(x 1, ... ,Xn) = ( (2n)" exp (1. -:z n ) i~ j.f; aijXiXj , (4.5.1) where A -1 = [aij] is a symmetric positive definite matrix. The conditional probability density of Xn given that Xl = Xl' ... , X n- 1 = Xn- 1, denoted by f(x nlx 1, ... ,Xn-l), is defined by f(x l' ... ,X n) J~oof(X1"'" x.) dXn = exp( - ~ J~oo exp( -~ I7=1 Ij=l aijXiX) Ii=l Ij=l aijxixj) dx; (4.5.2) After some simple algebraic calculations, one readily obtains that n n n-1 n-l I I aijXiXj = i=l I I i=l j=l j=l n-1 aijXiXj + x;ann + 2xn I i=l ainXi' (4.5.3) From this and (4.5.2), we have = I [ Cexp { -:zann Xn + n-1 i~ (a.) J2} , a:: Xi (4.5.4) 4. Gaussian Processes 100 where C is a norming constant independent of Xl' ... , x n- l . This is clearly a normal probability density with mean n-l (a. ) -I ~ ann Xi' i=l Thus, (4.5.5) Let g(t); t E T} be a Gaussian process with E(~(t)) = 0 and C(s, t) = E(~(s)~(t)), s, t E T, which is also a Markov process. We want to show that the following result holds. Proposition 4.5.1. The covariance matrix C(s, t) of a Markov Gaussian process with mean value 0 satisfies the condition that, for any t l , t 2, t3 E T with tl < t2 < t 3, (4.5.6) PROOF. Because s < t from T, ~(t) is a Markov process, it follows from (4.5.5) that, for any E g(t)1 ~(s) = x} = - (a ann n - l •n) x. (4.5.7) To determine (an-l.n/ann), consider f.t(Xl,X2) = • (lA -11)1/2 211: 1 I -1 exp(-"2xA x), where A -1 = [C(s, s) C(t, s) C(s, t)]-l C(t, t) 1 = C(s, s)C(t, t) - C 2(s, t) [ C(t, t) - C(s, t) -C(s,t)] . C(s,s) From this and (4.5.7), we conclude that E{~(t)I~(s) = x} = (~~:::Dx. Finally, according to (1.5.6), we can write C(t l ,t 3) = E(WdW3)) = E{E(WdW3)IW2))} = E{E(~(tdIW2))E(~(t3)1~(t2))}' (4.5.8) 4.5. Markov Gaussian Process 101 which proves the assertion. D According to the last proposition, the covariance function of every Gaussian Markov process with mean zero must satisfy Equation (4.5.6). One can show that the converse is also true. In other words, any mean zero Gaussian process whose covariance function satisfies Equation (4.5.6) must have the Markov property. To prove the last statement, we need to observe that a zero mean Gaussian process is Markov if and only if, for any tl < t2 < .. , < tn' n = 2, 3, ... , Necessity of this condition is obvious. To prove its sufficiency, note that from (4.5.5) and (4.5.7), it follows that On the other hand, it is not difficult to verify that 1 [ Xn !(Xn!Xn-l) = Cexp { -2ann + (a n - l • n) ~ Xn-l J2} , which proves the assertion. Let us now show that (4.5.6) implies (4.5.9). Note that from (4.5.6), it follows that, for any 1 :::; k :::; n - 1, C(tk, tn ) -_ C(tko tn- l )C(tn- l , tn), C(tn - l , t n - l ) for all k = 1, ... , n - 1. Thus, because these are normally distributed r.v.'s, e(tn) - ( C(tn- 1, tn) ) Wn-l) C(tn- l , tn-d 4. Gaussian Processes 102 or 4.6. Stationary Gaussian Process Let {~(t); t E T} be a real-valued Gaussian process such that E{Wn = 0, (4.6.1) Vt E T. The marginal distributions of such a stochastic process are completely determined by its covariance function (4.6.2) According to Definition 1.5.6, marginal distributions of a strictly stationary stochastic process are invariant under time-shifts. When is a Gaussian zero-mean process strictly stationary? From the previous remarks, the condition should be on C(s, t). To simplify our notation, assume that T = [0, 00); then we have the following result. Proposition 4.6.1. A real-valued Gaussian process ° :$; s :$; if t, C(s, t) = C(O, t - s) = R(t - s). PROOF. o} with zero-mean and only if, for any {~(t); t ~ and covariance function C(s, t) is strictly stationary (4.6.3) If ~(t) is strictly stationary, then E{~(s)~(t)} = E{~(s or C(s,t) = C(s + .)W + .)} + .,t + .). By taking. = - s we have C(s, t) = C(O, t - s) = R(t - s). On the other hand, if (4.6.3) holds, the characteristic function of t1 < .. , < t n) is (~(t1),··.,~(tn)) (O:$; Problems and Complements 103 [see (4.3.6)]. But this is also the characteristic function of (e(t 1 + -r)). This proves the assertion. + -r), ... , 0 e(t n Corollary 4.6.1. A zero-mean wide sense stationary Gaussian process is also strictly stationary. When is a stationary Gaussian process also a Markov process? The following proposition due to Doob provides an answer to this question. Proposition 4.6.2. A stochastically continuous stationary Gaussian process {W);t;:::.: O} with E{e(t)} is Markov =0 =1 E{W)V and if and only if R(s) = e- Y1s1 , y > O. (4.6.4) PROOF. If e(t) is stationary and Gaussian, then according to the condition of the proposition, (4.6.3) and (4.5.6), we have that R(s + t) = R(s)R(t). (4.6.5) Now, because e(t) is stochastically continuous, R(u) is continuous at u = 0 (prove this); then, because R(O) = 1, we have R(t + h) - R(t) = R(t) [R(h) - 1] This shows that R(t) is continuous at every t of the Cauchy equation (4.6.5) for t > 0 is R(t) = e- yt , E -+ 0 as h -+ O. R. Then the unique solution y > O. On the other hand, R( - t) = R(t), which proves necessity of condition (4.6.4). Conversely, if the covariance of a stationary zero-mean and variance 1 Gaussian process is of the form (4.6.4), then condition (4.5.6) holds. This implies that condition (4.5.9) must also hold, which proves the proposition. o Problems and Complements 4.1. Let X = (Xl' ... ,X4 ) be a Gaussian system ofr.v.'s with covariance matrix A= 15 3 1 0 3 16 6 -2 1 6 4 0 -2 1 3 104 4. Gaussian Processes Determine its probability density for f(xl' ... ' X4) if III = 1,0, = 0, 112 113 = -1,0, 114 = 1. 4.2. The probability density of the system X = (X l ,X2 ,X3 ) is given by f(x l ,x2,x3) = 1 f3 (1 2+ 4X22- 16V1i3exp -g[2x l 2x 2(X3 2) . + 5) + (X3 + 5)] Find 111,112,113' and the covariance matrix. 4.3. Let (X l ,X2 ,X3 ,X4 ) be a Gaussian system ofr.v.'s with E{Xd Show that E{X 1 X 2 X 3 X 4 } = where O"ij 0"120"34 = 0, i = 1, ... , 4. + 0"130"24 + 0"140"23, = E{XjXJ. 4.4. Let {XjH be an i.i.d. sequence ofr.v.'s with each X j - N(O, 1). Let {a;}~ and {bj}~ be real numbers. Show that Y= L" ajXj i==l and Z = . L bjXj i=l are independent r.v.'s if and only if L~=l ajbj = 0 4.5. Let A be an k x k square symmetric nonsingular positive definite matrix. Show that there exists a nonsingular square matrix r such that A = rr- l . 4.6. Let X and Y be i.i.d. r.v.'s with finite variance fJ such that a 2+ fJ2 = 1, a . fJ =f. 0, and aX 0"2. If there exist constants a and + fJY':" X, then X - N(O,O"). 4.7. Let g(t); t ;;::: O} be a standard Brownian motion. Show that a. E{W2)IWd = xd = Xl' tl < t2; b. E{W2)IWd = Xl' W3) = X3} = Xl + (X3 - Xl)(t2 - td, t3 - tl for all 0 < tl < t2 < t3. 4.8. Let Xl and X 2 be i.i.d. r.v.'s X j - N(O,O"). Consider W) = Xl cos 1t + X 2 sin 1t. Find the mean and the covariance function of the stochastic process {e(t); t ;;::: O}. Show that the process is Gaussian. 4.9. Let {X(t); t ;;::: O} be a Gaussian process with E{X(t)} = 0 and E{X(t)X(t + 't")} = C('t"). Find the covariance function of {I1(t);t;;::: O}, where I1(t) = X(t)X(s + t). 4.10. Show that a Brownian motion process g(t); t ;;::: O} is a Gaussian process. Problems and Complements 105 4.11. Construct (X, Y) which is not normal, but X and Yare normal. 4.12. Let {Xn}'1 be a sequence ofr.v.'s Xn ~ N(Jln, an) such that Xn q~ X. IfVar{X} = a 2 > 0, show that X ~ N(Jl, a). 4.13. Let X and 0 be independent r.v.'s, 0 uniform in [0, 2n], and _ {2x3e-1/2X" fx(x) - 0, x;::: 0 x < o. Let g(t); t ;::: o} be stochastic process defined by W) = X 2 cos(2nt + 0). Show that W) is Gaussian. 4.14. Let {X(t); t ;::: O} be a stochastic process specified by X(t) = e-'~(e2'), where ~(t) is a standard Brownian motion process. Show that X(t) is a Gaussian Markov process. 4.15. Let {~(s); s ;::: o} be a standard Brownian motion. Consider X(t) = I ~(s)ds. What kind of process is {X(t); t ;::: OJ? Determine its covariance function. 4.16. Let g(t); t ;::: O} be a Gaussian process specified by ~(t) = X cos(2nt + U), where U is uniform, in [0,2n] independent of x. Determine the distribution of X. 4.17. Let g(t); t ;::: O} be a Gaussian process with E g(t) = O} and suppose that C(s, t) = Eg(s)~(t)} is continuous in sand t. Show that X(t) = I o:(s)~(s) ds is also a Gaussian process where 0:(. ) is a continuous function. 4.18. Assume that {~(t); t ;::: O} is a stationary Gaussian process. Show that f+T ~(s) ds = Z(t) is also a stationary Gaussian. 4.19. Let {~(s); s ;::: O} be a Markov Gaussian process. Show that X(t) f = ,'+T ~(s)ds is not Markovian. 4.20. Let g(t); t ;::: O} be a stationary Markov Gaussian process with continuous covariance function C(s). Determine C(s). 4.21. Complete the proof of Proposition 4.6.2. CHAPTER 5 L2 Space 5.1. Definitions and Preliminaries In many applications of the theory of stochastic processes, an important role is played by families of square integrable (second-order) r.v.'s. In this section, we give some basic definitions and prove some fundamental inequalities involving second-order complex-valued r.v.'s. Let {n,~,p} be a probability space and Z a complex-valued r.v., i.e., Z = X + iY, where X and Yare real-valued r.v.'s on {n,~,p}. As usual, we write X = Re {Z} for the "real" part and Y = 1m {Z} for the "imaginary" part of the complex variable Z. The conjugate Z of Z is defined by Z= and the modulus X - iY IZI is given by IZI 2 = (X2 + y2). The obvious connection between the conjugate and the modulus of Z is Z·Z = IZI2. Definition 5.1.1. A complex-valued r.v. Z on {n,~,p} is called second order if EIZI 2 < 00. The family of all such r.v.'s is denoted by L2 = L2{n,~,p}. 5.1. Definitions and Preliminaries Proposition 5.1.1. For any Zl' Zz 107 E L2, IEZ1 ·Z2 Iz :::;; EI Z 1I z ·EI Z zl z. PROOF. (5.1.1) This is Schwarz's inequality. It follows readily from O<Elz _(EZ 1'ZZ)Z IZ 1 EIZzl z z EZ1' Zz) )((EZl 'Zz) - ) = E ( Zl - ( EIZzl z Zz Zl - EIZzl z Zz = EIZ IZ _ (IEZl . ZzIZ) EIZzl z 1 ' With the usual pointwise definitions of addition and multiplication by complex constants, L z becomes a linear (or vector) space. As such, L z makes no provision for such concepts as the norm of Z. We now append this notion to L z . To this end, we need the following convention in dealing with elements of L z . Definition 5.1.2. Any two Zl' Zz E L z are to be regarded as equal if Zl = Zz Definition 5.1.3. For Zl' Z2 E (a.s.). (5.1.2) L z , the complex number (Zl'ZZ) defined by (5.1.3) is called the "inner" product. That (5.1.3) always exists follows trivially from Schwarz's inequality. We obviously have (i) (cZ 1,Zz) = C(Zl'ZZ)' (ii) (Zl' Zz) = (Zz' Zd, (iii) (Zl + Z2, Z) = (Zl' Z) + (Zz' Z). (5.1.4) Finally, from (5.1.2), it follows that (Z, Z) = 0 implies Z = 0 (a.s.). Thus, (Zl' Zz) satisfies all the requirements for an inner product. Therefore, L2 is an inner product space. Remark 5.1.1. The convention (5.1.2) is an informal device for dealing with the problem posed by failure of (Z, Z) = 0 to imply Z = O. A formal means of treating this difficulty is to replace the space L z with the space of equivalence classes defined by the assertion that two r.v.'s are equivalent if and only ifthey are equal (a.e.). All subsequent statements would be applied to equivalence classes rather than to the r.v.'s themselves. 108 5. L2 Space The inner product can be used to define another function on L 2 • Definition 5.1.4. For each Z E L 2 , the "norm," denoted by IIZII, is defined by (5.1.5) Proposition 5.1.2. For any Zl' Z2 E L 2 , IIZ1 + Z211 2 + IIZ1 - Z211 2 = 211Z1112 + 211Z2112. (5.1.6) PROOF. This is the Parallelogram Law. To prove it, write IIZ1 + Z211 2= (Zl + Z2,Zl + Z2) = I Zl11 2+ (Zl,Z2) + (Z2,Zd + IIZ2112, IIZ1-Z2112 =(Zl-Z2,Zl-Z2)= I ZlI1 2+(Zl, -Z2)+(-Z2,Zl)+ I Z211 2 By adding these and employing (5.1.4), the assertion follows. D Definition 5.1.5. If Zl, Z2 E L2 are such that (Zl, Z2) = 0, we say that they are "orthogonal" and indicate this by writing Zl .1 Z2' Proposition 5.1.3. If Zl .1 Z2, we have IIZ1 + Z211 2 = IIZl112 + IIZ2112. (5.1. 7) PROOF. Write IIZ1 + Z2112 + Z2,Zl + Z2) = IIZl112 + (Zl,Z2) + (Z2,Zl) + I Z211 2 = IIZl112 + IIZ2112. = (Zl Definition 5.1.6. A subset (9 c L2 is called an "orthogonal collection" if, for any Zl' Z2 E (9 such that Zl "# Z2 (a.s.), we have Zl .1 Z2' In addition, if, for every Z E (9, we have IIZII = 1, (9 is called an "orthonormal collection." The following result is known as the Bessel inequality. Proposition 5.1.4. If {Z;}1 c L2 is an orthonormal sequence, then, for any ZEL 2 , n L I(Z,Z;)12 :s; IIZII2. i=l PROOF. o :s; II Z = i~ (Z, Z;)Zi (Z - i~ r (Z,Zi)Z;,Z - i~ (Z,Z;)Zi) (5.1.8) 5.1. Definitions and Preliminaries n 109 n n = II Z 112 - L (Z, Z,)(Z, Z;) - ,=1 L (Z, Z,)(Z" ,=1 = IIZII 2 - L I(Z"ZW - ,=1 L I(Z"Z)1 2 + ,=1 L I(Z,Z,W ,=1 = IIZI1 2 - L I(Z,Z,W· i=l n n Z) + n L L (Z, Z,)(Z, Zj)(Z" ,=1 j=l Z) n n o This proves the assertion. Corollary 5.1.1. If {Zi}f c L2 is orthonormal, then 00 L I(Z,ZiW S 1 (5.1.9) IIZI12. This is so because, according to the previous proposition, (5.1.8) holds for every partial series of this sequence. The following result is known as the Cauchy inequality. Proposition 5.1.5. If Zl' Z2 E L 2 , then II Z 1 + Z211 s II Z 111 + IIZ211· (5.1.10) PROOF. However, (Zl,Z2) + (Z2,Zd = s s (Zl,Z2) + (Zl,Z2) 21(Zl,Z2)1 21IZ111·IIZ211. Therefore, IIZ1 + Z211 2 S IIZl112 + 211Z111·IIZ211 + IIZ2112 = (11 Z 111 + II Z211 )2, o which proves (5.1.10). From (5.1.10) we readily deduce the "triangle inequality": II Z 1 - Z311 S II Z 1 - Z211 + II Z 2 - Z311· (5.1.11) Remark 5.1.2. The concept of a norm permits us to measure the "distance" d(Zl,Z2) between any two members Zl' Z2 E L2 as (5.1.12) Now, we can prove the following result. 5. L2 Space 110 Proposition 5.1.6. The next four functions are uniformly continuous [(a) and (c) on L z and (b) and (d) on L z x L z ]' (a) aZ; PROOF. IIZII; (c) From (5.1.10), it follows that (a) d(aZ 1 ,aZZ ) = lald(ZI'Zz), (b) I(Z1>Z) - (Zz,Z)I::;; IIZII'II ZI - Zzll = IIZlld(ZI'Zz), (c) IIIZIII-IIZzlll ::;;d(ZI,ZZ)' (d) II(ZI + Zz) - (VI + Vz)11 ::;; I ZI - VIII + IIZz - Vzll = d(ZI' Vd + d(Zz, Vz)· 5.2. Convergence in Quadratic Mean Because L z = L2 {n, gj, P} is a metric space, with the distance defined by (5.1.12), we can now introduce a notion of convergence. Definition 5.2.1. A sequence {Zn}f c L z is said to converge to an element Z E L z if IIZn - ZII = d(Zn, Z) ..... 0 as n ..... 00. (5.2.1) For this kind of convergence, it is not necessary that the values Zn(w) converge to Z(w) pointwise or a.s. We shall call this kind of convergence, "convergence in quadratic mean" or "convergence in the mean square" and write Zn ~ Z or Z = l.i.mZn • Proposition 5.2.1. If {Zn}f c L2 converges in quadratic mean (q.m.), it may have at most one limit. PROOF. Suppose that Zn~ V where V, V c and Zn~ V, L z ; then, for all n = 1,2, ... , I V - VII::;; IIZn - VII + IIZn - Vii ..... 0 as n ..... 00. Hence, I V - V II = 0, which implies that V = V (a.s.). Proposition 5.2.2. If {Zn}f c L2 converges in q.m. to Z E L 2, then IIZ.II ..... IIZII· D 111 5.2. Convergence in Quadratic Mean PROOF. The obvious inequalities IIZn I Ilzll ~ ~ Ilzll + IIZn - zll, IIZnll + Ilzn - zil imply that which proves the assertion. D Definition 5.2.2. A sequence {Zn} C L2 is said to be "Cauchy" or "fundamental," if for every e > 0 there exists a natural number N such that IIZn - Zmll ~ e ifm, n > N. Proposition 5.2.3. If sequence. {Zn}':' c L2 has a limit Z E L 2, then it is a Cauchy PROOF. Let e > 0 be arbitrary; then there exists a natural number N, such that for any n > N IIZn - ZII < e/2. Now, if m, n > N, then which proves the assertion. D The converse statement is much deeper. Proposition 5.2.4 (Riesz-Fischer). If Z E L2 such that {Zn}':' is a Cauchy sequence, there exists PROOF. Consider the convergent series Lk=l rk. Because {Zn}':' is fundamental, for every k there is an nk such that IIZn - Zmll < 2- k ifm, n ~ nk' We may suppose, without loss of generality, that n 1 < IIZnk+l - Znk I < 2- k • Therefore, n2 < "', so that 5. L2 Space 112 On the other hand, due to convexity (Jensen's inequality) IIZnH, - Znkl12 = EIZnk+, - Znkl2 ~ (EIZnk+, - Znkl)2, so that EIZnH, - Znkl ~ IIZnk+, - ZnJ· From this, it follows that the series 00 L E IZnH, k=l Znkl converges. Then, according to the Beppo-Levi theorem, the series converges (a.s.). Consequently, the series converges (a.s.). However, 00 Znl + k=l L (Znk+1 - ZnJ = lim Znk' k-+oo Set where this limit exists and Z = 0 on n where this limit does not exist. Next, we have to show that Z E L2 and that Z = Li.mZn • To this end, take an 6 > 0 and a natural number N such that, for m, n > N, IIZn - Zmll < 6. If ko > N, then IIZn - Znkll2 < 62 for any n > Nand k > k o. By applying Fatou's lemma to the sequence {liZ. - Znk II h>ko' we obtain that IIZn - ZII 2 ~ liminf IIZn - ZnJ 2 ~ 6 2. k-+oo In other words, for all n > N, IIZn - ZII < 6. From this, it follows that Zn - Z E L2 and, consequently, Z E L 2, and Z is the mean square limit of {Zn} l' This completes the proof of the Riesz-Fischer theorem. 0 Corollary 5.2.1. Because IEZn - EZI 2 ~ liZ. - ZII 2 --+ 0 as n --+ that 00, it follows 5.3. Remarks on the Structure of L2 113 The following is a criterion for the mean square convergence due to Loeve. Proposition 5.2.5. A sequence {Zn}1' c L2 converges in quadratic mean only if EZnZm ~ C (a finite constant) if and (5.2.2) as nand m tend independently to infinity. PROOF. Assume that (5.2.2) holds; then, as m, n ~ 00, IIZn - Zml1 2 = EIZn - Zml 2 = E(Zn - Zm)(Zn - Zm) = EZnZn - EZn' Zm - EZmZn ~ C- C- C +C= + EZmZm O. On the other hand, if Zn ~ Z, then EZnZm ~ EZZ = IIZI12. This follows from the fact that if {ZnH' c L2 and {Un }1' c L2 converge in q.m. to Z and U, respectively, then from Corollary 5.2.1, it follows that D Remark 5.2.1. The property of the space L2 established by the Riesz- Fischer theorem is called "completeness." With this property, L2 = L2 {n,~, P} becomes a Hilbert space. 5.3. Remarks on the Structure of L2 The structure of the Hilbert space L2 = L 2 {n,.?4,p} depends clearly on the underlying probability space {Q,.?4, Pl. In this section, we discuss the problem of approximation of a r.v. Z E L2 by means of a certain type of bounded r.v. from L 2 • We also discuss the problem of separability of the space L 2 • To this end, the following definition is needed. Definition 5.3.1. A (real or complex) r.v. Z on a probability space {Q,.?4, P} is said to be "essentially bounded" if there is a constant 0 < c < 00 such that IZI :s; c (a.s.). Loo The subset of all essentially bounded r.v.'s of L2{Q,~,P} is denoted by = Loo{Q,~,P}. On L oo , we define a norm by IIZII = IIZlloo = inf{ c; IZI :s; c (a.s.)}. (5.3.1) With this norm, Loo becomes a normed linear space. The norm IIZlloo is often 5. L2 Space 114 called the "essential supremum" of IZI and is denoted by IIZllao = esssuplZI. (5.3.2) Next, we prove the following result. Proposition 5.3.1. Let V be an arbitrary complex r.v. on > 0 there exists a r.v. Z E Lao {n, Pl, P} such that {n,~, P}, then for any 8 P{V i= Z} < 8. PROOF. Set Ak = {IVI > k}, k = 1,2, ... , and write G = {lUI = oo}. We clearly have that P(G) = 0, ao and that G = Ak• n k=l From this, it follows that P(Ad -+ 0 as k -+ P(Ano) < 8. Now define the r.v. Z as 00, Z- { V on A~ - 0 Clearly, IZI ~ no and P{Z i= V} < 8. so that there exists no such that 0 on Ano. This proves the assertion. o Definition 5.3.2. A subset D c L2 is said to be everywhere "dense" in L2 if any element of L2 is the mean square limit of a sequence of elements from D. It is not difficult to see that a necessary and sufficient condition for a subset D c L2 to be everywhere dense in L2 is that, for any Z c L2 and any 8 > 0, it is possible to find Zo E D such that liZ - Zoll < 8. Proposition 5.3.2. The class Lao c L2 of all bounded r.v.'s is everywhere dense in L 2 • PROOF. Let Z E L2 and consider cp(A) = L Z2 dP, A E !!4. Clearly, cp(. ) is a bounded non-negative countably additive set function on Pl such that cp « P. Therefore, for any 8 > 0, there exists a f> > 0 such that if P(A) < f>, then cp(A) < 8 2• Now, according to Proposition 5.3.1, there exists a bounded r.v. Zo such that 5.4. Orthogonal Projection 115 P{Z -# Zo} < (j. Without loss of generality, we may assume that Zo Then = 0 on the set {Z -# Zo}. o which proves the proposition. Next, we define two important concepts in L2 theory. The first one is that of a "linear manifold." Definition 5.3.3. A subset M c L2 is called a linear manifold if, for any Zl, Z2 E M and complex numbers a and /3, aZ l + PZ2 EM. Definition 5.3.4. A closed linear manifold H c L2 is called a subspace. Let G c L z be an arbitrary subset. There always exists at least one linear manifold that contains G. The intersection of all linear manifolds containing G is denoted by <G) and is called the linear manifold spanned by G. The closed linear manifold is denoted by <G). One of the fundamental questions in the theory of L2 spaces is that of their dimensionality. In this respect, of particular interest are conditions on the probability space {n,86,p} under which L z {n,86,p} is countably infinite dimensional. This question is closely related to the one of whether there exists a countable everywhere dense subset of L z. In other words, is the space L z separable? In general, this is not the case unless the probability space {n, 86, P} has a particular structure. Remark 5.3.1. A linear inner product space L z with the norm IIhll = (h,h)l/Z for h E L z is often called unitary. This term, however, is not standard in the literature. 5.4. Orthogonal Projection To investigate the geometry of the space L 2 {n,86,p}, we only need the concept of orthogonality. According to Definition 5.1.5, elements Zl' Zz E L2 are said to be orthogonal, written Z l l. Zz, if (Zl,ZZ) = O. We say that an element Z E L2 is orthogonal to a subset M c L 2 , and write Zl.M if(Z, U) = 0 for all U E M. In this section, we discuss an important and useful result in the theory of Hilbert spaces, the so-called projection theorem, which deals with decomposi- 116 5. L2 Space tion of an element Z E L2 into orthogonal components. Roughly speaking, the content of this theorem can be described as follows. Given a subspace He L2 and an arbitrary Z c L 2 , there exists a unique decomposition of the form Z = ZH + Zo, where ZH E Hand Zo .1 H. The ZH is called the orthogonal projection of Z onH. To this end, let us show that there is always a minimal distance from an element Z E L2 to a given subspace. Proposition 5.4.1. Let H c L2 be a subspace and Z lJ = inf{ liZ - VII; V Then there is Zo E E E L2 arbitrary and write H}. H such that liZ - Zoll = lJ. PROOF. Choose any {Zk}f c H such that liZ - Zkll from (5.1.6), we have II(Zn - Z) + (Zk -+ lJ. On the other hand, - Z)1I 2 + II(Zn - Z) - (Zk - Z)11 2 = 211 Zn - ZI1 2 + 211 Zk - ZI1 2 or IIZn - Zkl1 2 = 211Zn - ZI1 2 + 211Zk - ZII 2 - 411t(Zn Because t(Zn + Zk) E H, it follows that Ilt(Zn + Zk) - + Zk) - Z112. ZI1 2 ~ lJ 2 • Therefore, (5.4.1) By letting nand k -+ 00, the right-hand side in (5.4.1) converges to zero. This implies that {Zn}f is a Cauchy sequence. Therefore, Zn ~ Zo E H. From the continuity of the norm, it follows that o liZ - Zoll = lim liZ - Znll = lJ. n .... oo Proposition 5.4.2. Let H c L2 be a subspace and Z be such that liZ - ZO II = inf{ liZ - VII; V E L2 arbitrary. Let Zo E H}; then (Z - Zo) .1 H. PROOF. If V E H, so is Zo - aV for any complex a. Therefore, liZ - Zoll ~ liZ - Zo + aVII· E H 5.4. Orthogonal Projection 117 Set Z* = Z - Zo and 7J = (Z*, U). This yields IIZ*112 ~ IIZ* + tXUI1 2 = IIZ*112 + tXf3 + tXf3 + ItXI 211UI1 2 or By successively assigning tX values t, - t, it, and - it where t > 0, one obtains -tlIUII ~ f3 + 7J ~ tlIUII, - t II U II ~ i(f3 - 7J) ~ til U II· Because t > 0 is arbitrary, it follows that f3 + 7J = f3 - 7J = O. Therefore, (Z*, U) = f3 = 0 for all U E H, which proves the proposition. 0 Definition 5.4.1. The orthogonal complement of a subspace H c L2 is the subset H1- c L2 of all Z E L2 such that Z1-H. Proposition 5.4.3. The orthogonal complement H1- of a subspace H is also a subspace such that H n H1- = {ZO}, where Zo = 0 (a.s.). PROOF. If Zl' Z2 E H1-, then (Zl' Z) = (Z2' Z) = 0 for any Z E H. Therefore, + f3(Z2'Z) = (tXZ1 + f3Z2'Z) for any tX and f3. This shows that tXZ1 + f3Z2 E H1-, so that H1- is a linear manifold. To see that it is closed, let {Udl' c H1- converge in q.m. to Zoo 0= tX(Zl,Z) Then, by continuity of the inner product, we have that for any Z E H (ZO,Z) = lim (Un,Z) = O. Therefore, ZO 1- Z, so that ZO E Hl.. Finally, if Zo E H n H\ then (Zo, Zo) = 0, so that Zo = 0 (a.s.). 0 Definition 5.4.2. Let A and B be two arbitrary subsets of L 2 ; then A + B = {Z + U;ZEA,UEB}. (5.4.2) If H1 and H2 are two orthogonal subspaces of L2 one writes H1 E9 H2 for H1 + H 2. Thus, the use of the symbol E9 entails orthogonality of the summands. Proposition 5.4.4. If H is a subspace of L 2, then H E9 H1- = L 2 • PROOF. Clearly, H E9 H1- c L2.1f Z E L 2, then by Proposition 5.4.1 and 5.4.2, there exists Zo E H such that (Z - Zo) 1- H, which implies that (Z - Zo) E H1-. From this, the assertion follows because Z = Zo + (Z - Zo). 5. L2 Space 118 Proposition 5.4.5. If ZI + U1 = Z2 then ZI = Z2 and U1 = U2 (a.s.). + U2 , where ZI' Z2 E Hand U1 , U2 E H1-, PROOF. Because ZI - Z2 E Hand U1 - U2 E H\ it follows that ZI - Z2 = U2 - U1 E H (\ H1- and the assertion follows because H (\ H1- = {O}. 0 Corollary 5.4.1. Remark 5.4.1. If H c L2 is a subspace, it follows from Propositions 5.4.4 and 5.4.5 that any Z E L2 can be expressed uniquely as a sum (5.4.3) where ZH E H and ZH~ E H1-. We call ZH the orthogonal projection of Z on H. It is clear that Z = ZH if and only if Z E H. 5.5. Orthogonal Basis At the end of the Section 5.3 we made a brief commentary concerning the dimensionality of L2 spaces, although the notion by itself has not been formally defined. This concept, whose formal definition will be given at the end of this section, is closely related to the concepts of "orthogonality" and "completeness" which will be discussed in some detail in this section. We begin with the following definition (see Definition 5.1.6): Definition 5.5.1. An orthogonal system (!) c L2 is called "complete" if no other orthogonal collection of elements of L2 contains (!) as its proper subset. Next, we shall show that the closed linear manifold spanned by a complete orthonormal collection (!) c L2 is equal to L 2, <(9) = L 2. For this reason a complete orthonormal system is often called a "basis" for the space L 2 • Definition 5.5.2. Let {Uk}! c L2 be an arbitrary orthonormal system and Z E L 2 • The complex numbers Ck = (Z, Uk)' k = 1,2, ... , (5.5.1) are called the "Fourier coefficients" of Z with respect to the system {Uk}! and the sum (5.5.2) is called a "Fourier series" of Z with respect to {Uk}!' 5.5. Orthogonal Basis 119 From (5.5.1) and the Bessel inequality (5.1.9), it follows that (5.5.3) This then clearly implies that t II k=m+l CkUk 112 = t k=m+l ickl 2 -+ 0 as m, n -+ 00. (5.5.4) Therefore, liS" - Smll where S" = L~=l CkUk • -+ 0 as m, n -+ 00, Hence, by the Riesz-Fischer theorem, Sn~SEL2' Now we can prove the following result. Proposition 5.5.2. Let {Zk}'1 c L2 be a complete orthonormal system; then every element Z E L2 admits an expansion convergent in the mean square (5.5.5) where Ck = (Z, Zk), i.e., (5.5.6) Furthermore, we have that IIZII 2 II = L ickl 2 , k=l (5.5.7) which is called Parseval's formula. PROOF. From (5.5.4), it follows that II '" L... k=l Ck Zkq.m. Z0 - as n -+ 00. Now, for any fixed k = 1, 2, ... and n ~ k, I(Zo,Zk) - cll = I( jt Zo - ~ II Zo - jt = II Zo - jt CjZj' CjZj j l CjZ Zk)1 11'IIZkll II· (5.5.8) 5. L2 Space 120 As n --+ 00 the right-hand side tends to zero due to (5.5.8), which implies that (Zo, Zk) = Ck' Hence, for any k = 1,2, ... , (Zo - Z,Zk) = (ZO,Zk) - (Z,Zd = Ck - Ck = 0, so that Zo - Z is orthogonal to every element of the complete orthonormal system {Zk}'f. This then implies that Zo - Z = 0 (a.s.) so that (5.5.9) Zo = Z (a.s.). Finally, because II Z - ktl CkZk r = IIZlI z - ktl ickl z, we obtain the Parseval formula by letting n --+ (5.5.9). 00 (5.5.10) in (5.5.10) and invoking 0 Remark 5.5.1. The expansion (5.5.5) is called the generalized Fourier series of Z E L z , where ck = (Z, Zd are the generalized Fourier coefficients of Z with respect to the orthonormal sequence of {Zd'f. When {Zk}'f is a complete orthonormal system, it follows from Proposition 5.5.2 that (5.5.11) Remark 5.5.2. From Proposition 5.5.2, it also follows that for a given complete orthonormal {Zd'f c L z and Z E L z , we have that (5.5.5) holds where Ck = (Z, Zd, k = 1, 2, ... , and (5.5.7) holds. Does the converse also hold? In other words, given an arbitrary sequence of complex numbers {cd'f and a complete orthonormal system {Zk}'f, does there exist a Z E L z such that Ck = (Z, Zd for all k = 1, 2, ... and such that (5.5.5) holds? From Bessel's inequality, we know that only those sequences come into consideration for which Lk=l ICklz < 00. This condition is also sufficient. In fact, from (5.5.4) and using the Riesz-Fischer theorem, we have that the series converges in the mean to a Z E L z ifLk=llckl z < (Z,Zk) = Ck , k 00, and = 1,2, .... In other words, if {Zk}'f is a complete orthonormal system and {cd'f a sequence of complex numbers for which Lk=l ICklz < 00, then Lk=l CkZk ~ Z E L z , such that (Z,Zd = ck , and no other solution Z E k = 1,2, ... , (5.5.12) L z of the system of equations (5.5.12) exists. 5.6. Existence of a Complete Orthonormal Sequence in L2 121 Therefore, every complete orthonormal sequence {Zk}f E L2 establishes, by means of the formula Z = L~=l CkZk, a one-to-one correspondence between the elements Z E L2 and the sequences of complex numbers {ck}f satisfying L~11ck12 < 00. Now we can give the following definition. Definition 5.5.3. The dimension of the space L2 is the cardinality of its complete orthonormal set. If there is a complete orthonormal sequence {Zk}f c L 2, then L2 is countably infinite dimensional. Remark 5.5.3. The following, more general, form of Parseval's formula also holds. For any two Z, U E L 2 , L (Z, Zk)(U, Zk)' k=l <X) (Z, U) = (5.5.13) This follows from the identity (which is easy to verify) 4(Z, U) = liZ + UI1 2 - liZ - UI1 2 + iliZ + iUIi - iliZ - iU11 (5.5.14) and from (5.5.7). 5.6. Existence of a Complete Orthonormal Seq uence in L 2 Definition 5.5.3 specifies that a Hilbert space L2 is countably infinite dimensional if there exists a complete orthonormal sequence {Zk}f c L2 (often called its base). In general, such a sequence does not exist. If, however, there exists a countable everywhere dense subset Do c L2 (in other words, such that any element Z E L2 is the mean square limit of a sequence of elements from Do), then there exists a complete orthonormal sequence in L 2 • To show this, write elements of Do as a sequence {Uk}f, i.e., Do = {Udf. From this, by means of the well-known Gram-Schmid orthogonalization procedure, one can always construct an orthonormal family {Zk}f as follows. Put (5.6.1) To obtain Z2' set W2 = U2 - (U2 ,ZdZ l' Clearly, (W2' Zl) = O. Thus, Continuing this processs, we obtain k-l w,. = Uk - L j=l (Uk' Zj)Zj, (5.6.2) 122 5. L2 Space so that (5.6.3) and so on. It is easy to verify that the sequence {Zdf is orthonormal. Let us now show that {Zn}f is also complete. If Zo E L2 is orthogonal to every Zk' it follows from (5.6.1) and (5.6.2) that Zo is also orthogonal to every Ui • Consequently, IIZo - Uk l1 2 = (Zo - Uk,Zo - Uk) = (ZO,ZO) + (Ub Uk);;::: II ZoI12. But because {Uk}f is everywhere dense in L 2 , the left-hand side can be made arbitrarily small by suitable choice of k. This implies that IIZo \I = 0, which proves the assertion. We conclude this section with some remarks on second-order stochastic processes. Let {((t); t E T} be such a process, and denote by «(T» the closed linear manifold spanned by this family. Clearly, «(T» is a subspace of L2 and, as such, it is a Hilbert space. Assume now that {((t); t E T} is a q.m. continuous process and that Tis an interval. In other words, at every t E T. IIW + h) - ((t)1\2 -+ ° as h -+ 0. Denote now by Q c R the set of rationals of R. Then, for every t E T, there exists {td c Q n T such that tk -+ t as k -+ 00. Therefore, because {((t); t E T} is continuous in quadratic mean, we have, for every t E T, ((t) = l.i.m Wn). In other words, the countable family {((t); t E Q n T} is everywhere dense in «(T». Therefore, according to the previous discussion, there exists a complete countable orthonormal family in «(T». 5.7. Linear Operators in a Hilbert Space In this section, we discuss the concept of a continuous linear mapping of a Hilbert space into itself. Definition 5.7.1. Let Yf be a Hilbert space. A mapping T: Yf "linear operator" if T(C1Xl + C2 X 2 ) = Cl TX l + C2 TX 2 -+ Yf is called a (5.7.1) for all Xl' X 2 E Yf and any complex numbers Cl , C2' Definition 5.7.2. The operator T is continuous at Xo E Yf if Xn ~ Xo implies that TXn ~ TX o' This means that, for any I: > 0, there is b = b(l:) such that IIXn - Xoll < b=> IITXn - TXol1 < 1:. 124 (v) 5. L2 Space I T* T II = I TT* II = I T 112. (Clearly, the composites T* T: Yf and -+ Yf TT*: Yf -+ Yf are well defined.) (vi) (T1 0 T2)* = (T2* 0 T1*)' Definition 5.7.5. An operator T: Yf -+ Yf is called (a) (b) (c) (d) (e) isometric if: T*T= I (identity operator), unitary if: T*T= TT* = I, self-adjoint if: T* = T, projection if: TT = T and T = T*, normal if: T*T = TT*. The following result is very useful. Suppose that the space L2 is separable. Then, according to the Proposition 5.5.4, there exists a complete orthonormal basis {Zdf c L 2. Therefore, every Z E L2 has a unique representation Proposition 5.7.2. Let {bn}f be a bounded sequence of complex numbers and C = l.u.b{lbkl; k = 1,2, ... }. There exists a unique operator T such that (i) (ii) (iii) (iv) (v) (vi) TZk = bkZb k = 1,2, ... , T CkZk = CkTZb II Til = C, T*Zk = bkZk, T*T = TT*, T*(2:r=1 ckZd = Ck T*Zk' 2:r=1 2:r=1 2:r=1 PROOF. Consider Z EYf and Z = 2:r=1 CkZk, where {Zk}f c Yf is complete and orthonormal. The problem is to define TZ. Because co co L ICkbkl 2 :::; C 2k=1 2: ICkl2 = k=1 [this follows from (5.5.13) if we put U = C 211Z11 2 (5.7.4) Z], we can define co TZ = 2: bkckZk· k=l From (5.7.4), we have that I TZII 2 :::; C 2 11Z11 2 , which shows that T is continuous and I TIl :::; c. Clearly, TZk = bkZk; because IIZk I = 1, IITII = sup IITZII;;:: II TZkll = IIbkZkll = Ibkl IIZII=l for all k = 1, 2, .... Hence, II Til;;:: c. This proves (i)-(iii). (5.7.5) 5.7. Linear Operators in a Hilbert Space 123 Definition 5.7.3. A linear operator is bounded if there exists a constant C > 0 such that (5.7.2) II TXII ~ C IIXII for all X II TIl· E .Yf. The least such C is called the "norm" of T and is denoted by From the last definition we have II TIl = inf{C; II TXII ~ qXII,x E.Yf} . { C;1tXf II TXII ~ C,X E.Yf} . = mf In other words, IITII = sup IITXII. Xe Jff IIXII X,.o (5.7.3) An alternative formula for the norm of Tis II TIl = sup II TXII· IIXII;l From (5.7.2), it follows that every bounded linear operator is continuous on .Yf. Proposition 5.7.1. If T is a linear and continuous operator on .Yf, there exists a unique linear continuous operator T* on.Yf such that (TX, Y) = (X, T*Y) for all X, Y E .Yf and II TIl = II T* II· Definition 5.7.4. The operator T* is called the "adjoint" of T. We now list some simple properties of adjoint operators. (i) (A. T)* = IT*. =-=--="':-::-:::: (ii) (T*Y,X) = (X, T*Y) = (TX, Y) = (Y, TX). (iii) (T*)* = T [it follows from (ii) and ((T*)*X, Y) = (X, T*Y) = (TX, Y). (iv) If 7;: .Yf -+ .Yf are continuous and linear, i = 1, 2, then (Tl + T2)* = Tl* + T2*· [This follows from (iii) and ((Tl + T2)* X, Y) = (X, (Tl + T2) Y) = (X, Tl Y + T2 Y) = (X, Tl Y) + (X, T2 Y) = (Tl Y,X) + (T2 Y,X) = (Y, Tl*X) + (Y, T2*X) = (TrX, Y) + (TtX, Y) = (Tl*X + T2*X, Y).] 5.8. Projection Operators 125 To prove (iv) and (v), suppose that 00 Z 00 = k-I L CkZk the problem is to show that Yk T*Z and = k-I L YkZk; = Ckbk' For all k, Yk = (T*Z,Zk) = (Z, TZk) = (Z,b,.Zk) = bk(Z,Zk) = bkCk' Finally, T*TZk = T*Ib,.12Zk = TT*Zk; because {ZkH' is complete, T*T = TT*. In addition, if To is another operator such that TOZk = CkZk> then TOZk = TZk for all k = 1,2, ... ; because {Zk}f is complete, To = T. D 5.8. Projection Operators As we have seen in Section 5.4 (Propositions 5.4.1 and 5.4.2), if H c L2 is a subspace and Z E L2 an arbitrary element, there exists a unique decomposition Z = ZH + ZHi, where ZH is called the orthogonal projection of Z on Hand ZHi is orthogonal on Hl.. Let H be an arbitrary subspace of L2 and let P be the mapping P: L2 -.L 2 associated with the subspace H such that, for any Z PZ E L2, = ZH' (5.8.1) The following properties of the mapping (5.8.1) are not difficult to verify. (Here 0 E L2 is a zero element.) PZ=Z, (i) and VZEH (PZ, V) = (Z, PV) (ii) PZ =0 for all Z, V if Z E L 2• E Hl.; (5.8.2) (5.8.3) [This can be shown as follows: (PZ, V) = (ZH' VH + VHi) (Z,PV) = (ZH' VH), = (ZH + ZHi, VH) = (ZH' VH)'] It is also easy to see that (iii) P(Z Z so that P(Z + U) = +U= ZH + U) = PZ + PU, (ZH (5.8.4) + VH) + (ZHi + VHi), + UH = PZ + PV; (iv) P(cZ) = cPZ; (v) PPZ = PZ; (5.8.5) 126 5. L2 Space (vi) (PZ,Z) = IIPZI1 2 ~ IIZII, IIZI1 2 = IIZHII 2+ IIZH"1I2 ~ IIZHI12 = IIPZI1 2 ; (vii) H is the range of P. Remark 5.8.1. The mapping (5.8.1) is called the projection of L2 on H. Sometimes, the notation PH is used to indicate the relationship of P to the subspace He L 2 . Proposition 5.8.1. If T is any projection operator, there is a unique linear subspace He£, such that T = PH. Problems and Complements 5.1. If {IXJ'i and {Pi}! are sequences of complex numbers, show that (Cauchy's inequality) 5.2. Using the Cauchy inequality, show that 5.3. Show that every metric induced by a norm satisfies (i) d(Zl + IX, Z2 + IX) = d(Zl, Z2), (ii) d(IXZ 1 ,IXZ2 ) = IlXld(Zl,Z2), where IX is a complex number. Can every metric be obtained from a norm? 5.4. Can we obtain a norm from a metric? 5.5. Let Z E L2 and A c L 2. The distance between Z and A is defined by d(Z,A) = inf{d(Z, U); U E A}. Show that 5.6. Verify the identity IIZ1 - Z211 2 + IIZ1 - Z311 2 = HZ2 - z31i 2 + 211Z1 - t(Z2 + Z3)112. 5.7. Let {Z.}O' c L 2 ; show that Z. ~ Z if IIZ.II-+ IIZII 5.8. Let Zl' Z2 E and (Z.,Z)-+(Z,Z). L2 {Q, fl, P} be such that IZd ·IZ21 2: 1 (a.s.). Show that (EIZdHEIZ21) 2: 1. Problems and Complements 127 5.9. Let Co the set of all sequences of complex numbers {lXt}i' such that {k;lXk #- O} is finite. Define (X, Y) = ao L XkYk' 1 X, Y E Co. Show that Co is an inner product space but not a Hilbert space. 5.10. Prove that Lao = Lao{n,aJ,p} is a Banach space if IIZII = esssuplZI. 5.11. Let Lo C L2 be a subspace and Z E L 2 • Show that inf{ liZ - UII; U E Lo} = sup{I(Z, W); WELt, II WII ~ 1}. 5.12. If {Z.}i' and {U.}i' from L2 are such that Z.~Z and Un~U, show that (Z., Un) -+ (Z, U). 5.13. Show that in L 2 , ZI ..L Z2 if and only if II Z I + IXZ211 = IIZI - IXZ211. 5.14. Prove that if {Zt}i' c L2 is orthonormal, then, for any Z E L 2 , (i) lim.~ao (Z, Z.) = 0, (ii) IIZi - Zjll = 2 for all i #- j. 5.15. A subset M c L2 is said to be convex if (0 ~ IX ~ 1) ZI,Z2 E M=A = {ZE L 2; Z = IXZ1 + (1-IX)Z2} eM. If{Z.} c Msuch that IIZ.II-+d = inf{lIUlI; U E M},showthat {Z.} is a Cauchy sequence. The set A is called the segment joining ZI and Zz. 5.16. Let {Z.}i' c L2 be an orthonormal sequence. Show that for any U, VE L z ao L k=1 I(U,Zk)(V,Zl)1 ~ 1lU11·11V1I· 5.17. Let {Z.} c L z be such that Z. ~ Z. If 1 • Y.=n L Zi' i=1 then show that Y. ~ Z. 5.18. Let {Zt}7 be an orthonormal collection. Prove that is attained if IXk = (Z, Zt). 5.19. If {Z.} C L2 is such that suP. liZ. II ~ K < 5.20. Let lp, 1 ~ p < that 00, 00, show that Z./n -+ 0 (a. e.). be the set of all sequences ofreal numbers (1X1'lXz, ... ) such 128 5. L2 Space L Icx;lP < cx. 00 i::! Show that Ip is separable. 5.21. Let £ be Hilbert space and qJ a linear functional on £. Show that there exists a unique element Xo E £ such that qJ(x) = (x, xo) for all x (This is the Riesz representation theorem.) 5.22. Prove Proposition 5.7.1. E £. CHAPTER 6 Second-Order Processes 6.1. Covariance Function C(s, t) There exists a large class of engineering and physics problems whose solutions require only the knowledge of the first two moments and some very general properties of a second-order random process (see Definition 1.5.8). This chapter is concerned with some key properties of complex-valued second-order random processes. Let {~(t); t E T} be a complex-valued second-order random process on {n,~,p}, i.e., (6.1.1) Second-order processes are often called "Hilbert processes." Separating the real and imaginary parts of the process, we can write ~(t) = X(t) + iY(t), (6.1.2) where X(t) and Y(t) are two second-order real-valued random processes. In the sequel, unless otherwise stated, we will always assume that E{W)} = 0 for all t E T. Definition 6.1.1. The covariance function C(s, t) of the process (6.1.1) is by definition the second mixed moment, i.e., C(s, t) = E {~(s),(t)} (6.1.3) or, according to the definition (5.1.3) of the inner product, C(s, t) = R(s), e(t)). (6.1.4) 6. Second-Order Processes 130 From (6.1.3), it readily follows that C(s, t) = C(t, s), (6.1.5) which is known as the "Hermitian symmetry." We also have that C(t,t) = Varg(t)} = (6.1.6) 11~(t)112. The covariance function of a second-order process is always finite-valued. This follows from the Schwarz inequality (see 5.1.1) lC(s,t)1 2 ~ EI~(sW·EI~(tW = 11~(s)1I2·1I~(t)1I2. A covariance function possesses a number of interesting features, some of which are listed below. The following property imposes definite restriction on the form of such a function. (i) Every covariance function is non-negative definite. In other words, for any {t 1 , ••• , tn } c T and any complex numbers Zl' ... , Zn' n = 1,2,3, ... , n n II n L L ZiZjC(t i, t) = i=l L j=1 L zizjEWi)~(tj) i=1 j=1 = E = {t Zi~(ti) II it ZiWJ r~ jt1 Zi~(tj)} o. (6.1.7) (ii) Any complex-valued function on TxT which is non-negative definite is Hermitian. To show this, let R(s, t) be such a function and consider n n L L R(ti' tj)ZiZj ~ o. i=1 (6.1.8) j=l From this, for n = 1 we have that R(t,t) ~ 0, 'tit E T. Next, for n = 2, (6.1.8) yields + R(t1,t2)Z1Z2 + R(t2,tdz2Z1 + R(t2,t2)Z2Z2 R(t1,tdz1Z1 This implies that, for all complex numbers R(t1,t2)Z1Z2 Zl, ~ o. Z2, + R(t 2,tdz1Z2 is real. (6.1.9) For Z1 = 1, Z2 = i, (6.1.9) becomes (R(t 2,td - R(tl,t2))i is real, so that R(t 2, td - R(t 1, t 2) is pure imaginary. Finally, if we set Z1 = Z2 = (6.1.10) 1 in (6.1.9), we conclude that R(tl,t 2) + R(t 2,t 1) is real, (6.1.11) 131 6.1. Covariance Function C(s, t) which together with (6.1.10) clearly implies that R(t 1 ,t 2) = R(t 2,td· It seems that the last property of the function R(s, t) implies that any non-negative definite function on TxT is a covariance function of a secondorder stochastic process {e(t); t E T}. (iii) For any non-negative definite function R(s, t) on TxT (real or complex), there exists a second-order process {e(t); t E T} whose covariance function is precisely R(s, t). This has already been established in the case when R(s, t) was real (see Chapter 4; Section 4.4). To show that the statement holds when R(s, t) is complex, consider the Hermitian form H(tl, .. ·,t.) = • • L L R(ti,tj)ZiZj, i=l j=l (6.1.12) where {t 1 , ••• , t.} c: Tn = 1,2, .... Let Rl = Re{R} and R2 = Im{R}; then we can write (6.1.13) R(s, t) = Rl (s, t) + iR 2 (s, t). If we set Zj = Uj - iVj' we readily obtain that H(t1> ... ,t.) = • • L L Rl(ti,tj)(UiUj + ViVJ i=l j=l - i=l L• j=l L• R 2(t i, tj)(UiVj - UjVi) (6.1.14) (there is no imaginary part because, by assumption, R(s, t) is non-negative definite). According to Equation (4.3.3), is the characteristic function of 2n-dimensional Gaussian distribution of a system of 2n real r.v.'s, say (6.1.15) (X(td,···,X(t.), Y(t 1 ), ••• , Y(t.)) with E(X(tJ) = E(Y(tJ) = 0, i = 1, ... , n, and E{X(tJX(tJ} = E{Y(t;)Y(tJ} = R 1 (t i,tj), E{X(tJY(tj)} = -R 2(t i,tj). (6.1.16) (6.1.17) Set W) = (X(t) + iY(t))/J2. (6.1.18) We see that R(s,t) = E{e(s)W)} = R 1 (s,t) + iR 2 (s,t). (6.1.19) Therefore, the system (Wd, ... , W.)) has a Gaussian distribution for all {t 1 , ••• , t.} c: T and n = 1,2, .... These distributions satisfy the Kolmogorov 6. Second-Order Processes 132 consistency conditions, so there exists a complex-valued process e(t) having R(s, t) as its covariance function. Next we shall list a few basic properties of the class of covariance functions. (iv) The class of covariance functions is closed under additions, multiplications, and passages to the limit. In other words, if Cl and C2 are two covariance functions of random process with a common parameter set T, then so are IXlC l + (x2C2 and Cl ·C2 when IXl' 1X2 > O. In addition, if {Cn}f is a sequence of covariance functions and C = lim Cn' then C is also a covariance function. It is apparent that non-negative definiteness is preserved under positive linear combinations or under passage to the limit. In view of what was said in (iii), this proves the first and third statement. Let ~l(t) and ~2(t) be independent; then E gl (S)~2(S)· ~ 1(t)e2(t)} = E g 1(s)~ 1(t)~2(S)~2(t)} = Egl(S)~1(t)}Eg2(S)~2(t)} = C l (S,t)C 2(s,t). Because the first member is a covariance function, the second statement is proved. To see that two such processes, ~l (t) and ~2(t), exist, it suffices to assume that ~ 1(t) is Gaussian on a probability space {ill' ,q~\, Pd and, similarly, that e2(t) is normal on {il2' Bi2, P2 }, and then form the product space. 6.2. Quadratic Mean Continuity and Differentiability Let {~(t); t E T} be an L2 stochastic process with T c R an interval. In general, its covariance function C(s, t) does not provide any direct information about properties of sample functions of ~(t) such as continuity, differentiability, etc. In this section, we will define analogous concepts which make sense in Hilbert space and give criteria for L2 continuity and differentiability in terms of C. What is needed for this purpose is the notion of L2 convergence, i.e., convergence in L2 norm, which is specified by Definition 5.2.l. Definition 6.2.1. A second-order process {~(t); t E T} is said to be L2 continuous [or continuous in quadratic mean (q.m.)] at a point t E T if and only if ~(t + h) ~ ~(t) as h -+ O. According to the definition (5.2.1), this is equivalent to IIW + h) - W)11 2 = EIW + h) - WW-+O as h -+ O. (6.2.1) 6.2. Quadratic Mean Continuity and Differentiability 133 If a process is q.m. continuous at every t E T, we will say it is a q.m. continuous process. The following two propositions establish a relation between q.m. continuity of a stochastic process and continuity of its covariance function. Proposition 6.2.1. Let {(t);t E T} be an L2 process with C(s,t) = E{(s)(t)}. The process is q.m. continuous at a point t E T if and only if C( " . ) is continuous at (t, t). PROOF. Set IIW + h) - (t)112 = = + h,t + h) - C(t + h,t) - C(t,t + h) + C(t,t) C(t + h,t + h) - C(t,t) - (C(t + h,t) - C(t,t» - (C(t, t + h) - C(t, t». C(t From this, it is clear that the process is q.m. continuous at the point C(', .) is continuous at (t, t). Conversely, if {(t); t E T} is q.m. continuous at point t E T, then t E T if IC(t + h, t + h') - C(t, t)1 = lEW + h)~(t + h') - E(t)(t)1 = IE«((t + h) - W»W + h') + E«((t + h') - ~(t»~(t)1 ::s; IE«((t + h) - W»W + h')1 + IE«((t + h') - (t»(t)l. From this and the Schwarz inequality (5.1.1), the assertion follows. D The next proposition shows that q.m. continuity of (t) on T implies continuity of C(s, t) on TxT. Proposition 6.2.2. If C(t, t) = R(t) is continuous at every t E T, C(', .) is continuous on TxT. PROOF. Consider IC(s + h, t + h') - C(s, t)1 + h)(t + h') - E(s)(t)1 = IE«((s + h) - (s»(t + h') + E«((t + h') - (t»(s)1 ::s; [E«((s + h) - (s»(t + h')[ + IE«((t + h') - (t»(s)l. = IE(s Again applying the Schwartz inequality and taking into account the previous proposition, the assertion follows. D 6. Second-Order Processes 134 Remark 6.2.1. Continuity in q.m. of a second-order process does not imply sample function continuity. As an example, consider a time-homogeneous Poisson process N(t) (see Remark 2.4.2) with E{N(t)} = At. As is known, this process has independent increments so that, for any 0 < s < t, E{N(s)N(t)} = E(N(t) - N(s»N(s) + EN2(S) = A(t - S)AS + As(1 + AS), which yields C(s, t) = AS. Because C(s, t) is a continuous function, the Poisson process N(t) is q.m. continuous. However, its sample functions are step functions with probability 1. Definition 6.2.2. A second-order random process {e(t); t E T} is said to have a derivative e'(t) in q.m. at a point t E T if W + h) - e(t) ~ e'(t) when h --. o. h The r.v. e'(t) = de(t) dt is called the q.m. derivative of the random process e(t) at the point t E T. In the sequel, we will need one more definition. Definition 6.2.3. The second generalized derivative of a covariance function C(s, t) is defined as the limit (if it exists) of the quotient 1 hh' {C(s + h, t + h') - C(s + h, t) - C(s, t + h') + C(s, t)} as h, h' --. 0, which is denoted by (6.2.2) Proposition 6.2.3. Let {e(t); t E T} be a second-order process. A necessary and sufficient condition for q.m. differentiability of e(t) at t E T is that the generalized derivative (6.2.2) exists. PROOF. Write E{e(t + h) h 1 = hh' {C(t e(t) W + h') h' + h, t + h') - W)} C(t, t + h') - C(t + h, t) + C(t, t)} and the assertion follows from the Loeve criterion (see Proposition 5.2.5) 0 6.2. Quadratic Mean Continuity and Differentiability 135 Corollary 6.2.1. If {e(t); t E T} is q.m. differentiable at a point t E T, dE {W) }/dt exists and Eg'(t)} (6.2.3) :tE{W)}. = As a matter of fact, this is implied by the q.m. differentiability of the process at the point t E T and the inequality IE{e'(t) - e(t + h- W)}I:::;; (Ele'(t) _ e(t + h~ - W)12YI2. If a second-order process {W); t E T} is q.m. differentiable at every t E T, then {e'(t); t E T} is also a second-order random process. Proposition 6.2.4. Let {e(t); t E T} be a second-order process with covariance function C(s, t). If the generalized derivative 02C(S,t) osot (6.2.4) exists for every s = t and t E T, then e(t) is q.m. differentiable on T. In addition, (6.2.5) and (6.2.6) PROOF. Only formulas (6.2.5) and (6.2.6) require a proof. Write Eg'(s)W)} = lim E(W) e(s + h) - e(s)) h 11-0 = lim C(s + h, t) - C(s, t) h 11-0 oC(s,t) = ----as' Similarly, Ee'(s}e'(t) = lim E (e(s II,/J'-O = lim C(s 11,11'-0 02C(S, t) osot . + h) h e(s) e(t + h'~ - e(t)) h + h,t + h') - C(s,t + h') hh' C(s + h,t) + C(s,t) 6. Second-Order Processes 136 The last result implies that the second generalized derivative exists everywhere on TxT. This proves the proposition. 0 Remark 6.2.1. The concept of a stochastic integral in quadratic mean was discussed in some detail in Chapter 3, Section 3.8, but only in the case of real second-order processes. The same concept and results hold in the case of complex-valued second-order processes. 6.3. Eigenvalues and Eigenfunctions of C(s, t) ° Let {~(t); t E T} be a complex-valued second-order stochastic process on a probability space {n,fJIJ,p}, such that E{~(t)} = for each t E T. Let C(s,t) be its covariance function. In this section, we will give a brief review of some basic properties of eigenvalues and eigenfunctions of C(s, t). This is required for the problem of orthogonal expansions of second-order stochastic processes which will be discussed in the next section. Denote, as before, by <~(T) the closed linear manifold spanned by g(t); t E T} (see Definition 5.3.4). Clearly, ~(T) is a (Hilbert) subspace of L2 {n, fJIJ, Pl. As we have established in Chapter 5, Section 5.6, if the process ~(t) is q.m. continuous on T, the subspace ~(T) is separable. In other words, there exists a countable everywhere dense subset in <~(T). From this, by means of the well-known Gram-Schmidt orthogonalization procedure (see Section 5.6 of Chapter 5) we can always construct an orthonormal family {Zk}1', which is complete in <~(T). In such a case, according to Proposition 5.5.2, for every t E T, ~(t) admits an expansion of the form < < 00 ~(t) = L {3k(t)Zk, (6.3.1) k=l which is convergent in q.m., where (6.3.2) From (6.3.1), it seems to follow that under certain conditions we may write C(s, t) = (~(s), W)) = E g(s)~(t)} = E {~ j~ {3i(S)Zi{3j(t)Zj} (6.3.3) As we shall see later, {{3k(t)}1' is an orthogonal sequence of functions such that b i =F j (6.3.4) ({3i(t), {3it)) = {3i(t){3j(t) dt = 11k . = . f a {O, " I ] 6.3. Eigenvalues and Eigenfunctions of C(s, t) 137 (when T = [a, b]). In such a case, if term-by-term integration of (6.3.3) is permitted, we obtain readily that, for all k = 1,2, ... , {3k(S) - Ak r C(s, t){3k(t) dt = O. (6.3.5) This is a Fredholm linear integral equation of the second kind. Therefore, the Fourier coefficients in expansion (6.3.1) are solutions of the integral equation (6.3.5). For this reason, it seems appropriate to list some basic properties of integral equations of the form (6.3.5) Consider the integral equation q>(x) - A r K(x, y)q>(y) dy = 0, (6.3.6) where the kernel K(x, y) is a given (real or complex) function, q>(.) is unknown, and A is a parameter. Equations of this type are called Fredholm homogeneous linear integral equations of the second kind. In general, Equation (6.3.6) has only the trivial solution q>(x) == O. For certain critical values of A, however, there may exist nontrivial solutions. The values AO' A1 , ••• for which nontrivial solutions of (6.3.6) exist are called "eigenvalues" of the kernel K(x, y), and the corresponding q>'s are eigenfunctions. The eigenvalues and eigenfunctions depend upon the kernel K(x,y). Of particular interest here are kernels which are "Hermitian," i.e., K(x,y) = K(y,x). (6.3.7) As we shall see, such kernels possess a number of important properties, which collectively permit a thorough analysis of the integral equations in question. We now discuss some basic properties of eigenvalues of a Hermitian kernel. The natural context for our discussions will be the Hilbert space L2 [a, b] of square integrable complex-valued functions defined on the interval [a, b]. The symbol 11·11 will be used to denote the norm and r {r r (f, g) = f(x)g(x) dx is the inner product. In the case of square integrable complex-valued functions of two independent variables, we will write IIKII = IK(X,yWdXdyf/2. (6.3.8) Finally, we shall often write the linear transformation (6.3.6) as q> = AKq>. Proposition 6.3.1. Any non-null Hermitian kernel K(x,y) satisfying must have at least one eigenvalue A. (6.3.9) IIKII < 00 6. Second-Order Processes 138 In other words, if the kernel K(x,y) is Hermitian, Equation (6.3.6) must have nontrivial solutions. The next proposition is easy to prove. Proposition 6.3.2. The eigenvalues of Hermitian kernels are real. PROOF. Assume that II <p II =I 0; then )'(K<p, <p) = ()'K<p, <p) = (<p, <p) = 11<p1l2, so that )'(K<p, <p) is positive. On the other hand, because K is Hermitian, (K<p,<p) = (<p,K<p) = (<p,K<p) = (K<p,<p), o which, with the above, implies that), must be real. Corollary 6.3.1. The eigenvalues of a Hermitian kernel K(x, y) and its conjugate K(x, y) are identical. Corollary 6.3.2. If the Hermitian kernel K(x,y) is positive definite, its eigenvalues are positive. As a matter of fact, rr Gr r K(x, y) <p (x) <p (y) dx dy > 0 and K(x, y)<p(y) dy > CP(X)dX))' o. Consequently, which implies that), > O. Proposition 6.3.3. To every eigenvalue), of a Hermitian kernel K(x,y), there corresponds at least one eigenfunction. The number of linearly independent eigenfunctions corresponding to a given eigenvalue is finite. Remark 6.3.1. A kernel of the type K(x,y) un = n L h;(x)J;(y), ;=1 (6.3.10) where {h;}~ and c L 2 [a,b] are two families of linearly independent functions, is called a Pincherle-Goursat kernel. Proposition 6.3.4. Every nonzero Hermitian kernel either has a countably infinite number of eigenvalues or is a Pincherle-Goursat kernel. We are not going to prove this proposition here. However, we shall prove the following result. 6.4. Karhunen-Loeve Expansion 139 Proposition 6.3.5. Two eigenfunctions, cp(.) and t{!(.), of a Hermitian kernel K(x,y), corresponding to two different eigenvalues A1 and A2' are orthogonal to one another. PROOF. Note that (cp,t{!) = (A1 K cp,t{!) = A1(Kcp,t{!) = A1(CP,Kt{!) = A1 ;:(CP,A.2 K t{!) 2 A1 A1 2 Jl.2 = ;:(CP,A.2 K t{!) = ,(cp,t{!), which proves the assertion. D 6.4. Karhunen-Loeve Expansion Let {~(t); t E [a, b]} be second-order complex-valued q.m. continuous process defined on a probability space {n,~,p} with Eg(t)} = O. Its covariance function C(s, t) is non-negative definite Hermitian and, according to Proposition 2.2, continuous on [a, b] x [a, b]. Unless C(s, t) is a Pincherle-Goursat kernel, the covariance function has a countably infinite number of eigenvalues {Ak} f (see Proposition 6.3.4) such that Ak > 0 for all k = 1,2, ... (see Corollary 6.3.2). The corresponding eigenfunctions {CPk(t)} f are continuous on [a, b]. In addition, we will assume that {o1 (CPi' cp) = ifi,., j ·f· . 1 Z= ] for all i, j = 1, 2, ... (see Proposition 6.3.5). Assume that C(s, t) is square integrable; then, for any fixed s E [a, b], the Fourier series of C(s, t) with respect to s is L Ck(t)CPk(S), 00 C(s, t) '" k=1 where Consequently, 00 1 C(s, t) '" k~1 Ak <Pk(t)CPk(S). If the system of eigenfunctions {CPk(t)} 'f is complete, we would have that 00 C(s, t) = k~1 1 . Ak <Pk(t)CPk(S) 6. Second-Order Processes 140 in mean square (see Proposition 5.5.2). In other words, 1 L ,- q>k(t)CPk(S). n C(s, t) = l.i.m. n-oo k=l Ilk The following proposition due to Mercer (1909) holds for all square integrable continuous Hermitian kernels whose eigenvalues are all of one sign. Proposition 6.4.1. Let get); t E [a, bJ} be a second-order q.m. continuous stochastic process with Eg(t)} = 0 and square integrable covariance function C(s, t) with eigenvalues {Ak} f and eigerifunctions {CPk(t)} f, which form an orthonormal sequence. Then, (6.4.1) where the irifinite series converges absolutely and uniformly on [a, bJ x [a, b]. A proof of this proposition can be found in Riesz and Sz.-Nagy (1955). We now embark on a proof of a result known as the Karhunen-Loeve orthogonal expansion. As before, let g(t);tE[a,bJ} (6.4.2) be a second-order q.m. continuous process with E {~(t)} = 0 and covariance function C(s, t). According to Proposition 3.8.1, the integrals (6.4.3) exist for all k = 1,2, ... and represent r.v.'s. Clearly, Eg k } =0, k=1,2, ... and _ - (A;Aj) 1/2 f abfba - C(s, t) cpit) cp;(s) dt ds. From the fact that we obtain - (Ai)1/2 fb EgiO = ~ a - {(A;/Aj )1 /2 q>is)CPi(s)ds = 0 This clearly implies that the r.v.'s {~k}f are uncorrelated. if i = j ifi # j. (6.4.4) 6.4. Karhunen-Loeve Expansion 141 Proposition 6.4.2 (Karhunen-Loeve). On [a, b], the stochastic process (6.4.2) admits an orthogonal expansion of the form (6.4.5) where the infinite series in (6.4.5) converges in q.m. to PROOF. ~(t) uniformly in t. Consider n + k~l 1 _ (Ak)1/2 IPk(t)IPk(t). (6.4.6) Using (6.4.3), it is not difficult to verify that E {W)~d = (Ak)1/2 E {f ~(s)~(t)IPk(S) dS} Similarly, This and (6.4.6) then yield E 1W) - kt1 (Ak~1/2 IPk(t)~k 12 = C(t, t) - kt1 ~k IPk(t)IPk(t). By letting n -+ (fJ and invoking the Mercer theorem, we see that the series in (6.4.5) converges in q.m. to ~(t) for every t E [a, b]. Finally, because C(t, t) is continuous, the convergence is uniform due to Dini's theorem. This proves the assertion. 0 6.4.1. Suppose that ~(t) is a real Gaussian process. Then, the r.v.'s ~" defined by (6.4.3) are normal and, as such, due to (6.4.4), they are independent. In addition, because EXAMPLE L 00 k=l { 1 E (1 )1/2 IPk(t)~k li.k the series in (6.4.5) converges (a.s.). }2 = L00 11 IPf(t) = C(t, t), k=l li.k 6. Second-Order Processes 142 EXAMPLE 6.4.2. Let {e(t); t E [0, I]} be a standard Brownian motion process (see Definition 3.1.1). As we know [see (3.1.4)], its covariance function is (6.4.7) C(S,t) = min{s,t}. To determine its eigenvalues and eigenfunctions, consider the integral equation <p,,(S) - Ak t (6.4.8) C(s, t)<fJk(t)dt = 0, r where C(s, t) is specified by (6.4.7). Now write this equation as Ak t t<fJk(t) dt + AkS r <fJk(t) dt = <fJk(S). (6.4.9) Differentiating both sides with respect to s, we obtain Ak {S<fJk(S) + <fJk(t) dt - S<fJk(S)} = <fJ~(s) or <fJ~(S) = Ak f.1 <fJk(t) dt. (6.4.10) Ak<fJk(S). (6.4.11) Differentiating once more, we have <fJ; = - As is well known, the general solution of this second-order linear differential equation is (6.4.12) From Equation (6.4.9), it clearly follows that <fJk(O) = from Equation (6.4.10), we deduce that <fJ~(I) = o. On the other hand, O. (6.4.13) From the first initial condition and (6.4.13), we have C2(A k)1/2 COS«Ak)1/2) = 0, which implies that (2k)1/2 = n(2k - 1)/2, so that <fJk(S) = C2sin(~)(2k - l)s, k = 1,2, .... (6.4.14) Finally, to determine the constant C2 , we use the fact that {<fJk(t)}'f are orthonormal functions, so that 6.5. Stationary Stochastic Processes 143 From this, we deduce that and, thus, (6.4.15) Finally, we have At = (k - = 1,2, .... (6.4.16) sin 1I:(k ~ t)t ~k' (k - 2)11: (6.4.17) t)211: 2, k Therefore, according to (6.4.5), W) = (2)1/2 f k=1 where {~k}l is a sequence of independent N(O, 1) r.v.'s. 6.5. Stationary Stochastic Processes Let g(t);t E R} (6.5.1) be a second-order complex-valued stochastic process on a probability space {n,.1i, Pl. The stochastic process is called "wide sense stationary" (see Definition 1.7.9) if, for all s, t, hER, E {~(t + h)} = E {~(t)} and E {~(s + h)W + h)} = E {~(s)~(t)}. (6.5.2) In this case, E {~(t)} is clearly a constant, say fl., and the covariance function C(s, t) is a function of t - s, i.e., fl. = Eg(t)}, (6.5.3) From (6.5.3), we deduce that C( -r) = C(r). In what follows, we will assume, without loss of generality, that Eg(t)} = 0 and Varg(t)} = C(O) = 1. The stochastic process (6.5.1), regarded as a family of complex-valued r.v.'s is a subset of L2 {n,.1i, Pl. Because for every t E R, ~(t) is a point in L 2 {n,.1I,p}, the process (6.5.1) represents a curve in L 2 {n,.1i,p}. Because E {~(t)} = (W), 1) = 0 and (~(s), ~(t» = C(t - s), (6.5.4) this curve lies in the subspace which is the orthogonal complement (see Definition, 5.4.1) to {I} in L 2 {n,.1i,p}. According to the theory of second-order processes, the stochastic process (6.5.1) is said to be "continuous" if it is q.m. continuous (see Definition 6.2.1), 144 6. Second-Order Processes that is, if Eg I(t + h) - ~(t)12} -+ 0 as h -+ 0 for all t E (-00,00). From Chebyshev's inequality, it follows that (6.5.1) is also stochastically continuous. Indeed, P{lW for all t E + h) - ~(t)1 > e} I e ~ 2E{I~(t + h) - WW}-+O R, as h -+ O. We now have the following result. Proposition 6.5.1. A wide sense stationary stochastic process is q.m. continuous if and only if the real part of its covariance function is continuous at O. PROOF. For the process (6.5.1), E{I~(t + h) - + h) - ~(t»(~(t + h) - W» = EI~(t + h)12 + EIWW - Eg(t)~(t + h) + W + h)W)} = 2C(0) - C(h) - C(h) = 2(C(0) - Re{C(h)}) = 2 Re{l - C(h)}. ~(t)12} = E(W o The proof now follows directly. As an example consider the random telegraph process. EXAMPLE 6.5.1. Let {~(t); t E R} be a real-valued random process where, for each t E R, the r.v. ~(t) may assume only two values, -1 and + 1, with Pg(t) = -I} =!, Pg(t) = +l} =!. The sequence of transition times {Td':'oo forms a time-homogeneous Poisson process with a parameter A. > O. More specifically, if Pk(U) is the probability of k transitions in (t, t + u), then Pk(U) = e-A.U (A.U)k) ~' k = 0, 1, .... Show that ~(t) is wide sense stationary and determine its covariance function. Demonstration: First we have, for all t E R, Eg(t)} = 1· P{W) = I} + (-l)Pg(t) = -I} = O. On the other hand, after some straightforward calculations, we obtain + T) = -I} = P{W) = 1,W + T) = I}, P{W) = -l,W + T) = I} = Pg(t) = I,W + r) = -I}. Pg(t) = -l,W 6.6. Remarks on the Ergodicity Property 145 Next, for t > 0, Pg(t + t) = ll~(t) = l} = L <Xl k=O P2k(t) = e-;'tch(At). Similarly, Pg(t + t) = ll~(t) where ch(x) = t(e X = -l} = + e- L P2k+1(t) = e-;"sh(At), <Xl k=O X ), From this, we obtain C(t, t + t) = E g(t)~(t + tn = l,W + t) = l} - Pg(t) = -l,~(t + t) = l} P g(t) = 1, W + t) = -1} + P g(t) = -1, ~(t + t) = - l} = Pg(t) = e-;'t{ch(At) - sh(At)} = e- Ut • If, however, t < 0, C(t Therefore, C(t, t + t, t) = eUt • + t) = C(t) = e- U1tl. This shows that the stochastic process ~(t) is wide sense stationary. Figure 6.1(a) is a graphical depiction of a sample function and Figure 6.l(b) a depiction of the covariance function C(t) of the process ~(t). 6.6. Remarks on the Ergodicity Property Consider an arbitrary random process ~(t) and suppose that we want to determine some characteristics of the process, such as its mean, variance, and covariance function. For the first two characteristics, one needs one- r----'I 1 C(T) ,, 1 T -2' I , I -.J T -11 1 I 0 TIl 1 I ~., T21 I I L (a) (b) Figure 6.1 146 6. Second-Order Processes dimensional marginal distributions of the process. The evaluation of the covariance function requires bivariate marginal distributions of ~(t). In the case when these distributions are not available, estimates of these characteristics can be achieved if enough sample functions of the process ~(t) are available. Large classes of stationary random processes have an interesting property known as "ergodicity." This feature, roughly speaking, means that on the basis of a single realization over a sufficiently long time interval, we can obtain an estimate of the mean of the process and, possibly, of some other parameters. More specifically, if ~(t) is an ergodic stationary process, then the time average is equal to the ensemble average, namely, 1 lim -2 T .... oo T fT -T C;(t) dt = E {e(s)} = ex, (6.6.1) where the convergence is in the q.m. sense. In general, if 1 ~~ 2T fT -T f(W))dt = E{f(~(s))} (6.6.2) holds for every Borel function f( . ) such that E If(~(t))1 < 00, we say that the process is ergodic. From this, we also obtain that = 1 lim -2 T T .... '" fT W)~(t -T + r) dt. (6.6.3) In general, it is not easy to give a simple sufficient condition for ergodicity. For Gaussian processes, however, we have the following result. Let {~(t); -00 < t < oo} be Gaussian and stationary with E{e(t)} = ex, Then ~(t) is ergodic if f: IR(r)1 dr < 00. (6.6.4) Sometimes it may happen that (6.6.2) holds for some function f(·). For instance, ifit holds for linear functions only, we say that the process is ergodic with respect to its mean. This question can also be asked in a somewhat more direct way. Again, let {~(t); -00 < t < oo} be a stationary process; we are interested in a sufficient condition for f ~(t) 1 -2 T T -T dt ~ ex, (6.6.5) 6.6. Remarks on the Ergodicity Property where IX = 147 Eg(t)}. In other words, we want to know when ~(t)dt - lim E {-21 fT T-T T~oo 1X}2 = o. (6.6.6) Clearly, If T E { 2T -T ~(t)dt - IX }2 1 f T f T = (2T)2 -T -T (Eg(t)~(s)) - 1X2)dtds = (2 1 )2 fT fT R(t - s)dtds. -T-T T We change the variables of integration by putting z r = t - s, t = !(z + r), = t + s, s = !(z - r). The Jacobian of the transformation is ot or as or ot oz aS oz 2 2 I~ 2 1 2· 1 On the o"ther hand, the domains of integration are shown in Figure 6.2. Therefore, we have 1 (2 T)2 fT fT -T 1 = 2(2T)2 -T R(t - s) dt ds (fO f<+2T -2T -<-2T R(r)dzdr r 2T r-<+2T ) J<-2T R(r) dz dr + Jo z 2T T -T 0 T s _2~T~------~--------}-~r -T -2T Figure 6.2 6. Second-Order Processes 148 (f:2T R(r)(4T)dr + I2T R(r)(4T - 2r)dr) = 2 (21T) 2 = 2~[f:2T R(r)dr + I2T R(r)( 1 - 2~ )dr] ::; 21T[f:2T IR(r)ldr + I2T IR(r)l(l - 2~)drJ 1 foo ::; 21 f2T IR(r)1 dr ::; -2 IR(r)1 dr. T -2T T -ro Thus, (6.6.5) holds if f: IR(r)1 dr < 00, which is a simple sufficient condition for ergodicity. Problems and Complements 6.1. Show that the following functions are non-negative definite: R I (s, t) = min(s, t), (i) (ii) Rz(s,t) (iii) I - It - sl, ={ It - sl ::; 0; 1 0, It - s) > 1 = min(s, t) - st, R3(S, t) (iv) z s, t R 4 (s,t) = exp{ -It - s, t E R; s, t E [0,1]; sl}, s, t E R. 6.2. Let C(s, t) be a covariance function and Pn (' ) a polynomial with positive coefficients. Show that Cl(s,t) = Pn(C(s,t)) is also a covariance function. 6.3. Consider the stochastic process {~(t);t W) = z O}, where X cos ('It + 0) and X > 0, 'I > 0 are r.v.'s independent of 0 with finite second moments, whereas is uniform in [0,2n]. Show that W) is stationary. o 6.4. Let hi (t), ... , hn(t) be real functions and a l , ... , an positive constants. Show that C(s, t) = n L akhk(s)hk(t) k~l is a covariance function. 6.5. Let {W); t z O} be a stochastic process defined by W) = IX sin(fJt + X), 149 Problems and Complements where ex. > 0 and fJ > 0 are constants and X ~ N(O, 1). Determine Eg(t)} and Eg(s)~(t)}. Is the process wide sense stationary? 6.6. Let {X.} c: L2 and X. ~ X. Is X E L2? Show that Z.~Z, U.~U. (i) EX. -+ EX, (ii) EIX.12 -+ EIXI 2, (iii) (Xn' X) -+ (X, X). 6.7. Let {Zn}, {Un} c: L2 be such that Show that EZnUn -+ EZU. 6.8. Let {X.} c: L2 and Xn ~ X. Find a condition under which X. ~ X Xn -+ X (a.s.). => 6.9. Let {~(t); t E T} c: L2 be a random process and to E T. Show that ~(t) ~ Z E L2 as t -+ to if and only iffor all {tn}f, {sn}f such that t. -+ to, Sn -+ to, we have EWn)~(s.) -+ Co (constant). 6.10. If g(t);t E [a,bJ} c: L2 is L 2-continuous, show that, in L 2 , I' -d ~(s)ds = ~(t), dt • a ~ t ~ b. 6.11. Find the Karhunen-Loeve expansion on the interval [0, 1] of an L2 process with covariance function C(s, t) = st. 6.12. Determine eigenvalues and eigenfunctions of the Fredholm homogeneous integral equation q>(X) - A I' C(x, t)q>(t) dt = 0, where C(x,t) = { cosxsint, . cos tsmx, O~x~t t ~ x ~ n. 6.13. Let C(s, t) be given by C (s,t) = { S(t - 1), 0~s~t t(s - 1), t ~ s ~ 1. Find the eigenvalues and complete orthonormal eigenfunctions. 6.14. Consider (0 ~ X,Y ~ 1) K(x,y) = min(x,y) - xy. Show that the kernel is non-negative definite and find its eigenvalues and eigenfunctions. 6.15. Determine the eigenvalues and eigenfunctions of the symmetric kernel K(s, t) = {t(S + 1), s(t + 1), t ~s~1 t ~ s ~ 1. CHAPTER 7 Spectral Analysis of Stationary Processes 7.1. Preliminaries Let {~(t); t E R} be a wide sense stationary, complex-valued random process with E{~(t)} = 0 and C(t) = E g(s)~(s + t)}. (7.1.1) In this chapter, we will continue to build a theory of this particular class of stochastic processes based on the covariance function alone, using methods discussed in Chapter 5. As we have established in Chapter 6, the covariance C(t) is a non-negative definite complex function of real argument. In this section, we will see that the class of such functions coincides with the class of complex functions of real argument t/J(t) which can be written as K > 0, t/J(t) = Kt/Jo(t), (7.1.2) where t/Jo(t) is the characteristic function of a real-valued r.v. Consequently, every covariance function should have a representation of the form t/J(t) = J:oo e i1x dF(x), (7.1.3) where F(-) ~ 0 is a nondecreasing bounded function on R. Let us first prove the following result. Proposition 7.1.1. The covariance function C(t) is continuous on R continuous at t = O. PROOF. Because C(t) is non-negative definite, if it is 7.1. Preliminaries 151 n n LL ;=1 j=1 Set n = 3, Z1 = 1, Z2 = C(O) = Z, Z3 + zC( -u) - = (7.1.4) C(t; - tj)ZiZj ~ O. -Z, t1 = 0, t2 = U, t3 = v in (7.1.4) to obtain + zC(u) + IzI2C(U) -lzI2C(U - v) - zC(v) - IzI2C(U - v) + IzI2C(0) C(O) + 2 Re{z[C(u) - C(v)]} + 2IzI2[C(0) - Re{C(u - v)}] zC( -v) ~ O. Writing C(u) - C(v) z = = Ic(u) - c(v)le i8 , xe- 8i , x real, the last inequality becomes C(O) + 2xlc(u) - C(v)1 + 2X2[C(0) - Re{C(u - v)}] Because this holds for all x E ~ O. R, the discriminant cannot be positive, so that lC(u) - c(v)1 2 ~ 2C(0) [C(O) - Re{C(u - v)}J. (7.1.5) Now, by assumption, C(.) is continuous at o. Thus, because in the last inequality the right-hand side tends to zero as u -+ v, so does the left-hand side. This proves the assertion. 0 Remark 7.1.1. From (7.1.5) we see that C(t) is uniformly continuous on R ifit is continuous at zero. The following is the celebrated Bochner-Khinchin theorem. Proposition 7.1.2. A complex-valued function C(t) defined on R continuous at zero is the covariance function of a wide sense stationary stochastic process if and only if it can be written in the form C(t) = f: e itx dF(x), (7.1.6) where F( . ) is a real nondecreasing bounded function on R. The function F(· ) is referred to as the spectral distribution of the process It is uniquely defined up to an additive constant and we can always suppose that (7.1.7) F( -00) = 0, F( +00) = C(O). ~(t). In addition, we assume that F(·) is right-continuous. If F(·) is absolutely continuous, the derivative f(·) = F'(·) exists and is called the spectral density of the process. 7. Spectral Analysis of Stationary Processes 152 Remark 7.1.2. Let <p( . ): R --+ R+ = [0, 00) be a continuous symmetric function such that <p(0) = 1 and <p(t) --+ as t --+ 00. In addition, if <p(.) is convex on R+, then <p(. ) is a characteristic function. This result is due to P6lya. It is useful in establishing if a certain function is non-negative definite. ° EXAMPLE 7.1.1. Let us see if the following functions are non-negative definite: B (t) o (i) = (ii) {I°- It I Bl (t) 1 if It I ~ ifltl>l; = e- I/I ; B2 (t) = ei/i • (iii) Clearly, Bo(t) satisfies all the conditions of the P6lya theorem and, therefore, represents a characteristic function. The same conclusion holds for Bl (t). However, B 2 (t) is not non-negative definite. 7.1.2. Let {Zk}~ c L 2 {n,81,p} be an orthonormal family of complex-valued r.v.'s with E{Zd = 0, k = 1, ... , n; then EXAMPLE n ~(t) = L k=l Zk exp(iAkt), Ak real, is a wide sense stationary stochastic process. Its covariance function is n E {e(s)~(s + t)} = (~(s), ~(s + t)) = L k=l exp(iAkt). The next proposition gives the spectral representation of a covariance function in the discrete case. Proposition 7.1.3 (Herglotz's Theorem). Let C(n) be the covariance function of a wide sense stationary sequence {en}':' with E{en} = 0, then C(n) = f~" e iAn dF(A), (7.1.8) where F(·) is a bounded nondecreasing function with support [ - n, n]. The last two propositions will be proved in the next section. Remark 7.1.3. If the covariance function C(t) is absolutely integrable on (-00,00), i.e., if f: IC(t)1 dt < 00, (7.1.9) then, as is known, F(· ) is absolutely continuous and 1 f(x) = -2 n foo -00 e-ixlC(t)dt. (7.1.10) 7.2. Proof of the Bochner-Khinchin and Herglotz Theorems If (7.1.9) does not hold, we have, for any Xl 153 < x2, p 2 F (X2 ) - F( Xl ) -- - 1 l'1m fT exp(itxd - . ex (-itX )C()d t t. 21t T .... oo -T zt (7.1.11) Remark 7.1.4. If F( . ) is a symmetric function, i.e., if F(x) = C(O) - F( -x + 0), (7.1.12) the covariance function is real and f~oo (cos tx) dF(x). C(t) = (7.1.13) In addition, if (7.1.9) holds, J(x) = -1 foo C(t)(costx)dt. 1t 0 (7.1.14) A covariance function is real if and only if (7.1.12) holds. EXAMPLE 7.1.3. For the random telegraph process (see Example 6.5.1), we have established that C(t) = e-2.l lt l, -00 < t< 00. This is clearly an absolutely integrable function so we can use the formula (7.1.14) J(x) = -1 foo e-2A.t cos(xt) dt. 1t 0 Integrating by parts twice, we obtain ~ too e- w cos(xt) dt = 2!A. (1 - ~~ too e-2.lt cos(xt) dt). Therefore, 7.2. Proof of the Bochner-Khinchin and Herglotz Theorems We shall first prove the Bochner-Khinchin theorem. For this purpose we need the following lemma. Lemma 7.2.1. Let tp(.) be bounded and integrable on [ - T, T] and such that, Jor all x E R, 7. Spectral Analysis of Stationary Processes 154 = fT e-itx<p(t)dt fT(X) -T ~ o. (7.2.1) Then, fT(X) is integrable on R = (-00,00). PROOF. From (7.2.1), it is clear that fT(·) is continuous and, therefore, integrable on every finite interval. Next, set GT(X) = f~x fT(U)du. Because fTO ~ 0, GT(xd :::;; GT(X 2 ) for every GT ( . ) is bounded. To this end, set (s > 0) rPT(S) (7.2.2) Xl :::;; X2 • We must show that =! f2. GT(x)dx ~ GT(s). S • Consequently, if rPT(·) is bounded, so is GT(·). From (7.2.1) and (7.2.2), we have GT(x) =f Therefore, rPT(S) = -2 x du fT. eItU<p(t) dt -x f2' dx fT S. = ~ fT s = -T -T = 2 fT -T -T sintx - - <p(t) dt. t sin tx <p(t) dt -t ~{cosst - cos 2st} <p(t) dt t ~ f:T ~{1- 2sin2(~) -1 + 2sin st} <p(t)dt. 2 Set M = suptl<p(t)l; then, clearly, IrPT(S) I :::;; 6M f: (Si:Xy dx, o which proves the lemma. We now prove the Bochner-Khinchin theorem. First, let us show that the function C(t) defined by (7.1.6) is non-negative definite. Indeed, il ktl C(tj - tk)ZjZk = = = j~ ktl (f:oo exp[i(tj - f: C~ foo It -00 )-1 tk)x] dF(X») ZjZk exp(itjx)Zj ktl exp(itkX)Zk)dF eX P(itjX)ZjI2 dF(x) ~ o. 7.2. Proof of the Bochner-Khinchin and Herglotz Theorems 155 y v Tt------, D ' - -_ _ _ _L - - _.... o U ~---~-----~-.y o T T Figure 7.1 Next, let us show that any non-negative definite function which is continuous at zero has a representation in the form (7.1.6). To this end, consider fT(X) = Clearly, fT(X) ~ 2~T IT IT C(U - v)exp[ -i(u - v)x]dudv. (7.2.3) 0 because C(u - v)e-iue iv ~ O. Let us now make the change of variables in the double integral (7.2.3) as follows: t = dudv = IJldtdy = dtdy, y = u - v, u, where J is the Jacobian of the transformation. From this we obtain (see Figure 7.1) fT(X) = . dt -21- (fO dy fT C(y)e-'Yx nT -T -Y 1 (f . = 2nT -TO C(y)e'YX(T = + 0 + y)dy + ~ fT C(y) (1 - ~) 2n -T T e- iyX fT dy f T- Y C(y)e-'Yx . dt ) ° ° fT . C(y)e-'YX(T - y)dy ) dy. According to Lemma 7.2.1, fT(X) is integrable over (-00,00). Therefore, the inverse Fourier transform of fT(X) exists and is equal to C(y) (1 - I~I) for alllyl ~ T. From (7.2.4), we have C(O) = = t: t: fT(x)e iXY dx fT(X) dx (7.2.4) 7. Spectral Analysis of Stationary Processes 156 for all T > O. Because IYI) ~ C(y) T ( C(y) 1 - as T~ 00 uniformly on every finite interval and fT(X) ~ C(y) = we obtain that as T f(x) f: ~ 00, f(x)e ixy dx, which proves the proposition when F(x) in (7.1.6) is absolutely continuous. In a similar fashion, one can show it holds in the general case. PROOF OF THE BERGLOTZ THEOREM. Clearly, C(n) given by Equation (7.1.8) is non-negative definite. For each 1i = 1,2, ... and x E [ -n, n], define fn(x) by then fn(·) we have ~ O. Because there are n - Iml couples (k, v) for which k - v = m, nf J,,(x) = 21 (1 - ~) C(r)e- ixr . n r=-n+1 n (7.2.5) Set 1 Fn(x) = 2n fX _It J,,(u)du. From (7.2.5) and the fact that f it eixkdx = 0 -" if k =I 0, it follows that f" 1 -2 J,,(u) du = C(O). n -" Therefore, Fi· ) is a nondecreasing continuous function. Also from (7.2.5), (7.2.6) From the Belly compactness theorem, there is a subsequence of {Fn} which converges weakly to a bounded distribution F and, hence, converges completely because Fn( -n) = 0 and Fn(n) = C(O) for all n. Equation (7.1.8) then holds as a consequence of the second Belly theorem and (7.2.6). D 7.3. Random Measures 157 7.3. Random Measures In this section, we define the concept of an orthogonal random measure and of a stochastic integral with respect to it. We begin with some notation and definitions needed for this purpose. Let {n,.sf,p} be a probability space and {S,9'} an arbitrary measurable space. Let 9'0 be an algebra which generates the u-algebra 9'. Definition 7.3.1. A mapping (7.3.1) satisfying the conditions (i) 17(e) = 0, where e is the empty set, (ii) 17(A u B) = 17(A) + 17(B) (a.s.) for any disjoint A, B E ~ (7.3.2) is called an elementary random (or stochastic) measure. From the definition it is clear that m(A) = 1117(A)1I 2 < 00 for any A E~. (7.3.3) Definition 7.3.2. An elementary random measure is said to be orthogonal if (17(A), 17 (B)) = 0 for all disjoint A, B E 9'0' (7.3.4) For an elementary orthogonal random measure, the set function m(·) on defined by (7.3.3) is finitely additive. Indeed, if Ai> A2 E ~ are such that At ( j A2 = then ~ e, m(At u A 2) = 1117(At u A 2)1I 2 = 1117(A t ) + 17(A 2)1I 2 (due to (7.3.2.ii). Now, due to (7.3.4), 1117(A t ) + 17(A 2)1I 2 = 1117(AdIl 2 + 1117(A 2 )1I 2 = m(At) + m(A2)' Assume now that m( . ) is subadditive; then it can be extended to a measure on {S, 9'}. This extension, which is unique, will again be denoted by m. Then m( .) is called the measure associated with 17('), In the following, we will denote by (7.3.5) the Hilbert space of complex-valued functions on S which are square integrable with respect to m. Let {BkH c ~ be disjoint and consider h(s) = where Ct, ••• , n L ckIBJs), k=t cn are complex numbers. Define (7.3.6) 7. Spectral Analysis of Stationary Processes 158 t{I(h) = Is (7.3.7) h(S)l1(ds) = ktl Ckl1(Bk)· This integral is well defined and, clearly, t{I(h) E L 2 {n,.1l,p}. If n f(s) = L (X;!B,(S), i=l where (Xl' ... , (Xn are complex numbers, = jt1 c/ijm(Bj ) = Is h(s)f(s)m(ds) = (h,f). (7.3.8) The last inner product is defined in L 2 (m) [see (7.3.5)]. Thus, the mapping t{I preserves the inner product. Denote by L! c L 2 (m) the set of all functions on S of type (7.3.6). The integral (7.3.7) represents a linear mapping of L! to L 2 {n,.1l,p}. Because the inner product is preserved, the mapping is clearly isometric. Now, let g E L 2 (m) be arbitrary and let {hn}f be a sequence offunctions of type (7.3.6) such that II g - hn I -> as n -> 0 (7.3.9) 00. Then, due to (7.3.8), Ilt{I(hn) - t{I(hm) I = Ilhn- hmll -> 0 as m, n -> 00, (7.3.10) which shows that {t{I(hn)}f is a Cauchy sequence. By the Riesz-Fisher theorem (Proposition 5.2.4), there exists a r.v., say t{I(g) E L 2 {n,.1l,p}, such that IIt{I(g) - t{I(hn)1I -> 0 as n -> 00. This LV. is called the stochastic integral of g E LAm) with respect to the elementary orthogonal random measure 11. We denote it by t{I(g) = Is (7.3.11) g(S)l1(ds). Thus, the mapping (7.3.12) represents an isometric isomorphism. The following properties of t{I are straightforward consequences of its construction: For any f, g E L 2 (m), (i) (ii) (iii) (t{I(f), t{I(g)) = (f, g); Ilt{I(f)11 2 = IIfl12 = Is If(sWm(ds); + c2 g) = C 1 t{I(f) + C2 t{I(g) (a.s.); Ilt{I(fn) - t{I(f)11 -> 0 if Ilf - fn I -> 0 as n -> 00. t{I(cJ (7.3.13) 7.3. Random Measures 159 Can an elementary orthogonal stochastic measure 1'/, which is defined on Yo, be extended to g. The answer is affirmative and we will use the definition of the integral (7.3.11) to prove it. For A i7(A) = E g, define Is IA(s)I'/(ds). (7.3.14) It follows from (7.3. 13.ii) that i7(AI n A 2) = i7(A I ) + i7(A 2) (a.s.) for AI' A2 E g such that Al n A2 = (). From (7.3.13.i), we also have 11i7(A)11 2 = m(A), A (7.3.15) E [f'. Let us show that i7 is countably additive in the q.m. sense. Let {Bk}f be disjoint and let c g Then, n 00 = 00 L:I !/J(IBJ - L1 !/J(IBJ = k=n+l L I/I(IBJ However, as n --+ 00. Consequently, ~ i7(B I i7(B) - k) 11--+ 0 as n --+ 00. From (7.3.13.i), we also have if AI' A2 E !/' are disjoint. The mapping r;: !/' --+ L2 {n, 86', P} is countably additive in the q.m. sense and coincides with 1'/ on Yo. We will call i7 an orthogonal stochastic measure and write from now on 1'/ instead of i7. We also call I/I(g) = Is g(s)I'/(ds), defined above, a stochastic integral with respect to '1. 160 7. Spectral Analysis of Stationary Processes 7.4. Process with Orthogonal Increments In the rest of this chapter {S, Yo m} = {R, 9l, m}, where R is the real line and 9l is the a-algebra of Borel subsets of R. The purpose of this section is to show that orthogonal random measures are generated by processes with orthogonal increments, and vice versa. Let {Z(t);tER}cL 2 {n,9i,p}, with E{Z(t)} = 0 for all t E R. Definition 7.4.1. The stochastic process {Z(t); t increments if, for any to < tl < t 2 , E(Z(td - Z(t O ))(Z(t 2 ) E R} is said to have orthogonal Z(td) = O. - (7.4.1) In the following, we shall say that the process is right-continuous in the q.m. sense if, for every t E R fixed, IIZ(t) - Z(tk)11 - 0 as tk! t. (7.4.2) Let us now show that an orthogonal stochastic measure '1 generates a process having orthogonal increments. Let m be the finite measure associated with '1. Set Z(t) = '1« -00, tJ). (7.4.3) Then = m« IIZ(t)1I2 -00, tJ) = F(t) (7.4.4) and, for all s < t, IIZ(t) - Z(s)1I 2 = 1I'1«s,t])11 2 = m«s,t]). (7.4.5) From this, we clearly have that Z(t) is right-continuous in the q.m. sense. In addition, the function F( . ) defined by (7.4.4) is also right-continuous. Finally, because E(Z(tl) - Z(t O ))(Z(t 2 ) - Z(t 1 )) = ('1«t o, t 1 J),'1«tl' t 2 ])) = 0, the process Z(t) defined by (7.4.3) has orthogonal increments. Conversely, let {Z(t);t E R} be as in Definition 7.4.1 and, for any s < t, define F(t) - F(s) = IIZ(t) - Z(s)112. (7.4.6) Because the process has orthogonal increments, for any s < , < t, IIZ(t) - Z(s)11 2 = IIZ(t) - Z(,) = IIZ(t) - + Z(,) - Z(s)11 2 Z(,)11 2 + IIZ(,) - Z(s) II 2 ~ IIZ(,) - Z(s) II 2. Therefore, F(t) ~ F(,), (7.4.7) 7.4. Process with Orthogonal Increments 161 which implies that F(·) is a nondecreasing function on R. Finally, from (7.4.6), we see that Z(t) is q.m. continuous at t E R if and only if F(t) is continuous at t. In the following, we assume that F( -(0) = 0 F( +(0) < and (7.4.8) 00. Let m(·) be the Lebesgue-Stieltjes measure that F generates. We see that, for any interval (s, t], m«s, t]) = F(t) - F(s) = IIZ(t) - Z(s)ll. Therefore, if !l' is the algebra of sets A = n U (a k=l k ,,,,,], where (a k , bk ] are disjoint and l1(A) = n L [Z(b k=l k) - Z(a k )], (7.4.9) F(a k )] (7.4.10) it is evident that where n m(A) = L k=l [F(bk ) - and (11 «ai' b;]), 11 «aj' bj])) = 0 for all i ¥- j. This clearly implies that l1(A) defined by (7.4.9) represents an elementary orthogonal stochastic measure. The set function m(·) defined by (7.4.1) has a unique extension to a measure on ~. From this and the preceding section, we see that 11( . ) can be also extended to ~ and that 1111(A)11 2 = m(A), A E 9f!. Consequently, there exists an isomorphism between processes {Z(t); t E R} C L2 {n, gj, P} with orthogonal increments continuous from the right in the q.m. sense and such that IIZ(t)11 2 = F(t), F(-oo) = 0, F( +(0) < 00, and orthogonal stochastic measures 11 with m( . ), the measure associated with it. The correspondence is given by Z(t) = 11« -00, t]), F(t) = m« -00, t]), and l1«S, t]) = Z(t) - Z(s), m«s, t]) = F(t) - F(s). 162 7. Spectral Analysis of Stationary Processes Finally, the stochastic integral f: (7.4.11) h(t)dZ(t) will mean the stochastic integral f: h(t),,(dt). 7.5. Spectral Representation This section is concerned with the spectral representation of a wide sense stationary stochastic process. Mathematically speaking, the spectral representation establishes an isometric isomorphism between the closed linear manifold spanned by the process and a certain Hilbert space L2 {R, fJI, F} of complex functions. Such a representation is often useful in a large class of scientific and engineering problems, for example, guidance and control, signal detection, image enhancement, and so on. Let g(t);t E R} c L2{n,~,p} be a wide sense stationary stochastic process with Eg(t)} = 0, C(t) = E(~(s)~(s + t». (7.5.1) If the covariance function C(t) is continuous at zero, then by Proposition 7.1.2, there exists a unique nondecreasing bounded function F(·) on R such that C(t) = f: (7.5.2) e itx dF(x). The function F( . ), which we assume to be continuous from the right, is called the "spectral distribution" of the process ~(t). The set of the discontinuity points of F(·), which is always finite or countably infinite, is called the "spectrum" of the process ~(t). Proposition 7.5.1. Assume that the covariance function C(t) is continuous at zero. There exists unique orthogonal stochastic measure " with· values in such that L2{n,~,p} ~(t) = f: Furthermore, 1I,,(A)11 2 = m(A) = (7.5.3) eo.t,,(d)..) (a.s.). L dF for all A E fJI, (7.5.4) 7.5. Spectral Representation 163 where m( .) is the Lebesgue-Stieltjes measure associated with '1 and generated by the spectral distribution F. PROOF. There are various ways to prove this proposition. The method adopted here is a version of Stone's spectral representation theorem for a family of unitary operators in a Hilbert space. Let L 2(m) = L2 {R, 9f, m} be the Hilbert space of complex square integrable functions on R with the inner product (f, h) = f: f(l)h(l)m(dl). (7.5.5) Consider the system of complex functions {eilt;t E R} c L2{R,9f,m}. > Denote by <{e iA1 ; t E R} the linear manifold spanned by {e iA1 ; t E R}. This linear manifold is dense in L 2 (m) (see Proposition 5.3.2). Therefore, (7.5.6) As usual, we denote by <e(R» the closed linear manifold spanned by g(t);t E R}. We now establish a one-to-one correspondence between the linear manifolds <{e iAt ; t E R} and <e(R» by setting > e ilt +-+ e(t), (7.5.7) t E R. Clearly then, for any {t i , ... , t n } c R, we have (7.5.8) Note that (7.5.7) is a consistent definition in the sense that n n L cje iA1J = 0 (a.e.) m¢> j=i L Cje(t) = 0 (a.s.) P. j=i The correspondence (7.5.7) is an isometry. As a matter of fact, keeping in mind (7.5.2) and (7.5.7), we have (e iAt' , e iAt2 ) = f: e iA(tI-/2) dF(l) = C(ti - t 2 ) (7.5.9) Similarly, ( ~ t:tje iAtJ, ~ Pk eiA/k ) = (~ t:tj~(t), ~ Pke(tk»). Now, consider' E <~(R». There exists gn}'i' c <~(R» such that II' - 'nil --+ 0 as n --+ 00. (7.5.10) 7. Spectral Analysis of Stationary Processes 164 Consequently, {(n}f is a Cauchy sequence and, therefore, so is {tn}f c L 2 {R,9l,m}, where fn +-+ 'no L 2(m) such that 11f.. - fll -+ E L 2 (m) and Ilf - f,,11 -+ 0, <~(R) and {17,,} c <~(R) such that Because L2 {R, 9l, m} is complete, there exists f E o as n -+ 00. The converse also holds, namely, if f where {t,,} c <{e it .<; t E R}), there is 17 E 1117 - 17,,11 -+ 0 as n -+ and 00 17" +-+ fn. This clearly implies that the correspondence can be extended to elements of < {e it .<; t E R}) and <~(R). Next, consider f(J..) = IB(J..), and let 17(B) E <~(R) BE 9l, be such that IB(J..) +-+ 17 (B). Clearly, so that due to isometry, 1117(B)11 2 = m(B). In addition, if Bl r. B2 = 0, where B 1 , B2 E r!li, (17(B 1 ),17(B2)) =0 and II 17(B) - ktl 17(Bk) 112 -+ 0 as n -+ 00, where {Bd f c r!li are disjoint and B = Uk Bk • Therefore, the family 17(B), B E 9l, forms an orthonormal stochastic measure with respect to which we can define the stochastic integral t/I(f) = Let f E f: f(J..)17(dJ..), f E L2(m). (7.5.11) L 2 (m) and f +-+, and set , = tfi(f)· Let us show that t/I(f) = tfi(f) (a.s.) P. (7.5.12) If f(J..) = L" ckIBk(J..) k=l (7.5.13) 7.5. Spectral Representation 165 with Bk = (ak' bk] disjoint, then from (7.5.11), n t/I(f) = L c 1/(B k=l k k ), which is evidently equal to fjJ(f). Hence, (7.5.12) holds for functions of the form (7.5.13). However, if f E L2(m) and Ilf - fnll -+ 0 as n -+ 00, where fn(A) are of the form (7.5.13), then IIfjJ(f) - fjJ(fn)ll -+ 0 as n -+ 1It/I(f) - t/I(f..) I [by (7.3.13.iii)], -+ 0 00, which proves (7.5.12). Finally, consider f(A) = e iM ; t: then, by (7.5.7), fjJ(e iM ) = ~(t). On the other hand, due to (7.5.11), t/I(e iAt ) = This and (7.5.12) then yield W) = f: eiAt 1/(dA). eU '1/(dA.) (a.s.) 0 for all t E R. This proves the proposition. Remark 7.5.1. Let {Z(t); t E R} be a stochastic process with orthogonal increments corresponding to an orthogonal stochastic measure 1/(')' Then, from (7.4.11) and (7.4.12), we deduce that ~(t) = Remark 7.5.2. If E{W)} = IX, f: eU'dZ(A). (7.5.14) then we have ~(t) = IX + f: ei ),'1/(dA). Remark 7.5.3. From the proof of the proposition it is clear that there must be many different representations of a wide sense stationary process. The one that we give here is particularly useful, because it allows application of classical harmonic analysis in the studying of an important class of random processes. Let C(n) be the covariance function of a wide sense stationary sequence {~n}O with E {~n} = O. According to Herglotz's theorem (Proposition 7.1.3) 166 7. Spectral Analysis of Stationary Processes the covariance function has the spectral representation C(n) = f~" e inA dF()"), where F( . ) is a real nondecreasing bounded function. Proposition 7.5.2. There is an orthogonal random measure", on ~ n (n, n] such that, for every n, (7.5.15) and II ",(d)") 112 = dF()"). The proof of this proposition is essentially the same as the proof of the previous proposition. Remark 7.5.4. The spectral representation of a wide sense stationary process implies that, for each t, e(t) can be approximated in the q.m. sense by a finite sum of the form n L e iAkl .1Z()..k)' k;l where .1Z()..k) are orthogonal r.v.'s from <e(R». 7.6. Ramifications of Spectral Representation As we have established (see Remark 7.4.1), the covariance function C(t) of a second-order stationary process is real if and only if its spectral distribution F( . ) is symmetric. Something similar holds for stationary processes. Let {e(t); t E R} be a real wide sense stationary process and consider its spectral representation Because W) = e(t), we have W) = f: e- w dZ()..) = f: e W dZ( - )..), which implies that dZ()..) = dZ( - )..). From this [see also (7.4.11) and (7.4.12)], one can deduce that, for any A ",(A) = ",( - A), (7.6.1) E~, (7.6.2) 7.6. Ramifications of Spectral Representation 167 where '1(.) is the orthogonal random measure induced by {Z(t); t E R} and -A={X;-XEA}. Set (7.6.3) It is readily seen that '11 (-) and '12(·) are real measures. According to (7.6.2), we have . (7.6.4) Again consider the real wide sense stationary stochastic process e(t) and write its spectral representation, using (7.6.3), as W) = = f: f: f: (cos At cos At '11 (dA) - + Because + i sin At)('11 (dA) + i'12(dA)) f: sin At '12 (dA) + i{r: cos At '12 (dA) sinAt'11(dA)}. W) is real, f:ao cosAt '12 (dA) + f: sin At '11 (dA) = o. This and (7.6.4) yield W) = f: cos At '11 (dA) + = r : cos At dZ1(A) + f: f: sin At '12(dA) sin At dZ2(A), (7.6.5) where (7.6.6) Now consider the operation of q.m. differentiation of a wide sense stationary stochastic process ~(t). According to Proposition 6.2.4, the q.m. derivative of a second-order process with covariance function C(s, t) exists if 02C(S, t)1 osot .=1 (7.6.7) exists at every t E T. For a wide sense stationary stochastic process {W); t E R} with covariance function C(t), (7.6.7) is equivalent to the requirement that C"(O) exists. This, on the other hand, is equivalent to C"(O) = f: A2 dF(A) < 00. (7.6.8) 7. Spectral Analysis of Stationary Processes 168 Assume now that (7.6.8) holds. It is not difficult to show that eiJ.h - 1 --=-h- ~ iA (7.6.9) (with respect to F) h- ~(t) f: as h --+ O. Next, taking into account (7.5.14), we have W+ eiJ.h - 1 eiAtlJ(dA). h = The following is clearly permissible due to (7.6.8): · W + h)h 1.l.m. I'.Lm. (eiJ.hh- 1) e fro W) __ h~O From this and (7.6.9), we deduce ~'(t) = i f: iJ.< IJ (d1) A. h~O -00 (7.6.10) AewlJ(dA). Let us show that ~'(t) is also a wide sense stationary process. Because by assumption Eg(t)} = 0, clearly Eg'(t)} = O. (7.6.11) In addition, which proves the assertion. Let {~(t); t E T} bewide sense stationary with spectral representation (7.5.14) and let (E <~(R». What is the structure of the r.v. (? Proposition 7.6.1. Let (E <~(R», (a.s.) (= PROOF. then there exists cP f: cp(A)IJ(dA). E L 2 {R,91,F} such that (7.6.12) Set n (n = L Cik~(td; k=l (7.6.13) then by (7.5.14) (7.6.14) In other words, (7.6.12) holds for n CPn(A) = L Cike iAtk . k=l (7.6.15) 7.7. Estimation, Prediction, and Filtering In the general case, for, (7.6.13) such that E <~(R», 169 there exists a sequence of r.v.'s of type But then IIIPn-CPmll = "'n-'ml~O asn,m~oo. Consequently {IPn} is Cauchy in L2 {R, 91, F}, so that there exists an IP E L2 {R, 91, F} such that Ilcp - CPnll ~ 0 as n ~ 00. But then [see (7.3.13.iii)] III/!(cp) - I/!(IPn)1I ~ 0 as n ~ 00 and because we have , = I/!(cp) (a.s.), which proves the assertion. D 7.7. Estimation, Prediction, and Filtering The general estimation problem in L2 {n, 91, P} can be formulated as follows. Let Y E L2 be arbitrary but fixed and let S c L2 be a subspace. Find the element YES closest to Y. From the definition of Y, it clearly follows that IIY- YII =inf{IIY- UII;UES}. (7.7.1) In Hilbert space terminology, Y is the orthogonal projection of Y on S, characterized as the element of S such that Y - Y.iS. (7.7.2) This implies that (Y, U) = (Y, U) . (7.7.3) S. The following two particular cases are of special interest. C L2 {n,91, P} be a stochastic process. Assume that this process is observed up to time t, i.e., its values ~(s) are known for all s ::;:;; t. It is required to predict its value at some moment of time t + t, t > O. In this case, for each U E (i) Pure Prediction. Let g(s); s E R} Y= where §l = W + t) ug(s);s::;:;; t}. and S = L 2 {n,§I,p}, (7.7.4) 7. Spectral Analysis of Stationary Processes 170 and Prediction. Let {X(t);tER} and {Z(t);tER}c be given processes, where X(t) is the one that interests us. Unfortunately, X(t) cannot be observed directly as it is "contaminated" by the unobservable disturbance Z(t). What is observed is the sum (ii) Filtering L2{Q,~,P} Y(t) = X(t) + Z(t). (7.7.5) Here is a simple example of what an engineer would call a "signal" X(t) plus unwanted noise Z(t), giving Y(t), which is what we observe. The problem here is to separate the noise from the signal. More precisely, one must find an estimate X(t) of X(t) from observations Y(s) for s ::5: t. This is called filtering. Clearly, X(t) is the orthogonal projection of X(t) on L2{Q,~:,P}. If we have to find an estimate X(t + -r) of X(t + -r), where -r > 0, from values Y(s) for s ::5: t, this is called filtering and prediction. We now outline a general solution of the estimation problem. Let ~ c ~ be a sub-cr-algebra. We shall denote by L2(~) = L2{Q,~,P}, the Hilbert subspace of L 2 , which consists of those elements which are ~-measurable. We want to show that the orthogonal projection of Y on L2(~) (see Section 10.1 of Chapter 10) is y= E{YI~}· (7.7.6) Clearly, E{YI~} E L2(~). To prove the assertion, it suffices to show that (7.7.2) holds. Take any Z E L2(~); then, (Y - E{YI~},Z) = (Y,Z) - (E{YI~},Z) = E(YZ) - E(ZE{YI~}) = E(YZ) - E(E{YZI~}) = o. Therefore, Y - E{YI~}.1 L2(~). (7.7.7) Remark 7.7.1. The estimate (7.7.6) is clearly unbiased; that is, E{Y} = E(E{YI~}) = E{Y}. Remark 7.7.2. The quantity ~2 = IIY _ YI1 2 (7.7.8) is called the mean square error of the estimate (7.7.6). The next simple example contains the basic idea used in the following to resolve an estimation problem. 7.7.1. In regression analysis, the following problem is of some interest. Let Xl' X 2 E L2 {Q,~,P}; from the family of Borel functions, EXAMPLE h: C -+ C 7.7. Estimation, Prediction, and Filtering 171 such that h(X 1 ) E L 2 , (C is the set of complex numbers) find one that minimizes the norm IIX2 - h(Xdll. Clearly, h(Xd E L 2(a{Xd); therefore, the norm will achieve its minimum if h( . ) is such that X2 where L2(a{X}) = h(X1)~L2(a{X}), - L2 {Q, a{X}, P}. From this and (7.7.6), it then follows that (a.s.). h(Xd = E{X2IXd (7.7.9) From this example we see that the best approximation to X 2 in the class of all Borel functions h(') of Xl [which, of course, must be such that h(Xd E L 2] is E{X2IXd. This solution, unfortunately, has very limited practical applicability because the expectation is not easy to determine. A more useful solution can be obtained by seeking an approximation, not in the class of all Borel functions but in a subclass consisting of the linear functions h(X1) = aX 1· In such a case, the condition X 2 - aX 1 ~ <~> would yield (X 2 ,X1 ) = allX l 11 2 from which we obtain a. The next example is slightly more general. EXAMPLE E{Zd = 7.7.2. Let {Zd~ c L2{Q,~,P} be an orthonormal sequence with 0, k = 1, ... , n. Write ff,. = a{Zl,· .. ,Zn} and consider L2(ff,.). Let Y E L2 be arbitrary; then, according to (7.7.6), the element in L 2(ff,.) closest to Y is Y, defined by Y = E{YIff,.}. (7.7.10) Now, consider the linear manifold <{Zk}D spanned by {Zd~. Clearly, <{Zk}~ >is the subspace (because <{Zk}1 >is closed) consisting of all the linear combinations Lk=l CkZk, where Ck' k = 1, ... , n, are complex numbers. Therefore, the element Yo E <{ZdD closest to Y must be of the form To determine the coefficients rY.b observe that n Y- L k=l rY.kZk ~ <{Zk}D which, due to orthonormality of {Zd~, yields 172 7. Spectral Analysis of Stationary Processes (Y,Zi) = lXi' i = 1, ... , n. Therefore, n Yo = L (Y,Zk)Zk· k=l (7.7.11) Note that Ygiven by (7.7.10) is, in general, a better estimate of Y than Yo. As matter offact, from (7.7.7), we have I Y - Yo 112 = I Y - YI1 2 + I Y - Yo 112. (7.7.12) This fact also follows from <{Zk}D c L2(~). Remark 7.7.3. If Y, Zl' ... ' Zn form a Gaussian system, E{YIZl, ... ,Zn} = n L CkZk E <{Zk}D, k=l where, clearly, Ck = (Y, Zk). Therefore, in this case Y= Yo. To summarize, using the L2 norm as our criterion of closeness, we have shown that the closest element in the subspace L2(~) to an arbitrary Y E L2 is simply the conditional expectation E {YI ~}. On the other hand, the closest element to Y in the subspace <{ZkH) is Yo given by (7.7.11). From (7.7.12), we see that Y represents a better approximation of Y than Yo. 7.8. An Application Consider the simple model (7.8.1) where {en}~<Xl and {Z"}~<Xl c L2 {n,fJI, P}. Here, we interpret en as a signal and Zn as noise. Neither en nor Zn can be observed directly. It is the superposition Xn of these two that can be observed. For the signal gn}~<Xl and the noise {Zn}~<Xl' the following conditions are assumed to hold: (i) {en}~<Xl and {Zn}~<Xl are both wide sense stationary processes with (7.8.2) (ii) gn}~<Xl and {Zn}~<Xl are uncorrelated. In other words, (ej,Zk) = 0 for all k, j: ... - 1,0,1, .... (7.8.3) Denote by (7.8.4) 7.8. An Application k= 173 ... -1, 0, 1, .... Clearly, (7.8.5) C~( - k) = C~(k), In addition, after some straightforward calculations, we have (7.8.6) Let us define the problem of interest here. We want to find a linear estimate of a r.v. Y E L2 on the basis of observations (7.8.7) This is equivalent to the problem of finding the orthogonal projection Yo of Yon the linear manifold <{Xm+Jo) (which is a closed subspace of L 2 ). The best linear predictor Yo E <{Xm+l}o). This clearly implies that it must be of the form (7.8.8) where ho, ... , hr are some (complex) constants. Because Yo.i <{Xm+i}o), Y- it follows that (Y, X m +j ) = (Yo, X m +j ) for any j = 0, ... , r. In other words, (Y,Xm+) = = r L hl(Xm+I,Xm+) 1=0 r L hlCx(i - j). 1=0 Now consider the case when Y = ~m+r+k (prediction with filtration). In such a case, (Y, X m+j) = (~m+r+k' ~m+j + Zm+j) = (~m+r+k' ~m+) = C~(j - r - k). Thus, the system of equations characterizing r L hlCx(i - 1=0 where j Yo becomes j) = C~(j - r - k), j = 0, ... , r, = 0, ... , r. A solution of this system yields ho, ... , hr. 7. Spectral Analysis of Stationary Processes 174 7.9. Linear Transformations Let g(t); t E R} c: L2 {n, at, P} be a wide sense stationary q.m. continuous process with E {~(t)} = 0, spectral distribution F( . ), and orthogonal random measure 1]. Then (see Proposition 7.5.1), ~(t) has the spectral representation (7.9.1) where 111](dA)11 2 = dF(A). Let {((t);t E R} c: L2{n,~,p} be another wide sense stationary process such that ((t) = f: (7.9.2) ei.l.t<pp.)1](dA) where <p(·)EL2{R,Bl,F}. We then say that the stationary process W) was obtained from W) by a linear transformation. The function <p(.) in (7.9.2) is called the "spectral characteristic" of the transformation. Being a stationary process, ((t) has its own spectral representation ((t) = f:oo e iAt1]o(dA). It then follows from (7.9.2) that 1]0 d)' = <p().)1](dA). (7.9.3) In addition, because the spectral distribution Fo(') of ((t) is given by dFo().) = I<p(AW dF(A). (7.9.4) 7.9.1. One of the simplest linear transformations of a stationary stochastic process ~(t) is the "shift" operator 1',. defined by EXAMPLE 1',.~(t) From this, we have 1',.~(t) = = f: W + s). (7.9.5) ei'-te iAS1](dA) which is a particular case of (7.9.2), where <p(A) = eiAs• Clearly, we then have that Fo == F and 1]0 d)' = ei ;'s1] dA. Consider the closed linear manifold <~(R» spanned by the process ~(t). According to Proposition 7.6.1, an arbitrary element <~(R) can be 'E 7. Spectral Analysis of Stationary Processes 176 f: f:oo h(t = f: f: SdC(Sl - s2)'h(t - s2)ds 1 dS 2 h(sdh(S2)C(Sl - s2)ds 1 dS2 exists, which clearly holds due to (7.9.9). Remark 7.9.3. If h(u) = 0 for u < 0, the filter is said to be physically realizable. The integral (7.9.8) is also known as "a moving average process." EXAMPLE 7.9.2. Let cp(.) be the Fourier transform of some integrable function h('), namely, cp(A) = f:oo e-iAth(t) dt. Then, the linear transformation of the process istic (7.9.10) is the process (t) = = = = = Clearly, E {«s)(s + t)} = E = f: f: (f: ~(t) (7.9.10) with the spectral character- e iAt cp(A)l1(dA) e iAt t: (f:oo f: (f: e-iASh(s) dS) l1(dA) eiA(t-S)h(s) dS) l1(dA) eiAUh(t - u) dU) l1(dA) f:oo h(t - u)~(u)du. {f: h(s - t: t: u)~(u) du f: h(s +t- v)~(v) dV} h(Sl)h(S2)C(S2 - sdds 1 ds2· 7.10. Linear Prediction, General Remarks Let {e(t); t E R} C L2 {n,.?4, P} be a wide sense stationary stochastic process with E {W)} = O. As before, we denote by ~(R» the closed linear manifold spanned by ~(t). Clearly, <~(R» c L 2 {n,gB,p}. Because for every t E R, ~(t) < 7.9. Linear Transformations represented as 175 ,= I~oo qJ(A),,(dA), qJ E L2 {R,Bl, F}. Actually, the following result holds. If {W); t E R} c: L2 {n, 81, P} is wide sense stationary and such that <nR» c: <e(R», (7.9.6) then there exists qJ( .) E L2 {R, Bl, F} such that W) I~oo eiAtqJ(A),,(dA). = (7.9.7) To show this, consider ,,,(t) " IXke(t + t k) E <e(R». = L k=l From (7.9.1), we obtain the spectral representation of ,,,(t) as ,,,(t) = I: eiAtqJ,,(A),,(dA) where The general case is now obtained for arbitrary qJ(' ) by passage to the limit. Remark 7.9.1. When (7.9.6) holds, we often say that the process W) is subordinated to e(t). Remark 7.9.2. The derivative (7.6.1) of a wide sense stationary process is another particular case of (7.9.2), where qJ(A) = iA. Consider now another type of linear operation of a wide sense stationary process. The transformation W) = I~oo h(t - s)e(s)ds (7.9.8) of the process e(t) is called "an admissible linear filter." The complex function h( . ) is absolutely integrable, i.e., I: Ih(s) Ids < 00 (7.9.9) and square integrable on every finite interval. The integral (7.9.8) exists if and only if the integral 7. Spectral Analysis of Stationary Processes 178 of e(u) on <e(R u - t » with u = t + 't'. Now, because <e(R U- t2 » c <e(Ru - t ,», inf{lIe(u) - ZII;Z E <e(Ru - t ,»} ~ inf{lIe(u) - ZII;Z E <e(RU - t2 »} (see Proposition 5.4.1). This proves (7.10.4). If the stochastic process {e(t); t E R} is deterministic, then e(t <e(R,» for all t, 't' E R, so that e(t + 't') E + 't') = e(t + 't'). This implies that <5('t') = 0 for all 't' ~ 't'o' Conversely, if <5('t'o) = 0 for some 't'o > 0, the process is deterministic. Indeed, due to (7.10.4) then, <5('t') = 0 for all 't' ~ 't'o. Therefore, e(t for all t ~ + 't') = e(t + 't') E <e(R,» 't'o and all t E R. This then implies that <e(R'+t» c <e(R,» for all t E Rand 't' > O. Consequently, <e(Rt+t » = <e(Rt». In the following, for convenience sake, we will denote by pte(t + 't'), 't' ~ 0 (7.10.5) the orthogonal projection of W + 't') on <e(Rt». With this notation, the error of the linear prediction (7.10.2) can now be written as <5('t') = IIW + 't') - pte(t + 't')II. (7.10.6) Next, we shall prove the following result. Proposition 7.10.1. The stochastic process e(t) is purely nondeterministic only if <5('t') -+ C(O) as 't' -+ +00. PROOF. Assume that <5('t') -+ C(O) as 't' -+ +00. Because <e( all s E R, we have -(0» c lIe(t) - p.W)II ~ lIe(t) - P-ooW)II. if and <e(R.» for (7.10.7) Next, because (e(t) - p.e(t» 1- p.e(t), IIW)1I 2 = IIP.W)1I 2 + IIW) - p.e(t)1I2. (7.10.8) We also have IIe(t)II 2 = IIP_ooe(t)II 2 + IIe(t) - p-ooe(t) II 2. From this, (7.10.7) and (7.10.8), we deduce that IIP.W)II ~ IIP-ooW)II. However, IIe(t)II 2 = C(O) for all t, and IIe(t) - p.e(t)II2 = <5(t - s) -+ C(O) (7.10.9) 177 7.10. Linear Prediction, General Remarks is a "point" in L2 {n, f!l, P}, the whole process represents a curve in the Hilbert space L2 {n, ffI, Pl. The closed linear manifold <~(R) is the smallest subspace of L 2 {n,f!l,p} which contains this curve. For any t E R, write R t = (-00, t]; the <~(Rt) is defined to be the closed linear manifold spanned by r.v.'s ~(s), s ~ t. The subspace <~(Rt) is often said to represent the past and present of the stochastic process g(s); s E R}, given that t is the present time. Clearly, for any tl ~ t 2 , <~(Rt,) ~ <~(Rt2)' Denote by <~( -(0) = n tER <~(Rt)· Therefore, in the terminology just introduced, finitely remote past. <~( -(0) represents the in- Definition 7.10.1. If <~( -00) = <~(R) or, equivalently, if <~(Rt) is the same for all t E R, the stochastic process ~(t) is called "deterministic." If, on the other hand, <~( -00) ~ <~(R), the stochastic process is said to be nondeterministic. Finally, if <~( -(0) {O}, ~(t) is called "purely nondeterministic." = To elucidate somewhat the physical meaning of these notions, we will explain them in terms oflinear prediction. As we have seen in Section 7.7, the problem of linear prediction of a process ~(t) is to find an estimate of ~(t + r) for some r > 0, given the values of ~(s) for s ~ t. The best linear predictor of W + r) is its orthogonal projection on <~(Rt), which is denoted by ~(t + r). From Proposition 5.4.2, we know that the deviation W + r) - + r) 1. <~(R). (7.10.1) IIW + r) - ~(t + r)11 (7.10.2) ~(t Its norm t5(r) = is called the "error" of the linear prediction. It is independent of t because the process ~(t) is wide sense stationary. Note also that ~(t + r) E <~(Rt). (7.10.3) Let us show that, for any 0 ~ r l ~ r2' t5(r l ) ~ t5(r2) The orthogonal projection of W and t5(r) = 0 + r) on <~(Rt) for r ~ O. (7.10.4) is the same as the projection 7.11. The Wold Decomposition 179 as s ~ -00. Therefore, by (7.10.8), IIPsW)II2 ~ 0 as s ~ -00. Consequently, ~(t) 1.. <~( -00) for all t, so <~( -00) Conversely, suppose that <~( -00) = {OJ and let Pt-s~(t) ~ Z = {OJ. as s ~ 00; then Z E <~( -00) = {OJ so that lim <5(r) = lim IIW + r) - PtW + r)11 2 o This completes the proof of the assertion. 7.11. The Wold Decomposition Let g(t);t E R} c L2{n,~,p} be a wide sense stationary nondeterministic stochastic process. The Wold decomposition, which will be discussed next, shows that such a process can be represented as a sum of two mutually orthogonal second-order processes, one of which is deterministic and the other purely nondeterministic. Proposition 7.11.1. Let {W); t Then it can be written as E R} be wide sense stationary with E g(t)} = O. (7.11.1 ) where g 1 (t); t E R} is purely nondeterministic and istic. In addition, g 2 (t); t E R} is determin- (7.11.2) for any s, t PROOF. E R. Set (7.11.3) Clearly, ~2(t) E <~( -(0) for all t and, according to Proposition 5.4.2, (7.11.4) which implies that gl(t); t E R} and {~2(t); t E R} are orthogonal processes. Next, because ~ 1 (t) E <~(Rt) for all t, it follows that (7.11.5) This and (7.11.4) then imply that <~1( -00) = {OJ so that the process ~1(t) is purely nondeterministic. 7. Spectral Analysis of Stationary Processes 180 On the other hand, it is clear that <~2(Rt» so that <~2(R» c <~( c <~( -(0», (7.11.6) Vt E R, -(0». Because ~(t) = ~l(t) + ~2(t), <~(Rt» c <~l(Rt» EtJ <~2(Rt», Vt E R. (7.11.7) From this, (7.11.5), and (7.11.6), we deduce that <~( -(0» c <~l(Rt» Now, (7.11.4) clearly implies that <~l(Rt»..L <~( EB <~2(Rt». (7.11.8) -(0». (7.11.9) Because [see (7.11.8)] <~( -00 » ~ <~1 (R t» EtJ <~2(Rt» for all t, it seems clear that <~( c <~2(Rt» for every t. This and (7.11.6) yield -(0» -(0» <~( -(0» = <~2(Rt» for all t, so that <~2( = <~2(Rt» for all t, from which we conclude that {~2(t); t E R} is deterministic. Using a similar argument we can show that the decomposition (7.11.1) is unique. 0 Corollary 7.11.1. Both processes, Eg 1 (t)} = Eg 2 (t)} = O. ~l(t) and ~2(t) are wide sense stationary with We now give without proof a necessary and sufficient condition for a wide stationary stochastic process {~(t); t E R} to be nondeterministic. Let {Z(t); t E R} be the stochastic process with orthogonal increments associated with ~(t). Then we have the following result. Proposition 7.11.2. For the stationary process ~(t) be nondeterministic, it is necessary and sufficient that it has the representation ~(t) = where f: roo lX(t - s) dZ(s), IIX(sWds < 00, <~(Rt» = <Z(Rt»· Assume that the spectral distribution F(') of the process continuous, i.e., F(A.) (7.11.10) (7.11.11) ~(t) is absolutely = f:oo f(s) ds; then the following result holds. Proposition 7.11.3. The stochastic process ~(t) is purely nondeterministic if and only if its spectral density f( . ) exists and 7.11. The Wold Decomposition 181 f oo Inf(A.~ dA. > 1 + A. -00. (7.11.12) -00 Remark 7.11.1. The stochastic process f ~(t) is deterministic if oo Inf(A.~ dA. = 1 + A. -00. (7.11.13) -00 If f( . ) does not exist, then Remark 7.11.2. The spectral characteristic [see (7.9.2)] <p(A.) of the linear transformation (7.11.10) is given by <p(A.) = too e-i).tlX(t)dt (7.11.14) and represents the boundary value of the analytical function <p*(z) = o. on the half-plane Im{z} < EXAMPLE 7.11.1. Let g(t);t E{W)} too e-iztlX(t) dt E R} be wide sense stationary with = 0 and C(t) = lXe- t2 /4 , IX> O. Because the condition (7.1.9) holds, the spectral density f(·) exists [see (7.1.10)J and 2 f( I\,') -- ~ 2 foo e -iMlXe -t /4dt = 1t -00 ~ 1/2 e _).2 1t . From this, we clearly see that (7.11.13) holds, which implies that the process ~(t) is deterministic. EXAMPLE 7.11.2. Let us show that the wide sense stationary process E{W)} = 0, is deterministic. Indeed, the process has a discrete spectrum [i.e., its spectral distribution F( . ) is a step functionJ. To show this, consider C(t) = E {~(s)~(s = co L e- i ).kt + t)} E('1{A.k})2 k=1 dF(A.) = 0, 7. Spectral Analysis of Stationary Processes 182 If we set In {dF()')/d)'} holds. =- 00 when dF()')/d)' = 0, the condition (7.11.13) 7.11.3. The following example is particularly instructive. Let R} be wide sense stationary with E {W)} = 0 and covariance C(t), which we assume to be infinitely differentiable at zero. Let F(') be the corresponding spectral distribution. Then, according to (7.1.6), EXAMPLE {~(t); t E f~oo eio·dF().). C(t) = From this, it readily follows that f: ). 2 n dF()') < 00 (7.11.15) and vice versa, i.e., if (7.11.15) holds for all n = 1, 2, ... , the covariance function is infinitely differentiable at zero. According to (7.6.8), then (7.11.15) implies the existence (in the q.m. sense) of the derivative ~'(t). Next, because ~'(t) is also wide sense stationary with covariance function we clearly have C;(O) =- f: ).4 dF()'). Therefore, because this integral is finite, the first derivative of ~'(t) (in the q.m. sense) exists and represents a wide sense stationary process and so on. In general, if the condition (7.11.15) is met, the nth derivative ~(n)(t), n = 1,2, ... , exists and has the representation ~(n)(t) = f: (i).)ne iAt'1(d)'). From this, it follows that, for any. > 0, . W + .) = .k L00 k=O k! ~(k)(t). This allows an exact prediction of the process for all • > O. 7.12. Discrete Parameter Processes Let Egn}~oo be a wide sense stationary process with Eg n} = 0 and C(n) = Egkek+n}' According to the Wold decomposition, we have (7.11.16) 7.12. Discrete Parameter Processes 183 (7.12.1) e where enl and en2 are two wide sense stationary processes such that ekl ..L j 2 for all k, j. The first component in (7.12.1), enl is deterministic and the second component en2 represents a purely nondeterministic process. The deterministic component is perfectly predictable, and an explicit formula may be obtained for predicting the purely nondeterministic component as the following proposition shows. Proposition 7.12.1. Let {Xn}~oo be a wide sense stationary, purely nondeterministic process such that E {Xn} = O. Then, Xn 00 = L Cf.k(n-k, k=O (7.12.2) where gm}~oo is a sequence of pairwise orthogonal r.v.'s with EI(ml 2 = 1, and <{Xk}'~OO> = <gk}~OO> (7.12.3) for all m and {Cf.k} '1 a sequence of complex numbers such that Remark 7.12.1. The sequence gm}~oo is often called an "innovation sequence" for the process {Xn}~oo; the reason for this is that (n+1 provides "new information" needed to obtain <{Xk}~~>. We may drop the requirement that <{Xk}~oo> = <{(k}~oo> for all m and (7.12.2) will still hold as the following proposition shows. Proposition 7.12.2. A wide sense stationary process {Xn}~oo is purely nondeterministic if and only if it can be represented as Xn L bkZ k=O 00 = n- k (a.s.), (7.12.4) where {Zm}~oo is an orthonormal system of r.v.'s and Remark 7.12.1. A process in the form (7.12.4) is called a one-sided moving average process. An analogous result to Proposition 7.11.3 also holds in the discrete case. Proposition 7.12.3. Let gn}~oo be a nondegenerate wide sense stationary stochastic process with E {en} = o. If the process is purely nondeterministic, there exists a spectral density f().) such that f:" lnf()')d)' > -00. (7.12.5) 184 7. Spectral Analysis of Stationary Processes Conversely, if the process gn}'~'" has a spectral density satisfying (7.12.5), the process is purely nondeterministic. Remark 7.12.2. From (7.12.5), it follows that f(·) > 0 (a.e.) with respect to Lebesgue measure. Remark 7.12.3. If f~" Inf(A)dA = -00, (7.12.6) the process {~.}':'", is deterministic. 7.12.1. Let {Z.}':'oo be an orthonormal sequence of r.v.'s, i.e., E{Z.} = 0, IIZ.II = 1 for all nand (Zj,Zk) = 0 for all j # k. From this, we obtain that the covariance function Cz(n) is defined by EXAMPLE {I, = Cz(n) n=0 n #0. 0, (7.12.7) Assume that the process {Z.}':'", has a spectral density f(A). Taking into account that {(2n)-1/2 e-i... }':'", (7.12.8) is a complete orthonormal system on ( -n, n) we can write f(A) = - 1 2n L'" . k=-oo Ck e- 1k .., where ck = f~" e-ikY(A.)dA. But from this, (7.12.7), and Herglotz's theorem [see (7.1.8)], we conclude that I, Ck = Cz(k) = { 0, k =0 k # o. Therefore, f().) = {1/2n, -n :::;; ~ :::;; n 0, otherwIse. (7.12.9) The stochastic process {Z.}':'", is often called "white noise." Because the condition (7.12.5) is clearly satisfied, the process is nondeterministic. EXAMPLE 7.12.2. Let {~.}':'", be a stochastic process defined by 00 ~. = L IXkZn-k, k=-oo (7.12.10) 185 7.13. Linear Prediction where {Zn}~oo is a white noise process (see the previous example) and the sequence of complex numbers {ocn}~oo is such that (7.12.11) The process gn}~oo is called a moving average. Clearly, E {~n} = 0 for all n; in addition, 00 00 L L CXiXkE{Zm-jZm+n-d j=-oo k=-oo Egm~m+n} = = 00 L cxk-niik k=-oo 00 = k=-oo L cxkiik+n' (7.12.12) From this and (7.12.1), it follows that the Write {~n}~oo Then the spectral density f~(A) ofthe process h(A) = - 1 L C~(n)e-').n L cxrei).r 12 00 2n n=-oo = -1 2n 1 00 r=-oo is wide sense stationary. gn}~oo is equal to • > O. Remark 7.12.3. The last example shows that for a moving average process the spectral density f(A) > 0 (a.e.) with respect to the Lebesgue measure. One can show that the converse also holds. In other words, a wide sense stationary process with spectral density f(A) > 0 (a.e.) can be represented as a moving average. 7.13. Linear Prediction A particular case of a general estimation theory is that of (pure) prediction; the basic idea was outlined at the beginning of Section 7.7. In Section 7.10, general ideas of linear prediction were discussed in some detail. This section 7. Spectral Analysis of Stationary Processes 186 is concerned with some basic methods and techniques of the theory of linear prediction. Let g(t); t E R} be a wide sense stationary process with E g(t)} = 0, covariance function C(t) (continuous at zero), and a spectral distribution F(A). Given values ofthe process on a set To !:; R t = ( -00, t], we want to "predict" its value at a time t + h, h > 0. The prediction ~(t + h) of W + h) is clearly a functional of g(s);s E To}, i.e., ~(t + h) = ~g(s);s E To}. (7.13.1) We are concerned with the linear prediction of e(t + h). As we have seen in Sections 7.7 and 7.10, the prediction is called "linear" if ~(t + h) E <e(To». (7.13.2) The best possible prediction in such a case would be the element in <e(To» closest (in the L2 sense) to W + h). From Proposition 5.4.1, we know that there exists such an element (uniquely determined) which is the orthogonal projection of W + h) on <e(To». This projection will be denoted by PToW + h). (7.13.3) From Proposition 5.4.2, it follows that e(t + h) - PToW + h) 1. <e(To». From this, it follows that, for any' E <e(To», + h), 0 = The norm of the deviation e(t + h) b(h) = IIW + h) - + h),O· PToW + h), PToe(t + h)1I (7.13.4) + h) - PToW + h)12)1/2, (7.13.5) (e(t = (EIW (PToe(t is called the "error" of the linear prediction. EXAMPLE 7.13.1. Assume that To = {to,tl, ... ,tn} with to < tl < ... < tn =:; t. In this case n PToW + h) = L k=O IXke(tk), (7.13.6) where the sequence of complex constants must be such that (7.13.4) is satisfied. In other words, for any j = 0, ... , n, (W + h), Wj» n = L IXk(Wk), e(t) k=O (7.13.7) + h - tJ (7.13.8) or n L IXkC(tj k=O t k) = C(t 7.13. Linear Prediction 187 The solution of this system of linear equations will give us the values of the coefficients Oto, ••• , Otn , for which the right-hand side of (7.13.6) represents the orthogonal projection of W + h) on <e(To). Let us now determine the value of the "error" of the linear prediction. From (7.13.5) and (7.13.6), we have (j2(h) = II e(t + h) - Jo Otke(tk) 112 (W + h) - ktO OtkWk), W+ h) - ktO Otke(tk)) = (W + h) - ktO Otke(tk), W + h)) = n L = C(O) - k=O OtkC(t +h- tk)· (7.13.9) From this and (7.13.8), we obtain n n (j2(h) = C(O) - L L OtkOtiC(tk - til. k=O i=O (7.13.10) Next, we shall attempt to determine the best linear predictor of e(t + h) when To =Rt • As before [see (7.10.5)], the orthogonal projection of e(t + h) on the subspace <e(Rt) will be denoted by pte(t + h). Because PtW + h) E <e(Rt ), it follows from Proposition 7.6.1 and (7.9.7) that PrW + h) has the representation Pre(t + h) = f: e iAt q>(2, h)" (d2), (7.13.11) where q>(2, h) E L2 {R, al, F} is the spectral characteristic [see (7.9.2)] of the prediction process. As an illustration, consider the previous example, and in particular (7.13.6). Using the spectral representation of the process e(t), we obtain PToe(t + h) = f: eto OtkeiAtk),,(d2). Here, q>(2, h) = n L OtkeiAtk k=O (7.13.12) [the right-hand side of (7.13.12) depends on h because the coefficients are functions of hl To determine q>(2, h) we must devise an analog of the system of linear equations (7.13.8). The following method is due to Yaglom. Starting from the 188 7. Spectral Analysis of Stationary Processes fact that (7.13.13) it follows that, for any s (~(t ~ 0, + h) - P,W + h), W - =0 s)) or, equivalently, E{¢(t + h)~(t - s)} - E{PtW + h)~(t - s)} = O. (7.13.14) Using the spectral representations W + h) = I : e i).(t+h)'1(dl), W - s) = I : e i).(t-S)'1(dl), and (7.13.11), Equation (7.13.14) becomes I : ei).S[ei)'h - cp(l, h)] dF(l) = 0, (7.13.15) = O. (7.13.16) or if F( . ) is absolutely continuous, I : ei).S[ei)'h - cp(l,h)]f(l)dl The corresponding prediction error is b 2(h) = EI~(t + h) = E II : PtW + hW e i).(t+h)'1(dl) - I : eWcp(l, h)'1(dl) 12 = ElI: eW[ei)'h - CP(l,h)]'1(dl)12 = I : leW - cp(l, h)1 2 f(l) dl = C(O) - I : Icp(l, hW f(l) dl. (7.13.7) 7.14. Evaluation of the Spectral Characteristic <p(A, h) According to (7.13.11), the best linear prediction in the L2 sense, of W + h), h > 0, given all the values of ~(s) for s ::;; t, is completely characterized by its spectral characteristic cp(l, h). In some special cases, the determination of 7.14. Evaluation of the Spectral Characteristic q>(}., h) 189 <p(A., h) is simple enough (see Example 7.13.1). However, in general, the situation is far more complex. First, the spectral characteristic <p(A., h) may not even exist (except as a generalized function). Even if it exists, there is generally speaking, no sufficiently simple expression for Pt~(t + h). Second, from Proposition 7.6.1, we know that the <p(A., h) E L2 {R,!Jt, F} and must be a q.m. limit of a sequence of linear combinations of eiJ.s, s ~ O. Finally, the condition (7.13.15) must be satisfied. As a convenience, we list these requirements below. A function <p(A., h) is a spectral characteristic if the following holds: (i) <p(A., h) E L2 {R,!Jt, F}, where F(·) is the spectral distribution which is assumed to be absolutely continuous so that the spectral density f(A.) = F'(A.) exists. (ii) <p(A., h) is the q.m. limit of a sequence of linear combinations of eiuJ.. J:oo eisJ.[eihA - (iii) for all s ~ <p(A., h)]f(A.) dA. = 0 (7.14.1) O. Write H(A., h) = [e ihA - <p(A., h)]f(A.) (7.14.2) and consider the function H(z, h) of a complex variable z which is assumed to be single-valued and holomorphic (analytical) in the upper half of the complex plane {z; Im{z} ~ O}. The following proposition gives a sufficient condition for (7.14.1) to hold. Proposition 7.14.1. If, as Izl-+ some e > 0, then (7.14.1) holds. 00, IH(z, h)1 vanishes faster than 1/lzll+e for PROOF. Because e iZA and H(z, A.) are both holomorphic in the upper half of the complex plane, their product is also holomorphic. Therefore, according to the Cauchy theorem, L e izs H(z, h) dz = 0 for any closed contour C c {z;Im{z} ~ O}. Take C = CR , where CR is formed by the semicircle Izl = R in the upper half ofthe complex plane and [ -R, R]. According to the residue theorem, f-R R eisAH(A.,h)dA. = r eiZAH(z,h)dz. JC (7.14.3) R However (M > 0 is a constant), r eizAH(Z,A.)dzl ~ max IH(z, A.) InR ~ ~ -+ 0 IJ~ ~~ R as R -+ 00. From this and (7.14.3), the assertion follows. D 7. Spectral Analysis of Stationary Processes 190 In the following, we shall assume that f( . ) is bounded, i.e., sup f(A) < (7.14.4) 00, ),ER and that <p(A, h) E L2 {R, f7t, F} is such that (7.14.5) <p(z, h) is holomorphic in {z;Im{z} ~ O}, as Izl-+ <p(z, h) = O(lzlr) z E {z;Im{z} ~ O} 00, (7.14.6) for some r -+ O. The conditions of Proposition 7.14.1, (7.14.4), (7.14.5), and (7.14.6) are sufficient for (i)-(iii) to hold. We will illustrate with some examples how these conditions are used to construct a spectral characteristic <p(A, h). We will confine ourselves to the case where f(A) is a rational function. EXAMPLE 7.14.1. Let the wide sense stationary process g(t); t covariance function C(t) = 0'2e-~lrl, IX > O. E R} have the As in Example 7.1.3, we can show that the corresponding spectral density is 0'2 0'2 IX f(A) = -; A2 + 1X2 = IX -; (2 + lXi)(A - lXi)' Now, we have [see (7.14.2)] 0'2 e izh - <p(z, A) H(z, h) = -; (z + lXi)(z - lXi) The function f(z) possesses a single pole, z = lXi, in the upper half-plane {z; Im{z} ~ O}. Because H(z, h) is assumed holomorphic in this domain, this pole must be canceled by the pole of the numerator at the same point. Therefore, we must have ei(~i)h - <p(lXi, A) = 0, (7.14.7) In addition, the requirement of Proposition 7.14.1 must be satisfied. The only function <p(z, h) which satisfies all these conditions is <p(z, h) = e-~h (constant). Thus, (7.14.8) From this and (7.13.11), we deduce that the best linear predictor Pre(t of e(t + h) is Pre(t + h) = e-~It f: e ir ),l1(dA) = e-~he(t) + h) (7.14.9) which depends only on the value of e(t) at the last observed instant of time. The prediction error is easily found to be ~2 = 0'2(1 _ e2~1t). 7.14. Evaluation ofthe Spectral Characteristic qJ(l, h) 191 The following example is more instructive although slightly more complicated. EXAMPLE 7.14.2. Let {,(t); t E R} be a wide sense stationary stochastic process with covariance function + sinocltJ)e-"'iti, oc > o. (7.14.10) Find the best linear prediction of ,(t + h) given ,(s) for all s ~ t, Let us determine the spectral distribution of W). Because condition (7.1.9) C(t) = u 2 (cosoct holds and C(t) is real, we can uSe formula (7.1.14) to obtain the spectral density f().) = -1 foo C(t)cost)'dt 0 1t 2 = -u 1t foo e"'t(cos oct + sin oct) cos).t dt. 0 After some calculation, we have f().) = 4oc 3 • ).4 + (ocJ2)4 (7.14.11) Because ).4 + (ocJ2)4 = = + i(ocJ2f] [).2 - i(ocJ2)2] [). + (1 + i)oc] [). - (1 + i)oc] [). + (1 [).2 - i)oc] [). - (1 - i)oc], we obtain f().) = [). + (1 + i)oc] [). - (1 + 4oc 3 i)oc] [). 3 ih)' + (1 - i)oc] [). - (1 - i)oc] Hence, H()' h) , = [). h) 40c [e - qJ()., ] + (1 + i)oc] [). - (1 + i)oc] [). + (1 - i)oc] [). - (1 - i)oc] Becase the poles of the denominator in the upper half-plane are ). = (1 + i)oc, ). = -(1 - i)oc, it is clear that the numerator must also vanish at these points, so that + i)oc, h) = e ih(lH)", = e- h(l-i)"" (7.14.12) qJ( -(1 - i)oc, h) = e ih(i-l)", = e- h(lH)",. (7.14.13) qJ«1 On the other hand, from condition (i) we see that f: IqJ()., hW f().) d)' < 00, 192 7. Spectral Analysis of Stationary Processes which clearly holds if = a2 + b. qJ(2, h) (7.14.14) With this spectral characteristic, all the conditions concerning asymptotic behavior of qJ(Z, h) and H(z, h) as Izl--+ 00 are also satisfied. To determine the coefficients a and b in (7.14.14), we use conditions (7.14.12) and (7.14.13) to obtain the system oflinear equations a(1 -a(1 + i)a + b = e-h(l-i)", - i)a + b = e-h(l+i)". Solving this system, we obtain . -h" a Ie . h = --sm a, b = e-h,,(cos ha a + sin hal. From this and (7.14.14), it follows that 2' -h" qJ(2, h) = _Ie_sinha a + e-h"(cosha + sinha). We now substitute (7.14.15) in (7.13.11) to obtain PtW + h) = f oo -00 eitA (2'~ ~sinha = e~h" sin ha f: ) (7.14.15) + e-h"(cosha + sinha) '1 (d2) i2e iAt '1(d2) + e-h,,(cos ha + sin ha) f: eW '1(d2). From this and (7.11.16), we conclude that PtW + h) = e- h" (sinaha ~'(t) + (cos ha + sin ha)~(t)). (7.14.16) 7.15. General Form of Rational Spectral Density In this section we will discuss the problem of evaluation of the spectral characteristics qJ(2, h) in the case of rational spectral densities of the form f(2) = K\(2 -ad· .. (2 - ak)\2, Pd .. .(2 - Pn) (7.15.1) (2 - where K > 0, k < n, and {a;} and {Pr} are the complex numbers such that Im{a;} > 0, j EXAMPLE = 1, ... , k, 1m {Pr} > 0, r = 1, ... , n. 7.15.1. The spectral density f(2) given by 22 + p2 f(2) = K 24 +cx 2' K > 0, a> 0, P> 0, (7.15.2) 193 7.15. General Form of Rational Spectral Density can be written in the form (7.15.1) as follows. Clearly, ,e + /]2 = IA - i/W, (A2f + (a 2)2 = IA2 - ia 212. Hence, f(A) = 1 2 A - if3 (A - aJi)(A + aJi) 1 • The function [eiA.h - <p(A, h)] will be holomorphic in the upper part of the complex plane if the spectral characteristic <p(A, h) has singularities only at points aI' ... , ak • In such a case, <p(A, h) must be in the form <p(A, h) = (A - Q(A, h) ad .. ·(A - ad Q(A, h) TI1 (A - (7.15.3) a)' where Q(A, h) is an entire function (i.e., it does not have singularities at finite points). This takes care of condition (7.14.5). It is also required [see condition (i) of Section 7.14] that f: I<p(A, hWf(A) dA < (7.15.4) 00. In addition [see (7.14.2)], H(A h) = K {eiA.h TI1 (Ie - aj) - Q(A, h)} , (TI~ (A - f3))(TI~ (Ie - 13)) Ii (Ie _ I iX.) (7.15.5) J must be holomorphic in the upper part of the complex plane and such that condition (7.14.6) holds. All these requirements clearly imply that Q(Ie, h) must be a polynomial function Q(A, h) = Co + CIA + ... + Cn_IA n- l . (7.15.6) The coefficients {cj }Z-l will be determined first under the condition that all 131' ... , f3n in (7.15.1) are different, i.e., (7.15.7) Because, by assumption, H(A, h) is holomorphic in the upper half of the complex plane, it follows readily that (7.15.8) for A = 13" 1 = 1, ... , n. Solution of this system of n linear equations gives the unknown constants Co, ••• , Cn - 1 • 7.15.2. Determine the best linear predictor for ~(t + h), h > 0, based s :5:: t, if the spectral density of the process ~(t) is given by EXAMPLE on ~(s), f(A) = (A 2 1 + 2( 2 )(AZ + aZj2)' a> 0. 7. Spectral Analysis of Stationary Processes 194 We can clearly write f(A) = 1 , I(A - iCXj2)(A - icx/j2W (7.15.9) so that n = 2, Pl = icxj2, P2 = icx/j2, and conditions (7.15.2) and (7.15.7) are satisfied. From this it clearly follows that Q(A,h) = Co + C1A, so that Equation (7.15.8) becomes e Uh - (co + C1A) = 0 for A = irxj2 and A = icx/ j2. Therefore, we have the system + icxj2c 1 = exp( - cxhj2), Co + c 1 icx/j2 = exp( -cxh/j2) Co from which we obtain Co = [2 - exp( -rxhj2)] exp( -rxhj2), C1 '0.£.[1 = _/ycx (7.15.10) - exp( -cxh/j2)] exp( -cxh/j2). According to (7.15.3), we now have ((J(A,h) = Co + C1A, so that best linear predictor is [see (7.13.11)] Pr~(t + h) = f: f: eiAt({J(A, h)r/(dA) = Co = coW) eUt '1(dA) + C1 f:oo Ae w '1(dA) + C1 ~'(t) with the coefficients Co and C1 given by (7.15.10). Remark 7.15.1. The last example is easy to generalize. If the spectral density of ~(t) is in the form then clearly ({J(A-, h) = n-l I j=O CjA- j, where the Cj can be obtained as the solution of the system oflinear equations 7.15. General Form of Rational Spectral Density n-1 L 195 = eiJ.h c)) j;O (7.15.11) for A = f3j ,j = 1, ... , n. In this case, the best linear predictor of ~(t ~(s) for all s :::; t, is Pt~(t + h) = + h), given + C 1 ~'(t) + ... + Cn- 1~(n-1)(t). coW) (7.15.12) When the zeros f31' ... , f3n in (7.15.1) are not all different, the procedure for determining the spectral characteristic cp(A, h) remains essentially the same, as the following example shows. EXAMPLE 7.15.3. Let the spectral density be given by In this case, we clearly have that n = 2, so that cp(A, h) = Q(A, h) = We obtain the coefficients Co and eihJ. - (co C1 Co + C 1 A. from the fact that in such a case + C1 A) = 0 for A = irx and d 'h' -[e' "- (co dA • + C1 A)] =0 for A = irx. This yields the equations Co + c 1 iA = e-~h C1 = ihe-~h, . the solution of which is (7.15.13) The best linear predictor of W + h) in this case is Pt~(t + h) = coW) + C 1 ~'(t), where Co and C 1 are given by (7.15.13). In general, if f31 appears 11 times, ... , f3, appears 1, times, where 11 + ... + 1, = n, we have dV dA -v (. e,J.h n (A ' 1 A = f31, f3j)lj - L n-1.) CjA) = 0 j;O v = 0, 1, ... ,lp - 1, p for = 1, ... , r. (7.15.14) 7. Spectral Analysis of Stationary Processes 196 Problems and Complements 7.1. The covariance function of a wide sense stationary process by (to> 0) 1 { C(t) = 0 It I {~(t); t E R} is given ifltl~to to if It I > to. Determine its spectral density. 7.2. Let C(t) = e- 1tl cos ext be the covariance function of a wide sense stationary process. Find its spectral density. 7.3. Does there exist a wide sense stationary process whose covariance function is defined by ~ 0 ~ t ~ to C(t) = {(12 o 1ft> to. 7.4. Let X be a r. v. distributed on [0, tJ with the probability density jx( .). Let Y be another r.v. independent of X and uniformly distributed on [-t, tJ. Show that W) = (1cos(tX + Y) is wide sense stationary and find its spectral density. 7.5. A wide sense stationary process {W); t E R} has the covariance function C(t) (12e- a1tl (1 = + Itl), ex> o. Find its spectral density. 7.6. Find the spectral density of a wide sense stationary process with covariance function C(t) = (12e- a1tl ex> cos Pt, o. 7.7. Is the wide sense stationary process with spectral density (i) (ii) j(J..) = J..2 j(J..) = K {ex 2 + ex + P' (~ _ P)2 + ex 2 + (~ + p)2}' ex > 0, differentiable? 7.8. What are conditions for infinite differentiability of a wide sense stationary process (see Example 7.11.3). 7.9. The wide sense stationary process {~(t); t E R} has covariance function C(t) = e- a1t { 1 + ex Itl Determine its spectral density. + (ext). Problems and Complements 197 7.10. Find the spectral density of the wide sense stationary process with covariance function C(t) = e-·,t,(cos flt - IX sin flltl). 7.11. Show that the orthogonal stochastic measure Yf has the following properties (see Section 7.3): (i) If AI' A2 E g, (Yf(A d, Yf(A 2)) = m(Al n A2)· (ii) If A, BEg are disjoint, Yf(A v B) = Yf(A) + Yf(B) (a.s.). (iii) If {B}f c: g is a disjoint sequence, YfCQ Bk) = Jl Yf(Bk) (a.s.). 7.12. Let Yf: g x Q --+ L {Q, 31, p} be a stochastic orthogonal measure, and let m: g R+ [see (7.3.3) for the definition] be the measure associated with Yf. Define --+ where hE L 2 {R,9i',m}. Show that Yfl is an orthogonal stochastic measure with m 1 (. ) as associated measure. 7.13. (Continuation) If p(.) E L2 {R, 9i',md show that L P(S)YfI (ds) = L p(s)h(s)l'/(ds) (a.s.), BEg. 7.14. (Continuation) If Ihl > 0 (a. e.) em], then Yf(B) = L h- 1 (s)YfI (ds), BEg. 7.15. Let {Zn}f c: L2{Q,31, P} be an orthogonal family such that I f IIZnl12 < 00. Let {S, g} be a measurable space with g containing singletons as elements, i.e., XES => {x} E g. Define Yf: g --+ L2 {Q, 31, P} by en Yf(B) =I n=l ZnIB(Xn), {xn}f c: S, BEg. Show that Yf( . ) is an orthogonal stochastic measure with associated measure m( . ) given by en m(B) = I n=l I Z nI1 2IB(xn). Show also that 7.16. Let Yf be an orthogonal random measure with m(·) as associated measure. If B1 , B2 E g show that 198 7. Spectral Analysis of Stationary Processes (i) 1117(Bd - I7(B2 W = m(B1 A B2 ), (ii) I7(B1 u B2 ) = I7(Bd + I7(B2 ) - I7(B 1 n B2 ) (a.s.), (iii) I7(B 1 A B2 ) = I7(B 1 ) + I7(B2 ) - 217(B 1 n B2 ) (a.s.). 7.17. (Continuation) If B1 c B2 , show that (i) 1117(B1 )11 :s;; 1117(B2 )11, (ii) I7(B2 - Bd = I7(B2 ) - I7(Bd (a.s.). 7.18. Let {~(t); t E R} be a wide sense stationary process with E {W)} = 0 and spectral measure F (we use the same symbol for Lebesgue-Stieltjes measure and the function F that generates the measure). Show that 1.i.m. ' ..... 00 r ~(s) ds = F(O). Jo t 7.19. (Continuation) Under the same conditions as in the previous problem, show that lim -1 t-oo t it C(s) ds = F(O), 0 where C is the covariance function of ~(t). 7.20. Let {X(t); t E R} be a wide sense stationary process with continuous covariance function and spectral representation X(t) = Let ~(t) f:oo eiAtl7x(dA.). be defined by ~(t) = f:oo h(s)X(t - s)ds, where hE L2 {R, 9l, Fx}. Determine the covariance function and the spectral density of W), assuming that the spectral density of X(t) is CPx(·). 7.21. Let {~(t); t ~ O} be a standard Brownian motion and Z a r.v. independent of ~(t), with E{Z} = 0 and EIZI 2 = 1. Suppose that the process {X(t); t ~ O} is observed where X(t) = Zt + ~(t). Given X(s), s :s;; t, determine the best q.m. estimate it of Z in the form it = t I/I(t,s)dX(s). 7.22. Let the process {X(t); t E R} be defined by X(t) = Z cos 2IXt + N(t), where {N(t);t E R} c L 2 {n,£i,p} with E{N(t)} E{N(s)N(t = 0, + s)} = eltl and Z is a r.v. with E{Z} = 0, EIZI 2 = 1, and E{ZN(t)} = O. Determine a q.m. estimator of 1'; of Y given X(s) for all s :s;; t in the form y. = t I/I(t,s)dX(s) + X(O). Problems and Complements 199 7.23. Suppose we observe X(t) = ~(t) + (t), where {~(t); t E R} is a wide sense stationary process with spectral density f~(A) and g(t); t E R} a stationary noise process with spectral density h(A). Assuming that ~(t) and e(s) are orthogonal for all s, t E R, show that the best q.m. linear estimate of ~(t) based on X(s) is given by foo h(A) iAt -00 e h(A) + f~(A) Tfx (dA) . 7.24. (Continuation) Show that the best linear estimate of e(t given by + h) given X(s), s ~ t, is f:oo eiJ.t(A, h)Tfx(dl). 7.25. In Problem 7.23, suppose that h and f~ are specified as follows: HA) = A2 Ki > 0, 0(, + 0(2' P> 0. Show that, for h > 0, (A h) _ <p, where Kl - Kl K1 + K2 0( 0( + p p + iA + y y + iA e -.h , CHAPTER 8 Markov Processes I 8.1. Introduction The concept of a Markov random process was defined in Section 1.7 of Chapter 1. Various particular cases were discussed in subsequent chapters. Without doubt, this has been the most extensively studied class of random processes. Let g(t); t E T} be a Markov process on a probability space {n, 81, P} assuming values in a set S. Denote by Y' a a-algebra of subsets of S. As usual, the measure space {S, Y'} will be called the "state space" of the random process ~(t). The process ~(t) is said to be a homogeneous Markov process if its transition probability has the property Pg(s + t) E BI~(s) = x} = P(x, t, B). (8.1.1) for all t > 0, s, s + t E T, XES, and BEY'. In the following (unless otherwise stated), we will deal exclusively with homogeneous Markov processes. The transition probability (8.1.1) must satisfy the Chapman-Kolmogorov equation P(X, s + t, B) = Is In addition, the following also must hold. For each t E T and P(x, t, .) is a probability measure on Y' and, for every t E (8.1.2) P(x, s, dy)P(y, t, B). XES, (8.1.3) T and BEY', P(', t, B) is an Y' -measurable function on S. In the following, we take T = [0, (0). (8.1.4) 8.1. Introduction 201 The class of homogeneous Markov processes has an important feature. From the Chapman-Kolmogorov equation (8.1.2), we see that knowing the transition probability (8.1.1) for all x, B, and s ~ e, where e > 0 is arbitrarily small, we can determine P(x, u, B) for any U > e. In other words, the local behavior of a homogeneous Markov process (in a neighborhood of zero) determines its global behavior. Every function P(x, t, B) having properties (8.1.2)-(8.1.4) is said to be a transition probability. Let n( . ) be a probability distribution on the state space {S, Y}. If P(x, t, B) is a transition probability, does there exist a Markov process with n( . ) as its initial distribution (see Definition 1.5.3) and transition probability P(x, t, B)? The answer is affirmative. This is a stochastic process {W); t ~ O} with marginal distributions P{Wd = E Bl, .. ·,~(t,,) E Bn} f f n(dxo) S B, where 0 < t1 < ... < t", P(xo,t1,dxd··· {Bdi c Y, n n(B) r P«X,,_1,t. - JB" t,,_1,dx,,), (8.1.5) = 1,2, ... , and = P{ ~(O) E B}. (8.1.6) The family of finite-dimensional distributions (8.1.5) is a consistent one which, according to the Kolmogorov theorem 1.3.1, determines completely and uniquely a probability measure, which will be denoted by P", on the measurable space {SR+, yR+}, where, as usual, R+ = [0, (0). In the sequel, the following notation will be often useful. For any XES, we will write Px (-) = P,,{·I~(O) = x}. (8.1.7) Because n(B) = P" {~(O) E B}, we have P,,(G) = Is Px(G)n(dx), (8.1.8) where G E yR+. The expectation operator corresponding to Px will be denoted by EX" EXAMPLE 8.1.1. As an illustration, consider a homogeneous Markov process {~(t); t ~ O} with state space S = { -1,1}, an initial distribution n( {I}) = t, and a transition probability satisfying P(l,t,{l}) = P(-l,t,{ -I}) = a(t), (8.1.9) where lX(t) is a continuous function on [0, (0) such that a(O) = 1. We want to determine P,,{e(t) = I} and a(t) for all t ~ O. 8. Markov Processes I 202 The first part of the problem is simple. For every t > 0, P,,{e(t) = I} = tP(1, t, {I}) + tp( -1, t, {I}). From this and (8.1.9), we have P,,{e(t) = I} = 1- To determine IX (t), we will use the fact that P(I, t, {I}) is the transition probability of the Markov process e(t) and, therefore, it must satisfy the Chapman-Kolmogorov equation P(I,t + s, {I}) = P(I,s, {1})P(I,t, {I}) + P(l,s, {-I})P( -1,t, {I}). From this and (8.1.9), we readily deduce the functional equation lX(t + s) = IX(S)IX(t) + [1 = 21X(s)lX(t) + 1 - IX(S)] [1 - lX(t)] IX(S) - lX(t). (8.1.10) Now, by means of the substitution lX(t) = t[1 + h(t)], h(t + s) = h(s)h(t). Equation (8.1.10) becomes (8.1.11) This is clearly the Cauchy functional equation whose only continuous solution is h(t) = e-;", A. > O. From this and (8.1.11), we obtain that lX(t) = (1 + e-;")/2. Remark 8.1.1. A more conceptually complex concept of a Markov process was described in Dynkin's book Markov Processes. According to Dynkin's definition of a Markov process, its evolution takes place only up to a certain random time. and then it terminates. More precisely, at time • (w), the trajectory e(t, w) vanishes. For instance, if {e(t); t ~ O} is a Markov process on a probability space {n, &I, P} and a random time .: n -+ (0, (0) is given, then, for any WEn, the trajectory e(t, w) is defined only on [0, .(w)]. Markov processes considered in this book are those for which P{.(w) = +oo} = 1. Such processes are often called "conservative." Remark 8.1.2. It is reasonable to suppose that no transition can take place in zero time. For this reason, we will assume from now on that P(x, 0, B) 1 if x E B = { 0 if x ¢ B. (8.1.12) 8.2. Invariant Measures 203 We will also assume that P(x, t, B) is continuous at zero, i.e., that 1 if x E B lim P(x, t, B) = { 0 if x ¢ B, 1-+0+ for all XES (8.1.13) and BE!/'. 8.2. Invariant Measures Let {W); t ;;::: O} be a (not necessarily homogeneous) Markov process with state space {S,!/'} and transition probability P(s, t, x, B) (0 ~ s < t). It is not difficult to see that if ~(t) is strictly stationary, then it must be a homogeneous Markov process. This readily follows from P(s,t,x,B) = P{e(t) E BI~(s) = x} = P{e(t - s) E BI~(O) = x} = P(O,t - s,x,B) = P(x,t - s,B). The converse, however, does not hold, in general. In other words, a homogeneous Markov process is not necessarily strictly stationary. Under what conditions is a homogeneous Markov process strictly stationary? To answer this question, we need the concept of an invariant measure. Definition 8.2.1. An invariant measure of a homogeneous Markov process with a state space {S,!/'} and transition probability P(x, t, B) is a measure Jl on {S,!/'} satisfying the following condition: for each BE!/' and t ;;::: 0 Jl(B) = I P(x, t, B)Jl(dx). (8.2.1) An invariant measure is not necessarily a probability measure on {S, !/'}. When Jl(S) = 1, it is often called a "stationary distribution" of the Markov process. 8.2.1. Let {~(t); t ;;::: O} be a standard Brownian motion process. As we know (see Remark 3.1.1), ~(t) is a homogeneous Markov process with the state space {R, aI} and transition probability EXAMPLE P(x, t, B) = (27tttl/2 t exp ( - (y ;/)2) dy. (8.2.2) From this, we readily deduce that P(x, t, B) --. 0 as t --. + 00 (8.2.3) for every compact B E f#i. Therefore, it there exists a finite invariant measure Jl, we must have 204 j,L(B) = for every t (8.2.3) that ~ 0 and compact B j,L(B) f: E~. 8. Markov Processes I P(x, t, B)j,L(dx) By letting t --+ +00, we would get from = 0 for all compact B E {]t. Consequently, the only finite invariant measure of a standard Brownian motion is j,L == o. On the other hand, if j,L is the Lebesgue measure, we have, for all t > 0, f:oo P(x,t,B)dx = = f: L L f: P(x,t,dy)dx [(2nt)-1/2 exp ( (y ~tX)2)dXJdY = L dy. In other words, when j,L is the Lebesgue measure, (8.2.1) holds. This clearly implies that the Lebesgue measure is an invariant measure of a standard Brownian motion process. Let {~(t); t ~ O} be a strictly stationary Markov process. Then, as we have just seen, it must be homogeneous. Denote by P(x, t, B) its transition probability and by n(· ) its initial distribution. Clearly then, for all t, nCB) = Pg(t) E B}, From this, we obtain nCB) = BE f/. Is P(x, t, B)n(dx). Consequently, everyone-dimensional marginal distribution of the process is an invariant measure. Conversely, if j,L is a stationary distribution of a homogeneous Markov process, then the process is also strictly stationary under Pw As a matter of fact, because j,L can be considered as the initial distribution of the homogeneous Markov process, say g(t);t ~ OJ, with a state space {S,f/}, we have, for any t > 0 and B E f/, Pg(t) E B} Is Pg(O) dx, ~(t) B} = Is P(x, t, B)j,L(dx) = j,L(B). = E E From this, we deduce [see (8.1.5)] that P{WI + s) E Bl,~(t2 + s) E B2} =fs JBr JBr Pg(s)Edx,Wl+S)Edxl,W2+S)Edx2} 1 2 205 8.3. Countable State Space =f f f = Is Pg(0)Edx,WdEBl,~(t2)EB2} P(x,tl,dxl)P(Xl,t2-tl,dx2)/l(dx) S B, B2 = P{ Wl) E B l , ~(t2) E B 2}, and so on. 8.3. Countable State Space A class of homogeneous Markov processes with countable state space is of considerable interest in a variety of applications, particularly in biology and engineering. In this section, we outline some fundamental properties of this particular class. Let {~(t); t ~ O} be a homogeneous Markov process with countable state space S. Processes of this type are often called "continuous time Markov chains" or simply Markov chains. In the following, without loss of generality, we may assume that S = {O, I, ... }. We will denote by (s, t ~ 0) Pij(t) = Pg(s + t) = jl~(s) = i} (8.3.1) the transition probability from a state i E S to another state j E S after a time-lapse of duration t. In accordance with (8.1.12) and (8.1.13), we have that I if i = j if i =f. j, Pij(O) = { 0 and that . 11m Pij(t) = {I (8.3.2) if i = j 0 if·..4.· 1-+0+ I T" J. (8.3.3) The last condition means that Pij(t) is continuous at t = O. For any t ~ 0 and i,j E S, o : :; Pij(t) :::;; 1 and co L Pik(t) = 1, k=O (8.3.4) so that the transition probability matrix POl (t) Pll (t) ... J ... (8.3.5) 8. Markov Processes I 206 is a stochastic matrix. Due to (8.3.3), we clearly have that lim Mt = 1, t~O+ where 1 is the unit matrix. The Chapman-Kolmogorov equation has the form (s, t ;;::: 0) Piis + t) = 00 L Pik(S)Pkj(t). k=O (8.3.6) By means of this equation, one obtains readily that Ms+t=Ms·M t. (8.3.7) This clearly implies that the family of transition matrices {Mt; t ;;::: o} forms a semigroup. Next, we shall show that Pij(t) is continuous for all t ;;::: O. Proposition 8.3.1. For every i, j Ipij(s PROOF. E Sand s, t ;;::: 0, + t) - Pij(S) I ~ 1 - Pii(t). (8.3.8) From the Chapman-Kolmogorov equation [(8.3.6)], we obtain Pij(s 00 + t) - Pij(s) = L Pik(t)Pkj(s) k¥i Pij(S) + Pii(t)Piis). (8.3.9) On the other hand, 00 00 L Pik(t)Pkj(S) ~ k¥i L Pik(t) = 1 k¥i Pii(t), which yields Pij(S + t) - Pij(s) ~ [1 - pdt)] [1 - Pij(s)]. In a similar fashion, we deduce from (8.3.9) that - [Piis + t) - Pij(S)] =~ 00 L Pik(t)Pkis) + Pij(s) k¥i Pij(S) [1 - pdt)] ~ Pii(t)Pij(S) [1 - Pii(t)], o which proves (8.3.8). Corollary 8.3.1. From (8.3.8), we have that, for any 0 < t < s, Ipij(s - t) - Pij(s)1 ~ 1 - Pii(t). This, (8.3.8), and (8.3.3) clearly imply that Pij(t) is uniformly continuous on R+ for all i,j E s. The question of differentiability of Pij(t) is of central importance in applications. This stems from the fact that attempts to calculate Pij(t), under various regularity assumptions on the Markov chain ~(t), lead to a system 207 8.3. Countable State Space of difference-differential equations whose solution may give Pij(t). Further discussion on this subject requires the following lemma due to Hille. Lemma 8.3.1. Let H: R --+ R, where H(t) = 0 for all t ::;; 0, be subadditive, i.e., for all s, t E R, H(s + t) ::;; H(s) + H(t), (8.3.10) then lim H(t)/t = t~O+ where c ~ sup H(t)/t O,;;t<oo = c, (8.3.11) 0 may be +00. PROOF. From the definition of H(·), it follows that H(t) ~ 0 for all t E R. Now, choose any c' < c. Then, from the definition of c, it follows that there exists s > 0 such that c' < H(s)/s. Next, fix s and write s=nt+b, n= 1,2, ... , where t > 0 and 0 ::;; b < t. Clearly then, c' < !H(s) ::;; nH(t) s When t --+ 0, b --+ 0 and n --+ inequality, we obtain 00 + H(b) = nt H(t) sst + H(b). s so that ntis --+ 1. Consequently, from the last c' ::;; lim inf H(t)/t, t~O+ o which proves the lemma. Proposition 8.3.2. For each i E S, the limit lim (1 - Pii(t»/t = qii (8.3.12) exists but may be equal to +00. PROOF. Write (8.3.13) From (8.3.6), we clearly have Pii(S so that + t) ~ Pii(S)Pii(t), 8. Markov Processes I 208 and, due to (8.3.2), H (0) = O. Consequently, all the conditions of Lemma 8.3.1 are satisfied. Now, from (8.3.13), we have, as t ---.0, 1 Hi(t) ((1 - pdt» = -t-[1 + O(Hi(t»)]. o This and Lemma 8.3.1 prove the assertion. Remark 8.3.1. For all i E S, Pii(t) is differentiable at zero. Next, we discuss the question of differentiability of the transition probabilities Pij(t) at t = 0 for i -=1= j. The following proposition shows that P;j(O) = % always exists and that 1%1 < 00. Proposition 8.3.3. For all i -=1= j, from S, the following limit exists: lim pij(h)/h = qij < (8.3.14) 00. h~O+ PROOF. Let c E (t, 1) be fixed. Then, due to (8.3.3), there exists an s > 0 sufficiently small so that Pii(S) > c and Pjj(s) > c. Let 0 < S ::;; nh < 1>. Consider a homogeneous Markov chain gn}(f with state space S such that p gn+1 = jl ~n = i} pij(h). = Then, for all n > 1, we have pij(nh) = Pgn = jl~o = i} ~ Pgl = j, ~n = jl~o = i} n-I + r=1 L Pgl -=l=j'''''~r-l -=l=j'~r ~ CPiih)( 1 + :t: Pgl -=1= j, ... , = il~o = ~r-I i,~r+1 =j'~n =jl~o -=1= j, ~r = il~o = = i} = i} i}). On the other hand, Pgl -=l=j'''''~r-I -=l=j'~r = i} = Pgl -=l=j'''''~r-2 -=l=j'~r = il~o - Pgl -=l=j'''''~r-2 -=l=j'~r-I =j'~r = il~o = i} = Pgl -=l=j'''''~r-3 -=l=j'~r = il~o = i} - Pgl -=l=j'''''~r-3 -=l=j'~r-2 =j'~r = il~o = i} - Pgl -=l=j'''''~r-2 -=l=j'~r-I =j'~r = il~o = i}, and so on. Continuing in this fashion, we obtain Pgl -=l=j'''''~r-I -=l=j'~r = Pgr = il~o ~ C - (1- c) = il~o = i} L Pgl -=l=j'''''~k-I -=l=j'~k =j}Pgr = il~k =j} = i} - L Pgl -=l=j'''''~k-I -=l=j'~k =jl~o = i} ~ 2c-1. k<r k<r 8.3. Countable State Space 209 Consequently, Pij(nh) ~ (2c - l)npij(h). Set t < b, h < b, and n = [t/h], where [x] is the integer part of x; then, Pij(h)/h ~ pij([t/h]h)/{[t/h]hc(2c - I)}. By letting h --+ 0+, we obtain lim sup (pij(h)jh) ~ Pij(t)/{ tc(2c - I)} < 00 h~O+ for all t > O. Therefore, lim sup (pij(h)/h) ~ lim inf (Pij(t)/{t(2c - l)c}), h~O+ t~O+ which means that the limit qij = lim (Piih)jh) < 00 h~O+ o exists. This proves the proposition. If So c S is any finite subset which does not contain the state i, we have (1 - p;;(h»/h ~ (.L )eSo Piih»)!h. By letting h --+ 0+, it follows [see (8.3.12) and (8.3.14» that 00 qu ~ If qu < 00 L qij=>qU ~ j",i L qu· jeSo and 00 qii = L qij' Ui (8.3.15) the state i is called "stable" or "regular." Otherwise, it is called "nonregular." Remark 8.3.2. The matrix A= (8.3.16) is called the infinitesimal generator of the Markov chain {e(t); t ~ O}. Its essential role will become clear in what follows. Remark 8.3.3. The state i E S is "instantaneous" if qu = qii = O. 00 and "absorbing" if 8. Markov Processes I 210 Proposition 8.3.4. Assume that qii < 00 for each i E S, then the transition probabilities Pij(t) are differentiable for all t ~ 0 and i,j E S. In addition, pW) = 00 L qikPkit) k#i (8.3.17) qiiPij(t). PROOF. Consider any two states i,j E S. Then for any h > 0 and a finite subset No c S containing i andj, we have (Pij(t + h) - pij(t»/h = ( L keNo + Pik(h)Pkj(t) - Pij(t) L keNo" Pik(h)Pkj(t»)!h. (8.3.18) Clearly, (8.3.19) Due to (8.3.15), given any arbitrary small a> 0, we can find No such that L keNo qik:5: a. Now, choose ho > 0 so that for each h :5: ho, k i= i, and kENo, IPik(h)/h - qikl :5: a/II No II and 1(1 - Pii(h»/h - qiil :5: a/II No II, where IINol1 = Card{No} (the number of elements in No). Consequently, from (8.3.18) and the condition of the proposition, we have L keNo-Ii} qikPkj(t) - qiiPij(t) :5: lim inf (Pij(t h-+O+ + h) - :5: lim sup (Pij(t h-+O+ + h) - By letting a ~ 0, (8.3.17) follows. pij(t»/h Pij(t»/h o Remark 8.3.4. The system (8.3.17) is called the Kolmogorov backward equations. The matrix form of it is (8.3.20) Remark 8.3.5. The following also holds: dM'/dt = M'· A, which is called the Kolmogorov forward equation. (8.3.21) 8.4. Birth and Death Process 211 8.4. Birth and Death Process The birth and death process is an important example of the homogeneous Markov chain {~(t); t ~ O} with state space S = {O, I, ... }. Here, ~(t) represents the size of a population at time t, which fluctuates according to the following rules: if at time t the chain is in a state i E S, in one transition it can go only to i-lor i (8.4.1) + 1. The transition from state i to i + 1 indicates a "birth" in the population, whereas the transition from i to i - I indicates a "death." Two or more simultaneous births or deaths are not permitted. We also suppose that "autogenesis," i.e., transition from 0 to 1 is not excluded. Assume that the chain {~(t); t ~ O} is stochastically continuous and that all states are stable [i.e., qii < 00 and (8.3.15) holds]; then, lim (1 - Pii(h))/h = qu < 00, h ... O i E S, for all i E S and, from assumption (8.4.1) we have, as h ..... 0, pij(h) = o(h) Pi,H1 (h) = Aih + o(h), if Ii - il > 1, (8.4.2) Pi,i-1 (h) = I'ih + o(h), (8.4.3) where [in each case o(h) may depend on i] 1'0 = O. (8.4.4) From this, one can deduce the following: Aih + o(h) represents the probability of a birth in (t, t + h) given that ~(t) = i. Similarly, I'ih + o(h) represents the probability of a death in (t, t + h) given W) = i. Because all the states are stable, we have that (8.4.5) qu = Ai + I'i· The infinitesimal generator A (see (8.3.16)] in this case is A= -AO AO 0 0 1'1 -(A1 + 1'1) A1 0 0 1'2 - (A2 + 1'2) A2 0 0 1'3 -(A,3 + 1'3) The parameters Ai and I'i are called the birth and death rates, respectively. Now, from the Kolmogorov backward matrix equation dMI/dt = AMI, 8. Markov Processes I 212 [see (8.3.7)] we readily deduce that POj(t) = - AoPoN) + AOPlj(t), p;it) = Jl.iPi-l,j(t) - (Ai j = 0, 1, ... , + Jl.i)Pij(t) + AiPi+l,j(t), i:?; (8.4.6) 1. This is the Kolmogorov backward system of difference differential equations for a birth and death process. From the Kolmogorov forward equation (8.3.21), we obtain + Jl.IPi,l (t), i E S, Aj-1Pi,j-l(t) - (Aj + Jl.j)Pij(t) + Jl.j+1Pi,j+l(t), p;o(t) = - AoPiO(t) P;j(t) = j:?; 1. (8.4.7) Denote by P{W) =j} = Pj(t); then, from (8.4.7), we obtain that + Jl.IPl (t), pj(t) = Aj-1Pj-l (t) - (Aj + Jl.j)Pj(t) + Jl.j+1Pj+l (t). Po(t) = - AoPo(t) (8.4.8) Remark 8.4.1. If Jl.k == 0, the process e(t) is called a "pure birth" process. If, on the other hand, Ai == 0, the process e(t) is called a "pure death" process. The following is an application of the birth and death process in a telephone traffic problem. EXAMPLE 8.4.1. Consider a telephone exchange where the number of available lines is so large that for all practical purposes it can be considered infinite. Denote by e(t) the number of lines in use at time t. Then, our physical intuition is not violated by assuming that {e(t); t :?; O} is a birth and death process. It also seems reasonable to assume that, for all t > 0, h > 0, and i = 0,1, ... , P{e(t + h) = i + lle(t) = i} = Ah + o(h) because the probability that a call will occur in (t, t + h) is independent of the number of busy lines at time t. On the other hand, P{e(t + h) = i - lle(t) = i} = Jl.ih + o(h) as h --+ 0, for the obvious reasons. Let us calculate Pi(t) under the assumption that Jl.i=iJl.. In this case, the system of difference-differential equations (8.4.8) becomes Ai(t) = - APo(t) + Jl.Pl (t), pj(t) = APj-l (t) - (A + jJl.)pit) + Jl.(1 + j)Pj+l (t). (8.4.9) (8.4.10) 8.4. Birth and Death Process 213 This is not a recursive system (i.e., it cannot be solved successively). For this reason we are compelled to use the method of probability generating functions. Set g(t, u) = co L Pj(t)u j ; j=O then taking into account (8.4.9) and (8.4.1), we obtain ag(t, u) _ ~ '(t) j u at j=O - - - L... Pj = - APo(t)(l - u) + 2JlP2(t)u(1 = - A(l - u) = + JlPI (t)(l - u) - API (t)(l - u)u - u) - ... co L Pj(t)u j + Jl(l j=O -A(l - u)g(t,u) + Jl(l co u) L jPj(t)U j j=l 1 ag(t, u) - u)---att. Thus, the generating function g(t, u) satisfies the linear partial differential equation ag~; u) _ Jl(l _ u) ag~~ u) = _ A(l _ u)g(t, u). (8.4.11) Suppose that e(O) = io; then, g(O, u) = u io . (8.4.12) Solving this equation by standard methods and using the initial condition (8.4.12), we obtain g(t, u) = {1 - (1 - u)e-/lt}io exp {-~(1 - u)(l - e-/lt )}. (8.4.13) Note that the first term on the right side of (8.4.13) is the probability generating function of the binomial distribution with P = exp { - Jlt}, whereas the second term is the probability generating function of the Poisson distribution with mean value Therefore, (8.4.14) where eo(t) is a binomial component, el(t) is a Poisson component, and eo(t) is independent of el(t) for all t ~ O. We now expand (8.4.13) to obtain Pk(t). Mter some calculations, we obtain 8. Markov Processes I 214 _ Pj(t) - exp {_~ Jl (1 _ -lit}. minj!,o.jJ (iO)(~)j-re-lItk(1 - e- lIt )io+ j-2k e) 2.. k (" k)! ' k=O Jl ] . wherej = 0, 1, .... From (8.4.14), we deduce that Eg(t)} = ioe-llt A. + -(1 - e- lIt ). Jl These formulas were first obtained by Palm. 8.5. Sample Function Properties Let {~(t); t ~ O} be a homogeneous Markov process with the state space S = {O, I, ... } and transition probabilities Pij(t) which satisfy condition (8.3.3). In this section, we outline some basic properties ofits sample functions, which are step functions. First, it is not difficult to establish that the process is stochastically continuous (see Definition 1.9.1) which is equivalent to lim P{ W ± h) h-+O+ = W)} = 1, h ~ O. (8.5.1) To show that (8.5.1) holds, consider P{ W + h) = ~(t)} = ex> L pjj(h)pi t). j=O From condition (8.3.3) and the bounded convergence theorem, it follows that Pg(t + h) = ~(t)} -. 1 as h -. 0+. In a similar fashion, one can show that Pg(t - h) = ~(t)} -.1 as h -.0+. Because the process is stochastically continuous, it must be separable and measurable on every compact interval. In such a case (see Proposition 1.10.1), there exists a separable version, which is stochastically equivalent to ~(t), all of whose sample functions are continuous from the right. In the following, we will deal exclusively with this version, which we will denote by the same symbol ~(t). Because the trajectories of ~(t) are (right continuous) step sample functions, it seems intuitively justifiable that, at least when a state i E S is regular, to assume that if for some t ~(t, w) = i, then there exists an h > 0, which may depend on wand i, such that ~(t + h, w) = i. This gives rise to the following question. How long does the process ~(t) stay in a state after entering it? To answer this question, denote by r(t) = inf{h > 0; W) "# W + h)} the length of time the process remains in the state in which it is at time t. Let us calculate the conditional probability 215 8.5. Sample Function Properties (jJi(S) = P{r(t) > = i}, i E S, sl~(t) S ~ (8.5.2) 0 [this conditional probability does not depend on t because the process homogeneous]. Because {r(t) > s + u} = {r(t) > s,r(t + s) > u}, ~(t) is (8.5.3) it follows that (jJi(S + u) = = P{r(t) > s,r(t P{r(t) > sl~(t) Now, {~(t) = i, r(t) > s} property of ~(t), P{r(t + s) > = = c + s) > = ul~(t) = i}P{r(t i} + s) > ul~(t) = i,r(t) > s}. (8.5.4) {W + s) = i}. Consequently, due to Markov ul~(t) = i,r(t) > s} + s) > ul~(t) = i,r(t) > s,W + s) = i} P{r(t + s) > ulW + s) = i} = (jJi(U). P{r(t From this and (8.5.4), we deduce (jJi(S + u) = (8.5.5) (jJi (s) (jJi(U). This is a Cauchy functional equation. Because (jJi(S) is continuous at s = 0, it is continuous at all points s ~ O. In addition, 0 ~ (jJi(S) ~ 1, which implies that the only solution of (8.5.5) must be (jJi(S) = e-A,s where Ai ~ 0, and Ai may be (see also Remark 8.3.3). +00. for all s ~ 0, (8.5.6) We now can give the following definition Definition 8.5.1. A state i E S is called "absorbing" if Ai 00, and instantaneous if Ai = 00. o < Ai < = 0, stable if Clearly, if i is absorbing and if {~(t) = i} occurs, it follows that W + s) for all s ~ O. In other words, if ~(t) enters i, then it stays there forever. On the other hand, if i is a stable state, then = i prO < r(t) < ool~(t) = i} = 1. Finally, if i is an instantaneous state, P{r(t) = Ol~(t) = i} = 1, which implies that the process exists from an instantaneous state as soon as it enters it. This, however, is not possible because, by assumption, we are dealing with a separable version with right-continuous trajectories. Consequently, the state space S of such a Markov chain does not contain instantaneous states. Conversely, if the state space S of a Markov chain does not contain instantaneous states, the process is Cadlag (i.e., continuous on the right and having limits on the left). 8. Markov Processes I 216 Before concluding this section, let us describe once more the structure of the sample functions of a homogeneous Markov chain {~(t); t ;::: O} with state space S = {O, I, ... }, where each element of S is a stable state. At time t = 0, the process is in a state, say i o. It stays there for a random time T1 at the end of which it jumps to a new state. It remains in the new state for a random time T2 and then it jumps to another state, and so on. Because all the states in S are stable, each sample function of the process ~(t) is a right-continuous step function with left-hand limits. Set To = and ° n L 1'", 'tn = n = 0, 1, .... (8.5.7) k=1 Then we have the following definition. Definition 8.5.2. A homogeneous Markov chain {W); t ;::: O} with state space S = {O, 1, ... } is said to be regular if its sample functions are (a.s.) continuous from the right and = sUp'tn n +00 (a.s.). (8.5.8) Remark 8.5.1. The times 't 1, 't 2, •.. are the instants of transitions of the process W)· 8.6. Strong Markov Processes To investigate some deeper properties of a Markov process, the concept of stopping time is required. Consider a stochastic process {X(t); t E R+} defined on a probability space {n,gH,p}, with an arbitrary state space {S, 9"}. We will assume that the stochastic process is directly given (see Section 1.2 of Chapter 1). In other words, n = SR+ and 9" = 9"R+. Therefore, each WEn is a function w: R+ -+S. We now define the concept of a "stopping time" associated with the process X(t). This random variable is also known as a "Markov time" or a "random variable independent of the future" (Dynkin). Definition 8.6.1. A random variable 't: n -+ [0, 00] is called a "stopping time" if it has the property that for any two such that W1(8) = W2(8) and 't(w 1) ::;; t, then 't(w 1) = 't(W2)' for all 0::;; s ::;; t W 1, W 2 E n 8.6. Strong Markov Processes 217 From the definition of a stopping time L, it clearly follows that the occurrence or nonoccurrence of the random event {w; L ~ t} can be established from the observation of the random process X(s) on [0, t] alone. 8.6.1. Clearly every constant L = t > 0 is a stopping time. The first visiting time of a certain set A E!/' by the process X(t) is also stopping time, i.e., EXAMPLE LA(') = inf{s;X(s,·) E A}. Clearly, in this case 0 ~ LAO ~ 00. The nth visiting time of A is also a stopping time. However, the time of the last visit to A is not a stopping time. The sum L1 + L2 of two stopping times L1 and L2 is also a stopping time. The random variables inf{L1,L2} and SUP{L1,L2} are also stopping times. Consider now a homogeneous Markov process {e(t); t ~ O} with an arbitrary state space {S, !/'}. These properties of W) imply that, for any 0 ~ Sl < 00. < Sm < sand 0 < t1 < 00. < tn' + td E B1,oo.,e(s + tn) E Bnle(sd = x 1,.oo,e(sm) = xm,e(s) = x} = Px{ Wd E B 1, 00., Wn) E Bn} (8.6.1) P{e(s [see (8.1.7)] where {BJi c!/,. It seems intuitively reasonable to expect that relation (8.6.1) will hold if s is replaced with a stopping time L. This, however, is not true in general. Markov processes having this property are called "strong Markov processes." The following definition gives a precise formulation of the concept. Definition 8.6.2. A measurable homogeneous Markov process with a state space S is said to have the strong Markov property if, for any stopping time L, P{e(L = + t 1) E B 1,oo.,e(L + tn) E Bnle(s);s < L,e(L) = Px{Wd E B1,oo., Wn) E Bn}. x} (8.6.2) When is a homogeneous Markov process endowed with the strong Markov property? The following result is due to Dynkin and Yushkevich (1956). Proposition 8.6.1. A homogeneous Markov process {e(t); t ~ O} with state space {S,!/'} and transition probability P(x, t, B) has the strong Markov property if and only if for any XES, BE!/', and t > 0, PX{e(L + t)EB} = Is P(y,t,B)Px{e(L)Edy}. (8.6.3) 8. Markov Processes I 218 There is an extremely useful criterion for determining whether a Markov process has the strong Markov property. To formulate this criterion, some new notation is needed. Suppose that S is a metric space. Then, Y' is the Borel algebra. Denote by B the space of all bounded Borel measurable functions on S. With the usual supremum norm Ilhll = sup Ih(x)l, (8.6.4) x the space B is a Banach space. For each t ;;?: 0, define an operator Tt by (T' h)(x) = 1 h(y)P(x, t, dy) (8.6.5) or equivalently (8.6.6) Because 1(T'h)(x)1 = 11 h(Y)P(X,t,dy)l::;; s~p Ih(y)1 = Ilhll, T': B -+ B is a contraction. Proposition 8.6.2. Let {~(t); t ;;?: O} be a homogeneous Markov process with state space {S, Y'} and a transition probability P(x, t, B) which has the property that the operator (8.6.5) maps continuous bounded functions into continuous bounded functions. Then, the process ~(t) possesses the strong Markov property. Remark 8.6.1. Processes satisfying conditions of Proposition 8.6.2 are called "Feller processes." Remark 8.6.2. The following result is due to Yushkevich. Let {~(t); t ;;?: O} be a homogeneous Markov process with countable state space S. If ~(t) is stochastically continuous, there exists a separable version which possesses the strong Markov property. In other words, if all elements of S are stable states the Markov chain is a strong Markov process. 8.7. Structure of a Markov Chain Let us now return to Markov processes with countable state spaces and examine in more detail some questions concerning their stochastic structure. To this end, consider a homogeneous Markov chain {~(t); t ;;?: O} with state space S = {a, I, ... } and transition probability Pij(t). We will assume that condition (8.3.3) holds; in such a case, Pij(t) is called a "standard" transition 8.7. Structure of a Markov Chain 219 probability. From Section 8.5, we see then that the process e(t) must be stochastically continuous. This, on the other hand, implies that e(t) is separable and measurable. Then (see Proposition 1.10.1) there exists a separable version all of whose sample functions are continuous from the right and have left-hand limits. According to what was stated in Section 8.5, the state space S of e(t) does not contain instantaneous states. In the rest of this section, we will deal exclusively with the separable version of the Markov chain, which will be denoted by the same symbol e(t). As before, let us denote by T I, T 2, ..• the instants of transitions of the process e(t), and by To = 0, T" = Tn - Tn-I' n = 1,2, .... Clearly, each Tn is a stopping time. Because the process e(t) is measurable, X o, X I ' ... , defined by (8.7.1) are clearly random variables. Consider the bivariate sequence {(Xn' T,,)}O', To =0. (8.7.2) What is the stochastic structure of this bivariate process? This and related questions will be discussed in the rest of this section. For this purpose, the following two lemmas are needed. Lemma 8.7.1. For all i E Sand t ~ 0, Pj{TI > t} = e- l ". (8.7.3) PROOF. Notice that Pj{TI > t} On the other hand, for any t Pj{TI > t ~ = Pj{e(s) = i,O::;;; s::;;; t}. 0, u ~ (8.7.4) 0, + u} = Pj{e(s) = i,O::;;; s::;;; t + u} = Pj{e(s) = i,O::;;; s::;;; t,e(v) = i,t::;;; v::;;; u + t} = Pj{e(s) = i,O::;;; s::;;; t}Pj{e(v) = i,t::;;; v::;;; t + ule(s) = i,O::;;; s::;;; t}. Now, because e(t) is a homogeneous Markov process, we readily deduce that Pj{e(v) = i,t::;;; v::;;; t + ule(s) = i,O::;;; s::;;; t} = ~{e(v) = i,O::;;; v::;;; u}. Therefore, taking into account (8.7.4), we obtain Pj{T1 > t + u} = Pj{TI > t}Pj{T1 > u}. From this, in the same way as in Section 8.5, we conclude that Pj{TI > t} = e- l ", which proves the assertion. o 8. Markov Processes I 220 Lemma 8.7.2. Under Pi' the Tl and Xl are independent random variables. PROOF. Denote by t + To = inf{s; ~(s) =1= i and s > t} the first exit time from the state i after time t. Then, for all t > 0 and i we have (if Pi{Tl > t} > 0) =1= j E S, Pi{Tl > t,X l =j} = Pi{Tl > t,W) = i,~(Td =j} = Pi{Tl > t,W) = i,W = Pi{W + To) =jlTl > + To) =j} t,~(t) = i}Pi{Tl > t,W) = i}. Because, by assumption Pi{Tl > t} > 0, Pi{W + To) =jlTl > t,W) = i} = Pi{W + To) =jl~(t) = i} = Pig(Td =j} (due to homogeneity of the Markov process). This gives Pi{Tl > t,X l = j} = Pig(Td = j}P;{Tl > t}, o which proves the assertion. The following proposition is the central result of this section. Proposition 8.7.1. The bivariate sequence {(T;,Xi}O' represents a Markov renewal process. In other words, for any n = 0, 1, ... , t > 0, j E S, P{T,,+1 > t,Xn+1 =jl(T;,Xi)o} = Pi{Tl > t,X l = i} on the set {Xn (8.7.5) = i}. PROOF. Let us begin with some facts. First, ~(t) has the strong Markov property. Second, knowing {(T;, X;)}o is equivalent to knowing ~(t) for all o ::;; t ::;; Tn' Therefore, P{T,,+l > t,Xn+l =jl(T;,X;)i} = P{T,,+l > t,Xn+l =jl~(t);t::;; Tn} = fX) P{T,,+l Eds,~(Tn + s) =jl~(Tn)} = fX) Px .{ T,,+1 E ds, ~(s) = j} = pxJ Tl > t, ~(Tl) = j} (a.s.) This proves the proposition. o 8.7. Structure of a Markov Chain 221 Corollary 8.7.1. Write (8.7.6) Then, from the last proposition and Lemmas 8.7.1 and 8.7.2, it follows that on {Xn = i},for all n = 0,1, ... , (8.7.7) Corollary 8.7.2. From (8.7.7), we easily deduce that {Xn}O' is a homogeneous Markov chain with state space S and the transition probabilities (8.7.8) In addition, (8.7.9) Corollary 8.7.3. From Proposition 8.7.1, it also follows that P{T1 > t 1,···, 1',. > tnlXo = io,···,Xn = in} = P{T1 > t11 X o = io }·· .P{1',. > t nlXn- 1 = in-d (8.7.10) Corollary 8.7.4. It is not difficult to show that the bivariate sequence of random variables {(tn, Xn)}O' represents a homogeneous Markov chain with transition probabilities (s < t) (8.7.11) The next topic to be discussed is the question of regularity of a Markov chain (see Definition 8.1.5). Under what conditions on {A.i}O' does (8.5.8) hold? If the sequence {A'i}O' is bounded, sup A.i ::;;; i P< (8.7.12) 00, then the process is certainly regular. This can be proved as follows. If (8.7.12) holds, then e-A,I ~ e- P1 for all t ~ O. From this and (8.7.10), we have P{T1 ::;;; t 1 ,···, 1',. ::;;; tn } n (1 n ::;;; e- P1j ). 1 From this, we deduce that, for all n = 1, 2, ... , tn ~ t k=l Zk (~means in distribution), where {Zk}O' is an Li.d. sequence ofrandom variables with common distribu- 222 8. Markov Processes I tion function 1 - e- Pt . Hence, P{tn for all 0 < t < 00. ~ t} ~ pt~ Zk ~ t} = By letting n --+ 00, k~n (~t we deduce that ~ t} ~ e- Pt lim P{t e- pt f n~oo k:n ({3?k = O. k. The following proposition provides a more general criterion for regularity of ~(t). Proposition 8.7.2. A necessary and sufficient condition for a homogeneous Markov chain to be regular is that P{fAX~ = oo} = 1. k:1 PROOF. According to (8.7.9), for any i E{e-aTnIXn_1 = i} E S and a> 0, = too e- as P{1;' = Ai E dslXn- 1 = i} too e-rxse-l,sds = Ad(a + AJ From this and (8.7.10), we deduce that, for every n = 1,2, ... , n-1 E{e-atnIXo = iO,oo.,Xn = in} = Aij(a + Ai) n k:O or Set t = Ik"=1 7;,; then Because lim E{e- at } = E{I{t<oo}}, a--+O+ we have Finally, it is well known that 223 8.8. Homogeneous Diffusion 00 lim fI Axj(a + Ax.) = {I0 a~Ok=O if LAx! < 00 k=O 00 if LAx! = k=O 00. This yields pL~o Ax! < (f)} P{r < oo} = o and the assertion holds. 8.8. Homogeneous Diffusion Let {e(t); t ~ O} be a real homogeneous Markov process (i.e., S = R, the real line, and !f = fll, the a-algebra of Borel subsets of R) with (a.s.) continuous sample functions and a transition probability P(x, t, B). The stochastic process e(t) is termed a "homogeneous diffusion" if the transition probability satisfies the following conditions: for each (j > 0 and x E R, (i) (ii) (iii) lim t~O+ ~ lim t~O+ lim t~O+ t ~ t ~ t Jr Jr P(x, t, dy) = 0, Iy-xl >,j (y - x)P(x,t,dy) = b(x), (8.8.1) ly-xl:s;6 Jr (y - Iy-xl:s;,j xf P(x, t, dy) = a 2 (t). The first condition seems justified in view of (a.s.) continuity of the sample paths of the process. The function b(') characterizes the "average trend" of evolution of the process over a short period of time, given that e(O) = x and is called the "drift" coefficient. Finally, the non-negative quantity a 2 ( • ) determines the mean square deviation of the process e(t) from its mathematical expectation. The function a 2 (.) is called the "diffusion" coefficient. We will assume that the functions b( . ) and 0'2(. ) are finite. The three conditions (8.8.1) can be written in slightly different form as follows: lim t~O+ ~(1 t - Pxg(t) E lim + (j])) = 0, ~Ex[e(t) - lim t t~O+ t~O+ [x - (j, x ~Ex[W) t e(O)] = b(x), e(0)]2 = 0'2(X). 8. Markov Processes I 224 The central point in the theory of diffusion processes is that, for given functions b(· ) and 0"2 ( .), there is a unique and completely determined transition probability P(x, t, B) which yields a homogeneous Markov process with (a.s.) continuous sample functions. There are various methods for determining P(x, t, B), ranging from purely analytical to purely probabilistic. The method presented here was developed by Kolmogorov in 1931. Consider u(t, x) = f: (8.8.2) cp(y)P(x, t, dy), where cp(.) is bounded continuous function. We shall attempt to derive the partial differential equation which u(t, x) satisfies, subject to the initial condition (8.8.3) u(o+,x) = cp(x). The equation that we will derive is known as Kolmogorov's backward diffusion equation, which is satisfied by the function u(t, x). Proposition 8.8.1. Assume that conditions (8.8.1) hold and that the function u(t, x) defined by Equation (8.8.2) has a continuous second partial derivative in x for all t > O. Then, au au 1 2 a2 u at = b(x) ax +"20" (x) ax 2 ' t > O. (8.8.4) PROOF. Starting with the Chapman-Kolmogorov equation (8.1.2), we readily obtain that, for each h > 0, u(t + h, x) = = = f: f: f: cp(y)P(x, t P(x, h, dz) + h, dy) f: cp(y)P(z, t, dy) u(t, z)P(x, h, dz). Consequently, u(t + h, x) - u(t, x) h = 1 Ii foo -00 P(x, h, dy){ u(t, y) - u(t, x)}. (8.8.5) Now, according to condition (8.8.l.i) and the fact that cp(.) is a bounded function, Equation (8.8.5) can be written as follows: for each b > 0 (as h -+ 0), u(t + h, x) - u(t, x) h 1 = Ii r P(x,h,dy){u(t,y) t-6 XH u(t,x)} + 0(1). (8.8.6) 225 8.8. Homogeneous Diffusion Next, we expand u(t, x) in Taylor series to obtain u(t,y) = u(t, x) ou(t, x) x)ax + (y - + (y - X)2 02U(t, x) 2! . ox2 + (y - . xfR(t,y) with R(t, y) ~ 0 as y ~ x. Substituting this expansion into the right-hand side of Equation (8.8.6) and then using conditions (8.8.l.ii) and (8.8.l.iii), we arrive at r 1m u(t h~O + h,x) h u(t,x) _ b( )ou(t,x) - x -~- uX 1 2( )02 U(t,X) x ~ 2 + -2 q uX To complete the proof, we must consider the case h < O. This is handled in essentially the same way beginning with u(t + h, x) h u(t, x) = 1 Ii foo -00 P(x, JhJ, dy){u(t + h,y) - u(t + h,x)}. The analysis is the same as before (we use the Taylor expansion) except that, in obtaining the limit as h ~ 0, it is necessary to invoke the joint continuity of ou/ox and 02U/OX 2 in t and x. In particular, we need the fact that R(t, y) ~ 0 as y ~ x. This proves the assertion. 0 Remark 8.8.1. The error term in the Taylor expansion leads to an integral bounded by the quantity 1 fX+cl X-cl~~~X+cl JR(t,y)J Ii x-cl P(x,h,dy)(y - X)2. Its lim sup as h ~ 0 is not larger, then max JR(t, y)J q2(X) ~ 0 as J ~ O. Remark 8.8.2. To obtain the transition probability P(x, t, dy), assume that the transition probability density exists, i.e., P(x, t, B) = L p(x, t, y) dy, (8.8.7) and that the derivatives op(x, t,y) ox exist. Then, taking into account Equation (8.8.2), we can write Equation (8.8.4) as f:oo <p(y) {op(~/, y) - b(x) Op(~:, y) _ ~ q2(X) 02p!:~t, y)} dy = 0, where <p(.) is an arbitrary continuous bounded function. This, then, implies the equation 226 8. Markov Processes I op(x,t,y) _ b( )op(x,t,y) _ ~ 2( )02 p(X,t'Y)d = 0 2u x ox2 Y . ot x ox (8.8.8) Under appropriate conditions on b(·) and u(·), a solution of this equation exists and is unique. It represents a transition probability of the homogeneous diffusion process if, for instance, the coefficients b(x) and u2(x) are bounded and satisfy the Lipschitz conditions Ib(y) - b(x)1 s Ciy - xl, lu 2 (y) - u 2 (x)1 s Ciy - xl and if u 2 (x) ~ u 2 > o. Assume again that conditions (8.8.1) hold. If the transition probability density exists, if it has a derivative op(x, t,y) ot which is continuous with respect to t and y, and if the function b(y)p(x, t, y) is twice continuously differentiable with respect to y, the equation satisfied by p(x, t, y) for fixed x turns out to be op(x, t, y) ot 1 02 0 = - o/b(y)p(x, t, y)] + 2" oy2 [u(y)p(x, t, y)]. (8.8.9) This is the Fokker-Planck equation or the forward Kolmogorovequation. We will not derive this equation here. The study of diffusion processes via the backward partial differential equation has been reasonably successful in many cases. The basic problem with this method is that partial differential equations of this type are difficult to solve. In addition, the probabilistic input is very small. An alternative approach, more probabilistically appealing, was proposed by P. Levy and carried out by K. Ito in 1951. The idea here is to approach the problem in a fashion similar to Langevin's approach (see Section 3.7 of Chapter 3). Roughly speaking, what Ito has shown is that a diffusion process is governed by a stochastic differential equation of the form d~(t) = b(~(t)) dt + u 2R(t)) dW(t), (8.8.10) where b(·) and u 2 (.) are those from (8.3.1) and W(t) is a standard Brownian motion. Processes satisfying (8.8.10) are called locally Brownian. EXAMPLE 8.8.1. Find the solution of the stochastic differential equation d~(t) = tXoW)dt + tXl dW(t), where tXo and tXl > 0 are constant and W(t) is a standard Brownian motion. Here, clearly, b(x) = tXoX and u 2(.) == tX l • As we have pointed out [see Equation (3.7.5)], a solution of this equation 227 Problems and Complements over an interval containing a point to is defined as a process having continuous sample functions which satisfies the equation i' de(s) = txo Jto r' e(s)ds + r' dW(s) txl Jto Jto or e(t) - e(t o) = txo rt e(s) ds + txl [W(t) Jto - W(t o)]. To solve this equation, multiply both sides by e-a.ot to obtain e-a.o'[ w) - txo 1: e(s) dSJ = [Wo) - txl W(t o)]e-l1.ot + txl e-a.otW(t). This, on the other hand, can be written as d(e-a.ot 1: e(S)dS) = [e(t o) - txl W(to)]e-a.o'dt + txle-l1.otW(t)dt. Integrating both sides from to to t, we have e-a.ot rt e(s)ds = e(to) - J~ txl ~ W(t o) (e-a.oto _ e-a.ot,) + txl rt e-a.oSW(s)ds J~ or, equivalently, rt e(s)ds = e(to) - J~ txl ~ W(t o) (ea.o(t-,o) _ 1) + txl I' ea.o(t-S)W(s)ds. J~ Finally, by differentiating with respect to t, we see that e(t) = (e(t o) - txl W(to»el1.o(t-t o) + txl ( W(t) - txo 1: ea.o(t-S)W(s) dS)' Take to = 0 and assume that e(O) = 0; then, e(t) becomes e(t) = txl W(t) - txOtxl I ea.o(I-S)W(s)ds. This is clearly a Gaussian process with Eg(t)} = 0 and Eg(t)e(u)} = txl(ea.o(t+u) - ea.o1t-u1)/txo. Problems and Complements 8.1. Let {W); t ~ O} be a real homogeneous Markov process with transition probability P(x, t, B). Show that, for any Borel function f: RR -+ R and every tl < ... < tR' 8. Markov Processes I 228 8.2. Show that a stochastic process {,(t); t ~ O} with a state space {S, .'I'} is a Markov process if and only if, for any 0 :::;; SI < ... < Sk < t < tl < .,. < tn' = x} Pg(SI):::;; x1,···,'(x k ):::;; xk,Wd:::;; Yl,···,Wn):::;; YnIW) p(Ci = gi:::;; xdlW) = x 8.3. (Continuation) Show that {W); t Eta = !.('(Si)) JJ ~ )pCa gj:::;; yJIW) = x). O} is a Markov process if and only if h)Wj))IW) = x} E{]j !'('(si))IW) = x}ELa h)W))IW) = x}, where!. and hj are real Borel functions on {S, 9'}. 8.4. Let {W); t ~ O} be a Markov process. Show that the sequence Markov property. 8.5. (Continuation) Set Zk Markov process? = g(k)}~ has the [,(k)], where [x] is the integer part of x. Is {Zd~ a 8.6. Let {,(t); t ~ O} be a real Markov process and f: R ...... R a Borel function. Show by a counterexample that {f(,(t)); t ~ O} is not necessarily a Markov process. However, if f(· ) is one-to-one, the Markov property is preserved. 8.7. Let {W); t Show that ~ O} with E {W)} = 0 for all t ~ 0 be a Gaussian random process. is a necessary and sufficient condition for ,(t) to be a Markov process, where 0:::;; tl < ... < tn' n = 2, 3, .... 8.8. Let {'i(t); t Show that ~ O}, i = 1, 2, be two independent standard Brownian motions. X(t) = (, dt) + '2(t))2 is a Markov process. '1 8.9. Assume that {Ut); t ~ O}, i = 1, 2, are two zero mean, independent, strictly stationary, Markov processes. Under what conditions is Y(t) = (t) + '2(t) a Markov process? 8.10. Let {W); t ~ O} be a standard Brownian motion and LX = inf{t;W) = x}. Show that {LX;X > O} is a process with independent increments and, hence, Markovian. 8.11. Let {,(t); t such that ~ O} be a homogeneous Markov process with state space S = {-1,1} + t) = -1IW) = -1} = (2 + e- 3 ')/3, Pg(s + t) = -1IW) = 1} = (2 - e- 3 ')/3. Pg(S Find all invariant measures of the process. 229 Problems and Complements 8.12. Let {~(t); t ~ O} be a standard Brownian motion. Show that the process Z(t) = IW) + xl, x> 0, (so-called "Brownian motion with reflecting barrier") is a homogeneous Markov process. 8.13. Let g(t); t ~ O} be a standard Brownian motion. A Brownian bridge {X(t); t E [0, toJ} is defined by W) + x X(t) = t - -(W) to +y- x). This is clearly a Brownian process that starts at point x at time t = 0 and passes through point y at time to. Show that X(t) is a Markov process and determine its mean and covariance function. 8.14. (Continuation) Show that X(t) and X(t o - t) have the same distribution. 8.15. Let {~(t); t ~ O} be a standard Brownian motion. Show that X(t) = e-t~(e2t) is a strictly stationary Markov process. Find its mean and covariance function. 8.16. Let {X(t); t ~ O} be a stationary Gaussian process. If X(t) has the Markov property, show that its covariance function has the form ce- 01tl , c > 0, ex > O. 8.17. Let {~(t); t ~ O} be a homogeneous Markov process with transition probability P(x, t, B). If + h,t,B + h) = P(x,t,B) (spatial homogeneity) where B + h = {x + h;x E B}, then show that P(x W) has stationary independent increments. 8.18. Let g(t);t ~ O} be a homogeneous Markov process with state space S = {O, 1, ... }. Determine Pij(t) = Pg(s + t) = jl~(s) = i} assuming that Pij(t) = 0 if i > j; Pi,i+! (h) Pij(h) = o(h) 8.19. Assume that a state j E = Ah + o(h) as h ..... 0+; if h ..... 0+ when j - i ~ 2. S = {O, 1, ... } is stable (see Definition 8.3.15). Show that pW) = - Pij(t)qi + <Xl L Pik(t)qkj k#j (this is the forward Kolmogorov equation). 8.20. Let {~(t); t ~ O} be a homogeneous Markov chain with state space S = {O, 1,.,.} and transition probability Plj(t). Assume that Pi,i+!(h) = Ah + o(h), pi,/-1(h) = iJ.lh + o(h) as h ..... 0+. Determine P{W) = k} and show that W) = where ~o(t) is binomial, ~ 1 (t) ~o(t) + ~l(t), is Poisson, and ~o(t) is independent of ~ 1 (t). 8. Markov Processes I 230 8.21. Let g(t); t ~ O} be a "pure birth" process (see Remark 8.4.1). Denote by 0< < < ... its discontinuity points. Show that {Z.}f is a sequence of independent random variables where Zl = '1, Z. = '. - '.-1, n ~ 2. '1 '2 8.22. (Continuation) Show that P{Z.+1 ::;; t} = 1 - e- ln ', where An is defined by (8.4.4). 8.23. (Continuation) Show that (i ::;; j) Pij(t) = Pg(s + t) = jl'i = s}. 8.24. (Continuation) Show that pit) p{ t Zk::;; = k=i+l 8.25. (Continuation) Show that (Ai :ft P{W) = n} = PWt) =f. t} - p{ jf Zk::;; k=i+1 t}. Aj if i =f. j) Aj kto (exp - ·'!f.1 (Aj l Ak)). n ~ 1, = O} = e- lo'. 8.26. (Continuation) Show that Pk.(t) = U AjCt e- li '!f.1 (Aj - Ak)). n> k, Pkk(t) = e- l .,. 8.27. Let {W); t ~ O} be a pure birth process. In order that I 00 P{W) = n} = 1 for all t, "=0 show that it is necessary and sufficient that the series diverges. 8.28. Let {~(t); t ~ O} represent the size of a bacterial population which may grow (each bacterium can split into two) or decline (by dying). In a small time interval (t, t + At), each bacterium independently of others has a probability AAt + o(At) as At -+ 0, A > 0, of splitting in two and a probability IIAt + o(At) as At -+ 0, II > 0, of dying. Form a system of differential equations determining Pk(t) = P{W) = k}, n = 1, ... , 231 Problems and Complements and solve it assuming that ,(0) = 1. Verify <Xl L Pt(t) = 1. k=O 8.29. (Continuation) Show that ,(t) is a homogeneous Markov process and determine Pij(t) = Pg(s + t) = il e(s) = i}. 8.30. The linear birth and death process with immigration. Let {W); t ~ O} be the size of a population at time t. Members of the population behave independently of one another. In (t, t + ~t), a member will give a birth to another member with probability .Il~t + o(M) as t ~ O. With probability /-I~t + o(~t), the member will die. In the same time interval, an immigrant will join the population with probability a~t + o(~t) as M ~ O. If e(O) = i, determine the probability generating function for W). 8.31. (Continuation) Show that E{W)} and determine E{W)J2. = a/(/-1 - .Il) + [i - a/(p- .Il)]e-(/l-Alt CHAPTER 9 Markov Processes II: Application of Semigroup Theory 9.1. Introduction and Preliminaries Let {~(t); t ~ O} be a real homogeneous Markov process with transition probability P(x, t, B). In applications the following situation is typical. The transition probability is known for all t in a neighborhood of the origin. Then, P(x, t, B) can be determined for all t > 0 by means of the ChapmanKolmogorov equation (8.1.2). As in the case of a countable state space, we will show that, under assumption (8.1.13), the transition probability P(x, t, B) is completely determined by the value of ap(x,t,B) at 0 at t = . (9.1.1) Our goal is to deduce everything about the behavior of the Markov process ~(t) from (9.1.1). Naturally, when we say everything about the behavior of ~(t), we mean everything which does not depend on the initial distribution because we only make use of the transition probability. The basic tool to achieve this goal is semigroup theory. The modern theory of homogeneous Markov process is basically semigroup theory, whose elements will be discussed in the rest of this chapter. This approach not only elucidates various aspects of this important class of Markov processes, but also provides a unified treatment of the theory which is not attainable by other methods. We begin with a brief review of some concepts th"t were discussed in some detail in Chapter 5. Denote by B the set of all real bounded Borel functions defined on R. With the supremum norm IIhll = sup Ih(x)l, xeR (9.1.2) 233 9.1. Introduction and Preliminaries the set B becomes a Banach space. A mapping T: B-+B is called a "linear operator" (see Definition 5.7.1) if, for any two hI' h2 E B, (9.1.3) where IX and f3 are two fixed numbers. We say that T is bounded if there exists a positive constant M < that IIThllsM'llhll 00 forallhEB. such (9.1.4) The smallest M for which (9.1.4) holds is called the "norm of the operator" T and is denoted by IITII. Thus, sup II Thll = II Til. he B IIhll (9.1.5) h;"O From (9.1.5), we clearly have IIThll s IIhll·IITli. If II Til s 1, the operator T is said to be a "contraction." EXAMPLE 9.1.1. Let B = CEO, 1J be the set of continuous functions on [0,1]. Define T: B-+B by (Th){x) = xh(x) for each h E B and x E [0, 1]. Clearly, T(hl + h2 )(x) = x(h l (x) + h2(X)) = xh l (x) + xh 2(x) = (Thd(x) + (Th 2){x) so that T is linear. In addition, IITII = sup xlh(x)1 s Ilhll, which implies that II Til s 1 (in fact, II TIl = 1). Definition 9.1.1. A one-parameter family {Tt; t ;;::: O} of bounded linear operators on a Banach space B is called a "contraction semi group" if (i) II Ttull s Ilull for all u E B, (ii) Tt+s = T t . P = p. Tt for all s, t;;::: 0. In the following we will say, for short, that {T'; t ;;::: O} is a semi group. A 9. Markov Processes II: Application of Semigroup Theory 234 semigroup is called "strongly continuous" if TO = and I liT' - I II ~ ° as t ~ where I is the identity operator. Since, for any s, t ~ °+ , (9.1.6) 0, because T' is a contraction, this implies uniform continuity of T'. In the theory of homogeneous Markov processes, of particular interest is the semi group induced by the transition probability. Let {~(t); t ~ o} be a real homogeneous Markov process with the transition probability P(x, t, B). Let B be the set of all real bounded Borel functions on R. For each fEB and t ~ 0, define (T'f)(x) = r: f(y)P(x, t, dy). (9.1.7) The family of operators T' defined by (9.1.7) is clearly a contraction. On the other hand, from the Chapman-Kolmogorov equation, we have (T'+sf)(x) = = = = f: f: f: f: f: f: f(y)P(x, t f(y) + s, dy) P(x, t, dz)P(z, s, dy) P(x, t, dz) f(y)P(z, s, dy) P(x, t, dz)(T'f)(z) = (T'T'f)(x). In other words, and this is the semigroup property. 9.2. Generator of a Semigroup ° Let {T'; t ~ o} be a strongly continuous semigroup on a Banach space B. If a sequence {hn}f c:: B is such that IIh n - hll ~ as n ~ 00, we say that it strongly converges to h and write h = s lim hnClearly, the strong convergence is nothing but pointwise uniform convergence. 9.2. Generator of a Semigroup 235 Definition 9.2.1. The (infinitesimal) generator A of the semigroup is defined by · T1- f Alf =s 11m -1-+0+ t (9.2.1) at those fEB for which this limit exists. Denote by DA c: B the subset of elements where the limit (9.2.1) exists. Clearly, if fl' f2 c: D A' then + f3f2 Cl.fl E DA • Consequently, A is linear operator on Dk We now prove the following proposition. Proposition 9.2.1. Iff E D A, the function T'f is differentiable in t and (a) T'fED A , dT1 dt (b) - (c) T1 - f PROOF. = AT'f = T'AJ, = I ' (9.2.2) TSAf ds. (a) We have to show that the limit . ThT'f- T1 s hm h h-+O+ exists. This, however, follows from ThT'f - T'f h T"f - f h ---:;---- = T' ---=--- (9.2.3) and the fact that the limit on the right-hand side of the last equation exists as h --+ O. (b) From (9.2.3), we have that, for all t ~ 0, there exists . T'+hf - T'f d+T1 s hm =-h-+O+ h dt and that (9.2.4) To complete the proof we have to show that d+T1 d-T'f ~=~ for eachfE D A • 236 9. Markov Processes II: Application of Semigroup Theory Consider 0 < s < t and write I rtf -s Tt-sf - TtAf11 = I rtf -s rt-sf - Tt-sAf + Tt-sAf - TtAf11 ~ I Tt-s(Pfs- f - Af)11 ~ I T t- sI (II TSfs- f - Afll as s --+ 0 which completes the proof of (b). (c) The last part follows from I :s = rtf - f = Pf ds I + I Tt-S(PAf - Anll + I PAf - Alii) --+ 0 PAf ds, which proves the proposition. D In the following we will define the concept of a strongly integrable function. Let W, be a mapping W: [a,b] --+ B. If there exists the limit slim n-l L (tk+l - 6-0 k=O tdW,k+l' (9.2.5) where a = to < tl < ... < tn = band () = max O :;:;k:;:;n-l (tk+l - t k ), the mapping W, is called "strongly integrable" and the limit (9.2.5) is denoted by f W,dt. The following properties of the integral are simple to verify. If W, is strongly integrable, then nv, is strongly integrable on [a, b] and (9.2.6) where T: B --+ B is a linear operator. If W, is strongly continuous and integrable, then Ilf w,dtll ~ f IIW,II dt. (9.2.7) In addition, if W, is strongly integrable and strongly continuous from the right, s lim -h1 h-O+ f a h a + W,dt = J¥". (9.2.8) 9.2. Generator of a Semigroup 237 Finally, if d'W,/dt is strongly continuous on [a, b], then dW. f -i dt = b a w" - w,.. (9.2.9) The next proposition holds for strongly continuous semigroups. Proposition 9.2.2. Let {T'; t ~ O} be a strongly continuous semigroup; then, for all t ~ 0, (9.2.10) PROOF. Observe that for all 0 < u < t, we have [see (9.2.6)] ~(TU -I) t T"fds = ~{t T"+ufds - t T'f dS } (after the substitution t = u + sand [0, t] = [0, u] U [u, t]) = ~{r+1 TTfdt - = ~{f+u TTf dt - t f: T"f dS } T"f dS}. By letting u -+ 0 and invoking (9.2.8), we have s lim _(TU - I) 1 u-+O+ U i' TOf ds = T'f - f. 0 Thus, the limit exists and the proposition follows. D How big is the set D A? Does it contain enough elements to make the concept of a generator useful? The following corollary answers these questions. Corollary 9.2.1. If A is the generator of a strongly continuous semigroup {T'; t ~ O} on B, then DAis dense in B. As a matter of fact, because T'f is strongly continuous, it follows from (9.2.10) that, for every feB, -1 t i' i' 0 T"fdseD A· On the other hand, due to (9.2.8), slim -1 1-+0+ t 0 T"f ds = feB. In other words, any element feB is a strong limit of a family of elements from DA- 9. Markov Processes II: Application of Semigroup Theory 238 Proposition 9.2.3. The operator A is closed, i.e., slim fn =f, if {f,,} c DA and slim Afn = h, then fED A and Af = h. PROOF. From (9.2.2.c) we have, for every n = 1, 2, ... and t L L L Ttfn - By letting n --+ 00, we obtain = rtf - f f" = T'Afds ~ 0, T'Af" ds. = T'hds E DA due to (9.2.10). Dividing both sides of the last equation by t and letting t --+ 0 +, we conclude that Af = h, which proves the proposition. D 9.3. The Resolvent This section is concerned with solutions of the differential equation (9.2.2.b). In the numerical case, the semigroup property leads to the classical Cauchy functional equation f(s + t} = f(s}f(t}, f(O} = 1 whose only possible continuous, in fact, the only measurable solutions, are f(t} = e llt and f(t} == O. In the case of the differential equation, dT t dt = A Tt ' TO = I, to be also exponential, namely, would require that B = D A so that A is a bounded operator. This, however, is not always the case, although A is always the limit of bounded operators (Tt - I}/t. Therefore, a new method to solve this differential equation is required. To this end, we need the concept of a resolvent. Definition 9.3.1. Let {Tt;t space B, and let R;./ = ~ to O} be a contraction semigroup on a Banach e- At T1 dt, fEB, A > O. (9.3.1) The family of operators {R l ; A > O} is called the "resolvent" of the semigroup. 9.3. The Resolvent 239 From (9.3.1), we see that RJ is the Laplace transform of the continuous function Ttf Because A > 0, the domain of definition of R;. is the whole B. Consequently, R;. is a bounded linear operator. In fact, IIRJII :::;; LX) e-Atll Ttfll dt :::;; Ilfll/A, (9.3.2) and linearity is apparent. Note that the integral (9.3.1) is defined in the Riemann sense. Proposition 9.3.1. Let {Tt; t ~ O} be a strongly continuous semigroup on a Banach space B and A its generator; then, for each A > 0, (9.3.3) In addition, the mapping (9.3.4) is one-to-one and (9.3.5) PROOF. Let us first prove that (9.3.3) holds. To this end, write 1 t - I)R;.h = t(T' 1 t(T - I) foo e-;,sYSf ds 0 [by invoking (9.2.8)] 1 e;'t = __ ft 1 e;'t = __ ft 1 e-;'uTuhdu + _(eAt - 1) foo e-;,sYShds t o t 0 1 e-;'uTuhdu + _(e;'t - 1)R;.h. t o t By letting t -+ 0 +, we see that AR;.h = -h + AR;.h. (9.3.6) Consequently, the limit slim (T' - I)R;.h/t t-O+ exists, which proves (9.3.3). From Equation (9.3.6), we obtain AR;.h - AR;.h = h (9.3.7) 240 9. Markov Processes II: Application of Sernigroup Theory so that 1= R;.h is a solution of the equation )f - AI = h. (9.3.8) To complete the proof we have to show that (9.3.8) has a unique solution for each A > O. Assume that 11 and 12 are two solutions of (9.3.8); then, rp = 11 - 12 E DA and I (9.3.9) Arp - Arp = O. On the other hand, from Proposition 9.2.1, we see that Ttrp E DA and satisfies dTtrp _ TtA _ 'Tt ----;It rp - I\. rp. Thus, Consequently, for all t 0, ~ e-MTtrp = C (a constant). For t = 0, we obtain that C = rp, which gives o ~ Ilrpll ~ e-MIITtrpll ~ e-Mllrpll-+O as t -+ +00, so that Ilrpli = 0, and uniqueness is proved. This shows that AI - A is one-to-one on DA- Finally, from (9.3.7), we obtain that R;. = (AI - Ar 1 and that o Remark 9.3.1. Proposition 9.3.1 shows that the mapping (9.3.3) is one-to-one and onto. Corollary 9.3.1. For any I E B, we have s lim ARJ = f. ;. .... +ex> (9.3.10) In lact, III - ARJII = If! - A Lex> e-;'tTfj dt I = All Lex> e-;'t[f - TfjJ dt II = II Lex> e-U[f - p;'-ljJ du II 9.4. Uniqueness Theorem 241 Now, for each u 2 0, Ilf - TU'<-'fll --+ 0 as A --+ 00 and the integrand is bounded by 2e- u llfll. Now invoking the dominated convergence theorem, (9.3.10) follows. 9.4. Uniqueness Theorem In this section, we show that, under certain regularity conditions, semigroups are uniquely determined by their generators. Thus, given a generator, we should be able, at least in some cases, to recover the corresponding semigroup. This appears simple enough if the generator is bounded. If not, there are considerable difficulties that must be resolved to achieve this goal. Let B be a Banach space. The space of all bounded, continuous linear functionals I: B --+ R with the norm 11111 = 11(f)1 ~~~m (9.4.1) fEB is also a Banach space, which is called the "conjugate" space of B and denoted by B*. We now need the following result. Lemma 9.4.1. Let the mapping be continuous and bounded. Then, Loo e-.<tw(t) dt = 0 (9.4.2) for all A > 0 implies that w(t) == O. PROOF. Let I E B*; then I(w(t» is a bounded and continuous real function. Due to linearity of I, we have, for all A > 0, Loo e-.<tl(w(t»dt = I(Loo e-.<tW(t)dt) = O. This clearly implies that I(w(t» = 0 for all t 2 0 because of the uniqueness theorem for the Laplace transforms of real functions. Because this holds for any I E B*, the assertion follows. D We are now ready to prove the following proposition. Proposition 9.4.1. Let {Tci;t 2 O} and {Tf;t 2 O} be two strongly continuous semigroups on B with common generator A; then, Td = Tf for all t 2 O. 242 9. Markov Processes II: Application of Semigroup Theory PROOF. From Proposition 9.3.1 we deduce that TJ and T{ have the same resolvent, R). = (A.I A)-I. - Consequently, for every fEB and A. > 0, Loo e-;.t(TJ! - Ttf)dt = R;.f - RJ = O. From this and the previous lemma, the assertion follows. D Next, we shall attempt to construct a semigroup from its generator A when A is bounded. To this end, the following notation is needed. Define exp{A} as co eA = L Ak/k!, k=O where An is the n-iterate of A, i.e., An (9.4.3) = A . An-I, n = 1, 2, .... Because IIAnl1 :s; IIAII" and A is bounded, it follows that the series (9.4.3) converges absolutely. Proposition 9.4.2. If the generator A of a semigroup {Tt; t for all t ~ ~ O} is bounded, then 0, (9.4.4) PROOF. Let us show that I etA t- I - A 11-+ 0 as t --+ O. (9.4.5) Clearly, lIe tA - I - tAil :s; co L IIAllktk/k! = e k=2 tllAIl - 1- tiIAII, so that (9.4.5) holds. Because exp{tA} is a semigroup, the assertion now follows from Proposition 9.4.1. D From this proposition we see that a semigroup Tt must be of the form (9.4.4) if its generator A is bounded. In addition, D A = B. Therefore, every bounded operator generates a semigroup. If the generator A is not bounded, the situation is considerably more complicated. However, from Proposition 9.4.1, we know that A determines uniquely its semigroup. The next proposition shows that, under certain conditions, T' is the unique solution of the differential equation (9.2.l.b). Proposition 9.4.3. Let A be the generator of a semigroup {Tt; t ~ O}. Iff E D A , the function W(t) = Ttf is the unique solution of the differential equation 9.5. The Hille-Yosida Theorem 243 dW(t) dt = A W(t) (9.4.6) subject to the following conditions: (i) W(t) is strongly differentiable for all t > 0; (ii) II W(t)11 ~ Ke~', where K, oc E (0, 00) are some constants; (iii) W(t) -+ f strongly as t -+ 0+. PROOF. According to (9.2.1.b), the function W(t) satisfies Equation (9.4.6) and condition (i). To show uniqueness, assume that W1(t) and W2 (t) are two different solutions of (9.4.6) satisfying (i)-(iii) and set V(t) = WI (t) - W2 (t). Then V(t) is also a solution of (9.4.6) which satisfies conditions (i) and (ii) and V(t) -+ 0 (strongly) if t -+ 0+. Set U(t) = e- ll V(t); then, from (9.4.6), we have d - U(t) = - Ae- lt V(t) dt d + e- ll - dt V(t) = -AU(t) + e-l'AV(t) = -AU(t) + AU(t) = -R;:IU(t). Hence, II II Integrating both sides of this equation, we get o U(s) ds = -Rl 0 d ds U(s)ds = -RlU(t). When t -+ 00, the left-hand side tends to the Laplace transform of V(t), while the right-hand side converges to O. Therefore, for a suitable A > 0, the Laplace transform of V(t) vanishes. This and Lemma 9.4.1 prove the assertion. D 9.5. The Hille-Yosida Theorem In the previous section, we have established that every bounded operator generates a semigroup specified by (9.4.4). If an operator is unbounded, it may not be the generator of a semigroup. When is such an operator the generator of a semigroup? This question is answered by the following proposition due to Hille and Yosida, which characterizes the operators that are generators of semigroups. Proposition 9.5.1. Let B be a Banach space and A a linear operator with domain DA c B. In order to be the generator of a strongly continuous semigroup on B, it is necessary and sufficient that the following conditions hold: 9. Markov Processes II: Application of Semigroup Theory 244 (i) DA is dense in B; (ii) lor every A > 0 and h E B, the equation (9.5.1 ) )J-AI=h has a unique solution lEDA; (iii) the solution I satislies IIIII ~ I h II/A. PROOF. The necessity of these conditions was already proved in the previous section. To prove their sufficiency, assume that they hold. Then, from (ii), it clearly follows that the operator AI - A is invertible. Set R;. = (AI - A> 0 Ar 1 , (we will see later that R;. is indeed the resolvent). Clearly, R;.: B --+ DA (onto) and by (iii) we have that IIR;.II ~ A- 1 . Define A;. = AAR;. and write A .. = A{ -(AI - A)R .. + AR .. } = A(AR;. - I). (9.5.2) From this, we deduce that A .. is bounded, i.e., I A .. I ~ AllAR .. - III ~ A(l Let us prove that, as A --+ + 1) = 2A. 00, Ad --+ AI strongly for all I To this end, we will show first that, as A --+ DA • (9.5.3) E DA • (9.5.4) 00, strongly for all I ARd --+ I E Because [see (9.5.2)] 1 ARd - I = -XAd = ARd = R .. AI, we have liARd - III ~ IIR.. IIIIAIIi ~ IIAIll/A --+ 0 as A --+ 00, which proves (9.5.4). Let us now show that (9.5.4) holds for every IE B. Due to condition (i), for any IE Band e > 0 arbitrarily small, there exists 10 E D A such that III - loll ~ e. Then liARd - III ~ ~ IIARdo - loll + IIAR;.(f - lo}ll + III - loll IIARdo - loll + 2e. Consequently, due to (9.5.4) lim sup liARd - III .. --+00 which shows that (9.5.4) holds for all I E B. ~ 2e, 245 9.5. The Hille-Yosida Theorem To prove (9.5.3) consider an fED A; then, due to (9.5.4), A;.f = ).R;.Af -+ Af strongly as). -+ 00. The last result implies, roughly speaking, that the bounded operator A;. approximates A for large ).. Define (9.5.5) TI = exp{tA;.}. This is clearly a (contraction) semigroup for each), > O. Taking into account (9.5.2), we can write TI = exp{t).2R;. - tAl} = exp(-t).)exp{t).2R;.}. Therefore, Next, we have to prove that TIf has a limit as ). -+ to show that, for fED A, II TIf - T;fII -+ 0 as).,,., -+ 00. To do this, we have 00. (9.5.6) Let ). and ,., be fixed: Set (9.5.7) then ql(t) = A;. Tlf - A~ T;f = A;.qJ(t) + g(t), where g(t) = (A;. - A,.) T;f. It is not difficult to show that d d/qJ(t)exp { -tA;.}] = g(t)exp{ -tA;.}. Therefore, because qJ(O) = 0 [see (9.5.7)], qJ(t)exp{ -tA;.} Noting that = t exp{ -sA;.}g(s)ds. Tl and A" commute, we can write qJ(t) = = Hence, because t t exp{(t - s)A;.}g(s)ds TI-sT"S(A;. - A,,)f ds. Tl is a contraction for each), > 0, 9. Markov Processes II: Application of Semigroup Theory 246 II qJ(t) II ~ I ~ t II (A;. II TrST;(A;. - Aq)JII ds - Aq)JII. (9.5.8) But, by (9.5.3), the last term tends to zero as A, '1 -+ 00, which proves (9.5.6). Notice that the convergence is uniform in t in any compact interval. Now define the family of operators {T'; t ;;::: O} by setting T'J = slim TIJ. J E DA, in the strong sense. Because the convergence is uniform in t, T'J is continuous in t and T'J -+ J strongly if t -+ 0+. In addition, for each JEDA, II TIJII + II T'J - TIJII ~ IIJII + e, Consequently, II T'JII ~ IIJII. It is now easy to extend II T'JII ~ where e-+ 0 as A -+ 00. Tto allJE B. Finally, we have to show that the generator of the semigroup that we have just constructed is A. In any case, T' has a generator, say Ao. Let us compute AoJ for JEDA' Because it follows that TIJ - J = I TIA;./ds. (9.5.9) Now IITIA;./ - T'AJII ~ IIT'AJ - TIAJII + IITIAJ - TIA;'/II· The first term on the right side tends to 0 uniformly as A -+ 00, for t bounded. The second term is bounded by IIA;./ - AJII because TI is a contraction, and this bound tends to zero as A -+ 00. Therefore, TIA;./ -+ T' AJ strongly as A -+ 00 for each fixed JED, and the convergence is uniform in t over compact intervals. As a result, we can pass to the limit in (9.5.9) to obtain T'J - J = I TSAJ ds. Dividing by t and letting t -+ 0+, we have Aof = s lim T'J - J = slim -1 1.... 0+ t 1.... 0+ t I' 0 (9.5.10) ,T"AJdu = Af Consequently, DAo ::::l DA and AoJ = AJ on DABecause Ao is the generator of {T'; t ;;::: O}, AI - Ao maps D Ao onto B by 9.6. Examples 247 Proposition 9.3.1. But AI - A agrees with AI - Ao on DA and maps DA onto B by condition (ii). It follows that D Ao = D A and, thus, Ao = A. This proves the proposition. 0 9.6. Examples The first five sections of this chapter are concerned with some elements of semigroup theory. The question, of course, is where does all this lead? Our principal interest here is homogeneous Markov processes and their transition probabilities. Semigroup theory is a tool which is supposed to help us to investigate what are possible systems of transition probabilities {P(x, t, B)}. The development so far clearly implies that we should study P(x, t, B) through its generator. In this respect, it is of some interest to find out which operators on a suitable Banach space can be generators of semigroups. If an operator is bounded, then it is certainly the generator of a semigroup (see Proposition 9.4.2). An unbounded operator is not necessarily the generator of a semigroup. The next several examples illustrate this particular situation. EXAMPLE 9.6.1. Let {~(t); t ~ O} be a homogeneous Markov chain with a finite state space, say S = {Xl, ... ,XN} and the transition probability Pij(t) = Pg(s + t) = xjl~(s) = Xi}. We will assume that all states in S are stable (see 8.3.15). Denote by Mt the transition matrix Mt = {Pij(t)}; then due to condition (8.3.2), M O = I where I is the unit matrix. In addition, condition (9.3.3) implies that lim Mt = 1. t-+O+ From the Chapman-Kolmogorov equation, it readily follows [see (8.3.7)] that, for all s, t ~ 0, M S+ t = MS.M t = Mt·M s, so that the family of transition matrices {Mt; t ~ O} represents a semigroup. Denote by B the set of all real bounded functions on S (as a matter of fact, the Banach space B is the set of N-dimensional column vectors). For every Xi E Sand fEB, we define (Mtf)(Xi) = N L Pij(t)f(x). j=l We have I(Mtf)(Xi) I :::;; Ilfll for all Xi E S, (9.6.1) 248 9. Markov Processes II: Application of Semigroup Theory so that Xi which clearly shows that {Mt; t 2 O} is a contraction semigroup. Consider ~{(Mtf)(XJ - f(xJ} = ~{Jo pij(t)f(x) - f(XJ} = ~ L~i pij(t)f(x) - [1 - Pii(t)Jf(XJ }. From Propositions 9.3.2 and 9.3.3, it readily follows that the limit 1 lim -t {(Mtf)(x;) - f(x;)} L %fU) N = t-+O+ quf(x;) (9.6.2) j=li = (Af)(x;) exists for all fEB and Xi E S, where A= Because all states of S are stable, Iqiil < 00. It is clear that A is a bounded operator because IIAfil = S~iP I(Af)(xJ I = S~iP !j~; qijf(xj ) - quf(X;)! N S sup i L 1%1·lIfII· j=i Finally, to show that A is the generator, consider II Mtft-f -Afll =s~ipl~{(Mtf)(X;)-f(X;)} -(Af)(X;)!~O due to (9.6.2). Because the generator is bounded, we have Mt = etA = I + n tk k=l k. L ,Ak • EXAMPLE 9.6.2. Let {~(t); -00 < t < oo} be a strictly stationary homogeneous chain with the same state space and transition probabilities as in the previous example. Denote by 249 9.6. Examples then, for all s, t ~ 0, Pj = P{ ~(s + t) = j} = n L Pij(t)p;. ;=1 This implies that (9.6.3) where Q' = (Pi" .. , Pn) and Q' means the transpose of Q. Consider now the case N = 2 and take Our goal is to determine MI. First, we have Pll(t) + pdt) = Pll (t) + P21 (t) = 1. 1 and, from (9.6.3), we deduce Set Pll(t) = O(t); from the last two equations, we obtain that M' = [O(t) 1 - O(t) 1 - O(t)J. O(t) Consequently, dM'1 dt 1=0 = A = 0('(0)[ 1 -1 -1J 1 ' where 0 > 0('(0) = - A. Set A = - AAo; then it is readily seen that, for all n = 1,2, ... , An = t( -2A)n Ao. Because A is bounded, we have I _ M - I = I 1 ~ (- 2At)k A + 2 k~l k! + tA o(e- 21, - 1 OJ =[0 1 0 1) [t(e- 1) + -t(e-2.l.t - 1) 21t - The two examples discussed above deal with generators which are bounded linear operators defined on the whole of B. In this respect, it is of some interest to point out that if A is a generator whose domain is the entire space B, then A must be bounded. 250 9. Markov Processes II: Application of Semigroup Theory If a generator A is not bounded the solution (9.4.3), then breaks down because the series on the right-hand side does not have to converge even on D A' This is unfortunate because many of the most interesting examples (like the Brownian motion process) do not have bounded generators. 9.6.3. Let {~(t); t ~ O} be one-dimensional standard Brownian motion. As we have seen, this is a homogeneous Markov process with state space {R,~} and transition probability EXAMPLE P(x, t, B) = (2ntfl/2 f { B exp - (Y_X)2} 2t (9.6.4) dy. Let B be the Banach space of real bounded Borel measurable functions on R with the usual supremum norm. Simple calculation yields that, for any fEB, (Ttf)(x) - f(x) = (2nfl/2 f: exp { - u;} f(uJt + x)du - f(x). (9.6.5) From this, it seems clear that the semigroup {P;t ~ O} induced by (9.6.4) is not strongly continuous on B. Denote by Co c B the subset of continuous functions and by Bo c B the subset on which the semigroup is strongly continuous. Let us show that Bo c Co. To this end, consider the resolvent of the semigroup (h E B) (R;.h)(x) = = = LX> e-;'t(p g)(x) dt f OO 0 e-;'t dt(2ntfl/2 (2Afl/2 f: A> 0 foo -00 {(y -2t X)2} h(y) dy exp - exp{ly - xljU}h(y)dy. From this we see that for every h function on R, so that E (9.6.6) B, (R;.h)(x) is a bounded continuous On the other hand, (9.6.7) [see (9.3.3)], where R;.B = {R;.h; h E B}. But according to Corollary 9.2.1, the set DA is dense in Bo (as a matter of fact, the closure DA = Bo). Because Co is also a Banach space, it follows that Bo c Co. For h E Co, write (R;.h)(x) = (2Afl/2 {exP ( -xjU) f:oo exp(yjU)h(y)dy + exp(xjU) LXl exp( - yjU)h(Y)dY } 251 9.6. Examples and set f(x) = (RAh)(x), then, after some simple calculations, we have f'(x) = -2exp( -xfo) f:oo exp(yfo)h(y)dy + j2)J'(x), (9.6.8) (9.6.9) f"(x) = 2Af(x) - h(x). The last equation shows that 1"(') is a bounded and continuous function on R. Consequently, 1'(.) is uniformly continuous. From (9.3.7), we have .if(x) - (Af)(x) = h(x). This and (9.6.9) yield (Af)(x) = !f"(x). (9.6.10) In other words, every fED A satisfies (9.6.1 0). Now, denote by ct c Co the set of uniformly continuous functions on R. From Equation (9.6.8), it follows that I' E Co. This then implies that f E q. It is also not difficult to see that Af E Ct. From this and (9.6.10), we have that I" E q. Let us now show that the converse also holds, i.e., if f E ct => I" E ct => fED A' To this end, consider ~{(T'f)(X) = ~f"(X) f(x)} - 1 tv'2nt M::: foo {(y -2 X)2} [f(y) - exp - t -00 1 f(x)] dy - -2f"(x) (using the Taylor formula) f(y) = f(x) = + (y - x)f'(x) foo exp - 1 M::: tv' 2nt + ~(y - + !(y - x)2f(x + o(y - {(y -2 X)2}{ (y - x)f'(x) -00 X)2f"(X f: = (2nfl/2 = !(2n)-1/2 t + o(y - X»}d Y - e- u2 / 2 nu 2f"(x f:oo e- u2 / 2 [f"(x as t -+ 0+. f) - !f"(x) f"(x)]u 2 du. This implies that I ~(T'f - (0 < 0 < 1) ~f"(X) + OuJt)]du - + OuJt) - x» ~f" 11-+ 0 252 9. Markov Processes II: Application of Semigroup Theory Consequently, DA = {J;f E q and f" E Cc1'} and Bo = Cc1'. Next, assume that P(x, t, B) is such that P(x, t, B) = (Tt IB)(x) E D A; then OP(~,/,B) = A(P(',t,B))(x) or oP(x, t, B) at 1 02 P(x, t, B) -2 ox 2 (9.6.11) which is the backward Kolmogorov equation [see (8.8.8)]. From the general theory of the equation of heat conduction, we know that the unique solution of (9.6.11) is (9.6.4). 9.7. Some Refinements and Extensions Let g(t); t ;::: O} be a homogeneous Markov process with state space {R, 9I} and transition probability P(x, t, B). In this section, we discuss some morerefined properties ofthis function and of the semigroup which it induces. The first question that naturally arises is one concerning the continuity of P(x, t, B) with respect to t at t = O. As we have seen, this property was repeatedly used in Chapter 8 to obtain many useful results. Which of the Markov processes have this property? In other words, when does condition (8.1.13) hold? To answer this and some related questions, denote by U(x, e) the open e-neighborhood of x E R, i.e., U(x, e) = {y E R;ly - xl < e}. Definition 9.7.1. A transition probability P(x, t, B) on {R, 9I} is said to be "stochastically continuous" if lim P(x, t, U(x, e)) = 1 (9.7.1) for all e > 0 and x E R fixed. If the limit (9.7.1) holds uniformly in x for each e > 0, the transition probability is said to be "uniformly stochastically continuous." The following proposition specifies conditions on the sample functions of the Markov process ~(t) under which (9.7.1) holds. Proposition 9.7.1. Assume that the sample functions of the homogeneous Markov process g(t); t ;::: O} are continuous from the right; then (9.7.1) holds. 9.7. Some Refinements and Extensions 253 Let {t n} c R+ be a sequence decreasing to zero and set Bn {Wn) E U(x, e)}. Because e(t) is continuous from the right, we have PROOF. {e(0) E U(x,e)} s;; = n U Bk = liminf Bn· 00 00 n=1 k=n n-+oo Therefore, for all x E Rand e > 0, li~~f PABn) ~ Px{li~~nf Bn} ~ Px{e(O) E U(x,e)} = 1. This implies that lim inf P(x, tn' U(x, e» n->oo ~ 1, which proves the assertion. D In the following, we will always assume, unless otherwise stated, that Markov processes have right-continuous sample functions. Denote as usual by B the Banach space of bounded measurable functions on R with supremum norm II!II = supx If(x) I, fEB. On B, we have defined a oneparameter family of bounded linear operators {Tt; t ~ O} induced by a transition probability P(x, t, B) by (Ttf)(x) Clearly, for each t ~ = f:f(Y)P(X, t, dy) = Ex{J(W))}· 0, Tt: B--+B, = T P, and I Ttll : :; ; 1. As we have seen in the previous sections, the theory of semigroups of bounded linear operators in a Banach space deals with the exponential functions in infinite-dimensional functional spaces. These problems were investigated independently by Hille and Yosida (1948). They introduced the concept of the infinitesimal generator A of a semigroup Tt and discussed the problem of generation of T' in terms of A. Let us now prove the following result. T s +t t • Proposition 9.7.2. Let fEB be continuous; then, for each x E R, lim (T1)(x) = f(x). '->0+ Due to continuity of f, for each x E R and any ~ > 0, there exists U(x, e) such that when y E U(x, e), If(y) - f(x)1 < ~. Therefore, PROOF. (T'f)(x) - f(x) = r (f(y) - f(x»P(x, t,dy) JU(x.£) + r J UC(x,e) (f(y) - f(x»P(x, t, dy). 9. Markov Processes II: Application of Semigroup Theory 254 From this, it readily follows that + 211fll P(x, t, UC(x, e». I(Ttf)(x) - f(x) I :::;; bP(x, t, U(x, e» This and the previous lemma prove the assertion. D Corollary 9.7.1. From the last inequality, we deduce that the semigroup is strongly continuous on the subset of continuous functions C c B if P(x, t, B) is uniformly stochastically continuous. The following proposition is due to Hille. Proposition 9.7.3. Denote by oc(t) ~oc(t) = lim t t .... oo PROOF. = In I Ttll; then, inf t>O ~oc(t). t From the semi group property, we deduce that 11T"+tll = IIT"Ttll :::;; IITslI'llytll, so that oc(s + t) :::;; oc(s) + oc(t). Now, denote by f3 = inf ~oc(t); t>O t then f3 is either -00 or finite. Assume that f3 is finite. We choose, for any e > 0, a number x > 0 such that oc(x):::;; (f3 + e)x. Let t > x and n be such that nx:::;; t:::;; (n + l)x:Then 1 t 1 t 1 t f3 :::;; -oc(t) :::;; -oc(nx) + -oc(t - nx) nx 1 1 :::;; - -oc(x) + -oc(t - nx) txt 1 t nx :::;; -(f3 + e) + -oc(t - nx). t Thus, letting t -. 00 in the above inequality, we obtain lim t .... oo The case f3 = -00 ~oc(t) = f3. t is treated similarly. Remark 9.7.1. Lemma 8.3.1 is a version of this proposition. D 255 9.7. Some Refinements and Extensions The concept of the resolvent R;. associated with a semigroup {T'; t ~ O} was discussed in Section 9.3 of this chapter. According to Definition 9.3.1, this is a family of bounded linear operators {R;.; A > O} on the Banach space of bounded Borel-measurable functions on R defined by (R;.f)(x) = LX> e-Al(T'j)(x)dt, A> 0, fEB. From this and Fubini's theorem, it follows that (Rd)(x) = Ex It is easy to see that {Lx> e-).tf(~(t»dt}. (9.7.2) IIR;.II = A-I, (R A 1)(x) == A-I. Proposition 9.7.4. For all AI' A2 > 0, R A, - RA2 + (AI (9.7.3) - A2)R A2 R A, = O. In addition, iff E B is continuous at xo, then lim A(Rd)(xo) = f(xo). A--'oo PROOF. Consider P(RAJ)(x) = I : P(x,s,dy) too e-A"(T'f)(y)dt = too e-A"dt I : I:f(U)P(X,S,dY)P(y,t,dU) = fOO e- A" dt Ioo f(u)P(x, s + t, du) ° too e-A"(P+'f)(x) dt -00 = = e-A,s Isoo e-A,t(T'f) (x) d•. From this, we now have too e- A2S(T sR;.J)(x)ds = (R;'2R;.J)(X) = too e-;' 2S e-;"s ds Is'" e-;"t(Ttf)(x) d. = too e-;.,t(T'f)(x)d. t e(;"-;'2)Sds = (AI - A2tl {(R;.J)(x) - (R;.,f)(x)}, 256 9. Markov Processes II: Application of Semigroup Theory which proves (9.7.3). Next, from A(R;./)(xo) = too Ae-M(TIJ)(xo)dt = too e-t(Tt"-1j)(x o) dt and Proposition 9.7.2, the second part of the proposition follows. 0 Problems and Complements 9.1. Let {Tt; t ~ O} be a semigroup on a Banach space Band Bo c B the subset on which Tt is strongly continuous. Show that Bo is a closed linear manifold. 9.2. Let A be a bounded linear operator on a Banach space B. Show that (i) lIe Al1 ::;; exp{IIAII}; (ii) elI = et 1; (iii) e A+B = eA. e B if A· B = B· A. 9.3. If the bounded operators A and B are such that IletAl1 ::;; 1 and IletBl1 ::;; 1 for all t ~ 0, show that IletBf - etAfil ::;; t IIBf - Alii. 9.4. If A is a bounded operator on B, show that Tt = etA is a semigroup. 9.5. Let Co [ -00,00] be the space of bounded continuous functions on [-00,00]. Define on Co the operator T t , t ~ 0, by L ().4f(x <Xl (Ttf)(x) = e-).t kll)lk! k=O where). > 0 and Il > o. Show that: (i) Tt: Co ..... Co; (ii) {Tt; t ~ O} is a semigroup. Is the semigroup strongly continuous? 9.6. (Continuation) Define Tt on Co by (T1)(x) = f: K(x - u, t)f(u) du, where K(x, t) = (2m)-1/2 e -x 2 /2t, -00 < x < 00, t > o. Show that {Tt; t ~ O} is a strongly continuous semigroup. 9.7. Let {Tt;t ~ O} be a semigroup on a Banach space Band Bo c B the subset on which it is strongly continuous. Show that TtBo = Bo for all t ~ o. 9.8. (Continuation) Show that there exists a constant M > 0 and), > 0 such that, for allt~O, II Ttll ::;;Me).t. 9.9. (Continuation) Show that for each hE B o, Tth is a continuous mapping from [0,00) into Bo ({Ttf; t ~ O} represents a curve in Bo). 257 Problems and Complements 9.10. Let {W); t ~ O} be a homogeneous Markov process with state space {R, Bt} and transition probability P(x, t, B) satisfying condition (8.1.13). Let B be the set of bounded Borel functions f: R ..... R. For each t ~ 0, define T'B ..... B by (T'f)(x) = f: f(y)P(x, t, dy) = Ex {J(W)) }. Show that {T!; t ~ O} is a strongly continuous contraction semigroup on B. 9.11. (Continuation) Show that, for each t B. ~ 0, T' is a continuous linear operator on 9.12. (Continuation) If hE B is continuous at Xo E R, show that t-+O+ 9.13. Show that the generator A of a semigroup is a linear operator on D A. 9.14. Show that for every f E D A, Af E Bo. 9.15. Show that the generator ofthe semigroup in Problem 9.4 is A. 9.16. Show that the generator of the semigroup in Problem 9.5 is (Af)(x) = A.{J(x - Jl.) - f(x)}. 9.17. Determine the generator of the semigroup in Problem 9.6. 9.18. Show that R;.B ,;; B. 9.19. Let {T'; t ~ O} be a contraction semigroup on a Banach space B with generator A. Show that, for each A. > 0, (A.l - A): D A ..... B is one-to-one, and that the inverse mapping taking B into DAis the resolvent R;. (see Proposition 9.3.1). CHAPTER 10 Discrete Parameter Martingales 10.1. Conditional Expectation The concept of a martingale introduced in Section 1.5 of Chapter 1 [see (1.5.19)] was defined in terms of the conditional expectation with respect to a a-algebra. In this section, we will explore briefly some basic properties of this conditional expectation, which are needed in this chapter. We begin with some definitions. Let {n,gjJ,p} be a probability space and denote by L2 = L 2{n,gjJ,p} the Hilbert space of random variables (complex or real) defined on {n, fJI, P}, having finite variances. On L 2 , the inner product is defined by (Zl,Z2) = EZ 1 Z2 (see Definition 5.1.3). The norm of an element Z IIZII E L2 is then = (Z,Z)1/2 = (EIZI2)1/2. Let f/ c gjJ be a sub-a-algebra; the subset of all those random variables from L 2, measurable with respect to Yo is a subspace L! = L2 {n, f/, P} of L 2. Denote by p* the projection operator defined on L 2 {n,gjJ,p} which projects elements of this space perpendicularly onto L!. According to Proposition 5.4.1., for every Z E L 2, P*Z E L! so that P*Z is f/-measurable. In the following, we will write (10.1.1) P*Z = E{ZIf/} and call it the "conditional expectation of Z with respect to (or given) the a-algebra f/." It is clear that, due to Proposition 5.3.1, the f/-measurable random variable E{ZIf/} is uniquely defined up to a P-null set. If Y E L!, then, because P* Y = Y, it follows that 259 10.1. Conditional Expectation E{YI9'} = Y Next, for any A have E 9',IA E L (10.1.2) (a.s.). q; consequently [see (5.2.8) and (5.2.9)], we clearly E{ZI9'}dP = f P*Z·IAdP = (P*Z,IA) = (Z,P*IA) = (Z,IA) = L ZdP. (10.1.3) = E(Z). (10.1.4) From this, it follows that E(E{ZI9'}) If Z = IB' where BE fJI, the conditional expectation E{ZI9'} is called the "conditional probability of B given 9''' and written P(BI9'), i.e., P(BI9') = E{IBI9'}. (10.1.5) = lA (10.1.6) From this and (10.1.2), we obtain P(AI9') when A E (a.s.) 9'. In addition, due to (10.1.3), L (10.1.7) P(BI9')dP = P(A n B). We now list some basic properties of the conditional expectation which are not difficult to prove. First, E{aZ1 + PZ219'} = aE{Z119'} + PE{Z219'} (a.s.), (10.1.8) where a and Pare constants. To show (10.1.8), we invoke (10.1.1), (5.8.4), and (5.8.5) to obtain E{aZ1 + PZ219'} = P*(aZ 1 + PZ 2) = aP*Z1 + PP*Z2 = aE{Z119'} + PE{Z219'}, which proves (10.1.8). Let 9'1 c 9'2 C fJI be two sub-a-algebras, then (i) E(E{ZI9'2}I9'd = E{ZI9'd (ii) E(E{ZIY't} 19'2) = E{ZI9'd (a.s.). (a.s.), (10.1.9) To show the first equation, note that L2 {n,Y't,p} c L 2 {n,92,p}. If we denote by Pt Z the orthogonal projection of Z E L2 on the first subspace and by PI Z on the second one, we have E(E{ZI92}19'1) = E(PIZI9'd = Pt(PIZ) = ptZ which proves (1O.9.i). The second relation is a direct consequence of (10.1.2). If the random variable Z is independent of 9', then E{ZI9'} = E(Z) (a.s.). (10.1.10) 10. Discrete Parameter Martingales 260 As a matter of fact, for any A L Y, E E{ZIY}dP = L = E(Z·IA) ZdP L = P(A)E(Z) = Now consider X, Z measurable; then E L2 such that X· Z E{XZIY} E E {Z} dP. L2 and assume that X is Y- = XE{ZIY} (a.s.). (10.1.11) The proof of this begins by considering the case X = IA. For any C E Y, we have Jcr E{XZIY}dP = Jcr ZIA dP = = f f ZdP AnC E{ZIY}dP = AnC i C XE{ZIY}dP. Extension of this result to simple functions, to non-negative functions, and then to a general case follows in the usual way. Denote by ff" = a{X1,···,Xn } the a-algebra generated by the random variables Xl' ... , Xn; then, we write (10.1.12) Remark 10.1.1. For the conditional expectation of a random variable Y, given a a-algebra Y, to exist, it suffices that EI YI < 00 (the second moment is not really necessary). In such a case, E{YIY} is defined as any Y-measurable random variable satisfying the condition L E{YIY}dP = L YdP for all A E Y. Of course, E{YIY} so defined has all the properties (10.1.2)(10.1.11). In the sequel, we will assume only that the random variables we are dealing with are from Ll = Ll {n,96,p}. 10.2. Discrete Parameter Martingales Let {~n}O be a sequence of real random variables defined on a probability space {n, 96, Pl. Denote by ff" = ago,··.'~n}, n = 0,1, ... , 10.2. Discrete Parameter Martingales 261 the sub-O"-algebra of fJI generated by eo, ... , en. Clearly, {ff,,}0' is an increasing sequence, i.e.,!Fo c !F1 C ... , often called the "internal history" of gn}O'. The system (10.2.1) is called a "stochastic sequence." Definition 10.2.1. Assume that (10.2.2) then the stochastic sequence {(en, ff,,)}0' represents a discrete parameter (i) martingale if E gn+llff,,} = en (a.s.), (ii) submartingale if Eg n +1Iff,,} ~ en (a.s.), (iii) supermartingale if E gn+tlff,,} ~ en (a.s.) for all n (10.2.3) = 0, 1, .... From (10.1.4) and (1O.2.3.i), it follows that Eg n +1} = Egn } for all n= 0, 1, .... In other words, if {(en'ff,,)}O' is a martingale Eg n} = Ego} for all n ~ 1. On the other hand, if the stochastic sequence is a submartingale, we have Finally, we have, for all n = 0, 1, ... , if the stochastic sequence is supermartingale. From the properties of conditional expectations, one easily deduces that (10.2.3) is equivalent to the following: If, for each A E ff", n = 0, 1, ... , L L L L ~L ~L en+1 dP = en dP , {(en'ff,,)}O' is a martingale; en+1 dP en dP, {(en, ff,,)}0' is a submartingale; en+1 dP en dP; {(en, ff,,)}0' is a supermartingale. It is said that a martingale {(en, ff,,)} 0' is closed from the right if there exists a random variable eoo E L1 {n,fJI,p} such that en = E goolff,,} (a.s.) for all n = 0,1, .... On the other hand, if {(en,ff,,)}O' is a submartingale and 262 10. Discrete Parameter Martingales en ~ Egool§,,} (a.s.) for all n = 0, 1, ... , we say that the submartingale is closed from the right. In the modern literature a somewhat more general definition of a martingale is formulated. Let {y"}0' be an increasing sequence of sub-q-algebras of !fI, i.e., 9Oc~c···. Such a family is usually called a "history" or "filtration." A sequence {en}O' of random variables on {n,!fI,p} is said to be "adapted" to the history {y"}0' if en is Y,.-measurable for each n = 0, 1, .... A sequence gn}O' C L1 {n,!fI,p} of real random variables adapted to a history {y"}0' is called a "martingale" with respect to {y"}0' if (10.2.4) for every n = 0, 1, .... In a similar fashion, one may define the concept of a submartingale and supermartingale with respect to {y"}0'. Remark 10.2.1. It is not difficult to see that (10.2.4) leads to the following more general relation. For every 1 ~ k ~ n, EgnlY,.} = ek (10.2.5) (a.s.). This follows from (10.1.9.i) by induction because EgnlY,.} = E(EgnlY,.-dlY,.) = Egn - 1 1Y,.}, and so on. Remark 10.2.2. In the following, unless otherwise stated, when we say that a sequence of random variables is a martingale (submartingale or supermartingale), it means it is so with respect to its internal history. Proposition 10.2.1. Let {en}O' be a martingale and h: R then {h(en)}O' is a submartingale if {h(en)}O' C -+ R a convex junction, L1 {n,!fI,p}. PROOF. From Jensen's inequality, we have E{h(en+1)lh(eo),.··,h('n)} ~ h(Eg n +1leo,···,en}) = h(en) (a.s.). Therefore, o 10.3. Examples 263 Corollary 10.2.1. If {e,,}0' is a martingale, {le"I}O' is a submartingale because Ixl is a convex function. The function x+ = max{O,x} is a nondecreasing convex function. Indeed, because, for any two real numbers Xl and x 2, (Xl + x 2)+ ~ xi + xi, it follows that (PXl + qX2)+ ~ pxi + qxi, p +q= 1, 0 ~ p ~ 1, which proves the assertion. Therefore, {e:}O' is a submartingale. D 10.3. Examples In this section a series of motivating examples will be presented. 10.3.1. Let X ELl {n,&I,p} be an arbitrary random variable and a filtration, such that ~ c &I for all n = 0,1, .... Set EXAMPLE {~}O' ell = E{XI~}. Clearly, the sequence of random variable {e,,}0' is adapted to due to (10.1.9.i), we have {~}O'. Next, E{e"+1I~} = E{E{XI~+dl~} = E{XI~} = ell' which implies that the sequence {e,,}0' is a martingale. Many (but not all) martingales can be obtained in this way. The next example gives a gambling interpretation of a martingale. 10.3.2. A gambler plays a sequence of independent games. Assume that X o is his initial fortune (before play commences), X o + Xl is the gambler's fortune after the first play, X o + Xl + X 2 at the end of second, and so on. In this way we obtain a sequence of partial sums {e,,}0' EXAMPLE ell = Xo + Xl + ... + X" of independent random variables. Note that u{eo, ... ,e,,} = u{Xo,,,,,X,,}, n = 0, 1, .... In fact, because Xo = eo and X k= ek X" is u{eo, ... , e,,}-measurable so that ek-l' k = (10.3.1) 1, ... , n, each X o, Xl' ... , u{Xo,,,,,X,,} c u{eo, ... ,e,,}. On the other hand, each k = 0, ... , n, so that which proves (to.3.1). ek is clearly u{Xo, ... ,X,,}-measurable for every 10. Discrete Parameter Martingales 264 Next, invoking (10.1.10) and (10.3.1), we have Egn+1leo,···,en} = Egn + X n + 1 IXo,···,Xn } en + E{Xn+d· = From this, we deduce the following. The sequence of random variables {en}O' is a martingale if E {Xn} = 0 for all n = 0, 1, .... If E {Xn} > 0 for every n = 0, 1, ... , gn}O' is a submartingale and a supermartingale if E{Xn} < 0, n = 0, 1, ... . EXAMPLE 10.3.3. Let {Zn}O' be an i.i.d. sequence of(nondegenerate) random variables with common support on (0, 1] and E{ZD = IXi > 0, i = 1,2. Set It is quite clear that the sequence of random variables {Ln}f represents a martingale such that (10.3.2) for all n = 1,2, ... (for an application see Example 1.6.2). From this we clearly have that E {L;} ~ 00 as n ~ 00. In spite of this, however, Ln ~ 0 (a.s.) as n~oo. To show this, write Ln as Ln = ex p Ltl In(Zj/IXd}. From Jensen's inequality, we have E{ln(Zj/IX 1 )} < InE{Zj/IXd = o. Therefore, because E{ln(Zj/IX 1 )} < 0, by the strong law oflarge numbers n L In(Zj/IX j=l 1) ~ -00 (a.s.) and, thus, Ln ~ 0 (a.s.) as n ~ 00. This is an interesting example of a sequence of random variables converging to zero (a.s.) while its variance increases to infinity. EXAMPLE 10.3.4. Let {Xn}f be a sequence of random variables such that the probability density of (Xl' . .. ,Xn ) is either InO(., ... , .), or 1/ (., ... , .) for each n = 1,2, .... Assume that In 1 > 0 on R nfor all n = 1,2, ... ; then the random variable Pn(ro), defined by Pn /,,0 (X1' ... ' Xn) = In1 (Xl'··· ,Xn)' is a likelihood ratio. If /,,1 is the true probability density of (Xl, . .. ,Xn ), then 10.3. Examples 265 E{Pn+1IX 1 = Xl,,,,,Xn = Xn} = f Pn+1(X 1,· .. ,Xn,y)P{Xn+1 E dylX 1 = X1,···,Xn = Xn} = fin~l(Xl' ... 'Xn'Y)d =inO(x1, ... ,Xn) y 1 . j,nl( X1,···,X n) in (Xt> ... ,x n) In other words, E{Pn+1I X t> ... , Xn} = Pn (a.s.), which shows that {Pn}f is a martingale with respect to the filtration {ff,,}f, where ff" = 0"{X1,·.·,Xn}· On the other hand, from (10.1.3), we see that O"{Pl, ... ,Pn} c O"{Xt> ... ,Xn}. From this and (10.1.9.i), we deduce that E{Pn+lIPl, ... ,Pn} = E(E{Pn+lIXl,···,Xn}lpl,···,Pn) = E(PnIPl,···,Pn) = Pn (a.s.), so that {Pn}f is a martingale. EXAMPLE 10.3.5. Consider an urn which contains b ~ 1 black balls and w ~ 1 white balls which are well mixed. Repeated drawings are made from the urn, in such a way that after each drawing the selected ball is returned to the urn along with c ~ 1 balls of the same color. Set Xo = bl(b + w) and denote by Xn the proportion of black balls in the urn after the nth draw. Here we want to show that the sequence of random variables {Xn}~ represents a martingale. To this end, set Yo = 1 and, for n ~ 1, define y" as follows: Yn = { 1 if the nth ball drawn is black 0 if the nth ball drawn is white. Let bn and Wn be the number of black and white balls, respectively, in the urn after the nth draw (wo = w, bo = b). Then, clearly, Xn = bnl(bn + wn), n = 1, ... , and Next, we, clearly, have that 266 10. Discrete Parameter Martingales so that (a.s.) E{Xn+1IYo, ... ,Y,,}=E{b n+1 bn+1 lyo, ... ,Y,,} + wn+1 = E { bbn + C y"+1 IYo, .. ·, Y" } n + Wn + C bn cP{y"+1 = lIYo,'''' Y,,} -0---"---+-.0-:.:-:-=----'--"------"-'bn + Wn + c bn + Wn + c c bn -0---"--- + Xn bn + wn + c bn + Wn + c = b + wn 1 + n C {b +~}=X + n bn Wn n' Because a{Xo, .. ·,Xn} c a{Yo, .. ·, Y,,}, we see that E{Xn+1I X o,,,,,Xn} = E(E{Xn+dYo, .. ·, Y,,}IXo,''''Xn ) = E{XnIXo, ... ,Xn} = Xn (a.s.), which proves the assertion. lOA. The Upcrossing Inequality The proof of the fundamental convergence theorem for submartingales, which will be given in the next section, depends on a result which is known as the "upcrossing inequality." Its proof, somewhat combinatorial in nature, will be presented here. We begin by briefly discussing the convergence of numerical sequences. Let {lXn}f be a sequence of real numbers. From the definition of convergence, it is clear that {lXn} f has a limit as n ~ 00 if and only if, for any two rational numbers -00 < ro < r1 < 00, the sequence passes from below ro to above r1 at most a finite number of times. In other words, if the sequence converges, the number of "upcrossings" of any such interval [ro, rd is finite. If, on the other hand, such an interval can be found where the number of upcrossings by {lXn}f is infinite, the sequence diverges. As a matter of fact, if p(rO ,r1) represents the number of upcrossings of [ro, rJ by {lXn}f limlXn ~ ro < r1 ~ limlX n<=>p(ro,r1 ) = 00. From this, we deduce that the sequence {lXn}f converges if and only if p(ro, r1) < 00 for any two rational numbers -00 < ro < r1 < 00. Similarly, a sequence of real random variables {~n}f has a limit (finite or 10.4. The Upcrossing Inequality 267 infinite) with probability 1 if and only if the number of upcrossings of any [rO ,r1 ] is finite with probability 1. In this section we will establish an upper bound for the number of upcrossings of submartingales. Let -00 < a < b < +00 be two numbers and {((n'~)}'f a stochastic sequence. Set '0 = '1 = '2 0, inf{n > O;(n ~ a}, inf{n > '1; (n = '2k-1 = inf{n '2k = inf{n 'v ;;?: b}, (10.4.1) > '2k-2;(n::;; a}, > '2k-1; (n ;;?: b}, taking = 0 if the corresponding set { } is empty. Next, for every n ;;?: 1, define f3n(a,b) = (10.4.2) max{i;'2i::;; n} and f3n(a, b) = 0 ifr2 > n. In other words, the random variable f3n(a, b) represents the number of upcrossings of the interval [a,b] by the sequence (1' ... , (n (see Figure 10.1). Clearly, 13: 0, 1, ... , [nI2], where [x] represents the integer part of x. The following proposition is due to Doob. Proposition 10.4.1. Let {((n' ~)}'f be a submartingale. Then,for every n ;;?: 1, E{f3n(a, b)} ~ E((n - at /(b - a). PROOF. (10.4.3) From Corollary 10.2.1, we know that the sequence {((n - a)+,~}'f , • a I , I I I I 0 I T1 I • • I b • • , I I I • • • T3 • • T • I I I I T2 (10.4.4) T4 Figure 10.1 T5 • • • 268 10. Discrete Parameter Martingales is a non-negative submartingale. In addition, the number of upcrossings of the interval [0, b - a] by at, ... , (~n - at (~1 - ° is again Pn(a, b). Therefore, it suffices to consider a non-negative submartingale {(X.,ff..)}f, Xn ~ 0, with a = and to show that E{P.(O,b)}::;; E{Xn}/b. Set Xo = 0, ~o = {e,n}, and, for i ~ 1, write Bi = { ° if"k < i ::;; I if"k "k+1 for some odd k < i ::;; "k+l for some even k. ° In other words, [see (10.4.1)], Bi = 1 if the largest tk left of i is a downcrossing and the "k+l is an upcrossing, and Bi = if the largest left of i is an upcrossing and "k+l is a downcrossing. Clearly, then, Bl = because 1 ::;; tl (as a matter of fact, Bi = for all i ::;; ";). Now, we have ° bP.(O,b)::;; • L (Xi - i=1 "k ° X i- 1 )Bi and Consequently, bE{P.(O,b)}::;; = ::;; • L E{(Xi - i=1 t .=1 f {.i=l} X i- 1 )B;} (E{X;I.~-d - i~ f(E{Xil~-d - X i- 1 )dP Xi-ddP::;; f E{X.Iff..-ddP = E{X.}. This proves the assertion. D 10.5. Convergence of Submartingales In this section we will prove a theorem of Doob concerning convergence of submartingales. It represents one of the fundamental results in the theory of martingales and can be considered an analog of the well-known fact in real analysis that every bounded monotone sequence has a finite limit. 269 10.5. Convergence of Submartingales Proposition 10.5.1 (Doob). Let {(Xn,§")}'i' be a submartingale such that (10.5.1) n Then, there exists a random variable Xoo such that Xn --. Xoo and EIXool < PROOF. (10.5.2) (a.s.) 00. Assume that the submartingale does not converge; then, P{limXn > limXn} > O. (10.5.3) Because {lim Xn > lim Xn} = for all rationals IX U {lim Xn > P > IX > lim Xn} (%<(1 < p, there exist two rational numbers ro < r 1 such that P{limXn > ro > r1 > limXn} > O. In other words, with positive probability, there is an infinite number of upcrossings of [ro, r1]. Denote by Pn(ro, rd the number of up crossings of [rO ,r1 ] by Xl' ... ' Xn and write p(ro,rd = lim Pn(rO ,r1 ). n-+oo By (10.4.3), E{Pn(ro, r1 )} :::;; E(Xn - rot /(r1 :::;; E(X: - ro) + Irol)/(r1 - ro), and, therefore, E{p(ro,rd} :::;; (s~p EX: + Irol)/(r 1 - ro). From this (bearing in mind that sUPnEIXnl < oo¢>suPnEX: < (0), we deduce that E{p(ro,rd} < 00, which contradicts (10.5.3). Consequently, (10.5.2) holds and by Fatou's lemma EIXool :::;; sup EIXnl < 00, n D so the assertion follows. Corollary 10.5.1. Let {(Xn' §")}'i' be a non-negative martingale; then, the lim Xn exists (a.s.). Indeed, sup EIXnl = sup Xn = EX1 < n and Proposition 10.5.1 is applicable. n 00 10. Discrete Parameter Martingales 270 Remark 10.5.1. Proposition 10.5.1 also holds for martingales and supermartingales because every martingale is a submartingale and because {(-Xn'~)}'f is a submartingale if {(Xn'~)}'f is a supermartingale. For non-negative martingales or supermartingales, assumption (10.5.1) always holds. Remark 10.6.2. Condition (10.5.1) does not guarantee the converges of Xn to Xoo in the mean as the following counterexample shows. Let {~n}'f be an i.i.d. sequence of random variables with the common distribution Pg i = O} = t, Pg i = 2} = t. Consider Xn = n ~i' n n i=1 = 1, 2, ... ; then, {(Xn'~)} 'f is clearly a martingale with EXn = 1. Thus, Xn -+ Xoo (a.s.), where Xoo = 0 (a.s.). On the other hand, EIXn - Xool = EXn = 1 for all n = 1,2, .... However, if assumption (10.5.1) is strengthened to uniform integrability of the sequence {Xn}'f, then we have, simultaneously, Xn -+ Xoo (a.s.) and Xn -+ Xoo in the mean. Definition 10.5.1. A sequence of random variables tegrable if lim (sup +oo n c .... f gn>C) {~n}'f I~nl dP) = O. is uniformly in- (10.5.4) Let us give some criteria for uniform integrability. The simplest one is the following: If there exist a random variable U such that I~nl ~ U and EU < 00, then {~n}'f is an uniformly integrable sequence. More useful is the following result. Proposition 10.5.2. A sequence of random variables {~n}'f is uniformly integrable if and only if: (i) sUPnEI~nl < 00; (ii) for every B > 0, there exists (j > 0 such that for any B supn I~nl dP < B. fB E 9B with P(B) < (j, PROOF. (Necessity). If the sequence is uniformly integrable, then, for any B > 0, there exists c > 0 such that 10.5. Convergence of Submartingales 271 sup Ele.. 1 = sup (Ele.. II{I~nl2:cl .. .. + sup .. Ele.. II{I~nl<cl) ~ sup Ele.. II{I~nl2:cl . ~8+ + Ele.. II{I~nl<cl) e, which proves (i). On the other hand, Ele"IIB = Ele"IIBn{l~nl2:cl ~ Ele.. II{I~nl2:Cl + Ele.. IIBn{l~nl<cl + eP(B). N ow, take clarge enough so that sup" E Ie.. 1I (I~nl2:C l ~ 8/2; then, if P(B) < 8/2e, we obtain sup Ele"II B = sup .. " (Sufficiency). Take 8> 0 and Because (j f le.. 1 dP < 8. B > 0 so that P(B) < (j implies supEle.. IIB < 8. Ele.. 1 ~ Ele.. II{I~nl2:cl ~ eP{I~nl2:Cl' using Markov's inequality we have 1 sup P{lenl ~ e} ~ - sup Elenl-+ 0 as e -+ .. e .. 00. Consequently, for large e, any set {le.. 1 ~ e} can be taken as B. This implies that f le .. ldP < 8, {I~nl2:cl which completes the proof of the proposition. D Proposition 10.5.3. Let {e .. }! be a uniformly integrable family; then E{lim e.. } ~ lim Eg .. } ~ lim Eg n} ~ E{lim en}. PROOF. (10.5.5) Assume x > 0 and write (10.5.6) Given 8 > 0, due to uniform integrability there is x sufficiently large so that sup IEg.. Ign<-xll < .. 8. (10.5.7) On the other hand, by Fatou's lemma, lim EgnI{~n>-xl} ~ E{lim e.. I {," > -xl}' But e.. Ign>-xl} ~ e. ,which yields lim Eg.. Ign> -xl} ~ E{lim en}· (10.5.8) 272 10. Discrete Parameter Martingales Finally, (10.5.6), (10.5.7), and (10.5.8) give lim Eg n} ~ lim EgnIgn,;;-x}} - e ~ E{lim 'n} - e. Because e > 0 is arbitrary, the first half of (10.2. 7) follows. The second half can be proved similarly. D Corollary 10.5.2. If, in the last proposition, we have that integrable, E'n -+ E, EI'n - and '1-+ 0 'n -+ , (a.s.), then, is as n -+ 00. Indeed, in such a case we have E{n:::;; lim E{'n}:::;; lim E{'n}:::;; E{n. This shows that E {,} is finite and that lim Eg n} = Eg}. n-+co Finally, I'n - '1-+ 0 (a.s.) and, because I'n - ,I :::;; I'nl integrable family. + 1'1, it is a uniformly Another simple criterion for uniform integrability gives the following result. Proposition 10.5.4. Let gn}f c L1 {n,~,p} and h(t) ~ 0 be a nondecreasing function, defined for all t ~ 0, such that (i) lim h(t)/t '-+co = +00, sup E{h(I'nl)} < (ii) (10.5.9) 00; n then {'n}f is uniformly integrable. PROOF. h(t)/t Let e > 0, Q = sUPnE{h(I'nl)}, and IX = Q/e. Take x> 0 so large that for t ~ x. Then ~ IX f {I~nl2:x} uniformly for I'nl dP:::;; ~ IX f {I~nl2:x} h(l'nl)dP:::;; Q/IX = e n ~ 1. 10.6. Uniformly Integrable Martingales In this section we discuss some basic features of uniformly integrable martingales. Proposition 10.6.1. Let {(Xn, ~)}f be a uniformly integrable submartingale. Then, there exists a random variable Xco with EIXcol < 00 such that 273 10.6. Uniformly Integrable Martingales (i) Xn -+ Xoo (a.s.), (ii) EIXn - Xool-+ 0 as n -+ (10.6.1) 00, and {X1 , ••. ,Xoo ;$l1 , •.• ,$loo }, where$loo = u{Ui~} is also a submartingale. PROOF. Because the submartingale is uniformly integrable, condition (10.5.1) holds, and, consequently, Xn -+ Xoo (a.s.). From Corollary 10.5.2, we conclude that (1O.6.1.ii) also holds. Finally, consider A E $'" and let N ~ n, then L IXN - Xool dP -+ 0 as N -+ 00. Hence, The sequence {EXnI A} is nondecreasing because E{XNI$lN-d ~ X N- 1 => L ~L XNdP X N- 1dP. Therefore, which implies that Xn s-; E{Xool$',,} (a.s.). o This completes the proof of the proposition. Corollary 10.6.1. Let {(Xn , $',,)} i be a submartingale such that for some y > 1, (10.6.2) n Then there is an integrable random variable Xoo such that (10.6.1) holds. The proof follows from the fact that condition (10.6.2) guarantees the uniform integrability of {Xn}f (see Proposition 10.5.4). The following result due to Levy is concerned with the continuity property of a conditional expectation. Proposition 10.6.2. Let {n, aJ, P} be a probability space and {$',,}f a filtration, i.e.,$l1 c $l2 C .•. c aJ. Let ~ E L1 {n, aJ, P} and set$loo = U{U:;'l ~}. Then (10.6.3) (a.s.) and in the mean. PROOF. Set 10. Discrete Parameter Martingales 274 then, clearly, {Xn}f is a martingale. For any a > 0 and b > 0, f {IXnl;>a} IXnldP ~ = f f f (IXnl;>a) {IXnl;>a) = E{I~II~}dP 1~ldP {lXnl;>a.I~I:5b} 1~ldP + f ~ bP{IXnl ~ a} + f (lXnl;>a.I~I>b) {I~I>b} ~ bE{IXnl}/a + f ~ bE{I~I}/a + By letting a ---+ +00 and then b ---+ lim sup a-oo n +00, f f {IO>b} (1~I>b) 1~ldP I~I dP I~I dP I~I dP. we obtain {iXnl;>a} IXnl dP = O. This shows that the sequence {Xn}f is uniformly integrable. Therefore, by Proposition 10.6.1, there exists a r.v. Xoo such that Xn ---+ Xoo (a.s.) and in the mean. What remains to be shown is that Let N ~ n and A E ~; t then, Because the sequence {Xn}f is uniformly integrable and because it follows that IXN - t Xool dP ---+ XoodP = 0 as N L~dP. ---+ 00, (10.6.4) This equation holds for all A E ~ and, therefore, for all A E Uf~. Because 00 and EI~I < 00, the integrals in (10.6.4) are a-additive measures agreeing on the algebra U f ~. Because of the uniqueness of their extensions to a{Uf ff,.}, Equation (10.6.4) remains valid for all A E a{Uf ff,.} = $'00' Thus, for all A E $'00' EIXool < 275 10.6. Uniformly Integrable Martingales which implies that This completes the proof of the assertion. D The following example shows that a branching process has the martingale structure. EXAMPLE 10.6.1. Let x~l), X~l), .. . X~2), X~2), .. . be a sequence of i.i.d. non-negative integer-valued random variables such that Set To = 1, and, for n ~ 2, Tn-l T" = i~ x!n) ( 0 i~ (.) = ) 0 . (10.6.5) Note that {1a~-l is independent of {x!n)}i. Consequently, E{T,,} = E{T,,-dlt = Itn. Definition 10.6.1. The sequence of random variables {T,,}i is called a "Galton Watson branching process." Here, T" can be thought of as the size of a population at the nth generation, with each individual independently giving birth to a random number of offspring to produce the population in the next generation. Set Un = T,,/lt n, n ~ 1; then, {Un}i represents a martingale. To show this, consider 10. Discrete Parameter Martingales 276 E{UnIU1, ... , Un-d = E = r~' x[nJIU1,···, Un- 1}! p'n f k=l E{I{Tn_,=k J ± i=l x[nJIU1, ... ,Un_1}!p.n L kI{Tn_ =k}/P.n-1 = T,,_dp.n-1 = Un- 1, 00 = 1 k=l which proves the claim. Next, because sup EI Unl n = sup E{Un} = 1, n then In other words, which implies that T" --+ 0 (a.s.) if p. < 1 and T" --+ 00 (a.s.) if p. > 1. Problems and Complements 10.1. Let {n,as',p} be a probability space and §1 and §2 two independent sub-ITalgebras of as'. If A E §1 n ~, show that P(A) = 0 or 1. 10.2. (Continuation) If G c n is an arbitrary subset, write as'G = {G n B; B E as'}. (i) Show that as'G is a IT-algebra on G (the so-called IT-algebra induced on G by as'). (ii) Let f/ be a family of subsets of n. Show that IT{f/G} = IT{f/} n G. 10.3. Show that there does not exist a IT-algebra having a countably infinite number of elements. 10.4. Let Xl' ... , X. be n independent Poisson random variables with parameters A1, ... , An' respectively. Determine: (i) the conditional distribution of (X 1 , ••• ,X.- 1 ) given that Xl + ... + Xn = N; (ii) E{XdX1 + X 2 }. Problems and Complements 277 10.5. Let X E LdQ,~,P} and {Bk} C ~ be a partition ofQ. Let fF be a a-algebra generated by {B,.}'i'. Show that E{XlfF} = where (1.k f (1.k I B. k=l (a.s.), = E{XIBk}. 10.6. (Continuation) If fFl and fF2 are two sub-a-algebras of ~ such that a{ X} v fFl is independent of fF2 show that 10.7. Let {Xi}'i' c Ll {Q,~,P} be an i.i.d. sequence of random variables. Set Sn = Xl + ... + X. and show that E{XdS.,S.+1' ... } = Sn/n (a.s.). 10.8. Let X, YELl {Q,~, P} and let fFl and fF2 be two independent sub-a-algebras. Show that E{XlfFd and E{YlfF2 } are independent random variables. 10.9. (Continuation) Show that E{XlfFl n fF2 } = E(X). 10.10. Let {Xi}O' be a sequence of independent random variables such that E(Xi ) = 1, i = 0, 1, .... Show that the sequence {e.}O' is a martingale, where e. = n Xi· n i=O 10.11. Let {Xi}'i' be a sequence of independent random variables and {lj}'i' another sequence of random variables such that, for each n, the family {Xn' X.+1' ... } is independent of {ll, ... , Y,,}. Show that the sequence {Zn}'i' is a martingale, where Z. = • L k=l X k Yk if X k Yk E LdQ,~,P}, k = 1,2, .... 10.12. Let {e.}O' be a martingale and qJ(.) a convex function such that qJ(en) E LI {Q,~,P}, n = 0, 1, .... Show that the sequence {(qJ(e.),~)}, where ~ = ago, ... , e.} is a submartingale. 10.13. Let {~}'i' be a nondecreasing sequence of a-algebras and fF = a{Uk'=l ~}. Let be an fF -measurable random variable having finite expectation. Show that e E{ei~} -+ e (a.s.). 10.14. Let {e.}'i' be a uniformly integrable martingale with en ~ 0, e. -+ e (a.s.). Show that for all n = 1, 2, ... e. = E{eiel,···,e.}. 10.15. Let {Xn}'i' be an i.i.d. sequence of random variables with E(Xd = 0, E(Xf) = 1 and ~(l) = E(exp{lXd) < 00 for III < Il. Set S. = Xl + ... + X n • Show that as 278 10. Discrete Parameter Martingales rjJ(A) = 1 + A2/2 10.16. Let h(t) + O(A2). = (2t log log t)1/2. Show that lim h(t)/t = o. 10.17. Let A E (-e, e); show that the sequence {Y,,}f is a martingale, where Y" = exp{ASn}/(rjJ(A»n. 10.18. Let {Zdf be a sequence of i.i.d. random variables with support in (O,lJ and a i = E(ZI)i < 1, i = 1, 2. Show that {Ln}f is a martingale where Ln = (n~ ZJan, Find limn~oo E{L~} and show that Ln ---+ 0 (a.s.). 10.19. (Continuation) Let {Y,Jf c L2 {Q,~, P} be an i.i.d. sequence of strictly positive random variables independent of {Zk}f and set Sn = XI + ... + X n • Show that Sn ---+ S (a.s.), where E{S} = a l l1d(l - ad and 111 = E{Yd. Show also that Sn ~ ZI (YI + Sn-d. 10.20. (Continuation) Denote by Gn(x) G(x) where Q(y) = I = P{ Sn h(t) dt :0; x}, G(x) = G-s) f/l Q P{S :0; x}, and show that g(s) ds, = P {Y :0; y} and h is the probability density of ZI' Bibliography Chapter 1 Doob, J.L. (1953). Stochastic Processes. Wiley, New York. Elliott, J.R. (1982). Stochastic Calculus and Applications. Springer-Verlag, New York. Gihman, 1.1. and Skorohod, A.V. (1969). Introduction to the Theory of Random Processes. Saunders, Philadelphia. Halmos, P.R. (1950). Measure Theory. Van Nostrand, New York. Kolmogorov, A.N. (1941). fiber das logarithmisch normale Verteilungsgesetz der Dimensionen der Teilchen bei Zerstiickelung. Dokl. Akad. Nauk SSR. 31, 99-1Ol. Loeve, M. (1977). Probability Theory I. Springer-Verlag, New York. Neveu, J. (1965). Mathematical Foundations of the Calculus of Probability. HoldenDay, San Francisco. Prohorov, Y.V. and Rozanov, Y.A. (1969). Probability Theory. Springer-Verlag, New York. Todorovic, P. (1980). Stochastic modeling of longitudinal dispersion in a porous medium. Math. Sci. 5,45-54. Todorovic, P. and Gani, 1. (1987). Modeling of the effect of erosion on crop production. J. Appl. Prob. 24, 787-797. Chapter 2 Belayev, Yu. K. (1963). Limit theorem for dissipative flows. Theor. Prob. App/. 8, 165-173. Daley, D.1. and Vere-Jones, D. (1988). An Introduction to the Theory of Point Processes. Springer Verlag, New York. Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. 2, 2nd ed. Wiley, New York. Grandell, J. (1976). Doubly Stochastic Poisson Processes (Lecture Notes Math. 529). Springer-Verlag, New York. 280 Bibliography Kac, M. (1943). On the average number ofreal roots of a random algebraic equation. Bull. Am. Math. Soc. 49, 314-320. Le Cam, L. (1960). An approximation theorem for the Poisson binomial distribution. Pacific J. Math. 10,1181-1197. Renyi, A. (1967). Remarks on the Poisson process. Stud. Sci. Math. Hungar. 2, 119-123. Rice, S.O. (1945). Mathematical analysis of random noise, Bell Syst. Tech. J. 24, 46-156. Sertling, R.J. (1975). A general Poisson approximation theorem. Ann. Prob. 3, 726731. Todorovic, P. (1979). A probabilistic approach to analysis and prediction of floods. Proc. 42 Session lSI. Manila, pp. 113-124. Westcott, M. (1976). Simple proof of a result on thinned point process. Ann. Prob. 4, 89-90. Chapter 3 Bachelier, L. (1941). Probabilites des oscillations maxima. C.R. Acad. Sci., Paris 212, 836-838. Breiman, L. (1968). Probability. Addison-Wesley, Reading MA. Brown, R. (1928). A brief account of microscopical observations made in the months of June, July, and August, 1927 on the particles contained in the pollen of plants. Phi/os. Mag. Ann. Phi/os. (New Series). 4, 161-178. Einstein, A. (1905). On the movement of small particles suspended in a stationary liquid demanded by the molecular-kinetic theory of heat. Ann. Physik 17. Freedman, L. (1983). Brownian Motion and Diffusion. Springer-Verlag, New York. Hartman, P. and Wintner, A. (1941). On the law of the iterated logarithm. Am. J. Math.63,169-176. Hida, T. (1965). Brownian Motion. Springer-Verlag, New York. Karlin, S. (1968). A First Course in Stochastic Processes. Academic Press, New York. Kunita, H. and Watanabe, S. (1967). On square integrable Mortingales. Nagoya Math. J. 30,209-245. Levy, P. (1965). Processus Stochastiqus et Mouvement Brownien. Gauthier Villars, Paris. Nelson, E. (1967). Dynamical Theories oj Brownian Motion. Mathematical Notes, Princeton University. Skorohod, A.V. (1964). Random Processes with Independent Increments. Nauka, Moscow (in Russian). Smokuchowski, M. (1916). Drei Vortrage tiber Diffusion Brownche Molekulorbewegung und Koagulation von KolloidteiIchen. Phys. Zeit. 17, 557-571. Uhlenbeck, G.E. and Ornstein, L.S. (1930). On the theory of Brownian motion. Phys. Rev. 36, 823-841. Chapter 4 Anderson, T.W. (1958). An Introduction to Multivariate Statistical Analysis. Wiley, New York. Doob, J.L. (1953). Stochastic Processes. Wiley, New York. Bibliography 281 Feller, W. (1971). An Introduction to Probability Theory and its Applications, Volume 2, 2nd ed. Wiley, New York. Ibragimov, I.A. and Rozanov, Y.A. (1978). Gaussian Random Processes. SpringerVerlag, New York. Rozanov, Y.A. (1968). Gaussian infinitely dimensional distributions, Steklov Math. Inst. Publ. 108, 1-136. (in Russian). Chapter 5 Akhiezer, N.I. and Glazrnan, I.M. (1963). Theory of Linear Operators in Hilbert Space, Volumes I and II. Frederic Ungar Publishing, Co., New York. Dudley, R.M. (1989). Real Analysis and Probability. Wadsworth and Brooks/Cole, Pacific Grove, CA. Loeve, M. (1978). Probability Theory, II. Springer-Verlag, New York. Kolmogorov, A.N. and Fomin, S.V. (1970). Introductory Real Analysis. Prentice-Hall, Englewood Cliffs, NJ. Natanson, I.P. (1960). Theory of Functions of Real Variables, Volume I and II. Frederic Ungar Publishing, New York. Riesz, F. and Sz.-Nagy, B. (1955). Functional Analysis. Frederic Ungar Publishing, New York. Robinson, E.A. (1959). An Introduction to Infinitely Many Variates. Hafner Publishing, New York. Royden, H.L. (1968). Real Analysis, 2nd ed. The Macmillan Co., New York. Yosida, K. (1974). Functional Analysis, 4th ed. Springer-Verlag, New York. Wilansky, A. (1964). Functional Analysis. Blaisdell Publishing, New York. Chapter 6 Cramer, H. (1940). On the theory of stationary random processes. Ann. Math. 41, 215-230. Cramer, H. and Leadbetter, M.R. (1967). Stationary and Related Stochastic Processes. Wiley, New York. Gihman, I.I. and Skorohod, A.V. (1974). The Theory of Stochastic Processes. SpringerVerlag, New York. Grenander, U. and Rosenblatt, M. (1956). Statistical Analysis of Stationary Time Series. Wiley, New York. Karhunen, K. (1947). Uber Lineare Methoden in der Wahrscheinlichkeitsrechnung. Ann. Acad. Sci. Fenn. 37. Khinchin, A.Y. (1938). Correlation theory of stationary random processes. Usp. Math. Nauk. 5,42-51. Loeve, M. (1946). Fonctions aleatoires du second ordre. Rev. Sci. 84, 195-206. Lovitt, W.V. (1924). Linear Integral Equation. McGraw-Hill, New York. Mercer, J. (1909). Functions of positive and negative type and their connections with the theory of integral equations. Phil. Trans. Roy. Soc. London, Ser. A, 209, 415-446. Riesz, F., and Sz-Nagy, B. (1955). Functional Analysis. Frederic Unger Publishing, New York. Rozanov, A.Y. (1967). Stationary Random Processes. Holden-Day, San Francisco. Tricomi, F.G. (1985). Integral Equation. Dover Publishing, New York. 282 Bibliography Chapter 7 Ash, R. and Gardner, M.L. (1975). Topics in Stochastic Processes. Academic Press, New York. Bochner, S. (1955). Harmonic Analysis and Theory oj Probability. University of California Press, Berkeley. Bochner, S. (1959). Lectures on Fourier Integrals. (Ann. Math. Studies 42). Princeton University Press, Princeton, NJ. Cramer, H. (1940). On the theory of stationary random processes. Ann. Math. 41, 215-230. Cramer, H. and Leadbetter, M.R. (1967). Stationary and Related Stochastic Processes. Wiley, New York. Gihman, 1.1. and Skorohod, A.V. (1974). The Theory oj Stochastic Processes, Volume 1. Springer-Verlag, New York. Hajek, J. (1958). Predicting a stationary process when the correlation function is convex. Czech. Math. J. 8,150-161. Khinchin, A.Y. (1938). Correlation theory of stationary random processes. Usp. Math. Nauk. 5, 42-51. Kolmogorov, A.N. (1941). Interpolation and extrapolation of stationary random sequences. Izv. Akad. Nauk. SSSR Ser. Math. 5, 3-14. Krein, M.G. (1954). On the basic approximation problem in the theory of extrapolation and filtering of stationary random processes. Dokl. Akad. N auk. SSSR 94, 13-16. Liptser, R.S. and Shiryayev, A.N. (1978). Statistics oj Random Processes, Volume 2. Springer-Verlag, New York. Rozanov, A.Y. (1967). Stationary Random Processes. Holden-Day, San Francisco. Vong, E. and Hayek, B. (1985). Stochastic Processes in Engineering Systems. SpringerVerlag, New York. Wold, H. (1954). A study in the Analysis of Stationary Time Series, 2nd ed., Almqvist and Wiksell, Stockholm. Yaglom, A.M. (1949). On the question of linear interpolation of stationary stochastic processes. Usp. Math. Nauk. 4,173-178. Yaglom, A.M. (1962). An Introduction to the Theory oj Stationary Random Functions. Prentice-Hall, Englewood Cliffs, NJ. Chapter 8 Blumenthal, R.M. and Getoor, R.K. (1968). Markov Processes and Potential Theory. Academic Press, New York. Chiang, L.c. (1968). Introduction to Stochastic Processes in Biostatistics. Wiley, New York. Chung, K.L. (1967). Markov Chains with Stationary Transition Probabilities, 2nd ed. Springer-Verlag, New York. Doob, J.L. (1953). Stochastic Processes. Wiley, New York. Dynkin, E.B. and Yushkevich A.A. (1956). Strong Markov processes. Theory Prob. Appl. 1, 134-139. Dynkin, E.B. (1961). Foundations oj the Theory oj Markov Processes. Prentice-Hall, Englewood Cliffs, NJ. Feller, W. (1954). Diffusion processes in one dimension. Trans. Amr. Math. Soc. 77, 1-31. Bibliography 283 Feller, W. (1966). An Introduction to Probability Theory and its Applications, Volume 2. Wiley, New York. Gihman, 1.1. and Skorohod, A.V. (1975). The Theory of Stochastic Processes, Volume 2. Springer-Verlag, New York. Gnedenko, B.V. (1976). The Theory of Probability. Mir Publishers, Moscow. Hille, E. (1950). On the differentiability of semi-groups of operators. Acta Sci. Math. Szeged 12B, 19-24. Ito, K. (1963). Random Processes II. Izdatelstwo Inostranoy Literatury, Moscow (Russian translation from Japanese). Karlin, S. (1968). A First Course in Stochastic Processes. Academic Press, New York. Kolmogorov, A.N. (1951). On the problem of differentiability of transition probabilities of timechomogeneous Markov processes with countable number of states. Uchenye Zapiski Moskovskovo Gos. 148 (Matematika 4), 53-59 (in Russian). Lamperti, J. (1977). Stochastic Processes. Springer-Verlag, New York. Mandl, P. (1968). Analytical Treatment of One-Dimensional Markov Processes. Springer-Verlag, New York. Chapter 9 Dynkin, E.B. (1965). Markov Processes. Springer-Verlag, Berlin. Ethier, N.S. and Kurtz, T.G. (1986). Markov Processes. Wiley, New York. Feller, W. (1952). The parabolic differential equations and the associated semigroups of transformations. Ann. Math. 55,468-519. Hille, E. and Philips, R.S. (1957). Functional Analysis and Semi-groups. Am. Math. Soc. Colloq. Publ. 31. American Mathematical Society, Providence, RI. Hille, E. (1948). Functional Analysis and Semi-groups. Collq. Publ. Amer. Math. Soc. Mandl, P. (1968). Analytical Treatment of One-Dimensional Markov Processes. Springer-Verlag, New York. Yosida, K. (1948). On the differentiability and representation of one-parameter semigroups of linear operators. J. Math. Soc. Japan. 1, 15-21. Yosida, K. (1974). Functional Analysis, 4th ed. Springer-Verlag, New York. Chapter 10 Breiman, L. (1968). Probability. Addison-Wesley, Reading MA. Chung, K.L. (1974). A Course in Probability Theory, 2nd ed. Academic Press, New York. Doob, J.L. (1953). Stochastic Processes. Wiley, New York. Loeve, M. (1978). Probability Theory II, 4th ed. Springer-Verlag, New York. Shiryayev, A.N. (1984). Probability. Springer-Verlag, New York. Index A A/B/s, 18 Absorbing state, 209, 215 Adapted, 262 Adjoint operator, 123 Admissible linear filter, 175 Almost sure continuous trajectories, 24 Analytical, 189 Arc sin law, 70 Arrival times, 17,27 Autogenesis, 211 Average hitting time, 66 Bivariate point process, 53 Bochner-Khinchin, i51 Bombardment molecular, 81 Borel cylinder, 4 Borel-Cantelli, 77 Borel-Radon measure, 35 Bounded generator, 242 Bounded linear operator, 223 Bounded random variables, 113 Branching process, 275 Brown, Robert, 15, 62 Brownian bridge, 64 Brownian motion, 15,63 Brownian motion with drift, 64 B Bachelier, 81 Backward diffusion equation, 226 Backward equation, 210, 212 Banach space, 218, 233 Basis of L 2 , 118 Belayev, 52, 279 Beppo-Levi, 112 Bernoulli, 37, 39 Bernoulli random variables, 37, 39 Bessel inequality, 108 Best approximation, 171 Best linear predictor, 187 Binomial component, 213 Birth and death process, 211 C Cadlag, 7, 20, 215 Cardinality, 121 Cauchy functional equation, 49, 202, 215, 238 Cauchy inequality, 109 Cauchy or fundamental sequence, 111 Cauchy theorem, 189 Cauchy sequence, 111, 202 Chapman-Kolmogorov equation, 11, 200,202,206,224 Characterization of normal distribution, 96 Index Closed contour, 189 Closed linear manifold, 115 Closure, 19 Complete orthogonal system, 118 Completeness, 113 Conditional expectation, 258 probability, 259 Conjugate space, 241 Conservative Markov process, 202 Consistency conditions, 2 Continuity (a.s.), 24 Continuity concepts, 22 Continuous functions (set), 22 Continuous in probability, 22; see also stochastically continuous Continuous time Markov chain, 205 Contraction, 218, 233 Contraction semigroup, 233 Control, 162 Convergence in quadratic mean, 110 Convergence of submartingales, 268 Convex, 127, 262, 263 Countablyadditive, 114 Counting random function, 36 Coupling, 38 Covariance function, 129 Covariance matrix, 94 Cox, 51 Cramer, H., 79 Curve in the Hilbert space, 177 D Death rate, 211 Death process, 212 Decomposition of Z, 116 Dense subset, 114 Deterministic process, 177 Deviation square, 223 Difference-differential equations, 212 Diffusion coefficient, 223 Diffusion process, 223 Dimensionality, 121 Dinamically neutral, 16 Dini, 141 Dirac measure, 35 Directly given, 3 285 Discontinuities, 23 of the first kind, 25 Discontinuity point (fixed), 23 Discrete parameter processes, 1, 182 Dispersion (longitudinal), 16 Distance in L 2 , 109 Distance (minimal), 116 Distribution (marginal), 1 Doob, 1., 19,20,64,79,82, 103,267, 268 Doubly stochastic, 51 Drift coefficient, 223 Dynkin, 202, 216, 217 E Eigen function, 136 Eigenfunctions of integral equation, 137 Eigenvalues of a matrix, 93 of integral equation, 137 Einstein, A., 62 Elementary random measure, 157 Entire function, 193 Epoch,70 Equivalent processes, 5 Ergodicity, 145, 146 Error of estimate, 170 Essentially bounded, 113 Essential supremum, 114 Estimate, 170 Estimation, 169 Everywhere dense, 114 Everywhere dense in L 2, 114 Exceedance,57 Excursion, 57 Extremes of Brownian motion, 67 F Feller, 53,75 Feller processes, 128,218 Filtering and prediction, 169, 170 Filtration, 262, 265, 273 Finite dimensional distribution, 201 Fischer, 111 Fixed discontinuity point, 24 Flood modeling, 56 Index 286 Fokker Planck, 226 Forward diffusion equation, 226 Fourier coefficients, 118 Fourier series, 118 Fredholm equation, 137 Frictional force, 81 Fubini,28 Functional, 186,241 G G/G/l,18 Galton-Watson, 275 Gambling, 263 Gaussian process, 97 Gaussian system, 93 Generalized derivative, 134 Generalized Fourier series, 118 Generating function, 213 Generator (infinitesimal), 235 Generator of a semigroup, 234 Germination process, 13 Global behavior, 201 Gram-Schmidt, 121, 136 Grandel,51 H Hartman, 75 Herglotz's theorem, 152 Hermitian form, 131 kernel,137 symmetry, 130, 137 Hilbert space, 113 Hille, E., 207 Hille-Yosida, 243, 254 History, 261 Hitting times, 65 Homogeneous diffusion, 223 Homogeneous Markov process, 10, 200 Homogeneous Poisson process, 43 I Imbibition, 13 Independent increments, 8 Indistinguishable, 6 Initial distribution, 9, 201 Inner product, 107 Innovation, 183 Instantaneous state, 209 Integral (stochastic), 86 Riemann, 86 Interarrival times, 48 Internal history, 261 Invariant measure, 203 Inverse matrix, 92 Isometric isomorphism, 158 mapping, 158 operator, 124 Ito, 86, 226 J Jensen's inequality, 112 Joint probability density, 94 Jump, 216 K Kac, M., 35 Karhunen-Loeve expansion, 139 Kernel of integral equation, 137 Khintchine, 75 Kolmogorov, 7,15,25,75,224, 226 Kunita,86 L L2 space, 106 L 2-continuous process, 132 Langevin's equation, 81 Laplace transform, 53, 239 Law of motion (Newton), 81 Law of the iterated logarithm, 74 LeCam, 39 Likelihood ratio, 264 l.i.m., 110 Linear estimate, 173 Linear manifold, 115, 173 Linear operator, 122 Linear predictor, 173 Linear transformation, 174 Lipschitz conditions, 226 Local behavior, 201 287 Index Loeve's criterion, 113 Levy, 226 M M/M/l, 18 Marginal distribution, 1, 2 Marked point process, 53 Markov Gaussian process, 99 Markov inequality, 23 Markov process, 9, 200 Markov process homogeneous, 10 Markov process regular, 216 Markov property, 9 Markov renewal process, 220 Markov time, 216 Martingale, 12, 46, 258, 272 Martingale closed, 262 Mathematical modeling, 12 Matrix,92 Maximum, 19, 56, 68 Mean rate, 43 Mean square error, 170 Measurable process, 27 Mercer, 140 Metric, 38 Moving average, 176, 183, 185 N Newton, 12,81 Noise, 172 Non-deterministic process, 177 Non-negative definite, 93, 130 Norm, 108, 123 Norm of an operator, 123 Normal distribution, 94 Normal operator, 124 Normally distributed, 94 o Order statistics, 23, 44 Ornstein Uhlenbeck process, 81, 82 Orthogonal basis, 118 collection, 108 complement, 117 matrix,93 projection, 115 random measure, 157 Orthonormal collection, 108 Outer measure, 8 p Parallelogram law, 108 Parameter set, 1 Parseval, 119 Partition, 73 Path, 2, 71 Physically realizable filter, 176 Pincherle-Goursat kernel, 138 Poincare, R., 62 Point measure, 36 Point process on R, 34 Point process, simple, 35 Poisson process, 39, 40 Pole, 190 P61ya, 152 Porous medium, 16 Positive definite, 93, 94 Prediction (pure), 169 Process with orthogonal increments, 160 Projection operator, 124, 125 Pure birth process, 49, 212 Pure death process, 212 Purely non-deterministic process, 178 Q Quadratic form, 93 Quadratic mean continuity, 132 Quadratic mean differentiability, 134 Quadratic variation, 73 Queues, 17 R Rainfall, 34, 51 Random measure, 36, 175 Random process, 1 Random telegraph process, 153 Real valued, 1 Rectifiable, 74 Reflection principle, 68 Regular (stable) state, 209 Regular Markov process, 216 Index 288 Renyi,53 Resolvent, 244, 238 Rice, 35 Riemann stochastic integral, 86 Riesz-Fisher theorem, 111 Right continuous inverse, 43 S Sample path (function), 2 Scheffe,93 Schwarz's inequality, 107 Second order, 11, 106 Second order process, 129 Self adjoint operator, 123 Semigroup, 232, 233 Separability, 18 Separable process, 19 Separable version, 20 Serfling, 39 Shift operator, 174 Singleton, 197 Singular matrix, 92 Smoluchowski, M., 62 Soil erosion, 14 Spanned, 115 Spectral characteristic, 174 Spectral density, 121 Spectral distribution, 121, 163 Spectral representation of a process, 162 Standard Brownian motion, 63 State space, 2, 200 Stationary distribution, 203 Stationary Gaussian Markov, 102 Stationary Gaussian process, 102 Stationary stochastic process, 10, 11, 143 Stationary transition probability, 10 Stieltjes integral, 85 Stochastic integration, 85 matrix, 206 measure, 157 process, 1 sequence, 261 structure, 8 Stochastically continuous, 22 equivalent, 5 equivalent (wide sense), 5 Stone, 162 Stopping time, 216 Strictly stationary, 10 Strong convergence, 234 Markov process, 216 Markov property, 217 Strongly continuous, 234 Strongly integrable, 236 Subadditive, 207 Submartingale, 12,261 Subordinated process, 175 Subspace, 115 Supermartingale, 12, 261 Supremum norm, 232 Symmetric distribution, 153 T Telegraph process, 144 Thinned version, 52 Thinning of a point process, 51 Todorovic, 58 Total variation distance, 38 Transition probability, 10 standard,218 stochastically continuous, 252 Transpose matrix, 92 Triangle inequality, 109 U Unbiased estimate, 170 Uniform integrability, 270 Uniformly integrable martingales, 272 Unitary, 115 Unitary operator, 124 Upcrossing, 266 Upcrossing inequality, 266 V Version, 6, 9 W Weak convergence, 52 Westcott, 52 Index White noise, 83, 184 Wide sense stationary, 11, 143 Wiener process, 63 Wold decomposition, 179 289 y Yaglom, 187 Yushkevich, 217, 218 Springer Series in Statistics (continued from p. ii) Shed/er: Regeneration and Networks of Queues. Siegmund: Sequential Analysis: Tests and Confidence Intervals. Todorovic: An Introduction to Stochastic Processes and Their Applications. Tong: The Multivariate Normal Distribution. Vapnik: Estimation of Dependences Based on Empirical Data. West/Harriso1l: Bayesian Forecasting and Dynamic Models. Wolter: Introduction to Variance Estimation. Yaglom: Correlation Theory of Stationary and Related Random Functions I: Basic Results. Yaglom: Correlation Theory of Stationary and Related Random Functions II: Supplementary Notes and References.