# Linear Models

Anuncio
```Linear Models
Faculty of Science, 2010-2011
Juan M. Rodr&iacute;guez D&iacute;az
Statistics Department
University of Salamanca
Spain
PartII: Regression Models
Contents
1 Introduction
3
2 Linear Regression
2.1 Preliminaries and notation
2.2 Estimation . . . . . . . .
2.3 Tests and residuals . . . .
2.4 Prediction . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
4
6
6
3 Multiple Regression
3.1 Introduction . . . . .
3.2 Estimation and tests
3.3 Prediction . . . . . .
3.4 Other issues . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
7
9
9
.
.
.
.
.
.
.
.
.
.
.
.
4 Analysis of Covariance
12
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 ANOCOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Extensions of Regression Model
2
14
1
Introduction
Regression Models A bit of history
• The regression models try to find the dependence of a (vector of) response
variable y with respect to a (vector of) explicative variable x
• First used in Astronomy and Physics by Laplace and Gauss
• The name comes after Galton (ending of S. XIX), who studied the dependence of the children’s height respect to the parents’ height, finding a
“regression” to the mean
Regression models Prediction
• Let us assume we know the distribution of heights y for the Spanish population, and we want to predict the height ŷ of a randomly-selected person.
We would select
– the mode in order to maximize the prob. of guess right
– the median if minimizing the absolute value of the error
– the mean if minimizing the sum of squares of the errors
• Let us assume, additionally, that we know the weight x of the person and
the (conditioned) distribution of the heights for this weight
• Then we will estimate the unknown height by ȳ|x , the mean of the heights
conditioned to the weight x
Regression models Types of relation between variables
• Exact or functional: the knowledge of a variable x determines completely
the value of the other, y = f (x)
• Independence: the knowledge of x does not give any information about y
• Statistic or stochastic: the knowledge of a variable x can predict (better
or worse) the value of the other, y = f (x) + ε
The regression methods try to build models for this third case
3
2
Linear Regression
2.1
Preliminaries and notation
Linear Regression Model
• Model:
yi = β0 + β1 xi + ui with ui , independent and ui ≡ N (0, σ 2 ),
i = 1, . . . , n
• ei = yi − ŷi = yi − β̂0 − β̂1 xi
2.2
Estimation
Linear Regression Estimators of parameters
Pn
• Minimum Squares estimators minimize i e2i
• Maximum Likelihood estimators maximize
L(β) = log fn (e),
Pm
with fn (e) = (2π)−n/2 σ −n exp[−( i ei )/(2σ 2 )]
• ⇒ Under the normality assumption, MS estimators coincide with ML
estimators:
)
(
P
⇔
e
=
0
0 = ∂L(β)
i
∂β0
Normal equations
•
P i
0 = ∂L(β)
i ei xi = 0
∂β1 ⇔
•
β̂1 =
Sxy
2
Sx
β̂0 = ȳ − β̂1 x̄
;
P
with Sxy = n1 i (xi − x̄)(yi − ȳ) = xy
&macr; − x̄ȳ,
P
Sx2 = n1 i (xi − x̄)2 = x&macr;2 − x̄2
4
Linear Regression Properties of line and estimators
• Estimated Line:
ŷ = β̂0 + β̂1 x
or
(y − ȳ) = β̂1 (x − x̄)
P
2
• β̂1 =P i ωi yi , with
x̄)/nS
P 2ωi = (xi −
P x
2
and
ωi = 0,
ωi = 1/nSx ,
ωi xi = 1
P
• β̂0 = P
with
i )/n
i ri yi ,
Pri2= (1 − nx̄ω
P
and
ri = 1,
ri = (1 + x̄2 /Sx2 )/n,
ri xi = 0
σ2
• β̂1 ≡ N β1 , nS
2
x
2
• β̂0 ≡ N β0 , σn
1+
x̄2
2
Sx
=
σ 2 x&macr;2
2
nSx
2
x̄σ
• Cov(β̂0 , β̂1 ) = − nS
2
x
Linear Regression Some notes
• The point (x̄, ȳ) belongs to the estimated line
• About the two regression lines yx and xy :
– The line yx minimizes the vertical distances of the points to it
– The line xy minimizes the horizontal ones
– None of them is the line closest to the points
– The sign of the slope is the same in both lines
– They intersect in the point (x̄, ȳ)
• From Cov(β̂0 , β̂1 ), when x̄ &gt; 0 they are negative correlated
• The more distant from the mean is a point, the more importance it has in
the estimation of β1
• Var(β̂0 ) increases with the distance of x̄ to the origin
5
2.3
Tests and residuals
Linear Regression Tests
• The regression test is an ANOVA
P
P
P
2
2
2
•
i (yi − ȳ) =
i (yi − ŷi ) +
i (ŷi − ȳ)
•
N EV
σ2
≡ χ2n−2 ;
EV
σ2
≡ χ21 if H0 is true
Source
EV
NEV
TV
(TV=NEV+EV)
SS
P
i (ŷi
2
− ȳ)
2
P i ei 2
i (yi − ȳ)
P
Determination coefficient: R2 =
EV
TV
DF
MS
F
1
Se2
2
SR
Se2
2
SR
n−2
n−1
2
=1−
2
SR
Sc2y
2
R2 = rxy
=
2
Sxy
2 S2
Sx
y
Linear Regression Plots of residuals
• The plot ei vs. ŷi should be a shapeless “cloud of points”
• Otherwise it could mean
– a non-linear relation between the variables
– existence of outliers
– very different number of observations per group (x)
– heteroscedasticity
– computational errors
• The plot ei vs t (when possible) can detect dependence among the residuals
– positive autocorrelation
– negative autocorrelation
2.4
Prediction
Linear Regression Prediction
• Given x we may want to predict
– the value of Y when X = x (one observation, yx )
– the mean of the observations taken at x, &micro;x )
• In both cases the predicted value is the one given by the estimated line,
&micro;̂x = ŷx = β̂0 + β̂1 x
• But the variances are different:
2
2
– Var(&micro;̂x )=Var(ȳ + β̂1 (x − x̄))= σn 1 + (x−x̄)
2
Sx
2
2
– Var(ŷx )=E[(yx − ŷx )2 ]= σn n + 1 + (x−x̄)
2
S
x
• Now confidence intervals for &micro;x and yx can be constructed
6
3
Multiple Regression
3.1
Introduction
Multiple Regression Linear Model Notation
• Model:
yi = β0 + β1 x1i + β2 x2i + &middot; &middot; &middot; + βk xki + ui where xij could be
any function (linear or not) of the x, and with the usual assumptions for
the uncontrolled variables: ui ≡ N (0, σ 2 ), i = 1, . . . , n, and independent

1 x11
xk1
Y = (y1 , . . . , yn )0n&times;1
 1 x12
xk2
0
β = (β0 , β1 , . . . , βk )(k+1)&times;1
X=
• Matrix notation:

0
U = (u1 , . . . , un )n&times;1
1 x1n
xkn
• Thus the model can be written:
a space of dimension n
Y = Xβ + U




n&times;(k+1)
, where Y belongs to
Multiple Regression Linear Model Gauss-Markov Theorem
In the given model, if the following conditions are true
• The values of the dependent variable are generated by the linear model
Y = Xβ + U
• the {ui } are uncorrelated
• Var(ui )=constant
• the {ui } are not dependent on the x
• x is obtained without measure errors
• we look for estimators that are linear respect to the observations
and we consider the optimal estimator to be the one that is unbiased and with minimum variance,
then the MSE is optimal, no matter the distribution of u.
Note: Remember that when the normality assumption holds MLE coincides
with MSE.
3.2
Estimation and tests
Multiple Regression Linear Model Estimators of parameters
• Under the normality assumption, MS estimators coincide with ML estimators:
Pn
Pn
• G = i e2i = i (yi − β0 − β1 x1i − &middot; &middot; &middot; − βk xki )2
P


∂G
0 = ∂β
⇔


i ei = 0
0


P


∂G
0 = ∂β
⇔
e
x
=
0
i
1i
i
1
•
← Normal equations that can be


... P




∂G
0 = ∂β
⇔
i ei xki = 0
k
expressed as
X 0 Y = X 0 X β̂
⇒ β̂ = (X 0 X)−1 X 0 Y
7
Multiple Regression Linear Model Geometric interpretation
• Let X0 =10n , X1 = (x11 , . . . , x1n )0 , ..., Xk = (xk1 , . . . , xkn )0 the columns of
X, and EX =&lt; X0 , X1 , . . . , Xk &gt; the subspace of Rn generated by them
• The residuals vector Û = Y − X β̂ = Y − Ŷ = e will be minimum when it
is orthogonal to EX , that is when
0 = X 0 e = X 0 (Y − Ŷ ) ⇔ β̂ = (X 0 X)−1 X 0 Y
• And since ΣY = σ 2 I and ΣAY = AΣY A0 then
Σβ = σ 2 (X 0 X)−1
Multiple Regression Linear Model Properties of estimators
• ŷ = β̂0 + β̂1 x1 +&middot; &middot; &middot;+ β̂k xk
ŷ − ȳ = β̂1 (x1 − x̄1 )+&middot; &middot; &middot;+ β̂k (xk − x̄k )
or
• β̂i ≡ N (βi , σ 2 qii ) , with qii the (i, i) element of (X 0 X)−1
⇒ test and confidence interv. for parameters can be computed
P 2
e
2
i i
= n−k−1
is an unbiased estim. of σ 2 (Residual Variance)
• SR
)
(β̂−β)0 X 0 X(β̂−β)
0
≡ χ2k+1
X 0 X(β̂−β)
2
σ
⇒ (β̂−β)
≡ Fk+1,n−k−1
•
2
2
(k+1)SR
(n−k−1)SR
2
≡ χn−k−1
σ2
Multiple Regression Linear Model Properties of estimators
• The confidence region is the ellipsoid
2
(β̂ − β)0 X 0 X(β̂ − β) ≤ (k + 1)SR
Fk+1,n−k−1
whose axes depend on the characteristic roots of X 0 X
8
Multiple Regression Linear Model Tests
Sources of variability
P
P
P
2
2
2
i (yi − ȳ) =
i (yi − ŷi ) +
i (ŷi − ȳ)
(TV = EV + NEV)
Source
EV
i (ŷi
NEV
TV
R2 =
and
3.3
F =
EV
TV
SS
P
,
− ȳ)
2
2
i ei
P
2
=1−
R2 n−k−1
,
1−R2
k
MS
F
k
Se2
2
SR
Se2
2
SR
n−k−1
n−1
2
i (yi − ȳ)
P
DF
2
SR
Sc2y
k
= R2 − (1 − R2 ) n−k−1
,
2
2
)
SR
Prediction
Multiple Regression Linear Model Prediction
• Given x = (1, x1 , . . . , xk )0 we may want to predict
– the value of Y when X = x (one observation, yx )
– the mean of the observations taken at x, &micro;x )
• In both cases the predicted value is the one given by the estimated line,
&micro;̂x = ŷx = β̂0 + β̂1 x1 + &middot; &middot; &middot; + β̂k xk
• But the variances are different:
– Var(&micro;̂x )=E[(&micro;̂x − &micro;x )2 ]=x0 Var(β̂)x = σ 2 x0 (X 0 X)−1 x
|
{z
}
νx
2
2
– Var(ŷx )=E[(yx − ŷx ) ]=σ (1 + νx )
• Using this, confidence regions for &micro;x and yx can be constructed
3.4
Other issues
Multiple Regression Linear Model Multicolinearity
• If one explanatory variable is a linear combination of the rest, then X 0 X
cannot be inverted and the solution is not unique
• Close to this case the covariance matrix σ 2 (X 0 X)−1 is ’big’
⇒ great variances and covariances between parameters
• It does not affect to ŷ or e
• It could happen that none of the parameters is significative,
but the model is
• It could affect very much to some parameters and nothing to others
• Big MCE
9
Multiple Regression Linear Model Multicolinearity indexes
• Increment Variance Factor (IVF): increment of the estimator variance in
multiple regression respect to that in simple regression,
2
IV F (i) = (1 − Ri.R
)−1
i = 1, . . . , p
,
2
where Ri.R
= determination coefficient of the regression of xi respect to
x1 , . . . , xbi , . . . , xp
• IV F close to 1 means low multicolinearity
2
• Tolerance=(IV F )−1 = 1 − Ri.R
• If R = (rxi xj )(i,j) , R−1 measures the joint dependence:
{R−1 }(i,i) = IV F (i)
• The characteristic roots ofp
X 0 X, or R (better) measure it:
Conditioning Index (CI)= max.eigenvalue/min.eigenvalue
• Medium multicolinearity ≡ 10 &lt; IC ≤ 30
• Note: IC(A) = IC(A−1 )
Multiple Regression Linear Model Multicolinearity treatment
• A-priori design, aiming a ’big’ X 0 X
• When the a-priori design is not possible:
– Remove regressors
– Include external information (e.g. Bayes)
– Main components
Multiple Regression Linear Model Partial Correlation
• Given the variables x1 , . . . , xk , the partial correlation coefficient between
x1 and x2 is a coefficient that excludes the influence of the rest of variables,
r12.34&middot;&middot;&middot;k . Procedure:
– e1.34&middot;&middot;&middot;k , residuals of x1 respect to x3 , . . . , xk
– e2.34&middot;&middot;&middot;k , residuals of x2 respect to x3 , . . . , xk
– r12.34&middot;&middot;&middot;k is the correlation coefficient between e1.34&middot;&middot;&middot;k and e2.34&middot;&middot;&middot;k
• The partial correlation coefficient between y and xi is ryi.R such that
2
ryi.R
=
t2i
t2i
+n−k−1
• For 3 variables:
rxy.z = q
, with ti = β̂i /σ̂β̂i
rxy − rxz ryz
2 )(1 − r 2 )
(1 − rxz
yz
10
Multiple Regression Linear Model Stepwise regression
– begin with the most correlated with the dependent variable
– the rest use the partial correlation excluding the influence of the ones
in the model
– the procedure stops when none of the remaining variables has a significative correlation coefficient and high tolerance
• Backward: Begin with all the variables and remove one in each step
• Mixture of For and Backward
• Blocks: using blocks of variables that go in/out together
11
4
Analysis of Covariance
4.1
Introduction
Analysis of Covariance Introduction
• Very often the relationship between variables depend on qualitative variables
• Usually, when data can be grouped, the ’group factor’ should be included
in the model
• Thus we will consider linear models with explanatory variables qualitative
(factors) and quantitative (covariates)
Analysis of Covariance Fictitious variables
• When the groups (A and B) are not taken into account, the following
situations could arise
– Specification error due to the omission of a factor
– Not detect a true relation
– Decide that a false relation is significative
• Solutions to this problem
– Make separate regressions in the different groups
– Include the factor in the model:
with ZA =
1
0
y = β0 + β1 x + αzA + u
when in group A
, fictitious variable
when not in A
12
4.2
ANOCOVA
Analysis of Covariance Models
• About the explanatory variables, let us assume that we have
– One qualitative variable with p levels
⇒ p − 1 fictitious variables Z1 , ..., Zp−1
– k quantitative variables X1 , ..., Xk
• Models
1. Y = Xβ + U ≡
y = β0 + β1 x1 + &middot; &middot; &middot; + βk xk + u
≈ groups have no influence
2. Y = Xβ + Zα + U ≡
y = β0 + β1 x1 + &middot; &middot; &middot; + βk xk + z1 α1 + &middot; &middot; &middot; + zp−1 αp−1 + u
≈ groups with same slope but different intersect
3. Yi = Xi bi + εi ≡
yi = βi0 + βi1 x1 + &middot; &middot; &middot; + βik xk + εi i = 1, . . . , p
≈ different slope and intercept in each group
Analysis of Covariance Tests
• H01,3 : Groups are not influent
F 1,3 =
(SSRes1 − SSRes3 ) /(p − 1)(k + 1)
EV1→3 /DF1−3
=
N EV /DFN E
SSRes3 /[n − p(k + 1)]
The rejection of H01,3 means that Model 1 is poor,
it doesn’t explain enough, groups should be taken into account
⇒ the right model will be either Model 2 or Model 3
We need to perform another test to decide between these two:
• H02,3 : There is no interaction between factors and regressors
F 2,3 =
EV2→3 /DF2−3
(SSRes2 − SSRes3 ) /k(p − 1)
=
N EV /DFN E
SSRes3 /[n − p(k + 1)]
A significative result means that Model 2 does not explain enough
there exists interaction between factors and regressors,
the regression lines are different in the different groups
⇒ we need Model 3
13
5
Extensions of Regression Model
• Qualitative response variable
– Regression model
– Generalized linear models: Logit, Probit
• Polynomial models
– One explanatory variable
– Response surfaces
• Recursive estimation
14
```