Multilevel Models for Economists II: Estimation, Prediction and

Anuncio
Motivation
Inferential strategies
An empirical example
Final remarks
Multilevel Models for Economists II: Estimation,
Prediction and Inference
Walter Sosa-Escudero
Universidad de San Andres
Agosto de 2011
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
A Warm Up Example
Raudenbusch and Bryk (1986). Seminal paper
Yij : math achievement student i in school j.
Xij : socio-economic status (SES).
Zj : school is catholic.
Questions
Are SES effects heterogeneous?
If so, how much of this heterogeneity is due to school
variation?
How much to the fact that schools are catholic or not?
Note the strong interest in variance components?
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Consider a simple random coefficients model
The ‘within model’:
Yij = β0j + β1j Xij + rij
The ‘between model’:
βsj = µs0 + usj ,
s = 0, 1
In a two stage approach, βsj can be estimated by OLS at every
school. Let β̂sj be these estimates.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Variance components
Let esj = β̂sj − βsj (OLS estimation errors in the first stage.
Then, replacing above
β̂sj = µs0 + usj + esj
Then, if usj and esj are uncorrelated
V (β̂sj ) = V (usj ) + V (esj )
The variance of the first stage estimates are composed of
parameter and sampling variance.
The first can be estimated as a variance component.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
The question is how much of the observed dispersion in
estimates is due to school specific factors vs just sampling
variability.
Slopes as outcomes: add explanatory variables in the between
regression (catholic): further variance decomposition: how
much of the explained school variability can be captured by
school being catholic or not.
Note that the within model (object of interest in
econometrics) is something like an input.
We are leaving aside endogeneity of SES (very important).
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
The Mixed Linear Model
Y = Xβ + Zδ + δ ∼ (0, Ω), ∼ (0, Σ)
Careful with jargon:
β is the fixed effect.
δ is the random effect.
X and Z are observable
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Statistical inference
Estimate β: unbiased, consistent, efficient,...
Estimate its sampling variance (consistently).
Estimate relevant components of Ω and Σ.
Predict δ.
Evaluate relevant hypothesis. Confidence intervals, etc.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Some relevant comments
In econometrics most of the interest lies in estimating β and
the sampling variance consistently. This implies dealing with
‘non-spherical’ errors induced by leaving Zδ in the error term
(correlation, heteroskedasticity). Also, exogeneitys of X wrt
to Zδ is a major issue.
Simple ‘semi or nonparametris’ estimation (closest to OLS)
and ‘fix standard errors’. Minimal probabilistic structure.
The MLM literature is mostly intested in efficient estimation
and prediction, and in elements of the variance components.
Favors full model (MLE or bayesian) estimation under
normality.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Do economists care about variance components?
Sometimes. Rural income in El Salvador (Arias, Marchionni and
Sosa Escudero (Journal of Income Distribution, 2011)
Goal: 1) is persistent poverty due to invariant individual
characteristic or ‘bad luck’ persistence. 2) How much these two
dimensions explain income inequality.
yit = x0it β + uit
uit ≡ µi + vit ,
vit = ψvi,t−1 + eit ,
Walter Sosa-Escudero
µi ∼ (0, σµ2 )
eit ∼ (0, σe2 )
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Notation
σu2 = σµ2 + σv2 , σv2 = σe2 /(1 − ψ 2 ),
It is easy to show that
λ ≡ σµ2 /σu2
ρs ≡ Cor(uit , ui,t−s ) = λ + (1 − λ)φs
Persistence is an average of both sources of persistences.
Technically: an error components model with random effects and
first order serial correlation. The goal is to decompose persistence.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Volatility is the major source of variation in incomes across
rural families in El Salvador, much more so than in developed
countries. In any given year, the majority of variation in
incomes is not accounted for by education or observed family
characteristics but instead is due to transitory shocks;
As far as the persistence of incomes over a lifetime, most
(around two-thirds) of the persistency in low and high income
states is due to idiosyncratic differences between families
related to endowments, including unobserved income
determinants (unobserved heterogeneity). The persistence of
bad shocks is of second order given that the correlation of bad
shocks is relatively low (0.24) in these data. So over a lifetime
transitory components average out.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Inferential Strategies
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
OLS and WLS based
We will focus on the slopes as outcomes model (SOM)
Y = Xβ + u
β = Zγ + δ,
which can be expressed as
Y = U γ + Xδ + with U ≡ XZ. We will asumme V (u) = σ 2 I and V (δ) = Ω.
We will assume X is always exogenous wrt to Zγ, δ and u. We’ll
explore this later on.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Two step method:
First
β̂ = (X 0 X)−1 X 0 Y,
for every school. Then
γ̂ = (Z 0 Z)−1 Z 0 β̂
One step method:
γ̂ = (U 0 U )−1 U 0 Y
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Both unbiased and consistent.
Neither efficient in the Gauss-Markov sense.
Very careful with variance esitmation. Both non-spherical
variances render standard variance estimators inconsistent.
Note the second step is a multivariate regression.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Recall Y = U γ + Xδ + = U γ + η, η ≡ Xδ + Then
V ≡ V (β) = XΩX 0 + σ 2 I
Then, a GLS, BLUE estimator is
γ̂gls = (U V −1 U )−1 U V −1 Y
FGLS replaces Ω and σ 2 by consistent estimates. Two stage
strataegy. Variance components first.
FGLS not BLUE.
Careful Ω 6= σδ2 I is more the rule than the exception (think
simple random effects), due to just individual variation.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
The BLUE and the BLUP
Go back to our original formulation of the SOM
Y = Xβ + u
β = Zγ + δ,
We have already dealt with how to estimate γ optimally (BLUE).
Now β is a random variable, it cannot be ‘estimated’, but
predicted.
The best linear unbiased predictor (BLUP) of β is the value that
minimizes the mean squared prediction error.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Result (see de Leeuw and Meijer, pp. 27): the BLUP is given by
β̂B = ΩW −1 β̂ + (I − ΩW −1 )Z γ̂
with W ≡ Ω + σ 2 (X 0 X)−1
First: this is computed at the school level (for every school).
Striking? The BLUP is NOT the first stage estimate β̂,
neither Z γ̂.
It is a weighted average of β̂ and Z γ̂. It is a standard
‘shrinkage’ argument.
In practice, unknown magnitudes are replaced by estimates.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Variance components
Note that the sampling variance of the within model gives a
consistent estimate of σ 2 (estimating first by OLS.
Also note that, from the between model
V (β) = Ω
so a consistent estimate can be based on
m
Ω̂ =
1 X
(β̂j − Zj γ̂)(β̂j − Zj γ̂)0
m
j=1
This is a big topic in the literature. See Serle, Casella and
MacCullogh (2006).
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Full Information ML
Start from Recall Y = U γ + Xδ + = U γ + η, η ≡ Xδ + , with
V ≡ V (β) = XΩX 0 + σ 2 I
Then, under the assumption that and δ are jointly normal, the
log-likelihood for the SOM model is
L(γ, Ω, σj2 ) ∝ ln |V | + (y − U γ)0 V −1 (y − U γ)
Joint maximization leads to the full information maximum
likelihood estimator of all parameters.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Usual good properties of MLE hold (consistency, asymptotic
normality, CUAN efficiency, invariance).
Some usual concerns apply as well.
Besides normality, L is an interesting loss function per-se
(deviance), in a ‘quasi likelihood’ context. Penalizes lack of fit
in terms of mean and variance simultaneously.
Note that given V it leads to GLS. Iterative estimation (IGLS).
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Residual ML
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Bayesian approaches
A twighlight zone for applied economists...
Parameters now treated as random variables (we have
mentioned this before...). θ = (γ, Ω, σj2 ). Assume ‘knowledge’
means a completely specified prior distribution for θ, called
π(θ)
The goal is to upgrade this knowledge by looking at the data.
This is done throgh Bayes’ Theorem:
p(θ|y) =
f (y|θ)π(θ)
= C f (y|θ)π(θ)
f (y)
p(θ|y) is the posterior distribution that arises from mixing the
information already contained in π(θ) (known) with that
learned by estimating f (y|θ).
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
‘Parameters’ arise through a conditioning argument.
Once we get the posterior we can do anything we want
(compute means, variance, etc.).
Implementation is a whole new world. See de Leeuw and
Meijer (2008) for an EM algorithm for the SOM model.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Robust covariance estimation
Recall that the two-step estimator of γ is
γ̂ = (Z 0 Z)−1 Z 0 β̂ = AZ 0 β̂
In vector terms, it can be written as
γ̂ = A
m
X
Zj β̂j
j=1
Then, its variance is


m
X
V (γ̂) = A 
Zj V (β̂j )Zj0  A0
j=1
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
A cluster-robust covariance matrix is


m
X
V̂ (γ̂) = A 
Zj t̂j t̂0j Zj0  A0
j=1
where
t̂j ≡ β̂j − Zj γ̂
This is a consistent estimator (large m) that takes into account
cluster correlation.
Using a similar reasoning, a cluster robust covariance estimator can
be produced for the one-step estimator.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Endogeneities and the Hausman-Taylor estimator
Consideremos el siguiente modelo:
yit = xit β + zi γ + µi + vit
zi son variables explicativas que no cambian en el tiempo
(tamaño de un pais).
Supongamos que xit y zi estan correlacionadas con µi .
Efectos fijos estima consistentemente β.
Pero no es posible estimar γ.
(porque?)
Buscaremos un estimador de variables instrumentales que
ademas de estimar γ es potencialmente mas eficiente que el
de EF.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Sea u un vector N T con elemento caracteristico uit = µi + vit , y
Ω ≡ V (u).
El modelo original puede reescribirse en terminos matriciales como:
y = x δ + u
con x = [x z] y δ 0 = [β 0 γ 0 ].
Premultipliquemos por Ω−1/2 :
Ω−1/2 y = Ω−1/2 x δ + Ω−1/2 u
y ∗ = x∗ δ + u∗
con y ∗ ≡ Ω−1/2 y, x∗ ≡ Ω−1/2 x , y u∗ ≡ Ω−1/2 u.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Notar que
E(u∗ u∗0 ) = E(Ω−1/2 uu0 Ω−1/20 ) = E(Ω−1/2 ΩΩ−1/20 )
= E(Ω−1/2 Ω1/2 Ω1/20 Ω−1/20 )
= I
Entonces, si todas las variables explicativas, x y z, fuesen exogenas,
el estimador eficiente es el de MCO sobre el modelo transformado,
que es el estimador de MCG (o estimador de efectos aleatorios).
• Esto permite lidiar con el problema de la presencia de efectos
aleatorios.
• Pero no resuelve el problema de endogeneidad.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Hausman y Taylor (1981): supongamos que las variables
explicativas pueden ser particionadas de la siguiente forma:
x ≡ [x1 x2 ] en donde x1 son k1 variables explicativas no
correlacionadas con µ.
z ≡ [z1 z2 ] en donde z2 son g2 variables explicativas
correlacionadas con µ y z1 son g1 variables no-correlacionadas
con µ.
y definamos:
x̃ son todas las variables x como desvios de sus medias
historicas.
x̄1 son las medias historicas de x1 .
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
En este caso los instrumentos validos pueden ser:
x̃: los desvios con respecto a la media de todas las variables
son, por definicion, ortogonales a las cosas invariante en el
tiempo.
x̄1 : las medias de las variables exogenas son instrumentos
validos.
z1 : las variables exogenas invariantes en el tiempo tambien lo
son.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Logica:
El problema es la correlacion entre las variables y µi (que no
varia en el tiempo).
Cada variable xit en realidad tiene dos fuentes de variacion:
su media historia y sus desvios con respecto
(xit ≡ x̄i + (xit − x̄i )).
Las variables invariantes en el tiempo tienen una sola fuente
de variabilidad.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Hausman-Taylor: estimar
Ω−1/2 y = Ω−1/2 x δ + Ω−1/2 u
usando x̃, x̄1 y z1 como instrumentos.
• Problema: Ω no es observable!
• Solucion: notar que
di ≡ ȳi − x̄0i β = zi0 γ + µi + v̄i
Si β fuese observable, el problema consiste en estimar este modelo
auxiliar usando di como variable explicada, y z1 y x̄1 como
instrumentos validos.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Como β no es observable:
1
2
Estimar β usando el estimador de EF (es consistente!).
Definir los residuos dˆit ≡ yit − x0it βEF .
Estimar el modelo auxiliar por VI reemplazando di por dˆi .
3
A partir de aqui es posible estimar los componentes de Ω (ya
lo vimos!) y por lo tanto obtener Ω̂.
4
Aplicar VI en el modelo original, reemplazando Ω por Ω̂.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Cuestiones de identificacion:
Si k1 < g2 , δ no es idenfificable y el estimador de HT se
reduce al de efectos fijos (porque?).
Si k1 = g2 , δ es identificable y el estimador de HT se reduce
al de EF (porque?).
Si k1 > g2 , δ es idenfiticable y el estimador de HT es mas
eficiente que EF (porque?)
• Notar que la identificacion de δ tiene que ver con que haya como
minimo igual cantidad de variables exogenas que varian por i y t
que variables endogenas varian solo por i.
• La ganancia de eficiencia aparece solo en el ultimo caso.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Summary
Under exogeneity, OLS is always consistent and unbiased.
Never efficient
Standard variances are never consistent.
GLS strategies recover BLUE, require variance components
estimation first.
BLUP prediction is a relevant issue in many applications.
Full MLE estimates regression and variance parameters jointly.
Cluster robust covariance matrices can be easily obtained. Do
not lead (trivially) to variance components (recall de earnings
dynamics example)
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Empirical example
Raudenbusch and Bryk (1986) revisited:
High School and Beyond survey.
10,231 students from 82 Catholic schools and 94 public
school.
Yij : math achievement student i in school j.
Xij : socio-economic status (SES).
Zi : 1 iff catholic.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
A first random coefficients model
Intercept variability is mostly school variability.
Slope variability is mostly sampling variability (a third,
nevertheless).
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
A slopes as outcomes model (catholic dummy explaining slopes):
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Estimates in equation form
Careful: we have used Xijt already centered.
HW: how much of parameter variability is due to: 1) school
variation? 2) the fact that schools are catholic?
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Multilevel models in economics
The veredict
When the interest lies in the fixed part of the model,
OLS-robust standard errors (the econometrics standard)
works.
Efficiency arguments are rather unconvincing in econometrics
and require quantitative exploration and argumentation,
specially in the small sample framework (not enough to claim
‘it is inefficient’).
SOM provides a useful, easy to communicate model to deal
with a non-linearity (interaction) that induces heterogeneous
effects. Careful, the argument is usually framed upside down.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
SOM or MLM are relevant when the interest lies in error
components, variance estimation, ANOVA type of reasoning.
Prediction.
SOM or MLM can be seen as a ‘canned’ way to deal with
heterogeneities and non-spherical errors (cluster correlation or
heteroskedasticity).
Reliability of cluster robust variances is still an empirically
open question.
All ‘Mostly Harmless’ concerns apply to the literature. Be
very careful with causal interpretations.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
References and readings
Some useful readings for econometricians
Most basic multilevel books are too simple and have way too
many examples and little theory.
de Leeuw and Meijer (2008). Very well written with notation
similar to that in the econometrics literature. Provides some
useful links.
Raudenbusck and Bryk (2001) is a very useful book, well
balanced.
The second edition of Hsiao’s text (2003) has useful material,
though jargon like ‘hierarchical or multilevel or SOM never
shows up!
Croissant and Millo (2008) compare econometrics panels and
mixed linear approaches.
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Motivation
Inferential strategies
An empirical example
Final remarks
Software
HLM and MLwiN
R: plm(panel), lme4 and nlme.
Stata? xtmixed
Walter Sosa-Escudero
Multilevel Models for Economists II: Estimation, Prediction and In
Descargar