Motivation Inferential strategies An empirical example Final remarks Multilevel Models for Economists II: Estimation, Prediction and Inference Walter Sosa-Escudero Universidad de San Andres Agosto de 2011 Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks A Warm Up Example Raudenbusch and Bryk (1986). Seminal paper Yij : math achievement student i in school j. Xij : socio-economic status (SES). Zj : school is catholic. Questions Are SES effects heterogeneous? If so, how much of this heterogeneity is due to school variation? How much to the fact that schools are catholic or not? Note the strong interest in variance components? Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Consider a simple random coefficients model The ‘within model’: Yij = β0j + β1j Xij + rij The ‘between model’: βsj = µs0 + usj , s = 0, 1 In a two stage approach, βsj can be estimated by OLS at every school. Let β̂sj be these estimates. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Variance components Let esj = β̂sj − βsj (OLS estimation errors in the first stage. Then, replacing above β̂sj = µs0 + usj + esj Then, if usj and esj are uncorrelated V (β̂sj ) = V (usj ) + V (esj ) The variance of the first stage estimates are composed of parameter and sampling variance. The first can be estimated as a variance component. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks The question is how much of the observed dispersion in estimates is due to school specific factors vs just sampling variability. Slopes as outcomes: add explanatory variables in the between regression (catholic): further variance decomposition: how much of the explained school variability can be captured by school being catholic or not. Note that the within model (object of interest in econometrics) is something like an input. We are leaving aside endogeneity of SES (very important). Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks The Mixed Linear Model Y = Xβ + Zδ + δ ∼ (0, Ω), ∼ (0, Σ) Careful with jargon: β is the fixed effect. δ is the random effect. X and Z are observable Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Statistical inference Estimate β: unbiased, consistent, efficient,... Estimate its sampling variance (consistently). Estimate relevant components of Ω and Σ. Predict δ. Evaluate relevant hypothesis. Confidence intervals, etc. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Some relevant comments In econometrics most of the interest lies in estimating β and the sampling variance consistently. This implies dealing with ‘non-spherical’ errors induced by leaving Zδ in the error term (correlation, heteroskedasticity). Also, exogeneitys of X wrt to Zδ is a major issue. Simple ‘semi or nonparametris’ estimation (closest to OLS) and ‘fix standard errors’. Minimal probabilistic structure. The MLM literature is mostly intested in efficient estimation and prediction, and in elements of the variance components. Favors full model (MLE or bayesian) estimation under normality. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Do economists care about variance components? Sometimes. Rural income in El Salvador (Arias, Marchionni and Sosa Escudero (Journal of Income Distribution, 2011) Goal: 1) is persistent poverty due to invariant individual characteristic or ‘bad luck’ persistence. 2) How much these two dimensions explain income inequality. yit = x0it β + uit uit ≡ µi + vit , vit = ψvi,t−1 + eit , Walter Sosa-Escudero µi ∼ (0, σµ2 ) eit ∼ (0, σe2 ) Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Notation σu2 = σµ2 + σv2 , σv2 = σe2 /(1 − ψ 2 ), It is easy to show that λ ≡ σµ2 /σu2 ρs ≡ Cor(uit , ui,t−s ) = λ + (1 − λ)φs Persistence is an average of both sources of persistences. Technically: an error components model with random effects and first order serial correlation. The goal is to decompose persistence. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Volatility is the major source of variation in incomes across rural families in El Salvador, much more so than in developed countries. In any given year, the majority of variation in incomes is not accounted for by education or observed family characteristics but instead is due to transitory shocks; As far as the persistence of incomes over a lifetime, most (around two-thirds) of the persistency in low and high income states is due to idiosyncratic differences between families related to endowments, including unobserved income determinants (unobserved heterogeneity). The persistence of bad shocks is of second order given that the correlation of bad shocks is relatively low (0.24) in these data. So over a lifetime transitory components average out. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Inferential Strategies Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks OLS and WLS based We will focus on the slopes as outcomes model (SOM) Y = Xβ + u β = Zγ + δ, which can be expressed as Y = U γ + Xδ + with U ≡ XZ. We will asumme V (u) = σ 2 I and V (δ) = Ω. We will assume X is always exogenous wrt to Zγ, δ and u. We’ll explore this later on. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Two step method: First β̂ = (X 0 X)−1 X 0 Y, for every school. Then γ̂ = (Z 0 Z)−1 Z 0 β̂ One step method: γ̂ = (U 0 U )−1 U 0 Y Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Both unbiased and consistent. Neither efficient in the Gauss-Markov sense. Very careful with variance esitmation. Both non-spherical variances render standard variance estimators inconsistent. Note the second step is a multivariate regression. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Recall Y = U γ + Xδ + = U γ + η, η ≡ Xδ + Then V ≡ V (β) = XΩX 0 + σ 2 I Then, a GLS, BLUE estimator is γ̂gls = (U V −1 U )−1 U V −1 Y FGLS replaces Ω and σ 2 by consistent estimates. Two stage strataegy. Variance components first. FGLS not BLUE. Careful Ω 6= σδ2 I is more the rule than the exception (think simple random effects), due to just individual variation. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks The BLUE and the BLUP Go back to our original formulation of the SOM Y = Xβ + u β = Zγ + δ, We have already dealt with how to estimate γ optimally (BLUE). Now β is a random variable, it cannot be ‘estimated’, but predicted. The best linear unbiased predictor (BLUP) of β is the value that minimizes the mean squared prediction error. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Result (see de Leeuw and Meijer, pp. 27): the BLUP is given by β̂B = ΩW −1 β̂ + (I − ΩW −1 )Z γ̂ with W ≡ Ω + σ 2 (X 0 X)−1 First: this is computed at the school level (for every school). Striking? The BLUP is NOT the first stage estimate β̂, neither Z γ̂. It is a weighted average of β̂ and Z γ̂. It is a standard ‘shrinkage’ argument. In practice, unknown magnitudes are replaced by estimates. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Variance components Note that the sampling variance of the within model gives a consistent estimate of σ 2 (estimating first by OLS. Also note that, from the between model V (β) = Ω so a consistent estimate can be based on m Ω̂ = 1 X (β̂j − Zj γ̂)(β̂j − Zj γ̂)0 m j=1 This is a big topic in the literature. See Serle, Casella and MacCullogh (2006). Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Full Information ML Start from Recall Y = U γ + Xδ + = U γ + η, η ≡ Xδ + , with V ≡ V (β) = XΩX 0 + σ 2 I Then, under the assumption that and δ are jointly normal, the log-likelihood for the SOM model is L(γ, Ω, σj2 ) ∝ ln |V | + (y − U γ)0 V −1 (y − U γ) Joint maximization leads to the full information maximum likelihood estimator of all parameters. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Usual good properties of MLE hold (consistency, asymptotic normality, CUAN efficiency, invariance). Some usual concerns apply as well. Besides normality, L is an interesting loss function per-se (deviance), in a ‘quasi likelihood’ context. Penalizes lack of fit in terms of mean and variance simultaneously. Note that given V it leads to GLS. Iterative estimation (IGLS). Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Residual ML Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Bayesian approaches A twighlight zone for applied economists... Parameters now treated as random variables (we have mentioned this before...). θ = (γ, Ω, σj2 ). Assume ‘knowledge’ means a completely specified prior distribution for θ, called π(θ) The goal is to upgrade this knowledge by looking at the data. This is done throgh Bayes’ Theorem: p(θ|y) = f (y|θ)π(θ) = C f (y|θ)π(θ) f (y) p(θ|y) is the posterior distribution that arises from mixing the information already contained in π(θ) (known) with that learned by estimating f (y|θ). Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks ‘Parameters’ arise through a conditioning argument. Once we get the posterior we can do anything we want (compute means, variance, etc.). Implementation is a whole new world. See de Leeuw and Meijer (2008) for an EM algorithm for the SOM model. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Robust covariance estimation Recall that the two-step estimator of γ is γ̂ = (Z 0 Z)−1 Z 0 β̂ = AZ 0 β̂ In vector terms, it can be written as γ̂ = A m X Zj β̂j j=1 Then, its variance is m X V (γ̂) = A Zj V (β̂j )Zj0 A0 j=1 Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks A cluster-robust covariance matrix is m X V̂ (γ̂) = A Zj t̂j t̂0j Zj0 A0 j=1 where t̂j ≡ β̂j − Zj γ̂ This is a consistent estimator (large m) that takes into account cluster correlation. Using a similar reasoning, a cluster robust covariance estimator can be produced for the one-step estimator. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Endogeneities and the Hausman-Taylor estimator Consideremos el siguiente modelo: yit = xit β + zi γ + µi + vit zi son variables explicativas que no cambian en el tiempo (tamaño de un pais). Supongamos que xit y zi estan correlacionadas con µi . Efectos fijos estima consistentemente β. Pero no es posible estimar γ. (porque?) Buscaremos un estimador de variables instrumentales que ademas de estimar γ es potencialmente mas eficiente que el de EF. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Sea u un vector N T con elemento caracteristico uit = µi + vit , y Ω ≡ V (u). El modelo original puede reescribirse en terminos matriciales como: y = x δ + u con x = [x z] y δ 0 = [β 0 γ 0 ]. Premultipliquemos por Ω−1/2 : Ω−1/2 y = Ω−1/2 x δ + Ω−1/2 u y ∗ = x∗ δ + u∗ con y ∗ ≡ Ω−1/2 y, x∗ ≡ Ω−1/2 x , y u∗ ≡ Ω−1/2 u. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Notar que E(u∗ u∗0 ) = E(Ω−1/2 uu0 Ω−1/20 ) = E(Ω−1/2 ΩΩ−1/20 ) = E(Ω−1/2 Ω1/2 Ω1/20 Ω−1/20 ) = I Entonces, si todas las variables explicativas, x y z, fuesen exogenas, el estimador eficiente es el de MCO sobre el modelo transformado, que es el estimador de MCG (o estimador de efectos aleatorios). • Esto permite lidiar con el problema de la presencia de efectos aleatorios. • Pero no resuelve el problema de endogeneidad. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Hausman y Taylor (1981): supongamos que las variables explicativas pueden ser particionadas de la siguiente forma: x ≡ [x1 x2 ] en donde x1 son k1 variables explicativas no correlacionadas con µ. z ≡ [z1 z2 ] en donde z2 son g2 variables explicativas correlacionadas con µ y z1 son g1 variables no-correlacionadas con µ. y definamos: x̃ son todas las variables x como desvios de sus medias historicas. x̄1 son las medias historicas de x1 . Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks En este caso los instrumentos validos pueden ser: x̃: los desvios con respecto a la media de todas las variables son, por definicion, ortogonales a las cosas invariante en el tiempo. x̄1 : las medias de las variables exogenas son instrumentos validos. z1 : las variables exogenas invariantes en el tiempo tambien lo son. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Logica: El problema es la correlacion entre las variables y µi (que no varia en el tiempo). Cada variable xit en realidad tiene dos fuentes de variacion: su media historia y sus desvios con respecto (xit ≡ x̄i + (xit − x̄i )). Las variables invariantes en el tiempo tienen una sola fuente de variabilidad. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Hausman-Taylor: estimar Ω−1/2 y = Ω−1/2 x δ + Ω−1/2 u usando x̃, x̄1 y z1 como instrumentos. • Problema: Ω no es observable! • Solucion: notar que di ≡ ȳi − x̄0i β = zi0 γ + µi + v̄i Si β fuese observable, el problema consiste en estimar este modelo auxiliar usando di como variable explicada, y z1 y x̄1 como instrumentos validos. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Como β no es observable: 1 2 Estimar β usando el estimador de EF (es consistente!). Definir los residuos dˆit ≡ yit − x0it βEF . Estimar el modelo auxiliar por VI reemplazando di por dˆi . 3 A partir de aqui es posible estimar los componentes de Ω (ya lo vimos!) y por lo tanto obtener Ω̂. 4 Aplicar VI en el modelo original, reemplazando Ω por Ω̂. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Cuestiones de identificacion: Si k1 < g2 , δ no es idenfificable y el estimador de HT se reduce al de efectos fijos (porque?). Si k1 = g2 , δ es identificable y el estimador de HT se reduce al de EF (porque?). Si k1 > g2 , δ es idenfiticable y el estimador de HT es mas eficiente que EF (porque?) • Notar que la identificacion de δ tiene que ver con que haya como minimo igual cantidad de variables exogenas que varian por i y t que variables endogenas varian solo por i. • La ganancia de eficiencia aparece solo en el ultimo caso. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Summary Under exogeneity, OLS is always consistent and unbiased. Never efficient Standard variances are never consistent. GLS strategies recover BLUE, require variance components estimation first. BLUP prediction is a relevant issue in many applications. Full MLE estimates regression and variance parameters jointly. Cluster robust covariance matrices can be easily obtained. Do not lead (trivially) to variance components (recall de earnings dynamics example) Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Empirical example Raudenbusch and Bryk (1986) revisited: High School and Beyond survey. 10,231 students from 82 Catholic schools and 94 public school. Yij : math achievement student i in school j. Xij : socio-economic status (SES). Zi : 1 iff catholic. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks A first random coefficients model Intercept variability is mostly school variability. Slope variability is mostly sampling variability (a third, nevertheless). Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks A slopes as outcomes model (catholic dummy explaining slopes): Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Estimates in equation form Careful: we have used Xijt already centered. HW: how much of parameter variability is due to: 1) school variation? 2) the fact that schools are catholic? Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Multilevel models in economics The veredict When the interest lies in the fixed part of the model, OLS-robust standard errors (the econometrics standard) works. Efficiency arguments are rather unconvincing in econometrics and require quantitative exploration and argumentation, specially in the small sample framework (not enough to claim ‘it is inefficient’). SOM provides a useful, easy to communicate model to deal with a non-linearity (interaction) that induces heterogeneous effects. Careful, the argument is usually framed upside down. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks SOM or MLM are relevant when the interest lies in error components, variance estimation, ANOVA type of reasoning. Prediction. SOM or MLM can be seen as a ‘canned’ way to deal with heterogeneities and non-spherical errors (cluster correlation or heteroskedasticity). Reliability of cluster robust variances is still an empirically open question. All ‘Mostly Harmless’ concerns apply to the literature. Be very careful with causal interpretations. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks References and readings Some useful readings for econometricians Most basic multilevel books are too simple and have way too many examples and little theory. de Leeuw and Meijer (2008). Very well written with notation similar to that in the econometrics literature. Provides some useful links. Raudenbusck and Bryk (2001) is a very useful book, well balanced. The second edition of Hsiao’s text (2003) has useful material, though jargon like ‘hierarchical or multilevel or SOM never shows up! Croissant and Millo (2008) compare econometrics panels and mixed linear approaches. Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In Motivation Inferential strategies An empirical example Final remarks Software HLM and MLwiN R: plm(panel), lme4 and nlme. Stata? xtmixed Walter Sosa-Escudero Multilevel Models for Economists II: Estimation, Prediction and In