Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The Multiple Regression Model ECON 371: Econometrics September 22, 2021 ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Motivation for Multiple Regression The Model with k Independent Variables Mechanics and Interpretation of OLS Interpreting the OLS Regression Line The Expected Value of the OLS Estimators Inclusion of Irrelevant Variables Omitted Variable Bias The Variance of the OLS Estimators Variances in Misspeficied Models Estimating the Error Variance Efficiency of OLS: The Gauss-Markov Theorem ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The Model with k Independent Variables I Consider an extension of the ln(wage) equation we used for simple regression: ln(wage) = β0 + β1 educ + β2 IQ + u (1) where IQ is IQ score (in the population, it has a mean of 100 and sd = 15). I We are primarily interested in β1 , but β2 is of some interest, too. I We took IQ out of the error term, explicitly including it in the equation. If IQ is a good proxy for intelligence, this may lead to a more persuasive estimate of the causal effect of schooling. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The Model with k Independent Variables I As another example, what is the relationship between missing lecture and final exam score? I In a simple regression approach, we would relate final to missed (number of lectures missed). But part of the error term would be a student’s ability. I Instead, how about final = β0 + β1 missed + β2 priGPA + u (2) where priGPA is grade point average at the beginning of the term. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The Model with k Independent Variables Generally, we can write a model with two independent variables as y = β0 + β1 x1 + β2 x2 + u, where β0 is the intercept, β1 measures the change in y with respect to x1 , holding other factors fixed, and β2 measures the change in y with respect to x2 , holding other factors fixed. ECON 371: Econometrics The Multiple Regression Model (3) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The Model with k Independent Variables In the model with two independent variables, the key assumption about how u is related to x1 and x2 is E (u|x1 , x2 ) = 0. (4) I For any values of x1 and x2 in the population, the average unobservable is equal to zero. (The value zero is not important because we have an intercept, β0 in the equation.) ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The Model with k Independent Variables I In the wage equation, the assumption is E (u|educ, IQ) = 0. Now u no longer contains intelligence (we hope), and so this condition has a better chance of being true. In simple regression, we had to assume IQ and educ are unrelated to justify leaving IQ in the error term. I Other factors, such as workforce experience and ”motivation”, are part of u. Motivation is very difficult to measure. Experience is easier: ln(wage) = β0 + β1 educ + β2 IQ + β3 exper + u. ECON 371: Econometrics The Multiple Regression Model (5) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The Model with k Independent Variables I The multiple linear regression model can be written in the population as y = β0 + β1 x1 + β2 x2 + ... + βk xk + u (6) where β0 is the intercept, β1 is the parameter associated with x1 , β2 is the parameter associated with x2 , and so on. I Contains k + 1 (unknown) population parameters. We call β1 , ..., βk the slope parameters. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The Model with k Independent Variables I Now we have multiple explanatory or independent variables. We still have one explained or dependent variable and an error term, u. I Multiple regression allows more flexible functional forms: ln(wage) = β0 +β1 educ +β2 IQ +β3 exper +β4 exper 2 +u, (7) so that exper is allowed to have a quadratic effect on ln(wage). I Take x1 = educ, x2 = IQ, x3 = exper, and x4 = exper 2 . Note that x4 is a nonlinear function of x3 . ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The Model with k Independent Variables I We already know that 100 · β1 is the(ceteris paribus) percentage change in wage [not ln(wage)!] when educ increases by one year. 100 · β2 has a similar interpretation (for a one point increase in IQ). I β3 and β4 are harder to interpret, but we can use calculus to get the slope of lwage with respect to exper : ∂lwage = β3 + 2β4 exper ∂exper Multiply by 100 to get the percentage effect. ECON 371: Econometrics The Multiple Regression Model (8) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The Model with k Independent Variables I The key assumption for the general multiple regression model is easy to state in terms of a conditional expectation: E (u|x1 , ..., xk ) = 0 (9) I Provided we are careful, we can make this condition closer to being true by ”controlling for” more variables. In the wage example, we ”control for” IQ when estimating the return to education. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line Suppose we have x1 and x2 (k = 2) along with y . We want to fit an equation of the form ŷ = β̂0 + β̂1 x1 + β̂2 x2 (10) given data {(xi1 , xi2 , yi ) : i = 1, ..., n}. The sample size is again n. Now the explanatory variables have two subscripts: i is the observation number (as always) and the second subscript (1 and 2 in this case) are labels for particular variables. For example, xi1 = educi , i = 1, ..., n (11) xi2 = IQi , i = 1, ..., n (12) ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I As in the simple regression case, there are different ways to motivate OLS. We choose β̂0 , β̂1 , and β̂2 (so three unknowns) to minimize the sum of squared residuals, n X (yi − β̂0 − β̂1 xi1 − β̂2 xi2 )2 (13) i=1 I The case with k independent variables is easy to state: choose the k + 1 values β̂0 , β̂1 , β̂2 , ..., β̂k to minimize n X (yi − β̂0 − β̂1 xi1 − ... − β̂k xik )2 i=1 ECON 371: Econometrics The Multiple Regression Model (14) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line This can be solved with multivariable calculus. The OLS first order conditions are the k + 1 linear equations in the k + 1 unknowns β̂0 , β̂1 , β̂2 , ..., β̂k : n X (yi − β̂0 − β̂1 xi1 − ... − β̂k xik ) = 0 (15) xi1 (yi − β̂0 − β̂1 xi1 − ... − β̂k xik ) = 0 (16) xi2 (yi − β̂0 − β̂1 xi1 − ... − β̂k xik ) = 0 (17) .. . (18) xik (yi − β̂0 − β̂1 xi1 − ... − β̂k xik ) = 0 (19) i=1 n X i=1 n X i=1 n X i=1 ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I We will discuss the condition needed for this set of equations to have a unique solution. Computers are good at finding the solution. I The OLS regression line is generally written ŷ = β̂0 + β̂1 x1 + β̂2 x2 + ... + β̂k xk (20) I In applications, we will replace y and the xj with real variable names, and the β̂j with the numbers provided by Stata (or any other regression package). ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I This is important. The slope coefficients now explicitly have ceteris paribus interpretations. I Consider k = 2: ŷ = β̂0 + β̂1 x1 + β̂2 x2 (21) ∆ŷ = β̂1 ∆x1 + β̂2 ∆x2 (22) Then, tells us how predicted y changes when x1 and x2 change by any amount. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line What if we ”hold x2 fixed” ? ∆ŷ = β̂1 ∆x1 if ∆x2 = 0 (23) ∆ŷ = β̂2 ∆x2 if ∆x1 = 0 (24) Similarly, We call β̂1 and β̂2 partial effects or ceteris paribus effects. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line Terminology We say that β̂0 , β̂1 , ..., β̂k are the OLS estimates from the regression y on x1 , x2 , ..., xk (25) yi on xi1 , xi2 , ..., xik , i = 1, ..., n (26) or when we want to emphasize the sample being used. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line EXAMPLE: Final Exam Score and Missed Classes I Sample of 680 MSU students in Economics 201 (from several years ago). Attendance was recorded electronically. I We first regress final (out of 40 points) on missed(lectures missed out of 32) in a simple regression analysis. Then we add priGPA (prior GPA) as a control for quality of student. d = 26.60 − .121 missed final (27) n = 680 (28) I So the simple regression estimate implies that, say, 10 more missed classes reduces the predicted score by about 1.2 points (out of 40, with sd = 4.71). ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I Now we control for prior GPA: d = 17.42 + 0.017 missed + 3.24 priGPA final (29) n = 680 (30) I The coefficient on missed actually becomes positive, but it is very small, anyway. (Later, we will see it is not statistically different from zero.) I The coefficient on priGPA means that one more point on prior GPA (for example, from 2.5 to 3.5) predicts a final exam score that is 3.24 points higher, holding the number of missed class constant. I Note that missed and priGPA are negatively correlated, Corr (missed, priGPA) = −0.427. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line Now the wage example: \ = 5.97 + 0.0598 educ lwage (31) n = 935 (32) \ = 5.66 + 0.0391 educ + .0058 IQ lwage (33) n = 935 (34) ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I The estimated return to a year of education falls from about 6.0% to about 3.9% when we control for IQ. I In the multiple regression, we do this thought experiment: take two people, A and B, with the same IQ score. Suppose person B has one more year of schooling than person A. Then we predict B to have a wage that is 3.9% higher. I Simple regression does not allow us to compare people with the same IQ score. The larger estimated return is because we are attributing part of the IQ effect to education. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I Not suprisingly, there is a nontrival, positive correlation between educ and IQ: 0.52. I Multiple regression ”partials out” the other independent variables when looking at the effect of, say, educ. It can be shown that β̂1 measures the effect of educ on lwage once the correlation of educ and IQ has been netted or partialled out. I Another IQ point is worth much less than a year of education. Holding educ fixed, 10 more IQ points increases predicted wage by about 5.8%. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line What Does it Mean to ”Hold Other Factors Fixed?” I The beauty of multiple regression is that it gives us the ceteris paribus interpretation without having to find two people with the same value of IQ who differ in education by one year. The estimation method does that for us. I Multiple regression effectively mimics the ability to obtain a sample of individuals with same, say, IQ and different levels of education. I You can see in the first 20 observations that education and IQ score are free to vary. There are, of course, common values. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line Fitted Values and Residuals ŷi = β̂0 + β̂1 xi1 + β̂2 xi2 + ... + β̂k xik (35) ûi = yi − ŷi (36) 1. The residuals always average to zero, implies ȳ = ŷ . Pn i=1 ûi = 0. This 2. Each explanatory variable is uncorrelated with the residuals in the sample. This follows from the first order conditions. It implies that ŷi and ûi are also uncorrelated. 3. The sample averages always fall on the OLS regression line: ȳ = β̂0 + β̂1 x̄1 + β̂2 x̄2 + ... + β̂k x̄k ECON 371: Econometrics The Multiple Regression Model (37) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line Goodness-of-Fit As with simple regression, it can be shown that SST = SSE + SSR (38) where SST , SSE , and SSR are the total, explained, and residual sum of squares. We define the R-squared as before: R2 = SSE SSR =1− SST SST ECON 371: Econometrics The Multiple Regression Model (39) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I In addition to 0 ≤ R 2 ≤ 1 and the intepretation of R 2 , there is a useful fact to remember: using the same set of data and the same dependent variable, the R-squared can never fall when another independent variable is added to the regression. And, it almost always goes up, at least a little. I This means that, if we focus on R 2 , we might include silly variables among the xj . I Adding another x cannot make SSR increase. The SSR falls unless the coefficient on the new variable is identically zero. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I In the missed lectures example, the R 2 from final on missed is only .0196. When we add priGPA, R 2 = .1342, and priGPA is doing all the work. If we just regress final on priGPA, R 2 = .1339, which is barely below when missed is included. I In the wage example, the R 2 from lwage on educ is 0.0974, and it increases to 0.1297 when IQ is added. I Notice that, together – we cannot separate out the contribution of each, because they are correlated – educ and IQ explain less than 15% of the variation in lwage. Over 85% is left unexplained. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line Comparing Simple and Multiple Regression Estimates Consider the simple and multiple OLS regression lines: ỹ = β̃0 + β̃1 x1 (40) ŷ = β̂0 + β̂1 x1 + β̂2 x2 (41) where a tilda (˜) denotes the simple regression and hat (ˆ) denotes multiple regression (using the same n observations). ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I Question: Is there a simple relationship between β̃1 , which does not control for x2 , and β̂1 , which does? I Yes, but we need to define another simple regression. Let δ̃1 be the slope from the regression xi2 on xi1 , i = 1, ..., n (42) (Yes, x2 plays the role of the dependent variable.) I It is always true – that is, an algebraic fact – that β̃1 = β̂1 + β̂2 δ̃1 ECON 371: Econometrics The Multiple Regression Model (43) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line Notice how, in β̃1 = β̂1 + β̂2 δ̃1 , the slope coefficient on x2 in the multiple regression also plays a role. Case 1: If the partial effect of x2 on y is positive, so β̂2 > 0, and x1 and x2 are positive correlated in the sample, so δ̃1 > 0, then β̃1 > β̂1 EXAMPLE: In the lwage example with x1 = educ, x2 = IQ, β̂2 = 0.00586, δ̃1 = 3.534, and we saw that β̃1 = 0.0598 > 0.039 = β̂1 . ECON 371: Econometrics The Multiple Regression Model (44) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line Case 2: If β̂2 > 0 and δ̃1 < 0 (x1 and x2 negatively correlated), or vice-versa, then β̃1 = β̂1 + β̂2 δ̃1 = β̂1 + (+)(−) = β̂1 + (−) < β̂1 (45) EXAMPLE: In the missed lectures example with x1 = missed and x2 = priGPA, β̂2 = 3.24, δ̃1 = −0.0427, and we saw that β̃1 = −0.121 < 0.017 = β̂1 . I β̃1 = β̂1 only when the partial effect of x2 on y is equal to zero in the sample(β2 = 0), and/or x1 and x2 are uncorrelated in the sample (δ̃1 = 0). ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line Regression Through the Origin I Occasionally, one wants to impose that the predicted y is zero when all the xj are zero. This means the intercept should be set to zero, rather than estimated: ỹ = β̃1 x1 + β̃2 x2 + ... + β̃k xk (46) I The costs of imposing a zero intercept when the population intercept is not zero are severe: all of the slope coefficients are biased, in general. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I Estimating an intercept when β0 = 0 does not cause in bias in the slope coefficients (under the appropriate assumptions). We already know this with simple regression: nothing about SLR.1 to SLR.4 says the population parameters must be different from zero. I Also, computing an appropriate R-squared becomes a bit tricky. In this class, we always estimate the intercept. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I We motivated OLS estimation using the population regression function E (y |x1 , x2 , ..., xk ) = β0 + β1 x1 + β2 x2 + ... + βk xk (47) I We obtaind the sample regression function, ŷ = β̂0 + β̂1 x1 + β̂2 x2 + ... + β̂k xk (48) by choosing the β̂j to minimize the sum of squared residuals. I The β̂j have a ceteris paribus interpretation: ŷ = β̂1 ∆x1 , ∆x2 = ... = ∆xk = 0 ECON 371: Econometrics The Multiple Regression Model (49) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Interpreting the OLS Regression Line I We discussed algebraic properties of fitted values and residuals. These hold for any sample. Also, features of R 2 , the goodness-of-fit measure. I The only assumption we needed to discuss the algebraic properties for a given sample is that we can actually compute the estimates. I Now we turn to statistical properties and features of the study the sampling distributions of the estimators. We go further than in the simple regression case, eventually covering statistical inference. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I As with simple regression, there is a set of assumptions under which OLS is unbiased. We also explicitly consider the bias caused by omitting a variable that appears in the population model. I Now we label the assumptions with ”MLR” (multiple linear regression). ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias Assumption MLR.1 (Linear in Parameters) I The model in the population can be written as y = β0 + β1 x1 + β2 x2 + ... + βk xk + u (50) where the βj are the population parameters and u is the unobserved error. I We have seen examples already where y and the xj can be nonlinear functions of underlying variables, and so the model is flexible. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias Assumption MLR.2 (Random Sampling) I We have a random sample of size n from the population, {(xi1 , xi2 , ..., xik , yi ) : i = 1, ..., n} I As with SLR, this introduces the data and means the data are a representative sample from the population. Sometimes we will plug the observations into the population equation: yi = β0 + β1 xi1 + β2 xi2 + ... + βk xik + ui which emphasizes that, along with the observed variables, we effectively draw an unobserved error, ui , for each i. ECON 371: Econometrics The Multiple Regression Model (51) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias Assumption MLR.3 (No Perfect Collinearity) I In the sample (and, therefore, in the population), none of the explanatory variables is constant, and there are no exact linear relationships among them. I Ruling out no sample variation in {xij : i = 1, ..., n} is clear from simple regression. I There is a new part to the assumption because we have more than one explanatory variable. We must rule out the (extreme) case that one (or more) of the explanatory variables is an exact linear function of the others. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I If, say, xi1 is an exact linear function of xi2 , ..., xik , we say the model suffers from perfect collinearity. I Under perfect collinearity, there are no unique OLS estimators. Stata and other regression packages will indicate a problem. I Usually perfect collinearity arises from a bad specification of the population model. A small sample size or bad luck in drawing the sample can also be the culprit. I Assumption MLR.3 can only hold if n ≥ k + 1, that is,we have at least as many observations as we have parameters to estimate. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias Suppose that k = 2 and x1 = educ, x2 = exper. If we draw our sample so that educi = 2exper i (52) for every i, then Assumption MLR.3 is violated. Very unlikely unless the sample is small. (In the population, there are plenty of people for which years of schooling is not twice their years of workforce experience.) ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I With the samples we have looked at, the presence of perfect collinearity usually is a result of poor model specification, or defining variables inappropriately. I Such problems can almost always be detected by remembering the ceteris paribus nature of multiple regression. I EXAMPLE: Do not include the same variable in an equation that is measured in different units. For example, in a CEO salary equation, it would make no sense to include firm sales measured in dollars along with sales measured in millions of dollars. There is no new information once we include one of these. I The return on equity should be included as a percent or proportion, not both. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I EXAMPLE: Be careful with functional forms! Suppose we start with a constant elasticity model of family consumption: ln(cons) = β0 + β1 ln(inc) + u (53) I How might we allow the elasticity to be nonconstant, but include the above as a special case? The following does not work: ln(cons) = β0 + β1 ln(inc) + β2 ln(inc 2 ) + u because ln(inc 2 ) = 2 ln(inc), that is, x2 = 2x1 , where x1 = ln(inc). ECON 371: Econometrics The Multiple Regression Model (54) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I Instead, we really mean something like ln(cons) = β0 + β1 ln(inc) + β2 [ln(inc)]2 + u (55) which means x2 = x12 . So x2 is an exact nonlinear function, but this (fortunately) is allowed in MLR.3. I Tracking down perfect collinearity can be harder when it involves more than two variables. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias EXAMPLE: Motivated by the data in VOTE.DTA: voteA = β0 + β1 expendA + β2 expendB + β3 totexpend + u (56) where expendA is campaign spending by candidate A, expendB is spending by candidate B, and totexpend is total spending. All are in thousands of dollars. Mechanically, the problem is that, by definition, expendA + expendB = totexpend which, of course, will be true then for any sample we collect. ECON 371: Econometrics The Multiple Regression Model (57) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I One of the three variables has to be dropped. (Stata does this automatically, but we should rely on ourselves to properly construct a model and interpret it.) I The model makes no sense from a ceteris paribus perspective. For example, β1 is suppose to measure the effect of changing expendA on voteA, holding fixed expendB and totexpend. But if expendB and totexpend are held fixed, expendA cannot change! I We would probably drop totexpend and just use the two separate spending variables. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I The results of this regression seem sensible: spending by candidate A has a positive effect on the share of the vote received by A, and spending by B has essentially the opposite effect. (If expendA increases by 10, so $10,000, and expendB is held fixed, the voteA is estimated to increase by about .38 percentage points.) I Note that shareA, which is a nonlinear function of expendA and expendB, shareA = 100 · expendA (expendA + expendB) (58) can be included. It allows for the relative size of spending to matter. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I A Key Point: Assumption MLR.3 does not say the explanatory variables have to be uncorrelated. Nor does it say they cannot be ”highly” correlated. It rules out perfect correlation, that is, correlations of ±1. Very rare unless a mistake has been made in specifying the model. I Multiple regression would be almost useless if we had to insist x1 , ..., xk were uncorrelated in the sample (or population)! I If the xj were all pairwise uncorrelated, we could just use a bunch of simple regressions. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I In an equation like lwage = β0 + β1 educ + β2 IQ + β3 exper + u, (59) we fully expect correlation among educ, IQ, and exper. (Already saw educ and IQ are positively correlated; educ and exper tend to be negatively correlated.) I If educ were uncorrelated with all other variables that affect lwage, we could stick with simple regression of lwage on educ to estimate β1 . I Multiple regression allows us to estimate ceteris paribus effects precisely when there is correlation among the xj . ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I MLR.1: y = β0 + β1 x1 + β2 x2 + ... + βk xk + u I MLR.2: random sampling I MLR.3: no perfect collinearity The last assumption ensures that the OLS estimators are unique and can be obtained from the first order conditions (minizing the sum of squared residuals). We need a final assumption for unbiasedness. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias Assumption MLR.4 (Zero Conditional Mean) E (u|x1 , x2 , ..., xk ) = 0 (60) I The average value of the error does not change across different slices of the population defined by x1 , ..., xk . I If u is correlated with any of the xj , MLR.4 is violated. This is usually a good way to think about the problem. I MLR.4 would also be violated if we misspecify the functional relation between explained and explanatory variables in MLR.1. For example, if we use wage when the true relationship in the population has y = ln(wage). ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I EXAMPLE: score = β0 + β1 classize + β2 income + u (61) I Even at the same income level, families differ in their interest and concerns about their children’s education. Family support and student motivation is in u. Is this correlated with class size? ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I When Assumption MLR.4 holds, we say x1 , ..., xk are exogenous explanatory variables. If xj is correlated with u, we often say xj is an endogenous explanatory variable. (This name makes more sense in more advanced contexts, but it is used generally.) ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias THEOREM (Unbiasedness of OLS) I Under Assumptions MLR.1 through MLR.4, the OLS estimators are unbiased: E (β̂j ) = βj , j = 0, 1, 2, ..., k (62) for any values of the βj . I The easiest proof requires matrix algebra. See Appendix 3A for a proof based on summations. I The hope is that if our focus is on, say, x1 , we can include enough other variables in x2 , ..., xk to make MLR.4 true, or close to true. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I It is important to see that the unbiasedness result allow for the βj to be any value, including zero. I Suppose, then, that we specify the model lwage = β0 + β1 educ + β2 exper + β3 motheduc + u (63) where MLR.1 through MLR.4 hold. Suppose that β3 = 0 (motheduc has no effect on lwage after educ and exper are controlled for), but we do not know that. We estimate the full model by OLS: \ = β̂0 + β̂1 educ + β̂2 exper + β̂3 motheduc lwage ECON 371: Econometrics The Multiple Regression Model (64) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I We automatically know from the unbiasedness result that E (β̂j ) = βj , j = 0, 1, 2 (65) E (β̂3 ) = 0 (66) I Hence, including an irrelavant variable, or overspecifying the model, does not cause bias in any coefficients and it follows from the general unbiasedness result. However, it is not costless as the variances of the OLS estimators will suffer. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I Leaving a variable out when it should be included in multiple regression is a serious problem. This is called excluding a relevant variable or underspecifying the model. I We can perform a misspecification analysis in this case. The general case is more complicated. Consider the case where the correct model has two explanatory variables: y = β0 + β1 x1 + β2 x2 + u satisfies MLR.1 through MLR.4. ECON 371: Econometrics The Multiple Regression Model (67) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I If we regress y on x1 and x2 , we know the resulting estimators will be unbiased. But suppose we leave out x2 and use simple regression of y on x1 : ỹ = β̃0 + β̃1 x1 (68) I In most cases, we omit x2 because we cannot collect data on it. I We can easily derive the bias in β̃1 (conditional on the outcomes of xi1 and xi2 ). ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I We already have a relationship between β̃1 and the multiple regression estimator, β̂1 : β̃1 = β̂1 + β̂2 δ̃1 (69) where β̂2 is the multiple regression estimator of β2 and δ̃1 is the slope coefficient in the auxiliary regression xi2 on xi1 , i = 1, ..., n ECON 371: Econometrics The Multiple Regression Model (70) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I Now just use the fact that β̂1 and β̂2 are unbiased (or would be if we could compute them): E (β̃1 |x1 , x2 ) = E (β̂1 |x1 , x2 ) + E (β̂2 |x1 , x2 )δ̃1 (71) = β1 + β2 δ̃1 (72) I Therefore, Bias(β̃1 ) = β2 δ̃1 (73) I Recall that δ̃1 is a function of x1 , x2 and has the same sign as Corr (x1 , x2 ). ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias The simple regression estimator is unbiased in two cases. 1. β2 = 0. But this means that x2 does not appear in the population model, so simple regression is the right thing to do. 2. Corr (x1 , x2 ) = 0 (in the sample). Then the simple and multiple regression estimators are identical because δ̃1 = 0. If β2 6= 0 and Corr (x1 , x2 ) 6= 0 then β̃1 is generally biased. We do not know β2 and might only have a vague idea about the size of δ̃1 . But we often can guess at the signs. Bias in the Simple Regression Estimator of β1 Corr (x1 , x2 ) > 0 Corr (x1 , x2 ) < 0 β2 > 0 Positive Bias Negative Bias β2 < 0 Negative Bias Positive Bias ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias EXAMPLE: Omitted Ability Bias. lwage = β0 + β1 educ + β2 abil + u (74) where abil is “ability”. Essentially by definition, β2 > 0. We also think Corr (educ, abil) > 0 (75) so that higher ability people get more education, on average. In this scenario, E (β̃1 ) > β1 so there is an upward bias in simple regression. ECON 371: Econometrics The Multiple Regression Model (76) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I Our failure to control for ability leads to (on average) overestimating the return to education. We attribute some of the effect of ability to education because ability and education are positively correlated. I Remember, for a particular sample, we can never know whether β̃1 > β1 . But we should be very hesitant to trust a procedure that produces too large an estimate on average. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias EXAMPLE: Effects of a Tutoring Program on Student Performance GPA = β0 + β1 tutor + β2 abil + u (77) where tutor is hours spent in tutoring. Again, β2 > 0. Suppose that students with lower ability tend to use more tutoring: Corr (tutor , abil) < 0 (78) E (β̃1 ) = β1 + β2 δ̃1 = β1 + (+)(−) < β1 (79) Then, so that our failure to account for ability leads us to underestimate the effect of tutoring. In fact, it could happen that β1 > 0 but E (β̃1 ) ≤ 0, so we tend to find no effect or even a negative effect. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I Determing bias is more difficult in situations with more than two explanatory variables. The following simplication often works. Suppose lwage = β0 + β1 educ + β2 exper + β3 abil + u (80) and we omit ability. So we regress lwage on educ, exper. Call these estimators β̃j , j = 0, 1, 2. I Even if exper is uncorrelated with abil, β̃2 is generally biased, just as is β̃1 . This is because educ and exper are usually correlated. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Inclusion of Irrelevant Variables Omitted Variable Bias I If, as an approximation, we assume educ and exper are uncorrelated, and exper and abil are uncorrelated, then the bias analysis of β̃1 for the simpler case carries through, but it is now β3 that matters. So, as a rough guide, β̃1 will have an upward bias because β3 > 0 and Corr (educ, abil) > 0. I In the general case, it should be remembered that correlation of any xj with an omitted variable generally causes bias in all of the OLS estimators, not just in β̃j I See Appendix 3A in Wooldridge for a more detailed treatment. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance So far, we have assumed I MLR.1: y = β0 + β1 x1 + β2 x2 + ... + βk xk + u I MLR.2: random sampling I MLR.3: no perfect collinearity I MLR.4: E (u|x1 , x2 , ..., xk ) = 0 Under MLR.3 we can compute the OLS estimates in our sample. The other assumptions then ensure that OLS is unbiased (when we think of repeated sampling). ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I As in the simple regression case, to obtain Var (β̂j ) we add a simplifying assumption: homoskedasticity (constant variance). Assumption MLR.5 (Homoskedasticity) The variance of the error, u, does not change with any of x1 , x2 , ..., xk : Var (u|x1 , x2 , ..., xk ) = Var (u) = σ 2 (81) I This assumption can never be guaranteed. We make it for now to get simple formulas, and to be able to discuss efficiency of OLS. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I Assumptions MLR.1 and MLR.4 imply E (y |x1 , x2 , ..., xk ) = β0 + β1 x1 + ... + βk xk (82) and when we add MLR.5, Var (y |x1 , x2 , ..., xk ) = Var (u|x1 , x2 , ..., xk ) = σ 2 I Assumptions MLR.1 through MLR.5 are called the Gauss Markov assumptions. ECON 371: Econometrics The Multiple Regression Model (83) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I If we have a savings equation, sav = β0 + β1 inc + β2 famsize + β3 pareduc + u (84) where famsize is size of the family and pareduc is total parents’ education, MLR.5 means that the variance in sav cannot depend in income, family size, or parents’s education. I Later we will show how to relax MLR.5, and how to test whether it is true. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I To set up the following theorem, we focus only on the slope coefficients. (A different formula is needed for Var (β̂0 )) I As before, we are computing the variance conditional on the values of the explanatory variables in the sample. I We need to define two quantities associated with each xj . The first is the total variation in xj in the sample: SSTj = n X (xij − x̄j )2 i=1 (SSTj /n is the sample variance of xj .) ECON 371: Econometrics The Multiple Regression Model (85) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I The second is a measure of correlation between xj and the other explanatory variables, in the sample. This is the R-squared from the regression xij on xi1 , xi2 , ..., xi,j−1 , xi,j+1 , ..., xik (86) That is, we regress xj on all of the other explanatory variables (y plays no role here). Call this R-squared Rj2 , j = 1, ..., k. I Important: Rj2 = 1 is ruled out by Assumption MLR.3 because Rj2 = 1 means xj is an exact linear function (in the sample) of the other explanatory variables. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I Any value 0 ≤ Rj2 < 1 is allowed. As Rj2 gets closer to one, xj is more linearly related to the other independent variables. THEOREM (Sampling Variances of OLS Slope Estimators) Under Assumptions MLR.1 to MLR.5, Var (β̂j ) = σ2 , j = 1, 2, ..., k. SSTj (1 − Rj2 ) All five Gauss-Markov assumptions are needed to ensure this formula is correct. ECON 371: Econometrics The Multiple Regression Model (87) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance Suppose k = 3, y = β0 + β1 x1 + β2 x2 + β3 x3 + u (88) E (u|x1 , x2 , x3 ) = 0 (89) Var (u|x1 , x2 , x3 ) = γ0 + γ1 x1 (90) where x1 ≥ 0 (as are γ0 and γ1 ). This violates MLR.5, and the standard variance formula is generally incorrect for all OLS estimators, not just Var (β̂1 ). ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem The variance Var (β̂j ) = Variances in Misspeficied Models Estimating the Error Variance σ2 SSTj (1 − Rj2 ) (91) has three components. Two are familiar from simple regression. The third, 1 − Rj2 , is new to multiple regression. I As the error variance (in the population), σ 2 , decreases, Var (β̂j ) decreases. One way to reduce the error variance is to take more “stuff” out of the error. That is, add more explanatory variables. I As the total sample variation in xj , SSTj , increases, Var (β̂j ) decreases. As in the simple regression case, it is easier to estimate how xj affects y if we see more variation in xj . ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance SSTj j I As we mentioned earlier, SST n (or n−1 , the difference is unimportant here) is the sample variance of {xij : i = 1, 2, ..., n}. So we can use SSTj ≈ nσj2 (92) where σj2 > 0 is the population variance of xj . I We can increase SSTj by increasing the sample size. SSTj is roughly a linear function of n. (Of the three components in Var (β̂j ), this is the only one that depends systematically on n.) ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Var (β̂j ) = Variances in Misspeficied Models Estimating the Error Variance σ2 SSTj (1 − Rj2 ) (93) I As Rj2 → 1, Var (β̂j ) → ∞. Rj2 measures how linearly related xj is to the other explanatory variables. I We get the smallest variance for β̂j when Rj2 = 0: Var (β̂j ) = σ2 SSTj looks just like the simple regression formula. ECON 371: Econometrics The Multiple Regression Model (94) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I If xj is unrelated to all other independent variables, it is easier to estimate its ceteris paribus effect on y . I Rj2 = 0 is very rare. Even small values are not especially common. I In fact, Rj2 ≈ 1 is somewhat common, and this can cause problems for getting a sufficient precise estimate of βj . I Loosely, Rj2 “close” to one is called the “problem” of multicollinearity. Unfortunately, we cannot define what we mean by “close” that is relevant for all situations. We have ruled out the case of perfect collinearity, Rj2 = 1. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I Here is an important point: one often hears discussions of multicollinearity as if high correlation among two or more of the xj is a violation of an assumption we have made. But it does not violate any of the Gauss-Markov assumptions. I We know that if the zero conditional mean assumption is violated, OLS is not unbiased. If MLR.1 through MLR.4 hold, but homoskedasticity (constant variance) does not, then Var (β̂j ) = σ2 SSTj (1 − Rj2 ) (95) is not the correct formula. I But multicollinearity does not cause the OLS estimators to be biased. We still have E (β̂j ) = βj . ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I Further, the claim that the OLS variance formula is “biased” is also wrong. The formula is correct under MLR.1 through MLR.5. I In fact, the formula is doing its job: it shows that if Rj2 is “close” to one, Var (β̂j ) might be very large. If Rj2 is “close” to one, xj does not have much variation separate from the other explanatory variables. We are trying to estimate the effect of xj on y , holding x1 , ..., xj−1 , xj+1 , ..., , xk fixed, but the data might not be allowing us to do that very precisely. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I Because multicollinearity violates none of our assumptions, it is essentially impossible to state hard rules about when it is a “problem”. Of course, this has not stopped some from trying. I Other than just looking at the Rj2 , a common measure of multicollinearity is called the variance inflation factor (VIF): 1 1 − Rj2 (96) σ2 · VIFj SSTj (97) VIFj = Var (β̂j ) = ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I The VIFj tells us how many times larger the variance is than if we had the “ideal” case of no correlation of xij with xi1 , ..., xi,j−1 , xi,j+1 , ..., , xik . I This sometimes leads to silly rules-of-thumb. For example, one should be “concerned” if VIFj > 10 (equivalently, Rj2 > .9). I Is Rj2 > .9 “large”? Yes, in the sense that it would be better to have Rj2 smaller. I But, if we want to control for, say, x2 , ..., xk to get a good ceteris paribus effect of x1 on y , we often have no choice. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I A large VIFj can be offset by a large SSTj : Var (β̂j ) = σ2 · VIFj SSTj (98) With a large sample size, SSTj could be sufficiently large to offset VIFj . The value of VIFj per se is irrelevant. Ultimately, it is Var (β̂j ) that is important. I Even so, at this point, we have no way of knowing whether Var (β̂j ) is “too large” for the estimate β̂j to be useful. Only when we discuss confidence intervals and hypothesis testing will this be apparent. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I Be wary of work that reports a set of multicollinearity “diagnostics” and concludes nothing useful can be learned. Sometimes any VIF about, say, 10 is used to make such a claim. Other “diagnostics” are even more difficult to interpret. Using them indiscriminately is often a mistake. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I Consider an example: y = β0 + β1 x1 + β2 x2 + β3 x3 + u where β1 is the coefficient of interest. In fact, x2 and x3 act as controls so that we hope to get a good ceteris paribus estimate of x1 . Such controls are often highly correlated. But the correlation between x2 and x3 has nothing to do with Var (β̂1 ). It is only correlation of x1 with (x2 , x3 ) that matters. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I In an example to determine whether communities with larger minority populations are discriminated against in lending, we might have percapproved = β0 + β1 percminority +β2 avginc + β3 avghouseval + u where β1 is the key coefficient. We might expect avginc and avghouseval to be highly correlated across communities. But we do not care about β2 and β3 . ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I As with bias calculations, we can study the variances of the OLS estimators in misspecified models. I Consider the same case with (at most) two explanatory variables: y = β0 + β1 x1 + β2 x2 + u (99) which we assume satisfies the Gauss-Markov assumptions. I We run the “short” regression, y on x1 , and also the “long” regression, y on x1 , x2 : ỹ = β̃0 + β̃1 x1 (100) ŷ = β̂0 + β̂1 x1 + β̂2 x2 (101) ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I We know from the previous analysis that Var (β̂1 ) = σ2 SST1 (1 − R12 ) (102) conditional on the values xi1 and xi2 in the sample. I What about the simple regression estimator? We can show Var (β̃1 ) = σ2 SST1 which is again conditional on {(xi1 , xi2 ) : i = 1, ..., n}. ECON 371: Econometrics The Multiple Regression Model (103) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I Whenever xi1 and xi2 are correlated, R12 > 0, and Var (β̃1 ) = σ2 σ2 < < Var (β̂1 ) SST1 SST1 (1 − R12 ) (104) I So, by omitting x2 , we can in fact get an estimator with a smaller variance, even though it is biased. When we look at bias and variance, we have a tradeoff between simple and multiple regression. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance In the case R12 > 0, we can draw two conclusions. y = β0 + β1 x1 + β2 x2 + u (105) 1. If β2 6= 0, β̃1 is biased, β̂1 is unbiased, but Var (β̃1 ) < Var (β̂1 ). 2. If β2 = 0, β̃1 and β̂1 are both unbiased and Var (β̃1 ) < Var (β̂1 ). Case 2 is clear cut. If β2 = 0, x2 has no (partial) effect on y . When x2 is correlated with x1 , including it along with x1 in the regression makes it more difficult to estimate the partial effect of x1 . Simple regression is clearly preferred. This can be seen as the cost of including irrelevant variables. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance Case 1 is more difficult, but there are reasons to prefer the unbiased estimator, β̂1 . First, the bias in β̃1 does not systematically change with the sample size. We should assume the bias is as large when n = 1, 000 as when n = 10. By contrast, the variances Var (β̃1 ) and Var (β̂1 ) both shrink at the rate 1/n. With a large sample size, the difference between Var (β̃1 ) and Var (β̂1 ) is less important, especially considering the bias in β̃1 is not shrinking. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance Second reason for preferring β̂1 is more subtle. The formulas Var (β̃1 ) = σ2 SST1 (106) Var (β̂1 ) = σ2 SST1 (1 − R12 ) (107) because they condition on the same explanatory variables, act as if the error variance does not change when we add x2 . But if β2 6= 0, the variance does shrink. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance In a more advanced course, we would be making a comparison between Var (β̃1 ) = η2 SST1 (108) Var (β̂1 ) = σ2 SST1 (1 − R12 ) (109) where η 2 > σ 2 reflects the larger error variance in the simple regression analysis. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance We still need to estimate σ 2 . With n observations and k + 1 parameters, we only have df = n − (k + 1) (110) degrees of freedom. Recall we lose the k + 1 df due to k + 1 restrictions on the OLS residuals: n X ûi = 0 (111) = 0, j = 1, 2, ..., k (112) i=1 n X xij ûi i=1 ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance THEOREM: (Unbiased Estimation of σ 2 ) Under the Gauss-Markov assumptions (MLR.1 through MLR.5) σ̂ 2 = (n − k − 1)−1 n X i=1 ûi2 = SSR df σ2. (113) is an unbiased estimator of This means that, if we divide by n rather than n − k − 1, the bias is 2 k +1 −σ (114) n which means the estimated variance would be too small, on average. The bias disappears as n increases. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance I The square root of σ̂ 2 , σ̂, is reported by all regression packages. (standard error of the regression, or root mean squared error). I Note that SSR falls when a new explanatory variable is added, but df falls, too. So σ̂ can increase or decrease when a new variable is added in multiple regression. I The standard error of each β̂j is computed (for the slopes) as σ̂ se(β̂j ) = q SSTj (1 − Rj2 ) (115) and it will be critical to report these along with the coefficient estimates. I We have discussed the three factors that affect se(β̂j ) already. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Variances in Misspeficied Models Estimating the Error Variance Using WAGE2.DTA: \ = 5.198 + 0.057 educ + 0.0058IQ + 0.0195exper lwage (0.121) (0.007) (0.001) (0.0032) 2 n = 935, R = 0.1622 ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem I How come we use OLS, rather than some other estimation method? I Consider simple regression: y = β0 + β1 x + u (116) yi = β0 + β1 xi + ui . (117) and write, for each i, I If we average across the n obervations we get ȳ = β0 + β1 x̄ + ū ECON 371: Econometrics The Multiple Regression Model (118) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem I For any i with xi 6= x̄, subtract and rearrange: β1 = (yi − ȳ ) (ui − ū) − (xi − x̄) (xi − x̄) (119) I The last term has a zero expected value under random sampling and E (u|x) = 0. If xi 6= x̄ for all i, we could use an estimator n X (yi − ȳ ) (120) β̆1 = n−1 (xi − x̄) i=1 ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem I β̆1 is not the same as the OLS estimator, Pn (xi − x̄)yi β̂1 = Pi=1 n 2 i=1 (xi − x̄) (121) I How do we know OLS is better than this new estimator, β̆1 ? I Generally, we do not. Under MLR.1 to MLR.4, both estimators are unbiased. I Comparing their variances is difficult in general. But, if we add the homoskedasticity assumption, MLR.5, we can show Var (β̂1 ) < Var (β̆1 ) ECON 371: Econometrics The Multiple Regression Model (122) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem I This means β̂1 has a sampling distribution that is less spread out around β1 than β̆1 . When comparing unbiased estimators, we prefer an estimator with smaller variance. I We can make very general statements for the multiple regression case, provided the 5 Gauss-Markov assumptions hold. I However, we must also limit the class of estimators that we can compare with OLS. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem THEOREM (Gauss-Markov) Under Assumptions MLR.1 through MLR.5, the OLS estimators β̂0 , β̂1 , ..., β̂k are the best linear unbiased estimators (BLUEs). Start from the end of “BLUE”. I E (estimator): We must be able to compute an estimate from the observable data, using a fixed rule. I L (linear): The estimator is a linear function of yi . These estimators have the general form β̃j = n X wij yi (123) i=1 where the wij are functions of (xi1 , ..., xik ). The OLS estimators can be written in this way. It can be a nonlinear function of the explanatory variables. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem I U (unbiased): We must impose enough restrictions on the wij – we omit those here – so that E (β̃j ) = βj , j = 0, 1..., k (conditional on {(xi1 , ..., xik ) : i = 1, ..., n}). We know the OLS estimators are unbiased under MLR.1 through MLR.4. So are a lot of other linear estimators. ECON 371: Econometrics The Multiple Regression Model (124) Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem I B (best): This means smallest variance (which makes sense once we impose unbiasedness). In other words, what can be shown is that, under MLR.1 through MLR.5, Var (β̂j ) ≤ Var (β̃j ) all j (125) (and usually the inequality is strict). If we do not impose unbiasedness, then we can use silly rules – such as β̃j = 1 always – to get estimators with zero variance. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem I How do we use the GM Theorem? If the Gauss-Markov assumptions hold, and we insist on unbiased estimators that are also linear functions of {yi : i = 1, 2, ..., n}, then we need look no further than OLS: it delivers the smallest possible variances. I It might be possible (but even so, not practical) to find unbiased estimators that are nonlinear functions of {yi : i = 1, 2, ..., n} that have smaller variances than OLS. The GM Theorem only allows linear estimators in the comparison group. I Appendix 3A contains a proof of the GM Theorem. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem If MLR.5 fails, that is, Var (u|x1 , ..., xk ) depends on one or more xj , the GM conclusions do not hold. There may be linear, unbiased estimators of the βj with smaller variance. ECON 371: Econometrics The Multiple Regression Model Motivation for Multiple Regression Mechanics and Interpretation of OLS The Expected Value of the OLS Estimators The Variance of the OLS Estimators Efficiency of OLS: The Gauss-Markov Theorem Remember: Failure of MLR.5 does not cause bias in the β̂j , but it does have two consequences: 1. The usual formulas for Var (β̂j ), and therefore se(β̂j ), are wrong. 2. The β̂j are no longer BLUE. The first of these is more serious, as it will directly affect statistical inference (next). The second consequence means we may want to search for estimators other than OLS. This is not so easy. And with a large sample, it may not be very important. ECON 371: Econometrics The Multiple Regression Model