Subido por Ahsen Khan

LEC7-Dummy Variables rj

Anuncio
ECON7035 & 8040
LEC7: INDICATOR (DUMMY) VARIABLES
WWLJ Chapter 8: 8.1 to 8.4
HGL Chapter 7: 7.1, 7.2, 7.3.1
Learning objectives
Describing qualitative information
A single dummy independent variable
Using dummy variables for multiple categories
Interaction involving dummy variables
2
Describing qualitative information


Qualitative information:
 Comes in the form of binary information
 Example 1: A person is male or female
 Example 2: A student attends a private or public school
A way to incorporate qualitative information is to use dummy
variables (i.e. binary variables or zero–one variables).
 Example: female is a dummy variable, which takes the value
of 1 if the student is female, and zero otherwise.

They may appear as the dependent or as independent
variables. In this chapter we only consider dummy varaibles
as explanatory variables
3
Single dummy independent variable

How is a binary information incorporated into regression model?
β1 + δ1 female + β 2 educ + e
wage =
= the wage gain/loss if the person is a
woman rather than a man (holding
other things fixed)

Dummy variable:
=1 if the person is a woman
=0 if the person is man
δ1 is the difference in hourly wage between females and males
given the same amount of education.
δ1 =
E ( wage / female =
1, educ) − E ( wage / male =
1, educ)
E ( wage / female, educ) − E ( wage / male, educ)
4
Single dummy independent variable
Alternative interpretation of
coefficient δ1 = the difference in
mean wage between men and
women with the same level of
education.
wage
men: wage= β1 + β 2 educ
women:
wage = ( β1 + δ1 ) + β 2 educ
slope=β2
β1
β1 + δ1
Intercept shift
educ
5
Dummy variable trap
wage =
β1 + γ 1male + δ1 female + β 2 educ + e
This model cannot be estimated
perfect collinearity as female + male = 1
male is a perfect linear function of female
Dummy variable trap
6
Base group or benchmark group
When using dummy variables, one category always has to be omitted:
β1 + δ1 female + β 2 educ + e
wage =
wage =
β1 + γ 1male + β 2 educ + e
The base category are men
The base category are women
Alternatively, one could omit the intercept:
wage = γ 1male + δ1 female + β 2 educ + e
Disadvantages:
1. More difficult to test
for differences
between parameters
2. R-squared formula
only valid if regression
contains intercept.
7
Estimated wage equation
with intercept shift (wages.xlsx)
Holding education and experience
fixed, women earn $6.37 less per
hour than men.
Does that mean that women are discriminated against?

We can conclude that the $6.37 differential cannot be explained
by different levels of education and experience between men and
women and is due to gender or factors associated with gender
which have not been controlled for in the regression.
8
Program Evaluation
Example: effects of accessing online subject materials on assignment mark
(AS1.XLSX)
Assignment
mark
Dummy indicating whether student
reviewed online material
This is an example of program evaluation:


Treatment group (= reviewed online) vs. control group (= not reviewed)
This equation implies that a student who reviews the online materials
prior to handing in the assignment has a predicted assignment mark
about 1 point higher on average than a comparable (in terms of mid
term test marks and the number of tutes attended) student who did
not review the material.
9
Using a dummy independent variable
to account for outliers (gfc.xlsx)



Outliers can occur when sampling from a population.
May be different in some relevant aspect from rest of population.
Example: Did countries that adopted large fiscal packages
outperform those that didn’t?
Where the IMF forecast errors variable is defined as the excess of
2009 actual GDP growth over the IMF predicted GDP growth.

We expect Greece, Hungary, Iceland and Ireland to be outliers due
to their fiscal circumstances.

We discard these four observations.
10
Using a dummy independent variable
to account for outliers (cont.)

It was found that a positive and significant relationship existed
between fiscal stimulus and performance.

Create a dummy variable ‘DUM’ for observations that are Greece,
Hungary, Iceland and Ireland.


The dummy variable is not significant.
There is not a strong case to treat the observations on Greece,
Hungary, Iceland and Ireland differently.
11
Interpreting coefficients on dummy explanatory variables when the dependent
variable is log(y)
Example: housing price regression (hprice1.xlsx)
Dummy indicating
whether house is
of colonial style
As the dummy for
colonial style changes
from 0 to 1, the house
price increases by 5.4
percentage points
approximately.
The coefficient on a dummy variable when multiplied by 100 is interpreted
as the percentage difference in y holding all other factors constant.
12
Using dummy variables for multiple categories
Example: demand for Fords across LGAs (local government areas) in Australia
(PC_CARS.XLSX)
1. Define membership in each category by a dummy variable.
2. Leave out one category (which becomes the base category).
Dummy variables
are circled in red.
Holding other things fixed, the proportion of
Fords registered in Victoria is 15.4% higher
than in WA (= the base category).
Where m_fam is average family income and nemp is the unemployment
rate in the LGA.
13
Incorporating ordinal information
using dummy variables
Example: city credit ratings on government bond interest rates
GBR =+
β1 β 2CR + other factors
Government bond
rate
Credit rating from 0–4
(0=worst, 4=best)
This specification would probably not be appropriate, as the credit rating only contains
ordinal information. A better way to incorporate this information is to define dummies:
GBR =
β1 + δ1CR1 + δ 2CR2 + δ 3CR3 + δ 4CR4 + other factors
Dummies indicating whether the particular rating applies; e.g. CR1=1 if CR=1 and CR1=0
otherwise. All effects are measured in comparison to the worst rating (= base category).
14
Interactions among dummy variables

Examine the interaction term between female and married in the
log wage model.

We can examine the wages difference between single and married
females and males respectively.

We can test the null hypothesis that the gender differential does not
depend on marital status.

The ‘marriage premium’ for a female is .113-.145=-.032 (about
minus 3.2%) and for a male .113 (about 11.3%)
15
Allowing for different slopes
Allowing for different slopes
Interaction term
log( wage) =
β1 + δ1 female + β 2 educ + δ 2 female × educ + u
β1 = intercept men
β1 +δ1= intercept women
β2 = slope men
β2 +δ2 = slope women
Interesting hypotheses
H0 : δ2 = 0
The return to education is
the same for men and
women.
H0 : δ1 = δ2 = 0
The whole wage equation is
the same for men and
women.
16
Allowing for different slopes (cont.)
Interacting both the
intercept and the slope
with the female dummy
enables one to model
completely independent
wage equations for men
and women.
17
Example: log hourly wage equation
The estimated return to education
for women is 0.088 + 0.00005
The estimated return to education for
men is 8.8%
No evidence against the hypothesis that the return to education is
the same for men and women since the coefficient on the
interaction term is insignificant.
18
Example: log hourly wage equation
Does this mean that there is no significant evidence of
lower pay for women at the same levels of educ and exper?
No: when we introduced female.educ the SE of the
coefficient of female increased from 0.027 to 0.152 due to
multicollinearity between female and female.educ .
19
Testing for differences in regression
functions across groups
Unrestricted model (contains full set of interactions)
Assignment
mark
= 1 if viewed online material
= 0 otherwise
Student‘s mark
in test
assignmark =
β1 + δ1 female + β 2View + δ 2 female × View + β3midtest
+ δ 3 female × midtest + β 4 numtutes + δ 4 female × numtutes + u
Number of
tutorials attended
Restricted model (same regression for both groups)
assignmark =
β1 + β 2View + β3midtest + β 4 numtutes + u
20
Testing for differences in regression
functions across groups (cont.)
Null hypothesis
H0=
: δ1 0,=
δ 2 0,=
δ 3 0,=
δ4 0
Estimation of the unrestricted model
All interaction effects are zero,
i.e. the same regression
coefficients apply to men and
women.
Tested individually,
the hypothesis that
the interaction
effects are zero
cannot be rejected.
21
Joint test with F statistic
F
SSRr − SSRur ) / q [1291.56 − 1208.102] / 4
(=
=
SSRur / (n − k − 1)
1208.102 / 215
3.71
Alternative way to compute F statistic in the given case

Run separate regressions for men and for women; the unrestricted
SSR is given by the sum of the SSR of these two regressions.



Run a regression for the restricted model and store SSR.
If the test is computed in this way, it is called the Chow test.
Important: test assumes a constant error variance across groups.
22
Mar-09
Jun-08
Sep-07
Dec-06
Mar-06
Jun-05
Sep-04
Dec-03
Mar-03
Jun-02
Sep-01
Dec-00
Mar-00
Jun-99
Sep-98
Dec-97
Mar-97
Jun-96
Sep-95
Dec-94
Mar-94
Jun-93
Sep-92
Dec-91
Mar-91
Jun-90
Sep-89
Dec-88
7,000.00
Mar-88
Jun-87
Sep-86
Dec-85
Mar-85
Jun-84
Sep-83
CONTROLLING FOR SEASONALITY &
EVENTS IN TIME SERIES DATA
(1) Seasonal Dummies
Quarterly Retail Turnover: Department Stores
6,000.00
5,000.00
4,000.00
3,000.00
2,000.00
1,000.00
0.00
28
Define the seasonal dummy variables as follows.
D1t = 1 if t is first quarter, = 0 otherwise;
D2t = 1 if t is second quarter, = 0 otherwise;
D3t = 1 if t is third quarter, = 0 otherwise; and
D4t = 1 if t is fourth quarter, = 0 otherwise.
In Gretl click on Add and choose Periodic Dummies to create the seasonal
dummies.
The model can be specified as
yt = β1 + δ2D2t + δ3D3t + δ4D4t + β2Xt + εt
• Reference category = first quarter,
• δk = difference between the intercepts of quarter k and the first quarter
or
yt = δ1D1t + δ2D2t + δ3D3t + δ4D4t + β2Xt + εt
• No reference category,
• δk = the intercepts of quarter k
29
Descargar