english

Anuncio
Grado en Ingenier
ía
Ingeniería
Asignatura: Estad
ística
Estadística
Tema: Regresion - IN ENGLISH
Grado en Ingeniería. Estadística. Tema 4
Regression
Introduction. Non-deterministic relationships
Simple Linear Regression
Model
Estimation
Diagnosis / Inference
Multiple Regression
Multiple dispersion graphs
Estimation
Multicolinearity
Dummy variables / Interactions
Número de transparencia: 2
Grado en Ingeniería. Estadística. Tema 4
Objectives
To know how to analyse the relationship between variables using a
linear regression model that describes the influence of variable X on
another variable Y.
To know how to obtain point estimates of the parameters of the said
model
To know how to construct confidence intervals and resolve hypothesis
tests about the said parameters
To know how to estimate the average of Y for a given value of X
To know how to predict future values of the variable Y
Número de transparencia: 3
Grado en Ingeniería. Estadística. Tema 4
Relationships between variables
Regression studies the relation between variables.
What type of relationships can exist:
-Deterministic relationship (exact)
- Non-deterministic relationship (not exact)
Número de transparencia: 4
Grado en Ingeniería. Estadística. Tema 4
Deterministic relationships
We call a relationship between two variables deterministic when
by knowing the value of one of the variables we are able to know
the value of the other EXACTLY
This corresponds to an exact mathematical relationship; a function
Y = f (x)
Número de transparencia: 5
Grado en Ingeniería. Estadística. Tema 4
Non
-deterministic relationships
Non-deterministic
The relationship between the two variables is not exact; knowing
the value of one does not allow us to know the exact value of the
other.
We know that a relationship exists between the variables
– but it isn’t exact!
Número de transparencia: 6
Grado en Ingeniería. Estadística. Tema 4
Regression
What does regression do?
It creates a linear model to simulate the relationship between
the variables
The relationship isn’t exact, and the model is not exact
=> but it is very useful!
Número de transparencia: 7
Grado en Ingeniería. Estadística. Tema 4
Regression
Regression:: residuals
If the relationship is not exact, then we will always commit an error
e = residual
The distance between each point (real data) is the part of the model
that can’t be predicted
We will estimate the regression line so that the errors we commit
are minimised (criterion: least mean square), specifying that the
mean error is zero
Número de transparencia: 8
Grado en Ingeniería. Estadística. Tema 4
How is the regression line calculated
?
calculated?
gradient
Número de transparencia: 9
Grado en Ingeniería. Estadística. Tema 4
How do we term the variables?
X
Independent
Explicative
Y
Dependent
The response to be explained
What we want to predict
The value that we know
Número de transparencia: 10
Grado en Ingeniería. Estadística. Tema 4
Regression
Regression:: an example
An example: we will analyse the relationship between the production cost
of a process and the number of pieces produced
5,7
4,7
3,7
2,7
1,7
2,1
2,4
2,7
3
3,3
3,6
3,9
Y = Production cost
X = The number of pieces
We will calculate the regression line using Statgraphics
Número de transparencia: 11
Grado en Ingeniería. Estadística. Tema 4
Regression
Regression:: an example
5,7
4,7
coste prod = 0,783429 + 0,669509*piezas producidas 3,7
2,7
1,7
2,1
Número de transparencia: 12
2,4
2,7
3
3,3
3,6
Grado en Ingeniería. Estadística. Tema 4
3,9
Regression
Regression:: an example
5,7
4,7
3,7
2,7
1,7
2,1
2,4
2,7
3
3,3
3,6
3,9
coste prod = 0,783429 + 0,669509*piezas producidas
However, a factory that produces a million units will have a production cost
of:
coste prod = 0,783429 + 0,669509* 1 = 1, 46 millones €
Will all the factories with this volume of production have the same cost ??
Número de transparencia: 13
Grado en Ingeniería. Estadística. Tema 4
Regression
Regression:: an example
Will all the factories with this volume of production have the same cost ??
5,7
4,7
3,7
2,7
1,7
2,1
2,4
2,7
3
3,3
3,6
3,9
There is a range of production cost, from 2.8 to 4.8 milllon €
Specifically, for the factory A : Prod. Cost = 1,66 millones
But the model says:
coste prod = 0,783429 + 0,669509* 1 = 1, 46 millones €
Therefore the error that is committed is 1,66 – 1,46 = 0,2 millones
Número de transparencia: 14
Grado en Ingeniería. Estadística. Tema 4
Assumptions of the model
Can we apply the regression model to all types of data?
No. If the conclusions that we make for out models are correct, the data
that the use must comply to the following properties:
1.
2.
3.
4.
Linearity
Homoscedasticity (Homocedasticidad)
Independence
Normally distributed
Número de transparencia: 15
Grado en Ingeniería. Estadística. Tema 4
Linearity
This is a fundamental assumption, the data must follow a
linear tendency, and be highly correlated
Número de transparencia: 16
Grado en Ingeniería. Estadística. Tema 4
Linearity
Linearity:: what happens if the data are not linear?
The regression will not correctly represent the
relationship between the variables
If the data is not linear we can look for a mathematical
transformation (e.g, log, sqrt) that improves the linearity.
Número de transparencia: 17
Grado en Ingeniería. Estadística. Tema 4
Homoscedasticity
This assumption means that the data has constant variance,
that it has a graph of the following type:
• When the variance of the data is constant we say that it is
• HOMOSCEDASTIC
• What happens if the data is not homoscedastic ??
Número de transparencia: 18
Grado en Ingeniería. Estadística. Tema 4
Homoscedasticity
Homoscedasticity:: heteroscedastic data
When the variance is not constant (it grows with the independent
variable) we say the data is HETEROSCEDASTIC
How does this affect the regression?
Gastos - Ingresos
(X 1,E6)
1
Gastos
0,8
0,6
0,4
0,2
0
0
2
4
Ingresos
6
8
(X 100000)
The prediction errors will be larger by an amount that grows
with the value of the variables!!
We shouldn’t apply regression to such heteroscedastic data.
We have to transform the data using: LOG
Número de transparencia: 19
Grado en Ingeniería. Estadística. Tema 4
Testing for linearity and homoscedasticity
The test for the assumption of linearity and homoscedasticity
we carry out by a graphical analysis of the data
(Scatterplots / X-Y plot)
5,7
4,7
3,7
2,7
1,7
2,1
2,4
2,7
3
3,3
3,6
If the data satisfies this assumption then we can
continue with the analysis
Número de transparencia: 20
Grado en Ingeniería. Estadística. Tema 4
3,9
Independence
The data that we analyse must be mutually independent
(between each datum):
- If we analyse the production cost against
production volume for different factories, we assume
that the data from one factory does not affect data from
another.
You CANNOT use regression analysis to analyse data
from a time series, as the each datum depends on
previous data.
Número de transparencia: 21
Grado en Ingeniería. Estadística. Tema 4
Normally distributed
The last assumption is that the model requires is that the data
analysed is normally distributed. What does this mean?
5,7
4,7
3,7
2,7
1,7
2,1
2,4
2,7
3
3,3
3,6
3,9
We have said that for each value of X, Y can take values in a
certain range
We assume that the values of Y for each value of X follow a
normal distribution
Número de transparencia: 22
Grado en Ingeniería. Estadística. Tema 4
The model
If the data satisfies the (four) assumptions discussed, we
can use the model to estimate them.
coste prod = 0,783429 + 0,669509*piezas producidas
Número de transparencia: 23
Grado en Ingeniería. Estadística. Tema 4
The model
coste prod = 0,783429 + 0,669509*piezas producidas
β0
β1
is the value of Y when X has value 0
(not always a feasible condition)
A “+” sign indicates the two variables grow together
A “-” sign indicates one variable grows as the other decreases
It also tells us how Y increases against changes in X:
∆Y = β1 ∆X
Therefore, in our previous example – how much will the prod. cost
increase if the number of pieces produced increase by one million?
∆(coste prod) = 0.669509*∆ (piezas producidas) = 0.66 millon
Número de transparencia: 24
Grado en Ingeniería. Estadística. Tema 4
Regression
….
Regression:: a problem
problem….
In regression we start with a data sample and from that we estimate
the model
5,7
4,7
3,7
2,7
1,7
2,1
2,4
2,7
3
3,3
3,6
3,9
coste prod = 0,783429 + 0,669509*piezas producidas
Número de transparencia: 25
Grado en Ingeniería. Estadística. Tema 4
Regression
….
Regression:: a problem
problem….
If we change the data sample we will change the parameters of the
model (the numbers that we have calculated)
Is it possible to select a sample that would give as the following
result?
2
1.5
1
0.5
0
-0.5
-1
-1.5
-2
-2.5
-3
-2
-1
0
1
2
3
If this happens, the gradient of the line, β1, is ZERO and we say that
THE REGRESSION IS NOT SIGNIFICANT
Número de transparencia: 26
Grado en Ingeniería. Estadística. Tema 4
Regression
….
Regression:: a problem
problem….
2
1.5
1
0.5
0
-0.5
-1
-1.5
-2
-2.5
-3
-2
-1
0
1
2
3
We want to be sure that our regression is valid – independent of the
sample considered
We want to be sure that the regression is valid for all of the
population studied and no just for one specific sample
WE WANT TO BE SURE THAT β1 IS NEVER EQUAL TO ZERO
Número de transparencia: 27
Grado en Ingeniería. Estadística. Tema 4
Inferences about the regression
In order to analyse if β1 is zero we have three tools:
Confidence intervals
Hypothesis tests
t-statistic
p-value
Número de transparencia: 28
Grado en Ingeniería. Estadística. Tema 4
Confidence intervals
We calculate a range in which the estimate of β1 will be for “any” sample
that we take.
This we do using a determined probability (generally 95%)
β1 -2xSE(β
β1) β1
β1 +2xSE(β
β1 )
If the value “0” does not belong to the interval,
the parameter is SIGNIFICANT !!
Número de transparencia: 29
Grado en Ingeniería. Estadística. Tema 4
Confidence intervals
coste prod = 0,783429 + 0,669509*piezas producidas
(β1 -2xSE(β1
β1)
β1)
β1 ; β1 +2xSE(β1
β1 )
( 0,67-2*0.07; 0,67+2*0.07) = (0.53; 0.81)
“0” does not belong to the interval => the parameter is significant!
Número de transparencia: 30
Grado en Ingeniería. Estadística. Tema 4
Hypothesis test
An alternative of assuring that β1 is not zero is to
propose a hypothesis (according to the standard form):
H0: β1 =0
H1: β1 ≠0
Statgraphics gives us the p-value for this test directly:
p < 0.05
We reject H0
=> The regression is significant
Número de transparencia: 31
Grado en Ingeniería. Estadística. Tema 4
Hypothesis test
-statistic
test:: tt-statistic
We also have another alternative to the p-value to
resolve the hypothesis test:
H0: β1 =0
H1: β1 ≠0
Número de transparencia: 32
Grado en Ingeniería. Estadística. Tema 4
Hypothesis test
-statistic
test:: tt-statistic
We also have another alternative to the p-value to resolve the
hypothesis test:
H0: β1 =0
H1: β1 ≠0
|t|>2 we reject H0
|t|<2 we do not reject H0
|t|>2 => we reject H0
The regression is significant
Número de transparencia: 33
Grado en Ingeniería. Estadística. Tema 4
How good is the model
? -->> R2
model?
The coefficient R2 (R-squared) indicates how much of Y is explained by X
(using the model)
Ejemplo:
R2=71.76%
R2 / R2 = (squared correlation coefficient)
Número de transparencia: 34
Grado en Ingeniería. Estadística. Tema 4
Summary
We study the data and see if the assumptions are satisfied
If not, then we transform the data using mathematical functions
We fit the model
We use the confidence intervals and hypothesis tests to see if X is
significant to Y (does X influence Y ?)
Número de transparencia: 35
Grado en Ingeniería. Estadística. Tema 4
Diagnostics
The decisions that we can take thanks to the information given by a
regression model are important
We need to be sure that our conclusions are correct.
For this we use:
Inference tests, confidence intervals ….
Diagnostics: to test once more if the assumptions made remain valid
In the diagnosis of the model, we test that the random part of the
model (the residuals) do not contain any additional information,
- or demonstrate the invalidity of the assumptions (linearity,
homoscedaticity, independence, normally distributed).
Número de transparencia: 36
Grado en Ingeniería. Estadística. Tema 4
Diagnostics
The diagnosis is performed by visual inspection of the residual
graphs.
They should have the following general appearance:
Número de transparencia: 37
Grado en Ingeniería. Estadística. Tema 4
Diagnostics
We cannot accept residuals that display other types of behaviour:
3000
1000
2500
500
2000
0
1500
-500
1000
-1000
500
0
0
20
40
Número de transparencia: 38
60
80
100
-1500
500
1000
1500
2000
2500
Grado en Ingeniería. Estadística. Tema 4
3000
Regression
Introduction. Non-deterministic relationships
Simple Linear Regression
Model
Estimation
Diagnosis / Inference
Multiple Regression
Multiple dispersion graphs
Estimation
Multicolinearity
Dummy variables / Interactions
Número de transparencia: 39
Grado en Ingeniería. Estadística. Tema 4
Multiple Regression
In a multiple regression model, we want to know the value of a
response variable that results from more than one explicative
variable:
In this expression, each one of the β-coefficients reresents the
individual influence that each X variable has on Y
Advantages:
The assumptions of the model are the same as for simple regression
So are the hypothesis tests, diagnosis etc.
Slight inconveniences:
The visualisation of the graphs is slightly more complicated
We need to re-define the R2 coefficient
Número de transparencia: 40
Grado en Ingeniería. Estadística. Tema 4
Multiple Regression : Graphs
Each cell of the graph represents the bilateral relation between two
variables:
TOT_COST
UDS
MANPOWER
ENERGY
INVEST
MAINT
MAT
ENV
Número de transparencia: 41
Grado en Ingeniería. Estadística. Tema 4
Multiple Regression : adjusted R2 value
The R2 coefficient increases as the number of variables in the
model increases (whether they are significant or not). In order to
allieviate this effect we compensate for it. For this reason in multiple
regression we use the corrected (or adusted)
adusted R2 value.
Dependent variable: log(TOT_COST)
----------------------------------------------------------------------------Standard
T
Parameter
Estimate
Error
Statistic
P-Value
----------------------------------------------------------------------------CONSTANT
-1,82352
0,313487
-5,81689
0,0000
log(UDS)
0,666417
0,116524
5,71913
0,0000
log(MANPOWER)
0,157212
0,0551564
2,85029
0,0052
log(ENERGY)
0,174001
0,0489637
3,55367
0,0005
log(INVEST)
0,216335
0,0365883
5,91267
0,0000
log(MAINT)
-0,0199751
0,0594171
-0,336185
0,7373
log(MAT)
0,139431
0,0221418
6,2972
0,0000
log(ENV)
0,0027926
0,0178724
0,156252
0,8761
-----------------------------------------------------------------------------
Adjusted R2 = 81.73%
Número de transparencia: 42
Grado en Ingeniería. Estadística. Tema 4
Regression
Introduction. Non-deterministic relationships
Simple Linear Regression
Model
Estimation
Diagnosis / Inference
Multiple Regression
Multiple dispersion graphs
Estimation
Multicolinearity
Dummy variables / Interactions
Número de transparencia: 43
Grado en Ingeniería. Estadística. Tema 4
Example
Number
Number of
ofaccidents
accidentsin
in
Spanish
Spanishprovinces
provincesas
asaa
function
functionof
ofnumber
number of
of
registered
registeredvehicles
vehicles
(X 1000)
3
nacciden
2,5
2
1,5
1
0,5
0
0
4
8
12
16
matricul
20
24
(X 1000)
----------------------------------------------------------------------------Dependent variable: nacciden
----------------------------------------------------------------------------Standard
T
Parameter
Estimate
Error
Statistic
P-Value
----------------------------------------------------------------------------CONSTANT
278,24
102,518
2,71406
0,0265
matricul
0,0993373
0,00850344
11,682
0,0000
----------------------------------------------------------------------------R-squared (adjusted for d.f.) = 93,7703 percent
Número de transparencia: 44
Grado en Ingeniería. Estadística. Tema 4
Example
(X 1000)
3
2,5
nacciden
Number
Number of
ofaccidents
accidentsin
in
Spanish
Spanishprovinces
provincesas
asaa
function
functionof
ofthe
thenumber
number
of
of driving
drivinglicences.
licences.
2
1,5
1
0,5
0
0
4
8
12
16
permisos
20
24
(X 1000)
----------------------------------------------------------------------------Dependent variable: nacciden
----------------------------------------------------------------------------Standard
T
Parameter
Estimate
Error
Statistic
P-Value
----------------------------------------------------------------------------CONSTANT
216,481
127,099
1,70325
0,1269
permisos
0,107617
0,0109657
9,81395
0,0000
----------------------------------------------------------------------------R-squared (adjusted for d.f.) = 91,3722 percent
Número de transparencia: 45
Grado en Ingeniería. Estadística. Tema 4
Regressions
Accid=278.2 +0.1 Matriculas
(t-statistic = 11.68)
Accid=216.4 +0.1 Permisos
(t-statistic = 9.81)
Número de transparencia: 46
Grado en Ingeniería. Estadística. Tema 4
Regression for both variables
----------------------------------------------------------------------------Dependent variable: nacciden
----------------------------------------------------------------------------Standard
Parameter
Estimate
Error
T
Statistic
P-Value
----------------------------------------------------------------------------CONSTANT
250,63
113,216
2,21373
0,0625
matricul
0,0725492
0,0395634
1,83374
0,1093
permisos
0,0301069
0,043353
0,694461
0,5098
-----------------------------------------------------------------------------
Número de transparencia: 47
Grado en Ingeniería. Estadística. Tema 4
Regression for both variables
----------------------------------------------------------------------------Dependent variable: nacciden
----------------------------------------------------------------------------Standard
Parameter
Estimate
Error
T
Statistic
P-Value
----------------------------------------------------------------------------CONSTANT
250,63
113,216
2,21373
0,0625
matricul
0,0725492
0,0395634
1,83374
0,1093
permisos
0,0301069
0,043353
0,694461
0,5098
-----------------------------------------------------------------------------
Número de transparencia: 48
Grado en Ingeniería. Estadística. Tema 4
Regressions
Accid=278.2 +0.1 Matriculas
(11.68)
Accid=216.4 +0.1 Permisos
(9.81)
Accid=250+0.07 Matriculas +0.03 Permisos
(1.8)
(0.69)
Número de transparencia: 49
Grado en Ingeniería. Estadística. Tema 4
What
’s happening
?
What’s
happening?
(X 1000)
24
matricul
20
Correlación=.975
16
12
8
4
0
0
4
8
12
16
permisos
Número de transparencia: 50
20
24
(X 1000)
Grado en Ingeniería. Estadística. Tema 4
Regression: - a problem
Sometimes the independent variables are very similar:
they contain the same information
Independent
variables
Número de transparencia: 51
Dependent
variable
Grado en Ingeniería. Estadística. Tema 4
Regression: - a problem
The model cannot distinguish between the two variables.
Independent
variables
Número de transparencia: 52
Dependent
variable
Grado en Ingeniería. Estadística. Tema 4
In our example:
Registered cars
Driving licences
Num Accid
Both are too similar in order to
distinguish between them
Número de transparencia: 53
Grado en Ingeniería. Estadística. Tema 4
In our example:
The solution? –
Registered cars
Driving licences
Eliminate one of the variables.
We lose almost no information
Num Accid
Both are too similar in order to
distinguish between them
Número de transparencia: 54
Grado en Ingeniería. Estadística. Tema 4
In our example:
The solution? –
Registered cars
Eliminate one of the variables.
We lose almost no information
Num Accid
Both are too similar in order to
distinguish between them
Número de transparencia: 55
Grado en Ingeniería. Estadística. Tema 4
The problem of (multi)colinearity frequenctly appears in
statistics
We tend to measure one thing in many ways
It is detected when:
-for simple regression the variables are significant
- on introducing new variables, these variables stop
becoming significant
Número de transparencia: 56
Grado en Ingeniería. Estadística. Tema 4
Regression
Introduction. Non-deterministic relationships
Simple Linear Regression
Model
Estimation
Diagnosis / Inference
Multiple Regression
Multiple dispersion graphs
Estimation
Multicolinearity
Dummy variables
Número de transparencia: 57
Grado en Ingeniería. Estadística. Tema 4
A weight – height study
Is the relation the same for women and men?
Weight
Height
Número de transparencia: 58
Grado en Ingeniería. Estadística. Tema 4
A weight – height study
Is the relation the same for women and men?
Weight
Weight
Height
Número de transparencia: 59
Height
Grado en Ingeniería. Estadística. Tema 4
A weight – height study
If the relation is not equal, we could commit serious errors:
Weight
Weight
Height
Número de transparencia: 60
Height
Grado en Ingeniería. Estadística. Tema 4
Examples
Variable Y
Variable X
Group
that
influence
Weight
Height
Sex: Male or Female
Consumption of a
worker
Earnings of the
worker
Labour status:
Unemployed or Employed
Automobile
consumption
Power / Engine
size
Engine type: Diesel or
Petrol
Profit margin of a
bank branch
Bank charges
Branch: Urban or Rural
Número de transparencia: 61
could
Grado en Ingeniería. Estadística. Tema 4
It is necessary to introduce a group
group::
In this case:
• we define a variable Z that takes the following values:
Zi =0 if the observation belongs to group A
Zi=1 if the observation belongs to group B
• and we will estimate using the following regression model:
yˆ = βˆ0 + βˆ1 X + βˆ2 Z
Número de transparencia: 62
Grado en Ingeniería. Estadística. Tema 4
The model is estimated
estimated::
yˆ = βˆ0 + βˆ1 X + βˆ2 Z
• Women are assigned Z=0, so that :
yˆ = βˆ0 + βˆ1 X
• Men are assigned Z=1, so that:
yˆ = ( βˆ0 + βˆ2 ) + βˆ1 X
Número de transparencia: 63
Grado en Ingeniería. Estadística. Tema 4
Therefore
Therefore::
Weight
yˆ = ( βˆ0 + βˆ2 ) + βˆ1 X
β̂ 2
yˆ = βˆ0 + βˆ1 X
Height
The effect is that a man of a certain height weighs β2 kg more that a
women of the same height
Or does he? …
Número de transparencia: 64
Grado en Ingeniería. Estadística. Tema 4
Let
’s do it
Let’s
it::
Dependent variable: peso
----------------------------------------------------------------------------Standard
T
Parameter
Estimate
Error
Statistic
P-Value
----------------------------------------------------------------------------CONSTANT
-77,7888
16,0908
-4,83438
0,0000
altura
0,842013
0,0905752
9,29628
0,0000
sexo
-5,17748
2,20877
-2,34405
0,0208
----------------------------------------------------------------------------R-squared = 60,8791 percent
R-squared (adjusted for d.f.) = 60,1927 percent
Sexo=0 : Men
Sexo=1 : Women
Therefore: a man of height 180 will weigh: -78+0.84x180= 73 kilos
… and a women of the same height will weigh: -78+0.84x180-5.17= 68 kilos
There is a significant difference because t=-2.34 and its abs. value is > 2
Número de transparencia: 65
Grado en Ingeniería. Estadística. Tema 4
The result
Weight
5 kg
Men
Women
Height
Número de transparencia: 66
Grado en Ingeniería. Estadística. Tema 4
Interactions
We have supposed that the lines are parallel
And if they aren’t?
Y
B
A
X
Número de transparencia: 67
Grado en Ingeniería. Estadística. Tema 4
Including interactions in the model
Modelling an interaction is easy. One has to estimate a
regression model between:
·
·
·
·
the Y variable
the X variable
the Z variable
the X - Z interaction which is modelled by the product (XZ).
yˆ = βˆ 0 + βˆ1 X + βˆ 2 Z + βˆ 3 XZ
For the group with Z=0
yˆ = βˆ 0 + βˆ1 X
For the group with Z=1 yˆ = βˆ 0 + βˆ1 X + βˆ 2 + βˆ 3 X = ( βˆ 0 + βˆ 2 ) + ( βˆ1 + βˆ 3 ) X
Therefore, in order to analyse if an interaction exists is the same as to estimate a regression
model and see if the the interaction parameter is significant (abs. value of t-statistic > 2).
Número de transparencia: 68
Grado en Ingeniería. Estadística. Tema 4
Example
Example:: Sales of companies in the service sector in Madrid as a
function of their investment in research and development ((R&D)
R&D)
Plot of ventas vs id
240
ventas
200
160
120
80
40
0
0
0.5
1
1.5
2
2.5
id
3
(X 1000)
Plot of log(ventas) vs log(id)
log(ventas)
5.7
5.2
4.7
4.2
3.7
3.2
2.7
3.1
4.1
5.1
6.1
7.1
8.1
log(id)
LOG(VENTAS) = 1.762 + 0.393 Log(ID)
(t)
(7.88)
(10.34)
Número de transparencia: 69
R2 = 45.7 %
Grado en Ingeniería. Estadística. Tema 4
Example
Example:: Sales of companies in the service sector in Madrid as a
function of their investment in research and development ((R&D)
R&D)
We want to study if there is a difference in being in the telecommunications
sector or not
TELECO=1 : if in telecom sector
TELECO=0 : if not in telecom sector
LOG(VENTAS) = 2.25 + 0.288 Log(ID) +
(t)
(11.12) (8.08)
0.527 TELECO
(7.03)
R2 = 61.05%
•If the company is in the telecom sector:
Log(VENTAS)= 2.78 + 0.288 log(ID)
•If it is in another sector:
Log(VENTAS) = 2.25 + 0.288 log(ID)
We estimate the interaction:
Log(VENTAS)=1.99 + 0.334Log(ID) + 1.80 TELECO - 0.202 TELECOxLog(ID)
(t)
(8.84) (8.40)
(3.40)
(-2.43)
•If the company is in the telecom sector:
R2= 62.8%
Log(VENTAS) = 3.8 + 0.13 log(ID)
•If it is in another sector:
Log(VENTAS) = 1.99 + 0.334 log(ID)
Número de transparencia: 70
Grado en Ingeniería. Estadística. Tema 4
Descargar