Experimental design and analysis - Universidad de Castilla

Anuncio
Experimental design and analysis
[email protected]
University of Castilla-La Mancha
Department of Mathematics
Institute of Applied Mathematics to Science and Engineering
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
OUTLINE
I
THIS COURSE.
I
1. MOTIVATING INTRODUCTION TO STATISTICS.
I
2. IMPORTANCE OF DESIGNING AN EXPERIMENT.
I
3. ANOVA.
I
4. REGRESSION AND CORRELATION.
I
5. EXPERIMENTAL DESIGN: MOTIVATION AND CRITICISMS.
I
6. OPTIMAL DESIGN THEORY (LINEAR MODELS).
I
7. OPTIMAL DESIGNS FOR NONLINEAR MODELS.
I
8. REAL APPLICATIONS.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
THIS COURSE
I
ASIGNATURA: Modelización y análisis estadístico de procesos
estocásticos (Diseño y análisis de experimentos).
I
PROFESOR: [email protected]
I
http://www.uclm.es/profesorado/jesuslopezfidalgo/lect.html
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Libros de texto recomendados
I
Atkinson A.C. and Donev A.N. (1992). Optimum experimental
design. Oxford science publications. Oxford.
I
Fedorov V.V. (1972). Theory of optimal experiments. Academic
press. New York.
Fedorov V.V. and Hackl P. (1997). Model-oriented design of
experiments. Springer. New York.
I
I
Montgomery D. C. (1991). Diseño y Análisis de Experimentos.
Grupo Editorial Iberoamericano. México.
I
Peña Sánchez de Rivera, D. (2002). Regresión y Diseño de
Experimentos. Alianza Editorial. Madrid.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Apuntes y vídeos de la asignatura (web)
Apuntes: Diseño óptimo.
Vídeos:
I Fundamentos sobre modelización estadística:
I
I
I
I
Estimación y contrastes:
I
I
I
I
I
Probabilidad (error TCL).
Descriptiva.
Introducción a los contrastes de hipótesis.
Estimación y contrastes típicos.
Introducción modelos lineales: mínimos cuadrados, máxima
verosimilitud...
Introducción ANOVA para un factor.
Introducción al diseño de experimentos: ANOVA (un factor).
Análisis de la varianza:
I
I
ANOVA para un factor (análisis de los residuos y ejemplo).
Más de un factor e interacciones (ejemplo).
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Evaluación e información
Evaluación teórica (asistencia, intervenciones): 20%
Trabajos cortos: 40%
Trabajo final: 40%
Se aconseja revisar la página web
http://www.uclm.es/profesorado/jesuslopezfidalgo/lect.html y moodle
con periodicidad para ver avisos o trabajos recomendados.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
1. MOTIVATING
INTRODUCTION TO
STATISTICS
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Misconceptions of Statistics
I
Bernard Shaw: “If a man has his head in an oven and his feet in a
freezer, then his body is in the ideal temperature average”.
I
The probability of a car accident increases with time of driving, thus
this probability will drop increasing the speed.
I
33% of the mortal accidents involve a drunk driver ⇒ 67% involve
someone who has not drunk much ⇒ drive drunk.
I
The Vatican has two Popes per Km2 .
A sample tortured enough confess what you wants. Manipulating:
I
I
I
I
I
Modifying the data.
Bad sampling planning or design.
Wrong model or analysis (e.g. treatment of non response).
Inadequate interpretation.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
What does Statistics do?
I
I
Infer conclusions from experimental data.
Discover relationships:
I
I
Genes related to a desease.
Influence of a diet in preventing a type of cancer.
I
Measures the goodness of fitting a model to the reality.
I
Support and reference tool.
I
Scientific method: Deduction and induction (irregular die).
Proof:
I
I
I
Fast and efficient.
Non exact, but rigorous and scientific.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Healthy critical spirit with mass media
“67% of the young people drink alcohol during the weekends”
I
What is a young person?
I
What is the meaning of drinking alcohol?
I
What is a weekend?
I
Who did conduct/write it?
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Some things to take into account (for instance)
I
How was the sample been taken?
I
Covariation does not mean couase/effect relationship (e.g
police/delincuents or storks/births).
I
Graphics scale.
I
Dealing with non response.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Modern Statistics
I
Union of two disciplines which were developed independently:
I
I
I
Probability.
Descriptive Statistics.
Result: inference, decision making.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Statistical procedure
I
Choose the model.
I
I
Experimental design / sampling.
Preparing the data (e.g. transformations).
I
Analysis.
I
Interpretation and decision making.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Hypotheses testing
I
Court trial: Guilty vs. Innocent (treatment vs. traditional)
I
The system assumes innocency while the guilt is not clear: reject the
null hypothesis (significant)
H0
Free
Sentence
H1
Convict
Truth
H0
H1
Innocent
Guilty
Innocent
Guilty
free
free
ERROR II
Innocent
Guilty
convict
convict
ERROR I
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Conditional probability (extra/prior information)
P(B) =
P(B)
P(E )
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Conditional probability (extra/prior information)
P(B) =
P(B)
P(E ) ,
P(B|A) =
P(A∩B)
P(A∩E )
=
Inst. Appl. Math. Sci. & Eng.
P(A∩B)
P(A)
Experimental design and analysis, Jesús López Fidalgo
Conditional probability (extra/prior information)
P(B) =
P(B)
P(E ) ,
P(B|A) =
P(A∩B)
P(A∩E )
=
Inst. Appl. Math. Sci. & Eng.
P(A∩B)
P(A)
Experimental design and analysis, Jesús López Fidalgo
p-value and test power
I
Risk α = P(reject H0 | H0 ) = P(Type I Error).
I
Risk β = P(accept H0 | H1 ) = P(Type II Error).
I
Test power 1 − β (depends on each value of H1 and α).
I
From the sample, p-value:
p = P(Obtaining either these observations or any other farther from H0
Inst. Appl. Math. Sci. & Eng.
| H0 true).
Experimental design and analysis, Jesús López Fidalgo
Remarks
I
p does not measure the magnitude of the association between two
variables: E.g. Pisa report.
I
It is not the probability of H0 .
I
No rejecting H0 does not mean accepting H0 (test power).
I
Importance of the design and the sample size to succeed in rejecting
H0 when it is false.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Hypotheses test
Sample from N (µ, σ 2 = 32 ), n = 10,
Inst. Appl. Math. Sci. & Eng.
H0 : µ = 0
H1 : µ = 2
Experimental design and analysis, Jesús López Fidalgo
Central limit theorem (the magic)
What if the sample distribution is unknown?
X̄ ∼
= N (µ, σ 2 /n).
For n ≥ 30 the approximation usually works well.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Sampling
How many observations?
H0 : µ = 0
α = 0.05, σ 2 = 32 ,
H1 : µ = 2
1-b
1.0
0.8
0.6
0.4
0.2
10
20
Inst. Appl. Math. Sci. & Eng.
30
40
50
n
Experimental design and analysis, Jesús López Fidalgo
Example: Atypical cases of leukemia in a school
I
National proportion: 0.0001 (1 in 10000).
I
Proportion of 0.0017 in a particular school
(17 times more than the national reference)
I
School A: 3000 students and 5 cases (p = 0.035).
I
School B: 1200 students and 2 cases (p = 0.184).
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Frequent statistical analysis
Y
(Respon-)
se)
Quant.
Qual.
X (Explanatory variables)
Quantitative
Qualitative
Regression
t-test / ANOVA
Correlation
Mann–Withney / Kruskal–Wallis
Wilcoxon / Friedman
Discriminant A.
Fisher exact test
Logit, Probit...
chi-squared / log-linear
neuronal networks
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Interpretation
I
“90% of lung cancer patients have been smokers”
is not the same as
I
“90% of the smokers die of lung cancer”:
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Reliability of a particular cancer test
I
90% “reliable”.
I
Your test gives positive!, but...
In what sense is 90% reliable?
I
I
I
I
How many people currently have this particular cancer?
I
I
If you really have cancer the test gives positive with 90% probability
(sensitivity).
If you do not have this cancer the test gives negative in 90% of the
cases (specificity).
1 in 10.000 (prevalence).
Actual probability that you really have this cancer: 1 in 1000.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
interpretation and use of graphics
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
The same, but well done
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Rigorous proportion
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Histograms
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
2. IMPORTANCE OF
DESIGNING AN EXPERIMENT
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Why?
I
“Think before acting” (especially in the middle of a crisis).
I
Saving time, money and risk.
I
Correct analysis.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Basic principles
I
Randomization.
I
Replication (6= repeated measurements, helicopter example).
I
Blocking (e.g. to eliminate nuisance factors variability).
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Guidelines for Designing an Experiment (Montgomery)
I
Pre-experimental planning:
I
I
I
Recognition and statement of the problem.
Model: Choice of factors (Controllable, uncontrollable and noise),
levels, and ranges.
Selection of the response variable.
I
Choice of experimental design.
I
Performing the experiment (monitor the process, wine...).
I
Statistical analysis of the data.
I
Conclusions and recommendations.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Examples
I
Factorial (fractional).
I
Screening: Select important factors from a big quantity.
I
Nested or hierarchical:
I
Split-plot designs:
I
I
I
Whole plot (main treatments): Temperatures and times.
Split-plot: Remaining variables.
Add as a block.
I
Sequential and adaptive designs.
I
Mixture Experiments.
I
Proper name designs.
I
Response surface.
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Continued
Inst. Appl. Math. Sci. & Eng.
Experimental design and analysis, Jesús López Fidalgo
Descargar