Experimental design and analysis [email protected] University of Castilla-La Mancha Department of Mathematics Institute of Applied Mathematics to Science and Engineering Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo OUTLINE I THIS COURSE. I 1. MOTIVATING INTRODUCTION TO STATISTICS. I 2. IMPORTANCE OF DESIGNING AN EXPERIMENT. I 3. ANOVA. I 4. REGRESSION AND CORRELATION. I 5. EXPERIMENTAL DESIGN: MOTIVATION AND CRITICISMS. I 6. OPTIMAL DESIGN THEORY (LINEAR MODELS). I 7. OPTIMAL DESIGNS FOR NONLINEAR MODELS. I 8. REAL APPLICATIONS. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo THIS COURSE I ASIGNATURA: Modelización y análisis estadístico de procesos estocásticos (Diseño y análisis de experimentos). I PROFESOR: [email protected] I http://www.uclm.es/profesorado/jesuslopezfidalgo/lect.html Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Libros de texto recomendados I Atkinson A.C. and Donev A.N. (1992). Optimum experimental design. Oxford science publications. Oxford. I Fedorov V.V. (1972). Theory of optimal experiments. Academic press. New York. Fedorov V.V. and Hackl P. (1997). Model-oriented design of experiments. Springer. New York. I I Montgomery D. C. (1991). Diseño y Análisis de Experimentos. Grupo Editorial Iberoamericano. México. I Peña Sánchez de Rivera, D. (2002). Regresión y Diseño de Experimentos. Alianza Editorial. Madrid. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Apuntes y vídeos de la asignatura (web) Apuntes: Diseño óptimo. Vídeos: I Fundamentos sobre modelización estadística: I I I I Estimación y contrastes: I I I I I Probabilidad (error TCL). Descriptiva. Introducción a los contrastes de hipótesis. Estimación y contrastes típicos. Introducción modelos lineales: mínimos cuadrados, máxima verosimilitud... Introducción ANOVA para un factor. Introducción al diseño de experimentos: ANOVA (un factor). Análisis de la varianza: I I ANOVA para un factor (análisis de los residuos y ejemplo). Más de un factor e interacciones (ejemplo). Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Evaluación e información Evaluación teórica (asistencia, intervenciones): 20% Trabajos cortos: 40% Trabajo final: 40% Se aconseja revisar la página web http://www.uclm.es/profesorado/jesuslopezfidalgo/lect.html y moodle con periodicidad para ver avisos o trabajos recomendados. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo 1. MOTIVATING INTRODUCTION TO STATISTICS Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Misconceptions of Statistics I Bernard Shaw: “If a man has his head in an oven and his feet in a freezer, then his body is in the ideal temperature average”. I The probability of a car accident increases with time of driving, thus this probability will drop increasing the speed. I 33% of the mortal accidents involve a drunk driver ⇒ 67% involve someone who has not drunk much ⇒ drive drunk. I The Vatican has two Popes per Km2 . A sample tortured enough confess what you wants. Manipulating: I I I I I Modifying the data. Bad sampling planning or design. Wrong model or analysis (e.g. treatment of non response). Inadequate interpretation. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo What does Statistics do? I I Infer conclusions from experimental data. Discover relationships: I I Genes related to a desease. Influence of a diet in preventing a type of cancer. I Measures the goodness of fitting a model to the reality. I Support and reference tool. I Scientific method: Deduction and induction (irregular die). Proof: I I I Fast and efficient. Non exact, but rigorous and scientific. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Healthy critical spirit with mass media “67% of the young people drink alcohol during the weekends” I What is a young person? I What is the meaning of drinking alcohol? I What is a weekend? I Who did conduct/write it? Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Some things to take into account (for instance) I How was the sample been taken? I Covariation does not mean couase/effect relationship (e.g police/delincuents or storks/births). I Graphics scale. I Dealing with non response. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Modern Statistics I Union of two disciplines which were developed independently: I I I Probability. Descriptive Statistics. Result: inference, decision making. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Statistical procedure I Choose the model. I I Experimental design / sampling. Preparing the data (e.g. transformations). I Analysis. I Interpretation and decision making. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Hypotheses testing I Court trial: Guilty vs. Innocent (treatment vs. traditional) I The system assumes innocency while the guilt is not clear: reject the null hypothesis (significant) H0 Free Sentence H1 Convict Truth H0 H1 Innocent Guilty Innocent Guilty free free ERROR II Innocent Guilty convict convict ERROR I Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Conditional probability (extra/prior information) P(B) = P(B) P(E ) Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Conditional probability (extra/prior information) P(B) = P(B) P(E ) , P(B|A) = P(A∩B) P(A∩E ) = Inst. Appl. Math. Sci. & Eng. P(A∩B) P(A) Experimental design and analysis, Jesús López Fidalgo Conditional probability (extra/prior information) P(B) = P(B) P(E ) , P(B|A) = P(A∩B) P(A∩E ) = Inst. Appl. Math. Sci. & Eng. P(A∩B) P(A) Experimental design and analysis, Jesús López Fidalgo p-value and test power I Risk α = P(reject H0 | H0 ) = P(Type I Error). I Risk β = P(accept H0 | H1 ) = P(Type II Error). I Test power 1 − β (depends on each value of H1 and α). I From the sample, p-value: p = P(Obtaining either these observations or any other farther from H0 Inst. Appl. Math. Sci. & Eng. | H0 true). Experimental design and analysis, Jesús López Fidalgo Remarks I p does not measure the magnitude of the association between two variables: E.g. Pisa report. I It is not the probability of H0 . I No rejecting H0 does not mean accepting H0 (test power). I Importance of the design and the sample size to succeed in rejecting H0 when it is false. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Hypotheses test Sample from N (µ, σ 2 = 32 ), n = 10, Inst. Appl. Math. Sci. & Eng. H0 : µ = 0 H1 : µ = 2 Experimental design and analysis, Jesús López Fidalgo Central limit theorem (the magic) What if the sample distribution is unknown? X̄ ∼ = N (µ, σ 2 /n). For n ≥ 30 the approximation usually works well. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Sampling How many observations? H0 : µ = 0 α = 0.05, σ 2 = 32 , H1 : µ = 2 1-b 1.0 0.8 0.6 0.4 0.2 10 20 Inst. Appl. Math. Sci. & Eng. 30 40 50 n Experimental design and analysis, Jesús López Fidalgo Example: Atypical cases of leukemia in a school I National proportion: 0.0001 (1 in 10000). I Proportion of 0.0017 in a particular school (17 times more than the national reference) I School A: 3000 students and 5 cases (p = 0.035). I School B: 1200 students and 2 cases (p = 0.184). Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Frequent statistical analysis Y (Respon-) se) Quant. Qual. X (Explanatory variables) Quantitative Qualitative Regression t-test / ANOVA Correlation Mann–Withney / Kruskal–Wallis Wilcoxon / Friedman Discriminant A. Fisher exact test Logit, Probit... chi-squared / log-linear neuronal networks Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Interpretation I “90% of lung cancer patients have been smokers” is not the same as I “90% of the smokers die of lung cancer”: Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Reliability of a particular cancer test I 90% “reliable”. I Your test gives positive!, but... In what sense is 90% reliable? I I I I How many people currently have this particular cancer? I I If you really have cancer the test gives positive with 90% probability (sensitivity). If you do not have this cancer the test gives negative in 90% of the cases (specificity). 1 in 10.000 (prevalence). Actual probability that you really have this cancer: 1 in 1000. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo interpretation and use of graphics Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo The same, but well done Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Rigorous proportion Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Histograms Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo 2. IMPORTANCE OF DESIGNING AN EXPERIMENT Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Why? I “Think before acting” (especially in the middle of a crisis). I Saving time, money and risk. I Correct analysis. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Basic principles I Randomization. I Replication (6= repeated measurements, helicopter example). I Blocking (e.g. to eliminate nuisance factors variability). Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Guidelines for Designing an Experiment (Montgomery) I Pre-experimental planning: I I I Recognition and statement of the problem. Model: Choice of factors (Controllable, uncontrollable and noise), levels, and ranges. Selection of the response variable. I Choice of experimental design. I Performing the experiment (monitor the process, wine...). I Statistical analysis of the data. I Conclusions and recommendations. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Examples I Factorial (fractional). I Screening: Select important factors from a big quantity. I Nested or hierarchical: I Split-plot designs: I I I Whole plot (main treatments): Temperatures and times. Split-plot: Remaining variables. Add as a block. I Sequential and adaptive designs. I Mixture Experiments. I Proper name designs. I Response surface. Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo Continued Inst. Appl. Math. Sci. & Eng. Experimental design and analysis, Jesús López Fidalgo