Decision making
with two samples
Dr. Miguel Toro PhD
1
Introduction
• In the previous semester, we started studying
inferential statistics.
• In contrast with descriptive statistics,
inferential statistics aim to establish a fact
based on a sample.
• Two semesters ago we saw the analysis for two
samples. You guys will review this material.
For the last semester’s students, this is a
continuation of one-sample inferential
statistics.
• For one variable, we used a sample to infer
2
the population parameter with a certain degree
Introduction
• The analysis now expand to a two samples. How do
we know that the liking of a product is different
than the control beyond the limits of the data?
• Note that if our claims are valid only in our
laboratory, then they are not generalizable and,
therefore, are not knowledge contributions and,
therefore, are not a research outcome.
• Note that, if not properly made (statistically
speaking), whatever happens in the laboratory
lacks validity.
• We can claim that one treatment is different than
the control (or another treatment) if we sample
appropriately, we randomized the sampling (more of
3
this later in the semester), and we analyze the
Introduction
• In this and coming sessions we will talk about:
• Inference on the means of two populations,
variance known,
• Inference on the means of two populations,
variances unknown,
• The paired t-test,
• Inference on the ratio of variances of two
normal populations, and
• Inference on two population proportions,
4
Inference on the
means of two
populations, variance
known
5
Inference on the means of two
populations, variance known
• Imagine we manufacture two products and are
interested in whether people like one more than
the other.
• Let’s assume we use a hedonic scale of 1-9
from extremely dislike to extremely like.
• We could collect the data, get the mean for the
tasting rate for each product, and compare it.
• Are we in the position of saying that our
results represent the populations?
6
Inference on the means of two
populations, variance known
• To WB 1
7
Inference on the means of two
populations, variance known
• No.
• We can make claims about the difference between
these two samples if the following assumptions
are true:
8
Inference on the means of two
populations, variance known
• In the previous semesters, we talk about
estimators. Numbers help us to examine the
veracity of our inference.
• A logical estimator of the difference in
population means µ1 - µ2 is the difference in
the sample means 𝑋1 − 𝑋2.
• To WB2
9
Inference on the means of two
populations, variance known
10
Inference on the means of two
populations, variance known
• To test whether the difference in means is
assigned to a constant. Then we can say that:
• WB3
11
Inference on the means of two
populations, variance known
12
Example 01
• A product developer is interested in reducing the
drying time in food products. Two formulations are
tested. Formulation 1 is the standard chemistry, and
formulation 2 has a new drying ingredient that should
reduce the drying time.
• From experience is known that the standard deviation of
drying time is 8 minutes, and its inherent variability
should be unaffected by the addition of the new
ingredient.
• Ten specimens are tested with formulation 1 and another
10 specimens with formulation 2. The specimens were
tested in random order.
• The average drying times are 𝑋1 =121 min and 𝑋2 = 112
min, respectively.
• What conclusions can the product developer draw about 13
Inference on the means of two
populations, variance known
14
Inference on the means of two
populations, variance known
• Type II error and choice of sample size
• Suppose that the null hypothesis µ1 - µ2 = ∆0 is
false and that the true difference in means is
µ1 - µ2 = ∆,
• Where ∆>∆0
• As we did for one sample inferential analysis,
we can estimate the sample size to obtain a
certain level of type II error (β) for a given
difference in means ∆ and a certain level of
significance α.
15
Type II error and choice of
sample size
16
Type II error and choice of
sample size
17
Example
• In the previous example suppose that the actual
difference in drying times is as much as 10
minutes. We want to detect this with a
probability of at least 0.9. What sample size
is appropriate?
• To WB5
18
Inference on the
means of two
populations, variance
unknown
19
Inference on the means of two
populations, variance unknown
• As we did for one sample, we can extend the
analysis of the means of two populations when
we do not know the variance.
• When the sample size exceeds 40, we can use the
previous procedure, even if we do not know the
variance. We can assume that the sample
variance is the population variance.
• We can't assume that the populations in
question follow a normal distribution when
small samples are taken. But if the sample was
properly taken (randomly representative of the
20
population), we can assume the populations
Inference on the means of two
populations, variance unknown
• We will approach this analysis a bit
differently. We will divide by cases.
21
Inference on the means of two
populations, variance unknown
– Case 1
• In case 1, we do not know the variances, but we
know that they are equal.
• To WB
22
Inference on the means of two
populations, variance unknown
– Case 1
23
Inference on the means of two
populations, variance unknown
– Case 1
24
Example
• Two catalysts are being analyzed to determine
how they affect the mean yield of a chemical
process. Specifically, catalyst 1 is currently
in use, but catalyst 2 is acceptable. Because
catalyst 2 is cheaper, it should be adopted,
providing it does not change the process yield.
• A test is run, and the results are in table 2.
Are there any differences in mean yields?
Assume equal variances.
25
Example
To WB7
26
Inference on the means of two
populations, variance unknown
– Case 2
• The second case is when, not knowing the
population’s variance, we know that they are
different.
• The statistic we use is an approximation of the
T statistic.
27
Inference on the means of two
populations, variance unknown
– Case 2
28
Example
• Arsenic concentration in public drinking
supplies is a potential health risk. An
article in the Arizona Republic (May 27, 2001)
reported drinking water arsenic concentrations
in parts per billion (ppb) for 10 metropolitan
Phoenix communities and 10 communities in rural
Arizona.
• The data is as follows:
29
Example
30
Example
• We wish to determine if there is any difference
in mean arsenic concentrations between
metropolitan Phoenix communities and
communities in rural Arizona.
• To WB8
31
Type II Error and choice of
sample size
• For case 1, the operating curves Va, Vb, Vc and Vd
are used with:
∆−∆0
𝑑=
,
2σ
where:
• ∆ is the true difference in means
• The sample size is calculated as follows:
𝑛∗ = 2𝑛 − 1
• Va and Vb are for two-sided alternative hypothesis
• Vc and Vd are for one-sided alternative
hypothesis.
• We do not know the population standard deviation
32
so, we approximate this value with the sample
Example
• Consider the catalyst experiment we solved
before. Suppose that if catalyst 2 produces a
mean yield that differs from the mean yield of
catalyst 1 by 4. We would like to reject the
null hypothesis with a probability of at least
0,85. What sample size is required?
• To WB9
33
The paired t-test
34
Introduction
• This is a special case of the two-sample t-tests,
where the two populations of interest are
collected in pairs.
• Each pair of measurements are taken under
homogenous conditions. Although the conditions
might change from pair to pair.
• For example, in the lab, we are interested in
testing the likeability of meat products.
• We have two possible percentages of substitution
of meat in a known product. Let’s assume we have a
first batch of raw material and a second from
somewhere else.
• The conditions for the first batch are homogenous,
but there might be slight changes from the second 35
Introduction
• If we randomized the measurements and omitted
the possible differences in the batch, part of
the variability can be attributed to the batch
differences and not the likeability score.
• So, a more powerful experimental procedure is
to collect the data in pairs. In this way, we
can examine not only the differences across
substitutions but also across batches of raw
materials.
• This procedure is called paired t-test.
36
Paired t-test
• (𝑋11 , 𝑋21 ), (𝑋12 , 𝑋22 ) … (𝑋1𝑛 , 𝑋2𝑛 )
• Are a set of n paired observation.
• The difference between each pair is:
• 𝐷𝑗 = 𝑋1𝑗 − 𝑋2𝑗 ; j=1,2,…n
• The differences are assumed to be normally
distributed.
• The mean of the difference is uD and the
variance σ𝐷2
37
Paired t-test
38
Example
• An article in the Journal of Strain Analysis
reports a comparison of several methods for
predicting shear strength for steel plate
girders (vigas de acero).
• Shear strength is the ability of a material to
resist the force that causes it to slide or
fail. Adhesives like epoxy are known for their
high shear strength.
• Data for two of these methods, the Karlsrube
and Lehigh procedures, when applied to nine
specific girders, are shown in Table 5-3. We
39
wish to determine whether there are any
Example
40
Example
• To WB10
41
Paired versus unpaired
comparisons
• Depending on the conditions of the experiment,
the experimenter can choose between a pair or
unpaired (two samples) experiment.
• What conditions to look for?
1. If the experimental units are relatively
homogeneous (small σ) and the correlation
between the pairs is small, the gain in
precision attributable to pairing will be offset
by the loss of degrees of freedom, so an
independent-sample experiment should be used.
2. The paired experiment should be used if the
experimental units are relatively heterogeneous
42
(large σ) and there is a large positive
Inference on the
ratio of variances of
two normal
populations
43
Introduction
• Suppose two independent normal populations are
of interest (for example, the acidity of two
products). Let us assume that we do not know
the mean and variance of the two populations.
• Assume that there are two samples from those
populations of interest (samples we can get in
the laboratory or Innopolis).
• So, we are interested in testing whether the
variances of the two populations are similar.
• For this test, we introduce a new distribution:
the F distribution.
44
Introduction
• The random variable F is the ratio of two
independent chi-square random variables, each
divided by its number of degrees of freedom.
• To WB11
45
The F distribution
46
The F distribution
47
The F distribution
48
The F distribution
• Table IV in “Engineering Statistics” shows
several tables for different levels of
significance level.
• Note that the table shows only the upper tail
percentage points of the F distribution (𝑓α,𝑢,𝑣 ).
• To find the lower tail (𝑓1−α,𝑢,𝑣 ) we only have to
apply the following relation:
49
The test procedure for the
equality of two variances
Note that under the null hypothesis of equality of population variances, the F d
a relation between the sample variances. Then… WB12
50
The test procedure for the
equality of two variances
51
The test procedure for the
equality of two variances
52
Example
• Oxide layers on semiconductor wafers are etched in a
mixture of gases to achieve the proper thickness.
• The variability in the thickness of these oxide layers
is a critical characteristic of the wafer, and low
variability is desirable for subsequent steps.
• Two different mixtures of gases are being studied to
determine whether one is superior in reducing the oxide
thickness variability.
• Sixteen wafers are etched in each gas. The sample
standard deviations of oxide thickness are s1= 1.96
angstroms and s2 = 2.13 angstroms, respectively.
• Is there evidence, to an alpha = 0.05, indicating that
either gas is preferable?
• To WB13
53
Inference on two
population
proportions
54
Hypothesis testing on the
equality of two binomial
proportions
• Suppose we are interested in comparing two
manufacturing floors in terms of the quality of
their products. Let's imagine that the
microbiological content measures quality.
Anything beyond a certain limit is not
acceptable for human consumption.
• Then, for each of the manufacturing floors, we
will have a proportion of success (product
outside specifications). We want to know if
there is any difference between the two
populations.
55
Hypothesis testing on the
equality of two binomial
proportions
• Recall from our analysis with one sample that
these binomial distributions (formed by
consecutive Bernoulli events) can approximate
to normal distributions if the samples are big
enough.
• To WB14
56
Hypothesis testing on the
equality of two binomial
proportions
𝑥1 + 𝑥2
𝑃=
𝑛1 + 𝑛2
57
Example
• Two types of polishing solutions are being
evaluated for possible use in tumble-polish
operations (lijado fino) for manufacturing
interocular lenses used in the human eye
following cataract surgery.
• Three hundred lenses were tumbled-polished
using the first polishing solution, and of this
number, 253 had no polishing-induced defects.
Another 300 lenses were tumbled-polished using
the second polishing solution, and 196 lenses
were satisfactory on completion. Is there a
reason to believe that the two polishing
58
Type II error and choice of
sample size
59
Type II error and choice of
sample size
60
Type II error and choice of
sample size
61
Summary tables
• Look at the cover back of the book…
62
What if we have more than two
samples?
• This part will be just a quick introduction to
the analytics portion of the design of
experiments.
• We will go deeper into this later.
• You will see that analyzing data from
experiments compares samples, like what we have
been doing so far.
• The difference is that we create the conditions
for the groups we want to compare.
• However, the question remains. What if we have
more than two samples to compare?
63
What if we have more than two
samples?
• Think of an experiment where we manipulate the
substitution percentage in the elaboration of a
meat-by product.
• Let's imagine that the literature (past
research) points to three different percentage
substitutions.
• Once we have the data (likeability for example)
taken from three groups of people, we have
three groups.
• Before moving into the analysis, what have we
changed to produce (presumably) a different
64
likeability score?
What if we have more than two
samples?
• This is the factor. What we manipulate changes
a particular characteristic interest (different
likeability score).
• As we saw in our previous classes, if we are
interested in comparing the means of two
samples, we use the t-test.
• What would you do if you had three?
65
What if we have more than two
samples?
• There is an analysis that does precisely that.
That is the analysis of variance.
• We will talk a lot more about this in the next
classes.
66