This is how to do Fisher’s exact test. First construct a 2x2 contingency table (this is a perfect result for the lady drinking tea)
(C = matrix(c(4,0,0,4), 2, 2))
## [,1] [,2]
## [1,] 4 0
## [2,] 0 4
Now run Fisher’s test on the table. alternative = “greater” option means that we are testing the probability of getting an equal or even better table by random luck (i.e., one with higher numbers on the diagonal of the table)
fisher.test(C, alternative = "greater")
##
## Fisher's Exact Test for Count Data
##
## data: C
## p-value = 0.01429
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
## 2.003768 Inf
## sample estimates:
## odds ratio
## Inf
Notice this is the same as the probability of getting exactly this table by random luck, because there is no better table. There are “8 choose 4” possible guesses, so the probability of the correct guess is:
(p = 1 / choose(8, 4))
## [1] 0.01428571
Now try if she misclassifies two cups
(C = matrix(c(3,1,1,3), 2, 2))
## [,1] [,2]
## [1,] 3 1
## [2,] 1 3
fisher.test(C, alternative = "greater")
##
## Fisher's Exact Test for Count Data
##
## data: C
## p-value = 0.2429
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
## 0.3135693 Inf
## sample estimates:
## odds ratio
## 6.408309
And now 4 misclassified cups
(C = matrix(c(2,2,2,2), 2, 2))
## [,1] [,2]
## [1,] 2 2
## [2,] 2 2
fisher.test(C, alternative = "greater")
##
## Fisher's Exact Test for Count Data
##
## data: C
## p-value = 0.7571
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
## 0.05093776 Inf
## sample estimates:
## odds ratio
## 1
Notice this is just \(1 - p\), where \(p\) was the probability above of a perfect answer This is because there is exactly one combination that is worse
(C = matrix(c(1,3,3,1), 2, 2))
## [,1] [,2]
## [1,] 1 3
## [2,] 3 1
fisher.test(C, alternative = "greater")
##
## Fisher's Exact Test for Count Data
##
## data: C
## p-value = 0.9857
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
## 0.00326534 Inf
## sample estimates:
## odds ratio
## 0.156047
Everything wrong! Notice the probability of doing this well or better is 1.
(C = matrix(c(0,4,4,0), 2, 2))
## [,1] [,2]
## [1,] 0 4
## [2,] 4 0
fisher.test(C, alternative = "greater")
##
## Fisher's Exact Test for Count Data
##
## data: C
## p-value = 1
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
## 0 Inf
## sample estimates:
## odds ratio
## 0
Let’s test the hypothesis that the Michelson-Morley data was on average higher than the true speed of light
Null hypothesis, \(H_0 : \mu =\) “true speed”
Alternative hypothesis, \(H_1 : mu >\) “true speed”
True speed of light (in km/s minus 299,000 km/s)
trueSpeed = 792.458
Let’s test the sample mean from just the 4th run (this was the closest to correct)
x = morley$Speed[morley$Expt == 4]
(sampleMean = mean(x) )
## [1] 820.5
sampleSigma = sd(x)
n = length(x)
Here is the critical value for an alpha = 0.05 significance level
(criticalValue = trueSpeed + sampleSigma / sqrt(n) * qt(1 - 0.05, df = n - 1))
## [1] 815.6729
We reject the null hypothesis if the sampleMean is greater than this (it is)
(sampleMean > criticalValue)
## [1] TRUE
Use a one-sided t test to get a p-value
(tStat = (sampleMean - trueSpeed) / (sampleSigma / sqrt(n)))
## [1] 2.088677
(pValue = 1 - pt(tStat, df = n - 1))
## [1] 0.02521578
We reject the null hypothesis if this p-value is < 0.05 (our significance level). Note: the final answer (reject or don’t reject) is equivalent to the critical value test
(pValue < 0.05)
## [1] TRUE
All of these steps can be verified with a single R command. (This is what you would use in practice if you did not want to do all of the individual steps)
t.test(x - trueSpeed, alternative = "greater")
##
## One Sample t-test
##
## data: x - trueSpeed
## t = 2.0887, df = 19, p-value = 0.02522
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
## 4.827144 Inf
## sample estimates:
## mean of x
## 28.042
Let’s test the hypothesis that Obama was below 50% in Florida in
2012. We’ll use the last Rasmussen poll from 2012, which had him at
48%.
The sample size was 750.
sampleMean = 0.48
n = 750
sampleSigma = sqrt(0.48 * (1 - 0.48))
Null hypothesis, \(H_0 : p = 0.5\)
Alternative hypothesis, \(H_1 : p < 0.5\)
Let’s use the t test again on the sample mean. It is the same procedure as above (just notice the sign change because we are testing the hypothesis that it is less)
(criticalValue = 0.5 - sampleSigma / sqrt(n) * qt(1 - 0.05, df = n -
1))
## [1] 0.4699561
Also the test is now if we are less than the critical value
(sampleMean < criticalValue)
## [1] FALSE
Same pValue procedure as above, but now with the “left tail” (notice the “1 -” goes away)
(tStat = (sampleMean - 0.5) / (sampleSigma / sqrt(n)))
## [1] -1.096323
(pValue = pt(tStat, df = n - 1))
## [1] 0.136645
(pValue < 0.05)
## [1] FALSE
Let’s look at the iris data again and test differences in the Species. Let’s reduce it down to just two of the three species
iris2 = iris[iris$Species == "virginica" | iris$Species == "versicolor",]
iris2$Species = factor(iris2$Species, levels = c("versicolor", "virginica"))
Say we are interested in if these two species have different sepal lengths Our null hypothesis is
H0 : “average sepal lengths of the two species are equal”
Our alternate hypothesis is
H1 : “average sepal lengths are different”
First, let’s do some boxplots of the data
boxplot(Sepal.Length ~ Species, data = iris2, ylab = "Sepal Length")
Looks like there may be a difference, but there is some overlap in the boxplots
Set up the two lists of data
vers = iris2$Sepal.Length[iris2$Species == "versicolor"]
virg = iris2$Sepal.Length[iris2$Species == "virginica"]
n = length(vers)
m = length(virg)
Assuming equal variances, we start with computing the pooled variance
sp = ((n - 1) * var(vers) + (m - 1) * var(virg)) / (n + m - 2) * (1/n + 1/m)
Now, we construct the t-statistic
(t = (mean(vers) - mean(virg)) / sqrt(sp))
## [1] -5.629165
Finally, we get a p-value by looking up in the t distribution. Note this is a two-sided test because of our alternate hypothesis. In the two sided test we have to calculate the probability of being “more extreme” to the right of |t| and to the left of -|t|
(p = (1-pt(abs(t), df = m + n - 2)) + pt(-abs(t), df = m + n - 2))
## [1] 1.724856e-07
This is how to do this hypothesis test step-by-step. In practice it is easiest to use the built-in “t.test” function in R
t.test(vers, virg, var.equal = TRUE)
##
## Two Sample t-test
##
## data: vers and virg
## t = -5.6292, df = 98, p-value = 1.725e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.8818516 -0.4221484
## sample estimates:
## mean of x mean of y
## 5.936 6.588
Now let’s say that we had a hunch that versicolor would have shorter sepals than virginica before we started the experiment. Then we might have a one-sided alternate hypothesis,
H1 : (versicolor mean) < (virginica mean)
Now our p-value is
p = pt(t, df = m + n - 2)
Notice that the t statistic and the degrees-of-freedom don’t change, just the computation of the p-value
Again, the simple t.test way looks like this:
t.test(vers, virg, var.equal = TRUE, alternative = "less")
##
## Two Sample t-test
##
## data: vers and virg
## t = -5.6292, df = 98, p-value = 8.624e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.4596661
## sample estimates:
## mean of x mean of y
## 5.936 6.588