---
title: "Hypothesis Tests for Population Means"
author: "Evan L. Ray"
date: "November 13, 2017"
output: ioslides_presentation
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, cache = FALSE)
require(ggplot2)
require(scales)
require(dplyr)
require(tidyr)
require(readr)
require(mosaic)
```
## Outline of Hypothesis Tests (Again)
1) **Collect Data**: (For each of 8 attempts, was Paul's prediction right?)
2) Calculate a **test statistic**: $x = 8$ (observed number correct)
3) Write down **hypotheses**:
* **Null Hypothesis**: Paul was just guessing: $p = 0.5$
* **Alternative Hypothesis**: Paul is psychic: $p > 0.5$
5) **Sampling Distribution** of the test statistic, assuming null hypothesis is true.
6) **p-value**: probability of getting a test statistic at least as extreme as what we observed, assuming null hypothesis is true.
7) **Conclusion**: Compare the p-value to the significance level $\alpha$. If the p-value is small, it's unlikely that Paul would get 8/8 right if he was just guessing, so we reject the null
## Example: Body Temperatures
```{r, echo = FALSE}
bodytemp = read.table('http://www.amstat.org/publications/jse/datasets/normtemp.dat.txt')
names(bodytemp) = c('temp','sex','hr')
bodytemp$sex = factor(bodytemp$sex)
levels(bodytemp$sex) = c("Males","Females")
```
* It's generally believed that the average body temperature is 98.6 degrees Farenheit (37 degrees Celsius).
* Let's investigate with measurements of the temperatures of 130 adults.
```{r, fig.height=2}
ggplot() +
geom_density(mapping = aes(x = temp), data = bodytemp)
```
* Hypotheses:
* $H_0$: $\mu = 98.6$
* $H_A$: $\mu \neq 98.6$
* What should our test statistic be?
## A Key Result from Last Class
* $\bar{X} \sim \text{Normal}(\mu, \sigma / \sqrt{n})$
* Across all samples, on average the sample mean is equal to the population mean $\mu$.
* The standard deviation of $\bar{X}$ is $\frac{1}{\sqrt{n}}$ as much as the standard deviation $\sigma$ of values in the population.
* $$\frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \sim \text{Normal}(0, 1)$$
* $\frac{\bar{X} - \mu}{\sigma / \sqrt{n}}$ is the distance of $\bar{X}$ from $\mu$, in units of $SD(\bar{X})$.
* $$\frac{\bar{X} - \mu}{s / \sqrt{n}} \sim t_{n-1} \text{ (replace $\sigma$ with its estimate, $s$).}$$
* $\frac{\bar{X} - \mu}{s / \sqrt{n}}$ is the distance of $\bar{X}$ from $\mu$, in units of $SE(\bar{X})$.
## Test Statistic for a Mean
* Let's define our test statistic to be
$$t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}} \text{, where}$$
$\mu_0$ is the value of $\mu$ specified in $H_0$ (98.6 in this case)
* How far was the sample mean from the hypothesized population mean, in units of our best guess at the standard deviation of $\bar{X}$?
* If the null hypothesis is true, then
$$t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}} \sim t_{n - 1}$$
## Conditions to Check
* Observations are **independent**
* Population is **nearly normal** (unimodal, approximately symmetric)...
* ...and **sample size** $n$ is large enough (how big depends on how asymmetric distribution is)
## Back to Body Temperatures
```{r, fig.height=2}
ggplot() +
geom_density(mapping = aes(x = temp), data = bodytemp)
```
Assumptions for hypothesis tests about means:
* Independence
* Data distribution is nearly normal (unimodal and symmetric)
* Sufficient sample size
## Hypotheses
* Null Hypothesis ($H_0$): $\mu = 98.6$ (where $\mu$ is the population mean temperature)
* Alternative Hypothesis ($H_A$): $\mu \neq 98.6$
## Test Statistic
```{r, echo = TRUE}
nrow(bodytemp)
mean(bodytemp$temp)
sd(bodytemp$temp)
```
$$
t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}} = \frac{98.249 - 98.6}{0.733 / \sqrt{130}} = -5.460
$$
## Test Statistic in R
$$
t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}}
$$
```{r, echo = TRUE}
n <- nrow(bodytemp)
x_bar <- mean(bodytemp$temp)
s <- sd(bodytemp$temp)
mu_0 <- 98.6
t <- (x_bar - mu_0) / (s / sqrt(n))
t
```
## P-value
* Probability of getting a test statistic at least as extreme as what we observed, assuming the null hypothesis was true.
* "At least as extreme" in either direction, since $H_A: \mu \neq 98.6$
* $t \sim t_{129}$ (since $n = 130$ and the degrees of freedom is $n - 1$)
```{r, echo = FALSE, fig.height=4, fig.width=7}
plot_df <- data.frame(
x = seq(from = -6,
to = 6,
length = 101)
)
ggplot() +
# geom_polygon(aes(x = x, y = density), fill = "blue", alpha = 0.4, data = plot_df2) +
stat_function(mapping = aes(x = x),
fun = dt,
args = list(df = 129),
data = plot_df) +
geom_vline(xintercept = t) +
geom_vline(xintercept = -t)
```
## Calculation of p-value
```{r, echo = TRUE}
pt(-5.455, df = 129) # probability to the left of -5.455
1 - pt(5.455, df = 129) # probability to the right of 5.455
```
* Combined p-value is 0.000000241
## Alternative Calculation in R
```{r, echo = TRUE}
t.test(bodytemp$temp, mu = 98.6, alternative = "two.sided")
```
## Conclusion
* Compare the p-value to the significance level $\alpha$. For example, if $\alpha = 0.001$ then
$$0.000000241 < 0.001 \text{, so}$$
* The data provide enough evidence to conclude that the mean temperature is not 98.6 degrees F, at the $\alpha = 0.001$ significance level.
## From Wikipedia
"The range for normal human body temperatures, taken orally, is 36.8 $\pm$ 0.5 °C (98.2 $\pm$ 0.9 °F). This means that any oral temperature between 36.3 and 37.3 °C (97.3 and 99.1 °F) is likely to be normal.
The normal human body temperature is often stated as 36.5-37.5 °C (97.7-99.5 °F). In adults a review of the literature has found a wider range of 33.2-38.2 °C (91.8-100.8 °F) for normal temperatures, depending on the gender and location measured."
* https://en.wikipedia.org/wiki/Human_body_temperature
* Never cite Wikipedia