--- title: "Hypothesis Tests for Population Proportions" author: "Evan L. Ray" date: "November 6, 2017" output: ioslides_presentation --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE, cache = FALSE) require(ggplot2) require(scales) require(dplyr) require(tidyr) require(readr) require(mosaic) ``` ## Is Paul the Octopus Psychic? Recall our procedure for hypothesis testing: 1. Collect **data**: for each of 8 trials, was the prediction correct? 2. Calculate a **sample statistic** (called the test statistic): * $x =$ total number correct (8 in our case) 3. Obtain the **sampling distribution** of the test statistic, assuming a **null hypothesis** of no effect (in this case, assuming Paul is just guessing) 4. Calculate the **p-value**: probability of getting a test statistic "at least as extreme" as what we observed in step 2 5. If the p-value is low, reject the null hypothesis and conclude that Paul is psychic! ## More Carefully... * **Test Statistic**: $x = 8$ (observed number correct) * **Null Hypothesis**: Paul was just guessing: $p = 0.5$ * **Alternative Hypothesis**: Paul is psychic: $p > 0.5$ * **Sampling Distribution**, assuming null hypothesis is true:
$$X \sim \text{Binomial}(8, 0.5)$$ (check assumptions!) * **p-value**: $$P(X \geq 8) = 0.0039$$ ```{r, echo = FALSE, fig.height=2.25, fig.width=3.3} Paul_success_probs <- data.frame( num_successes = seq(from = 0, to = 8), pv = factor(c(rep(0, 8), 1)), probability = dbinom(x = seq(from = 0, to = 8), size = 8, prob = 0.5)) ggplot() + geom_col(mapping = aes(x = num_successes, y = probability, fill = pv), data = Paul_success_probs) + xlab("Number of Successes") + scale_fill_manual("p-value", values = c("black", "red")) ```
* **Conclusion**: It's unlikely that Paul would get 8/8 right if he was just guessing, so we reject the null hypothesis and conclude that he is psychic! ## IMPORTANT FACT!!!
* Hypothesis tests are **guaranteed** tell you the wrong thing sometimes!!!!!!!!!!!!!!!! * This is similar to the fact that confidence intervals are **guaranteed** to miss the population parameter sometimes. * More on this in a few days. (image credit: Paul J. Sullivan)
## More on Hypotheses * **Null Hypothesis**: (Short Hand: $H_0$) * Nothing has changed since the past... * People are just guessing... * Nothing interesting is going on... * $p =$ (proportion from the past)/(chance of being right if just guessing)/(etc...) * **Alternative Hypothesis**: (Short Hand: $H_A$) * Times have changed! * People know what they're doing! * The world is fascinating! * $p \neq$ (value from null hypothesis) * $p >$ (value from null hypothesis) * $p <$ (value from null hypothesis) ## Examples of Hypotheses * Paul the Octopus, 8 right out of 8 * Null Hypothesis ($H_0$): $p = 0.5$ * Alternative Hypothesis ($H_A$): $p > 0.5$ * Proportion of M and M's that are blue (concerned it's lower now!); 12 blue out of 100 * Null Hypothesis ($H_0$): $p = 0.16$ * Alternative Hypothesis ($H_A$): $p < 0.16$ * The National Center for Education Statistics released a report in 1996 saying that 66% of students had missed at least one day of school in the past month. A more recent survey of 8302 students found that 5562 of them had missed at least one day of school. Has the rate of absenteeism changed? * Null Hypothesis ($H_0$): $p = 0.66$ * Alternative Hypothesis ($H_A$): $p \neq 0.66$ ## More on P-Values * **p-value**: probability of getting a test statistic "at least as extreme" as what we observed, assuming $H_0$ is true * What counts as "at least as extreme" depends on the form of the alternative hypothesis ## P-Values for One-Sided Tests
* Paul predicts 8 of 8 correctly * $H_0$: $p = 0.5$ * $H_A$: $p > 0.5$ * **p-value**: $P(X \geq 8) = 0.0039$ if $X \sim \text{Binomial}(8, 0.5)$ ```{r, echo = FALSE, fig.height=3.5, fig.width=3.75} Paul_success_probs <- data.frame( num_successes = seq(from = 0, to = 8), pv = factor(c(rep(0, 8), 1)), probability = dbinom(x = seq(from = 0, to = 8), size = 8, prob = 0.5)) ggplot() + geom_col(mapping = aes(x = num_successes, y = probability, fill = pv), data = Paul_success_probs) + xlab("Number of Successes") + scale_fill_manual("Included\nin p-value", labels = c("No", "Yes"), values = c("black", "red")) ```
* 12 Blue M\&M's out of 100 * $H_0$: $p = 0.16$ * $H_A$: $p < 0.16$ * **p-value**: $P(X \leq 12) = 0.1703$ if $X \sim \text{Binomial}(100, 0.16)$ ```{r, echo = FALSE, fig.height=3.5, fig.width=3.75} Paul_success_probs <- data.frame( num_successes = seq(from = 0, to = 100), pv = factor(c(rep(1, 13), rep(0, 88))), probability = dbinom(x = seq(from = 0, to = 100), size = 100, prob = 0.16)) ggplot() + geom_col(mapping = aes(x = num_successes, y = probability, fill = pv), data = Paul_success_probs) + xlab("Number of Successes") + scale_fill_manual("Included\nin p-value", labels = c("No", "Yes"), values = c("black", "red")) ```
## P-Values for Two-Sided Tests * 5562 out of 8302 students missed at least one day of school. * $H_0$: $p = 0.66$, $H_A$: $p \neq 0.66$ * If $H_0$ is true, $X \sim \text{Binomial}(8302, 0.66)$ * "At least as extreme": at least as far from the expected value * $E(X) = np = 8302 * 0.66 = 5479.32$
```{r, warning=FALSE, echo = FALSE, fig.height=2, fig.width=4} Paul_success_probs <- data.frame( num_successes = seq(from = 0, to = 8302), pv = 0, probability = dbinom(x = seq(from = 0, to = 8302), size = 8302, prob = 0.66)) diff <- 5562 - 5479 Paul_success_probs$pv[Paul_success_probs$num_successes >= 5562] <- 1 Paul_success_probs$pv[Paul_success_probs$num_successes <= 5479 - diff] <- 1 Paul_success_probs$pv <- factor(Paul_success_probs$pv) ggplot() + geom_col(mapping = aes(x = num_successes, y = probability, fill = pv), data = Paul_success_probs) + geom_vline(mapping = aes(xintercept = xintercept), color = "red", linetype = 2, data = data.frame(xintercept = 5479.32)) + xlab("Number of Successes") + scale_fill_manual("Included\nin p-value", labels = c("No", "Yes"), values = c("black", "red")) ``` ```{r, warning=FALSE, echo = FALSE, fig.height=2, fig.width=4} Paul_success_probs <- data.frame( num_successes = seq(from = 0, to = 8302), pv = 0, probability = dbinom(x = seq(from = 0, to = 8302), size = 8302, prob = 0.66)) diff <- 5562 - 5479 Paul_success_probs$pv[Paul_success_probs$num_successes >= 5562] <- 1 Paul_success_probs$pv[Paul_success_probs$num_successes <= 5479 - diff] <- 1 Paul_success_probs$pv <- factor(Paul_success_probs$pv) ggplot() + geom_col(mapping = aes(x = num_successes, y = probability, fill = pv), data = Paul_success_probs) + geom_vline(mapping = aes(xintercept = xintercept), color = "red", linetype = 2, size = 1, data = data.frame(xintercept = 5479.32)) + xlab("Number of Successes") + xlim(c(5250, 5700)) + scale_fill_manual("Included\nin p-value", labels = c("No", "Yes"), values = c("black", "red")) ```
* R actually does something slightly different, but the results will usually be the same as what's described here. ## Calculation of P-Values in R * Suppose we have a data frame with a variable indicating success/failure: ```{r, echo = FALSE} paul_guesses <- data.frame(result = rep("correct", 8)) ``` ```{r, echo = TRUE} paul_guesses ``` ## Calculation of P-Values in R (Cont'd) * One-sided: $H_A$: $p > 0.5$ ```{r, echo = TRUE} binom.test(paul_guesses$result, success = "correct", p = 0.5, alternative = "greater") ``` ## Calculation of P-Values in R (Cont'd) * One-sided: $H_A$: $p < 0.16$ * Suppose we know the number of trials ($n = 100$ M\&M's) and number of successes ($x = 12$ blue) ```{r, echo = TRUE} binom.test(x = 12, n = 100, p = 0.16, alternative = "less") ``` ## Calculation of P-Values in R (Cont'd) * Two-sided: $H_A$: $p \neq 0.66$ * Suppose we know the number of trials ($n = 8302$ students) and number of successes ($x = 5562$ missed school) ```{r, echo = TRUE} binom.test(x = 5562, n = 8302, p = 0.66, alternative = "two.sided") ``` ## Drawing Conclusions * **p-value**: probability of getting a test statistic at least as extreme as what we observed, assuming $H_0$ is true * e.g., probability of getting at least 8 predictions right if Paul is just guessing * If the p-value is **small**, that is evidence that the null hypothesis may not be true ## Drawing Conclusions * **p-value**: probability of getting a test statistic at least as extreme as what we observed, assuming $H_0$ is true * e.g., probability of getting at least 8 predictions right if Paul is just guessing * If the p-value is **small**, that is evidence that the null hypothesis may not be true * If we need to make a decision about whether or not the null hypothesis is true, we can see whether the p-value is smaller than a cutoff of our choosing * The cutoff is the **significance level** of the test * Denote the significance level by $\alpha$ (alpha) * A common significance level: $\alpha = 0.05$ * But this choice is arbitrary ## Drawing Conclusions * If the p-value $< \alpha$, we "reject" $H_0$: the data offer enough evidence to conclude that $H_0$ is not true at the significance level $\alpha$. * If the p-value $\geq \alpha$, we "fail to reject"" $H_0$: the data **don't** offer enough evidence to conclude that $H_0$ is not true at the significance level $\alpha$. ## Note About the Book * The procedure described in these slides is different from what's in the book. * Our method uses * **sample statistic** = number of successes in the sample * **sampling distribution** = modeled with a Binomial * The book's method uses * **sample statistic** = proportion of successes in the sample * **sampling distribution** = modeled with a Normal * Everything else is the same (hypotheses, p-values, conclusions), and both methods are valid. * Our procedure requires less work: * fewer assumptions to check (more broadly applicable) * fewer computations (e.g. no need to calculate $\sqrt{p(1-p)/n}$)