In the previous lab about confidence intervals for proportions, we looked at some data about religous views around the world. Let’s look at that data set again and conduct a hypothesis test.
Let’s load in the data and subset to obtain data for just the United States.
atheism <- read_csv("https://mhc-stat140-2017.github.io/data/openintro/atheism/atheism.csv")
## Parsed with column specification:
## cols(
## nationality = col_character(),
## response = col_character(),
## year = col_integer()
## )
head(atheism)
## # A tibble: 6 x 3
## nationality response year
## <chr> <chr> <int>
## 1 Afghanistan non-atheist 2012
## 2 Afghanistan non-atheist 2012
## 3 Afghanistan non-atheist 2012
## 4 Afghanistan non-atheist 2012
## 5 Afghanistan non-atheist 2012
## 6 Afghanistan non-atheist 2012
us_2012 <- filter(atheism, nationality == "United States", year == "2012")
table(us_2012$response)
##
## atheist non-atheist
## 50 952
nrow(us_2012)
## [1] 1002
head(us_2012)
## # A tibble: 6 x 3
## nationality response year
## <chr> <chr> <int>
## 1 United States non-atheist 2012
## 2 United States non-atheist 2012
## 3 United States non-atheist 2012
## 4 United States non-atheist 2012
## 5 United States non-atheist 2012
## 6 United States non-atheist 2012
According to Wikipedia, in 1991 2% of U.S. citizens identified as atheists (https://en.wikipedia.org/wiki/Demographics_of_atheism#United_States
). You suspect that this proportion has increased in the intervening years. Conduct a test to evaluate this hypothesis.
SOLUTION:
\(H_0: p = 0.02\)
\(H_A: p > 0.02\)
SOLUTION:
Each person has two outcomes (atheist or non-atheist)
Each person we pick has the same probability of being an atheist or non-atheist
The athesist/non-atheist status of the people in our sample is independent. Two things to check here:
The sample was a random sample
The sample size, 1002, is less than 10% of the population size.
SOLUTION:
Here’s the calculation using the binom.test function:
binom.test(us_2012$response, success = "atheist", p = 0.02, alternative = "greater")
##
##
##
## data: us_2012$response [with success = atheist]
## number of successes = 50, number of trials = 1002, p-value =
## 8.4e-09
## alternative hypothesis: true probability of success is greater than 0.02
## 95 percent confidence interval:
## 0.03908496 1.00000000
## sample estimates:
## probability of success
## 0.0499002
From the output, the p-value is \(8.4 * 10^{-9} = 0.0000000084\).
Here’ the calculation using the pbinom function. For this calculation, note that since the alternative hypothesis is that \(p > 0.02\), the p-value will be the probability of getting a test statistic at least as large as what we got in our sample. The number of “successes” (atheists) in this data set is 50 (from the output of table(us_2012$response)
above). The sample size \(n\) is 1002. And we use the value of \(p\) from the null hypothesis, 0.02. So, in short, the model for the number of atheists in our sample corresponding to the null hypothesis is
\[X \sim \text{Binomial}(1002, 0.02)\]
and the pvalue is calculated as
\[P(X \geq 50) = 1 - P(X \leq 49)\]
We can calculate that probability in R as follows:
1 - pbinom(49, size = 1002, prob = 0.02)
## [1] 8.399696e-09
This is the same p-value we got using the binom.test function above.
SOLUTION:
This p-value is very small. It would be unlikely to observe at least 50 atheists in a sample of size 1002 if the proportion of atheists in the population was 0.02. This offers evidence that the population proportion is greater than 0.02.
More formally, if we compare the p-value to a significance level cutoff such as \(\alpha = 0.001\), we see that the p-value is less than \(\alpha\). The data provide enough evidence to conclude that the population proportion is greater than \(0.01\) at the \(\alpha = 0.001\) significance level.
According to the 2010 Cenus, 11.4% of all housing units in the United States were vacant. A county supervisor wonders if her county is different from this. She randomly selects 850 housing units in her county and finds that 129 of the housing units are vacant.
Conduct a test to evaluate the county supervisor’s hypothesis.
SOLUTION:
\(H_0: p = 0.114\)
\(H_A: p \neq 0.114\)
(If you’re looking at the Rmd document, the symbol \(\neq\) gets turned into a “does-not-equal sign” when you knit the document.)
SOLUTION:
Each house has two outcomes (vacant or occupied)
Each randomly selected house we pick has the same probability of being vacant (equal to the proportion of vacant houses in the county).
The vacant/occupied status of the houses in our sample is independent. Two things to check here:
The sample was a random sample
The sample size, 850 houses, is less than 10% of the population size. We can’t really check this from the information given in the problem, so we’ll just have to assume that there are more than 8,500 houses in the county.
SOLUTION:
# Your code goes here
binom.test(129, 850, p = 0.114, alternative = "two.sided")
##
##
##
## data: 129 out of 850
## number of successes = 129, number of trials = 850, p-value =
## 0.0008068
## alternative hypothesis: true probability of success is not equal to 0.114
## 95 percent confidence interval:
## 0.1282967 0.1776757
## sample estimates:
## probability of success
## 0.1517647
The p-value is 0.0008.
SOLUTION:
We can compare the p-value to a specified significance level \(\alpha\). For example, we could use \(\alpha = 0.001\). Since the p-value is less than \(\alpha = 0.001\), we can reject the null hypothesis. The data provide statistically significant evidence that the proportion of vacant houses in the population is not equal to 0.114, at the \(\alpha = 0.001\) significance level.