1. Religions Example

In the previous lab about confidence intervals for proportions, we looked at some data about religous views around the world. Let’s look at that data set again and conduct a hypothesis test.

Let’s load in the data and subset to obtain data for just the United States.

atheism <- read_csv("https://mhc-stat140-2017.github.io/data/openintro/atheism/atheism.csv")
## Parsed with column specification:
## cols(
##   nationality = col_character(),
##   response = col_character(),
##   year = col_integer()
## )
head(atheism)
## # A tibble: 6 x 3
##   nationality    response  year
##         <chr>       <chr> <int>
## 1 Afghanistan non-atheist  2012
## 2 Afghanistan non-atheist  2012
## 3 Afghanistan non-atheist  2012
## 4 Afghanistan non-atheist  2012
## 5 Afghanistan non-atheist  2012
## 6 Afghanistan non-atheist  2012
us_2012 <- filter(atheism, nationality == "United States", year == "2012")
table(us_2012$response)
## 
##     atheist non-atheist 
##          50         952
nrow(us_2012)
## [1] 1002
head(us_2012)
## # A tibble: 6 x 3
##     nationality    response  year
##           <chr>       <chr> <int>
## 1 United States non-atheist  2012
## 2 United States non-atheist  2012
## 3 United States non-atheist  2012
## 4 United States non-atheist  2012
## 5 United States non-atheist  2012
## 6 United States non-atheist  2012

According to Wikipedia, in 1991 2% of U.S. citizens identified as atheists (https://en.wikipedia.org/wiki/Demographics_of_atheism#United_States). You suspect that this proportion has increased in the intervening years. Conduct a test to evaluate this hypothesis.

Write down the null and alternative hypotheses

SOLUTION:

\(H_0: p = 0.02\)

\(H_A: p > 0.02\)

Check assumptions and conditions

SOLUTION:

  1. Each person has two outcomes (atheist or non-atheist)

  2. Each person we pick has the same probability of being an atheist or non-atheist

  3. The athesist/non-atheist status of the people in our sample is independent. Two things to check here:

  1. The sample was a random sample

  2. The sample size, 1002, is less than 10% of the population size.

Calculate a p-value. Do this using the binom.test function and by hand using the pbinom function. You should get the same answer both ways.

SOLUTION:

Here’s the calculation using the binom.test function:

binom.test(us_2012$response, success = "atheist", p = 0.02, alternative = "greater")
## 
## 
## 
## data:  us_2012$response  [with success = atheist]
## number of successes = 50, number of trials = 1002, p-value =
## 8.4e-09
## alternative hypothesis: true probability of success is greater than 0.02
## 95 percent confidence interval:
##  0.03908496 1.00000000
## sample estimates:
## probability of success 
##              0.0499002

From the output, the p-value is \(8.4 * 10^{-9} = 0.0000000084\).

Here’ the calculation using the pbinom function. For this calculation, note that since the alternative hypothesis is that \(p > 0.02\), the p-value will be the probability of getting a test statistic at least as large as what we got in our sample. The number of “successes” (atheists) in this data set is 50 (from the output of table(us_2012$response) above). The sample size \(n\) is 1002. And we use the value of \(p\) from the null hypothesis, 0.02. So, in short, the model for the number of atheists in our sample corresponding to the null hypothesis is

\[X \sim \text{Binomial}(1002, 0.02)\]

and the pvalue is calculated as

\[P(X \geq 50) = 1 - P(X \leq 49)\]

We can calculate that probability in R as follows:

1 - pbinom(49, size = 1002, prob = 0.02)
## [1] 8.399696e-09

This is the same p-value we got using the binom.test function above.

Write down your conclusions

SOLUTION:

This p-value is very small. It would be unlikely to observe at least 50 atheists in a sample of size 1002 if the proportion of atheists in the population was 0.02. This offers evidence that the population proportion is greater than 0.02.

More formally, if we compare the p-value to a significance level cutoff such as \(\alpha = 0.001\), we see that the p-value is less than \(\alpha\). The data provide enough evidence to conclude that the population proportion is greater than \(0.01\) at the \(\alpha = 0.001\) significance level.

2. Empty Houses (Adapted from SDM 4.6)

According to the 2010 Cenus, 11.4% of all housing units in the United States were vacant. A county supervisor wonders if her county is different from this. She randomly selects 850 housing units in her county and finds that 129 of the housing units are vacant.

Conduct a test to evaluate the county supervisor’s hypothesis.

Write down the null and alternative hypotheses

SOLUTION:

\(H_0: p = 0.114\)

\(H_A: p \neq 0.114\)

(If you’re looking at the Rmd document, the symbol \(\neq\) gets turned into a “does-not-equal sign” when you knit the document.)

Check assumptions and conditions

SOLUTION:

  1. Each house has two outcomes (vacant or occupied)

  2. Each randomly selected house we pick has the same probability of being vacant (equal to the proportion of vacant houses in the county).

  3. The vacant/occupied status of the houses in our sample is independent. Two things to check here:

  1. The sample was a random sample

  2. The sample size, 850 houses, is less than 10% of the population size. We can’t really check this from the information given in the problem, so we’ll just have to assume that there are more than 8,500 houses in the county.

Calculate a p-value. Use the binom.test function (you don’t need to do it again with the pbinom function).

SOLUTION:

# Your code goes here
binom.test(129, 850, p = 0.114, alternative = "two.sided")
## 
## 
## 
## data:  129 out of 850
## number of successes = 129, number of trials = 850, p-value =
## 0.0008068
## alternative hypothesis: true probability of success is not equal to 0.114
## 95 percent confidence interval:
##  0.1282967 0.1776757
## sample estimates:
## probability of success 
##              0.1517647

The p-value is 0.0008.

Write down your conclusions

SOLUTION:

We can compare the p-value to a specified significance level \(\alpha\). For example, we could use \(\alpha = 0.001\). Since the p-value is less than \(\alpha = 0.001\), we can reject the null hypothesis. The data provide statistically significant evidence that the proportion of vacant houses in the population is not equal to 0.114, at the \(\alpha = 0.001\) significance level.