There are three main goals/learning objectives for this lab:
Get experience with checking conditions and interpreting confidence intervals for proportions.
Get some experience using R to calculate confidence intervals for proportions.
Understand how the width of the confidence interval depends on various factors:
the confidence level
the proportion that is being estimated
the sample size
This lab will not be graded for credit. However, I will ask you turn it in; there are several common errors in interpreting confidence intervals, and I want to have a chance to read these and give you feedback before you are responsible for this material on graded assignments. Please email the completed lab (Rmd file only) to me, cc-ing anyone you worked with, by 5pm on Monday, Nov 6. You will have class time on Wednesday and Friday to work on this, but we will not spend time on this lab in class on Monday. To download the Rmd file from Rstudio, click the check box next to the file name in the lower right panel of RStudio, then click “More” (top right of that Files panel), and choose “Export…” and save the file to your computer. Then you can attach it to an email.
In August of 2012, news outlets ranging from the Washington Post to the Huffington Post ran a story about the rise of atheism in America. The source for the story was a poll that asked people, “Irrespective of whether you attend a place of worship or not, would you say you are a religious person, not a religious person or a convinced atheist?” The full press release for the poll, conducted by WIN-Gallup International, is found at the following address:
SOLUTION:
SOLUTION:
Turn your attention to Table 6 of the press release (pages 15 and 16), which reports the sample size and response percentages for all 57 countries. While this is a useful format to summarize the data, we will base our analysis on the original data set of individual responses to the survey. Load this data set into R with the following commands.
atheism <- read_csv("https://mhc-stat140-2017.github.io/data/openintro/atheism/atheism.csv")
## Parsed with column specification:
## cols(
## nationality = col_character(),
## response = col_character(),
## year = col_integer()
## )
head(atheism)
## # A tibble: 6 x 3
## nationality response year
## <chr> <chr> <int>
## 1 Afghanistan non-atheist 2012
## 2 Afghanistan non-atheist 2012
## 3 Afghanistan non-atheist 2012
## 4 Afghanistan non-atheist 2012
## 5 Afghanistan non-atheist 2012
## 6 Afghanistan non-atheist 2012
str(atheism)
## Classes 'tbl_df', 'tbl' and 'data.frame': 88032 obs. of 3 variables:
## $ nationality: chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ response : chr "non-atheist" "non-atheist" "non-atheist" "non-atheist" ...
## $ year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 3
## .. ..$ nationality: list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ response : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ year : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
atheism
correspond to?SOLUTION:
To investigate the link between these two ways of organizing this data, take a look at the estimated proportion of atheists in the United States. Towards the bottom of Table 6, we see that this is 5%. We can check this number using the atheism
data by running the commands below. Make sure you understand what each of the commands below does after running it.
us_2012 <- filter(atheism, nationality == "United States", year == "2012")
nrow(us_2012)
## [1] 1002
head(us_2012)
## # A tibble: 6 x 3
## nationality response year
## <chr> <chr> <int>
## 1 United States non-atheist 2012
## 2 United States non-atheist 2012
## 3 United States non-atheist 2012
## 4 United States non-atheist 2012
## 5 United States non-atheist 2012
## 6 United States non-atheist 2012
table(us_2012$response)
##
## atheist non-atheist
## 50 952
table(us_2012$response) / nrow(us_2012)
##
## atheist non-atheist
## 0.0499002 0.9500998
SOLUTION:
# Your code goes here
As was hinted at in Exercise 1, Table 6 provides statistics, that is, calculations made from the sample of 51,927 people. What we’d like, though, is insight into the population parameters. You answer the question, “What proportion of people in your sample reported being atheists?” with a statistic; while the question “What proportion of people on earth would report being atheists” is answered with an estimate of the parameter.
Here is how we’d compute a 95% confidence interval for the proportion of atheists in the United States in 2012.
confint(binom.test(us_2012$response, conf.level = 0.95, ci.method = "wald"))
## probability of success lower upper level
## 1 0.0499002 0.03641833 0.06338206 0.95
SOLUTION:
SOLUTION:
SOLUTION:
SOLUTION:
# Your R code goes here
SOLUTION:
austria_2012 <- filter(atheism, nationality == "Austria", year == "2012")
nrow(austria_2012)
## [1] 1002
cr_2012 <- filter(atheism, nationality == "Czech Republic", year == "2012")
nrow(cr_2012)
## [1] 1000
kenya_2012 <- filter(atheism, nationality == "Kenya", year == "2012")
nrow(kenya_2012)
## [1] 1000
## Add confidence interval calculations for the proportion of the population
## who identify as atheists in each of Austria, the Czech Republic, and Kenya.
SOLUTION:
SOLUTION:
SOLUTION:
saudi_arabia_12 <- filter(atheism, nationality == "Saudi Arabia", year == "2012")
nrow(saudi_arabia_12)
## [1] 500
south_africa_12 <- filter(atheism, nationality == "South Africa", year == "2012")
nrow(south_africa_12)
## [1] 202
## Add confidence interval calculations for the proportion of the population
## who identify as atheists in each of Saudi Arabia and South Africa.
SOLUTION:
SOLUTION: