Goals

There are three main goals/learning objectives for this lab:

  1. Get experience with checking conditions and interpreting confidence intervals for proportions.

  2. Get some experience using R to calculate confidence intervals for proportions.

  3. Understand how the width of the confidence interval depends on various factors:

    1. the confidence level

    2. the proportion that is being estimated

    3. the sample size

Grading

This lab will not be graded for credit. However, I will ask you turn it in; there are several common errors in interpreting confidence intervals, and I want to have a chance to read these and give you feedback before you are responsible for this material on graded assignments. Please email the completed lab (Rmd file only) to me, cc-ing anyone you worked with, by 5pm on Monday, Nov 6. You will have class time on Wednesday and Friday to work on this, but we will not spend time on this lab in class on Monday. To download the Rmd file from Rstudio, click the check box next to the file name in the lower right panel of RStudio, then click “More” (top right of that Files panel), and choose “Export…” and save the file to your computer. Then you can attach it to an email.

Introduction

In August of 2012, news outlets ranging from the Washington Post to the Huffington Post ran a story about the rise of atheism in America. The source for the story was a poll that asked people, “Irrespective of whether you attend a place of worship or not, would you say you are a religious person, not a religious person or a convinced atheist?” The full press release for the poll, conducted by WIN-Gallup International, is found at the following address:

*<“https://mhc-stat140-2017.github.io/labs/20171101_p_ci/Global_INDEX_of_Religiosity_and_Atheism_PR__6.pdf>*

Preliminary Questions

1. In the first paragraph of the press release, several key findings are reported. Do these percentages appear to be sample statistics (derived from the data sample) or population parameters?

SOLUTION:

2. The title of the report is “Global Index of Religiosity and Atheism”. To generalize the report’s findings to the global human population, what must we assume about the sampling method? Does that seem like a reasonable assumption?

SOLUTION:

The data

Turn your attention to Table 6 of the press release (pages 15 and 16), which reports the sample size and response percentages for all 57 countries. While this is a useful format to summarize the data, we will base our analysis on the original data set of individual responses to the survey. Load this data set into R with the following commands.

atheism <- read_csv("https://mhc-stat140-2017.github.io/data/openintro/atheism/atheism.csv")
## Parsed with column specification:
## cols(
##   nationality = col_character(),
##   response = col_character(),
##   year = col_integer()
## )
head(atheism)
## # A tibble: 6 x 3
##   nationality    response  year
##         <chr>       <chr> <int>
## 1 Afghanistan non-atheist  2012
## 2 Afghanistan non-atheist  2012
## 3 Afghanistan non-atheist  2012
## 4 Afghanistan non-atheist  2012
## 5 Afghanistan non-atheist  2012
## 6 Afghanistan non-atheist  2012
str(atheism)
## Classes 'tbl_df', 'tbl' and 'data.frame':    88032 obs. of  3 variables:
##  $ nationality: chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ response   : chr  "non-atheist" "non-atheist" "non-atheist" "non-atheist" ...
##  $ year       : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 3
##   .. ..$ nationality: list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ response   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ year       : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

3. What does each row of Table 6 correspond to? What does each row of atheism correspond to?

SOLUTION:

To investigate the link between these two ways of organizing this data, take a look at the estimated proportion of atheists in the United States. Towards the bottom of Table 6, we see that this is 5%. We can check this number using the atheism data by running the commands below. Make sure you understand what each of the commands below does after running it.

us_2012 <- filter(atheism, nationality == "United States", year == "2012")
nrow(us_2012)
## [1] 1002
head(us_2012)
## # A tibble: 6 x 3
##     nationality    response  year
##           <chr>       <chr> <int>
## 1 United States non-atheist  2012
## 2 United States non-atheist  2012
## 3 United States non-atheist  2012
## 4 United States non-atheist  2012
## 5 United States non-atheist  2012
## 6 United States non-atheist  2012
table(us_2012$response)
## 
##     atheist non-atheist 
##          50         952
table(us_2012$response) / nrow(us_2012)
## 
##     atheist non-atheist 
##   0.0499002   0.9500998

4. Using a similar series of commands, confirm the calculation of the proportion of atheist responses in our neighboring country of Canada. Does it agree with the percentage of 9% in Table 6?

SOLUTION:

# Your code goes here

Inference on proportions

As was hinted at in Exercise 1, Table 6 provides statistics, that is, calculations made from the sample of 51,927 people. What we’d like, though, is insight into the population parameters. You answer the question, “What proportion of people in your sample reported being atheists?” with a statistic; while the question “What proportion of people on earth would report being atheists” is answered with an estimate of the parameter.

A confidence interval

Here is how we’d compute a 95% confidence interval for the proportion of atheists in the United States in 2012.

confint(binom.test(us_2012$response, conf.level = 0.95, ci.method = "wald"))
##   probability of success      lower      upper level
## 1              0.0499002 0.03641833 0.06338206  0.95

5. Interpret this confidence interval in the context of the problem.

SOLUTION:

6. Write out the conditions for inference to construct a 95% confidence interval for the proportion of atheists in the United States in 2012. Are you confident all conditions are met?

SOLUTION:

7. Although formal confidence intervals don’t show up in the report, suggestions of inference appear at the bottom of page 7: “In general, the error margin for surveys of this kind is plus or minus 3-5% at 95% confidence”. Based on the R output, what is the margin of error for the estimate of the proportion of the proportion of atheists in US in 2012?

SOLUTION:

Confidence interval width and the confidence level

8. Calculate a 90% confidence interval for the proportion of atheists in the United States in 2012. Does it make sense that this confidence interval would be wider or narrower than the 95% confidence interval we already calculated?

SOLUTION:

# Your R code goes here

Confidence interval width and the proportion being estimated

9. Modify the R chunk below to calculate 95% confidence intervals for the proportion of the population who identify as atheists in Austria, the Czech Republic, and Kenya. (We should check the conditions for constructing the confidence interval as in question number 6 above – but let’s ignore that step here in order to focus our time on other issues.) Note that for each of these countries, as well as the U.S., the sample size is similar (about 1000 respondants). Then answer the questions below.

SOLUTION:

austria_2012 <- filter(atheism, nationality == "Austria", year == "2012")
nrow(austria_2012)
## [1] 1002
cr_2012 <- filter(atheism, nationality == "Czech Republic", year == "2012")
nrow(cr_2012)
## [1] 1000
kenya_2012 <- filter(atheism, nationality == "Kenya", year == "2012")
nrow(kenya_2012)
## [1] 1000
## Add confidence interval calculations for the proportion of the population
## who identify as atheists in each of Austria, the Czech Republic, and Kenya.

(i) Is the width of the confidence intervals the same for these three countries and the U.S.? How does this relate to the formula we learned for the margin of error used in calculating the confidence interval?

SOLUTION:

(ii) If we surveyed a new country where 70% of the population were atheists, and obtained a sample size of about 1000 people, how wide would you expect the confidence interval to be? Would it have the same or similar width as the interval we’ve seen for one of the other countries we’ve already looked at, or would it be larger or smaller?

SOLUTION:

Confidence interval width and the sample size

10. Calculate 95% Confidence Intervals for the proportion of the population who identify as atheists in Saudi Arabia and South Africa (again, skip checking the conditions for now). Note that these countries have similar proportions who identify as atheists as the U.S., but the sample sizes are different. Then answer the questions below.

SOLUTION:

saudi_arabia_12 <- filter(atheism, nationality == "Saudi Arabia", year == "2012")
nrow(saudi_arabia_12)
## [1] 500
south_africa_12 <- filter(atheism, nationality == "South Africa", year == "2012")
nrow(south_africa_12)
## [1] 202
## Add confidence interval calculations for the proportion of the population
## who identify as atheists in each of Saudi Arabia and South Africa.

(i) Is the width of the confidence intervals the same for these two countries and the U.S.? How does this relate to the formula we learned for the margin of error used in calculating the confidence interval?

SOLUTION:

(ii) Suppose we are planning a survey of a new country, and our initial guess is that about the proportion of the population who identify as atheists is about 0.15 (or 15%). If we want to ensure that the total width of a 95% confidence interval for that proportion will be no larger than 0.04, how large should our sample size be? Use the formula for the margin of error for a confidence interval based on the normal approximation. (Recall that the margin of error is half of the width of the interval!)

SOLUTION: