Example 1: Chips Ahoy! (Adapted from SDM4 20.43)

Chips Ahoy claims that their 18-ounce bags of chocolate chip cookies contained over 1000 chocolate chips, on average. Dedicated statistics students at the Air Force Academy purchased some randomly selected bags of cookies and counted the chocolate chips. The following R chunk reads these data in:

cookies <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Chips_ahoy.csv")
## Parsed with column specification:
## cols(
##   Chips = col_integer()
## )

Make a density plot of the number of chocolate chips in each bag (the variable is named Chips) and compute the mean and standard deviation of the number of chocolate chips.

SOLUTION:

ggplot() +
  geom_density(mapping = aes(x = Chips), data = cookies)

mean(cookies$Chips)
## [1] 1238.188
sd(cookies$Chips)
## [1] 94.282

Would it be appropriate to conduct a hypothesis test about the mean number of chocolate chips per bag using these data? Check all assumptions.

SOLUTION:

To calculate a confidence interval for the mean, we need to check three conditions:

  1. The distribution of values is nearly normal (unimodal and symmetric). This is satisfied in this case, as we can see from the density plot above.

  2. The sample size is “big enough”. Exactly how big depends on how far from normal the distribution is. The distribution looks quite close to normal in this case, so our sample of 16 bags of cookies is big enough.

  3. Independence. We’re told that the bags of cookies were sampled randomly.

Regardless of you answer above, let’s go ahead with the hypothesis test to see whether the data provide strong evidence that the mean number of chocolate chips is over 1000. State your null and alternative hypotheses.

SOLUTION:

\(H_0: \mu = 1000\)

\(H_A: \mu > 1000\)

Calculate the p-value using the t.test function in R.

SOLUTION:

t.test(cookies$Chips, mu = 1000, alternative = "greater")
## 
##  One Sample t-test
## 
## data:  cookies$Chips
## t = 10.105, df = 15, p-value = 2.176e-08
## alternative hypothesis: true mean is greater than 1000
## 95 percent confidence interval:
##  1196.867      Inf
## sample estimates:
## mean of x 
##  1238.188

The p-value is \(2.176 * 10^{-8} = 0.00000002176\)

Calculate the p-value again using the pt function in R. You will first need to calculate the test statistic, and then calculate the p-value: the probability of getting a “more extreme” test statistic, assuming the null hypothesis is true. You should get the same answer as above, up to rounding error.

SOLUTION:

Since the alternative hypothesis is that the population mean is greater than 1000, the p-value is the probability of getting a test statistic at least as large as what we got in this data set, assuming the null hypothesis is true. This is 1 - the probability of getting a test statistic that is smaller than what we got in this data set; see the last line in the R chunk below.

# Calculate test statistic:
x_bar <- mean(cookies$Chips)
s <- sd(cookies$Chips)
n <- nrow(cookies)
test_statistic <- (x_bar - 1000)/(s / sqrt(n))
test_statistic
## [1] 10.10532
1 - pt(test_statistic, df = n - 1)
## [1] 2.176348e-08

Draw a conclusion in the context of this problem.

SOLUTION:

Since the p-value is small, it would be unlikely to observe a test statistic as large as 10.105 if the mean number of chocolate chips was 1000. Formally, the p-value is less than \(\alpha = 0.001\) or any other reasonable significance level cut-off. The data offer enough evidence to reject the null hypothesis and conclude that the mean number of chocolate chips per bag is greater than 1000 at the \(\alpha = 0.001\) significance level.

Example 2: Mercury Contamination in Food

Dolphins are at the top of the oceanic food chain; as a consequence, dangerous substances such as mercury tend to be present in their organs and muscles at high concentrations. In areas where dolphins are regularly consumed, it is important to monitor dolphin mercury levels. This example uses data from a random sample of 19 Risso’s dolphins from the Taiji area in Japan. (Taiji is a significant source of dolphin and whale meat in Japan. Thousands of dolphins pass through the Taiji area annually; assume that these 19 dolphins represent a simple random sample. Data reference: Endo T and Haraguchi K. 2009. High mercury levels in hair samples from residents of Taiji, a Japanese whaling town. Marine Pollution Bulletin 60(5):743-747.)

In a sample of 19 dolphins, the average concentration of mercury was 4.4 micrograms of mercury per wet gram of muscle, with a standard deviation of 2.3 micrograms of mercury per wet gram of muscle.

Based on guidelines from the Food and Agriculture Organization, a subdivision of the World Health Organization, the maximum safe concentration of mercury for someone weighing 70kg who wants to eat a serving of dolphin meat is about 1.32 micrograms of mercury per gram of muscle.

Do the data provide strong evidence that the concentration of mercury in dolphin meat is above the safe limit for consumption? That is, is the population mean concentration of mercury greater than 1.32 micrograms of mercury per gram of muscle?

We don’t have the data available to make a plot of the concentration of mercury in dolphin meat for this sample, so we can’t check all of the necessary assumptions. Regardless, list the assumptions necessary for conducting a hypothesis test about the population mean. Check the assumptions that you can check, and for the rest state what you would need to look at.

SOLUTION:

To calculate a confidence interval for the mean, we need to check three conditions:

  1. The distribution of values is nearly normal (unimodal and symmetric). We can’t check this without looking at a plot of the data.

  2. The sample size is “big enough”. Exactly how big depends on how far from normal the distribution is. Since we don’t know how close to a normal distribution the data follow, it’s hard to assess whether the sample size is large enough. We need to look at a plot of the data to check this assumption as well.

  3. Independence. We’re told that it’s safe to assume that these 19 dolphins represent a simple random sample, so the assumption of independence is reasonable.

Regardless of you answer above, let’s go ahead with the hypothesis test to see whether the data provide strong evidence that the concentration of mercury is greater than 1.32 micrograms per gram of muscle. State your null and alternative hypotheses.

SOLUTION:

\(H_0: \mu = 1.32\)

\(H_A: \mu > 1.32\)

Calculate the p-value using the pt function in R. You will first need to calculate the test statistic, and then calculate the p-value: the probability of getting a “more extreme” test statistic, assuming the null hypothesis is true.

SOLUTION:

# Calculate test statistic:
x_bar <- 4.4
s <- 2.3
n <- 19
test_statistic <- (x_bar - 1.32)/(s / sqrt(n))
test_statistic
## [1] 5.837134
1 - pt(test_statistic, df = n - 1)
## [1] 7.879384e-06

The p-value is \(7.87 * 10^{-6} = 0.00000787\)

Draw a conclusion in the context of this problem.

SOLUTION:

The p-value is less than any reasonable significance level cut-off such as \(\alpha = 0.001\). These data offer enough evidence to reject the null hypothesis at the \(\alpha = 0.001\) significance level, and conclude that the mean mercury concentration in dolphin meat is higher than the safe limit for consumption.