Comparing Fuel Efficiency for automatic transmission vs. manual transmission cars

Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2012. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage?

# load data ---------------------------------------------------------
fuel_eff <- read_csv("https://mhc-stat140-2017.github.io/data/misc/fuel_eff.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   model_yr = col_integer(),
##   model_type_index = col_integer(),
##   engine_displacement = col_double(),
##   no_cylinders = col_integer(),
##   city_mpg = col_integer(),
##   hwy_mpg = col_integer(),
##   comb_mpg = col_integer(),
##   no_gears = col_integer()
## )
## See spec(...) for full column specifications.
# select a small sample ---------------------------------------------
man_rows <- which(fuel_eff$transmission == "M")
aut_rows <- which(fuel_eff$transmission == "A")

set.seed(3583)
man_rows_samp <- sample(man_rows, 26)
aut_rows_samp <- sample(aut_rows, 26)

fuel_eff_samp <- fuel_eff[c(man_rows_samp,aut_rows_samp), ]
fuel_eff_samp$transmission <- factor(fuel_eff_samp$transmission)

levels(fuel_eff_samp$transmission) <- c("automatic", "manual")

ggplot() +
  geom_density(mapping = aes(x = comb_mpg, color = transmission), data = fuel_eff_samp)

fuel_eff_man <- filter(fuel_eff_samp, transmission == "manual")
fuel_eff_aut <- filter(fuel_eff_samp, transmission == "automatic")

mean(fuel_eff_man$comb_mpg)
## [1] 22.85
sd(fuel_eff_man$comb_mpg)
## [1] 4.73
mean(fuel_eff_aut$comb_mpg)
## [1] 18.65
sd(fuel_eff_aut$comb_mpg)
## [1] 4.137

State the null and alternative hypotheses

Define \(\mu_1\) to be the mean fuel efficiency among the population of all automatic transmission cars manufactured in 2012 and \(\mu_2\) to be the mean fuel efficiency among the population of all manual transmission cars manufactured in 2012.

\(H_0\): \(\mu_1 = \mu_2\), or \(\mu_1 - \mu_2 = 0\)

\(H_A\): \(\mu_1 \neq \mu_2\), or \(\mu_1 - \mu_2 \neq 0\)

Check Assumptions for Inference

Paired or Unpaired?

Are these data paired or unpaired?

SOLUTION:

There is no indication that these data are paired. To be paired, there would have to be a situation where for each car model, we measured fuel efficiency for a version of that car model with an automatic transmission and a version with a manual transmission. I will treat them as unpaired data.

1. Independence

  • Among observations within each group and observations in different groups for a two-sample test
  • Among the different pairs for a paired test

SOLUTION:

Since we’re measuring fuel efficiency for different randomly selected cars, it’s reasonable to assume that their fuel efficencies are independent within each group and across the different groups.

2. Nearly Normal

  • Check separately for both groups for a two-sample test
  • Check for the differences for a paired test

SOLUTION:

Within each group, the density plot above shows that the distribution of fuel efficiency measurements are nearly normal.

3. Sample size big enough

  • Check separately for both groups for a two-sample test
  • Check for the number of pairs for a paired test

SOLUTION:

How big the sample size has to be depends on how far from normal the distribution of values within each group is. Since the distributions are quite close to normal within each group, a sample size of 26 in each group is certainly large enough.

Do the mechanics of the test

You can use t.test() function.

# call to t.test() here
t.test(fuel_eff_aut$comb_mpg, fuel_eff_man$comb_mpg)
## 
##  Welch Two Sample t-test
## 
## data:  fuel_eff_aut$comb_mpg and fuel_eff_man$comb_mpg
## t = -3.4, df = 49, p-value = 0.001
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.669 -1.716
## sample estimates:
## mean of x mean of y 
##     18.65     22.85

Draw a conclusion in context of the problem

SOLUTION:

Since the p-value is 0.001339, which is less than commonly used significance levels such as \(\alpha = 0.05\), we can reject the null hypothesis. The data provide enough evidence to conclude that the mean fuel efficiency for automatic transmission cars is different from the mean fuel efficiency for manual transmission acars, at the \(\alpha = 0.01\) significance level.

State a 95% confidence interval and interpret it in context.

SOLUTION:

We are 95% confident that the difference in the mean fuel efficiency for the population of all automatic transmission cars and the mean fuel efficiency for the population of all manual transmission cars is between -6.67 mpg and -1.72 mpg. If we were to take many samples from these populations, and use each sample to compute a 95% confidence interval for the difference in population means, about 95% of those confidence intervals would contain the true difference in the population mean fuel efficiency for automatic transmission cars and manual transmission cars.

Accidents on Friday the 13th

The British Medical Journal published an article titled “Is Friday the 13th Bad for Your Health?” The article examined the number of people admitted to emergency rooms for vehicular accidents on 12 Friday evenings (6 each on the 6th and 13th). Here are the data:

friday13 <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Friday_the_13th_Part_2.csv")
## Parsed with column specification:
## cols(
##   `Year and Month` = col_character(),
##   `6th` = col_integer(),
##   `13th` = col_integer()
## )
names(friday13) <- c("year_month", "accidents_6th", "accidents_13th")
friday13 <- mutate(friday13, difference = accidents_13th - accidents_6th)
head(friday13)
## # A tibble: 6 x 4
##   year_month accidents_6th accidents_13th difference
##        <chr>         <int>          <int>      <int>
## 1     Oct-89             9             13          4
## 2     Jul-90             6             12          6
## 3     Sep-91            11             14          3
## 4     Dec-91            11             10         -1
## 5     Mar-92             3              4          1
## 6     Nov-92             5             12          7

Is there a difference between rates of accidents on Friday the 13th and Friday the 6th?

State Hypotheses

SOLUTION:

Define \(mu_1\) to be the mean number of accidents that occur on Friday the 13th and \(\mu_2\) to be the mean number of accidents that occur on Friday the 6th.

\(H_0\): \(\mu_1 = \mu_2\), or \(\mu_1 - \mu_2 = 0\)

\(H_A\): \(\mu_1 \neq \mu_2\), or \(\mu_1 - \mu_2 \neq 0\)

Check Assumptions for Inference

Paired or Unpaired?

Are these data paired or unpaired?

SOLUTION:

These are paired data since for each month, we have observations of the number of accidents on two consecutive Fridays.

1. Independence

  • Among observations within each group and observations in different groups for a two-sample test
  • Among the different pairs for a paired test

SOLUTION:

Each pair of observations (for the number of accidents on the 6th and on the 13th) occurs in a different month and year. There is no reason to think there would be a connection between the different months and years in this data set.

2. Nearly Normal

  • Check separately for both groups for a two-sample test
  • Check for the differences for a paired test

SOLUTION:

ggplot() +
  geom_density(mapping = aes(x = difference), data = friday13)

ggplot() +
  geom_histogram(mapping = aes(x = difference), data = friday13)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

It’s difficult to assess the form of the distribution with a sample size of only 6, but it does seem that the distribution may be skewed slightly to the left. However, it does seem like the mean will be a reasonably good summary of the center of this distribution.

3. Sample size big enough

  • Check separately for both groups for a two-sample test
  • Check for the number of pairs for a paired test

SOLUTION:

A sample size of only 6 is pretty small. This sample size really might not be big enough to support reliable inference with these data.

Calculate a p-value

You can use t.test() function.

SOLUTION:

# Your code goes here
t.test(friday13$accidents_13th, friday13$accidents_6th, paired = TRUE, conf.level = 0.99)
## 
##  Paired t-test
## 
## data:  friday13$accidents_13th and friday13$accidents_6th
## t = 2.7, df = 5, p-value = 0.04
## alternative hypothesis: true difference in means is not equal to 0
## 99 percent confidence interval:
##  -1.623  8.290
## sample estimates:
## mean of the differences 
##                   3.333

Draw a Conclusion

SOLUTION:

The p-value for this test is 0.042. This p-value is less than the significance cut-off of \(\alpha = 0.05\), so we can reject the null hypothesis. These data provide enough evidence to conclude that there is a difference in the mean number of accidents on Friday the 13th and Friday the 6th, at the \(\alpha = 0.05\) significance level. However, we should remain cautious about the strength of this inference given the small sample size noted above.

State a 99% confidence interval and interpret it in context.

SOLUTION:

We are 99% confident that the difference between the mean number of accidents that occur on Fridays the 13th and the mean number of accidents that occur on Fridays the 6th is between -1.62 and 8.29. If we took many different samples and computed a similar confidence interval based on each sample, about 99% of those confidence intervals would contain the true population difference in the number of accidents that occur on the 13th and on the 6th.