Education and Mortality

The following data set has the mortality rate (deaths per 100,000 people) and the education level (average number of years in school) for 58 U.S. cities.

death <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Education_and_mortality.csv")
## Parsed with column specification:
## cols(
##   Mortality = col_double(),
##   Education = col_double()
## )
head(death)
## # A tibble: 6 x 2
##   Mortality Education
##       <dbl>     <dbl>
## 1     921.9      11.4
## 2     997.9      11.0
## 3     962.4       9.8
## 4     982.3      11.1
## 5    1071.3       9.6
## 6    1030.4      10.2
nrow(death)
## [1] 58

1. Initial Questions

  • What are the observational units?
  • Are the variables quantitative or categorical?

SOLUTION:

2. Scatterplot

Make a scatterplot of the data, thinking of Education as the explanatory variable and Mortality as the response.

SOLUTION:

3. Are the assumptions for a linear model met?

Check all the assumptions you can check without actually fitting the model.

SOLUTION:

4. Regardless of your answer above, go ahead and fit the linear model.

SOLUTION:

5. Check any assumptions you couldn’t check before you fit the model. Should we go ahead with using this model?

You’ll need to make a plot of the residuals.

SOLUTION:

6. Explain in context what the regression says about the relationship between education levels and mortality. Interpret both the intercept and the slope in context.

SOLUTION:

7. Conduct a hypothesis test of the claim that there is no relationship between education levels and mortality. State all hypotheses, defining all involved population parameters. Use the p-value from the R output above, and verify that you know how to calculate the p-value using the qt function and the coefficient estimate and standard error. (When I did this, I got p-values of about 6.169e-08 to 6.228e-08, depending on exactly how I did the rounding).

SOLUTION:

8. Obtain a 90% confidence interval for the population slope, \(\beta_1\). Do this using the confint function, and verify that you also know how to do it using the qt function in R using the estimate and standard error. Interpret the confidence interval for \(\beta_1\) in context.

SOLUTION: