Education and Mortality
The following data set has the mortality rate (deaths per 100,000 people) and the education level (average number of years in school) for 58 U.S. cities.
death <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Education_and_mortality.csv")
## Parsed with column specification:
## cols(
## Mortality = col_double(),
## Education = col_double()
## )
head(death)
## # A tibble: 6 x 2
## Mortality Education
## <dbl> <dbl>
## 1 921.9 11.4
## 2 997.9 11.0
## 3 962.4 9.8
## 4 982.3 11.1
## 5 1071.3 9.6
## 6 1030.4 10.2
nrow(death)
## [1] 58
1. Initial Questions
- What are the observational units?
- Are the variables quantitative or categorical?
SOLUTION:
2. Scatterplot
Make a scatterplot of the data, thinking of Education as the explanatory variable and Mortality as the response.
SOLUTION:
3. Are the assumptions for a linear model met?
Check all the assumptions you can check without actually fitting the model.
SOLUTION:
4. Regardless of your answer above, go ahead and fit the linear model.
SOLUTION:
5. Check any assumptions you couldn’t check before you fit the model. Should we go ahead with using this model?
You’ll need to make a plot of the residuals.
SOLUTION:
6. Explain in context what the regression says about the relationship between education levels and mortality. Interpret both the intercept and the slope in context.
SOLUTION:
7. Conduct a hypothesis test of the claim that there is no relationship between education levels and mortality. State all hypotheses, defining all involved population parameters. Use the p-value from the R output above, and verify that you know how to calculate the p-value using the qt function and the coefficient estimate and standard error. (When I did this, I got p-values of about 6.169e-08 to 6.228e-08, depending on exactly how I did the rounding).
SOLUTION:
8. Obtain a 90% confidence interval for the population slope, \(\beta_1\). Do this using the confint function, and verify that you also know how to do it using the qt function in R using the estimate and standard error. Interpret the confidence interval for \(\beta_1\) in context.
SOLUTION: