This assignment is due by the start of class on Monday, December 11th.
SDM4 25.1, 25.3, 25.5, 25.7, 25.9, 25.11, 25.13, 25.15, 25.17, 25.19, 25.21, 25.23, 25.25, 25.31, 25.35, 25.37, 25.39
The following R code reads in data recording the mortality rate (age-adjusted deaths per 100,000 people) and the education level (average number of years in school) for 58 U.S. cities.
cities <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Education_and_mortality.csv")
## Parsed with column specification:
## cols(
## Mortality = col_double(),
## Education = col_double()
## )
SOLUTION:
SOLUTION:
SOLUTION:
SOLUTION:
SOLUTION:
As part of your discussion, address the following points: i. the interpretation of the intercept; ii. the interpretation of the slope; and iii. what the model has to say about whether higher education rates cause reduced mortality rates.
SOLUTION:
SOLUTION:
Conduct a hypothesis test by using the pt() function to calculate a p-value. You will need some output from calling the summary() function on your linear model fit object. Verify that your p-value matches the p-value in the R output from the summary (mine matched the R output very closely).
SOLUTION:
You should do this using the critical value from the qt() function as well as the output from calling the summary() function on your linear model fit object. Verify that your confidence interval is similar to the output from R’s confint() function (mine matched up to 1 decimal place).
SOLUTION:
SOLUTION:
SOLUTION:
SOLUTION:
This question is sortof a throw-back to chapter 9 of the 4th edition of the book (chapter 10 of the 3rd edition of the book), but updated to the ideas we’ve been talking about more recently.
Biologists studying the effects of acid rain on wildlife collected data from 172 sites on streams in the Adirondack Mountains. Importantly, some of the sites are on the same stream. The researchers recorded the pH (acidity) of the water and the BCI, a measure of biological diversity. Here’s a look at the first 10 rows of the data and a scatterplot of BCI against pH for the 163 sites for which we have these data (we didn’t have measurements for all 172 streams), along with results from a linear model fit to the data.
streams <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Streams.csv")
## Parsed with column specification:
## cols(
## Stream = col_character(),
## SUB = col_character(),
## pH = col_double(),
## Temo = col_double(),
## BCI = col_character(),
## Hardness = col_double(),
## Alk = col_double(),
## Phos = col_double()
## )
streams <- mutate(streams, BCI = as.numeric(BCI)) %>%
filter(!is.na(pH) & !is.na(BCI)) %>%
arrange(Stream)
## Warning in evalq(as.numeric(BCI), <environment>): NAs introduced by
## coercion
head(streams, 10)
## # A tibble: 10 x 8
## Stream SUB pH Temo BCI Hardness Alk Phos
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ABIJAH S 6.75 8 1413 34.2 34.2 0.12
## 2 ABIJAH S 7.00 1 1366 51.3 51.3 0.20
## 3 ABIJAH178 S 7.00 2 1382 85.5 51.3 0.07
## 4 ABIJAHBULL S 7.00 2 1492 68.4 51.3 0.05
## 5 ABIJAHNICHOL S 7.00 3 1462 85.5 51.3 0.06
## 6 BARNSCOR S 7.00 2 1357 34.2 51.3 0.10
## 7 BEAR S 7.00 2 1389 51.3 0.0 0.05
## 8 BEAR L 7.50 9 1365 68.4 85.5 0.05
## 9 BEAR S 7.20 0 1289 51.3 51.3 0.83
## 10 BEAR M 7.00 1 1301 51.3 51.3 0.21
ggplot() +
geom_point(mapping = aes(x = pH, y = BCI), data = streams)
lm_fit <- lm(BCI ~ pH, data = streams)
streams <- mutate(streams, residual = residuals(lm_fit))
ggplot() +
geom_density(mapping = aes(x = residual), data = streams)
ggplot() +
geom_point(mapping = aes(x = pH, y = residual), data = streams)
There are multiple valid answers to this question. Simple strategies could be based on ideas that we discussed in Chapter 9 (chapter 10 of the 3rd edition of the book) and in the Frogs example on Monday, Nov 27 (think about the first example, where we had multiple observations for each frog). (Dealing with assumption violations like this is one of the big ideas of later statistics classes).
SOLUTION:
ggplot() +
geom_point(mapping = aes(x = pH, y = BCI), data = streams) +
geom_smooth(mapping = aes(x = pH, y = BCI),
data = streams,
method = "lm",
se = FALSE)
summary(lm_fit)
##
## Call:
## lm(formula = BCI ~ pH, data = streams)
##
## Residuals:
## Min 1Q Median 3Q Max
## -502.5 -59.9 12.0 87.3 387.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2733.4 187.9 14.55 < 2e-16 ***
## pH -197.7 25.6 -7.73 1.1e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 140 on 161 degrees of freedom
## Multiple R-squared: 0.271, Adjusted R-squared: 0.266
## F-statistic: 59.8 on 1 and 161 DF, p-value: 1.09e-12
SOLUTION:
SOLUTION:
SOLUTION:
SOLUTION:
For each of the following situations, state whether a Type I, a Type II, or neither error has been made.
SOLUTION:
SOLUTION:
SOLUTION:
SOLUTION:
Public health officials believe that 98% of children have been vaccinated against measles. A random survey of medical records at many schools across the country found that among more than 13,000 children, only 97.4% had been vaccinated. A hypothesis test would reject the null hypothesis of 98% with a p-value of less than 0.0001.
SOLUTION:
SOLUTION: