This homework is due at the start of class on Friday, September 29th.
SDM4 5.25, 5.35, 5.37, 5.43, 5.47, 5.51
SDM4 6.1, 6.5, 6.13, 6.15, 6.19, 6.21, 6.25, 6.27, 6.29, 6.31, 6.33, 6.35, 6.39
The Hopkins Forest is a research forest in northwestern Massachusetts, Vermont, and New York states. The Williams College Center for Environmental Studies studies the forest, and records measurements of weather on an ongoing basis. The box plots below show the average wind speed recorded each day in 2011, broken down by month. (i.e., the data frame had 365 rows – one for each day in 2011 – and for each day, we have a measurement of average wind speed that day).
hopkins <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Hopkins_Forest_2011.csv") %>%
mutate(
Month = factor(Month, ordered = TRUE))
## Parsed with column specification:
## cols(
## Season = col_character(),
## `Avg Wind Speed mph)` = col_double(),
## Month = col_integer(),
## Day = col_integer(),
## `Day of Year` = col_integer(),
## `Avg Temp(deg C)` = col_double(),
## `Avg Temp(deg F)` = col_double(),
## `Max Wind Speed(mph)` = col_double(),
## `Avg Barom(mb)` = col_double(),
## `Precip(in)` = col_double()
## )
levels(hopkins$Month) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
names(hopkins)[6] <- "ave_daily_temp_C"
ggplot() +
geom_boxplot(mapping = aes(x = Month, y = ave_daily_temp_C), data = hopkins)
Notice that there are relatively large outliers January and July. Let’s investigate those three outliers (there are also outliers in other months, but two is enough for the sake of this homework problem). In the R chunk below, I create two data sets, one with just observations from January and one with just observations from July.
hopkins_jan <- filter(hopkins, Month == "Jan")
hopkins_jul <- filter(hopkins, Month == "Jul")
ave_daily_temp_C
variable for the data from January, and another for the data from July. In your judgment, is it reasonably appropriate to calculate means and standard deviations with these data? Why or why not?SOLUTION:
# Your code goes here
SOLUTION:
# Your code goes here
max
function.SOLUTION:
# Your code goes here
SOLUTION:
SOLUTION:
Two companies market new batteries targeted at owners of personal music players (OK, the book is old). DuraTunes claims a mean battery life of 11 hours, while RockReady advertises 12 hours.
SOLUTION:
Here’s some code that is relevant:
1 - pnorm(q = 8, mean = 11, sd = 2)
## [1] 0.9331928
Explain why the above line of code is relevant. What does it calculate, and how does that relate to the question?
SOLUTION:
Now, add another line to the R code chunk above to calculate the other number you’ll need. Then, answer the question of which battery is most likely to last for at least 8 hours below:
SOLUTION:
In the R chunk below, use similar commands to the ones you used for part b) above, updated to do the calculation for 16 hours. Then answer the question.
SOLUTION:
# Your code goes here
A 1997 study of the movement patterns of house cats found that the average area of the region explored by suburban house cats at night time was 2.54 hectares, with a standard deviation of 1.08 hectares (Barratt, David G. “Home range size, habitat utilisation and movement patterns of suburban and farm cats Felis catus.” Ecography 20.3 (1997): 271-280.) Let’s assume that the area explored by suburban cats at night time follows a normal distribution (though this is almost certainly not the case in reality).
SOLUTION:
SOLUTION:
# Your code goes here.
SOLUTION:
SOLUTION:
# Your code goes here.
A researcher investigating the association between two variables collected some data and was surprised when he calculated the correlation. He had expected to find a fairly strong association, yet the correlation was near 0. Discouraged, he didn’t bother making a scatter plot. Explain to him how the scatter plot could still reveal the strong association he anticipated.
SOLUTION:
The correlation between Fuel Efficiency (as measured by miles per gallon) and Price of 140 cars at a large dealership is \(r = -0.34\). Explain whether or not each of these possible conclusions is justified:
SOLUTION:
SOLUTION:
SOLUTION:
SOLUTION:
The Minnesota Department of Transportation hoped that they could measure the weights of big trucks without actually stopping the vehicles by using a newly developed “weight-in-motion” scale. To see if the new device was accurate, they conducted a calibration test. They weighed several stopped trucks (static Weight) and assumed that this weight was correct. Then they weighed the trucks again while they were moving to see how well the new scale could estimate the actual weight. We read in these data below:
truck_weights <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Vehicle_weights.csv")
## Parsed with column specification:
## cols(
## WeightinMotion = col_double(),
## StaticWeight = col_double()
## )
SOLUTION:
# Your code goes here.
SOLUTION:
SOLUTION:
WeightinMotion
and StaticWeight
variables. Use the cor
function.SOLUTION:
# add call to cor() here...
SOLUTION:
SOLUTION: