This homework is due at the start of class on Friday, October 6th.

PRACTICE PROBLEMS (not to be turned in; may be helpful for exam review):

SDM4 7.1, 7.3, 7.7, 7.13, 7.15, 7.21, 7.23, 7.33, 7.35, 7.67, 7.69

SDM4 8.7, 8.9, 8.17, 8.19, 8.23, 8.25, 8.33, 8.37, 8.39, 8.41

PROBLEMS TO TURN IN:

Problem 1: SDM4 7.36 (More misinterpretations)

A sociology student investigated the association between a contry’s Literacy Rate and Life Expectancy, and then drew the conclusions listed below. Explain why each statement is incorrect. Assume that all the calculations were done properly.

a) The \(R^2\) of 0.64 means that the Literacy Rate determines 64% of the Life Expectancy for a country.

SOLUTION:

b) The slope of the line shows that an increase of 5% in Literacy Rate will produce a 2-year improvement in Life Expectancy

SOLUTION:

Problem 2: SDM4 7.69 (Climate change)

The Earth’s climate is getting warmer. The most common theory attributes the increase in temperatures to an increase in atmospheric levels of carbon dioxide, (CO\(_2\)), a greenhouse gas. The following R chunk reads in data with measurements of the mean annual CO\(_2\) concentration in the atmosphere, measured in parts per million (ppm) at the top of Mauna Loa in Hawaii, and the mean annual air temperature over both land and sea across the globe, in degrees Celsius for the years 1970 to 2013.

climate <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Climate_Change_2013.csv")
names(climate) <- c("year", "CO2", "global_ave_temp", "DJIA")
head(climate)
## # A tibble: 6 x 4
##    year    CO2 global_ave_temp     DJIA
##   <int>  <dbl>           <dbl>    <dbl>
## 1  1970 325.68           14.05 753.1181
## 2  1971 326.32           14.08 884.8719
## 3  1972 327.45           14.10 950.1170
## 4  1973 329.68           14.04 924.0697
## 5  1974 330.18           13.95 759.1294
## 6  1975 331.08           13.87 802.8886

a) Make a scatter plot with CO2 on the horizontal axis and global_ave_temp on the vertical axis.

SOLUTION:

# Your code goes here

b) What is the correlation between CO\(_2\) and temperature?

SOLUTION:

# Your code goes here

c) Based on the scatter plot you made in part a), check the assumptions for fitting a linear regression model to these data.

SOLUTION:

d) Give the regression equation, and interpret the coefficients in the context of the problem.

SOLUTION:

# Your code goes here

e) Make a density plot of the residuals and a scatter plot of the residuals vs. the fitted values. Do the plots show evidence of violations of any assumptions of the linear regression model? If so, which ones?

SOLUTION:

# Your code goes here

f) What is the residual standard deviation? Interpret it in the context of this problem using the 68-95-99.7 rule.

SOLUTION:

# Your code goes here

g) What is the \(R^2\)? Interpret it in the context of this problem.

SOLUTION:

h) CO\(_2\) levels will probably reach 400 ppm by 2020. What mean Temperature does the linear regression model predict for that concentration of CO\(_2\)?

SOLUTION:

i) Give two reasons why your answer to part h) does not mean that when the CO\(_2\) level hits 400 ppm, the temperature will reach the predicted level. Hint: Think about residuals, and causation… and explain your thinking in detail!

SOLUTION:

Problem 3: SDM4 7.70 (Climate change revisited)

In the previous exercise, we explored the relationship between atmospheric CO\(_2\) levels and gloval average temperatures from 1970 to 2013. Let’s now explore the relationship between the Dow Jones Industrial Average (DJIA), a stock market index, and global temperatures during the same time period.

a) Make a scatter plot with DJIA on the horizontal axis and global_ave_temp on the vertical axis.

SOLUTION:

# Your code goes here

b) What is the correlation between the DJIA and global average temperatures?

SOLUTION:

# Your code goes here

c) Give the regression equation, and interpret the coefficients in the context of the problem. You already showed in the last problem that you know how to check the assumptions, so you don’t need to do that again here. (But in real life, you should always check the assumptions!)

SOLUTION:

# Your code goes here

d) Report the \(R^2\) and interpret it in the context of this problem.

SOLUTION:

# Your code goes here

e) Suppose the DJIA hits 20,000 in the year 2020. What mean temperature does the regression model predict for that level of the DJIA?

SOLUTION:

f) Give two reasons why your answer to part e) does not mean that when the DJIA hits 20,000 ppm, the temperature will reach the predicted level.

SOLUTION:

Problem 4: SDM 8.17 (Human Development Index 2012)

The United Nations Development Programme (UNDP) uses the Human Development Index (HDI) in an attempt to summarize in one number the progress in health, education, and economics of a country. In 2012, the HDI was as high as 0.955 for Norway and as low as 0.304 for Niger. The gross national income per capita, by contrast, is often used to summarize the overall economic strength of a country. The following R chunk reads in data including the HDI and gross national income per capita for 187 countries as of 2012.

country_data <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/HDI_2012.csv")
names(country_data) <- c("hdi_rank", "country", "country_abbr", "hdi", "life_expectancy", "mean_school_years", "exp_chool_years", "gni_per_cap", "gni_rank_minus_hdi_rank", "non_income_hdi", "type")
head(country_data)
## # A tibble: 6 x 11
##   hdi_rank             country country_abbr   hdi life_expectancy
##      <int>               <chr>        <chr> <dbl>           <dbl>
## 1      175         Afghanistan          AFG 0.374            49.1
## 2       70             Albania          ALB 0.749            77.1
## 3       93             Algeria          DZA 0.713            73.4
## 4       33             Andorra          AND 0.846            81.1
## 5      148              Angola          AGO 0.508            51.5
## 6       67 Antigua and Barbuda          ATG 0.760            72.8
## # ... with 6 more variables: mean_school_years <dbl>,
## #   exp_chool_years <dbl>, gni_per_cap <int>,
## #   gni_rank_minus_hdi_rank <int>, non_income_hdi <dbl>, type <chr>

a) Make a scatter plot with gni_per_cap on the horizontal axis and hdi on the vertical axis. Would it be appropriate to fit a linear model to describe the relationship between these variables? Check all of the assumptions for the linear model and assess whether or not each is violated.

SOLUTION:

Problem 5: SDM 8.24 (Tracking Hurricanes)

The National Hurricane Center is responsible for making predictions of the path of hurricanes. The following R chunk reads in a data set with the average error in their predictions for each year from 1970 through 2012. There is a separate variable in the data set for errors in predictions made 24 hours in advance, 48 hours in advance, and 72 hours in advance. Let’s see whether they’ve been improving at predictions made 24 hours in advance.

hurricanes <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Tracking_hurricanes_2012.csv")
head(hurricanes)
## # A tibble: 6 x 4
##    Year Error24h Error48h Error72h
##   <int>    <dbl>    <dbl>    <dbl>
## 1  1970     84.3    185.8    253.8
## 2  1971    112.4    242.0    381.9
## 3  1972    142.3    390.6    689.2
## 4  1973    116.7    246.2    363.2
## 5  1974     97.1    206.5    348.3
## 6  1975    117.0    256.9    402.1

a) Make a scatter plot with Year on the horizontal axis and Error24h on the vertical axis. Would it be appropriate to fit a linear model to describe the relationship between these variables? Check all of the assumptions for the linear model and assess whether or not each is violated.

SOLUTION:

Problem 6: SDM4 8.30 (What’s the effect)

A researcher studying violent behavior in elementary school children asks the children’s parents how much time each child spends playing computer games and has their teachers rate each child on the level of aggressiveness they display while playing with other children. Suppose that researcher finds a moderately strong positive correlation. Describe three different possible cause-and-effect explanations for this relationship.

SOLUTION: