Wildfires (based on SDM4 5.58)

The following R chunk loads a data set with the average number of acres burned per wildfire (in hundreds of thousands of acres) in each year from 1985 to 2012. I’ve written this code out for you, but at this point in the class it would be good if you understand what each line of code in this R chunk does. Let me know if any of this isn’t clear!

wildfires <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Wildfires_2012.csv")
names(wildfires) <- c("num_fires", "years_since_1985", "ave_acres_burned")
wildfires <- mutate(wildfires, year = years_since_1985 + 1985)
wildfires <- arrange(wildfires, year)
head(wildfires)
## # A tibble: 6 x 4
##   num_fires years_since_1985 ave_acres_burned  year
##       <int>            <int>            <dbl> <dbl>
## 1     82591                0             29.0  1985
## 2     85907                1             27.2  1986
## 3     71300                2             24.5  1987
## 4     72750                3             50.1  1988
## 5     48949                4             18.3  1989
## 6     66481                5             46.2  1990
str(wildfires)
## Classes 'tbl_df', 'tbl' and 'data.frame':    28 obs. of  4 variables:
##  $ num_fires       : int  82591 85907 71300 72750 48949 66481 75754 87394 58810 79107 ...
##  $ years_since_1985: int  0 1 2 3 4 5 6 7 8 9 ...
##  $ ave_acres_burned: num  29 27.2 24.5 50.1 18.3 ...
##  $ year            : num  1985 1986 1987 1988 1989 ...

(a) We suspect that there may be a trend in the number of acres burned over time. Make a scatter plot with years_since_1985 on the horizontal axis and ave_acres_burned on the vertical axis.

SOLUTION:

# Your code goes here

(b) Is a linear regression model appropriate for these data? Check for (1) quantitative variables, (2) straight enough, (3) outliers, and (4) does the plot thicken? Note that we can’t yet check whether the residuals follow a nearly normal distribution, since we haven’t yet fit the model in order to get the residuals to plot!

SOLUTION:

(c) Regardless of your answer to (c), fit the regression model using ave_acres_burned as the response variable and years_since_1985 as the explanatory variable.

SOLUTION:

# Your code goes here

(d) Make a scatter plot with the regression line. You can use the same code you used for part (a) as a starter, but how add a second layer to the plot using geom_smooth

# Your code goes here

(e) Now that we’ve fit a tentative model, we can check whether the residuals follow a nearly normal distribution. Use the mutate function to add two new variables to the wildfires data frame: one with the predicted values from the model, and one with the residuals (we won’t need the predicted values in this lab, but we will use them in future classes and it’s good to get in the habit of saving them). Then, make a density plot of the residuals. Is the assumption that the residuals follow a nearly normal distribution satisfied?

SOLUTION:

# Your code goes here.

(f) Regardless of your answer to part (e), let’s go ahead with using this model. Use the coef function to print out the estimated coefficients. Write down the equation of the line of regression predicting number of acres burned from the number of years since 1985. Interpret the intercept and slope in the context of the problem. Does the interpretation of the intercept make sense?

SOLUTION:

# Your code goes here

(g) Using the equation you wrote down in part (f), calculate the linear model’s predicted values for 1985 and for 2018.

SOLUTION:

(h) Fit a new model using year as the explanatory variable. Use the coef function to obtain the estimated model coefficients, and write down the estimated regression equation. Interpret the coefficients in this model. In this model, does the interpretation of the intercept make sense? Compare the slope to what you got in part (f).

SOLUTION:

# Your code goes here

(i) Using the equation you wrote down in part (h), calculate the linear model’s predicted values for 1985 and for 2018. Compare to the answer you got in part (g) (you should have the same results, up to rounding errors).

SOLUTION: