The following R chunk loads a data set with the average number of acres burned per wildfire (in hundreds of thousands of acres) in each year from 1985 to 2012. I’ve written this code out for you, but at this point in the class it would be good if you understand what each line of code in this R chunk does. Let me know if any of this isn’t clear!
wildfires <- read_csv("https://mhc-stat140-2017.github.io/data/sdm4/Wildfires_2012.csv")
names(wildfires) <- c("num_fires", "years_since_1985", "ave_acres_burned")
wildfires <- mutate(wildfires, year = years_since_1985 + 1985)
wildfires <- arrange(wildfires, year)
head(wildfires)
## # A tibble: 6 x 4
## num_fires years_since_1985 ave_acres_burned year
## <int> <int> <dbl> <dbl>
## 1 82591 0 29.0 1985
## 2 85907 1 27.2 1986
## 3 71300 2 24.5 1987
## 4 72750 3 50.1 1988
## 5 48949 4 18.3 1989
## 6 66481 5 46.2 1990
str(wildfires)
## Classes 'tbl_df', 'tbl' and 'data.frame': 28 obs. of 4 variables:
## $ num_fires : int 82591 85907 71300 72750 48949 66481 75754 87394 58810 79107 ...
## $ years_since_1985: int 0 1 2 3 4 5 6 7 8 9 ...
## $ ave_acres_burned: num 29 27.2 24.5 50.1 18.3 ...
## $ year : num 1985 1986 1987 1988 1989 ...
years_since_1985
on the horizontal axis and ave_acres_burned
on the vertical axis.SOLUTION:
# Your code goes here
SOLUTION:
ave_acres_burned
as the response variable and years_since_1985
as the explanatory variable.SOLUTION:
# Your code goes here
geom_smooth
# Your code goes here
mutate
function to add two new variables to the wildfires
data frame: one with the predicted values from the model, and one with the residuals (we won’t need the predicted values in this lab, but we will use them in future classes and it’s good to get in the habit of saving them). Then, make a density plot of the residuals. Is the assumption that the residuals follow a nearly normal distribution satisfied?SOLUTION:
# Your code goes here.
coef
function to print out the estimated coefficients. Write down the equation of the line of regression predicting number of acres burned from the number of years since 1985. Interpret the intercept and slope in the context of the problem. Does the interpretation of the intercept make sense?SOLUTION:
# Your code goes here
SOLUTION:
year
as the explanatory variable. Use the coef
function to obtain the estimated model coefficients, and write down the estimated regression equation. Interpret the coefficients in this model. In this model, does the interpretation of the intercept make sense? Compare the slope to what you got in part (f).SOLUTION:
# Your code goes here
SOLUTION: