The following R chunk reads in a data set called anscombe
. It has four pairs of x variables and y variables: (x1, y1), (x2, y2), (x3, y3), and (x4, y4). You will examine one of these pairs of variables, and then we will discuss them all as a class.
anscombe <- read_csv("https://mhc-stat140-2017.github.io/data/base_r/anscombe.csv")
SOLUTION:
I’ll just do this for the (x1, y1) pair.
## use the lm() function to fit a linear model.
## It should look like this: anscombe_fit <- lm( ~, data = anscombe)
## You will need to fill in the proper formula with the
## response and explanatory variables you're using.
anscombe_fit <- lm(y1 ~ x1, data = anscombe)
Note: y1 is the response variable and x1 is the explanatory variable. The format of the formula in lm is “response ~ explanatory”.
SOLUTION:
## Use the coef() function to print out the regression coefficients,
## or use summary(anscombe_fit) to print out more information, including the
## regression coefficients
coef(anscombe_fit)
## (Intercept) x1
## 3.0000909 0.5000909
summary(anscombe_fit)
##
## Call:
## lm(formula = y1 ~ x1, data = anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.92127 -0.45577 -0.04136 0.70941 1.83882
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0001 1.1247 2.667 0.02573 *
## x1 0.5001 0.1179 4.241 0.00217 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared: 0.6665, Adjusted R-squared: 0.6295
## F-statistic: 17.99 on 1 and 9 DF, p-value: 0.00217
The estimated coefficients are b0 = 3 and b1 = 0.5. If the x1 variable is 0, the predicted value of y1 is 3. For each unit increase in the value of x1, the predicted value of y1 increases by 0.5.
Note that in the output from summary, the coefficient estimates are in the “Estimate” column of the coefficients table.
SOLUTION:
## If you didn't already print out the summary(anscombe_fit) for part 2,
## you'll have to do it now
The residual standard deviation is about 1.24 (it is mislabeled as “Residual standard error” in the R output). For about 95% of the data set, the observed value of y1 was within plus or minus 2*1.24, or plus or minus 2.48, of the predicted value from the linear model. There aren’t any units in this problem because it’s a made up data set – but in a real problem, I’d give the units.
SOLUTION:
The \(R^2\) value is about 0.667 (labeled as “Multiple R-squared” in the R output). The linear model using x1 as an explanatory variable accounts for about 66.7 percent of the variability in the response variable, y1.