Anscombe’s data

The following R chunk reads in a data set called anscombe. It has four pairs of x variables and y variables: (x1, y1), (x2, y2), (x3, y3), and (x4, y4). You will examine one of these pairs of variables, and then we will discuss them all as a class.

anscombe <- read_csv("https://mhc-stat140-2017.github.io/data/base_r/anscombe.csv")

1) Fit a linear model

SOLUTION:

I’ll just do this for the (x1, y1) pair.

## use the lm() function to fit a linear model.
## It should look like this: anscombe_fit <- lm( ~, data = anscombe)
## You will need to fill in the proper formula with the
## response and explanatory variables you're using.
anscombe_fit <- lm(y1 ~ x1, data = anscombe)

Note: y1 is the response variable and x1 is the explanatory variable. The format of the formula in lm is “response ~ explanatory”.

2) Report and interpret the regression coefficients

SOLUTION:

## Use the coef() function to print out the regression coefficients,
## or use summary(anscombe_fit) to print out more information, including the
## regression coefficients
coef(anscombe_fit)
## (Intercept)          x1 
##   3.0000909   0.5000909
summary(anscombe_fit)
## 
## Call:
## lm(formula = y1 ~ x1, data = anscombe)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.92127 -0.45577 -0.04136  0.70941  1.83882 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.0001     1.1247   2.667  0.02573 * 
## x1            0.5001     0.1179   4.241  0.00217 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared:  0.6665, Adjusted R-squared:  0.6295 
## F-statistic: 17.99 on 1 and 9 DF,  p-value: 0.00217

The estimated coefficients are b0 = 3 and b1 = 0.5. If the x1 variable is 0, the predicted value of y1 is 3. For each unit increase in the value of x1, the predicted value of y1 increases by 0.5.

Note that in the output from summary, the coefficient estimates are in the “Estimate” column of the coefficients table.

3) Report the residual standard deviation and interpret it using the “95” part of the 68-95-99.7 rule.

SOLUTION:

## If you didn't already print out the summary(anscombe_fit) for part 2,
## you'll have to do it now

The residual standard deviation is about 1.24 (it is mislabeled as “Residual standard error” in the R output). For about 95% of the data set, the observed value of y1 was within plus or minus 2*1.24, or plus or minus 2.48, of the predicted value from the linear model. There aren’t any units in this problem because it’s a made up data set – but in a real problem, I’d give the units.

4) Report and interpret \(R^2\)

SOLUTION:

The \(R^2\) value is about 0.667 (labeled as “Multiple R-squared” in the R output). The linear model using x1 as an explanatory variable accounts for about 66.7 percent of the variability in the response variable, y1.

5) Put your regression coefficients, residual standard deviation, and \(R^2\) on the board.