Scatterplots and correlation

A First Example

Rail Trails

Data were collected on the volume of users on the Northampton Rail Trail in Florence, Massachusetts. Variables in the dataset include the number of crossings on a particular day (measured by a sensor near the intersection with Chestnut Street, volume), the average of the min and max temperature in degrees Fahrenheit for that day (avgtemp), and a dichotomous indicator of whether the day was a weekday or a weekend/holiday (weekday).

RailTrail <- mutate(RailTrail, daytype = ifelse(weekday==1, "Weekday", "Wkend/Holiday"))
head(RailTrail)
##   hightemp lowtemp avgtemp spring summer fall cloudcover precip volume
## 1       83      50    66.5      0      1    0        7.6   0.00    501
## 2       73      49    61.0      0      1    0        6.3   0.29    419
## 3       74      52    63.0      1      0    0        7.5   0.32    397
## 4       95      61    78.0      0      1    0        2.6   0.00    385
## 5       44      52    48.0      1      0    0       10.0   0.14    200
## 6       69      54    61.5      1      0    0        6.6   0.02    375
##   weekday       daytype
## 1       1       Weekday
## 2       1       Weekday
## 3       1       Weekday
## 4       0 Wkend/Holiday
## 5       1       Weekday
## 6       1       Weekday
str(RailTrail)
## 'data.frame':    90 obs. of  11 variables:
##  $ hightemp  : int  83 73 74 95 44 69 66 66 80 79 ...
##  $ lowtemp   : int  50 49 52 61 52 54 39 38 55 45 ...
##  $ avgtemp   : num  66.5 61 63 78 48 61.5 52.5 52 67.5 62 ...
##  $ spring    : int  0 0 1 0 1 1 1 1 0 0 ...
##  $ summer    : int  1 1 0 1 0 0 0 0 1 1 ...
##  $ fall      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cloudcover: num  7.6 6.3 7.5 2.6 10 ...
##  $ precip    : num  0 0.29 0.32 0 0.14 ...
##  $ volume    : int  501 419 397 385 200 375 417 629 533 547 ...
##  $ weekday   : Factor w/ 2 levels "0","1": 2 2 2 1 2 2 2 1 1 2 ...
##  $ daytype   : chr  "Weekday" "Weekday" "Weekday" "Wkend/Holiday" ...

Make a scatter plot using the RailTrail data with the volume variable on the vertical axis and the avgtemp variable on the horizontal axis.

# Your code goes here

Describe the relationship between the number of crossings and avgtemp (average of min and max temperatures). Be sure to describe the direction, form, strength, and any unusual features.

SOLUTION:

Report and interpret the correlation between average temp and number of crossings. Use the cor function.

# Your code goes here.

SOLUTION:

Thinking about correlation and dependence (THEY ARE NOT THE SAME!!!)

Arcade Revenue and Lawyers in Wyoming

Here are some data about the total revenue of arcades in the US (in millions of dollars) and the number of lawyers in Wyoming in each year from 2000 to 2009.

Arcades_and_Lawyers <-
  read_csv("https://mhc-stat140-2017.github.io/labs/20170922_correlation/data/arcade_revenue_lawyers_Wyoming.csv")
## Parsed with column specification:
## cols(
##   Year = col_integer(),
##   `Total revenue generated by arcades (US, millions of dollars)` = col_integer(),
##   `Lawyers in Wyoming` = col_integer()
## )
names(Arcades_and_Lawyers) <- c("year", "arcade_revenue", "lawyers_in_Wyoming")

Make a scatter plot with arcade_revenue on the horizontal axis and lawyers_in_Wyoming on the vertical axis.

SOLUTION:

# Your code goes here.

Describe the direction, form, and strength of the relationship between arcade revenue and the number of lawyers in Wyoming

SOLUTION:

Calculate the correlation between arcade_revenue and lawyers_in_Wyoming

SOLUTION:

# Your code goes here

Do the correlation and relationship that you described above mean that there is a causal relationship between arcade revenue and the number of lawyers in Wyoming?

SOLUTION:

Spurious Correlations

Go to http://www.tylervigen.com/spurious-correlations and browse through some of the plots there. Become fully and deeply convinced that if two variables have a high correlation, that does not tell you anything about one variable causing the other.