Data were collected on the volume of users on the Northampton Rail Trail in Florence, Massachusetts. Variables in the dataset include the number of crossings on a particular day (measured by a sensor near the intersection with Chestnut Street, volume
), the average of the min and max temperature in degrees Fahrenheit for that day (avgtemp
), and a dichotomous indicator of whether the day was a weekday or a weekend/holiday (weekday
).
RailTrail <- mutate(RailTrail, daytype = ifelse(weekday==1, "Weekday", "Wkend/Holiday"))
head(RailTrail)
## hightemp lowtemp avgtemp spring summer fall cloudcover precip volume
## 1 83 50 66.5 0 1 0 7.6 0.00 501
## 2 73 49 61.0 0 1 0 6.3 0.29 419
## 3 74 52 63.0 1 0 0 7.5 0.32 397
## 4 95 61 78.0 0 1 0 2.6 0.00 385
## 5 44 52 48.0 1 0 0 10.0 0.14 200
## 6 69 54 61.5 1 0 0 6.6 0.02 375
## weekday daytype
## 1 1 Weekday
## 2 1 Weekday
## 3 1 Weekday
## 4 0 Wkend/Holiday
## 5 1 Weekday
## 6 1 Weekday
str(RailTrail)
## 'data.frame': 90 obs. of 11 variables:
## $ hightemp : int 83 73 74 95 44 69 66 66 80 79 ...
## $ lowtemp : int 50 49 52 61 52 54 39 38 55 45 ...
## $ avgtemp : num 66.5 61 63 78 48 61.5 52.5 52 67.5 62 ...
## $ spring : int 0 0 1 0 1 1 1 1 0 0 ...
## $ summer : int 1 1 0 1 0 0 0 0 1 1 ...
## $ fall : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cloudcover: num 7.6 6.3 7.5 2.6 10 ...
## $ precip : num 0 0.29 0.32 0 0.14 ...
## $ volume : int 501 419 397 385 200 375 417 629 533 547 ...
## $ weekday : Factor w/ 2 levels "0","1": 2 2 2 1 2 2 2 1 1 2 ...
## $ daytype : chr "Weekday" "Weekday" "Weekday" "Wkend/Holiday" ...
volume
variable on the vertical axis and the avgtemp
variable on the horizontal axis.# Your code goes here
avgtemp
(average of min and max temperatures). Be sure to describe the direction, form, strength, and any unusual features.SOLUTION:
cor
function.# Your code goes here.
SOLUTION:
Here are some data about the total revenue of arcades in the US (in millions of dollars) and the number of lawyers in Wyoming in each year from 2000 to 2009.
Arcades_and_Lawyers <-
read_csv("https://mhc-stat140-2017.github.io/labs/20170922_correlation/data/arcade_revenue_lawyers_Wyoming.csv")
## Parsed with column specification:
## cols(
## Year = col_integer(),
## `Total revenue generated by arcades (US, millions of dollars)` = col_integer(),
## `Lawyers in Wyoming` = col_integer()
## )
names(Arcades_and_Lawyers) <- c("year", "arcade_revenue", "lawyers_in_Wyoming")
arcade_revenue
on the horizontal axis and lawyers_in_Wyoming
on the vertical axis.SOLUTION:
# Your code goes here.
SOLUTION:
arcade_revenue
and lawyers_in_Wyoming
SOLUTION:
# Your code goes here
SOLUTION:
Go to http://www.tylervigen.com/spurious-correlations
and browse through some of the plots there. Become fully and deeply convinced that if two variables have a high correlation, that does not tell you anything about one variable causing the other.