September 20, 2017

Warmup with a neighbor (~5 min)

  • What are the observational units, variable(s), and variable type(s)?
  • What did the code I used to make the plot look like?

(image source: Wikipedia)

Summarizing Scatter Plots

  • Recall: we summarize the distribution of one continuous variable with:
    • center (mean, median)
    • spread (standard deviation, IQR)
    • shape (symmetric/skewed, unimodal/bimodal/multimodal)
    • unusual features (gaps, outliers)
  • For two continuous variables, describe:
    • direction (positive association, negative association)
    • shape (linear, curved)
    • unusual features (gaps, outliers)

Describe the relationship…

ggplot() +
  geom_point(mapping = aes(x = petal_length, y = petal_width),
    data = iris) +
  ggtitle("Petal Length (cm) vs. Petal Width (cm)\nfor 150 Iris Flowers")

  • direction (positive association, negative association)
  • shape (linear, curved)
  • unusual features (gaps, outliers)

Coloring by Species…

ggplot() +
  geom_point(mapping = aes(x = petal_length, y = petal_width,
      color = species),
    data = iris) +
  ggtitle("Petal Length (cm) vs. Petal Width (cm)\nfor 150 Iris Flowers")

Just the versicolor species

versicolor <- filter(iris, species == "versicolor")
ggplot() +
  geom_point(mapping = aes(x = petal_length, y = petal_width),
    data = versicolor) +
  ggtitle("Petal Length (cm) vs. Petal Width (cm)\nfor 150 Iris Flowers")

Units I understand: 1 cm = 0.3937 in

versicolor <- mutate(versicolor,
  petal_length_in = petal_length * 0.3937,
  petal_width_in = petal_width * 0.3937)
ggplot() +
  geom_point(mapping = aes(x = petal_length_in, y = petal_width_in),
    data = versicolor) +
  ggtitle("Petal Length (in) vs. Petal Width (in)")

Shape of Plot Doesn't Depend on Units

versicolor <- mutate(versicolor,
  z_score_length = (petal_length - mean(petal_length))/sd(petal_length),
  z_score_width = (petal_width - mean(petal_width))/sd(petal_width))
ggplot() +
  geom_point(mapping = aes(x = z_score_length, y = z_score_width),
    data = versicolor) +
  ggtitle("Petal Length vs. Petal Width")

Correlation

  • The (almost) average of products of \(z\)-scores: \(r = \frac{\sum_{i=1}^n z^x_{i} z^y_{i}}{n - 1}\)

  • Using \(z\)-scores instead of original units means \(-1 \leq r \leq 1\)
  • In this case, about 0.79

Calculation in R

cor(versicolor$petal_length, versicolor$petal_width)
## [1] 0.7866681