September 13, 2017

Warm Up (with a neighbor)

Here are the first few rows of a data set with information about participants in a randomized controlled trial designed to evaluate a substance abuse treatment program. What are the observational units and variables? Are the variables categorical or quantitative?

head(HELPrct)
##   age homeless substance dep_score mental_score
## 1  37   housed   cocaine        49    25.111990
## 2  37 homeless   alcohol        30    26.670307
## 3  26   housed    heroin        39     6.762923
## 4  39   housed    heroin        15    43.967880
## 5  32 homeless   cocaine        39    21.675755
## 6  47   housed   cocaine         6    55.508991

The observational unit, variables, and variable types matter!!

They dictate

  • What questions you can ask
  • What plots you can make
  • What statistical models are appropriate

The Grammar of Graphics

A statistical graphic is a mapping of data variables

to aes()thetic attributes of geom_etric objects

The Grammar of Graphics

A statistical graphic is a mapping of data variables

to aes()thetic attributes of geom_etric objects

ggplot() +
  geom_<geometry type>(
    mapping = aes(<attribute1> = <variable1>, <attribute2> = <variable2>),
    data = <data frame>)

The Grammar of Graphics

A statistical graphic is a mapping of data variables

to aes()thetic attributes of geom_etric objects

ggplot() +
  geom_<geometry type>(
    mapping = aes(<attribute1> = <variable1>, <attribute2> = <variable2>),
    data = <data frame>)
ggplot() +
  geom_point(
    mapping = aes(x = dep_score, y = mental_score),
    data = HELPrct)

1 Categorical Variable: Bar Plot

ggplot() +
  geom_bar(mapping = aes(x = homeless),
    data = HELPrct)

2 Categorical Variables: Bar Plot

ggplot() +
  geom_bar(mapping = aes(x = homeless, fill = substance),
    data = HELPrct)

2 Categorical Variables: Bar Plot

ggplot() +
  geom_bar(mapping = aes(x = homeless, fill = substance),
    position = position_dodge(),
    data = HELPrct)

1 Quantitative Variable: Histograms

ggplot() +
  geom_histogram(mapping = aes(x = dep_score),
    data = HELPrct)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

1 Quantitative Variable: Histograms

ggplot() +
  geom_histogram(mapping = aes(x = dep_score),
    binwidth = 4,
    data = HELPrct)

1 Quantitative Variable: Histograms

ggplot() +
  geom_histogram(mapping = aes(x = dep_score, y = ..density..),
    binwidth = 4,
    data = HELPrct)

1 Quantitative Variable: Density Plots

ggplot() +
  geom_density(mapping = aes(x = dep_score),
    data = HELPrct)

2 Quantitative Variables: Scatter Plots

ggplot() +
  geom_point(
    mapping = aes(x = dep_score, y = mental_score),
    data = HELPrct)

2 Quantitative Variables: Scatter Plots

ggplot() +
  geom_point(
    mapping = aes(x = dep_score, y = mental_score,
      color = substance),
    data = HELPrct)

1 Quantitative, 1 Categorical: Box Plots

ggplot() +
  geom_boxplot(mapping = aes(x = substance, y = dep_score),
    data = HELPrct)

1 Quantitative, 1 Categorical: Density

ggplot() +
  geom_density(mapping = aes(x = dep_score, color = substance),
    data = HELPrct)

Summary

Variables Plot Type Geometry Aesthetics
1 Categorical Bar Plot geom_bar x
1 Quantitative Histogram geom_histogram x
1 Quantitative Density geom_density x
2 Categorical Bar Plot geom_bar x, fill
2 Quantitative Scatter Plot geom_point x, y
1 Categorical and 1 Quantitative Box Plot geom_boxplot x (categorical), y (quantitative)
1 Categorical and 1 Quantitative Density Plot geom_density x (quantitative), color (categorical)

Common Error 1

No + on first line:

ggplot()

  geom_point(
    mapping = aes(x = dep_score, y = mental_score),
    data = HELPrct)
## mapping: x = dep_score, y = mental_score 
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

Common Error 2

Forgot data =:

ggplot() +
  geom_point(
    mapping = aes(x = dep_score, y = mental_score))

Error in FUN(X[[i]], …) : object 'dep_score' not found