September 20, 2017

Warmup with a neighbor (~10 min)

  • What are the observational units, variable(s), and variable type(s) in each plot?
  • What did the code I used to make the plots look like?
  • What statistics should we use for the center and spread?

All three are Nearly Normal

  • The Nearly Normal condition:
    • Distribution is unimodal
    • Distribution is (approximately) symmetric

Why does this matter?

  • For any variable with a nearly normal distribution, we can use the same rules to calculate:
    1. Percentiles/quantiles
    2. The proportion of the data that are less than a given value.
  • Lots of variables have a nearly normal distribution!

The normal model

  • \(N(\mu, \sigma)\)
  • Read: "normal distribution with mean \(\mu\) and standard deviation \(\sigma\)"
  • \(\mu\) and \(\sigma\) are parameters

  • To use the model with real data, we estimate \(\mu\) and \(\sigma\) with the sample mean \(\bar{y}\) and standard deviation \(s\)

Example

summarize(car_speeds, mean_speed = mean(speed), sd_speed = sd(speed))
## # A tibble: 1 x 2
##   mean_speed sd_speed
##        <dbl>    <dbl>
## 1    23.8439 3.563338
  • Example: red curve is a \(N(23.8, 3.6)\) distribution

The 68-95-99.7 rule:

Examples: Using the 68-95-99.7 rule

  • If driver speeds in a 20 MPH speed zone can be represented by a \(N(23.8, 3.6)\) model, find the following:
    • The proportion of drivers who drive between 20.2 and 27.6 MPH.
    • The proportion of drivers who drive less than 20.2 MPH
    • The 2.5th percentile of driver speeds
    • The 50th percentile of driver speeds

Your turn: Using the 68-95-99.7 rule

summarize(pizza, mean_price = mean(Price), sd_price = sd(Price))
## # A tibble: 1 x 2
##   mean_price  sd_price
##        <dbl>     <dbl>
## 1   2.619038 0.1559685

Your turn: Using the 68-95-99.7 rule

  • If the cost of a slice of pizza can be represented by a \(N(2.62, 0.16)\) model, find the following:
    • The proportion of pizza shops where a slice of pizza costs less than $2.30.
    • The 84th percentile of pizza slice costs
    • A lower and upper bound on the 99th percentile of pizza slice costs

\(z\)-scores

  • To calculate percentiles, we only need to know the number of standard devations above or below the mean a particular value is.
  • This is the \(z\)-score:

\[z = \frac{y - \mu}{\sigma}\]

\(z\)-scores: examples

  • Ex: Suppose a police officer pulls over someone who was going 31MPH in a 20MPH zone. Assume a \(N(23.8, 3.6)\) model applies.
    • How many standard deviations above the mean was that driver going?
    • What percentile of driving speeds were they at?
  • Ex: Suppose a slice of pizza costs $2.94. Assume a \(N(2.62, 0.16)\) model applies.
    • How many standard deviations above the mean did that piece of pizza cost?
    • What percentile of costs was that slice at?

\(z\)-scores: examples

  • Ex: Suppose a police officer pulls over someone who was going 31MPH in a 20MPH zone. Assume a \(N(23.8, 3.6)\) model applies.
    • How many standard deviations above the mean was that driver going?
    • What percentile of driving speeds were they at?
  • Ex: Suppose a slice of pizza costs $2.94. Assume a \(N(2.62, 0.16)\) model applies.
    • How many standard deviations above the mean did that piece of pizza cost?
    • What percentile of costs was that slice at?
  • Apparently, driving 31 MPH in a 20 MPH zone is as rare as getting a slice of pizza for $2.94

The normal model in R: quantiles

  • Use qnorm to calculate quantiles (remember – essentially the same thing as percentiles)
    • What is the 90th percentile of speeds in a 20 MPH speed zone? Assume a \(N(23.8, 3.6)\) model applies.
qnorm(p = 0.90, mean = 23.8, sd = 3.6)
## [1] 28.41359

The normal model in R: proportions

  • Use pnorm to calculate proportion of data that are less than a particular value
    • What proportion of drivers travel less than 30 MPH?
pnorm(q = 30, mean = 23.8, sd = 3.6)
## [1] 0.9574854
  • For you to do (draw a picture!):
    • What proportion of drivers travel more than 30 MPH?
    • What proportion of drivers travel between 20 and 25 MPH?

Summary

  1. Everything in this chapter is for
    • one quantitative variable
    • that satisfies the nearly normal condition (unimodal, symmetric)
  2. There are 2 basic types of calculations:
    • Find a percentile/quantile
    • Find the proportion of the data that are in a given range of values.
  3. We can do calculations using either:
    • The 68-95-99.7 rule – often only approximate
    • R (qnorm for quantiles, pnorm for proportion of data less than a given number)