October 23, 2017

Warm Up

The best place to go Trick-or-Treating is at Willy Wonka's house, because he has the best candy. This year, he'll be giving out two types of candy: Everlasting Gobstoppers and Scrumdiddlyumptious candies. Define the following events:

A = the event that he gives you an Everlasting Gobstopper

B = the event that he gives you a Scrumdiddlyuptious.

I heard from an Oompa-Loompa that \(P(A) = \frac{1}{3}\), \(P(B) = \frac{2}{3}\), and \(P(A\text{ and }B) = \frac{2}{9}\).

Are the events \(A\) and \(B\) independent?

Are the events \(A\) and \(B\) disjoint?

Random Variables

  • A random variable is a variable whose possible values are numerical outcomes of a random phenomenon.

  • Example: Number of times Paul the Octopus correctly predicts the winner of a World Cup Soccer/Football game in 8 attempts

(image credit: Wolfgang Rattay/Reuters)

Bernoulli Trials

  • Bernoulli trials are sequences random experiments with three characteristics:
    1. on each trial, there are exactly two outcomes
      • success (prediction correct!)
      • failure (prediction incorrect!)
    2. The probability of success is the same on all trials
      • probability of making the right prediction is the same across all 8 games
    3. The trials are independent
      • Knowing how well Paul did in one game does not give you any information about how well he will do on the next game (other than that the probability of success is the same in both games)

Bernoulli Random Variables

  • A Bernoulli random variable represents the outcome of a single Bernoulli trial:
    • \(X = 1\) if the trial is a success
    • \(X = 0\) if the trail is a failure
    • \(P(X = 1) = p\)
  • Write \(X \sim \text{Bernoulli}(p)\)
    • "\(X\) follows a Bernoulli distribution with probability \(p\)"
  • Notation:
    • Use capital letters (like \(X\), \(Y\)) for random variables
    • Use lower case letters (\(x\), \(y\)) for observed values, or values we might possibly observe at some point

The Binomial Model

  • The binomial model describes the number of successes in \(n\) Bernoulli trials.

  • \(X \sim \text{Binomial}(n, p)\)
    • "\(X\) follows a Binomial distribution with \(n\) trials and probability of success \(p\)"
  • Back to Paul…
    • Define \(X\) = the number of successful predictions in 8 attempts.
    • Suppose \(p = 0.9\) (Paul's predictions are pretty good!)
    • We could use the model \(X \sim \text{Binomial}(8, 0.9)\)
    • Don't forget to check assumptions (2 possible outcomes; same probability of success on all trials; trials are independent)

Calculations with the Binomial

  • Suppose \(X \sim Binomial(8, 0.9)\)
  • It's helpful to think of \(X\) as the sum of 8 independent Bernoulli random variables: \[X = X_1 + X_2 + X_3 + X_4 + X_5 + X_6 + X_7 + X_8\]
  • (\(X_1\) and \(X_2\) are independent if knowing the value of \(X_1\) doesn't tell you anything about the value of \(X_2\).)
  • What's the probability that Paul gets 8 out of 8 predictions correct?
  • What's the probability that Paul gets the first 7 predictions correct and the last one wrong?
  • What's the probability that Paul gets 7 out of 8 predictions correct? (we're not specifying which ones he got right)

More Calculations with the Binomial

  • If \(X \sim Binomial(n, p)\), then \[P(X = x) = \begin{pmatrix} n \\ x \end{pmatrix} p^x (1 - p)^{(n - x)}\]
  • \(\begin{pmatrix} n \\ x \end{pmatrix}\) (read "\(n\) choose \(x\)") is the number of ways of picking which \(x\) of the \(n\) trials will be successes.
  • There's a formula for \(\begin{pmatrix} n \\ x \end{pmatrix}\) as well: \[\begin{pmatrix} n \\ x \end{pmatrix} = \frac{n!}{x! (n - x)!}\]
  • Let's not do this by hand; you don't need to remember the contents of this slide

Calculations with the Binomial in R

  • To calculate \(P(X = 7)\), use
dbinom(x = 7, size = 8, prob = 0.9)
## [1] 0.3826375
  • To calculate \(P(X \leq 7)\), use
pbinom(q = 7, size = 8, prob = 0.9)
## [1] 0.5695328
  • Note: the size argument in the R functions matches up to \(n\) in the mathematical notation for a \(\text{Binomial}(n, p)\) distribution.

The Full Binomial Distribution

  • We can use dbinom to calculate the probability of \(x\) successes for each possible \(x\) from 0 to 8:
Paul_success_probs <- data.frame(
  num_successes = seq(from = 0, to = 8),
  probability = dbinom(x = seq(from = 0, to = 8), size = 8, prob = 0.9))
Paul_success_probs
##   num_successes probability
## 1             0  0.00000001
## 2             1  0.00000072
## 3             2  0.00002268
## 4             3  0.00040824
## 5             4  0.00459270
## 6             5  0.03306744
## 7             6  0.14880348
## 8             7  0.38263752
## 9             8  0.43046721

Expected Value

  • How many do we expect Paul to get right? (same as average, or mean)

  • \[ \begin{align*} &\mu = E(X) = \sum_x x P(X = x) \\ &\qquad = 0 \cdot P(X = 0) + 1 \cdot P(X = 1) + \cdots + 8 \cdot P(X = 8) \\ &\qquad = 7.2 \end{align*} \]

Variance and Standard Deviation

  • Variance:
  • \[ \begin{align*} &\sigma^2 = Var(X) = \sum_x (x - \mu)^2 P(X = x) \\ &\qquad = (0 - 7.2)^2 \cdot P(X = 0) + \cdots \\ &\qquad \qquad \qquad + (8 - 7.2)^2 \cdot P(X = 8) \\ &\qquad = 0.72 \end{align*} \]
  • Standard Deviation: \(\sigma = SD(X) = \sqrt{Var(X)}\)
  • \(\sigma = \sqrt{\sigma^2} = 0.85\)

Expected Value, Variance and SD of a Binomial

If \(X \sim Binomial(n, p)\), then:

  • \(E(X) = np\)
    • (Note, for Paul \(8 * 0.9 = 7.2\))
  • \(Var(X) = np(1 - p)\)
    • (Note, for Paul \(8 * 0.9 * (1 - 0.9) = 0.72\))
  • \(SD(X) = \sqrt{np(1 - p)}\)

General Properties

  • If \(X\) and \(Y\) are random variables and \(a\) is a number, then
    • \(E(aX) = aE(X)\)
    • \(E(X \pm Y) = E(X) \pm E(Y)\)
  • If \(X\) and \(Y\) are independent random variables and \(a\) is a number, then
    • \(Var(aX) = a^2Var(X)\)
    • \(Var(X \pm Y) = Var(X) + Var(Y)\)
  • Note, this means things are more complicated for standard deviation:
    • \(SD(X \pm Y) = \sqrt{SD(X)^2 + SD(Y)^2}\)