October 30, 2017

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples

Sampling Distribution of Sample Mean

  • The sampling distribution is the distribution of values of the sample mean, across all different samples of a certain size \(n\).

Sampling Distribution Depends on \(n\)

  • Always centered at population mean, but as \(n\) increases:
    • standard deviation gets smaller
    • distribution looks more normal

Sample Mean: Central Limit Theorem

  • \(Y_1, Y_2, \ldots, Y_n\) are independent observations of a quantitative variable
  • Population has mean \(\mu\) and standard deviation \(\sigma\)
  • Compute the sample mean: \(\bar{Y} = \frac{1}{n}\sum_{i=1}^n Y_i\)
  • The sampling distribution of \(\bar{Y}\):
    • has mean \(\mu\)
    • has standard deviation \(\sigma/\sqrt{n}\)
    • for large enough \(n\), it is approximately normal
    • putting this together: the sampling distribution of \(\bar{Y}\) is approximately Normal(\(\mu\), \(\sigma/\sqrt{n}\)) for large enough \(n\).

More on \(n\)

  • The sample size required for the sample mean be normally distributed depends on the population distribution

Estimating the Success Probability

  • Suppose we want to estimate the proportion \(p\) of US households who own the home they live in.

  • We take a sample of size \(n\) and count the number of households in our sample who own their home:

\[X \sim \text{Binomial}(n, p)\]

  • How can we estimate \(p\) using \(X\)?

Sampling distribution of \(\hat{p}\)

  • We will estimate the probability of success using \[\hat{p} = \frac{X}{n}\]

  • Remember that we can write \(X\) as a sum of independent Bernoulli Random Variables: \(X = X_1 + X_2 + \cdots + X_n\)

  • So \(\hat{p} = \frac{X}{n} = \frac{1}{n} \sum_i X_i\) is a sample mean of independent Bernoulli random variables

  • Since \(\hat{p} = \frac{1}{n} \sum_i X_i\), the Central Limit Theorem tells us the approximate sampling distribution of \(\hat{p}\), for large enough \(n\).

  • https://istats.shinyapps.io/SampDist_Prop/

Sampling distribution of \(\hat{p}\)

  • For a single Bernoulli random variable,
    • \(E(X_i) = p\)
    • \(SD(X_i) = \sqrt{p(1 - p)}\)
  • The CLT says that for large enough \(n\), the sampling distribution of \(\hat{p}\) is approximately \[\hat{p} \sim \text{Normal}(p, \sqrt{p(1 - p)/n})\]

  • For estimating a proportion/probability \(p\), we say \(n\) is large enough if the success/failures condition is satisfied:
    • \(np \geq 10\) and \(n(1 - p) \geq 10\)