- The sampling distribution is the distribution of values of the sample mean, across all different samples
October 30, 2017
Suppose we want to estimate the proportion \(p\) of US households who own the home they live in.
We take a sample of size \(n\) and count the number of households in our sample who own their home:
\[X \sim \text{Binomial}(n, p)\]
We will estimate the probability of success using \[\hat{p} = \frac{X}{n}\]
Remember that we can write \(X\) as a sum of independent Bernoulli Random Variables: \(X = X_1 + X_2 + \cdots + X_n\)
So \(\hat{p} = \frac{X}{n} = \frac{1}{n} \sum_i X_i\) is a sample mean of independent Bernoulli random variables
Since \(\hat{p} = \frac{1}{n} \sum_i X_i\), the Central Limit Theorem tells us the approximate sampling distribution of \(\hat{p}\), for large enough \(n\).
The CLT says that for large enough \(n\), the sampling distribution of \(\hat{p}\) is approximately \[\hat{p} \sim \text{Normal}(p, \sqrt{p(1 - p)/n})\]