Suppose your friend claims to be able to tell the difference between the tastes of Coke and Pepsi. Within your groups, design an experiment to test this claim. You should answer the following:
SOLUTION:
What are the experimental units?
What aspects of the experiment (if any) will you control?
What aspects of the experiment (if any) will you randomize?
What aspects of the experiment (if any) will you replicate?
What aspects of the experiment (if any) will you block?
Will you use any sort of blinding? (Why?)
SOLUTION:
SOLUTION:
SOLUTION:
Let’s suppose that you decided to run a fairly simple experiment, as follows: You prepare 5 cups of soda; each cup is an experimental unit. For each cup, you randomly decide whether that cup has Coke or Pepsi in it by flipping a coin. You control the temperature of the sodas so that both Coke and Pepsi are the same temprature; you control the cups, using the same cups for both sodas; and so on. You give the cups to your friend one at a time and ask her to determine whether each cup has Coke or Pepsi in it. You record the number of cups where your friend made a correct determination of the soda type.
Suppose your friend correctly identifies the soda on four out of five trials. That’s pretty good, but is it enough to conclude that she can tell the difference between Coke and Pepsi?
Let’s examine this using a kind of reverse psychology: What if your friend was not able to distinguish between Coke and Pepsi? How likely would she be to guess correctly at least four times out of five? We will say that there is strong evidence that she can distinguish between Coke and Pepsi if she was unlikely to guess correctly at least 4 times by random guessing.
Our exploration will have three phases. The first two will be familiar to most of you already, since we did something very similar on the first day of class: 1. A physical simulation flipping coins 2. A simulation in R 3. A more formal mathematical analysis
The key question here is to determine what results would occur in the long run under the assumption that your friend is not able to tell the difference between Coke and Pepsi. (We will call this assumption the null model or null hypothesis.) We will answer this question by simulating the process of determining soda types in five trials over and over, assuming that each guess is random and has a 50/50 chance of being correct.
R
The coin-flipping process got us an approximation to the sampling distribution of the total number of correct guesses if the null hypothesis is true, but our approximation would be improved by repeating the simulation process thousands of times. Let’s use R.
The command to flip a coin in R is called rbinom
. The following command flips a coin 5 times and counts the number of heads; try it out:
rbinom(n = 1, size = 5, prob = 0.5)
## [1] 5
Just as in the Capture-Recapture lab, we can repeat this simulation process 1000 as follows:
simulation_results <- do(1000) * {
data.frame(num_correct = rbinom(n = 1, size = 5, prob = 0.5))
}
Run the above chunk. You should see a new data frame created in your Global Environment, with 1000 rows (one for each simulation) and a variable called num_correct
with the number of correct guesses in each simulation run.
num_correct
variable in the simulation_results
data frame. Use a binwidth = 1
argument.ggplot() +
geom_histogram(aes(x = num_correct), binwidth = 1, data = simulation_results)
The following command calculates the number of simulation runs that resulted in a num_correct
of 0, 1, 2, and so on:
table(simulation_results$num_correct)
##
## 0 1 2 3 4 5
## 43 159 302 314 156 26
This proportion is called an approximate p-value. A p-value is the probability of obtaining a result as extreme as the one observed, assuming that the null hypothesis is true. A small p-value casts doubt on the null model/hypothesis used to perform the calculation (in this case, that infants have no genuine preference).
SOLUTION:
SOLUTION:
SOLUTION:
Repeating the simulation 1000 times is better than just a few times, but the approximation to the sampling distribution we got above by doing the simulation 1000 times is still not perfect. We can do better. Our ultimate goal here is to calculate the exact p-value. That is, if the null hypothesis was true and our friend could not distinguish between Coke and Pepsi, what are the chances that she would guess correctly at least 4 times? The next few questions will lead you through an exact calculation of this probability.
The first step is to think about all possible outcomes of the experiment. In our case, we can represent a single outcome by a sequence of five letters of the form “CCIIC”, where a C means that the guess was Correct, and an I means the guess was Incorrect. For example, the sequence “CCIIC” means that the first two guesses were correct, the third and fourth guesses were incorrect, and the fifth guess was correct.
The sample space is the set of all possible outcomes of a random phenomenon. We use the capital letter S to denote the sample space.
SOLUTION:
SOLUTION:
A =
B =
\[P(A) = \frac{\text{Number of outcomes in the event A}}{\text{Number of outcomes in the sample space, S}}\] Note that in our example, if your friend is actually just guessing randomly then all of the outcomes you wrote down in part i) are equally likely. Use this to find the probabilities of the events A and B we defined in part j).
SOLUTION:
\[P(A \text{ or } B) = P(A) + P(B)\]
Use this formula, in combination with you answers from part k), to calculate the probability that your friend would guess correctly at least 4 times out of five. This is the exact p-value.
SOLUTION:
The process we followed in this lab to analyze the strength of evidence from the experiment had four steps:
All of these steps are important, but the conceptual foundation of this approach is in step 3. As we’ve seen here, calculating probabilities of events directly can be tedious – and our experimental setup was pretty simple! We’ll spend the next week or two looking at rules for how to do these calculations and make them less tedious.