---
title: "Experiments and Observational Studies"
author: "Evan L. Ray"
date: "October 6, 2017"
output: ioslides_presentation
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
require(ggplot2)
require(dplyr)
require(tidyr)
require(readr)
```
## Context
* Evaluating potential cost savings of housing programs:
> "We examined changes in service use in a Housing First (HF) pilot program for adults who were homeless with medical illnesses and high prior acute-care use relative to a similar comparison group. ... Reductions in estimated costs for participants and comparison group members were $62,504 and $25,925 per person per year-a difference of $36,579, far outweighing program costs of $18,600 per person per year. ... HF participants showed striking reductions in acute-care use relative to the comparison group, demonstrating that HF can be a successful model for people with complex medical conditions and high prior acute-care use."
Srebnik et al., American Journal of Public Health 103, no. 2 (2013).
* **What are the explanatory and response variables?**
## How Could We Study This?
There are basically 2 types of studies:
1. **Observational** studies: the explanatory variable(s) are not manipulated or controlled by the researcher.
* Subjects either end up in the Housing First program or not, for reasons outside of the researcher's control.
2. **Experiments**: The explanatory variable(s) are controlled by the researcher (and the researcher randomly assigns the value of the explanatory variable to each subject).
* The researcher (randomly) determines who will be enrolled in the Housing First program
* To demonstrate a **causal** relationship, need to run an **experiment**.
## Factors and Treatments
* **Factor**: An explanatory variable whose levels are manipulated (i.e., assigned to the subjects/experimental units) by the researcher.
* In our example: Does the subject enroll in the Housing First program?
* There may be more than one factor in a given study.
* This is different from R's use of the word factor to refer to any categorical variable!!
* **Treatment**: The combination of factor levels a given subject or experimental unit is assigned to.
* In our example: "Assigned to Housing First" or "Not assigned to Housing First"
* If we had two factors, treatment would be the combined levels of those factors
## Confounding
* **Confounding**: When the levels of one factor (explanatory variable) are associated with the levels of another factor, we can't tell which one causes the response
* **(Made Up) Example:**
* Costs to the public depend on whether a subject is enrolled in the Housing First program
* Costs to the public depend on medical history (e.g., maybe subjects with a history of substance abuse incur more costs to the public)
* Suppose the Housing First program doesn't accept subjects with a history of substance abuse
* An individual's **enrollment status in Housing First** and history of **Substance Abuse** would be confounded, so we would not be able to isolate their effects on costs to the public.
## Four Priciples of Experimental Design
* **Control**: Control sources of variation other than the factors we are testing by making conditions as similar as possible for all treatment groups.
* Ensure that all conditions other than whether or not a subject is in Housing First are the same for all subjects.
* **Randomization**: Subjects/experimental units are assigned to treatments at random to equalize the effects of unknown or uncontrollable sources of variation.
## Four Priciples of Experimental Design
* **Replication**: Each treatment is applied to more than one subject/experimental unit.
* **Blocking**: Group together subjects/experimental units that are similar in important ways that you cannot control, then randomize the assignment of treatments within each of these groups, or blocks.
* If we think a person's medical conditions are related to possible costs to the public, form a group of people with similar medical histories, randomly assign some to Housing First and some to control.
## Other Terms
* **Matching** In an observational study, study participants who are similar in ways that are not directly being studied, but have different levels of the explanatory variables of interest, are *matched* and the response is compared between these matched participants.
* If we think a person's medical conditions are related to possible costs to the public, find two people with similar medical histories: one in Housing First and one not in Housing First; compare costs for those subjects.
* **Blinding**: Any individual involved in an experiment (including the subjects and the researchers) who does not know which subjects/experimental units have been assigned to which treatments is **blinded**.
## A Final Thought
* **This study found preliminary evidence that a housing first program targeted to "adults who were homeless with medical illnesses and high prior acute-care use" could result in cost savings to the public**.
* Is a statistical analysis of costs to the public a good way of justifying this program or arguing for its funding?
## A Final Thought
* **This study found preliminary evidence that a housing first program targeted to "adults who were homeless with medical illnesses and high prior acute-care use" could result in cost savings to the public**.
* Is a statistical analysis of costs to the public a good way of justifying this program or arguing for its funding?
* Should we only provide housing to homeless adults who have sufficiently high medical costs that savings of at least $18,600 can be achieved?
## A Final Thought
* **This study found preliminary evidence that a housing first program targeted to "adults who were homeless with medical illnesses and high prior acute-care use" could result in cost savings to the public**.
* Is a statistical analysis of costs to the public a good way of justifying this program or arguing for its funding?
* Should we only provide housing to homeless adults who have sufficiently high medical costs that savings of at least $18,600 can be achieved?
* Before you do a study and run a fancy statistical analysis, make sure you're answering the right question.