--- title: "Experiments and Observational Studies" author: "Evan L. Ray" date: "October 6, 2017" output: ioslides_presentation --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) require(ggplot2) require(dplyr) require(tidyr) require(readr) ``` ## Context * Evaluating potential cost savings of housing programs: > "We examined changes in service use in a Housing First (HF) pilot program for adults who were homeless with medical illnesses and high prior acute-care use relative to a similar comparison group. ... Reductions in estimated costs for participants and comparison group members were $62,504 and $25,925 per person per year-a difference of $36,579, far outweighing program costs of $18,600 per person per year. ... HF participants showed striking reductions in acute-care use relative to the comparison group, demonstrating that HF can be a successful model for people with complex medical conditions and high prior acute-care use." Srebnik et al., American Journal of Public Health 103, no. 2 (2013). * **What are the explanatory and response variables?** ## How Could We Study This? There are basically 2 types of studies: 1. **Observational** studies: the explanatory variable(s) are not manipulated or controlled by the researcher. * Subjects either end up in the Housing First program or not, for reasons outside of the researcher's control. 2. **Experiments**: The explanatory variable(s) are controlled by the researcher (and the researcher randomly assigns the value of the explanatory variable to each subject). * The researcher (randomly) determines who will be enrolled in the Housing First program * To demonstrate a **causal** relationship, need to run an **experiment**. ## Factors and Treatments * **Factor**: An explanatory variable whose levels are manipulated (i.e., assigned to the subjects/experimental units) by the researcher. * In our example: Does the subject enroll in the Housing First program? * There may be more than one factor in a given study. * This is different from R's use of the word factor to refer to any categorical variable!! * **Treatment**: The combination of factor levels a given subject or experimental unit is assigned to. * In our example: "Assigned to Housing First" or "Not assigned to Housing First" * If we had two factors, treatment would be the combined levels of those factors ## Confounding * **Confounding**: When the levels of one factor (explanatory variable) are associated with the levels of another factor, we can't tell which one causes the response * **(Made Up) Example:** * Costs to the public depend on whether a subject is enrolled in the Housing First program * Costs to the public depend on medical history (e.g., maybe subjects with a history of substance abuse incur more costs to the public) * Suppose the Housing First program doesn't accept subjects with a history of substance abuse * An individual's **enrollment status in Housing First** and history of **Substance Abuse** would be confounded, so we would not be able to isolate their effects on costs to the public. ## Four Priciples of Experimental Design * **Control**: Control sources of variation other than the factors we are testing by making conditions as similar as possible for all treatment groups. * Ensure that all conditions other than whether or not a subject is in Housing First are the same for all subjects. * **Randomization**: Subjects/experimental units are assigned to treatments at random to equalize the effects of unknown or uncontrollable sources of variation. ## Four Priciples of Experimental Design * **Replication**: Each treatment is applied to more than one subject/experimental unit. * **Blocking**: Group together subjects/experimental units that are similar in important ways that you cannot control, then randomize the assignment of treatments within each of these groups, or blocks. * If we think a person's medical conditions are related to possible costs to the public, form a group of people with similar medical histories, randomly assign some to Housing First and some to control. ## Other Terms * **Matching** In an observational study, study participants who are similar in ways that are not directly being studied, but have different levels of the explanatory variables of interest, are *matched* and the response is compared between these matched participants. * If we think a person's medical conditions are related to possible costs to the public, find two people with similar medical histories: one in Housing First and one not in Housing First; compare costs for those subjects. * **Blinding**: Any individual involved in an experiment (including the subjects and the researchers) who does not know which subjects/experimental units have been assigned to which treatments is **blinded**. ## A Final Thought * **This study found preliminary evidence that a housing first program targeted to "adults who were homeless with medical illnesses and high prior acute-care use" could result in cost savings to the public**. * Is a statistical analysis of costs to the public a good way of justifying this program or arguing for its funding? ## A Final Thought * **This study found preliminary evidence that a housing first program targeted to "adults who were homeless with medical illnesses and high prior acute-care use" could result in cost savings to the public**. * Is a statistical analysis of costs to the public a good way of justifying this program or arguing for its funding? * Should we only provide housing to homeless adults who have sufficiently high medical costs that savings of at least $18,600 can be achieved? ## A Final Thought * **This study found preliminary evidence that a housing first program targeted to "adults who were homeless with medical illnesses and high prior acute-care use" could result in cost savings to the public**. * Is a statistical analysis of costs to the public a good way of justifying this program or arguing for its funding? * Should we only provide housing to homeless adults who have sufficiently high medical costs that savings of at least $18,600 can be achieved? * Before you do a study and run a fancy statistical analysis, make sure you're answering the right question.