Causal inference in ecology – links to the series
Last week I described a straightforward example of why inferring a causal relationship from an observed association can be problematic. The authors of the study on the “Scully effect” are mostly pretty careful to write things like “regular viewers of The X-Files have far more positive beliefs about STEM than other women in the sample” rather than claiming that viewing of the X Files caused women to have more positive beliefs about STEM. In the end, though, they can’t help themselves:
The findings of this study confirm what previous research has established, that entertainment media is influential in shaping life choices.
As I pointed out last time, in order to make that claim from these data, we’d need to know that there wasn’t already a difference between women in the sample that caused women with positive beliefs about STEM to watch the X Files more often than other women.
So let’s suppose that in addition to asking women in their sample (a) whether they had watched the X Files and (b) whether they had a positive beliefs about STEM they had also asked them (c) how many courses in science and math they took during junior high and high school. Then a statistical model describing the data they collected would look like this:
\(y_i = \alpha_{treat[i]} + \beta x_i \\\)where yi is a measure of positive belief for individual i,1 αtreat[i] is an indicator variable that denotes whether or not the individual was part of the treatment (watching the X Files ),2 β is a regression coefficient indicating the amount that taking once science or math course affects the measure of positive belief, and xi; is the number of science or math courses that individual i took. If αt > αc;, then we have some evidence that watching the X Files causally contributes to more positive impressions of stem in women.3
This approach only works, though, if the range in number of science courses taken by the two groups of women is roughly the same. If all of the women who watched the X Files took more science courses than any of the women who didn’t, we couldn’t tell whether the difference in their positive impressions was due to watching the X Files or to taking more science courses (or to the personality traits that caused them to take more science courses).
That’s the basic idea behind the Rubin causal model: Identify all of the factors that might reasonably influence the outcome of interest, include those factors in an analysis of covariance (or something similar), and infer a causal effect of the difference between two groups if there’s an effect of the grouping variable after controlling for all of the other factors and if the groups broadly overlap on other potential causal factors. The degree to which you can be confident in your causal inference depends (a) on how well you’ve done at identifying and measuring plausible causal factors and (b) how closely your two groups are matched on those other causal factors. Matching here plays the same conceptual role as randomization in a controlled experiment.
- Where I assume that larger values correspond to more positive beliefs. ↩
- Notice that the subscript on α will only take two values. I’ll denote them αc and αt for “control” and “treatment”, respectively. ↩
- Provided we’re willing to extrapolate from our sample to women in general, or at least to women in the US. ↩