A couple of years ago I wrote a series of posts on causal inference in ecology. In it I explored the Rubin causal model and concluded that
the Rubin causal model isn’t likely to help me make causal inferences with the kinds of observational data I collect.
I haven’t changed my mind about that, but I do have an update.
I’ve been reading Regression and other stories, by Andrew Gelman, Jennifer Hill, and Aki Vehtari, which I highly recommend reading if you use regression for any purpose in your research. I just finished Chapter 21, “Additional topics in causal inference”, and the last section, 21.5 “Causes of effects and effects of causes”, is particularly relevant to my earlier conclusion. Not surprisingly, Gelman, Hill, and Vehtari (GHV) have a better way of explaining the role that regression can play in generating hypotheses than I did. You’ll need to read the chapters on causal inference (or be familiar with the Rubin causal model) to fully appreciate their insight, but here it is in a nutshell.
We can make inferences about the effect of a cause when we (a) identify an intervention (a cause) that may have an effect and (b) randomize the intervention across experimental units (or do something that mimics random assignment by balancing on potential pre-observation confounders or by using an instrumental variable, a regression discontinuity, or difference-in-differences approach). Thought about in this way, the purpose of statistical analysis is to estimate the magnitude of an effect.
The regression analyses I typically do can be cast as an attempt to make inferences about the cause of an effect.1 Here’s where GHV have a better way of thinking about it than I did. Let’s suppose that I’m interested in environmental features that influence stomatal density, the example I discussed on 11 June 2018. I illustrate there that three principal components describing aspects of the environment show strong associations with stomatal density. GHV remind us that some other variable (or set of variables) could cause the observed differences in stomatal density and that once we’ve taken that variable into account, none of the PCs would show an association with stomatal density.2 More importantly, they point out that the association suggests causal hypotheses that could account for the association. To the extent that its important to us to dissect those causes, we can then do new experiments or make new observations (using Rubin’s causal model as a framework if we’re going to make causal inferences from an observational study) structured to estimate the effects those hypotheses suggest.
- I wrote “can be cast as an attempt”, because I do my damndest to make it clear that I’m only asserting that certain variables have stronger associations with the outcome I’m studying than others, not that those variables cause the outcome. ↩
- Fortunately (for me), that’s consistent with what I wrote two years ago. ↩