Causal inference in ecology – links to the series
Last week I concluded that the Rubin causal model isn’t likely to help me make causal inferences with the kinds of observational data I collect. I also argued that
It does, however, illuminate the ways in which additional data from different systems could be combined (informally) with the data I collect1 to make plausible causal inferences.
From the one data set I analyzed last week, I concluded that we could see an association between rainfall and stomata density in Protea sect. Exsertae but that we couldn’t claim (on the basis of this evidence alone) that the differences in rainfall caused differences in stomata density. Why do I claim that “additional data from different systems [can] be combined (informally) with [these] data to make plausible causal inferences”? Here’s why.
Think back to when we discussed controlled experiments. I pointed out that by randomizing individuals across treatments we statistically control for the chance that there’s some unmeasured factor that influences the results. It’s not as good as a perfectly controlled experiment in which the individuals are identical in every way except for the one factor whose causal influence we are trying to estimate, but it’s pretty good. Well, if we have a lot of observations from different systems – different taxa, different ecosystems, different climates – and we get higher stomata densities in areas with more annual rainfall, as we did in Protea sect. Exsertae, we also know that these other systems differ from Protea sect. Exsertae in many different ways in addition to those having to do with annual rainfall. That’s not as good as randomization, but it suggests that the association we saw in that small group of plants in the Cape Floristic Region is similar to associations elsewhere. That means the association is stable across a broader range of taxa or ecosystems or climates, or all three than our limited data showed, suggesting that there is a causal relationship.
Now it still doesn’t show that it’s mean annual rainfall, per se, that matters. It could still be something that’s associated with mean annual rainfall not only in the CFR but also in the other systems we studied. If we happened to find that the association always held, that it was never violated in any system we still couldn’t exclude the possibility that the “true” causal factor was this other thing we aren’t measuring, but it begins to become a bit implausible – rather like claiming that it’s not smoking that causes cancer, it’s something else that’s associated with smoking that causes cancer.2
This kind of argument doesn’t produce logical certainty, but re-read the post on falsification and you’ll see that even if a well-controlled experiment fails to give the results predicted by a hypothesis, it is very difficult to be sure that it’s the hypothesis that’s wrong. It may be that the experimental conditions don’t match those presumed by the hypothesis, in which case we can’t say anything about the truth or falsity of the hypothesis. In other words, even the classical hypothesis test can’t reject a hypothesis with certainty. There’s always judgment involved. It can’t be escaped.
Bottom line: If you’re willing to reject a hypothesis based on a failed experiment because you’re willing to examine all of the factors influencing the experimental conditions and conclude that none of them are the problem,3 you should be as willing to use evidence from a range of associational studies combined with some theory (whether a formal mathematical model or verbal description of the mechanics of a system) to build a case for a causal relationship from observational data. In neither case will you be certain of your conclusions. Your conclusions will merely be more or less plausible depending on how much and how strong your evidence is.
As scientists,4 we are more like detectives than logicians. We build cases. We don’t build syllogisms.
- Remember what I wrote in that last footnote. ↩
- You could argue that if the two factors, the “true” causal factor and the one we measure, are invariably connected that there is really only one factor. That’s a longer philosophical discussion that I don’t have the energy to get into – at least not now. ↩
- Notice that reaching this conclusion depends on your background knowledge about the system and its components, i.e., prior knowledge, not observations from the experiment itself. ↩
- Or at least as ecologists and evolutionists. ↩