Uncommon Ground

Author Archive: kent

Causal inference in ecology – The Rubin causal modal (part 1)

Causal inference in ecology – links to the series

Last week I described an experiment that was reasonably well controlled. We1 randomized genotypes within populations across two experimental gardens to determine whether certain leaf traits changed with the age of the plant, the garden in which they were grown or both. Interpreting the results of those experiments is reasonably straightforward. At the end, I was setting up for an exploration of what might be required to infer a causal influence of home environment on plant traits from an observed statistical association between the environment in the sites from which seeds were collected and the traits of the plants grown from those seeds in the gardens.

I’m going to shift gears, because I ran across a (non-ecological) example should be easier to understand. Putting off discussion of Protea for another week also gives me more time to get a simplified version of the data prepared and analyzed (in an R notebook). So I’m going to examine a recent report that claims evidence for the “Scully effect”, which is the claim that young women who watched Dr. Dana Scully in the X Files have a more positive impression of STEM fields, were more likely to pursue a career in STEM, and see Scully as a role model than those who didn’t watch the X Files.2

I’ll focus on only the first claim. The report3 states it more precisely this way:

Women who are medium/heavy watchers of The X-Fileshold more positive views of STEM than non/light watchers, and several survey questions link this directly to the influence of Scully’s character.

And I’ll focus on only one of the pieces of evidence in the report, namely

A greater percentage of medium/heavy viewers of The X-Files strongly believe that young women should be encouraged to study STEM than non/light viewers (56% compared to 47%).

Let’s not worry about statistics. What we have is a positive association between watching the X Files and encouraging young women to study STEM. Let’s assume that the reported difference in the sample is a reliable indication of a similar difference in the population as a whole.4 Can we conclude that watching the X Files caused that association?

The easy answer is “No”, since we all know that correlation does not imply causation, but let’s unpack that “No” and see why correlation does not imply causation. Maybe, just maybe there’s a way we can infer causation from correlation, provided we make some additional assumptions.

One of the first observations I made in this series is that causes precede effects. That condition is clearly satisfied. The women included in the survey were chosen specifically to be old enough (a) to have watched the original X Files or the current seasons and (b) “to have entered the post-college workforce.” One reason we might be skeptical of the claim “Having watched Scully increases the probability that women believe young women should be encouraged to study STEM” is that women who watched Scully may differ from women who didn’t watch Scully in ways that would have predisposed them to encourage young women to study STEM. For example, it’s reasonable to think that women who watched Scully might have already had a greater interest in science than those who didn’t, since the X Files was a science fiction drama series, not a crime fiction or other drama series.

So to conclude that a positive association between watching Scully and encouraging young women to study STEM is evidence that watching Scully causes viewers to encourage young women to study STEM, we need to know that watching Scully or not is the only relevant difference between the women included in the observational sample. If it is, then even though we didn’t design the experiment and randomly assign women to one treatment or the other, it is as if we did. This is where Rubin’s causal model comes in.5 It helps us think about how we might be able to determine that the watching and non-watching groups are equivalent (or how we might make them essentially equivalent if they aren’t already). That’s where we’ll pick up next week.6

  1. Remember the disclaimer about “we”. “We” really means Jane Carlson, Ann Marie Gawel, and Rachel Prunier. My role was primarily to sit on the sidelines and provide a little advice. I recorded some of the data when it was dictated to me, but Jane, Ann Marie, and Rachel did all of the real work.
  2. In the interest of full disclosure, I should mention that I watched only a few episodes of the X Files. That won’t surprise anyone who knows me, because anyone who knows me knows that I watch very little TV.
  3. from the Geena Davis Institute on Gender in Media
  4. One challenge that is too often overlooked is determining what “the population as a whole” means. One of the most fundamental questions to answer in interpreting any experimental or observational result is “Over what population can the pattern I observed be generalized?” As a point of information, the authors of the report note that “All differences reported here are statistically significant at the .10 level.”
  5. Or at least this is where it comes in for me.
  6. Which also means that it will be at least two weeks before we get back to Protea.

Why didn’t I investigate R Notebooks sooner?

I don’t remember when I first heard about R Notebooks, but it was quite a while ago. I finally decided to investigate them last weekend, and I’m hooked. I expect to be doing much of my R work in R notebooks from now on. For purposes of reproducibility, I still plan to extract R code from what are referred to as “chunks” to produce standalone R scripts that I can rerun from the R console to verify my results, but the interactive notebooks will allow me to run chunks as I’m developing new code and to document what I’m doing.

The link above will take you to documentation on R Notebooks. The only downside to them is that to use them you have to use RStudio. That’s not a big downside, since there is a free version available, but I’m still enough of an old fogey that my fingers are used to Emacs, and Emacs is where I generally prefer to edit code. It will take me a while to develop a new workflow, but you can be sure that it will include R Notebooks.

I produced a Randomization demo very quickly once I updated my version of RStudio. I produced HTML simply by saving my notebook to disk. The .nb.html file was automatically produced in the same directory. All I had to do was to upload it to an appropriate directory here. If you have a recent version of RStudio that supports R Notebooks, you can download the Markdown code using the “Code” dropdown at the top right of the page. Simply open the .Rmd file, and RStudio will open it as a notebook. You can then execute chunks yourself to redo the simulation in any way that you care to. That’s even easier than copying and pasting the code I posted earlier.

Causal inference in ecology – Setting the stage for the Rubin causal model

Causal inference in ecology – links to the series

If you’ve been following along, you realize by now that it’s not easy to infer the cause of a phenomenon, even in a well-controlled experiment. What about observational experiments, which are what many ecologists and evolutionary biologists have? Take one paper of mine that I’m reasonably happy with (PLoS One e52035; 2012. doi: 10.1371/journal.pone.0052035). One part of that paper included an experiment of sorts. We1 established experimental gardens at the Kirstenbosch National Botanical Garden and at a mid-elevation site on Jonaskop mountain about 100km due east of Kirstenbosch. Among other things we were interested both in how certain traits in newly formed leaves (specific leaf area, stomatal pore index, and leaf area) differed depending on the age of the plant and on the garden in which they were grown.

This part of the paper is analogous to the thought experiment on corn that we’ve discussed so far. Since the plants were grown from wild-collected seedlings, we obviously couldn’t duplicate genotypes across gardens. We also didn’t have enough seed from individual maternal plants to replicate families across the gardens. So we did the best we could. We randomized seedlings within populations and split populations across gardens. You’ll see the results from this part of the analysis in the figure below.

Figure 4 from PLoS One e52035; 2012. doi: 10.1371/journal.pone.0052035

The trends with plant age are clear.2 Specific leaf area (SLA) declines with plant age, stomatal pore index increases with plant age, and leaf area increases with plant age. Given that the trends are consistent across species and gardens, I’m reasonably confident that plant age influences these traits in this group of Protea.3 Notice that I wrote “influences”, which is a short way (for me) of writing that plant age is a causal factor that influences the traits but that I am not claiming that it is the causal factor.4

Figure 3 from PLoS One e52035; 2012. doi: 10.1371/journal.pone.0052035

Similarly, the figure above makes it clear that which garden the plants are grown in influences these (and other) traits. These results won’t surprise anyone whose worked with plants. The traits plants have depend both on how old the plant is and on where its grown. So far a reasonably straightforward experiment, but how about this? We also wanted to know whether the amount of change in leaf traits depended on measures of resource availability and rainfall seasonality in the places that seeds were collected from. Here we’re asking a more complicated question.

Take SLA, for example, and imagine that we’re asking the question just about changes in SLA for plants grown at Jonaskop. Now the fully fleshed out the question is something like this:

  • I know that plants are often adapted to the local circumstances in which they are growing.
  • If SLA reflects plant characteristics that are important in local adaptation to nutrients or water availability, then plants that grow in places that differ in nutrients or water availability should also differ in SLA in ways that make them well-suited to the place where they occur.
  • Do we have evidence that there is an association between changes in SLA and nutrient availability or precipitation patterns in the site from which they are derived?

It’s that middle step that’s tricky. We don’t need to do anything special to run a regression on changes in SLA and home site characteristics, but to interpret that regression as evidence for the causal story in that middle step we need to do something more. Unlike the nicely randomized experiment with which we began this post, we aren’t randomizing plants across sites and allowing them to adapt. What we have is purely observational data to address this question. To what extent can we make a causal inference from these data? That’s the question I’ll turn to in the next installment.

  1. By “we” I should be clear that Jane Carlson, Rachel Prunier, and Ann Marie Gawel collected all of the seeds that “we” used to establish the gardens, and they did all of the work of germinating seedlings and establishing the gardens. They also collected nearly all of the data. I helped collect a little, but my help mostly consisted of standing there with a clipboard and data sheet and writing down numbers in the appropriate columns.
  2. The same plants were measured in 2009 and 2010. In both cases, measurements were made on newly formed, but fully expanded leaves. I’m not reporting P-values, but you can find them in Table 1.
  3. The species are all members of a small, recently evolved monophyletic clade, Protea sect. Exsertae.
  4. Notice also that I am discounting the possibility that it is weather in the year the plants were growing that influences their traits rather than their age.

Honoring Ruth Millikan

Ruth Millikan is Emeritus Board of Trustees Distinguished Professor of Philosophy at UConn. Quoting from her web page, Ruth’s “research interests span many topics in the philosophy of biology, philosophy of mind, philosophy of language, and ontology.” She is a highly respected and influential philosopher. From her Wikipedia page:

She was awarded the Jean Nicod Prize and gave the Jean Nicod Lectures in Paris in 2002.[3] She was elected to the American Academy of Arts and Sciences in 2014 [4] and received, in 2017, both the Nicholas Rescher Prize for Systematic Philosophy from the University of Pittsburgh[5] and the Rolf Schock Prize in Logic and Philosophy.[6]

On April 30th I had the great honor of presenting a few remarks at an event held to celebrate Ruth’s contributions and to inaugurate the Ruth Garrett Millikan Endowment to support graduate students. Daniel Dennett was the featured speaker, and he highlighted Ruth’s contributions, focusing especially on one of her early books – Language, Thought, and Other Biological Categories – and her most recent one – Beyond Concepts. If you want to understand why her work is so important, you’ll need to read those books yourself. Her Wikipedia page provides only a very brief summary.

My comments focused on why graduate education, particularly PhD education, and financial support for graduate education is vital. On the off chance you’re interested in reading what I had to say, the full text of my remarks follows.

(more…)

Causal inference in ecology – The challenge of falsification

Causal inference in ecology – links to the series

It sounds so simple. You have a hypothesis. You design an experiment to test it. If the predicted result doesn’t happen, reject the hypothesis and start over. That’s how science works, right? We can’t prove a hypothesis, but we can reject them. That’s how we make progress. That’s what makes science empirical. End of story right? Would I be asking that question if it were?

Let’s look at the logic a bit more carefully.

The hypothesis we’ve been using as an example is simple: If we apply nitrogen fertilizer, the yield of corn will increase. Our experiment is to till the soil in a field thoroughly, plant genetically uniform1 corn, and apply fertilizer on one part of the field and not the other. The test of our hypothesis is whether yield in the fertilized part of the field exceeds yield in the unfertilized part of the field. For the sake of argument, let’s suppose that the fertilized part of the field has the same yield (or less) than the unfertilized part of the field. Would you conclude that adding nitrogen fertilizer doesn’t increase corn yield? I wouldn’t, and I’ll bet you wouldn’t either. Why wouldn’t we conclude that? My logic would run like this:

  • I’m aware of a lot of other experiments, including some I’ve run myself, where adding nitrogen fertilizer to corn (and to other plants for that matter) increases yield.2 There must have been something wrong with the experimental conditions.
  • The experimental conditions include everything about the experiment.
    • It could be that I didn’t do a good job of tilling the field and mixing the soil. Maybe the part of the field that I left unfertilized happened to have much higher soil fertility, more than enough to compensate for the added nitrogen in the part of the field with lower fertility. Maybe the part of the field I fertilized happened to have minerals in the soil that immediately bound the nitrogen so that it wasn’t available to the plants.
    • It could be that there was something wrong with the fertilizer. Maybe it was a bad batch and for some reason the nitrogen wasn’t in a form that’s available for plants.
    • Maybe I didn’t do a good job of randomizing the genetic background, and I happened to have families of low-yield plants in the nitrogen fertilizer treatment.
    • Maybe I put on so much nitrogen that I “burned” the corn.

The bottom line is, there are a lot of ways that the experiment could have gone wrong. When an experiment fails to give the prediction we expected, our natural tendency is to reject the hypothesis we were testing, but strictly speaking, we don’t know whether our hypothesis is wrong, or whether there was something about our experimental conditions that made the experiment a bad test of the hypothesis.

In short, falsifying a hypothesis is hard, and we can never be certain that it’s false. It’s only by assessing the reasonableness of the experimental conditions that we can determine whether it’s our hypothesis or the experimental conditions that are faulty.

To my mind this is why we trust causal inferences from carefully controlled experiments more than those from observational studies. In a carefully controlled experiment, we make everything about the treatment and control as similar as possible, except for the difference in treatment. That way if we see a treatment effect, we have a lot more confidence in ascribing the result to the treatment not something else, and we have a lot more confidence in saying that the treatment has no effect (and our hypothesis is satisfied) if we fail to observe the expected result.

Next time we’ll talk about how to apply similar logic to observational studies and explore the challenge of making causal inferences from them.

  1. Or genetically randomized
  2. To be honest. I’ve never run such an experiment with corn, but I’ve run crude, unintentional experiments on a lot of plants I grow in my yard. I forget to fertilize some, and the difference is obvious.

Causal inference in ecology – no post this week

Last week was finals week at UConn, which means Commencement weekend began on Saturday. I represented the Provost’s Office at the PharmD ceremony at 9:00am Saturday morning, and our Master’s Commencement Ceremony was held at 1:30pm. I had yesterday off, but because of all the things that accumulated last week, I didn’t have time to write the next installment of this series. It will return next Monday, barring unforseen complications.

In the meantime, this page contains links to the posts that have appeared so far. It also contains links to some related posts that you might find interesting. You may also have noticed the “Causal inference in ecology” label at the top of this page. That’s a link to the same page of posts in case you want to find it again.

Causal inference in ecology – Randomization and sample size

Causal inference in ecology – links to the series

Last week I explored the logic behind controlled experiments and why they are typically regarded as the gold standard for identifying and measuring causal effects.1 Let me tie that post and the preceding one on counterfactuals together before we proceed with the next idea. To make things as concrete as possible, let’s return to our hypothetical example of determining whether applying nitrogen fertilizer increases the yield of corn. We do so by

  • Randomly assigning individual corn plants to different plots within a field.
  • Applying nitrogen fertilizer to some plots, the treatment plots, and not to others, the control plots.
  • Determining whether the yield in treatment plots exceeds that in the control plots.

Where do counterfactuals come in? If the yield of treatment plots exceeds that of control plots aren’t we done? Well, not quite. You see the plants that were in the treatment plots are different individuals from those that are in the control plots. To infer that nitrogen fertilizer increases yield, we have to extrapolate the results from the treatment plots to the control plots. We have to be willing to conclude that the yield in the control plots would have been greater if we had applied nitrogen fertilizer there. That’s the counterfactual. We are asserting what would have happened if the facts had been different. In practice, we don’t usually worry about this step in the logic, because we presume that our random assignment of corn plants to different plots means that the plants in the two plots are essentially equivalent. As I pointed out last time, that inference depends on having done the randomization well and having a reasonably large sample.

Let’s assume that we’ve done the randomization well, say by using a pseudorandom number generator in our computer to assign individual plants to the different plots. But let’s also assume that there is genetic variation among our corn plants that influences yield. To make things really simple, let’s assume that there’s a single locus with two alleles associated with yield differences, that high yield is dominant to low yield, and that the two alleles are in equal frequency, so that 75% of the individuals are high yield and 25% are low yield. To make things really simple let’s further assume that all of the high yield plants produce 1kg of corn (sd=0.1kg) and that all of the low yield plants produce exactly 0.5kg of corn (sd=0.1kg).2 Let’s further assume that applying nitrogen fertilizer has absolutely no effect on yield.Then a simple simulation in R produces the following results:3

Sample size:  5 
          lo:  133 
          hi:  140 
     neither:  9727 
Sample size:  10 
          lo:  201 
          hi:  175 
     neither:  9624 
Sample size:  20 
          lo:  255 
          hi:  217 
     neither:  9528 

What you can see from these results is that I was only half right. You need to do the randomization well,4 but your sample size doesn’t need to be all that big to ensure that you get reasonable results. Keep in mind that “reasonable results” here means that (a) you reject the null hypothesis of no difference in yield about 5% of the time and (b) you reject it either way at about the same frequency.5 There are, however, other reasons that you want to have reasonable sample sizes. Refer to the posts linked to on the Causal inference in ecology page for more information about that.

With counterfactuals, controlled experiments, and randomization out of the way, our next stop will be the challenge of falsification.I didn’t discuss the “and measuring” part last week, only the “identifying” part. We’ll return to measuring causal effects later in this series after we’ve explored issues associated with identifying causal effects (or exhausted ourselves trying).

  1. That corresponds to an effect size of 0.2 standard deviations.
  2. Click through to the next page to see the R code.
  3. OK, you can’t see that you need to do the randomization well, but I did it well and it worked, so why not do it well and be safe?
  4. Since I used a two-sided t-test with a 5% significance threshold, this is just what you should expect.

(more…)

Causal inference in ecology – Controlled experiments

Causal inference in ecology – links to the series

Randomized controlled experiments are generally regarded as the gold standard for identifying a causal factor.1 Let’s describe a really simple one first. Then we’ll explore why they’re regarded as the gold standard.

Picking up with the example I used last time, let’s suppose we’re trying to test the hypothesis that applying nitrogen fertilizer increases the yield of corn.2 As I pointed out, in setting up our experiment, we’d seek to control for every variable that could influence corn yield so that we can isolate the effect of nitrogen. In the simplest possible case, we’d have two adjacent plots in a field that have been plowed and tilled thoroughly so that the soil in the two plots is completely mixed and indistinguishable in every way – same content of nitrogen, phosphorous, other macronutrients, other micronutrients; same soil texture; same percent of (the same kind of) soil organic matter; same composition of clay, silt, and sand; everything.3 We’d also have plants that were genetically uniform (or as genetically uniform as we can make them), either highly inbred lines or an F1 cross produced between two highly inbred lines. We’d make sure the field was level, maybe using high-tech laser leveling devices, and we’d make sure that every plant in the entire field received the same amount of water. Since we know that the microclimate at the perimeter of the field is different from in the middle of the field, we’d make the field big enough that we could focus our measurements on a part of the field isolated from these edge effects. Then we’d randomly choose one side of the field to be the “low N” treatment and the other to be the “high N” treatment.4 After allowing the plants to grow for an appropriate amount of time, we’d harvest them, dry them, and weigh them.

Our hypothesis has the form

If N is applied to a corn field, then the yield will be greater than if it had not been applied.

Notice that we can’t both apply N and not apply N to the same set of plants. We have to compare what happens when we apply N to one set of plants and don’t apply it to another. If we find that the “high N” plants have a greater yield than the “low N” plants, we infer that the “low N” plants would also have had a greater yield if we had applied N to them (which we didn’t). Why is that justified? Because everything about the two treatments is identical, by design, except for the amount of N applied. If there’s a difference in yield, it can only be attributed to something that differs between the treatments, and the only thing that differs is the amount of N applied.

I can hear you thinking, “Couldn’t the difference just be due to chance?” Well, yes it could. If we do a statistical test and demonstrate that the yields are statistically distinguishable, that increases our confidence that the difference in yield is real, but nothing can ever make the conclusion logically certain in the way we can be logically certain that 2+2=4.5 To my mind there are two things that make us accept the outcome of this experiment as evidence that applying N increases corn yield:

  1. It’s not just this experiment. If the same experiment is repeated in different places with different soil types, different corn genotypes, and different weather patterns, we get the same result. We can never be certain, but the consistency of that result increases our confidence that the association isn’t just a fluke.
  2. What we understand about plant growth and physiology leads us to expect that providing nitrogen in fertilizer should enhance plant growth. In other words, this particular hypothesis is part of a larger theoretical framework about plant physiology and development. That framework provides a coherent and repeatable set of predictions across a wide empirical domain.

Put those two together, and we have good reason for thinking that the observed association between N fertilizer and corn yield is actually a causal association.

In experiments where we can’t completely control all relevant variables except the one that we’re interested in, we rely on randomization. Suppose, for example, we couldn’t produce genetically uniform corn. Then we’d randomize the assignment of individuals to the “high” and “low” treatments. The results aren’t quite as solid as if we’d had complete uniformity. It’s always possible that by some statistical fluke a factor we aren’t measuring ends up overrepresented in one treatment and underrepresented in the other, but if we’ve randomized well and we have a reasonably large sample, the chances are small. So our inference isn’t quite as firm, but it’s still pretty goo.

We’ll explore the “reasonably large sample question” in the next installment.

  1. See, for example, Rubin (Annals of Applied Statistics 2:808-840; 2008. https://projecteuclid.org/euclid.aoas/1223908042)
  2. If you know me or my work, you know that I’m not at all crazy about the null hypothesis testing approach to investigating ecology. We’ll get to that later, but let’s start with a simple case. Even those of us who don’t like null hypothesis testing as a general approach recognize that it has value. We’ll focus on one way in which it has value here.
  3. If we were really fastidious we might even set up the experiment in a large growth chamber in which we mixed the soil together and distributed it evenly ourselves.
  4. If we were really paranoid about controlling for all possible factors, we’d even randomly assign a nitrogen fertilizer level (high or low) to every different plant in the field, and we’d probably do the whole experiment in a very large growth chamber where we could mix the soil ourselves and ensure that light, humidity, and temperature were as uniform as possible across all individuals in the experiment.
  5. If you don’t see why, Google “problem of induction” and you’ll get some idea. If that doesn’t satisfy you, ask, and I’ll see what I can do to provide an explanation.

Made it to Brisbane

It’s among the longest stretch of (planned) travel that I’ve done.1 I

  • left Hartford at 11:45am EDT on Thursday, April 19
  • arrived in Cinncinnati at 1:53pm,
  • left Cinncinnati at 2:45pm,
  • arrived in Los Angeles at 4:45pm PDT,
  • left Los Angeles at 10:30, and
  • arrived in Brisbane at 5:30am Australian Eastern time on Saturday, April 21.

There’s a 14 hour time difference between Brisbane and Hartford. That makes the total travel time 27 hours, 45 minutes gate to gate. I arrived at my hotel about 2 1/2 hours ago. Remarkably, they had a room they could give me, even though the official check in time isn’t until 2:00pm. It’s a very comfortable room in what appears to be a very nice part of the city. I don’t have any meetings until tomorrow. Once I’ve finished up a couple of things I want to do, I’m going to put on some comfortable shoes and go for a walk around town with my camera. First stop, the City Botanic Gardens. I’ll miss the farmer’s market, which is held tomorrow, and I’m not sure what I’ll visit after the botanic gardens, but I’m going to keep going all day. If I can go to bed at something resembling a normal time, there’s a good chance I’ll escape the worst effects of jet lag tomorrow.

The photo is the view from my hotel room.

  1. A few years ago it took me 3 1/2 days to get home from Capetown. I was stranded in Amsterdam for 2 nights. Yes I mean stranded. The first night I was stuck in the airport. The second night I was at an airport hotel, but I didn’t get there until 3 in the afternoon – too late to go into the city and enjoy anything.