# Academics, biodiversity, genetics, & evolution

## You really need to check your statistical models, not just fit them

I haven’t had a chance to read the paper I mention below yet, but it looks like a very good guide to model checking – a step that is too often forgotten. It doesn’t do us much good to estimate parameters of a statistical model that doesn’t do well at fitting the data we have. That’s what model checking is all about. In a Bayesian context, posterior predictive model checking is particularly useful.1 If the parameters and the model you used to estimate them can’t reproduce the data you collected reasonably well, the model isn’t doing a good job of fitting the data, and you shouldn’t trust the parameter estimates.

If you happen to be using Stan (via rstan) or rstanarm, posterior predictive model checking is either immediately available (rstanarm) or easy to make available (rstan) in Shinystan. It’s built on the functions in bayesplot, which provides the underlying functions for posterior prediction for virtually any package (provided you coerce the result into the right format). I’ve been using bayesplot lately, because it integrates nicely with R Notebooks, meaning that I can keep a record of my model checking in the same place that I’m developing and refining the code that I’m working on.

Here’s the title, abstract, and a link:

A guide to Bayesian model checking for ecologists

Paul B. Conn, Devin S. Johnson, Perry J. Williams, Sharon R. Melin, Mevin B. Hooten

Ecological Mongraphs doi: 10.1002/ecm.1314

Checking that models adequately represent data is an essential component of applied statistical inference. Ecologists increasingly use hierarchical Bayesian statistical models in their research. The appeal of this modeling paradigm is undeniable, as researchers can build and fit models that embody complex ecological processes while simultaneously accounting for observation error. However, ecologists tend to be less focused on checking model assumptions and assessing potential lack of fit when applying Bayesian methods than when applying more traditional modes of inference such as maximum likelihood. There are also multiple ways of assessing the fit of Bayesian models, each of which has strengths and weaknesses. For instance, Bayesian P values are relatively easy to compute, but are well known to be conservative, producing P values biased toward 0.5. Alternatively, lesser known approaches to model checking, such as prior predictive checks, cross‐validation probability integral transforms, and pivot discrepancy measures may produce more accurate characterizations of goodness‐of‐fit but are not as well known to ecologists. In addition, a suite of visual and targeted diagnostics can be used to examine violations of different model assumptions and lack of fit at different levels of the modeling hierarchy, and to check for residual temporal or spatial autocorrelation. In this review, we synthesize existing literature to guide ecologists through the many available options for Bayesian model checking. We illustrate methods and procedures with several ecological case studies including (1) analysis of simulated spatiotemporal count data, (2) N‐mixture models for estimating abundance of sea otters from an aircraft, and (3) hidden Markov modeling to describe attendance patterns of California sea lion mothers on a rookery. We find that commonly used procedures based on posterior predictive P values detect extreme model inadequacy, but often do not detect more subtle cases of lack of fit. Tests based on cross‐validation and pivot discrepancy measures (including the “sampled predictive P value”) appear to be better suited to model checking and to have better overall statistical performance. We conclude that model checking is necessary to ensure that scientific inference is well founded. As an essential component of scientific discovery, it should accompany most Bayesian analyses presented in the literature.

1. Andrew Gelman introduced the idea more than 20 year ago (link), but it’s only really caught on since his Stan group made some general purpose packages available that simplify the process of producing the predictions. (See the next paragraph for references.)

## Alan Gelfand on the history of MCMC and the future of statistics (in a world of data science)

I am fortunate to have known Alan Gelfand for a couple of decades. I first met him in the late 1990s when I walked over to the Math/Science building to talk with him about some problems I was having in my early exploration of Bayesian inference for F-statistics. I was using BUGS (this was pre-WinBUGS), but it was the modeling I needed some advice on. I didn’t realize until a couple of years later that Alan was the Gelfand of Gelfand and Smith, “Sampling-Based Approaches to Calculating Marginal Densities” (Journal of the American Statistical Association 85:398-409; 1990 – doi: 10.1080/01621459.1990.10476213)  and Gelfand et al. “Illustration of Bayesian Inference in Normal Data Models Using Gibbs Sampling” (Journal of the American Statistical Association 85:972-985; 1990 – doi: 10.1080/01621459.1990.10474968). Fortunately, Alan is too nice to have pointed out how naive I was. He simply gave me a lot of help. I haven’t seen him as often since he moved to Duke, but our paths still cross every year or two, because he and John Silander continue to collaborate on various problems in community ecology.

Alan was a keynote speaker at the Statistics in Ecology and Environmental Monitoring Conference in Queenstown, NZ last December, and David Warton posted a YouTube interview on the Methods Blog of the British Ecological Society. Alan describes the early history of MCMC, mentions his concern about the emergence of “data science”, and talks about what excites him most now – applying statistics to difficult problems in ecology and environmental science.

## Trait-environment relationships in Pelargonium

Almost 15 years ago Wright et al. (Nature 428:821–827; 2004 – doi: 10.1038/nature02403) described the worldwide leaf economics spectrum “a universal spectrum of leaf economics consisting of key chemical, structural and physiological properties.” Since then, an enormous number of articles have been published that examine or refer to it – more than 4000 according to Google Scholar. In the past few years, many authors have pointed out that it may not be as universal as originally presumed. For example, in Mitchell et al. (The American Naturalist 185:525-537; 2015 – http://www.jstor.org/stable/10.1086/680051) we found a negative relationship between an important component of the leaf economics spectrum (leaf mass per area) and mean annual temperature in Pelargonium from the Cape Floristic Region of southwestern South Africa, while the global pattern is for a positive relationship.1

Now Tim Moore and several of my colleagues follow up with a more detailed analysis of trait-environment relationships in Pelargonium. They demonstrate several ways in which the global pattern breaks down in South African samples of this genus. Here’s the abstract and a link to the paper.

• Functional traits in closely related lineages are expected to vary similarly along common environmental gradients as a result of shared evolutionary and biogeographic history, or legacy effects, and as a result of biophysical tradeoffs in construction. We test these predictions in Pelargonium, a relatively recent evolutionary radiation.
• Bayesian phylogenetic mixed effects models assessed, at the subclade level, associations between plant height, leaf area, leaf nitrogen content and leaf mass per area (LMA), and five environmental variables capturing temperature and rainfall gradients across the Greater Cape Floristic Region of South Africa. Trait–trait integration was assessed via pairwise correlations within subclades.
• Of 20 trait–environment associations, 17 differed among subclades. Signs of regression coefficients diverged for height, leaf area and leaf nitrogen content, but not for LMA. Subclades also differed in trait–trait relationships and these differences were modulated by rainfall seasonality. Leave‐one‐out cross‐validation revealed that whether trait variation was better predicted by environmental predictors or trait–trait integration depended on the clade and trait in question.
• Legacy signals in trait–environment and trait–trait relationships were apparently lost during the earliest diversification of Pelargonium, but then retained during subsequent subclade evolution. Overall, we demonstrate that global‐scale patterns are poor predictors of patterns of trait variation at finer geographic and taxonomic scales.

doi.org/10.1111/nph.15196

1. If you read The American Naturalist paper, you’ll see that we wrote in the Discussion that “We could not detect a relationship between LMA and MAT in Protea….” I wouldn’t write it that way now. Look at Table 2. You’ll see that the posterior mean for the relationship is 0.135 with a 95% credible interval of (-0.078,0.340). I would now write that “We detected a weakly supported positive relationship between LMA and MAT….” Why the difference? I’ve taken to heart Andrew Gelman’s observation that “The difference between significant’ and ‘not significant’ is not itself statistically significant” (blog post; article in The American Statistician). I am training myself to pay less attention to which coefficients in a regression and which aren’t and more to reporting the best guess we have about each relationship (the posterior means) and the amount of confidence we have about them (the credible intervals). I recently learned about hypothesis() in brms, which will provide an estimate of the posterior probability that the you’ve got the sign of the relationship right. I need to investigate that. I suspect that’s what I’ll be using in the future.

## Causal inference in ecology – Concluding thoughts

Causal inference in ecology – links to the series

Last week I concluded that the Rubin causal model isn’t likely to help me make causal inferences with the kinds of observational data I collect. I also argued that

It does, however, illuminate the ways in which additional data from different systems could be combined (informally) with the data I collect1 to make plausible causal inferences.

From the one data set I analyzed last week, I concluded that we could see an association between rainfall and stomata density in Protea sect. Exsertae but that we couldn’t claim (on the basis of this evidence alone) that the differences in rainfall caused differences in stomata density. Why do I claim that “additional data from different systems [can] be combined (informally) with [these] data to make plausible causal inferences”? Here’s why.

Think back to when we discussed controlled experiments. I pointed out that by randomizing individuals across treatments we statistically control for the chance that there’s some unmeasured factor that influences the results. It’s not as good as a perfectly controlled experiment in which the individuals are identical in every way except for the one factor whose causal influence we are trying to estimate, but it’s pretty good. Well, if we have a lot of observations from different systems – different taxa, different ecosystems, different climates – and we get higher stomata densities in areas with more annual rainfall, as we did in Protea sect. Exsertae, we also know that these other systems differ from Protea sect. Exsertae in many different ways in addition to those having to do with annual rainfall. That’s not as good as randomization, but it suggests that the association we saw in that small group of plants in the Cape Floristic Region is similar to associations elsewhere. That means the association is stable across a broader range of taxa or ecosystems or climates, or all three than our limited data showed, suggesting that there is a causal relationship.

Now it still doesn’t show that it’s mean annual rainfall, per se, that matters. It could still be something that’s associated with mean annual rainfall not only in the CFR but also in the other systems we studied. If we happened to find that the association always held, that it was never violated in any system we still couldn’t exclude the possibility that the “true” causal factor was this other thing we aren’t measuring, but it begins to become a bit implausible – rather like claiming that it’s not smoking that causes cancer, it’s something else that’s associated with smoking that causes cancer.2

This kind of argument doesn’t produce logical certainty, but re-read the post on falsification and you’ll see that even if a well-controlled experiment fails to give the results predicted by a hypothesis, it is very difficult to be sure that it’s the hypothesis that’s wrong. It may be that the experimental conditions don’t match those presumed by the hypothesis, in which case we can’t say anything about the truth or falsity of the hypothesis. In other words, even the classical hypothesis test can’t reject a hypothesis with certainty. There’s always judgment involved. It can’t be escaped.

Bottom line: If you’re willing to reject a hypothesis based on a failed experiment because you’re willing to examine all of the factors influencing the experimental conditions and conclude that none of them are the problem,3 you should be as willing to use evidence from a range of associational studies combined with some theory (whether a formal mathematical model or verbal description of the mechanics of a system) to build a case for a causal relationship from observational data. In neither case will you be certain of your conclusions. Your conclusions will merely be more or less plausible depending on how much and how strong your evidence is.

As scientists,4 we are more like detectives than logicians. We build cases. We don’t build syllogisms.

1. Remember what I wrote in that last footnote.
2. You could argue that if the two factors, the “true” causal factor and the one we measure, are invariably connected that there is really only one factor. That’s a longer philosophical discussion that I don’t have the energy to get into – at least not now.
3. Notice that reaching this conclusion depends on your background knowledge about the system and its components, i.e., prior knowledge, not observations from the experiment itself.
4. Or at least as ecologists and evolutionists.

## Causal inference in ecology – The Rubin causal model in ecology

Causal inference in ecology – links to the series

Evaluating the claim that viewing of the X Files caused women to have more positive beliefs about science illustrated how the Rubin causal model can be used to make causal influences from observational data. The basic idea is that you make the observational sample similar to a randomized experiment by using statistical adjustments to make the “treatment” and “control” conditions as similar as possible – except for the “treatment” difference.1 Several weeks ago, I promised to describe how we might use the Rubin causal model in ecology, drawing on data from a paper in PLoS One that I’m reasonably happy with. After playing with that data a bit, I changed gears. I’m going to use data from a more recent paper (Carlson et al., Annals of Botany 117:195-207; 2016 (doi: https://dx.doi.org/10.1093/aob/mcv146).

I’ll focus on a subset of the data that explores the relationship between stomatal density of Protea repens seedlings grown in an experimental garden at Kirstenbosch National Botanical Garden and three principal components associated with the environment in the populations from which seed was collected. You’ll find the details of the analysis, an <tt>R</tt> notebook, and the data in Github. The HTML produced by the R notebook showing the results is at http://darwin.eeb.uconn.edu/pages/Protea-causal-analysis.nb.html. To run the analyses from the code you can download there, you’ll need to retrieve the CSV from Github: https://github.com/kholsinger/Protea-causal-analysis/blob/master/traits-environment-pca.csv.

Here’s the bottom line. If we run a simple regression (treating year of observation as a random effect), we get the following results for the regression coefficients:

Mean 2.5%tile 97.5%tile
PCA 1 (annual temperature) 2.422 1.597 3.216
PCA 2 (summer rainfall) -2.125 -2.980 -1.277
PCA 3 (annual rainfall) 1.317 0.538 2.099

All three principal components are strongly associated with stomatal density. We’ve all been told repeatedly that “correlation does not equal causation,” but it’s still very tempting to conclude that warmer climates favor higher stomatal densities (PCA 1), more summer rainfall favors lower stomatal densities (PCA 2), and more annual rainfall favors higher stomatal densities (PCA 3). Given what I wrote last week about the Rubin causal model, we might even feel justified in reaching this conclusion, since we’ve statistically controlled for relevant differences among populations (other than those that we measured). But go back and read that post again, and pay particular attention to this sentence:

The degree to which you can be confident in your causal inference depends (a) on how well you’ve done at identifying and measuring plausible causal factors and (b) how closely your two groups are matched on those other causal factors.

Notice (a) in particular. We have good evidence for the associations noted above,2 but the principal components we identified were based on only 7 environmental descriptors, six from the South African Atlas of Agrohydrology and Climatology and elevation (from a NASA digital elevation model). There could easily be other environmental factors correlated with one (or all) of the principal components we identified that drive the association we observe. Now if similar associations had been observed in worldwide datasets involving many different groups of plants, it might not unreasonable to conclude that there is a causal relationship between the principal components we analyzed and stomatal density, but that conclusion wouldn’t be based solely on the data and analysis here. It would depend on seeing the same pattern repeatedly in different contexts, which gives us something analogous to haphazard (not random) assignment to experimental conditions.

There is, however, a further caveat.

In Carlson et al., we obtained the following results for the mean and 95% credible interval on the association between stomatal density and each of the three principal component axes:

Mean 2.5%tile 97.5%tile
PCA 1 (annual temperature) 0.258 0.077 0.441
PCA 2 (summer rainfall) -0.216 -0.394 -0.040
PCA 3 (annual rainfall) 0.155 -0.043 0.349

Don’t worry about the difference in magnitude of the coefficients. In Carlson et al. we transformed the response variables to a mean of 0 and a standard deviation of 1 before the analysis. Focus on the credible intervals. Here the credible interval for PCA 3 overlaps zero. In a conventional interpretation, we’d say that we don’t have evidence for a relationship between annual rainfall and stomatal density. 3I’d prefer to say that the relationship with annual rainfall appears to be positive, but the evidence is weaker than for the relationships with annual temperature or summer rainfall. However you say it though, there seems to be a difference in the results. Why would that be?

Because in Carlson et al. we analyzed stomatal density as one of a suite of leaf traits (length-width ratio, stomatal density, stomatal pore index, specific leaf area, and leaf area) that are correlated with one another. In particular, leaf area and stomatal density are associated with one another, perhaps because of the way that leaves develop. Leaf area is associated with annual rainfall. Thus, the association between leaf area and stomatal density intensifies the observed relationship between annual rainfall and stomatal density.

In short, we should modify that sentence from last week to add a condition (c):

The degree to which you can be confident in your causal inference depends (a) on how well you’ve done at identifying and measuring plausible causal factors, (b) how closely your two groups are matched on those other causal factors, and (c) whether or not your response variable is associated with something else (measured or not) that is influenced by the causal factors you’re studying.

Bottom line: For the types of observations I make4 the Rubin causal model doesn’t seem likely to help me make causal inferences. It does, however, illuminate the ways in which additional data from different systems could be combined (informally) with the data I collect5 to make plausible causal inferences. At least they should be plausible enough to motivate careful experimental or observational tests of those inferences (if the causal processes are interesting enough to warrant those tests).

1. Implementing this approach in analysis of a real data set can become very complicated. There’s a large literature on the Rubin causal model in social science. I’ve read almost none of it. What I’ve learned about the Rubin causal model comes from reading Gelman and Hill’s regression modeling book and from reading Imbens and Rubin.
2. That’s overstating it a bit. See the discussion that follows this paragraph.
3. There are serious problems with this kind of interpretation. See Andrew Gelman’s post explaining why “the difference between ‘significant’ and ‘not significant’ is not itself statistically significant.
4. Remember, when I write “I make” I really mean “my students, postdocs, and collaborators make.” I just follow along and help with the statistics.
5. Remember what I wrote in that last footnote.

## Causal inference in ecology – The Rubin causal model (part 2)

Causal inference in ecology – links to the series

Last week I described a straightforward example of why inferring a causal relationship from an observed association can be problematic. The authors of the study on the “Scully effect” are mostly pretty careful to write things like “regular viewers of The X-Files have far more positive beliefs about STEM than other women in the sample” rather than claiming that viewing of the X Files caused women to have more positive beliefs about STEM. In the end, though, they can’t help themselves:

The findings of this study confirm what previous research has established, that entertainment media is influential in shaping life choices.

As I pointed out last time, in order to make that claim from these data, we’d need to know that there wasn’t already a difference between women in the sample that caused women with positive beliefs about STEM to watch the X Files more often than other women.

So let’s suppose that in addition to asking women in their sample (a) whether they had watched the X Files and (b) whether they had a positive beliefs about STEM they had also asked them (c) how many courses in science and math they took during junior high and high school. Then a statistical model describing the data they collected would look like this:

$$y_i = \alpha_{treat[i]} + \beta x_i \\$$

where yi is a measure of positive belief for individual i,1 αtreat[i] is an indicator variable that denotes whether or not the individual was part of the treatment (watching the X Files ),2 β is a regression coefficient indicating the amount that taking once science or math course affects the measure of positive belief, and xi; is the number of science or math courses that individual i took. If αt > αc;, then we have some evidence that watching the X Files causally contributes to more positive impressions of stem in women.3

This approach only works, though, if the range in number of science courses taken by the two groups of women is roughly the same. If all of the women who watched the X Files took more science courses than any of the women who didn’t, we couldn’t tell whether the difference in their positive impressions was due to watching the X Files or to taking more science courses (or to the personality traits that caused them to take more science courses).

That’s the basic idea behind the Rubin causal model: Identify all of the factors that might reasonably influence the outcome of interest, include those factors in an analysis of covariance (or something similar), and infer a causal effect of the difference between two groups if there’s an effect of the grouping variable after controlling for all of the other factors and if the groups broadly overlap on other potential causal factors. The degree to which you can be confident in your causal inference depends (a) on how well you’ve done at identifying and measuring plausible causal factors and (b) how closely your two groups are matched on those other causal factors. Matching here plays the same conceptual role as randomization in a controlled experiment.

1. Where I assume that larger values correspond to more positive beliefs.
2. Notice that the subscript on α will only take two values. I’ll denote them αc and αt for “control” and “treatment”, respectively.
3. Provided we’re willing to extrapolate from our sample to women in general, or at least to women in the US.

## Causal inference in ecology – The Rubin causal modal (part 1)

Causal inference in ecology – links to the series

Last week I described an experiment that was reasonably well controlled. We1 randomized genotypes within populations across two experimental gardens to determine whether certain leaf traits changed with the age of the plant, the garden in which they were grown or both. Interpreting the results of those experiments is reasonably straightforward. At the end, I was setting up for an exploration of what might be required to infer a causal influence of home environment on plant traits from an observed statistical association between the environment in the sites from which seeds were collected and the traits of the plants grown from those seeds in the gardens.

I’m going to shift gears, because I ran across a (non-ecological) example should be easier to understand. Putting off discussion of Protea for another week also gives me more time to get a simplified version of the data prepared and analyzed (in an R notebook). So I’m going to examine a recent report that claims evidence for the “Scully effect”, which is the claim that young women who watched Dr. Dana Scully in the X Files have a more positive impression of STEM fields, were more likely to pursue a career in STEM, and see Scully as a role model than those who didn’t watch the X Files.2

I’ll focus on only the first claim. The report3 states it more precisely this way:

Women who are medium/heavy watchers of The X-Fileshold more positive views of STEM than non/light watchers, and several survey questions link this directly to the influence of Scully’s character.

And I’ll focus on only one of the pieces of evidence in the report, namely

A greater percentage of medium/heavy viewers of The X-Files strongly believe that young women should be encouraged to study STEM than non/light viewers (56% compared to 47%).

Let’s not worry about statistics. What we have is a positive association between watching the X Files and encouraging young women to study STEM. Let’s assume that the reported difference in the sample is a reliable indication of a similar difference in the population as a whole.4 Can we conclude that watching the X Files caused that association?

The easy answer is “No”, since we all know that correlation does not imply causation, but let’s unpack that “No” and see why correlation does not imply causation. Maybe, just maybe there’s a way we can infer causation from correlation, provided we make some additional assumptions.

One of the first observations I made in this series is that causes precede effects. That condition is clearly satisfied. The women included in the survey were chosen specifically to be old enough (a) to have watched the original X Files or the current seasons and (b) “to have entered the post-college workforce.” One reason we might be skeptical of the claim “Having watched Scully increases the probability that women believe young women should be encouraged to study STEM” is that women who watched Scully may differ from women who didn’t watch Scully in ways that would have predisposed them to encourage young women to study STEM. For example, it’s reasonable to think that women who watched Scully might have already had a greater interest in science than those who didn’t, since the X Files was a science fiction drama series, not a crime fiction or other drama series.

So to conclude that a positive association between watching Scully and encouraging young women to study STEM is evidence that watching Scully causes viewers to encourage young women to study STEM, we need to know that watching Scully or not is the only relevant difference between the women included in the observational sample. If it is, then even though we didn’t design the experiment and randomly assign women to one treatment or the other, it is as if we did. This is where Rubin’s causal model comes in.5 It helps us think about how we might be able to determine that the watching and non-watching groups are equivalent (or how we might make them essentially equivalent if they aren’t already). That’s where we’ll pick up next week.6

1. Remember the disclaimer about “we”. “We” really means Jane Carlson, Ann Marie Gawel, and Rachel Prunier. My role was primarily to sit on the sidelines and provide a little advice. I recorded some of the data when it was dictated to me, but Jane, Ann Marie, and Rachel did all of the real work.
2. In the interest of full disclosure, I should mention that I watched only a few episodes of the X Files. That won’t surprise anyone who knows me, because anyone who knows me knows that I watch very little TV.
3. from the Geena Davis Institute on Gender in Media
4. One challenge that is too often overlooked is determining what “the population as a whole” means. One of the most fundamental questions to answer in interpreting any experimental or observational result is “Over what population can the pattern I observed be generalized?” As a point of information, the authors of the report note that “All differences reported here are statistically significant at the .10 level.”
5. Or at least this is where it comes in for me.
6. Which also means that it will be at least two weeks before we get back to Protea.

## Why didn’t I investigate R Notebooks sooner?

I don’t remember when I first heard about R Notebooks, but it was quite a while ago. I finally decided to investigate them last weekend, and I’m hooked. I expect to be doing much of my R work in R notebooks from now on. For purposes of reproducibility, I still plan to extract R code from what are referred to as “chunks” to produce standalone R scripts that I can rerun from the R console to verify my results, but the interactive notebooks will allow me to run chunks as I’m developing new code and to document what I’m doing.

The link above will take you to documentation on R Notebooks. The only downside to them is that to use them you have to use RStudio. That’s not a big downside, since there is a free version available, but I’m still enough of an old fogey that my fingers are used to Emacs, and Emacs is where I generally prefer to edit code. It will take me a while to develop a new workflow, but you can be sure that it will include R Notebooks.

I produced a Randomization demo very quickly once I updated my version of RStudio. I produced HTML simply by saving my notebook to disk. The .nb.html file was automatically produced in the same directory. All I had to do was to upload it to an appropriate directory here. If you have a recent version of RStudio that supports R Notebooks, you can download the Markdown code using the “Code” dropdown at the top right of the page. Simply open the .Rmd file, and RStudio will open it as a notebook. You can then execute chunks yourself to redo the simulation in any way that you care to. That’s even easier than copying and pasting the code I posted earlier.

## Causal inference in ecology – Setting the stage for the Rubin causal model

Causal inference in ecology – links to the series

If you’ve been following along, you realize by now that it’s not easy to infer the cause of a phenomenon, even in a well-controlled experiment. What about observational experiments, which are what many ecologists and evolutionary biologists have? Take one paper of mine that I’m reasonably happy with (PLoS One e52035; 2012. doi: 10.1371/journal.pone.0052035). One part of that paper included an experiment of sorts. We1 established experimental gardens at the Kirstenbosch National Botanical Garden and at a mid-elevation site on Jonaskop mountain about 100km due east of Kirstenbosch. Among other things we were interested both in how certain traits in newly formed leaves (specific leaf area, stomatal pore index, and leaf area) differed depending on the age of the plant and on the garden in which they were grown.

This part of the paper is analogous to the thought experiment on corn that we’ve discussed so far. Since the plants were grown from wild-collected seedlings, we obviously couldn’t duplicate genotypes across gardens. We also didn’t have enough seed from individual maternal plants to replicate families across the gardens. So we did the best we could. We randomized seedlings within populations and split populations across gardens. You’ll see the results from this part of the analysis in the figure below.

Figure 4 from PLoS One e52035; 2012. doi: 10.1371/journal.pone.0052035

The trends with plant age are clear.2 Specific leaf area (SLA) declines with plant age, stomatal pore index increases with plant age, and leaf area increases with plant age. Given that the trends are consistent across species and gardens, I’m reasonably confident that plant age influences these traits in this group of Protea.3 Notice that I wrote “influences”, which is a short way (for me) of writing that plant age is a causal factor that influences the traits but that I am not claiming that it is the causal factor.4

Figure 3 from PLoS One e52035; 2012. doi: 10.1371/journal.pone.0052035

Similarly, the figure above makes it clear that which garden the plants are grown in influences these (and other) traits. These results won’t surprise anyone whose worked with plants. The traits plants have depend both on how old the plant is and on where its grown. So far a reasonably straightforward experiment, but how about this? We also wanted to know whether the amount of change in leaf traits depended on measures of resource availability and rainfall seasonality in the places that seeds were collected from. Here we’re asking a more complicated question.

Take SLA, for example, and imagine that we’re asking the question just about changes in SLA for plants grown at Jonaskop. Now the fully fleshed out the question is something like this:

• I know that plants are often adapted to the local circumstances in which they are growing.
• If SLA reflects plant characteristics that are important in local adaptation to nutrients or water availability, then plants that grow in places that differ in nutrients or water availability should also differ in SLA in ways that make them well-suited to the place where they occur.
• Do we have evidence that there is an association between changes in SLA and nutrient availability or precipitation patterns in the site from which they are derived?

It’s that middle step that’s tricky. We don’t need to do anything special to run a regression on changes in SLA and home site characteristics, but to interpret that regression as evidence for the causal story in that middle step we need to do something more. Unlike the nicely randomized experiment with which we began this post, we aren’t randomizing plants across sites and allowing them to adapt. What we have is purely observational data to address this question. To what extent can we make a causal inference from these data? That’s the question I’ll turn to in the next installment.

1. By “we” I should be clear that Jane Carlson, Rachel Prunier, and Ann Marie Gawel collected all of the seeds that “we” used to establish the gardens, and they did all of the work of germinating seedlings and establishing the gardens. They also collected nearly all of the data. I helped collect a little, but my help mostly consisted of standing there with a clipboard and data sheet and writing down numbers in the appropriate columns.
2. The same plants were measured in 2009 and 2010. In both cases, measurements were made on newly formed, but fully expanded leaves. I’m not reporting P-values, but you can find them in Table 1.
3. The species are all members of a small, recently evolved monophyletic clade, Protea sect. Exsertae.
4. Notice also that I am discounting the possibility that it is weather in the year the plants were growing that influences their traits rather than their age.

## Honoring Ruth Millikan

Ruth Millikan is Emeritus Board of Trustees Distinguished Professor of Philosophy at UConn. Quoting from her web page, Ruth’s “research interests span many topics in the philosophy of biology, philosophy of mind, philosophy of language, and ontology.” She is a highly respected and influential philosopher. From her Wikipedia page:

She was awarded the Jean Nicod Prize and gave the Jean Nicod Lectures in Paris in 2002.[3] She was elected to the American Academy of Arts and Sciences in 2014 [4] and received, in 2017, both the Nicholas Rescher Prize for Systematic Philosophy from the University of Pittsburgh[5] and the Rolf Schock Prize in Logic and Philosophy.[6]

On April 30th I had the great honor of presenting a few remarks at an event held to celebrate Ruth’s contributions and to inaugurate the Ruth Garrett Millikan Endowment to support graduate students. Daniel Dennett was the featured speaker, and he highlighted Ruth’s contributions, focusing especially on one of her early books – Language, Thought, and Other Biological Categories – and her most recent one – Beyond Concepts. If you want to understand why her work is so important, you’ll need to read those books yourself. Her Wikipedia page provides only a very brief summary.

My comments focused on why graduate education, particularly PhD education, and financial support for graduate education is vital. On the off chance you’re interested in reading what I had to say, the full text of my remarks follows.