Uncommon Ground

Biology

On the importance of making observations (and inferences) at the right hierarchical level

I mentioned a couple of weeks ago that trait-environment associations observed at a global scale across many lineages don’t necessarily correspond to those observed within lineages at a smaller scale (link). I didn’t mention it then, but this is just another example of the general phenomenon known as the ecological fallacy, in which associations evident at the level of a group are attributed to individuals within the group. The ecological fallacy is related to Simpson’s paradox in which within-group associations differ from those between groups.

A recent paper in Proceedings of the National Academy of Sciences gives practical examples of why it’s important to make observations at the level you’re interested in and why you should be very careful about extrapolating associations observed at one level to associations at another. They report on six repeated-measure studies in which the responses of multiple participants (87-94) 1 were assessed across time. Thus, the authors could assess both the amount of variation within individuals over time and the amount of variation among individuals at one time. They found that the amount of within individual variation was between two and four times higher than the amount of among individual variation. Why do we care? Well, if you wanted to know, for example whether administering imipramine reduced symptoms of clinical depression (sample 4 in the paper) and used the among individual variance in depression measured once to assess whether or not an observed difference was statistically meaningful, you’d be using a standard error that’s a factor of two or more too small. As a result, you’d be more confident that a difference exists than you should be based on the amount of variation within individuals.

Why does this matter to an ecologist or an evolutionary biologist? Have you ever heard of “space-time substitution”? Do a Google search and near the top you’ll find a link to this chapter from Long Term Studies in Ecology by Steward Pickett. The idea is that because longitudinal studies take a very long time, we can use variation in space as a substitute for variation in time. The assumption is rarely tested (see this paper for an exception), but it is widely used. The problem is that in any spatially structured system with a finite number of populations or sites, the variance among sites at any one time (the spatial variation we’d measure) is substantially less than the variance in any one site across time (the temporal variance). If we’re interested in the spatial variance, that’s fine. If we’re interested in how variable the system is over time, though, it’s a problem. It’s also a problem if we believe that associations we see across populations at one point in time are characteristics of any one population across time.

In the context of the leaf economic spectrum, most of the global associations that have been documented involve associations between species mean trait values. For the same reason that space-time substitution may not work and for the same reason that this recent paper in PNAS illustrates that among group associations in humans don’t reliably predict individual associations, if we want to understand the mechanistic basis of trait-environment or trait-trait associations, by which I mean the evolutionary mechanisms acting at the individual level that produce those associations within individuals, we need to measure the traits on individuals and measure the environments where those individuals occur.

Here’a the title and abstract of the paper that inspired this post. I’ve also included a link.

Lack of group-to-individual generalizability is a threat to human subjects research

Aaron J. Fisher, John D. Medaglia, and Bertus F. Jeronimus

Only for ergodic processes will inferences based on group-level data generalize to individual experience or behavior. Because human social and psychological processes typically have an individually variable and time-varying nature, they are unlikely to be ergodic. In this paper, six studies with a repeated-measure design were used for symmetric comparisons of interindividual and intraindividual variation. Our results delineate the potential scope and impact of nonergodic data in human subjects research. Analyses across six samples (with 87–94 participants and an equal number of assessments per participant) showed some degree of agreement in central tendency estimates (mean) between groups and individuals across constructs and data collection paradigms. However, the variance around the expected value was two to four times larger within individuals than within groups. This suggests that literatures in social and medical sciences may overestimate the accuracy of aggregated statistical estimates. This observation could have serious consequences for how we understand the consistency between group and individual correlations, and the generalizability of conclusions between domains. Researchers should explicitly test for equivalence of processes at the individual and group level across the social and medical sciences.

doi: 10.1073/pnas.1711978115

  1. The studies are on human subjects.

You really need to check your statistical models, not just fit them

I haven’t had a chance to read the paper I mention below yet, but it looks like a very good guide to model checking – a step that is too often forgotten. It doesn’t do us much good to estimate parameters of a statistical model that doesn’t do well at fitting the data we have. That’s what model checking is all about. In a Bayesian context, posterior predictive model checking is particularly useful.1 If the parameters and the model you used to estimate them can’t reproduce the data you collected reasonably well, the model isn’t doing a good job of fitting the data, and you shouldn’t trust the parameter estimates.

If you happen to be using Stan (via rstan) or rstanarm, posterior predictive model checking is either immediately available (rstanarm) or easy to make available (rstan) in Shinystan. It’s built on the functions in bayesplot, which provides the underlying functions for posterior prediction for virtually any package (provided you coerce the result into the right format). I’ve been using bayesplot lately, because it integrates nicely with R Notebooks, meaning that I can keep a record of my model checking in the same place that I’m developing and refining the code that I’m working on.

Here’s the title, abstract, and a link:

A guide to Bayesian model checking for ecologists

Paul B. Conn, Devin S. Johnson, Perry J. Williams, Sharon R. Melin, Mevin B. Hooten

Ecological Mongraphs doi: 10.1002/ecm.1314

Checking that models adequately represent data is an essential component of applied statistical inference. Ecologists increasingly use hierarchical Bayesian statistical models in their research. The appeal of this modeling paradigm is undeniable, as researchers can build and fit models that embody complex ecological processes while simultaneously accounting for observation error. However, ecologists tend to be less focused on checking model assumptions and assessing potential lack of fit when applying Bayesian methods than when applying more traditional modes of inference such as maximum likelihood. There are also multiple ways of assessing the fit of Bayesian models, each of which has strengths and weaknesses. For instance, Bayesian P values are relatively easy to compute, but are well known to be conservative, producing P values biased toward 0.5. Alternatively, lesser known approaches to model checking, such as prior predictive checks, cross‐validation probability integral transforms, and pivot discrepancy measures may produce more accurate characterizations of goodness‐of‐fit but are not as well known to ecologists. In addition, a suite of visual and targeted diagnostics can be used to examine violations of different model assumptions and lack of fit at different levels of the modeling hierarchy, and to check for residual temporal or spatial autocorrelation. In this review, we synthesize existing literature to guide ecologists through the many available options for Bayesian model checking. We illustrate methods and procedures with several ecological case studies including (1) analysis of simulated spatiotemporal count data, (2) N‐mixture models for estimating abundance of sea otters from an aircraft, and (3) hidden Markov modeling to describe attendance patterns of California sea lion mothers on a rookery. We find that commonly used procedures based on posterior predictive P values detect extreme model inadequacy, but often do not detect more subtle cases of lack of fit. Tests based on cross‐validation and pivot discrepancy measures (including the “sampled predictive P value”) appear to be better suited to model checking and to have better overall statistical performance. We conclude that model checking is necessary to ensure that scientific inference is well founded. As an essential component of scientific discovery, it should accompany most Bayesian analyses presented in the literature.

  1. Andrew Gelman introduced the idea more than 20 year ago (link), but it’s only really caught on since his Stan group made some general purpose packages available that simplify the process of producing the predictions. (See the next paragraph for references.)

Trait-environment relationships in Pelargonium

Almost 15 years ago Wright et al. (Nature 428:821–827; 2004 – doi: 10.1038/nature02403) described the worldwide leaf economics spectrum “a universal spectrum of leaf economics consisting of key chemical, structural and physiological properties.” Since then, an enormous number of articles have been published that examine or refer to it – more than 4000 according to Google Scholar. In the past few years, many authors have pointed out that it may not be as universal as originally presumed. For example, in Mitchell et al. (The American Naturalist 185:525-537; 2015 – http://www.jstor.org/stable/10.1086/680051) we found a negative relationship between an important component of the leaf economics spectrum (leaf mass per area) and mean annual temperature in Pelargonium from the Cape Floristic Region of southwestern South Africa, while the global pattern is for a positive relationship.1

Now Tim Moore and several of my colleagues follow up with a more detailed analysis of trait-environment relationships in Pelargonium. They demonstrate several ways in which the global pattern breaks down in South African samples of this genus. Here’s the abstract and a link to the paper.

  • Functional traits in closely related lineages are expected to vary similarly along common environmental gradients as a result of shared evolutionary and biogeographic history, or legacy effects, and as a result of biophysical tradeoffs in construction. We test these predictions in Pelargonium, a relatively recent evolutionary radiation.
  • Bayesian phylogenetic mixed effects models assessed, at the subclade level, associations between plant height, leaf area, leaf nitrogen content and leaf mass per area (LMA), and five environmental variables capturing temperature and rainfall gradients across the Greater Cape Floristic Region of South Africa. Trait–trait integration was assessed via pairwise correlations within subclades.
  • Of 20 trait–environment associations, 17 differed among subclades. Signs of regression coefficients diverged for height, leaf area and leaf nitrogen content, but not for LMA. Subclades also differed in trait–trait relationships and these differences were modulated by rainfall seasonality. Leave‐one‐out cross‐validation revealed that whether trait variation was better predicted by environmental predictors or trait–trait integration depended on the clade and trait in question.
  • Legacy signals in trait–environment and trait–trait relationships were apparently lost during the earliest diversification of Pelargonium, but then retained during subsequent subclade evolution. Overall, we demonstrate that global‐scale patterns are poor predictors of patterns of trait variation at finer geographic and taxonomic scales.

doi.org/10.1111/nph.15196

  1. If you read The American Naturalist paper, you’ll see that we wrote in the Discussion that “We could not detect a relationship between LMA and MAT in Protea….” I wouldn’t write it that way now. Look at Table 2. You’ll see that the posterior mean for the relationship is 0.135 with a 95% credible interval of (-0.078,0.340). I would now write that “We detected a weakly supported positive relationship between LMA and MAT….” Why the difference? I’ve taken to heart Andrew Gelman’s observation that “The difference between significant’ and ‘not significant’ is not itself statistically significant” (blog post; article in The American Statistician). I am training myself to pay less attention to which coefficients in a regression and which aren’t and more to reporting the best guess we have about each relationship (the posterior means) and the amount of confidence we have about them (the credible intervals). I recently learned about hypothesis() in brms, which will provide an estimate of the posterior probability that the you’ve got the sign of the relationship right. I need to investigate that. I suspect that’s what I’ll be using in the future.

Trait-climate evolution in Protea

Protea compacta

If you’re reading this post, you know that my colleagues and I have been studying Protea for more than a decade. A lot of our work has focused on documenting and understanding trait-environment associations. We’ve studied those associations both among populations within species (Protea repens: https://doi.org/10.1093/aob/mcv146), among populations within a small, closely related clade (Protea sect. Exsertae: https://doi.org/10.1111/j.1558-5646.2010.01131.x and https://doi.org/10.1111/j.1420-9101.2012.02548.x), and across the entire genus (https://doi.org/10.1086/680051). But all of those studies look at the relationship between the climate as it is now (as reflected in the South African Atlas of Agrohydrology and Climatology). They haven’t examined how traits have evolved in response to changes in climate.

Our latest paper, begins to address that shortcoming. We use the highly resolved phylogeny of Protea that Nora Mitchell constructed as part of her dissertation (http://darwin.eeb.uconn.edu/uncommon-ground/blog/2017/01/23/a-new-phylogeny-for-protea/ and https://doi.org/10.3732/ajb.1600227), and we reconstruct estimates of how traits changed over evolutionary time in concert (or not) with climates. Our reconstructions depend on particular models of evolutionary change, and we explore several alternatives. Here’s the abstract:

Evolutionary radiations are responsible for much of Earth’s diversity, yet the causes of these radiations are often elusive. Determining the relative roles of adaptation and geographic isolation in diversification is vital to understanding the causes of any radiation, and whether a radiation may be labeled as “adaptive” or not. Across many groups of plants, trait–climate relationships suggest that traits are an important indicator of how plants adapt to different climates. In particular, analyses of plant functional traits in global databases suggest that there is an “economics spectrum” along which combinations of functional traits covary along a fast–slow continuum. We examine evolutionary associations among traits and between trait and climate variables on a strongly supported phylogeny in the iconic plant genus Protea to identify correlated evolution of functional traits and the climatic-niches that species occupy. Results indicate that trait diversification in Protea has climate associations along two axes of variation: correlated evolution of plant size with temperature and leaf investment with rainfall. Evidence suggests that traits and climatic-niches evolve in similar ways, although some of these associations are inconsistent with global patterns on a broader phylogenetic scale. When combined with previous experimental work suggesting that trait–climate associations are adaptive in Protea, the results presented here suggest that trait diversification in this radiation is adaptive.

Mitchell, N., J.E. Carlson, and K.E. Holsinger.  2018.  Correlated evolution between climate and suites of traits along a fast–slow continuum in the radiation of Protea. Ecology and Evolution 8:1853–1866. doi: 10.1002/ece3.3773.

The origin of a bipolar moss (i.e., one that occurs in the far North and the far South)

One of the great pleasures of serving as an associate advisor on PhD committee is that sometimes you contribute enough to the analysis and interpretation of the data that you end up being a co-author on a paper. That’s why I have papers on New Zealand cicadas, deer mice, and tapeworms, among other things. Now I’ve added another group to my list – moss. Lily Lewis finished her PhD at UConn in the spring of 2015 working with Bernard Goffinet. I was a member of her committee, and now a chapter of her dissertation on which I was able to help has appeared in the American Journal of Botany.1 Here’s the title and abstract. You’ll find the DOI and a link to the paper below.

Resolving the northern hemisphere source region for the long-distance dispersal event that gave rise to the South American endemic dung moss Tetraplodon fuegianus.

PREMISE OF THE STUDY: American bipolar plant distributions characterize taxa at various taxonomic ranks but are most common in the bryophytes at infraspecific and infrageneric levels. A previous study on the bipolar disjunction in the dung moss genus Tetraplodon found that direct long-distance dispersal from North to South in the Miocene–Pleistocene accounted for the origin of the Southern American endemic Tetraplodon fuegianus, congruent with other molecular studies on bipolar bryophytes. The previous study, however, remained inconclusive regarding a specific northern hemisphere source region for the transequatorial dispersal event that gave rise to T. fuegianus.
METHODS: To estimate spatial genetic structure and phylogeographic relationships within the bipolar lineage of Tetraplodon, which includes T. fuegianus, we analyzed thousands of restriction-site-associated DNA (RADseq) loci and single nucleotide polymorphisms using Bayesian individual assignment and maximum likelihood and coalescent model based phylogenetic approaches.
KEY RESULTS: Northwestern North America is the most likely source of the recent ancestor to T. fuegianus.
CONCLUSIONS: Tetraplodon fuegianus, which marks the southernmost populations in the bipolar lineage of Tetraplodon, arose following a single long-distance dispersal event involving a T. mnioides lineage that is now rare in the northern hemisphere and potentially restricted to the Pacific Northwest of North America. Furthermore, gene flow between sympatric lineages of Tetraplodon mnioides in the northern hemisphere is limited, possibly due to high rates of selfing or reproductive isolation.

DOI: 10.3732/ajb.1700144
(more…)

Climate change and Pelargonium in South Africa

For more than a decade my colleagues Margaret Rubega and Bob Wyss have co-taught a course to graduate students in science and engineering and undergraduates in Journalism.1 The purpose of the course is to help science students improve their skills in working with journalists and to help journalist increase their skills in interviewing scientists and developing stories from those interviews. One of the projects in this fall’s edition of the course was for the journalism students to interview one of the science graduate students and produce a short video describing the student’s research. Daniela Doncel interviewed Tanisha Williams, a PhD student in EEB whom I co-advise with Carl Schlichting. In addition to interviewing Tanisha, Daniela also interviewed Cindi Jones and me. She assembled a video that explains Tanisha’s work very well. I think Daniela did a very nice job of weaving the disparate interviews into a compelling story, and I think the video looks very good (even though it has me in it). I hope that you agree.

(more…)

AIBS Emerging Public Policy Leadership Award

The American Institute of Biological Sciences works to ensure that the public, legislators, and others have access to the best scientific information available, especially in the fields of environmental and organismal biology. In addition to individual members, more than 100 professional societies are organizational members. One of the most interesting programs AIBS offers is its Emerging Public Policy Leadership Award. Here is the text of an e-mail I recently received announcing this year’s award.

Each year, the American Institute of Biological Sciences (AIBS) recognizes graduate students in the biological sciences who have demonstrated initiative and leadership in science policy. Recipients obtain first-hand experience at the interface of science and public policy.Winners receive:

  • A trip to Washington, DC, to participate in the AIBS Congressional Visits Day, an annual event that brings scientists to the nation’s capital to advocate for federal investment in the biological sciences, with a primary focus on the National Science Foundation. The event will be held on April 17-18, 2018. Domestic travel and hotel expenses will be paid for the winners.
  • Policy and communications training, including information on the legislative process and trends in federal science funding.
  • Meetings with congressional policymakers to discuss the importance of federal investment in the biological sciences.
  • A one-year AIBS membership, including a subscription to the journal BioScience and a copy of “Communicating Science: A Primer for Working with the Media.”
    The 2018 award is open to U.S. citizens and U.S. permanent residents enrolled in a graduate degree program in the biological sciences, science education, or a closely allied field. Applicants should have a demonstrated interest in and commitment to science policy and/or science education policy.

Applications are due by 11:59 PM Eastern Time on 9 January 2017. The application can be downloaded at http://www.aibs.org/public-policy/eppla.html.

Plants, People, and the Mother City

Tanisha Williams, Fulbright 2015-2016, South Africa, at Boulders Beach visiting the penguins.

Some of you know that Carl Schlichting and I co-advise Tanisha Williams. If you know that, you almost certainly know that Tanisha spent the 2015-2016 academic year as a Fulbright Fellow in South Africa. She was based at the Cape Peninsula University of Technology, and she used her time not only to collect seeds of Pelargonium and establish experimental gardens at Kirstenbosch Botanical Garden and Rhodes University but also to work with two non-profit environmental organizations. She posted an article about her experience on the blog of the Fulbright Student Program. Here’s an excerpt to whet your appetite:

Among the many experiences I had, I must say the residents from the Khayelitsha township have taken a special place in my heart. This is where I taught girls and young women math, science, computer tutoring, life skills, and female empowerment through a community center program. It was such an impactful experience, as these girls are growing up in a community with high rates of unemployment, violence, and other socioeconomic issues. It was empowering for me to see the curiosity and determination these girls had for learning and changing their community. They thought I was there to teach them from my own experiences being raised in a comparable situation and now working on my doctorate as a scientist, but I know I was the one that gained the most from our time together. I learned what it truly means to have hope and persevere. These lessons, along with the ecological and evolutionary insights from my academic research, will be ones that I always remember.

Announcing the BioOne Career Center

BioOne logos
BioOne is a collaboration between libraries and non-profit scholarly publishers in organismal and environmental life sciences. It was founded in 1999 to help publishers obtain the revenue they need to support their publishing program while ensuring affordable access to scholarly journals for libraries and their patrons. I am proud to have served as Chair of the BioOne Board of Directors since 2000.

BioOne’s primary service is to provide BioOne Complete, a database of 207 journals including many open access titles. As the title of this post suggests, BioOne is now offering a new service, the BioOne Career Center. Anyone looking for an opportunity can create a free account, set up a job aloer, and post their CV. Employers can post jobs on the site for free until the end of October (you’ll find the necessary code in the announcement), and posting for internships, volunteer opportunities, and conferences will always be free. We hope that the BioOne Career Center will become a valuable resource.