Uncommon Ground

Biology

Sharing a new version of my genetic drift simulation

You may be aware that I wrote a series of applications in RShiny several years ago to illustrate some principles of population genetics. I just finished revising the genetic drift application. If you’ve used it in the past, you’ll know that it would get hung up when you tried to simulate a long time series or a lot of populations. After some digging around, I realized that the problem isn’t with running the simulation or with collecting the results. It’s with converting the results to a form that allows the simulation to unfold over time.

As a result, this version allows you to turn the animation off. Now you can run long time series with lots of populations (where “lots” equals “up to 10”). You won’t see the results played as a movie, but you’ll see them displayed very quickly. As you’ll see from the first link above, all of the source code is available on Github. If you find any of these applications useful, you’ll want to take a look at the Google Doc that Katie Lotterhos put together and announced on Twitter last January. It includes screenshots and links to applications written by CJ Battey, Graham Coop, and Chris Muir.

Genetic structure and clonal diversity in an important Chinese grass

Since you’re reading this blog, you must know that I don’t have a lot of time for research these days. My duties as Vice Provost for Graduate Education and Dean of The Graduate School at UConn take up most of my time. I do manage to contribute to some research, so long as other people do the real work and I contribute some ideas or some statistical analyses. Here’s another example of that.

Last fall I was asked about the old C++ program Hickory that I had written to facilitate analysis of Wright’s F-statistics with dominant markers. It was never terribly widely used, and it was difficult to maintain. I gave up about 10 years ago. In the meantime, I realized that there’s an easy way to rewrite Hickory using Stan. After being contacted, I finally bit the bullet and did the rewrite in a combination of Stan and R. I even mentioned the R/Stan implementation last September.

Yesterday, we posted a pre-print on bioRxiv that uses the new version of Hickory as one of a variety of analytical methods that provide some insight into the genetic structure of Leymus chinensis. Here’s the abstract and a link.

Genetic structure in patchy populations of a candidate foundation plant: a case study of Leymus chinensis (Poaceae) using genetic and clonal diversity

Jian Guo, Christina L. Richards, Kent E. Holsinger, Gordon A. Fox, Zhuo Zhang, Chan Zhou

doi: https://doi.org/10.1101/2021.06.12.448174

PREMISE The distribution of genetic diversity on the landscape has critical ecological and evolutionary implications. This may be especially the case on a local scale for foundation plant species since they create and define ecological communities, contributing disproportionately to ecosystem function.

METHODS We examined the distribution of genetic diversity and clones, which we defined first as unique multi-locus genotypes (MLG), and then by grouping similar MLGs into multi-locus lineages (MLL). We used 186 markers from inter-simple sequence repeats (ISSR) across 358 ramets from 13 patches of the foundation grass Leymus chinensis. We examined the relationship between genetic and clonal diversities, their variation with patch-size, and the effect of the number of markers used to evaluate genetic diversity and structure in this species.

RESULTS Every ramet had a unique MLG. Almost all patches consisted of individuals belonging to a single MLL. We confirmed this with a clustering algorithm to group related genotypes. The predominance of a single lineage within each patch could be the result of the accumulation of somatic mutations, limited dispersal, some sexual reproduction with partners mainly restricted to the same patch, or a combination of all three.

CONCLUSIONS We found strong genetic structure among patches of L. chinensis. Consistent with previous work on the species, the clustering of similar genotypes within patches suggests that clonal reproduction combined with somatic mutation, limited dispersal, and some degree of sexual reproduction among neighbors causes individuals within a patch to be more closely related than among patches.

The link between traits and performance in Protea

If you’re reading this, you probably know enough about me to know that my students and I have been working on Protea for the last 10-15 years. Today I am pleased to report that the most recent work, from Kristen Nolting’s PhD dissertation has appeared in Annals of Botany. The advance publication version appeared nearly a year ago, but the paper is officially out in a special issue focusing on intraspecific trait variation in plants. Here’s the abstract and a link.

Intraspecific trait variation influences physiological performance and fitness in the South Africa shrub genus Protea (Proteaceae)

Kristen M Nolting, Rachel Prunier, Guy F Midgley, Kent E Holsinger

Background and Aims

Global plant trait datasets commonly identify trait relationships that are interpreted to reflect fundamental trade-offs associated with plant strategies, but often these trait relationships are not identified when evaluating them at smaller taxonomic and spatial scales. In this study we evaluate trait relationships measured on individual plants for five widespread Protea species in South Africa to determine whether broad-scale patterns of structural trait (e.g. leaf area) and physiological trait (e.g. photosynthetic rates) relationships can be detected within natural populations, and if these traits are themselves related to plant fitness.

Methods

We evaluated the variance structure (i.e. the proportional intraspecific trait variation relative to among-species variation) for nine structural traits and six physiological traits measured in wild populations. We used a multivariate path model to evaluate the relationships between structural traits and physiological traits, and the relationship between these traits and plant size and reproductive effort.

Key Results

While intraspecific trait variation is relatively low for structural traits, it accounts for between 50 and 100 % of the variation in physiological traits. Furthermore, we identified few trait associations between any one structural trait and physiological trait, but multivariate regressions revealed clear associations between combinations of structural traits and physiological performance (R2 = 0.37–0.64), and almost all traits had detectable associations with plant fitness.

Conclusions

Intraspecific variation in structural traits leads to predictable differences in individual-level physiological performance in a multivariate framework, even though the relationship of any particular structural trait to physiological performance may be weak or undetectable. Furthermore, intraspecific variation in both structural and physiological traits leads to differences in plant size and fitness. These results demonstrate the importance of considering measurements of multivariate phenotypes on individual plants when evaluating trait relationships and how trait variation influences predictions of ecological and evolutionary outcomes.

Annals of Botany 127:519–531; 2021 https://doi.org/10.1093/aob/mcaa060

Causal inference in ecology – An update

Causal inference in ecology – links to the series

A couple of years ago I wrote a series of posts on causal inference in ecology. In it I explored the Rubin causal model and concluded that

the Rubin causal model isn’t likely to help me make causal inferences with the kinds of observational data I collect.

I haven’t changed my mind about that, but I do have an update.

I’ve been reading Regression and other stories, by Andrew Gelman, Jennifer Hill, and Aki Vehtari, which I highly recommend reading if you use regression for any purpose in your research. I just finished Chapter 21, “Additional topics in causal inference”, and the last section, 21.5 “Causes of effects and effects of causes”, is particularly relevant to my earlier conclusion. Not surprisingly, Gelman, Hill, and Vehtari (GHV) have a better way of explaining the role that regression can play in generating hypotheses than I did. You’ll need to read the chapters on causal inference (or be familiar with the Rubin causal model) to fully appreciate their insight, but here it is in a nutshell.

We can make inferences about the effect of a cause when we (a) identify an intervention (a cause) that may have an effect and (b) randomize the intervention across experimental units (or do something that mimics random assignment by balancing on potential pre-observation confounders or by using an instrumental variable, a regression discontinuity, or difference-in-differences approach). Thought about in this way, the purpose of statistical analysis is to estimate the magnitude of an effect.

The regression analyses I typically do can be cast as an attempt to make inferences about the cause of an effect.1 Here’s where GHV have a better way of thinking about it than I did. Let’s suppose that I’m interested in environmental features that influence stomatal density, the example I discussed on 11 June 2018. I illustrate there that three principal components describing aspects of the environment show strong associations with stomatal density. GHV remind us that some other variable (or set of variables) could cause the observed differences in stomatal density and that once we’ve taken that variable into account, none of the PCs would show an association with stomatal density.2 More importantly, they point out that the association suggests causal hypotheses that could account for the association. To the extent that its important to us to dissect those causes, we can then do new experiments or make new observations (using Rubin’s causal model as a framework if we’re going to make causal inferences from an observational study) structured to estimate the effects those hypotheses suggest.

  1. I wrote “can be cast as an attempt”, because I do my damndest to make it clear that I’m only asserting that certain variables have stronger associations with the outcome I’m studying than others, not that those variables cause the outcome.
  2. Fortunately (for me), that’s consistent with what I wrote two years ago.

An R-Stan implementation of Bayesian inference for Wright’s F-statistics

Some of you know that many years ago Paul Lewis, Dipak Dey, and I wrote a paper describing a Bayesian approach to inferring population structure from dominant markers.1 You may also know that Paul Lewis and I wrote a Windoze program in C++, Hickory, that implemented the approach. We later extended Hickory for analysis of co-dominant markers. Later still, Feng Guo, Dipak, and I wrote another paper describing a Bayesian approach to (a) estimating population- and locus-specific effects on FST and (b) identifying loci where the posterior distribution of FST is markedly different from the overall estimate.2 If you know that (and maybe even if you don’t know all of that), you also know that Paul and I stopped maintaining Hickory a number of years ago. I moved from Windoze to Mac, and the library we were using to support the graphical user interface became too complicated for me to keep up with.

I’ve had a few requests from people who were interested in using Hickory, but I just haven’t had the time to find a way to support them – until now.

Over the past several years, I’ve been using Stan for many different statistical analyses. When I received another request for Hickory a couple of weeks ago, I realized that I could pretty easily develop a new version of Hickory in R/Stan. This approach has several advantages over the standalone C++ code in the original Hickory.

  1. I don’t have to worry about writing the MCMC sampler myself. I use the very sophisticated Hamiltonian Monte Carlo in Stan. I not only avoid me the bother of writing my own sampler, I have much greater confidence that the sampler is performing correctly. It’s written and maintained by experts, and the convergence diagnostics are far more sophisticated than for Metropolis-Hastings.
  2. It should be readily portable to any platform on which R is supported. The only requirement, for now, is that you have a C++ compiler installed. If you’re running a Mac, you may need to download Xcode. If you’re running Linux, you should be all set. If your running Windows, you can download Rtools from CRAN. I intend to submit the R package I’ve written to CRAN once I’ve tested it more thoroughly and provided some extensions to the crude functionality currently available. Once it’s on CRAN, you won’t even need a C++ compiler.
  3. I can develop an interface to adegenet and other R packages used for analysis and manipulation of genetic data so that Hickory can use data in many different formats supported by other packages.

A very early release of Hickory is available at GitHub. You should find all of the information you need to install and use it there. Let me know if you run into problems. I’ll do my best to walk you through them (and probably correct some errors I’ve made or at least improve the meager documentation in the process).

  1. Holsinger, K. E., Lewis, P. O., and Dey, D. K. 2002. A Bayesian approach to inferring population structure from dominant markers. Molecular Ecology 11:1157–1164.
  2. Guo, F., Dey, D. K., and Holsinger, K. E. 2009. A Bayesian hierarchical model for analysis of SNP diversity in multilocus, multipopulation samples. Journal of the American Statistical Association 104:142–154. http://doi.org/10.1198/jasa.2009.0010

Making accessible HTML from LaTeX sources — an additional experiment

Last week I reported on my initial experiments using Pandoc and LaTeXML to convert LaTeX to HTML. Here are links to the PDF produced with pdfLaTeX and the HTML:

If you’re like me, you’ll prefer the LaTeXML version to the Pandoc version, but as I pointed out the LaTeXML version includes CSS to customize the styling and the Pandoc version doesn’t. I did a quick Google search, figured out how to add CSS (and a table of contents) to the HTML output from Pandoc, and found a very nice CSS style to use (from Pascal Hertlief on Github). It’s possible that I’ll fiddle with Pascal’s CSS a bit, but there’s a good chance I won’t change it at all. It makes the HTML look really, really nice:

What I haven’t tried yet is converting LaTeX source that includes PDF figures. Let’s try that now and see how it works.

It took a while to get ImageMagick installed, to write a short Perl script to change all of the references to EPS files into references to PNG files and convert the EPSs to PNGs, but I really like the results. But this gets two of my three “to-dos” out of the way.

 

  • Check CSS styling for Pandoc.
  • Show the results to an accessibility expert at UConn and get some feedback on the different approaches.
  • See what happens with figures when they’re included in a LaTeX document.

Now I just (just?) need to check with an accessibility expert to confirm that the HTML is accessible. If it is, I’m all set.

By the way, if you’re interested in seeing the Perl script, let me know. It will be posted in the Github archive where I post the LaTeX source for my notes later this fall, but I’d be happy to send you a copy now if you drop me a line.

Making accessible HTML from LaTeX sources – some initial impressions

Some of you know that I’ve been making notes from my graduate course in Population Genetics available online for nearly 20 years (http://darwin.eeb.uconn.edu/uncommon-ground/eeb348/notes/). What a smaller number of you know is that I use LaTeX to write my notes and pdfLaTeX to produce PDFs from the LaTeX source. So far as I can tell (using ANDI), the PDFs produced in this way provide some elements that aid accessibility, but I am exploring options to produce HTML from the same source that might produce documents that are accessible to more readers. For my first experiment, I used the LaTeX file from 2019 that produced notes on resemblance among relatives. Here are links to three versions of the notes:

Both approaches to producing HTML are straightforward.

For Pandoc:

pandoc --standalone --mathjax -o quant-resemblance-pandoc.html quant-resemblance.tex

For LaTeXML:

latexml --includestyles --dest=quant-resemblance.xml quant-resemblance.tex
latexmlpost --dest=quant-resemblance-latexml.html quant-resemblance.xml

With the default options, I like the look of the LaTeXML version better, but it also includes CSS customizations and the Pandoc version doesn’t. It’s probably possible to include customized CSS with Pandoc, but I haven’t had a chance to investigate that yet. I also haven’t had a chance to consult anyone who knows how to judge accessibility of documents. When I’ve had a chance to do that. I’ll return with a report. (Don’t hold your breath. I am a dean, so I don’t have a lot of time on my hands.)

Here’s my to-do list, so that I don’t forget:

  • Check CSS styling for Pandoc.
  • Show the results to an accessibility expert at UConn and get some feedback on the different approaches.
  • See what happens with figures when they’re included in a LaTeX document.

If you have additional questions, let me know, and I’ll add them to the list.

Some thoughts about my career and about biology over the last several decades

It’s a little frightening to me when it dawns on me that the world of biology that is existed when I started graduate school is as far away in time as the New Synthesis was then. Introns had been discovered only a few years before. Allozymes were the rage, and Sanger sequencing was “the hot new thing.” Suffice it to say that a lot has changed.

Some of you know that I had the privilege of serving as President of the American Institute of Biological Sciences in 2006. As a result of that, I also have the privilege of being featured this month in BioScience. The piece is part of the In Their Own Words series. Here’s the abstract and a link:

In Their Own Words chronicles the stories of scientists who have made great contributions to their fields, particularly within the biological sciences. These short oral histories provide our readers a way to learn from and share their experiences. Each month, we will publish in the pages of BioScience and in our podcast, BioScience Talks (http://bioscienceaibs.libsyn.com), the results of these conversations. This second oral history is with Dr. Kent Holsinger, board of trustees distinguished professor of biology in the Department of Ecology and Evolutionary Biology at the University of Connecticut. He previously served as president of the American Institute of Biological Sciences.

https://doi.org/10.1093/biosci/biz136

If for some reason you’d like to hear the interview, there’s also a podcast link: http://bioscienceaibs.libsyn.com. I haven’t listened to the podcast, but the article came out reasonably well.

Microscale trait-environment associations in Protea

If you follow me (or Nora Mitchell) on Twitter, you saw several weeks ago that a publish before print version of our most recent paper appeared in the American Joiurnal of Botany. This morning I noticed that the full published version is available on the AJB website. Here’s the citation and abstract:

Mitchell, N., and K. E. Holsinger.  2019.  Microscale trait‐environment associations in two closely‐related South African shrubs. American Journal of Botany 106:211-222.  doi: 10.1002/ajb2.1234

Premise of the Study
Plant traits are often associated with the environments in which they occur, but these associations often differ across spatial and phylogenetic scales. Here we study the relationship between microenvironment, microgeographical location, and traits within populations using co‐occurring populations of two closely related evergreen shrubs in the genus Protea.
Methods
We measured a suite of functional traits on 147 plants along a single steep mountainside where both species occur, and we used data‐loggers and soil analyses to characterize the environment at 10 microsites spanning the elevational gradient. We used Bayesian path analyses to detect trait‐environment relationships in the field for each species. We used complementary data from greenhouse grown seedlings derived from wild collected seed to determine whether associations detected in the field are the result of genetic differentiation.
Key Results
Microenvironmental variables differed substantially across our study site. We found strong evidence for six trait‐environment associations, although these differed between species. We were unable to detect similar associations in greenhouse‐grown seedlings.
Conclusions
Several leaf traits were associated with temperature and soil variation in the field, but the inability to detect these in the greenhouse suggests that differences in the field are not the result of genetic differentiation.

Saturday afternoon at Trail Wood

OK. This is mildly embarrassing. I moved to Connecticut in 1986, I was one of the co-founders of the Edwin Way Teale Lecture Series on Nature and the Environment in 1996, I’ve read A Naturalist Buys an Old Farm at least half a dozen times, and Trail Wood is less than 30 miles (40 minutes) from my home in Coventry, but it wasn’t until Saturday that I finally visited. It won’t be the last time. I expect to return once or twice a year to the Beaver Pond Trail, to cross Starfield and Firefly Meadow, and to visit the Summerhouse and Writing Cabin.

Black-eyed susan (Rudbeckia hirta) photographed at Trail Wood

A nice patch of black-eyed susan (Rudbeckia hirta) greeted me near the parking area, which is just a short walk from the house at Trail Brook. Rather than following Veery Lane, I turned left and followed the path through Firefly Meadow towards the small pond.

Edwin Way Teale’s writing cabin at Trail Wood

The Writing Cabin is on the southwest shore of the pond. I turned right and followed the northeast shore to Summerhouse. From there I followed a path along the stone wall bordering Woodcock Pasture until it met the Shagbark Hickory Trail.

Spotted wintergreen (Chimaphila maculata) photographed at Trail Wood

I found spotted wintergreen (Chimaphila maculata) along the Shagbark Hickory Trail , which I followed to the Old Colonial Road. From their I followed the Beaver Pond Trail to the edge of the pond.

Beaver Pond at Trail Wood

After sitting for a while on a nice bench at the south end of the pond, I backtracked on the Beaver Pond Trail and followed the Fern Brook trail through Starfield back to the house and then to the parking area. The whole walk was less than a mile and a half, and the total elevation gain was only 55 feet. It was definitely an easy walk, not a hike, but it was very pleasant, and it was nice to spend time on the old farm where Teale spent so much of his time.

So to anyone from UConn (or nearby) who reads this and hasn’t been to Trail Wood yet, take a couple of hours some afternoon, drive to Hampton, and explore. Trail Wood is easy to find, and it’s open from dawn to dusk. It’s a gem in our own backyard. And if you haven’t read A Naturalist Buys an Old Farm, do it now. You’ll enjoy your visit to Trail Wood even more if you do.