Uncommon Ground

Monthly Archive: September 2020

An R-Stan implementation of Bayesian inference for Wright’s F-statistics

Some of you know that many years ago Paul Lewis, Dipak Dey, and I wrote a paper describing a Bayesian approach to inferring population structure from dominant markers.1 You may also know that Paul Lewis and I wrote a Windoze program in C++, Hickory, that implemented the approach. We later extended Hickory for analysis of co-dominant markers. Later still, Feng Guo, Dipak, and I wrote another paper describing a Bayesian approach to (a) estimating population- and locus-specific effects on FST and (b) identifying loci where the posterior distribution of FST is markedly different from the overall estimate.2 If you know that (and maybe even if you don’t know all of that), you also know that Paul and I stopped maintaining Hickory a number of years ago. I moved from Windoze to Mac, and the library we were using to support the graphical user interface became too complicated for me to keep up with.

I’ve had a few requests from people who were interested in using Hickory, but I just haven’t had the time to find a way to support them – until now.

Over the past several years, I’ve been using Stan for many different statistical analyses. When I received another request for Hickory a couple of weeks ago, I realized that I could pretty easily develop a new version of Hickory in R/Stan. This approach has several advantages over the standalone C++ code in the original Hickory.

  1. I don’t have to worry about writing the MCMC sampler myself. I use the very sophisticated Hamiltonian Monte Carlo in Stan. I not only avoid me the bother of writing my own sampler, I have much greater confidence that the sampler is performing correctly. It’s written and maintained by experts, and the convergence diagnostics are far more sophisticated than for Metropolis-Hastings.
  2. It should be readily portable to any platform on which R is supported. The only requirement, for now, is that you have a C++ compiler installed. If you’re running a Mac, you may need to download Xcode. If you’re running Linux, you should be all set. If your running Windows, you can download Rtools from CRAN. I intend to submit the R package I’ve written to CRAN once I’ve tested it more thoroughly and provided some extensions to the crude functionality currently available. Once it’s on CRAN, you won’t even need a C++ compiler.
  3. I can develop an interface to adegenet and other R packages used for analysis and manipulation of genetic data so that Hickory can use data in many different formats supported by other packages.

A very early release of Hickory is available at GitHub. You should find all of the information you need to install and use it there. Let me know if you run into problems. I’ll do my best to walk you through them (and probably correct some errors I’ve made or at least improve the meager documentation in the process).

  1. Holsinger, K. E., Lewis, P. O., and Dey, D. K. 2002. A Bayesian approach to inferring population structure from dominant markers. Molecular Ecology 11:1157–1164.
  2. Guo, F., Dey, D. K., and Holsinger, K. E. 2009. A Bayesian hierarchical model for analysis of SNP diversity in multilocus, multipopulation samples. Journal of the American Statistical Association 104:142–154. http://doi.org/10.1198/jasa.2009.0010