Uncommon Ground

Author Archive: kent

Keep your data tidy

If you’ve spent any time using R, you probably know the name Hadley Wickham. He’s chief scientist at RStudio, the author of 4 books on R, and the author of several indispensable R packages, including ggplot2, dplyr, and devtools. I was reminded recently that several years ago, he wrote a very useful paper for the Journal of Statistical Software, “Tidy data” (August 2014, Volume 59, Issue 10, https://www.jstatsoft.org/article/view/v059i10).

If you are familiar with Hadley’s contributions to R, you won’t be surprised that tidy data has a simple, clean – tidy – set of requirements:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.

That sounds simple, but it requires that many of us rethink the way we structure our data, no more column headers as values, no more storing of multiple variables in one column, no more storing some variables in rows and others in columns. Fortunately, Hadley is also the author of tidyr. I haven’t used it yet, but given how bad I am at starting with tidy data, I suspect I’ll be using it a lot in the future.

I’m sorry. P < 0.005 won't save you.

Recently, a group of distinguished psychologists posted a preprint on PsyArxiv arguing for a re-definition of the traditional signifance threshold, lowering it from P < 0.05 to P < 0.005. They are concerned about reproducibility, and they argue that “changing the P value threshold is simple, aligns with the training undertaken by many researchers, and might quickly achieve broad acceptance.” That’s all true, but I’m afraid it won’t necessarily “improve the reproducibility of scientific research in many fields.”

Why? Let’s take a little trip down memory lane.

Almost a year ago I pointed out that we need to “Be wary of results from studies with small sample sizes, even if the effects are statistically significant.” I illustrated why with the following figure produced using R code available at Github: https://github.com/kholsinger/noisy-data

What that figure shows is the distribution of P-values that pass the P < 0.05 significance threshold when the true difference between two populations is mu standard deviations (with the same standard deviation in both populations) and with equal sample sizes of n. The results are from 1000 random replications. As you can see when the sample size is small, there’s a good chance that a significant result will have the wrong sign, i.e., the observed difference will be negative rather than positive, even if the between-population diffference is 0.2 standard deviations. When the between-population difference is 0.05, you’re almost as likely to say the difference is in the wrong direction as to get it right.

Does changing the threshold to P < 0.005 help. Well, I changed the number of replications to 10,000, reduced the threshold to P < 0.005, and here’s what I got.

Do you see a difference? I don’t. I haven’t run a formal statistical test to compare the distributions, but I’m pretty sure they’d be indistinguishable.

In short, reducing the significance threshold to P < 0.005 will result in fewer investigators reporting statistically significant results. But unless the studies they do also have reasonably large sample sizes relative to the expected magnitude of any effect and the amount of variability within classes, they won’t be any more likely to know the direction of the effect than with the traditional threshold of P < 0.05.

The solution to the reproducibility crisis is better data, not better statistics.


Using weather to predict growth of forest trees

Last January I mentioned that I co-authored a paper that appeared on bioRxiv in which we combined tree ring and growth increment data to predict growth from weather and biophysical data. The paper has now appeared in Ecosphere, an open acces journal from the Ecological Society of America. Here’s the abstract. You’ll find the full citation below.

Fusing tree-ring and forest inventory data to infer influences on tree growth

Better understanding and prediction of tree growth is important because of the many ecosystem services provided by forests and the uncertainty surrounding how forests will respond to anthropogenic climate change. With the ultimate goal of improving models of forest dynamics, here we construct a statistical model that combines complementary data sources, tree-ring and forest inventory data. A Bayesian hierarchical model was used to gain inference on the effects of many factors on tree growth—individual tree size, climate, biophysical conditions, stand-level competitive environment, tree-level canopy status, and forest management treatments—using both diameter at breast height (dbh) and tree-ring data. The model consists of two multiple regression models, one each for the two data sources, linked via a constant of proportionality between coefficients that are found in parallel in the two regressions. This model was applied to a data set of ~130 increment cores and ~500 repeat measurements of dbh at a single site in the Jemez Mountains of north-central New Mexico, USA. The tree-ring data serve as the only source of information on how annual growth responds to climate variation, whereas both data types inform non-climatic effects on growth. Inferences from the model included positive effects on growth of seasonal precipitation, wetness index, and height ratio, and negative effects of dbh, seasonal temperature, southerly aspect and radiation, and plot basal area. Climatic effects inferred by the model were confirmed by a dendroclimatic analysis. Combining the two data sources substantially reduced uncertainty about non-climate fixed effects on radial increments. This demonstrates that forest inventory data measured on many trees, combined with tree-ring data developed for a small number of trees, can be used to quantify and parse multiple influences on absolute tree growth. We highlight the kinds of research questions that can be addressed by combining the high-resolution information on climate effects contained in tree rings with the rich tree- and stand-level information found in forest inventories, including projection of tree growth under future climate scenarios, carbon accounting, and investigation of management actions aimed at increasing forest resilience.

Evans, M. E. K., D. A. Falk, A. Arizpe, T. L. Swetnam, F. Babst, and K. E. Holsinger. 2017. Fusing tree-ring and forest inventory data to infer influences on tree growth. Ecosphere 8(7):e01889. doi: 10.1002/ecs2.1889

A journal I don’t trust – Journal of Forensic Medicine Forecast

I won’t say that the Journal of Forensic Medicine Forecast is a predatory open-access journal. I will say that I am suspicious. Why? Beccause I received an invitation over the weekend to join its editorial board. If you’re reading this post, you know enough about me to know that I have no expertise at all in forensic medicine. How could I not be suspicious any journal, open access or not, that invites me to join its editorial board when I lack expertise relevant to its subject.

Here’s the text of the e-mail I received, with a name redacted to protect the individual’s privacy:.

Dear Dr. Kent E Holsinger,


Journal of Forensic Medicine Forecast is an international, Open Access and peer-reviewed journal has been launched by ScienceForecast Publications. The journal devoted to Clinical forensic medicine, digital and multimedia sciences, forensic analytical chemistry, forensic anthropology, forensic biology, forensic education, forensic entomology, forensic genetics, forensic microbiology, forensic odontology, forensic osteology, forensic pathology, forensic physical evidence, forensic psychiatry, forensic radiology, forensic serology, blood spatter analysis, drug delivery, crime scene investigation, dna fingerprinting, toxicological, human toxicology, applied toxicology, experimental toxicology, environmental toxicology, investigative toxicology.

At the onset, we are going to invite editorial board members, journal is seeking energetic, qualified and high profile researchers to join its editorial board. We believe the quality of a journal is depends on the quality of its Editorial Board.

Based on your high expertise in the field of forensic medicine, we are inviting you to join as editorial board member of our journal. As an EB member of our journal you may be required to occasionally review papers, solicit articles from your colleagues/acquaintances and help promote the journal at initial stage.

If you are interested to join as our Editorial Board member, please reply with your latest CV, photograph along with research interests as an email attachment.

To visit Desktop site: www.scienceforecastoa.com

We look forward to receiving your valuable response.

Best Regards,
<name redacted>
Editorial Office: Journal of Forensic Medicine Forecast
Tipple, View drive W
Ohio – 43016

This is not a spam email. If you are not interested to join as Editorial Board Member of this journal, please reply to this email with “unsubscribe” in the subject line.

Lecture notes in population genetics – final version from Spring 2017

I’ve finally had time to clean and post the final version of lecture notes from my graduate course in population genetics last spring. The individual lectures have been since I revised them for class, meaning that the last set of them was available in late April. You will find links to the individual lecture notes at http://darwin.eeb.uconn.edu/uncommon-ground/eeb348/notes/. If you’re interested in a particular topic in population genetics and I have a lecture that covers the topic, that’s probably where you’ll want to go.

If you want a single-volume reference to population genetics (including some old notes that I no longer maintain), you’ll find a PDF (5.89MB, 322 pages) at Figshare (doi: 10.6084/m9.figshare.100687.v2). If you want to print the PDF, I recommend that you print it on a double-sided printer. You can then put the pages in a binder and flip through them as if it were a bound book.

If you use LaTeX (and you’re a glutton for punishment), the LaTeX source and EPS files (for figures) is available in a Github repository (https://kholsinger.github.io/Lecture-Notes-in-Population-Genetics/).

These notes are released under a Creative Commons Attribution-ShareAlike license (http://creativecommons.org/licenses/by-sa/4.0/). I hope you find them useful. If you find errors in them, please let me know.

Doctoral Dissertation Improvement Grants – RIP

The National Science Foundation released this Dear Colleague letter yesterday:

June 6, 2017

Dear Colleague:

With this Dear Colleague Letter, the Directorate for Biological Sciences (BIO) is notifying members of the research communities . served by the Division of Integrative Organismal Systems (IOS) and the Division of Environmental Biology (DEB) to changes to the Doctoral Dissertation Improvement Grant (DDIG) Program.

Following a process of internal review and discussion regarding available resources, both the DEB and IOS Divisions will no longer accept DDIG proposals. This difficult decision was necessitated because of increasing workload and changes in Division priorities. This change is consistent with decisions made by other programs in BIO, which have not participated in the DDIG competition for more than a decade. This decision does not affect DDIGs that are already awarded.

We recognize that the independent research that was encouraged by the DDIGs has been an important aspect of training the next generation of scientists; we hope that this culture will continue. BIO continues to support graduate student participation in PI-led research across the entire spectrum of topics supported by its programs. Proposals for conferences are encouraged to include support for graduate and postdoctoral trainee travel and attendance. Further, NSF continues to support graduate research through the Graduate Research Fellowship Program (GRFP) and the NSF Research Traineeship Program (NRT).

Please see the Frequently Asked Questions (FAQs) (NSF 17-095) related to this DCL for more information.

If you have any questions pertaining to graduate student support under existing awards or future grant proposals, please contact the cognizant program director in the relevant Division.

James L. Olds
Assistant Director
Directorate for Biological Sciences

I hesitate to second-guess my colleagues at NSF. I know many of the program officers in the Biological Sciences Directorate and especially those in the Division of Environmental Biology. I know that they reached this decision because they believe that NSF can more effectively support research in life sciences by redirecting resources currently used to support Doctoral Dissertation Improvement Grants to other purposes, and I am confident their evaluation included an assessment of the impact on training of the next generation of life scientists. I also agree with many comments I’ve seen that DDIGs provide a great return on investment, at least in terms of the quality and quantity of research (and training) done with DDIG support. But I also know from conversations with current and former NSF officials that the there are large costs in time and money associated with reviewing DDIGs. I don’t have access to the data I would need to make a fair evaluation of the costs and benefits of the program, so I have no choice but to trust the judgment of my NSF colleagues.

Still, it saddens me to see this program go away. It has been an important part of PhD training in environmental biology for decades.

Discussing privilege in environmental conservation

I last taught my graduate course in conservation biology in Fall 2015. Holly Brown, my teaching assistant in the course, had to fill in for me a couple of times because of commitments that took me out of town. She designed a creative and powerful exercise for one of the times I was out of town. In written evaluations of the course, almost every student reported that it was eye opening and, quite possibly, the most useful exercise in the course. What was this creative and powerful exercise? Holly’s version of a privilege walk. If you don’t know what that is or you want to know how she used a privilege walk in the context of conservation or both, it’s your lucky day. A paper describing the exercise recently appeared in Conservation Biology. Here’s the citation and a link.

Brown, H.M., A. Kamath, and M. Rubega.  2017.  Facilitating discussions about privilege among future conservation practitioners. Conservation Biology 31:727-730.  doi: 10.1111/cobi.12810

Causes of genetic differentiation in Protea repens

American Journal of Botany Volume 104, Number 5. May 2017.

Protea repens is the most widespread member of the genus. It was one of the focal species in our recently completed Dimensions of Biodiversity project. Part of the project involved genotyping-by-sequencing analyses of 663 individuals from 19 populations spanning most of the geographical range of the species. We summarize results of those analyses in a paper that just appeared in advance of the May issue (cover photo featured above) of the American Journal of Botany. Here’s the abstract. You’ll find the citation and a link at the bottom.

PREMISE OF THE STUDY: The Cape Floristic Region (CFR) of South Africa is renowned for its botanical diversity, but the evolutionary origins of this diversity remain controversial. Both neutral and adaptive processes have been implicated in driving diversification, but population-level studies of plants in the CFR are rare. Here, we investigate the limits to gene flow and potential environmental drivers of selection in Protea repens L. (Proteaceae L.), a widespread CFR species.
METHODS: We sampled 19 populations across the range of P. repens and used genotyping by sequencing to identify 2066 polymorphic loci in 663 individuals. We used a Bayesian FST outlier analysis to identify single-nucleotide polymorphisms (SNPs) marking genomic regions that may be under selection; we used those SNPs to identify potential drivers of selection and excluded them from analyses of gene flow and genetic structure.
RESULTS: A pattern of isolation by distance suggested limited gene flow between nearby populations. The populations of P. repens fell naturally into two or three groupings, which corresponded to an east-west split. Differences in rainfall seasonality contributed to diversification in highly divergent loci, as do barriers to gene flow that have been identified in other species.
CONCLUSIONS: The strong pattern of isolation by distance is in contrast to the findings in the only other widespread species in the CFR that has been similarly studied, while the effects of rainfall seasonality are consistent with well-known patterns. Assessing the generality of these results will require investigations of other CFR species.

Prunier, R., M. Akman, C.T. Kremer, N. Aitken, A. Chuah, J. Borevitz, and K. E. Holsinger. Isolation by distance and isolation by environment contribute to population differentiation in Protea repens (Proteaceae L.), a widespread South African species. American Journal of Botany doi: 10.3732/ajb.1600232 

This is cool (if you’re a typography nerd)

A portion of the fontmap of Google Fonts generated by a designer from Ideo (http://fontmap.ideo.com)

Kevin Ho, software design lead at Ideo, created a fascinating tool to explore the 750+ typefaces available on Google Fonts. I’m not a designer,1 and I use only a small number of fonts,2 so I don’t need this tool, but I’ve been fascinated by printing and typography for several decades. I can’t stop playing with this fontmap, and I had to share the fun. There’s a nice article at Fast Company describing the project and how Ho used two open source algorithms to create it. Maybe future versions of Word will provide a fontmap to explore choices instead of vertical lists.

2017 Graduate Commencement Ceremonies @UConn

The University of Connecticut celebrated its 138th Commencement exercises last weekend.1 The Graduate School now confers so many degrees that we have two ceremonies, a ceremony for recipients of master’s degrees on Saturday afternoon and a ceremony for recipients of doctoral degrees on Monday evening. Stuart Rothenburg, who received his

Stuart Rothenburg, who received his PhD in Political Science from UConn, addressed the graduating class at the master’s ceremony. If you’d like to see his remarks, follow the link below, click on “Graduate School Ceremony: Masters Candidates, May 6, 2017”, and then click on “Commencement Address” at the left.

I addressed the graduating class at the doctoral ceremony on behalf of Elizabeth Jockusch, this year’s winner of the Edward C. Marth Award for Mentorship, and Takiyah Harper-Shipman was our student speaker. If you’d like to see my remarks, follow the link below, click on “Graduate School Ceremony: Doctoral Candidates, May 8, 2017”,  and then click on “Welcome Remarks” at the left. After a brief welcome from Interim Provost Jeremy Teitelbaum, you’ll see me. If you’d like to see Takiyah’s remarks, click on “Commencement Address” instead. If for some reason you’d like to read my remarks, keep scrolling down (or click through if you’re on the home page).

University of Connecticut Commencement Ceremonies 2017 (from Total Webcasting)