Uncommon Ground

Monthly Archive: August 2017

The solar eclipse at UConn

Solar eclipse

Solar eclipse prediction for Storrs from Wolfram

This afternoon at 2:45pm EDT the solar eclipse will reach its maximum in Storrs, about 70%. The figure above is a screenshot from my iPhone of the Wolfram Precision Eclipse Computation for Storrs. Follow that link to get the results for your location. The Physics Department at UConn will be hosting an eclipse viewing party on Horsebarn Hill. There will be solar telescopes and a short public lecture in addition to other activities. Unfortunately, I won’t be able to attend. I will be welcoming the new class of graduate fellows from 2:00-3:00pm and welcoming all new and continuing fellows and their faculty advisors at an ice cream social from 3:00-4:00pm. We will be meeting in the Alumni House, so we should see the darkening outside, and I may suggest that we take a short break a little before 2:45pm to go outside.

It’s almost certainly too late to get eclipse glasses, so if you don’t have them already you’ll have to find welder’s goggles or a solar telescope. If you can’t find any of those, you can still build yourself an eclipse viewer with a cardboard box and a few simple tools. Whatever you do, don’t look at the eclipse without protection for your eyes. A man in Portland, Oregon looked at an eclipse for no more than 20 seconds when he was in high school in 1963. It burned a holed in the retina of his right eye. Don’t let that happen to you.

Keep your data tidy

If you’ve spent any time using R, you probably know the name Hadley Wickham. He’s chief scientist at RStudio, the author of 4 books on R, and the author of several indispensable R packages, including ggplot2, dplyr, and devtools. I was reminded recently that several years ago, he wrote a very useful paper for the Journal of Statistical Software, “Tidy data” (August 2014, Volume 59, Issue 10, https://www.jstatsoft.org/article/view/v059i10).

If you are familiar with Hadley’s contributions to R, you won’t be surprised that tidy data has a simple, clean – tidy – set of requirements:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.

That sounds simple, but it requires that many of us rethink the way we structure our data, no more column headers as values, no more storing of multiple variables in one column, no more storing some variables in rows and others in columns. Fortunately, Hadley is also the author of tidyr. I haven’t used it yet, but given how bad I am at starting with tidy data, I suspect I’ll be using it a lot in the future.

I’m sorry. P < 0.005 won't save you.

Recently, a group of distinguished psychologists posted a preprint on PsyArxiv arguing for a re-definition of the traditional signifance threshold, lowering it from P < 0.05 to P < 0.005. They are concerned about reproducibility, and they argue that “changing the P value threshold is simple, aligns with the training undertaken by many researchers, and might quickly achieve broad acceptance.” That’s all true, but I’m afraid it won’t necessarily “improve the reproducibility of scientific research in many fields.”

Why? Let’s take a little trip down memory lane.

Almost a year ago I pointed out that we need to “Be wary of results from studies with small sample sizes, even if the effects are statistically significant.” I illustrated why with the following figure produced using R code available at Github: https://github.com/kholsinger/noisy-data

What that figure shows is the distribution of P-values that pass the P < 0.05 significance threshold when the true difference between two populations is mu standard deviations (with the same standard deviation in both populations) and with equal sample sizes of n. The results are from 1000 random replications. As you can see when the sample size is small, there’s a good chance that a significant result will have the wrong sign, i.e., the observed difference will be negative rather than positive, even if the between-population diffference is 0.2 standard deviations. When the between-population difference is 0.05, you’re almost as likely to say the difference is in the wrong direction as to get it right.

Does changing the threshold to P < 0.005 help. Well, I changed the number of replications to 10,000, reduced the threshold to P < 0.005, and here’s what I got.

Do you see a difference? I don’t. I haven’t run a formal statistical test to compare the distributions, but I’m pretty sure they’d be indistinguishable.

In short, reducing the significance threshold to P < 0.005 will result in fewer investigators reporting statistically significant results. But unless the studies they do also have reasonably large sample sizes relative to the expected magnitude of any effect and the amount of variability within classes, they won’t be any more likely to know the direction of the effect than with the traditional threshold of P < 0.05.

The solution to the reproducibility crisis is better data, not better statistics.


Using weather to predict growth of forest trees

Last January I mentioned that I co-authored a paper that appeared on bioRxiv in which we combined tree ring and growth increment data to predict growth from weather and biophysical data. The paper has now appeared in Ecosphere, an open acces journal from the Ecological Society of America. Here’s the abstract. You’ll find the full citation below.

Fusing tree-ring and forest inventory data to infer influences on tree growth

Better understanding and prediction of tree growth is important because of the many ecosystem services provided by forests and the uncertainty surrounding how forests will respond to anthropogenic climate change. With the ultimate goal of improving models of forest dynamics, here we construct a statistical model that combines complementary data sources, tree-ring and forest inventory data. A Bayesian hierarchical model was used to gain inference on the effects of many factors on tree growth—individual tree size, climate, biophysical conditions, stand-level competitive environment, tree-level canopy status, and forest management treatments—using both diameter at breast height (dbh) and tree-ring data. The model consists of two multiple regression models, one each for the two data sources, linked via a constant of proportionality between coefficients that are found in parallel in the two regressions. This model was applied to a data set of ~130 increment cores and ~500 repeat measurements of dbh at a single site in the Jemez Mountains of north-central New Mexico, USA. The tree-ring data serve as the only source of information on how annual growth responds to climate variation, whereas both data types inform non-climatic effects on growth. Inferences from the model included positive effects on growth of seasonal precipitation, wetness index, and height ratio, and negative effects of dbh, seasonal temperature, southerly aspect and radiation, and plot basal area. Climatic effects inferred by the model were confirmed by a dendroclimatic analysis. Combining the two data sources substantially reduced uncertainty about non-climate fixed effects on radial increments. This demonstrates that forest inventory data measured on many trees, combined with tree-ring data developed for a small number of trees, can be used to quantify and parse multiple influences on absolute tree growth. We highlight the kinds of research questions that can be addressed by combining the high-resolution information on climate effects contained in tree rings with the rich tree- and stand-level information found in forest inventories, including projection of tree growth under future climate scenarios, carbon accounting, and investigation of management actions aimed at increasing forest resilience.

Evans, M. E. K., D. A. Falk, A. Arizpe, T. L. Swetnam, F. Babst, and K. E. Holsinger. 2017. Fusing tree-ring and forest inventory data to infer influences on tree growth. Ecosphere 8(7):e01889. doi: 10.1002/ecs2.1889

A journal I don’t trust – Journal of Forensic Medicine Forecast

I won’t say that the Journal of Forensic Medicine Forecast is a predatory open-access journal. I will say that I am suspicious. Why? Beccause I received an invitation over the weekend to join its editorial board. If you’re reading this post, you know enough about me to know that I have no expertise at all in forensic medicine. How could I not be suspicious any journal, open access or not, that invites me to join its editorial board when I lack expertise relevant to its subject.

Here’s the text of the e-mail I received, with a name redacted to protect the individual’s privacy:.

Dear Dr. Kent E Holsinger,


Journal of Forensic Medicine Forecast is an international, Open Access and peer-reviewed journal has been launched by ScienceForecast Publications. The journal devoted to Clinical forensic medicine, digital and multimedia sciences, forensic analytical chemistry, forensic anthropology, forensic biology, forensic education, forensic entomology, forensic genetics, forensic microbiology, forensic odontology, forensic osteology, forensic pathology, forensic physical evidence, forensic psychiatry, forensic radiology, forensic serology, blood spatter analysis, drug delivery, crime scene investigation, dna fingerprinting, toxicological, human toxicology, applied toxicology, experimental toxicology, environmental toxicology, investigative toxicology.

At the onset, we are going to invite editorial board members, journal is seeking energetic, qualified and high profile researchers to join its editorial board. We believe the quality of a journal is depends on the quality of its Editorial Board.

Based on your high expertise in the field of forensic medicine, we are inviting you to join as editorial board member of our journal. As an EB member of our journal you may be required to occasionally review papers, solicit articles from your colleagues/acquaintances and help promote the journal at initial stage.

If you are interested to join as our Editorial Board member, please reply with your latest CV, photograph along with research interests as an email attachment.

To visit Desktop site: www.scienceforecastoa.com

We look forward to receiving your valuable response.

Best Regards,
<name redacted>
Editorial Office: Journal of Forensic Medicine Forecast
Tipple, View drive W
Ohio – 43016

This is not a spam email. If you are not interested to join as Editorial Board Member of this journal, please reply to this email with “unsubscribe” in the subject line.

Lecture notes in population genetics – final version from Spring 2017

I’ve finally had time to clean and post the final version of lecture notes from my graduate course in population genetics last spring. The individual lectures have been since I revised them for class, meaning that the last set of them was available in late April. You will find links to the individual lecture notes at http://darwin.eeb.uconn.edu/uncommon-ground/eeb348/notes/. If you’re interested in a particular topic in population genetics and I have a lecture that covers the topic, that’s probably where you’ll want to go.

If you want a single-volume reference to population genetics (including some old notes that I no longer maintain), you’ll find a PDF (5.89MB, 322 pages) at Figshare (doi: 10.6084/m9.figshare.100687.v2). If you want to print the PDF, I recommend that you print it on a double-sided printer. You can then put the pages in a binder and flip through them as if it were a bound book.

If you use LaTeX (and you’re a glutton for punishment), the LaTeX source and EPS files (for figures) is available in a Github repository (https://kholsinger.github.io/Lecture-Notes-in-Population-Genetics/).

These notes are released under a Creative Commons Attribution-ShareAlike license (http://creativecommons.org/licenses/by-sa/4.0/). I hope you find them useful. If you find errors in them, please let me know.