Uncommon Ground

Author Archive: kent

Being Bayesian won’t save you

Last week I pointed out that you should

Be wary of results from studies with small sample sizes, even if the effects are statistically significant.

Now you may be thinking to yourself: “I’m a Bayesian, and I use somewhat informative priors. This doesn’t apply to me.” Well, I’m afraid you’re wrong. Here are results from analysis of data simulated according to the same conditions I used last week in exploring P-values. The prior on each mean is N(0, 1), and the prior on each standard deviation is half-N(0, 1).

Mean Sample size Power Wrong sign
0.05 10 39/1000 18/39
50 59/1000 12/59
100 47/1000 5/47
0.10 10 34/1000 8/34
50 81/1000 10/81
100 115/1000 6/115
0.20 10 62/1000 7/62
50 158/1000 2/158
100 292/1000 0/292

Here “Power” refers to the number of times (out of 1000 replicates) the symmetric 95% credible intervals do not overlap 0, which is when we’d normally conclude we have evidence that the means of the two populations are different. Notice that when the effect and sample size are small (0.05 and 10, respectively), we would infer the wrong sign for the difference almost half of the time (18/39). We’re less likely to make a sign error when the effect is larger (7/62 for an effect of 0.20) or when the sample size is large (5/47 for a sample size of 100). But the bottom line remains the same:

Be wary of results from studies with small sample sizes, even if the effects are statistically significant.

This figure summarizes results from the simulation, and you’ll find the code in the same Github repository as the P-value code I mentioned last week: https://github.com/kholsinger/noisy-data. Remember that Gelman and Carlin (Perspectives on Psychological Science 9:641; 2014 http://dx.doi.org/10.1177/1745691614551642)  also have advice on how to tell whether you’re data are too noisy for your sample to give confidence in your inferences.

bayesian

Thought for the day

If a man walk in the woods for love of them half of each day, he is in danger of being regarded as a loafer; but if he spends his whole day as a speculator, shearing off those woods and making earth bald before her time, he is esteemed an industrious and enterprising citizen.

Henry David Thoreau, Life without principle

Edwin Way Teale Series on Nature and the Environment

Teale 2016Every year since the 1997 the University of Connecticut has hosted the Edwin Way Teale Lecture Series on Nature and the Environment. The series features distinguished natural scientists, social scientists, authors, artists, performers, and policy makers whose work informs our understanding of nature and the environment. The lectures are free and open to the public. Many lectures in recent years are also available online. You can find the full list of past lectures and links to videos (where available) at this link: http://lib.uconn.edu/about/events/nature-the-environment-the-edwin-way-teale-lecture-series-past-lectures/.

Here is a quick list of this year’s events:

  • Julien Agyeman, “Just Sustainabilities: Re-imagining e/quality, Living Within Limits”
  • Emma Rosi-Marshall, “Our Rivers on Drugs: Pharmaceuticals and Personal Care Products as Agents of Ecological Change in Aquatic Ecosystems”
  • Harriet Ritvo, “Wanting the Wild”
  • Elizabeth Kolbert, “The Sixth Extinction”
  • Maria Carmen Lemos, “Building Capacity for Adapting to Climate Change”
  • Mina Girgis, “The Nile Project”

The dates and times for the events are available on the Teale Series website. If you are close to Storrs, please stop by and join us. If you are far away or other commitments mean that you can’t join us, please check back to see if a recorded version of the presentation that interests you is available online.

Legos and graduate school

Screen Shot 2016-09-01 at 12.49.36 PMGraduate students are very creative, and I recently learned about an anonymous graduate student in her/his sixth year at a private, West Coast university who is more creative than most – @legogradstudent. I’ve been out of graduate school for more years than I like to admit,1 but I can still relate to the feelings @legogradstudent captures in her/his tweets. S/he has just short of 2600 followers now, but I’m sure that number is going to grow. Inside Higher Ed described her/him this way in the article that brought her/him to my attention:

Lego Grad Student has fans across disciplines, who often use some variation of “devastatingly true” to describe his experiences. Indeed, his tableaux focus not on the intricacies of his research but rather on the human experience of graduate school: feelings of being on a treadmill to nowhere, being beaten to the intellectual punch by colleagues, using sophisticated avoidance techniques during a class discussion and the horror of seeing free food disappear before his eyes at departmental events.

If you’re in graduate school, if you have friends or relatives who are in graduate school, or if you’re just interested in graduate school, you owe it to yourself to follow @legogradstudent on Twitter or Instagram.


134 years last June, if you must know.

Inference from noisy data with small samples

From a blog post Andrew Gelman made over a decade ago that I first came across about five or six years ago (http://andrewgelman.com/2004/12/29/type_1_type_2_t/):

In statistics, we learn about Type 1 and Type 2 errors. For example, from an intro stat book:

  • A Type 1 error is committed if we reject the null hypothesis when it is true.
  • A Type 2 error is committed if we accept the null hypothesis when it is false.

That’s a standard definition that anyone who’s had a basic statistics course has probably heard (even if they’ve forgotten it by now). Gelman points out, however, that it is arguably more useful to think about two different kinds of error,

  • Type S errors occur when you claim that an effect is positive even though it’s actually negative.
  • Type M errors occur when you claim that an effect is large when it’s really small (or vice versa).

You’re probably thinking to yourself, “Why should I care about Type S or Type M errors? Surely if I do a typical null hypothesis test and reject the null hypothesis, I won’t make a Type S error, right?”1 Wrong! More precisely, you’re wrong if your sample size is small, and your data are noisy.

Let me illustrate this with a really simple example. Suppose we’re comparing the mean of two different populations x and y. To make that comparison, we take a sample of size N from each population, and perform a t-test (assuming equal variances in x and y). To make this concrete let’s assume that the variance is 1 in both populations and that the mean in population y is 0.05 greater than the mean in population x and suppose that N = 10. Now you’re probably thinking that the chances of detecting a difference between x and y isn’t great, and you’d be right. In fact, in the simulation below only 50 out of 1000 had a P-value < 0.05. What may surprise you is that of those 50 samples with P < 0.05, the mean of the sample from x was smaller than the mean of the sample from y. In other words, more than 30% of the time we would have made the wrong conclusion about which population had the larger mean, even though the difference in our sample was statistically significant. With a sample size of 100, we don’t pick up a significant difference between x and y that much more often (66 out of 1000 instead of 50 out of 1000), but only 9 of the 66 samples has the wrong sign. Obviously, if the difference in means is greater, sample size is less of an issue, but the bottom line is this:

If you are studying effects where between group differences are small relative to within group variation, you need a large sample to be confident in the sign of any effect you detect, even if the effect is statistically significant.

The figure below illustrates results for 1000 replicates drawn from two different populations with the specified difference in means and sample sizes. Source code (in R) to replicate the results and explore different combinations of sample size and mean difference is available in Github: https://github.com/kholsinger/noisy-data.

P-values

Gelman and Carlin (Perspectives on Psychological Science 9:641; 2014 http://dx.doi.org/10.1177/1745691614551642) provide a lot more detail and useful advice, including this telling paragraph from the conclusions:

[W]e believe that too many small studies are done and preferentially published when “significant.” There is a common misconception that if you happen to obtain statistical significance with low power, then you have achieved a particularly impressive feat, obtaining scientific success under difficult conditions.

Bottom line: Be wary of results from studies with small sample sizes, even if the effects are statistically significant.


1I’m not going to talk about Type M errors, because in my work I’m usually happy just determining whether or not a given effect is positive and less worried about whether it’s big or small. If you’re worried about Type M errors, read the paper by Gelman and Tuerlinckx (PDF).

Letters from graduate school

I recently learned of a new website that is worth putting in your bookmark list or adding to the subscription list of your RSS reader: Letters from graduate school. Here’s what they say about themselves.

For every graduate student, graduate school is a different experience filled with ups, downs, failures, and successes. The goal of Letters from Graduate School is to build a collective of graduate school experiences—your experience, in your own voice! (http://lettersfromgradschool.org)

There are four essays in Issue 1 (August 2016)

  • Love and abuse in graduate school, by an anonymous contributor, which makes a plea for teaching graduate students “that their love for research doesn’t have to be siphoned out of a finite pool of respect they’re allowed to show towards themselves.”
  • Writing on an island, by Becky Vartabedian, which describes a physical practice that helped her find her way out of the isolation that is an inevitable part of doctoral study.
  • Post-PhD: the jobs didn’t get and the one that I did, by A. Seun Ajiboye, which talks about how he found what profession he wanted in (and how he got there) after realizing that he didn’t want to be in academia.
  • Don’t check your optimism at the door, by Renee Geck, which provides some excellent advice – “Grad school doesn’t have to be a year of ignorant bliss and then a dreary trudge to the end. If you find people whom you trust to help you through the worst patches, chances are you’ll come out the other side a lot better than the people who go it alone.”

If Issue 1 is any indication, Letters from Graduate School will be a valuable resource for graduate students and graduate advisors. I look forward to reading the essays in future issues.

Beloit mindset list 2020

Every year Beloit College releases its Mindset List. Although the list has its critics (http://www.beloitmindlessness.com/must-be-destroyed/) and it’s been parodied by The Onion, I always get a kick out of looking it over. It reminds me just how old I am. Here are a few gems from this year’s list:

  • There has always been a digital swap meet called eBay.
  • There have always been Cadillac Escalades.
  • West Nile has always been a virus found in the US.
  • They have never had to watch or listen to programs at a scheduled time.
  • Vaccines have always erroneously been linked to autism.
  • They have no memory of Bob Dole promoting Viagra.
  • John Elway and Wayne Gretzky have always been retired.

National Park Service – Happy 100th birthday

100 years ago today the National Park Service was born. National parks are, as the Ken Burns documentary put it, America’s Best Idea. Unfortunately, I will not be able to participate in any of the celebrations today, nor am I likely to make it to a National Park this year, but I am delighted to live in a country that has placed such value on wild and beautiful places. I practically grew up in Yellowstone, and I’ve visited many other National Parks. Please take some time today to celebrate our good fortune, and if you’re close enough to a National Park or National Monument to visit, please consider taking the time to stop by and thank the Park Service employees for their service to our country.

What counts? Evaluating public communication in tenure and promotion

Last Friday, the American Sociological Association released a subcommittee report entitled What counts? Evaluating public communication in tenure and promotion. It suggests that public communication of research can be assessed along three axes:

  • Type of content: explanatory journalism, opinion, application of research to a practical issue
  • Rigor and quality of the communication: peer-reviewed, edited, non-edited, effectiveness of communication for the intended audience
  • Public impact: number of readers, breadth of influence on policy or practice

Amy Schalet, Director of the Public Engagement Project at UMass Amherst, argues that public communication should be included in faculty evaluations, because when we include it, “we encourage [faculty] to share their knowledge with the members of society who could most benefit from it.”

I agree, but as always, the devil is in the details. Among the questions I wrestle with are:

  • To what extent is public communication of scholarly work an aspect of scholarly achievement?
  • Should public communication be regarded primarily as service in the tenure triad of teaching, research, and service?
  • Is public communication worthwhile only if it leads to changes in public policy or professional practice? Is it worthwhile if it leads “only” to greater aesthetic appreciation of the human or natural world?

The University of Connecticut has been recognized as a “Community Engaged” campus by the Carnegie Foundation for the Advancement of Teaching since 2010 (ref). These are clearly among the questions that we need to answer.

Don’t overinterpret STRUCTURE plots

Screen Shot 2016-08-21 at 4.11.10 PM
Several weeks ago1 Daniel Falush (@DanielFalush) posted a preprint on bioRxiv, “A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots”. I finally had a chance to read it this weekend. Here’s the abstract:

Genetic clustering algorithms, implemented in popular programs such as STRUCTURE and ADMIXTURE, have been used extensively in the characterisation of individuals and populations based on genetic data. A successful example is reconstruction of the genetic history of African Americans who are a product of recent admixture between highly differentiated populations. Histories can also be reconstructed using the same procedure for groups which do not have admixture in their recent history, where recent genetic drift is strong or that deviate in other ways from the underlying inference model. Unfortunately, such histories can be misleading. We have implemented an approach (available at www.paintmychromsomes.com) to assessing the goodness of fit of the model using the ancestry ‘palettes’ estimated by CHROMOPAINTER and apply it to both simulated and real examples. Combining these complementary analyses with additional methods that are designed to test specific hypothesis allows a richer and more robust analysis of recent demographic history based on genetic data.

A key observation Falush and his co-authors make is that different demographic scenarios can lead to the same STRUCTURE diagram. They illustrate three different scenarios. In all of them, they simulate data from 12 populations but sample from only four of them. In all of the scenarios, population P4 has been isolated from the other three populations in the sample for a long time. It’s the relationship between P1, P2, and P3 that differs among the scenarios.

  • Recent admixture: P1 and P3 have also been distinct for some time, and P2 is a recent admixture of P1, P3, and P4.
  • Ghost admixture: P1 and P3 diverged some time ago, and P2 is a recent admixture of P1 and a “ghost” population more closely related to P3 than to P1.
  • Recent bottleneck: P1 is sister to P2 but underwent a strong recent bottleneck.

Screen Shot 2016-08-21 at 4.19.59 PM

As you can see, the STRUCTURE diagrams estimated from data simulated in each scenario are indistinguishable. They also show that if you have additional data available, specifically if you are lucky enough to be working in an organism with a lot of SNPs that are mapped, then you can combine estimates from CHROMOPAINTER with those from STRUCTURE to distinguish the recent admixture scenario from the other two – assuming that you’ve picked a reasonable number for K, the number of subpopulations.2

The authors also refer to Puechmaille’s recent work demonstrating that estimates of genetic structure are greatly affected by sample size. Bottom line: Read both this paper and Puechmaille’s if you use STRUCTURE, tread cautiously when interpreting results, and don’t expend too much effort trying to estimate the “right” K.


1OK, as you can see from the tweet, it was almost a month ago.

2The paper contains a brief remark about how hard it is to estimate K: “Unless the demographic history of the sample is particularly simple, the value of K inferred according to any statistically sensible criterion is likely to be smaller than the number of distinct drift events that have significantly impacted the sample. What the algorithm often does is in practice use variation in admixture proportions between individuals to approximately mimic the effect of more than K distinct drift events without estimating ancestral populations corresponding to each one.”

Falush, D., L. van Dorp, D. Lawson. 2016. A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots. bioRxiv doi: 10.1101/066431
Lawson, D.J., G. Hellenthal, S. Myers, and D. Falush. 2012. Inference of population structure using dense haplotype data. PLoS Genetics 8:e1002453. doi: 10.1371/journal.pgen.1002453
Puechmaille, S.J. 2016. The program structure does not reliably recover the correct population structure when sampling is uneven: subsampling and new estimators alleviate the problem. Molecular Ecology Resources 16:608-627. doi: 10.1111/1755-0998.12512