# Author Archive: kent

## Not every credible interval is credible

Lauren Kennedy and co-authors (citation below) worry about the effect of “contamination” on estimates of credible intervals.1 The effect arises because we often assume that values are drawn from a normal distribution, even though there are “outliers” in the data, i.e., observations drawn from a different distribution that “contaminate” our observations. Not surprisingly, they find that a model including contamination does a “better job” of estimating the mean and credible intervals than one that assumes a simple normal distribution.2

They consider the following data as an example:
`-2, -1, 0, 1, 2, 15`
They used the following model for the data (writing in JAGS notation):

```x[i] ~ dnorm(mu, tau)
tau ~ dgamma(0.0001, 0.0001)
mu ~ dnorm(0, 100)
```

That prior on tau should be a red flag. Gelman (citation below) pointed out a long time ago that such a prior is a long way from being vague or non-informative. It puts a tremendous amount of weight on very small values of tau, meaning a very high weight on large values of the variance. Similarly, the N(0, 100); prior on mu; may seem like a “vague” choice, but it puts more than 80% of the prior probability on outcomes with x < -20 or x > 20, substantially more extreme than any that were observed.

Before we begin an analysis we typically have some idea what “reasonable” values are for the variable we’re measuring. For example, if we are measuring the height of adult men, we would be very surprised to find anyone in our sample with a height greater than 3m or less than 0.5m. It wouldn’t make sense to use a prior for the mean that put appreciable probability on outcomes more extreme.

In this case the data are made up, so there isn’t any prior knowledge to work from. but the authors say that “[i]t is immediately obvious that the sixth data point is an outlier” (emphasis in the original). Let’s take them at their word. A reasonable choice of prior might then be N(0,1), since all of the values (except for the “outlier”) lie within two standard deviations of the mean.3 Similarly, a reasonable choice for the prior on sigma (sqrt(1/tau)) might be a half-normal with mean 0 and standard deviation 2, which will allow for standard deviations both smaller and larger than observed in the data.

I put that all together in a little R/Stan program (test.R, test.stan). When I run it, these are the results I get:

```         mean se_mean    sd    2.5%     25%     50%     75%   97.5% n_eff  Rhat
mu      0.555   0.016 0.899  -1.250  -0.037   0.558   1.156   2.297  3281 0.999
sigma   4.775   0.014 0.841   3.410   4.156   4.715   5.279   6.618  3466 1.000
lp__  -16.609   0.021 0.970 -19.229 -17.013 -16.314 -15.903 -15.663  2086 1.001
```

Let’s compare those results to what Kennedy and colleagues report:

AnalysisPosterior mean95% credible interval
Stan + "reasonable priors"0.56(-1.25, 2.30)
Kennedy et al. - Normal2.49(-4.25, 9.08)
Kennedy et al. - Contaminated normal0.47(-2.49, 4.88)

So if you use “reasonable” priors, you get a posterior mean from a model without contamination that isn’t very different from what you get from the more complicated contaminated normal model, and the credible intervals are actually narrower. If you really think a priori that 15 is an unreasonable observation, which estimate (point estimate and credible interval) would you prefer? I’d go for the model assuming a normal distribution with reasonable priors.

It all comes down to this. Your choice of priors matters. There is no such thing as an uninformative prior. If you think you are playing it safe by using very vague or flat priors, think carefully about what you’re doing. There’s a good chance that you’re actually putting a lot of prior weight on values that are unreasonable.4 You will almost always have some idea about what observations are reasonable or possible. Use that information to set weakly informative priors. See the discussion at https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations for more detailed advice.

## Photos from Bonn

Christmas market in Bonn, December 2016. Click on the photo to see the entire album on Flickr.

Late last November I visited Bonn to participate in a small workshop sponsored by the Crop Trust (https://www.croptrust.org/) intended to develop criteria to assess whether germplasm collections of crops and crop wild relatives are sufficient to meet the Millenium Development Goals. I was there not because I know a lot about crop germplasm (I don’t), but because my expertise in analysis of genetic structure in plant populations and my work on plant conservation genetics provided some (I hope) useful context for the discussions.

I didn’t have a lot of time to explore Bonn, but I did have a couple of hours on the afternoon I arrived and most of the morning on the day that I left. If you’d like to see the photos I thought were worth saving, click on the image above to visit a Flckr album where you can see them all.

## I feel appreciated

This week the University of Connecticut expressed its appreciation for employees. I am nearing the end of my 31st year with the University, and I’ve heard through the grapevine that I’ll be receiving a “surprise” gift recognizing 30 years of service at a meeting I’m attending next week. On Tuesday, I presented a gift to an employee of The Graduate School recognizing 35 years of service as a state employee. I was surprised and gratified to receive the unofficial “Certificate of Awesomeness” pictured above from members of The Graduate School staff. I feel very appreciated – and fortunate to have been a part of UConn for more than 30 years.

## It appears that I’m a New Conservationist

I recently discovered the Future of Conservation Project. It’s a project designed “to explore the views of conservationists on a range of issues, as a way of informing debates on the future of conservation.” As the About page says,

Recent debates about the future of conservation have been dominated by a few high-profile individuals, whose views seem to fit fairly neatly into polarised positions. In this survey, we are exploring the range of views that exist within the conservation movement globally, and how this varies by key demographic characteristics such as age, gender, geography and educational background.

The blue dots in the figure above are results from the 99 people who responded to the survey before me. Here’s how to interpret my results:

#### How to interpret your results

Your position is weakly negative along the people & nature axis and weakly positive along the conservation & capitalism axis.

Your position on the two axes above reflects your survey answers. A move from left to right along the horizontal axis (people/nature) implies a shift from seeing conservation as a means of improving human welfare to conservation for nature’s own sake.

The vertical axis (conservation & capitalism) indicates a spectrum of willingness to embrace markets and capitalism as conservation tools: the higher up the graph your score is, the more pro-markets it is. This places you in the top left quadrant of the graph – a position suggesting your views on these particular dimensions of the debate are most closely related to those of ‘new conservationists’ as set out in the literature.

#### Your thinking most closely aligns with: New Conservation

Central to the ‘new conservation’ position is a shift towards framing conservation as being about protecting nature in order to improve human wellbeing (especially that of the poor), rather than for biodiversity’s own sake. ‘New conservationists’ believe that win-win situations in which people benefit from conservation can often be achieved by promoting economic growth and partnering with corporations.

Although new conservation advocates have been criticised for doing away with nature’s intrinsic value, key authors within the movement have responded by clarifying that their motive is not so much an ethical as a strategic or pragmatic one. In other words, they claim that conservation needs to emphasise nature’s instrumental value to people because this better promotes support for conservation compared to arguments based solely on species’ rights to exist.

If you’re interested in taking the survey, here’s the link: http://www.futureconservation.org/.

## A few photos from Dublin

I visited Dublin a couple of weeks ago for meetings of the Deans and Directors of Graduate Studies (DDoGS) from Universitas 21 (http://www.universitas21.com/article/research/details/55/deans-and-directors-of-graduate-studies-ddogs). The meeting began on 20 March with a workshop on “The Future of the Doctorate.” It continued on 21 and 22 March with discussions of graduate research project grants, the possibility of a 3-MT competition for master’s students, plans for a supervisor development project,1, and a discussion of wellness and mental health. This was the fifth DDoGS meeting I’ve attended, and I’ve always found them useful.

There is a direct flight from Hartford to Dublin. I left in the evening on 18 March and arrived in Dublin a little after 5:00am on 19 March. Since there weren’t any activities until 6:00pm that night, I spent most of the day on the 19th wandering around Dublin taking photographs. I also spent the day on 23 March taking photographs, since I didn’t return home until the 24th. The photograph above is of Temple Bar, one of the most famous pubs in Dublin. If you’d like to see more of the photographs I took while I was there, here’s a link: https://www.flickr.com/photos/billandkent/albums/72157679862838432.

## The beauty of fynbos

In case you’ve ever wondered why I have spent so much time working in, thinking about, and writing about Protea this video from CapeNature will give you a bit of a clue. The fynbos is a very interesting place. It has an enormous diversity of plants, many of which are found nowhere else in the world, and much of that diversity is concentrated in a relatively small number of big evolutionary radiations, one of which is Protea.1 One of my students,

Kristen Nolting (@KristenNolting on Twitter) pointed me to this video. Thanks, Kristen.

## A new phylogeny for Protea

Protea compacta near Kleinmond, Western Cape, South Africa

The genus Protea is one of the iconic evolutionary radiations in the Greater Cape Floristic Region of southwestern South Africa. Its range extends north through Mozambique into parts of central Africa, but the vast majority of species are found in South Africa. From 2011-2014 we collected samples from most of the South African species (59 in total), and for most of the species we collected samples from several individuals from different populations. Over the last couple of years, we extracted DNA, built libraries for next generation sequencing using targeted phylogenomics, and constructed a highly-resolved estimate of phylogenetic relationships in the genus. The paper describing our results is now out in “early view” in American Journal of Botany. Most species from which we have multiple samples are supported as monophyletic units, and most relationships we identify are strongly supported (> 90% support in ASTRAL-II and SVDquartets analyses). We use the species tree from our data as a backbone to provide reliable estimates of relationship for additional species included in a paper by Schnitzler and colleagues for which we did not have samples.

Mitchell, N., P.O. Lewis, E.M. Lemmon, A.R. Lemmon, and K.E. Holsinger.  2017.  Anchored phylogenomics improves the resolution of evolutionary relationships in the rapid radiation of Protea L. American Journal of Botany doi: 10.3732/ajb.1600227

## Conservation, economic inequality, and privilege

Increased equity and pro-poor actions are not only moral issues to be kept in mind by conservationists. They are, rather, central to the larger goal of protecting the planet. – Bill Murdoch

In the past 15-20 years, conservation biologists have become increasingly aware that successful conservation efforts require the support and involvement of local communities, but only more recently have we become fully aware that getting that support and involvement requires that we pay attention to what communities need, not only to what we want. Reducing economic inequality, and in particular improving the lives of those in poverty, is not only the right thing to do on its own terms. It’s the only way we can protect the natural systems we can care about in the long term.

In Fall 2015, I discussed some of these issues in my graduate course in conservation biology. The problem is that it’s easy to say the “right” words and to congratulate ourselves for our wisdom and generosity. It’s harder to see how the attitudes those of us who live in relatively prosperous communities are influenced by the economic privileges we have. Those privileges are part of the reason it’s hard for us to understand farmers, ranchers, and oilmen who seem to have little regard for the land. I grew up among farmers and ranchers in southern Idaho, and the people I knew care as deeply about the land as I do. The difference? They draw their livelihood directly from the land, and their livelihood is less secure. Not unreasonably, they focus on immediate needs,1 not far-off benefits.

Fortunately, I had a very talented teaching assistant for the course, Holly Brown. She had the idea of using a “privilege walk” to illustrate the ways in which we – graduate students and faculty at UConn – are privileged when compared to many of those living in areas where conservation action is needed. Holly, Ambika Kamath, and Margaret Rubega describe the exercise in a recent article in Conservation Biology. This anonymous comment from one of our students was particularly striking:

The main thing I took away was that, when it comes to issues that are controversial (including climate change or biodiversity preservation), approaching those who might oppose ecologists with an understanding of my own privilege and how it differs from the background of others can help me to open myself up to innovative solutions, instead of imposing my beliefs on others.

If you teach a course in conservation biology, I encourage you to read Holly’s article and use some of the ideas in it the next time you teach your course.

## Against null hypothesis testing – the elephants and Andrew Gelman edition

Last week I pointed out a new paper by Denes Szucs and John Ioannidis, When null hypothesis significance testing is unsuitable for research: a reassessment.1 I mentioned that P-values from small, noisy studies are likely to be misleading. Last April, Raghu Parthasarathy at The Eighteenth Elephant had a long post on a more fundamental problem with P-values: they encourage binary thinking. Why is this a problem?

1. “Binary statements can’t be sensibly combined” when measurements have noise.
2. “It is almost never necessary to combine boolean statements.”
3. “Everything always has an effect.”

Those brief statements probably won’t make any sense,2 so head over to The Eighteenth Elephant to get the full explanation. The post is a bit long, but it’s easy to read, and well worth your time.

Andrew Gelman recently linked to Parthasarathy’s post and adds one more observation on how P-values are problematic: they are “interpretable only under the null hypothesis, yet the usual purpose of the p-value in practice is to reject the null.” In other words, P-values are derived assuming the null hypothesis is true. They tell us what the chances of getting the data we got are if the null hypothesis were true. Since we typically don’t believe the null hypothesis is true, the P-value doesn’t correspond to anything meaningful.

To take Gelman’s example, suppose we had an experiment with a control, treatment A, and treatment B. Our data suggest that treatment A is not different from control (P=0.13) but that treatment B is different from the control (P=0.003). That’s pretty clear evidence that treatment A and treatment B are different, right? Wrong.

P=0.13 corresponds to a treatment-control difference of 1.5 standard deviations; P=0.003, to a treatment-control difference of 3.0 standard deviations, a difference of 1.5 standard deviations, which corresponds to a P-value of 0.13. Why the apparent contradiction? Because if we want to say that treatment A and treatment B, we need to compare them directly to each other. When we do so, we realize that we don’t have any evidence that the treatments are different from one another.

As Parthasarthy points out in a similar example, a better interpretation is that we have evidence for the ordering (control < treatment A < treatment B). Null hypothesis significance testing could easily mislead us into thinking that what we have instead is (control = treatment A < treatment B). The problem arises, at least in part, because no matter how often we remind ourselves that it’s wrong to do so, we act as if a failure to reject the null hypothesis is evidence for the null hypothesis. Parthasarthy describes nicely how we should be approaching these problems:

It’s absurd to think that anything exists in isolation, or that any treatment really has “zero” effect, certainly not in the messy world of living things. Our task, always, is to quantify the size of an effect, or the value of a parameter, whether this is the resistivity of a metal or the toxicity of a drug.

We should be focusing on estimating the magnitude of effects and the uncertainty associated with those estimates, not testing null hypotheses.

## The influence of climate on tree growth

Northern Hemisphere temperature changes estimated from various proxy records shown in blue (Mann et al. 1999). Instrumental data shown in red. Note the large uncertainty (grey area) as you go further back in time.

Ecologists and paleoecologists have used the width of tree rings for years as a way of inferring past climates. In fact, tree ring data were an important component of the proxy data Mann et al. (1998) used when they constructed their famous1 hockey stick representing global surface temperatures over the last millennium. I don’t have anything as earth shattering as a hockey stick to share with you, but I am pleased to report that a paper on which I am a co-author demonstrates how to combine tree ring and growth increment data (with other data) to predict growth of forest trees. Here’s tha abstract and a link to the paper on bioRxiv.

https://doi.org/10.1101/097535

# Fusing tree-ring and forest inventory data to infer influences on tree growth

Better understanding and prediction of tree growth is important because of the many ecosystem services provided by forests and the uncertainty surrounding how forests will respond to anthropogenic climate change. With the ultimate goal of improving models of forest dynamics, here we construct a statistical model that combines complementary data sources: tree-ring and forest inventory data. A Bayesian hierarchical model is used to gain inference on the effects of many factors on tree growth (individual tree size, climate, biophysical conditions, stand-level competitive environment, tree-level canopy status, and forest management treatments) using both diameter at breast height (DBH) and tree-ring data. The model consists of two multiple regression models, one each for the two data sources, linked via a constant of proportionality between coefficients that are found in parallel in the two regressions. The model was applied to a dataset developed at a single, well-studied site in the Jemez Mountains of north-central New Mexico, U. S. A. Inferences from the model included positive effects of seasonal precipitation, wetness index, and height ratio, and negative effects of seasonal temperature, southerly aspect and radiation, and plot basal area. Climatic effects inferred by the model compared well to results from a dendroclimatic analysis. Combining the two data sources did not lead to higher predictive accuracy (using the leave-one-out information criterion, LOOIC), either when there was a large number of increment cores (129) or under a reduced data scenario of 15 increment cores. However, there was a clear advantage, in terms of parameter estimates, to the use of both data sources under the reduced data scenario: DBH remeasurement data for ~500 trees substantially reduced uncertainty about non-climate fixed effects on radial increments. We discuss the kinds of research questions that might be addressed when the high-resolution information on climate effects contained in tree rings are combined with the rich metadata on tree- and stand-level conditions found in forest inventories, including carbon accounting and projection of tree growth and forest dynamics under future climate scenarios.
(more…)