Lauren Kennedy and co-authors (citation below) worry about the effect of “contamination” on estimates of credible intervals.^{1} The effect arises because we often assume that values are drawn from a normal distribution, even though there are “outliers” in the data, i.e., observations drawn from a different distribution that “contaminate” our observations. Not surprisingly, they find that a model including contamination does a “better job” of estimating the mean and credible intervals than one that assumes a simple normal distribution.^{2}

They consider the following data as an example:

`-2, -1, 0, 1, 2, 15`

They used the following model for the data (writing in JAGS notation):

x[i] ~ dnorm(mu, tau)
tau ~ dgamma(0.0001, 0.0001)
mu ~ dnorm(0, 100)

That prior on `tau` should be a red flag. Gelman (citation below) pointed out a long time ago that such a prior is a long way from being vague or non-informative. It puts a tremendous amount of weight on very small values of `tau`, meaning a very high weight on large values of the variance. Similarly, the `N(0, 100)`; prior on `mu`; may seem like a “vague” choice, but it puts more than 80% of the prior probability on outcomes with `x < -20` or `x > 20`, substantially more extreme than any that were observed.

Before we begin an analysis we typically have some idea what “reasonable” values are for the variable we’re measuring. For example, if we are measuring the height of adult men, we would be very surprised to find anyone in our sample with a height greater than 3m or less than 0.5m. It wouldn’t make sense to use a prior for the mean that put appreciable probability on outcomes more extreme.

In this case the data are made up, so there isn’t any prior knowledge to work from. but the authors say that “[i]t is immediately obvious that the sixth data point is an *outlier*” (emphasis in the original). Let’s take them at their word. A reasonable choice of prior might then be `N(0,1)`, since all of the values (except for the “outlier”) lie within two standard deviations of the mean.^{3} Similarly, a reasonable choice for the prior on `sigma` (`sqrt(1/tau)`) might be a half-normal with mean 0 and standard deviation 2, which will allow for standard deviations both smaller and larger than observed in the data.

I put that all together in a little R/Stan program (test.R, test.stan). When I run it, these are the results I get:

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 0.555 0.016 0.899 -1.250 -0.037 0.558 1.156 2.297 3281 0.999
sigma 4.775 0.014 0.841 3.410 4.156 4.715 5.279 6.618 3466 1.000
lp__ -16.609 0.021 0.970 -19.229 -17.013 -16.314 -15.903 -15.663 2086 1.001

Let’s compare those results to what Kennedy and colleagues report:

Analysis | Posterior mean | 95% credible interval |

Stan + "reasonable priors" | 0.56 | (-1.25, 2.30) |

Kennedy et al. - Normal | 2.49 | (-4.25, 9.08) |

Kennedy et al. - Contaminated normal | 0.47 | (-2.49, 4.88) |

So if you use “reasonable” priors, you get a posterior mean from a model without contamination that isn’t very different from what you get from the more complicated contaminated normal model, and the credible intervals are actually narrower. If you really think *a priori* that 15 is an unreasonable observation, which estimate (point estimate and credible interval) would you prefer? I’d go for the model assuming a normal distribution with reasonable priors.

It all comes down to this. *Your choice of priors matters. There is no such thing as an uninformative prior. *If you think you are playing it safe by using very vague or flat priors, think carefully about what you’re doing. There’s a good chance that you’re actually putting a lot of prior weight on values that are unreasonable.^{4} You will almost always have some idea about what observations are reasonable or possible. *Use that information to set weakly informative priors.* See the discussion at https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations for more detailed advice.

(more…)