{"id":279,"date":"2017-04-19T08:30:39","date_gmt":"2017-04-19T12:30:39","guid":{"rendered":"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/?p=279"},"modified":"2017-04-16T12:24:32","modified_gmt":"2017-04-16T16:24:32","slug":"not-every-credible-interval-is-credible","status":"publish","type":"post","link":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2017\/04\/19\/not-every-credible-interval-is-credible\/","title":{"rendered":"Not every credible interval is credible"},"content":{"rendered":"<p>Lauren Kennedy and co-authors (citation below) worry about the effect of &#8220;contamination&#8221; on estimates of credible intervals.<sup>1<\/sup> The effect arises because we often assume that values are drawn from a normal distribution, even though there are &#8220;outliers&#8221; in the data, i.e., observations drawn from a different distribution that &#8220;contaminate&#8221; our observations. Not surprisingly, they find that a model including contamination does a &#8220;better job&#8221; of estimating the mean and credible intervals than one that assumes a simple normal distribution.<sup>2<\/sup><\/p>\n<p>They consider the following data as an example:<br \/>\n<code>-2, -1, 0, 1, 2, 15<\/code><br \/>\nThey used the following model for the data (writing in JAGS notation):<\/p>\n<pre>x[i] ~ dnorm(mu, tau)\r\ntau ~ dgamma(0.0001, 0.0001)\r\nmu ~ dnorm(0, 100)\r\n<\/pre>\n<p>That prior on <tt>tau<\/tt> should be a red flag. Gelman (citation below) pointed out a long time ago that such a prior is a long way from being vague or non-informative. It puts a tremendous amount of weight on very small values of <tt>tau<\/tt>, meaning a very high weight on large values of the variance. Similarly, the <tt>N(0, 100)<\/tt>; prior on <tt>mu<\/tt>; may seem like a &#8220;vague&#8221; choice, but it puts more than 80% of the prior probability on outcomes with <tt>x &lt; -20<\/tt> or <tt>x &gt; 20<\/tt>, substantially more extreme than any that were observed.<\/p>\n<p>Before we begin an analysis we typically have some idea what &#8220;reasonable&#8221; values are for the variable we&#8217;re measuring. For example, if we are measuring the height of adult men, we would be very surprised to find anyone in our sample with a height greater than 3m or less than 0.5m. It wouldn&#8217;t make sense to use a prior for the mean that put appreciable probability on outcomes more extreme.<\/p>\n<p>In this case the data are made up, so there isn&#8217;t any prior knowledge to work from. but the authors say that &#8220;[i]t is immediately obvious that the sixth data point is an <em>outlier<\/em>&#8221; (emphasis in the original). Let&#8217;s take them at their word. A reasonable choice of prior might then be <tt>N(0,1)<\/tt>, since all of the values (except for the &#8220;outlier&#8221;) lie within two standard deviations of the mean.<sup>3<\/sup> Similarly, a reasonable choice for the prior on <tt>sigma<\/tt> (<tt>sqrt(1\/tau)<\/tt>) might be a half-normal with mean 0 and standard deviation 2, which will allow for standard deviations both smaller and larger than observed in the data.<\/p>\n<p>I put that all together in a little R\/Stan program (<a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/documents\/test.R\">test.R<\/a>, <a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/documents\/test.stan\">test.stan<\/a>). When I run it, these are the results I get:<\/p>\n<pre>         mean se_mean    sd    2.5%     25%     50%     75%   97.5% n_eff  Rhat\r\nmu      0.555   0.016 0.899  -1.250  -0.037   0.558   1.156   2.297  3281 0.999\r\nsigma   4.775   0.014 0.841   3.410   4.156   4.715   5.279   6.618  3466 1.000\r\nlp__  -16.609   0.021 0.970 -19.229 -17.013 -16.314 -15.903 -15.663  2086 1.001\r\n<\/pre>\n<p>Let&#8217;s compare those results to what Kennedy and colleagues report:<\/p>\n\n<table id=\"tablepress-1\" class=\"tablepress tablepress-id-1\">\n<thead>\n<tr class=\"row-1 odd\">\n\t<th class=\"column-1\">Analysis<\/th><th class=\"column-2\">Posterior mean<\/th><th class=\"column-3\">95% credible interval<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-hover\">\n<tr class=\"row-2 even\">\n\t<td class=\"column-1\">Stan + \"reasonable priors\"<\/td><td class=\"column-2\">0.56<\/td><td class=\"column-3\">(-1.25, 2.30)<\/td>\n<\/tr>\n<tr class=\"row-3 odd\">\n\t<td class=\"column-1\">Kennedy et al. - Normal<\/td><td class=\"column-2\">2.49<\/td><td class=\"column-3\">(-4.25, 9.08)<\/td>\n<\/tr>\n<tr class=\"row-4 even\">\n\t<td class=\"column-1\">Kennedy et al. - Contaminated normal<\/td><td class=\"column-2\">0.47<\/td><td class=\"column-3\">(-2.49, 4.88)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-1 from cache -->\n<p>So if you use &#8220;reasonable&#8221; priors, you get a posterior mean from a model without contamination that isn&#8217;t very different from what you get from the more complicated contaminated normal model, and the credible intervals are actually narrower. If you really think <em>a priori<\/em> that 15 is an unreasonable observation, which estimate (point estimate and credible interval) would you prefer? I&#8217;d go for the model assuming a normal distribution with reasonable priors.<\/p>\n<p>It all comes down to this. <strong><em>Your choice of priors matters. There is no such thing as an uninformative prior.\u00a0<\/em><\/strong>If you think you are playing it safe by using very vague or flat priors, think carefully about what you&#8217;re doing. There&#8217;s a good chance that you&#8217;re actually putting a lot of prior weight on values that are unreasonable.<sup>4<\/sup> You will almost always have some idea about what observations are reasonable or possible. <strong><em>Use that information to set weakly informative priors.<\/em><\/strong> See the discussion at <a href=\"https:\/\/github.com\/stan-dev\/stan\/wiki\/Prior-Choice-Recommendations\">https:\/\/github.com\/stan-dev\/stan\/wiki\/Prior-Choice-Recommendations<\/a> for more detailed advice.<\/p>\n<p><!--more--><br \/>\nGelman, A. 2006. Prior distributions for variance parameters in hierarchical models (Comment on article by Browne and Draper). <em>Bayesian Analysis<\/em> 1:515-534 <a href=\"https:\/\/projecteuclid.org\/euclid.ba\/1340371048\">https:\/\/projecteuclid.org\/euclid.ba\/1340371048<\/a><\/p>\n<p>Kennedy, L.A., D.J. Navarro, A. Perfors, and N. Briggs. 2017. Not every credible interval is credible. <em>Behavioral Research<\/em> doi: <a href=\"http:\/\/dx.doi.org\/10.3758\/s13428-017-0854-1\">10.3758\/s13428-017-0854-1<\/a><\/p>\n<p><small><sup>1<\/sup>They note that the problem isn&#8217;t unique to Bayesian credible intervals. The same problems apply to classical confidence intervals.<\/small><\/p>\n<p><small><sup>2<\/sup>If you want to know what the authors mean by &#8220;better&#8221;, read the paper. That&#8217;s not the focus of this post.<\/small><\/p>\n<p><small><sup>3<\/sup>If you&#8217;re following closely, you&#8217;re likely to be bothered that I&#8217;m using the data to set the prior. You&#8217;re right to be bothered, because you should use <em>prior<\/em> knowledge, not the data to set your prior. In this case I have no choice, since there isn&#8217;t any prior knowledge to draw on.<\/small><\/p>\n<p><small><sup>4<\/sup>There are cases where a flat prior is completely reasonable. For example, if you&#8217;re a population geneticist (like me) and you&#8217;re estimating allele frequencies in populations, it&#8217;s completely reasonable to presume that any value between 0 and 1 is reasonable. Using a flat uniform (or flat Dirichlet) prior is reasonable. Of course, we might be better off with a Beta(2,2) prior (for one allele frequency) or a Dirichlet with all parameters equal to 2 because the fact that we&#8217;re estimating allele frequencies at all from a finite sample means that it&#8217;s unlikely the allele frequency is very close to either 0 or 1.<\/small><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Lauren Kennedy and co-authors (citation below) worry about the effect of &#8220;contamination&#8221; on estimates of credible intervals.1 The effect arises because we often assume that values are drawn from a&#8230; <a class=\"read-more-button\" href=\"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2017\/04\/19\/not-every-credible-interval-is-credible\/\">Read more &gt;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[],"class_list":["post-279","post","type-post","status-publish","format-standard","hentry","category-statistics"],"_links":{"self":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/279","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/comments?post=279"}],"version-history":[{"count":0,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/279\/revisions"}],"wp:attachment":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/media?parent=279"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/categories?post=279"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/tags?post=279"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}