R Notebook

Illustrating the influence of priors

We’ll go back to our really simple example of binomial sampling.

rm(list = ls())

N <- 20
k <- 2
## Load the rstan library
##
library(rstan)

## set the number of chains to the number of cores in the computer
##
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

## set up the data
##   N: sample size
##   k: number of A1 alleles
stan_data <- list(N = N,
                  k = k)

## Invoke stan
##
fit <- stan("binomial-model.stan",
            data = stan_data,
            refresh = 0)

recompiling to avoid crashing R session

## print the results on the console with 3 digits after the decimal
##
print(fit, digits = 3)

Inference for Stan model: binomial-model.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

       mean se_mean    sd    2.5%    25%    50%    75%  97.5%
p     0.136   0.002 0.070   0.034  0.084  0.125  0.176  0.298
lp__ -9.262   0.021 0.706 -11.262 -9.435 -8.993 -8.812 -8.763
     n_eff  Rhat
p     1447 1.002
lp__  1121 1.003

Samples were drawn using NUTS(diag_e) at Tue Sep 14 06:39:25 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Looking at the posterior distribution

What you see in the table above is a summary of the posterior distribution. If you read the text on top of the table, you’ll see that there were

4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

What I haven’t described so far is how Stan (or Hickory) produces estimates. It does a fancy random walk in such a way that the number of times it visits a point is proportional to the posterior probability associated with that point.¹

But you want to be sure that you haven’t wandered off somewhere strange. That’s why we do it 4 times - the 4 chains. And you want to be sure you’ve taken enough steps so that you have a good sense of what those probabilities are - the iter = 2000.

You also want to be sure that you only pay attention after you’re in the right neighborhood. You may be starting somewhere a long way from where you want to be. So you only start paying attention to where you are after a while - after the warmup = 1000.

Since we take a total of 2000 steps per chain and we ignore the first 1000, that leaves 1000 post-warmup steps we’re paying attention to in any one chain, or a total of 4000 steps across all four.

We can visualize what the posterior distribution looks like pretty easily. The filled area in blue is the distribution we got from our sample. The black line shows the theoretical expectation.

library(bayesplot)

p <- mcmc_dens(fit, pars = "p") +
  geom_function(fun = dbeta, args = list(k+1, N-k+1))
print(p)

Visualizing the results

library(ggplot2)

p <- ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
  geom_function(fun = dbeta, args = list(k, N-k)) +
  geom_function(fun = dbeta, args = list(1, 1), color = "blue") +
  geom_function(fun = dbeta, args = list(k+1, N-k+1), 
                color = "salmon") +
  geom_vline(xintercept = (k-1)/(N-2), linetype = "dashed") +
  geom_vline(xintercept = (k+1)/(N+2), color = "red", 
             linetype = "dashed") +
  annotate("text", x = 0.75, y = 6, label = "Likelihood") +
  annotate("text", x = 0.75, y = 4, label = "prior", 
           color = "blue") +
  annotate("text", x = 0.75, y = 5, label = "posterior",
           color = "red") +
  theme_bw()
p

Technically, it’s a bit more complicated than that, but the way I described it is close enough for our purposes.↩︎

LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyMgSWxsdXN0cmF0aW5nIHRoZSBpbmZsdWVuY2Ugb2YgcHJpb3JzCgpXZSdsbCBnbyBiYWNrIHRvIG91ciByZWFsbHkgc2ltcGxlIGV4YW1wbGUgb2YgYmlub21pYWwgc2FtcGxpbmcuCgpgYGB7cn0Kcm0obGlzdCA9IGxzKCkpCgpOIDwtIDIwCmsgPC0gMgojIyBMb2FkIHRoZSByc3RhbiBsaWJyYXJ5CiMjCmxpYnJhcnkocnN0YW4pCgojIyBzZXQgdGhlIG51bWJlciBvZiBjaGFpbnMgdG8gdGhlIG51bWJlciBvZiBjb3JlcyBpbiB0aGUgY29tcHV0ZXIKIyMKb3B0aW9ucyhtYy5jb3JlcyA9IHBhcmFsbGVsOjpkZXRlY3RDb3JlcygpKQpyc3Rhbl9vcHRpb25zKGF1dG9fd3JpdGUgPSBUUlVFKQoKIyMgc2V0IHVwIHRoZSBkYXRhCiMjICAgTjogc2FtcGxlIHNpemUKIyMgICBrOiBudW1iZXIgb2YgQTEgYWxsZWxlcwpzdGFuX2RhdGEgPC0gbGlzdChOID0gTiwKICAgICAgICAgICAgICAgICAgayA9IGspCgojIyBJbnZva2Ugc3RhbgojIwpmaXQgPC0gc3RhbigiYmlub21pYWwtbW9kZWwuc3RhbiIsCiAgICAgICAgICAgIGRhdGEgPSBzdGFuX2RhdGEsCiAgICAgICAgICAgIHJlZnJlc2ggPSAwKQoKIyMgcHJpbnQgdGhlIHJlc3VsdHMgb24gdGhlIGNvbnNvbGUgd2l0aCAzIGRpZ2l0cyBhZnRlciB0aGUgZGVjaW1hbAojIwpwcmludChmaXQsIGRpZ2l0cyA9IDMpCmBgYAoKIyMgTG9va2luZyBhdCB0aGUgcG9zdGVyaW9yIGRpc3RyaWJ1dGlvbgoKV2hhdCB5b3Ugc2VlIGluIHRoZSB0YWJsZSBhYm92ZSBpcyBhICpzdW1tYXJ5KiBvZiB0aGUgcG9zdGVyaW9yIGRpc3RyaWJ1dGlvbi4gSWYgeW91IHJlYWQgdGhlIHRleHQgb24gdG9wIG9mIHRoZSB0YWJsZSwgeW91J2xsIHNlZSB0aGF0IHRoZXJlIHdlcmUgCgo8cHJlPgo0IGNoYWlucywgZWFjaCB3aXRoIGl0ZXI9MjAwMDsgd2FybXVwPTEwMDA7IHRoaW49MTsgCnBvc3Qtd2FybXVwIGRyYXdzIHBlciBjaGFpbj0xMDAwLCB0b3RhbCBwb3N0LXdhcm11cCBkcmF3cz00MDAwLgo8L3ByZT4KCldoYXQgSSBoYXZlbid0IGRlc2NyaWJlZCBzbyBmYXIgaXMgaG93IGBTdGFuYCAob3IgYEhpY2tvcnlgKSBwcm9kdWNlcyBlc3RpbWF0ZXMuIEl0IGRvZXMgYSBmYW5jeSByYW5kb20gd2FsayBpbiBzdWNoIGEgd2F5IHRoYXQgdGhlIG51bWJlciBvZiB0aW1lcyBpdCB2aXNpdHMgYSBwb2ludCBpcyBwcm9wb3J0aW9uYWwgdG8gdGhlIHBvc3RlcmlvciBwcm9iYWJpbGl0eSBhc3NvY2lhdGVkIHdpdGggdGhhdCBwb2ludC5eW1RlY2huaWNhbGx5LCBpdCdzIGEgYml0IG1vcmUgY29tcGxpY2F0ZWQgdGhhbiB0aGF0LCBidXQgdGhlIHdheSBJIGRlc2NyaWJlZCBpdCBpcyBjbG9zZSBlbm91Z2ggZm9yIG91ciBwdXJwb3Nlcy5dIAoKQnV0IHlvdSB3YW50IHRvIGJlIHN1cmUgdGhhdCB5b3UgaGF2ZW4ndCB3YW5kZXJlZCBvZmYgc29tZXdoZXJlIHN0cmFuZ2UuIFRoYXQncyB3aHkgd2UgZG8gaXQgNCB0aW1lcyAtIHRoZSA0IGNoYWlucy4gQW5kIHlvdSB3YW50IHRvIGJlIHN1cmUgeW91J3ZlIHRha2VuIGVub3VnaCBzdGVwcyBzbyB0aGF0IHlvdSBoYXZlIGEgZ29vZCBzZW5zZSBvZiB3aGF0IHRob3NlIHByb2JhYmlsaXRpZXMgYXJlIC0gdGhlIGl0ZXIgPSAyMDAwLgoKWW91IGFsc28gd2FudCB0byBiZSBzdXJlIHRoYXQgeW91IG9ubHkgcGF5IGF0dGVudGlvbiAqYWZ0ZXIqIHlvdSdyZSBpbiB0aGUgcmlnaHQgbmVpZ2hib3Job29kLiBZb3UgbWF5IGJlIHN0YXJ0aW5nIHNvbWV3aGVyZSBhIGxvbmcgd2F5IGZyb20gd2hlcmUgeW91IHdhbnQgdG8gYmUuIFNvIHlvdSBvbmx5IHN0YXJ0IHBheWluZyBhdHRlbnRpb24gdG8gd2hlcmUgeW91IGFyZSBhZnRlciBhIHdoaWxlIC0gYWZ0ZXIgdGhlIHdhcm11cCA9IDEwMDAuIAoKU2luY2Ugd2UgdGFrZSBhIHRvdGFsIG9mIDIwMDAgc3RlcHMgcGVyIGNoYWluIGFuZCB3ZSBpZ25vcmUgdGhlIGZpcnN0IDEwMDAsIHRoYXQgbGVhdmVzIDEwMDAgcG9zdC13YXJtdXAgc3RlcHMgd2UncmUgcGF5aW5nIGF0dGVudGlvbiB0byBpbiBhbnkgb25lIGNoYWluLCBvciBhIHRvdGFsIG9mIDQwMDAgc3RlcHMgYWNyb3NzIGFsbCBmb3VyLgoKV2UgY2FuIHZpc3VhbGl6ZSB3aGF0IHRoZSBwb3N0ZXJpb3IgZGlzdHJpYnV0aW9uIGxvb2tzIGxpa2UgcHJldHR5IGVhc2lseS4gVGhlIGZpbGxlZCBhcmVhIGluIGJsdWUgaXMgdGhlIGRpc3RyaWJ1dGlvbiB3ZSBnb3QgZnJvbSBvdXIgc2FtcGxlLiBUaGUgYmxhY2sgbGluZSBzaG93cyB0aGUgdGhlb3JldGljYWwgZXhwZWN0YXRpb24uCgpgYGB7cn0KbGlicmFyeShiYXllc3Bsb3QpCgpwIDwtIG1jbWNfZGVucyhmaXQsIHBhcnMgPSAicCIpICsKICBnZW9tX2Z1bmN0aW9uKGZ1biA9IGRiZXRhLCBhcmdzID0gbGlzdChrKzEsIE4taysxKSkKcHJpbnQocCkgICAgCmBgYAoKIyMgVmlzdWFsaXppbmcgdGhlIHJlc3VsdHMKCmBgYHtyfQpsaWJyYXJ5KGdncGxvdDIpCgpwIDwtIGdncGxvdChkYXRhLmZyYW1lKHggPSBjKDAsIDEpKSwgYWVzKHggPSB4KSkgKwogIGdlb21fZnVuY3Rpb24oZnVuID0gZGJldGEsIGFyZ3MgPSBsaXN0KGssIE4taykpICsKICBnZW9tX2Z1bmN0aW9uKGZ1biA9IGRiZXRhLCBhcmdzID0gbGlzdCgxLCAxKSwgY29sb3IgPSAiYmx1ZSIpICsKICBnZW9tX2Z1bmN0aW9uKGZ1biA9IGRiZXRhLCBhcmdzID0gbGlzdChrKzEsIE4taysxKSwgCiAgICAgICAgICAgICAgICBjb2xvciA9ICJzYWxtb24iKSArCiAgZ2VvbV92bGluZSh4aW50ZXJjZXB0ID0gKGstMSkvKE4tMiksIGxpbmV0eXBlID0gImRhc2hlZCIpICsKICBnZW9tX3ZsaW5lKHhpbnRlcmNlcHQgPSAoaysxKS8oTisyKSwgY29sb3IgPSAicmVkIiwgCiAgICAgICAgICAgICBsaW5ldHlwZSA9ICJkYXNoZWQiKSArCiAgYW5ub3RhdGUoInRleHQiLCB4ID0gMC43NSwgeSA9IDYsIGxhYmVsID0gIkxpa2VsaWhvb2QiKSArCiAgYW5ub3RhdGUoInRleHQiLCB4ID0gMC43NSwgeSA9IDQsIGxhYmVsID0gInByaW9yIiwgCiAgICAgICAgICAgY29sb3IgPSAiYmx1ZSIpICsKICBhbm5vdGF0ZSgidGV4dCIsIHggPSAwLjc1LCB5ID0gNSwgbGFiZWwgPSAicG9zdGVyaW9yIiwKICAgICAgICAgICBjb2xvciA9ICJyZWQiKSArCiAgdGhlbWVfYncoKQpwCmBgYAo=