Let's suppose we've collected data from a population of Desmodium
cuspidatum15 and have found 7 alleles coding for the fast allele at a enzyme locus encoding glucose-phosphate
isomerase in a sample of 20 alleles. We want to estimate the frequency
of the fast allele. The maximum-likelihood estimate is
, which we got by finding the value of
that maximizes

Until a little over fifteen years ago it was necessary to do a bunch of complicated calculus to combine the prior with the likelihood to get a posterior. Since the early 1990s statisticians have used a simulation approach, Monte Carlo Markov Chain sampling, to construct numerical samples from the posterior. For the problems encountered in this course, we'll mostly be using the freely available software package WinBUGS to implement Bayesian analyses. For the problem we just encountered, here's the code that's needed to get our results:17
model {
# likelihood
k ~ dbin(p, N)
# prior
p ~ dunif(0,1)
}
list(k = 7, n = 20)
Running this in WinBUGS produces the results in
Figure 1.
|
The column headings in Figure 1 should be
fairly self-explanatory, except for the one labeled MC
error.18 mean
is the posterior mean. It's our best guess of the value for the
frequency of the fast allele. s.d. is the posterior
standard deviation. It's our best guess of the uncertainty associated
with our estimate of the frequency of the fast allele. The
2.5%, 50%, and 97.5% columns are the percentiles of the posterior
distribution. The [2.5%, 97.5%] interval is the 95% credible
interval, which is analogous to the 95% confidence interval in
classical statistics, except that we can say that there's a 95%
chance that the frequency of the fast allele lies within this
interval.19 Since the results
are from a simulation, different runs will produce slightly different
results. In this case, we have a posterior mean of about 0.36 (as
opposed to the maximum-likelihood estimate of 0.35), and there is a
95% chance that
lies in the interval [0.18, 0.57].20