next up previous
Next: Another approach to selecting Up: Analyzing the genetic structure Previous: Applying assignment to understand

More details on selecting $K$

In the documentation for Strucutre, Pritchard et al. provide a few more suggestions on picking $K$. That ``thing'' I called ``Mean L(K)'' in Table 1 is the mean natural logarithm of the probability of the data, given the unknown parameters. To give it a mathematical formula

\begin{displaymath}
\mbox{Mean }L(K) = \frac{1}{N}\sum \ln \left(P_i(X\vert K)\right) \quad ,
\end{displaymath}

where $N$ is the number of times we ran Structure for a particular $K$ and $\ln\left(P_i(X\vert K)\right)$ is the natural logarithm of the probability of the data on the $i$th run.2 That quantity $P_i(X\vert K)$ should ring a bell. It looks a lot like a likelihood--probably because it is (approximately) a likelihood, meaning that $\exp\left(\mbox{Mean }L(K)\right)$ is approximately the mean log likelihood of the data.

So if we don't have any idea what $K$ is, we might assume a uniform prior, i.e., that all of the $K$ we considered are a priori equally likely. then by using Bayes' Theorem, we can calculate the posterior probability of each $K$. For the Berberis thunbergii data and $K=2$,

\begin{eqnarray*}
P(K=2\vert X) &=& \frac{\exp\left(-2553.2\right)}
{\exp\left(-...
...p\left(-2476.3\right)} \\
&\approx& 7.8 \times 10^{-97} \quad .
\end{eqnarray*}

If you try to put a number like $\exp\left(-2553.2\right)$ into your calculator, you're going to get a numerical overflow. So I used a little re-scaling trick to do the calculation. Here's a function in R that will do it for you:
   K.post <- function(K, post) {
      min.post <- min(post)
      denominator <- sum(exp(post-min.post))
      numerator <- exp(post[K] - min.post)
      numerator/denominator
   }
post is an array that contains the mean log probability of the data for each $K$ you're considering, in this case c(-2553.2, -2331.9, -2402.0, -2476.3). K is the index of the $K$ whose posterior probability you want to calculate, in this case 1.3 So putting this into R we get
   > K.post(1, c(-2553.2, -2331.9, -2402.0, -2476.3))
   [1] 7.77376e-97


next up previous
Next: Another approach to selecting Up: Analyzing the genetic structure Previous: Applying assignment to understand
Kent Holsinger 2010-09-20