next up previous
Next: An introduction to Bayesian Up: Estimating allele frequencies Previous: Estimating allele frequencies

What is a maximum-likelihood estimate?

I just told you that the method I described produces ``maximum-likelihood estimates'' for the allele frequencies, but I haven't told you what a maximum-likelihood estimate is. The good news is that you've been using maximum-likelihood estimates for as long as you've been estimating anything, without even knowing it. Although it will take me awhile to explain it, the idea is actually pretty simple.

Suppose we had a sock drawer with two colors of socks, red and green. And suppose we were interested in estimating the proportion of red socks in the drawer. One way of approaching the problem would be to mix the socks well, close our eyes, take one sock from the drawer, record its color and replace it. Suppose we do this $N$ times. We know that the number of red socks we'll get might be different the next time, so the number of red socks we get is a random variable. Let's call it $K$. Now suppose in our actual experiment we find $k$ red socks, i.e., $K=k$. If we knew $p$, the proportion of red socks in the drawer, we could calculate the probability of getting the data we observed, namely

\begin{displaymath}
\mbox{P}(K=k\vert p) = {N \choose k} p^k (1-p)^{(N-k)} \quad .
\end{displaymath} (4)

This is the binomial probability distribution. The part on the left side of the equation is read as ``The probability that we get $k$ red socks in our sample given the value of $p$.'' The word ``given'' means that we're calculating the probability of our data conditional on the (unknown) value $p$.

Of course we don't know $p$, so what good does writing (4) do? Well, suppose we reverse the question to which equation (4) is an answer and call the expression in (4) the ``likelihood of the data.'' Suppose further that we find the value of $p$ that makes the likelihood bigger than any other value we could pick.12 Then $\hat p$ is the maximum-likelihood estimate of $p$.13

In the case of the ABO blood group that we just talked about, the likelihood is a bit more complicated

\begin{displaymath}
{N \choose N_A N_{AB} N_B N_O} p_a^{N_A} p_{ab}^{N_{AB}} p_b^{N_B} p_o^{N_O}
\end{displaymath} (5)

This is a multinomial probability distribution.


next up previous
Next: An introduction to Bayesian Up: Estimating allele frequencies Previous: Estimating allele frequencies
Kent Holsinger 2008-08-13