next up previous
Next: What is a maximum-likelihood Up: The Hardy-Weinberg Principle and Previous: The Hardy-Weinberg principle

Estimating allele frequencies

Before we can determine whether genotypes in a population are in Hardy-Weinberg proportions, we need to be able to estimate the frequency of both genotypes and alleles. This is easy when you can identify all of the alleles within genotypes, but suppose that we're trying to estimate allele frequencies in the ABO blood group system in humans. Then we have a situation that looks like this:

Phenotype A AB B O
Genotype(s) aa ao ab bb bo oo
No. in sample $N_A$ $N_{AB}$ $N_{B}$ $N_O$
Now we can't directly count the number of $a$, $b$, and $o$ alleles. What do we do? Well, about 50 years ago, some statisticians came up with a sneaky approach called the EM algorithm. It uses a trick you'll see repeatedly through this course. When we don't know something we want to know, we pretend that we know it and do some calculations with it. If we're lucky, we can fiddle with our calculations a bit to relate the thing that we pretended to know to something we actually do know so we can figure out what we wanted to know. Make sense? Probably not. But let's try an example.

If we knew $p_a$, $p_b$, and $p_o$, we could figure out how many individuals with the $A$ phenotype have the $aa$ genotype and how many have the $ao$ genotype, namely

\begin{eqnarray*}
N_{aa} &=& n_A \left({p_a^2 \over p_a^2 + 2p_ap_o}\right) \\
N_{ao} &=& n_A \left({2p_ap_o \over p_a^2 + 2p_ap_o}\right) \quad .
\end{eqnarray*}

Obviously we could do the same thing for the $B$ phenotype:

\begin{eqnarray*}
N_{bb} &=& n_B \left({p_b^2 \over p_b^2 + 2p_bp_o}\right) \\
N_{bo} &=& n_B \left({2p_bp_o \over p_b^2 + 2p_bp_o}\right) \quad .
\end{eqnarray*}

Notice that $N_{ab} = N_{AB}$ and $N_{oo} = N_O$ (lowercase subscripts refer to genotypes, uppercase to phenotypes). If we knew all this, then we could calculate $p_a$, $p_b$, and $p_o$ from

\begin{eqnarray*}
p_a &=& {2N_{aa} + N_{ao} + N_{ab} \over 2N} \\
p_b &=& {2N_{...
...ver 2N} \\
p_o &=& {2N_{oo} + N_{ao} + N_{bo} \over 2N} \quad ,
\end{eqnarray*}

where $N$ is the total sample size.

Surprisingly enough we can actually estimate the allele frequencies by using this trick. Just take a guess at the allele frequencies. Any guess will do. Then calculate $N_{aa}$, $N_{ao}$, $N_{bb}$, $N_{bo}$, $N_{ab}$, and $N_{oo}$ as described in the preceding paragraph.8 That's the Expectation part the EM algorithm. Now take the values for $N_{aa}$, $N_{ao}$, $N_{bb}$, $N_{bo}$, $N_{ab}$, and $N_{oo}$ that you've calculated and use them to calculate new values for the allele frequencies. That's the Maximization part of the EM algorithm. Chances are your new values for $p_a$, $p_b$, and $p_o$ won't match your initial guesses, but9 if you take these new values and start the process over and repeat the whole sequence several times, eventually the allele frequencies you get out at the end match those you started with. These are maximum-likelihood estimates of the allele frequencies.10

Consider the following example:11

Phenotype A AB AB O
No. in sample 25 50 25 15
We'll start with the guess that $p_a = 0.33$, $p_b = 0.33$, and $p_o =
0.34$. With that assumption we would calculate that $25(0.33^2/(0.33^2
+ 2(0.33)(0.34))) = 8.168$ of the A phenotypes in the sample have genotype $aa$, and the remaining 16.832 have genotype $ao$. Similarly, we can calculate that 8.168 of the B phenotypes in the population sample have genotype $bb$, and the remaining 16.823 have genotype $bo$. Now that we have a guess about how many individuals of each genotype we have we can calculate a new guess for the allele frequencies, namely $p_a = 0.362$, $p_b = 0.362$, and $p_o =
0.277$. By the time we've repeated this process four more times, the allele frequencies aren't changing anymore. So the maximum likelihood estimate of the allele frequencies is $p_a = 0.372$, $p_b = 0.372$, and $p_o = 0.256$.



Subsections
next up previous
Next: What is a maximum-likelihood Up: The Hardy-Weinberg Principle and Previous: The Hardy-Weinberg principle
Kent Holsinger 2008-08-13