The EM algorithm is a method of calculating maximum-likelihood estimates through a two-step iteration: Expectation and Maximization. In this application we start with a guess for the underlying allele frequencies (a, b, & o). If those were the correct frequencies, then the expected number of genotypes in each category is:
| AA: | (a2/(a2 + 2ao))NA |
| AO: | (2ao/(a2 + 2ao))NA |
| BB: | (b2/(b2 + 2bo))NB |
| BO: | (2bo/(b2 + 2bo))NB |
| AB: | (2ab)NAB |
| OO: | (o2)NO |
This is the E expectation stage. Given those expected numbers, new guesses for the allele frequencies can be calulated from the maximum-likelihood estimates associated with them, i.e,
| a = (2AA + AB + AO)/(2(AA + AO + BB + BO + AB +OO)) |
| b = (2BB + AB + BO)/(2(AA + AO + BB + BO + AB +OO)) |
| o = (2OO + AO + BO)/(2(AA + AO + BB + BO + AB +OO)) |
This is the Maximization stage. The estimates of a, b, and o obtained from this stage are used for another round of Expectation and Maximization, and the process is repeated until the frequencies don't change.