next up previous
Next: The gory details9 Up: Analyzing the genetic structure Previous: Summary


Statistical expectation and biased estimates

The concept of statistical expectation is actually quite an easy one. It is an arithmetic average, just one calculated from probabilities instead of being calculated from samples. So, for example, if $\mbox{P}(k)$ is the probability that we find $k$ $A_1$ alleles in our sample, the expected number of $A_1$ alleles in our sample is just

\begin{eqnarray*}
\mbox{E}(k) &=& \sum k \mbox{P}(k) \\
&=& n p \quad , \\
\end{eqnarray*}

where $n$ is the total number of alleles in our sample and $p$ is the frequency of $A_1$ in our sample.7

Now consider the expected value of our sample estimate of the population allele frequency, $\hat p = k/n$, where $k$ now refers to the number of $A_1$ alleles we actually found.

\begin{eqnarray*}
\mbox{E}(\hat p) &=& \mbox{E}\left(\sum (k/n)\right) \\
&=& ...
...ft(\sum k P(k)\right) \\
&=& (1/n)(n p) \\
&=& p \quad . \\
\end{eqnarray*}

Because $\mbox{E}(\hat p) = p$, $\hat p$ is said to be an unbiased estimate of $p$. When an estimate is unbiased it means that if we were to repeat the sampling experiment an infinite number of times and to take the average of the estimates, the average of those values would be equal to the (unknown) parameter value.

What about estimating the frequency of heterozygotes within a population? The obvious estimator is $\tilde H = 2\hat p (1 - \hat
p)$. Well,

\begin{eqnarray*}
\mbox{E}(\tilde H) &=& \mbox{E}\left(2\hat p (1 - \hat p)\righ...
...mbox{E}({\hat p}^2)\right) \\
&=& ((n-1)/n)2p(1-p) \quad . \\
\end{eqnarray*}

Because $\mbox{E}(\tilde H) \ne 2p(1-p)$, $\tilde H$ is a biased estimate of $2p(1-p)$. If we set $\hat H = (n/(n-1))\tilde H$, however, $\hat H$ is an unbiased estimator of $2p(1-p)$.8

If you've ever wondered why you typically divide the sum of squared deviations about the mean by $n-1$ instead of $n$ when estimating the variance of a sample, this is why. Dividing by $n$ gives you a (slightly) biased estimator.



Subsections
next up previous
Next: The gory details9 Up: Analyzing the genetic structure Previous: Summary
Kent Holsinger 2012-09-08