next up previous
Next: Zeng et al.'s Up: Tajima's , Fay and Previous: Introduction

Fay and Wu's $H$

Let $\xi_i$ be the number of sites at which a sequence occurring $i$ times in the sample differs from the sequence of the most recent common ancestor for all the sequences. Fu [2] showed that

\begin{displaymath}
\mbox{E}(\xi_i) = \frac{\theta}{i} \quad .
\end{displaymath}

Remember that $i$ is the number of times this haplotype occurs in the sample. Using this result, we can rewrite $\hat\theta_\pi$ and $\hat\theta_k$ as

\begin{eqnarray*}
\hat\theta_\pi &=& {n \choose 2}^{-1}\sum_{i=1}^{n-1}i(n-i)\h...
..._i \\
\hat\theta_k &=& \frac{1}{a_n}\sum_{i=1}^{n-1}\hat\xi_i
\end{eqnarray*}

There are also at least three other statistics that could be used to estimate $\theta$ from these data:

\begin{eqnarray*}
\xi_e &=& x_1 \\
\theta_H &=& {n \choose 2}^{-1}\sum_{i=1}^...
...
\theta_L &=& \frac{1}{n-1}\sum_{i=1}^{n-1}i\hat\xi_i \quad .
\end{eqnarray*}

Notice that to estimate $\xi_e$, $\theta_H$, or $\theta_L$, you'll need information on the sequence of an ancestral haplotype. To get this you'll need an outgroup. As we've already seen, we can get estimates of $\theta_\pi$ and $\theta_k$ without an outgroup.

Fay and Wu [1] suggest using the statistic

\begin{displaymath}
H = \hat\theta_\pi - \theta_H
\end{displaymath}

to detect departures from neutrality. So what's the difference between Fay and Wu's $H$ and Tajima's $D$? Well, notice that there's an $i^2$ term in $\theta_H$. The largest contributions to this estimate of $\theta$ are coming from alleles in relatively high frequency, i.e., those with lots of copies in our sample. In contrast, intermediate-frequency alleles contribute most to estiamtes of $\theta_\pi$. Thus, $H$ measures departures from neutrality that are reflected in the difference between high-frequency and intermediate-frequency alleles. In contrast, $D$ measures departures from neutrality that are reflected in the difference between low-frequency and intermediate frequency alleles. Thus, while $D$ is sensitive to population expansion (because the number of segregating sites responds more rapidly to changes in population size than the nucleotide heterozygosity), $H$ will not be. As a result, combining both tests may allow you to distinguish populaion expansion from purifying selection.


next up previous
Next: Zeng et al.'s Up: Tajima's , Fay and Previous: Introduction
Kent Holsinger 2008-09-07