next up previous
Next: Fay and Wu's Up: Tajima's , Fay and Previous: Tajima's , Fay and

Introduction

We saw last time that comparing two estimators of $\theta = 4N_E\mu$ can help us to determine whether patterns of diversity within populations are consistent with neutral expectations or not. Specifically, let

\begin{displaymath}
\hat\pi = \sum \hat x_i\hat x_j\delta_{ij}/N \quad ,
\end{displaymath}

be the observed nucleotide heterozygosity and let $\hat k$ be the observed number of segregating sites in a sample, then

\begin{eqnarray*}
\hat \theta_\pi &=& \hat \pi \\
\hat \theta_k &=& \frac{\hat k}{\sum_i^{n-1}\frac{1}{i}} \quad ,
\end{eqnarray*}

where $n$ is the number of sequences in your sample, and

\begin{displaymath}
\hat D = \hat\theta_\pi - \hat\theta_k
\quad.
\end{displaymath}

$\hat D > 0$ suggests either a recent population bottleneck or some form of balancing selection. $\hat D < 0$ suggests either population expansion or purifying selection. A quick check in Web of Science reveals that the paper in which Tajima described this approach [3] has been cited almost 2200 times since 1994. Clearly it has been widely used for interpreting patterns of nucleotide sequence variation. Although it is a very useful statistic, Zeng et al. [4] point out that there are important aspects of the data that Tajima's $D$ does not consider. As a result, it may be less powerful, i.e., less able to detect departures from neutrality, than some alternatives.



Kent Holsinger 2008-09-07