Next: Zeng et al.'s
Up: Tajima's , Fay and
Previous: Introduction
Let
be the number of sites at which a sequence occurring
times in the sample differs from the sequence of the most recent
common ancestor for all the sequences. Fu [2] showed that
Remember that
is the number of times this haplotype occurs in the
sample. Using this result, we can rewrite
and
as
There are also at least three other statistics that could be used to
estimate
from these data:
Notice that to estimate
,
, or
, you'll
need information on the sequence of an ancestral haplotype. To get
this you'll need an outgroup. As we've already seen, we can get
estimates of
and
without an outgroup.
Fay and Wu [1] suggest using the statistic
to detect departures from neutrality. So what's the difference between
Fay and Wu's
and Tajima's
? Well, notice that there's an
term in
. The largest contributions to this estimate of
are coming from alleles in relatively high frequency, i.e.,
those with lots of copies in our sample. In contrast,
intermediate-frequency alleles contribute most to estiamtes of
. Thus,
measures departures from neutrality that are
reflected in the difference between high-frequency and
intermediate-frequency alleles. In contrast,
measures departures
from neutrality that are reflected in the difference between
low-frequency and intermediate frequency alleles. Thus, while
is
sensitive to population expansion (because the number of segregating
sites responds more rapidly to changes in population size than the
nucleotide heterozygosity),
will not be. As a result, combining
both tests may allow you to distinguish populaion expansion from
purifying selection.
Next: Zeng et al.'s
Up: Tajima's , Fay and
Previous: Introduction
Kent Holsinger
2008-09-07