- ... site.1
- Of course, we know this isn't
true. Multiple substitutions can occur at any site. That's
why the percent difference between two sequences isn't equal to the
number of substitutions that have happened at any particular
site. We're simply assuming that the sequences we're comparing are
closely enough related that nearly all mutations have occurred at
different positions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
sequence.2
- I lied, but you must be getting used to that by
now. This isn't quite the way you estimate it. To get an unbiased
estimate of pi, you have to multiply this equation by
,
where
is the number of haplotypes in your sample. And, of
course, if you're Bayesian you'll be even a little more
careful. You'll estimate
using an appropriate prior on
haplotype frequencies and you'll estimate the probability that
haplotypes
and
are different at a randomly chosen position
given the observed number of differences and the sequence
length. That probability will be close to
, but it
won't be identical.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
time.3
- This is not the same
we encountered
when discussing
-statistics. Weir and Cockerham's
is a
different beast. I know it's confusing, but that's the way it
is. When reading a paper, the context should make it clear which
conception of
is being used. Another thing to be careful of
is that sometimes authors think of
in terms of a haploid
population. When they do, it's
. Usually the context makes
it clear which definition is being used, but you have to remember to
pay attention to be sure.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... sample.4
- The
``E'' refers to expectation. It is the average value of a random
variable.
is read as ``the expectation of
>
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
sample.5
- If your memory is really good, you may recognize that
those estimates are method of moments estimates, i.e., parameter
estimates obtained by equating sample statistics with their expected
values.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... severe.6
- Why? Because most of
the heterozygosity is due to alleles of moderate to high
frequency, and those are not the ones likely to be lost in a
bottleneck. See the Appendix
for more details.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... locus.7
- Please remember that the failure to detect a difference
from 0 could mean that your sample size is too small to detect an
important effect. If you can't detect a difference, you should try
to assess what values of
are consistent with your data and be
appropriately circumspect in your conclusions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...,8
- Because it has an
rather than an
in its
formula
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.