... site.1
Of course, we know this isn't true. Multiple substitutions can occur at any site. That's why the percent difference between two sequences isn't equal to the number of substitutions that have happened at any particular site. We're simply assuming that the sequences we're comparing are closely enough related that nearly all mutations have occurred at different positions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... sequence.2
I lied, but you must be getting used to that by now. This isn't quite the way you estimate it. To get an unbiased estimate of pi, you have to multiply this equation by $n/(n-1)$, where $n$ is the number of haplotypes in your sample. And, of course, if you're Bayesian you'll be even a little more careful. You'll estimate $x_i$ using an appropriate prior on haplotype frequencies and you'll estimate the probability that haplotypes $i$ and $j$ are different at a randomly chosen position given the observed number of differences and the sequence length. That probability will be close to $\delta_{ij}/N$, but it won't be identical.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... time.3
This is not the same $\theta$ we encountered when discussing $F$-statistics. Weir and Cockerham's $\theta$ is a different beast. I know it's confusing, but that's the way it is. When reading a paper, the context should make it clear which conception of $\theta$ is being used. Another thing to be careful of is that sometimes authors think of $\theta$ in terms of a haploid population. When they do, it's $2N_e\mu$. Usually the context makes it clear which definition is being used, but you have to remember to pay attention to be sure.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... sample.4
The ``E'' refers to expectation. It is the average value of a random variable. $\mbox{E}(\pi)$ is read as ``the expectation of $\pi$>
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... sample.5
If your memory is really good, you may recognize that those estimates are method of moments estimates, i.e., parameter estimates obtained by equating sample statistics with their expected values.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... severe.6
Why? Because most of the heterozygosity is due to alleles of moderate to high frequency, and those are not the ones likely to be lost in a bottleneck. See the Appendix[*] for more details.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... locus.7
Please remember that the failure to detect a difference from 0 could mean that your sample size is too small to detect an important effect. If you can't detect a difference, you should try to assess what values of $D$ are consistent with your data and be appropriately circumspect in your conclusions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...,8
Because it has an $i$ rather than an $i^2$ in its formula
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.