... polymorphic.1
They were discovered as a result of investigations into rejection of transplanted organs and tissues. They are the loci governing acceptance/rejection of transplants in vertebrates.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... substitutions.2
No surprise there. That's the ``sledgehammer principle in operation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... site.3
Of course, we know this isn't true. Multiple substitutions can occur at any site. That's why the percent difference between two sequences isn't equal to the number of substitutions that have happened at any particular site. We're simply assuming that the sequences we're comparing are closely enough related that nearly all mutations have occurred at different positions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... sequence.4
I lied. This isn't quite the way you estimate it. To get an unbiased estimate of pi, you have to multiply this equation by $n/(n-1)$, where $n$ is the number of haplotypes in your sample. And, of course, if you're Bayesian you'll be even a little more careful. You'll estimate $x_i$ using an appropriate prior on haplotype frequencies and you'll estimate the probability that haplotypes $i$ and $j$ are different at a randomly chosen position given the observed number of differences and the sequence length. That probability will be close to $\delta_{ij}/N$, but it won't be identical.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... time.5
This is not the same $\theta$ we encountered when discussing $F$-statistics. Weir and Cockerham's $\theta$ is a different beast. I know it's confusing, but that's the way it is. When reading a paper, the context should make it clear which conception of $\theta$ is being used. Another thing to be careful of is that sometimes authors think of $\theta$ in terms of a haploid population. When they do, it's $2N_e\mu$. Usually the context makes it clear which definition is being used, but you have to remember to pay attention to be sure.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... sample.6
The ``E'' refers to expectation. It is the average value of a random variable. $\mbox{E}(\pi)$ is read as ``the expectation of $\pi$>
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... sample.7
If your memory is really good, you may recognize that those estimates are method of moments estimates, i.e., parameter estimates obtained by equating sample statistics with their expected values.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... severe.8
Why? Because most of the heterozygosity is due to alleles of moderate to high frequency, and those are not the ones likely to be lost in a bottleneck. See the Appendix[*] for more details.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.