One limitation of the way I've described things so far is that
doesn't provide a convenient way to compare population
structure from different samples.
can be much larger
if both alleles are about equally common in the whole sample than if
one occurs at a mean frequency of 0.99 and the other at a frequency of
0.01. Moreover, if you stare at equations (4)-(6)
for a while, you begin to realize that they look a lot like some
equations we've already
encountered.
Namely, if we were to define
6 as
, then we could rewrite equations (4)-(6) as
There may, of course, be inbreeding within populations, too. But it's
easy to incorporate this into the framework, too.8 Let
be the actual
heterozygosity in individuals within subpopulations,
be the
expected heterozygosity within subpopulations assuming Hardy-Weinberg
within populations, and
be the expected heterozygosity in the
combined population assuming Hardy-Weinberg over the whole
sample.9 Then
thinking of
as a measure of departure from Hardy-Weinberg and
assuming that all populations depart from Hardy-Weinberg to the same
degree, i.e., that they all have the same
, we can define
