next up previous
Next: F statistics Up: Analyzing the genetic structure Previous: The algebraic development

Wright's $F$-statistics

One limitation of the way I've described things so far is that $\hbox{Var}(p)$ doesn't provide a convenient way to compare population structure from different samples. $\hbox{Var}(p)$ can be much larger if both alleles are about equally common in the whole sample than if one occurs at a mean frequency of 0.99 and the other at a frequency of 0.01. Moreover, if you stare at equations (4)-(6) for a while, you begin to realize that they look a lot like some equations we've already encountered. Namely, if we were to define $F_{st}$8 as $\mbox{Var}(p)/\bar p\bar q$, then we could rewrite equations (4)-(6) as

$\displaystyle \frac{1}{k}\sum p_i^2$ $\textstyle =$ $\displaystyle \bar p^2 + F_{st}\bar p \bar q$ (9)
$\displaystyle \frac{1}{k}\sum 2p_iq_i$ $\textstyle =$ $\displaystyle 2\bar p\bar q(1 - F_{st})$ (10)
$\displaystyle \frac{1}{k}\sum q_i^2$ $\textstyle =$ $\displaystyle \bar q^2 + F_{st}\bar p \bar q$ (11)

And it's not even completely artificial to define $F_{st}$ the way I did. After all, the effect of geographic structure is to cause matings to occur among genetically similar individuals. It's rather like inbreeding. Moreover, the extent to which this local mating matters depends on the extent to which populations differ from one another. $\bar p\bar q$ is the maximum allele frequency variance possible, given the observed mean frequency. So one way of thinking about $F_{st}$ is that it measures the amount of allele frequency variance in a sample relative to the maximum possible.9

There may, of course, be inbreeding within populations, too. But it's easy to incorporate this into the framework, too.10 Let $H_i$ be the actual heterozygosity in individuals within subpopulations, $H_s$ be the expected heterozygosity within subpopulations assuming Hardy-Weinberg within populations, and $H_t$ be the expected heterozygosity in the combined population assuming Hardy-Weinberg over the whole sample.11 Then thinking of $f$ as a measure of departure from Hardy-Weinberg and assuming that all populations depart from Hardy-Weinberg to the same degree, i.e., that they all have the same $f$, we can define

F_{it} = 1 - \frac{H_i}{H_t}

Let's fiddle with that a bit.

1 - F_{it} &=& \frac{H_i}{H_t} \\
&=& \left(\frac{H_i}{H_s}\...
...\frac{H_s}{H_t}\right) \\
&=& (1 - F_{is})(1 - F_{st}) \quad ,

where $F_{is}$ is the inbreeding coefficient within populations, i.e., $f$, and $F_{st}$ has the same definition as before.12 $H_t$ is often referred to as the genetic diversity in a population. So another way of thinking about $F_{st} =
(H_t - H_s)/H_t$ is that it's the proportion of the diversity in the sample that's due to allele frequency differences among populations.

next up previous
Next: F statistics Up: Analyzing the genetic structure Previous: The algebraic development
Kent Holsinger 2014-12-28