next up previous
Next: An AMOVA example Up: Analysis of molecular variance Previous: Introduction

Analysis of molecular variation (AMOVA)

The notation now becomes just a little bit more complicated. We will now use $x_{ik}$ to refer to the frequency of the $i$th haplotype in the $k$th population. Then

\begin{displaymath}
x_i = \frac{1}{K}\sum_{k=1}^K x_{ik}
\end{displaymath}

is the mean frequency of haplotype $i$ across all populations, where $K$ is the number of populations. We can now define

\begin{eqnarray*}
\pi_t &=& \sum_{ij} x_ix_j \delta_{ij} \\
\pi_s &=& \frac{1}{K}\sum_{k=1}^K\sum_{ij} x_{ik}x_{jk}\delta_{ij} \quad ,
\end{eqnarray*}

where $\pi_t$ is the nucleotide sequence diversity across the entire set of populations and $\pi_0$ is the average nucleotide sequence diversity within populations. Then we can define
\begin{displaymath}
\Phi_{st} = \frac{\pi_t - \pi_s}{\pi_t} \quad ,
\end{displaymath} (1)

which is the direct analog of Wright's $F_{st}$ for nucleotide sequence diversity. Why? Well, that requires you to remember stuff we covered eight or ten weeks ago.

To be a bit more specific, refer back to http://darwin.eeb.uconn.edu/eeb348/lecture-notes/wahlund/node4.html. If you do, you'll see that we defined

\begin{displaymath}
F_{it} = 1 - \frac{H_i}{H_t} \quad ,
\end{displaymath}

where $H_i$ is the average heterozygosity in individuals and $H_t$ is the expected panmictic heterozygosity. Defining $H_s$ as the average panmictic heterozygosity within populations, we then observed that

\begin{eqnarray*}
1 - F_{it} &=& \frac{H_i}{H_t} \\
&=& \frac{H_i}{H_s}\frac{H_s}{H_t} \\
&=& (1 - F_{is})(1 - F_{st}) \quad .
\end{eqnarray*}

In short, another way to think about $F_{st}$ is
\begin{displaymath}
F_{st} = \frac{H_t - H_s}{H_t} \quad .
\end{displaymath} (2)

Now if you compare equation (1) and equation (2), you'll see the analogy.

Excoffier et al. [1] pointed out that other types of molecular data can easily be fit into this framework. We simply need an appropriate measure of the ``distance'' between different haplotypes or alleles. Even with nucleotide sequences the appropriate $\delta_{ij}$ may reflect something about the mutational pathway likely to connect sequences rather than the raw number of differences between them. The idea is illustrated in Figure 1. This procedure for partitioning diversity in molecular markers is referred to as an analysis of molecular variance or AMOVA (by analogy with the ubiquitous statistical procedure analysis of variance, ANOVA). Like Wright's $F$-statistics, the analysis can include several levels in the hierarchy.

Figure 1: Converting raw differences in sequence (or presence and absence of restriction sites) into a minimum spanning tree and a mutational measure of distance for an analysis of molecular variance (from [1]).
\resizebox{!}{8cm}{\includegraphics{amova-procedure.eps}}


next up previous
Next: An AMOVA example Up: Analysis of molecular variance Previous: Introduction
Kent Holsinger 2006-11-30