next up previous
Next: Bibliography Up: Nested clade analysis Previous: Statistical parsimony

Nested clade analysis

Once we have constructed the haplotype network, we're then faced with the problem of identifying nested clades. Templeton et al. [3] propose the following algorithm to construct a unique set of nested clades:

Step 1.
Each haplotype in the sample comprises a 0-step clade, i.e., each copy of a particular haplotype in the sample is separated by zero evolutionary steps from other copies of the same haplotype. ``Tip'' haplotypes are those that are connected to only one other haplotype. ``Interior'' haplotypes are those that are connected to two or more haplotypes. Set $j=0$

Step 2.
Pick a tip haplotype that is not part of any $j+1$-step network.

Step 3.
Identify the interior haplotype with which it is connected by $j+1$ mutational steps.

Step 4.
Identify all tip haplotypes connected to that interior haplotype by $j+1$ mutational steps.

Step 5.
The set of all such tip and interior haplotypes constitutes a $j+1$-step clade.

Step 6.
If there are tip haplotypes remaining that are not part of a $j+1$-step clade, return to step 2.

Step 7.
Identify internal $j$-step clades that are not part of a $j+1$ step clade and are separated by $j+1$ steps.

Step 8.
Designate these clades as ``terminal'' and return to step 2.

Step 9.
Increment $j$ by one and return to step 2.

That sounds fairly complicated, but if you look at the example in Figure [*], you'll see that it isn't all that horrible.

Figure: Nesting of haplotypes at the Adh locus in Drosophila melanogaster.
\includegraphics[height=8cm]{nca-nesting.eps}

This algorithm produces a set of nested clades, i.e., a 1-step clade is contained within a 2-step clade, a 2-step clade is contained within a 3-step clade, and so on. Once such sets of nested clades have been identified, we can calculate statistics related to the geographical distribution of each clade in the sample. Templeton et al. [6] define two statistics that are used in an inferential key (the most recent version of the key is in [4]; see Figure [*]):

Clade distance
The average distance of each haplotype in the the particular clade from the center of its geographical distribution. ``Distance'' may be the great circle distance or it might be the distance measured along a presumed dispersal corridor. The clade distance for clade $X$ is symbolized $D_c(X)$, and it measures how far this clade has spread.

Nested clade distance
The average distance of the center of distribution for this haplotype from the center of distribution for the haplotype within which it is nested. So if clade $X$ is nested within clade $Y$, we calculate $D_n(X)$ by determinining the geographic center of clades $X$ and clade $Y$ and measuring the distance between those centers. $D_n(X)$ measures how far the clade has changed position relative to the clade from which it originated.

Figure: Each number corresponds to a haplotype in the sample. Haplotypes 1 and 2 are ``tip'' haplotypes. Haplotype 3 is an interior haplotype. The numbers in square boxes illustrate the center for each 0-step clade (a haplotype). The hexagonal ``N'' represents the center for the clade containing 1, 2, and 3. Numbers in ovals are the distances from the center of each collecting area to the clade center. $D_c(1)=0$, $D_c(2)=(3/9)(2)+(6/9)(1)=1.33$, $D_c(3) = (4/12)(1.9) + (4/12)(1.9) + (4/12)(1.9)=1.9$. $D_n(1) =
1.6$, $D_n(2)=(3/9)(1.6)+(6/9)(1.5)=1.53$, $D_n(3)=
(4/12)(1.6)+(4/12)(1.5)+(4/12)(2.3)=1.8$.
\includegraphics[scale=0.4]{nca-calculations.eps}

Once you've calculated these distances, you randomly permute the clades across sample locations. This shuffles the data randomly, keeping the number of haplotypes and the sample size per location the same as in the orignal data set. For each of these permutations, you calculate $D_c(X)$ and $D_n(X)$. If the observed clade distance, the observed nested clade difference, or both are significantly different from expected by chance, then you have evidence of (a) geographical expansion of the clade (if $D_c(X)$ is greater than null expectation) or (b) a range-shift (if $D_n(X)$ is greater than null expectation). Using these kinds of statistics, you run your data set through Templeton's inference key to reach a conclusion. For example, applying this procedure to data from Ambystoma tigrinum (Figure [*]), Templeton et al. [6] construct the scenario in Figure [*].

Figure: Geographic distribution of mtDNA haplotypes in Ambystoma tigrinum.

Figure: Inference key for Ambystoma tigrinum.


next up previous
Next: Bibliography Up: Nested clade analysis Previous: Statistical parsimony
Kent Holsinger 2010-12-13