Next: Bibliography
Up: Nested clade analysis
Previous: Statistical parsimony
Once we have constructed the haplotype network, we're then faced with
the problem of identifying nested clades. Templeton et
al. [3] propose the following algorithm to
construct a unique set of nested clades:
 Step 1.
 Each haplotype in the sample comprises a 0step clade,
i.e., each copy of a particular haplotype in the sample is separated
by zero evolutionary steps from other copies of the same
haplotype. ``Tip'' haplotypes are those that are connected to only
one other haplotype. ``Interior'' haplotypes are those that are
connected to two or more haplotypes. Set
 Step 2.
 Pick a tip haplotype that is not part of any step
network.
 Step 3.
 Identify the interior haplotype with which it is
connected by mutational steps.
 Step 4.
 Identify all tip haplotypes connected to that interior
haplotype by mutational steps.
 Step 5.
 The set of all such tip and interior haplotypes
constitutes a step clade.
 Step 6.
 If there are tip haplotypes remaining that are not part
of a step clade, return to step 2.
 Step 7.
 Identify internal step clades that are not part of
a step clade and are separated by steps.
 Step 8.
 Designate these clades as ``terminal'' and return to
step 2.
 Step 9.
 Increment by one and return to step 2.
That sounds fairly complicated, but if you look at the
example in Figure , you'll see that it isn't all
that horrible.
Figure:
Nesting of haplotypes at the Adh locus in Drosophila melanogaster.

This algorithm produces a set of nested clades, i.e., a 1step clade
is contained within a 2step clade, a 2step clade is contained within
a 3step clade, and so on. Once such sets of nested clades have been
identified, we can calculate statistics related to the geographical
distribution of each clade in the sample. Templeton et
al. [6] define two statistics that are used in
an inferential key (the most recent version of the key is
in [4]; see Figure ):
 Clade distance
 The average distance of each haplotype in the
the particular clade from the center of its geographical
distribution. ``Distance'' may be the great circle distance or it
might be the distance measured along a presumed dispersal
corridor. The clade distance for clade is symbolized ,
and it measures how far this clade has spread.
 Nested clade distance
 The average distance of the center of
distribution for this haplotype from the center of distribution for
the haplotype within which it is nested. So if clade is nested
within clade , we calculate by determinining the
geographic center of clades and clade and measuring the
distance between those centers. measures how far the clade
has changed position relative to the clade from which it originated.
Figure:
Each number corresponds to a haplotype in the
sample. Haplotypes 1 and 2 are ``tip'' haplotypes. Haplotype 3 is an
interior haplotype. The numbers in square boxes illustrate the
center for each 0step clade (a haplotype). The hexagonal ``N''
represents the center for the clade containing 1, 2, and 3. Numbers
in ovals are the distances from the center of each collecting area
to the clade center. ,
,
.
,
,
.

Once you've calculated these distances, you randomly permute the
clades across sample locations. This shuffles the data randomly,
keeping the number of haplotypes and the sample size per location the
same as in the orignal data set. For each of these permutations, you
calculate and . If the observed clade distance, the
observed nested clade difference, or both are significantly different
from expected by chance, then you have evidence of (a) geographical
expansion of the clade (if is greater than null expectation)
or (b) a rangeshift (if is greater than null
expectation). Using these kinds of statistics, you run your data set
through Templeton's inference key to reach a conclusion. For example,
applying this procedure to data from Ambystoma tigrinum (Figure
), Templeton et al. [6]
construct the scenario in Figure .
Figure:
Geographic distribution of mtDNA haplotypes in Ambystoma
tigrinum.

Figure:
Inference key for Ambystoma
tigrinum.

Next: Bibliography
Up: Nested clade analysis
Previous: Statistical parsimony
Kent Holsinger
20101213