Next: Nested clade analysis
Up: Nested clade analysis
Previous: Introduction
Templeton et al. [5] lay out the theory and
procedures involved in statstical parsimony in great detail. Those get
a little complicated, and we'll get to those complications soon
enough, but in outline the process is pretty simple:
- Evaluate the limits of parsimony, i.e., the number of mutational
steps that can be reliably inferred without having to worry about
multiple substitutions.
- Construct ``the set of parsimonious and non-parsimonious
cladograms that is consistent with these limits'' (p. 619).1
So why use parsimony? Within species the time for substitutions to
occur is relatively short. As a result, it may be reasonable to assume
that we don't have to worry about multiple substitutions having
occurred, at least between those haplotypes that are the most closely
related. To ``identify the limits of parsimony'' we first estimate
from our data. Then we plug it into a formula that
allows us to assess the probability that the difference between two
randomly drawn haplotypes in our sample is the result of more than one
substituion.2 If that
probability is small, say less than 5%, we can connect all of the
haplotypes into a parsimonious network.
More likely than not, we won't be able to connect all of the
haplotypes parsimoniously, but there's still a decent chance that
we'll be able to identify subsets of the haplotypes for which the
assumption of parsimonious change is reasonable. Templeton et
al. [5] suggest the following procedure to
construct a haplotype network:
- Step 1:
- Estimate
the probability that haplotype pairs
differing by a single change are the result of a single
substitution. If
, as is likely, connect all pairs of
haplotypes that differ by a single change. There may be ambiguities
in the reconstruction, including loops. Keep these in the network.
- Step 2:
- Identify the products of recombination by inspecting
the 1-step network to determine if postulating recombination between
a pair of sequences can remove ambiguity identified in step 1.
- Step 3:
- Augment
by one and estimate
. If
,
join
-step networks into a
-step network by connecting
the two haplotypes that differ by
steps. Repeat until either all
haplotypes are included in a single network or you are left with two
or more non-overlapping networks.
- Step 4:
- If you have two or more networks left to connect,
estimate the smallest number of non-parsimonious changes that will
occur with greater than 95% probability, and connect the networks.
Refer to Templeton et al. [5] for
details of the calculations. Figure 2 provides an
example of the resulting analysis.
Figure 2:
Statistical parsimony network for the Amy locus of
Drosophila melanogaster.
|
|
Next: Nested clade analysis
Up: Nested clade analysis
Previous: Introduction
Kent Holsinger
2006-12-06