Next: Divergence of nucleotide sequences
Up: Types of data
Previous: The physical basis of
The diversity of laboratory techniques used to reveal molecular
variation is even greater than the diversity of underlying physical
structures. I'll mention only the most important techniques.
- Immunological distance
- Some molecules, notably protein
molecules, induce an immune response in common laboratory
mammals. The cross-reactivity between an antigen raised to humans
and chimps, for example, can be used as a measure of evolutionary
distance. The ID between humans and chimps is smaller than it is
between humans and orangutans, suggesting that humans and chimps
share a more recent common ancestor.
- DNA-DNA hybridization
- Once the repetitive sequences have been
``subtracted out'', the rate and temperature at which DNA species
from two different species anneal reflects the average percent
sequence divergence between them. The percent sequence divergence
can be used as a measure of evolutionary distance. Immunological
distances and DNA-DNA hybridization were used primarily to identify
phylogenetic relationships among species. Neither is now widely used
in molecular evolution studies.
- Isozymes
- Biochemists recognized in the late 1950s that many
soluble enzymes occurred in multiple forms within a single
individual. Population genetics, notably Hubby and Lewontin, later
recognized that in many cases, these different forms corresponded to
different alleles at a single locus, allozymes. Allozymes are
relatively easy to score in most macroscopic organisms, they are
typically co-dominant (the allelic composition of heterozygotes can
be inferred), and they allow investigators to identify both variable
and non-variable loci.3 Patterns of
variation at allozyme loci may not be representative of genetic
variation that does not result from differences in protein structure
or that are related to variation in proteins that are insoluble.
- RFLPs
- In the 1970s molecular geneticists discovered restriction
enzymes, enzymes that cleave DNA at specific 4, 5, or 6 base pair
sequences, the recognition site. A single nucleotide change in
a recognition site is usually enough to eliminate it. Thus, presence
or absence of a restriction site at a particular position in a
genome provides compelling evidence of an underlying difference in
nucleotide sequence at that positon.
- RAPDs, AFLPs, ISSRs
- With the advent of the polymerase chain
reaction in the late 1980s, several related techniques for the rapid
assessment of genetic variation in organisms for which little or no
prior genetic information was available. These methods differ in
details of how the laboratory procedures are performed, buty they
are similar in that they (a) use PCR to amplify anonymous stretches
of DNA, (b) generally produce larger amounts of variation than
allozyme analyses of the same taxa, and (c) are bi-allelic, dominant
markers. They have the advantage, relative to allozymes, that they
sample more or less randomly through the genome. They have the
disadvantage that heterozygotes cannot be distinguished from
dominant homozygotes, meaning that it is difficult to use them to
obtain information about levels of within population inbreeding.
- Microsatellites
- Satellite DNA, highly repetitive DNA associated
with heterochromatin, had been known since biochemists first began
to characterize the large-scale structure of genomes in DNA
hybridization studies. In the mid-late 1980s several investigators
identified smaller repetitive units dispersed throughout many
genomes. Microsatellites, which consist of short (2-6) nucleotide
sequences repeated many times, have proven particularly useful for
analyses of variation within populations since the
mid-1990s. Because of high mutation rates at each locus, they
commonly have many alleles. Moreover, they are typically
co-dominant, making them more generally useful than dominant
markers. Identifying variable microsatellite loci is more laborious
than identifying AFLPs, RAPDs, or ISSRs.
- Nucleotide sequence
- The advent of automated sequencing has
greatly increased the amount of population-level data available on
nucleotide sequences. Nucleotide sequence data has an important
advantage over most of the types of data discussed so far:
allozymes, RFLPs, AFLPs, RAPDs, and ISSRs may all hide
variation. Nucleotide sequence differences need not be reflected in
any of those markers. On the other hand, each of those markers
provides information on variation at several or many, independently
inherited loci. Nucleotide sequence information reveals differences
at a location that rarely extends more than 2-3kb.
- Single nucleotide polymorphisms
- In organisms that are
genetically well-characterized it may be possible to identify a
large number of single nucleotide positions that harbor
polymorphisms. These SNPs potentially provide high-resolution
insight into patterns of variation within the genome. For example,
the HapMap project has identified approximately 3.2M SNPs in the
human genome, or about one every kb [1].
As you can see from these brief descriptions, each of the markers
reveals different aspects of underlying hereditary differences among
individuals, populations, or species. There is no single ``best''
marker for evolutionary analyses. Which is best depends on the
question you are asking. In many cases in molecular evolution, the
interest is intrinsically in the evolution of the molecule itself, so
the choice is based not on what those molecules reveal about the
organism that contains them but on what questions about which
molecules are the most interesting.
Next: Divergence of nucleotide sequences
Up: Types of data
Previous: The physical basis of
Kent Holsinger
2008-09-03