next up previous
Next: Divergence of nucleotide sequences Up: Types of data Previous: The physical basis of

Revealing molecular variation

The diversity of laboratory techniques used to reveal molecular variation is even greater than the diversity of underlying physical structures. I'll mention only the most important techniques.

Immunological distance
Some molecules, notably protein molecules, induce an immune response in common laboratory mammals. The cross-reactivity between an antigen raised to humans and chimps, for example, can be used as a measure of evolutionary distance. The ID between humans and chimps is smaller than it is between humans and orangutans, suggesting that humans and chimps share a more recent common ancestor.

DNA-DNA hybridization
Once the repetitive sequences have been ``subtracted out'', the rate and temperature at which DNA species from two different species anneal reflects the average percent sequence divergence between them. The percent sequence divergence can be used as a measure of evolutionary distance. Immunological distances and DNA-DNA hybridization were used primarily to identify phylogenetic relationships among species. Neither is now widely used in molecular evolution studies.

Isozymes
Biochemists recognized in the late 1950s that many soluble enzymes occurred in multiple forms within a single individual. Population genetics, notably Hubby and Lewontin, later recognized that in many cases, these different forms corresponded to different alleles at a single locus, allozymes. Allozymes are relatively easy to score in most macroscopic organisms, they are typically co-dominant (the allelic composition of heterozygotes can be inferred), and they allow investigators to identify both variable and non-variable loci.3 Patterns of variation at allozyme loci may not be representative of genetic variation that does not result from differences in protein structure or that are related to variation in proteins that are insoluble.

RFLPs
In the 1970s molecular geneticists discovered restriction enzymes, enzymes that cleave DNA at specific 4, 5, or 6 base pair sequences, the recognition site. A single nucleotide change in a recognition site is usually enough to eliminate it. Thus, presence or absence of a restriction site at a particular position in a genome provides compelling evidence of an underlying difference in nucleotide sequence at that positon.

RAPDs, AFLPs, ISSRs
With the advent of the polymerase chain reaction in the late 1980s, several related techniques for the rapid assessment of genetic variation in organisms for which little or no prior genetic information was available. These methods differ in details of how the laboratory procedures are performed, buty they are similar in that they (a) use PCR to amplify anonymous stretches of DNA, (b) generally produce larger amounts of variation than allozyme analyses of the same taxa, and (c) are bi-allelic, dominant markers. They have the advantage, relative to allozymes, that they sample more or less randomly through the genome. They have the disadvantage that heterozygotes cannot be distinguished from dominant homozygotes, meaning that it is difficult to use them to obtain information about levels of within population inbreeding.

Microsatellites
Satellite DNA, highly repetitive DNA associated with heterochromatin, had been known since biochemists first began to characterize the large-scale structure of genomes in DNA hybridization studies. In the mid-late 1980s several investigators identified smaller repetitive units dispersed throughout many genomes. Microsatellites, which consist of short (2-6) nucleotide sequences repeated many times, have proven particularly useful for analyses of variation within populations since the mid-1990s. Because of high mutation rates at each locus, they commonly have many alleles. Moreover, they are typically co-dominant, making them more generally useful than dominant markers. Identifying variable microsatellite loci is more laborious than identifying AFLPs, RAPDs, or ISSRs.

Nucleotide sequence
The advent of automated sequencing has greatly increased the amount of population-level data available on nucleotide sequences. Nucleotide sequence data has an important advantage over most of the types of data discussed so far: allozymes, RFLPs, AFLPs, RAPDs, and ISSRs may all hide variation. Nucleotide sequence differences need not be reflected in any of those markers. On the other hand, each of those markers provides information on variation at several or many, independently inherited loci. Nucleotide sequence information reveals differences at a location that rarely extends more than 2-3kb.

Single nucleotide polymorphisms
In organisms that are genetically well-characterized it may be possible to identify a large number of single nucleotide positions that harbor polymorphisms. These SNPs potentially provide high-resolution insight into patterns of variation within the genome. For example, the HapMap project has identified approximately 3.2M SNPs in the human genome, or about one every kb [1].

As you can see from these brief descriptions, each of the markers reveals different aspects of underlying hereditary differences among individuals, populations, or species. There is no single ``best'' marker for evolutionary analyses. Which is best depends on the question you are asking. In many cases in molecular evolution, the interest is intrinsically in the evolution of the molecule itself, so the choice is based not on what those molecules reveal about the organism that contains them but on what questions about which molecules are the most interesting.


next up previous
Next: Divergence of nucleotide sequences Up: Types of data Previous: The physical basis of
Kent Holsinger 2008-09-03