Next: Revealing molecular variation
Up: Types of data
Previous: Types of data
With the exception of RNA viruses, the hereditary information in all
organisms is carried in DNA. Ultimately, differences in any of the
molecular markers we study (and of genetically-based morphological,
behavioral, or physiological traits) is associated with some
difference in the physical structure of DNA, and molecular
evolutionists study a variety of its aspects.
- Nucleotide sequence
- A difference in nucleotide sequence is the
most obvious way in which two homologous stretches of DNA may
differ. The differences may be in translated portions of protein
genes (exons), portions of protein genes that are transcribed but
not translated (e.g., introns, 5' or 3' untranslated regions),
non-transcribed functional regions (e.g., promoters), or regions
without apparent function.
- Protein sequence
- Because of redundancy in the genetic code, a
difference in nucleotide sequence at a protein-coding locus may or
may not result in proteins with a different amino acid
sequence. Important note: Don't forget that some loci code for
RNA that has an immediate function without being translated to a
protein, e.g., ribosomal RNA and various small nuclear RNAs.
- Secondary, tertiary, and quaternary structure
- Differences in
amino acid sequence may or may not lead to a different distribution
of
-helices and
-sheets, to a different
three-dimensional structure, or to different multisubunit
combinations.
- Imprinting
- At certain loci in some organisms the expression
pattern of a particular allele depends on whether that allele was
inherited from the individual's father or its mother.
- Expression
- Functional differences among individuals may arise
because of differences in the patterns of gene expression, even if
there are no differences in the primary sequences of the genes that
are expressed.3
- Sequence organization
- Particular genes may differ between
organisms because of differences in the position and number of
introns. At the whole genome level, there may be differences in the
amount and kind of repetitive sequences, in the amount and type of
sequences derived from transposable elements, in the relative
proportion of G-C relative to A-T, or even in the identity and
arrangement of genes that are present. In microbial species, only a
subset of genes are present in all strains. For example, in Streptococcus pneumoniae the ``core genome'' contains only 73%
of the loci present in one fully sequenced reference
strain [7]. Similarly, a survey of 20 strains of
Escherichia coli and one of E. fergusonii, E. coli's closest relative, identified only 2000 homologous
loci that were present in all strains out of 18,000 orthologous loci
identified [9]
- Copy number variation
- Even within diploid genomes, there may be
substantial differences in the number of copies of particular
genes. In humans, for example, 76 copy-number polymorphisms (CNPs)
were identified in a sample of only 20 individuals, and individuals
differed from one another by an average of 11
CNPs. [8].
It is worth remembering that in nearly all eukaryotes there
are two different genomes whose characteristics may be analyzed: the
nuclear genome and the mitochondrial genome. In plants there is a
third: the chloroplast genome. In some protists, there may be even
more, because of secondary or tertiary endosymbiosis. The
mitochondrial and chloroplast genomes are typically inherited only
through the maternal line, although some instances of biparental
inheritance are known.
Next: Revealing molecular variation
Up: Types of data
Previous: Types of data
Kent Holsinger
2010-12-13