next up previous
Next: Concerted evolution Up: Evolution in multigene families Previous: Introduction

Globin evolution

I've just pointed out the distinction between myoglobin and hemoglobin. You may also remember that hemoglobin is a multimeric protein consisting of four subunits, 2 $\alpha $ subunits and 2 $\beta $ subunits. What you may not know is that in humans there are actually two types of $\alpha $ hemoglobin and four types of $\beta $ hemoglobin, each coded by a different genetic locus (see Table 1). The five $\alpha $-globin loci ($\alpha_1$, $alpha_2$, $\zeta$, and two non-functional pseudogenes) are found in a cluster on chromosome 16. The six $\beta $-globin loci ($\epsilon$, $\gamma_G$, $\gamma_A$, $\delta $, $\beta $, and a pseudogene) are found in a cluster on chromosome 11. The myoglobin locus is on chromosome 22.


Table 1: Human hemoglobins arranged in developmental sequence. Adult hemoglobins composed of 2$\alpha $ and 2$\delta $ subunits typically account for less than 3% of hemoglobins in adults (http://sickle.bwh.harvard.edu/hbsynthesis.html).
Developmental stage $\alpha $ globin $\beta $ globin
Embryo $\zeta$ $\epsilon$
  $\alpha $ $\epsilon$
Fetus $\alpha $ $\beta $
  $\alpha $ $\gamma$
Adult $\alpha $ $\beta $
  $\alpha $ $\delta $


Not only do we have all of these different types of globin genes in our bodies, they're all related to one another. Comparative sequence analysis has shown that vertebrate myoglobin and hemoglobins diverged from one another about 450 million years ago. Figure 1 shows a phylogenetic analysis of globin genes from humans, mice, and a variety of Archaea. Focus your attention on the part of the tree that has human and mouse sequences. You'll notice two interesting things:

This pattern is exactly what we expect as a result of duplication and divergence. Up to the time that a gene becomes duplicated, its evolutionary history matches the evolutionary history of the organisms containing it. Once there are duplicate copies, each follows an independent evolutionary history. Each traces the history of speciation and divergence. And over long periods duplicate copies of the same gene share more recent common ancestry with copies of the same gene in a different species than they do with duplicate genes in the same genome.

Figure 1: Evolution of globin genes in Archaea and mammals (from [2]).
\resizebox{\textwidth}{!}{\includegraphics{globin.eps}}

A history of duplication an divergence in multigene families makes it important to distinguish between two classes of related loci: those that represent the same locus in different species and between which divergence is a result of species divergence are orthologs. Those that represent different loci and between which divergence occurred after duplication of an ancestral gene are paralogs. The $\beta $-globin loci of humans and chickens are orthologous. The $\alpha $- and $\beta $-globin loci of any pair of taxa are paralogous.

As multigene families go, the globin family is relatively simple and easy to understand. There are only about a dozen loci involved, one isolated locus (myoglobin) and two clusters of loci ($\alpha $- and $\beta $-globins). You'll find a diagram of the $\beta $-globin cluster in Figure 2. As you can see the $\beta $-globins are not only evolutionarily related to one another they occur relatively close to one another on chromosome 11 in humans.

Figure 2: Structure of the human $\beta $-globin gene cluster. % identity refers to similarity to the mouse $\beta $-globin sequence. From http://globin.cse.psu.edu/html/pip/betaglobin/iplot.ps (retrieved 28 Nov 2006).
\includegraphics{beta-globin.eps}

Other families are far more complex. Class I and class II MHC loci, for example are part of the same multigene family. Moreover, immunoglobulins, T-cell receptors, and, and MHC loci are part of a larger superfamily of genes, i.e., all are ultimately derived from a common ancestral gene by duplication and divergence. Table 2 lists a few examples of multigene families and superfamilies in the human genome and the number of proteins produced.


Table 2: A few gene families from the human genome (adapted from [5,6]).
Protein family domain Number of proteins
Actin 61
Immunoglobulin 381
Fibronectin type I 5
Fibronectin type II 11
Fibronectin type III 106
Histone  
    H2A/H2B/H3/H4 75
Homeobox 160
Immunoglobulin 381
MHC Class I 18
MHC Class II$\alpha $ 5
MHC Class II$\beta $ 7
T-cell receptor $\alpha $ 16
T-cell receptor $\beta $ 15
T-cell receptor $\gamma$ 1
T-cell receptor $\delta $ 1
Zinc finger, C2H2 564
Zinc finger, C3HC4 135



next up previous
Next: Concerted evolution Up: Evolution in multigene families Previous: Introduction
Kent Holsinger 2006-11-28