next up previous
Next: Bibliography Up: Introduction to molecular population Previous: Revealing molecular variation

Divergence of nucleotide sequences

Underlying everything else we're going to discuss in this last part of the course is the idea that we should be able to describe the degree of difference between nucleotide sequences, proteins, or anything else as a result of some underlying evolutionary processes. To illustrate the principle, let's start with nucleotide sequences and develop a fairly simple model that describes how they become different over time.

Let $q_t$ be the probability that two homologous nucleotides are identical after having been evolving for $t$ generations independently since the gene in which they were found was replicated in their common ancestor. Let $\lambda$ be the probability of a substitution occuring at this nucleotide position in either of the two genes during a small time interval, $\Delta t$. Then

\begin{eqnarray*}
q_{t+\Delta t} &=& (1 - \lambda\Delta t)^2q_t
+ 2\left(1 - \l...
...\
q_t &=& 1 - {3 \over 4}\left(1 - e^{-8\lambda t/3}\right) \\
\end{eqnarray*}

The expected number of nucleotide substitutions separating the two sequences at any one position since they diverged is $d = 2\lambda
t$.3 Thus,

\begin{eqnarray*}
q_t &=& 1 - {3 \over 4}\left(1 - e^{-4d/3}\right) \\
d &=& -{3 \over 4}\ln\left[1 - {4 \over 3}(1 - q_t)\right] \\
\end{eqnarray*}

This is the simplest model of nucleotide substitution possible - the Jukes-Cantor model. It assumes

Let's examine the second of those assumptions first. Observed differences between nucleotide sequences shows that some types of substitutions, i.e., transitions ($A \iff G$, $T \iff C$), occur much more frequently than others, i.e., transversions ($A \iff G$, $A \iff
C$, $T \iff A$, $T \iff G$). There are a variety of different substitution models corresponding to different assumed patterns of mutation: Kimura 2 parameter (K2P), Felsenstein 1984 (F84), Hasegawa-Kishino-Yano 1985 (HKY85), Tamura and Nei (TrN), and generalized time-reversible (GTR). The GTR is, as its name suggests, the most general time-reversible model. It allows substitution rates to differ between each pair of nucleotides. That's why it's general. It requires, however, that the substitution rate be the same in both directions. That's why it's time reversible. While it would be possible to construct a model in which the substitution rate differs depending on the direction of substitution, it leads to something of a paradox: with non-reversible substitution models the distance between two sequences $A$ and $B$ depends on whether we measure the distance from $A$ to $B$ or from $B$ to $A$.

There are two ways in which the rate of nucleotide substitution can be allowed to vary from position to position - the phenomenon of among-site rate variation. First, we expect the rate of substitution to depend on codon position in protein-coding genes. The sequence can be divided into first, second, and third codon positions and rates calculated separately for each of those positions. Second, we can assume a priori that there is a distribution of different rates possible and that this distribution is described by one of the standard distributions from probability theory. We then imagine that the substitution rate at any given site is determined by a random draw from the given probability distribution. The gamma distribution is widely to describe the pattern of among-site rate variation, because it can approximate a wide variety of different distributions (Figure 1).4

Figure 1: Examples of a gamma distribution.
\resizebox{!}{8cm}{\includegraphics{gamma.eps}}

The mean substitution rate in each curve above is 0.1. The curves differ only in the value of a parameter, $\alpha$, called the ``shape parameter.'' The shape parameter gives a nice numerical description of how much rate variation there is, except that it's backwards. The larger the parameter, the less among-site rate variation there is.


next up previous
Next: Bibliography Up: Introduction to molecular population Previous: Revealing molecular variation
Kent Holsinger 2006-11-04