Inbreeding and self-fertilization

Introduction

Remember that long list of assumptions associated with derivation of the Hardy-Weinberg principle that we just finished? Well, we’re about to begin violating assumptions to explore the consequences, but we’re not going to violate them in order. We’re first going to violate Assumption #2:

Genotypes mate at random with respect to their genotype at this particular locus.

There are many ways in which this assumption might be violated:

When there is sexual selection or disassortative mating genotypes differ in their chances of being included in the breeding population. As a result, allele and genotype frequencies will tend to change from one generation to the next. We’ll talk a little about these types of departures from random mating when we discuss the genetics of natural selection in a few weeks, but we’ll ignore them for now. In fact, we’ll also ignore assortative mating, since it’s properties are fairly similar to those of inbreeding, and inbreeding is easier to understand. We’ll also ignore asexual reproduction, since genotypes simply reproduce themselves and the genetic composition of the population doesn’t change.1

Self-fertilization

Self-fertilization is the most extreme form of inbreeding possible, and it is characteristic of many flowering plants and some hermaphroditic animals, including freshwater snails and that darling of developmental genetics, Caenorhabditis elegans.2 It’s not too hard to figure out what the consequences of self-fertilization will be without doing any algebra.3

So you’d expect that the frequency of heterozygotes would be halved every generation, that the frequency of homozygotes would increase, and that the allele frequencies wouldn’t change,4 and you’d be right. To see why, consider the following mating table:5

Offspring genotype
Mating frequency \(A_1A_1\) \(A_1A_2\) \(A_2A_2\)
\(A_1A_1 \times A_1A_1\) \(x_{11}\) 1 0 0
\(A_1A_2 \times A_1A_2\) \(x_{12}\) \(\frac{1}{4}\) \(\frac{1}{2}\) \(\frac{1}{4}\)
\(A_2A_2 \times A_2A_2\) \(x_{22}\) 0 0 1

Using the same technique we used to derive the Hardy-Weinberg principle, we can calculate the frequency of the different offspring genotypes from the above table. \[\begin{aligned} x_{11}' &=& x_{11} + x_{12}/4 \\ x_{12}' &=& x_{12}/2 \\ x_{22}' &=& x_{22} + x_{12}/4 \end{aligned}\]

I use the \('\) to indicate the next generation. Notice that in making this calculation I assume that all other conditions associated with Hardy-Weinberg apply (meiosis is fair, no differences among genotypes in probability of survival, no input of new genetic material, etc.). We can also calculate the frequency of the \(A_1\) allele among offspring, namely \[\begin{aligned} p' &=& x_{11}' + x_{12}'/2 \\ &=& x_{11} + x_{12}/4 + x_{12} /4 \\ &=& x_{11} + x_{12}/2 \\ &=& p \end{aligned}\]

These equations illustrate two very important principles that are true with any system of strict inbreeding:

  1. Inbreeding does not cause allele frequencies to change, but it will generally cause genotype frequencies to change.6

  2. Inbreeding reduces the frequency of heterozygotes relative to Hardy-Weinberg expectations. It need not eliminate heterozygotes entirely, but it is guaranteed to reduce their frequency.7

Partial self-fertilization

Many plants reproduce by a mixture of outcrossing and self-fertilization. To a population geneticist that means that they reproduce by a mixture of selfing and random mating.8 Now I’m going to pull a fast one and derive the equations that determine how allele frequencies change from one generation to the next without using a mating table. To do so, I’m going to imagine that our population consists of a mixture of two populations. In one part of the population all of the reproduction occurs through self-fertilization and in the other part all of the reproduction occurs through random mating. If you think about it for a while, you’ll realize that this is equivalent to imagining that each plant reproduces some fraction of the time through self-fertilization and some fraction of the time through random mating.9 Let \(\sigma\) be the fraction of progeny produced through self-fertilization, then \[\begin{aligned} x_{11}' &=& p^2(1-\sigma) + (x_{11} + x_{12}/4)\sigma \\ x_{12}' &=& 2pq(1-\sigma) + (x_{12}/2)\sigma \label{eq:het} \\ x_{22}' &=& q^2(1-\sigma) + (x_{22} + x_{12}/4)\sigma \end{aligned}\] Notice that I use \(p^2\), \(2pq\), and \(q^2\) for the genotype frequencies in the part of the population that’s mating at random. Question: Why can I get away with that?10

It takes a little more algebra than it did before, but it’s not difficult to verify that the allele frequencies don’t change between parents and offspring. \[\begin{aligned} p' &=& \left\{p^2(1-\sigma) + (x_{11} + x_{12}/4)\sigma\right\} + \left\{pq(1-\sigma) + (x_{12}/4)\sigma\right\} \\ &=& p(p+q)(1-\sigma) + (x_{11} + x_{12}/2)\sigma \\ &=& p(1-\sigma) + p\sigma \\ &=& p \end{aligned}\] Because homozygous parents can always have heterozygous offspring (when they outcross), heterozygotes are never completely eliminated from the population as they are with complete self-fertilization. In fact, we can solve for the equilibrium frequency of heterozygotes, i.e., the frequency of heterozygotes reached when genotype frequencies stop changing.11 By definition, an equilibrium for \(x_{12}\) is a value such that if we put it in on the right side of equation ([eq:het]) we get it back on the left side, or in equations \[\begin{aligned} \hat x_{12} &=& 2pq(1-\sigma) + (\hat x_{12}/2)\sigma \\ \hat x_{12}(1 - \sigma/2) &=& 2pq(1-\sigma) \\ \hat x_{12} &=& \frac{2pq(1-\sigma)}{(1-\sigma/2)} \end{aligned}\]

It’s worth noting several things about this set of equations:

  1. I’m using \(\hat x_{12}\) to refer to the equilibrium frequency of heterozygotes. I’ll be using hats over variables to denote equilibrium properties throughout the course.12

  2. I can solve for \(\hat x_{12}\) in terms of \(p\) because I know that \(p\) doesn’t change. If \(p\) changed, the calculations wouldn’t be nearly this simple.

  3. The equilibrium is approached gradually (or asymptotically as mathematicians would say). While a single generation of random mating will put genotypes in Hardy-Weinberg proportions (assuming all the other conditions are satisfied), many generations may be required for genotypes to approach their equilibrium frequency with partial self-fertilization.

Inbreeding coefficients

Now that we’ve found an expression for \(\hat x_{12}\) we can also find expressions for \(\hat x_{11}\) and \(\hat x_{22}\). The complete set of equations for the genotype frequencies with partial selfing are: \[\begin{aligned} \hat x_{11} &=& p^2 + \frac{\sigma pq}{2(1-\sigma/2)} \\ \hat x_{12} &=& 2pq - 2\left(\frac{\sigma pq}{2(1-\sigma/2)}\right) \\ \hat x_{22} &=& q^2 + \frac{\sigma pq}{2(1-\sigma/2)} \end{aligned}\] Notice that all of those equations have a term \(\sigma/(2(1-\sigma/2))\). Let’s call that term \(f\). Then we can save ourselves a little hassle by rewriting the above equations as: \[\begin{aligned} \hat x_{11} &=& p^2 + fpq \\ \hat x_{12} &=& 2pq(1-f) \\ \hat x_{22} &=& q^2 + fpq \end{aligned}\] Now you’re going to have to stare at this a little longer, but notice that \(\hat x_{12}\) is the frequency of heterozygotes that we observe and \(2pq\) is the frequency of heterozygotes we’d expect under Hardy-Weinberg in this population.13 So if we divide both sides of equation (22) by \(2pq\), we get

\[\begin{aligned} 1-f &=& \frac{\hat x_{12}}{2pq} \\ f &=& 1 - \frac{\hat x_{12}}{2pq} \\ &=& 1 - \frac{\hbox{observed heterozygosity}}% {\hbox{expected heterozygosity}} \end{aligned}\]

\(f\) is the inbreeding coefficient. When defined as 1 - (observed heterozygosity)/(expected heterozygosity) it can be used to measure the extent to which a particular population departs from Hardy-Weinberg expectations.14 When \(f\) is defined in this way, I refer to it as the population inbreeding coefficient.15

But \(f\) can also be regarded as a function of a particular system of mating. With partial self-fertilization the population inbreeding coefficient when the population has reached equilibrium is \(\sigma/(2(1-\sigma/2))\). When regarded as the inbreeding coefficient predicted by a particular system of mating, I refer to it as the equilibrium inbreeding coefficient.

We’ll encounter one more definition for \(f\) once I’ve introduced idea of identity by descent.

Identity by descent

Self-fertilization is, of course, only one example of the general phenomenon of inbreedingnon-random mating in which individuals mate with close relatives more often than expected at random. We’ve already seen that the consequences of inbreeding can be described in terms of the inbreeding coefficient, \(f\) and I’ve introduced you to two ways in which \(f\) can be defined.16 I’m about to introduce you to one more, but first I have to tell you about identity by descent.

Two alleles at a single locus are identical by descent if they are identical copies of the same allele in some earlier generation, i.e., both are copies that arose by DNA replication from the same ancestral sequence without any intervening mutation.

We’re more used to classifying alleles by type than by descent. Although we don’t usually say it explicitly, we regard two alleles as the “same,” i.e., identical by type, if they have the same phenotypic effects. Whether or not two alleles are identical by descent, however, is a property of their genealogical history, not of their phenotypic effects. Consider the following two scenarios:

Identity by descent
\(A_1\) \(\rightarrow\) \(A_1\)
\(\nearrow\)
\(A_1\)
\(\searrow\)
\(A_1\) \(\rightarrow\) \(A_1\)
Identity by type
\(A_1\) \(\rightarrow\) \(A_1\)
\(\nearrow\)
\(A_1\)
\(\searrow\)
\(A_2\) \(\rightarrow\) \(A_1\)
\(\uparrow\) \(\uparrow\)
mutation mutation

In both scenarios, the alleles at the end of the process are identical in type, i.e., they’re both \(A_1\) alleles and they have the same phenotypic effect. In the second scenario, however, they are identical in type only because one of the alleles has two mutations in its history.17 So alleles that are identical by descent will also be identical by type, but alleles that are identical by type need not be identical by descent.18

A third definition for \(f\) is the probability that two alleles chosen at random are identical by descent.19 Of course, there are several aspects to this definition that need to be spelled out more explicitly.20

Let’s imagine for a moment, however, that we’ve traced back the ancestry of all alleles in a particular population to what we call a reference population, i.e., a population in which we regard all alleles as unrelated. That’s equivalent to saying that alleles chosen at random from this population have zero probability of being identical by descent, even if they are identical by type. Given this assumption we can write down the genotype frequencies in a descendant population once we know \(f\), where we define \(f\) as the probability that two alleles chosen at random in the descendant population are identical by descent, i.e., descended from just one of the alleles in the reference population. \[\begin{aligned} x_{11} &=& p^2(1-f) + fp \\ x_{12} &=& 2pq(1-f) \\ x_{22} &=& q^2(1-f) + fq \quad . \end{aligned}\] It may not be immediately apparent, but you’ve actually seen these equations before in a different form. Since \(p - p^2 = p(1-p) = pq\) and \(q - q^2 = q(1-q) = pq\) these equations can be rewritten as \[\begin{aligned} x_{11} &=& p^2 + fpq \\ x_{12} &=& 2pq(1-f) \\ x_{22} &=& q^2 + fpq \quad . \end{aligned}\]

This is the third way in which \(f\) can be defined and used that we will discuss,21 and you can now probably see why population geneticists tend to play fast and loose with the definitions. If we ignore the distinction between identity by type and identity by descent, then the equations we used earlier to show the relationship between genotype frequencies, allele frequencies, and \(f\) (defined as a measure of departure from Hardy-Weinberg expectations) are identical to those used to show the relationship between genotype frequencies, allele frequencies, and \(f\) (defined as a the probability that two randomly chosen alleles in the population are identical by descent).

Creative Commons License

These notes are licensed under the Creative Commons Attribution License. To view a copy of this license, visit or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.