Resemblance among relatives

Introduction

Just as individuals may differ from one another in phenotype because they have different genotypes, because they developed in different environments, or both, relatives may resemble one another more than they resemble other members of the population because they have similar genotypes, because they developed in similar environments, or both. In an experimental situation, typically try to randomize individuals across environments. If we are successful, any tendency for relatives to resemble one another more than non-relatives must be due to similarities in their genotypes.

Using this insight, we can develop a statistical technique that allows us to determine how much of the variance among individuals in phenotype is a result of genetic variance and how much is due to environmental variance. Remember, we can only ask about how much of the variability is due to genetic differences, and we can only do so in a particular environment and with a particular set of genotypes, and we can only do it when we randomize genotypes across environments.

An outline of the approach

The basic approach to the analysis is either to use a linear regression of offspring phenotype on parental phenotype, which as we’ll see estimates hn2, or to use a nested analysis of variance. One of the most complete designs is a full-sib, half-sib design in which each male sires offspring from several dams but each dam mates with only one sire.

The offspring of a single dam are full-sibs (they are nested within dams). Differences among the offspring of dams indicates that there are differences in maternal “genotype” in the trait being measured.11 1 Assuming that we’ve randomized siblings across environments. If we haven’t, siblings may resemble one another because of similarities in the environment they experienced, too.

The offspring of different dams mated to a single sire are half-sibs. Differences among the offspring of sires indicates that thee are differences in paternal “genotype” in the trait being measured.22 2 You’ll see the reason for the quotes around genotype in this paragraph and the last a little later. It’s a little more complex than what I’ve suggested.

As we’ll see, this design has the advantage that it allows both additive and dominance components of the genetic variance to be estimated. It has the additional advantage that we don’t have to assume that the distribution of environments in the offspring generation is the same as it was in the parental generation. To use the regression approach to estimate heritability, we have to assume that the distribution of environmental effects is the same in parental and offspring generations.

The gory details

OK, so I’ve given you the basic idea. Where does it come from, and how does it work? Funny you should ask. The whole approach is based on calculations of the degree to which different relatives resemble one another. For these purposes we’re going to continue our focus on phenotypes influenced by one locus with two alleles, and we’ll do the calculations in detail only for half sib families. We start with something that may look vaguely familiar.33 3 Remember our mother-offspring combinations with Zoarces viviparus? Take a look at Table 1.

Maternal Offspring genotype
genotype Frequency A1A1 A1A2 A2A2
A1A1 p2 p q 0
A1A2 2pq p2 12 q2
A2A2 q2 0 p q
Table 1: Half-sib family structure in a population with genotypes in Hardy-Weinberg proportions.

Note also that the probabilities in Table 1 are appropriate only if the progeny are from half-sib families. If the progeny are from full-sib families, we must specify the frequency of each of the nine possible matings (keeping track of the genotype of both mother and father) and the offspring that each will produce.44 4 To check your understanding of all of this, you might want to try to produce the appropriate table.

Covariance of two random variables

Let pxy be the probability that random variable X takes the value x and random variable Y takes the value y. Then the covariance between X and Y is:

Cov(X,Y)=pxy(x-μx)(y-μy),

where μx is the mean of X and μy is the mean of Y. The covariance between two random variables is a measure of how much they vary together — covary. If the covariance is large and positive, they tend to vary in the same way. Positive deviations from the mean in one are associated with positive deviations from the mean in the other, and negative deviations are similarly associated. If the covariance is large and negative, they tend to vary in opposite ways. Positive deviations from the mean in one variable are associated with negative deviations in the other, and vice versa. If the covariance is small, it means there isn’t a strong tendency for deviations from the mean in one variable to be associated with deviations in the other.

Covariance between half-siblings

Here’s how we can calculate the covariance between half-siblings: First, imagine selecting huge number of half-sibs pairs at random. The phenotype of the first half-sib in the pair is a random variable (call it S1), as is the phenotype of the second (call it S2). The mean of S1 is just the mean phenotype in all the progeny taken together, x¯. Similarly, the mean of S2 is just x¯.55 5 The reasoning here gets a little tricky, since the mean of different half-sib families may be different. Think about it this way. We picked this particular half-sib family at random from among all half-sib families in the population. It takes a bit of algebra to show it, but the mean phenotype of a randomly chosen half-sib family is x¯, meaning that x¯ is the mean phenotype for both S1 and S2. They’re part of the same family, so they share the same family mean. Now with one locus, two alleles we have three possible phenotypes: x11 (corresponding to the genotype A1A1), x12 (corresponding to the genotype A1A2), and x22 (corresponding to the genotype A2A2). So all we need to do to calculate the covariance between half-sibs is to write down all possible pairs of phenotypes and the frequency with which they will occur in our sample of randomly chosen half-sibs based on the frequenices in Table 1 above and the frequency of maternal genotypes. It’s actually a bit easier to keep track of it all if we write down the frequency of each maternal genotype and the frequency with which each possible phenotypic combination will occur in her progeny.

Cov(S1,S2) = p2[p2(x11-x¯)2+2pq(x11-x¯)(x12-x¯)+q2(x12-x¯)2]
+2pq[14p2(x11-x¯)2+12p(x11-x¯)(x12-x¯)+12pq(x11-x¯)(x22-x¯)
 +14(x12-x¯)2+12q(x12-x¯)(x22-x¯)+14q2(x22-x¯)2]
+q2[p2(x12-x¯)2+2pq(x12-x¯)+q2(x22-x¯)]
=  p2[p(x11-x¯)+q(x12-x¯)]2
+2pq[12p(x11-x¯)+12q(x12-x¯)+12p(x12-x¯)+12q(x22-x¯)]2
+q2[p(x12-x¯)+q(x22-x¯)]2
=  p2[px11+qx12-x¯]2
+2pq[12(px11+qx12-x¯)+12(px12+qx22-x¯)]2
+q2[px12+qx22-x¯]2
=  p2[α1-x¯2]2+2pq[12(α1-x¯2)+12(α2-x¯2)]2+q2[α2-x¯2]2
=  p2[12(2α1-x¯)]2+2pq[12(α1+α2-x¯)]2+q2[12(2α2-x¯)]2
= (14)[p2(2α1-x¯)2+2pq[(α1+α2-x¯)]2+q2(2α2-x¯)2]
= (14)Va

A numerical example

Now we’ll return to an example we saw earlier (Table 2). This set of genotypes and phenotypes may look familiar. It is the same one we encountered earlier when we calculated additive and dominance components of variance. Let’s assume that p=0.4. Then we know that

x¯ = 54.4
Va = 1505.28
Vd = 207.36.

We can also calculate the numerical version of Table 1, which you’ll find in Table 3.

Genotype A1A1 A1A2 A2A2
Phenotype 100 80 0
Table 2: An example of a non-additive relationship between genotypes and phenotypes.
Maternal Offspring genotype
genotype Frequency A1A1 A1A2 A2A2
A1A1 0.16 0.4 0.6 0.0
A1A2 0.48 0.2 0.5 0.3
A2A2 0.36 0.0 0.4 0.6
Table 3: Mother-offspring combinations (half-sib) when the frequency of A1 is 0.4.

So now we can follow the same approach we did before and calculate the numerical value of the covariance between half-sibs in this example:

Cov(S1,S2) =  [(0.4)2(0.16)+(0.2)2(0.48)](100-54.4)2
+[(0.6)2(0.16)+(0.5)2(0.48)+(0.4)2(0.36)](80-54.4)2
+[(0.3)2(0.48)+(0.6)2(0.36)](0-54.4)2
+2[(0.4)(0.6)(0.16)+(0.2)(0.5)(0.48)](100-54.4)(80-54.4)
+2(0.2)(0.3)(0.48)(100-54.4)(0-54.4)
+2[(0.5)(0.3)(0.48)+(0.4)(0.6)(0.36)](80-54.4)(0-54.4)
=  376.32
=  (14)1505.28.

Covariances among relatives

Well, if we can do this sort of calculation for half-sibs, you can probably guess that it’s also possible to do it for other relatives. I won’t go through all of the calculations, but the results for common forms of relationship are summarized in Table 4

MZ twins (CovMZ) Va+Vd
Parent-offspring (CovPO)1 (12)Va
Full sibs (CovFS) (12)Va+(14)Vd
Half sibs (CovHS) (14)Va
1One parent or mid-parent.
Table 4: Genetic covariances among relatives.

Estimating heritability

Galton introduced the term regression to describe the inheritance of height in humans. He noted that there is a tendency for adult offspring of tall parents to be tall and of short parents to be short, but he also noted that offspring tended to be less extreme than the parents.66 6 It’s worth noting that Galton is often “credited” with establishing the field of eugenics. He was a proponent of encouraging the “best” people to marry one another to “improve” the human race. There is a building at University College London named in his honor, the Galton Laboratory. The University is considering changing its name (https://www.dailymail.co.uk/sciencetech/article-6466845/UCL-rename-buildings-honouring-Sir-Francis-Galton-known-father-eugenics.html). He described this as a “regression to mediocrity,” and statisticians adopted the term to describe a standard technique for describing the functional relationship between two variables.

Regression analysis

Measure the parents. Regress the offspring phenotype on: (1) the phenotype of one parent or (2) the mean of the parental parental phenotypes. In either case, the covariance between the parental phenotype and the offspring genotype is (12)Va. Now the regression coefficient between one parent and offspring, bPO, is

bPO = CovPOVar(P)
= (12)VaVp
= (12)hN2.

In short, the slope of the regression line is equal to one-half the narrow sense heritability. In the regression of offspring on mid-parent value,

Var(MP) = Var(M+F2)
= 14Var(M+F)
= 14(Var(M)+Var(F))
= 14(2Vp)
= 12Vp.

Thus, bMPO=12Va/12Vp=hN2. In short, the slope of the regression line is equal to the narrow sense heritability.

Sib analysis

Mate a number of males (sires) with a number of females (dams). Each sire is mated to more than one dam, but each dam mates only with one sire. Do an analysis of variance on the phenotype in the progeny, treating sire and dam as main effects. The result is shown in Table 5.

Composition of
Source d.f. Mean square mean square
Among sires s-1 MSS σW2+kσD2+dkσs2
Among dams s(d-1) MSD σW2+kσD2
   (within sires)
Within progenies sd(k-1) MSW σW2
s=number of sires
d=number of dams per sire
k=number of offspring per dam
Table 5: Analysis of variance table for a full-sib analysis of quantitative genetic variation.

Now we need some way to relate the variance components (σW2, σD2, and σS2) to Va, Vd, and Ve.77 7 σW2, σD2, and σS2 are often referred to as the observational components of variance, because they are estimated from observations we make on phenotypic variation. Va, Vd, and Ve are often referred to as the causal components of variance, because they represent the gentic and environmental influences on trait expression. How do we do that? Well,

Vp=σT2=σS2+σD2+σW2.

σS2 estimates the variance among the means of the half-sib familes fathered by each of the different sires or, equivalently, the covariance among half-sibs.88 8 To see why consider this is so, consider the following: The mean genotypic value of half-sib families with an A1A1 mother is px11+qx12; with an A1A2 mother, px11/2+qx12/2+px12/2+qx22/2; with an A2A2 mother, px12+qx22. The equation for the variance among these means is identical to the equation for the covariance among half-sibs.

σS2 = CovHS
= (14)Va.

Now consider the within progeny component of the variance, σW2. In general, it can be shown that any among group variance component is equal to the covariance among the members within the groups.99 9 With xij=ai+ϵij, where ai is the mean group effect and ϵij is random effect on individual j in group i (with mean 0), Cov(xij,xik)=E(ai+ϵij-μ)(ai+ϵik-μ)=E((ai-μ2)+ai(ϵij+ϵik)+ϵijϵik)=Var(A). Thus, a within group component of the variance is equal to the total variance minus the covariance within groups. In this case,

σW2 = Vp-CovFS
= Va+Vd+Ve-[(12)Va+(14)Vd]
= (12)Va+(34)Vd+Ve.

There remains only σD2. Now σW2=Vp-CovFS, σS2=CovHS, and σT2=Vp. Thus,

σD2 = σT2-σS2-σW2
= Vp-CovHS-(Vp-CovFS)
= CovFS-CovHS
= [(12)Va+(14)Vd]-(14)Va
= (14)Va+(14)Vd.

So if we rearrange these equations, we can express the genetic components of the phenotypic variance, the causal components of variance, as simple functions of the observational components of variance:

Va = 4σS2
Vd = 4(σD2-σS2)
Ve = σW2-3σD2+σS2.

Furthermore, the narrow-sense heritability is given by

hN2=4σs2σS2+σD2+σW2.

An example: body weight in female mice

The analysis involves 719 offspring from 74 sires and 192 dams, each with one litter. The offspring were spread over 4 generations, and the analysis is performed as a nested ANOVA with the genetic analysis nested within generations. An additional complication is that the design was unbalanced, i.e., unequal numbers of progeny were measured in each sibship. As a result the degrees of freedom don’t work out to be quite as simple as what I showed you.1010 10 What did you expect from real data? This example is extracted from Falconer and Mackay, pp. 169–170. See the book for details. The results are summarized in Table 6.

Composition of
Source d.f. Mean square mean square
Among sires 70 17.10 σW2+kσD2+dkσs2
Among dams 118 10.79 σW2+kσD2
   (within sires)
Within progenies 527 2.19 σW2
d=2.33
k=3.48
k=4.16
Table 6: Quantitative genetic analysis of the inheritance of body weight in female mice (from Falconer and Mackay, pp. 169–170.)

Using the expressions for the composition of the mean square we obtain

σW2 = MSW
= 2.19
σD2 = (1k)(MSD-σW2)
= 2.47
σS2 = (1dk)(MSS-σW2-kσD2)
= 0.48.

Thus,

Vp = 5.14
Va = 1.92
Vd+Ve = 3.22
Vd = (0.001.64)
Ve = (1.583.22)

Why didn’t I give a definite number for Vd after my big spiel above about how we can estimate it from a full-sib crossing design? Two reasons. First, if you plug the estimates for σD2 and σS2 into the formula above for Vd you get Vd=7.96,Ve=-4.74, which is clearly impossible since Vd has to be less than Vp and Ve has to be greater than zero. It’s a variance. Second, the experimental design confounds two sources of resemblance among full siblings: (1) genetic covariance and (2) environmental covariance. The full-sib families were all raised by the same mother in the same pen. Hence, we don’t know to what extent their resemblance is due to a common natal environment.1111 11 Notice that this doesn’t affect our analysis of half-sib families, i.e., the progeny of different sires, since each father was bred with several females If we assume Vd=0, we can estimate the amount of variance accounted for by exposure to a common natal environment, VEc=1.99, and by environmental variation within sibships, VEw=1.23.1212 12 See Falconer for details. Similarly, if we assume VEw=0, then Vd=1.64 and VEc=1.58. In any case, we can estimate the narrow sense heritability as

hN2 = (1.925.14)
= 0.37.

Creative Commons License

These notes are licensed under the Creative Commons Attribution License. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.