As I’ve tried to make clear in lecture, drift is complicated. As I’m sure you’ve noticed, it can also be confusing. The purpose of the simulations last week, this week, and next week is to let you explore the complications a bit more by presenting more realistic models of the drift process and comparing them to the analytical expectations we’ve derived (or I’ve presented in class).1
Although it is very simple, Sewall Wright’s infinite island model of migration is a useful way to explore how drift, migration, and mutation interact to influence the genetic structure of populations that are divided into subpopulations. In this model we assume that
Every allele introduced into a population differs from any of the resident alleles.
Every population receives the same fraction, \(m\), of migrants.2
Given those assumptions and assuming an infinite alleles model of mutation, we showed that3
\[ F_{ST} = \frac{1}{4N_e(m + \mu) + 1} \quad . \]
A more realistic model of migration is the finite island model of migration.4 It’s still very simple, but it adds one thing that makes it a bit more realistic:
It assumes that there are a finite number of populations exchanging migrants with one another. Every population still receives the same fraction of migrants, \(m\), but we now also specify that the fraction of migrants coming from any one population into this one is \(m/(k - 1)\), where \(k\) is the number of populations.
The formula for \(F_{ST}\) under the
finite island model is a lot more complicated than the one above. If
you’re interested, you can look in the R
code below.5 Let’s
compare the predictions for a few values of \(m\) and \(k\) with \(N_e =
100\) and \(\mu = 10^{-3}\).
rm(list = ls())
infinite_island <- function(n_e, m, mu) {
return(1/(4*n_e*(m + mu) + 1))
}
lambda <- function(mu, m, k) {
m_prime <- m*k/(k - 1)
l <- (1/2)*((1 - mu)^(-2)*(1- m_prime)^(-2) - 1)
return(l)
}
gamma <- function(mu, m, k) {
m_prime <- m*k/(k - 1)
g <- (1 + m_prime*(2 - m_prime)/(k*mu*(2 - mu)*(1 - m_prime)^2))^(-1)
return(g)
}
finite_island <- function(n_e, m, mu, n_islands) {
l <- lambda(mu, m, n_islands)
g <- gamma(mu, m, n_islands)
h_s <- 4*n_e*l*g/(4*n_e*l*g + 1)
h_b <- h_s*(1 + 4*n_e*l)/(4*n_e*l)
h_t <- ((n_islands - 1)/n_islands)*h_b + (1/n_islands)*h_s
f_st <- (h_t - h_s)/h_t
return(f_st)
}
n_e <- 100
mu <- 10^(-4)
k_vals <- rep(c(5, 25, 250, 2500), 2)
m_vals <- c(rep(0.01, 4), rep(0.05, 4))
dat <- data.frame(k = k_vals,
m = m_vals,
Infinite = rep(NA, length(k_vals)),
Finite = rep(NA, length(m_vals)),
Ratio = rep(NA, length(k_vals)))
for (i in 1:nrow(dat)) {
dat$Infinite[i] <- infinite_island(n_e, dat$m[i], mu)
dat$Finite[i] <- finite_island(n_e, dat$m[i], mu, dat$k[i])
dat$Ratio[i] <- dat$Infinite[i]/dat$Finite[i]
}
round(dat, 3)
From the table above, it’s apparent that predictions of the infinite island and finite island models are very different when the number of islands is small. Let’s run a simulation of the finite island model and see how well the simulation results match up to each prediction.
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ───────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.5 ✔ purrr 1.0.1
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.1 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.1
── Conflicts ──────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library(hierfstat)
initialize <- function(k, n_loci) {
p <- array(dim = c(n_loci, k, 2))
for (j in 1:n_loci) {
for (i in 1:k) {
p[j, i, 1] <- runif(n = 1, min = 0, max = 1)
p[j, i, 2] <- 1 - p[j, i, 1]
}
}
return(p)
}
make_symmetric_matrix <- function(m, k) {
M <- diag(x = 1 - m, nrow = k)
for (i in 1:(k - 1)) {
for (j in (i+1):k) {
M[i, j] <- m/(k-1)
M[j, i] <- M[i, j]
}
}
return(M)
}
sample_two_alleles <- function(p) {
allele_1 <- ((runif(1) < p) + 1)*10
allele_2 <- (runif(1) < p) + 1
return(allele_1 + allele_2)
}
make_sample <- function(p, n_sample) {
n_pops <- dim(p)[2]
n_loci <- dim(p)[1]
names <- paste("Locus", seq(1:n_loci), sep = "")
pops <- numeric(n_pops*n_sample)
ct <- 1
for (k in 1:n_pops) {
population <- paste("Population", k, sep = "")
for (i in 1:n_sample) {
pops[ct] <- population
ct <- ct + 1
}
}
locus_data <- matrix(nrow = n_pops*n_sample, ncol = n_loci)
for (i in 1:n_loci) {
ct <- 1
for (k in 1:n_pops) {
for (n in 1:n_sample) {
locus_data[ct, i] <- sample_two_alleles(p[i, k, 1])
ct <- ct + 1
}
}
}
df <- data.frame(locus_data)
colnames(df) <- paste("Locus", seq(1:n_loci), sep = "")
df$Population <- pops
df <- relocate(df, Population)
return(df)
}
simulate <- function(n_e, m, mu, n_islands, n_loci, n_gen, n_sample) {
p <- initialize(n_islands, n_loci)
M <- make_symmetric_matrix(m, n_islands)
V <- make_symmetric_matrix(mu, 2)
g_st <- numeric(n_loci)
p_star <- array(dim = c(n_loci, n_islands, 2))
for (i in 1:n_gen) {
for (k in 1:n_loci) {
p_star[k, , ] <- M %*% p[k, , ] %*% V
for (j in 1:n_islands) {
p[k, j, ] <- rmultinom(1, 2*n_e, p_star[k, j, ])/(2*n_e)
}
mu_p <- mean(p[k, , 1])
g_st[k] <- ((n_islands - 1)/n_islands)*var(p[k, , 1])/(mu_p*(1 - mu_p))
}
}
df <- make_sample(p, n_sample)
return(wc(df)$FST)
}
run_simulation <- function(n_e, m, mu, n_islands, n_loci, n_gen,
n_sample, n_repetitions)
{
f_st <- numeric(n_repetitions)
for (i in 1:n_repetitions) {
if (n_repetitions > 50) {
cat(".", sep="")
if ((i %% 50) == 0) {
cat(i, "\n", sep ="")
}
}
f_st[i] <-simulate(n_e, m, mu, n_islands, n_loci, n_gen, n_sample)
}
return(f_st)
}
plot_simulation <- function(f_st, n_e, m, mu, n_islands) {
df <- data.frame(F_st = f_st)
p <- ggplot(df, aes(x = F_st)) +
geom_histogram(bins = 20, alpha = 0.4) +
geom_vline(xintercept = infinite_island(n_e, m, mu),
linetype = "dashed",
color = "red") +
geom_vline(xintercept = finite_island(n_e, m, mu, n_islands),
linetype = "dashed",
color = "blue") +
ggtitle(paste("N_e = ", n_e, ", m = ", m, ", mu = ", mu,
", n_islands = ", n_islands, sep = "")) +
theme_bw()
return(p)
}
f_st <- run_simulation(100, 0.05, 1e-4, 5, 10, 1000, 25, 1000)
..................................................50
..................................................100
..................................................150
..................................................200
..................................................250
..................................................300
..................................................350
..................................................400
..................................................450
..................................................500
..................................................550
..................................................600
..................................................650
..................................................700
..................................................750
..................................................800
..................................................850
..................................................900
..................................................950
..................................................1000
plot_simulation(f_st, 100, 0.05, 1e-4, 5)
The dashed red line is the prediction from the infinite island model. The dashed blue line is the prediction from the finite island model.
The last plot shows that neither the infinite island nor the finite island prediction captures the results of this particular simulation very well, although the finite island prediction is reasonably close to the average outcome from the simulations. Even so, it only predicts on average and there’s a lot of variation from one replicate to the next. This outcome reflects the “evolutionary” or “genetic” sampling we discussed when comparing Nei’s \(G_{ST}\) with Weir and Cockerham’s \(\theta\). This week’s lab exercise will explore that variability a bit further and explore conditions that affect the amount of variability and how useful the infinite island approximation is. To do so, you’ll want to use the following code.6 You’ll be changing
n_loci
- The number of loci included in the genetic
sample andn_gen
- The number of generations for which to run the
simulation.In addition to n_loci
and n_gen
there are
two more parameters
n_sample
- The number of individuals sampled in each
population (and used in making an estimate of \(F_{ST}\)) andn_repetitions
- The number of times that results from a
particular combination of N_e
, m
,
mu
, and n_islands
will be simulated for a
particular choice of n_loci
, n_gen
, and
n_sample
.For each choice of these four parameters, the simulation will run
through all combinations of N_e = (25, 200, 250, 500)
,
m = (0.05, 0.01, 0.005, 0.001)
,
mu = (0.001, 0.0001)
, and
n_islands = (10, 25, 50)
.
## These are the parameters to change
##
n_loci <- 10
n_gen <- 1000
## Leave these parameters unchanged
##
n_sample <- 25
n_repetitions <- 10
df <- data.frame(N_e = NA, m = NA, mu = NA, n_islands = NA, Observed = NA,
Infinite = NA, Finite = NA)
for (n_e in c(25, 100, 250, 500)) {
for (m in c(0.05, 0.01, 0.005, 0.001)) {
for (mu in c(0.001, 1e-4)) {
for (n_islands in c(10, 25, 50)) {
cat("N_e = ", n_e, ", m = ", m, ", mu = ", mu,
"n_islands = ", n_islands, "\n" )
f_st <- run_simulation(n_e, m, mu, n_islands, n_loci, n_gen,
n_sample, n_repetitions)
df <- add_row(df,
N_e = rep(n_e, n_repetitions),
m = rep(m, n_repetitions),
mu = rep(mu, n_repetitions),
n_islands = rep(n_islands, n_repetitions),
Observed = f_st,
Infinite = infinite_island(n_e, m, mu),
Finite = finite_island(n_e, m, mu, n_islands))
}
}
}
}
N_e = 25 , m = 0.05 , mu = 0.001 n_islands = 10
N_e = 25 , m = 0.05 , mu = 0.001 n_islands = 25
N_e = 25 , m = 0.05 , mu = 0.001 n_islands = 50
N_e = 25 , m = 0.05 , mu = 1e-04 n_islands = 10
N_e = 25 , m = 0.05 , mu = 1e-04 n_islands = 25
N_e = 25 , m = 0.05 , mu = 1e-04 n_islands = 50
N_e = 25 , m = 0.01 , mu = 0.001 n_islands = 10
N_e = 25 , m = 0.01 , mu = 0.001 n_islands = 25
N_e = 25 , m = 0.01 , mu = 0.001 n_islands = 50
N_e = 25 , m = 0.01 , mu = 1e-04 n_islands = 10
N_e = 25 , m = 0.01 , mu = 1e-04 n_islands = 25
N_e = 25 , m = 0.01 , mu = 1e-04 n_islands = 50
N_e = 25 , m = 0.005 , mu = 0.001 n_islands = 10
N_e = 25 , m = 0.005 , mu = 0.001 n_islands = 25
N_e = 25 , m = 0.005 , mu = 0.001 n_islands = 50
N_e = 25 , m = 0.005 , mu = 1e-04 n_islands = 10
N_e = 25 , m = 0.005 , mu = 1e-04 n_islands = 25
N_e = 25 , m = 0.005 , mu = 1e-04 n_islands = 50
N_e = 25 , m = 0.001 , mu = 0.001 n_islands = 10
N_e = 25 , m = 0.001 , mu = 0.001 n_islands = 25
N_e = 25 , m = 0.001 , mu = 0.001 n_islands = 50
N_e = 25 , m = 0.001 , mu = 1e-04 n_islands = 10
N_e = 25 , m = 0.001 , mu = 1e-04 n_islands = 25
N_e = 25 , m = 0.001 , mu = 1e-04 n_islands = 50
N_e = 100 , m = 0.05 , mu = 0.001 n_islands = 10
N_e = 100 , m = 0.05 , mu = 0.001 n_islands = 25
N_e = 100 , m = 0.05 , mu = 0.001 n_islands = 50
N_e = 100 , m = 0.05 , mu = 1e-04 n_islands = 10
N_e = 100 , m = 0.05 , mu = 1e-04 n_islands = 25
N_e = 100 , m = 0.05 , mu = 1e-04 n_islands = 50
N_e = 100 , m = 0.01 , mu = 0.001 n_islands = 10
N_e = 100 , m = 0.01 , mu = 0.001 n_islands = 25
N_e = 100 , m = 0.01 , mu = 0.001 n_islands = 50
N_e = 100 , m = 0.01 , mu = 1e-04 n_islands = 10
N_e = 100 , m = 0.01 , mu = 1e-04 n_islands = 25
N_e = 100 , m = 0.01 , mu = 1e-04 n_islands = 50
N_e = 100 , m = 0.005 , mu = 0.001 n_islands = 10
N_e = 100 , m = 0.005 , mu = 0.001 n_islands = 25
N_e = 100 , m = 0.005 , mu = 0.001 n_islands = 50
N_e = 100 , m = 0.005 , mu = 1e-04 n_islands = 10
N_e = 100 , m = 0.005 , mu = 1e-04 n_islands = 25
N_e = 100 , m = 0.005 , mu = 1e-04 n_islands = 50
N_e = 100 , m = 0.001 , mu = 0.001 n_islands = 10
N_e = 100 , m = 0.001 , mu = 0.001 n_islands = 25
N_e = 100 , m = 0.001 , mu = 0.001 n_islands = 50
N_e = 100 , m = 0.001 , mu = 1e-04 n_islands = 10
N_e = 100 , m = 0.001 , mu = 1e-04 n_islands = 25
N_e = 100 , m = 0.001 , mu = 1e-04 n_islands = 50
N_e = 250 , m = 0.05 , mu = 0.001 n_islands = 10
N_e = 250 , m = 0.05 , mu = 0.001 n_islands = 25
N_e = 250 , m = 0.05 , mu = 0.001 n_islands = 50
N_e = 250 , m = 0.05 , mu = 1e-04 n_islands = 10
N_e = 250 , m = 0.05 , mu = 1e-04 n_islands = 25
N_e = 250 , m = 0.05 , mu = 1e-04 n_islands = 50
N_e = 250 , m = 0.01 , mu = 0.001 n_islands = 10
N_e = 250 , m = 0.01 , mu = 0.001 n_islands = 25
N_e = 250 , m = 0.01 , mu = 0.001 n_islands = 50
N_e = 250 , m = 0.01 , mu = 1e-04 n_islands = 10
N_e = 250 , m = 0.01 , mu = 1e-04 n_islands = 25
N_e = 250 , m = 0.01 , mu = 1e-04 n_islands = 50
N_e = 250 , m = 0.005 , mu = 0.001 n_islands = 10
N_e = 250 , m = 0.005 , mu = 0.001 n_islands = 25
N_e = 250 , m = 0.005 , mu = 0.001 n_islands = 50
N_e = 250 , m = 0.005 , mu = 1e-04 n_islands = 10
N_e = 250 , m = 0.005 , mu = 1e-04 n_islands = 25
N_e = 250 , m = 0.005 , mu = 1e-04 n_islands = 50
N_e = 250 , m = 0.001 , mu = 0.001 n_islands = 10
N_e = 250 , m = 0.001 , mu = 0.001 n_islands = 25
N_e = 250 , m = 0.001 , mu = 0.001 n_islands = 50
N_e = 250 , m = 0.001 , mu = 1e-04 n_islands = 10
N_e = 250 , m = 0.001 , mu = 1e-04 n_islands = 25
N_e = 250 , m = 0.001 , mu = 1e-04 n_islands = 50
N_e = 500 , m = 0.05 , mu = 0.001 n_islands = 10
N_e = 500 , m = 0.05 , mu = 0.001 n_islands = 25
N_e = 500 , m = 0.05 , mu = 0.001 n_islands = 50
N_e = 500 , m = 0.05 , mu = 1e-04 n_islands = 10
N_e = 500 , m = 0.05 , mu = 1e-04 n_islands = 25
N_e = 500 , m = 0.05 , mu = 1e-04 n_islands = 50
N_e = 500 , m = 0.01 , mu = 0.001 n_islands = 10
N_e = 500 , m = 0.01 , mu = 0.001 n_islands = 25
N_e = 500 , m = 0.01 , mu = 0.001 n_islands = 50
N_e = 500 , m = 0.01 , mu = 1e-04 n_islands = 10
N_e = 500 , m = 0.01 , mu = 1e-04 n_islands = 25
N_e = 500 , m = 0.01 , mu = 1e-04 n_islands = 50
N_e = 500 , m = 0.005 , mu = 0.001 n_islands = 10
N_e = 500 , m = 0.005 , mu = 0.001 n_islands = 25
N_e = 500 , m = 0.005 , mu = 0.001 n_islands = 50
N_e = 500 , m = 0.005 , mu = 1e-04 n_islands = 10
N_e = 500 , m = 0.005 , mu = 1e-04 n_islands = 25
N_e = 500 , m = 0.005 , mu = 1e-04 n_islands = 50
N_e = 500 , m = 0.001 , mu = 0.001 n_islands = 10
N_e = 500 , m = 0.001 , mu = 0.001 n_islands = 25
N_e = 500 , m = 0.001 , mu = 0.001 n_islands = 50
N_e = 500 , m = 0.001 , mu = 1e-04 n_islands = 10
N_e = 500 , m = 0.001 , mu = 1e-04 n_islands = 25
N_e = 500 , m = 0.001 , mu = 1e-04 n_islands = 50
df <- filter(df, !is.na(N_e))
Running one set of simulations for n_loci
= 10 and
n_gen
= 1000 takes close to half an hour on my MacBook. If
you increase the number of loci or the number of generations by a factor
of two, it will take about twice as long for the simulations to finish.
You’ll get a printout periodically that lets you know what the current
set of parameters being simulated is, and you’ll see the set of
parameters that’s already been completed. Because running the simulation
for one set of parameters takes so long, you can’t possibly explore all
of the combinations of those parameters you might be interested in, but
I’d like you to explore at least four combinations of
n_loci
and n_gen
. Use the results from those
simulations to address the following questions:7
n_loci = 5
, n_gen = 1000
n_loci = 15
, n_gen = 1000
n_loci = 5
, n_gen = 100
n_loci = 25
, n_gen = 100
How close is the relationship between observed estimates of \(F_{ST}\) from the simulations and those predicted from the infinite island model? from the finite island model?
Under what conditions, if any, is the observed variation in \(F_{ST}\) relatively small?
The easiest way to run different combinations of parameters is to
copy the code block above and replace n_loci
and
n_gen
with new choices. To keep track of what you’re doing,
I recommend doing that for one, examining the results as described
below, and moving on to a new set of parameter (with a new code block)
after that.
You can plot the observed \(F_{ST}\) versus the \(F_{ST}\) predicted from either model using the code below.8 In addition to showing how well the simulations match the predictions, you can get a feel for how much variability there is among the simulations by seeing how scattered the points are.
## for the infinite island model
##
p <- ggplot(df, aes(x = Infinite, y = Observed)) +
geom_point() +
geom_abline(intercept = 0, slope = 1, linetype = "dashed",
color = "red") +
theme_bw() +
ggtitle("Infinite island model")
p
## for the finite island model
##
p <- ggplot(df, aes(x = Finite, y = Observed)) +
geom_point() +
geom_abline(intercept = 0, slope = 1, linetype = "dashed",
color = "red") +
theme_bw() +
ggtitle("Finite island model")
p
If you’d like a more quantitative way to compare the models,9 a simple
way is to run linear regressions of Observed
on
Infinite
and Expected
like this:
summary(lm(Observed ~ Infinite, data = df))
Call:
lm(formula = Observed ~ Infinite, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.137396 -0.009049 0.004256 0.013500 0.204770
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.005941 0.001488 -3.993 7.01e-05 ***
Infinite 0.946782 0.003974 238.253 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0313 on 958 degrees of freedom
Multiple R-squared: 0.9834, Adjusted R-squared: 0.9834
F-statistic: 5.676e+04 on 1 and 958 DF, p-value: < 2.2e-16
summary(lm(Observed ~ Finite, data = df))
Call:
lm(formula = Observed ~ Finite, data = df)
Residuals:
Min 1Q Median 3Q Max
-0.109311 -0.008866 0.000041 0.013271 0.240050
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.003874 0.001483 2.613 0.00911 **
Finite 0.965530 0.004122 234.214 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03183 on 958 degrees of freedom
Multiple R-squared: 0.9828, Adjusted R-squared: 0.9828
F-statistic: 5.486e+04 on 1 and 958 DF, p-value: < 2.2e-16
If predictions were perfect the (Intercept)
term in the
regression would be 0, and the other term (Infinite
in the
first regression and Finite
in the second) would be 1.
Given the variability of the simulations, you won’t see numbers that are
exactly 0 or 1. You can judge whether they are “close” by noticing
whether the number in the Estimate
column is more than two
standard errors away from what you predict.
The residual standard error
below the table measures how
much error there is around the regression, i.e., it gives you a way to
compare quantitatively how much variation there is in \(F_{ST}\).
Wright’s island model, whether the infinite island model or the
finite island model, is very unrealistic in one important respect.
Populations close to one another are no more likely to exchange migrants
than those that are far apart. Another model that Wright introduced is
more realistic, the one-dimensional stepping stone model. In this model
we assume that only adjacent populations can exchange migrants. It’s
still pretty unrealistic, but it may not be a bad approximation for a
set of populations that are distributed along a stream, a narrow beach,
or other habitat that is much narrower than it is long. Here’s the code
you’ll need to simulate results from a one-dimensional stepping stone
model.10 Play around with different values of
n_pops
. Compare results from the finite island model and
the stepping stone and see what you find. WARNING: This takes a long
time to run. You might decide to stop after trying it once with the
default parameters.
make_M_1d_stepping_stone <- function(m, k) {
M <- diag(1 - m, nrow = k)
M[1, 2] <- m
M[k, k - 1] <- m
for (i in 2:(k-1)) {
M[i, i - 1] <- m/2
M[i, i + 1] <- m/2
}
return(M)
}
simulate <- function(n_e, m, mu, n_pops, n_loci, n_gen, n_sample) {
## the finite island model
##
p <- initialize(n_pops, n_loci)
M <- make_symmetric_matrix(m, n_pops)
V <- make_symmetric_matrix(mu, 2)
g_st <- numeric(n_loci)
p_star <- array(dim = c(n_loci, n_pops, 2))
for (i in 1:n_gen) {
for (k in 1:n_loci) {
p_star[k, , ] <- M %*% p[k, , ] %*% V
for (j in 1:n_pops) {
p[k, j, ] <- rmultinom(1, 2*n_e, p_star[k, j, ])/(2*n_e)
}
mu_p <- mean(p[k, , 1])
}
}
df <- make_sample(p, n_sample)
fst_island <- wc(df)$FST
## the stepping stone model
##
p <- initialize(n_pops, n_loci)
M <- make_M_1d_stepping_stone(m, n_pops)
V <- make_symmetric_matrix(mu, 2)
g_st <- numeric(n_loci)
p_star <- array(dim = c(n_loci, n_pops, 2))
for (i in 1:n_gen) {
for (k in 1:n_loci) {
p_star[k, , ] <- M %*% p[k, , ] %*% V
for (j in 1:n_pops) {
p[k, j, ] <- rmultinom(1, 2*n_e, p_star[k, j, ])/(2*n_e)
}
mu_p <- mean(p[k, , 1])
}
}
df <- make_sample(p, n_sample)
fst_step <- wc(df)$FST
return(list(fst_step = fst_step, fst_island = fst_island))
}
run_simulation <- function(n_e, m, mu, n_pops, n_loci, n_gen,
n_sample, n_repetitions)
{
df <- tibble(Island = NA, Step = NA)
for (i in 1:n_repetitions) {
if (n_repetitions > 50) {
cat(".", sep="")
if ((i %% 50) == 0) {
cat(i, "\n", sep ="")
}
}
f_st <-simulate(n_e, m, mu, n_pops, n_loci, n_gen, n_sample)
df <- add_row(df,
Island = f_st$fst_island,
Step = f_st$fst_step)
}
df <- filter(df, !is.na(N_e))
return(df)
}
result <- run_simulation(100, 0.01, 0.0001, 100, 100, 1000, 25, 100)
...........
Notice that I said more realistic. There are still important ways in which these models are not entirely realistic. As the statistician, George Box, once put it “All models are wrong, but some are useful.”↩︎
Notice that \(m\) is the backwards migration rate, i.e., the fraction of a population composed of migrants that come from another population, not the fraction of residents that migrate to another population.↩︎
Actually, we didn’t show what happens with both migration and mutation, but from what we saw you can probably guess where this came from.↩︎
Latter, B.D.H. 1973. The island model of population differentiation: a general solution. Genetics 73:147-157 doi: 10.1093/genetics/73.1.147↩︎
The formula for \(F_{ST}\) comes from Fu et al., Theoretical Population Biology 63:231-243; 2003.↩︎
Notice that in order to run this code, you’ll also need
to run the code above so that the relevant functions are loaded in
R
’s memory.↩︎
Keep in mind that you’ll only be able to answer these questions for the set of simulations that you complete. Nick will be grading your responses on the simulation results that you show, not on what an exhaustive examination of all possibilities might be.↩︎
The diagonal red dashed line is the 1:1 line. Every point should fall on this line if observed exactly matches the prediction.↩︎
And I’d encourage you to take a more quantitative approach↩︎
Note: It draws on some of the simulation code above, so you’ll need to run the code that comes before Exercise #6 in order to run it.↩︎