This is an R notebook that contains the code I used in the Randomization and sample size post in my series on Causal inference in ecology.

First, we set the default parameters for the simulation: the allele frequency at the locus influencing yield (p), the mean yield for high and low genotypes (hi, lo: high is dominant to low), the standard deviation of yield around the mean (sd), the number of simulations at each of the sample sizes (n_sim, and the P-value threshold for statistical significance (thresh).

## allele frequency
##
p <- sqrt(0.5)
## yield
##
hi <- 1.0
lo <- 0.5
## standard deviation
##
sd <- 0.1
## number of simulations at each sample size
##
n_sim <- 10000
## P-value threshold for significance
##
thresh <- 0.05

The we define the functions simulate() and report() that perform the simulation for one sample size and report the results, respectively.

simulate <- function(n_sim, n, p, hi, lo, sd, thresh) {
  hi_freq <- p^2 + 2*p*(1.0-p)
  lo_ct <- 0
  hi_ct <- 0
  ## n_hi is the total number of high-yield plants to be sampled across
  ## the two treatments
  ##
  ## 2*n because the sample size in each treatment is n
  ##
  n_hi <- rbinom(n_sim, 2*n, hi_freq)
  for (i in 1:n_sim) {
    ## select high-yield plants for high-N treatment
    ##
    n_hi_in_hi_N <- rhyper(1, n_hi[i], 2*n-n_hi[i], n)
    ## high-yield plants in low-N treatment is simply the number left
    ## out of n_hi
    ##
    n_hi_in_lo_N <- n_hi[i] - n_hi_in_hi_N
    ## low-yield plants in high-N
    ##
    n_lo_in_hi_N <- n - n_hi_in_hi_N
    ## low-yield plants in low-N
    ##
    n_lo_in_lo_N <- n - n_hi_in_lo_N
    hi_n <- c(rnorm(n_hi_in_hi_N, hi, sd),
              rnorm(n_lo_in_hi_N, lo, sd))
    lo_n <- c(rnorm(n_hi_in_lo_N, hi, sd),
              rnorm(n_lo_in_lo_N, lo, sd))
    test <- t.test(hi_n, lo_n)
    if ((test$statistic < 0.0) && (test$p.value < thresh)) {
      lo_ct <- lo_ct + 1
    } else if ((test$statistic > 0.0) && (test$p.value < thresh)) {
      hi_ct <- hi_ct + 1
    }
  }
  return(list(n=n,
              lo_ct=lo_ct,
              hi_ct=hi_ct))
}
report <- function(result, n_sim) {
  cat("Sample size: ", result$n, "\n",
      "         lo: ", result$lo_ct, "\n",
      "         hi: ", result$hi_ct, "\n",
      "    neither: ", n_sim - result$lo_ct - result$hi_ct, "\n")
}

With those in place, we’re ready to run the simulations and report the results. Notice that I’ve hardcoded small, medium, and large as 5, 10, and 20 (the second argument in the call to simulate).

small <- simulate(n_sim, 5, p, hi, lo, sd, thresh)
medium <- simulate(n_sim, 10, p, hi, lo, sd, thresh)
large <- simulate(n_sim, 20, p, hi, lo, sd, thresh)
report(small, n_sim)
Sample size:  5 
          lo:  141 
          hi:  135 
     neither:  9724 
report(medium, n_sim)
Sample size:  10 
          lo:  177 
          hi:  195 
     neither:  9628 
report(large, n_sim)
Sample size:  20 
          lo:  211 
          hi:  220 
     neither:  9569 

I produced the webpage you see here simply by selecting “Run->Restart R and Run All Chunks”.

LS0tCnRpdGxlOiAiUmFuZG9taXphdGlvbiBkZW1vIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tClRoaXMgaXMgYW4gUiBub3RlYm9vayB0aGF0IGNvbnRhaW5zIHRoZSBjb2RlIEkgdXNlZCBpbiB0aGUgW1JhbmRvbWl6YXRpb24gYW5kIHNhbXBsZSBzaXplXShodHRwOi8vZGFyd2luLmVlYi51Y29ubi5lZHUvdW5jb21tb24tZ3JvdW5kL2Jsb2cvMjAxOC8wNC8zMC9jYXVzYWwtaW5mZXJlbmNlLWluLWVjb2xvZ3ktcmFuZG9taXphdGlvbi1hbmQtc2FtcGxlLXNpemUvKSBwb3N0IGluIG15IHNlcmllcyBvbiBbQ2F1c2FsIGluZmVyZW5jZSBpbiBlY29sb2d5XShodHRwOi8vZGFyd2luLmVlYi51Y29ubi5lZHUvdW5jb21tb24tZ3JvdW5kL2NhdXNhbC1pbmZlcmVuY2UtaW4tZWNvbG9neS8pLgoKCkZpcnN0LCB3ZSBzZXQgdGhlIGRlZmF1bHQgcGFyYW1ldGVycyBmb3IgdGhlIHNpbXVsYXRpb246IHRoZSBhbGxlbGUgZnJlcXVlbmN5IGF0IHRoZSBsb2N1cyBpbmZsdWVuY2luZyB5aWVsZCAoPGVtPnA8L2VtPiksIHRoZSBtZWFuIHlpZWxkIGZvciBoaWdoIGFuZCBsb3cgZ2Vub3R5cGVzICg8ZW0+aGk8L2VtPiwgPGVtPmxvPC9lbT46IGhpZ2ggaXMgZG9taW5hbnQgdG8gbG93KSwgdGhlIHN0YW5kYXJkIGRldmlhdGlvbiBvZiB5aWVsZCBhcm91bmQgdGhlIG1lYW4gKDxlbT5zZDwvZW0+KSwgdGhlIG51bWJlciBvZiBzaW11bGF0aW9ucyBhdCBlYWNoIG9mIHRoZSBzYW1wbGUgc2l6ZXMgKDxlbT5uX3NpbTwvZW0+LCBhbmQgdGhlIFAtdmFsdWUgdGhyZXNob2xkIGZvciBzdGF0aXN0aWNhbCBzaWduaWZpY2FuY2UgKDxlbT50aHJlc2g8L2VtPikuCmBgYHtyfQojIyBhbGxlbGUgZnJlcXVlbmN5CiMjCnAgPC0gc3FydCgwLjUpCiMjIHlpZWxkCiMjCmhpIDwtIDEuMApsbyA8LSAwLjUKIyMgc3RhbmRhcmQgZGV2aWF0aW9uCiMjCnNkIDwtIDAuMQojIyBudW1iZXIgb2Ygc2ltdWxhdGlvbnMgYXQgZWFjaCBzYW1wbGUgc2l6ZQojIwpuX3NpbSA8LSAxMDAwMAojIyBQLXZhbHVlIHRocmVzaG9sZCBmb3Igc2lnbmlmaWNhbmNlCiMjCnRocmVzaCA8LSAwLjA1CmBgYApUaGUgd2UgZGVmaW5lIHRoZSBmdW5jdGlvbnMgPHR0PnNpbXVsYXRlKCk8L3R0PiBhbmQgPHR0PnJlcG9ydCgpPC90dD4gdGhhdCBwZXJmb3JtIHRoZSBzaW11bGF0aW9uIGZvciBvbmUgc2FtcGxlIHNpemUgYW5kIHJlcG9ydCB0aGUgcmVzdWx0cywgcmVzcGVjdGl2ZWx5LgpgYGB7cn0Kc2ltdWxhdGUgPC0gZnVuY3Rpb24obl9zaW0sIG4sIHAsIGhpLCBsbywgc2QsIHRocmVzaCkgewogIGhpX2ZyZXEgPC0gcF4yICsgMipwKigxLjAtcCkKICBsb19jdCA8LSAwCiAgaGlfY3QgPC0gMAogICMjIG5faGkgaXMgdGhlIHRvdGFsIG51bWJlciBvZiBoaWdoLXlpZWxkIHBsYW50cyB0byBiZSBzYW1wbGVkIGFjcm9zcwogICMjIHRoZSB0d28gdHJlYXRtZW50cwogICMjCiAgIyMgMipuIGJlY2F1c2UgdGhlIHNhbXBsZSBzaXplIGluIGVhY2ggdHJlYXRtZW50IGlzIG4KICAjIwogIG5faGkgPC0gcmJpbm9tKG5fc2ltLCAyKm4sIGhpX2ZyZXEpCiAgZm9yIChpIGluIDE6bl9zaW0pIHsKICAgICMjIHNlbGVjdCBoaWdoLXlpZWxkIHBsYW50cyBmb3IgaGlnaC1OIHRyZWF0bWVudAogICAgIyMKICAgIG5faGlfaW5faGlfTiA8LSByaHlwZXIoMSwgbl9oaVtpXSwgMipuLW5faGlbaV0sIG4pCiAgICAjIyBoaWdoLXlpZWxkIHBsYW50cyBpbiBsb3ctTiB0cmVhdG1lbnQgaXMgc2ltcGx5IHRoZSBudW1iZXIgbGVmdAogICAgIyMgb3V0IG9mIG5faGkKICAgICMjCiAgICBuX2hpX2luX2xvX04gPC0gbl9oaVtpXSAtIG5faGlfaW5faGlfTgogICAgIyMgbG93LXlpZWxkIHBsYW50cyBpbiBoaWdoLU4KICAgICMjCiAgICBuX2xvX2luX2hpX04gPC0gbiAtIG5faGlfaW5faGlfTgogICAgIyMgbG93LXlpZWxkIHBsYW50cyBpbiBsb3ctTgogICAgIyMKICAgIG5fbG9faW5fbG9fTiA8LSBuIC0gbl9oaV9pbl9sb19OCiAgICBoaV9uIDwtIGMocm5vcm0obl9oaV9pbl9oaV9OLCBoaSwgc2QpLAogICAgICAgICAgICAgIHJub3JtKG5fbG9faW5faGlfTiwgbG8sIHNkKSkKICAgIGxvX24gPC0gYyhybm9ybShuX2hpX2luX2xvX04sIGhpLCBzZCksCiAgICAgICAgICAgICAgcm5vcm0obl9sb19pbl9sb19OLCBsbywgc2QpKQogICAgdGVzdCA8LSB0LnRlc3QoaGlfbiwgbG9fbikKICAgIGlmICgodGVzdCRzdGF0aXN0aWMgPCAwLjApICYmICh0ZXN0JHAudmFsdWUgPCB0aHJlc2gpKSB7CiAgICAgIGxvX2N0IDwtIGxvX2N0ICsgMQogICAgfSBlbHNlIGlmICgodGVzdCRzdGF0aXN0aWMgPiAwLjApICYmICh0ZXN0JHAudmFsdWUgPCB0aHJlc2gpKSB7CiAgICAgIGhpX2N0IDwtIGhpX2N0ICsgMQogICAgfQogIH0KICByZXR1cm4obGlzdChuPW4sCiAgICAgICAgICAgICAgbG9fY3Q9bG9fY3QsCiAgICAgICAgICAgICAgaGlfY3Q9aGlfY3QpKQp9CgpyZXBvcnQgPC0gZnVuY3Rpb24ocmVzdWx0LCBuX3NpbSkgewogIGNhdCgiU2FtcGxlIHNpemU6ICIsIHJlc3VsdCRuLCAiXG4iLAogICAgICAiICAgICAgICAgbG86ICIsIHJlc3VsdCRsb19jdCwgIlxuIiwKICAgICAgIiAgICAgICAgIGhpOiAiLCByZXN1bHQkaGlfY3QsICJcbiIsCiAgICAgICIgICAgbmVpdGhlcjogIiwgbl9zaW0gLSByZXN1bHQkbG9fY3QgLSByZXN1bHQkaGlfY3QsICJcbiIpCn0KYGBgCldpdGggdGhvc2UgaW4gcGxhY2UsIHdlJ3JlIHJlYWR5IHRvIHJ1biB0aGUgc2ltdWxhdGlvbnMgYW5kIHJlcG9ydCB0aGUgcmVzdWx0cy4gTm90aWNlIHRoYXQgSSd2ZSBoYXJkY29kZWQgc21hbGwsIG1lZGl1bSwgYW5kIGxhcmdlIGFzIDUsIDEwLCBhbmQgMjAgKHRoZSBzZWNvbmQgYXJndW1lbnQgaW4gdGhlIGNhbGwgdG8gPHR0PnNpbXVsYXRlPC90dD4pLgpgYGB7cn0Kc21hbGwgPC0gc2ltdWxhdGUobl9zaW0sIDUsIHAsIGhpLCBsbywgc2QsIHRocmVzaCkKbWVkaXVtIDwtIHNpbXVsYXRlKG5fc2ltLCAxMCwgcCwgaGksIGxvLCBzZCwgdGhyZXNoKQpsYXJnZSA8LSBzaW11bGF0ZShuX3NpbSwgMjAsIHAsIGhpLCBsbywgc2QsIHRocmVzaCkKCnJlcG9ydChzbWFsbCwgbl9zaW0pCnJlcG9ydChtZWRpdW0sIG5fc2ltKQpyZXBvcnQobGFyZ2UsIG5fc2ltKQpgYGAKCkkgcHJvZHVjZWQgdGhlIHdlYnBhZ2UgeW91IHNlZSBoZXJlIHNpbXBseSBieSBzZWxlY3RpbmcgIlJ1bi0+UmVzdGFydCBSIGFuZCBSdW4gQWxsIENodW5rcyIuIA==