Uncommon Ground

Causal inference in ecology – Randomization and sample size

Causal inference in ecology – links to the series

Last week I explored the logic behind controlled experiments and why they are typically regarded as the gold standard for identifying and measuring causal effects.1 Let me tie that post and the preceding one on counterfactuals together before we proceed with the next idea. To make things as concrete as possible, let’s return to our hypothetical example of determining whether applying nitrogen fertilizer increases the yield of corn. We do so by

  • Randomly assigning individual corn plants to different plots within a field.
  • Applying nitrogen fertilizer to some plots, the treatment plots, and not to others, the control plots.
  • Determining whether the yield in treatment plots exceeds that in the control plots.

Where do counterfactuals come in? If the yield of treatment plots exceeds that of control plots aren’t we done? Well, not quite. You see the plants that were in the treatment plots are different individuals from those that are in the control plots. To infer that nitrogen fertilizer increases yield, we have to extrapolate the results from the treatment plots to the control plots. We have to be willing to conclude that the yield in the control plots would have been greater if we had applied nitrogen fertilizer there. That’s the counterfactual. We are asserting what would have happened if the facts had been different. In practice, we don’t usually worry about this step in the logic, because we presume that our random assignment of corn plants to different plots means that the plants in the two plots are essentially equivalent. As I pointed out last time, that inference depends on having done the randomization well and having a reasonably large sample.

Let’s assume that we’ve done the randomization well, say by using a pseudorandom number generator in our computer to assign individual plants to the different plots. But let’s also assume that there is genetic variation among our corn plants that influences yield. To make things really simple, let’s assume that there’s a single locus with two alleles associated with yield differences, that high yield is dominant to low yield, and that the two alleles are in equal frequency, so that 75% of the individuals are high yield and 25% are low yield. To make things really simple let’s further assume that all of the high yield plants produce 1kg of corn (sd=0.1kg) and that all of the low yield plants produce exactly 0.5kg of corn (sd=0.1kg).2 Let’s further assume that applying nitrogen fertilizer has absolutely no effect on yield.Then a simple simulation in R produces the following results:3

Sample size:  5 
          lo:  133 
          hi:  140 
     neither:  9727 
Sample size:  10 
          lo:  201 
          hi:  175 
     neither:  9624 
Sample size:  20 
          lo:  255 
          hi:  217 
     neither:  9528 

What you can see from these results is that I was only half right. You need to do the randomization well,4 but your sample size doesn’t need to be all that big to ensure that you get reasonable results. Keep in mind that “reasonable results” here means that (a) you reject the null hypothesis of no difference in yield about 5% of the time and (b) you reject it either way at about the same frequency.5 There are, however, other reasons that you want to have reasonable sample sizes. Refer to the posts linked to on the Causal inference in ecology page for more information about that.

With counterfactuals, controlled experiments, and randomization out of the way, our next stop will be the challenge of falsification.I didn’t discuss the “and measuring” part last week, only the “identifying” part. We’ll return to measuring causal effects later in this series after we’ve explored issues associated with identifying causal effects (or exhausted ourselves trying).

  1. That corresponds to an effect size of 0.2 standard deviations.
  2. Click through to the next page to see the R code.
  3. OK, you can’t see that you need to do the randomization well, but I did it well and it worked, so why not do it well and be safe?
  4. Since I used a two-sided t-test with a 5% significance threshold, this is just what you should expect.


## allele frequency
p <- sqrt(0.5)
## yield
hi <- 1.0
lo <- 0.5
## standard deviation
sd <- 0.1
## number of simulations at each sample size
n_sim <- 10000
## P-value threshold for significance
thresh <- 0.05

simulate <- function(n_sim, n, p, hi, lo, sd, thresh) {
  hi_freq <- p^2 + 2*p*(1.0-p)
  lo_ct <- 0
  hi_ct <- 0
  ## n_hi is the total number of high-yield plants to be sampled across
  ## the two treatments
  ## 2*n because the sample size in each treatment is n
  n_hi <- rbinom(n_sim, 2*n, hi_freq)
  for (i in 1:n_sim) {
    ## select high-yield plants for high-N treatment
    n_hi_in_hi_N <- rhyper(1, n_hi[i], 2*n-n_hi[i], n)
    ## high-yield plants in low-N treatment is simply the number left
    ## out of n_hi
    n_hi_in_lo_N <- n_hi[i] - n_hi_in_hi_N
    ## low-yield plants in high-N
    n_lo_in_hi_N <- n - n_hi_in_hi_N
    ## low-yield plants in low-N
    n_lo_in_lo_N <- n - n_hi_in_lo_N
    hi_n <- c(rnorm(n_hi_in_hi_N, hi, sd),
              rnorm(n_lo_in_hi_N, lo, sd))
    lo_n <- c(rnorm(n_hi_in_lo_N, hi, sd),
              rnorm(n_lo_in_lo_N, lo, sd))
    test <- t.test(hi_n, lo_n)
    if ((test$statistic < 0.0) && (test$p.value < thresh)) {
      lo_ct <- lo_ct + 1
    } else if ((test$statistic > 0.0) && (test$p.value < thresh)) {
      hi_ct <- hi_ct + 1

report <- function(result, n_sim) {
  cat("Sample size: ", result$n, "\n",
      "         lo: ", result$lo_ct, "\n",
      "         hi: ", result$hi_ct, "\n",
      "    neither: ", n_sim - result$lo_ct - result$hi_ct, "\n")

small <- simulate(n_sim, 5, p, hi, lo, sd, thresh)
medium <- simulate(n_sim, 10, p, hi, lo, sd, thresh)
large <- simulate(n_sim, 20, p, hi, lo, sd, thresh)

report(small, n_sim)
report(medium, n_sim)
report(large, n_sim)

Leave a Comment

Your email address will not be published. Required fields are marked *