{"id":591,"date":"2018-04-30T08:30:00","date_gmt":"2018-04-30T12:30:00","guid":{"rendered":"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/?p=591"},"modified":"2018-04-29T13:50:29","modified_gmt":"2018-04-29T17:50:29","slug":"causal-inference-in-ecology-randomization-and-sample-size","status":"publish","type":"post","link":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2018\/04\/30\/causal-inference-in-ecology-randomization-and-sample-size\/","title":{"rendered":"Causal inference in ecology &#8211; Randomization and sample size"},"content":{"rendered":"<p><a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/causal-inference-in-ecology\/\">Causal inference in ecology &#8211; links to the series<\/a><\/p>\n<p><a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2018\/04\/23\/causal-inference-in-ecology-controlled-experiments\/\">Last week<\/a> I explored the logic behind controlled experiments and why they are typically regarded as the gold standard for identifying and measuring causal effects.<sup><a id=\"ffn1\" class=\"footnote\" href=\"#fn1\">1<\/a><\/sup> Let me tie that post and the preceding one on <a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2018\/04\/16\/causal-inference-in-ecology-counterfactuals\/\">counterfactuals<\/a> together before we proceed with the next idea. To make things as concrete as possible, let\u2019s return to our hypothetical example of determining whether applying nitrogen fertilizer increases the yield of corn. We do so by<\/p>\n<ul>\n<li>Randomly assigning individual corn plants to different plots within a field.<\/li>\n<li>Applying nitrogen fertilizer to some plots, the treatment plots, and not to others, the control plots.<\/li>\n<li>Determining whether the yield in treatment plots exceeds that in the control plots.<\/li>\n<\/ul>\n<p>Where do counterfactuals come in? If the yield of treatment plots exceeds that of control plots aren\u2019t we done? Well, not quite. You see the plants that were in the treatment plots are different individuals from those that are in the control plots. To infer that nitrogen fertilizer increases yield, we have to extrapolate the results from the treatment plots to the control plots. We have to be willing to conclude that the yield <em><strong>in the control plots<\/strong><\/em> would have been greater <em><strong>if we had applied nitrogen fertilizer there<\/strong><\/em>. That\u2019s the counterfactual. We are asserting what would have happened if the facts had been different. In practice, we don\u2019t usually worry about this step in the logic, because we presume that our random assignment of corn plants to different plots means that the plants in the two plots are essentially equivalent. As I pointed out last time, that inference depends on having done the randomization well and having a reasonably large sample.<\/p>\n<p>Let\u2019s assume that we\u2019ve done the randomization well, say by using a pseudorandom number generator in our computer to assign individual plants to the different plots. But let\u2019s also assume that there is genetic variation among our corn plants that influences yield. To make things really simple, let\u2019s assume that there\u2019s a single locus with two alleles associated with yield differences, that high yield is dominant to low yield, and that the two alleles are in equal frequency, so that 75% of the individuals are high yield and 25% are low yield. To make things really simple let\u2019s further assume that all of the high yield plants produce 1kg of corn (sd=0.1kg) and that all of the low yield plants produce exactly 0.5kg of corn (sd=0.1kg).<sup><a id=\"ffn2\" class=\"footnote\" href=\"#fn2\">2<\/a><\/sup> Let\u2019s further assume that applying nitrogen fertilizer has absolutely no effect on yield.Then a simple simulation in R produces the following results:<sup><a id=\"ffn3\" class=\"footnote\" href=\"#fn3\">3<\/a><\/sup><\/p>\n<pre><code>Sample size:  5 \r\n          lo:  133 \r\n          hi:  140 \r\n     neither:  9727 \r\nSample size:  10 \r\n          lo:  201 \r\n          hi:  175 \r\n     neither:  9624 \r\nSample size:  20 \r\n          lo:  255 \r\n          hi:  217 \r\n     neither:  9528 \r\n<\/code><\/pre>\n<p>What you can see from these results is that I was only half right. You need to do the randomization well,<sup><a id=\"ffn4\" class=\"footnote\" href=\"#fn4\">4<\/a><\/sup> <em><strong>but<\/strong><\/em> your sample size doesn\u2019t need to be all that big to ensure that you get reasonable results. Keep in mind that \u201creasonable results\u201d here means that (a) you reject the null hypothesis of no difference in yield about 5% of the time and (b) you reject it either way at about the same frequency.<sup><a id=\"ffn5\" class=\"footnote\" href=\"#fn5\">5<\/a><\/sup> There are, however, other reasons that you want to have reasonable sample sizes. Refer to the posts linked to on the <a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/causal-inference-in-ecology\/\">Causal inference in ecology page<\/a> for more information about that.<\/p>\n<p>With counterfactuals, controlled experiments, and randomization out of the way, our next stop will be the challenge of falsification.I didn\u2019t discuss the \u201cand measuring\u201d part last week, only the \u201cidentifying\u201d part. We\u2019ll return to measuring causal effects later in this series after we\u2019ve explored issues associated with identifying causal effects (or exhausted ourselves trying). <a href=\"#ffn1\">&#x21a9;<\/a><\/p>\n<ol id=\"footnotes\">\n<li id=\"fn2\">That corresponds to an effect size of 0.2 standard deviations. <a href=\"#ffn2\">&#x21a9;<\/a><\/li>\n<li id=\"fn3\">Click through to the next page to see the R code. <a href=\"#ffn3\">&#x21a9;<\/a><\/li>\n<li id=\"fn4\">OK, you can\u2019t see that you need to do the randomization well, but I did it well and it worked, so why not do it well and be safe? <a href=\"#ffn4\">&#x21a9;<\/a><\/li>\n<li id=\"fn5\">Since I used a two-sided t-test with a 5% significance threshold, this is just what you should expect. <a href=\"#ffn5\">&#x21a9;<\/a><\/li>\n<\/ol>\n<p><!--more--><\/p>\n<p>&nbsp;<\/p>\n<pre><code>## allele frequency\r\n##\r\np &lt;- sqrt(0.5)\r\n## yield\r\n##\r\nhi &lt;- 1.0\r\nlo &lt;- 0.5\r\n## standard deviation\r\n##\r\nsd &lt;- 0.1\r\n## number of simulations at each sample size\r\n##\r\nn_sim &lt;- 10000\r\n## P-value threshold for significance\r\n##\r\nthresh &lt;- 0.05\r\n\r\nsimulate &lt;- function(n_sim, n, p, hi, lo, sd, thresh) {\r\n  hi_freq &lt;- p^2 + 2*p*(1.0-p)\r\n  lo_ct &lt;- 0\r\n  hi_ct &lt;- 0\r\n  ## n_hi is the total number of high-yield plants to be sampled across\r\n  ## the two treatments\r\n  ##\r\n  ## 2*n because the sample size in each treatment is n\r\n  ##\r\n  n_hi &lt;- rbinom(n_sim, 2*n, hi_freq)\r\n  for (i in 1:n_sim) {\r\n    ## select high-yield plants for high-N treatment\r\n    ##\r\n    n_hi_in_hi_N &lt;- rhyper(1, n_hi[i], 2*n-n_hi[i], n)\r\n    ## high-yield plants in low-N treatment is simply the number left\r\n    ## out of n_hi\r\n    ##\r\n    n_hi_in_lo_N &lt;- n_hi[i] - n_hi_in_hi_N\r\n    ## low-yield plants in high-N\r\n    ##\r\n    n_lo_in_hi_N &lt;- n - n_hi_in_hi_N\r\n    ## low-yield plants in low-N\r\n    ##\r\n    n_lo_in_lo_N &lt;- n - n_hi_in_lo_N\r\n    hi_n &lt;- c(rnorm(n_hi_in_hi_N, hi, sd),\r\n              rnorm(n_lo_in_hi_N, lo, sd))\r\n    lo_n &lt;- c(rnorm(n_hi_in_lo_N, hi, sd),\r\n              rnorm(n_lo_in_lo_N, lo, sd))\r\n    test &lt;- t.test(hi_n, lo_n)\r\n    if <span class=\"footnote_referrer\"><a role=\"button\" tabindex=\"0\" onclick=\"footnote_moveToReference_591_1('footnote_plugin_reference_591_1_1');\" onkeypress=\"footnote_moveToReference_591_1('footnote_plugin_reference_591_1_1');\" ><sup id=\"footnote_plugin_tooltip_591_1_1\" class=\"footnote_plugin_tooltip_text\">[1]<\/sup><\/a><span id=\"footnote_plugin_tooltip_text_591_1_1\" class=\"footnote_tooltip\">test$statistic &lt; 0.0) &amp;&amp; (test$p.value &lt; thresh<\/span><\/span><script type=\"text\/javascript\"> jQuery('#footnote_plugin_tooltip_591_1_1').tooltip({ tip: '#footnote_plugin_tooltip_text_591_1_1', tipClass: 'footnote_tooltip', effect: 'fade', predelay: 0, fadeInSpeed: 200, delay: 400, fadeOutSpeed: 200, position: 'top center', relative: true, offset: [-7, 0], });<\/script> {\r\n      lo_ct &lt;- lo_ct + 1\r\n    } else if <span class=\"footnote_referrer\"><a role=\"button\" tabindex=\"0\" onclick=\"footnote_moveToReference_591_1('footnote_plugin_reference_591_1_2');\" onkeypress=\"footnote_moveToReference_591_1('footnote_plugin_reference_591_1_2');\" ><sup id=\"footnote_plugin_tooltip_591_1_2\" class=\"footnote_plugin_tooltip_text\">[2]<\/sup><\/a><span id=\"footnote_plugin_tooltip_text_591_1_2\" class=\"footnote_tooltip\">test$statistic &gt; 0.0) &amp;&amp; (test$p.value &lt; thresh<\/span><\/span><script type=\"text\/javascript\"> jQuery('#footnote_plugin_tooltip_591_1_2').tooltip({ tip: '#footnote_plugin_tooltip_text_591_1_2', tipClass: 'footnote_tooltip', effect: 'fade', predelay: 0, fadeInSpeed: 200, delay: 400, fadeOutSpeed: 200, position: 'top center', relative: true, offset: [-7, 0], });<\/script> {\r\n      hi_ct &lt;- hi_ct + 1\r\n    }\r\n  }\r\n  return(list(n=n,\r\n              lo_ct=lo_ct,\r\n              hi_ct=hi_ct))\r\n}\r\n\r\nreport &lt;- function(result, n_sim) {\r\n  cat(\"Sample size: \", result$n, \"\\n\",\r\n      \"         lo: \", result$lo_ct, \"\\n\",\r\n      \"         hi: \", result$hi_ct, \"\\n\",\r\n      \"    neither: \", n_sim - result$lo_ct - result$hi_ct, \"\\n\")\r\n}\r\n\r\nsmall &lt;- simulate(n_sim, 5, p, hi, lo, sd, thresh)\r\nmedium &lt;- simulate(n_sim, 10, p, hi, lo, sd, thresh)\r\nlarge &lt;- simulate(n_sim, 20, p, hi, lo, sd, thresh)\r\n\r\nreport(small, n_sim)\r\nreport(medium, n_sim)\r\nreport(large, n_sim)\r\n<\/code><\/pre>\n<div class=\"speaker-mute footnotes_reference_container\"> <div class=\"footnote_container_prepare\"><p><span role=\"button\" tabindex=\"0\" class=\"footnote_reference_container_label pointer\" onclick=\"footnote_expand_collapse_reference_container_591_1();\">References<\/span><span role=\"button\" tabindex=\"0\" class=\"footnote_reference_container_collapse_button\" style=\"display: none;\" onclick=\"footnote_expand_collapse_reference_container_591_1();\">[<a id=\"footnote_reference_container_collapse_button_591_1\">+<\/a>]<\/span><\/p><\/div> <div id=\"footnote_references_container_591_1\" style=\"\"><table class=\"footnotes_table footnote-reference-container\"><caption class=\"accessibility\">References<\/caption> <tbody> \r\n\r\n<tr class=\"footnotes_plugin_reference_row\"> <th scope=\"row\" class=\"footnote_plugin_index_combi pointer\"  onclick=\"footnote_moveToAnchor_591_1('footnote_plugin_tooltip_591_1_1');\"><a id=\"footnote_plugin_reference_591_1_1\" class=\"footnote_backlink\"><span class=\"footnote_index_arrow\">&#8593;<\/span>1<\/a><\/th> <td class=\"footnote_plugin_text\">test$statistic &lt; 0.0) &amp;&amp; (test$p.value &lt; thresh<\/td><\/tr>\r\n\r\n<tr class=\"footnotes_plugin_reference_row\"> <th scope=\"row\" class=\"footnote_plugin_index_combi pointer\"  onclick=\"footnote_moveToAnchor_591_1('footnote_plugin_tooltip_591_1_2');\"><a id=\"footnote_plugin_reference_591_1_2\" class=\"footnote_backlink\"><span class=\"footnote_index_arrow\">&#8593;<\/span>2<\/a><\/th> <td class=\"footnote_plugin_text\">test$statistic &gt; 0.0) &amp;&amp; (test$p.value &lt; thresh<\/td><\/tr>\r\n\r\n <\/tbody> <\/table> <\/div><\/div><script type=\"text\/javascript\"> function footnote_expand_reference_container_591_1() { jQuery('#footnote_references_container_591_1').show(); jQuery('#footnote_reference_container_collapse_button_591_1').text('\u2212'); } function footnote_collapse_reference_container_591_1() { jQuery('#footnote_references_container_591_1').hide(); jQuery('#footnote_reference_container_collapse_button_591_1').text('+'); } function footnote_expand_collapse_reference_container_591_1() { if (jQuery('#footnote_references_container_591_1').is(':hidden')) { footnote_expand_reference_container_591_1(); } else { footnote_collapse_reference_container_591_1(); } } function footnote_moveToReference_591_1(p_str_TargetID) { footnote_expand_reference_container_591_1(); var l_obj_Target = jQuery('#' + p_str_TargetID); if (l_obj_Target.length) { jQuery( 'html, body' ).delay( 0 ); jQuery('html, body').animate({ scrollTop: l_obj_Target.offset().top - window.innerHeight * 0.2 }, 380); } } function footnote_moveToAnchor_591_1(p_str_TargetID) { footnote_expand_reference_container_591_1(); var l_obj_Target = jQuery('#' + p_str_TargetID); if (l_obj_Target.length) { jQuery( 'html, body' ).delay( 0 ); jQuery('html, body').animate({ scrollTop: l_obj_Target.offset().top - window.innerHeight * 0.2 }, 380); } }<\/script>","protected":false},"excerpt":{"rendered":"<p>Causal inference in ecology &#8211; links to the series Last week I explored the logic behind controlled experiments and why they are typically regarded as the gold standard for identifying&#8230; <a class=\"read-more-button\" href=\"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2018\/04\/30\/causal-inference-in-ecology-randomization-and-sample-size\/\">Read more &gt;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[],"class_list":["post-591","post","type-post","status-publish","format-standard","hentry","category-statistics"],"_links":{"self":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/591","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/comments?post=591"}],"version-history":[{"count":3,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/591\/revisions"}],"predecessor-version":[{"id":594,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/591\/revisions\/594"}],"wp:attachment":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/media?parent=591"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/categories?post=591"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/tags?post=591"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}