Eric Menges has been studying rare plants and fire ecology at Archbold Biological Station for more than 30 years. I just learned that he’s the subject (and narrator) of a short, 16-minute video on Vimeo that I’ve embedded above. Please take the time to watch it. You’ll learn a lot.
Last week was finals week at UConn, which means Commencement weekend began on Saturday. I represented the Provost’s Office at the PharmD ceremony at 9:00am Saturday morning, and our Master’s Commencement Ceremony was held at 1:30pm. I had yesterday off, but because of all the things that accumulated last week, I didn’t have time to write the next installment of this series. It will return next Monday, barring unforseen complications.
In the meantime, this page contains links to the posts that have appeared so far. It also contains links to some related posts that you might find interesting. You may also have noticed the “Causal inference in ecology” label at the top of this page. That’s a link to the same page of posts in case you want to find it again.
Last week I explored the logic behind controlled experiments and why they are typically regarded as the gold standard for identifying and measuring causal effects.1 Let me tie that post and the preceding one on counterfactuals together before we proceed with the next idea. To make things as concrete as possible, let’s return to our hypothetical example of determining whether applying nitrogen fertilizer increases the yield of corn. We do so by
- Randomly assigning individual corn plants to different plots within a field.
- Applying nitrogen fertilizer to some plots, the treatment plots, and not to others, the control plots.
- Determining whether the yield in treatment plots exceeds that in the control plots.
Where do counterfactuals come in? If the yield of treatment plots exceeds that of control plots aren’t we done? Well, not quite. You see the plants that were in the treatment plots are different individuals from those that are in the control plots. To infer that nitrogen fertilizer increases yield, we have to extrapolate the results from the treatment plots to the control plots. We have to be willing to conclude that the yield in the control plots would have been greater if we had applied nitrogen fertilizer there. That’s the counterfactual. We are asserting what would have happened if the facts had been different. In practice, we don’t usually worry about this step in the logic, because we presume that our random assignment of corn plants to different plots means that the plants in the two plots are essentially equivalent. As I pointed out last time, that inference depends on having done the randomization well and having a reasonably large sample.
Let’s assume that we’ve done the randomization well, say by using a pseudorandom number generator in our computer to assign individual plants to the different plots. But let’s also assume that there is genetic variation among our corn plants that influences yield. To make things really simple, let’s assume that there’s a single locus with two alleles associated with yield differences, that high yield is dominant to low yield, and that the two alleles are in equal frequency, so that 75% of the individuals are high yield and 25% are low yield. To make things really simple let’s further assume that all of the high yield plants produce 1kg of corn (sd=0.1kg) and that all of the low yield plants produce exactly 0.5kg of corn (sd=0.1kg).2 Let’s further assume that applying nitrogen fertilizer has absolutely no effect on yield.Then a simple simulation in R produces the following results:3
Sample size: 5 lo: 133 hi: 140 neither: 9727 Sample size: 10 lo: 201 hi: 175 neither: 9624 Sample size: 20 lo: 255 hi: 217 neither: 9528
What you can see from these results is that I was only half right. You need to do the randomization well,4 but your sample size doesn’t need to be all that big to ensure that you get reasonable results. Keep in mind that “reasonable results” here means that (a) you reject the null hypothesis of no difference in yield about 5% of the time and (b) you reject it either way at about the same frequency.5 There are, however, other reasons that you want to have reasonable sample sizes. Refer to the posts linked to on the Causal inference in ecology page for more information about that.
With counterfactuals, controlled experiments, and randomization out of the way, our next stop will be the challenge of falsification.I didn’t discuss the “and measuring” part last week, only the “identifying” part. We’ll return to measuring causal effects later in this series after we’ve explored issues associated with identifying causal effects (or exhausted ourselves trying). ↩
- That corresponds to an effect size of 0.2 standard deviations. ↩
- Click through to the next page to see the R code. ↩
- OK, you can’t see that you need to do the randomization well, but I did it well and it worked, so why not do it well and be safe? ↩
- Since I used a two-sided t-test with a 5% significance threshold, this is just what you should expect. ↩
Randomized controlled experiments are generally regarded as the gold standard for identifying a causal factor.1 Let’s describe a really simple one first. Then we’ll explore why they’re regarded as the gold standard.
Picking up with the example I used last time, let’s suppose we’re trying to test the hypothesis that applying nitrogen fertilizer increases the yield of corn.2 As I pointed out, in setting up our experiment, we’d seek to control for every variable that could influence corn yield so that we can isolate the effect of nitrogen. In the simplest possible case, we’d have two adjacent plots in a field that have been plowed and tilled thoroughly so that the soil in the two plots is completely mixed and indistinguishable in every way – same content of nitrogen, phosphorous, other macronutrients, other micronutrients; same soil texture; same percent of (the same kind of) soil organic matter; same composition of clay, silt, and sand; everything.3 We’d also have plants that were genetically uniform (or as genetically uniform as we can make them), either highly inbred lines or an F1 cross produced between two highly inbred lines. We’d make sure the field was level, maybe using high-tech laser leveling devices, and we’d make sure that every plant in the entire field received the same amount of water. Since we know that the microclimate at the perimeter of the field is different from in the middle of the field, we’d make the field big enough that we could focus our measurements on a part of the field isolated from these edge effects. Then we’d randomly choose one side of the field to be the “low N” treatment and the other to be the “high N” treatment.4 After allowing the plants to grow for an appropriate amount of time, we’d harvest them, dry them, and weigh them.
Our hypothesis has the form
If N is applied to a corn field, then the yield will be greater than if it had not been applied.
Notice that we can’t both apply N and not apply N to the same set of plants. We have to compare what happens when we apply N to one set of plants and don’t apply it to another. If we find that the “high N” plants have a greater yield than the “low N” plants, we infer that the “low N” plants would also have had a greater yield if we had applied N to them (which we didn’t). Why is that justified? Because everything about the two treatments is identical, by design, except for the amount of N applied. If there’s a difference in yield, it can only be attributed to something that differs between the treatments, and the only thing that differs is the amount of N applied.
I can hear you thinking, “Couldn’t the difference just be due to chance?” Well, yes it could. If we do a statistical test and demonstrate that the yields are statistically distinguishable, that increases our confidence that the difference in yield is real, but nothing can ever make the conclusion logically certain in the way we can be logically certain that 2+2=4.5 To my mind there are two things that make us accept the outcome of this experiment as evidence that applying N increases corn yield:
- It’s not just this experiment. If the same experiment is repeated in different places with different soil types, different corn genotypes, and different weather patterns, we get the same result. We can never be certain, but the consistency of that result increases our confidence that the association isn’t just a fluke.
- What we understand about plant growth and physiology leads us to expect that providing nitrogen in fertilizer should enhance plant growth. In other words, this particular hypothesis is part of a larger theoretical framework about plant physiology and development. That framework provides a coherent and repeatable set of predictions across a wide empirical domain.
Put those two together, and we have good reason for thinking that the observed association between N fertilizer and corn yield is actually a causal association.
In experiments where we can’t completely control all relevant variables except the one that we’re interested in, we rely on randomization. Suppose, for example, we couldn’t produce genetically uniform corn. Then we’d randomize the assignment of individuals to the “high” and “low” treatments. The results aren’t quite as solid as if we’d had complete uniformity. It’s always possible that by some statistical fluke a factor we aren’t measuring ends up overrepresented in one treatment and underrepresented in the other, but if we’ve randomized well and we have a reasonably large sample, the chances are small. So our inference isn’t quite as firm, but it’s still pretty goo.
We’ll explore the “reasonably large sample question” in the next installment.
- See, for example, Rubin (Annals of Applied Statistics 2:808-840; 2008. https://projecteuclid.org/euclid.aoas/1223908042) ↩
- If you know me or my work, you know that I’m not at all crazy about the null hypothesis testing approach to investigating ecology. We’ll get to that later, but let’s start with a simple case. Even those of us who don’t like null hypothesis testing as a general approach recognize that it has value. We’ll focus on one way in which it has value here. ↩
- If we were really fastidious we might even set up the experiment in a large growth chamber in which we mixed the soil together and distributed it evenly ourselves. ↩
- If we were really paranoid about controlling for all possible factors, we’d even randomly assign a nitrogen fertilizer level (high or low) to every different plant in the field, and we’d probably do the whole experiment in a very large growth chamber where we could mix the soil ourselves and ensure that light, humidity, and temperature were as uniform as possible across all individuals in the experiment. ↩
- If you don’t see why, Google “problem of induction” and you’ll get some idea. If that doesn’t satisfy you, ask, and I’ll see what I can do to provide an explanation. ↩
It’s among the longest stretch of (planned) travel that I’ve done.1 I
- left Hartford at 11:45am EDT on Thursday, April 19
- arrived in Cinncinnati at 1:53pm,
- left Cinncinnati at 2:45pm,
- arrived in Los Angeles at 4:45pm PDT,
- left Los Angeles at 10:30, and
- arrived in Brisbane at 5:30am Australian Eastern time on Saturday, April 21.
There’s a 14 hour time difference between Brisbane and Hartford. That makes the total travel time 27 hours, 45 minutes gate to gate. I arrived at my hotel about 2 1/2 hours ago. Remarkably, they had a room they could give me, even though the official check in time isn’t until 2:00pm. It’s a very comfortable room in what appears to be a very nice part of the city. I don’t have any meetings until tomorrow. Once I’ve finished up a couple of things I want to do, I’m going to put on some comfortable shoes and go for a walk around town with my camera. First stop, the City Botanic Gardens. I’ll miss the farmer’s market, which is held tomorrow, and I’m not sure what I’ll visit after the botanic gardens, but I’m going to keep going all day. If I can go to bed at something resembling a normal time, there’s a good chance I’ll escape the worst effects of jet lag tomorrow.
The photo is the view from my hotel room.
- A few years ago it took me 3 1/2 days to get home from Capetown. I was stranded in Amsterdam for 2 nights. Yes I mean stranded. The first night I was stuck in the airport. The second night I was at an airport hotel, but I didn’t get there until 3 in the afternoon – too late to go into the city and enjoy anything. ↩
UConn is a member of Universitas 21, an international group of universities dedicated to excellence in research and education. Every year Deans and Directors of Graduate Studies (DDoGS) of U21 universities gather at one of the member universities to exchange ideas about improving graduate education. This year the discussions will include sessions on providing support for career planning, advising and mentoring graduate students and postdocs, and entrepreneurship. The meetings begin Sunday morning and continue through Tuesday – three very full days of vigorous discussion.
This year, the University of Queensland is hosting the meeting, hence my travel to Brisbane. I’m sitting in the departure lounge at Bradley as I type this, and I expect to land in Brisbane about 36 hours from now. If time permits, I’ll send an update or two from the DDoGS meeting. If not, expect a short report after I return.
Let’s start with a few preliminaries.1
- A causal factor (“cause” for short) is something that is predictably related to a particular outcome. For example, fertilizing crops generally increases their yield, so fertilizer is a causal factor related to yield. The way I think about it, a causal factor need not always lead to the outcome. It’s enough if it merely increases the probability of the outcome. For example, smoking doesn’t always lead to lung cancer among those who smoke, but it does increase the probability that you will suffer from lung cancer if you smoke.
- Causes precede effects.2 That’s one reason why teleology is problematic. A teleological explanation explains the current state of things as a result of, i.e., as caused by, something in the future, namely a purpose.3
- Effects may have multiple causes. The world, or at least the world of biology, is a complicated place. Regardless of what phenomenon you’re studying, there are likely to be several (or many) causal factors that influence.
The last point is one of the most important ones for purposes of this series. When we are investigating a phenomenon,4 we’re trying to discern which of several plausible causal factors plays a role and, possibly, the relative “importance” of those causal factors.5
To make this concrete, let’s suppose that we’re trying to determine whether application of nitrogen fertilizer increases the yield of corn. That means we have to determine whether adding nitrogen and adding nitrogen alone increases corn yield. Why the emphasis on “adding nitrogen alone”? Suppose that we added nitrogen to a corn field by adding manure. Then increases in the amount of applied nitrogen are associated with increases in the amount of a host of other substances. If yields increased, we’d know that adding manure increases yield, but not whether it’s because of the nitrogen in manure or something else. Why does this matter?
From very early on in our education we’re taught that “correlation is not the same as causation.” We want to distinguish cases where A causes B from cases where A is merely correlated with B. Yet, as David Hume pointed out long ago, experience6 alone can only show us that A and B actually occur together, not that they must occur together (link). One way of distinguishing cause from correlation is that causes support counterfactual statements. They provide us with a reason to believe statements like “If we had applied nitrogen to the field, the corn yield would have increased” even if we never applied nitrogen to the field at all. The only reason I can see that we could believe such a statement is if we had already determined that adding nitrogen and adding nitrogen alone increases corn yield.7
How do we determine that? Randomized controlled experiments are the most widely known approach, and they are typically regarded as the gold standard against which all other means of inference are compared. That’s where we’ll pick up in the next installment.
- As I warned in the introduction to the series, I am not an expert in causal inference. The terminology I use is likely both to be imprecise and to be somewhat different from the terminology experts use. ↩
- Philosophers have argued about whether backward causation is possible, but I’m going to ignore that possibility. ↩
- Biologists sometimes use teleological language to explain adaptation, e.g., land animals evolved legs to provide mobility. It is, however, relatively easy (if a bit long-winded) to eliminate the teleological language, because natural selection shows how adaptations arise from differential reproduction and survival (link). ↩
- Or at least this is how it is when I’m investigating a phenomenon. ↩
- I’ll come back to the idea of identifying the relative importance of causal factors in a future post. ↩
- Or experiment. ↩
- If there are any philosophers reading this, you’ll recognize that this account is horribly sketchy and amounts to little more than proof by vigorous assertion. If you’re so inclined, I invite you to flesh out more complete explanations for readers who are interested. ↩
If you’ve been following posts here since the first of the year, you know that I’ve been writing about how I keep myself organized. Today I’m starting a completely different series in which I begin to collect my thoughts on how we can make judgments about the cause (or causes) of ecological phenomena1 and the circumstances under which judgments are possible. Before I start, I need to offer a few disclaimers.
- Any evolutionary biologist or ecologist who knows me and my work knows that it’s not uncommon for my ideas to represent a minority opinion. (Think pollen discounting for those of you who know my work on the evolution of plant mating systems.) I make no claim that anything I write here is broadly representative of what my fellow evolutionary biologists and ecologists think, only that it’s what I think. Please challenge me on anything you think I’ve got wrong, because I’m sure there will be things I get wrong, and the easiest way for me to discover those errors is for someone else to point them out.
- I had a minor in Philosophy as an undergraduate and there is an enormous literature on causality in the philosophy of science. I’ll be using a very crude understanding of “cause.” I don’t think it is wildly misleading, but I’m certain it wouldn’t stand up to serious scrutiny.2
- I’ll be thinking about causal inference in the specific context of trying to infer causes from observational data using statistics rather than from inferring causes controlled experiments.3 I’ll be using an approach developed in the 1970s by Donald Rubin, the Rubin Causal Model.4
- There is a very large literature on causal inference in the social sciences. I’ll be drawing heavily on Imbens and Rubin, Causal Inference for Statistics, Social and Biomedical Sciences: An Introduction,5 but there’s an enormous amount of material there that I won’t attempt to cover. I am also pretty new to the concepts associated with the Rubin causal model, so it’s entirely possible that I’ll misrepresent or misinterpret a point that the real experts got right. In other words, if something I say doesn’t make any sense, it’s more likely I got it wrong than that Imbens and Rubin got it wrong.
Although I will be thinking about causal inference in the context of observational data and statistics, I don’t plan to write much (if at all) about the problems with P-values, Bayes factors, credible/confidence intervals overlapping 0 (or not), and the like. If you’d like to know the concerns I have about them, here are links to old posts on those issues.
- Inference from noisy data with small samples
- Being Bayesian won’t save you
- Even an informative prior doesn’t help much
- Noisy data and small samples are a bad combination
- I’m calling the post “Causal inference in ecology” only because “Causal inference in ecology, evolutionary biology, and population genetics” would be too long. ↩
- There’s a good chance that a moderately competent undergraduate Philosophy major would find it woefully inadequate. ↩
- To be more precise, we don’t infer causes from controlled experiments. Rather, we have pre-existing hypotheses about possible causes, and we use controlled experiments to test those hypotheses. ↩
- In my relatively limited reading on the subject, I’ve most often seen it referred to as the Rubin causal model, but it is sometimes referred to as the Neyman causal model. ↩
- Reminder: If you click on that link, it will take you to Amazon.com. I use that link simply because it’s convenient. You can buy the book, if you’re so inclined, from many other outlets. I am not an Amazon affiliate, and I will not receive any compensation if you decide to buy the book regardless of whether you buy it at Amazon or elsewhere. By the way, Chapter 23 in Gelman and Hill’s book, Data Analysis Using Regression and Multilevel/Hierarchical Models has an excellent overview of the Rubin causal model. ↩
When I started this series I didn’t think it would take me three months to finish, but it did. If you’ve been following along, you’ve read about how I keep myself organized. In this last post, I’ll put it altogether by running through the process with links to the individual steps. If you’re familiar with David Allen’s Getting Things Done, this will look pretty familiar, although I discovered most of these practices on my own well before I read his book.1
It all starts on Sunday morning. I brew myself a nice cup of coffee – black, no sugar -, sit down at my laptop, and boot up OmniFocus. I move tasks that may have accumulated in my OmniFocus Inbox to the appropriate Project folder or subfolder.2Then I use the Review perspective to review all of my tasks. I’ve set different projects for different review frequencies. Some I review every week, some I review once a month, and some I review only once every 3-6 months. But everything gets reviewed at a frequency experience has taught me is appropriate. Every week the review will review tasks that need to be rescheduled (sometimes earlier, sometimes later) or dropped. And every week the review gives me ideas for new tasks or projects that get entered into the appropriate place (sometimes it’s Someday/Maybe for things that I just need to think about, sometimes it’s a new project or a new task in an existing project). With that review done, I’m confident that I’ve planned for anything I can plan for in the following week and that my complete list of projects and tasks is in good order so that I’ll be prompted about other important things when the right time arrives.
I review my calendar for the week ahead at the same time. Before I became a dean, I made appointments with myself for blocks of time that I could use for focused work. I treated those time blocks as real appointments and did my best not to let other commitments break them up. As a Dean, I can’t be that inflexible. Too many things arise that need prompt, if not immediate, attention. I’ve cut back on scheduling blocks of time for focused work. Only when I have a really important project that has a looming deadline, a grant proposal for example, will I put a “Do not disturb” block of time on my calendar with instructions to my administrative assistant to check with me before scheduling anything short of a meeting request from the President or the Provost in that time block. That’s as close as I can get to planning deep work time ahead of time. Mostly, I have to take advantage of time blocks when they appear, and they are rarely more than a couple of hours.
On any given day, my calendar and OmniFocus keep me on track. Some of my OmniFocus tasks have specific times of day associated with them, meeting preparation for example. Many have only the end of the day, 5:00pm. I review today’s task list every morning. As a result, I can often pick something to do without checking OmniFocus first, but I do check it frequently throughout the day, often because I’m entering something new that just came up.
At meetings I rarely take paper. I’ve either saved the electronic versions of documents that were sent or scanned paper versions to PDF. Either way, any documents I have before the meeting are in Dropbox, Evernote, or both. Any notes I’ve made before the meeting were probably made with Emacs using Markdown, and published to Evernote with Byword. At the meetings I use pen and paper, my everything notebook. At the end of the day, I’ll scan notes to PDF and save them to Dropbox or I’ll scan them directly to Evernote. As I wrote earlier, I don’t have a clear plan for what goes to Dropbox and what goes to Evernote, but either way I can get it from any electronic device I have handy. If there are action items I need to follow up on, I will have marked them with an arrow (==>) in my notebook, and I transfer them to OmniFocus. I also check over my everything notebook during my weekly review to make sure I haven’t missed any action items that need to be recorded.
Writing it all out like this may make it sound pretty time consuming and complicated, but it’s not. The daily task management is a natural part of the activity and it doesn’t add any time. It just uses the time differently. The weekly review takes a bit longer, but spending 15 minutes or half an hour with a nice cup of coffee looking over the week to come is a nice way to spend a quiet Sunday morning.
Last week I introduced the idea of deep work,
Professional activities performed in a state of distraction-free concentration that push your cognitive capabilities to their limit.
The key words there are distraction free. I picked up some useful tips from reading Deep Work, but there’s also at least one limit to be aware of.1
In Deep Work Cal Newport describes the working style of two people who have been exceptionally productive and who exemplify what he calls the monastic philosophy of deep work scheduling, Adam Grant and Don Knuth. I don’t think I’d heard of Adam Grant before,2 but anyone who’s done more than a little programming has heard of Don Knuth. Not only is he the author of the monumental The Art of Computer Programming, he grew frustrated with the typesetting for TAoCP and wrote TeX and Metafont to compensate. He is also famously inaccessible by e-mail. He stopped answering e-mail in 1990. If you want to contact him, you’ll need to send him a letter to his postal mailing address. His administrative assistant will sort through them and pass along any that seem relevant. Grant isn’t quite as extreme as Knuth, but he batches his availability. He stacks all of his teaching into the fall semester, turning his attention fully to research for the rest of the year. He’ll answer e-mail, but if you happen to e-mail him during one of the 3-4 day periods when he’s focused on a research task, you’ll get an auto-response telling you that you’ll have to wait to hear back from him.
There’s no question that a monastic approach to deep work allows those who can adopt it to accomplish an enormous amount. But there’s also no question that society can continue to function only so long as there are only a few people who adopt that approach. A functioning society depends on functioning institutions, and functioning institutions depend on people to keep them functioning. If you work with a very small group of people, you might be able to agree among yourselves that interruptions are allowed only between 11:00am and 1:00pm or only on Tuesdays and Thursdays, but if you work with more than four or five people you’re unlikely to be able to set aside consistent “do not interrupt” hours except relatively early in the morning or relatively late in the day.3
And it’s not just the people you work with face to face. If you’re an academic, the functioning of your scholarly community depends on your willingness to review papers and grant proposals and to serve as a leader in your scholarly society. I know a few people4 who have made many important scientific contributions, in the sense that they’ve published important papers and discovered important things, who have also made few or no contributions at all to supporting the scholarly community on which they depend. If you decide to adopt a monastic approach, you better be sure that you can make contributions large and important enough that they compensate for your lack of community spirit.
For most of us, we won’t even be able to adopt the bimodal philosophy that Jung employed – periods of intense deep work in seclusion interspersed with periods of involvement in day-to-day life and work.5 It’s most likely that we’ll have to adopt the journalistic philosophy – developing the discipline to do concentrated deep work whenever the opportunity presents itself. That’s why setting up your workspace in a way that you can avoid distraction is important. Any time you find yourself with more than 15-20 minutes of uninterrupted time ask yourself,
- How much time can I set aside right now for work that needs concentrated attention?
- What is the most important work I can do right now that needs concentrated attention?
Then do that work, and don’t allow yourself to be interrupted. Close the door. Don’t answer the phone. Ignore e-mail, Twitter, and Facebook.6 Turn off notifications on your cellphone. Do the work. Then take a break and reward yourself.
Your position in life will determine both how often you find yourself with those uninterrupted blocks of time and how long they are. If you ever find yourself in a position like mine, whether department head or dean or any other administrative position, you’ll soon learn something a friend of mine told me a long time ago.
It’s not that Provosts or Presidents spend that much more time working than the average faculty member. It’s that Provosts and Presidents have little control over their own time.7
That’s more true for me now as a Vice Provost and Dean than it was when I served as Interim Department Head, and it was more true for me as a faculty member than it was as a graduate student or postdoc.
One last piece of advice, if you’re a graduate student or postdoc reading this, take advantage of your relative freedom to develop good deep work habits now. The more you practice, the better you get at it, and the older you get, the more you’re going to need those good habits – no matter what career path you follow.
- To be fair, Cal Newport acknowledges the limit I’m about to describe, but I don’t think his discussion of depth philosophies fully captures it. ↩
- It turns out he’s was the youngest person ever promoted to full professor at Wharton, and he’s the author of a New York Times bestseller (link). ↩
- I say “relatively” because the meaning of early and late depend on where you work. Many businesses operate on an 8:00am-5:00pm schedule, so early might be before 8:00am and late might be after 5:00pm. I’m a morning person. I’m usually in the office before 6:30am. Since I rarely have scheduled meetings before 9:00am (except for meetings with my students), I typically have 2 1/2 hours to myself every morning. ↩
- Who shall remain nameless. ↩
- If you’re not familiar with Jung’s work habits, buy Deep Work or do a little web surfing. Same thing for Walter Isaacson who follows. ↩
- Use new, clean workspace if you’re on your computer. Use a utility that block Internet access if you doubt your willpower. ↩
- The friend who told me this is a former Provost at a major research university (not UConn). ↩