Uncommon Ground

Author Archive: kent

A few thoughts on how to structure a scientific paper

I mentioned last week that I’m reading Williams & Bizup, Style: The Basics of Style and Grace. Yesterday I came across this very succinct advice for the early stages of writing a paper and thinking about how to structure it.

When you plan a paper, look for a question that is small enough to answer but is also connected to a question large enough for you and your readers to care about.

If you’re a scientist and writing a paper,1 you already have the data and most or all of the statistical analyses done. So the “look for a question” part has to happen twice in writing a scientific paper.2 You need to “look for a question that is small enough to answer but is also connected to a question large enough for you and your readers to care about” before you begin collecting data. Then you need to collect data that will answer that question.

Science being what it is,3 after you’ve collected the data you’ll find that there are data you couldn’t collect that you wanted to collect4 and there are data you collected that you didn’t anticipate collecting. In writing the paper you now have to look at the data you have in hand, identify a question that the data in hand can answer that is connected to a larger, interesting question, and (this is the hard part) write the paper using only the data that answer that larger, interesting question. If you’re like me, you5 will have collected other data that don’t fit in this paper. That doesn’t mean they’re useless, and it doesn’t mean you should discard them. It merely means that they’re not useful for this paper. With any luck you’ll find that they are useful for another paper that you’ll write in the future.

  1. Or at least if you’re a scientist like me and writing a paper.
  2. Or at least it has to happen twice if you’re me.
  3. Or at least science being what it is in the way that I do it.
  4. Especially if your research involves work in the field.
  5. In the interest of full disclosure, I have to point out that I almost never collect data myself. It’s my students and collaborators who collect the data. Even when I’m in the field, I mostly hold the field notebook and write down the measurements someone else is making. I rarely make the measurements myself. The closest I usually come to collecting data myself is collecting samples from which someone else derives data.

A few thoughts on writing (inspired by Williams & Bizup, Style)

I do not claim to write well, but I have been writing for nearly 40 years, and I’ve been helping students with writing for more than 30. Along the way I’ve figured out a few things that work for me, so I thought I’d pass a few of them along. Keep in mind that I have no training, and I have no credentials suggesting that anything I write is worth reading. If you find something useful here, use it. If you don’t, ignore it. Better yet, if you find something here you think is fundamentally misguided, leave a comment so that others won’t be misled.

Nearly 30 years ago I bought a copy of Wiliams, Style: Lessons in Clarity and Grace. I’ve referred to it frequently ever since. On Wednesday I bought the Kindle edition of Williams & Bizup, Style: The Basics of Style and Grace. I’m just getting started on it, but I can already recommend it. It’s a shorter version of Lessons, but even in the shorter version there’s a lot here that anyone who writes can use.

The first and most important lesson is not to worry about style or principles of style at all until you’ve written something down. The only bad first draft is the first draft you haven’t written. Before you worry about whether readers can understand you or whether what you’ve written will capture or hold their interest, write something down so that you can start revising it. This lesson took me a very long time to learn. When I was in graduate school I literally had to start writing with the first paragraph of the Introduction and then write every other paragraph in sequence until I was done. I also struggled to make every sentence and paragraph perfect as I was writing them, because that was how I imagined writers wrote. Even though I’d read and heard it before, it wasn’t until some time after I joined the faculty at UConn that I finally understood that 90 percent or more of writing is rewriting. As Williams and Bizup put it,

Most experienced writers get something down as fast as they can. Then as they revise that first draft into something clearer, they understand their ideas better. And when they understand their ideas better, they express them more clearly, and the more clearly they express them, the better they understand them—and so it goes, until they run out of energy, interest, or time.

They also point out that you can exercise your revising chops on other people’s writing. When you’re reading something that seems complicated and confusing, take a good, hard look at it and see if you can find a way to express the ideas more clearly. If you can, you’ll have the satisfaction not only of having worked out the meaning of that complicated thing, but also of knowing that you had the skill to make something understandable when the author couldn’t or wouldn’t do the Sam thing.

Just telling you to revise doesn’t help, of course. You have to know how to revise. The good news is that Williams and Bizup provide a set of principles that anyone can learn and apply. It’s not easy to apply them, and sometimes you won’t have the time to apply them, but keep in mind that

Everything that can be thought at all can be thought clearly. Everything that can be said can be said clearly. —Ludwig wittgenstein

BioOne is collaborating with SPIE and moving to a new platform

A few of you know that one of the hats I wear is that of Chair for the Board of Directors of BioOne. BioOne is a non-profit organization that provides low-cost access to journals in organismal and environmental life sciences while providing the society and non-profit publishers of journals in the BioOne collection with substantial revenue. Leadership of BioOne includes representatives of both the scholarly publishers and academic libraries. I have found it very rewarding to be associated with such a productive collaboration. The focus on low-cost access increases the availability of the journals to students and scholars everywhere. The focus on providing income to publishers ensures that they can continue to publish the journals. Working together, we help to ensure that scholarly communication within the fields represented in BioOne is accessible and sustainable.

Today BioOne is announcing a new collaboration with SPIE, the international society for optics and photonics. What does optics and photonics have to do with life sciences you ask? Well, like BioOne, SPIE is committed to providing electronic access to a wide audience, and BioOne’s journal collection will be hosted on a new, high-performance web site in collaboration with SPIE that launches on January 1, 2019. SPIE is already providing the technology behind the BioOne Career Center, and we look forward to working with them to provide even better access to journal resources than we do now and to develop new ways of serving the life science community that we haven’t even thought of yet.

Here’s the press release:

BioOne, the nonprofit publisher of more than 200 journals from 150 scientific societies and independent presses, has announced the forthcoming launch of a new website for its content aggregation, BioOne Complete. The new website, to launch on January 1, 2019, will be powered by a nonprofit collaboration with SPIE, the international society for optics and photonics.

This significant partnership leverages SPIE’s proprietary platform technology to meet the needs of BioOne’s community, including its more than 4,000 accessing libraries worldwide. The new BioOne platform (remaining at bioone.org) will give BioOne Complete a more modern and intuitive look and feel, while enhancing user functionality.

Lauren Kane, BioOne Chief Strategy and Operating Officer, notes, “This exciting partnership better positions BioOne for growth in the future, all while redirecting a major cost center to a fellow not-for-profit organization. SPIE has already proven to be a responsive and creative collaborator with an appreciation for BioOne’s mission and stakeholder needs. We are excited to share this news, and soon, our new site, with the community.”

Scott Ritchey, SPIE Chief Technology Officer, adds, “Our partnership with BioOne demonstrates the value that compatible, not-for-profit organizations can create when working together. The SPIE mission is better fulfilled with the shared insights and economies of scale created by our relationship with BioOne.”

BioOne’s goal is to ensure that this will be a seamless and transparent transition for all stakeholder groups. All aggregation content, subscriber licenses, and user profiles are being migrated to the new site. The BioOne team will be in touch throughout the fall with updates, required actions, and educational resources.

Proposed revisions to US Endangered Species Act regulations

On Monday I pointed out that the US Fish & Wildlife Service and the National Marine Service planned to propose revisions to regulations that affect how the Endangered Species Act is implemented. The proposed changes were published in the Federal Register today. There are three sets of changes. Here are links and the accompanying summary for each:

We, the U.S. Fish and Wildlife Service (FWS) and the National Marine Fisheries Service (NMFS) (collectively referred to as the “Services” or “we”), propose to revise portions of our regulations that implement section 4 of the Endangered Species Act of 1973, as amended (Act). The proposed revisions to the regulations clarify, interpret, and implement portions of the Act concerning the procedures and criteria used for listing or removing species from the Lists of Endangered and Threatened Wildlife and Plants and designating critical habitat. We also propose to make multiple technical revisions to update existing sections or to refer appropriately to other sections. https://www.federalregister.gov/documents/2018/07/25/2018-15810/endangered-and-threatened-wildlife-and-plants-revision-of-the-regulations-for-listing-species-and

We, the U.S. Fish and Wildlife Service, propose to revise our regulations extending most of the prohibitions for activities involving endangered species to threatened species. For species already listed as a threatened species, the proposed regulations would not alter the applicable prohibitions. The proposed regulations would require the Service, pursuant to section 4(d) of the Endangered Species Act, to determine what, if any, protective regulations are appropriate for species that the Service in the future determines to be threatened. https://www.federalregister.gov/documents/2018/07/25/2018-15811/endangered-and-threatened-wildlife-and-plants-revision-of-the-regulations-for-prohibitions-to

We, FWS and NMFS (collectively referred to as the “Services” or “we”), propose to amend portions of our regulations that implement section 7 of the Endangered Species Act of 1973, as amended. The Services are proposing these changes to improve and clarify the interagency consultation processes and make them more efficient and consistent. https://www.federalregister.gov/documents/2018/07/25/2018-15812/endangered-and-threatened-wildlife-and-plants-revision-of-regulations-for-interagency-cooperation

The period for public comment ends on 24 September 2018.

Proposed revisions to regulations implementing the US Endangered Species Act

The US Fish & Wildlife Service and the National Marine Fisheries Service are charged with implementing the US Endangered Species Act. On Wednesday, they will publish three proposed rules in the Federal Register that modify existing regulations by which they implement the act. The proposed rules deal specifically with

  • Criteria for listing of species as endangered or threatened and for designation of critical habitat,
  • Aligning the way in which protections to threatened species are applied between USFWS and NMFS, and
  • Changing requirements and procedures associated with interagency cooperation on activities that affect endangered species.

If you are interested in how the Endangered Species Act is implemented in the United States, I urge you to read the proposed changes. If you want to comment on them, you have two options (on or after Wednesday, 25 July):

  1. Go to the Federal eRulemaking Portal (http://www.regulations.gov), enter FWS-HQ-ES-2018-0006 in the search box, click on the “Proposed Rules” link, click on “Comment Now!”, and submit your comment.
  2. Deliver a hard copy of you comments by US mail or hand delivery to
    • Public Comments Processing, Attn: FWS–HQ–ES–2018–0006; U.S. Fish & Wildlife Service, MS: BPHC, 5275 Leesburg Pike, Falls Church, VA 22041–3803
    • National Marine Fisheries Service, Office of Protected Resources, 1315 East West Highway, Silver Spring, MD 20910.

If you submit comments, they will be posted at http://www.regulations.gov.

I expect to review the proposed changes over the next few weeks and to post my comments on each of the proposals here. Then I’ll collect them into a single comment and post them at http://www.regulations.gov. If you read my comments and disagree, please explain how and why you disagree in the comments. Your comments will make my the comments I share with USFWS and NMFS much better.

Saturday afternoon at Trail Wood

OK. This is mildly embarrassing. I moved to Connecticut in 1986, I was one of the co-founders of the Edwin Way Teale Lecture Series on Nature and the Environment in 1996, I’ve read A Naturalist Buys an Old Farm at least half a dozen times, and Trail Wood is less than 30 miles (40 minutes) from my home in Coventry, but it wasn’t until Saturday that I finally visited. It won’t be the last time. I expect to return once or twice a year to the Beaver Pond Trail, to cross Starfield and Firefly Meadow, and to visit the Summerhouse and Writing Cabin.

Black-eyed susan (Rudbeckia hirta) photographed at Trail Wood

A nice patch of black-eyed susan (Rudbeckia hirta) greeted me near the parking area, which is just a short walk from the house at Trail Brook. Rather than following Veery Lane, I turned left and followed the path through Firefly Meadow towards the small pond.

Edwin Way Teale’s writing cabin at Trail Wood

The Writing Cabin is on the southwest shore of the pond. I turned right and followed the northeast shore to Summerhouse. From there I followed a path along the stone wall bordering Woodcock Pasture until it met the Shagbark Hickory Trail.

Spotted wintergreen (Chimaphila maculata) photographed at Trail Wood

I found spotted wintergreen (Chimaphila maculata) along the Shagbark Hickory Trail , which I followed to the Old Colonial Road. From their I followed the Beaver Pond Trail to the edge of the pond.

Beaver Pond at Trail Wood

After sitting for a while on a nice bench at the south end of the pond, I backtracked on the Beaver Pond Trail and followed the Fern Brook trail through Starfield back to the house and then to the parking area. The whole walk was less than a mile and a half, and the total elevation gain was only 55 feet. It was definitely an easy walk, not a hike, but it was very pleasant, and it was nice to spend time on the old farm where Teale spent so much of his time.

So to anyone from UConn (or nearby) who reads this and hasn’t been to Trail Wood yet, take a couple of hours some afternoon, drive to Hampton, and explore. Trail Wood is easy to find, and it’s open from dawn to dusk. It’s a gem in our own backyard. And if you haven’t read A Naturalist Buys an Old Farm, do it now. You’ll enjoy your visit to Trail Wood even more if you do.

On the importance of making observations (and inferences) at the right hierarchical level

I mentioned a couple of weeks ago that trait-environment associations observed at a global scale across many lineages don’t necessarily correspond to those observed within lineages at a smaller scale (link). I didn’t mention it then, but this is just another example of the general phenomenon known as the ecological fallacy, in which associations evident at the level of a group are attributed to individuals within the group. The ecological fallacy is related to Simpson’s paradox in which within-group associations differ from those between groups.

A recent paper in Proceedings of the National Academy of Sciences gives practical examples of why it’s important to make observations at the level you’re interested in and why you should be very careful about extrapolating associations observed at one level to associations at another. They report on six repeated-measure studies in which the responses of multiple participants (87-94) 1 were assessed across time. Thus, the authors could assess both the amount of variation within individuals over time and the amount of variation among individuals at one time. They found that the amount of within individual variation was between two and four times higher than the amount of among individual variation. Why do we care? Well, if you wanted to know, for example whether administering imipramine reduced symptoms of clinical depression (sample 4 in the paper) and used the among individual variance in depression measured once to assess whether or not an observed difference was statistically meaningful, you’d be using a standard error that’s a factor of two or more too small. As a result, you’d be more confident that a difference exists than you should be based on the amount of variation within individuals.

Why does this matter to an ecologist or an evolutionary biologist? Have you ever heard of “space-time substitution”? Do a Google search and near the top you’ll find a link to this chapter from Long Term Studies in Ecology by Steward Pickett. The idea is that because longitudinal studies take a very long time, we can use variation in space as a substitute for variation in time. The assumption is rarely tested (see this paper for an exception), but it is widely used. The problem is that in any spatially structured system with a finite number of populations or sites, the variance among sites at any one time (the spatial variation we’d measure) is substantially less than the variance in any one site across time (the temporal variance). If we’re interested in the spatial variance, that’s fine. If we’re interested in how variable the system is over time, though, it’s a problem. It’s also a problem if we believe that associations we see across populations at one point in time are characteristics of any one population across time.

In the context of the leaf economic spectrum, most of the global associations that have been documented involve associations between species mean trait values. For the same reason that space-time substitution may not work and for the same reason that this recent paper in PNAS illustrates that among group associations in humans don’t reliably predict individual associations, if we want to understand the mechanistic basis of trait-environment or trait-trait associations, by which I mean the evolutionary mechanisms acting at the individual level that produce those associations within individuals, we need to measure the traits on individuals and measure the environments where those individuals occur.

Here’a the title and abstract of the paper that inspired this post. I’ve also included a link.

Lack of group-to-individual generalizability is a threat to human subjects research

Aaron J. Fisher, John D. Medaglia, and Bertus F. Jeronimus

Only for ergodic processes will inferences based on group-level data generalize to individual experience or behavior. Because human social and psychological processes typically have an individually variable and time-varying nature, they are unlikely to be ergodic. In this paper, six studies with a repeated-measure design were used for symmetric comparisons of interindividual and intraindividual variation. Our results delineate the potential scope and impact of nonergodic data in human subjects research. Analyses across six samples (with 87–94 participants and an equal number of assessments per participant) showed some degree of agreement in central tendency estimates (mean) between groups and individuals across constructs and data collection paradigms. However, the variance around the expected value was two to four times larger within individuals than within groups. This suggests that literatures in social and medical sciences may overestimate the accuracy of aggregated statistical estimates. This observation could have serious consequences for how we understand the consistency between group and individual correlations, and the generalizability of conclusions between domains. Researchers should explicitly test for equivalence of processes at the individual and group level across the social and medical sciences.

doi: 10.1073/pnas.1711978115

  1. The studies are on human subjects.

You really need to check your statistical models, not just fit them

I haven’t had a chance to read the paper I mention below yet, but it looks like a very good guide to model checking – a step that is too often forgotten. It doesn’t do us much good to estimate parameters of a statistical model that doesn’t do well at fitting the data we have. That’s what model checking is all about. In a Bayesian context, posterior predictive model checking is particularly useful.1 If the parameters and the model you used to estimate them can’t reproduce the data you collected reasonably well, the model isn’t doing a good job of fitting the data, and you shouldn’t trust the parameter estimates.

If you happen to be using Stan (via rstan) or rstanarm, posterior predictive model checking is either immediately available (rstanarm) or easy to make available (rstan) in Shinystan. It’s built on the functions in bayesplot, which provides the underlying functions for posterior prediction for virtually any package (provided you coerce the result into the right format). I’ve been using bayesplot lately, because it integrates nicely with R Notebooks, meaning that I can keep a record of my model checking in the same place that I’m developing and refining the code that I’m working on.

Here’s the title, abstract, and a link:

A guide to Bayesian model checking for ecologists

Paul B. Conn, Devin S. Johnson, Perry J. Williams, Sharon R. Melin, Mevin B. Hooten

Ecological Mongraphs doi: 10.1002/ecm.1314

Checking that models adequately represent data is an essential component of applied statistical inference. Ecologists increasingly use hierarchical Bayesian statistical models in their research. The appeal of this modeling paradigm is undeniable, as researchers can build and fit models that embody complex ecological processes while simultaneously accounting for observation error. However, ecologists tend to be less focused on checking model assumptions and assessing potential lack of fit when applying Bayesian methods than when applying more traditional modes of inference such as maximum likelihood. There are also multiple ways of assessing the fit of Bayesian models, each of which has strengths and weaknesses. For instance, Bayesian P values are relatively easy to compute, but are well known to be conservative, producing P values biased toward 0.5. Alternatively, lesser known approaches to model checking, such as prior predictive checks, cross‐validation probability integral transforms, and pivot discrepancy measures may produce more accurate characterizations of goodness‐of‐fit but are not as well known to ecologists. In addition, a suite of visual and targeted diagnostics can be used to examine violations of different model assumptions and lack of fit at different levels of the modeling hierarchy, and to check for residual temporal or spatial autocorrelation. In this review, we synthesize existing literature to guide ecologists through the many available options for Bayesian model checking. We illustrate methods and procedures with several ecological case studies including (1) analysis of simulated spatiotemporal count data, (2) N‐mixture models for estimating abundance of sea otters from an aircraft, and (3) hidden Markov modeling to describe attendance patterns of California sea lion mothers on a rookery. We find that commonly used procedures based on posterior predictive P values detect extreme model inadequacy, but often do not detect more subtle cases of lack of fit. Tests based on cross‐validation and pivot discrepancy measures (including the “sampled predictive P value”) appear to be better suited to model checking and to have better overall statistical performance. We conclude that model checking is necessary to ensure that scientific inference is well founded. As an essential component of scientific discovery, it should accompany most Bayesian analyses presented in the literature.

  1. Andrew Gelman introduced the idea more than 20 year ago (link), but it’s only really caught on since his Stan group made some general purpose packages available that simplify the process of producing the predictions. (See the next paragraph for references.)

Alan Gelfand on the history of MCMC and the future of statistics (in a world of data science)

I am fortunate to have known Alan Gelfand for a couple of decades. I first met him in the late 1990s when I walked over to the Math/Science building to talk with him about some problems I was having in my early exploration of Bayesian inference for F-statistics. I was using BUGS (this was pre-WinBUGS), but it was the modeling I needed some advice on. I didn’t realize until a couple of years later that Alan was the Gelfand of Gelfand and Smith, “Sampling-Based Approaches to Calculating Marginal Densities” (Journal of the American Statistical Association 85:398-409; 1990 – doi: 10.1080/01621459.1990.10476213)  and Gelfand et al. “Illustration of Bayesian Inference in Normal Data Models Using Gibbs Sampling” (Journal of the American Statistical Association 85:972-985; 1990 – doi: 10.1080/01621459.1990.10474968). Fortunately, Alan is too nice to have pointed out how naive I was. He simply gave me a lot of help. I haven’t seen him as often since he moved to Duke, but our paths still cross every year or two, because he and John Silander continue to collaborate on various problems in community ecology.

Alan was a keynote speaker at the Statistics in Ecology and Environmental Monitoring Conference in Queenstown, NZ last December, and David Warton posted a YouTube interview on the Methods Blog of the British Ecological Society. Alan describes the early history of MCMC, mentions his concern about the emergence of “data science”, and talks about what excites him most now – applying statistics to difficult problems in ecology and environmental science.

Trait-environment relationships in Pelargonium

Almost 15 years ago Wright et al. (Nature 428:821–827; 2004 – doi: 10.1038/nature02403) described the worldwide leaf economics spectrum “a universal spectrum of leaf economics consisting of key chemical, structural and physiological properties.” Since then, an enormous number of articles have been published that examine or refer to it – more than 4000 according to Google Scholar. In the past few years, many authors have pointed out that it may not be as universal as originally presumed. For example, in Mitchell et al. (The American Naturalist 185:525-537; 2015 – http://www.jstor.org/stable/10.1086/680051) we found a negative relationship between an important component of the leaf economics spectrum (leaf mass per area) and mean annual temperature in Pelargonium from the Cape Floristic Region of southwestern South Africa, while the global pattern is for a positive relationship.1

Now Tim Moore and several of my colleagues follow up with a more detailed analysis of trait-environment relationships in Pelargonium. They demonstrate several ways in which the global pattern breaks down in South African samples of this genus. Here’s the abstract and a link to the paper.

  • Functional traits in closely related lineages are expected to vary similarly along common environmental gradients as a result of shared evolutionary and biogeographic history, or legacy effects, and as a result of biophysical tradeoffs in construction. We test these predictions in Pelargonium, a relatively recent evolutionary radiation.
  • Bayesian phylogenetic mixed effects models assessed, at the subclade level, associations between plant height, leaf area, leaf nitrogen content and leaf mass per area (LMA), and five environmental variables capturing temperature and rainfall gradients across the Greater Cape Floristic Region of South Africa. Trait–trait integration was assessed via pairwise correlations within subclades.
  • Of 20 trait–environment associations, 17 differed among subclades. Signs of regression coefficients diverged for height, leaf area and leaf nitrogen content, but not for LMA. Subclades also differed in trait–trait relationships and these differences were modulated by rainfall seasonality. Leave‐one‐out cross‐validation revealed that whether trait variation was better predicted by environmental predictors or trait–trait integration depended on the clade and trait in question.
  • Legacy signals in trait–environment and trait–trait relationships were apparently lost during the earliest diversification of Pelargonium, but then retained during subsequent subclade evolution. Overall, we demonstrate that global‐scale patterns are poor predictors of patterns of trait variation at finer geographic and taxonomic scales.

doi.org/10.1111/nph.15196

  1. If you read The American Naturalist paper, you’ll see that we wrote in the Discussion that “We could not detect a relationship between LMA and MAT in Protea….” I wouldn’t write it that way now. Look at Table 2. You’ll see that the posterior mean for the relationship is 0.135 with a 95% credible interval of (-0.078,0.340). I would now write that “We detected a weakly supported positive relationship between LMA and MAT….” Why the difference? I’ve taken to heart Andrew Gelman’s observation that “The difference between significant’ and ‘not significant’ is not itself statistically significant” (blog post; article in The American Statistician). I am training myself to pay less attention to which coefficients in a regression and which aren’t and more to reporting the best guess we have about each relationship (the posterior means) and the amount of confidence we have about them (the credible intervals). I recently learned about hypothesis() in brms, which will provide an estimate of the posterior probability that the you’ve got the sign of the relationship right. I need to investigate that. I suspect that’s what I’ll be using in the future.