May 2019 – Uncommon Ground

Presenting science to the public in a post-truth era – implications for public policy

Last Friday I attended a very interesting symposium entitled Presenting science to the public in a post-truth era and jointly sponsored by the Science of Learning & Art of Communication¹ and the University of Connecticut Humanities Institute, more specifically its project on Humility & Conviction in Public Life.² The speakers – Åsa Wikforss (Stockholm University), Tali Sharot (University College London), and Michael Lynch (UConn) – argued that the primary function³ of posts on social media is to express emotion, not to impart information, that not only are we more likely to accept new evidence that confirms what we already believe than new evidence that contradicts it, and that knowledge resistance often arises because we resist the consequences that would follow from believing the evidence presented to us.

I can’t claim expertise in the factors influencing whether people accept or reject the evidence for climate change, but Merchants of Doubt makes a compelling case that the resistance among some prominent doubters arises because they believe that accepting the evidence that climate change is happening and the humans are primarily responsible will require massive changes in our economic system and, quite possibly, severe limits on individual liberty. In other words, the case Oreskes and Conway make in Merchants of Doubt is consistent with a form of knowledge resistance in which the evidence for human-caused climate change is resisted because of the consequences accepting that evidence would have. It also illustrates a point I do my best to drive home when I teach my course in conservation biology.

As scientists, we discover empirical facts about the world, e.g., CO2 emissions have increased the atmospheric concentration of CO2 far above pre-industrial levels and much of the associated increase in global average temperature is a result of those emissions. Too often, though, we proceed immediately from discovering those empirical facts to concluding that particular policy choices are necessary. We think, for example, that because CO2 emissions are causing changes in global climate we must therefore reduce or eliminate CO2 emissions. There is, however, a step in the logic that’s missing.

To conclude that we must reduce or eliminate CO2 emissions we must first decide that the climate changes associated with increasing CO2 emissions are bad things that we should avoid. It may seem obvious that they are. After all, how could flooding of major metropolitan areas and the elimination of low-lying Pacific Island nations be a good thing? They aren’t. But avoiding them isn’t free. It involves choices. We can spend some amount of money now to avoid those consequences, we can spend money later when the threats are more imminent, or we can let the people who live in those place move out of the way when the time comes. I’m sure you can think of some other choices, too. Even if those three are the only choices, the empirical data alone don’t tell us which one to pick. The choice depends on what kind of world we want to live in. It is a choice based on moral or ethical values. The empirical evidence must inform our choice among the alternatives, but it isn’t sufficient to determinethe choice.

Perhaps the biggest challenge we face in developing a response to climate change is that emotions are so deeply engaged on both sides of the debate that we cannot agree on the empirical facts. A debate that should be played out in the realm of “What kind of world do we want to live in? What values are most important?” Is instead played out in the realm of tribal loyalty.

The limits to knowledge Wikforss, Sharot, and Lynch identified represent real, important barriers to progress. But overcoming knowledge resistance, in particular, seems more likely if we remember that translating knowledge to action requires applying our values. When we are communicating science that means either stopping at the point where empirical evidence ends and application of values begins or making it clear that science ends with the empirical evidence and that our recommendation for action derives from our values.⁴

A training grant funded through the National Science Foundation Research Traineeship (NRT) Program ↩
Funded by the John Templeton Foundation (story in UConn Today). ↩
Note: Lynch used the phrase “primary function” in a technical, philosophical sense inspired by Ruth Milliken’s idea of a “proper function,” but the plain sense of the phrase conveys its basic meaning. ↩
In the real world it may sometimes, perhaps even often, be difficult to make a clean distinction between the realm of empirical research and the realm of ethical values. Distinguishing between them to the extent possible is still valuable, and it is even more valuable to be honest about the ways in which your personal values influence any actions you recommend. ↩

kent May 29, 2019 Environment, Science policy 0 Read more >

How to organize data in spreadsheets

I recently discovered an article by Karl Broman and Kara Woo in The American Statistician entitled “Data organization in spreadsheets” (https://doi.org/10.1080/00031305.2017.1375989). It is the first article in the April 2018 special issue on data science. Why, you might ask, would a journal published by the American Statistical Association devote the first paper in a special issue on data science to spreadsheets instead of something more statistical. Well, among other things it turns out that the risks of using spreadsheets poorly are so great that there’s a European Spreadsheet Risks Interest Group that keeps track of “horror stories” (http://www.eusprig.org/horror-stories.htm). For example, Wisconsin initially estimated that the cost of a recount in the 2016 Presidential election would be $3.5M. After correcting a spreadsheet error, the cost climbed to $3.9M (https://www.wrn.com/2016/11/wisconsin-presidential-recount-will-cost-3-5-million/).

My favorite example, though, dates from 2013. Thomas Herndon, then a third-year doctoral student at UMass Amherst showed that a spreadsheet error in a very influential paper published by two eminent economists, Carmen Reinhart and Kenneth Rogoff, magnified the apparent effect of debt on economic growth (https://www.chronicle.com/article/UMass-Graduate-Student-Talks/138763). That paper was widely cited by economists arguing against economic stimulus in response to the financial crisis of 2008-2009.

That being said, Broman and Woo correctly point out that

Amid this debate, spreadsheets have continued to play a significant role in researchers’ workflows, and it is clear that they are a valuable tool that researchers are unlikely to abandon completely.

So since you’re not going to stop using spreadsheets (and I won’t either), you should at least use them well. If you don’t have time to read the whole article, here are twelve points you should remember:

Be consistent – “Whatever you do, do it consistently.”
Choose good names for things – “It is important to pick good names for things. This can be hard, and so it is worth putting some time and thought into it.”
Write dates as YYYY-MM-DD. https://imgs.xkcd.com/comics/iso_8601.png
No empty cells – Fill in all cells. Use some common code for missing data.¹
Put just one thing in a cell – “The cells in your spreadsheet should each contain one piece of data. Do not put more than one thing in a cell.”
Make it a rectangle – “The best layout for your data within a spreadsheet is as a single big rectangle with rows corresponding to subjects and columns corresponding to variables.”²
Create a data dictionary – “It is helpful to have a separate file that explains what all of the variables are.”
No calculations in raw data files – “Your primary data file should contain just the data and nothing else: no calculations, no graphs.”
Do not use font color or highlighting as data – “Analysis programs can much more readily handle data that are stored in a column than data encoded in cell highlighting, font, etc. (and in fact this markup will be lost completely in many programs).”
Make backups – “Make regular backups of your data. In multiple locations. And consider using a formal version control system, like git, though it is not ideal for data files. If you want to get a bit fancy, maybe look at dat (https://datproject.org/).”
Use data validation to avoid errors
Save the data in plain text files

R likes “NA”, but it’s easy to use “.” or something else. Just use “na.strings” when you use read.csv or “na” when you use readcsv. ↩
If you’re a ggplot user you’ll recognize that this is wide format, while ggplot typically needs long format data. I suggest storing your data in wide format and using ddply() to reformat for plotting. ↩

kent May 22, 2019 Statistics 0 Read more >

New version of RStudio released

If you use R, there’s a good chance that you also use RStudio. I just noticed that the RStudio folks released v1.2 on April 30th. I haven’t had a chance to give it a spin yet, but here’s what they say on the blog:

Over a year in the making, this new release of RStudio includes dozens of new productivity enhancements and capabilities. You’ll now find RStudio a more comfortable workbench for working in SQL, Stan, Python, and D3. Testing your R code is easier, too, with integrations for shinytest and testthat. Create, and test, and publish APIs in R with Plumber. And get more done with background jobs, which let you run R scripts while you work.

Underpinning it all is a new rendering engine based on modern Web standards, so RStudio Desktop looks sharp on displays large and small, and performs better everywhere – especially if you’re using the latest Web technology in your visualizations, Shiny applications, and R Markdown documents. Don’t like how it looks now? No problem–just make your own theme.

You can read more about what’s new this release in the release notes, or our RStudio 1.2 blog series.

I look forward to exploring the new features, and I encourage you to do the same. Running jobs in the background will be especially useful.

kent May 3, 2019 Statistics 0 Read more >

Monthly Archive: May 2019

Presenting science to the public in a post-truth era – implications for public policy

How to organize data in spreadsheets

New version of RStudio released