The review in the New Scientist confuses Bayes' Theorem1 and Bayesian inference In a way that I presume the book doesn't, but it concludes:
[T]o have crafted a page-turner out of the history of statistics is an impressive feat.Indeed. I can imagine it being a page-turner for a stats geek like me, but David Robson doesn't sound like a stats geek. If Sharon Bertsch McGrayne really turned a history of Bayesian inference into something that more normal people find interesting, she's accomplished a task many of us would envy.
If you're wondering what the review (I hope not the book) got wrong about Bayes' Theorem versus Bayesian inference, read on.

P(B|A) = P(B and A)/P(A)That's the definition of conditional probability. But if that's true for B given A, this is true for A given B:
P(A|B) = P(A and B)/P(B)Since P(A and B) = P(B and A),
P(B|A)P(A) = P(A|B)P(B)and
P(B|A) = P(A|B)P(B)/P(A)Everybody who understands probability would agree with that. That's why they call it a theorem.
So what's Bayesian inference, and why is it controversial?
Well, suppose we collect a bunch of data, call it X. R.A. Fisher pointed out a long time ago that we can calculate a quantity called the likelihood that is, roughly, the probability of getting that data given a particular probability distribution and the (unknown) parameters that govern that probability distribution, like the mean and the variance of a normal distribution. That's P(X|theta), where theta are the unknown parameters.
Fisher proposed to estimate theta by finding the value(s) of theta that make X that maximize P(X|theta), i.e., that make the data more likely than any other possible value. That's what we mean when we say that we have a maximum-likelihood estimate of some parameter.
Bayesians like me use the same likelihood, but we get our estimates from it in a different way. We think it's more natural and informative to think about the probability that the unknown parameters take on a particular value given the data we already have than it is to think about the probability of getting data we already have given some unknown parameters. To "invert" the likelihood - the probability of the data given the parameter - into the more natural probability of the parameter given the data, we use Bayes theorem:
P(theta|X) = P(X|theta)P(theta)/P(X)That looks pretty easy. So why is it controversial? Because we have to specify P(theta), the prior probability. We have to say what are possible (or likely) values of theta before we look at any data. That's where the subjectivity (apparently) comes in. I say "apparently" because I think the subjectivity is more apparent than real. Andrew Gelman has a series of papers that delve into this much more deeply and authoritatively than I can. If you're interested, click on the link below.
Related articles
- The holes in my philosophy of Bayesian data analysis (stat.columbia.edu)
1As you can see, the book refers to "Bayes' rule". I presume, not having read the book yet - I will place an order on Amazon.com to have it delivered to my Kindle as soon as I finish writing this - that by "Bayes' rule" the author means what is more commonly in my world referred to as Bayes' Theorem.



Leave a comment