Last week I pointed out that you should

Be wary of results from studies with small sample sizes, even if the effects are statistically significant.

Now you may be thinking to yourself: “I’m a Bayesian, and I use somewhat informative priors. This doesn’t apply to me.” Well, I’m afraid you’re wrong. Here are results from analysis of data simulated according to the same conditions I used last week in exploring P-values. The prior on each mean is N(0, 1), and the prior on each standard deviation is half-N(0, 1).

Mean | Sample size | Power | Wrong sign |
---|---|---|---|

0.05 | 10 | 39/1000 | 18/39 |

50 | 59/1000 | 12/59 | |

100 | 47/1000 | 5/47 | |

0.10 | 10 | 34/1000 | 8/34 |

50 | 81/1000 | 10/81 | |

100 | 115/1000 | 6/115 | |

0.20 | 10 | 62/1000 | 7/62 |

50 | 158/1000 | 2/158 | |

100 | 292/1000 | 0/292 |

Here “Power” refers to the number of times (out of 1000 replicates) the symmetric 95% credible intervals do not overlap 0, which is when we’d normally conclude we have evidence that the means of the two populations are different. Notice that when the effect and sample size are small (0.05 and 10, respectively), we would infer the wrong sign for the difference almost half of the time (18/39). We’re less likely to make a sign error when the effect is larger (7/62 for an effect of 0.20) or when the sample size is large (5/47 for a sample size of 100). But the bottom line remains the same:

Be wary of results from studies with small sample sizes, even if the effects are statistically significant.

This figure summarizes results from the simulation, and you’ll find the code in the same Github repository as the P-value code I mentioned last week: https://github.com/kholsinger/noisy-data. Remember that Gelman and Carlin (*Perspectives on Psychological Science* 9:641; 2014 http://dx.doi.org/10.1177/1745691614551642) also have advice on how to tell whether you’re data are too noisy for your sample to give confidence in your inferences.