Uncommon Ground

Using projection predictiion for variable selection in a Bayesian regression

Variable selection in multiple regression

Horseshoe priors are very easy to use if you’re using rstanarm. You should consider using them in any analysis where you use stan_glm() or stan_glmer(). If you were paying attention, though, I did a bit (OK, more than a bit) of handwaving in deciding which covariates were “important”. In this example, it was pretty easy, because there were some covariates with posterior distributions well away from zero and others with posterior distributions (close to) centered on zero and the difference between the two sets of coefficients was easy to see. That won’t always be the case. In fact, it probably won’t usually be the case. So we’d like to have some way of more “objectively” identifying which covariates are important and which aren’t.

Thats where projection predictive variable selection comes in. It’s an approach that uses a statistically meaningful criterion to guide your choice of variables that are “important” in the sense that including those variables (and only those variables) is sufficient to give predictions roughly equivalent to including all of them. Again, if you’re using rstanarm, it’s very easy to take advantage of the approach thanks to the projpred package available on CRAN.

In case you missed the link to projection predictive variable selection, here it is again: http://darwin.eeb.uconn.edu/pages/variable-selection/projection-predictive-variable-selection.nb.html.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.