Variable selection in multiple regression

Horseshoe priors are very easy to use if you’re using `rstanarm`

. You should consider using them in any analysis where you use `stan_glm()`

or `stan_glmer()`

. If you were paying attention, though, I did a bit (OK, more than a bit) of handwaving in deciding which covariates were “important”. In this example, it was pretty easy, because there were some covariates with posterior distributions well away from zero and others with posterior distributions (close to) centered on zero and the difference between the two sets of coefficients was easy to see. That won’t always be the case. In fact, it probably won’t usually be the case. So we’d like to have some way of more “objectively” identifying which covariates are important and which aren’t.

Thats where projection predictive variable selection comes in. It’s an approach that uses a statistically meaningful criterion to guide your choice of variables that are “important” in the sense that including those variables (and only those variables) is sufficient to give predictions roughly equivalent to including all of them. Again, if you’re using `rstanarm`

, it’s very easy to take advantage of the approach thanks to the `projpred`

package available on CRAN.

In case you missed the link to projection predictive variable selection, here it is again: http://darwin.eeb.uconn.edu/pages/variable-selection/projection-predictive-variable-selection.nb.html.