Variable selection in multiple regression

If you’ve been following along, you’ve now seen some fairly simple approaches for reducing the number of covariates in a linear regression. It shouldn’t come as a shock that statisticians have been worried about the problem for a long time or that they’ve come up with some pretty sophisticated approaches to the problem.^{1} The first one we’ll explore is the Lasso (**l**east **a**solute **s**hrinkage and **s**election **o**perator), which Rob Tibshirani introduced the Lasso to statistics and machine learning more than 20 years ago.^{2} You’ll find more details in the R notebook illustrating using the Lasso to select covariates, but here are the basic ideas.

The “shrinkage” part of the name refers to the idea that we don’t expect all of the covariates we’re including in the model to be important. And if a covariate isn’t important, we want the magnitude of the regression coefficient associated with that component to be zero (or nearly zero). In other words, we want the estimate to be “shrunk” towards zero rather than taking the value it would if we included it in the full multiple regression.

The “selection” part of the name refers to the idea that we don’t know ahead of time which of the covariates are important (and shouldn’t be shrunk towards 0) and which are important (and should be shrunk towards 0). We want the data to tell us which covariates are important and which aren’t, i.e., we want the data to “select” important covariates.

The Lasso accomplishes this by adding a penalty to the typical least squares estimates. Instead of simply minimizing the sum of squared deviations from the regression line, we do so subject to a constraint that the total magnitude of all regression coefficients is less than some value. We’ll use `glmnet()`

to fit the Lasso. If you explore the accompanying documentation, you’ll see that the Lasso is just one method along a continuum of constrained optimization approaches. I’ll let you explore those on your own if you’re interested.

- I’m not going to discuss forward, backward, or all subsets approaches to selecting variables. They don’t seem to be used much anymore (for good reason). If you’re interested in them, take a look at the Wikipedia page on stepwise regression. ↩
- Wikipedia points out that it was originally introduced 10 years earlier in geophysics, but Tibshirani discovered it independently, and it was his discovery that led to its wide use in statistics and machine learning. ↩