I was talking with one of my graduate students a few days ago about variable selection in multiple regression. She was looking for a published “cheat sheet.” I told her I didn’t know of any. “Why don’t you write one?” “The world’s too complicated for that. There will always be judgment involved. There will never be a simple recipe to follow.” That was the end of it, for then.
From the title you can tell that I decided I needed to get my own thoughts in order about variable selection. If you know me, you also know that I find one of the best ways to get my thoughts straight is to write them down. So that’s what I’m starting now.
Expect to see a new entry every week or so. I’ll be posting the details in R notebooks so that you can download the code, run it yourself, and play around with it if you’re so inclined.1 As I develop notebooks, I’ll develop a static page with links to them. Unlike the page on causal inference in ecology, which links to blog posts, these will link directly to HTML versions of R notebooks that will show discuss the aspect of the issue I’m working through that week along with the R code that facilitated my thinking. All of the source code will be available in a Github repository, but you’ll also be able to download the .Rmd file when you have the HTML version open simply by clicking on the “Code” button at the top right of the page and selecting “Download Rmd” from the dropdown.
If you’re still interested after all of that. Here’s a link to the first installment: