{"id":790,"date":"2019-08-26T08:00:00","date_gmt":"2019-08-26T12:00:00","guid":{"rendered":"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/?p=790"},"modified":"2019-08-24T10:56:40","modified_gmt":"2019-08-24T14:56:40","slug":"trying-out-a-couple-of-simple-strategies-for-reducing-the-number-of-covariates","status":"publish","type":"post","link":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2019\/08\/26\/trying-out-a-couple-of-simple-strategies-for-reducing-the-number-of-covariates\/","title":{"rendered":"Trying out a couple of simple strategies for reducing the number of covariates"},"content":{"rendered":"\r\n<p class=\"wp-block-paragraph\"><a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/variable-selection-in-multiple-regression\/\">Variable selection in multiple regression<\/a><\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If you\u2019ve been following this series, you now know that <a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2019\/08\/12\/collecting-my-thoughts-about-variable-selection-in-multiple-regression\/\">multiple regression can be very useful<\/a> but that its usefulness depends on <a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2019\/08\/19\/challenges-of-multiple-regression-or-why-we-might-want-to-select-variables\/\">overcoming several challenges<\/a>. One of those challenges is that if we use all of the covariates available to us and some of them are highly correlated with one another, our assessment of which covariates have an association with the response variable may be misleading and any prediction we make about new observations may be very unreliable. That leads us to the problem of variable selection. Rather than using all of the covariates we have available, maybe we\u2019d be better off if we used only a few.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">In <a href=\"http:\/\/darwin.eeb.uconn.edu\/pages\/variable-selection\/reducing-the-number-of-covariates.nb.html\">this R notebook<\/a>, I explore a couple of approaches to variable selection:<\/p>\r\n\r\n\r\n\r\n<ol class=\"wp-block-list\">\r\n<li>Restricting the covariates to those we know have an association with the response variable.<sup><a id=\"ffn1\" class=\"footnote\" href=\"#fn1\">1<\/a><\/sup><\/li>\r\n<li>Identifying clusters of covariates that are highly associated with one another, (relatively) unassociated with those in other clusters, and picking one covariate from each cluster for the analysis.<sup><a id=\"ffn2\" class=\"footnote\" href=\"#fn2\">2<\/a><\/sup> <\/li>\r\n<\/ol>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">As you\u2019ll see for the sample data set we\u2019ve been exploring in which there are two clusters of covariates having strong associations within clusters and weak to non-existent associations between clusters, neither of these approaches serves us particularly well. The next installment will explore another commonly used approach &#8211; <a href=\"https:\/\/en.wikipedia.org\/wiki\/Principal_component_regression\">principal components regression<\/a>.<\/p>\r\n\r\n\r\n\r\n<ol class=\"wp-block-list\">\r\n<li id=\"fn1\">There\u2019s at least one obvious problem with this approach that I don\u2019t discuss in the notebook. In the work I\u2019ve been involved with, we rarely know ahead of time which covariates, if any, have \u201creal\u201d relationships with the response variable. Most often we\u2019ve measured covariates because we anticipate that they have some relationship to what we\u2019re interested in and we\u2019re trying to figure out which one(s) are most important. <a href=\"#ffn1\">&#x21a9;<\/a><\/li>\r\n<li id=\"fn2\">This approach has some practical problems that I don\u2019t discuss in the notebook. How strong do associations have to be to be \u201chighly associated\u201d? How weak do they have to be to be \u201c(relatively) unassociated\u201d? What do we do if there isn\u2019t a clear cutoff between \u201chighly associated\u201d and \u201c(relatively) associated\u201d? <a href=\"#ffn2\">&#x21a9;<\/a><\/li>\r\n<\/ol>\r\n","protected":false},"excerpt":{"rendered":"<p>Variable selection in multiple regression If you\u2019ve been following this series, you now know that multiple regression can be very useful but that its usefulness depends on overcoming several challenges&#8230;. <a class=\"read-more-button\" href=\"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2019\/08\/26\/trying-out-a-couple-of-simple-strategies-for-reducing-the-number-of-covariates\/\">Read more &gt;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[],"class_list":["post-790","post","type-post","status-publish","format-standard","hentry","category-statistics"],"_links":{"self":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/790","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/comments?post=790"}],"version-history":[{"count":2,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/790\/revisions"}],"predecessor-version":[{"id":792,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/790\/revisions\/792"}],"wp:attachment":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/media?parent=790"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/categories?post=790"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/tags?post=790"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}