{"id":797,"date":"2019-09-02T08:00:00","date_gmt":"2019-09-02T12:00:00","guid":{"rendered":"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/?p=797"},"modified":"2019-08-31T08:36:38","modified_gmt":"2019-08-31T12:36:38","slug":"principal-components-regression","status":"publish","type":"post","link":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2019\/09\/02\/principal-components-regression\/","title":{"rendered":"Principal components regression"},"content":{"rendered":"\r\n<p><a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/variable-selection-in-multiple-regression\/\">Variable selection in multiple regression<\/a><\/p>\r\n<p>In the <a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2019\/08\/26\/trying-out-a-couple-of-simple-strategies-for-reducing-the-number-of-covariates\/\">last installment<\/a> of this series we explored a couple of simple strategies to reduce the number of covariates in a multiple regression.<sup><a id=\"ffn1\" class=\"footnote\" href=\"#fn1\">1<\/a><\/sup>, namely retaining only covariates that have a \u201creal\u201d relationship with the response variable<sup><a id=\"ffn2\" class=\"footnote\" href=\"#fn2\">2<\/a><\/sup> and selecting one covariate from each cluster of (relatively) uncorrelated covariates.<sup><a id=\"ffn3\" class=\"footnote\" href=\"#fn3\">3<\/a><\/sup> Unfortunately, we found that neither approach worked very well in our toy example.<sup><a id=\"ffn4\" class=\"footnote\" href=\"#fn4\">4<\/a><\/sup>.<\/p>\r\n\r\n\r\n\r\n<p>One of the reasons that the second approach (picking \u201cweakly\u201d correlated covariates) may not have worked very well is that in our toy example we know that both <code>x1<\/code> and <code>x3<\/code> contribute positively to <code>y<\/code>, but our analysis included only <code>x1<\/code>. Another approach that is sometimes used when there\u2019s a lot of association among covariates is to first perform a principal components analysis and then to regress the response variable on the scores from the first few principal components. The newest <a href=\"http:\/\/darwin.eeb.uconn.edu\/pages\/variable-selection\/principal-components-regression.nb.html\">R notebook<\/a> in this series explores principal component regression.<\/p>\r\n\r\n\r\n\r\n<p>Spoiler alert: It doesn\u2019t help the point estimates much either, but the uncertainty around those point estimates is so large that we can\u2019t legitimately say they\u2019re different from one another.<\/p>\r\n\r\n\r\n\r\n<ol class=\"wp-block-list\">\r\n<li id=\"fn1\">If you\u2019ve forgotten why we might want to reduce the number of covariates, look back at <a href=\"http:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2019\/08\/19\/challenges-of-multiple-regression-or-why-we-might-want-to-select-variables\/\">this post<\/a>. <a href=\"#ffn1\">&#x21a9;<\/a><\/li>\r\n<li id=\"fn2\">The paradox lurking here is that if we knew which covariates these were, we probably wouldn\u2019t have measured the others (or at least we wouldn\u2019t have included them in the regression analysis). <a href=\"#ffn2\">&#x21a9;<\/a><\/li>\r\n<li id=\"fn3\">There isn\u2019t a good criterion to determine how weak the correlation needs to be to regard clusters as \u201crelatively\u201d uncorrelated. <a href=\"#ffn3\">&#x21a9;<\/a><\/li>\r\n<li id=\"fn4\">If you\u2019re reading footnotes, you\u2019ll realize that the situation isn\u2019t quite as dire as it appears from looking only at point estimates. Using <code>rstanarm()<\/code> for a Bayesian analysis shows that the credible intervals are very broad and overlapping. We don\u2019t have good evidence that the point estimates are different from one another. <a href=\"#ffn4\">&#x21a9;<\/a><\/li>\r\n<\/ol>\r\n","protected":false},"excerpt":{"rendered":"<p>Variable selection in multiple regression In the last installment of this series we explored a couple of simple strategies to reduce the number of covariates in a multiple regression.1, namely&#8230; <a class=\"read-more-button\" href=\"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/blog\/2019\/09\/02\/principal-components-regression\/\">Read more &gt;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[],"class_list":["post-797","post","type-post","status-publish","format-standard","hentry","category-statistics"],"_links":{"self":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/797","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/comments?post=797"}],"version-history":[{"count":6,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/797\/revisions"}],"predecessor-version":[{"id":800,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/posts\/797\/revisions\/800"}],"wp:attachment":[{"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/media?parent=797"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/categories?post=797"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/darwin.eeb.uconn.edu\/uncommon-ground\/wp-json\/wp\/v2\/tags?post=797"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}