Often we want to use only part of the data that are available for further analysis. The subset()
function in base R
provides a convenient way to do that. I’ll illustrate with a simple example using data from Protea repens that will be part of the lab exercise for week 2.
library(tidyverse)
## I almost always begin my work with this line. rm() removes objects
## from memory. list = ls() specifies all of the objects in memory. The
## effect is to remove all objects from memory so that I begin with a
## clean slate.
##
rm(list = ls())
dat <- read_csv("http://darwin.eeb.uconn.edu/eeb348-resources/repens-outliers.csv")
── Column specification ───────────────────────────────────────────────
cols(
.default = col_double(),
pop = col_character()
)
ℹ Use `spec()` for the full column specifications.
If you look in the “Data” window in RStudio
(probably the upper right window), you’ll see that there’s an object called dat
that has 662 observations and 174 variables. For this example we’ll work only with the first column. Let’s see how many observations we have from each population.
table(dat$pop)
ALC ANY BAN BAV BRD CDB CER GAR KAR KLM KSW LOE MGU POT RIV RND SWA
12 52 8 36 46 22 41 72 24 39 47 22 12 28 36 42 58
UNI VAN
30 35
Suppose we’re interested only in individuals from BRD
. We’ll use subset
to create a new object that includes only those individuals.
brd <- subset(dat, pop == "BRD")
Now you’ll see brd
in the “Data” window. It has 46 observations (which matches the sample size we saw using table()
) and 174 variables. brd
retains all of the data, but only for individuals from BRD. We can verify that by running table()
again.
table(brd$pop)
BRD
46
LS0tCnRpdGxlOiAiVXNpbmcgc3Vic2V0KCkgaW4gUiIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKT2Z0ZW4gd2Ugd2FudCB0byB1c2Ugb25seSBwYXJ0IG9mIHRoZSBkYXRhIHRoYXQgYXJlIGF2YWlsYWJsZSBmb3IgZnVydGhlciBhbmFseXNpcy4gVGhlIGBzdWJzZXQoKWAgZnVuY3Rpb24gaW4gYmFzZSBgUmAgcHJvdmlkZXMgYSBjb252ZW5pZW50IHdheSB0byBkbyB0aGF0LiBJJ2xsIGlsbHVzdHJhdGUgd2l0aCBhIHNpbXBsZSBleGFtcGxlIHVzaW5nIGRhdGEgZnJvbSAqUHJvdGVhIHJlcGVucyogdGhhdCB3aWxsIGJlIHBhcnQgb2YgdGhlIGxhYiBleGVyY2lzZSBmb3Igd2VlayAyLgoKYGBge3J9CmxpYnJhcnkodGlkeXZlcnNlKQoKIyMgSSBhbG1vc3QgYWx3YXlzIGJlZ2luIG15IHdvcmsgd2l0aCB0aGlzIGxpbmUuIHJtKCkgcmVtb3ZlcyBvYmplY3RzIAojIyBmcm9tIG1lbW9yeS4gbGlzdCA9IGxzKCkgc3BlY2lmaWVzIGFsbCBvZiB0aGUgb2JqZWN0cyBpbiBtZW1vcnkuIFRoZQojIyBlZmZlY3QgaXMgdG8gcmVtb3ZlIGFsbCBvYmplY3RzIGZyb20gbWVtb3J5IHNvIHRoYXQgSSBiZWdpbiB3aXRoIGEgCiMjIGNsZWFuIHNsYXRlLgojIwpybShsaXN0ID0gbHMoKSkKCmRhdCA8LSByZWFkX2NzdigiaHR0cDovL2Rhcndpbi5lZWIudWNvbm4uZWR1L2VlYjM0OC1yZXNvdXJjZXMvcmVwZW5zLW91dGxpZXJzLmNzdiIpCmBgYAoKSWYgeW91IGxvb2sgaW4gdGhlICJEYXRhIiB3aW5kb3cgaW4gYFJTdHVkaW9gIChwcm9iYWJseSB0aGUgdXBwZXIgcmlnaHQgd2luZG93KSwgeW91J2xsIHNlZSB0aGF0IHRoZXJlJ3MgYW4gb2JqZWN0IGNhbGxlZCBgZGF0YCB0aGF0IGhhcyA2NjIgb2JzZXJ2YXRpb25zIGFuZCAxNzQgdmFyaWFibGVzLiBGb3IgdGhpcyBleGFtcGxlIHdlJ2xsIHdvcmsgb25seSB3aXRoIHRoZSBmaXJzdCBjb2x1bW4uIExldCdzIHNlZSBob3cgbWFueSBvYnNlcnZhdGlvbnMgd2UgaGF2ZSBmcm9tIGVhY2ggcG9wdWxhdGlvbi4KCmBgYHtyfQp0YWJsZShkYXQkcG9wKQpgYGAKClN1cHBvc2Ugd2UncmUgaW50ZXJlc3RlZCBvbmx5IGluIGluZGl2aWR1YWxzIGZyb20gYEJSRGAuIFdlJ2xsIHVzZSBgc3Vic2V0YCB0byBjcmVhdGUgYSBuZXcgb2JqZWN0IHRoYXQgaW5jbHVkZXMgb25seSB0aG9zZSBpbmRpdmlkdWFscy4KCmBgYHtyfQpicmQgPC0gc3Vic2V0KGRhdCwgcG9wID09ICJCUkQiKQpgYGAKCk5vdyB5b3UnbGwgc2VlIGBicmRgIGluIHRoZSAiRGF0YSIgd2luZG93LiBJdCBoYXMgNDYgb2JzZXJ2YXRpb25zICh3aGljaCBtYXRjaGVzIHRoZSBzYW1wbGUgc2l6ZSB3ZSBzYXcgdXNpbmcgYHRhYmxlKClgKSBhbmQgMTc0IHZhcmlhYmxlcy4gYGJyZGAgcmV0YWlucyBhbGwgb2YgdGhlIGRhdGEsIGJ1dCBvbmx5IGZvciBpbmRpdmlkdWFscyBmcm9tIEJSRC4gV2UgY2FuIHZlcmlmeSB0aGF0IGJ5IHJ1bm5pbmcgYHRhYmxlKClgIGFnYWluLgoKYGBge3J9CnRhYmxlKGJyZCRwb3ApCmBgYAo=