Often we want to use only part of the data that are available for further analysis. The subset() function in base R provides a convenient way to do that. I’ll illustrate with a simple example using data from Protea repens that will be part of the lab exercise for week 2.

library(tidyverse)

## I almost always begin my work with this line. rm() removes objects 
## from memory. list = ls() specifies all of the objects in memory. The
## effect is to remove all objects from memory so that I begin with a 
## clean slate.
##
rm(list = ls())

dat <- read_csv("http://darwin.eeb.uconn.edu/eeb348-resources/repens-outliers.csv")

── Column specification ───────────────────────────────────────────────
cols(
  .default = col_double(),
  pop = col_character()
)
ℹ Use `spec()` for the full column specifications.

If you look in the “Data” window in RStudio (probably the upper right window), you’ll see that there’s an object called dat that has 662 observations and 174 variables. For this example we’ll work only with the first column. Let’s see how many observations we have from each population.

table(dat$pop)

ALC ANY BAN BAV BRD CDB CER GAR KAR KLM KSW LOE MGU POT RIV RND SWA 
 12  52   8  36  46  22  41  72  24  39  47  22  12  28  36  42  58 
UNI VAN 
 30  35 

Suppose we’re interested only in individuals from BRD. We’ll use subset to create a new object that includes only those individuals.

brd <- subset(dat, pop == "BRD")

Now you’ll see brd in the “Data” window. It has 46 observations (which matches the sample size we saw using table()) and 174 variables. brd retains all of the data, but only for individuals from BRD. We can verify that by running table() again.

table(brd$pop)

BRD 
 46 
LS0tCnRpdGxlOiAiVXNpbmcgc3Vic2V0KCkgaW4gUiIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKT2Z0ZW4gd2Ugd2FudCB0byB1c2Ugb25seSBwYXJ0IG9mIHRoZSBkYXRhIHRoYXQgYXJlIGF2YWlsYWJsZSBmb3IgZnVydGhlciBhbmFseXNpcy4gVGhlIGBzdWJzZXQoKWAgZnVuY3Rpb24gaW4gYmFzZSBgUmAgcHJvdmlkZXMgYSBjb252ZW5pZW50IHdheSB0byBkbyB0aGF0LiBJJ2xsIGlsbHVzdHJhdGUgd2l0aCBhIHNpbXBsZSBleGFtcGxlIHVzaW5nIGRhdGEgZnJvbSAqUHJvdGVhIHJlcGVucyogdGhhdCB3aWxsIGJlIHBhcnQgb2YgdGhlIGxhYiBleGVyY2lzZSBmb3Igd2VlayAyLgoKYGBge3J9CmxpYnJhcnkodGlkeXZlcnNlKQoKIyMgSSBhbG1vc3QgYWx3YXlzIGJlZ2luIG15IHdvcmsgd2l0aCB0aGlzIGxpbmUuIHJtKCkgcmVtb3ZlcyBvYmplY3RzIAojIyBmcm9tIG1lbW9yeS4gbGlzdCA9IGxzKCkgc3BlY2lmaWVzIGFsbCBvZiB0aGUgb2JqZWN0cyBpbiBtZW1vcnkuIFRoZQojIyBlZmZlY3QgaXMgdG8gcmVtb3ZlIGFsbCBvYmplY3RzIGZyb20gbWVtb3J5IHNvIHRoYXQgSSBiZWdpbiB3aXRoIGEgCiMjIGNsZWFuIHNsYXRlLgojIwpybShsaXN0ID0gbHMoKSkKCmRhdCA8LSByZWFkX2NzdigiaHR0cDovL2Rhcndpbi5lZWIudWNvbm4uZWR1L2VlYjM0OC1yZXNvdXJjZXMvcmVwZW5zLW91dGxpZXJzLmNzdiIpCmBgYAoKSWYgeW91IGxvb2sgaW4gdGhlICJEYXRhIiB3aW5kb3cgaW4gYFJTdHVkaW9gIChwcm9iYWJseSB0aGUgdXBwZXIgcmlnaHQgd2luZG93KSwgeW91J2xsIHNlZSB0aGF0IHRoZXJlJ3MgYW4gb2JqZWN0IGNhbGxlZCBgZGF0YCB0aGF0IGhhcyA2NjIgb2JzZXJ2YXRpb25zIGFuZCAxNzQgdmFyaWFibGVzLiBGb3IgdGhpcyBleGFtcGxlIHdlJ2xsIHdvcmsgb25seSB3aXRoIHRoZSBmaXJzdCBjb2x1bW4uIExldCdzIHNlZSBob3cgbWFueSBvYnNlcnZhdGlvbnMgd2UgaGF2ZSBmcm9tIGVhY2ggcG9wdWxhdGlvbi4KCmBgYHtyfQp0YWJsZShkYXQkcG9wKQpgYGAKClN1cHBvc2Ugd2UncmUgaW50ZXJlc3RlZCBvbmx5IGluIGluZGl2aWR1YWxzIGZyb20gYEJSRGAuIFdlJ2xsIHVzZSBgc3Vic2V0YCB0byBjcmVhdGUgYSBuZXcgb2JqZWN0IHRoYXQgaW5jbHVkZXMgb25seSB0aG9zZSBpbmRpdmlkdWFscy4KCmBgYHtyfQpicmQgPC0gc3Vic2V0KGRhdCwgcG9wID09ICJCUkQiKQpgYGAKCk5vdyB5b3UnbGwgc2VlIGBicmRgIGluIHRoZSAiRGF0YSIgd2luZG93LiBJdCBoYXMgNDYgb2JzZXJ2YXRpb25zICh3aGljaCBtYXRjaGVzIHRoZSBzYW1wbGUgc2l6ZSB3ZSBzYXcgdXNpbmcgYHRhYmxlKClgKSBhbmQgMTc0IHZhcmlhYmxlcy4gYGJyZGAgcmV0YWlucyBhbGwgb2YgdGhlIGRhdGEsIGJ1dCBvbmx5IGZvciBpbmRpdmlkdWFscyBmcm9tIEJSRC4gV2UgY2FuIHZlcmlmeSB0aGF0IGJ5IHJ1bm5pbmcgYHRhYmxlKClgIGFnYWluLgoKYGBge3J9CnRhYmxlKGJyZCRwb3ApCmBgYAo=