We will discuss STRUCTURE
in lecture on Thursday. Unfortunately, I have not been able to get STRUCTURE
to run under Mac OS X Big Sur (v11.5.2). Those of you who use Windows may be able to install and run STRUCTURE
. Those of you running earlier versions of Mac OS X may also be able to install and run it. If you can, and you’d like to play around with it, let me know and I’ll arrange a time to walk you through how to use it.[^ADMIXTURE
provides maximum likelihood estimates of individual assignment using the same sampling model as STRUCTURE
. Binaries are available for Mac OS X and Linux, but not for Windows. I haven’t checked to see whether the Mac OS X binaries work on my machine.]
Fortunately, there is an R
package that we can use to explore the properties of individual assignement methods, although the mathematical approach it uses is quite different from STRUCTURE
. We’ll get to that a bit later. First, a bit about the data we’ll be using this week and next.
The data
Many species of trout and salmon are found in fragmented populations. Some of them persist only as relicts. Gila trout (Onchorhyncus gilae), for example, is restricted to a few relict populations in the southwestern United States where it is threatened by hybridization from non-native rainbow trout (O. mykiss). Camak et al. (2021) collected samples from five populations of O. gilae in southwestern New Mexico and genotyped those samples at 2381 SNP loci. We will use these data to explore population structure in O. gilae this week. In Project #1 (next week) we’ll extend the analysis to address some aspects of the conservation questions that motivated this study.
Introducing LEA
The ugly math
We will be using the package LEA
to explore individual assignment. The principles behind this approach are described in Frichot et al. (2014). Briefly, Frichot et al. note that the ancestry proportions for any individual can be written in matrix form as \[
{\bf P} = {\bf Q}{\bf G} \quad ,
\] where \(\bf Q\) is the ancestry proportion for each individual and \(\bf G\) describes the genotype frequencies in the base populations. So far, this is equivalent to what STRUCTURE
does. Where LEA
differs is that Frichot et al. use a special form of least squares optimization where \(\bf Q\) and \(\bf G\) are constrained to be non-negative their rows are constrained to sum to one.
If none of that makes sense, don’t worry about it. All you really need to know is that LEA
is trying to estimate the same thing as STRUCTURE
and that it’s taking a different approach.
Installing LEA
LEA
is not available on CRAN
. It is a Bioconductor
package. The good news is that it’s less difficult to install that Hickory
. Here’s what you do. First, install BiocManager
.
install.packages("BiocManager")
Then install LEA
.
BiocManager::install("LEA")
As always, let me know if you run into difficulties and I’ll do what I can to help.
Lab exercise 3
Read through section 3.2 of the tutorial on LEA
: https://bioconductor.org/packages/release/bioc/vignettes/LEA/inst/doc/LEA.pdf.
Use snmf()
to identify the best choice for \(K\) (or the best choices if there are several that seem pretty similar). Use values of \(K\) from 2 to 10, by specifying K = 2:10
in the call to snmf()
. Here’s a link to the data: http://darwin.eeb.uconn.edu/eeb348-resources/gila-trout.stru.geno.
Visualize the results from #2 using the function below.. This will put the function plot_LEA.R
in your workspace, and you can call it to plot the results as follows:
sites_table <- c(31, 31, 30, 31, 31)
names(sites_table) <- c("Iron Creek", "Main Diamond", "South Diamond",
"Spruce Creek", "Whiskey Creek")
plot_LEA("your_object_name_here", "your K here", sites_table)
where you substitute the name you used to store the results from snmf()
and the K
you want to visualize.
Do you see substantial evidence of admixture? If so, in which populations? How are you recognizing the presence or absence of admixture?
plot_LEA
Here’s the plotting code that I promised above.
library(tidyverse)
library(ggplot2)
plot_LEA <- function(fit, k, sites_table) {
sites_end <- cumsum(sites_table)
ce <- cross.entropy(fit, K = k)
best <- which.min(ce)
q <- Q(fit, K = k, run = best)
q <- data.frame(individual = seq(from = 1, to = nrow(q)), q)
colnames(q) <- c("Individual", paste("Cluster_", 1:k, sep = ""))
df <- pivot_longer(q, starts_with("Cluster"), names_to = "cluster")
p <- ggplot(df, aes(x = Individual, y = value, fill = cluster)) +
geom_bar(stat = "identity") +
xlab("Individual") +
ylab("Admixture proportion") +
ggtitle(paste("K = ", k, sep = ""))
guides(fill = "none") +
theme_bw()
for (i in 1:length(sites_end)) {
x_increment <- ifelse(i > 1, cumsum(sites_table[2:i])[i-1], 0)
p <- p + geom_vline(xintercept = sites_end[i]) +
annotate("text", x = sites_end[1]/2 + x_increment,
y = 1.05, label = names(sites_table)[i])
}
print(p)
}
LS0tCnRpdGxlOiAiSW5kaXZpZHVhbCBhc3NpZ25tZW50IHdpdGggYExFQWAiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCldlIHdpbGwgZGlzY3VzcyBgU1RSVUNUVVJFYCBpbiBsZWN0dXJlIG9uIFRodXJzZGF5LiBVbmZvcnR1bmF0ZWx5LCBJIGhhdmUgbm90IGJlZW4gYWJsZSB0byBnZXQgYFNUUlVDVFVSRWAgdG8gcnVuIHVuZGVyIE1hYyBPUyBYIEJpZyBTdXIgKHYxMS41LjIpLiBUaG9zZSBvZiB5b3Ugd2hvIHVzZSBXaW5kb3dzIG1heSBiZSBhYmxlIHRvIGluc3RhbGwgYW5kIHJ1biBgU1RSVUNUVVJFYC4gVGhvc2Ugb2YgeW91IHJ1bm5pbmcgZWFybGllciB2ZXJzaW9ucyBvZiBNYWMgT1MgWCBtYXkgYWxzbyBiZSBhYmxlIHRvIGluc3RhbGwgYW5kIHJ1biBpdC4gSWYgeW91IGNhbiwgYW5kIHlvdSdkIGxpa2UgdG8gcGxheSBhcm91bmQgd2l0aCBpdCwgbGV0IG1lIGtub3cgYW5kIEknbGwgYXJyYW5nZSBhIHRpbWUgdG8gd2FsayB5b3UgdGhyb3VnaCBob3cgdG8gdXNlIGl0LlteYEFETUlYVFVSRWAgcHJvdmlkZXMgbWF4aW11bSBsaWtlbGlob29kIGVzdGltYXRlcyBvZiBpbmRpdmlkdWFsIGFzc2lnbm1lbnQgdXNpbmcgdGhlIHNhbWUgc2FtcGxpbmcgbW9kZWwgYXMgYFNUUlVDVFVSRWAuIEJpbmFyaWVzIGFyZSBhdmFpbGFibGUgZm9yIE1hYyBPUyBYIGFuZCBMaW51eCwgYnV0IG5vdCBmb3IgV2luZG93cy4gSSBoYXZlbid0IGNoZWNrZWQgdG8gc2VlIHdoZXRoZXIgdGhlIE1hYyBPUyBYIGJpbmFyaWVzIHdvcmsgb24gbXkgbWFjaGluZS5dIAoKRm9ydHVuYXRlbHksIHRoZXJlIGlzIGFuIGBSYCBwYWNrYWdlIHRoYXQgd2UgY2FuIHVzZSB0byBleHBsb3JlIHRoZSBwcm9wZXJ0aWVzIG9mIGluZGl2aWR1YWwgYXNzaWduZW1lbnQgbWV0aG9kcywgYWx0aG91Z2ggdGhlIG1hdGhlbWF0aWNhbCBhcHByb2FjaCBpdCB1c2VzIGlzIHF1aXRlIGRpZmZlcmVudCBmcm9tIGBTVFJVQ1RVUkVgLiBXZSdsbCBnZXQgdG8gdGhhdCBhIGJpdCBsYXRlci4gRmlyc3QsIGEgYml0IGFib3V0IHRoZSBkYXRhIHdlJ2xsIGJlIHVzaW5nIHRoaXMgd2VlayBhbmQgbmV4dC4KCiMjIFRoZSBkYXRhCgpNYW55IHNwZWNpZXMgb2YgdHJvdXQgYW5kIHNhbG1vbiBhcmUgZm91bmQgaW4gZnJhZ21lbnRlZCBwb3B1bGF0aW9ucy4gU29tZSBvZiB0aGVtIHBlcnNpc3Qgb25seSBhcyByZWxpY3RzLiBHaWxhIHRyb3V0ICgqT25jaG9yaHluY3VzIGdpbGFlKiksIGZvciBleGFtcGxlLCBpcyByZXN0cmljdGVkIHRvIGEgZmV3IHJlbGljdCBwb3B1bGF0aW9ucyBpbiB0aGUgc291dGh3ZXN0ZXJuIFVuaXRlZCBTdGF0ZXMgd2hlcmUgaXQgaXMgdGhyZWF0ZW5lZCBieSBoeWJyaWRpemF0aW9uIGZyb20gbm9uLW5hdGl2ZSByYWluYm93IHRyb3V0ICgqTy4gbXlraXNzKikuIENhbWFrIGV0IGFsLiAoMjAyMSleW0NhbWFrLCBELlQuLCBNLkouIE9zYm9ybmUsIGFuZCBULkYuIFR1cm5lci4gMjAyMS4gUG9wdWxhdGlvbiBnZW5vbWljcyBhbmQgY29uc2VydmF0aW9uIG9mIEdpbGEgdHJvdXQgKCpPbmNob3JoeW5jdXMgZ2lsYWUqKS4gKkNvbnNlcnZhdGlvbiBHZW5ldGljcyogW2h0dHBzOi8vZG9pLm9yZy8xMC4xMDA3L3MxMDU5Mi0wMjEtMDEzNTUtMF0oaHR0cHM6Ly9kb2kub3JnLzEwLjEwMDcvczEwNTkyLTAyMS0wMTM1NS0wKS5dIGNvbGxlY3RlZCBzYW1wbGVzIGZyb20gZml2ZSBwb3B1bGF0aW9ucyBvZiAqTy4gZ2lsYWUqIGluIHNvdXRod2VzdGVybiBOZXcgTWV4aWNvIGFuZCBnZW5vdHlwZWQgdGhvc2Ugc2FtcGxlcyBhdCAyMzgxIFNOUCBsb2NpLiBXZSB3aWxsIHVzZSB0aGVzZSBkYXRhIHRvIGV4cGxvcmUgcG9wdWxhdGlvbiBzdHJ1Y3R1cmUgaW4gKk8uIGdpbGFlKiB0aGlzIHdlZWsuIEluIFByb2plY3QgIzEgKG5leHQgd2Vlaykgd2UnbGwgZXh0ZW5kIHRoZSBhbmFseXNpcyB0byBhZGRyZXNzIHNvbWUgYXNwZWN0cyBvZiB0aGUgY29uc2VydmF0aW9uIHF1ZXN0aW9ucyB0aGF0IG1vdGl2YXRlZCB0aGlzIHN0dWR5LgoKIyMgSW50cm9kdWNpbmcgYExFQWAKCiMjIyBUaGUgdWdseSBtYXRoCgpXZSB3aWxsIGJlIHVzaW5nIHRoZSBwYWNrYWdlIGBMRUFgIHRvIGV4cGxvcmUgaW5kaXZpZHVhbCBhc3NpZ25tZW50LiBUaGUgcHJpbmNpcGxlcyBiZWhpbmQgdGhpcyBhcHByb2FjaCBhcmUgZGVzY3JpYmVkIGluIEZyaWNob3QgZXQgYWwuICgyMDE0KS5eW0ZyaWNob3QsIEUuLCBGLiBNYXRoaWV1LCBULiBUcm91aWxsb24sIEcuIEJvdWNoYXJkIGFuZCBPLiBGcmFuw6dvaXMuIDIwMTQuIEZhc3QgYW5kIGVmZmljaWVudCBlc3RpbWF0aW9uIG9mIGluZGl2aWR1YWwgYW5jZXN0cnkgY29lZmZpY2llbnRzLiAqR2VuZXRpY3MqIDE5Njo5NzPigJM5ODMuIFtodHRwczovL2RvaS5vcmcvMTAuMTUzNC9nZW5ldGljcy4xMTMuMTYwNTcyXShodHRwczovL2RvaS5vcmcvMTAuMTUzNC9nZW5ldGljcy4xMTMuMTYwNTcyKV0gQnJpZWZseSwgRnJpY2hvdCBldCBhbC4gbm90ZSB0aGF0IHRoZSBhbmNlc3RyeSBwcm9wb3J0aW9ucyBmb3IgYW55IGluZGl2aWR1YWwgY2FuIGJlIHdyaXR0ZW4gaW4gbWF0cml4IGZvcm0gYXMKJCQKe1xiZiBQfSA9IHtcYmYgUX17XGJmIEd9IFxxdWFkICwKJCQKd2hlcmUgJFxiZiBRJCBpcyB0aGUgYW5jZXN0cnkgcHJvcG9ydGlvbiBmb3IgZWFjaCBpbmRpdmlkdWFsIGFuZCAkXGJmIEckIGRlc2NyaWJlcyB0aGUgZ2Vub3R5cGUgZnJlcXVlbmNpZXMgaW4gdGhlIGJhc2UgcG9wdWxhdGlvbnMuIFNvIGZhciwgdGhpcyBpcyBlcXVpdmFsZW50IHRvIHdoYXQgYFNUUlVDVFVSRWAgZG9lcy4gV2hlcmUgYExFQWAgZGlmZmVycyBpcyB0aGF0IEZyaWNob3QgZXQgYWwuIHVzZSBhIHNwZWNpYWwgZm9ybSBvZiBsZWFzdCBzcXVhcmVzIG9wdGltaXphdGlvbiB3aGVyZSAkXGJmIFEkIGFuZCAkXGJmIEckIGFyZSBjb25zdHJhaW5lZCB0byBiZSBub24tbmVnYXRpdmUgdGhlaXIgcm93cyAgYXJlIGNvbnN0cmFpbmVkIHRvIHN1bSB0byBvbmUuIAoKSWYgbm9uZSBvZiB0aGF0IG1ha2VzIHNlbnNlLCBkb24ndCB3b3JyeSBhYm91dCBpdC4gQWxsIHlvdSByZWFsbHkgbmVlZCB0byBrbm93IGlzIHRoYXQgYExFQWAgaXMgdHJ5aW5nIHRvIGVzdGltYXRlIHRoZSBzYW1lIHRoaW5nIGFzIGBTVFJVQ1RVUkVgIGFuZCB0aGF0IGl0J3MgdGFraW5nIGEgZGlmZmVyZW50IGFwcHJvYWNoLgoKIyMjIEluc3RhbGxpbmcgTEVBCgpgTEVBYCBpcyBub3QgYXZhaWxhYmxlIG9uIGBDUkFOYC4gSXQgaXMgYSBgQmlvY29uZHVjdG9yYCBwYWNrYWdlLiBUaGUgZ29vZCBuZXdzIGlzIHRoYXQgaXQncyBsZXNzIGRpZmZpY3VsdCB0byBpbnN0YWxsIHRoYXQgYEhpY2tvcnlgLiBIZXJlJ3Mgd2hhdCB5b3UgZG8uIEZpcnN0LCBpbnN0YWxsIGBCaW9jTWFuYWdlcmAuCgo8cHJlPgppbnN0YWxsLnBhY2thZ2VzKCJCaW9jTWFuYWdlciIpCjwvcHJlPgoKVGhlbiBpbnN0YWxsIGBMRUFgLgoKPHByZT4KQmlvY01hbmFnZXI6Omluc3RhbGwoIkxFQSIpCjwvcHJlPgoKQXMgYWx3YXlzLCBsZXQgbWUga25vdyBpZiB5b3UgcnVuIGludG8gZGlmZmljdWx0aWVzIGFuZCBJJ2xsIGRvIHdoYXQgSSBjYW4gdG8gaGVscC4KCiMjIExhYiBleGVyY2lzZSAzCgoxLiBSZWFkIHRocm91Z2ggc2VjdGlvbiAzLjIgb2YgdGhlIHR1dG9yaWFsIG9uIGBMRUFgOiBbaHR0cHM6Ly9iaW9jb25kdWN0b3Iub3JnL3BhY2thZ2VzL3JlbGVhc2UvYmlvYy92aWduZXR0ZXMvTEVBL2luc3QvZG9jL0xFQS5wZGZdKGh0dHBzOi8vYmlvY29uZHVjdG9yLm9yZy9wYWNrYWdlcy9yZWxlYXNlL2Jpb2MvdmlnbmV0dGVzL0xFQS9pbnN0L2RvYy9MRUEucGRmKS4KCjIuIFVzZSBgc25tZigpYCB0byBpZGVudGlmeSB0aGUgYmVzdCBjaG9pY2UgZm9yICRLJCAob3IgdGhlIGJlc3QgY2hvaWNlcyBpZiB0aGVyZSBhcmUgc2V2ZXJhbCB0aGF0IHNlZW0gcHJldHR5IHNpbWlsYXIpLl5bWW91IG1heSBub3Qgc2VlIGEgY2xlYXIgbWluaW11bSBpbiB0aGUgY3Jvc3MtZW50cm9weS4gSWYgbm90LCBwaWNrIGEgJEskIHdoZXJlIGl0IHNlZW1zIGFzIGlmIHRoZSBjcm9zcy1lbnRyb3B5IGJlZ2lucyB0byBwbGF0ZWF1IGFuZCBleGFtaW5lIHRoZSByZXN1bHRzIGZvciB2YWx1ZXMgb2YgJEskIG9uZSBvciB0d28gc3RlcHMgc21hbGxlciBhbmQgb25lIG9yIHR3byBzdGVwcyBncmVhdGVyLl0gVXNlIHZhbHVlcyBvZiAkSyQgZnJvbSAyIHRvIDEwLCBieSBzcGVjaWZ5aW5nIGBLID0gMjoxMGAgaW4gdGhlIGNhbGwgdG8gYHNubWYoKWAuIEhlcmUncyBhIGxpbmsgdG8gdGhlIGRhdGE6IFtodHRwOi8vZGFyd2luLmVlYi51Y29ubi5lZHUvZWViMzQ4LXJlc291cmNlcy9naWxhLXRyb3V0LnN0cnUuZ2Vub10oaHR0cDovL2Rhcndpbi5lZWIudWNvbm4uZWR1L2VlYjM0OC1yZXNvdXJjZXMvZ2lsYS10cm91dC5zdHJ1Lmdlbm8pLgoKMy4gVmlzdWFsaXplIHRoZSByZXN1bHRzIGZyb20gIzIgdXNpbmcgdGhlIGZ1bmN0aW9uIGJlbG93Ll5bVGhlIGVhc2llc3Qgd2F5IHRvIGRvIHRoaXMgaXMgdG8gY29weSBhbmQgcGFzdGUgaXQgdG8gYW4gZWRpdG9yIGFuZCBzYXZlIGl0IGluIGEgZmlsZSBuYW1lZCBgcGxvdF9MRUEuUmAuXltCZSBzdXJlIHRvIHNhdmUgaXQgYXMgYSBwbGFpbiB0ZXh0IGZpbGUsIG5vdCBhcyBhIFdvcmQgZG9jdW1lbnQuIEl0J3MgZWFzaWVzdCB0byBkbyB0aGlzIGlmIHlvdSB1c2UgTm90ZXBhZCBpbiBXaW5kb3dzIG9yIFRleHRlZGl0IG9uIGEgTWFjLl0gVGhlbiBmcm9tIHRoZSBjb25zb2xlIHdpbmRvdyBpbiBgUlN0dWRpb2AgdHlwZSBgc291cmNlKCJwbG90X0xFQS5SIilgXS4gVGhpcyB3aWxsIHB1dCB0aGUgZnVuY3Rpb24gYHBsb3RfTEVBLlJgIGluIHlvdXIgd29ya3NwYWNlLCBhbmQgeW91IGNhbiBjYWxsIGl0IHRvIHBsb3QgdGhlIHJlc3VsdHMgYXMgZm9sbG93czoKPHByZT4Kc2l0ZXNfdGFibGUgPC0gYygzMSwgMzEsIDMwLCAzMSwgMzEpCm5hbWVzKHNpdGVzX3RhYmxlKSA8LSBjKCJJcm9uIENyZWVrIiwgIk1haW4gRGlhbW9uZCIsICJTb3V0aCBEaWFtb25kIiwKICAgICAgICAgICAgICAgICAgICAgICAgIlNwcnVjZSBDcmVlayIsICJXaGlza2V5IENyZWVrIikKcGxvdF9MRUEoInlvdXJfb2JqZWN0X25hbWVfaGVyZSIsICJ5b3VyIEsgaGVyZSIsIHNpdGVzX3RhYmxlKQo8L3ByZT4Kd2hlcmUgeW91IHN1YnN0aXR1dGUgdGhlIG5hbWUgeW91IHVzZWQgdG8gc3RvcmUgdGhlIHJlc3VsdHMgZnJvbSBgc25tZigpYCBhbmQgdGhlIGBLYCB5b3Ugd2FudCB0byB2aXN1YWxpemUuCgo0LiBEbyB5b3Ugc2VlIHN1YnN0YW50aWFsIGV2aWRlbmNlIG9mIGFkbWl4dHVyZT8gSWYgc28sIGluIHdoaWNoIHBvcHVsYXRpb25zPyBIb3cgYXJlIHlvdSByZWNvZ25pemluZyB0aGUgcHJlc2VuY2Ugb3IgYWJzZW5jZSBvZiBhZG1peHR1cmU/CgojIyMgYHBsb3RfTEVBYAoKSGVyZSdzIHRoZSBwbG90dGluZyBjb2RlIHRoYXQgSSBwcm9taXNlZCBhYm92ZS4KCjxwcmU+CmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KGdncGxvdDIpCgpwbG90X0xFQSA8LSBmdW5jdGlvbihmaXQsIGssIHNpdGVzX3RhYmxlKSB7CiAgIHNpdGVzX2VuZCA8LSBjdW1zdW0oc2l0ZXNfdGFibGUpCiAgIGNlIDwtIGNyb3NzLmVudHJvcHkoZml0LCBLID0gaykKICAgYmVzdCA8LSB3aGljaC5taW4oY2UpCiAgIHEgPC0gUShmaXQsIEsgPSBrLCBydW4gPSBiZXN0KQogICBxIDwtIGRhdGEuZnJhbWUoaW5kaXZpZHVhbCA9IHNlcShmcm9tID0gMSwgdG8gPSBucm93KHEpKSwgcSkKICAgY29sbmFtZXMocSkgPC0gYygiSW5kaXZpZHVhbCIsIHBhc3RlKCJDbHVzdGVyXyIsIDE6aywgc2VwID0gIiIpKQogICBkZiA8LSBwaXZvdF9sb25nZXIocSwgc3RhcnRzX3dpdGgoIkNsdXN0ZXIiKSwgbmFtZXNfdG8gPSAiY2x1c3RlciIpCiAgIHAgPC0gZ2dwbG90KGRmLCBhZXMoeCA9IEluZGl2aWR1YWwsIHkgPSB2YWx1ZSwgZmlsbCA9IGNsdXN0ZXIpKSArCiAgICAgIGdlb21fYmFyKHN0YXQgPSAiaWRlbnRpdHkiKSArCiAgICAgIHhsYWIoIkluZGl2aWR1YWwiKSArCiAgICAgIHlsYWIoIkFkbWl4dHVyZSBwcm9wb3J0aW9uIikgKwogICAgICBnZ3RpdGxlKHBhc3RlKCJLID0gIiwgaywgc2VwID0gIiIpKQogICAgICBndWlkZXMoZmlsbCA9ICJub25lIikgKyAKICAgICAgdGhlbWVfYncoKQogICBmb3IgKGkgaW4gMTpsZW5ndGgoc2l0ZXNfZW5kKSkgewogICAgICB4X2luY3JlbWVudCA8LSBpZmVsc2UoaSA+IDEsIGN1bXN1bShzaXRlc190YWJsZVsyOmldKVtpLTFdLCAwKQogICAgICBwIDwtIHAgKyBnZW9tX3ZsaW5lKHhpbnRlcmNlcHQgPSBzaXRlc19lbmRbaV0pICsKICAgICAgICAgYW5ub3RhdGUoInRleHQiLCB4ID0gc2l0ZXNfZW5kWzFdLzIgKyB4X2luY3JlbWVudCwgCiAgICAgICAgICAgICAgICAgIHkgPSAxLjA1LCBsYWJlbCA9IG5hbWVzKHNpdGVzX3RhYmxlKVtpXSkKICAgfQogICBwcmludChwKQp9CjwvcHJlPg==