September 2010 Archives
Question: After running Structure last week, we all found that as K got larger, so did its corresponding mean log prob of the data. That would mean that K=8 has the largest and so, logically, we should pick K=8 for our K values to run with the new data? I remember you talking about this in class and I can't find in my notes if this is the most important method for finding K in our situation.My response: When we first talked about Structure in class, I presented two methods for selecting K: (1) looking at the log probability of the data directly from Structure (in our case a mean of ~20 runs for each K from the spreadsheet) and (2) calculating Delta-K. I pointed out that simply using the log probability of the data is likely to overestimate K (a) if the underlying groups have genotype frequencies that depart from Hardy-Weinberg &/or (b) if the multilocus genotype frequencies aren't simply the product of the single-locus frequencies. In our case both (a) and (b) are likely to hold, so basing our choice of K purely on the log probability of the data probably isn't a good idea.
Looking at Delta-K isn't a horrible alternative, but if you read the paper in which it was proposed as an alternative, this approach to selecting K was tested only under one very specific kind of population structure, which may or may not apply in our case. So I'd certainly suggest looking at Delta-K, but I wouldn't treat what it tells you as gospel either.
So what do you do? Well, take a look at what we did in our Molecular Ecology paper. We combined what both criteria had to tell us about K with a look at where populations fell on the map and what the different choices of K seemed to tell us about their relationship. In other words, we combined some quantitative understanding of population structure with biological intuition about what relationships are likely to be important to settle on a K that best helped us to understand the data.
That's what I'm suggesting you do here. Look at both the log probability of the data and Delta-K. Then combine what they tell you with what the geographical distribution of populations might suggest about relationships to select a K (or possibly 2-3 Ks) that seem most informative. Be sure to explain your reasoning for settling on the number of Ks that you choose. Then use that choice (or those choices) to set up the analyses for questions 2 and 3.
The commentary: A bit of Texas in Florida
The full article: Genetic restoration of the Florida panther
We've also posted a slightly modified version of the structure data file that has the populations sorted into taxonomic order. You'll need to run Structure again to get the bar plot with the populations sorted into taxa, but you only need to do it for the K (or K's) you think are important for interpreting the data, and you only need to do it once for that K (or K's). Here's the link. We've also posted a copy of an Excel spreadsheet with the data you'll need for delta-K calculations.
Kathryn also put together a document describing how she solved the questions in Problem 1.
To ease my grading, please use the following instructions:
1) 1) Save your file as a .doc or .docx with a file name Last Name_Problem1 (example: Theiss_Problem1)
2) 2) Please answer each question on its own page. You should write 2-3 sentences to answer the question, and then paste your code and stats underneath.
I've also uploaded a new version of james.arp (for use with Arlequin). If you downloaded it earlier, please replace that version with this one. There were a couple of glitches in it that I needed to fix.
It also dawned on me that I should post versions of the R/JAGS code for the solution to the example problem for anyone whose interested. The R code is in the (creatively named) zoarces.R and the JAGS code, which is equivalent to the WinBUGS code, is in zoarces.txt.
Even if you've already figured out how to use JAGS and R, you may want to download this too. It's easier to see where errors in your code are when you use the graphical interface. And if you're like me, you'll make plenty of errors, so finding them (relatively) easily is a big plus.
If you're connecting from off-campus, you should get an error message saying that you don't have permission to retrieve the file. I did that because even though OpenBUGS is freely available, I don't want to be responsible for distributing it to anyone other than folks at UConn.