An exercise using Tajima’s D

We discussed Tajima’s D in class last week. As you’ll notice when you read this week’s exercise, I realized that I’ve been describing a couple of details concerning Tajima’s D incorrectly for several years. I’ve corrected the online notes, and you’ll find a brief reference to the differences in this week’s exercise. Fortunately, the details I got wrong don’t affect the interpretation of any results, only the way in which Tajima’s D is calculated. As usual, you can also find the exercise linked from the Lab Schedule page.

In grading Lab 7, the one on exploring the coalescent, I realized that the 1-dimensional stepping stone and the finite island model didn’t have quite the properties I expected when the number of populations is low. I verified the pattern many of you found, and the results are displayed here. You can also find a link to the results from the Lab Schedule page. We’ll talk more about what they mean and what they tell us about drift and migration on Tuesday.

Coalescence and self-incompatibility

Lab 9 is now posted. As usual, you can find it from the Lab Schedule page or from the direct link below. Although there is a little simulation this week, there’s only one simulation, and it runs pretty quickly (4-5 seconds) on my laptop). As with Project #2, the emphasis here is on interpreting the results. If you’d like an overview of sRNases (the proteins produced by the loci used in this exercise), the Igic and Kohn paper below is a bit old (OK, two decades old), but it provides a good overview of the phenomena.

  • Comparing simulated and “observed” coalescent times

Igic, B., and J.R. Kohn. 2021. Evolutionary relationships among self-incompatibility RNases. Proceedings of the National Academy of Sciences USA 98:13167-13171. doi: 10.1073/pnas.231386798

Conservation genetics of Pacific salmon – Project #2

I’ve just uploaded Project #2. As usual, you can find it either from the Lecture Schedule page or from the direct link below. This project is different from any of the lab exercises you’ve done so far. There isn’t data to analyze, and there aren’t simulations to do. Instead, there’s a paper by Robin Waples and David Teel to read: Conservation Genetics of Pacific Salmon I. Temporal Changes in Allele Frequency. [1]This link takes you to the website of Conservation Biology. I can’t seem to get to the full-text of the paper from off-campus, even though the VPN, but there is a freely available version at: … Continue reading After reading the paper, I have five questions for you to answer. When I grade this project, I will be evaluating how well you use what you’ve learned about genetic drift and natural selection to answer the questions. None of the answers need to be more than a couple of paragraphs. Feel free to submit them in whatever form you find convenient, Word document, R notebook, PDF, or Pages are formats I know I can handle easily. If you send it in a form I don’t recognize, I’ll be in touch.


1 This link takes you to the website of Conservation Biology. I can’t seem to get to the full-text of the paper from off-campus, even though the VPN, but there is a freely available version at:

Exploring the coalescent

I’ve just posted Lab 7. You’ll find it in the usual places, i.e., from the Lab Schedule page or from the direct link below. As you’ll see, this lab exercise will allow you to explore the coalescent by comparing the time to coalescence of all alleles in a sample as a function of the rate of migration, the number of populations exchanging genes, and the migration model, i.e., either the finite island model you explored in last week’s exercise or the one-dimensional stepping stone. As noted in the exercise itself, the run_simulation() function will run 1000 samples by default. I recommend picking a smaller number first, say 50 or 100, to get a sense of how long a full run will take before you start the simulation. It shouldn’t be surprising that simulations take longer the more populations that you specify. There’s no need to try simulating more than 100 populations. You’ll see any patterns associated with differing numbers of populations by the time you get to that many.

Lab exercise for Week 6 posted

I’ll have grades returned for Week 5’s lab by the end of the day tomorrow. This week’s lab is a bit more involved for a couple of reasons.

  1. Each set of simulation conditions will take you longer to run. Each set takes about half an hour on my MacBook. You’ll want to get an early start on the exercise so you have a chance to explore at least five different sets of conditions.
  2. The questions ask you to stretch a little and interpret the results of the simulations relative to theoretical expectations.

As usual, you can get directly to the exercise from the link below, or you can get to it through the Lab Schedule page.

Lab exercise for week 5 posted

I just posted the lab exercise for week 5. You can find it from the Lab Schedule page, or you can click on the link below. As you’ll see, this week’s exercise is a bit different. Rather than analyzing data, you’ll be running a small simulation to explore some properties of drift. Provided that you show results from at least 10 different sets of parameters, you’ll get 10 out of 10 points for this exercise. The next two weeks will be similar, except that I’m liable to ask you to stretch a bit and explain some aspect of the results you see in weeks 6 and 7.

Project 1 now available

Project 1 is now available. You can find it from the Lab Schedule, or you can download it directly from this link. You’ll be analyzing the same data as you did last week, but it’s in a different format. If you continue to use population genetics in your research after this course, one of the things you’ll discover is that there are a variety of different formats used by different packages. In addition to using R for analyses, you’ll find that you need to learn how to wrangle data in R (or in Python), but I’ll do my best to protect you from that this semester.

Updated notes on Bayesian estimates of Fst

I included a few details about the Bayesian model that Hickory uses that weren’t in the notes I posted earlier – the among population allele frequency distribution and extending the model to includ locus- and population-specific effects. I’ve edited the notes to include those details, and you’ll find new HTML and PDF versions on the website. They’ll appear in the consolidated “book” version of the notes in mid-late December.

Quick update on today’s lecture

In addition to the additional reading on F-statistics that I already provided in the lecture detail associated with today’s lecture, I just added a link to an R notebook that delves a bit more into how Bayesian inference is typically implemented and that illustrates a bit better how the likelihood, prior, and posterior are related to one another. We’ll spend some time exploring these ideas at the start of today’s lecture