Project #5 – Estimating heritability

I’ve posted the assignment, data, and R code for Project #5. Feel free to take a look at it. Kristen will show you how to use rstanarm (specifically stan_glmer) for the analysis. I know that you probably don’t want to learn another R package, but this is one that you are likely to find very useful for analysis of all kinds of data, not just genetic data.

Update on Project #4

It appears that strataG doesn’t work with the latest version of R (v 3.5.3). I did, however, realize that it’s possible to do the analysis using pegas. You use the same command to read a FASTA file in as I showed in the notes. Then you run tajima.test(seqs) on the results (where seqs is the name of the object you read the DNA sequence information into). I’ll update the project assignment to show the new syntax later today.

Project #4 posted

In case you want to get an early start on Project #4, I’ve posted it to the lecture detail page for April 2nd. After you’ve installed strataG in R, you may encounter an error that prevents the library from loading. I had that problem with v3.5.3 on an iMac, but v3.5.1 works fine on my MacBook. I’ll try to sort out the problem before tomorrow, but if I can’t, Kristen and I will develop a workaround.

Updated R Shiny app illustrating F-statistics

I updated the R Shiny app that compare’s Nei’s Gst and Weir and Cockerham’s θ. This version allows you to set Fis, although it is set to the same value in every population. I’ll use it for a demonstration in class tomorrow, and Kristen may have you play with it a bit more during lab.

Speaking of lab, come prepared to install at least one new R package. It allows you to make estimates of Fst using the Weir and Cockerham approach. If you ever collect your own data where you need Fst estimates, it’s the most convenient approach to use.

Project #1 is posted and there’s new binomial sampling code

I’ve just posted Project #1 for anyone who wants to get an early look. If you do take an early look, don’t be frightened. You’ll see some things that we haven’t covered yet. We will cover them in lecture tomorrow. I will also take time towards the end of lecture tomorrow to describe the project and what we’ll be looking for in the answers, so there’s no need to rush out and look at it now. And if you do, remember I warned you not to be frightened.

I also updated the R code associated with binomial.jags. The new version of binomial.R will display a histogram showing the posterior density corresponding to your sample. Kristen will walk you through some exercises with in tomorrow in lab.

Posterior distribution of allele frequencies in the ABO blood group system

In Thursday’s lecture there was a very good question about the posterior distribution of allele frequencies in the ABO blood group system. I promised to produce a histogram illustrating the posterior distribution. It took a bit longer than I had hoped, but I have results to share.

Posterior distribution of allele frequencies

What you see above are histograms showing how often each of the three allele frequencies took on a particular value in one run of the MCMC sampler in JAGS. The sample size is quite large (862 A, 131 AB, 365 B, and 702 O), so the distributions are very narrow, meaning that we have a high degree of confidence in our estimates: 0.281 a, 0.129, b, 0.589 o. If we reduce the sample size to 29 A, 4 AB, 14 B and 23 O, you’d expect the posterior distributions to be broader and you’d be right. Here’s the result:

Posterior distribution of allele frequencies

I’ve updated multinomial.R to produce the histograms you see here after you run the code. You’ll need to install ggplot2 and bayesplot in order to produce the histograms. If you have trouble with that, Kristen can help you on Tuesday.

I encourage you to fiddle with sample sizes that are specified in the code and see how the posterior distributions change. Also compare the posterior distributions to the 95% credible intervals reported in the printout.

Course overview for EEB 5348 (Spring 2018) updated and finalized

I’ve updated the final pieces of the course website: the Course Overview and the Lab Schedule. When you look at the Lab Schedule page, you’ll notice that it has only the dates for the labs. That’s because from now on Kristen will be updating the Lab Schedule page. She and I will be determining the specifics of each lab assignment as we go along based on the interests you express and the parts of the course where more hands on experience will be of the greatest benefit. Kristen will do her best to have materials posted by noon on the Monday before each lab, if not sooner. If she doesn’t manage to pull that off, it will be my fault for not letting her know where we’re going sooner, not hers.

Notes on resemblance among relatives, evolution of quantitative traits, and association mapping

I just added three more sets of notes to the website:

  1. Resemblance among relatives – The mathematical underpinnings of how we use resemblance among relatives to estimate components of the genetic variance when we don’t know the underlying genes.
  2. Evolution of quantitative traits – The mathematics of how selection on phenotypes results in changes in allele frequency from one generation to the next that then result in a new set of phenotypes in the following generation. That’s R = h^2 S. There’s also a derivation of Fisher’s Fundamental Theorem of Natural Selection at one locus with two alleles.
  3. Association mapping – A very cursory introduction to the principles of association mapping, including some notes on 2-locus population genetics.

I expect to get the notes on genomic prediction written next weekend. Once I do, I’ll also be posting an updated one-volume version of the notes.

An R Shiny application illustrating resemblance between parents and offspring

In my continuing effort to develop R Shiny applications that illustrate principles of population genetics, I’ve just added one that illustrates the resemblance between parents and offspring. It’s based on a really simple model (one locus, two alleles, and the same environmental variance for all genotypes). You can see phenotype distributions, components of genetic variance (calculated from the underlying genotypic values and allele frequency) and simulate a parent-offspring regression with different sample sizes.