R is an open source statistical package with versions available for Linux, Mac OS X, and Windows. In addition to being available without charge, it is very powerful and very flexible. It's been my statistical package of choice for the past 4-5 years. Running a test or Fisher's exact test, as you need to do for the first question in Problem #1, is very straightforward - once you get the data in. In the example that follows, we'll imagine that you've determined the number and alleles contributed by fathers to offspring of each maternal genotype as follows:
Let me show you the code first. Then I'll explain it.
> alleles <- matrix(c(12, 4, 15, 17, 25, 4), nr=3, + dimnames=list(c("A1A1", "A1A2", "A2A2"), c("A1", "A2"))) > alleles A1 A2 A1A1 12 17 A1A2 4 25 A2A2 15 4 > chisq.test(alleles) Pearson's Chi-squared test data: alleles X-squared = 20.2851, df = 2, p-value = 3.937e-05 > fisher.test(alleles) Fisher's Exact Test for Count Data data: alleles p-value = 2.957e-05 alternative hypothesis: two.sided
The first thing to know about R is that it's command-line oriented. If you've ever used Linux, the DOS box in Windows, or the terminal in Mac OS X, you'll be familiar with the idea of a ``prompt''. That's the > that you see at the start of some lines. It indicates that R is waiting for you to type something. Sometimes you'll have a long command to type, as I do on the first line. IF you havent finished your command yet when you hit return, you'll get a different prompt, the + that you see on the second line. This reminds you that you're in the middle of typing a command. So what does that first line do?
> alleles <- matrix(c(12, 4, 15, 17, 25, 4), nr=3You'd get a table that looks like this
> alleles [,1] [,2] [1,] 12 17 [2,] 4 25 [3,] 15 4As you can see, including dimnames makes it easier to see that the data you've entered matches what you actually wanted to enter.
Pearson's Chi-squared test data: alleles X-squared = 20.2851, df = 2, p-value = 3.937e-05You can run a Fisher's exact test by typing fisher.test(alleles).
Fisher's Exact Test for Count Data data: alleles p-value = 2.957e-05 alternative hypothesis: two.sidedIn most cases the -values won't be too different, but if they are different, the exact test is right. Since it's just as easy to run a Fisher's exact test in R as a test, I prefer the Fisher's exact test. The is based on an approximation and was useful before the advent of powerful desktop computers, but the only reason to use it now is if you don't have a computer handy and need to do the calculations by hand.
In this case the test gives us a value of and Fisher's exact test gives us a value of . So regardles of which test we choose we have very strong evidence that there are significant differences in the proportion of sperm fertilizing eggs of the three different maternal genotypes.