`R` is an open source statistical package with versions available
for Linux, Mac OS X, and Windows. In addition to being available
without charge, it is *very* powerful and very flexible. It's
been my statistical package of choice for the past 4-5 years. Running
a test or Fisher's exact test, as you need to do for the
first question in Problem #1, is very straightforward - once you
get the data in. In the example that follows, we'll imagine that
you've determined the number and alleles contributed by
fathers to offspring of each maternal genotype as follows:

Paternal gamete | ||

Maternal genotype | ||

12 | 17 | |

4 | 25 | |

15 | 4 |

Let me show you the code first. Then I'll explain it.

> alleles <- matrix(c(12, 4, 15, 17, 25, 4), nr=3, + dimnames=list(c("A1A1", "A1A2", "A2A2"), c("A1", "A2"))) > alleles A1 A2 A1A1 12 17 A1A2 4 25 A2A2 15 4 > chisq.test(alleles) Pearson's Chi-squared test data: alleles X-squared = 20.2851, df = 2, p-value = 3.937e-05 > fisher.test(alleles) Fisher's Exact Test for Count Data data: alleles p-value = 2.957e-05 alternative hypothesis: two.sided

The first thing to know about `R` is that it's command-line
oriented. If you've ever used Linux, the DOS box in Windows, or the
terminal in Mac OS X, you'll be familiar with the idea of a
``prompt''. That's the `>` that you see at the start of some lines. It
indicates that `R` is waiting for you to type something. Sometimes
you'll have a long command to type, as I do on the first line. IF you
havent finished your command yet when you hit return, you'll get a
different prompt, the `+` that you see on the second line. This
reminds you that you're in the middle of typing a command. So what
does that first line do?

`alleles`- This is the name of an object that the first command creates in which to store the data. I do this so that I can easily refer to the data later. You can give the object pretty much any name you want, as long as it doesn't start with a number. You could call it`Fred`, if you wanted to, or`x`, if you don't want to type so much.`<-`- This is the ``operator'' that assigns the result of the command we put on the right side to the object,`alleles`, on the left side.`matrix`- This tells`R`that we're going to construct a matrix with our data. The stuff in between the pair of parentheses that start on this line and end on the next line tell`R`how to construct the matrix.`c(12, 4, 15, 17, 25, 4),`- As you can probably guess this is the data. The`c()`constructs a column of data (each element separated by a comma. Notice that we list the data by going down the first column to the bottom and beginning again at the top. The comma at the end of this parenthesis lets`R`know that the next ``argument'' is coming.`nr=3,`- This argument tells`R`that there are three rows in our data. This means that the column of data you just created has to have a multiple of three elements in it. In our case we have six, so we're fine. The comma tells us that the next argument is coming.^{1}Notice that I hit the return key after the comma so we now go to the next line, which has a prompt of`+`because the command isn't finished. Instead of the comma, you could put a parenthesis here,`)`. When you hit return, you'd get a`>`instead of a`+`, indicating that the command was done.- This line is optional, but I rather like it. It allows me to
check to make sure I've entered the data the way that I thought I
did.
`dimnames`means that I'm going to give names to the rows and columns. There's a`list`with two elements, a column of maternal genotypes,`c("A1A1", "A1A2", "A2A2")`, and a column of paternal alleles,`c("A1", "A2")`. There are three parentheses at the end of the line to balance the three that preceded it.^{2}When you hit return you get the primary prompt,`>`. - Now I just type
`alleles`at the prompt and I see the table of data displayed before me. If you'd left off the`dimnames`, i.e., if you'd entered only> alleles <- matrix(c(12, 4, 15, 17, 25, 4), nr=3

You'd get a table that looks like this> alleles [,1] [,2] [1,] 12 17 [2,] 4 25 [3,] 15 4

As you can see, including`dimnames`makes it easier to see that the data you've entered matches what you actually wanted to enter. - That's the hard part. To run a test just type
`chisq.test(alleles)`and hit return. You'll get the following result.Pearson's Chi-squared test data: alleles X-squared = 20.2851, df = 2, p-value = 3.937e-05

You can run a Fisher's exact test by typing`fisher.test(alleles)`.Fisher's Exact Test for Count Data data: alleles p-value = 2.957e-05 alternative hypothesis: two.sided

In most cases the -values won't be too different, but if they are different, the exact test is right. Since it's just as easy to run a Fisher's exact test in`R`as a test, I prefer the Fisher's exact test. The is based on an approximation and was useful before the advent of powerful desktop computers, but the only reason to use it now is if you don't have a computer handy and need to do the calculations by hand.In this case the test gives us a value of and Fisher's exact test gives us a value of . So regardles of which test we choose we have very strong evidence that there are significant differences in the proportion of sperm fertilizing eggs of the three different maternal genotypes.