Finding patterns

| 0 Comments | 1 TrackBack | View blog reactions
ResearchBlogging.org Scientists understand the world first by noticing regularities and second by explaining them. Linnaeus noticed that living things could be grouped into hierarchical categories more than a century before Darwin provided the explanation. Mendeleev noticed that elements could be grouped into a periodic table more than a century before the atomic theory explained why.

One of the challenges we always face is figuring out what patterns make sense. Take a bunch of nucleotide sequence data from a bunch of different nuclear genes within a large population of an outcrossing species, throw it into your favorite phylogeny program, and you'll get out a tree, one tree1 -- even if each gene has a different evolutionary history.2 Or take a bunch of data from a single gene and a bunch of different populations and throw it into the same programs, and you'll get out a tree -- even if the populations show a linear cline or isolation by distance.

Wouldn't it be great if you could throw your data into a program and have it figure out whether a tree is the best way to structure your data or if some linear order or a dominance hierarchy or something else made more sense? Well, hang on to your hats. There's a recent paper suggesting that it might just be possible.
F3.large.jpgStructures discovered in five sample datasets. (a) animals, (b) justices' decisions from the U.S. Supreme Court, (c) colors, (d) human faces, and (e) the geographical position of world cities. (from Kemp and Tenenbaum 2008)
Charles Kemp and Joshua Tenenbaum describe a Bayesian approach in which the structure of the underlying relationships is itself a part of the model being estimated. As the figure to the left shows, using their approach they were able to deduce that the relatiionships among animals are best represented by a tree, while those among justices' decisions on the U.S. Supreme Court are best represented along a spectrum from right to left.

I'm no expert on the Supreme Court, but the placement of the various justices on that spectrum looks pretty reasonable to me. The animal tree has a problem, since it puts birds closer to mammals than to alligators and iguanas. But alligators and iguanas are the only tetrapods in the tree other than birds and mammals, and there are only two fish. So the problem may be with the long branches in the middle of the tree rather than a fundamental problem with the approach. Moreover, the data used to generate the tree consisted of 106 binary characters like "perceptual features (is black), anatomical features (has feet), ecological features (lives in the ocean), and behavioral features (makes loud noises)." Given that none of those characters are good homologies, it's surprising how well the result turns out.

All of the datasets and code for running the model are available from http://charleskemp.com. I can see myself spending some time playing around with this in the not too distant future. The results look very promising.

1Even if you throw it into Mr. Bayes, chances are you'll focus on the majority-rule consensus tree as your one best guess for the evolutionary history.
2In a large population of an outcrossing species different (unlinked) genes will have different evolutionary histories because the process of drift happens independently at each locus, leading to different coalescent histories.

Holyoak, K.J. (2008). Induction as model selection. Proceedings of the National Academy of Sciences, 105(31), 10637-10638. DOI: 10.1073/pnas.0805910105
Kemp, C., Tenenbaum, J.B. (2008). From the Cover: The discovery of structural form. Proceedings of the National Academy of Sciences, 105(31), 10687-10692. DOI: 10.1073/pnas.0802631105

1 TrackBack

TrackBack URL: http://darwin.eeb.uconn.edu/cgi-bin/mt/mt-tb.cgi/1704

RSS problems from Uncommon Ground on August 12, 2008 12:05 PM

If you're one of the dozen people who subscribe to my RSS feed and if you are interested in what I write, 1 you might want to click through to the site and see what's there. For some reason, Feedburner... Read More

Leave a comment

 Subscribe in a reader

Recent Entries

A swine flu survey
Carl Zimmer points to a study on swine flu psychology that needs participants.As you have heard in the news, there…
Suppressing evidence
From Andy Revkin a few days ago.For more than a decade the Global Climate Coalition, a group representing industries with…
Who does climate change hurt?
From the Center for Climate Change Communication at George Mason University Based on a nationally representative survey of 2,164 American…
Nature Blog Network View blog authority