A better approach to citation analysis

Many people use the impact factor of a journal as a quick and dirty assessment of its quality, in spite of its many limitations.1 Some administrators use the impact factor of journals in which an author's work appears as a quick and dirty assessment of the quality of that author's work. A better index would probably be how often that work is cited, but sometimes work becomes so broadly influential and so taken for granted that it's simply common knowledge and no longer cited. Who cites Darwin, On the Origin of the Species (1859), when they refer to the theory of evolution by natural selection, for example? Still, counting citations gives us some indication of the influence an individual has had on their field, but it's far from perfect.

It is a slightly more robust measure, but it is still silly because 90% of citations are shallow: most authors haven't even read the paper they are citing. We tend to cite famous authors and famous venues in the hope that some of the prestige will get reflected. (Daniel Lemire)
Unlike me, Daniel Lemire doesn't just point out the inadequacy of citation counting. He proposes to do something about it.

We have the technology to measure the usage made of a cited paper. Some citations are more significant: for example it can be an extension of the cited paper. Machine learning techniques can measure the impact of your papers based on how much following papers build on your results.
He's starting a project to develop such an approach, but he needs your help (if you've published one or more scientific papers). He needs you to head over to his site and fill out a short form that will give him and his collaborators the data they need to start building textual analysis tools that will allow for automated analysis of which papers have the largest influence on how a field develops. Please head over and help him out.

In case you want to see the link before you click on it, here it is:


1The Wikipedia entry on impact factors has a good summary of the major criticisms, centering on validity of the scores, editorial policies that can affect them, ways in which they can be manipulated, and ways in which they may be misused.