Search this keyword

Mammal tree from Wikipedia

Following on from my previous post about visualising the mammalian classification in Wikipedia, I've extracted the largest component from the graph for all mammal taxa in Wikipedia, and it is a tree. This wasn't apparent in the previous diagram, where the component appeared as a big ball due to the layout algorithm used.
tree.jpg


What this suggests is that Wikipedia contributors are quite capable of generating trees, it's just that not all the bits of the tree are connected (hence all the components in the previous post.

As Cyndy Parr suggested in her comments, it would be useful to compare the Wikipedia-derived tree with other trees, say from Mammal species of the World or ITIS.

Visualising the Wikipedia classification of mammals

As part of my on-going experiments with Wikipedia as a repository of taxonomic information, I've extracted mammal pages from Wikipedia. There's a lot to be done with these, but the first thing I wanted to ask was whether the Wikipedia pages would form a tree (i.e., had the authors of these pages managed to ensure the pages formed a single, coherent taxonomic classification). The answer, as shown in the graph below, is no.
m.jpg


The graph contains 7750 nodes, each one representing a Wikipedia page with a Taxobox containing the class Mammalia. A node is connected to the node corresponding to its parent in the mammalian classification.

If it formed a single classification there would be just one component. Instead, it contains 841 distinct components, many of which you can see at the bottom. If you want to explore the graph, I've made an image map here using the wonderful graph editor yEd. You'll need to move the browser's scroll bars to see the graph. If you click on the node you'll be taken to the corresponding Wikipedia page.

Note: The graph has been laid out using yEd's organic layout command, so it won't look tree-like. The diagram is intended to testing for connectedness only.

Some of these components may be due to errors in my parser, but many are due to inconsistencies in Wikipedia. Typical problems are Taxoboxes containing taxa for which there is no page in Wikipedia (these are visible as redlinks), or monotypic taxa where the pages for the genus and species are the same).

Of course, the joy of Wikipedia is that these problems can be easily fixed, but the trick is discovering the problems in the first place. There is a distinct lack of tools to enable Wikipedia editors to view the entire classification of interest and identify areas that need fixing (something Roger Hyam alluded to in his comment on an earlier posting). It would, of course, be great to be able to edit the graph shown above and have those changes automatically transmitted to Wikipedia.

Scientific citations in Wikipedia

wikipediaisaccuratecitationneeded.jpg
While thinking about measuring the quality of Wikipedia articles by counting the number of times they cite external literature, and conversely measuring the impact of papers by how many times they're cited in Wikipedia, I discovered, as usual, that somebody has already done it. I came across this nice paper by Finn Årup Nielsen (arXiv:0705.2106v1) (originally published in First Monday as a HTML document, I've embedded the PDF from arXiv below).

Nielsen retrieved 30,368 citations from Wikipedia, and summarised how many times each journal is cited within Wikipedia. He then compared this with a measure of citations within the scientific literature by multiplying the journal's impact factor by the total number of citations. In general there's a pretty good correlation.
1997-20088-1-PB.gif


What is striking to me is that
When individual journals are examined Wikipedia citations to astronomy journals stand out compared to the overall trend (Figure 2). Also Australian botany journals received a considerable number of citations, e.g., Nuytsia (101 [citations]), in part due to concerted effort for the genus Banksia, where several Wikipedia articles for Banksia species have reached "featured article" status.


In the diagram, note also that Australian Systematic Botany (ISSN 1030-1887), which has a impact factor of 1.351, is punching well above its weight in Wikipedia. What I want to find out is whether this is true for other taxonomic journals. Nielsen's study was based on a Wikipedia dump from 2 April 2007, and a lot has been added since then (and the journal Zootaxa has become a major publisher of new taxonomic names).

But what I'm also wondering is whether this is not a great opportunity for the taxonomic community. By responding to {{citation needed}}, we can improve the quality of Wikipedia, and increase the visibility of their work. Given that many Wikipedia taxon pages are in the top 10 Google hits {{citation needed}}, our work is but one click away from the Google results page. Instead of endlessly moaning about the low impact factor of taxonomic journals, we can actively do something that increases the quality and visibility of taxonomic information, and by extension, taxonomy itself.

Scientific citations in Wikipedia