Search this keyword

Visualising edit history of a Wikipedia page

Quick post (really should be doing something else). Reading Jeff Atwood's post Mixing Oil and Water: Authorship in a Wiki World lead me to IBM's wonderful history flow tool to visualise the edit history of a Wikipedia page.

Imagine a scenario where three people will make contributions to a Wiki page at different points in time. Each person edits the page and then saves their changes to what becomes the latest version of that page.

history-flow-animation.gif

History Flow connects text that has been kept the same between consecutive versions. Pieces of text that do not have correspondence in the next (or previous) version are not connected and the user sees a resulting "gap" in the visualization; this happens for deletions and insertions. (animated GIF from Jeff Atwood's post).


There's a nice paper describing history flow (doi:10.1145/985692.985765, free PDF here). Inspired by this I decided to try and implement history flow in PHP and SVG. Here's a preliminary result:

afrotheria.png

This is the edit history for the Afrotheria page. Click on the image above (or here to see the SVG image -- you need a decent web browser for this, IE uses will need a SVG plugin).

The SVG image is clickable. The columns represent revisions, click on those to go to that revision. The columns are evenly spaced (i.e., the gaps don't correspond to time). The bands between revisions trace individual blocks of text (in this case lines in the Wikipedia page source). If you click on a band you get taken to that Wikipedia user's page.

This is all done in a rush, but it gives an idea of what can be done. The history flow carries all sorts of information about how an article has developed over time, major changes (such as the introduction of Taxoboxes), and makes the content of a page traceable, in the sense that you can see who contributed what to a page.

Google and Wikipedia revisited

Given that one response to my post on Fungi in Wikipedia was to say that fungi are also charismatic, so maybe I should try [insert unsexy taxon name here]. So, I've now looked at all the species I extracted from Wikipedia (nearly 72,000), ran the Google searches, and here are the results:

SiteHow many times is it the top hit?
en.wikipedia.org42515
www.birdlife.org2125
commons.wikimedia.org1522
plants.usda.gov1496
species.wikimedia.org1487
animaldiversity.ummz.umich.edu1419
amphibiaweb.org851
www.calflora.org770
www.fishbase.org727
ibc.lynxeds.com699
davesgarden.com659
www.arkive.org510
ukmoths.org.uk414
zipcodezoo.com368
www.itis.gov304
calphotos.berkeley.edu294
www.floridata.com234
www.planetcatfish.com234
www.eol.org226
www.arthurgrosset.com213


The table lists the top twenty sites, based on the number of times each site occupies the number one place in the Google search results. Surprise, surprise, Wikipedia wins hands down.

What is interesting is that the other top-ranking sites tend to be taxon-specific, such as FishBase, Amphibia Web, and USDA Plants. To me this suggests that the argument that Wikipedia's dominance of the search results is because it focusses on charismatic taxa doesn't hold. In fact, the truly charismatic taxa are likely to have their own, richly informative webs sites that will often beat Wikipedia in the search rankings. If your taxon is not charismatic, then it's a different story. This suggests one of two strategies for making taxon web sites that people will find. Either go for the niche market, and make a rich site for a set of taxa that you (and ideally some others) like, or add content to Wikipedia. Sites that span across all taxa will always come up against Wikipedia's dominance in the search rankings. So, it's a choice of being a specialist, or trying to compete with an über-generalist.

Fungi in Wikipedia

One response to the analysis I did of the Google rank of mammal pages in Wikipedia is to suggest that Wikipedia does well for mammals because these are charismatic. It's been suggested that for other groups of taxa Wikipedia might not be so prominent in the search results.

As a quick test I extracted the 1552 fungal species I could find in Wikipedia and repeated the analysis. If anything, the results are more dramatic:
Untitled Image.png


Once again, Wikipedia dominates the search rankings. Over 75% of the pages are the top hit in Google. More specialist fungal sites, such as CAB Abstracts Plus and the American Phytopathological Society's online database do pretty well. EOL and the nomenclatural database Index Fungorum barely make an appearance.

If fungi are less "charismatic" than mammals, the implication is that the less charismatic the taxon, the better Wikipedia does (perhaps there is less competition from other sites). Of course, Wikipedia is severely underpopulated with fungal pages, so one could argue that for fungi not in Wikipedia, sites like EOL may do better (relative to other sites), but that would need to be tested. I suspect that sites that provide more broadly useful information (such as APSnet) will continue to dominate the search rankings, followed by scientific articles (for the fungi in Wikipedia the publishers Springer, Wiley, and Elsevier all appear in the top of sites that appear in the Google rankings).