Search this keyword

Gene Wiki and Google

Andrew Su has posted an analysis of Gene Wiki, a project to provide Wikipedia pages on every human gene:
Here's the take home message: in terms of online gene annotation resources, Gene Cards is the most common top-ranked resource, followed closely by the Gene Wiki / Wikipedia, with NCBI in a very distant third (note the log scale).
top_sites.png

This result is interesting in that an existing resource (Gene Cards) beats Wikipedia, but only just. There are various ways we could interpret this, but from the point of view of biodiversity resources I suspect it emphasises that if there is a good, existing resource that has a lot of traction (i.e., Gene Cards) it will do well in Google Searches. If there is no single dominant resource (as is the case for biodiversity), then it leaves the field open to be dominated by Wikipedia.

Visualising edit history of a Wikipedia page

Quick post (really should be doing something else). Reading Jeff Atwood's post Mixing Oil and Water: Authorship in a Wiki World lead me to IBM's wonderful history flow tool to visualise the edit history of a Wikipedia page.

Imagine a scenario where three people will make contributions to a Wiki page at different points in time. Each person edits the page and then saves their changes to what becomes the latest version of that page.

history-flow-animation.gif

History Flow connects text that has been kept the same between consecutive versions. Pieces of text that do not have correspondence in the next (or previous) version are not connected and the user sees a resulting "gap" in the visualization; this happens for deletions and insertions. (animated GIF from Jeff Atwood's post).


There's a nice paper describing history flow (doi:10.1145/985692.985765, free PDF here). Inspired by this I decided to try and implement history flow in PHP and SVG. Here's a preliminary result:

afrotheria.png

This is the edit history for the Afrotheria page. Click on the image above (or here to see the SVG image -- you need a decent web browser for this, IE uses will need a SVG plugin).

The SVG image is clickable. The columns represent revisions, click on those to go to that revision. The columns are evenly spaced (i.e., the gaps don't correspond to time). The bands between revisions trace individual blocks of text (in this case lines in the Wikipedia page source). If you click on a band you get taken to that Wikipedia user's page.

This is all done in a rush, but it gives an idea of what can be done. The history flow carries all sorts of information about how an article has developed over time, major changes (such as the introduction of Taxoboxes), and makes the content of a page traceable, in the sense that you can see who contributed what to a page.

Google and Wikipedia revisited

Given that one response to my post on Fungi in Wikipedia was to say that fungi are also charismatic, so maybe I should try [insert unsexy taxon name here]. So, I've now looked at all the species I extracted from Wikipedia (nearly 72,000), ran the Google searches, and here are the results:

SiteHow many times is it the top hit?
en.wikipedia.org42515
www.birdlife.org2125
commons.wikimedia.org1522
plants.usda.gov1496
species.wikimedia.org1487
animaldiversity.ummz.umich.edu1419
amphibiaweb.org851
www.calflora.org770
www.fishbase.org727
ibc.lynxeds.com699
davesgarden.com659
www.arkive.org510
ukmoths.org.uk414
zipcodezoo.com368
www.itis.gov304
calphotos.berkeley.edu294
www.floridata.com234
www.planetcatfish.com234
www.eol.org226
www.arthurgrosset.com213


The table lists the top twenty sites, based on the number of times each site occupies the number one place in the Google search results. Surprise, surprise, Wikipedia wins hands down.

What is interesting is that the other top-ranking sites tend to be taxon-specific, such as FishBase, Amphibia Web, and USDA Plants. To me this suggests that the argument that Wikipedia's dominance of the search results is because it focusses on charismatic taxa doesn't hold. In fact, the truly charismatic taxa are likely to have their own, richly informative webs sites that will often beat Wikipedia in the search rankings. If your taxon is not charismatic, then it's a different story. This suggests one of two strategies for making taxon web sites that people will find. Either go for the niche market, and make a rich site for a set of taxa that you (and ideally some others) like, or add content to Wikipedia. Sites that span across all taxa will always come up against Wikipedia's dominance in the search rankings. So, it's a choice of being a specialist, or trying to compete with an über-generalist.