Search this keyword

NCBI Taxonomy IDs and Wikipedia

Wikipedia-logo-v2-en.png
36388.gif

I've written a note on the Wikipedia Taxobox page making the case for adding NCBI taxonomy IDs to the standard Taxobox used to summarise information about a taxon. Here is what I wrote:

Wikipedia's taxon pages have a huge web presence (see my blog post Google and Wikipedia revisited and Page, R. D. M. (2010). "Wikipedia as an encyclopaedia of life". Nature Precedings hdl:10101/npre.2010.4242.1). If a taxon is in Wikipedia it is almost always the first search result in Google. Researchers in other areas of biology are making use of a Wikipedia as a tool to annotate genes Gene Wiki and RNA families Wikipedia:WikiProject_RNA, respectively. Pages for genes, such as Cytochrome_b, have numerous external identifiers in their equivalent of the Taxobox (the Pfam_box). I think we are missing a huge opportunity by not including NCBI taxonomy ids. The advantages would be:

  • It would provide a valuable service to Wikipedia readers by enabling them to go to NCBI to discover more about a taxon

  • It would help Wikipedia contributors by providing a standardised way to refer to NCBI (and enable bots to add missing NCBI taxonomy ids). Putting them in an External links section makes it harder to be consistent (there are various ways to write a URL linking to the NCBI taxonomy)

  • It would facilitate linking from NCBI to Wikipedia. A mapping of Wikipedia pages to NCBI taxonomy ids could be added to NCBI Linkout, generating more traffic to the Wikipedia pages

  • Projects that are trying to integrate information from different sources would be able to combine information of genomics from NCBI with other information much more readily

Note that I am not arguing that Wikipedia should "follow" NCBI taxonomy, merely that where the potential to link exists, the links would create value, both within and outside the Wikipedia community.

Some discussion has ensued on the Taxobox page, all positive. I'm blogging this here to encourage anyone who as any more thoughts on the matter to contribute to the discussion.

Viewing a BioStor reference in Cooliris

cooliris.pngCooliris is a web browser plugin that can display a large number of images as a moving "infinite" wall. It's Friday, so for fun I added a media RSS feed to BioStor to make the BHL page scans available to Cooliris. The result is easier to show than describe, so take a peek at the video I made of A review of the Centrolenid frogs of Ecuador, with descriptions of new species (http://biostor.org/reference/20844):

Cooliris view of BioStor from Roderic Page on Vimeo.


Cooliris is a little flaky under Snow Leopard, but still works (the plug-in is cross platform). It is also available for the iPhone (and I'm assuming the iPad), which means you can get the experience on a mobile device.

Linking biodiversity data

Time for a Friday folly. I've made a clunky screencast showing an example of linking biodiversity data together, using bioGUID as the universal wrapper around various data sources. I started with GenBank sequence EF013683, added another, EF013555, then explored some links (specimen, publication, taxon, journal), using the OpenLink RDF Browser:



You can try the URIs I used in the linked data browser of your choice:


The demo is a bit clunky, partly because the linked data browser is generic. What we really need is a browser that is tailored to displaying the kind of data we're interested, and hides the gory details under the hood. But the goal is to show that, once everything we care about has a resolvable URI that provides data in a consistent form, and we re-use identifiers, then we can glue stuff together with relative ease. In principle we can simply crawl this web of data (you can append other DOIs, ISSNs, and Genbank accession numbers to http://bioguid.info and get RDF to your heart's content).

None of this is particularly new, we've had RDF in biodiversity informatics for at least five years, there are various linked data-style projects, such as GeoSpecies and the first iteration of bioGUID, and some people (such as Roger Hyam) have been pushing HTTP URIs + RDF for a while, but we seem remarkably unable to get traction on this. Notably, no major biodiversity provider provides RDF (by major I mean GenBank or GBIF size). We make diagrams like the one I drew for GBIF last year, we make the case that linking is a Good Thing™, and yet nothing much happens. This suggests that the idea is still not be presented in a compelling enough fashion. Certainly, clunky demos like the one above probably won't help much. Linked Data clients are generally pretty awful things to use. I think we're going to need some compelling applications that really grab people's attention.