Search this keyword

Sherborn presentation on Open Taxonomy

Here is my presentation from today's Anchoring Biodiversity Information: From Sherborn to the 21st century and beyond meeting.


All the presentations will be posted online, along with podcasts of the audio. Meantime, presentations by Dave Remsen and Chris Freeland are already online.

Linking taxonomic names to literature: beyond digitised 5×3 index cards

Pubs
Tomorrow is the Anchoring Biodiversity Information: From Sherborn to the 21st century and beyond meeting. It should be an interesting gathering, albeit overshadowed by the sudden death of Frank Bisby.

I'm giving a talk entitled "Open Taxonomy", in which I argue that most taxonomic databases are little more than digitised collections of 5×3 index cards, where literature is treated as dumb citation strings rather than as resources with digital identifiers. To make the discussion concrete I've created a mapping between the Index to Organism Names (ION) database and a range of bibliographic sources, such as CrossRef (for DOIs), BioStor, JSTOR, etc.

This mapping is online at http://iphylo.org/~rpage/itaxon/.

So far I've managed to link some 200,000 animal names to a literature identifier, and a good fraction of these articles are freely available, either as images in BioStor and Gallica (for I've created a simple viewer) or as PDFs (which are displayed using Google Docs.

Some examples are:


The site is obviously a work in progress, and there's a lot to be done to the interface, but I hope it conveys the key point: a significant fraction of the primary taxonomic literature is online, and we should be linking to this. The days of digitised 5×3 index cards are past.



Final thoughts on TDWG RDF challenge

Quick final comment on the TDWG Challenge - what is RDF good for?. As I noted in the previous post, Olivier Rovellotti (@orovellotti) and Javier de la Torre (@jatorre) have produced some nice visualisations of the frog data set:
Cartodb
Nice as these are, I can't help feeling that they actually help make my point about the current state of RDF in biodiversity informatics. The only responses to my challenge have been to use geography, where the shared coordinate system (latitude and longitude) facilitates integration. Having geographic coordinates means we don't need to have shared identifiers to do something useful, and I think it's no accident that GBIF is one of the most important resources we have. Geography is also the easiest way to integrate across other fields (e.g., climate).

But what of the other dimensions? What I'm really after are links across datasets that enable us to make new inferences, or address interesting questions. The challenge is still there...