Search this keyword

Thoughts on the International Year of Biodiversity 2010

IYB2010_Logo_English_sm.jpg
Given that a new decade prompts predictions, as well as New Year's resolutions, and that 2010 is the International Year of Biodiversity, which comes complete with glossy web sites and calls for action, I'm making some predictions of my own, inspired in part by Eric Hellman's Ten Predictions for the Next Ten Years. I won't be nearly as bold as Eric, I'm limiting myself to biodiversity informatics, and the coming year. Here are my predictions:


  1. The Encyclopedia of Life will continue it's slow decline into irrelevance. Nobody will care, as we have Wikipedia.

  2. Catalogue of Life (CoL) will issue another release, complete with much fanfare. The LSIDs for the 2009 release (which have never worked) will continue to fail, LSIDs for 2010 either won't be released, or will fail. Nobody will care.

  3. There will be much talk of integrating biodiversity data. Unless GBIF adopts resolvable identifiers for specimens, and a major nomenclator or taxonomic name database (re)uses resolvable identifiers for literature (e.g., DOIs and BHL URLs), nothing of significance in this area will happen. Database providers will continue to confuse "link integration" (i.e., sharing URLs, doi:10.1038/nrg1065) with genuine integration.

  4. For most young scientists GenBank will be the dominant source of information about biodiversity. If it hasn't been sequenced, they won't care about it.

  5. DNA barcoding by itself will become boring, but will be the best tool with which to engage the public about taxonomy (e.g., Barcoding, taxonomy and citizen CSI).

  6. Literature that is not online will cease to be read. Taxonomic groups where the literature is not online will effectively cease to be studied.

  7. The major databases will continue to be riddled with errors. These will be numerous enough to be annoying, but not so numerous as to prevent useful work being done. The databases will make no (serious) effort to fix these (doi:10.1126/science.319.5870.1598).

  8. No major database effort will adopt wikis.

  9. Data providers such as Thomson Reuters (Index of Organism Names) will continue to clutch to debilitating notions of "intellectual property." As the coverage of the Biodiversity Heritage Library, and the reach of Google's indexing increase, commercial indexing services will become irrelevant.

  10. The chasm between the classifications that underlie efforts such as EOL, and phylogenetic trees being generated by systematists will grow. Neither community will care.


What are your predictions?

BioStor

Today I finally got a project out the door. BioStor is my take on what an interface to theBiodiversity Heritage Library (BHL) could look like. It features the visualisations I've mentioned in earlier posts, such as Google maps based on extracted localities, and tag trees. It also has a modified version of my earlier BHL viewer.

There are a number of ideas I want to play with using BioStor, but the main goal this site is to provide article-level metadata for BHL. As I've discussed earlier (see also Chris Freeland's post But where are the articles?), BHL has very little article-level metadata, making searching for articles a frustrating experience. BioStor aims to make this easier by providing an OpenURL resolver that tries to find articles in BHL.

BioStor supports the OpenURL standard, which means it can be used from
within EndNote and Zotero. Web sites that support COinS (such as Drupal-based Scratchpads and EOL's LifeDesks) can also be uses BioStor (see http://biostor.org/referrer.php for details).

My approach to finding articles in BHL is to take existing metadata from bilbiographies and databases, and use this to search BHL using techniques ranging from reasonably elegant (Smith-Waterman alignment on words to match titles) to down-and-dirty regular expression matching. Since this metadata may contain errors, BioStor provides basic editing tools (using reCAPTCHA rather than user logins at this point).

There's much to be done, the article finding is somewhat error-prone, and the search requires a local copy of BHL, and mine is rather out of date. However, it is a start.

To get a flavour of BioStor, try browsing some references:

http://biostor.org/reference/1
http://biostor.org/reference/4
http://biostor.org/reference/12

or view information for a journal:

http://biostor.org/issn/0007-1498


or an author:

http://biostor.org/author/41
http://biostor.org/author/16

or a taxon name:

http://biostor.org/name/Atelophryniscus%20chrysophorus

BHL interface ideas

I've been buried in programming (and it's exam time at Glasgow) so I've not blogged for a month (gasp). I've been playing with ways to visualise Biodiversity Heritage Library content for a while (click here for a list of previous posts), and have occasionally surfaced to tweet a screenshot via twitpic. The more I play with the BHL content the more I think it's a gold mine, and that many of the ideas I played with for my ill-fated Elsevier Challenge entry (website here, background paper at hdl:10101/npre.2009.3173.1) are taking on a new life with this project.

I'm hoping to release my BHL article finding and visualising web site by the end of the month, but meantime I'm gathering the screenshots here.

The first shows a Google map generated from latitude and longitudes extracted from OCR text using some simple regular expressions from page 7705952 in the BHL.There's quite a bit of latitude and longitude information in BHL, and that's before trying georeferencing tools.

<46740423.png


The idea is to display this map next to the article so that user get's an immediate sense of what region in the world the article covers, such as this article about Riekia wasps:

46744940.png


I'm also interested in useful ways to display search results. Here's an experiment using TileBars to visualise how relevant a search result is. The width of the bar is a function of how many pages are in the article, the vertical stripes indicate pages that have the search term. The idea is to get a quick visual impression of whether the article mentions the term in parsing, or treats it in some detail.

48350737.png


TileBars were developed by Marti Hearst, whose web site has some great resources. Partly inspired by her BioText projec, as well as the thumbnail page display in JSTOR I'm now experimenting with showing thumbnails in search results. For example, here's a search for the deep sea octopus Graneledone pacifica, showing two articles:


48832222-196574b7b6d6a2bc5764a5e853cd478b.4b228b85-full.png


I display thumbnails for pages that (a) have the name on the page, and (b) have what look like figure captions on them. The idea is that an article that figures a taxon is likely to be a fairly important article to look at, so displaying thumbnails will highlight those articles. The second article in the search results is the paper that published the name Graneledone pacifica, and the figures illustrate the taxon.

These are all pretty rough, but they give some idea of what I've been working on the last month.