Search this keyword

Linnaeus meets the Internet: PLoS + Botany = #fail

C2914D0E-13E9-4CA6-BE0A-7A8645BC6A72.jpgTo much fanfare (e.g., Nature News, "Linnaeus meets the Internet" doi:10.1038/news.2010.221), on May 5th PLoS ONE published Sandy Knapp's "Four New Vining Species of Solanum (Dulcamaroid Clade) from Montane Habitats in Tropical America" doi:10.1371/journal.pone.0010502. To quote the Nature News piece:
The paper represents the culmination of a campaign to institute the electronic publication of scientific names, a case Knapp and others have made in journals including Nature[doi:10.1038/446261a]. Allowing electronic publication should make accessing information easier for scientists worldwide — especially those in developing countries who may not have access to fully stocked libraries. This, in turn, will aid conservation efforts, Knapp says.

Given the profile of this paper, "...the first time new plant names have been published in a purely electronic journal and still complied with ICBN rules", you'd think the participants would ensure the electronic aspects of the publication worked. Sadly, this is not the case.

The four names in question have apparently been deposited in IPNI with the following LSID's:

  • Solanum aspersum: urn:lsid:ipni.org:names:77103633-1

  • Solanum luculentum: urn:lsid:ipni.org:names:77103634-1

  • Solanum sanchez-vegae: urn:lsid:ipni.org:names:77103635-1

  • Solanum sousae: urn:lsid:ipni.org:names:77103636-1


Today is May 6th. None of these names are returned by a search of IPNI, for example http://www.ipni.org/ipni/simplePlantNameSearch.do?find_wholeName= returns this:

ipni1.png

Resolving the LSID returns this:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:tn="http://rs.tdwg.org/ontology/voc/TaxonName#"
xmlns:tm="http://rs.tdwg.org/ontology/voc/Team#"
xmlns:tcom="http://rs.tdwg.org/ontology/voc/Common#"
xmlns:p="http://rs.tdwg.org/ontology/voc/Person#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#">
<tn:TaxonName rdf:about="urn:lsid:ipni.org:names:77103633-1">
<tcom:versionedAs rdf:resource="urn:lsid:ipni.org:names:77103633-1:1.2"/>
<tcom:Deleted>Yes</tcom:Deleted>
</tn:TaxonName>
</rdf:RDF>

Hmmm, so apparently this record has been "deleted"?

The paper also states that:
The IPNI LSIDs (Life Science Identifiers) can be resolved and the associated information viewed through any standard web browser by appending the LSID contained in this publication to the prefix http://ipni.org/.

This sentence mirrors similar ones in other PLoS ONE papers saying we can resolve ZooBank LSIDs by appending the LSID to http://zoobank.org (e.g., see doi:10.1371/journal.pone.0001787).

Thing is, URLs such as http://ipni.org/urn:lsid:ipni.org:names:77103633-1 return a 404 from Kew (any IPNI LSID I've tried does this).


Update As per Alan Paton's comment below, the http://ipni.org prefix now works.


So, to recap:

  1. The names aren't in IPNI

  2. The LSIDs state the record has been deleted

  3. The LSID's can't be resolved by the means stated in the paper

Now, I don't know what happened (perhaps IPNI wanted to hold off until the paper actually appeared before releasing the names), but the paper is out, the buzz in Nature is out, and IPNI doesn't have the resolver in place, yet alone the names.

Given the milestone this paper represents, and the fuss over the publication of the name Darwinius, you'd expect the bioinformatics side of it to be, you know, actually working. In these circumstances, how on Earth do we make the case that the LSID and name databasing side of taxonomic publication is useful?

Mendeley Open API and the Biodiversity Heritage Library

Mendeley have called for proposals to use their forthcoming API. The API will publicly available soon, but in a clever move Mendeley will provide early access to developers with cool ideas.
Given that the major limitation of the Biodiversity Heritage Library (from my perspective) is the lack of article-level metadata, and Mendeley has potentially lots of such data, I wonder whether this is something that could be explored. My BioStor project takes article metadata and finds articles in BHL, so an attractive work flow would be:
  1. People upload bibliographies to Mendeley (e.g., bibliographies for particular taxa, journals, etc.)

  2. BioStor uses Mendeley's API to find articles liklely to be in BHL, then locates the actual article in Mendeley.

  3. The user could then grab a PDF of the article from BioStor that contains XMP metadata (which Mendeley, and other tools, can read)

Users would gain a tool to manage their bibliographies (assuming that they prefer Mendeley to other tools, or are happy to sync with Mendeley), they would be contributing to a database of taxonomic (and biological literature in general, BHL's content is pretty diverse), and also gain easy access to PDFs for BHL content (this last feature depends on whether Mendeley can associate a PDF with an existing bibliographic record automatically). In the same way, a tool such as BioStor (and, by implication, BHL) could gain usage statistics (i.e., who is reading these articles?).

Our communities efforts at assembling bibliographies haven't amounted to much yet. The tools we use tend to be poor. I find CiteBank to be underwhelming, and Drupal's bibliographic modules (used by CiteBank and ScratchPads) lack key features. We also seem reluctant to contribute to aggregated bibliographies. Perhaps encouraging people to use a nicer tool, and at the same time providing additional benefits (e.g., XMP PDFs) might help move things forward.

Anyway, food for thought. Perhaps other tools might make more sense, such as using the API to upload metadata and PDFs direct from BioStor to Mendeley, and making the collection public. But, if I were Mendeley, what I'd be looking for are tools that enhance the Mendeley experience. There's some obvious scope for visualising the output and social networks of authors, such as the sparklines and coauthor graphs I've been playing with in BioStor (for example, for W E Duellman):

FB144353-8061-456E-A502-9D9F01F56123.jpg

D6EB9A37-0440-479F-B937-4489359C1E33.jpg

Before this blog post starts to veer irretrievably off course, I'd be interested in thoughts of anyone interested in matters BHL. There's nothing like a deadline (Friday, May 14th) to concentrate the mind...

Time for some decent service

The BBC web site has an article entitled Giant deep sea jellyfish filmed in Gulf of Mexico which has footage of Stygiomedusa gigantea, and mentions an associated fish, Thalassobathia pelagica.

AE8B4B6F-CC98-405F-90FF-390262EBE3C0.jpg


One thing that frustrates me beyond belief is how hard it is to get more information about these organisms. Put another way, the biodiversity informatics community is missing a huge opportunity here. There are a slew of services, such as Zemanta and OpenCalais.com, that can enrich the content of a document by identifying terms and adding links. Imagine a similar service that took taxonomic names and could provide information and links about that name, so that sites such as the BBC could enrich their pages. We've had various attempts at this1, but we are still far from creating something genuinely useful.

Part of the problem is that the plethora of taxonomic databases we have are often of little use. After fussing with Google I discover that Stygiomedusa gigantea (Browne, 1910) has the synonym Stygiomedusa fabulosa Russell, 1959 (see, e.g., the WoRMS database), but no database tells me that the genus Stygiomedusa was published by Russell in Nature in 1959 (doi:10.1038/1841527a0). Nor can I readily find the original reference for (Browne, 1910) in these databases2. Why is this so hard?

9B0FFA09-AF7B-4F82-98F5-C5D7DF891C5F.jpgThen when we do have information, we fail to make it digestible. For example, the EOL page for Thalassobathia pelagica links to BHL pages, but fails to point out that the pages it links belong to a single article, and that this article (http://biostor.org/reference/4339) is the original description of the fish.

Publishers are increasingly interested in any tools that can embellish their content. The organisation that gets their act together and provides a decent service for publishers (including academic journals, and news services such as the BBC) is going to own this space. Any takers...?

  1. Such as uBio LinkIT and EOL NameLink.
  2. After finding another taxon with the author Browne 1910 in BHL, I found Diplulmaris (?) gigantea, which looked like a good candidate for the original name for the jellyfish, see http://biodiversitylibrary.org/page/1727009. This is confirmed by the Smithsonian's Antarctic Invertebrates site.