Search this keyword

Anchoring Biodiversity Information: from Sherborn to the 21st century and beyond

Next month I'll be speaking in London at The Natural History Museum at a one day event Anchoring Biodiversity Information: From Sherborn to the 21st century and beyond. This meeting is being organised by the International Commission on Zoological Nomenclature and the Society for the History of Natural History, and is partly a celebration of his major work Index Animalium and partly a chance to look at the future of zoological nomenclature.

Details are available from the ICZN web site. I'll be giving a a talk entitled "Towards an open taxonomy" (no, I don't know what I mean by that either). But it should be a chance to rant about the failure of taxonomy to embrace the Interwebs.

SherbornPoster Sept 11

I think I now "get" the Encylopedia of Life

The Encylopedia of Life (EOL) has been relaunched, with a new look and much social media funkiness. I've been something of an EOL sceptic, but looking at the new site I think I can see what EOL is for. Ironically, it's not really about E. O. Wilson's original vision (doi:10.1016/S0169-5347(02)00040-X:
Imagine an electronic page for each species of organism on Earth, available everywhere by single access on command. The page contains the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits. The page opens out directly or by linking to other data bases, such as ARKive, Ecoport, GenBank and MORPHOBANK. It comprises a summary of everything known about the species’ genome, proteome, geographical distribution, phylogenetic position, habitat, ecological relationships and, not least, its practical importance for humanity.
We still lack a decent database that does this. EOL tries, but in my opinion still falls short, partly because it isn't nearly aggressive enough in harvesting and linking data (links to the primary literature anyone?), and has absolutely no notion of phylogenetics.

In terms of doing science I don't see much that I'd want to do with EOL, as opposed, say, to Wikipedia or existing taxonomic databases. But thinking about other applications, EOL has a lot of potential. One nice feature is the ability to make "collections". For example, Cyndy Parr has created a collection called Fascinating textures, which is simply a collection of images in EOL (I've included some below):

Textures
What is nice about this is that it cuts across any existing classification and assembles a set of taxa that share nothing other than having "fascinating textures". This ability to tag taxa means we could create all sorts of interest sets of taxa based on criteria that are meaningful in a particular context. For example, egotist that I am, I created a collection called Taxa described by Roderic Page, which includes the one crab and 6 bopyrid isopods that I described in the 80's.

Putting on my teaching hat, I'm involved in teaching a course on animal diversity and could imagine assembling collections of taxa relevant to a particular lecture (either taxonomically, or based on some other criteria, such as all parasites of a particular taxon, or all organisms found associated with deep sea vents. Other collections could be built by people or organisations with content. For example, lists of top ten new species, lists of species for which the BBC has content, etc.

In this sense, EOL becomes a tagging service for life, a bit like delicious. The social network side of things is still a little clunky —there doesn't seem to be a notion of "contacts" or "friends", and it needs integration with existing social networks — but I think I now "get" what EOL is for.

Phantom articles: why Mendeley needs to make duplication transparent

Browsing Mendeley I found the following record: http://www.mendeley.com/research/description-larva/. This URL is for a paper
Costa, J. M., & Santos, T. C. (2008). Description of the larva of. Zootaxa, 99(2), 129-131
which apparently has the DOI doi:10.1645/GE-2580.1. This is strange because Zootaxa doesn't have DOIs. The DOI given resolves to a paper in the Journal of Parasitology:
Harriman, V. B., Galloway, T. D., Alisauskas, R. T., & Wobeser, G. A. (2011). Description of the larva of Ceratophyllus vagabundus vagabundus (Siphonaptera: Ceratophyllidae) from nests of Rossʼs and lesser snow geese in Nunavut, Canada. The Journal of parasitology, 93(2), 197-200
Now, this paper has it's own record in Mendeley.

OK, so this is weird..., but it gets weirder. If you look at the Mendeley page for this chimeric article there is a PDF preview of yet another article:
LOPES, Maria José Nascimento; FROEHLICH, Claudio Gilberto and DOMINGUEZ, Eduardo (2003). Description of the larva of Thraulodes schlingeri (Ephemeroptera, Leptophlebiidae). Iheringia, Sér. Zool. 92(2), 197-200 2003 doi:10.1590/S0073-47212003000200011
Mendeley duplicate

But it gets even more interesting. The abstract for the phantom Zootaxa article belongs to yet another paper:
Marques, K. I. D. S., & Xerez, R. D.Description of the larva of Popanomyia kerteszi James & Woodley (Diptera: Stratiomyidae) and identification key to immature stages of Pachygastrinae. Neotropical Entomology, 38(5), 643-648.
which also exists in Mendeley.

To investigate further I used Mendeley's API to retrieve this record (I had to look at the source of the web page to find the internal identifier used by Mendeley, namely 010c48d0-edb5-11df-99a6-0024e8453de6 to do this, why does Mendeley hide these?). Here's the abbreviated JSON for this record.

{
...
"website": "http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21506868",
"identifiers": {
"pmid": "21506868",
"issn": "19372345",
"doi": "10.1645\/GE-2580.1"
},
...
"issue": "2",
"pages": "129-131",
"public_file_hash": "fe7eed3f6c43a3be1480a0937229b9ad33666df4",
"publication_outlet": "Zootaxa",
"type": "Journal Article",
"mendeley_url": "http:\/\/www.mendeley.com\/research\/description-larva\/",
"uuid": "010c48d0-edb5-11df-99a6-0024e8453de6",
"authors": [
{
"forename": "J M",
"surname": "Costa"
},
{
"forename": "T C",
"surname": "Santos"
}
],
"title": "Description of the larva of",
"volume": "99",
"year": 2008,
"categories": [
39,
203,
37,
52,
43,
40,
210
],
"oa_journal": false
}

Doesn't add much to the story, but does give us the sha1 for the PDF for the chimeric article (fe7eed3f6c43a3be1480a0937229b9ad33666df4). If I download the PDF for the article in Iheringia, Sér. Zool. it has the same sha1:


openssl sha1 a11v93n2.pdf
SHA1(a11v93n2.pdf)= fe7eed3f6c43a3be1480a0937229b9ad33666df4

This article doesn't exist
So, to summarise, this paper doesn't exist. It is credited to a journal that doesn't have DOIs, the DOI resolves to an article in a different journal, the abstract comes from another article in another journal, and the PDF is from a third article. OMG!

This is just weird
So, something about the way Mendeley merges references is broken. Merging references is a tough problem so there will always be cases where things go wrong. But it would be really, really helpful if Mendeley could display the set of articles that it has merged to create each canonical reference (say by listing the UUIDs for each article). Users could then see if badness had happened, and provide feedback, for example by highlighting references that are clearly the same, and those that are clearly different. Until this happens I'm a bit nervous about trusting Mendeley with my bibliographic data, I don't want it mangled into chimeric papers that don't exist.