Search this keyword

These are my species - finding the taxonomic names I published using Mendeley

The latest addition to my mapping of taxonomic names to the literature (http://iphylo.org/~rpage/itaxon/) is the ability for authors with Mendeley accounts to find the names they've published. This is an extension of the "I wrote that" tool I developed earlier.

Let's say I want to show the names that a given author has published. I could search by that author's name, but that raises all sorts of issues (see my earlier posts ReaderMeter: what's in a name? and Equivalent author names), especially for this database where I have incomplete citations and in many cases lack author names beyond surname.

Another way to tackle the problem is if I have a list of publications for an author, then all I need to do is match that list to the publications in my taxonomic database. If both lists have identifiers for the publications, such as DOIs, then the task is trivial. But, where do I get these lists?

An obvious source is Mendeley, where people are building lists of their own publications (as well as other publications that they are interested in). For example, my publications are listed at http://www.mendeley.com/profiles/roderic-page/.

But I don't want to have to get these lists myself, I'd much rather that a Mendeley user could go to my taxonomic database, say "I have this Mendeley account, show me the names I've published". One reason I'd like to do this is that if I want people to engage with this project it would be nice to be able to offer an immediate reward, in this case, a place where you can show your contribution to the task of cataloguing life on this planet.

Finding my taxonomic names

If you have a Mendeley account here's what you do:

Go to http://iphylo.org/~rpage/itaxon/. At the top right you will see a "Sign in using Mendeley" link.

M1
Click this and you will be taken to Mendeley where you will be asked if you'd like to allow http://iphylo.org/~rpage/itaxon/ to connect to your account (if you're already logged in to Mendeley then you'll see an Accept button, otherwise Mendeley will ask you to log in).

M2
If you click on Accept then you will be taken back to my site and you should now see your profile name and picture on the top right:

M3

If you click on the Profile link then my site will talk to Mendeley and get a list of your papers and look for them in my database. If it find a paper it outputs the taxonomic names published in that paper. For example, here is my profile:

M4

Listed are the species of bird lice in the genus Dennyus described in a paper on which I was a coauthor (http://dx.doi.org/10.1046/j.1365-3113.1996.d01-13.x).

This list is incomplete as earlier papers of mine on crab and isopod taxonomy aren't listed because these lack identifiers. This is something I need to work on, but for now this seems like a simple way to enable someone to go to the http://iphylo.org/~rpage/itaxon/ mapping between taxonomic names and literature and find the names they've authored.

If you have a Mendeley account, and your list of publications in Mendeley includes papers describing new animal species, go to http://iphylo.org/~rpage/itaxon/ and try it out.


Mapping names to literature: closing in on 250,000 names

Following on from my earlier post Linking taxonomic names to literature: beyond digitised 5×3 index cards I've been slowly updating my latest toy:

http://iphylo.org/~rpage/itaxonAlpheus

This site displays a database mapping over 200,000 animal names to the primary literature, using a mix of identifiers (DOIs, Handles, PubMed, URLs) as well as links to freely available PDFs where they are available. Lots still to do as about a third of the 1.5 million names in the database have citations that my code hasn't been able to parse. There are also lots of gaps that need to be filled in, for example missing DOIs or PubMed identifiers, and a lot of the earlier names are linked by "microcitations" to names, and I'll need to handle those (using code from my earlier project Nomenclator Zoologicus meets Biodiversity Heritage Library: linking names directly to literature).

The mapping itself is stored in a database that I'm constantly editing, so this is far from production quality, but I've found it eye-opening just how much literature is available. There is a lot of scope for generating customised lists of papers, for example, primary taxonomic sources for taxa currently on the IUCN Red List, or those taxa which have sequences in GenBank (building on the mapping of NCBI taxa onto Wikipedia). Given that a lot of the relevant literature is in BHL, or available as PDFs, we could do some data mining, such as extracting geographical coordinates, taxonomic names, and citations. And if linked data is your thing, the 110,000 DOIs and nearly 9,000 CiNiii URLs all serve RDF (albeit not without a few problems).

I've set a "goal" of having 250,000 names mapped to the primary literature, at which point the database interface will get some much-needed attention, but for now have a look for your favourite animal and see if it's original description has been digitised.

Towards the bibliography of life

David King et al.'s paper "Towards the bibliography of life" http://dx.doi.org/10.3897/zookeys.150.2167 has just appeared in a special issue of ZooKeys. I've written a number of posts on this topic, so I've a few comments.

King et al. survey some of the issues, but don't really tackle the big issue of how we're going to build this. If we define the "bibliography of life" somewhat narrowly as the list of all papers that have published a scientific name (or a new combination, such as moving a species from one genus to another), then this is a large, but measurable undertaking. According to ION's metrics page, these are the numbers involved (for animals and protozoa):

Total New Names1,510,402
Total New Genera / Subgenera215,242
Total New Species / Subspecies1,192,366
Total Other New Names102,794
Total New Combinations241,296
Total New Synonyms260,544


Even in the worse case scenario of one name per publication (clearly not the case) this is big, but not insurmountable, task.

Publications not taxa
Part of the challenge is figuring out the best way to tackle the problem. In the past, most efforts at building taxonomic bibliographies have focussed on specific taxa, which is natural — the bibliographies are being built by taxonomists and they specialise in particular groups. But I'd argue that this is not the most efficient way to tackle the problem. Because the taxonomic literature is so widely dispersed, after the obvious "low hanging fruit" have been collected, considerable effort must be spent tracking down the harder to find citations. There are few economies of scale in this approach. In contrast, if we focus on publications at, say, the level of journal, then we can build a bibliography much more quickly. Once we've found the source, say, for one article, often we could use that information to harvest many articles from the same source (e.g., write scripts to harvest from a digital repository such as a DSpace server, or a digital library such as Gallica). But if we are focussed on a particular taxon, we will ignore the other articles in that journal ("what do I care about fish, I like turtles").

Put another way, if we imagine a taxa × publication matrix, then we can either go after rows (i.e., a bibliography for a specific taxonomic group), or columns (a list of articles in a specific journal). The article-based approach will be faster, albeit at the cost of finding articles that aren't necessarily relevant to taxonomy. This is why I'm spending what feels like far too much time harvesting article lists and uploading these to Mendeley. It is also one reason BHL has been so successful. They've simply gone after scanning the literature wholesale, rather than focussing on particular taxonomic groups.

TaxapublicationmatrixWikispecies logo enCrowd sourcing and Wikispecies
Crowd sourcing often strikes me as a euphemism for "we can't be bothered doing the tedious stuff, lets get the public to do it for us (plus it will look like we're engaged with the public)." I'm not denying can work, but I suspect it's not a magic bullet. Perhaps the best crowd sourcing is not to try and bring the crowd to a project, but go where the crowd has already gathered. In this case, an obvious crowd is the Wikispecies community. Working with the ION database for my Sherborn presentation, it's clear that the quality of bibliographic data in ION is variable, and rather poor for older references. In contrast, the reference lists on Wikispecies can be very good (e.g., the bibliography for George Boulenger). There are some issues with Wikispecies, notably the lack of a decent bibliographic template (unlike Wikipedia) so parsing references can be *cough* interesting, but there is scope here to use it to improve other databases. Citation matching can be a challenge, but in this case we have citations indexed by taxonomic name (in both ION and Wikispecies), which greatly reduces the scope of possible matches.

Summary
I think building the "bibliography of life" needs a combination of aggressive data gathering, and avoiding building additional tools unless absolutely needed. There are great tools and communities that can already be leveraged (e.g., Mendeley, Wikispecies), let's make use of them.