Search this keyword

Journals I'd like BHL to scan

I've recently updated my database of links between animal taxonomic names and literature identifiers, which now has over 280,000 names linked to some form of identifier (127,000 of these being DOIs). You can see the current version here:

http://iphylo.org/~rpage/itaxon/

As an experiment I've added a feature to list the number of names for each journal. Based on this list (limited to journals that I've found an ISSN for) here are some journals I'd like to see digitised by the Biodiversity Heritage Library (BHL). Note that by digitised I mean beyond the 1923 cutoff applied to many journals. This will mean negotiating with the journal publishers, but in a number of cases these are scientific societies or institutions, some associated with BHL. Given that major partners in BHL have made post-1923 content available, it would nice to extend this to other key taxonomic journals.

Revue Suisse de Zoologie

Revue Suisse de Zoologie has published nearly 10,000 taxonomic names but has essentially zero digital presence, which is extraordinary. Another Swiss journal, Entomologica Basiliensia is also an obvious candidate.

Revue de Zoologie et de Botanique Africaines

Revue de Zoologie et de Botanique Africaines has published over 5,000 names, and given the interest in providing information resources for Africa (e.g., http://www.mendeley.com/groups/1681811/bhl-africa/) this seems an obvious journal to scan completely.

Bulletin of the British Museum (Natural History) journals and books

The Natural History Museum [formerly British Museum (Natural History)] is a member of BHL so I'd expect it to have better coverage of it's own publications in BHL. There are gaps in journals such as Bulletin of the British Museum (Natural History) Entomology, which means there is a significant chunk of research published by Museum staff that simply doesn't exist digitally. At one point The Natural History Museum renamed the journals and moved them to Cambridge University Press, resulting in further gaps in digitisation. It's interesting that museums that haven't changed the title of their publications (such as the American Museum of Natural History and the Australian Museum) have better digital coverage than the NHM, which has flirted with various title changes in the last few decades. The Museum also published a series of monographs in the 20th century, many of these aren't in BHL.

Memoirs of the Queensland Museum

The Memoirs of the Queensland Museum is an important journal (> 3,000 names) but has only early issues scanned in BHL and recent issues as PDFs on the Museum web site (vulnerable to link rot when the site gets redesigned, as I've discovered to my cost).

Russian journals

Russian journals contain large numbers of taxonomic descriptions, but their digital presence is patchy. Springer has started to publish translations online (e.g., http://dx.doi.org/10.1134/S0013873810050155 in Entomological Review, which is a translation of an article in Zoologicheskii Zhurnal), but much of the Russian literature seems unavailable in digital form. BHL has spread from it's US-UK origins to BHL-Europe, BHL_China, and BHL_Australia, maybe it's time for BHL-Russia?

Summary

There are huge holes in the availability of taxonomic literature (where I equate "availability" with being digitised and online, free or otherwise). But on the other hand I've been pleasantly surprised by just how much taxonomic literature is online. It looks quite feasible to link at least 300,000 animal names to digital publications.

The journals I've highlighted are just a few obvious candidate for scanning. I suspect that as one goes down the list of taxonomic journals the rate of return will decline, to the point where scanning entire journals will be less efficient than scanning targeted articles.



Towards an interactive taxonomic article: displaying an article from ZooKeys

One of the things I keep revisiting is the way we display scientific articles. Apart from Nature's excellent iPhone and iPad apps, most efforts to re-imagine how we display articles are little more than glorified PDF viewers (e.g., the PLoS iPad app).

Part of the challenge is that if we make the article more interactive we immediately confront the problem of how to link to other content. For example, we may have a lovingly crafted ePub view (e.g., Nature's apps), but what happens when the user clicks on a citation to another paper? If the paper is published by the same journal, then potentially it could be viewed using the same viewer, but if not then we are at the mercy of the other publisher. They will have their own ideas of how to display articles, so the simplest fallback is to display the cited article in a web browser view. The problem with this is that it breaks the user experience - the other publisher is unlikely to follow the same conventions for displaying an article and its links. If we are lucky the cited article might be published in an Open Access journal that provides, say, XML based on the NLM DTD standard. Knowing whether an article is Open Access or not is not straightforward, and different journals have their own unique interpretation of the NLM standard.

Then there is the issue of other kinds of content, such as taxonomic names, specimens, DNA sequences, geographic localities, etc. We lack decent services for many of these objects, as a result efforts like PLoS Biodiversity Hub end up being underwhelming collections of reformatted journal articles, rather then innovative integrations of biodiversity knowledge.

With these issues in mind I've started playing with ZooKeys XML, initially looking at ways to display the article beyond the conventional format. Ultimately I'd like to embed the article in a broader web of citations and data. ZooKeys articles are available in PDF, HTML, and XML. The HTML has links to taxon pages, maps, etc., which is nice, but I personally find this a little jarring because it interrupts the reading experience. The ZooKeys web site also surrounds the article with all paraphernalia of a publisher's web site:

Zookeys
As a first experiment, I've taken the XML for article At the lower size limit for tetrapods, two new species of the miniaturized frog genus Paedophryne (Anura, Microhylidae) http://dx.doi.org/10.3897/zookeys.154.1963 and used a XSLT style sheet to reformat the article. I've borrowed some ideas from Nature's apps, such as the font for the title, displaying the abstract in bold, and showing all the figures in the article as thumbnails near the top. I've also added some basic interactivity, which you can see in the video below. Instead of figures being in one place in the article, wherever a figure is mentioned in the article (e.g., "Fig. 1") if you click on the reference to the figure it appears. If the article display a point locality using latitude and longitude, instead of launching a separate browser window with a Google map, click on the locality and the map appears. The idea is that the flow of reading isn't interrupted, figures, maps, and citations all appear in the text.


This demo (which you can see live at http://iphylo.org/~rpage/zookeys) is limited, but most of its functionality comes from simply reformatting XML using XSLT. There's a little bit of jQuery for animation, and I ended up having to write a PHP script to convert verbatim latitude and longitude coordinates to the decimal coordinates expected by Google Maps, but it's all very light weight. It wouldn't take much to add some JSON queries to make the taxon names clickable (e.g., showing a summary of a taxon from EOL). Because ZooKeys uses the NLM DTD for its XML, some of this code could also be applied to other journals, such as PLoS, so we could start to grow a library of linked, interactive taxonomic articles.

Exporting data from Australian Faunal Directory on CouchDB

Quick note to self about exporting data from my Australian Faunal Directory on CouchDB project. To export data from a CouchDB view you can use a list function (see Formatting with Show and List). Following the example on the Kanapes IDE blog, I created the following list function:

{
"_id": "_design/publication",
"_rev": "14-467dee8248e97d874f1141411f536848",
"language": "javascript",
"lists": {
"tsv": "function(head,req) {
var row;
start({
'headers': {
'Content-Type': 'text/tsv'
}
});
while(row = getRow()) {
send(row.value + '\\t' + row.key + '\\n');
}}"
},
"views": {
.
.
.
}
}


I can use this function with the view below, which lists Australian Faunal Directory publications by UUID ("value"), indexed by DOI ("key").

Couch

I can get the tab-delimited dump from http://localhost:5984/afd/_design/publication/_list/tsv/doi. Note that instead of, say, /afd/_design/publication/_view/doi to get the view, we use /afd/_design/publication/_list/tsv/doi to get the tab-delimited dump.

I've created files listing DOIs and BioStor ids for publications in the Australian Faunal Directory. I'll play with lists a bit more, specially as I would like to extract the mapping from the Australian Faunal Directory on CouchDB project and add it to the iTaxon project.