Accounting Careers

Continuing with RSS feeds, I've now added wrappers around IPNI that will return for each plant family a list of names added to the IPNI database in the last 30 days. You can see the list at here.

One thing which is a constant source of frustration for me is the disconnect between nomenclators (lists of published names for species) and scientific publishing. The unit of digitisation for a publisher is the scientific article, but nomenclators often cite not the article in which a name was published, but the page on which the name appears.

For example, consider IPNI record 77096979-1 (or, if you prefer LSIDs urn:lsid:ipni.org:names:77096979-1). It is for the begonia Begonia ozotothrix, and the citation is:

Edinburgh J. Bot. 66(1): 105 (-110; figs. 1, 4-5, map). 2009 [Mar 2009]

Very detailed, and great if I have access to a physical library that has the Edinburgh Journal of Botany -- I just find volume 66 on the shelf and turn to page 105. But, I want this on my computer now ("library" - who they?). How do I find this reference on the web? The answer, is not easily. Tools such as OpenURL, which could be used, assume that I know at least the starting page of the article, but IPNI doesn't tell me that. Nor do I have an article title, which might help, but a Google search on "Begonia ozotothrix" finds the article:

TWO NEW SPECIES OF BEGONIA ( BEGONIACEAE) FROM CENTRAL SULAWESI, INDONESIA
D C Thomas, W H Ardi and M Hughes
Edinburgh Journal of Botany 66, 103 (2009)
doi:10.1017/S0960428609005320

Note the DOI! This article exists on the web, so why can't IPNI give me the DOI? They've gone to a lot of trouble to describe the citation in great detail, but adding the DOI brings the record into the 21st century and the web (the DOI is even printed on the article!).

I think nomenclators need to make a concerted effort to integrate with the digital scientific literature, otherwise they will remain digital backwaters that make the implicit assumption that their users have access to libraries such as that at the Royal Botanic Gardens, Edinburgh (pictured).

For recently published articles there's absolutely no reason not to store the DOI. Finding these retrospectively is a pain, but I need these for my RSS feed (and other projects) so one thing I added a while ago to bioGUID's OpenURL resolver is the ability to search for an article given an arbitrary page. For example,

http://bioguid.info/openurl/?genre=article&title=Edinburgh J. Bot.&volume=66&pages=105

will search various sources (such as CrossRef) to find an article that includes page 105. Now, I just have to have a parser that can make sense of IPNI bibliographic citations...

Although I'd been thinking of getting the wiki project ready for e-Biosphere '09 as a challenge entry, lately I've been playing with RSS has a complementary, but quicker way to achieve some simple integration.

I've been playing with RSS on and off for a while, but what reignited my interest was the swine flu timemap I made last week. The neatest thing about the timemap was how easy it was to make. Just take some RSS that is geotagged and you get the timemap (courtesy of Nick Rabinowitz's wonderful Timemap library).

So, I began to think about taking RSS feeds for, say journals and taxonomic and genomic databases and adding them together and displaying them using tools such as timemap (see here for an earlier mock up of some GenBank data). Two obstacles are in the way. The first is that not every data source of interest provides RSS feeds. To address this I've started to develop wrappers around some sources, the first of which is ZooBank.

The second obstacle is that integration requires shared content (e.g., tags, identifiers, or localities). Some integration will be possible geographically (for example, adding geotagged sequences and images to a map), but this won't work for everything. So, I need to spend some time trying to link stuff together. In the case of Zoobank there's some scope for this, as ZooBank metadata sometimes includes DOIs, which enables us to link to the original publication, as well as bookmarking services such as Connotea. I'm aiming to include these links within the feed, as shown in this snippet (see the <link rel="related"...> element):

<entry>
<title>New Protocetid Whale from the Middle Eocene of Pakistan: Birth on Land, Precocial Development, and Sexual Dimorphism</title>
<link rel="alternate" type="text/html" href="http://zoobank.org/urn:lsid:zoobank.org:pub:8625FB9A-1FC3-43C3-9A99-7A3CDE0DFC9C"/>
<updated>2009-05-06T18:37:34+01:00</updated>
<id>urn:uuid:c8f6be01-2359-1805-8bdb-02f271a95ab4</id>
<content type="html">Gingerich, Philip D., Munir ul-Haq, Wighart von Koenigswald, William J. Sanders, B. Holly Smith & Iyad S. Zalmout<br/><a href="http://dx.doi.org/10.1371/journal.pone.0004366">doi:10.1371/journal.pone.0004366</a></content>
<summary type="html">Gingerich, Philip D., Munir ul-Haq, Wighart von Koenigswald, William J. Sanders, B. Holly Smith & Iyad S. Zalmout<br/><a href="http://dx.doi.org/10.1371/journal.pone.0004366">doi:10.1371/journal.pone.0004366</a></summary>
<link rel="related" type="text/html" href="http://dx.doi.org/10.1371/journal.pone.0004366" title="doi:10.1371/journal.pone.0004366"/>
<link rel="related" type="text/html" href="http://bioguid.info/urn:lsid:zoobank.org:pub:8625FB9A-1FC3-43C3-9A99-7A3CDE0DFC9C" title="urn:lsid:zoobank.org:pub:8625FB9A-1FC3-43C3-9A99-7A3CDE0DFC9C"/>
</entry>

What I'm hoping is that there will be enough links to create something rather like my Elsevier Challenge entry, but with a much more diverse set of sources.

Accounting Careers

Search this keyword

Nomenclators + digitised literature = fail

Integrating and displaying data using RSS

Blog Archive

Popular Posts

Labels