Search this keyword

Finding scientific articles in a large digital archive: BioStor and the Biodiversity Heritage Library

npre20104928-1.thumb.pngYesterday I uploaded a manuscript to Nature Precedings that describes the inner workings of BioStor. The title is "Finding scientific articles in a large digital archive: BioStor and the Biodiversity Heritage Library", and you can grab it here: hdl:10101/npre.2010.4928.1.

Manuscripts describing databases are usually pretty turgid affairs, and this isn't an exception, despite my attempts to spice it up with the tale of Leviathan, oops, Livyatan (see doi:10.1038/nature09381 and Wikipedia). Plus, I can't escape the thought that BioStor would have been a lot more fun to write if I'd used a key-value database like CouchDB. I fear this is often the way of things. By the time it comes to writing something up, you realise that if you could start over you'd do it rather differently.

BHL and the iPad

@elyw I'd leave bookmarking to 3rd party, e.g. Mendeley. #bhlib specific issues incl. displaying DjVu files, and highlighting taxon namesless than a minute ago via Tweetie for Mac



Quick mock-up of a possible BHL iPad app (made using OmniGraffle), showing a paper from BioStor(http://biostor.org/reference/50335). Idea is to display a scanned page at a time, with taxonomic names on page being clickable (for example, user might get a list of other BHL content for this name). To enable quick navigation all the pages in the document being viewed are displayed in a scrollable gallery below main page.

bhlipad.jpg

Key to making this happen is being able to display DjVu files in a sensible way, maybe building on DjVu XML to HTML. Because BHL content is scanned, it makes sense to treat content as pages. We could extract OCR text and display that as a continuous block of text, but the OCR is sometimes pretty poor, and we'd also have to parse the text and interpret its structure (e.g., this is the title, these are section headings, etc.), and that's going to be hard work.

Touching citations on the iPad

Quick demo of the mockup I alluded to in the previous post. Here's a screen shot of the article "PhyloExplorer: a web server to validate, explore and query phylogenetic trees" (doi:10.1186/1471-2148-9-108) as displayed as a web-app on the iPad. You can view this at http://iphylo.org/~rpage/ipad/touch/ (you don't need an iPad, although it does work rather better on one).

touch.png
I've taken the XML for the article, and redisplayed it as HTML, with (most) of the citations highlighted in blue. If you touch one (or click on it if you're using a desktop browser) then you'll see a popover with some basic bibliographic details. For some papers which are Open Access I've extracted thumbnails of the figures, such as for "PhyloFinder: an intelligent search engine for phylogenetic tree databases" (doi:10.1186/1471-2148-8-90), shown above (and in more detail below):

popover.png
The idea is to give the reader a sense of what the paper is about, beyond can be gleaned from just the title and authors. The idea was inspired by the Biotext search engine from Marti Hearst's group, as well as Elsevier's "graphical abstract" noted by Alex Wild (@Myrmecos).

Here's a quick screencast showing it "live":



The next step is to enable the reader to then go and read this paper within the iPad web-app (doh!), which is fairly trivial to do, but it's Friday and I'm already late...