Search this keyword

Tag trees: displaying the taxonomy of names in BHL

I've added a feature to my Biodiversity Heritage Library viewer that should help make sense of the names found on a page. Until now I've displayed them as a list of "tags", which ignores the relations among the names. Based on some code I'd developed for my e-Biosphere 09 challenge entry I've added a "tag tree" that displays the classification of the names found on a BHL page:

big.png

The idea is that a set of names can make much more sense if you know what kind of organism they are referring to. For example, I don't know what Onetes is, but if I look at BHL page 2298380 I can see that it's an insect:

Onetes.png

The names in gray don't occur on the page, but do occur in the tree that links those names (the latter are highlighed in black). The tag tree can be useful for separating out host and parasite, e.g. BHL page 2298491 is about a flea and it's mammalian hosts:

flea.png

The tag tree can also flag names that might be mistaken, such as those found on page 2298330:

mistake.png

This page has names of some grasshoppers from Madagascar, as well as the name of a butterfly (Tsaratanana), which seems a little odd. Looking at the text, we discover that "Tsaratanana" is Mont. Tsaratanana a mountain in Madagascar. It would be fun to develop tools to annotate such cases so that somebody looking for the butterfly won't be presented with this page.

How it works

The inspiration for this tag tree came from several sources. David Remsen has often used an example of finding a fly name in the middle of a book on birds as being of interest, and the NCBI have a subtree view of taxa in a PubMed article. My own tag tree is constructed by finding for each name the ancestor-descendant path in a local, modified copy of the Catalogue of Life database, then assembling those paths into a tree. Because not all the names on a BHL page are in the Catalogue of Life, there may be names that aren't classified. These are simply listed below the tag tree (see image above).

iTaxon screencast

Sadly I won't be at TDWG 2009, at least not in person. However, there is a session on wikis, which may contain this brief screencast of my iTaxon experiments. The screencast was made in haste, but tries to convey some of the ideas behind these experiments, especially the idea that by linking data together we can generate more interesting and rich views of objects such as scientific publications. The screencast starts with the The amphibian tree of life page.


BHL Viewer now with go faster stripes

One of the more glaring limitations of my BHL viewer described in the previous post is that it can take a while to load all the page thumbnails (there can be hundreds). Given that one of the original motivations for this project was a faster viewer, this kinda sucks. What I'd like to do is load the thumbnails only when I need them, rather than all at once at the start -- in other words I'd like to implement lazy loading.

I'm using the Prototype Javascript library, and to my delight Bram Van Damme has written lazierLoad, inspired by Lazy Load for JQuery. lazierLoad works by attaching a listener to each image that listens for scroll events -- when the browser window scrolls each image receives a notification event and works out whether it needs to load the image. In theory, all you do is add the lazierLoad Javascript to your page, and only images that are currently visible will be fetched from the server. I say "in theory" because I needed to tweak the script a little because the thumbnails are inside a DIV element that has it's own scrollbar (thanks to the CSS style overflow:auto). Hence I needed to add the listener to this DIV, and compute coordinates for the image taking the DIV into account. Like most things, easy once you know how (translation, after numerous failed attempts, and the occasional "doh!" it seems to work).

You can see lazy loading in action if you view a BHL item, such as Item 26140. Note that this implementation of lazy loading doesn't work in Safari, much to my chagrin (it's my default browser). It works fine in Firefox