Search this keyword

Reading the Biodiversity Heritage Library using Readmill

Readmill reasonably smalltl;dr Readmill might be a great platform for shared annotation and correction of Biodiversity Heritage Library content.

Thinking about accessing the taxonomic literature I started revisiting previous ideas. One is DeepDyve (see DeepDyve - renting scientific articles). Imagine not having to pay large sums for an article, but being able to rent it. Yes, open access would be great, but ultimately it's all a question of money (who pays and when), the challenge is to find the mix of models that encourage people to digitise the relevant literature. Instead of publishers insisting we pay $US30 for an article, how about renting it for the short time we actually need to read it?

Another model is unglue.it, a Kickstarter-like company that seeks to raise funds to digitise and make freely available e-Books. unglue.it has campaigns where people pledge donations, and if sufficient pledges are made the book's rights-holder has the book digitised and released DRM-free.

Looking at unglue.it I stumbled across Readmill, "a curious community of readers, highlighting and sharing the books they love." Readmill has an iPad app where you can highlight passages of text and add your own annotation. These annotations can be shared, and multiple people can read and comment on the same book. Imagine doing this on BHL content. You could highlight parts of the text where the OCR has failed, and provide a correction. You could highlight taxonomic names that automatic parsers have missed, geographic localities, cited literature, etc. All within a nice, social app.

Even better, Readmill has an API. You can retrieve highlights and comments on those highlights. So, if someone flags a sentence as mangled OCR and provides a correction, that correction could be harvested and feed back to, say, BHL. These corrections could be used to improve searches, as well as the text delivered when generating searchable PDFs, etc.

You can even add highlights via the API, so we could upload a ePub book then add all the taxonomic names found by uBio or NetiNeti, enabling users to see which bits of text are probably names, correcting any mistakes along the way. Instead of giving readers a blank canvas they could already have annotations to start with.

Building an app from scratch to read and annotate BHL content would be a major undertaking. From my cursory initial look I wonder if Readmill might just provide the platform we need to clean up and annotate key parts of the BHL corpus?

Towards a biogeographic search engine

We all have a "past" that we might not advertise widely, and my past includes flirting with panbiogeography. Indeed my PhD thesis hdl:2292/1999 is entitled "Panbiogeography: a cladistic approach." Shortly after graduating I moved on to host-parasite cospeciation and the gene tree/species tree problem ("reconciled trees", see Katz et al. http://dx.doi.org/10.1093/sysbio/sys026 for a recent example of this approach), but part of me misses the glory days of vicariance, dispersal, and panbiogeography.

One thing which strikes me is how little use large-scale historical biogeography makes of GBIF data. One of the things that made Croizat's panbiogeography so interesting was the way he exposed similar distribution patterns in unrelated groups of organisms. He did this by hand, producing map after map, some embellished with all manner of annotations ("gates", "nodes", "massings", etc.). In some ways, Croizat as an early data miner. Now we are awash in distributional data, where are the people revisiting global scale historical patterns? In particular, wouldn't it be cool to have a biogeographic search engine that could pull out taxa with particular distribution patterns that we could then analyse.

For example, while working on a project to map taxonomic names to literature and genomics data, I embedded a widget to display GBIF maps. Every so often I come across taxa have the classic "Gondwana" distribution pattern. For example, below is a map for stoneflies of the family the Notonemouridae from GBIF.



Below is a map for the Notonemouridae using an orthographic projection (see earlier post for details):

Notonemouridae
Another family of stone flies, the Gripopterygidae, show a similar pattern:

Gripopterygidae

What I'd like is to be able to query a database like GBIF for patterns such as these Gondwanic distributions, then be able to pull out associated phylogenetic information (e.g., via sequences in GenBank) so that we could determine the antiquity of these patterns, and whether they are consistent with geological models. We could begin to do large-scale testing of biogeographic hypotheses in a (semi-)automated way. At present we generally rely on a few well-studied examples that are either broadly consistent with
Bocxlaer, I. V., Roelants, K., Biju, S. D., Nagaraju, J., & Bossuyt, F. (2006). Late Cretaceous Vicariance in Gondwanan Amphibians. (M. Hofreiter, Ed.)PLoS ONE, 1(1), e74. doi:10.1371/journal.pone.0000074.t002

or contradict
Cook, L. G., & Crisp, M. D. (2005). Not so ancient: the extant crown group of Nothofagus represents a post-Gondwanan radiation. Proceedings of the Royal Society B: Biological Sciences, 272(1580), 2535–2544. doi:10.1098/rspb.2005.3219

the hypothesis that the history of biota of the southern hemisphere has been largely structured by the break-up of Gondwana.

A first step might be to index distributions at, say, family level and above, and provide a series of polygons representing different distribution patterns. We then search for distributions that are largely concordant with those patterns, and query GenBank (or TreeBASE) for sequences (or phylogenies) for those taxa. We then ask the questions "how old are these taxa?" and "what biogeographic histories do they have?"

Touching the tree of life

Prompted by a conversation with Vince Smith at the recent Online Taxonomy meeting at the Linnean Society in London I've been revisiting touch-based displays of large trees. There are a couple of really impressive examples of what can be done.

Perceptive Pixel


I've blogged about this before, but came across another video that better captures the excitement of touch-based navigation of a taxonomy. Perceptive Pixel's (recently acquired by Microsoft) Jeff Han demos browsing an animal classification. The underlying visualisation is fairly sttaightforward, but the speed and ease with which you can interact with it clearly makes it fun to use.

DeepTree



DeepTree comes from Life on Earth lab, and there's a paper coming out by @blockflorian and colleagues (I was reminded of this project by @treevisproject):



For technical details on the layout algorithm see https://lifeonearth.seas.harvard.edu/downloads/DeepTree.pdf. Below is a video of it in use:



Both of these are really nice, but what I really want is to have this on my iPad…