Search this keyword

Towards a biogeographic search engine

We all have a "past" that we might not advertise widely, and my past includes flirting with panbiogeography. Indeed my PhD thesis hdl:2292/1999 is entitled "Panbiogeography: a cladistic approach." Shortly after graduating I moved on to host-parasite cospeciation and the gene tree/species tree problem ("reconciled trees", see Katz et al. http://dx.doi.org/10.1093/sysbio/sys026 for a recent example of this approach), but part of me misses the glory days of vicariance, dispersal, and panbiogeography.

One thing which strikes me is how little use large-scale historical biogeography makes of GBIF data. One of the things that made Croizat's panbiogeography so interesting was the way he exposed similar distribution patterns in unrelated groups of organisms. He did this by hand, producing map after map, some embellished with all manner of annotations ("gates", "nodes", "massings", etc.). In some ways, Croizat as an early data miner. Now we are awash in distributional data, where are the people revisiting global scale historical patterns? In particular, wouldn't it be cool to have a biogeographic search engine that could pull out taxa with particular distribution patterns that we could then analyse.

For example, while working on a project to map taxonomic names to literature and genomics data, I embedded a widget to display GBIF maps. Every so often I come across taxa have the classic "Gondwana" distribution pattern. For example, below is a map for stoneflies of the family the Notonemouridae from GBIF.



Below is a map for the Notonemouridae using an orthographic projection (see earlier post for details):

Notonemouridae
Another family of stone flies, the Gripopterygidae, show a similar pattern:

Gripopterygidae

What I'd like is to be able to query a database like GBIF for patterns such as these Gondwanic distributions, then be able to pull out associated phylogenetic information (e.g., via sequences in GenBank) so that we could determine the antiquity of these patterns, and whether they are consistent with geological models. We could begin to do large-scale testing of biogeographic hypotheses in a (semi-)automated way. At present we generally rely on a few well-studied examples that are either broadly consistent with
Bocxlaer, I. V., Roelants, K., Biju, S. D., Nagaraju, J., & Bossuyt, F. (2006). Late Cretaceous Vicariance in Gondwanan Amphibians. (M. Hofreiter, Ed.)PLoS ONE, 1(1), e74. doi:10.1371/journal.pone.0000074.t002

or contradict
Cook, L. G., & Crisp, M. D. (2005). Not so ancient: the extant crown group of Nothofagus represents a post-Gondwanan radiation. Proceedings of the Royal Society B: Biological Sciences, 272(1580), 2535–2544. doi:10.1098/rspb.2005.3219

the hypothesis that the history of biota of the southern hemisphere has been largely structured by the break-up of Gondwana.

A first step might be to index distributions at, say, family level and above, and provide a series of polygons representing different distribution patterns. We then search for distributions that are largely concordant with those patterns, and query GenBank (or TreeBASE) for sequences (or phylogenies) for those taxa. We then ask the questions "how old are these taxa?" and "what biogeographic histories do they have?"

Touching the tree of life

Prompted by a conversation with Vince Smith at the recent Online Taxonomy meeting at the Linnean Society in London I've been revisiting touch-based displays of large trees. There are a couple of really impressive examples of what can be done.

Perceptive Pixel


I've blogged about this before, but came across another video that better captures the excitement of touch-based navigation of a taxonomy. Perceptive Pixel's (recently acquired by Microsoft) Jeff Han demos browsing an animal classification. The underlying visualisation is fairly sttaightforward, but the speed and ease with which you can interact with it clearly makes it fun to use.

DeepTree



DeepTree comes from Life on Earth lab, and there's a paper coming out by @blockflorian and colleagues (I was reminded of this project by @treevisproject):



For technical details on the layout algorithm see https://lifeonearth.seas.harvard.edu/downloads/DeepTree.pdf. Below is a video of it in use:



Both of these are really nice, but what I really want is to have this on my iPad…

Decoding Nature's ENCODE iPad app - OMG it's full of ePUB

Encode
The release of the ENCODE (ENCyclopedia Of DNA Element) project has generated much discussion (see Fighting about ENCODE and junk). Perhaps perversely, I'm more interested in the way Nature has packaged the information than the debate about how much of our DNA is "junk."

Nature has a website (http://www.nature.com/encode/) that demonstrates the use of "threads" to navigate through a set of papers. Instead of having to read every paper you can pick a topic and Nature has collected a set of extracts on that topic (such as a figure and its caption) from the relevant papers and linked them together as a thread. Here is a video outlining the rationale behind threads.


Threads can be viewed on Nature's web site, and also in the iPad app. The iPad app is elegant, and contains full text for articles from Nature, Genome Research, Genome Biology, BMC Genetics. Despite being from different journals the text and figures from these articles are displayed in the same format in the app. Curious as to how this was done I "disassembled" the iPad app (see Extract and Explore an iOS App in Mac OS X for how to do this. If you've downloaded the app on your iPad and synced the iPad with your Mac, then the apps are in the folder "iTunes/iTunes Media/Mobile Applications" folder inside your "Music" folder. The app contains a file called encode.zip, and inside that folder are the articles and threads, all as ePub files. ePub is the format used by a number of book-reading apps, such as Apple's iBooks. Nature has a lot of experience with ePub, using it in their iPhone and iPad journal apps (see my earlier article on these apps, and my web-based clone for more details).

Photo
ePub has several advantages in this context over, say, PDFs. Because it ePUb is essentially HTML, the text and images can be reflowed, and it is possible to style the content consistently (imagine how much clunkier things would have looked if the app had used PDFs of the articles, each in the different journals' house style). Having the text in ePub also makes creating threads easy, you simply extract the relevant chunks and combine them into a new ePub file.

Threads are an interesting approach, particularly as they cut across the traditional boundaries of individual articles to create a kind of "mash up." Of course, in the ENCODE app these are preselected for you, you can't create your own thread. But you could imagine having an app that would enable you to not just collect the papers relevant to a topic (as we do with bibliographic software), but enable you to extract the relevant chunks and create a personalised mash up across papers from multiple journals, each linked back to the original article (much like Ted Nelson envisioned for the Xanadu project). It will be interesting to see whether thread-like approaches get more widely adopted. Whatever happens, Nature are consistently coming up with innovative approaches to displaying and navigating the scientific literature.