Search this keyword

Visualising taxonomic classifications using SpaceTrees

The problem of displaying large taxonomic classifications on a web page continues to be an on again-off again obsession. My latest experiment makes use of Nicolas Garcia Belmonte's wonderful JavaScript Infovis Toolkit (JIT), which provides implementations of classic visualisations such as treemaps, hyperbolic trees, and SpaceTrees.

SpaceTrees were developed at Maryland's HCIL lab, and that lab has applied them to biodiversity informatics. The LepTree project has also used them (see LepTaxonTree). I've not been a huge fan, mainly because the existing implementation is a stand-alone Java program, which somewhat limits it's utility. But JIT changes all that.

To get a sense of whether SpaceTrees would be useful, I took Belmonte's second SpaceTree example as a starting point. In this example, nodes are created on demand (rather than loading the entire tree into memory). It proved relatively straightforward (after getting my head around making Ajax requests using Mootools) to modify the example to load nodes from a local copy of the Catalogue of Life 2008 classification.



I've put a live version of the Catalogue of Life SpaceTree up at http://bioguid.info/demos/spacetree. It doesn't do much, beyond display the tree, together with some basic information about the node. But I think it shows the power of Javacsript to create pleasing visualisations, and the potential of SpaceTrees as a simple tool to browse large taxonomic classifications.

ChrisFreeland.com: #ebio09, silverbacks, & haiku

Chris Freeland has written a thoughtful summary of his experiences of the two-day closed session to create a road map for biodiversity informatics, entitled #ebio09, silverbacks, & haiku.

Taxonomy on a hard disk

This post is likely to seem somewhat off the wall, given the rush to getting everything in the cloud, but it's Friday, so let's give it a whirl.

One idea I've been toying with is dispensing with relational databases, wikis, etc. and just storing taxonomic data using files and folders on a disk. There are several reasons for this:

  • File system naturally enforces hierachy

  • There are existing systems for putting files and folders under version control (e.g., CVS, Subversion, git)

  • Native text and image editors handily beat web-based ones

  • Some file systems have great tools for searching on metadata (e.g., "smart folders" and Spotlight on Mac OS X)

  • Some of the visualisations that we would like for classifications (such as treemaps) already exist in very polished form for viewing file systems

By way of background, I've been prompted to think along these lines by David Shorthouse's observation that we could place a taxonomic hierachy under version control (e.g., github) and deal with changes/multiple versions that way. I've also been inspired by tools such as couchdb, which is a schema-less database that one can talk to directly via HTTP. This reflects a trend where people are starting to exploit the untapped power in some basic, well-known technologies (such as the HTTP protocol), avoiding the need for lots of middle-ware in the process (why write stuff in Java/Ruby/PHP, etc. when HTTP GET, PUT, POST, etc. cover the bases?). Another inspiration is Dropbox, which enables replication of files across multiple machines and the web. The web interface to Dropbox is very clean, and essentially mirrors the local folder structure.

So, in some ways this probably sounds silly, and closely resembles the naive way many of us started making digital versions of taxonomies, and it will have many database people rolling their eyes and muttering about "data consistency" and "queries". But, a key thing to remember is that the file system is a database that resides under a graphical user interface, and it maintains some forms of consistency that classical relational databases are poor at handling. For example, file systems enforce hierarchical consistency (if I move a folder to another folder, all the files and folders below that folder move as well). Of course, we can program this with a relational database, but our track record in doing this is pretty miserable. I've found inconsistencies in versions of ITIS (haven't checked recently), and last years' Catalogue of Life database had all sorts of orphans lurking in the tree table.

Then there's the GUI. If I write a taxonomic database in the classical way, I need to write code to talk to the database, edit records, support user authentication, data versioning, etc. If I use the file system, I get this pretty much for free. Authentication? It's called the login screen. Versioning? I put it in a public repository like Google Code or github, and that takes care of that (plus I get online authentication for free). Editing? Well, I can drag and drop items onto folders, and I can open them in native editors.

What I envisage is replicating a taxonomic hierarchy on disk, and representing key-value pairs of attributes (such as taxon name authorship, bibliographic details) as text files where the name of the file is the key (e.g., publishedIn) and the content of the file is the value (e.g., doi:10.1590/S0101-81752005000300004). I could add images and PDFs, and the neat thing is that they have lots of useful metadata embedded inside (where, arguably, it belongs).

I'm also toying with the idea of using symbolic links (Windows users, look away now) to represent relationships such as basionym links to original names.

This is all a bit half-baked at present, but it seems worth pursing. One could argue that having a full taxonomic hierachy is overkill (and raises the issue of which one to use), but binomial names are themselves hierachical (species epithet nested inside genus name), so we need some degree of hierarchy anyway. I like the idea that copying a folder called "behreae" in the folder "Pinnixa" and placing the copy under "Austinixa", then within Austinixa/behreae adding a symbolic link to Pinnixa/behreae pretty much takes care of synonomy. I also like the idea that one could download an entire taxonomy, and using just the native tools on your computer, edit and annotate it, then merge changes with a remote copy. It makes mamnaging the data little different from writing code.

In practise we'll want to add some things. It would be nice to have a web interface for browsing, but this could be as trivial as having a script that read the contents of a folder, display folders as HTML links, and list the files (keys) and their contents (values) in the web page.

Perhaps this is a little silly, but I like the idea of having data on my machine that is trivially easy to edit. I also like the idea of getting functionality for free, rather than having to invent it from scratch.