Search this keyword

Using a zoomable treemap to visualise a taxonomic classification

One visualisation method I keep coming back too is the treemap. Each time I experiment with them I learn a little bit more, but I usually end up abandoning them (with the exception of using quantum treemaps to display bibliographic data). But they keep calling me back.

My latest experiment builds on some earlier thoughts on quantum treemaps, but tackles two issues that have kept bugging me. The first is that quantum treemaps are limited to hierarchies that are only two levels deep (e.g., family → genus → species). This is because, unlike regular treemaps where you are slicing and dicing a rectangle of predetermined size, when you construct a quantum treemap you don't know how big it will be until you've made it (this is because you want to ensure that every item in the hierarchy can be displayed at the same size, and fitting them in may require you to tweak the size of the treemap). Given that taxonomic classifications have > 2 levels this is a problem. One approach is to construct quantum treemaps for the lower parts of the classification, then pack those into a larger rectangle. This is an instance of the packing problem. After Googling for a bit I came up across this code for packing rectangles, which was easy to follow and gave reasonable results.

The second problem is that I want the treemap to be interactive. I want to be able to zoom in and out and navigate around the treemap. After more Googling, I came across the Zoomooz.js library which makes web page elements zoom (for a pretty mind-blowing example of what can be done see impress.js), but I decided I want to work with SVG. After playing with examples from Keith Wood's jQuery SVG plugin I started to get the hang of creating zoomable visualisations in SVG.

Here's a video of what I've come up with so far (you can see this live at http://iphylo.org/~rpage/zoomrect/primates.html). This is an interactive display of the Catalogue of Life 2010 classification of primates, with images from EOL. It's crude, there are some obvious issues with redrawing images, labels, etc., but it gives a sense of what can be done. With care this could probably be scaled up to handle the entire Catalogue of Life classification. With a bit more care, it could probably be optimised for the iPad, which would be a fun way to navigate through the diversity of life.

Discovering species descriptions in digitised newspapers: Trove and The Brisbane Courier


While exploring ways to visually compare classifications I came across the Australian snake name Demansia atra, and ended up reading a series of papers in the Bulletin of Zoological Nomenclature discussing the status of the name (more fun than it sounds, trust me). For example, Smith and Wallach Case 2920. Diemenia atra Macleay, 1884 (currently Demansia atra; Reptilia, Serpentes): proposed conservation of the specific name asked the ICZN to conserve the name, whereas Shea On the proposed conservation of the specific name of Diemenia atra Macleay, 1884 (currently Demansia atra; Reptilia, Serpentes) argued that Hoplocephalus vestigiatus was the correct name for the snake (OK, perhaps not that much fun.)

The reason I bring this up is that the original description of Hoplocephalus vestigiatus was published in an Australian newspaper, the The Brisbane Courier, 13 September 1884 (!). This newspaper has been digitised and is available in Trove, a digital archive hosted by the National Library of Australia. The description of Hoplocephalus vestigiatus appears in an account of a meeting of the Royal Society of Queensland: http://trove.nla.gov.au/ndp/del/article/3434083.

The Trove newspaper archive has both scanned images and OCR text, rather like BHL, but also enables users to correct OCR errors. The original text looked like this:

Mr De Vis then read a communication en
titled " Desciiptioua of >icw Snakes' -
This papei fe ive the descriptions of four
udditions to our Austi alian snake fauna,
and was prefaced by a synopsis and ditfii ential
characters of the genoa Huplo fjihaliui, to
which two of the anakca were referred-a genus
which Mr De Via stated w as out of all propor
tion larger than any other of anakea in Aus
traha, and, though consisting for the most
deadly reptile of Queensland-the brown
banded or "tiger snake " The new sn ikes were
Hoplocephalus snlcans-the furrow snake, so
called from the faculty the reptile has of con
verting ita ventral surface into a continuous
furrow, forwaidcd by Mr. C W de Burgh
Birch from the Mitchell district Hnplore
phalus lestigialiis, the foot punt snake, a name
of the white mai kings upon its back to tracks
of feet, Cacoplm, Wari o from Warroo station,
in the Port Curtía district, where it was
collected by Mi Llackman, one of the mern
bcis of the soeietj , and IJrarln/ioma Snther
lundi, a snakefrom Carl Creek, Norman River,
dcdicatcel to Mi J Sutherland, of normanton,


A few quick edits in Trove and it looks like this:

Mr. De Vis then read a communication en-
titled "Descriptions of New Snakes."—
This paper gave the descriptions of four
additions to our Australian snake fauna,
and was prefaced by a synopsis and differential
characters of the genus Hoplocephalus to
which two of the snakes were referred—a genus
which Mr. De Vis stated was out of all propor-
tion larger than any other of snakes in Aus-
tralia, and, though consisting for the most
deadly reptile of Queensland—the brown
banded or "tiger snake." The new snakes were
Hoplocephalus sulcans—the furrow snake, so
called from the faculty the reptile has of con-
verting its ventral surface into a continuous
furrow, forwarded by Mr. C. W. de Burgh
Birch from the Mitchell district . Hoploce-
phalus vestigitatus, the foot-print snake, a name
said to be suggested by the fancied resemblance
of the white markings upon its back to tracks
of feet; Cacophis Warro, from Warroo station,
in the Port Curtis district, where it was
collected by Mr. Blackman, one of the mem-
bers of the society; and Brachysoma Suther-
landi, a snake from Carl Creek, Norman River,
dedicated to Mr. J. Sutherland, of Normanton.
Searching Trove for Hoplocephalus I discovered a number of articles on snakes, some of which have also had their OCR text corrected, a measure of the success the project has had in engaging users. Trove has come up several times in discussions abut OCR correction and BHL, but this is the first time I've taken a closer look — I didn't expect to find species descriptions in an Australian newspaper.

Linking NCBI taxonomy to GBIF


In response to Rutger Vos's question I've started to add GBIF taxon ids to the iPhylo Linkout website. If you've not come across iPhylo Linkout, it's a Semantic Mediawiki-based site were I maintain links between the NCBI taxonomy and other resources, such as Wikipedia and the BBC Nature Wildlife finder. For more background see

Page, R. D. M. (2011). Linking NCBI to Wikipedia: a wiki-based approach. PLoS Currents, 3, RRN1228. doi:10.1371/currents.RRN1228

I'm now starting to add GBIF ids to this site. This is potentially fraught with difficulties. There's no guarantee that the GBIF taxonomy ids are stable, unlike NCBI tax_ids which are fairly persistent (NCBI publish deletion/merge lists when they make changes). Then there are the obvious problems with the GBIF taxonomy itself. But, if you want a way to generate a distribution map for a taxon in the NCBI taxonomy, the quickest way is going to be via GBIF.

The mapping is being made automatically, with some crude checks to try and avoid too many erroneous links (e.g., due to homonyms). It will probably take a few days to complete (the mapping is quick, uploading to the wiki is a bit slower). Using a wiki to manage the mapping makes it easy to correct any spurious matches.

As an example, the page http://iphylo.org/linkout/Ncbi:109175 is for the frog Hyla japonica (NCBI tax_id 109175) and shows links to Wikipedia (http://en.wikipedia.org/wiki/Japanese_Tree_Frog, and to GBIF (http://data.gbif.org/species/2427601/). There's even a link to TreeBASE. I display a GBIF map so you can see what data GBIF currently has for that taxon.

Hyla

So, we have a wiki page, how do we answer Rutger's original question: how to get GBIF occurrence records via web service?

To do this we can use the RDF output by the Semantic Mediawiki software that underpins the Wiki. You can gte this by clicking on the RDF icon near the bottom of the page, or go to http://iphylo.org/linkout/Special:ExportRDF/Ncbi:109175. The RDF this produces is really, really ugly (and people wonder why the Semantic Web has been slow to take off...). In this RDF you will see the statement:

<rdfs:seeAlso rdf:resource="http://data.gbif.org/species/2427601/"/>

So, arm yourself with XPath, a regular expression, or if you are a serious RDF geek break out the SPARQL, and you can extract the GBIF taxon id for a NCBI taxon. Given that id you can query the GBIF web services. One service that I like is the occurrence density service, which you can use to recreate the 1°×1° density maps shown by GBIF. For example, http://data.gbif.org/ws/rest/density/list?taxonconceptkey=2427601 will get you the squares shown in the screen shot above.

Of course, I have glossed over several issues, such as the errors and redundancy in the GBIF classification, the mismatch between NCBI and GBIF classifications (NCBI has many more ranks than GBIF), and whether the taxon concepts used by the two databases are equivalent (this is likely to be more of an issue for higher taxa). But it's a start.