Search this keyword

Visualising differences between classifications using cluster maps

As part of a project to build a tool to navigate through taxonomic names and classifications I've become interested in quick ways to compare classifications. For example, EOL has multiple classifications for the same taxon, and I'd like to quickly discover what the similarities and differences are.

One promising approach is to use "cluster maps", a technique described by Fluit et al. (see Aduna Cluster Map for an implementation):

Fluit, C., Sabou, M., & Harmelen, F. (2006). Visualizing the Semantic Web. (V. Geroimenko & C. Chen, Eds.) (pp. 45–58). Springer Science + Business Media. doi:10.1007/1-84628-290-X_3 (see also http://www.cs.vu.nl/~frankh/abstracts/VSW05.html)

Cluster map details

Cluster maps can be thought of as fancy Venn Diagrams, in that they can be used to depict the overlap between sets of objects. The diagram is a graph with two kinds of nodes. One represents categories (in the example above, file formats and search terms), the other represents sets of objects that occur in one or more categories (in the example above, these are files that match the search terms "rdf" and "aperture").

I've cobbled together a crude version of cluster maps. For a given taxon (e.g., a genus) I list all the immediate sub-taxa (e.g., species) in each classification in EOL, and then find the sets of sub-taxa that are shared across the classification sources (e.g., ITIS, NCBI, etc.) and those that are unique to one source. I then create the cluster map using Graphviz. Inspired by the hexagonal packing used by Aduna, I've done something similar to display the taxa in each set. Adding these to the output of Graphviz required a little fussing with. First I get Graphviz to output the graph in SVG, then I load the SVG into a program that locates each node in the graph and inserts SVG for the packed circles (given that SVG is XML this is fairly straightforward).

As an example, consider the genus Demansia (http://eol.org/pages/34967/overview). EOL reports four classifications for this genus. Below is a cluster map for this genus:

34967

This diagram show that, for example, the Catalogue of Life (CoL) and Reptile databases share 4 names, these databases share three other names with ITIS. All databases have names unique to themselves, one database (NCBI) is completely disconnected from the other three databases.

One important caveat here is that I'm mapping the scientific names as returned by EOL, and in many cases these contain the taxonomic authority. This is a major headache, prompting this outburst:


If we clean the names by removing the taxonomic authority the clusters overlap rather more:
Demansia
Now we see that only ITIS and the Reptile Database have unique names. This is one reason why I get stroppy when taxonomists start saying databases shouldn't have to supply cleaned "canonical" names. If the names have authorities then I have to clean them, because in many cases the authorities (while useful to know) are inconsistent across databases. For example:

  • Demansia olivacea GRAY 1842 versus Demansia olivacea (Gray, 1842)
  • Demansia torquata GÜNTHER 1862 versus Demansia torquata (Günther, 1862)

Taxonomic authorities are frequently misspelt, and people seem confused about when to use parentheses or not. Databases should spare the user some pain and provide clean names (and authority strings separately where they have them).

The visualisation is still incomplete (I need to make it interactive), but it shows promise. The names that are unique to one database are usually worth investigating. In some cases they are names other databases regard as synonyms, in other cases they represent spelling variations. The goal of this visualisation is to highlight the names that the user might want to investigate further.

Using a zoomable treemap to visualise a taxonomic classification

One visualisation method I keep coming back too is the treemap. Each time I experiment with them I learn a little bit more, but I usually end up abandoning them (with the exception of using quantum treemaps to display bibliographic data). But they keep calling me back.

My latest experiment builds on some earlier thoughts on quantum treemaps, but tackles two issues that have kept bugging me. The first is that quantum treemaps are limited to hierarchies that are only two levels deep (e.g., family → genus → species). This is because, unlike regular treemaps where you are slicing and dicing a rectangle of predetermined size, when you construct a quantum treemap you don't know how big it will be until you've made it (this is because you want to ensure that every item in the hierarchy can be displayed at the same size, and fitting them in may require you to tweak the size of the treemap). Given that taxonomic classifications have > 2 levels this is a problem. One approach is to construct quantum treemaps for the lower parts of the classification, then pack those into a larger rectangle. This is an instance of the packing problem. After Googling for a bit I came up across this code for packing rectangles, which was easy to follow and gave reasonable results.

The second problem is that I want the treemap to be interactive. I want to be able to zoom in and out and navigate around the treemap. After more Googling, I came across the Zoomooz.js library which makes web page elements zoom (for a pretty mind-blowing example of what can be done see impress.js), but I decided I want to work with SVG. After playing with examples from Keith Wood's jQuery SVG plugin I started to get the hang of creating zoomable visualisations in SVG.

Here's a video of what I've come up with so far (you can see this live at http://iphylo.org/~rpage/zoomrect/primates.html). This is an interactive display of the Catalogue of Life 2010 classification of primates, with images from EOL. It's crude, there are some obvious issues with redrawing images, labels, etc., but it gives a sense of what can be done. With care this could probably be scaled up to handle the entire Catalogue of Life classification. With a bit more care, it could probably be optimised for the iPad, which would be a fun way to navigate through the diversity of life.

Discovering species descriptions in digitised newspapers: Trove and The Brisbane Courier


While exploring ways to visually compare classifications I came across the Australian snake name Demansia atra, and ended up reading a series of papers in the Bulletin of Zoological Nomenclature discussing the status of the name (more fun than it sounds, trust me). For example, Smith and Wallach Case 2920. Diemenia atra Macleay, 1884 (currently Demansia atra; Reptilia, Serpentes): proposed conservation of the specific name asked the ICZN to conserve the name, whereas Shea On the proposed conservation of the specific name of Diemenia atra Macleay, 1884 (currently Demansia atra; Reptilia, Serpentes) argued that Hoplocephalus vestigiatus was the correct name for the snake (OK, perhaps not that much fun.)

The reason I bring this up is that the original description of Hoplocephalus vestigiatus was published in an Australian newspaper, the The Brisbane Courier, 13 September 1884 (!). This newspaper has been digitised and is available in Trove, a digital archive hosted by the National Library of Australia. The description of Hoplocephalus vestigiatus appears in an account of a meeting of the Royal Society of Queensland: http://trove.nla.gov.au/ndp/del/article/3434083.

The Trove newspaper archive has both scanned images and OCR text, rather like BHL, but also enables users to correct OCR errors. The original text looked like this:

Mr De Vis then read a communication en
titled " Desciiptioua of >icw Snakes' -
This papei fe ive the descriptions of four
udditions to our Austi alian snake fauna,
and was prefaced by a synopsis and ditfii ential
characters of the genoa Huplo fjihaliui, to
which two of the anakca were referred-a genus
which Mr De Via stated w as out of all propor
tion larger than any other of anakea in Aus
traha, and, though consisting for the most
deadly reptile of Queensland-the brown
banded or "tiger snake " The new sn ikes were
Hoplocephalus snlcans-the furrow snake, so
called from the faculty the reptile has of con
verting ita ventral surface into a continuous
furrow, forwaidcd by Mr. C W de Burgh
Birch from the Mitchell district Hnplore
phalus lestigialiis, the foot punt snake, a name
of the white mai kings upon its back to tracks
of feet, Cacoplm, Wari o from Warroo station,
in the Port Curtía district, where it was
collected by Mi Llackman, one of the mern
bcis of the soeietj , and IJrarln/ioma Snther
lundi, a snakefrom Carl Creek, Norman River,
dcdicatcel to Mi J Sutherland, of normanton,


A few quick edits in Trove and it looks like this:

Mr. De Vis then read a communication en-
titled "Descriptions of New Snakes."—
This paper gave the descriptions of four
additions to our Australian snake fauna,
and was prefaced by a synopsis and differential
characters of the genus Hoplocephalus to
which two of the snakes were referred—a genus
which Mr. De Vis stated was out of all propor-
tion larger than any other of snakes in Aus-
tralia, and, though consisting for the most
deadly reptile of Queensland—the brown
banded or "tiger snake." The new snakes were
Hoplocephalus sulcans—the furrow snake, so
called from the faculty the reptile has of con-
verting its ventral surface into a continuous
furrow, forwarded by Mr. C. W. de Burgh
Birch from the Mitchell district . Hoploce-
phalus vestigitatus, the foot-print snake, a name
said to be suggested by the fancied resemblance
of the white markings upon its back to tracks
of feet; Cacophis Warro, from Warroo station,
in the Port Curtis district, where it was
collected by Mr. Blackman, one of the mem-
bers of the society; and Brachysoma Suther-
landi, a snake from Carl Creek, Norman River,
dedicated to Mr. J. Sutherland, of Normanton.
Searching Trove for Hoplocephalus I discovered a number of articles on snakes, some of which have also had their OCR text corrected, a measure of the success the project has had in engaging users. Trove has come up several times in discussions abut OCR correction and BHL, but this is the first time I've taken a closer look — I didn't expect to find species descriptions in an Australian newspaper.