Health fitness magazine,health & fitness magazine,muscle fitness magazine,women fitness magazine,muscle & fitness magazine,articles on health and fitness.
James Rosindell's OneZoom tree viewer is out and the paper describing the viewer has been published in PLoS One (disclosure, I was a reviewer):
Rosindell, J., & Harmon, L. J. (2012). OneZoom: A Fractal Explorer for the Tree of Life. PLoS Biology, 10(10), e1001406. doi:10.1371/journal.pbio.1001406.g004
Below is a video where James describes OneZoom.
OneZoom is fun, and is deservedly attracting a a lot of attention. But as visually striking as it is, I confess I have reservations about fractal-based viewers. For a start they make it hard to get a sense of the relative size of taxonomic groups. Looking at the mammal tree shown in the video above your eye is drawn to the monotremes, one of the smallest mammalian lineages. That the greatest number of extant mammals are either rodents or bats is not readily apparent. Fractal geometry also removes the timescale, so you can't discover whether radiations in different clades are the same age (unlike, say, if the tree was drawn in a "traditional" fashion with a linear timescale). In some ways I think fractal viewers are rather like the hyperbolic viewers that attracted attention about a decade ago - visually striking but ultimately difficult to interpret. What I'd like to see are studies which evaluate how easily people can navigate different trees and accomplish specific tasks (such as determining closest relationships, relative clade diversity, etc.).
In some ways OneZoom resembles Google Maps with its zoomable interface. But ironically this only serves to illustrate a key different between OneZoom and Google Maps. Part of the strength of the later is the consistent conventions for drawing maps (e.g., north is up, south is down) which, when coupled with agreed co-ordinates (latitude and longitude), enables people to mash up geographic data. What I'd like is the equivalent of CartoDB for trees.
Prompted by a conversation with Vince Smith at the recent Online Taxonomy meeting at the Linnean Society in London I've been revisiting touch-based displays of large trees. There are a couple of really impressive examples of what can be done.
Perceptive Pixel
I've blogged about this before, but came across another video that better captures the excitement of touch-based navigation of a taxonomy. Perceptive Pixel's (recently acquired by Microsoft) Jeff Han demos browsing an animal classification. The underlying visualisation is fairly sttaightforward, but the speed and ease with which you can interact with it clearly makes it fun to use.
As part of a project to build a tool to navigate through taxonomic names and classifications I've become interested in quick ways to compare classifications. For example, EOL has multiple classifications for the same taxon, and I'd like to quickly discover what the similarities and differences are.
One promising approach is to use "cluster maps", a technique described by Fluit et al. (see Aduna Cluster Map for an implementation):
Cluster maps can be thought of as fancy Venn Diagrams, in that they can be used to depict the overlap between sets of objects. The diagram is a graph with two kinds of nodes. One represents categories (in the example above, file formats and search terms), the other represents sets of objects that occur in one or more categories (in the example above, these are files that match the search terms "rdf" and "aperture").
I've cobbled together a crude version of cluster maps. For a given taxon (e.g., a genus) I list all the immediate sub-taxa (e.g., species) in each classification in EOL, and then find the sets of sub-taxa that are shared across the classification sources (e.g., ITIS, NCBI, etc.) and those that are unique to one source. I then create the cluster map using Graphviz. Inspired by the hexagonal packing used by Aduna, I've done something similar to display the taxa in each set. Adding these to the output of Graphviz required a little fussing with. First I get Graphviz to output the graph in SVG, then I load the SVG into a program that locates each node in the graph and inserts SVG for the packed circles (given that SVG is XML this is fairly straightforward).
This diagram show that, for example, the Catalogue of Life (CoL) and Reptile databases share 4 names, these databases share three other names with ITIS. All databases have names unique to themselves, one database (NCBI) is completely disconnected from the other three databases.
One important caveat here is that I'm mapping the scientific names as returned by EOL, and in many cases these contain the taxonomic authority. This is a major headache, prompting this outburst:
Argh, taxonomic databases that don't provide canonical names (i.e., with authority names REMOVED) will be first against the wall @eol
If we clean the names by removing the taxonomic authority the clusters overlap rather more: Now we see that only ITIS and the Reptile Database have unique names. This is one reason why I get stroppy when taxonomists start saying databases shouldn't have to supply cleaned "canonical" names. If the names have authorities then I have to clean them, because in many cases the authorities (while useful to know) are inconsistent across databases. For example:
Demansia olivacea GRAY 1842 versus Demansia olivacea (Gray, 1842)
Demansia torquata GÜNTHER 1862 versus Demansia torquata (Günther, 1862)
Taxonomic authorities are frequently misspelt, and people seem confused about when to use parentheses or not. Databases should spare the user some pain and provide clean names (and authority strings separately where they have them).
The visualisation is still incomplete (I need to make it interactive), but it shows promise. The names that are unique to one database are usually worth investigating. In some cases they are names other databases regard as synonyms, in other cases they represent spelling variations. The goal of this visualisation is to highlight the names that the user might want to investigate further.
My latest experiment builds on some earlier thoughts on quantum treemaps, but tackles two issues that have kept bugging me. The first is that quantum treemaps are limited to hierarchies that are only two levels deep (e.g., family → genus → species). This is because, unlike regular treemaps where you are slicing and dicing a rectangle of predetermined size, when you construct a quantum treemap you don't know how big it will be until you've made it (this is because you want to ensure that every item in the hierarchy can be displayed at the same size, and fitting them in may require you to tweak the size of the treemap). Given that taxonomic classifications have > 2 levels this is a problem. One approach is to construct quantum treemaps for the lower parts of the classification, then pack those into a larger rectangle. This is an instance of the packing problem. After Googling for a bit I came up across this code for packing rectangles, which was easy to follow and gave reasonable results.
The second problem is that I want the treemap to be interactive. I want to be able to zoom in and out and navigate around the treemap. After more Googling, I came across the Zoomooz.js library which makes web page elements zoom (for a pretty mind-blowing example of what can be done see impress.js), but I decided I want to work with SVG. After playing with examples from Keith Wood's jQuery SVG plugin I started to get the hang of creating zoomable visualisations in SVG.
Here's a video of what I've come up with so far (you can see this live at http://iphylo.org/~rpage/zoomrect/primates.html). This is an interactive display of the Catalogue of Life 2010 classification of primates, with images from EOL. It's crude, there are some obvious issues with redrawing images, labels, etc., but it gives a sense of what can be done. With care this could probably be scaled up to handle the entire Catalogue of Life classification. With a bit more care, it could probably be optimised for the iPad, which would be a fun way to navigate through the diversity of life.
One of the things I find frustrating about TreeBASE is that there's no easy way to get an overview of what it contains. What is it's taxonomic coverage like? Is it dominated by plants and fungi, or are there lots of animal trees as well? Are the obvious gaps in our phylogenetic knowledge, or do the phylogenies it contains pretty much span the tree of life?
As part of my phyloinformatics course I've put together a simple browser to navigate through TreeBASE. The inspiration comes from genome browsers (e.g., the UCSC Genome Browser) where the genome is treated as a linear set of co-ordinates, and features of the genome are displayed as "tracks".
For my browser, I've used the order in which nodes appear in the NCBI tree as you go from left to right as the set of co-ordinates (actually, from top to bottom as my browser displays the co-ordinate axis vertically).
I then place each TreeBASE tree within this classification by taking the TreeBASE → NCBI mapping provided by TreeBASE and finding the "majority rule" taxon for each tree (in a sense, the taxa that summarises what the tree is about). Each tree is represented by a vertical line depicting the span of the corresponding NCBI taxon (corresponding to a "track" in a genome browser). Taking the majority-rule taxon rather than say, the span of the tree, makes it possible to pack the vertical lines tightly together so that they take up less space (the ordering from left to right is determined by the NCBI taxonomy).
If you mouse-over a vertical bar you can see the title of the study that published the tree. If you click on the vertical bar you'll see the tree displayed on the right (if your web browser understands SVG, that is). If you click on the background you will drill down a level in the NCBI classification. To go back up the classification, click on the arrow at the top left of the browser.
There is also a live version at http://iphylo.org/~rpage/wikihistoryflow. If you enter the name of a Wikipedia page the tool will display the edit history with columns representing page versions and individual contributors (people and bots) distinguished by different colours.
This tool will fall over for pages with a lengthy history of edits, and requires a web browser that can support SVG, but it's a fun visualisation, and may inspire someone to do this properly.
One side effect of playing with ways to visualise and integrate biology databases is that you stumble across the weird and wonderful stuff that living organisms get up to. My earliest papers were on crustacean taxonomy, so I thought I'd try my latest toy on them.
What lives on crustaceans?
The "symbiome" graph for crustacea shows a range of associations, including marine bacteria (Vibrio), fungi (microsporidians), and other organisms, including other crustacea (crustaceans are at the top of the circle, I'll work on labelling these diagrams a little better).
What do crustaceans live on?
Crustacea (in addition to parasitising other crustacea) parasitise several vertebrates groups, including fish and whales. But they also occur in terrestrial vertebrates. For example, sequence EF583871 is from the pentastomid worm Porocephalus crotali from a dog. When people think of terrestrial crustacea they usually don't think of parasites. There's also a prominent line from crustaceans to what turns out to be corals, representing coral-living barnacles.
It's instructive to compare this with insects, which similarly parasitise vertebrates. The striking difference is the association between insects and flowering plants.
I guess these really need to be made interactive, so we could click on them and discover more about the association represented by each line in the diagram.
Back in 2006 in a short post entitled "Building the encyclopedia of life" I wrote that GenBank is a potentially rich source of information on host-parasite relationships. Often sequences of parasites will include information on the name of the host (the example I used was sequence AF131710 from the platyhelminth Ligophorus mugilinus, which records the host as the Flathead mullet Mugil cephalus).
I've always wanted to explore this idea a bit more, and have finally made a start, in part inspired by the recent VIZBI 2011 meeting. I've grabbed a large chunk of GenBank, mined the sequences for host records, and created some simple visualisations of what I'm terming (with tongue firmly in cheek) the "symbiome". Jonathan Eisen will not be happy, but I need a word that describes the complete set of hosts, mutualists, symbionts with which an organism is associated, and "symbiome" seems appropriate.
Human symbiome To illustrate the idea, below is the human "symbiome". This diagram shows all the taxa in GenBank arranged in a circle, with lines connecting those organisms that have DNA sequences where humans are recorded as their host.
At a glance, we have a lot of bacteria (the gray bar with E. coli) and fungi (blue bar with Yeast), and a few nematodes and arthropods.
Fig tree symbiome Next up are organisms collected from fig trees (genus Ficus).
Fig trees have wasp pollinators (the dark line landing near the honey bee Apis), as well as nematodes (dark line landing near Caenorhabditis elegans). There are also some associations with fungi and other arthropods.
Which taxa host insects? Next up is a plot of all associations involving insects and a host.
The diagram is dominated by insect-flowering plant interactions, followed by insect-vertebrate associations (most likely bird and mammal lice).
Which taxa are hosted by insects? We can reverse the question and ask what organisms are hosted by insects:
Lots of associations between insects and fungi, as well as bacteria, and a few other organisms, such as nematodes, and Plasmodium (the organism which causes malaria).
Frog symbiome Lastly, below is the symbiome of frogs. "Worms" feature prominently, as well as the fungus that causes chytridiomycosis.
How the visualisation was made
The symbiome visualisations were made as follows. Firstly DNA sequences were downloaded from EMBL and run through a script that extracted as much metadata as possible, including the contents of the host field (where present). I then took the NCBI taxonomy and generated an ordered list of taxa by walking the tree in postorder, which determines where on the circumference of the circle the taxon lies. Pairs of taxa in an association are connected by a quadratic Bezier curve. The illustration was created using SVG.
Next steps There are several ways this visualisation could be improved. It's based only only a subset of data (I haven't run all of the sequence databases though the parser yet), and the matching of host taxa is based on exact string matching. All manner of weird and wonderful things get entered in the host field, so we'll need some more sophisticated parsing (see "LINNAEUS: A species name identification system for biomedical literature" doi:10.1186/1471-2105-11-85 for a more general discussion of this issue).
The visualisation is fairly crude at this stage. Circle plots like this are fairly simple to create, and pop up in all sorts of situations (e.g., RNA secondary structure methods, which I did some work on years ago). Of course, Circos would be an obvious tool to use to create the visualisations, but the overhead of installing it and learning how to use it meant I took a shortcut and wrote some SVG from scratch.
Although I've focussed on GenBank as a source of data, this visualisation could also be applied to other data. I briefly touched on this in Tag trees: displaying the taxonomy of names in BHL where a page in the Biodiversity Heritage Library contains the names of a flea and it's mammalian hosts. I think these circle plots would be a great way to highlight possible ecological associations mentioned in a text.
Given that the Twitter stream tagged #vizbi will fade away soon, I've grabbed most of the links I tweeted during VIZBI 2011 and have put them here. This isn't intended as a comprehensive list, merely the things which caught my eye, and didn't flash by faster than I could tweet.
I've spent the last three days at VIZBI, a Workshop on Visualizing Biological Data, held at the Broad Institute in Boston (note that "Broad" rhymes with "Code"). A great conference in a special venue that includes the DNAtrium. Videos of the talks will be online "real soon now", look for the keynotes, which were full of great ideas and visualisations. To get a flavour of the meeting search for the hashtag #vizbi on Twitter (you can also see the tweet stream on the VIZBI home page). All the keynotes were great, but I personally found Tamara Munzer's the most enlightening. She drew on lots of research in visual perception to outline what works and what doesn't when presenting information visually. You can grab a PDF of her presentation here.
One aspect of the meeting which worked really well was the poster presentations. Poster sessions were held during coffee breaks, and after the last talk of the session but before the audience broke for coffee, each author of a poster got 90 seconds to introduce their poster (there were typically around 10 posters per break). This meant the poster authors got a chance to introduce themselves and their work to the workshop audience, and the audience could discover what posters were being displayed. Neat idea.
More zoom viewer experiments (see previous post), this time with a linked map that updates as you browse the tree (SVG-capable browser required). As you browse the frog classification the map updates to show the location of georeferenced sequences in GenBank from the taxa in the part of the tree you are looking at. The map is limited to not more than 200 localities, and many frog sequences aren't georeferenced, but it's a fun way to combine classification and geography. You can try it at:
Continuing experiments with a zoom viewer for large trees (see previous post), I've now made a demo where the labels are clickable. If the NCBI taxon has an equivalent page in Wikipedia the demo displays and link to that page (and, if present, a thumbnail image). Give it a try at
Here's a quick demo of a 2D large tree viewer that I'm working on. The aim is to provide a simple way to view and navigate very large trees (such as the NCBI classification) in a web browser using just HTML and Javascript. At the moment this is simply a viewer, but the goal is to add the ability to show "tracks" like a genome browser. For example, you could imagine columns appearing to the right of the tree showing you whether there are phylogenies available for these taxa in TreeBASE, images from Wikipedia, sparklines for sequencing activity over time, etc. I'll blog some more on the implementation details when I get the chance, but it's pretty straightforward. Image tiles are generated from SVG images of tree using ImageMagick, labelling is applied on the fly using GIS-style queries to a MySQL database that holds the "world coordinates" of the nodes in the tree (see discussion of world coordinates on Google's Map API pages), and the zooming and tile fetching is based on Michal Migurski's Giant-Ass Image Viewer. Once I've tidied up a few things I'll put up a live demo so people can play with it.
Matt Yoder (@mjyoder had a Twitter conversation yesterday about phylogeny viewers, prompted by my tweeting about my latest displacement activity, a 2D tree browser using the tiling approach made popular by Google Maps.
This issue deserves more exploration, but here are some quick thoughts. 3D has been used in a number of phylogeny browsers, such as Mike Sanderson's Paloverde, Walrus, and the Wellcome Trust's Tree of Life. I don't find any terribly successful, pretty as they may be. I think there are several problems with trees in general, and 3D versions in particular.
Trees aren't real Trees aren't real in the same way that the physical world is (or even imagined physical worlds). Trees are conceptual structures. The history of web interfaces is littered with attempts to visualise conceptual space, for example to summarise search results. These have been failures, a simple top ten list as used by Google wins. I don't think this is because Google's designers lack imagination, it's because it works. Furthermore, this is actually a very successful visualisation:
I think elaborate attempts to depict conceptual spaces on screens are mostly going to fail.
Trees are empty Compared to, say, a geographic map, trees are largely empty space. In a map every pixel counts, in that it potentially represents something. Think of the satellite view in Google Maps. Each pixel on the screen has information. Trees are largely empty, hence much of the display space is wasted. Moving trees to 3D just gives us more space to waste.
Trees don't have a natural ordering Even if we accept that trees are useful visualisations, they have problems. Given the tree ((1,2),(3,4)); we have a lot of (perhaps too much) freedom in how we can depict that tree. For example, both diagrams below depict this tree. In the x-axis there is a partial order of internal nodes (the ancestor of {1,2} must be to the right of the ancestor {1,2,3,4}), but the tree ((1,2),(3,4)); says nothing about the relative ordering of {1,2} versus {3,4}. We are free to choose. A natural linear ordering would be divergence time, but estimates of those times can be contested, or unavailable.
Phylogenies are unordered trees in the sense that I can rotate any node about it's ancestor and still have the same tree (compare the two trees above). Phylogenies are like mobiles:
The practical consequence of this is that different tree viewers can render the same tree in very different ways, making navigation across viewers unpredictable. Compare this to maps. Even if I use different projections, the maps remain recognisably similar, and most maps retain similar relationships between areas. If I look at a map of Glasgow and move left I will end up in the Atlantic Ocean, no matter if I use Google Maps or Microsoft Maps. Furthermore, trees grow in a way that maps don't (at least, not much). If I add nodes to a tree it may radically change shape, destroying navigation cues that I may have relied on before. Typically maps change by the addition of layers, not by moving bits around (paleogeographic maps excepted).
Trees aren't 3D There's nothing intrinsically 3D about trees, which means any mapping to 3D space is going to be arbitrary. Indeed, most 3D viewers simply avoid any mapping and show a 2D tree in 3D space, which seems rather pointless.
Perhaps it's because I don't play computer games much (went through an Angry Birds phase, and occasionally pick up an X-Box controller, only to be mercilessly slaughtered by my son), but I'm not inspired by the analogy with computer games. I'm not denying that there are useful things to learn from games (I'm sure the controls in Google Earth owe something to games). But games also rely on a visceral connection with the play, and an understanding of the visual vocabulary (how to unlock treasure, etc.). Matt's 3D model requires users to learn a whole visual vocabulary, much of which (e.g., "Fruit on your tree? Someone has left comment(s) or feedback. ") seems forced.
My sense is that the most successful interfaces make the minimal demands on users, don't fight their intuition, and don't force them to accept a particular visualisation of their own cognitive space.
I'll write more about this once I get my 2D tree viewer into shape where it can be shown. It will be a lot less imaginative than Matt's vision, all I'm shooting for is that it is usable.
My views on TreeBASE are prettywellknown. Lately I've been thinking a lot about how to "fix" TreeBASE, or indeed, move beyond it. I've made a couple of baby steps in this direction.
The first step is that I've created a group for TreeBASE papers on Mendeley. I've uploaded all the studies in TreeBASE as of December 13 (2010). Having these in Mendeley makes it easier to tidy up the bibliographic metadata, add missing identifiers (such as DOIs and PubMed ids), and correct citations to non-existent papers (which can occur if at the time the authors uploaded their data the planned to submit their paper to one journal, but it ending up being accepted in another). If you've a Mendeley account, feel free to join the group. If you've contributed to TreeBASE, you should find your papers already there.
The second step is playing with CouchDB (this years new hotness), exploring ways to build a database of phylogenies that has nothing much to do with either a relational database or a triple store. CouchDB is a document store, and I'm playing with taking NeXML files from TreeBASE, converting them to something vaguely usable (i.e., JSON), and adding them to CouchDB. For fun, I'm using my NCBI to Wikipedia mapping to get images for taxa, so if TreeBASE has mapped a taxon to the NCBI taxonomy, and that taxon has a page in Wikipedia with an image, we get an image for that taxon. The reason for this is I'd really like a phylogeny database that was visually interesting. To give you some examples, here are trees from TreeBASE (displayed using SVG), together with thumbnails of images from Wikipedia:
Everything (tree and images) is stored within a single document in CouchDB, making the display pretty trivial to construct. Obviously this isn't a proper interface, and there's things I'd need to do, such as order the images in such a way that they matched the placement of the taxa on the tree, but at a glance you can see what the tree is about. We could then envisage making the images clickable so you could find out more about that taxon (e.g., text from Wikipedia, lists of other trees in the database, etc.).
We could expand this further by extracting geographical information (say, from the sequences included in the study) and make a map, or eventually a phylogeny on Google Earth) (see David Kidd's recent "Geophylogenies and the Map of Life" for a manifesto doi:10.1093/sysbio/syq043).
One of the big things missing from databases like TreeBASE is a sense of "fun", or serendipity. It's hard to find stuff, hard to discover new things, make new connections, or put things in context. And that's tragic. Try a Google image search for treebase+phylogeny:
Call me crazy, but I looked at that and thought "Wow! This phylogeny stuff is cool!" Wouldn't it be great if that's the reaction people had when they looked at a database of evolutionary trees?
Being in an unusually constructive mood, I've spent the last couple of days playing with the TreeBASE II API, in an effort to find out how hard it would be to replace TreeBASE's frankly ghastly interface.
After some hair pulling and bad language I've got something to work. It's very crude, but gives a glimpse at what can be done. If you visit http://iphylo.org/~rpage/mytreebase/ and enter a taxon name, my code paddles off and queries TreeBASE to see if it has any phylogenies for that taxon. Gears grind, RSS feeds are crunched, a triple store is populated, NEXUS files are grabbed and Newick trees extracted, small creatures are needlessly harmed, and at last some phylogeny thumbnails are rendered in SVG (based on code I mentioned earlier), grouped by study. Functionality is limited (you can't click on the trees to make them bigger, for example), and the bibliographic information TreeBASE stores for studies is a bit ropey, but you get the idea.
What I'm looking for at this stage is a very simple interface that answers the question "show me the trees", which I think is the most basic question you can ask of TreeBASE (and one its own web interface makes unnecessarily hard). I've also gained some inspiration from the BioText search engine.
If you want to give it a try, here are some examples. These examples should be fairly responsive as the data is cached, but if you try searching for other taxa you may have a bit of a wait while my code talks to TreeBASE.
Having made a first stab at mapping NCBI taxa to Wikipedia, I thought it might be fun to see what could be done with it. I've always wanted to get quantum treemaps working (quantum treemaps ensure that the cells in the treemap are all the same size, see my 2006[!] blog post for further description and links). After some fussing I have some code that seems to do the trick. As an example, here is a quantum treemap for Laurasiatheria.
The diagram shows the NCBI taxonomy subtree rooted on Laurasiatheria, with images (where available) from Wikipedia for the children of the the children of that node. In other words, the images correspond to the tips of the tree below:
There's a lot to be done to tidy this up, but there is potential to create a nice, visual way to navigate through the NCBI taxonomy (it might work well on the iPhone or iPad, for example).
Cooliris is a web browser plugin that can display a large number of images as a moving "infinite" wall. It's Friday, so for fun I added a media RSS feed to BioStor to make the BHL page scans available to Cooliris. The result is easier to show than describe, so take a peek at the video I made of A review of the Centrolenid frogs of Ecuador, with descriptions of new species (http://biostor.org/reference/20844):
Cooliris is a little flaky under Snow Leopard, but still works (the plug-in is cross platform). It is also available for the iPhone (and I'm assuming the iPad), which means you can get the experience on a mobile device.
Some serious displacement activity. I'm toying with adding phylogenies to iSpecies, probably sourced from the PhyLoTA browser. This raises the issue of how to display trees on a web page. PhyLoTA itself uses bitmap images, such as this one: but I'd like to avoid bitmaps. I toyed with using SVG, but that has it's own series of issues (it basically has to be served as a separate file). So, I've spent a couple of hours playing with the <canvas> element. This enables some quite nice drawing to be down in a browser window, without plugins, SVG, or Flash. I wrote a quick PHP script to parse a Newick tree and draw it using <canvas>. It's really pretty simple, and the results are quite nice: One minor gotcha is interacting with the diagram (this is one advantage of SVG). Turns out we need a hack, so I've used the trick of a blank, transparent GIF and a usemap (see Greg Houston's Canvas Pie Chart with Tooltips). The picture above is a screen shot, you can see a live example here.
Random half-formed idea time. Thinking about marking up an article (e.g., from PLoS) with a phylogeny (such as the image below, see doi:10.1371/journal.pone.0001109.g001), I keep hitting the fact that existing web-based tree viewers are, in general, crap.
Given that a PLoS article is an XML document, it would be great if the tree diagram was itself XML, in particular SVG. But, in one sense, we don't want just a diagram, we want access to the underlying tree (for example, so we can play with it in other software). The tree may or may not be available in TreeBASE, but what if the diagram itself was the tree? In other words, imagine a tree viewing program could output SVG, structured in such a way that with a XSLT stylesheet the underlying tree could be extracted (say in Newick or, gack, NexXML) from the SVG, but users could take the SVG and embellish it (in Adobe Illustrator or Inkscape). The nice illustration and the tree data structure would be one and the same thing! No getting tree and illustration out of sync, and no hoping authors have put tree in a database somewhere -- the article contains the tree.
In order for this to happen, we need a tree viewer that exports SVG, and ideally would allow annotation so that the author could do most of the work within that program (ensuring that the underlying tree object isn't broken by graphic editing). Then export the SVG, add extract bits in Illustrator/Inkscape if needed, and have it incorporated into the article XML (which is what the publisher uses to render the article on the web). Simples.