Search this keyword

Fun things about crustaceans

One side effect of playing with ways to visualise and integrate biology databases is that you stumble across the weird and wonderful stuff that living organisms get up to. My earliest papers were on crustacean taxonomy, so I thought I'd try my latest toy on them.

What lives on crustaceans?

The "symbiome" graph for crustacea shows a range of associations, including marine bacteria (Vibrio), fungi (microsporidians), and other organisms, including other crustacea (crustaceans are at the top of the circle, I'll work on labelling these diagrams a little better).

CrusthostWhat do crustaceans live on?Crustpara

Crustacea (in addition to parasitising other crustacea) parasitise several vertebrates groups, including fish and whales. But they also occur in terrestrial vertebrates. For example, sequence EF583871 is from the pentastomid worm Porocephalus crotali from a dog. When people think of terrestrial crustacea they usually don't think of parasites. There's also a prominent line from crustaceans to what turns out to be corals, representing coral-living barnacles.

It's instructive to compare this with insects, which similarly parasitise vertebrates. The striking difference is the association between insects and flowering plants.

Insect

I guess these really need to be made interactive, so we could click on them and discover more about the association represented by each line in the diagram.

Visualising the symbiome: hosts, parasites, and the Tree of Life

Back in 2006 in a short post entitled "Building the encyclopedia of life" I wrote that GenBank is a potentially rich source of information on host-parasite relationships. Often sequences of parasites will include information on the name of the host (the example I used was sequence AF131710 from the platyhelminth Ligophorus mugilinus, which records the host as the Flathead mullet Mugil cephalus).

I've always wanted to explore this idea a bit more, and have finally made a start, in part inspired by the recent VIZBI 2011 meeting. I've grabbed a large chunk of GenBank, mined the sequences for host records, and created some simple visualisations of what I'm terming (with tongue firmly in cheek) the "symbiome". Jonathan Eisen will not be happy, but I need a word that describes the complete set of hosts, mutualists, symbionts with which an organism is associated, and "symbiome" seems appropriate.

Human symbiome
To illustrate the idea, below is the human "symbiome". This diagram shows all the taxa in GenBank arranged in a circle, with lines connecting those organisms that have DNA sequences where humans are recorded as their host.

Human

At a glance, we have a lot of bacteria (the gray bar with E. coli) and fungi (blue bar with Yeast), and a few nematodes and arthropods.

Fig tree symbiome
Next up are organisms collected from fig trees (genus Ficus).

Ficus
Fig trees have wasp pollinators (the dark line landing near the honey bee Apis), as well as nematodes (dark line landing near Caenorhabditis elegans). There are also some associations with fungi and other arthropods.

Which taxa host insects?
Next up is a plot of all associations involving insects and a host.

Insect
The diagram is dominated by insect-flowering plant interactions, followed by insect-vertebrate associations (most likely bird and mammal lice).

Which taxa are hosted by insects?
We can reverse the question and ask what organisms are hosted by insects:

Insectashost
Lots of associations between insects and fungi, as well as bacteria, and a few other organisms, such as nematodes, and Plasmodium (the organism which causes malaria).

Frog symbiome
Lastly, below is the symbiome of frogs. "Worms" feature prominently, as well as the fungus that causes chytridiomycosis.

FrogHow the visualisation was made

The symbiome visualisations were made as follows. Firstly DNA sequences were downloaded from EMBL and run through a script that extracted as much metadata as possible, including the contents of the host field (where present). I then took the NCBI taxonomy and generated an ordered list of taxa by walking the tree in postorder, which determines where on the circumference of the circle the taxon lies. Pairs of taxa in an association are connected by a quadratic Bezier curve. The illustration was created using SVG.


Next steps
There are several ways this visualisation could be improved. It's based only only a subset of data (I haven't run all of the sequence databases though the parser yet), and the matching of host taxa is based on exact string matching. All manner of weird and wonderful things get entered in the host field, so we'll need some more sophisticated parsing (see "LINNAEUS: A species name identification system for biomedical literature" doi:10.1186/1471-2105-11-85 for a more general discussion of this issue).

The visualisation is fairly crude at this stage. Circle plots like this are fairly simple to create, and pop up in all sorts of situations (e.g., RNA secondary structure methods, which I did some work on years ago). Of course, Circos would be an obvious tool to use to create the visualisations, but the overhead of installing it and learning how to use it meant I took a shortcut and wrote some SVG from scratch.

Although I've focussed on GenBank as a source of data, this visualisation could also be applied to other data. I briefly touched on this in Tag trees: displaying the taxonomy of names in BHL where a page in the Biodiversity Heritage Library contains the names of a flea and it's mammalian hosts. I think these circle plots would be a great way to highlight possible ecological associations mentioned in a text.

TreeBASE meets NCBI, again

Déjà vu is a scary thing. Four years ago I released a mapping between names in TreeBASE and other databases called TBMap (described here: doi:10.1186/1471-2105-8-158). Today I find myself releasing yet another mapping, as part of my NCBI to Wikipedia project. By embedding the mapping in a wiki, it can be edited, so the kinds of problems I encountered with TbMap, recounted here, here, and here. The mapping in and of itself isn't terribly exciting, but it's the starting point for some things I want to do regarding how to visualise the data in TreeBASE.

Because TreeBASE 2 has issued new identifiers for its taxa (see TreeBASE II makes me pull my hair out), and now contains its own mapping to the NCBI taxonomy, as a first pass I've taken their mapping and added it to http://iphylo.org/linkout. I've also added some obvious mappings that TreeBASE has missed. There are a lot more taxa which could be added, but this is a start.

The TreeBASE taxa that have a mapping each get their own page with a URL of the form http://iphylo.org/linkout/<TreeBase taxon identifier>, e.g. http://iphylo.org/linkout/TB2:Tl257333. This page simply gives the name of the taxon in TreeBASE and the corresponding NCBI taxon id. It uses a Semantic Mediawiki template to generate a statement that the TreeBASE and and NCBI taxa are a "close match". If you go to the corresponding page in the wiki for the NCBI taxon (e.g., http://iphylo.org/linkout/Ncbi:448631) you will see any corresponding TreeBASE taxa listed there. If a mapping is erroneous, we simply need to edit the TreeBASE taxon page in the wiki to fix it. Nice and simple.

At the time of writing the initial mapping is still being loaded (this can take a while). I'll update this post when the uploading has finished.

VIZBI 2011

broad.jpg
I've spent the last three days at VIZBI, a Workshop on Visualizing Biological Data, held at the Broad Institute in Boston (note that "Broad" rhymes with "Code"). A great conference in a special venue that includes the DNAtrium. Videos of the talks will be online "real soon now", look for the keynotes, which were full of great ideas and visualisations. To get a flavour of the meeting search for the hashtag #vizbi on Twitter (you can also see the tweet stream on the VIZBI home page). All the keynotes were great, but I personally found Tamara Munzer's the most enlightening. She drew on lots of research in visual perception to outline what works and what doesn't when presenting information visually. You can grab a PDF of her presentation here.

One aspect of the meeting which worked really well was the poster presentations. Poster sessions were held during coffee breaks, and after the last talk of the session but before the audience broke for coffee, each author of a poster got 90 seconds to introduce their poster (there were typically around 10 posters per break). This meant the poster authors got a chance to introduce themselves and their work to the workshop audience, and the audience could discover what posters were being displayed. Neat idea.

I gave a presentation on phylogenies, which I've put on slideshare. After explaining that I thought phylogeny visualisation was mostly a solved problem (as evidenced by the large number of tree viewers available), I continued the theme of why I don't think 3D works for phylogeny (except for geophylogenies), made the pitch for building a phylogeny viewer on the iPad, and finished with my recent work on Google Maps-style viewing very large trees.