Search this keyword

Time for some decent service

The BBC web site has an article entitled Giant deep sea jellyfish filmed in Gulf of Mexico which has footage of Stygiomedusa gigantea, and mentions an associated fish, Thalassobathia pelagica.

AE8B4B6F-CC98-405F-90FF-390262EBE3C0.jpg


One thing that frustrates me beyond belief is how hard it is to get more information about these organisms. Put another way, the biodiversity informatics community is missing a huge opportunity here. There are a slew of services, such as Zemanta and OpenCalais.com, that can enrich the content of a document by identifying terms and adding links. Imagine a similar service that took taxonomic names and could provide information and links about that name, so that sites such as the BBC could enrich their pages. We've had various attempts at this1, but we are still far from creating something genuinely useful.

Part of the problem is that the plethora of taxonomic databases we have are often of little use. After fussing with Google I discover that Stygiomedusa gigantea (Browne, 1910) has the synonym Stygiomedusa fabulosa Russell, 1959 (see, e.g., the WoRMS database), but no database tells me that the genus Stygiomedusa was published by Russell in Nature in 1959 (doi:10.1038/1841527a0). Nor can I readily find the original reference for (Browne, 1910) in these databases2. Why is this so hard?

9B0FFA09-AF7B-4F82-98F5-C5D7DF891C5F.jpgThen when we do have information, we fail to make it digestible. For example, the EOL page for Thalassobathia pelagica links to BHL pages, but fails to point out that the pages it links belong to a single article, and that this article (http://biostor.org/reference/4339) is the original description of the fish.

Publishers are increasingly interested in any tools that can embellish their content. The organisation that gets their act together and provides a decent service for publishers (including academic journals, and news services such as the BBC) is going to own this space. Any takers...?

  1. Such as uBio LinkIT and EOL NameLink.
  2. After finding another taxon with the author Browne 1910 in BHL, I found Diplulmaris (?) gigantea, which looked like a good candidate for the original name for the jellyfish, see http://biodiversitylibrary.org/page/1727009. This is confirmed by the Smithsonian's Antarctic Invertebrates site.

What I want from a web phylogeny viewer - XML, SVG and Newick round tripping

Random half-formed idea time. Thinking about marking up an article (e.g., from PLoS) with a phylogeny (such as the image below, see doi:10.1371/journal.pone.0001109.g001), I keep hitting the fact that existing web-based tree viewers are, in general, crap.
53BC7C85-7D00-475D-AE8A-7D91FBE75068.jpg

Given that a PLoS article is an XML document, it would be great if the tree diagram was itself XML, in particular SVG. But, in one sense, we don't want just a diagram, we want access to the underlying tree (for example, so we can play with it in other software). The tree may or may not be available in TreeBASE, but what if the diagram itself was the tree? In other words, imagine a tree viewing program could output SVG, structured in such a way that with a XSLT stylesheet the underlying tree could be extracted (say in Newick or, gack, NexXML) from the SVG, but users could take the SVG and embellish it (in Adobe Illustrator or Inkscape). The nice illustration and the tree data structure would be one and the same thing! No getting tree and illustration out of sync, and no hoping authors have put tree in a database somewhere -- the article contains the tree.

In order for this to happen, we need a tree viewer that exports SVG, and ideally would allow annotation so that the author could do most of the work within that program (ensuring that the underlying tree object isn't broken by graphic editing). Then export the SVG, add extract bits in Illustrator/Inkscape if needed, and have it incorporated into the article XML (which is what the publisher uses to render the article on the web). Simples.

Elsevier Grand Challenge paper out

CB88EB6F-75CD-485D-8A3D-5F43D9EE2B37.jpgAt long last the peer-reviewed version of the paper "Enhanced display of scientific articles using extended metadata" (doi:10.1016/j.websem.2010.03.004), in which I describe my entry in the Elsevier Grand Challenge, has finally appeared in the journal Web Semantics: Science, Services and Agents on the World Wide Web. The pre-print version of this paper has been online (hdl:10101/npre.2009.3173.1) for a year prior to appearance of the published version (24 April 2009 versus 3 April 2010), and the Challenge entry itself went online in December 2008. Unfortunately the published version has an awful typo in the title (that was in neither the manuscript nor the proofs).

Given this typo, the time lag between doing the work, writing the manuscript, and seeing it published, and the fact that I've already been to meetings where my invitation has been based the entry and the pre-print, I do wonder why on Earth would I bother with traditional publication (which is somewhat ironic, given the topic of the paper)?