Search this keyword

BLAST a sequence and get a tree

For this weeks sessions of my phyloinformatics course I'm developing some phylogeny tools. The first is a simple AJAX-based BLAST tool. I've always wanted a quick way to see a GenBank sequence in its phylogenetic context, so I've built a simple tool to that takes a GenBank accession number or GI number, submits a BLAST job, retrieves the sequences, aligns them using CLUSTALW, builds a quick and dirty neighbour-joining tree using PAUP*, then displays the tree using SVG (if your browser doesn't support this you won't see the tree). One use for this is to quikcly get a sense of whether an unnamed ("dark") taxon is related to sequences that have been identified.

Nothing fancy, but it was a chance to display the whole process in the browser without opening new windows or refreshing the page. Here's an example for the GenBank sequence FJ559186:



For the technically-minded, the calls to BLAST and the alignment and tree construction tools all use AJAX, and there's a simple Javascript timer to countdown the seconds that the NCBI BLAST web service estimates the BLAST job will take, before we poll NCBI to see if the job has in fact finished. The code is in GitHub.

Extracting museum specimen codes from text

Quick note about a tool I've cobbled together as part of the phyloinformatics course, which addresses a long standing need I and others have to extract specimen codes from text. I've had this code kicking around for a while (as part of various never-finished data mining projects), but never got around to releasing it, until now. It is very crude (basically a bunch of regular expressions), and there's a lot which could be done to improve it (not least starting with a complete list of museum specimen codes, rather than just those I've come across in, say Zootaxa and BioStor).

You can try the tool at http://iphylo.org/~rpage/phyloinformatics/services/specimenparser.php. Paste in some text and it will try and extract museum codes. The tool tries to handle ranges of specimens (e.g., MHNSM 1808-09), and some of the more common specimen numbering schemes.

Comments welcome. If you are looking for a source of text, papers in Zookeys or Zootaxa are a good place to start (especially papers on vertebrates where specimen numbers are often used). BioStor is also a good source: if you're looking at a paper in BioStor click on the "Text" link to get the OCR text for an article and paste that into the form at . For example, the text for Systematics of the Bufo coccifer complex (Anura: Bufonidae) of Mesoamerica is available at http://biostor.org/reference/97426.text.

The extraction tool can also be called as a web service using POST to get back the results in JSON.

Open course on phyloinformatics

As part of a postgraduate course here at the University of Glasgow I'm teaching five sessions on "phyloinformatics", which I've decided to define broadly enough to encompass most of biodiversity informatics.

Given that this module is being developed on the fly, and will make use of lots of little "toys" I've developed and discussed on this blog, I've decided to put the course notes online, along with the interactive demos and the source code. So, if you want to follow along for the next couple of weeks, here are the links:



Each course page supports comments (see the bottom of the page), so feel free to add comments, or suggestions. The notes are at a crude stage, and will be developed over the duration of the course (2 weeks). I'm also endeavouring to get all the source code for the demonstration apps into GitHub. None of these demos is polished, but they will hopefully provide some ideas for taking them further. There will be iSpecies-like mashups, iPad webapps, classification visualisations, TreeBASE search tools, geophylogenies and other phylogeny viewers.