Search this keyword

To wiki or not to wiki?

What follows are some random thoughts as I try and sort out what things I want to focus on in the coming days/weeks. If you don't want to see some wallowing and general procrastination, look away now.

I see four main strands in what I've been up to in the last year or so:
  1. services
  2. mashups
  3. wikis
  4. phyloinformatics
Let's take these in turns.

Services
Not glamourous, but necessary. This is basically bioGUID (see also hdl:10101/npre.2009.3079.1). bioGUID provides OpenURL services for resolving articles (it has nearly 84,000 articles in it's cache), looking up journal names, resolving LSIDs, and RSS feeds.

Mashups
iSpecies is my now aging tool for mashing up data from diverse sources, such as Wikipedia, NCBI, GBIF, Yahoo, and Google Scholar. I tweak it every so often (mainly to deal with Google Scholar forever mucking around with their HTML). The big limitation of iSpecies is that it doesn't make it's results reusable (i.e., you can't write a script to call iSpecies and return data). However, it's still the place I go to to quickly find out about a taxon.

The other mashups I've been playing with focus on taking standardised RSS feeds (provided by bioGUID, see above) and mashing them up, sometimes with a nice front end (e.g., my e-Biosphere 09 challenge entry).

Wiki
I've invested a huge amount of effort in learning how wikis (especially Mediawiki and its semantic extensions) work, documented in earlier posts. I created a wiki of taxonomic names as a sandbox to explore some of these ideas.

I've come to the conclusion that for basic taxonomic and biological information, the only sensible strategy for our community is to use (and contribute to) Wikipedia. I'm struggling to see any justification for continuing with a proliferation of taxonomic databases. After e-Biosphere 09 the game's up, people have started to notice that we've an excess of databases (see Claire Thomas in Science, "Biodiversity Databases Spread, Prompting Unification Call", doi:10.1126/science.324_1632).

Phyloinformatics
In truth I've not been doing much on this, apart from releasing tvwidget (code available from Google Code), and playing with a mapping of TreeBASE studies to bibliographic identifiers (available as a featured download from here). I've played with tvwidget in Mediawiki, and it seems to work quite well.

Where now?
So, where now? Here are some thoughts:
  1. I will continue to hack bioGUID (it's now consuming RSS feeds from journals, as well as Zotero). Everything I do pretty much depends on the services bioGUID provides

  2. iSpecies really needs a big overhaul to serve data in a form that can be built upon. But this requires decisions on what that format should be, so this isn't likely to happen soon. But I think the future of mashup work is to use RDF and triple stores (providing that some degree of editing is possible). I think a tool linking together different data sources (along the lines of my ill-fated Elsevier Challenge entry) has enormous potential.

  3. I'm exploring Wikipedia and Wikispecies. I'm tempted to do a quantitative analysis of Wikipedia's classification. I think there needs to be some serious analysis of Wikipedia if people are going to use it as a major taxonomic resource.

  4. If I focus on Wikipedia (i.e., using an existing wiki rather than try to create my own), then that leaves me wondering what all the playing with iTaxon was for. Well, actually I think the original goal of this blog (way back in December 2005) is ideally suited to a wiki. Pretty much all the elements are in place to dump a copy of TreeBASE into a wiki and open up the editing of links to literature and taxonomic names. I think this is going to handily beat my previous efforts (TbMap, doi:10.1186/1471-2105-8-158), especially as errors will be easy to fix.

So, food for thought. Now, I just need to focus a little and get down to actually doing the work.

Nexus Data Editor and Windows Vista

nde.gifSometimes it's just amazing/frightening how long a piece of software remains useful. I wrote Nexus Data Editor (NDE) in the late 1990's, mainly to keep my then PhD student Vince Smith happy. Vince was constructing a morphological dataset for lice, and he didn't like Macs (in those days, he's seen the light now), and even if he did MacClade didn't allow him to wax lyrical about character states, so I wrote NDE for Windows (in those days this meant Windows 95 and NT). Vince and other students found it useful, so I wrote a manual and released it.

Turns out people still use NDE, but it doesn't install on Vista. I finally bit the bullet and put a installed a copy of Vista in VM Fusion on my MacBook, and confirmed that the installation was broken. Fearing I'd have to compile NDE for Vista (a challenge as it was built using the wonderful Borland 5.02 C++ compiler and IDE, now defunct). Turns out, it's the install package itself that's broken (built using Install Shield). The Inno Setup installer I use for TreeView X works fine, however.

The upshot is, if you use(d) NDE and have Vista, download a new copy of NDE from the web site, and it should work. Thanks to Mike Polcyn at the Southern Methodist University, Dallas, for the prompting that finally got this done.

GBIF and Linked Data

At the end of day two of the GBIF LSID-GUID Task Group I put together this crude diagram to summarise some of the possible links between biodiversity data and the larger linked data cloud, which I, among others, have argued is where biodiversity informatics should be heading. Here's my hastily put together diagram (created using the wonderful OmniGraffle):
Links.jpg


I've put GBIF at the centre since we're at GBIF, and it's them we are trying to convince. Yellow circles are biodiversity data sources (which aren't linked data providers (but some can me made so using, for example, my LSID proxy resolver), white circles are linked data sources.

The "sales pitch"is that if we join the linked data cloud we open up the possibility of some very powerful queries, especially once that are outside the relatively narrow scope of what GBIF and TDWG concern themselves with. Imagine being able to query biodiversity data with respect to population and economic data across countries. These are the sort of things we could realistically aim for.

On a practical level, it also means biodiversity database could devolve a lot of their tasks to other databases (via reusing identifiers). Some taxonomists have DBPedia URIs, and more could be added to Wikipedia (and so will find there way into DBPedia). Geonames provides geographic URIs which we could reuse, and so on. Within our own community we could do a better job of reusing our own identifiers, and reusing external ones (such as taxa in Wikipedia).

It's late, this is a rushed diagram, and I don't know if it's going to end up in whatever report we manage to assemble tomorrow (our final day). But I hope it captures some of the scope of what we're looking at. I know there are some problems (as have been pointed out to me on Twitter), I'll try and deal with these tomorrow.