Search this keyword

EOL Computable Data Challenge community

17823 130 130Now we are awash in challenges! EOL has announced its Computable Data Challenge:
We invite ideas for scientific research projects that use EOL, including the Biodiversity Heritage Library (BHL), to answer questions in biology. The specific field of biological interest for the challenge is open; projects in ecology, evolution, behavior, conservation biology, developmental biology, or systematics may be most appropriate. Projects advancing informatics alone may be less competitive. EOL may be used as a source of biological information, to establish a sampling strategy, to assist the retrieval of computable data by mapping identifiers across sources (e.g. to accomplish name resolution), and/or in other innovative ways. Projects involving data or text or image mining of EOL or BHL content are encouraged. Current EOL data and API shall be used; suggestions for modification of content or the API could be a deliverable of the project. We encourage the use of data not yet in EOL for analyses. In all cases projects must honor terms of use and licensing as appropriate.

Some $US 50,000 is on offer. "Challenge" is perhaps a misnomer, as EOL is offering this money not as a prize at the end, but rather to fund one or more proposals (submitted by 22 May) that are accepted. So, it's essentially a grant competition (with a pleasingly minimal amount of administrivia). There is also a Computable Data Challenge community to discuss the challenge.

It's great to see EOL trying different strategies to engage with developers. Of the different challenges EOL is running this one is perhaps the most appealing to me, because one of my biggest complaints about EOL is that it's hard to envisage "doing science" with it. For example, we can download GenBank and cluster sequences into gene families, or grab data from GBIF and model species distributions, but what could we do with EOL? This challenge will be a chance to explore the extent to which EOL can support science, which I would argue will be a key part of its long term future.

BHL and GBIF as biomedical databases

When I think of the Biodiversity Heritage Library (BHL) or GBIF I tend to think of taxonomy and biodiversity. Folk wisdom has it that BHL is full of old books, mostly pre-1923. Great for finding old taxonomic names, or nice artwork, but not exactly "modern" biology. GBIF is mainly about displaying organism distributions based on museum specimens, the primary data of taxonomic research. Again, great stuff, but aren't museums simply full of dead stuff that people have collected and forgotten about?

But BHL has a lot more post-1923 content than I suspect most people realise (several museum or society journals have 21st century issues in BHL's archives, for example). Continuing the theme of linking BHL and GBIF content, as part of a forthcoming project on taxonomic names (to be made available "real soon now") I stumbled across this 1976 paper in BHL (now in BioStor):

Monograph on "Lithoglyphopsis" aperta, the snail host of Mekong River Schistosomiasis by Davis et al..

Malacologia157576inst 0263

This paper has been indexed in PubMed (PMID:948206, but as far as I'm aware, BHL (and BioStor) has the only digital copy of this paper. (As a side note, wouldn't it be great if PubMed could link to BHL content?).

The article page in BioStor shows a map derived from the OCR text, showing a two localities:

Mekong

Below the map are the specimen codes I've automatically extracted from the OCR text, linked to the corresponding records in GBIF, which are georeferenced (e.g., ANSP Malacology 330925).

If we joined these things up just a little more, we could do some useful things. For example, what if a researcher searching in PubMed for schistosomiasis in South East Asia could find the Davis et al. paper, and then go to BHL or BioStor to read it? What if a researcher looking at gastropod distributions in the Mekong River in the GBIF portal could see that BHL had publications on diseases associated with these organisms (as well as their taxonomy and biology). We could also traverse the link from GBIF to BHL to PubMed and provide a direct route from distribution maps to biomedical literature.

It seems there's scope for trying to connect BHL, GBIF, and PubMed, and that BHL and GBIF may have important roles to play in providing access to basic information about organisms that have a serious impact on human populations.

iEvoBio 2012 Challenge: Synthesizing phylogenies

0150The iEvoBio 2012 Challenge has been announced, and the topic is synthesizing phylogenies. The task:

Somewhere, buried in large sets of trees, lies a stunning new revelation, a baffling discovery, the answer to a longstanding controversy, or simply something not obvious to the naked eye. The mission of the 2012 iEvoBio challenge is to find those revelations, discoveries and answers within your own data and/or within one of the datasets provided by the challenge. What new scientifically interesting results can you pull from these trees, using any combination of techniques at your disposal?


The rules of this challenge are:
  1. The set of trees you use must have at least 10,000 leaves in total. Acceptable entries could be a set comprising 2,500 distinct trees covering the same four taxa, a single tree with 10,000+ leaves, or anything in between.
  2. Your results must be scientifically new.
  3. The data, or at least a description of the data, must be publicly available. If working with your own dataset, you must at least provide a summary of the data you used (see below for the minimum description that must be provided).
  4. The source code of any tool and/or method developed as part of your challenge submission must be publicly downloadable under an OSI-approved open-source license (or dedicated to the public domain) at the latest by the time of the conference.


For more details see the challenge site. Deadline for submission is June 25, 2012.