Search this keyword

The top-ten new species described in 2010 and the failure of taxonomy to embrace Open Access publication

Each year the grandly titled International Institute for Species Exploration (IISE) publishes list of the top 10 species described in the previous year. This year's list is reproduced below, to which I've added the links to the original publications (why do people think still it's OK to omit links to the primary literature when all of these articles are online?).

The striking thing is that only 2 of the 10 species were described in Open Access publications (and I use that term loosely as as Arthropod Systematics & Phylogeny PDFs are freely available, but the licensing isn't clear). Sadly much of our knowledge of the planet's diversity is still locked up behind a paywall.

SpeciesReferenceDOI/PDFOpen Access
Caerostris 5Darwin's Bark SpiderKuntner, M. and I. Agnarsson. 2010. Web gigantism in Darwin's bark spider, a new species from Madagascar (Araneidae: Caerostris). The Journal of Arachnology 38(2):346-35610.1636/B09-113.1No
Mycena 2Bioluminescent MushroomDesjardin, D.E., B.A. Perry, D.J. Lodge, C.V. Stevani, and E. Nagasawa. 2010. Luminescent Mycena: new and noteworthy species. Mycologia 102(2):459-47710.3852/09-197No
HalomonasBacteriumSanchez-Porro, C., B. Kaur, H. Mann and A. Ventosa. 2010. Halomonas titanicae sp. nov., a halophilic bacterium isolated from the RMS Titanic. International Journal of Systematic and Evolutionary Microbiology 60(12):2768-277410.1099/ijs.0.020628-0No
VaranusMonitor LizardWelton, L.J., C.D. Siler, D. Bennett, A. Diesmos, M.R. Duya, R. Dugay, E.L.B. Rico, M. van Weerd and R.M. Brown. 2010. A spectacular new Philippine monitor lizard reveals a hidden biogeographic boundary and a novel flagship species for conservation. Biology Letters 6(5):654-65810.1098/rsbl.2010.0119No
GlomeremusPollinating cricketHugel, S., C. Micheneau, J. Fournel, B.H. Warren, A. Gauvin-Bialecki, T. Pailler, M.W. Chase and D. Strasberg. 2010. Glomeremus species from the Mascarene islands (Orthoptera, Gryllacrididae) with the description of the pollinator of an endemic orchid from the island of Réunion. Zootaxa 2545:58-68PDFNo
Philantomba 2DuikerColyn, M., J. Hulselmans, G. Sonet, P. Oudé, J. de Winter, A. Natta, Z.T. Nagy and E. Verheyen. 2010. Discovery of a new duiker species (Bovidae: Cephalophinae) from the Dahomey Gap, West Africa. Zootaxa 2637:1-30PDFNo
TyrannobdellaLeechPhillips, A.J., R. Arauco-Brown, A. Oceguera-Figueroa, G.P. Gomez, M. Beltran, Y.-T. Lai and M.E. Siddall. 2010. Tyrannobdella rex n. gen. n. sp. and the evolutionary origins of mucosal leech infestations. PLoS ONE 5(4):e1005710.1371/journal.pone.0010057Yes
PsathyrellaUnderwater mushroomFrank, J.L., R.A. Coffan and D. Southworth. 2010. Aquatic gilled mushrooms: Psathyrella fruiting in the Rogue River in southern Oregon. Mycologia 102(1):93-10710.3852/07-190No
SaltoblattellaJumping cockroachBohn, H., M. Picker, K.-D. Klass and J. Colville. 2010. A jumping cockroach from South Africa, Saltoblattella montistabularis, gen. nov., spec. nov. (Blattodea: Blattellidae). Arthropod Systematics and Phylogeny 68(1):53-39/td>PDFYes
HalieutichthysPancake BatfishHo, H.-C., P. Chakrabarty and J.S. Sparks. 2010. Review of the Halieutichthys aculeatus species complex (Lophiiformes: Ogcocephalidae), with descriptions of two new species. Journal of Fish Biology 77(4):841-86910.1111/j.1095-8649.2010.02716.xNo

BioStor article published (finally)

LogoMy article describing BioStor — "Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library" — has finally seen the light of day in BMC Bioinformatics (doi:10.1186/1471-2105-12-187, the DOI is not working at the moment, give it a little while to go live, meantime you can access the article here).

Getting this article published was more work than I expected. There seems to be an inverse correlation between how important I think the work is and how easy it is to get published — the more straightforward I think the article is the more work it is to convince the referees of its merits. Of course, it may be that my judgement of the article's merits influences how much effort I put into making the manuscript as rigorous and clear as possible. And perhaps having a blog has spoiled me, I really struggle with the notion that it takes months to publish a paper, especially as most of the intellectual debate involved (i.e., the refereeing process) is behind closed doors, compared to the open and immediate nature of commentary on a blog post.

However, despite my frustrations with the referring process, there's no doubt that it did improve the manuscript (you can see the original version at Nature Precedings, hdl:10101/npre.2010.4928.1).

With the publication of this article, and last week's conversation with Anurag Acharya and Darcy Dapra about getting BioStor indexed by Google Scholar, it has been a good few days for BioStor.



BHL, DjVu, and reading the f*cking manual

One of the many biggest challenges I've faced with the BioStor project, apart from dealing with messy metadata, has been handling page images. At present I get these from the Biodiversity Heritage Library. They are big (typically 1 Mb in size), and have the caramel colour of old paper. Nothing fills up a server quicker than thousands of images.

A while ago started playing with ImageMagick to resize the images, making them smaller, as well as ways to remove the background colour, leaving just black text and lines on white background.

Before and after converting BHL image


I think this makes the page image clearer, as well as removing the impression that this is some ancient document, rather than a scientific article. Yes, it's the Biodiversity Heritage Library, but the whole point of the taxonomic literature is that it lasts forever. Why not make it look as fresh as when it was first printed?

Working out how to best remove the background colour takes some effort, and running ImageMagick on every image that's downloaded starts putting a lot of stress on the poor little Mac Mini that powers BioStor.

Then there's the issue of having an iPad viewer for BHL, and making it interactive. So, I started looking at the DjVu files generated by the Internet Archive, and thinking whether it would make more sense to download those and extract images from them, rather than go via the BHL API. I'll need the DjVu files for the text layout anyway (see Towards an interactive DjVu file viewer for the BHL).

I couldn't remember the command to extract images from DjVu, but I did remember that Google is my friend, which led me to this question on Stack Overflow: Using the DjVu tools to for background / foreground seperation?.

OMG! DjVu tools can remove the background? A quick look at the documentation confirmed it. So I did a quick test. The page on the left is the default page image, the page on the right was extracted using ddjvu with the option -mode=foreground.

507.png


Much, much nicer. But why didn't I know this? Why did I waste time playing with ImageMagick when it's a trivial option in a DjVu tool? And why does BHL serve the discoloured page images when it could serve crisp, clean versions?

So, I felt like an idiot. But the other good thing that's come out of this is that I've taken a closer look at the Internet Archive's BHL-related content, and I'm beginning to think that perhaps the more efficient way to build something like BioStor is not through downloading BHL data and using their API, but by going directly to the Internet Archive and downloading the DjVu and associated files. Maybe it's time to rethink everything about how BioStor is built...