Search this keyword

Taxonomy - crisis, what crisis?

Following on from the last post How many species are there, and why do we get two very different answers from same data? another interesting paper has appeared in TREE:

Lucas N. Joppa, David L. Roberts, Stuart L. Pimm The population ecology and social behaviour of taxonomists Trends in Ecology & Evolution doi:10.1016/j.tree.2011.07.010

The paper analyses the "ecology and social habits of taxonomists" and concludes:

Conventional wisdom is highly prejudiced. It suggests that taxonomists were a formerly more numerous people, are in 'crisis', are becoming endangered and are generally asocial. We consider these hypotheses and reject them to varying degrees.

Queue flame war on TAXACOM, no doubt, but it's a refreshing conclusion, and it's based on actual data. Here I declare an interest. I was a reviewer, and in a fit of pique recommended rejection simply because the authors don't make the data available (they do, however, provide the R scripts used to do the analyses). As the authors patiently pointed out in their response to reviews, the various explicit or implicit licensing statements attached to taxonomic data mean they can't provide the data (and I'm assuming that in at least some cases the dark art of screen scrapping was used to get the data).

There's an irony here. Taxonomic databases are becoming hot topics, generating estimates of the scale of the task facing taxonomy, and diagnosing state of the discipline itself (according to Joppa et al. it's in rude health). This is the sort of thing that can have a major impact on how people perceive the discipline (and may influence how many resources are allocated to the subject). If taxonomists take issue with the analyses then they will find them difficult to repeat because the taxonomic data they've spent their careers gathering are under lock and key.

How many species are there, and why do we get two very different answers from same data?

GlobeTwo papers estimating the total number of species have recently been published, one in the open access journal PLoS Biology:

Camilo Mora, Derek P. Tittensor, Sina Adl, Alastair G. B. Simpson, Boris Worm. How Many Species Are There on Earth and in the Ocean?. PLoS Biol 9(8): e1001127. doi:10.1371/journal.pbio.1001127
SSB logo final
the second in Systematic Biology (which has an open access option but the authors didn't use it for this article):

Mark J. Costello, Simon Wilson and Brett Houlding. Predicting total global species richness using rates of species description and estimates of taxonomic effort. Syst Biol (2011) doi:10.1093/sysbio/syr080

The first paper has gained a lot of attention, in part because Jonathan Eisen Bacteria & archaea don't get no respect from interesting but flawed #PLoSBio paper on # of species on the planet was mightily pissed off about the estimates of the number:
Their estimates of ~ 10,000 or so bacteria and archaea on the planet are so completely out of touch in my opinion that this calls into question the validity of their method for bacteria and archaea at all.

The fuss over the number of bacteria and archaea seems to me to be largely a misunderstanding of how taxonomic databases count taxa. Databases like Catalogue of Life record described species, and most bacteria aren't formally described because they can't be cultured. Hence there will always be a disparity between the extent of diversity revealed by phylogenetics and by classical taxonomy.

The PLoS Biology paper has garnered a lot more reaction than the Systematic Biology paper (e.g., the commentary by Carl Zimmer in the New York TimesHow Many Species? A Study Says 8.7 Million, but It’s Tricky), which arguably has the more dramatic conclusion.

How many species, 8.7 million, or 1.8 to 2.0 million?

Whereas the Mora et al. in PLoS Biology concluded that there are some 8.7 million (±1.3 million SE) species on the planet, Costello et al. in Systematic Biology arrive at a much more conservative figure (1.8 to 2.0 million). The implications of these two studies are very different, one implies there's a lot of work to do, the other leads to headlines such as 'Every species on Earth could be discovered within 50 years'.

What is intriguing is that both studies use the same databases, Catalogue of Life and the World's Register of Marine Species, and yet arrive at very different results.

So, the question is, how did we arrive at two very different answers from the same data?


Taylor and Francis Online breaks DOIs - lots of DOIs

TandFOnline twitterDOIs are meant to be the gold standard in bibliographic identifier for article. They are not supposed to break. Yet some publishers seem to struggle to get them to work. In the past I've grumbled about BioOne, Wiley, and others as cuplrits with broken or duplicate or disappearing DOIs.

Today's source of frustration is Taylor and Francis Online. T&F Online is powered by (Atypon), which recently issued this glowing press release:

SANTA CLARA, Calif.—20 September 2011—Atypon®, a leading provider of software to the professional and scholarly publishing industry, today announced that its Literatum™ software is powering the new Taylor & Francis Online platform (www.TandFOnline.com). Taylor & Francis Online hosts 1.7 million articles.
...
"The performance of Taylor & Francis Online has been excellent," said Matthew Jay, Chief Technology Officer for the Taylor & Francis Group. "Atypon has proven that it can deliver on schedule and achieve tremendous scale. We're thrilled to expand the scope of our relationship to include new products and developments."

Great, except that lots of T&F DOIs are broken. I've come across two kinds of fail.

DOI resolves to server that doesn't exist
The first is where a DOI resolves to a phantom web address. For example, the DOI doi:10.1080/00288300809509849 resolves to http://tandfprod.literatumonline.com/doi/abs/10.1080/00288300809509849. But the domain tandfprod.literatumonline.com doesn't exist, so the DOI is a dead end.

DOI doesn't resolve
Taylor and Francis have digitised the complete Annals and Magazine of Natural History, a massive journal comprising nearly 20,000 articles from 1841 to 1966, and which has published some seminal papers, including A. R. Wallace's "On the law which has regulated the introduction of new species" doi"10.1080/037454809495509 which forced Darwin's hand (see the Wikipedia page for the successor journal Journal of Natural History. Taylor and Francis are to be congratulated for putting such a great resource online.

Problem is, I've not found a single DOI for any article in Annals and Magazine of Natural History that actually works. If you try and resolve the DOI for Wallace's paper, doi"10.1080/037454809495509, you get the dreaded "Error - DOI not found" web page. So something like 20,000 DOIs simply don't work. The only way to make the DOI work is append it to "http://www.tandfonline.com/doi/abs/", e.g. http://www.tandfonline.com/doi/abs/10.1080/037454809495509. This gets us to the article, but rather defeats the purpose of DOIs.

Why?
Something is seriously wrong with CrossRef's quality control. It can't be too hard to screen all domains to see if they actually exist (this would catch the first error). It can't be too hard to take a random sample of DOIs and check that they work, or automatically check DOIs that are reported as missing. In the case the Annals and Magazine of Natural History the web page for the Wallace article states that it has been available online since 16 December 2009. That's a long time for a DOI to be dead.

There is a wealth of great content that is being made hard to find by some pretty basic screw ups. So CrossRef, Atypon and Taylor and Francis, can we please sort this out?