Search this keyword

Showing posts with label API. Show all posts
Showing posts with label API. Show all posts

BioNames update - API documentation

D4844ff2657d813d00fbd0741e8377a4
One of the fun things about developing web sites is learning new tricks, tools, and techniques. Typically I hack away on my MacBook, and when something seems vaguely usable I stick it on a web server. For BioNames things need to be a little more formalised, especially as I'm collaborating with another developer (Ryan Schenk). Ryan is focussing on the front end, I'm working on the data (harvesting, cleaning, storing).

In most projects I've worked on the code to talk to the database and the code to display results have been the same, it was ugly but it got things. For this project these two aspects have to be much more cleaning separated so that Ryan and I can work independently. One way to do this is to have a well-defined API that Ryan can develop against. This means I can hide the sometimes messy details of how to communicate with the data, and Ryan doesn't need to worry about how to get access to the data.

Nice idea, but to be workable it requires that the API is documented (if it's just me then the documentation is in my head). Documentation is a pain, and it is easy for it to get out of sync with the code such that what the docs say an API does and what it actually does are two separate things (sound familiar?). What would be great is a tool that enables you to write the API documentation, and make that "live" so that the API output can be tested against. In other words, a tool like apiary.io.

Apiary.io is free, very slick, and comes with GitHUb integration. I've started to document the BioNames API at http://docs.bionames.apiary.io/. These documents are "live" in that you can try out the API and get live results from the BioNames database.

I'm sure this is all old news to real software developers (as opposed to people like me who know just enough to get themselves into trouble), but it's quite liberating to start with the API first before worrying about what the web site will look like.

EOL iPad web app using jQueryMobile

As part of a course on "phyloinformatics" that I'm about to teach I've been making some visualisations of classifications. Here's one I've put together using jQuery Mobile and the Encyclopedia of Life API. It's pretty limited, but is a simple way to explore EOL using three different classifications. You can view this live at http://iphylo.org/~rpage/phyloinformatics/eoliphone/ (looks best on an iPad or iPhone). Once I've tidied it up I'll put the code online. Meantime here's a quick demo:

More BHL app ideas

Hero rosellasFollowing on from my previous post on BHL apps and a Twitter discussion in which I appealed for a "sexier" interface for BHL (to which @elywreplied that is what BHL Australia were trying to do), here are some further thoughts on improving BHL's web interface.
Build a new interface
A fun project would be to create a BHL website clone using just the BHL API. This would give you the freedom to explore interface ideas without having to persuade BHL to change its site. In a sense, the app would be provide the persuasion.

Third party annotations
It would be nice if the BHL web site made use of third party annotations. For example, BHL itself is extracting some of the best images and putting them on Flickr. How about if you go to the page for an item in BHL and you see a summary of the images from that item in Flickr? At a glance you can see whether the item has some interesting content. For example, if you go to http://biodiversitylibrary.org/item/109846 you see this:

N2 w1150

which gives you no idea that it contains images like this:

n24_w1150Tables of contents
Another source of annotations is my own BioStor project, which finds articles in scanned volumes in BHL. If you are looking at an item in BHL it would be nice to see a list of articles that have been found in that item, perhaps displayed in a drop down menu as a table of contents. This would help provide a way to navigate through the volume.

Who links to BHL?
When I suggested third party annotations on Twitter @stho002chimed in asking about Wikispecies, Species-ID, ZooBank, etc. These resources are different, in that they aren't repurposing BHL content but are linking to it. It woud be great if a BHL page for an item could display reverse links (i.e., the pages in those external databases that link to that BHL item).

Implementing reverse links (essential citation linking) can be tricky, but two ways to do it might be:
  1. Use BHL web server logs to find and extract referrals from those projects
  2. Perhaps more elegantly, encourage external databases to link to BHL content using an OpenURL which includes the URL of the originating page. OpenURL can be messy, but especially in Mediawiki-based projects such as Wikispecies and Species-ID it would be straightforward to make a template that generated the correct syntax. In this way BHL could harvest the inbound links and display them on the item page.





Mendeley Hack4Knowledge: towards an "ego wall"

I'm taking a virtual part in Mendeley's Hack4Knowledge event. I'm using this a chance to explore some ideas about building novel interfaces to bibliographic data in Mendeley. One idea is to display a user's entire library in one screen. I think the user interfaces employed by most bibliographic software are too conservative and there some cool things that could be done. For example, see A fluid treemap interface for personal digital libraries (doi:10.1145/1065385.1065512, PDF available from CiteSeer).

One idea I'm playing with is to display all a Mendeley user's papers as a quantum treemap, with thumbnails of the papers and "badges" indicating, for example, how many readers each paper has. The idea is that at a glance you can see all your publications, and which ones are being read the most. You can think of it as an "ego wall" — a quick way to see what others think about your work. Below is part of my library. You can see the full treemap here as an SVG file. Imagine this as an iPad interface to a user's Mendeley library.

Wall

Eventually I'll make this live. I'm doing this yet as the script to create the visualisation is slow due to the multiple requests I need to make to get the necessary information. I have to get the list of a user's papers from Mendeley, then I call the API for each paper to get basic bibliographic details. I have to screen scrape the corresponding paper's web page to get the thumbnail and the paper's UUID, which I can then use to get the readership stats via Mendeley's API via yet another API call. Sigh.

Anyway, this is enough hacking for one day. Hope to spend some more time on this project tomorrow.



I wrote that: asserting authorship using the Mendeley API

Inspired by the forthcoming Hack4Knowledge I've put together a service that enables you to assert that you are the author of a paper using the Mendeley API.

If you are impatient, give it a try at:

http://iphylo.org/~rpage/hack4knowledge/iwrotethat/

To use it you need a Mendeley account. When you go to I wrote that you will be asked to connect to your Mendeley account. Once you've done that, enter the DOI or PubMed ID of a paper and, if the paper is in your Mendeley library and flagged as a paper you've authored, you should see something like this:

Wrote

The site can be a little sluggish as it needs to go through all of your publications one by one until it finds a match.

Why?
Imagine you have a web database that includes publications, and you want people to join your site as users. If they have publications in your database, you'd like your users to be able to say "I'm the author of those papers" or, more generally, the author you have as "Roderic D. M. Page" is me.

One way to do this would be to enable the users to sign in to your site using Mendeley (see my blog post Mendeley connect). Once they've done that, the user could select a publication and say "that's mine". How do we test this assertion? Well, if the user is indeed the author it is likely that they will have added it to their "My Publications" section in their Mendeley library. So, we can use the Mendeley API to get a list of the author's publications and see whether the publication they claim is, in fact, one of theirs.

The inspiration for this came from tools like Google Analytics, where in order to add the tool to your web site you need to convince Google that you own the site. One way to do this is to add some text supplied by Google to the HTML on for site, on the assumption that only you can do this (because it's your site). In the same way, only you can add papers to your Mendeley library. Of course, I'm assuming that Mendeley users are being trustworthy when they and papers to "My Publications" (i.e., they're not claiming authorship on papers they didn't write).

How?
This hack uses Mendeley's OAuth support (the same technology used by Twitter and Facebook to connect to other sites) to enable you to connect your Mendeley account to the "I wrote that" application (note that my app never sees your account name or password). I use the Mendeley API user authored method to get a list of your publications, and user library document details to retrieve details of each publication. I then compare the DOI or PMID you supplied with each publication, until I find one that matches. If none matches, then I've no evidence you authored that paper.

Moan
No post about the Mendeley API would be complete without a moan about the state of the API. Apart from the fact that there is no function to directly find a publication in your library by DOI or PMID (hence I have to look at them all), there is virtually no support for retrieving any details about the user. For example, I wanted to brighten the web page up a little by adding a picture of the Mendeley user once they've logged in. There is no API function for this, nor a function to retrieve an identifier or URL for the user. Hence, in order to get a picture I screen scrape (yes, screen scrape) the Mendeley web page for the reference to get the URL for the linked author of the paper, then scrape the author's profile page and extract the URL for the image. This is insane. Please, please can we have a better API?

The Mendeley API Binary Battle - win $US 10,001

Now we'll bring the awesome. Mendeley have announced The Mendeley API Binary Battle, with a first prize of $US 10,0001, and some very high-profile judges (Juan Enriquez, Tim O'Reilly, James Powell, Werner Vogels, and John Wilbanks). Deadline for submission is August 31st 2011, with the results announced in October.

The criterion for judging are:
  1. How active is your application? We’ll look at your API key usage.

  2. How viral is the app? We’ll look at the number of sign ups on Mendeley and/or your application, and we’ll also have an eye on Twitter.

  3. Does the application increase collaboration and/or transparency? We’ll look at how much your application contributes to making science more open.

  4. How cool is your app? Does it make our jaws drop? Is it the most fun that you can have with your pants on? Is it making use of Facebook, Twitter, etc.?

  5. The Binary Battle is open to apps built previous to this announcement.


Start your engines...

ReaderMeter: what's in a name?

Screen_shot_2010-08-30_at_22.37.31.pngDario Taraborelli has released ReaderMeter, an elegant app built on top of the Mendeley API. You enter an author's name and it summarises that authorship's readership in Mendeley. The app provides some summary statistics (mine are shown below), and if you click on the horizontal bar corresponding to a paper, you can see a visualisation of who is reading your paper, including a nice map.

meter.png

As ever with author names, there are issues of people's name having more than one spelling. In Mendeley I'm known as Roderic D. M. Page, R. D. M. Page, Rod Page, Roderic Page, Roderic D. M. Page, and doubtless some others. Searching ReaderMeter using different spellings of my name gives different results. There are various approaches to tackling this problem, I've touched on one approach earlier.

However, there's a different way to tackle this problem in the context of apps like ReaderMeter, because if you're a Mendeley user you can assert that you are the author of a paper (these papers live in your "My Publications" collection). Using Mendeley's API, an app could retrieve this list of publications (providing the user gave it access), and we could compute readership statistics from the set of articles "known" to be authored (leaving aside the issue of people gaming the system by spuriously claiming authorship). In this way the app relies on the default behaviour of Mendeley users - uploading and self-identifying the articles they've written.

Implementing a feature like this posses two problems. The first is access to a user's data. Mendeley's API supports OAuth, so it could be done in such a way that only the account's user could authorise the app to access this list. The app could store the fact that the user has verified that the list of publications. Think of it as a bit like Amazon's Real Name™ feature.

The other obstacle is Mendeley's API, which returns readership statistics for public documents (i.e., those in the central aggregation). At present, using the API there is no way to link the global id for a Mendeley reference (e.g., ae7dd6a0-6d09-11df-936c-0026b95e484c) with the local id (e.g., 3582682802) that reference has in a user's collection, unless we resort to trying to match articles by searching by identifiers or article titles. If the API exposed these links, apps like ReaderMeter could become even more powerful (and personalised).

Navigating the Encyclopedia of Life tree on the desktop and the iPhone

This week seems to be API week. The Encyclopedia of Life API Beta Test has been out since August 12th. By comparison with the Mendeley API that I've spent rather too much time trying to get to grips with, the EOL API release seems rather understated.

However, I've spent the last couple of days playing with it in order to build a simple tree navigating widget, which you can view at http://iphylo.org/~rpage/eoltree/.

The widget resembles Aaron Thompson's Taxonomy (formerly called KPCOFGS) iPhone app in that it uses the iPhone table view to list all the taxa at a given level in a taxonomic tree. Clicking on a row in this table takes you to the descendants of the corresponding taxon, clicking "Back" takes you back up the tree. if you've reached a leave node (typically a species) the widget displays a snippet of information about that taxon. It also resembles Javier de la Torre's taxonomic browser written in Flex.

Here's a screen shot of the widget running in a desktop web browser:

insecta.png

Here's the same widget in the iPhone web browser:

web.pngUsing the API
The EOL API is pretty straightforward. I call the http://www.eol.org/api/docs/hierarchy_entries API to get the tree rooted at a given node, then populate each child of that node using http://www.eol.org/api/docs/pages. The result is a simple JSON file that I cache locally to speed up performance and avoid hitting the EOL servers for the same information. because I'm locally caching the API calls I need a couple of PHP scripts to do this, but everything else is HTML and Javascript.

iPhone and iPad
I've not really developed this for the iPhone. I've cobbled together some crude Javascript to simulate some iPhone-like effects, but if I was serious about the phone I'd look into one of the Javascript kits available for iPhone development. However, I did want something that was similar in size to the iPhone screen. The reason is I'm looking at adding taxonomic browsing to the geographic browser I described in the post Browsing a digital library using a map, so I wanted something easy to use but which didn't take up too much space. In the same way that the Pygmybrowse tree viewer I played with in 2006 was a solution to viewing a tree on a small screen, I think developing for the iPhone forces you to strip things down to the bare essentials.

I'm also keeping the iPad in mind. In portrait mode some apps display lists in a popover like this:

popover_flatten.png

This popover takes up a similar amount of screen space to the entire iPhone screen, so if I was to have a web app (or native app) that had taxonomic navigation, I'd want it to be about the size of the iPhone.

Let me know what you think. Meantime I need to think about bolting this onto the map browser, and providing a combined taxonomic and geographic perspective on a set of documents,

Mendeley API PHP client

mendeley.pngFollowing on from my earlier post about the Mendeley API, I've bundled up my code for OAuth access to the Mendeley API for anyone who's interested in playing with the API using PHP. You can browse the code on Google Code, or grab a tarball here. You'll need a consumer key and a consumer secret from Mendeley for the demos to work, and if you're behind a HTTP proxy you'll have to tweak the code (this is explained in the ReadMe.txt file that comes with the code).

The code is pretty rough, and doesn't use all the Mendeley API calls, but I've other things to do, and it felt like a case of either bundle this up now, or it will get lost among a host of other projects. The Mendeley API still feels woefully under-developed. I'd be more interested in developing this client further if the API was powerful enough to do the kinds of things I'd like to do.

Browsing a digital library using a map

Every so often I revisit the idea of browsing a collection of documents (or specimens, or phylogenies) geographically. It's one thing to display a map of localities for single document (as I did most recently for Zootaxa), it's quite another to browse a large collection.

Today I finally bit the bullet and put something together, which you can see at http://biostor.org/maps/. The website comprises a Google Map showing localities extracted from papers in BioStor, and a list of the papers that have one or more points visible on the map.

mapbrowser.png


In building this I hit a few obstacles. The first is the number of localities involved. I've extracted several thousand point localities from articles in BioStor. Displaying all these on a Google Map is going to be tedious. Fortunately, there's a wonderful library called MarkerCluster, part of the google-maps-utility-library-v3 that handles this problem. MarkerCluster cluster together markers based on zoom level. If you zoom out the markers cluster together, as you zoom in these clusters will start to resolve into their component points. Very, very cool.

The second challenge was to have the list of references update automatically as we move around or zoom in and out on the map. To do this I need to know the bounding box currently being displayed in the map, I can then query the MySQL database underlying BioStor for the localities within the bounding box, using MySQL's spatial extensions. The query is easy enough to implement using ajax, but the trick was knowing when to call it. Initially, listening for the bounds_changed event seemed a good idea. However, this event is fired as the map is being moved (i.e., if the user is panning or dragging the map a whole series of bounds_changed events are fired), whereas what I want is something that signals that the user has stopped moving the map, at which point I can query the database for articles that correspond to the region that map is currently displaying. Turns out that the event I need to listen for is idle (see Issue 1371: map.bounds_changed event fires repeatedly when the map is moving), so I have a function that captures that event and loads the corresponding set of articles.

Another "gotcha" occurs when the region being viewed crosses longitude 180° (or -180°) (see diagram below from http://georss.org/Encodings).

179-rule.jpg


In this case the polygon used to query MySQL would be incorrectly interpreted, so I create two polygons, each with 180° or -180° as one of the boundaries, and merge the articles with points in either of those two polygons.

I've made a short video showing the map in action. Although I've implemented this for BioStor, the code is actually pretty generic, and could easily be adapted to other cases where we want to navigate through a set of objects geographically.


More on the Mendeley API

After playing with the public API for Mendeley over the weekend (see Social citations: using Mendeley API to measure citation readership) I've had a quick play with the user specific part of the API. This API enables apps to connect with a user's account, so you could imagine using it to personalise citations lists (as I mentioned in the previous post), or building apps to handle a user's reading list (to complement Mendeley's existing desktop and iPhone clients).

Once again, it's frustrating just how rough the API is. The documentation is incomplete and contains errors, and some of the API calls simply don't work (see this post). I know I'm sounding like a broken record, but this API really needs a test suite. The quickest way to annoy potential users of the API is to get them to find really obvious bugs for you.

With a test suite in mind, I've created a simple app that enables you to connect to your Mendeley account and perform a bunch of simple tasks. The hardest part of getting this working was getting my head around OAuth. Luckily, @abraham has written a PHP library to support OAuth access to Twitter's API, so I grabbed that and replaced Twitter-specific code with the equivalent code for Mendeley.

Demo
You can try the app here: http://iphylo.org/~rpage/mendeley/moauth/.

The first time you go to the app is shows a button to connect to Mendeley. If you click on it you'll see something like this:
connect.png
(if you're not already logged in to Mendeley it may ask you to log in — note that all of this happens on Mendeley's site, my app never knows your username or password details). If you're willing to try the app, allow it to connect to your account. You'll then see a bunch of API requests and results. All but one of the requests is simply displaying information. One request does try to add a test document (the one listed on the Mendeley developer's site), but at the moment this part of the API doesn't seem to work (nor does the call to get the list of papers that you've authored).

If and when Mendeley get the API working fully (and documented) there's a lot of scope here. But what I'd really like to see is Mendeley develop a test suite that runs through every API call and checks that the methods work as advertised.

Social citations: using Mendeley API to measure citation readership

Quick note on an app I threw together using the Mendeley API that I discussed in the previous post. This app is crude, and given that the Mendeley API is rate-limited and in flux it might not work for you.

The basic idea is to embellish make the list of literature cited in an article with information that might help a reader decide whether a given citation is worth reading. One clue might be how many people on Mendeley are reading that article. So, my app takes an article, extracts the list of cited literature, and for each article with a PubMed identifier it asks Mendeley "how many readers does this article have?" For now the app is restricted to using articles from the BiomedCentral series as these have Open Access XML with literature cited lists that contain PubMed numbers (PLoS articles, for instance, don't have these, for now I'm avoiding the overhead of finding identifiers for the articles). I'm using PubMed identifiers as the Document Details method in the Mendeley API doesn't handle DOIs at present.

The app is at http://iphylo.org/~rpage/mendeley/, and the default article I've chosen to demonstrate the app is Robust physical methods that enrich genomic regions identical by descent for linkage studies: confirmation of a locus for osteogenesis imperfecta doi:10.1186/1471-2156-10-16, but you can enter the DOI of any BMC article to give it a try. Below is a screenshot of part of the list of literature cited by this paper, together with readership numbers:

readership.png

The default article has 4 readers in Mendeley. The readership of the articles it cites varies, but one article stands out with 208 readers.

There are huge limitations with this app (it doesn't cache the Mendeley results, so repeated use will exceed the rate limits), it is limited to citations in PubMed (could add support for DOIs and title searches), and only BMC articles can be processed.

What would be interesting is to extend this in other directions. For example, if the user had a Mendeley account, it would be nice to flag which articles the reader already had in their library (and perhaps have the ability to add those that weren't to the library). To personalise the citation readership display I'd need to add support for OAuth, which Mendeley uses to authorise access to user accounts.

If Mendeley were to provide more social features in their API then we could add flags indicating whether any of user's contacts have any of these articles in their libraries (your decision to read a paper might be influenced by whether a contact of yours has read it -- think of it as a resembling the Facebook "Like" button). Or we could display the readers themselves, so you could discover people with potentially similar interests to your own.

My twitter stream has been full of complaints about the Mendeley API — life on the bleeding edge is not always fun. But the API does have the potential to support some cool applications, once it gets the kinks ironed out.

Mendeley API: we'll bring the awesome if you bring the documentation

mendeley.pngMenedeley's API has been publicly launched at http://dev.mendeley.com/, accompanied by various announcements such as:
Mendeley's Research API is now open to the public. Developers, go forth and bring the awesome :) http://dev.mendeley.com/ (@subcide)

Finally saw the awesome Easter Egg that @subcide hid on the new dev.mendeley.com Developer Portal! Whoaaa! (@mendeley_com)

All good fun to be sure, but it's a pity more effort has been spent on Easter eggs than on documenting and testing the API. If you visit the API development site there's precious little in the way of documentation, and few examples. As well as making a developer's life harder, adding examples would have helped catch some bugs, such as the failure of the API calls to return details such as volume, issue, and page numbers for articles, and the inability to retrieve a document using a DOI (the '/' that a DOI contains breaks the API). These are fairly obvious things. If resources are limiting, perhaps the Mendeley API team should open up the development web site to others to help create documentation and examples. A wiki would be one way to do this.

Menedeley is a great idea, but on occasion the hype gets ahead of reality. The product has a lot of potential, but also has some significant problems. Using the search API you pretty quickly encounter its number one problem: duplicates. I get the sense that Mendeley is about three things:

  1. Managing personal bibliographies and generating citations (desktop client)

  2. Networking ("the Last.fm of research") (web site)

  3. Bibliographic data

Number 3 is, I suspect, the hardest problem to tackle, and it is where the ultimate value lies (think citation networks, audience data, iTunes-like business model for selling articles, etc.). I'd like Mendeley a lot more if I was confident that they had a good handle on the complexities of bibliographic data (and didn't drop pagination from API calls). Good places to start are "Are your citations clean?" (doi:10.1145/1323688.1323690) and "Learning metadata from the evidence in an on-line citation matching scheme" (doi:10.1145/1141753.1141817), both currently duplicated in Mendeley (try searching for Are your citations clean and Learning metadata from the evidence in an on-line citation matching scheme).

Mendeley Open API and the Biodiversity Heritage Library

Mendeley have called for proposals to use their forthcoming API. The API will publicly available soon, but in a clever move Mendeley will provide early access to developers with cool ideas.
Given that the major limitation of the Biodiversity Heritage Library (from my perspective) is the lack of article-level metadata, and Mendeley has potentially lots of such data, I wonder whether this is something that could be explored. My BioStor project takes article metadata and finds articles in BHL, so an attractive work flow would be:
  1. People upload bibliographies to Mendeley (e.g., bibliographies for particular taxa, journals, etc.)

  2. BioStor uses Mendeley's API to find articles liklely to be in BHL, then locates the actual article in Mendeley.

  3. The user could then grab a PDF of the article from BioStor that contains XMP metadata (which Mendeley, and other tools, can read)

Users would gain a tool to manage their bibliographies (assuming that they prefer Mendeley to other tools, or are happy to sync with Mendeley), they would be contributing to a database of taxonomic (and biological literature in general, BHL's content is pretty diverse), and also gain easy access to PDFs for BHL content (this last feature depends on whether Mendeley can associate a PDF with an existing bibliographic record automatically). In the same way, a tool such as BioStor (and, by implication, BHL) could gain usage statistics (i.e., who is reading these articles?).

Our communities efforts at assembling bibliographies haven't amounted to much yet. The tools we use tend to be poor. I find CiteBank to be underwhelming, and Drupal's bibliographic modules (used by CiteBank and ScratchPads) lack key features. We also seem reluctant to contribute to aggregated bibliographies. Perhaps encouraging people to use a nicer tool, and at the same time providing additional benefits (e.g., XMP PDFs) might help move things forward.

Anyway, food for thought. Perhaps other tools might make more sense, such as using the API to upload metadata and PDFs direct from BioStor to Mendeley, and making the collection public. But, if I were Mendeley, what I'd be looking for are tools that enhance the Mendeley experience. There's some obvious scope for visualising the output and social networks of authors, such as the sparklines and coauthor graphs I've been playing with in BioStor (for example, for W E Duellman):

FB144353-8061-456E-A502-9D9F01F56123.jpg

D6EB9A37-0440-479F-B937-4489359C1E33.jpg

Before this blog post starts to veer irretrievably off course, I'd be interested in thoughts of anyone interested in matters BHL. There's nothing like a deadline (Friday, May 14th) to concentrate the mind...