Why Linked Open Data Makes Sense for Biodiversity Informatics

I came across an issue in my own work that I think serves as a good example of the advantages of the Linked Open Data approach.

I have been working to create Linked Open Data compliant identifiers for species. Species are traditionally described in a published paper. These species descriptions along with the type specimens serve as documentation of the species concept. Occasionally, others revise this species concept through published "revisions."

The TaxonConcept.org species concepts would be clearer and more useful if I included links to the original species description in the species concept RDF.

Since the Biodiversity Heritage Library has already been working on collecting, scanning and databasing this information, it seems that the most sensible and efficient approach would be simply to link to their identifiers for the appropriate publications in my RDF.

It does not make sense to replicate this data and functionality in my application when the BHL is already doing a great job databasing biodiversity publications.

These links could be either in the form of a URL to the PDF version of the species description, or as links to an RDF file containing the title, author, and journal of the original species description. All that is really needed are resolvable identifiers for each published species description that exposes enough information to make it clear to what specific article I am linking to.

This first diagram represents how I might have modeled TaxonConcept.org as a traditional "walled garden" web application. In this example, I recreate tables and data for occurrence records and references. I then curate and expose my reference and occurrence data separately from other, often more complete, data sets.

A Linked Open Data approach would make more sense. One could simply link to occurrence records and references that are already being curated by others.

In addition to improving the value of my data set, other groups could use those same links to improve the quality of information they provide. In the process, those data sets that link to the same BHL identifier are now also interlinked and "findable." From the perspective of the BHL, these links could serve as a way to measure the utility of their service and obtain metrics on each publication.

If each dataset is assigned to a separate graph, it becomes easy to include or exclude the data sets and statements made by other groups.

The diagram below shows some of the potentially linkable data sets. There are a lot more on the Linked Open Data Cloud, but I wanted a reasonably sized diagram. Some of these resources already exist and are interlinked, while others like GBIF, the BHL and the Encyclopedia of Life are either not available or are still in the planning stage.

In summary, the Linked Open Data approach makes the best use of everyone's efforts, reduces data redundancy, and makes additional data sets, of which you might not have been originally aware, findable and usable.