Linked data – currently a bumpy road

So, the UK Government has made a firm commitment in its new “public data principles” to publish any “raw” dataset in linked data form in the smarter government white paper.  Quite what does this mean?

I tried to find a clear definition of “raw” data.  In almost all definitions you will find on the Internet, it means “unedited” or “unprocessed” data.  Is that really what is intended by those that use the term?  If so, then all of our statistical outputs would be excluded?  The term “raw data” is not at all helpful, especially if when used in this context it is not intended to mean the same thing as the more generally accepted definition, which comes from measurements from scientific instruments.

I suspect that the intention is that it means the underlying datasets that were used to reach policy recommendations.  These are almost always not “raw” datasets – they are statistical outputs that most certainly have been processed in order to ensure that they really are meaningful.

So, perhaps we really mean we will publish meaningful processed data (which is almost always not “raw” data) in linked data form.  How easy is it to ensure that the provenance of linked data can be well understood, and taken into account by those using it?  Not very, if we follow the example set by Brian Kelly of the University of Bath.  He set a challenge to some students to use Linked Data to find out which UK city has the highest proportion of students.  Within a few hours, a student produced the answer…Cambridge, whose students are 3224% of the population.  According to the Linked Data web of data, there are 38,696 students living in Cambridge which has a total population (according to the web of linked data) of 12.

Oh dear!  The wonderfully elegant query created by the student looks fine, but the underlying quality of the data in some of the sources would seem to be a bit dodgy.  Provenance remains a serious issue for Linked Data, and if we are to start publishing official statistics in Linked Data form it is not at all clear how Brian’s student would know how to write the query to pick up the right population of Cambridge (which I think must be more than 12).

This does not mean we should ignore this world – it is early days, and the concepts make good sense.  But equally, we should not rush into thinking that there is a quick fix. Those who are interested in finding a consistent way to represent statistical data in the Linked Data world are working together at this google group.  If you are interested in this, please join us.

February 26, 2010 • Tags:  • Posted in: Uncategorized

2 Responses to “Linked data – currently a bumpy road”

  1. Luke - March 5, 2010

    An interesting point you make regarding users ability to extract the information they need. The problem is, we are always catering to the lowest common denominator when desseminating information, and this comes at a cost – how the information is presented, how it can be accessed and the time and effort spent on making it so – not to mention the delights of putting it through disclosure control checks.

    If only all our users were as intelligent as we are, right? ;-P

    There is also an obvious issue of cost vs benefit (as always) and despite the noble intentions of making such data available, while I support the concept I do not support the vehicle – users will always wind up giving producers of data a ring/sending an e-mail asking for such-and-such. I would suggest having the data available online but not spending too much effort on presentation or advanced accessibility…then when a request comes in that has not been done before, uploading that dataset for everyone to see, with appropriate titles and notes accompanying.

  2. Simon - March 6, 2010

    Thanks Luke.

    You’ve reminded me of some interesting comments on the experience of open data in New Zealand. One of the key messages here is to find what people want, and value, first, otherwise you will waste a lot of money and effort freeing up information that nobody wants. I fear we may have to re-learn this lesson the hard way in the UK.

Leave a Reply