What is the Internet?
A nice article here in The Observer by Professor John Naughton, which neatly captures much of the difficulty that many have in understanding the Internet. I do like his distinction between the Web and the Internet – the train / track analogy seems a good one, worth a go when trying to explain this to colleagues who are stuck in a browser-based view of the Internet.
By coincidence, I read this today having attended an earlier discussion at work about “Web Strategy”, in which we debated whether we meant Web or Internet, or just plain dissemination! It would be good to have that discussion again after everyone has read this article.
I’m less sure about “web 3.0″ – the Semantic Web looks more like a different kind of train running on the same track than an enhancement of the “web 2.0″ train, but that is a minor point, and as he says in his postscript, “It’s too early to say”.
Semantic Web at ONS
I’m giving a series of talks with Paul Richards at ONS exploring what the semantic web is, and what it means to ONS.
Here’s the prezi we use, but halfway through, Paul shows a BIS application then gives a great demo using the Talis platform and constructing a SPARQL query on the fly.
I have to admit that the more I describe Linked Data, the more the inherent contradictions come home to me. I suspect that proponents should stop talking about grand but unrealisable visions, and focus on some narrow but deep practical applications.
We have yet to figure out how best to represent statistical data in this emerging world, hopefully some of my colleagues responsible for producing statistics will step forward and help.
Taking an architectural view of services
I’m 18 months into my research at Glamorgan Business School, and at last have reached the stage where I am doing real stuff, and collecting real feedback and data. These case studies are being conducted at two different government departments, and I’ve plans for one or two more during this year.
I’ve developed an approach to evaluating service designs, using a service quality model, based on experience and models drawn from the software architecture community. One feature of this work has been to give all evaluations a set of stakeholder perspectives. This seems to be opening up a host of research opportunities far beyond the scope of my project.
One of my case studies involves developing an improved public service, for which we have described 21 scenarios and identified 28 distinct stakeholders (which we have clustered into 12 groups). Looking at each scenario from the perspective of each stakeholder group has produced a very rich and varied picture of the service, which can be viewed through the lens of a common service quality model. This is more like a by-product of my method than an intended consequence, but the use of a common stable model opens up the possibility of benchmarking and classifying stakeholders according to their profile.
At the top level, the model breaks service quality into six characteristics: Functionality, Reliability, Usability, Efficiency, Maintainability, Adaptability. We are now building profiles for these case study projects of different stakeholder groups – which ones care most about Efficiency, which ones are focused on Usability. With enough data, I am sure we would find repeated patterns – no prizes for guessing what the finance department are most interested in.
Will these vary by industry? By type of service? If the patterns are consistently repeated, can we use them to develop service design patterns, driving the quality of service design up, and the cost down?
I’ll be discussing these themes in my presentation at the Enterprise Architecture Conference Europe 2010 on 18th June.
Linked Data and Statistics
Semantic web seems to be dominating my thinking at work at the moment. Paul Richards and I attended the SDMX Expert Group in Geneva last week, and we presented the outcome of the Sunningdale Workshop we organised on Publishing Statistics in SDMX and the Semantic Web to the SDMX community. We received some very positive comments, and there is real recognition that the community should take advantage of semantic web developments (and has valuable experience and expertise to contribute).
I used Prezi again – which is open and shareable. Is this better than Powerpoint, or more distracting?
Part of the purpose of the presentation was to boost awareness of the community that has grown, and continues to develop the ideas formed in Sunningdale. We’re already up to 60 members, and still growing. We seem to have made real progress, but have yet to prove that there is a good reason to break apart the observations that make up a dataset into countless rdf triples. I see good reasons to expose dimensions and other dataset metadata as linked data, to make it findable and linkable.
But why disassemble the actual dataset?
This seems a bit like separating all the words of a document so that they can be reassembled by a client application. The effort to make sure that the resulting document is the same as the original is substantial, and of doubtful value. I think we are close to showing that it can be done with datasets, though perhaps not very safely. I’m not sure we’ve yet shown why we should do it, but I’m happy to go along with a number of experiments that might shed some light on the matter.
Linked data – currently a bumpy road
So, the UK Government has made a firm commitment in its new “public data principles” to publish any “raw” dataset in linked data form in the smarter government white paper. Quite what does this mean?
I tried to find a clear definition of “raw” data. In almost all definitions you will find on the Internet, it means “unedited” or “unprocessed” data. Is that really what is intended by those that use the term? If so, then all of our statistical outputs would be excluded? The term “raw data” is not at all helpful, especially if when used in this context it is not intended to mean the same thing as the more generally accepted definition, which comes from measurements from scientific instruments.
I suspect that the intention is that it means the underlying datasets that were used to reach policy recommendations. These are almost always not “raw” datasets – they are statistical outputs that most certainly have been processed in order to ensure that they really are meaningful.
So, perhaps we really mean we will publish meaningful processed data (which is almost always not “raw” data) in linked data form. How easy is it to ensure that the provenance of linked data can be well understood, and taken into account by those using it? Not very, if we follow the example set by Brian Kelly of the University of Bath. He set a challenge to some students to use Linked Data to find out which UK city has the highest proportion of students. Within a few hours, a student produced the answer…Cambridge, whose students are 3224% of the population. According to the Linked Data web of data, there are 38,696 students living in Cambridge which has a total population (according to the web of linked data) of 12.
Oh dear! The wonderfully elegant query created by the student looks fine, but the underlying quality of the data in some of the sources would seem to be a bit dodgy. Provenance remains a serious issue for Linked Data, and if we are to start publishing official statistics in Linked Data form it is not at all clear how Brian’s student would know how to write the query to pick up the right population of Cambridge (which I think must be more than 12).
This does not mean we should ignore this world – it is early days, and the concepts make good sense. But equally, we should not rush into thinking that there is a quick fix. Those who are interested in finding a consistent way to represent statistical data in the Linked Data world are working together at this google group. If you are interested in this, please join us.
