Fear of RDF

Yesterday I received tweet that said: @philarcher1 when I hear "Open Data" I always fear it's RDF. Is that the case or are these new open data initiatives more Web friendly?

Even a cursory look at the sender's Web site shows you that he's a very knowledgeable programmer who is particularly skilled in JavaScript. I know him virtually through the W3C's Mobile Web courses (one of which I teach) and it's clear he knows his stuff.

This tweet however strongly suggests that he's so entrenched in one world view that he rejects any other out of hand. I know this not to be the case but it is the impression given in the limited circumstances of 140 quickly written characters. This is not helpful.

JavaScript allows you to create highly engaging Web content that is dynamic and interactive. If you’re good at it (and this person is) then you can get a browser to do extraordinary things these days, including, literally, making it sing (through the audio API). The data exchange method used by JavaScript, JSON, is very powerful and all the modern browsers have built-in support for it. Nothing wrong with JSON, nothing wrong with JavaScript.

What saddens me of course is the rejection of RDF. Some basics:

JavaScript is a programming language that allows you to do clever things in a Web browser;
JSON is a simple data exchange format that encodes name/value pairs and arrays;
RDF is a data model that allows you describe the real world;
the real world can be complicated and messy;
Linked Data is an application of RDF that allows disparate data sources to enrich each other.

To 'fear RDF' is as silly as to fear wooden spoons — different tools for different jobs.

Can you not enrich (mix) data with JSON? Of course you can, but you do it by matching one table with another in a relational-database sort of way. Again, nothing wrong with relational databases it's just that RDF is better for some things, like publishing datasets in such a way that encourages re-use in ways that the publisher themselves may not have thought of. A tabular data set is essentially two dimensional, and, to emphasise, very good for a lot of things. RDF is multi-dimensional and good for a lot of other things. The two overlap, but not by 100%.

At the Uses of Open Data workshop I ran recently, I saw a couple of examples where the power of linked data is making a real difference. The Renewable Energy and Energy Efficiency Partnership, Reegle [broken link removed], collects and triplifies data from many different sources to present a lot of domain-specific information in different ways. Similarly, publicspending.gr [broken link removed] collects data from different sources, triplifies it and presents it in useful ways. In both cases, the aggregated data is made available as linked data that can be accessed using the same query language and tools that are used in the applications themselves. Try doing that in JSON. You can, yes, but it's a lot easier to use RDF for that kind of thing.

The idea that RDF is not Web friendly is, well, arrant nonsense. RDF is based on using URIs to identify things and relationships between them. If you restrict the use of RDF to using URIs that begin http: and follow a set of simple best practices, you get Linked Data which is how you do data at Web scale. That is, a massively distributed data set that anyone can contribute to, just like they can to the Web of documents. The idea of using URIs to link data is right there in TimBL's original paper from 1989. The Web of Data is woven into the very fabric of the Web.

Tim Berners-Lee's original proposal for what became the World Wide Web

But there's more.

One of the reasons some public sector officials are reluctant to publish their data is that they fear that it will be misused. Notably the fear is that some of the important detail in the data will be missed. Tabular data typically includes annotations "beware this is an estimate" or "this value is missing because of XYZ". That kind of data is an important part of the story and it's vital that developers take note — those annotations and the general metadata that goes with any data set are what tell you the context in which the data was gathered, what it does and does not represent. To encode that you need rather more than name/value pairs and the odd array. I said more about this in a recent keynote.

Religious wars that reduce to "my technology's better than your technology" are always counter productive. That said, I do think that the linked data community does need to do more to make developers' lives easier. JSON-LD is a really good start, so is R2RML and I'm looking forward to seeing what comes out of the heavily JSON-centric Data Protocols [broken link removed] work announced this week by the Open Knowledge Foundation.

But please don't fear RDF and don't tell me that it's not Web friendly.

25 July The original Twitter user wrote:

I'm a bit upset that you called me out like that in your blog post. I was asking honestly and I think you unfairly made me sound like an ignorant idiot. I don't just speak from one side: I sunk a lot of years into RDF and got my initial PhD funding based on the premise of using RDF to build scalable games. I actually did 1.5 honours thesis (with Jane Hunter who you may have heard of… she was big in metadata some years back); The honours first I did was building an RDF repo to catalogue children's art by embedding RDF into images. However, I had to quit that project because RDF caused me so many issues I gave up; I sent you the link to my second thesis, so I feel I do kinda know what I am talking about - so I was not just trolling.

I take the role of open data extremely seriously and access to that information as vital to society. However, I fear that instead of data being provided as "raw" data, much of be will be pumped out as RDF. All data is hard to deal with - but RDF is particularly tricky and tool support is, AFAIK, extremely poor (I speak from experience - not as someone who is just some one sided person).

Anyway, I think you could have made your points about RDF and their relation to DBs, etc. without attacking me directly. If there is anything "the other side" wants to see is tools the show the potential and that RDF is as easy to deal with on the Web clients as JSON or plain text. The fact remains that browsers support JSON out of the box, but not RDF… and that might make it harder to use. There are lots of web developers that would love to play with sources that are currently in RDF (e.g., stuff from museums in the UK), but are put off by the complexity. I don't think that is one sided.

Anyway, I wish we could have a good discussion about this. I'm an open minded person and I'm always willing to try things, like RDF, again.

Reply

First of all, my apologies — it certainly was not my intention to make you sound like an ignorant idiot. I have anonymised this page very rapidly and deleted my Tweet that alerted you to it.

Interesting that you make a distinction between 'raw data' and RDF. That strikes me as odd - even more so given your evident experience. RDF is just a way of encoding data, but one that is flexible enough to make a reasonable job of modelling the real world in ways that simpler technologies struggle with. I would argue that, in some circumstances, creating 'raw data' - by which I assume you mean something tabular - is processed very heavily to force it into a table and that such processing might eliminate details of its meaning. This can be true in RDF as well of course - my bugbear is imprecise but accurate dates like "July 2012" that get turned into 2012-07-01T00:00:00Z which is very precise but could be wrong by as much as 31 days, just so it fits the regex.

We agree entirely that there is a need to make RDF easier to deal with and support in the browsers would be an obvious aid there but my broader point is that the world is messy. I don't know whether Einstein ever actually said this but it's attributed to him: Everything should be made as simple as possible, but not simpler.

To emphasise my final para, I do agree that the Linked Data community is where the work needs to be done to make RDF easier to process and there are serious efforts in that direction like Sindice [broken link removed]. But I don't think that means that we can only work with data models that can always be handled in a particular way — sometimes you need more expressivity and yes, that can be hard.

Fear of RDF

Comments

25 July The original Twitter user wrote:

Reply