Time to update this page again! Jonathan Rees has taken on the substantial task of trying to resolve this issue that has raged for years. He's seeking comments on a new document that tried to capture the arguments and proposals in one place: Providing and discovering definitions of URIs.
My own (short) comments on that are in the TAG mailing list archive.
Incidentally, I have removed the links to Ian Davis's blog posts from this page as they are no longer online. Originally they were at:
- Is 303 Really Necessary? (http://iand.posterous.com/is-303-really-necessary)
- A Guide to Publishing Linked Data Without Redirects (http://iand.posterous.com/a-guide-to-publishing-linked-data-without-red)
Since I posted this, Ian has written a new blog post that centres on making use of the Content-Location HTTP Response header. I admit I'd not heard of this header (but I seem to have been in good company). The HTTP-bis Working Group's current draft on this topic includes a set of rules for parsing the headers that show how a 200 response could be given for a non-information resource as the Content-Location header points to the information resource that describes it. Details in Ian's post.
This might work and, if adopted and implemented consistenty could be an alternative for 303 redirects. Time will tell.
I wrote a short piece
about this on the W3C Team blog that is primarily about publicising the erratum
in the POWDER specifiation of
wdrs:describedby. That piece, and the various things it links to, mean that
any discussion of the errors made would be redundant here.
Talis CTO Ian Davis has posted a blog entry in which he argues that the convention in the linked data world of giving URIs to 'things' that, when dereferenced, point to a document which is at a different location but that describes the identified thing, is a barrier to LD adoption.
Where I disagree is on Ian's solution which is that we should stop using HTTP's 303 response and just get over ourselves an use 200, adding some triples along the way to retain the distinction between Information Resources (electronic documents, images, etc) and non-Information Resources (everything else).
Quoting from RFC 2616
HTTP Response Code 200 OK means "The request has succeeded."
HTTP Response Code 303 See Other means "The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. The new URI is not a substitute reference for the originally requested resource."
And we're being asked to chose between these. I think that's a false dichotomy. Actually, we need a different response altogether.
Let's take a look at a real example:
This is the identifier for my post code, as defined by the Ordnance Survey in the data recently published through our platform. If I dereference that and don't follow the redirect, the HTTP Headers include:
303 See Other Connection: close Location: http://data.ordnancesurvey.co.uk/doc/postcodeunit/IP45TW Content-Type: text/html; charset=UTF-8
Linked Data folks interpret this as saying "I know what you mean by http://data.ordnancesurvey.co.uk/id/postcodeunit/IP45TW and if you want to find out about it, look at http://data.ordnancesurvey.co.uk/doc/postcodeunit/IP45TW (note the path change from /id/ to /doc/).
But the server is also spitting out its default MIME type of text/html with a character encoding of UTF-8. That's nonsense.
Let's look again at what 303 says:
- The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource.
So the response to a request for the identifier for IP4 5TW can be found somewhere else, somewhere with a different URI. That makes it a different resource and that makes sense. It fits the architecture of the Web. OK, it's not a perfect fit but it will serve our purpose.
- This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource.
OK, it says "primarily" so it can be used for something else but we're not using it as intended and we're stretching the semantics a lot. We're a long way from responding to an HTTP POST here.
- The new URI is not a substitute reference for the originally requested resource.
That, for me, is the strongest argument in favour of using 303.
It's not a bad fit, but, for all the reasons Ian elucidates, it's not a developer-friendly solution and, from the above, it's not really doing what we want.
So Ian's solution is to use 200 and add in some triples. Let's try that. We'd have:
<http://data.ordnancesurvey.co.uk/id/postcodeunit/IP45TW > <http://data.ordnancesurvey.co.uk/ontology/postcode/county> <http://data.ordnancesurvey.co.uk/id/7000000000015934>
<http://data.ordnancesurvey.co.uk/id/postcodeunit/IP45TW > ex:isDescribedBy <http://data.ordnancesurvey.co.uk/doc/postcodeunit/IP45TW.html>
But we'll also get a load of HTTP headers. Assuming they'll be much the same as what we see at http://data.ordnancesurvey.co.uk/doc/postcodeunit/IP45TW.html now that means we'll also have:
Cache-Control: max-age=7200, must-revalidate Content-Type: text/html; charset=UTF-8
And that's where it is clearly wrong. The data we have may be in different formats but it's still data. It's saying that IP4 5TW is in Suffolk (it is) and that it has a maximum age of 7200 seconds and is encoded in UTF-8 which is obvious nonsense.
It's that obvious nonsense that lead to the adoption of 303 as the solution — which is a hack — making the best of what is available.
Faced with a choice of 200 or 303, I'm firmly in the 303 camp. If we care about semantics and the detail in the data then we can't just cherry pick the bits of data that suit us from the response the server is sending, conveniently ignoring the other data we're getting.
But that is not to say I think 303 is a good solution. It's a compromise. A better solution would be a set of data points that actually described the situation. That is, an encoding of:
The resource you have requested is not an information resource and therefore cannot be returned over HTTP. Instead, you're receiving a document that describes that resource. The document is of the type text/html, is encoded in UTF-8 etc.
That's not in the HTTP spec. But if that's what we want to say, and I believe it is, then we jolly well need to get in into the spec. There are plenty of numbers available as 2xx codes!
Messing with HTTP is a huge undertaking. Getting an addition to the spec is a bit of hurdle but that's all it is, a hurdle. And it can be overcome, especially if what is proposed is an addition that doesn't change any existing features. The bigger hurdle by far is getting such a new response code implemented.
I don't under estimate the latter, but given the false dichotomy of 200 or 303, I think it's worth a go.
Finally, a relatively minor point in this context but one that if I don't make it, no one else will!
descibedby property, is formally defined at http://www.w3.org/TR/powder-dr/#appD, as meaning:
The relationship A 'describedby' B asserts that resource B provides a description of resource A. There are no constraints on the format or representation of either A or B, neither are there any further constraints on either resource.
If I say so myself, that seems a pretty good fit for what we need here. And, a non-trivial point I hope, is that the term is also included in the list of link relationship types in the new Web Linking RFC 5988.