Phil Archer

phil@philarcher.org

Online Privacy, Do Not Track etc.

As part of my role as an advisor for MetaCert, I've been catching up on the issue of privacy on the Web, specifically around the Do Not Track header, Tracking Protection Lists, the W3C activities in this area and the comments of policy makers in the EU and US. Lots to look at and I've probably missed a lot more. Reading through a few things today I wanted to capture some thoughts.

Web Tracking Protection

Microsoft made a Member Submission to W3C in February called Web Tracking Protection. The features it describes are already implemented in Internet Explorer 9 and there are similar features in Firefox 4+ and a Chrome extension — and it's not even a stable standard yet.

The submissions sets out two things:

First, a Do Not Track header, that is, an HTTP Header that would be sent with every request your browser makes. It's as simple as it can be:

DNT: 1

Whether this is set or not is exposed in the DOM so it's available to developers via JavaScript too.

But the bulk of the spec describes the syntax for Tracking Protection Lists (TPLs). This is a way to define lists of sites that are allowed and blocked from collecting data from you. That's a simplification. It only kicks in if the blocked or allowed site is providing a third party service to the site you're actually visiting. So if I were to include content from example.com on this site, you could use a TPL to block that site from collecting data when you visited philarcher.org. If you go to example.com itself, that's the primary site you're visiting and the TPL doesn't apply.

The design of the TPL, or Filter List, looks very simple — which is good. But the flexibility that has been built in, allowing any string match in the optional path section and so on, seems rather loose. We struggled with this in POWDER Grouping. URIs are a very compact data format and syntax is critical. You need to handle it carefully and that means that some complexity is inevitable.

What we ended up with was a way to express any group of URIs, no matter how complex. The simplification step was to allow you to specify the pattern bit by bit with things like "includepathstartswith" etc.

My expectation is that in reality, filter lists will be used only to block and allow domains and that the optional path section will rarely be used. I adduce the poster boy TRUSTe Tracking Protection List as evidence! This makes sense in context. If you trust example.com/page1.html to track your behaviour, why would you not trust example.com/page2.html? And anyway, it's not the site you're looking at that is or is not allowed to track you. It's the third party site whose content you probably don't even realise you're looking at because it's not that site showing in your address bar.

The only scenario I can think of where being able to switch tracking on and off might be useful is if you were buying different goods online from the same vendor and you didn't want to be tracked when buying one thing but didn't mind being tracked for buying another. For example, massive disclosure coming up, I've pre-ordered Stephen Baxter's latest novel Bronze Summer. The URL for the page on the Waterstone's Web site for this is http://www.waterstones.com/waterstonesweb/products/stephen+baxter/bronze+summer/7037858/.

I don't mind Waterstone's tracking me on this. I have a loyalty card with them for goodness sake so I know that they already know everything about my literary preferences. But I may prefer not to be tracked if I were to buy a book for my son who, for example, likes Darren Shan http://www.waterstones.com/waterstonesweb/products/darren+shan/hell27s+heroes/6769747/ so maybe I'd add

+d stephen*baxter
-d darren*shan

to a personal TPL. That way I'd be covered whether I was being tracked buying books from Waterstone's or Amazon (http://www.amazon.co.uk/Bronze-Summer-Northland-Stephen-Baxter/dp/0575089229/). But no, hang on, I'm looking at Waterstone's or Amazon so the TPL doesn't apply. From a user perspective, I think I'd want to be able to say "don't track me when I'm on waterstone.com, irrespective of who is doing the tracking on your behalf." Also, I think a method of just listing domains would be sufficient. And even then, would I be bothered to maintain a TPL? Probably not in all honesty, but I would be interested in using a list of sites that did and did not "do the right thing" that someone I trusted had compiled.

And that's one aspect I like about the TPL specification and IE9 implementation — the way the data is transmitted. Having imported the TRUSTe list into IE9 it will automatically be updated every 2 days (a period set by TRUSTe) and a user can easily add in more lists from other suppliers (like MetaCert in due course).

The DNT header relies on it being respected by companies that want to track you unless you say otherwise.

The publication of TPLs by companies like TRUSTe and (in future) MetaCert means that the user's choice of rules is implemented in the browser, so sites that want to track you and that don't respect the DNT header will be thwarted. Good.

The Broader Debate

W3C held a workshop on Web Tracking and User Privacy in April and, looking through the report and the slides, it was clearly a good discussion. Lorrie Cranor was there, one of the principal architects of P3P which was a well designed and detailed specification for exchanging privacy preferences. P3P was fully implemented in Internet Explorer too. Tracking protection is only one part of privacy and, in my opinion, the biggest weakness of P3P was that it enshrined its own vocabulary within the spec. Bridges, water and all that.

There's a lot of talk among developers about the EU directive on cookies. This is the one that appears to require Web sites to seek explicit permission from every visitor before any cookies can be set. The most typical comment on this from developers being along the lines of "hang on, the only way you can know whether a visitor to the site has refused permission for you to set cookies… is to set a cookie." Like most UK reporting of EU law it's alarmist and inaccurate.

The directive (actually an amendment to the Privacy and Electronic Communications Directive) only applies to cookies that carry user-specific data. It doesn't apply to per-session cookies, shopping baskets etc. More on this from Scottish SEO company Hobo: New EU Privacy Directive On Cookies — "12 Months To Get Your House In Order" in UK.

But the policy makers and politicians are a long way from being satisfied. Both the American FTC and the EU Commission are looking well beyond cookies as reported at the second W3C workshop on this subject that took place last week in Brussels.

Metadata, browsers, user choice, user interface. It all sounds terribly familiar but I'm hopeful that we can get it right. It's part of my role at MetaCert to make sure of it!