Talks Given in 2017

Data on the Web Cambridge,

Data on the Web

Through Muriel Foulonneau (formerly at the Luxembourg Institute of Science and Technology), I was invited to give a talk at one of Amazon's UK offices in Cambridge where they primarily work on Alexa (the original company developed a lot of the technology and then was bought by Amazon). I gave a general intro to W3C (using the usual team photo) and highlighting the horizontal review aspect. Then on to TimBL's campaign for 'raw data now' from which a thousand flowers will bloom. Well … maybe. Except that people don't use the Web as designed they use it as a glorified USB stick.

So here's something I'd like the Web to be able to do (perhaps via Alexa). Here are two familiar UK actors. The one on the left is well known; Hugh Bonneville is best known for Downton Abbey but he's done a lot of other things including W1A, the comedy about the BBC. But who's the chap on the right? I want to be able to say 'Alexa, who's that chap there?' There's a lot about him online. He won a BAFTA for his portrayal of the landlord who was wrongly charged with the murder of Jo Yeates, he was on TV at the time of this talk playing Solomon Coup in the drama Taboo. But what's his name? If data is shared inside zip files for download and local processing you'll never find out that his name is Jason Watkins. Hence: Data on the Web Best Practices - how to share data intelligently - which has a specialised version about spatial data.

I then went on to talk about the concept of content negotiatiion by application profile (that the then planned Dataset Exchange WG will work on) and the importance of schema.org for discovery. And maybe schema.org can also help with the problem of fake news (a term strongly associated with Trump). Huge amounts of online content, particulatrly the professionally produced content, is marked up with schema.org and other embedded data. Can machines other than search engines not make better use of that? Dan Brickley tells me that he'd love to see schema.org data being used by applications other than search engines. I referred to NIF 2.0 as a possible future standard - there is a huge amount of old documentation to be processed and data extracted.

As this was a general talk, I thought I'd touch on the security disclosure best practices that W3C is promoting. It says that BP is not to go after someone who points out a security flaw in your system. It's part of the response to the very significant criticism we're facing over EME. Then looking for more possible relevance to Amazon, I metioned François Daoust's work on Web and TV, second screen etc. Dave Raggett's work on Web of Things, and the the strategy funnel. Ending with the general exhortation that decisions are made by people who turn up - so by all means, turn up (Amazon is not a W3C Member).

In the Q&A, I ended up mentioning the idea of annotating RDF statements with temporal and probabilistic qualifiers - this went down very well with the engineers in the audience as it was right in their field.

Linked Data implementation - how can and should it work Manchester,

W3C Update

Video of the evening. I followed the BBC's Augustine Kwanashie who spoke about latest developments in their Linked Data Platform.

This was a general update on W3C work in which I tried to cover revent past, present and near future work.

It begins with the usual team photo (now more than 3 years old) that I use to highlight the breadth of work at W3C (I only cover a bit of it). Then I talked a little about the Data on the Web Best Practices. This led to highligting the 3 layers of metadata: discovery, assessment and structure. We have standards in all those areas. schema.org is almost certainly what should be used now to improve discovery of data, then for assessment there are a variety of things. These include Prov, Data Quality and Dataset Usage vocabs etc. I pointed specifically to ODRL which should be at Candidate Rec soon. Then the CSV on the Web standards - with the usual lines about the network effect and using the Wev as originally designed and not as a glorified USB stick.

Time to speed up... so I rattled through the Spatial data on the Web WG outputs, highlighting BP 1 on using persistent URIs - again back to the network effect thing. Linked Data Notifications became the Rec the day before the meeting so I mentioned that although it's outside my aea in W3C. Then SHACL - now in CR - and on to the Dataset Exchange WG which I knew was ghoing to launch before the end of the week. Future work on vocabularies has been a long time coming but we should soon make it easy to publish to w3.org/ns/* from a GitHub repo. And then we get to blue sky thinking. What's next? I mentioned possible work on annotating RDF statements with temporal and probabilistic modifiers. In the Q&A, that got a "please do that" as I'm now hearing wherever I mention it.

IPTC Spring Meeting London,

Publishing and more at W3C

Not for the first time, I stood in for Ivan Herman to give a presentation to IPTC, this time in London. IPTC was meeting as part of a bigger set of events that week and the next day was the beginning of the 2 day face to face meeting of the Permissions & Obligations Expression WG at Canary Wharf.

I was using my vesion of Ivan's slides for the early part of the talk. Setting out what the Digital Publishing Activity at W3C was doing, the different groups, and their function. Slide 12 shows the draft DPub WG charter that wew had expected to have been approved by the membership by the time of this talk but a couple of members have objected and so discussions are ongoing. An important deliverable of the proposed WG is 'Web Publications.' I used a snapshot of the book on Sogn Fjord that Heather Broomfield kindly gave me on my visit to Difi at Leikanger last year as an example (slide 14). It has two images, text in two languages and different colours, fonts, non-ASCII charcters, page numbers, and a bookmark. All of those things would have their own URI on the Web but we need to be able to refer to the publication as a whole, knowing that it is a collection of resources. Then that publication needs to be available on and off line which may need the help of a Service Workers (slide 15).

I then gave a quick overview of some other relevant work at W3C, starting with Web Annotations. I wanted to mention Big Data Europe as this has a pliot around NLP, i.e. extracting value from archive material of which newspapers have a great deal. Then a quick mention of Permissions & Obligations Expression (ODRL) and the Dataset Exchange WG. This was the first time I used the image in slide 19 that, I hope, conveys the idea of an exchange between different parties. Profiles are very important for ODRL. The flower picture is not used here in the way I normaly do to talk about opening up data and letting a thousand flowewrs bloom, ratherm this is about colour management - different ways of defining colour beyond hex, HSL and RGB. I included a link to Doug Schepers' work on accessible diagrams, emphasising that this is a way to package data and make it shareable in a meaningful way - something that I hope might be useful for infographics.

The final substantive slide, with the photographer looking at all her equipment, is about the sheer variety and potential of the Web. Lots of power and potential to completely change publishing. There's plenty of room for blue sky thinking - if you turn up and take part.

Geonovum, 10th Anniversary Open Geo Day Amersfoort,

Spatial Data on the Web

I was invited to give a talk about the Spatial Data on the Web WG by one of its key members, Linda van den Brink. She's one of the editors of the Best Practices document so it seems a little odd that I should be the one to present it but that's the way of these things. I based the talk on the presentation given to the OGC Membership by the other principal author, Jeremy Tandy. This meant following his narrative reasonably closely.

I begin with a very brief history of the WG, starting with the March 2014 workshop and the WG charter. Then I quickly show the other docs produced by the WG before getting to the main topic of the talk - the Best Practices doc itself.

Slide 12/13: This is what we expect on the Web. Information on a Web page, perhaps with a map embedded, that we can use to find our way (the map shows the pedestrain route from Geonovum's office to the event venue in Amersfoort).

Slide 14/15: But in the spatial world that's not what we get. Rather, we use specialist portals, such as the INSPIRE portal, and search results are in the form of lots of metadata that may be unfamiliar. This is not useful to non-specialist users. I couldn't even find a URI for that dataset, or the search results.

Slide 16: Asks whether something is truly on the Web if a regular search engine can't find it.

Slide 17: Shows the markup behind the Geonovum address page - designed for human interpretation, not machines. It requires a lot of NLP and inference to extract the actual data - which doesn't scale.

Slide 18: The SDW-BP doc works within the 5 star paradigm - although I only use that with caution. 5 Star data is more valuable than 3 star, but most developers will only thank you for 3 stars.

Slide 19: SDW-BP offers a route to 5 star LOD - with a spatial rosette.

Then Jeremy talked about the introductory material (there's a lot of very good stuff, but it's not a text book). And then to the actual BPs themselves. The first 3 are Web fundamentals, then 4-11 are about specific aspects of spatial data. BP 12 talks about APIs (and builds on the DWBP work) and then there are a couple of spatial metadata.

Slide 22: The SDW BP doc ends with an analysis of what isn't included. The talk is quite long by now so I've shortened this as a way to squeeze in a mention of the Dataset Exchange WG and the JWOC that will maintain the SDW BP doc and others from the original WG.

Then the much-used Cliffs of Moher sequence to end.

Göteborgs Stad

The Public Sector Web of Data

I went to Göteborg primariy for a meeting with the RDA but the way things worked out meant that I was going to be in the city for the whole of the previous day. I'd met Fredric Landqvist of Findwise a couple of years previously when I'd given a talk at a Linked Data event he and Kerstin Forsberg organised so I got in touch to let him know I'd be there and he set up two meetings for me. The first was a breakfast meeting where about 15 people from various public and private sector organisations. After a few introductory remarks I fielded questions and a wide ranging discussion for about an hour and a half. After a very pleasant lunch, we then headed to the offices of Göteborgs Stad, for which Fredric and Findwise do a lot of work. There I gave a variation of my by now usual presentation about data on the Web.

I began with the W3C team photo to talk about the range of things we do, and then highlighted the Share-PSI and Data on the Web Best Practices, both of which were relevant to the audience. An overview of the DWBPs were followed by a discussion of the different levels of metadata (Discovery, Assessment and Structure). schema.org must be part of the discovery story these days, ODRL is part of the assessment. For structure I used the rows of trees image to refer to CSV on the Web - an image I'd first used for that talk for Fredric in April 2015. As ever, this is all about highlighting the need to use the Web properly to link data points and not as a glorified USB stick.

Now that the Spatial Data on the Web BP document is complete, it's easy to point that as an extension of DWBP; ditto SHACL. Again, that cut out orange peel was an image I first used in Göteborg 2 years previously. So much for the completed work - time to mention the current work which at this time is the Dataset Exchange WG, just getting under way. Government bodies like Göteborgs Stad are very much the target market for the revision of DCAT and the formalisation of vocabulary profiles. I also highlighted the desire at W3C to support the development of vocabularies and that - blue sky thinking - we're always open to other needs.

As I have done so often, I ended with the Cliffs of Moher sequence to atlk about the interdependence of different datasets.

RDA WGIG Chairs Meeting Göteborg

The Dataset Exchange WG & Future Prospects for W3C/RDA Collaboration

This was a very different kind of presentation from the norm. The aim was to try and rekindle the relationsgip between W3C and the Research Data Alliance that I had tried to set up some years previously. As a historical note, the RDA's first plenary and kick off meeting was held in Göteborg, at Chalmers University, 4 years ago so there was an air of anniversary celebration in the air throughout the dinner the previous evening and the session I was invited to attend the next day. 4 years ago was when I ran the Open Data on the Web workshop, which included a lightning talk about the RDA, that lead to the DWBP and CSV on the Web WGs so it all resonated with me too.

My talk was delivered to a small number of folks from the RDA but they included some of the key people working on how the RDA might evolve in future. They have a lot of working groups, some of which have created some useful outputs, but these are not formal standards in the way that W3C standards are, primarily because the process is less well defined and far from being as rigorous.

I began with a credit to the VRE4EIC project, which was the banner under which Keith Jeffery and I were there, and a reference back to the SDSVoc workshop that spent a lot of time throwing rocks at DCAT. A key feature of DCAT is that it distinguishes between the abstract concept of a dataset and a distribution of it. That distinction is made by other vocabularies, but not all of them. And DCAT is only one of many vocabularies for describing data. My slides listed several of them, as well as pointing the EC's DCAT Application Profile, showing that having a vocabulary isn't enough - you need to define how it's used in specific circiumstances for increased interoperability. All of which leads to the Datazset Exchange Working Group in which Keith and I would really like RDA members to participate.

Not all the members of the DXWG will be relevant to the RDA community but many will and these were highlighted. Givem the context of the meeting with the RDA, I talked at some length about the W3C process, emohasising its openness, with all agendas, minutes and discussions held in public, documents evolving on the wiki and in GitHub. the use of Specref etc. Given all that, could RDA and W3C work together? The Spatial Data on the Web WG outputs are relevant here as they are joint standards with OGC. The RDA folks seemed genuinely interested in the idea of joint RDA/W3C standards and in the Q&A we spent a lot of tine talking about how that might work (W3C Fellows, joint projects and more). W3C is keen to do this too but: such collaboration can only work if we agree that data should be shared using Web architecture, Web identifiers, links etc - i.e. Linked Data in some form. If we can agree on that - and there was no dissent in the room on this point - then let's make it work.

This potential collaboration is highlghted in my W3C swan song blog post.

OAI10 - CERN - UNIGE Workshop on Innovations in Scholarly Communication Geneva

The Web Is Not A Glorified USB Stick

This talk was recorded

Close up of Phil Archer giving the talk in Geneva
Attribution Share Alike Some rights reserved by Elena Giglia

This was an emotional talk for me. I'd first joined the W3C team in February 2009 and a month later found myself at CERN for the 20th anniversary of the Web event. I returned to the birthplace of the Web on almost my last day at W3C to deliver a talk in which I was able to recap on many of the themes I'd been expounding for so long.

But I wasn't supposed to be there.

The organiser, Herbert van de Sompel, was very keen to invite a woman to present the Data on the Web Best Practices as he'd been criticised for gender imbalance in his sessions previously. He'd met Caroline Burle in Amsterdam at the SDSVoc workshop a few months previously and first asked her. Caroline wasn't available so she suggsted Herbert ask the other female editor of the document, Bernadette Farias Lóscio but, as she was heavily pregnant at the time, that wasn't possible. Herbert also asked two of the three female co-chairs of the working group, Deirdre Lee and Hadley Beeman, neither of them could make it. Which leads me to an apology:

The slides and, looking at the video, my presentation of them, suggest that the four women mentioned so far were the only ones involved in the work and the only ones who could have given the talk. This is not true. Sincere apologies are due, without hesitation or excuse, to Yaso Córdova and Annette Greiner, both of whom made substantial contributions to the Data on the Web Best Practices WG, Yaso as co-chair and Annette as one of the most diligent and knowledgeable commentators.

So it was yet another over weight, balding, middle aged white Anglo-Saxon male who gave the talk.

The content of the talk was, sorry Geneva, very much along the lines of many I've done in recent years: the best practices provide an intended outcome and then suggestions for how to implement them (other methods that achieve the same thing are fine), levels of metadata (discovery, assessment and structure), the importance of schema.org, the Dataset Exchange WG, vocabularies etc.

I knew, of course, that this was my last talk for W3C and so by the end I was feeling rather emotional, and I ended with an emotional plea. Looking back, my final 5 minutes were OK, but I wasn't as eloquent as I would have wished so, if you'll forgive a certain amount of l'esprit d'escalier, here's what I was trying to say:

550 years separate Johannes Gutenburg (Moveable Type was first used in 1439) and Tim Berners-Lee (the original proposal is dated March 1989). Gutenburg brought in a revolution in the way information is shared and the Web is the biggest revolution in that regard since then. It was designed at CERN in Geneva to link the results of experiments, the equipment used, the people, the places and, yes, their documents. It was not designed as a straight replacement for the printing press so don't use it as one; use the Web as a Web, make the links between facts, don't just link datasets as a whole. In other words, please, don't use the Web as a glorified USB stick.
Johannes Gensfleisch zur Laden zum Gutenberg, made after his death (public domain) | Tim Berners-Lee with an early demonstration of the Web, circa 1992

phila@w3.org out.