Talks Given in 2017

Data on the Web Cambridge,

Data on the Web

Through Muriel Foulonneau (formerly at the Luxembourg Institute of Science and Technology), I was invited to give a talk at one of Amazon's UK offices in Cambridge where they primarily work on Alexa (the original company developed a lot of the technology and then was bought by Amazon). I gave a general intro to W3C (using the usual team photo) and highlighting the horizontal review aspect. Then on to TimBL's campaign for 'raw data now' from which a thousand flowers will bloom. Well … maybe. Except that people don't use the Web as designed they use it as a glorified USB stick.

So here's something I'd like the Web to be able to do (perhaps via Alexa). Here are two familiar UK actors. The one on the left is well known; Hugh Bonneville is best known for Downton Abbey but he's done a lot of other things including W1A, the comedy about the BBC. But who's the chap on the right? I want to be able to say 'Alexa, who's that chap there?' There's a lot about him online. He won a BAFTA for his portrayal of the landlord who was wrongly charged with the murder of Jo Yeates, he was on TV at the time of this talk playing Solomon Coup in the drama Taboo. But what's his name? If data is shared inside zip files for download and local processing you'll never find out that his name is Jason Watkins. Hence: Data on the Web Best Practices - how to share data intelligently - which has a specialised version about spatial data.

I then went on to talk about the concept of content negotiatiion by application profile (that the then planned Dataset Exchange WG will work on) and the importance of schema.org for discovery. And maybe schema.org can also help with the problem of fake news (a term strongly associated with Trump). Huge amounts of online content, particulatrly the professionally produced content, is marked up with schema.org and other embedded data. Can machines other than search engines not make better use of that? Dan Brickley tells me that he'd love to see schema.org data being used by applications other than search engines. I referred to NIF 2.0 as a possible future standard - there is a huge amount of old documentation to be processed and data extracted.

As this was a general talk, I thought I'd touch on the security disclosure best practices that W3C is promoting. It says that BP is not to go after someone who points out a security flaw in your system. It's part of the response to the very significant criticism we're facing over EME. Then looking for more possible relevance to Amazon, I metioned Fran├žois Daoust's work on Web and TV, second screen etc. Dave Raggett's work on Web of Things, and the the strategy funnel. Ending with the general exhortation that decisions are made by people who turn up - so by all means, turn up (Amazon is not a W3C Member).

In the Q&A, I ended up mentioning the idea of annotating RDF statements with temporal and probabilistic qualifiers - this went down very well with the engineers in the audience as it was right in their field.

Linked Data implementation - how can and should it work Manchester,

W3C Update

Video of the evening. I followed the BBC's Augustine Kwanashie who spoke about latest developments in their Linked Data Platform.

This was a general update on W3C work in which I tried to cover revent past, present and near future work.

It begins with the usual team photo (now more than 3 years old) that I use to highlight the breadth of work at W3C (I only cover a bit of it). Then I talked a little about the Data on the Web Best Practices. This led to highligting the 3 layers of metadata: discovery, assessment and structure. We have standards in all those areas. schema.org is almost certainly what should be used now to improve discovery of data, then for assessment there are a variety of things. These include Prov, Data Quality and Dataset Usage vocabs etc. I pointed specifically to ODRL which should be at Candidate Rec soon. Then the CSV on the Web standards - with the usual lines about the network effect and using the Wev as originally designed and not as a glorified USB stick.

Time to speed up... so I rattled through the Spatial data on the Web WG outputs, highlighting BP 1 on using persistent URIs - again back to the network effect thing. Linked Data Notifications became the Rec the day before the meeting so I mentioned that although it's outside my aea in W3C. Then SHACL - now in CR - and on to the Dataset Exchange WG which I knew was ghoing to launch before the end of the week. Future work on vocabularies has been a long time coming but we should soon make it easy to publish to w3.org/ns/* from a GitHub repo. And then we get to blue sky thinking. What's next? I mentioned possible work on annotating RDF statements with temporal and probabilistic modifiers. In the Q&A, that got a "please do that" as I'm now hearing wherever I mention it.

IPTC Spring Meeting London,

Publishing and more at W3C

Not for the first time, I stood in for Ivan Herman to give a presentation to IPTC, this time in London. IPTC was meeting as part of a bigger set of events that week and the next day was the beginning of the 2 day face to face meeting of the Permissions & Obligations Expression WG at Canary Wharf.

I was using my vesion of Ivan's slides for the early part of the talk. Setting out what the Digital Publishing Activity at W3C was doing, the different groups, and their function. Slide 12 shows the draft DPub WG charter that wew had expected to have been approved by the membership by the time of this talk but a couple of members have objected and so discussions are ongoing. An important deliverable of the proposed WG is 'Web Publications.' I used a snapshot of the book on Sogn Fjord that Heather Broomfield kindly gave me on my visit to Difi at Leikanger last year as an example (slide 14). It has two images, text in two languages and different colours, fonts, non-ASCII charcters, page numbers, and a bookmark. All of those things would have their own URI on the Web but we need to be able to refer to the publication as a whole, knowing that it is a collection of resources. Then that publication needs to be available on and off line which may need the help of a Service Workers (slide 15).

I then gave a quick overview of some other relevant work at W3C, starting with Web Annotations. I wanted to mention Big Data Europe as this has a pliot around NLP, i.e. extracting value from archive material of which newspapers have a great deal. Then a quick mention of Permissions & Obligations Expression (ODRL) and the Dataset Exchange WG. This was the first time I used the image in slide 19 that, I hope, conveys the idea of an exchange between different parties. Profiles are very important for ODRL. The flower picture is not used here in the way I normaly do to talk about opening up data and letting a thousand flowewrs bloom, ratherm this is about colour management - different ways of defining colour beyond hex, HSL and RGB. I included a link to Doug Schepers' work on accessible diagrams, emphasising that this is a way to package data and make it shareable in a meaningful way - something that I hope might be useful for infographics.

The final substantive slide, with the photographer looking at all her equipment, is about the sheer variety and potential of the Web. Lots of power and potential to completely change publishing. There's plenty of room for blue sky thinking - if you turn up and take part.

Geonovum, 10th Anniversary Open Geo Day Amersfoort,

Spatial Data on the Web

I was invited to give a talk about the Spatial Data on the Web WG by one of its key members, Linda van den Brink. She's one of the editors of the Best Practices document so it seems a little odd that I should be the one to present it but that's the way of these things. I based the talk on the presentation given to the OGC Membership by the other principal author, Jeremy Tandy. This meant following his narrative reasonably closely.

I begin with a very brief history of the WG, starting with the March 2014 workshop and the WG charter. Then I quickly show the other docs produced by the WG before getting to the main topic of the talk - the Best Practices doc itself.

Slide 12/13: This is what we expect on the Web. Information on a Web page, perhaps with a map embedded, that we can use to find our way (the map shows the pedestrain route from Geonovum's office to the event venue in Amersfoort).

Slide 14/15: But in the spatial world that's not what we get. Rather, we use specialist portals, such as the INSPIRE portal, and search results are in the form of lots of metadata that may be unfamiliar. This is not useful to non-specialist users. I couldn't even find a URI for that dataset, or the search results.

Slide 16: Asks whether something is truly on the Web if a regular search engine can't find it.

Slide 17: Shows the markup behind the Geonovum address page - designed for human interpretation, not machines. It requires a lot of NLP and inference to extract the actual data - which doesn't scale.

Slide 18: The SDW-BP doc works within the 5 star paradigm - although I only use that with caution. 5 Star data is more valuable than 3 star, but most developers will only thank you for 3 stars.

Slide 19: SDW-BP offers a route to 5 star LOD - with a spatial rosette.

Then Jeremy talked about the introductory material (there's a lot of very good stuff, but it's not a text book). And then to the actual BPs themselves. The first 3 are Web fundamentals, then 4-11 are about specific aspects of spatial data. BP 12 talks about APIs (and builds on the DWBP work) and then there are a couple of spatial metadata.

Slide 22: The SDW BP doc ends with an analysis of what isn't included. The talk is quite long by now so I've shortened this as a way to squeeze in a mention of the Dataset Exchange WG and the JWOC that will maintain the SDW BP doc and others from the original WG.

Then the much-used Cliffs of Moher sequence to end.