Phil Archer

phil@philarcher.org

Is schema.org Yet Another Labelling Idea?

screen grab of text: schema.org from that site

The other day I was asked by Robert Chapin (@miqrogroove) whether I thought schema.org would be another labelling non-starter or a new Google-powered phenomenon.

The promotion by the search engines of a relatively simple system for including structured data has enormous potential for success. Increasing the amount of real estate you get when your site appears on a list of search results is a strong incentive to add the markup. That's one reason for the success of the Good Relations Ontology. How the shill and spam specialists will exploit this, and how the search engines will combat those efforts, has yet to be seen. As an advocate of the Semantic Web, I cannot but wince at the re-invention and loose interpretation of established vocabularies like FOAF and Dublin Core but, as Mike Bergman says in his piece on schema.org:

Google and the search engine triumvirate understand well … that use and adoption trump elegance and sophistication.

schema.org is a logical and obvious step alongside HTML Microdata and is the search engines effectively saying "this is the structured data we need if we're going to help you."

So does this have anything to add to what I was saying about labels and online safety the other day?

Not really.

The schema.org Creative Work vocabulary includes a property of isFamilyFriendly that takes a Boolean value. It's simple to understand and will be sufficient for many situations. It also has a property of contentRating for which it says: "Official rating of a piece of content-for example, 'MPAA PG-13'."

If we continue with schema.org's example of Avatar (and if we can work our way through all the Flash) we can visit the MPAA's Film Ratings site and find this film's rating.

screenshot of filmratings.com results for Avatar, shows 3 separate results.

Oh, hang on, there are 3 results.

There are two versions of the James Cameron film you probably meant and a 2005 film called Cyber Wars that has an alternative title of, yes, Avatar. Now, as it happens, all 3 have a PG-13 rating although the different film is rated PG-13 for different reasons. One motivation the studios have for producing alternative versions of films is to secure different ratings. In-flight versions of films are often edited to get a lower classification, for example. So film classification organisations don't classify a film, they classify a specific version of a film (or TV programme, or DVD or whatever).

But the schema.org Creative Work vocabulary isn't designed to describe a film fully. Going to the specialised Movie vocabulary adds in just 7 properties:

This is a very simple list that, self-evidently, excludes a whole raft of other roles associated with film production (costume designer and screenplay writer to name just two). Given that, it strikes me as odd that they've included musicBy and productionCompany. As a consumer, who cares that Warner Bros is the company behind the Harry Potter films, for example?

schema.org is there to disambiguate terms in text, primarily for the benefit of search engines. It's not designed as a means of publishing data in the way that RDF and its Semantic Web cousins are. Different job, different technology.

In short…

If you're publishing a bit of text about something, HTML Microdata and schema.org give you a way to add a bit of search engine friendly machine-readable structure to what you've written. Good news if you're in the SEO business.

If you're publishing data, whether it be film classifications or anything else, then RDF is the format to use. If you publish text that is closely associated with RDF data, then use HTML + RDFa, not HTML Microdata, to include structure in your markup.