This article is translated to Serbo-Croatian language by Vera Djuraskovic from Webhostinggeeks.com.
Every Web developer is excited about HTML5 — and rightly so. It adds new features to the primary language of the Web and is designed with experience and practicality in mind.
However, it's not yet completed. As of today, browser manufacturers are all working hard on implementing it but we're a long way of being able to assume full support for HTML5 across the board. The specification documents are in Last Call, meaning that the working group believes the document to be complete subject to receiving and dealing with comments submitted by the community. After that there are further critical stages to go before the press release goes out saying "HTML5 is a W3C Recommendation." (Don't hold me to it but I'd say 2013 at the earliest).
What is stable and fully implemented is XHTML (I hope you'll allow me to leave aside the peculiarities of Internet Explorer or we'll be here all night). So what should you use — the not quite stable but very exciting HTML5 or the older, stable XHTML?
You can pretty much do both at the same time.
I'm about to do it on my own site. Time me. It's .
i.e. it took me less than 3 minutes to change my site from being exclusively written in HTML5 to being written in both HTML5 and XML: what's called a polyglot document.
Now, OK, I may be being a bit unfair on the timing. I knew what I was about to do, you may have noticed that I said HTML5 and XML, not HTML5 and XHTML, and everything was ready before I started, but let's work through it.
As I noted on , I made a few changes to the markup on this site to change it from XHTML 1.0 strict to HTML5. Now that's an easy transition to make since I was already working in the stricter markup language and I have long been used to validating my pages. So every element was properly closed, ampersands were encoded, element names were written in lower case and so on. You don't have to do this in HTML5 but you can and I do, as much out of habit as anything. The advice is that you should continue as you begin. In other words, for me now to stop closing tags and quoting attribute values, or to start using anything other than lower case element names, would be bad practice — so I haven't stopped doing any of the things I always did in XHTML 1.0 strict.
Because of that, all I had to do to go from XHTML 1.0 Strict to HTML5 back in April was to change the page template so that the top lines went from:
<?xml version="1.0" encoding="windows-1252"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en-GB"> <head> <meta http-equiv="content-language" content="en-GB" /> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
<!doctype html> <html lang="en-GB"> <head> <meta charset="windows-1252" />
And that was it. I then played around with some of the nice new HTML elements like
article but, just
to emphasise the point, continued to make sure that elements were closed so that, for example, every <p> is matched by a </p>.
So what I wanted to do today was to make a few changes so that my Web pages could be parsed as either HTML5 or XML (remember the point of XHTML, it's HTML encoded in XML). My reference for all this is Polyglot Markup: HTML-Compatible XHTML Documents. As W3C documents go it's remarkably short. The first line of the abstract tells you what it's about:
A document that uses polyglot markup is a document that is a stream of bytes that parses into identical document trees (with the exception of the xmlns attribute on the root element) when processed as HTML and when processed as XML.
Incidentally, notice that the aim is to please both an HTML and an XML parser, not an XHTML parser. Polyglot documents are not valid XHTML.
All of which sounds terribly complicated, not to say arcane, but let me cut to the chase. The first steps I took today were:
So the top few lines now look like this:
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" lang="en-GB" xml:lang="en-GB"> <head>
Notice that I've included the
These are optional in HTML 5 but required for polyglot documents.
There were a couple of other things to take care of though.
Notice that I don't use the preferred character encoding of UTF-8. This is simply because I use
a Windows PC and am used to using an HTML editor that doens't support UTF-8. I could use a different editor of course but,
well, I'm comfortable with the one I've used for years (CuteHTML). Looking at the relevant
WHATWG's FAQ I notice
that for polyglot documents, UTF-8 is the only character encoding that can be declared using the
<meta charset="…" /> element. That's because XML character encoding is declared in the Processing Instruction
<?xml version="1.0" encoding="UTF-8"?>). There is no
meta element through which you can declare
the charset for XML. To get round this I've finally got around to doing what I should have done ages ago and set the
character encoding at server level using a one line .htaccess file that simply says:
I was able to use my HTTP Header viewer to confirm that this worked as expected. Doing this however does produce another warning in the W3C validator which recommends that you include a document level character encoding. Well, I have a reason not to and I'm sticking with it. Let's move on!
Scripts and style definitions can be included within polyglot documents but there are restrictions on the characters you can use and it's easy to forget those little details so the advice is clear: define all your styles and scripts in external files.
On this very simple Web site I don't use any
document.writeln() in what
document.writeln() is not valid in XML.
I do, however, include the Google Analytics code and that had been written within a
script element. Not
any more — it's now in an external file. This, incidentally, is good practice anyway, especially for mobile. The script
is included in every page and so it's better to make it a separate file that can be cached rather than shipping the code with
every page. There is no
noscript element in polyglot documents by the way.
Finally I had 2 style definitions specific to the home page that were embedded at document level. That approach, document level
definitions, seems right but, well, for the sake of copying the content into a little text file and replacing it with
link it hardly seems worth arguing with. What I didn't do though was to copy the styles into the primary
stylesheet for the site since that would mean shipping those few bytes with the stylesheet even when they weren't required.
Again, for mobile, every byte matters.
I've covered what I had to do to this site to make documents polyglot. As you can see, it wasn't
much. But that's because this is a very simple site, hand coded with a bit of PHP templating. I don't
have any need to use tables anywhere but if I did I'd have to make sure that all
tr elements were
wrapped in one of
tfoot. Likewise any
would need to be wrapped in a
This very simple site does not include any SVG or MathML but if it did, I'd have to follow a few extra rules on those. Chapter and verse can be seen in the Polyglot Markup standard.
Converting an XHTML document into a polyglot document is easy. By following a few relatively simple rules — some of which actively encourage good practice — your markup can be parsed as either XML or HTML5. Add in an HTML5 shiv (I use the one created by Remy Sharp) and you're good to go with a document that is very likely to work as you'd expect in just about any browser.