What is XHTML?

written in front-end

In a recent conversation, I was asked if I knew what XHTML was. Much to my dismay, I did not. I mean, I could make some guesses (and did): the HTML part was pretty clear, and scrabbling together context clues I correctly inferred that the ‘X’ stood for extensible. But as to what XHTML was and how it differed from HTML or XML, I didn’t know.

So, on that note, today we’re going to learn about XHTML.

The Basics: XML

In short, XHTML is HTML written in XML. At this point, I’m assuming you’re familiar with HTML, which is often the first thing we ever learn in web development - but that might not be the case with XML, or ‘Extensible Markup Language’. XML is a markup language designed for storing and transporting data - it’s similar in appearance to HTML, but whereas the tags in HTML provide information about how documents should be displayed in the browser, the tags in XML allow you to describe the nature of the data being conveyed. HTML’s tags are pre-defined, while in XML you can create your own tags. Here’s some sample XML:

<?xml version="1.0" encoding="UTF-8"?>
<food>
  <name>Cheddar</name>
  <type>Cheese</type>
  <flavour>Sharp, pungent and slightly earthy.</flavour>
</food>

As you can see, the XML tags provide information about the nature of the data - the <food> tag is descriptive (in this case serving to name an object), as are the properties we have given it through the <name>, <type>, and <flavour> tags. This XML code does not do anything in and of itself. Another piece of software is needed to do something with it. XML is extensible because new data can be added and removed, and programs will typically still function. For example, we could add a <nationality> tag to the example above and it would still work.

XML vs HTML Syntax

Several characteristics of XML differ from HTML and become important when we look at XHTML. Firstly, XML is case sensitive, while HTML is case insensitive. Secondly, all XTML tags must be closed. In HTML, sometimes code will still function with out a closing tag, like so:

<body>
  <h1>Hello world!</h1>
  <p>This is a simple piece of code
</body>

In XML, an unclosed tag is illegal and will produce an error. Similarly, XML tags must be properly nested. That is to say, tags must be opened and closed in order, and cannot overlap. This is not the case in HTML, where the following may still function as intended:

<div>
  <p>Here is some sample text.
</div></p>

So, generally speaking, the rules of syntax for XML are stricter than they are for HTML, and certain suboptimal practices that will escape unpunished in HTML will produce errors in XML. This strictness played a role in the rationale for developing XHTML.

The History of XHTML

In 1998, the World Wide Web Consortium (W3C), the main international organisation responsible for setting standards for the internet, published a document entitled ‘Reformulating HTML in XML’. The document outlined a rationale for reformulating HTML4 as an XML application. First, as new methods of browsing the Internet were being introduced, the shift to XML was intended to increase user agent interoperability, with the ultimate goal being that XHTML would be able to be used by a broader range of agents than HTML. Secondly, the extensibility of XML would allow for the easier introduction of new elements and element attributes through XHTML modules, meaning that it would be relatively easy to introduce ways of expressing new ideas through markup.

XHTML was also introduced in an attempt to promote ‘well-formed’ code. Since, as described above, browswers are generally lenient regarding certain syntax errors in HTML, websites often contained numerous errors. The W3C viewed this as problematic - and by introducing HTML written in XML, whose syntatic rules were stricter, web developers would be forced to abide by these rules or have their website produce an error message in the browser.

It would appear that the reaction to XHTML was less than enthusiastic - in part because so many web pages contained errors that the prospect of rewriting in XHTML (which provided little in the way of additional features over HTML4) was not very appealing. In fact, loopholes in the XHTML 1.0 specification meant that text that looked like XHTML could still be served as the text/html MIME type and avoid the stricter error handling.

Meanwhile, in the early 2000s, a competing vision for an evolved version of the HTML4 standard began to emerge amongst actors like Mozilla and Opera, and was articulated by the Web Hypertext Applications Technology Working Group (WHAT Working Group). This vision aimed to develop specifications based on HTML that would expand HTML’s functionality, ease the deployment of interoperable web applications, and (and this is key) not break backwards compatibility.

In 2014, this vision was formally released as a stable W3C recommendation as HTML5, and its widespread adoption has hindered any further progress in the adoption of XHTML. Indeed, in 2009, the W3C shut down the XHTML 2.0 Working Group.

Conclusion

With the rise of HTML5, it would seem that XHTML will recede into the annals of internet history as a mere footnote. It is, however, fun to note that to this day it is still easy to find developers defending its use (as well as other developers decrying it). This Quora post, asking ‘Why do developers still use XHTML?’ is a fascinating read - though most of the argument in favour of XHTML is that it forces you to code with discipline.

There you have it - a quick trip down the Internet’s memory lane, sparked by a simple reference in a conversation.