XML is a semantic computer language, it identifies information ‘types’ and through an XML schema, information relationships.
HTML is primarily a presentation language, it tells your browser how large text should be, the colour and font to be used, etc.
For example, a company’s contact webpage could contain numbers for a direct phone line, facsimile and ICQ. In HTML code, this information may be formatted as follows:
<p><strong>Phone: 0064 4 3 800 800</strong><br />
Fax: 0064 4 3 800 800<br />
ICQ: 700516</p>
This would be displayed as:
Phone: +64 4 3 800 800
Fax: +64 4 3 800 800
ICQ: 700516
Code that describes the meaning of the data is referred to as semantic markup.
The formatting of the phone number is used to prioritise it above the fax and ICQ. This may help a person reading your webpage find the appropriate number. A search engine however, cannot rely upon formatting cues. Contact details may be displayed in a range of colours and type treatments determined by brand guidelines rather than adherance to a universal information presentation standard.
With XML, custom tags can be created that (at a code level) make clear the ‘meaning’ of each number:
<phone>+64 4 3 800 800</phone>
<fax>+64 4 3 800 800</fax>
<icq>700516</icq>
To be meaningful each XML document must be accompanied by a custom ‘how-to’ guide or schema.
An XML schema establishes the relationship between XML tags; just as grammar rules establish the relationship between parts of speech.
Adding a semantic level to the web may make finding information more precise. Asking a question of a search engine such as “what is XXX’s fax number” can, with XML be answered with a specific answer, rather than with a link to a webpage.