A Statistical Review of 1 Billion Web Pages
chrisd writes "As part of a recent examination of the most popular html authoring techniques, my colleague Ian Hickson parsed through a billion web pages from the Google repository to find out what are the most popular class names, elements, attributes, and related metadata. We decided that to publish this would be of significant utility to developers. It's also a fascinating look into how people create web pages. For instance one thing that surprised me was that the <title> is more popular than <br>. The graphs in the report require a browser with SVG and CSS support (like Firefox 1.5!). Enjoy!"
"In general, the tone of the article seems to be that many people should not be allowed on the web because they can't follow standards"
/ /www.google.co.uk/
And that would include, I think, the article's authors. Using DOCTYPE HTML is a non-standard DTD, they haven't specified any character encoding, and whilst they've cottoned on to the HTML entity for an opening right-angle bracket they haven't yet to the HTML entity for a closing bracket. This last isn't invalid I don't think, just stupid.
Also, check out the source for google.com for some awful, non-standards HTML:
http://validator.w3.org/check?verbose=1&uri=http:
This is the sort of markup muck that they generally spit out. So with respect to "I wonder how much HTML coding the authors do" I'd say slightly more than the google web devs.