A Statistical Review of 1 Billion Web Pages
chrisd writes "As part of a recent examination of the most popular html authoring techniques, my colleague Ian Hickson parsed through a billion web pages from the Google repository to find out what are the most popular class names, elements, attributes, and related metadata. We decided that to publish this would be of significant utility to developers. It's also a fascinating look into how people create web pages. For instance one thing that surprised me was that the <title> is more popular than <br>. The graphs in the report require a browser with SVG and CSS support (like Firefox 1.5!). Enjoy!"
With css power you really do not need to use br, maybe that is the reason for the small stats for the tag's use?
This is my sig. There are thousands more, but this one is mine.
Prove that most people (and WYSIWYGs) don't know how to produce valid and accessible markup. The img alt attibute (an accessibility requirement) was found significantly less than width, height, and border.
I'm working on a site now where the project owner is continually reducing usability and accessibilty of the entire site (Never mind that he secretly had a third party come up with an ugly design and ambushed the dev team with it).
I keep telling everyone to deconstruct the adage "form follows function". It means function comes first. He doesn't care what anything *is* or how it *works*, only what it looks like. And, of course, that it's ugly.
If you can have a larger sample, why not use it? It's more accurate that way.
It looks like a subtle push against IE: many mantions of the HTML 5 spec (which is being written by WHAT a workgroup that includes many browser companies but not MS); use of SVG; written by a major FF developer.
Way to go Google! Pour on the pressure!
Whilst may appear on more distinct pages,
surely is used more frequently in the aggregate; that is, the multiplicity of occurrences of
on many pages far exceeds the single(?) occurrence of on most pages.
I would be interested in seeing how many web pages use Java applets, Flash, Shockwave, Quicktime, ActiveX controls etc, etc. Sadly the authors did not include this information.
sheep.horse - does not contain information on sheep or horses.
It's explicitly mentioned on the very first page ("Note: You will need a browser with SVG and CSS support to view the result graphs correctly. We recommend Firefox 1.5.").
Bogtha Bogtha Bogtha
It's even dumber to state that someone is presenting pictures with Flash when they're actually using SVG.
Gecko fascism indeed, I mean what a bunch of bastard, using completely valid SVG files, oooh the nerve of them blokes...
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
>>> "Can anyone tell me what's here that can't be visualized with GIF's?"
... it's about the creation of the images, not their visualisation. These images can be created on the fly from varying data with only textual manipulation of the code - the processing will be extremely light as will the data load on the servers. Presumably the xml-to-image parsing in the browser incurs a processing penalty though.
i que-classes-per-page.svg you'll see that it is less than 10k. It also has a theoretical infinite resolution; which might be useful if the graphs are to be used for a presentation (like printing them on the moon using lasers!!?).
... assuming the officers of Google that need access have FF1.5 then the web devs have probably met their brief?!
I don't think that's the point
If you view code of one of the graphs http://code.google.com/webstats/2005-12/charts/un
Use of FF isn't too suprising as the section code.google.com is for promotion of OSS.
It looks to be an internal project that we have just happened to be given access too
I don't know. I rather prefer the straightforwardness of "This is a title. You know how to format it." approach.
With FONT tags, you need to specify the font and color on a single passage of text. Then on another. And then another. And then another. And for the good measure, just another. And by the way, one more. And that one too. And that one there, even when you just described that other one back there to have the exact same font and color. Oh, and that one too. And almost forgot that one there.
After Netscape & IE 4 died, CSS just works.
1 billion pages! Talk about a violation of privacy! The justice department is only asking for a random sample of 1 million addresses and the search results for any 1 week period. This guy gets access to 1 billion pages via the google repository (whatever that is), conducts detailed analysis of the contents of those pages, and nary a word of dissent from the vast Slashdot audience.