Slashdot Mirror


A Statistical Review of 1 Billion Web Pages

chrisd writes "As part of a recent examination of the most popular html authoring techniques, my colleague Ian Hickson parsed through a billion web pages from the Google repository to find out what are the most popular class names, elements, attributes, and related metadata. We decided that to publish this would be of significant utility to developers. It's also a fascinating look into how people create web pages. For instance one thing that surprised me was that the <title> is more popular than <br>. The graphs in the report require a browser with SVG and CSS support (like Firefox 1.5!). Enjoy!"

14 of 294 comments (clear)

  1. BR tag? by p0 · · Score: 5, Insightful

    With css power you really do not need to use br, maybe that is the reason for the small stats for the tag's use?

    --
    This is my sig. There are thousands more, but this one is mine.
    1. Re:BR tag? by Bogtha · · Score: 2, Insightful

      The <br> element type is kept around for a few minority uses. Things like poetry, code listings, etc, where dividing something up into lines is necessary. These things are rare, which is why masklinn said "should almost never be used" and not "should never be used".

      What SHOULD never happen, I think, is for BR to be treated as a substitute for proper block-level delineation.

      Yes, and if you take into account the idea that most pages that use the <br> element type do so in precisely this way, you'll end up agreeing with masklinn and myself.

      --
      Bogtha Bogtha Bogtha
    2. Re:BR tag? by Metasquares · · Score: 2, Insightful

      Because I don't know if the user wants to enter a paragraph. What the user entered is a line break (that's what hitting return does), thus br is the tag to use. If the user wants to enter in a paragraph, he can enter his own p tag or skip a line (which is the default p tag behavior anyway) and the p tag will be used.

      My site is XHTML, so the closing tag is required (not that that's stopping me).

  2. Some of these results... by Dracos · · Score: 3, Insightful

    Prove that most people (and WYSIWYGs) don't know how to produce valid and accessible markup. The img alt attibute (an accessibility requirement) was found significantly less than width, height, and border.

    I'm working on a site now where the project owner is continually reducing usability and accessibilty of the entire site (Never mind that he secretly had a third party come up with an ugly design and ambushed the dev team with it).

    I keep telling everyone to deconstruct the adage "form follows function". It means function comes first. He doesn't care what anything *is* or how it *works*, only what it looks like. And, of course, that it's ugly.

  3. Re:what's the point of a 1 billion page sample? by Durinthal · · Score: 5, Insightful

    If you can have a larger sample, why not use it? It's more accurate that way.

  4. Ad for anti-IE by jamienk · · Score: 4, Insightful

    It looks like a subtle push against IE: many mantions of the HTML 5 spec (which is being written by WHAT a workgroup that includes many browser companies but not MS); use of SVG; written by a major FF developer.

    Way to go Google! Pour on the pressure!

  5. is NOT more popular than by Anonymous Coward · · Score: 1, Insightful

    Whilst may appear on more distinct pages,
    surely is used more frequently in the aggregate; that is, the multiplicity of occurrences of
    on many pages far exceeds the single(?) occurrence of on most pages.

  6. What about plugins? by AndrewStephens · · Score: 2, Insightful

    I would be interested in seeing how many web pages use Java applets, Flash, Shockwave, Quicktime, ActiveX controls etc, etc. Sadly the authors did not include this information.

    --
    sheep.horse - does not contain information on sheep or horses.
  7. Re:Pretty crappy page authoring... by Bogtha · · Score: 2, Insightful

    Pretty crappy page authoring...not to tell a poor end user that he/she was missing a required viewer

    It's explicitly mentioned on the very first page ("Note: You will need a browser with SVG and CSS support to view the result graphs correctly. We recommend Firefox 1.5.").

    --
    Bogtha Bogtha Bogtha
  8. Re:Dumb by Spad · · Score: 4, Insightful

    It's even dumber to state that someone is presenting pictures with Flash when they're actually using SVG.

  9. Re:Pretty crappy page authoring... by masklinn · · Score: 2, Insightful

    Gecko fascism indeed, I mean what a bunch of bastard, using completely valid SVG files, oooh the nerve of them blokes...

    --
    "The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
  10. Re:Poor style by Google by pbhj · · Score: 2, Insightful

    >>> "Can anyone tell me what's here that can't be visualized with GIF's?"

    I don't think that's the point ... it's about the creation of the images, not their visualisation. These images can be created on the fly from varying data with only textual manipulation of the code - the processing will be extremely light as will the data load on the servers. Presumably the xml-to-image parsing in the browser incurs a processing penalty though.

    If you view code of one of the graphs http://code.google.com/webstats/2005-12/charts/uni que-classes-per-page.svg you'll see that it is less than 10k. It also has a theoretical infinite resolution; which might be useful if the graphs are to be used for a presentation (like printing them on the moon using lasers!!?).

    Use of FF isn't too suprising as the section code.google.com is for promotion of OSS.

    It looks to be an internal project that we have just happened to be given access too ... assuming the officers of Google that need access have FF1.5 then the web devs have probably met their brief?!

  11. Re:Font still popular by WWWWolf · · Score: 2, Insightful
    You know, there's something to be said for the straightforwardness of the "Font. Color. Red. Do it." approach.

    I don't know. I rather prefer the straightforwardness of "This is a title. You know how to format it." approach.

    With FONT tags, you need to specify the font and color on a single passage of text. Then on another. And then another. And then another. And for the good measure, just another. And by the way, one more. And that one too. And that one there, even when you just described that other one back there to have the exact same font and color. Oh, and that one too. And almost forgot that one there.

    After Netscape & IE 4 died, CSS just works.

  12. I'm feeling violated by Sontas · · Score: 2, Insightful

    1 billion pages! Talk about a violation of privacy! The justice department is only asking for a random sample of 1 million addresses and the search results for any 1 week period. This guy gets access to 1 billion pages via the google repository (whatever that is), conducts detailed analysis of the contents of those pages, and nary a word of dissent from the vast Slashdot audience.