Slashdot Mirror


A Statistical Review of 1 Billion Web Pages

chrisd writes "As part of a recent examination of the most popular html authoring techniques, my colleague Ian Hickson parsed through a billion web pages from the Google repository to find out what are the most popular class names, elements, attributes, and related metadata. We decided that to publish this would be of significant utility to developers. It's also a fascinating look into how people create web pages. For instance one thing that surprised me was that the <title> is more popular than <br>. The graphs in the report require a browser with SVG and CSS support (like Firefox 1.5!). Enjoy!"

14 of 294 comments (clear)

  1. Re:what's the point of a 1 billion page sample? by Anonymous Coward · · Score: 5, Informative

    You get a decrease of the variance of the mean.

  2. Opera also supports SVG by TheJavaGuy · · Score: 4, Informative

    FYI, Opera also supports SVG. I'm surprised that Ian Hickson didn't have Opera also mentioned on that Google page, after all he worked at Opera until a few months ago.

    --
    Opera Watch - An Opera browser blog.
  3. Re:what's the point of a 1 billion page sample? by shoolz · · Score: 2, Informative

    Because with statistics, increasing the sample size does not result in a uniform increase in accuracy.

    If you start with a sample size of 1000 and add an additional 10000, the accuracy will increase dramatically. But if you start with 1,000,000,000, and increase it by another 1,000,000,000, the accuracy won't go up even by as much as 0.0001%

    Yes, I'm pulling the numbers out of the air, but the point is that there exists a sweet spot where the additional effort does not pay off.

  4. table with no by saigon_from_europe · · Score: 4, Informative
    From the article:
    If someone can explain why so many pages would use a
    <table>
    tag and then not put any cells in it, please let us know.
    I don't know if they counted dynamic pages, but I guess they did. In dynamic pages, an empty table is quite normal.

    Your code usually goes like this:
    <table>
    <% for each element in collection %>
    <tr><td> something </td></tr>
    <% end for %>
    </table>

    So it is quite easy to get the empty table if the collection is empty.
    --
    No sig today.
  5. The reason not to do this by winkydink · · Score: 4, Informative

    Capitalization makes all the difference in the sentence:

    i helped my uncle jack off a horse

    --

    "I'd rather be a lightning rod than a seismometer." -Ken Kesey

  6. Re:Not so fast - I'm pulling up mostly blank pages by stunt_penguin · · Score: 2, Informative

    Try using a SVG compatible browser. SVG graphics *tend to* work better that way.

    --
    When the posters fear their moderators, there is tyranny; when the moderators fears the posters, there is liberty.
  7. Re:Ad for anti-IE by Bogtha · · Score: 3, Informative

    written by a major FF developer

    I don't believe Ian Hickson has been involved with Firefox; if I remember correctly, he used to hack on Mozilla, but then started work at Opera before Firefox took off.

    I don't think it's a jab at Internet Explorer, it's just that he knows that the target audience is likely to have a decent browser, so he's used the features likely to be available.

    --
    Bogtha Bogtha Bogtha
  8. Re:Heh by Blink+Tag · · Score: 2, Informative
    Most people (roughly 98%) include head, html, title and body elements. This is somewhat ironic, since three of those four elements are optional in HTML

    Somewhat true. The HEAD tag is technically optional (per spec), but TITLE is required, and must be in the HEAD. Thus HEAD is required in practice.

    From the HTML 4.01 spec:

    Every HTML document must have a TITLE element in the HEAD section.

    Though marked as "start tag optional"/"end tag optional", the BODY and HTML tags do provide useful symantec relevance.

  9. Re:Cool statistics by Anonymous Coward · · Score: 1, Informative

    Image maps are often used on banner ads. I would guess that this is the main reason why they are so popular in this analysis.

  10. For folks does not (want) to run Firefox by Ilgaz · · Score: 3, Informative

    http://www.adobe.com/svg/viewer/install/main.html got suitable plugins for browsers/OS of choice.

    Notice that I got SVG plugin installed for ages, Safari didn't display the graphs. Is it because I am not using "a browser with CSS"? Well, nevermind really...

    This is the thing why I and others have negative views against firefox, svg and even .ogg. Rootless promotion of this kind...

  11. Re:BR tag? is used in 7 out of 8 pages by TekGoNos · · Score: 3, Informative

    The summary got it wrong,

    the study states that there are more pages using title, than pages using br. NOT that more title tags are used than br tags.

    Approximatly 98% of all pages have a title tag and approximatly 7 out of 8 pages have (at least one, probably more) br tags.

    --
    I have discovered a truly remarkable proof for my post which this sig is too small to contain.
  12. Fix for Firefox 1.5 by bigbadbuccidaddy · · Score: 3, Informative

    If your Firefox 1.5 doesn't display the graphs, or crashes, do the following as suggested by the Google webstats author:

    Apparently there's a problem in Firefox 1.5 regarding SVG images if you
    had SVG in the registry. Try following the steps described here:

          https://bugzilla.mozilla.org/show_bug.cgi?id=30358 1#c3

  13. Re:Set-Cookie2 insecure? by hixie · · Score: 2, Informative

    Yeah, I misspoke on this. Set-Cookie is insecure (due to domain-crossing problems -- should a cookie sent to a.b.c get sent to z.b.c? Depends on "b" and "c" in ways that depend on month-to-month political changes around the globe), but as far as I can tell, Set-Cookie2 is also insecure. I had thought it fixed this, but apparently not.

  14. Re:Not so fast - I'm pulling up mostly blank pages by hixie · · Score: 2, Informative

    It has nothing to do with "cool"; SVG happens to be easier for us to produce than bitmaps, and anyone who is going to be able to read this report and view graphics will be using an SVG-capable browser. The fact that it found bugs in every SVG browser out there is merely a bonus, it means that SVG support will get better.

    We used standards. It's not our fault if there was only one released browser that supported those standards well enough for you to be able to see the graphics.