A Statistical Review of 1 Billion Web Pages
chrisd writes "As part of a recent examination of the most popular html authoring techniques, my colleague Ian Hickson parsed through a billion web pages from the Google repository to find out what are the most popular class names, elements, attributes, and related metadata. We decided that to publish this would be of significant utility to developers. It's also a fascinating look into how people create web pages. For instance one thing that surprised me was that the <title> is more popular than <br>. The graphs in the report require a browser with SVG and CSS support (like Firefox 1.5!). Enjoy!"
if the tag isn't on the top elements list.
the tag.
well when people talk like this and dont bother using punctuation spacekeys or any of the skills that they have been taught in school its no wonder why webpages turn out like this not to mention those long runon sentences and also all that broken code that are the fist attempt at a webpage by a twelve year old kid who tried to steal someone elses layout and replaced the word with his own then you start to look at all of those dynamically generated webpages and the layouts and the style sheets and its no wonder why the good old br tag never get a work out.
An un-slashdottable server.
With css power you really do not need to use br, maybe that is the reason for the small stats for the tag's use?
This is my sig. There are thousands more, but this one is mine.
It didn't have everything of course. Some elements were censored on behalf of the Chinese government.
I have to ask, what's the purpose of a 1-BILLION page sample? That's the beautiful thing about statistics. If you can say something about the distribution of characteristics within a population, you don't have to survey the entire population to get meaningful results. Are the study authors proposing that no standard distribution can be applied to the entire universe of web pages? If that's the case, then do the statistics they apply to their sample of one billion really say anything predictive about the entire population?
Aside from the cool factor of saying they sampled a billion pages, I don't see what extra benefits are gained from that extra effort.
The 'br' element
The br element is a simple one, yet used on so many pages that it is the 8th most-used element. It is used more than the p element.
clear, style, class, soft, id, and \.
Wow! I never knew you guys were that popular.
He who knows best knows how little he knows. - Thomas Jefferson
Prove that most people (and WYSIWYGs) don't know how to produce valid and accessible markup. The img alt attibute (an accessibility requirement) was found significantly less than width, height, and border.
I'm working on a site now where the project owner is continually reducing usability and accessibilty of the entire site (Never mind that he secretly had a third party come up with an ugly design and ambushed the dev team with it).
I keep telling everyone to deconstruct the adage "form follows function". It means function comes first. He doesn't care what anything *is* or how it *works*, only what it looks like. And, of course, that it's ugly.
How about:
IF(Post=Old_And_Tired) GOTO Mod_Down
It looks like a subtle push against IE: many mantions of the HTML 5 spec (which is being written by WHAT a workgroup that includes many browser companies but not MS); use of SVG; written by a major FF developer.
Way to go Google! Pour on the pressure!
FYI, Opera also supports SVG. I'm surprised that Ian Hickson didn't have Opera also mentioned on that Google page, after all he worked at Opera until a few months ago.
Opera Watch - An Opera browser blog.
I wonder how much of what they found is influenced by how people learned to write HTML - which in all likelihood was to copy code from existing pages... might explain parts of what they found, such as:
ClutterMe.com - easiest site creation on the Net. Just click and type.
Your code usually goes like this:
So it is quite easy to get the empty table if the collection is empty.
No sig today.
Capitalization makes all the difference in the sentence:
i helped my uncle jack off a horse
"I'd rather be a lightning rod than a seismometer." -Ken Kesey
I had an interesting run-in with Benford's law a bit ago. I had this typed up already, so here goes (description of the law omitted; read the Wikipedia link in the parent -- it's really cool):
You see, my hard drive crashed about two weeks ago. It had three partitions on it, and two of them are still perfectly readable. The third is pretty well shot. (Fortunately, it was the most useless partition; it's main contents was Windows itself. This does mean ANOTHER Windows installation -- after having to do one a few weeks before -- but really that's no biggie compared with my actual data. And while I'm on that subject, I had two hard drives; when I got the newer one, I put all my work stuff on it as well as a new Linux installation specifically because it was less likely to fail, and I look back at that decision now with great happiness, because it is that foresight that has made this no big deal at all.)
I've been trying to recover data off of the third partition, and it seems that if you do a full scan of the partition it appears as if the data was just deleted. Most of the time it's able to recover information, but not always: folder names are often lost. They show up in the recovery programs I tried as just Folder2393 for example. (Numbers ranged from 2 to 5 digits.)
The folder numbers approximately follow Benford's law.
Here is the approximate distribution:
(M. S. Digit) (% of folders) (Ideal Benford %)
1 32 30.1
2 15 17.6
3 12 12.5
4 12 9.7
5 19 7.9
6 03 6.7
7 03 5.8
8 02 5.1
9 02 4.6
It's even dumber to state that someone is presenting pictures with Flash when they're actually using SVG.
http://www.adobe.com/svg/viewer/install/main.html got suitable plugins for browsers/OS of choice.
.ogg. Rootless promotion of this kind...
Notice that I got SVG plugin installed for ages, Safari didn't display the graphs. Is it because I am not using "a browser with CSS"? Well, nevermind really...
This is the thing why I and others have negative views against firefox, svg and even
The summary got it wrong,
the study states that there are more pages using title, than pages using br. NOT that more title tags are used than br tags.
Approximatly 98% of all pages have a title tag and approximatly 7 out of 8 pages have (at least one, probably more) br tags.
I have discovered a truly remarkable proof for my post which this sig is too small to contain.
If your Firefox 1.5 doesn't display the graphs, or crashes, do the following as suggested by the Google webstats author:
8 1#c3
Apparently there's a problem in Firefox 1.5 regarding SVG images if you
had SVG in the registry. Try following the steps described here:
https://bugzilla.mozilla.org/show_bug.cgi?id=3035