Netcraft Web Server Stats Challenged
kolchak writes "An article in The Age has an interesting analysis of the Netcraft Web Server Usage Reports. According to Port80 Software, Netcraft's surveys are biased towards domain name parkers and very small web sites, not taking into account how popular a site may be - there's some interesting results in the competing Port80 survey." However, it should be pointed out that Port80 "develops software products to enhance the security, performance and user experience of Microsoft's Internet Information Services (IIS) Web server."
This is wrong on soooooo many levels. I could understand trying to twist the truth by redefining what a webserver is... but thier sampling method is straight out wrong.
Want proof? Here it is. Go to the linked article, (or click here) and where they have the box to check your server header (about half way down the page) type in www.microsoft.com - you will see its running IIS/6. A nice happy IIS server.
Now, type in my web server - http://www.isthatdamngood.com - its a nice Linux/Apache server. My server will CRASH thier app! Actually, a lot of linux servers will crash it...
Kinda hard to claim your results are more indicitative of the market when your scanning technology is flat out broken.
Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
and this was their response:
We detect that homepage.mac.com is running Apache/1.3.27 (Darwin).
but with this caveat
Note:
No matter what the above results show, this company may be running Microsoft IIS and protecting its Web server identity with ServerMask.
Nope, no bias there.
The dogcow says "Moof!"
It is not only funny that according to their "survey" IIS has more market share than Apache, but *gasp* Netscape has a larger market share than Apache too!
That is as big of a red flag as I have ever seen.
Of course the fact that they indeed produce softs for IIS is in no way shape or form any sort of indication to a possible, slight, minimal... bias.
LOL, a nice laugh... and they may even get slashdotted, which will bring joy to their sorry operation since they will now be able to claim that they are now one of the nets most popular companies/sites. I am sure this is some sort of ploy to get traffic, it will be funny to see if indeed their beloved IIS can stand the slashdot effect. LOL
Didn't Netcraft themselves cover this topic last year? IIRC, some pro-MS group made the same argument, that you should only count the big guys. They looked at the Fortune N (I forget what N was) and found that lo and behold, IIS came out on top.
Then Netcraft came back with another study, where they ranked companies not by their Fortune ranking (i.e., total revenue), which would tend to favor MS as that's the "safe" choice for big companies. Instead, they ranked companies by how much revenue they made on the Net (so companies like Amazon would rank much higher), and found that by that measure, Apache was again on top.
I tried several sites myself with my own javascript and guess what?
My results were were different than their's more than half the time! I figured they had multiple servers running, etc., so I rechecked at least 5 times on all sites (all sites checked, that is ~50)...NO CHANGE!
Take disney.com, for example. Their site says IIS 5.0. I got netscape...so did netcraft.
One word... BULL#%&*!
-Pride
a product .... to confuse script kiddies
I am running Apache on Linux, and I still get 1000 hits a day trying to crack MSADC with buffer overflows, and FrontPage exploit attempts. It's not like the script kiddies check the server ID or pay any attention to it even if they do.
I'll ignore for the moment the question of the quality of their data. I'm sure others will endlessly debate it (and I'll probably join in). Let's look at something else: The quality of their presentation.
First, let's take a look at the most recent Netcraft server survey. Let's see, clean display. The scale grid is subtle and doesn't draw attention to itself, but makes it easy to see exactly where a line falls. There is little wasted pixel data. It's easy to see trends and make comparisons. For the curious the exact numbers for the last two samples is listed (regrettably one two samples are listed). The graph labels the data it shows ("Market Share for Top Servers Across All Domains August 1995 - November 2003") leaving the reader to form his own opinions. On the down side, the scale confusingly marks 7% increments and the yellow line for Netscape/SunOne almost disappears into the background. Still, a well above average for graph. Definately room to improve, but better than most people expect to see.
Now let's example the Port80 server survey. Wow, what a difference. The grid is a much more dominant element. The 3d effect means that bars further in the back appear taller (by up to 15 pixels, or about 7%) and makes it hard to compare a specific data point against the scale. The complexity of the 3d bars complicates things, the "top" of the bar is actually larger than the month to month shift in the numbers. The "area" of the bars implies size (intellectually you know it isn't, but your gut says otherwise), this means that the largely obscured middle bars (Netscape and Apache) seem smaller. Ultimately bars are the wrong choice, we're examining points over time (suggesting a line chart), not clusters of data. The chart is labeled with a conclusion ("Microsoft IIS Maintains Dominance Of the Corporate Web Server Market"), suggesting interpretations to the reader. On the up side, they provide heavily broken up information for the most recent sample point (regrettably it's a graphic). They include a worthless pie chart. If you want to show market share a line chart showing historical data would be much more enlightening.
Conclusion? Port80's graphs suck. Hard. It's a stunning example of how not to create high quality graphs. The creators need to be beaten with copies of Tufte's information display books until they get it. This is the sort of amateur crap I expect on PowerPoint slides from people more interested in being cool than being useful, or perhaps from the graphics department at USA Today. As an engineer I'm disappointed.
Search 2010 Gen Con events
They list the 995 sites they include (they're using the Fortune 1,000, and (looking at some of the earlier reports), apparently 5 Fortune 1,000 companies don't have sites. (If they're still Slashdotted, you can download the pages from Google's cache. start here.)
A bit of quick Perl hackery pulls back the following values, roughly in line with what they report. The second column is actual sites found.
That said, I doubt the usefulness of the survey. It's a survey of Fortune 1,000 companies. These are often companies whose web presence is minimal. What does a giant holding company need with a web site? Heck, five of the companies didn't have any site at all! Of those sites that exist, many lack any sort of complexity (say, thousands of pages, or lots of dynamic pages). Simply put, many of these sites would run fine an almost anything, they don't represent Hard Work. I'm a lot more interested in what Google and Yahoo choose to run than in what the Radian Group and the Kiewit run.
Now Netcraft does have the problem they cite: Netcraft weights everyone equally. Perhaps that introduces bias. Perhaps we should select a set of sites that is high bandwidth, typically has at least some dynamic systems in place (say, to handle selling accounts), and is a popular target for hackers? How about porn sites? Porn operators have a hard job, thanks to Smutcraft you can see what they run.
Second, it looks like they've chosen one site for each company. For Amerco, for example, they chose UHaul.com running IIS. Reasonable enough (UHaul is part of Amerco), but it's interesting that they skipped amerco.com (running Apache). Not a great example, surely (especially since uhaul.com is certainly doing more real work than the very thin amerco.com), but it shows that there is a selection process of some sort, and any selection process risks introducing bias.
Search 2010 Gen Con events
Now do the following commands:
(With Apache 2.x, cd os/unix)
#define PLATFORM "Unix"
(With Apache 2.x, vi ap_release.h)
#define SERVER_BASEVENDOR "Apache Group"
#define SERVER_BASEPRODUCT "Apache"
#define SERVER_BASEREVISION "1.x.xx"
(With Apache 2.x, cd
You're done. Congratulations. You just saved yourself $49 dollars!!!
i tried their header check for www.apache.org [link is here]
Port80 returned this result:
"We detect that www.apache.org is running Apache/2.0.48-dev (Unix)."
But further down the page is this gem:
"No matter what the above results show, this company may be running Microsoft IIS and protecting its Web server identity with ServerMask."
WTF?!
Yet Socrates himself is particularly missed.
A lovely little thinker but a bugger when he's pissed.