Netcraft Web Server Stats Challenged
kolchak writes "An article in The Age has an interesting analysis of the Netcraft Web Server Usage Reports. According to Port80 Software, Netcraft's surveys are biased towards domain name parkers and very small web sites, not taking into account how popular a site may be - there's some interesting results in the competing Port80 survey." However, it should be pointed out that Port80 "develops software products to enhance the security, performance and user experience of Microsoft's Internet Information Services (IIS) Web server."
This is wrong on soooooo many levels. I could understand trying to twist the truth by redefining what a webserver is... but thier sampling method is straight out wrong.
Want proof? Here it is. Go to the linked article, (or click here) and where they have the box to check your server header (about half way down the page) type in www.microsoft.com - you will see its running IIS/6. A nice happy IIS server.
Now, type in my web server - http://www.isthatdamngood.com - its a nice Linux/Apache server. My server will CRASH thier app! Actually, a lot of linux servers will crash it...
Kinda hard to claim your results are more indicitative of the market when your scanning technology is flat out broken.
Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
From thier Partners page:
"Port80 Software's Strategic Partners:
Microsoft, Inc."
Strategic in what way? FUD?
and this was their response:
We detect that homepage.mac.com is running Apache/1.3.27 (Darwin).
but with this caveat
Note:
No matter what the above results show, this company may be running Microsoft IIS and protecting its Web server identity with ServerMask.
Nope, no bias there.
The dogcow says "Moof!"
It is not only funny that according to their "survey" IIS has more market share than Apache, but *gasp* Netscape has a larger market share than Apache too!
That is as big of a red flag as I have ever seen.
Of course the fact that they indeed produce softs for IIS is in no way shape or form any sort of indication to a possible, slight, minimal... bias.
LOL, a nice laugh... and they may even get slashdotted, which will bring joy to their sorry operation since they will now be able to claim that they are now one of the nets most popular companies/sites. I am sure this is some sort of ploy to get traffic, it will be funny to see if indeed their beloved IIS can stand the slashdot effect. LOL
Even if these Port80 guys are on Microsoft's payroll, the point they make is still quite correct - it make no sense to measure market share by simply counting web hosts. If all the high-traffic web sites on the Internet are running IIS while the numerically greater but less popular remainder are running Apache, can you meaningfully say that Apache has a higher 'market share'?
Unfortunately, short of tracking people's surfing habits or getting access to web server logs, there is no easy way of working out the popularity of a site. Netcraft's method of polling every known webserver is really the only practical method available, if it is not truly accurate.
You have to look at their survey. It's talking about the CORPORATE web servers. I work for a major corporate america company. We have close to 4000 servers handling our "web" environment. That consists of web, app, and database servers. There's more IIS then anything else out there for sure in corporate america. Expecially on the WEB front end. In a corporate environment there are about 20 Windows to 1 Unix boxes. Mostly due to Windows servers being so cheap and can't handle as much load per server. But on the DATABASE backend there is much more UNIX to Windows.
Another thing is Corporate America is barely getting their feet wet with Linux/Apache. The UNIX boxes that are installed are not running Apache, they're running something from a major vendor (ie. Netscape, etc). Up until this year there was NO linux in the corporate company I work for. If a MAJOR vendor will not support a product, corporate america will not install it. They love to point the finger at the vendors. If there's nobody to point a finger at when something goes wrong, it will not get installed.
Until Redhat started selling Linux for $5k corporate america wouldn't even bat an eye at it. Now they're eating it up like hot cakes cause it's EXPENSIVE! Linux is no longer a free thing. Now powerful execs can point fingers and plus be able to throw around the "L" buzz word and feel like they're pushing the envelope.
I tried several sites myself with my own javascript and guess what?
My results were were different than their's more than half the time! I figured they had multiple servers running, etc., so I rechecked at least 5 times on all sites (all sites checked, that is ~50)...NO CHANGE!
Take disney.com, for example. Their site says IIS 5.0. I got netscape...so did netcraft.
One word... BULL#%&*!
-Pride
Port80 Survey header check /surveys/top1000webservers/headercheck.asp, line 121
Microsoft OLE DB Provider for ODBC Drivers error '80040e57'
[Microsoft][ODBC SQL Server Driver][SQL Server]String or binary data would be truncated.
A suggestion for their servermask product: COVER UP ERRORS THAT GIVE AWAY INFORMATION. Seriously, if they think that headers are going to give away a lot of info, then forced errors will, too. But, there is boatload of other techniques (including passive techniques) that get around their security-throught-obscurity program.
HIV Crosses Species Barrier... into Muppets
It doesn't matter if the domain is parked or serving thousands of pages...domains are just as easily parked on IIS as on Apache.
slashdot, news for crazed liberal socialist zealots
I could not help but notice that Google, Yahoo, and Slashdot are omitted from their "top 1000" list. Yet rumors persist that these three web sites get a fair amount of traffic.
a product .... to confuse script kiddies
I am running Apache on Linux, and I still get 1000 hits a day trying to crack MSADC with buffer overflows, and FrontPage exploit attempts. It's not like the script kiddies check the server ID or pay any attention to it even if they do.
If you are conducting a survey to find out what is the "best of the best" in server software, why survey Family Dollar Store? Or Land 'O Lakes? You should be choosing technically savvy, solution neutral companies are likely to choose the best. These are the actual companies that have a big web presence and you would not expect them to choose a platform which would affect their bottom line badly... As opposed to Sears Roebuck, whose online presence can be compared to Amazon's retail presence. Would we ask Amazon how to organize endcaps? Let's pick a few technically adept companies at random here...
Amazon - Apache
AT&T - Netscape
Bell South - Apache
Cisco - Unix
Dell - IIS5
Earthlink - Netscape
E-Bay - IIS4
HP - Apache
Intel - IIS6
Lucent - Netscape
Motorola - Apache
National Semiconductor - Netscape
Nextel - Netscape
Qualcomm - Netscape
PC Connection - IIS5
I can't survey any more companies, because Port80's IIS6 server is slashdotted. However, if is apparent from this data that nearly 1/3rd of all websites that count are hosted on Netscape platforms. Apache and IIS share 1/4th each, and Cisco's odd unix variant wrapps up the rest.
Personally I'm amazed that Netscape is holding on to a lead... I would have expected them to be out of the running long ago. I'll have to check them out.
The ______ Agenda
I'll ignore for the moment the question of the quality of their data. I'm sure others will endlessly debate it (and I'll probably join in). Let's look at something else: The quality of their presentation.
First, let's take a look at the most recent Netcraft server survey. Let's see, clean display. The scale grid is subtle and doesn't draw attention to itself, but makes it easy to see exactly where a line falls. There is little wasted pixel data. It's easy to see trends and make comparisons. For the curious the exact numbers for the last two samples is listed (regrettably one two samples are listed). The graph labels the data it shows ("Market Share for Top Servers Across All Domains August 1995 - November 2003") leaving the reader to form his own opinions. On the down side, the scale confusingly marks 7% increments and the yellow line for Netscape/SunOne almost disappears into the background. Still, a well above average for graph. Definately room to improve, but better than most people expect to see.
Now let's example the Port80 server survey. Wow, what a difference. The grid is a much more dominant element. The 3d effect means that bars further in the back appear taller (by up to 15 pixels, or about 7%) and makes it hard to compare a specific data point against the scale. The complexity of the 3d bars complicates things, the "top" of the bar is actually larger than the month to month shift in the numbers. The "area" of the bars implies size (intellectually you know it isn't, but your gut says otherwise), this means that the largely obscured middle bars (Netscape and Apache) seem smaller. Ultimately bars are the wrong choice, we're examining points over time (suggesting a line chart), not clusters of data. The chart is labeled with a conclusion ("Microsoft IIS Maintains Dominance Of the Corporate Web Server Market"), suggesting interpretations to the reader. On the up side, they provide heavily broken up information for the most recent sample point (regrettably it's a graphic). They include a worthless pie chart. If you want to show market share a line chart showing historical data would be much more enlightening.
Conclusion? Port80's graphs suck. Hard. It's a stunning example of how not to create high quality graphs. The creators need to be beaten with copies of Tufte's information display books until they get it. This is the sort of amateur crap I expect on PowerPoint slides from people more interested in being cool than being useful, or perhaps from the graphics department at USA Today. As an engineer I'm disappointed.
Search 2010 Gen Con events
You can't make an accurate comparison unless you can remove all the other factors which directly affect how the server will perform.
"I have a porkchop, you have a porkchop. I have a veal, you have a veal".
Port80Software has been slashdotted. As of 23:41 MTN Standardtime Nov 26th, 2003.. their box is completely down.
...
Wonder what they're running
What about boxes like the ones where I work that run many (dozens, hundreds even) domains on one physical server? That's where the real difference creeps in; it's how 60-whatever % of sites run on Linux while 60-whatever % of boxes running web servers run Windows. Lots of the Linux boxes run multiple sites (and I don't just mean www.foo.com and images.foo.com; I mean they run www.foo.com and www.bar.com and www.baz.com and www.qxt.com on the single box).
So, take one of my boxes at work: it currently hosts 53 second-level domains and about 200 subdomains from them. The one I'm thinking of has its own class C netblock, but we have similar ones that just have a single IP address for their dozens of sites. Do you want that counted as one server, as 53, or as 200? Netcraft says it's 200. Port80 says it's 1. I'd like to count it as 53. Netcraft's way tells you what people who make web hosting decisions like. Port80's way tells you what people who make hardware and software buying decisions like.
All's true that is mistrusted
They list the 995 sites they include (they're using the Fortune 1,000, and (looking at some of the earlier reports), apparently 5 Fortune 1,000 companies don't have sites. (If they're still Slashdotted, you can download the pages from Google's cache. start here.)
A bit of quick Perl hackery pulls back the following values, roughly in line with what they report. The second column is actual sites found.
That said, I doubt the usefulness of the survey. It's a survey of Fortune 1,000 companies. These are often companies whose web presence is minimal. What does a giant holding company need with a web site? Heck, five of the companies didn't have any site at all! Of those sites that exist, many lack any sort of complexity (say, thousands of pages, or lots of dynamic pages). Simply put, many of these sites would run fine an almost anything, they don't represent Hard Work. I'm a lot more interested in what Google and Yahoo choose to run than in what the Radian Group and the Kiewit run.
Now Netcraft does have the problem they cite: Netcraft weights everyone equally. Perhaps that introduces bias. Perhaps we should select a set of sites that is high bandwidth, typically has at least some dynamic systems in place (say, to handle selling accounts), and is a popular target for hackers? How about porn sites? Porn operators have a hard job, thanks to Smutcraft you can see what they run.
Second, it looks like they've chosen one site for each company. For Amerco, for example, they chose UHaul.com running IIS. Reasonable enough (UHaul is part of Amerco), but it's interesting that they skipped amerco.com (running Apache). Not a great example, surely (especially since uhaul.com is certainly doing more real work than the very thin amerco.com), but it shows that there is a selection process of some sort, and any selection process risks introducing bias.
Search 2010 Gen Con events
Now do the following commands:
(With Apache 2.x, cd os/unix)
#define PLATFORM "Unix"
(With Apache 2.x, vi ap_release.h)
#define SERVER_BASEVENDOR "Apache Group"
#define SERVER_BASEPRODUCT "Apache"
#define SERVER_BASEREVISION "1.x.xx"
(With Apache 2.x, cd
You're done. Congratulations. You just saved yourself $49 dollars!!!
http://news.netcraft.com/archives/2003/11/03/novem ber_2003_web_server_survey.html
Is the latest survey, apache has 67.41 of all domains (well, all that Netcraft knows about anyways) at 30298060 domains.
If you look only at "active" domains, apache has 68.60%, so actually even a *higher* market-share. Of a total of 14370515 active domains. (so according to Netcraft, about half of all registered domains are "active" and the other half are "parked"
i tried their header check for www.apache.org [link is here]
Port80 returned this result:
"We detect that www.apache.org is running Apache/2.0.48-dev (Unix)."
But further down the page is this gem:
"No matter what the above results show, this company may be running Microsoft IIS and protecting its Web server identity with ServerMask."
WTF?!
Yet Socrates himself is particularly missed.
A lovely little thinker but a bugger when he's pissed.
So typical of "open sores" zealots...
"EXPERTS CONFIRM: CONFIGURING OPEN SOURCE SOFTWARE IS 300% MORE DIFFICULT THAN ORIGINALLY CLAIMED"