Netcraft Web Server Stats Challenged
kolchak writes "An article in The Age has an interesting analysis of the Netcraft Web Server Usage Reports. According to Port80 Software, Netcraft's surveys are biased towards domain name parkers and very small web sites, not taking into account how popular a site may be - there's some interesting results in the competing Port80 survey." However, it should be pointed out that Port80 "develops software products to enhance the security, performance and user experience of Microsoft's Internet Information Services (IIS) Web server."
From thier Partners page:
"Port80 Software's Strategic Partners:
Microsoft, Inc."
Strategic in what way? FUD?
umm, how can you claim that they are sampling correctly when your only evidence of the way they sample is by way of an app that crashs on linux/apache servers?
I am the Alpha and the Omega-3
If it wasn't so sad that people can charge $50 for what in Apache is a one-line config change, it'd be pretty funny.
It's hard to be religious when certain people are never incinerated by bolts of lightning.
You have to look at their survey. It's talking about the CORPORATE web servers. I work for a major corporate america company. We have close to 4000 servers handling our "web" environment. That consists of web, app, and database servers. There's more IIS then anything else out there for sure in corporate america. Expecially on the WEB front end. In a corporate environment there are about 20 Windows to 1 Unix boxes. Mostly due to Windows servers being so cheap and can't handle as much load per server. But on the DATABASE backend there is much more UNIX to Windows.
Another thing is Corporate America is barely getting their feet wet with Linux/Apache. The UNIX boxes that are installed are not running Apache, they're running something from a major vendor (ie. Netscape, etc). Up until this year there was NO linux in the corporate company I work for. If a MAJOR vendor will not support a product, corporate america will not install it. They love to point the finger at the vendors. If there's nobody to point a finger at when something goes wrong, it will not get installed.
Until Redhat started selling Linux for $5k corporate america wouldn't even bat an eye at it. Now they're eating it up like hot cakes cause it's EXPENSIVE! Linux is no longer a free thing. Now powerful execs can point fingers and plus be able to throw around the "L" buzz word and feel like they're pushing the envelope.
It doesn't matter if the domain is parked or serving thousands of pages...domains are just as easily parked on IIS as on Apache.
slashdot, news for crazed liberal socialist zealots
I'll ignore for the moment the question of the quality of their data. I'm sure others will endlessly debate it (and I'll probably join in). Let's look at something else: The quality of their presentation.
First, let's take a look at the most recent Netcraft server survey. Let's see, clean display. The scale grid is subtle and doesn't draw attention to itself, but makes it easy to see exactly where a line falls. There is little wasted pixel data. It's easy to see trends and make comparisons. For the curious the exact numbers for the last two samples is listed (regrettably one two samples are listed). The graph labels the data it shows ("Market Share for Top Servers Across All Domains August 1995 - November 2003") leaving the reader to form his own opinions. On the down side, the scale confusingly marks 7% increments and the yellow line for Netscape/SunOne almost disappears into the background. Still, a well above average for graph. Definately room to improve, but better than most people expect to see.
Now let's example the Port80 server survey. Wow, what a difference. The grid is a much more dominant element. The 3d effect means that bars further in the back appear taller (by up to 15 pixels, or about 7%) and makes it hard to compare a specific data point against the scale. The complexity of the 3d bars complicates things, the "top" of the bar is actually larger than the month to month shift in the numbers. The "area" of the bars implies size (intellectually you know it isn't, but your gut says otherwise), this means that the largely obscured middle bars (Netscape and Apache) seem smaller. Ultimately bars are the wrong choice, we're examining points over time (suggesting a line chart), not clusters of data. The chart is labeled with a conclusion ("Microsoft IIS Maintains Dominance Of the Corporate Web Server Market"), suggesting interpretations to the reader. On the up side, they provide heavily broken up information for the most recent sample point (regrettably it's a graphic). They include a worthless pie chart. If you want to show market share a line chart showing historical data would be much more enlightening.
Conclusion? Port80's graphs suck. Hard. It's a stunning example of how not to create high quality graphs. The creators need to be beaten with copies of Tufte's information display books until they get it. This is the sort of amateur crap I expect on PowerPoint slides from people more interested in being cool than being useful, or perhaps from the graphics department at USA Today. As an engineer I'm disappointed.
Search 2010 Gen Con events
What about boxes like the ones where I work that run many (dozens, hundreds even) domains on one physical server? That's where the real difference creeps in; it's how 60-whatever % of sites run on Linux while 60-whatever % of boxes running web servers run Windows. Lots of the Linux boxes run multiple sites (and I don't just mean www.foo.com and images.foo.com; I mean they run www.foo.com and www.bar.com and www.baz.com and www.qxt.com on the single box).
So, take one of my boxes at work: it currently hosts 53 second-level domains and about 200 subdomains from them. The one I'm thinking of has its own class C netblock, but we have similar ones that just have a single IP address for their dozens of sites. Do you want that counted as one server, as 53, or as 200? Netcraft says it's 200. Port80 says it's 1. I'd like to count it as 53. Netcraft's way tells you what people who make web hosting decisions like. Port80's way tells you what people who make hardware and software buying decisions like.
All's true that is mistrusted
while it is not a substitute for a good security policy, it is an excellent augmentation. the old saying goes that the only secure computer is one that isn't connected to the network. well, that's not really possible if yr running a web server, but you definitely don't need to advertise that you're connected... or how you're connected.
let's use a military analogy (ugh). you may put your soldiers in an armoured transport... but they still wear camoflauge.
i mean, after all, we all turn off ping before we put our servers up... don't we?
2 1337 4 u!
They list the 995 sites they include (they're using the Fortune 1,000, and (looking at some of the earlier reports), apparently 5 Fortune 1,000 companies don't have sites. (If they're still Slashdotted, you can download the pages from Google's cache. start here.)
A bit of quick Perl hackery pulls back the following values, roughly in line with what they report. The second column is actual sites found.
That said, I doubt the usefulness of the survey. It's a survey of Fortune 1,000 companies. These are often companies whose web presence is minimal. What does a giant holding company need with a web site? Heck, five of the companies didn't have any site at all! Of those sites that exist, many lack any sort of complexity (say, thousands of pages, or lots of dynamic pages). Simply put, many of these sites would run fine an almost anything, they don't represent Hard Work. I'm a lot more interested in what Google and Yahoo choose to run than in what the Radian Group and the Kiewit run.
Now Netcraft does have the problem they cite: Netcraft weights everyone equally. Perhaps that introduces bias. Perhaps we should select a set of sites that is high bandwidth, typically has at least some dynamic systems in place (say, to handle selling accounts), and is a popular target for hackers? How about porn sites? Porn operators have a hard job, thanks to Smutcraft you can see what they run.
Second, it looks like they've chosen one site for each company. For Amerco, for example, they chose UHaul.com running IIS. Reasonable enough (UHaul is part of Amerco), but it's interesting that they skipped amerco.com (running Apache). Not a great example, surely (especially since uhaul.com is certainly doing more real work than the very thin amerco.com), but it shows that there is a selection process of some sort, and any selection process risks introducing bias.
Search 2010 Gen Con events
A script kiddie might still attack you because he's just a brute forcer. Anybody with brains won't trust your server's self-identification... so who are we fooling here?