Slashdot Mirror


Netcraft Web Server Stats Challenged

kolchak writes "An article in The Age has an interesting analysis of the Netcraft Web Server Usage Reports. According to Port80 Software, Netcraft's surveys are biased towards domain name parkers and very small web sites, not taking into account how popular a site may be - there's some interesting results in the competing Port80 survey." However, it should be pointed out that Port80 "develops software products to enhance the security, performance and user experience of Microsoft's Internet Information Services (IIS) Web server."

4 of 461 comments (clear)

  1. Re:A bit more than the average MS bias by damiam · · Score: 5, Insightful

    If it wasn't so sad that people can charge $50 for what in Apache is a one-line config change, it'd be pretty funny.

    --
    It's hard to be religious when certain people are never incinerated by bolts of lightning.
  2. Cheap and flashy graphics by ChaosDiscord · · Score: 5, Insightful

    I'll ignore for the moment the question of the quality of their data. I'm sure others will endlessly debate it (and I'll probably join in). Let's look at something else: The quality of their presentation.

    First, let's take a look at the most recent Netcraft server survey. Let's see, clean display. The scale grid is subtle and doesn't draw attention to itself, but makes it easy to see exactly where a line falls. There is little wasted pixel data. It's easy to see trends and make comparisons. For the curious the exact numbers for the last two samples is listed (regrettably one two samples are listed). The graph labels the data it shows ("Market Share for Top Servers Across All Domains August 1995 - November 2003") leaving the reader to form his own opinions. On the down side, the scale confusingly marks 7% increments and the yellow line for Netscape/SunOne almost disappears into the background. Still, a well above average for graph. Definately room to improve, but better than most people expect to see.

    Now let's example the Port80 server survey. Wow, what a difference. The grid is a much more dominant element. The 3d effect means that bars further in the back appear taller (by up to 15 pixels, or about 7%) and makes it hard to compare a specific data point against the scale. The complexity of the 3d bars complicates things, the "top" of the bar is actually larger than the month to month shift in the numbers. The "area" of the bars implies size (intellectually you know it isn't, but your gut says otherwise), this means that the largely obscured middle bars (Netscape and Apache) seem smaller. Ultimately bars are the wrong choice, we're examining points over time (suggesting a line chart), not clusters of data. The chart is labeled with a conclusion ("Microsoft IIS Maintains Dominance Of the Corporate Web Server Market"), suggesting interpretations to the reader. On the up side, they provide heavily broken up information for the most recent sample point (regrettably it's a graphic). They include a worthless pie chart. If you want to show market share a line chart showing historical data would be much more enlightening.

    Conclusion? Port80's graphs suck. Hard. It's a stunning example of how not to create high quality graphs. The creators need to be beaten with copies of Tufte's information display books until they get it. This is the sort of amateur crap I expect on PowerPoint slides from people more interested in being cool than being useful, or perhaps from the graphics department at USA Today. As an engineer I'm disappointed.

  3. Numbers look legit, but of questionable value. by ChaosDiscord · · Score: 5, Insightful

    They list the 995 sites they include (they're using the Fortune 1,000, and (looking at some of the earlier reports), apparently 5 Fortune 1,000 companies don't have sites. (If they're still Slashdotted, you can download the pages from Google's cache. start here.)

    A bit of quick Perl hackery pulls back the following values, roughly in line with what they report. The second column is actual sites found.

    54.0% 537 Microsoft-IIS
    18.2% 182 Netscape-Enterprise
    16.1% 161 Apache
    _3.6% _36 OTHER
    _3.4% _34 IBM_HTTP_SERVER
    _2.7% _27 UNKNOWN
    _1.8% _18 Lotus-Domino
    _____ 995 TOTAL

    That said, I doubt the usefulness of the survey. It's a survey of Fortune 1,000 companies. These are often companies whose web presence is minimal. What does a giant holding company need with a web site? Heck, five of the companies didn't have any site at all! Of those sites that exist, many lack any sort of complexity (say, thousands of pages, or lots of dynamic pages). Simply put, many of these sites would run fine an almost anything, they don't represent Hard Work. I'm a lot more interested in what Google and Yahoo choose to run than in what the Radian Group and the Kiewit run.

    Now Netcraft does have the problem they cite: Netcraft weights everyone equally. Perhaps that introduces bias. Perhaps we should select a set of sites that is high bandwidth, typically has at least some dynamic systems in place (say, to handle selling accounts), and is a popular target for hackers? How about porn sites? Porn operators have a hard job, thanks to Smutcraft you can see what they run.

    Second, it looks like they've chosen one site for each company. For Amerco, for example, they chose UHaul.com running IIS. Reasonable enough (UHaul is part of Amerco), but it's interesting that they skipped amerco.com (running Apache). Not a great example, surely (especially since uhaul.com is certainly doing more real work than the very thin amerco.com), but it shows that there is a selection process of some sort, and any selection process risks introducing bias.

  4. Re:A bit more than the average MS bias by timeOday · · Score: 5, Insightful

    A script kiddie might still attack you because he's just a brute forcer. Anybody with brains won't trust your server's self-identification... so who are we fooling here?