Slashdot Mirror


Netcraft Web Server Stats Challenged

kolchak writes "An article in The Age has an interesting analysis of the Netcraft Web Server Usage Reports. According to Port80 Software, Netcraft's surveys are biased towards domain name parkers and very small web sites, not taking into account how popular a site may be - there's some interesting results in the competing Port80 survey." However, it should be pointed out that Port80 "develops software products to enhance the security, performance and user experience of Microsoft's Internet Information Services (IIS) Web server."

19 of 461 comments (clear)

  1. A bit more than the average MS bias by SeanTobin · · Score: 5, Informative

    This is wrong on soooooo many levels. I could understand trying to twist the truth by redefining what a webserver is... but thier sampling method is straight out wrong.

    Want proof? Here it is. Go to the linked article, (or click here) and where they have the box to check your server header (about half way down the page) type in www.microsoft.com - you will see its running IIS/6. A nice happy IIS server.

    Now, type in my web server - http://www.isthatdamngood.com - its a nice Linux/Apache server. My server will CRASH thier app! Actually, a lot of linux servers will crash it...

    Kinda hard to claim your results are more indicitative of the market when your scanning technology is flat out broken.

    --
    Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
    1. Re:A bit more than the average MS bias by _xeno_ · · Score: 5, Informative
      Worked for me. I tried "slashdot.org" and "www.theregister.co.uk" - both of them worked just fine. However, "www.isthatdamngood.com" did indeed cause a scripting error - but I doubt it would effect their actual surveying, it's just an ASP error, not an actual "crash."

      Anyway, it's long been known that Netcraft's methods are flawed, since it counts individual web servers multiple times for each virtual domain. It should only count unique sites. (For example, Slashdot counts for something like 13 sites - the individual sections (like apple.slashdot.org - I'm not listing all of them), slashdot.org, www.slashdot.org, images.slashdot.org.)

      It's still debatable what the correct survey method is (and whether Port80's method is any better), but Netcraft is biased towards sites with lots of virtual domain names. (I'd imagine SourceForge gets counted many times, too...) Of course, it's also questionable if individual servers in a round-robin load-balancing solution should be counted, so counting by IP instead of domain name is questionable too.

      As is often said, "there are lies, damned lies, and statistics" - any counting method has issues.

      Blah, I can't preview because Mozilla is f***ing broken and won't display the preview page, so please pardon any typos.

      --
      You are in a maze of twisty little relative jumps, all alike.
    2. Re:A bit more than the average MS bias by damiam · · Score: 5, Insightful

      If it wasn't so sad that people can charge $50 for what in Apache is a one-line config change, it'd be pretty funny.

      --
      It's hard to be religious when certain people are never incinerated by bolts of lightning.
    3. Re:A bit more than the average MS bias by orthogonal · · Score: 5, Funny

      So.... If you are running MS IIS your best security measure is to pretend to be running Apache?

      No. It's to wave your hands and intone "These are not the servers you're looking for."

      It requires the Obi Wan Server Mask, however.

    4. Re:A bit more than the average MS bias by boneshintai · · Score: 5, Interesting

      i mean, after all, we all turn off ping before we put our servers up... don't we?

      No, as a matter of fact I don't turn off ECHO responses on boxes I manage. I prefer to be able to tell if an operating system or tcp/ip stack has fallen over without having to go over and hook up a console. I'm actually rather annoyed at certain ISPs for continuing to block ping even after Welchia and Slammer have mostly abated.

      Which is not to say you can't turn off pings on your boxes, but neither your preference nor mine is everyone's preference.

    5. Re:A bit more than the average MS bias by panaceaa · · Score: 5, Informative

      The parent poster's point is that their site grabber program can get IIS sites but crashes on some Apache sites. Port80 Software may use the same code to run their surveys since both the grabber and survey programs need the core feature of analyzing a site's HTTP headers.

      So if their survey script also returns invalid data for Apache sites, then the IIS numbers would be much higher than they actually are. I would at least like to see some actual numbers rather than pure percents before I believed their data. They surveyed 1000 sites -- how many sites are included in the survey's data?

      Another thing that seems odd to me is Netscape iPlanet usage is higher than Apache. Where's the primary data to support that?

    6. Re:A bit more than the average MS bias by timeOday · · Score: 5, Insightful

      A script kiddie might still attack you because he's just a brute forcer. Anybody with brains won't trust your server's self-identification... so who are we fooling here?

  2. I tried homepage.apple.com by fidget42 · · Score: 5, Interesting

    and this was their response:

    We detect that homepage.mac.com is running Apache/1.3.27 (Darwin).

    but with this caveat

    Note:
    No matter what the above results show, this company may be running Microsoft IIS and protecting its Web server identity with ServerMask.

    Nope, no bias there.

    --
    The dogcow says "Moof!"
  3. LOL by javiercero · · Score: 5, Interesting

    It is not only funny that according to their "survey" IIS has more market share than Apache, but *gasp* Netscape has a larger market share than Apache too!

    That is as big of a red flag as I have ever seen.

    Of course the fact that they indeed produce softs for IIS is in no way shape or form any sort of indication to a possible, slight, minimal... bias.

    LOL, a nice laugh... and they may even get slashdotted, which will bring joy to their sorry operation since they will now be able to claim that they are now one of the nets most popular companies/sites. I am sure this is some sort of ploy to get traffic, it will be funny to see if indeed their beloved IIS can stand the slashdot effect. LOL

  4. Re:Not so inaccurate .. by Prof.+Pi · · Score: 5, Interesting
    it make no sense to measure market share by simply counting web hosts. If all the high-traffic web sites on the Internet are running IIS while the numerically greater but less popular remainder are running Apache, can you meaningfully say that Apache has a higher 'market share'?

    Didn't Netcraft themselves cover this topic last year? IIRC, some pro-MS group made the same argument, that you should only count the big guys. They looked at the Fortune N (I forget what N was) and found that lo and behold, IIS came out on top.

    Then Netcraft came back with another study, where they ranked companies not by their Fortune ranking (i.e., total revenue), which would tend to favor MS as that's the "safe" choice for big companies. Instead, they ranked companies by how much revenue they made on the Net (so companies like Amazon would rank much higher), and found that by that measure, Apache was again on top.

  5. Something smells... by pridefinger · · Score: 5, Interesting

    I tried several sites myself with my own javascript and guess what?

    My results were were different than their's more than half the time! I figured they had multiple servers running, etc., so I rechecked at least 5 times on all sites (all sites checked, that is ~50)...NO CHANGE!

    Take disney.com, for example. Their site says IIS 5.0. I got netscape...so did netcraft.

    One word... BULL#%&*!

    -Pride

    1. Re:Something smells... by a.koepke · · Score: 5, Interesting
      I just checked this too... Port80 displays MS IIS and Netcraft displays Netscape. I thought I would do my own check. This now shows a flaw in both checks, Netcraft and Port80.

      andreas:/var/mail# telnet disney.com 80
      Trying 198.187.189.55...
      Connected to disney.com.
      Escape character is '^]'.
      HEAD / HTTP/1.0

      HTTP/1.1 302 Moved Temporarily
      Server: Netscape-Enterprise/3.6 SP3
      Date: Thu, 27 Nov 2003 06:44:12 GMT
      Location: http://disney.go.com/
      Content-length: 0
      Content-type: text/html
      Connection: close

      Connection closed by foreign host.
      andreas:/var/mail# telnet disney.go.com 80
      Trying 198.187.189.93...
      Connected to disney.go.com.
      Escape character is '^]'.
      HEAD / HTTP/1.0

      HTTP/1.0 200 OK
      Server: Microsoft-IIS/5.0
      P3P: CP="CAO DSP COR CURa ADMa DEVa TAIa PSAa PSDa IVAi IVDi CONi OUR SAMo OTRo BUS PHY ONL UNI PUR COM NAV INT DEM CNT STA PRE"
      Set-Cookie: SWID=E4481904-1BC1-4D6B-A21F-5FB993D69628; path=/; expires=Thu, 27-Nov-2023 06:44:39 GMT; domain=.go.com;
      Cache-Expires: Thu, 27 Nov 2003 06:47:13 GMT
      Cache-Control: max-age=300
      Date: Thu, 27 Nov 2003 06:44:39 GMT
      Content-Type: text/html
      Accept-Ranges: bytes
      Last-Modified: Thu, 27 Nov 2003 06:42:13 GMT
      ETag: "ba9b4197b1b4c31:b10"
      Content-Length: 6260
      Vary: Accept-Encoding, User-Agent
      Via: 1.1 redline-7 (Redline Networks Accelerator 2.2.8 0)

      Connection closed by foreign host.


      Interesting, Disney.com is a Netscape webserver which just does a 302 Moved header and sends the client to Disney.go.com which is an IIS box.

      So the actual Disney site you end up with (Disney.go.com) is IIS so in that case Port80 are sort of right in reporting it as so. But Netcraft are also right in reporting Netscape for the Disney.com domain since that is what Disney.com is running, Disney.go.com is a seperate domain and would be counted seperately.
      --


      (\(\
      (^.^)
      (")")
      *This is the cute bunny virus, please copy this into your sig so it can spread
  6. Like that's going to work by BigRedFish · · Score: 5, Informative

    a product .... to confuse script kiddies

    I am running Apache on Linux, and I still get 1000 hits a day trying to crack MSADC with buffer overflows, and FrontPage exploit attempts. It's not like the script kiddies check the server ID or pay any attention to it even if they do.

  7. Cheap and flashy graphics by ChaosDiscord · · Score: 5, Insightful

    I'll ignore for the moment the question of the quality of their data. I'm sure others will endlessly debate it (and I'll probably join in). Let's look at something else: The quality of their presentation.

    First, let's take a look at the most recent Netcraft server survey. Let's see, clean display. The scale grid is subtle and doesn't draw attention to itself, but makes it easy to see exactly where a line falls. There is little wasted pixel data. It's easy to see trends and make comparisons. For the curious the exact numbers for the last two samples is listed (regrettably one two samples are listed). The graph labels the data it shows ("Market Share for Top Servers Across All Domains August 1995 - November 2003") leaving the reader to form his own opinions. On the down side, the scale confusingly marks 7% increments and the yellow line for Netscape/SunOne almost disappears into the background. Still, a well above average for graph. Definately room to improve, but better than most people expect to see.

    Now let's example the Port80 server survey. Wow, what a difference. The grid is a much more dominant element. The 3d effect means that bars further in the back appear taller (by up to 15 pixels, or about 7%) and makes it hard to compare a specific data point against the scale. The complexity of the 3d bars complicates things, the "top" of the bar is actually larger than the month to month shift in the numbers. The "area" of the bars implies size (intellectually you know it isn't, but your gut says otherwise), this means that the largely obscured middle bars (Netscape and Apache) seem smaller. Ultimately bars are the wrong choice, we're examining points over time (suggesting a line chart), not clusters of data. The chart is labeled with a conclusion ("Microsoft IIS Maintains Dominance Of the Corporate Web Server Market"), suggesting interpretations to the reader. On the up side, they provide heavily broken up information for the most recent sample point (regrettably it's a graphic). They include a worthless pie chart. If you want to show market share a line chart showing historical data would be much more enlightening.

    Conclusion? Port80's graphs suck. Hard. It's a stunning example of how not to create high quality graphs. The creators need to be beaten with copies of Tufte's information display books until they get it. This is the sort of amateur crap I expect on PowerPoint slides from people more interested in being cool than being useful, or perhaps from the graphics department at USA Today. As an engineer I'm disappointed.

  8. Numbers look legit, but of questionable value. by ChaosDiscord · · Score: 5, Insightful

    They list the 995 sites they include (they're using the Fortune 1,000, and (looking at some of the earlier reports), apparently 5 Fortune 1,000 companies don't have sites. (If they're still Slashdotted, you can download the pages from Google's cache. start here.)

    A bit of quick Perl hackery pulls back the following values, roughly in line with what they report. The second column is actual sites found.

    54.0% 537 Microsoft-IIS
    18.2% 182 Netscape-Enterprise
    16.1% 161 Apache
    _3.6% _36 OTHER
    _3.4% _34 IBM_HTTP_SERVER
    _2.7% _27 UNKNOWN
    _1.8% _18 Lotus-Domino
    _____ 995 TOTAL

    That said, I doubt the usefulness of the survey. It's a survey of Fortune 1,000 companies. These are often companies whose web presence is minimal. What does a giant holding company need with a web site? Heck, five of the companies didn't have any site at all! Of those sites that exist, many lack any sort of complexity (say, thousands of pages, or lots of dynamic pages). Simply put, many of these sites would run fine an almost anything, they don't represent Hard Work. I'm a lot more interested in what Google and Yahoo choose to run than in what the Radian Group and the Kiewit run.

    Now Netcraft does have the problem they cite: Netcraft weights everyone equally. Perhaps that introduces bias. Perhaps we should select a set of sites that is high bandwidth, typically has at least some dynamic systems in place (say, to handle selling accounts), and is a popular target for hackers? How about porn sites? Porn operators have a hard job, thanks to Smutcraft you can see what they run.

    Second, it looks like they've chosen one site for each company. For Amerco, for example, they chose UHaul.com running IIS. Reasonable enough (UHaul is part of Amerco), but it's interesting that they skipped amerco.com (running Apache). Not a great example, surely (especially since uhaul.com is certainly doing more real work than the very thin amerco.com), but it shows that there is a selection process of some sort, and any selection process risks introducing bias.

  9. Free Software Wins again. by Anonymous Coward · · Score: 5, Informative
    and what would that one line be?I want my $50 worth on my apache server


    • Unpack the Apache distro file (apache_1.x.xx.tar.gz) and run the configure script.

      Now do the following commands:

    • cd src/os/unix
      (With Apache 2.x, cd os/unix)
    • vi os.h
    • Search for:
      #define PLATFORM "Unix"
    • Replace "Unix" with whatever you want your OS identification to be. (Some of the more creative ones I've done are 'NachOS,' 'PathOS,' 'StratOS,' 'ZerOS,' and 'WinDos'...anything.)
    • Save the file.
    • cd ../../include
    • vi httpd.h
      (With Apache 2.x, vi ap_release.h)
    • Search for:
      #define SERVER_BASEVENDOR "Apache Group"
      #define SERVER_BASEPRODUCT "Apache"
      #define SERVER_BASEREVISION "1.x.xx"
    • Replace "Apache" and "1.x.xx" with whatever you want your Server and version number to be. (I recommend "Port80Software-Is-A-Fucking-Ripoff" and "Holy-Jumping-Jesus-This-Was-Easy", respectively.)
    • Save the file.
    • cd ../..
      (With Apache 2.x, cd ..)
    • make

    You're done. Congratulations. You just saved yourself $49 dollars!!!
    1. Re:Free Software Wins again. by ivan.ristic · · Score: 5, Informative

      If you're using mod_security on your Apache server then you only need to add one line to the configuration file:

      SecServerSignature "MyServer/19.5.1"

  10. Yes they are... check this out by imtheguru · · Score: 5, Funny

    i tried their header check for www.apache.org [link is here]

    Port80 returned this result:
    "We detect that www.apache.org is running Apache/2.0.48-dev (Unix)."

    But further down the page is this gem:
    "No matter what the above results show, this company may be running Microsoft IIS and protecting its Web server identity with ServerMask."

    WTF?!

    --
    Yet Socrates himself is particularly missed.
    A lovely little thinker but a bugger when he's pissed.
    1. Re:Yes they are... check this out by kyrre · · Score: 5, Interesting

      Apperantly servermask is their product. When I try a site I knew running IIS response is like so:

      Protect your Web server identity with ServerMask!
      Why let anyone find out you're running a Microsoft IIS server? Don't tempt potential hackers!

      Try ServerMask FREE for 30 days. Download Now!
      Buy ServerMask for only $49.95 today!


      No: "No matter what the above results show, this company may be running Apache and protecting its Web server identity with ServerMask."

      Security through masking the server string sounds very secure. sigh.