Slashdot Mirror


Netcraft Web Server Stats Challenged

kolchak writes "An article in The Age has an interesting analysis of the Netcraft Web Server Usage Reports. According to Port80 Software, Netcraft's surveys are biased towards domain name parkers and very small web sites, not taking into account how popular a site may be - there's some interesting results in the competing Port80 survey." However, it should be pointed out that Port80 "develops software products to enhance the security, performance and user experience of Microsoft's Internet Information Services (IIS) Web server."

24 of 461 comments (clear)

  1. A bit more than the average MS bias by SeanTobin · · Score: 5, Informative

    This is wrong on soooooo many levels. I could understand trying to twist the truth by redefining what a webserver is... but thier sampling method is straight out wrong.

    Want proof? Here it is. Go to the linked article, (or click here) and where they have the box to check your server header (about half way down the page) type in www.microsoft.com - you will see its running IIS/6. A nice happy IIS server.

    Now, type in my web server - http://www.isthatdamngood.com - its a nice Linux/Apache server. My server will CRASH thier app! Actually, a lot of linux servers will crash it...

    Kinda hard to claim your results are more indicitative of the market when your scanning technology is flat out broken.

    --
    Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
    1. Re:A bit more than the average MS bias by ejaw5 · · Score: 4, Informative

      Check out the ad below the detection test:

      Note:
      No matter what the above results show, this company may be running Microsoft IIS and protecting its Web server identity with ServerMask.

      Try ServerMask FREE for 30 days. Download Now!
      Buy ServerMask for only $49.95 today!

      --

      $cat /dev/random > Sig
    2. Re:A bit more than the average MS bias by _xeno_ · · Score: 5, Informative
      Worked for me. I tried "slashdot.org" and "www.theregister.co.uk" - both of them worked just fine. However, "www.isthatdamngood.com" did indeed cause a scripting error - but I doubt it would effect their actual surveying, it's just an ASP error, not an actual "crash."

      Anyway, it's long been known that Netcraft's methods are flawed, since it counts individual web servers multiple times for each virtual domain. It should only count unique sites. (For example, Slashdot counts for something like 13 sites - the individual sections (like apple.slashdot.org - I'm not listing all of them), slashdot.org, www.slashdot.org, images.slashdot.org.)

      It's still debatable what the correct survey method is (and whether Port80's method is any better), but Netcraft is biased towards sites with lots of virtual domain names. (I'd imagine SourceForge gets counted many times, too...) Of course, it's also questionable if individual servers in a round-robin load-balancing solution should be counted, so counting by IP instead of domain name is questionable too.

      As is often said, "there are lies, damned lies, and statistics" - any counting method has issues.

      Blah, I can't preview because Mozilla is f***ing broken and won't display the preview page, so please pardon any typos.

      --
      You are in a maze of twisty little relative jumps, all alike.
    3. Re:A bit more than the average MS bias by servoled · · Score: 2, Informative

      I never claimed that their sampling method was correct. I only claimed that there is insufficient evidence to say that it is incorrect, especially when the evidence presented tells absolutely nothing about the sampling method in question (ie, which sites they chose to sample, how many times they sample the sites, what weighting they give to each site, etc...).

      From the evidence at hand all you can say is that they aren't the best ASP/SQL programmers which is completely unrelated to the sampling of websites from a statistical point of view.

      Please take the time to carefully read a post before responding.

      --
      "I have a porkchop, you have a porkchop. I have a veal, you have a veal".
    4. Re:A bit more than the average MS bias by panaceaa · · Score: 5, Informative

      The parent poster's point is that their site grabber program can get IIS sites but crashes on some Apache sites. Port80 Software may use the same code to run their surveys since both the grabber and survey programs need the core feature of analyzing a site's HTTP headers.

      So if their survey script also returns invalid data for Apache sites, then the IIS numbers would be much higher than they actually are. I would at least like to see some actual numbers rather than pure percents before I believed their data. They surveyed 1000 sites -- how many sites are included in the survey's data?

      Another thing that seems odd to me is Netscape iPlanet usage is higher than Apache. Where's the primary data to support that?

    5. Re:A bit more than the average MS bias by tkittel · · Score: 2, Informative

      > Kinda hard to claim your results are more indicitative
      > of the market when your scanning technology is flat out broken.

      Worse than broken.

      I just checked www.fys.ku.dk and www.nbi.dk which are running on some old unix. But Port80 happily claims Microsoft-IIS/5.0. (Netcraft sees them correctly).

      Now that is just plain cheating!

    6. Re:A bit more than the average MS bias by Chris-Port80 · · Score: 2, Informative

      Thanks for catching a bug in Port80's real-time header check tool. We will look into the tool's SQL error on the URL www.isthatdamngood.com.

      That's not too damn good...

      Our online tools are not perfect, but they do work for most Apache sites. For instance, here is another version of the tool and a report for apache.org:

      http://www.port80software.com/products/httpzip/com presscheck?url=www.apache.org

      The actual Web server survey (www.port80software.com/surveys/top1000webservers) is conducted by another offline tool developed in Python by Port80's folks. Our published results have been verified independently on this thread today for the Fortune 1000 sites -- in terms of the current and ongoing Web server market share among the main corporate sites of Fortune 1000 companies.

      Here's the methodology we followed (http://www.port80software.com/surveys/top1000webs ervers/methodology), and the results from our November survey can be accessed online in our archive reports:

      http://www.port80software.com/surveys/top1000webse rvers/#checkacompanyout

      Happy Turkey Day,

      Chris @ Port80

  2. Re:This makes sense.. by An+Anonymous+Hero · · Score: 2, Informative
    Netcraft's method *is* unfair, because there's no weight as to the location to which the domains point.

    What's the alternative, counting by IP? It could be interesting, but not necessarily more representative. I'm on a shared host with dozens of other domains: by choosing that host, we 'cast votes' for Apache, didn't we?

  3. Hahaha! Yep, good old IIS is SOOOO reliable by Anonymous Coward · · Score: 0, Informative

    Trying to access http://port80software.com/:

    Microsoft OLE DB Provider for ODBC Drivers error '80040e31'

    [Microsoft][ODBC SQL Server Driver]Timeout expired /includes/Referer.asp, line 7

    Hahahah! Yeah, I'll trust ANYTHING those MS lackies have to say.

  4. Like that's going to work by BigRedFish · · Score: 5, Informative

    a product .... to confuse script kiddies

    I am running Apache on Linux, and I still get 1000 hits a day trying to crack MSADC with buffer overflows, and FrontPage exploit attempts. It's not like the script kiddies check the server ID or pay any attention to it even if they do.

  5. Ok, so use the survey's at securityspace.com by Anonymous Coward · · Score: 2, Informative

    The surveys at securityspace.com attempt to weight webserver popularity by site popularity.

  6. A good methodology by cgenman · · Score: 4, Informative

    If you are conducting a survey to find out what is the "best of the best" in server software, why survey Family Dollar Store? Or Land 'O Lakes? You should be choosing technically savvy, solution neutral companies are likely to choose the best. These are the actual companies that have a big web presence and you would not expect them to choose a platform which would affect their bottom line badly... As opposed to Sears Roebuck, whose online presence can be compared to Amazon's retail presence. Would we ask Amazon how to organize endcaps? Let's pick a few technically adept companies at random here...

    Amazon - Apache
    AT&T - Netscape
    Bell South - Apache
    Cisco - Unix
    Dell - IIS5
    Earthlink - Netscape
    E-Bay - IIS4
    HP - Apache
    Intel - IIS6
    Lucent - Netscape
    Motorola - Apache
    National Semiconductor - Netscape
    Nextel - Netscape
    Qualcomm - Netscape
    PC Connection - IIS5

    I can't survey any more companies, because Port80's IIS6 server is slashdotted. However, if is apparent from this data that nearly 1/3rd of all websites that count are hosted on Netscape platforms. Apache and IIS share 1/4th each, and Cisco's odd unix variant wrapps up the rest.

    Personally I'm amazed that Netscape is holding on to a lead... I would have expected them to be out of the running long ago. I'll have to check them out.

  7. And they are running.... by MavEtJu · · Score: 2, Informative

    We detect that www.port80software.com is running Yes we are using ServerMask.

    Date: Thu, 27 Nov 2003 07:15:24 GMT
    Server: Yes we are using ServerMask
    Set-Cookie: It works on cookies too=8, SM130P.5Q..NS12H57M64MP00.N2356; path=/
    Cache-control: private
    Content-Length: 21881
    Connection: keep-alive
    Connection: Keep-Alive
    Content-Type: text/html

    --
    bash$ :(){ :|:&};:
  8. Re:Corporate Web Servers by Sevn · · Score: 3, Informative

    As a career admin who has worked for 15 fortune 100 company as either an employee or a consultant in the past decade, and currently as the project lead replacing an aging proprietary UNIX solution for a telecom spanning an ENTIRE STATE you are on crack. To dot the I's and cross the T's I hired FIVE independant firms to do cost benefit analysis on proprietary versus open source even though I already knew the answer. The long and the short of it is, over a 5 year period for our particular needs the BEST case scenerio for cost with the cheapest possible proprietary solution factoring in maintenance, upfront costs, and scale was 10 million dollars. The highest price for an open source solution was 4.3 million and that was because it was a hybrid solution that was about 50 percent proprietary and not purely open source. The solution I went with was 90 percent debian based (since redhat is doing it's thing, and SuSe is uncertain because of the merger) and 10 percent Solaris/Oracle and will cost an estimated 2.3 million. And for the record I freaking HATE debian but it makes the most sense for this particular situation.

    --
    For every annoying gentoo user, are three even more annoying anti-gentoo crybabies. Take Yosh from #Gimp for example.
  9. Re:Where's Google? by ChaosDiscord · · Score: 4, Informative
    I could not help but notice that Google, Yahoo, and Slashdot are omitted from their "top 1000" list.

    The "top 1,000" list is based on the Fortune 1,000. Google, Yahoo, and Slashdot aren't on the Fortune 1,000. The theory is that the Fortune 1,000 indicates Real Companies, and that this is what Real Companies chose. However, many of these Real Companies are holding companies or target highly specialized audiences (like people needing drilling supplies). Many of these Real Companies are actually running what we would consider toy web sites: almost no content, entirely static pages, very few pages, and almost no visitors. So while this may represent what Real Companies chose, it does not necessarily represent what people with Real Work chose.

  10. Free Software Wins again. by Anonymous Coward · · Score: 5, Informative
    and what would that one line be?I want my $50 worth on my apache server


    • Unpack the Apache distro file (apache_1.x.xx.tar.gz) and run the configure script.

      Now do the following commands:

    • cd src/os/unix
      (With Apache 2.x, cd os/unix)
    • vi os.h
    • Search for:
      #define PLATFORM "Unix"
    • Replace "Unix" with whatever you want your OS identification to be. (Some of the more creative ones I've done are 'NachOS,' 'PathOS,' 'StratOS,' 'ZerOS,' and 'WinDos'...anything.)
    • Save the file.
    • cd ../../include
    • vi httpd.h
      (With Apache 2.x, vi ap_release.h)
    • Search for:
      #define SERVER_BASEVENDOR "Apache Group"
      #define SERVER_BASEPRODUCT "Apache"
      #define SERVER_BASEREVISION "1.x.xx"
    • Replace "Apache" and "1.x.xx" with whatever you want your Server and version number to be. (I recommend "Port80Software-Is-A-Fucking-Ripoff" and "Holy-Jumping-Jesus-This-Was-Easy", respectively.)
    • Save the file.
    • cd ../..
      (With Apache 2.x, cd ..)
    • make

    You're done. Congratulations. You just saved yourself $49 dollars!!!
    1. Re:Free Software Wins again. by ivan.ristic · · Score: 5, Informative

      If you're using mod_security on your Apache server then you only need to add one line to the configuration file:

      SecServerSignature "MyServer/19.5.1"

  11. Re:This makes sense.. by Eivind · · Score: 4, Informative
    Except if you'd bothered to check you would notice that Netcraft is fully aware of this, and thus produce different numbers for "web-servers" and "active web-servers" the latter excludes domains which are only parked somewhere.

    http://news.netcraft.com/archives/2003/11/03/novem ber_2003_web_server_survey.html Is the latest survey, apache has 67.41 of all domains (well, all that Netcraft knows about anyways) at 30298060 domains.

    If you look only at "active" domains, apache has 68.60%, so actually even a *higher* market-share. Of a total of 14370515 active domains. (so according to Netcraft, about half of all registered domains are "active" and the other half are "parked"

  12. The TRUTH is ... by Jerry · · Score: 2, Informative

    that Microsoft's web server installs across ALL TOP DOMAINS have dropped to their 1997 levels, while Apache has almost doubled their 1997 levels. No amount of MS PR cash can change that fact.

    Hiding your IIS server behind a server mask or mis-identifying it as an Apache server isn't going to stop a virus or trojan... they can't read. They just try the exploit and if it works... it works. Not only has that been happening a lot on IIS servers, and MS software in general, the rates of infections/infectors seem to be growing... which explains why Apache had another large jump since last month, and MS has fallen by almost the same amount.

    It's one thing to have your web site broken into, its another thing to pay to have it broken into. That's what you're doing when you buy & install MS web servers and the anti-viral software which supposedly will 'protect' them. It's obvious something is not working....

    --

    Running with Linux for over 20 years!

  13. Re:Where's Google? by jrumney · · Score: 2, Informative
    The "top 1,000" list is based on the Fortune 1,000.

    No, it's not. Look at the examples they gave of "Top 1000" sites that switched to IIS in the last month: CDW (CDWC, Nasdaq-100), Martin Marietta Materials (MLM, not part of any index), Warnaco (WRNC, not part of any index)

  14. Re:It's just plain wrong. by polyp2000 · · Score: 2, Informative

    Why would anybody do that?

    I had a mate that needed to do exactly that. He was running an apache webserver, and as such he was unable to get tech support. His way round this was to have Apache look like IIS by getting it to serve IIS headers.

    nick

    --
    Electronic Music Made Using Linux http://soundcloud.com/polyp
  15. salt by Minna+Kirai · · Score: 2, Informative

    should be taken with a mountain-sized grain of salt

    People who enjoy the taste of salt add it in proportion to the amount of food they intend to eat. "Take with a grain of salt" means "Eat so little that just one grain is adequate seasoning", or just "eat very little". The suggestion to only consume a small amount is meant to imply a low level of trust. It is the opposite of expressions like "Swallow if whole" and "Swallow it hook, line, and sinker".

    Expanding the salt grain to mountainous proportions therefore means that you will accept the survey results with total creduluity.

  16. Re: More results by rduke15 · · Score: 2, Informative
    Well, I should have better things to do, but I couldn't resist looking at the results.

    So with "the nation's 500 fastest-growing private companies, from Inc magazine" data (see parent), the dominance of MS, to my great chagrin, is even worse:
    Total: 440

    57% (254) Microsoft-IIS
    34% (153) Apache
    2% ( 12) Rapidsite
    0% ( 3) Lotus-Domino
    0% ( 3) ConcentricHost-Ashurbanipal
    0% ( 2) Netscape-Enterprise
    0% ( 2) WebSTAR
    0% ( 2) Apache Tomcat
    0% ( 1) Sun-ONE-Web-Server
    0% ( 1) Lasso
    0% ( 1) Apache-AdvancedExtranetServer
    0% ( 1) Stronghold
    0% ( 1) WebSitePro
    0% ( 1) Xitami
    0% ( 1) Zeus
    0% ( 1) NetPr
    0% ( 1) Resin
    Who can find some interesting top-something companies list on which MS would get the low rating it deserves?
  17. Greetings from Port80 Software by jflima · · Score: 2, Informative

    Sorry not to be replying to any particular post, but the sheer volume makes that a little difficult to manage.

    It was good to see that, after a relatively brief spate of misdirected criticisms of our survey as being tainted by pro-Microsoft 'bias,' many contributors here saw that the data itself is pretty uncontroversial (and in fact easily reproducible), and instead began to address themselves to the questions that the survey was intended to raise -- namely, questions about what is an appropriate sampling methodology when attempting to measure HTTP server 'market share.'

    Those are the sorts of conversations we were hoping to start, and it's good to see them under way here with such vigor.

    Just to be clear: We have no real objection to the Netcraft results per se -- only to their being marketed as an unambiguously accurate picture of something called 'Web server market share.' We simply think that sampling this market is a more complicated affair than the endless recitation of the most commonly-sited Netcraft numbers would suggest.

    A number of the contributors here who grant the legitimacy of our criticisms of Netcraft's methodology have raised the point that a sample based on Fortune 1000 sites isn't necessarily a good proxy for Web server market share either. (Since some of these sites are nothing more than glorified brochureware, and so on.) I think that's entirely correct.

    In a sense, our survey simply sets one type of partial snapshot, with its own kind of built-in sampling bias, alongside another. But then our aim wasn't to be definitive. It was simply to remove the halo of definitiveness from the Netcraft survey -- and to get people thinking about what it would take to be definitive in this context.

    And as I say, some of that thinking is on display here. Folks like ChaosDiscord are almost certainly right to suggest that it would be more accurate (or interesting) to sample the server choices of high-traffic sites. We hope to cover some of this territory in future surveys.

    Thanks to all those who looked past the fact that we happen to make commercial software for IIS, and actually engaged with our survey's findings and implications. And happy Thanksgiving to one and all.

    Joe

    Port80 Software