Alexa Top-500 Sites: Microsoft-IIS vs Apache
on
Apache down, IIS up
·
· Score: 2, Informative
Rather than an all-encompassing census of every IP address that answers to port 80, I'm much more interested
in the web server software that organizations rely upon to handle the biggest sites on the web, those with the most traffic. I downloaded the Alexa Global 500 list of most-visited sites today and polled each of those 500 sites to see what they're running. here's a summary of what i got back:
230 sites with a Server: Apache/* header
115 sites with some other product in their Server: header
86 sites with a Server: Microsoft-IIS/* header
68 sites with an empty Server: header
Apache is used in almost three times more high-traffic sites than IIS.
if you're curious what the 115 other sites announced for their http Server: header, it was mostly GWS and Netscape-Enterprise, here are the top five "others":
43 GWS
17 Netscape-Enterprise
9 Sun-ONE-Web-Server
5 Zeus
5 lighttpd
a comment about the opposite situation -- injecting "code" into your html -- you should consider externalizing javascript/css if it is widely used throughout your website. imagine you have 20 kb of javascript navigation menu code and 10 kb of css that is common to all or almost all pages in your website. by externalizing these two documents, you will improve:
the user experience - users will download the javascript file and the css file ONCE, after which they will reload it from their desktop cache. by eliminating 30 kb of payload per page-view, you will make the site work more quickly for everyone, improving the user experience.
your bandwidth costs & server performance - if you have a high-traffic site and you reduce the bandwidth per page view, you should improve server performance (child apache daemons will not be wasting as much time sending down repeat content) and possibly reduce your bandwidth bill, serve more pages inside of your web-hosting traffic allowance, etc.
SEO - i can imagine some decisions about search relevance based on the ratio of code to content in your documents. if you have 30k of css and javascript in each page, to support an average of 1k of unique content, that makes your documents look awfully similar. imagine if you strip out the redundant code via external css/javascript links. now each document in your site contains just the unique content of interest to web search users.
there are lots of challenges to accurately sizing an audience. i think that acceptance of audience size based solely upon unique IP addresses waned around 1999 when methods employing javascript and cookies came into vogue. there's always the option of counting registered users too.
all of these techniques help quantify unique users and monitor the trends in their online behavior. as far as noise in unique IP counting, i think that the biggest issue with relying soley upon unique IPs is that a simple count of unique IP addresses will include all robot noise. the major web searche engine spiders will not influence this count much, but the gratuitous IPs logged by script-kiddie bots can eclipse the human traffic on smaller sites.
the market for home routers is very competitive, and there is little to help customers distinguish between d-link vs. netgear vs. linksys. learning that my d-link di-624 router is a "gross polluter" is a big incentive to upgrade to another brand -- they're cheap devices that get replaced every couple of years anyway.
hearing about d-link's inept implementation of ntp makes me wonder what other shortcomings may be baked into the various d-link products i've purchased over the years. when the product is a commodity such as a network card or a home router, its a very easy decision for customers to switch brands when they learn that d-link has made a major mistake that they are unable to correct after ~120 days of private communication with the victim of their DDoS.
Is it possible to create an internet that relies instead on peer-to-peer connectivity? How would the hardware work? How would the information be passed? What would be the incentive for average people to buy into it if it meant they'd have to host someone else's packets on their hard drive?
we had this type of backboneless internet, once upon a time, operating under various names:
Oracle Mergers & Acquisitions probably got wind of MySQL via this Slashdot Thread: MySQL To Be Ikea Of The Database Market and moved to snap up a piece of the Ikea action.
Seriously, MySQL doesn't perceive itself as an Oracle competitor, but Oracle may very well perceive MySQL as a competitive threat. Employing the developers who write the back-end for the paying customers of MySQL is a nice way Oracle can influence their competitor.
My work has an application based on mysql that needs to add ACID compliance, and PostgreSQL just got a whole lot more attractive.
the commercial ssh.com site appears to draw a bigger audience (and thus, a better alexa ranking) than the free openssh.com site. if the more popular, better-known software (ssh, commercial) wants to call attention to a free competitor (openssh, free), that's their mistake, and i hope the openssh community benefits from it!
have you looked into scalable vector graphics format (svg)?
it is a mature spec published by the W3C, and like macromedia flash, svg stands to become much more popular once it is distributed with web browsers.
there have been svg browser plugins for some time; now native svg is included with firefox on ms-windows, and scheduled for inclusion with firefox and mozilla. here are some SVG and SVG animation links for you:
if you download and install it as of 10am PST today, its going to try and install a cron job that begins:
-*/10 * * * * ps x | grep...
which vixie cron (and presumably others) rejects as invalid. i just changed it to run every 10 minutes like:
*/10 * * * * ps x | grep...
hth
i would view contract-to-hire as a mutual try-before-you-buy period, during which you get a feel for the workplace and the employer evaluates your performance. if the contract is not renewed, your resume just lists a contract job that was completed.
contractors in the usa are often reported to the IRS via form 1099, with no withholdings or employer contributions for medicare, social security, health insurance, state and federal tax, disability, etc., so you would should expect to pay for those out of pocket before you count your loot.
unless the author of this sensational article reviewed their httpd logs for the user agent 'msnbot' clear back to 2003, they have not ruled out the possibility that microsoft's spider simply crawled the site in question, before msn search was a tech news feature. brett tabke's webmasterworld forums mention sitings of msrbot from microsoft in april 2003, and widespread msnbot activity starting december 2003.
its also possible that microsoft seeded their search index by licensing it from a comparable index source, e.g. the alexa crawl.
i sometimes liken system and network admin to being a coal stoker in the basement of a big building, just shoveling coal into the furnace 24/7 to keep the business above running.
punchline of your story is that they fired the (only?) full time system administrator.
haven't seen any thoughts on afbackup and burt. i auditioned both last summer for network backups of multiple 9GB machines over ssh and settled on afbackup to a DDS-2 DAT.
afbackup is pretty painless to setup, speedy backups, can run over ssh, prompts by email when tape changes are needed, reasonable restores of entire backup sets, but is very slow for selected file restores.
burt is wicked fast for backups, tcl-based interface, imho elegant, and can run over ssh. afbackup was better documented and offered an emergency restore option that i preferred at the time.
i ruled out amanda because it is complex and tends to want a holding disk the size of an entire backup set.
web-dav (web distributed authoring and versioning) is a good first place to check, i think with the appropriate apache module it supports in-place page locking and editing via msie 5. if your user community is semi literate, you might also look into cvs to manage web development, which is easier and more effective imho. HTH
scoop certainly doesn't owe the linux community anything -- we owe him, for putting up and a much-needed service just in time for the linux mass-media explosion, then continuing to refine it with countless unpaid hours. scoop's unheralded work is similar to many major contributors to the linux community over the years. scoop's decision to turn off freshmeat in response to unfavorable email leaves me wondering two things on the freshmeat meltdown and linux community:
stability? what other linux community services are subject to disappearing when their creator/maintainer has a bad email day? in the big picture of linux advocacy it shows a scary achilles heel (weakness) when one of the information hubs of a major Internet phenomenon can be closed down by hurt feelings. alternatives for solving community gripes may be preferrable to highly visible linux web sites going on strike like this. will be interesting to see if the news.com and wired type sites cover this incident...
language? i have corresponded with scoop briefly 2-3x and recall he is in germany (true?) is there any chance that this scoop is not only being harrassed, but having to wade through a number of carelessly worded attacks that lose everything but their nasty tone in the translation to german language?
hope everything can be resolved soon. i enjoyed the freshmeat redesign for a few hours before the entire site was yanked. --sean
it would be great if tesseract could augment the gocr-based FuzzyOCR and OCR plugins for SpamAssassin.
- 230 sites with a Server: Apache/* header
Apache is used in almost three times more high-traffic sites than IIS. if you're curious what the 115 other sites announced for their http Server: header, it was mostly GWS and Netscape-Enterprise, here are the top five "others":115 sites with some other product in their Server: header
86 sites with a Server: Microsoft-IIS/* header
68 sites with an empty Server: header
17 Netscape-Enterprise
9 Sun-ONE-Web-Server
5 Zeus
5 lighttpd
all of these techniques help quantify unique users and monitor the trends in their online behavior. as far as noise in unique IP counting, i think that the biggest issue with relying soley upon unique IPs is that a simple count of unique IP addresses will include all robot noise. the major web searche engine spiders will not influence this count much, but the gratuitous IPs logged by script-kiddie bots can eclipse the human traffic on smaller sites.
the market for home routers is very competitive, and there is little to help customers distinguish between d-link vs. netgear vs. linksys. learning that my d-link di-624 router is a "gross polluter" is a big incentive to upgrade to another brand -- they're cheap devices that get replaced every couple of years anyway.
hearing about d-link's inept implementation of ntp makes me wonder what other shortcomings may be baked into the various d-link products i've purchased over the years. when the product is a commodity such as a network card or a home router, its a very easy decision for customers to switch brands when they learn that d-link has made a major mistake that they are unable to correct after ~120 days of private communication with the victim of their DDoS.
Seriously, MySQL doesn't perceive itself as an Oracle competitor, but Oracle may very well perceive MySQL as a competitive threat. Employing the developers who write the back-end for the paying customers of MySQL is a nice way Oracle can influence their competitor.
My work has an application based on mysql that needs to add ACID compliance, and PostgreSQL just got a whole lot more attractive.
the commercial ssh.com site appears to draw a bigger audience (and thus, a better alexa ranking) than the free openssh.com site. if the more popular, better-known software (ssh, commercial) wants to call attention to a free competitor (openssh, free), that's their mistake, and i hope the openssh community benefits from it!
there have been svg browser plugins for some time; now native svg is included with firefox on ms-windows, and scheduled for inclusion with firefox and mozilla. here are some SVG and SVG animation links for you:
if you download and install it as of 10am PST today, its going to try and install a cron job that begins:
-*/10 * * * * ps x | grep...
which vixie cron (and presumably others) rejects as invalid. i just changed it to run every 10 minutes like:
*/10 * * * * ps x | grep...
hth
i would view contract-to-hire as a mutual try-before-you-buy period, during which you get a feel for the workplace and the employer evaluates your performance. if the contract is not renewed, your resume just lists a contract job that was completed.
contractors in the usa are often reported to the IRS via form 1099, with no withholdings or employer contributions for medicare, social security, health insurance, state and federal tax, disability, etc., so you would should expect to pay for those out of pocket before you count your loot.
good luck!
unless the author of this sensational article reviewed their httpd logs for the user agent 'msnbot' clear back to 2003, they have not ruled out the possibility that microsoft's spider simply crawled the site in question, before msn search was a tech news feature. brett tabke's webmasterworld forums mention sitings of msrbot from microsoft in april 2003, and widespread msnbot activity starting december 2003. its also possible that microsoft seeded their search index by licensing it from a comparable index source, e.g. the alexa crawl.
has anybody succeeded in verifying one of the domainkey headers from a gmail message?
- delany-d omainkeys-base-01.txt
... s=beta; d=gmail.com; ...
after reading the ietf draft:
http://www.ietf.org/internet-drafts/draft
if this is in the message header:
DomainKey-Signature:
i think you should be able to retrieve the public key necessary to verify it by querying dns for a txt record for
beta.gmail.com
but i don't get anything back in the answer section when i run
dig TXT beta.gmail.com
anyone have better luck verifying one of these messages? or is the gmail domainkeys implementation incomplete at present?
punchline of your story is that they fired the (only?) full time system administrator.
afbackup is pretty painless to setup, speedy backups, can run over ssh, prompts by email when tape changes are needed, reasonable restores of entire backup sets, but is very slow for selected file restores.
burt is wicked fast for backups, tcl-based interface, imho elegant, and can run over ssh. afbackup was better documented and offered an emergency restore option that i preferred at the time.
i ruled out amanda because it is complex and tends to want a holding disk the size of an entire backup set.
web-dav (web distributed authoring and versioning) is a good first place to check, i think with the appropriate apache module it supports in-place page locking and editing via msie 5. if your user community is semi literate, you might also look into cvs to manage web development, which is easier and more effective imho. HTH
hope everything can be resolved soon. i enjoyed the freshmeat redesign for a few hours before the entire site was yanked.
--sean