I used to analyze corporate web metrics and my comment about this article is that a bit more information should have been provided about the methods that he used to filter spiders and other automated retrieval systems, especially if you are going to make your own claims about browser/operating system share.
I have seen as much as 30% of activity on a single page be due to some some sort of automated retrieval.
It looks like he paid quite a bit of attention of the wide variety of user agent strings (kudos!) and I'm guessing that much or all of his filtering was based on his analysis of these strings.
Unfortunately, this is not enough. Many spiders identify themselves as some Windows/Internet Explorer variation without any indicator that they are a spider, business intelligence application, or other non-eyeball-on-the-page type browser.
One lo-fi suggestion for folks who want to check their current spider filter setup is to create a temporary honeypot link on one of their pages. An invisible hyperlink to a non-production page using a 1x1px transparent gif is enough. Then run a metrics report on that non-production page and include the IP addresses (watch for entire subnets of crawlers) in your production filters.
Also, I assume that whenever the author says "requests" he means "page views." It is good to make that specific distinction and focus solely on page views only for any kind of metrics reporting that is about eyeballs rather than server load. It provides for apples-apples comparison between browsers that might be more conservative at caching and re-using images, etc.
I used to analyze corporate web metrics and my comment about this article is that a bit more information should have been provided about the methods that he used to filter spiders and other automated retrieval systems, especially if you are going to make your own claims about browser/operating system share.
I have seen as much as 30% of activity on a single page be due to some some sort of automated retrieval.
It looks like he paid quite a bit of attention of the wide variety of user agent strings (kudos!) and I'm guessing that much or all of his filtering was based on his analysis of these strings.
Unfortunately, this is not enough. Many spiders identify themselves as some Windows/Internet Explorer variation without any indicator that they are a spider, business intelligence application, or other non-eyeball-on-the-page type browser.
One lo-fi suggestion for folks who want to check their current spider filter setup is to create a temporary honeypot link on one of their pages. An invisible hyperlink to a non-production page using a 1x1px transparent gif is enough. Then run a metrics report on that non-production page and include the IP addresses (watch for entire subnets of crawlers) in your production filters.
Also, I assume that whenever the author says "requests" he means "page views." It is good to make that specific distinction and focus solely on page views only for any kind of metrics reporting that is about eyeballs rather than server load. It provides for apples-apples comparison between browsers that might be more conservative at caching and re-using images, etc.
cheers
thx
check the 100,000 to 600,000 page and Market Share pages for WoW and EQ2 numbers.