Statistical Analyzers for HTTP Logs?
krishnaD asks: "I have been using webalizer to generate access
log reports for the site but lately my customers
are asking for statistics like average amount of time spent by visitors on site, if a person reaches a page X, what is the probability that from here he goes to Y? From which link people exited the site etc. Basically, they are asking for a detailed flow analysis of usage patterns of visitors. Are there any tools that will do this kind of analysis? I love to know what kind of tools other sysadmins use to generate reports for their clients."
A couple of years ago, I did some research for webstats packages for our websites, and came up with a package that I haven't seen mentioned yet: Sawmill is the best tool for the kinds of questions you mentioned -- it can run as a CGI program (or as its own daemon) and does on-the-fly limiting, different reports, etc. So if they want to know what kind of browsers people were using in the Support section at 3am, they can get that.
I put together a Perl CGI to handle combining logs from all of our different servers, and then feed the combined log to Sawmill (or FunnelWeb, the other package we wound up using).
-Esme
Do not assume that people browse with just one browser window. I can not speak for others, but normally, when I leave a site, I close that browser where that site was. It is not often I follow a link out. If there are interesting links, I open them in new windows. It is not uncommon for me to have 16-32 windows open, often on 2-4 desktops.
yes, I know there are tricks to discourage this sort of browsing. Those also doscourage me from visiting the sites, if I can find friendlier alternatives.
In Murphy We Turst
...is the statistics tool which is probably sufficient for us 99% of the population outside the elite statistician circle.
ModLogAn is the successor of Webalizer.
It produces similar reports, but it can works with a lot of servers, including FTP servers, firewalls, a bunch of web servers, realserver, shoutcast, squid, etc.
{{.sig}}
I used WebTrends in a large corportate environment in which we were getting like 1+ million hits per week. I can tell you that, at least at that time (1999), WebTrends was complete garbage. The numbers were way off (+/- 25% or more) from what the actual logs said. It actually told me once that I had 900,000 hits this week -- and seven million of them occurred on Saturday! In a support call, I actually got one of their engineers to admit that the product sucked and we couldn't count on the numbers to be accurate.
I wrote some perl scripts and used the GD modules to simulate something close to WebTrends output until I came up with something better.
Soon I found analog. The charts were not nearly as pretty as WebTrends, but the numbers were accurate and it ran about 15-20 times faster.
Finally, I found the ReportMagic add on for analog, and I started creating accurate -- and attractive -- reports again.