Statistical Analyzers for HTTP Logs?
krishnaD asks: "I have been using webalizer to generate access
log reports for the site but lately my customers
are asking for statistics like average amount of time spent by visitors on site, if a person reaches a page X, what is the probability that from here he goes to Y? From which link people exited the site etc. Basically, they are asking for a detailed flow analysis of usage patterns of visitors. Are there any tools that will do this kind of analysis? I love to know what kind of tools other sysadmins use to generate reports for their clients."
Freshmeat Internet::Log Analysis
I have tried webalizer and webtrends, but without a doubt, nr. 1 is Urchin. It really is the cream of the crop, but it costs too. You check out a sample here.
If you get an account with Verio, you will get your stats in Urchin for free.
Hurra for Knark!
Access Analyzers (Uppsala University)
Log Analyzers (reallybig.com)
Web Log Analyzers (2K Communications)
If I weren't nailed to the penis, I'd be pushing up the daisies!
WebTrends offers software like this. We outsource a lot of our web stuff and one of our providers runs WebTrends and our people who like looking at pretty pictures really seem to like it. I have never installed or configured their software so I can't speak for ease of use, but the end user reports are easy to navigate. IIRC you can download a demo from their site and play with it. They do seem to have a demo report you can look at and see if this meets your needs.
It's not very difficult to implement one from scratch. On the pages you want to track, just call a function that sends the HTTP server variables (and other desired information) to a database. You can then use the IP address as an identifier and track a reader's history through the site. Trace the IPs, and you can get even more information. I've implemented a system on my site that basically tells me that folks from Los Angeles spend less time on a certain page than folks from Newark.
Got Rhinos?
Well, if you are looking for free stuff...
/index.html
:-)
I'd recommend W3Perl http://www.w3perl.com/softs/index.html which is a kind of mess of perl scripts, but is surprisingly fast (much faster than other perl-only stats packages), and is the most full featured free package I've ever come across.
Set up is kind of a pain - it's rather complex, owing to the vast array of configurable thingies, but it works pretty well once it's put together.
There are some genuinely innovative features, such as a tree view of your website weighted by the popularity of each branch from
Worth a look if you are on a feature hunt. It requires some arcane image generation program to make the pretty graphs.
Oh, and if you were hoping to explore the code - be aware that the guy who wrote it is French
-----
A couple of years ago, I did some research for webstats packages for our websites, and came up with a package that I haven't seen mentioned yet: Sawmill is the best tool for the kinds of questions you mentioned -- it can run as a CGI program (or as its own daemon) and does on-the-fly limiting, different reports, etc. So if they want to know what kind of browsers people were using in the Support section at 3am, they can get that.
I put together a Perl CGI to handle combining logs from all of our different servers, and then feed the combined log to Sawmill (or FunnelWeb, the other package we wound up using).
-Esme
Did you ever try netgenesis? It aint cheap, but it does a lot of ad hoc reporting rather than the static reports that things like webtrends (and what urchin appears to do - although I'm not really familiar with that)
They also have an API that you can use to build custom functionality and/or match data against other systems (like a customer database)
"Karma can only be portioned out by the cosmos." -- Homer Simpson
Do not assume that people browse with just one browser window. I can not speak for others, but normally, when I leave a site, I close that browser where that site was. It is not often I follow a link out. If there are interesting links, I open them in new windows. It is not uncommon for me to have 16-32 windows open, often on 2-4 desktops.
yes, I know there are tricks to discourage this sort of browsing. Those also doscourage me from visiting the sites, if I can find friendlier alternatives.
In Murphy We Turst
Even if you don't find stats packages that do what you want, you can make webalizer a lot better.
...is the statistics tool which is probably sufficient for us 99% of the population outside the elite statistician circle.
ModLogAn is the successor of Webalizer.
It produces similar reports, but it can works with a lot of servers, including FTP servers, firewalls, a bunch of web servers, realserver, shoutcast, squid, etc.
{{.sig}}
Advanced Web Statistics A perl script that parses a combine log and spits out alot of useful info
yah, I brake it all.....
It's not free, but it is very nice.
Jeremy
phpOpenTracker does not rely on logfiles, but seems like addressing your needs.
If you don't mind paying for such a program, I would recommend Sawmill for this task.
FunnelWeb is quite good, you can even DL a demo.
http://www.quest.com/funnel_web/analyzer/
At my former company, we used ILux. It started as a simple log analyzer with a Java front end, and then evolved into a campaign-analysis/trip-through-the-site-tracking/c ustomize-email-marketing-according-to-their-path-t hrough-the-site behemoth. It costs, and (when I used it) it was cookie-based to enable the site tracking. We also didn't have much luck with it as it evolved, probably because we were using underpowered hardware (we really only wanted the log analyzer it started as) and it was a first release of the expanded product, but you may want to check into an eval.
Save Maine's economy: write stuff down. All comments are exclusively my own, not my employer.
Analog is a nice loganalyzer, and fills all of my requirements, but a 100% accurate flow analysis of usage patterns, is not possible. Since HTTP is a stateless protocol, it will be difficult to exactly know what the client is doing.
--
"Trying is the first step towards failure."
-Homer Simpson
WebTrends Log Analyzer is a great program and easy to find a crack on older versions.
Was looking into this. I work with a data-mining group and we were going to do a POP project for them. But then things fell through ... so I assume that they still went ahead with this project.
~N~