Slashdot Mirror


Statistical Analyzers for HTTP Logs?

krishnaD asks: "I have been using webalizer to generate access log reports for the site but lately my customers are asking for statistics like average amount of time spent by visitors on site, if a person reaches a page X, what is the probability that from here he goes to Y? From which link people exited the site etc. Basically, they are asking for a detailed flow analysis of usage patterns of visitors. Are there any tools that will do this kind of analysis? I love to know what kind of tools other sysadmins use to generate reports for their clients."

20 of 39 comments (clear)

  1. Karma whore link by Anonymous Coward · · Score: 2, Informative
    1. Re:Karma whore link by balamw · · Score: 2, Informative
      Of that bunch, I must say that I really like Webalizer. It produces really nice looking reports with pie and bar charts and the level of detail can be customized to almost any need. It's also nice that it'll work on both web server logs as well as squid logs....

      Analog may be the most poular, but I also found it rather difficult to set up and get useful data into and out of.

      Balam

  2. Urchin - cream of the crop by proxybyproxy · · Score: 3, Informative

    I have tried webalizer and webtrends, but without a doubt, nr. 1 is Urchin. It really is the cream of the crop, but it costs too. You check out a sample here.

    If you get an account with Verio, you will get your stats in Urchin for free.

    --

    Hurra for Knark!
    1. Re:Urchin - cream of the crop by twodiddyliddy · · Score: 2, Insightful

      I completely agree, urchin rules.

      When I worked for a .gone I was in charge of the stats analysis, and I looked into most of the major programs mentioned in this thread, and nothing comes close to urchin. As far as I see it, it only has two weaknesses:

      - cost
      - not open (there were a few features, I would have loved to add/alter)

      One really beautiful feature it has is that you can incorporate your sales stats with the program. I haven't tried it yet, but from what I remember, it allows you to directly check your sales stats/visitors ratio, where your purchasers came from, how long they stayed etc.

      --
      To mænd sad i en tømmerflåde
    2. Re:Urchin - cream of the crop by demaria · · Score: 3, Informative

      Urchin is fast and awesome.

      But it doesn't have as much detail as other vendors like Webtrends. You can't really do campaign analysis.

      Review of these two

  3. More linkage by Dead+Penis+Bird · · Score: 3, Informative
    --

    If I weren't nailed to the penis, I'd be pushing up the daisies!

  4. WebTrends by krangomatik · · Score: 2, Insightful

    WebTrends offers software like this. We outsource a lot of our web stuff and one of our providers runs WebTrends and our people who like looking at pretty pictures really seem to like it. I have never installed or configured their software so I can't speak for ease of use, but the end user reports are easy to navigate. IIRC you can download a demo from their site and play with it. They do seem to have a demo report you can look at and see if this meets your needs.

    1. Re:WebTrends by jslag · · Score: 5, Informative

      I've used WebTrends for about a year, and couldn't be less impressed. Randomly chokes on logs that webalizer handles without trouble. Hard-to-use interface. Reports a number of things that you really can't tell from web logs.


      On the plus side, the PHBs love it.

    2. Re:WebTrends by Anonymous Coward · · Score: 2, Interesting

      I used WebTrends in a large corportate environment in which we were getting like 1+ million hits per week. I can tell you that, at least at that time (1999), WebTrends was complete garbage. The numbers were way off (+/- 25% or more) from what the actual logs said. It actually told me once that I had 900,000 hits this week -- and seven million of them occurred on Saturday! In a support call, I actually got one of their engineers to admit that the product sucked and we couldn't count on the numbers to be accurate.

      I wrote some perl scripts and used the GD modules to simulate something close to WebTrends output until I came up with something better.

      Soon I found analog. The charts were not nearly as pretty as WebTrends, but the numbers were accurate and it ran about 15-20 times faster.

      Finally, I found the ReportMagic add on for analog, and I started creating accurate -- and attractive -- reports again.

  5. Not difficult by zpengo · · Score: 2

    It's not very difficult to implement one from scratch. On the pages you want to track, just call a function that sends the HTTP server variables (and other desired information) to a database. You can then use the IP address as an identifier and track a reader's history through the site. Trace the IPs, and you can get even more information. I've implemented a system on my site that basically tells me that folks from Los Angeles spend less time on a certain page than folks from Newark.

    --


    Got Rhinos?
    1. Re:Not difficult by fdragon · · Score: 5, Informative

      But you cannot tie a particular IP address to a user. You have the problems of AOL users (each request from a different IP address) and corporations (and now many homes) using NAT or PAT devices to make 1 or more users have the same IP address.

      The best way to get around this is setting a session cookie via Apache. Then you key off that.

      --
      The program isn't debugged until the last user is dead.
    2. Re:Not difficult by damiangerous · · Score: 2
      The best way to get around this is setting a session cookie via Apache. Then you key off that.

      Then you run into people like me who routinely deny cookies unless the site has a valid reason for issuing them. This has become easier than ever for the average user with IE6's cookie management.

  6. W3Perl by Jon+Peterson · · Score: 4, Informative

    Well, if you are looking for free stuff...

    I'd recommend W3Perl http://www.w3perl.com/softs/index.html which is a kind of mess of perl scripts, but is surprisingly fast (much faster than other perl-only stats packages), and is the most full featured free package I've ever come across.

    Set up is kind of a pain - it's rather complex, owing to the vast array of configurable thingies, but it works pretty well once it's put together.

    There are some genuinely innovative features, such as a tree view of your website weighted by the popularity of each branch from /index.html

    Worth a look if you are on a feature hunt. It requires some arcane image generation program to make the pretty graphs.

    Oh, and if you were hoping to explore the code - be aware that the guy who wrote it is French :-)

    --
    ----- .sig: file not found
  7. Sawmill by esme · · Score: 2, Interesting

    A couple of years ago, I did some research for webstats packages for our websites, and came up with a package that I haven't seen mentioned yet: Sawmill is the best tool for the kinds of questions you mentioned -- it can run as a CGI program (or as its own daemon) and does on-the-fly limiting, different reports, etc. So if they want to know what kind of browsers people were using in the Support section at 3am, they can get that.

    I put together a Perl CGI to handle combining logs from all of our different servers, and then feed the combined log to Sawmill (or FunnelWeb, the other package we wound up using).

    -Esme

  8. Assumptions by heikkile · · Score: 2, Interesting
    From which link people exited the site etc

    Do not assume that people browse with just one browser window. I can not speak for others, but normally, when I leave a site, I close that browser where that site was. It is not often I follow a link out. If there are interesting links, I open them in new windows. It is not uncommon for me to have 16-32 windows open, often on 2-4 desktops.

    yes, I know there are tricks to discourage this sort of browsing. Those also doscourage me from visiting the sites, if I can find friendlier alternatives.

    --

    In Murphy We Turst

  9. add to webalizer by DeadSea · · Score: 2
    Webalizer itself is very configurable but its default configuration leaves a lot to be desired. I maintain a list of search engines and sits that should be added to the configuration of Webalizer to make it a lot more powerful. I also have a log sorting tool there to prevent webalizer from croaking on logs that are just a little bit out of order.

    Even if you don't find stats packages that do what you want, you can make webalizer a lot better.

  10. One word: Excel by YE · · Score: 2, Interesting

    ...is the statistics tool which is probably sufficient for us 99% of the population outside the elite statistician circle.

    1. Re:One word: Excel by xrayspx · · Score: 2

      If you have 700MB per day per server across 5 servers Excel falls down hard. WebTrends falls down hard. The last two companies I worked for both overcame these difficulties with web-bug pixel gifs in the pages to track user agent, referrers and such to a database where we could analyze it.

      The good thing about doing the raw logs is that they give a better idea of how much traffic has passed off the site. This is usually more use to smaller sites that have to pay by bandwidth used, or more specifically, more use to their providers.

      If you're just looking at tracking specific data, there's no easier way than to have all that data written to a database, you can have your webbugs tweaked to save off exactly what you want, you get all your data, no extraneous crap, and you can track whatever.

  11. ModLogAn by chrysalis · · Score: 4, Interesting

    ModLogAn is the successor of Webalizer.

    It produces similar reports, but it can works with a lot of servers, including FTP servers, firewalls, a bunch of web servers, realserver, shoutcast, squid, etc.

    --
    {{.sig}}
  12. We used to use ILux by markhb · · Score: 2

    At my former company, we used ILux. It started as a simple log analyzer with a Java front end, and then evolved into a campaign-analysis/trip-through-the-site-tracking/c ustomize-email-marketing-according-to-their-path-t hrough-the-site behemoth. It costs, and (when I used it) it was cookie-based to enable the site tracking. We also didn't have much luck with it as it evolved, probably because we were using underpowered hardware (we really only wanted the log analyzer it started as) and it was a first release of the expanded product, but you may want to check into an eval.

    --
    Save Maine's economy: write stuff down. All comments are exclusively my own, not my employer.