Slashdot Mirror


Statistical Analyzers for HTTP Logs?

krishnaD asks: "I have been using webalizer to generate access log reports for the site but lately my customers are asking for statistics like average amount of time spent by visitors on site, if a person reaches a page X, what is the probability that from here he goes to Y? From which link people exited the site etc. Basically, they are asking for a detailed flow analysis of usage patterns of visitors. Are there any tools that will do this kind of analysis? I love to know what kind of tools other sysadmins use to generate reports for their clients."

39 comments

  1. Karma whore link by Anonymous Coward · · Score: 2, Informative
    1. Re:Karma whore link by balamw · · Score: 2, Informative
      Of that bunch, I must say that I really like Webalizer. It produces really nice looking reports with pie and bar charts and the level of detail can be customized to almost any need. It's also nice that it'll work on both web server logs as well as squid logs....

      Analog may be the most poular, but I also found it rather difficult to set up and get useful data into and out of.

      Balam

  2. Urchin - cream of the crop by proxybyproxy · · Score: 3, Informative

    I have tried webalizer and webtrends, but without a doubt, nr. 1 is Urchin. It really is the cream of the crop, but it costs too. You check out a sample here.

    If you get an account with Verio, you will get your stats in Urchin for free.

    --

    Hurra for Knark!
    1. Re:Urchin - cream of the crop by twodiddyliddy · · Score: 2, Insightful

      I completely agree, urchin rules.

      When I worked for a .gone I was in charge of the stats analysis, and I looked into most of the major programs mentioned in this thread, and nothing comes close to urchin. As far as I see it, it only has two weaknesses:

      - cost
      - not open (there were a few features, I would have loved to add/alter)

      One really beautiful feature it has is that you can incorporate your sales stats with the program. I haven't tried it yet, but from what I remember, it allows you to directly check your sales stats/visitors ratio, where your purchasers came from, how long they stayed etc.

      --
      To mænd sad i en tømmerflåde
    2. Re:Urchin - cream of the crop by demaria · · Score: 3, Informative

      Urchin is fast and awesome.

      But it doesn't have as much detail as other vendors like Webtrends. You can't really do campaign analysis.

      Review of these two

  3. More linkage by Dead+Penis+Bird · · Score: 3, Informative
    --

    If I weren't nailed to the penis, I'd be pushing up the daisies!

  4. WebTrends by krangomatik · · Score: 2, Insightful

    WebTrends offers software like this. We outsource a lot of our web stuff and one of our providers runs WebTrends and our people who like looking at pretty pictures really seem to like it. I have never installed or configured their software so I can't speak for ease of use, but the end user reports are easy to navigate. IIRC you can download a demo from their site and play with it. They do seem to have a demo report you can look at and see if this meets your needs.

    1. Re:WebTrends by jslag · · Score: 5, Informative

      I've used WebTrends for about a year, and couldn't be less impressed. Randomly chokes on logs that webalizer handles without trouble. Hard-to-use interface. Reports a number of things that you really can't tell from web logs.


      On the plus side, the PHBs love it.

    2. Re:WebTrends by ClickNMix · · Score: 1

      I've used WebTrends for about a year, and couldn't be less impressed. > On the plus side, the PHBs love it.
      I have to pretty much aggree to both points here after using it a couple of years back now. It actually seemed to get worse with new versions. And it was pretty costly for what it acutally did.

      --
      I saw the light at the end of the tunnel... But it was just someone with a flashlight bringing more work.
    3. Re:WebTrends by Anonymous Coward · · Score: 2, Interesting

      I used WebTrends in a large corportate environment in which we were getting like 1+ million hits per week. I can tell you that, at least at that time (1999), WebTrends was complete garbage. The numbers were way off (+/- 25% or more) from what the actual logs said. It actually told me once that I had 900,000 hits this week -- and seven million of them occurred on Saturday! In a support call, I actually got one of their engineers to admit that the product sucked and we couldn't count on the numbers to be accurate.

      I wrote some perl scripts and used the GD modules to simulate something close to WebTrends output until I came up with something better.

      Soon I found analog. The charts were not nearly as pretty as WebTrends, but the numbers were accurate and it ran about 15-20 times faster.

      Finally, I found the ReportMagic add on for analog, and I started creating accurate -- and attractive -- reports again.

    4. Re:WebTrends by TheAlmightyQ · · Score: 1

      I've been running WebTrends reports for the management types for about 2 years now, it's painfully slow, taking about an hour to generate the reports for our company intranet site. Just recently I set up with with WebTrendsLive and I have to say it's a big improvement. It generates almost the same reports with the same pretty pictures and graphs, but makes them all up on the fly. Which means less work for me, woohoo!

      --
      I hope you're not pretending to be evil while secretly being good. That would be dishonest.
    5. Re:WebTrends by dubiousmike · · Score: 1

      Webtrends does the same as a lot of the free alternatives out there, but looks a lot prettier.

      Note: Many Mac users by default can not properly navigate through Webtrends Reports (Log Analyzer) due to a Java issue, which is an issue where I work where over half of the boxes are Mac.

  5. Not difficult by zpengo · · Score: 2

    It's not very difficult to implement one from scratch. On the pages you want to track, just call a function that sends the HTTP server variables (and other desired information) to a database. You can then use the IP address as an identifier and track a reader's history through the site. Trace the IPs, and you can get even more information. I've implemented a system on my site that basically tells me that folks from Los Angeles spend less time on a certain page than folks from Newark.

    --


    Got Rhinos?
    1. Re:Not difficult by fdragon · · Score: 5, Informative

      But you cannot tie a particular IP address to a user. You have the problems of AOL users (each request from a different IP address) and corporations (and now many homes) using NAT or PAT devices to make 1 or more users have the same IP address.

      The best way to get around this is setting a session cookie via Apache. Then you key off that.

      --
      The program isn't debugged until the last user is dead.
    2. Re:Not difficult by ClickNMix · · Score: 1
      The best way to get around this is setting a session cookie via Apache. Then you key off that.

      Thats fine for any new logs, but you also want something that works with your old data. Even if its not as easy a solution, or requires a couple of different approaches. I don't think any manager type would be pleased with out a retrospective view. (And if they didnt ask for it, adding it anyway can only help next time you ask them for a pay rise.)

      --
      I saw the light at the end of the tunnel... But it was just someone with a flashlight bringing more work.
    3. Re:Not difficult by damiangerous · · Score: 2
      The best way to get around this is setting a session cookie via Apache. Then you key off that.

      Then you run into people like me who routinely deny cookies unless the site has a valid reason for issuing them. This has become easier than ever for the average user with IE6's cookie management.

    4. Re:Not difficult by Anonymous Coward · · Score: 0

      Cookies? No. Try using all the http header stuff the browser sends before any of that. You'll get make/model/version/OS. Then resort to cookies

  6. W3Perl by Jon+Peterson · · Score: 4, Informative

    Well, if you are looking for free stuff...

    I'd recommend W3Perl http://www.w3perl.com/softs/index.html which is a kind of mess of perl scripts, but is surprisingly fast (much faster than other perl-only stats packages), and is the most full featured free package I've ever come across.

    Set up is kind of a pain - it's rather complex, owing to the vast array of configurable thingies, but it works pretty well once it's put together.

    There are some genuinely innovative features, such as a tree view of your website weighted by the popularity of each branch from /index.html

    Worth a look if you are on a feature hunt. It requires some arcane image generation program to make the pretty graphs.

    Oh, and if you were hoping to explore the code - be aware that the guy who wrote it is French :-)

    --
    ----- .sig: file not found
  7. Sawmill by esme · · Score: 2, Interesting

    A couple of years ago, I did some research for webstats packages for our websites, and came up with a package that I haven't seen mentioned yet: Sawmill is the best tool for the kinds of questions you mentioned -- it can run as a CGI program (or as its own daemon) and does on-the-fly limiting, different reports, etc. So if they want to know what kind of browsers people were using in the Support section at 3am, they can get that.

    I put together a Perl CGI to handle combining logs from all of our different servers, and then feed the combined log to Sawmill (or FunnelWeb, the other package we wound up using).

    -Esme

  8. Net.Genesis by JAZ · · Score: 1

    Did you ever try netgenesis? It aint cheap, but it does a lot of ad hoc reporting rather than the static reports that things like webtrends (and what urchin appears to do - although I'm not really familiar with that)

    They also have an API that you can use to build custom functionality and/or match data against other systems (like a customer database)

    --


    "Karma can only be portioned out by the cosmos." -- Homer Simpson
  9. Assumptions by heikkile · · Score: 2, Interesting
    From which link people exited the site etc

    Do not assume that people browse with just one browser window. I can not speak for others, but normally, when I leave a site, I close that browser where that site was. It is not often I follow a link out. If there are interesting links, I open them in new windows. It is not uncommon for me to have 16-32 windows open, often on 2-4 desktops.

    yes, I know there are tricks to discourage this sort of browsing. Those also doscourage me from visiting the sites, if I can find friendlier alternatives.

    --

    In Murphy We Turst

    1. Re:Assumptions by elfkicker · · Score: 1

      How does browsing like you described effect the "exit page" feature of these programs?

      I always figured they were using some kind of best-guess algorithm...ie. first page off session would be without a local referer, last page of session would be last page visited with a local referer since session start. Pulling the links over to another window, I'm pretty sure, sends the referer over. It might screw with the "visit path" features, but not with session time or exit page.

      Can anyone with more experience shed some light on how it is done?

    2. Re:Assumptions by Anonymous Coward · · Score: 0

      You are a strange person.

      No marketing person gives a shite about you.

  10. add to webalizer by DeadSea · · Score: 2
    Webalizer itself is very configurable but its default configuration leaves a lot to be desired. I maintain a list of search engines and sits that should be added to the configuration of Webalizer to make it a lot more powerful. I also have a log sorting tool there to prevent webalizer from croaking on logs that are just a little bit out of order.

    Even if you don't find stats packages that do what you want, you can make webalizer a lot better.

  11. One word: Excel by YE · · Score: 2, Interesting

    ...is the statistics tool which is probably sufficient for us 99% of the population outside the elite statistician circle.

    1. Re:One word: Excel by xrayspx · · Score: 2

      If you have 700MB per day per server across 5 servers Excel falls down hard. WebTrends falls down hard. The last two companies I worked for both overcame these difficulties with web-bug pixel gifs in the pages to track user agent, referrers and such to a database where we could analyze it.

      The good thing about doing the raw logs is that they give a better idea of how much traffic has passed off the site. This is usually more use to smaller sites that have to pay by bandwidth used, or more specifically, more use to their providers.

      If you're just looking at tracking specific data, there's no easier way than to have all that data written to a database, you can have your webbugs tweaked to save off exactly what you want, you get all your data, no extraneous crap, and you can track whatever.

    2. Re:One word: Excel by FatRatBastard · · Score: 1

      Yucko. Spreadsheets are fine for certain types of dataset, but a real pain in the ass in terms of useability all but the most rudimentry statistical analysis. Give me a Stata command line any day, even if just calculating simple means and standard deviations. For calculating statistics on subsets of data a spreadsheet is going to be an excersize in torture.

  12. ModLogAn by chrysalis · · Score: 4, Interesting

    ModLogAn is the successor of Webalizer.

    It produces similar reports, but it can works with a lot of servers, including FTP servers, firewalls, a bunch of web servers, realserver, shoutcast, squid, etc.

    --
    {{.sig}}
  13. simple but effective by johnnycal · · Score: 0

    Advanced Web Statistics A perl script that parses a combine log and spits out alot of useful info

    --
    yah, I brake it all.....
  14. Sawmill by Enoch · · Score: 1
    We use Sawmill where I work. It is very thorough, and it supports every different type of web log file format I can think of (Apache and IIS are supported, of course). Additionally, it is very thorough in it's reports, graphs, and statisitics with ways to customize such.

    It's not free, but it is very nice.

    Jeremy

  15. phpOpenTracker by g_dancer · · Score: 1

    phpOpenTracker does not rely on logfiles, but seems like addressing your needs.

  16. Sawmill by PatSmarty · · Score: 1

    If you don't mind paying for such a program, I would recommend Sawmill for this task.

  17. if buying a soft is not a problem... by patpro · · Score: 1

    FunnelWeb is quite good, you can even DL a demo.

    http://www.quest.com/funnel_web/analyzer/

  18. We used to use ILux by markhb · · Score: 2

    At my former company, we used ILux. It started as a simple log analyzer with a Java front end, and then evolved into a campaign-analysis/trip-through-the-site-tracking/c ustomize-email-marketing-according-to-their-path-t hrough-the-site behemoth. It costs, and (when I used it) it was cookie-based to enable the site tracking. We also didn't have much luck with it as it evolved, probably because we were using underpowered hardware (we really only wanted the log analyzer it started as) and it was a first release of the expanded product, but you may want to check into an eval.

    --
    Save Maine's economy: write stuff down. All comments are exclusively my own, not my employer.
  19. Analog by tornado_norway · · Score: 0

    Analog is a nice loganalyzer, and fills all of my requirements, but a 100% accurate flow analysis of usage patterns, is not possible. Since HTTP is a stateless protocol, it will be difficult to exactly know what the client is doing.

    --

    --
    "Trying is the first step towards failure."
    -Homer Simpson
  20. WebTrends Log Analyzer by wessman · · Score: 1

    WebTrends Log Analyzer is a great program and easy to find a crack on older versions.

  21. Blue Martini by Niet3sche · · Score: 1

    Was looking into this. I work with a data-mining group and we were going to do a POP project for them. But then things fell through ... so I assume that they still went ahead with this project. ~N~