Slashdot Mirror


Web Log Analyzers?

sammy.lost-angel.com asks: "What's the best web log analyzer out there today? It's time to upgrade our horribly out of date one and I'm not sure what's good out there at this time. Our site receives about 50,000 hits a day, so things like remembering what's already been analyzed can save a lot of time." What about log analyzers that can work on more than one type of web server? An analyzer that could parse access data for, say, IIS and Apache would be a nice tool!

20 of 31 comments (clear)

  1. My 2 bits about 2 packages by azephrahel · · Score: 2, Insightful

    Well the two web log analyzers I worked with at my old job were,
    WebTrends Professional
    and WebSphere Site Analyzer.

    Bottom line with WebTrends is, its junk. It costs a bundle, is more expensive for the unix version, and you need one base liscence for the first machine who's logs you want to analyze, plus one supplimental liscence for each additional machine. If your site spans four boxes, you need a base+3 additonal. PRICEY! To boot it is not very configurable, and it has a hellova time counting user sessions by custom cookies.

    WebSphere SiteAnalyzer on the other hand is a behmoth of a program. It requires far to many resource to run, takes forever to properly configure, and needs a tweaked version of DB2. On the plus side its highly configurable, and comes "Free" with websphere server afaik. You can count anyting on anything if you really want to, and you don't need to get a special version to do your own querries against the data. All the data is in DB2, so you are free to probe the data all on your lonesome. With Webtrends you need a special version to get access to the database, and then the access is only with their propiretary libs. Of course the other big plus for SiteAnalyzer is that it has a client server model, and the both can run on Linux, Solaris, HPuX, windos..etc.

    To be honest those are the two biggies for comercial site analysis software, and neither are that good. Check out some of the OS offerings, prehaps one of them will work for you :)

    --
    You are only young once, but you can stay immature indefinitely.
  2. Analog and Webalizer by rakerman · · Score: 3

    http://www.analog.cx/

    http://www.webalizer.com/

    1. Re:Analog and Webalizer by larien · · Score: 2
      The fact is, crunching 6 months of logs isn't that big a problem. You probably only want to run reports covering:
      1. The total year (run once a year)
      2. The total month (run once a month)
      3. The last N days/weeks/months (I'd have said the last 30 days would be good enough; run this daily or weekly, depending on your tastes)
      Only the 3rd option there should cause any trouble; the others are run infrequently enough that you don't care if they take a while to run. If you're worried about affecting your web server while crunching the data, remember that you don't have to run the analyzer on the web server!

      Personally, I've used Analog as listed above and found it to be pretty good, once you get the configuration working (which is a once off thing). It will also work on compressed logs, IIRC, so you can even save some disk space (at the expense of more CPU time at analysis).

    2. Re:Analog and Webalizer by Stephen · · Score: 2
      [Analog] will also work on compressed logs, IIRC, so you can even save some disk space (at the expense of more CPU time at analysis).
      ... but less elapsed time. Analog is primarily limited by disk speed, so you will get the results sooner from compressed logs than uncompressed ones. Strange but true.
      --
      11.00100100001111110110101010001000100001011010001 1000010001101001100010011
    3. Re:Analog and Webalizer by Stephen · · Score: 3, Informative
      ...it can be hard to dig through the [analog] documentation.
      I (the author) have some sympathy with this; but the main problem is that it's so configurable that there just are a lot of commands.

      I have done some work recently on presenting the documentation in different ways. As well as the main topic-based documentation, there's now a page with only the most basic commands for beginners; a comprehensive index; all the commands on a single page with a BNF-type grammar; and two sample configuration files with all the commands in, one in topic order and one in report order. There's also the beginnings of a collection of third-party HOWTO's (for which I need more volunteers, HINT HINT!).

      I do take a lot of time and trouble over documentation, I suspect much more than most open source projects. My rule is that no change can be committed until it's fully documented. So you will never find the documentation lagging behind the reality, or options missed out of the documentation. I also spend a lot of time rephrasing the existing documentation.

      --
      11.00100100001111110110101010001000100001011010001 1000010001101001100010011
    4. Re:Analog and Webalizer by larien · · Score: 2
      I guess it depends on the disks and the CPU; you can run a SCSI-2/Fibre hard drive (not to mention striping/mirroring etc) off a fairly crummy CPU and you'd probably find the CPU as the bottleneck. On the other hand, an old hard drive on a P4/Athlon would probably be better compressed.

      The point is well made, though; hard disks tend to be the bottleneck on today's systems with clock speeds in the gigahertz.

  3. I have to agree by Mustang+Matt · · Score: 2

    I'm not sure about the other package, but Webtrends leaves much to be desired.

    --
    The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin
  4. What's it matter what server generates the logs? by barzok · · Score: 2, Informative

    Last I checked, both IIS and Apache generate (or can be set to generate) W3C standard format logfiles. Part of the reason for having/using that standard is so that you don't get locked into a proprietary tool.

  5. seconded by mattdm · · Score: 2

    yah. And analog is *fast*.

  6. webalizer & awstats by EvilStein · · Score: 2, Informative

    awstats (awstats.sourceforge.net) for the IIS logs, but it's kind of funky to set up...

    and webalizer (www.mrunix.net/webalizer) for the Apache logs.

    awstats is Perl, too...

    I've used them both and since I have only Apache to log, I've stuck with webalizer. Plus, you can easily customize it for each user/domain with its own webalizer.conf file.

  7. AWstats rocks! by OctaneZ · · Score: 3, Informative

    I have been running AWStats since July, and I absolutely love it. It does not provide the fine-grain detail that many people need, and which can be provided by Analog. But it does provide exactly what 90% percent of us need, in an easy to view package. It creates an easy to understand page about many aspects of your site, including, users, page hits, countries, languages, OS, browser, spiders/robots, access times; it's great! It is also a GPLed perl script! The developement team is over at Source Forge and is actively releasing new code all the time. It also has the added benefit of allowing cgi updating through a web page; simply putting the script in your /www/cgi-bin/ directory and adding appropriate permissions allows you to get up to the second information about your sight without having to dig up a terminal! Definately check this package out!
    -OctaneZ

    1. Re:AWstats rocks! by damiam · · Score: 2, Insightful
      "I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones."

      If you're going to quote someone, at least give them proper credit:
      "I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones." - Albert Einstein

      --
      It's hard to be religious when certain people are never incinerated by bolts of lightning.
  8. Analog by Stephen · · Score: 5, Informative
    I'd like to plug analog. I'm the author, so read my comments in that light. :-)

    First, as others have commented, the commercial programs suck, especially Webtrends.

    Analog is over six years old, but it's still actively developed, and I think it's still the leading free log analyser. The main contender is the Webalizer. To some extent it depends what you want (why not try out both?). The Webalizer's biggest advantage is that it produces prettier pictures. Some of analog's advantages are that it is more configurable; that it runs on any OS (the Webalizer is Unix only); and that it can analyse logfiles from any web server.

    Besides, analog's author reads Slashdot.

    --
    11.00100100001111110110101010001000100001011010001 1000010001101001100010011
    1. Re:Analog by Stephen · · Score: 2
      Webalizer's biggest advantage is that it produces prettier pictures.
      I should add that you can make analog's output prettier for your PHB[?] if you use Report Magic with analog.
      --
      11.00100100001111110110101010001000100001011010001 1000010001101001100010011
    2. Re:Analog by frankie · · Score: 4, Interesting

      I use Analog exclusively (well, after DNSTran for name lookups and Perl to sort out sub-logs) and I have found little reason to complain. As Stephen mentioned, you can use ReportMagic to prettify the output. I don't bother.

      My only complaint is Stephen's dogmatic insistence on not performing any form of speculative analysis. For example, he refuses to even attempt visitor counting, path tracking, etc. The sort of stuff that bosses like to see, whether or not it's strictly accurate.

      Stephen could put WebTrends out of business with a couple hours of coding, but he has his principles.

  9. Re:What's it matter what server generates the logs by Stephen · · Score: 3, Insightful
    Last I checked, both IIS and Apache generate (or can be set to generate) W3C standard format logfiles. Part of the reason for having/using that standard is so that you don't get locked into a proprietary tool.
    You might think so, but IIS breaks the standard in several ways. And it's not even really a standard, just an early working draft that was never finished.

    In my opinion, a good logfile analysis tool should be able to recognise and analyse all commonly-used formats, and provide a means to specify custom formats. In other words, it should work with what the server has already produced, rather than force the server administrator to reconfigure the server and ignore old logfiles. My program analog does all this, but most programs don't.

    --
    11.00100100001111110110101010001000100001011010001 1000010001101001100010011
  10. speculative analysis by KyleCordes · · Score: 2

    I agree.

    I have customers who want to see something that guesses at number of unique visitors, guesses at paths through the site, etc. They don't want to study, and don't care about the details of why it's unknowable information... they're used to seeing it from other packages, and complain that it's missing.

    Any kind of wild guess, with a bunch of caveats on the output, would be much more useful than the explanation of why this analysis is not done.

  11. Wusage by angry_beaver · · Score: 2, Interesting

    Our company uses Wusage and it's quite a nice package IMHO.
    It doesn't generate very pretty reports by default, but it is highly customizable and provides a truck load of data.

    Note: I am not affiliated with the makers of Wusage in any way.

  12. Sawmill is great. by tdyson · · Score: 2, Interesting

    Sawmill, by Flowerfire is pretty cool. It understand virtually every log you imagine. It'll run as a cgi, via cli or as a stand alone web server. There is a version for many different platforms. With the web interface, the Marketing group can do their own drill down and queries, so I can dosome real work. Performance is good. I think of it as the program that WebTrends wished it was. Get the eval version and take if for a spin.

  13. Pointy clicky lower end analyzer by crisco · · Score: 2
    In case your pointy clicky windows boss or marketing people want to use something, I can suggest Open Web Scope, a shareware type app.

    In the meantime, use analog or webalizer to get the full skinny on your traffic.

    --

    Bleh!