Webtrends - Reporting Site Usage and Other Stats?
gammoth asks: "My company has a successful web site which gets roughly 1,800,000 hits from 45,000 sessions a day. A few years ago, our web stats software, HitList, broke when we crossed it's capacity threshold (~1,000,000 hits). I replaced it with a tailored version of Webalizer supported by an array of perl scripts and a Suitespot server plugin. My reporting system runs with little intervention, managing log files from 4 hosts, and competently reports on hits, popular pages, referrers, etc. But it's not perfect and I'm the first to admit it doesn't provide the kind of info the marketing department would find really useful. I have plans of a comprehensive system using a DB and a report engine, but I've not had the time to implement it. (We're interested in info on marketing campaign success, path through site, etc). Meanwhile, marketing is tired of waiting and the otherwise exceptionally supportive IT management (truly) is considering contracting out some of our site usage reporting. Webtrends is being looked at seriously. I was wondering if any readers out there had had any experience with Webtrends or other software package or service provider. Are there any OS packages that provide features well beyond Webalizer?"
WebTrends annoys me greatly, because it is poorly documented, has a sucky interface, and misleads naive users into thinking they are getting reports on "visitors" and "sessions" when in fact they are simply getting stats on a window of visits from an IP number.
Read this document Why web usage statistics are worse than meaningless and memorise it.
Also, remind your marketing folks that quantitative data from your logfiles can only be interpreted with qualitative data from interviews/focus groups/usability studies. If people stay for less time in your site tan before, is it because your design sucks, or because they found what they wanted and left quickly? Only qualitative research can tell you.
Whenever marketing people spot trend variations, they will ask you why. You will need to know the above in order to respond properly.
It's closed-source, commerical software, but I've been a big fan of NetTracker from Sane Solutions for a few years now.
I use it in an ISP environment, running with Apache logs on FreeBSD, and haven't had a problem with it yet. Plus, their support is outstanding.
It's one of the few pieces of closed-source software I have recommended. They have a demo version, so you can try it out on your logfiles and see if it works for you. But I highly recommend it.
Disclaimer: I have no relationship with Sane aside from being a happy customer
I used WebTrends for several sites with about the same traffic you're looking at analyzing. In short, WebTrends sucks bigtime. It would crash for no reason almost everyday, and their dns resolver code is sloooooooow. I had to write a custom dns resolver that would replace all of the ip's with the hostnames in the logfiles before running it through webtrends. I've used both the Windows version, and the Linux Webtrends server. The windows version actually worked better, but it still sucked bigtime. Their customer support sucks too. A new version came out a week after I spent $2000 on their software, which was filled with bugs. The new version fixed most of the bugs, but they were going to make me buy it again to get the upgrade. Analog with Report Magic did the same things webtrends did, but it was free, and it worked much better.
Another package I've used is Accrue. I think this is by the same people that make HitList, but it's much better. It's not without it's problems, but it would work great for a site with the amount of traffic that you are analyzing. We didn't run into problems until trying to analyze more than 150 million hits/day. It has a sniffer that sits on your network and watches web traffic. It generates it's own logs which are more comprehensive than your webserver logs. Every hour, it uploads it's data to the "warehouse" box which analyzes it at the end of the day. It requires beefy hardware, big expensive Sun Enterprise systems. It has some nice marketing stats stuff, like path analysis, and other crap. Very expensive though, expect to spend 5 to 6 figures on the software, and another 5 to 6 figures on hardware. They purchased another company that did nearly the same thing about a year ago, and they have a new version based on the technology from the other company, version 6.0/6.1. I haven't used the new version, but supposedly it's much better. The price is still insane though, so unless this is something you really really need, I'd stay away. It also requires a good DBA who knows RedBrick or Oracle (you can use either for a database).
Another option is a managed log service like Digimine. They work well, but it's a recurring fee since it's a service, not software. And you have to upload your logs to them every day.
There's a company that's been hitting me up lately, I forget their name now. But they have a linux based version which has clustering capability. The database is stored compressed in chunks across the entire cluster. It scales linearly, so you can add machines as you need them. They've been taking business away from Digimine and Accrue. They are based in Minneapolis I think, but like I said, I forget their name now. Their software can correlate different logs together too, and get you stats on email campaign's, video streaming, and your webservers. If you're into spending money, this would likely be your best bet.
I would stay far far away from WebTrends if I were you. Webtrends is a sucky product, and you can get the same info with Analog and ReportMagic, for free, and with better performance. 1.8 million hits isn't really that much, so a product like Accrue would likely be overkill. And most companies balk at services since they can't depreciate the expenditure over time, it's an operating cost not a capital expense.
Need Free Juniper/NetScreen Support? JuniperForum
I've read that document before, and I suggest that perhaps you need to re-read it with a more jaundiced eye towards your prejudices.
:P ). It's still worth reading, but only after you filter it a little.
The document now contains several disclaimers admitting that the author's original conclusions have been undermined somewhat by his own hyperbole, ignorance and by new technology (the original was written in 1995 - in web terms, it may as well be written in hieroglyphics on decaying papyrus)(ok, so that's a little exageration of my own...
In particular, he doesn't account for cookies, which are great for web tracking (personally, I block nearly all cookies, but I don't think that session tracking is a malicious use). Cookies can give you very accurate data on visitor use, and proper reporting can turn that into very useful information.
Also, the points he (or she) and you make about IP addresses vs. sessions vs. users are valid, but overblown. Very few people access the same site from different IP address in a given session. You wouldn't want to bet your life savings on these numbers, but they're accurate over 90% of the time, and that's more than enough to get good information (as someone else once said, "Don't believe me? Next time you have a blood test, tell them to take it all to make sure they get an accurate reading.").
We've used WebTrends for month, and I like them quite a lot. For some things they are excellent; for others, not so. A word about methodology: WebTrends tracking code consists of a primary method and a fallback. The primary method uses JavaScript to compute a compressed string of data including much client information and appends this to an HTML image tag - this data is slurped into a database at WebTrends. If JavaScript is disabled, the hit still gets recorded, but without all the fancy extra info. They try to place a unique, persistent cookie with each image load (once per page).
According to WebTrends, over 95% of our visitors have both cookies and JavaScript enabled.
Their reporting tools are very good and comprehensive, containing everything I've seen from the best log analysis software and some things that software can't get (average screen resolution and window size, for instance - I love this). You can customize content groups to your heart's content by modifying some variables in their JS. Their site itself is well made and smart: their help system pops up a content-sensitive window with information for each specific page; if you click to a new page, the help window is updated. Yes, this is relatively easy to implement, but how many sites do it? Too few.
Now, not all is Madam George and roses (to coin a phrase). I've found that WebTrends reports at best 95% of our traffic. Periodically I run a couple home-brew Perl scripts on our logs and it always counts more hits than WebTrends shows (not an issue with my Perl-fu, BTW). Their tech support is decent, but not wonderful - if you have a real issue, you might run around a little. A couple times they've flat-out dropped large chunks of our traffic (e.g. 40% for a day), never to be seen again.
Finally, we get about 10% the traffic the original poster does, so I can't tell you how well they scale. They'll charge a pretty penny for that amount of traffic, too.
To summarize (whew): (a) WebTrends is pretty decent, and excellent for some things; (b) IP-based assumptions and cookie tracking can get you very accurate statistics as long as you can live with the limitations.
This isn't as much "normalization" as it is "don't take so many drugs when you're designing tables."