Slashdot Mirror


Webtrends - Reporting Site Usage and Other Stats?

gammoth asks: "My company has a successful web site which gets roughly 1,800,000 hits from 45,000 sessions a day. A few years ago, our web stats software, HitList, broke when we crossed it's capacity threshold (~1,000,000 hits). I replaced it with a tailored version of Webalizer supported by an array of perl scripts and a Suitespot server plugin. My reporting system runs with little intervention, managing log files from 4 hosts, and competently reports on hits, popular pages, referrers, etc. But it's not perfect and I'm the first to admit it doesn't provide the kind of info the marketing department would find really useful. I have plans of a comprehensive system using a DB and a report engine, but I've not had the time to implement it. (We're interested in info on marketing campaign success, path through site, etc). Meanwhile, marketing is tired of waiting and the otherwise exceptionally supportive IT management (truly) is considering contracting out some of our site usage reporting. Webtrends is being looked at seriously. I was wondering if any readers out there had had any experience with Webtrends or other software package or service provider. Are there any OS packages that provide features well beyond Webalizer?"

11 of 28 comments (clear)

  1. Kill Marketing by Jester998 · · Score: 2

    "supported by an array of perl scripts... doesn't provide the kind of info the marketing department would find really useful"

    I think that a LART, applied tactfully, is in order. Obviously, the marketing department needs a crash course in the elegance of Perl. =)

  2. Hating WebTrends, and dealing with Marketing by judd · · Score: 5, Insightful
    I much prefer less flashy but more capable tools such as Analog (http://www.analog.cx).

    WebTrends annoys me greatly, because it is poorly documented, has a sucky interface, and misleads naive users into thinking they are getting reports on "visitors" and "sessions" when in fact they are simply getting stats on a window of visits from an IP number.

    Read this document Why web usage statistics are worse than meaningless and memorise it.

    Also, remind your marketing folks that quantitative data from your logfiles can only be interpreted with qualitative data from interviews/focus groups/usability studies. If people stay for less time in your site tan before, is it because your design sucks, or because they found what they wanted and left quickly? Only qualitative research can tell you.

    Whenever marketing people spot trend variations, they will ask you why. You will need to know the above in order to respond properly.

  3. I've had good luck with NetTracker by lunenburg · · Score: 4, Informative

    It's closed-source, commerical software, but I've been a big fan of NetTracker from Sane Solutions for a few years now.

    I use it in an ISP environment, running with Apache logs on FreeBSD, and haven't had a problem with it yet. Plus, their support is outstanding.

    It's one of the few pieces of closed-source software I have recommended. They have a demo version, so you can try it out on your logfiles and see if it works for you. But I highly recommend it.

    Disclaimer: I have no relationship with Sane aside from being a happy customer

    1. Re:I've had good luck with NetTracker by speleo · · Score: 2, Informative

      Same here.

      We've been using NetTracker for over 3 years now and our customers love it.

      Great support and it runs on most platforms. We use it on Solaris now but ran it under Linux for several years with no problems.

  4. Rings a bell... by twodiddyliddy · · Score: 2, Informative

    This story looks a lot like this ask slashdot.

    --
    To mænd sad i en tømmerflåde
  5. WebTrends sucks by austad · · Score: 4, Informative

    I used WebTrends for several sites with about the same traffic you're looking at analyzing. In short, WebTrends sucks bigtime. It would crash for no reason almost everyday, and their dns resolver code is sloooooooow. I had to write a custom dns resolver that would replace all of the ip's with the hostnames in the logfiles before running it through webtrends. I've used both the Windows version, and the Linux Webtrends server. The windows version actually worked better, but it still sucked bigtime. Their customer support sucks too. A new version came out a week after I spent $2000 on their software, which was filled with bugs. The new version fixed most of the bugs, but they were going to make me buy it again to get the upgrade. Analog with Report Magic did the same things webtrends did, but it was free, and it worked much better.

    Another package I've used is Accrue. I think this is by the same people that make HitList, but it's much better. It's not without it's problems, but it would work great for a site with the amount of traffic that you are analyzing. We didn't run into problems until trying to analyze more than 150 million hits/day. It has a sniffer that sits on your network and watches web traffic. It generates it's own logs which are more comprehensive than your webserver logs. Every hour, it uploads it's data to the "warehouse" box which analyzes it at the end of the day. It requires beefy hardware, big expensive Sun Enterprise systems. It has some nice marketing stats stuff, like path analysis, and other crap. Very expensive though, expect to spend 5 to 6 figures on the software, and another 5 to 6 figures on hardware. They purchased another company that did nearly the same thing about a year ago, and they have a new version based on the technology from the other company, version 6.0/6.1. I haven't used the new version, but supposedly it's much better. The price is still insane though, so unless this is something you really really need, I'd stay away. It also requires a good DBA who knows RedBrick or Oracle (you can use either for a database).

    Another option is a managed log service like Digimine. They work well, but it's a recurring fee since it's a service, not software. And you have to upload your logs to them every day.

    There's a company that's been hitting me up lately, I forget their name now. But they have a linux based version which has clustering capability. The database is stored compressed in chunks across the entire cluster. It scales linearly, so you can add machines as you need them. They've been taking business away from Digimine and Accrue. They are based in Minneapolis I think, but like I said, I forget their name now. Their software can correlate different logs together too, and get you stats on email campaign's, video streaming, and your webservers. If you're into spending money, this would likely be your best bet.

    I would stay far far away from WebTrends if I were you. Webtrends is a sucky product, and you can get the same info with Analog and ReportMagic, for free, and with better performance. 1.8 million hits isn't really that much, so a product like Accrue would likely be overkill. And most companies balk at services since they can't depreciate the expenditure over time, it's an operating cost not a capital expense.

    --
    Need Free Juniper/NetScreen Support? JuniperForum
    1. Re:WebTrends sucks by Wanker · · Score: 4, Informative

      My biggest gripe with WebTrends is how they try to "dumb it down" so that any bozo who can spell HTML can use it. This in itself is not all bad, but there is absolutely no faciltiy to have it reveal how it arrived at the numbers it did.

      You have to have blind faith in the product.

      Try feeding WebTrends a custom log that isn't in its predefined types. It will not error out, it will not complain a bit, it just parses the log incorrectly and produces completely meaningless output.

      How can you tell this completely meaningless garbage output from a properly parsed logfile?

      You can't.

  6. NOT meaningless by legLess · · Score: 4, Interesting

    I've read that document before, and I suggest that perhaps you need to re-read it with a more jaundiced eye towards your prejudices.

    The document now contains several disclaimers admitting that the author's original conclusions have been undermined somewhat by his own hyperbole, ignorance and by new technology (the original was written in 1995 - in web terms, it may as well be written in hieroglyphics on decaying papyrus)(ok, so that's a little exageration of my own... :P ). It's still worth reading, but only after you filter it a little.

    In particular, he doesn't account for cookies, which are great for web tracking (personally, I block nearly all cookies, but I don't think that session tracking is a malicious use). Cookies can give you very accurate data on visitor use, and proper reporting can turn that into very useful information.

    Also, the points he (or she) and you make about IP addresses vs. sessions vs. users are valid, but overblown. Very few people access the same site from different IP address in a given session. You wouldn't want to bet your life savings on these numbers, but they're accurate over 90% of the time, and that's more than enough to get good information (as someone else once said, "Don't believe me? Next time you have a blood test, tell them to take it all to make sure they get an accurate reading.").

    We've used WebTrends for month, and I like them quite a lot. For some things they are excellent; for others, not so. A word about methodology: WebTrends tracking code consists of a primary method and a fallback. The primary method uses JavaScript to compute a compressed string of data including much client information and appends this to an HTML image tag - this data is slurped into a database at WebTrends. If JavaScript is disabled, the hit still gets recorded, but without all the fancy extra info. They try to place a unique, persistent cookie with each image load (once per page).

    According to WebTrends, over 95% of our visitors have both cookies and JavaScript enabled.

    Their reporting tools are very good and comprehensive, containing everything I've seen from the best log analysis software and some things that software can't get (average screen resolution and window size, for instance - I love this). You can customize content groups to your heart's content by modifying some variables in their JS. Their site itself is well made and smart: their help system pops up a content-sensitive window with information for each specific page; if you click to a new page, the help window is updated. Yes, this is relatively easy to implement, but how many sites do it? Too few.

    Now, not all is Madam George and roses (to coin a phrase). I've found that WebTrends reports at best 95% of our traffic. Periodically I run a couple home-brew Perl scripts on our logs and it always counts more hits than WebTrends shows (not an issue with my Perl-fu, BTW). Their tech support is decent, but not wonderful - if you have a real issue, you might run around a little. A couple times they've flat-out dropped large chunks of our traffic (e.g. 40% for a day), never to be seen again.

    Finally, we get about 10% the traffic the original poster does, so I can't tell you how well they scale. They'll charge a pretty penny for that amount of traffic, too.

    To summarize (whew): (a) WebTrends is pretty decent, and excellent for some things; (b) IP-based assumptions and cookie tracking can get you very accurate statistics as long as you can live with the limitations.

    --
    This isn't as much "normalization" as it is "don't take so many drugs when you're designing tables."
    1. Re:NOT meaningless by judd · · Score: 3, Insightful

      The author of the article I linked did say it was a rant.

      Cookies only work their magic if you have full control of the hosting environment; ie if you can set a unique cookie in the first place, and record it in the logs. (Yes, I know, you ought to be able to do this everywhere, but it's not a perfect world). In their absence, I don't know how you measure the effect of proxies in conflating IP numbers.

      I think there's WebTrends, and WebTrends. The versions that I have seen (WebTrends Enterprise) did not operate in the manner you described - it was a product that you ran on your own boxes that did pure log file analysis. It did not have a server component that could "tag" a page, and WT themselves were not involved. The feature you describe is very neat - although whether it justifies WT's very high price tag is a good question.

      I still feel that if there is a particular metric that is important to you, you are better off coding it yourself or using Analog.

    2. Re:NOT meaningless by babbage · · Score: 2
      Yes. At my last job, we used the self-run version of WebTrends. There were good things about it, and there were things that weren't so good.

      Personally, as the monkey that had to run it every week, I didn't like it very much -- the version we had could only do log analysis on NT, but all our servers were Linux, so once a week I had to download the weekly reports to my computer (which meant that I wasn't able to run Linux even though all the work I did other than WT had to run on Linux; the download wasn't *that* bad, we were a small site and all the logs would "only" amount to a couple hundred megs -- usually it would finish by the time lunch was over on Mondays... :/), the actual analysis would take quite a long time to complete, and at the end of it I'd have to upload the summary reports back to the web site so that everyone that needed access would be able to see it.

      Now to be fair, counter to what I just wrote, the catch is that the copy of Web Trends we had was old when I started the job, and it's even older now -- I'm thinking it's at least three or four years old now. Presumably there are better versions of it now that can run on the same platform as the server OS. Better still, I could have saved a lot of bother by just leaving an NT box at the co-lo facility and automatically access Apache log files by the FTP or SAMBA as needed. But I didn't think of that then, and the product documentation certainly didn't make such a useful suggestion.

      I think the earlier poster hit the nail on the head with his comment about quantitative (log based data) vs. qualitative (questionnaire / survey / observation type data). You can whine all you want about how web gathered data is imperfect as a data collection for marketing analysis, but hey, just look at what *every other* form of advertising has to offer: surveys only. Hey, the web can do surveys too, and the web can also gather all this technical information that aren't available to TV, radio, or print ads. The issue isn't that this web data is imperfect -- so what? -- the issue is that you at least have something to work with. As long as you keep the imperfections in the back of your mind when analyzing that data, you can still draw conclusions that are at least as solid as those gatherable by any other form of media.

      Also, playing into another point raised by that same poster, I have heard of companies hitting the same problems with internal web data numbers being consistently lower than those obtained by third parties. There are a lot of reasons that this will happen, and not all of them are ones you want to try to defeat -- caching for example allows many people to see your content without running up your bandwidth costs, but the tradeoff is that you never know for sure how many people got to see that content. The best you can hope for is consistency -- to always have numbers that are 1.2x that of the third party counts, or 1.6x, or 3x or if you're lucky 0.75x. Whatever. The point being, getting the numbers to agree is difficult because everyone has a different counting strategy; as long as you can account for the differences & accurately preduct how the third party numbers will agree (or disagree) with yours, that's enough to work with.

  7. Hosted service or local app? by selan · · Score: 2
    The first thing you need to decide is whether you want to use a hosted service where the service keeps track of your hits on their or a local app that runs on your server and analyzes your logs. I think that Webtrends has both versions, hence the very contradictory comments so far.

    I have used the Webtrends hosted service, WebTrendsLive, and have no complaints.

    • It's easy to implement--just insert some javascript into each page you want to track and set a few variables to customize it.
    • As a hosted service, they keep track of all data and crunch the numbers, so there is no extra load on your servers.
    • The web interface is nice and provides all the info marketing wants and more.You can set up the service to email reports to your marketeers as often as they want.
    • However, with your level of hits, it will probably cost you big bucks.

    Applications that run locally are much less expensive, but they put a bigger load on your servers (I don't have a lot of experience with them, though).