Slashdot Mirror


Best website statistics package?

goodminton asks: "As the webmaster for a small but growing e-commerce site, I'm becoming increasingly interested in the quality of our site metrics. We currently use a Javascript-based counter that provides good but basic information, however, a recent Slashdot posting has me thinking the stats from our system may not be as accurate as we'd like. What do you think is the best website statistics package, and why?"

79 comments

  1. Google by $exyNerdie · · Score: 3, Interesting

    You might want to try this: http://www.google.com/analytics/.

    It's free!!
    (you can register for the invite until it becomes publicly available)

    1. Re:Google by celardore · · Score: 1

      Google Analytics seems to be excellent. All sorts of statistics are available. I use it, and I can't really grumble about anything.

    2. Re:Google by bugg · · Score: 0

      Does google really need any MORE information about you and your website?

      I'm sorry, but I'm creeped out by the amount of data google already has on everyone, I don't need to let them watch who is visiting what on my websites as well.

      --
      -bugg
    3. Re:Google by jrockway · · Score: 1

      Agreed. I'm fine with google collecting data from their own websites, but I hate seeing "Waiting for google-analytics.com..." at every website that I visit. It creeps me out for some reason. To that end, I installed the NoScript extension and haven't looked back. I learned that 99% of javascript on the web is doing something to benifet the site owner (rather than me, the visitor), so I've turned javascript off globally and only turn it on for sites that use it for something good. NoScript is powerful enough that I can block Google's spying while still using the site's AJAX features.

      Big Brother might not be watching, but Google sure is.

      --
      My other car is first.
    4. Re:Google by bugg · · Score: 1
      Big Brother might not be watching, but Google sure is.

      And "Big Brother" -- say, the NSA, is most probably watching Google. I mean, assuming that anyone at NSA has any clue at all, don't you think they know as much as Google does?

      You want to install a network tap that gets the most interesting data and is easily analyzed? Install taps on google's uplink providers--- assuming the lower tech solution s(getting someone at google to give you access to the data, getting an inside person at google, rooting google's machines, etc. etc.) don't work, you're still golden. Your google searches are plaintext...

      --
      -bugg
  2. New Discussion System by perlionex · · Score: 3, Interesting

    For those subscribers using Slashdot's new discussion system, this link will work better.

    From the posting, though, I don't understand why you think your (Javascript-based) stats would be inaccurate, though, since only about 1.34% of users disabled or did not support Javascript.

    That said -- I personally use Analog, and although it does give some fairly useful statistics such as search engine terms, most popular directories, referers, etc., I don't find it gives me a very high level of insight into surfing habits. A log analysis tool such as that may be a good starting point for you, though, if you don't currently do analysis of that sort.

    1. Re:New Discussion System by xmas2003 · · Score: 1

      Ditto parent's comments. Analog is an oldie, but goldie - provides all the basic functionality you need and does it via log file analysis ... so no tweeks required on your HTML, nor dependant on Javascript enabled in the browser. Not all the bells and whistles of the newer stuff, but a great way to start.

      --
      Hulk SMASH Celiac Disease
  3. Webalizer or Analytics by Toveling · · Score: 3, Interesting

    Webalizer. Just feed it some nice Apache logs, and let it do the talking. Or, if you're less of the command-line guy, I've heard Google Analytics is great.

    1. Re:Webalizer or Analytics by Anonymous Coward · · Score: 0

      Have a look at modlogan (http://jan.kneschke.de/projects/modlogan/). While no formal development is happening anymore on it, it is stable and just does the job you want it to do. The output can be fixed (webalizer like) or fully customized. It not only is able to process apache logs, but a lot more log formats (squid, sendmail, ...).

  4. AWStats by Anonymous Coward · · Score: 0

    I'm a big fan of AWStats (awstats.sourceforge.net). It breaks things down nicely and is under active development. I used webalizer for a little bit last year, but it hasn't been updated in years and honestly stinks compared to aw.

    1. Re:AWStats by eddeye · · Score: 4, Informative

      What I decided on was http://awstats.sourceforge.net/. It's got a pretty impressive feature list, and I like the look, and the sheer volume of data it can collect.

      As someone who setup awstats for a high-traffic site last year, let me warn you -- beyond the available options, it ain't customizable. At all. The html generation is embedded in bits and pieces throughout their perl code. Some of the nastiest, speghettiest mess I've ever seen. They don't even use stylesheets for proper styling. If it does exactly what you want, then fine. But be forewarned: if your needs ever change, don't expect awstats to change with them.

      --
      Democracy is two wolves and a sheep voting on lunch.
    2. Re:AWStats by eddeye · · Score: 3, Informative

      I'm a big fan of AWStats (awstats.sourceforge.net).

      We got sucked in by the pretty graphs too. Internally, awstats is a mess. Some of the worst code spaghetti I've seen in awhile. As I already said, I'm not optimistic of their ability to improve going forward.

      --
      Democracy is two wolves and a sheep voting on lunch.
    3. Re:AWStats by hemp · · Score: 1

      Webalizer also does a great job of parsing your logfiles and producing graphs and charts:

      http://www.mrunix.net/webalizer/

      --
      Skip ------ See the latest from http://www.anArchyFortWorth.com
    4. Re:AWStats by bcrowell · · Score: 2, Insightful

      Mod parent up. I can't imagine why that would have been modded -1, redundant. Awstats has had a string of security problems, which caused me to give up on it.

    5. Re:AWStats by tdemark · · Score: 1

      They don't even use stylesheets for proper styling.

      The horror!

      Do they use tables and eat babies, too?

    6. Re:AWStats by gbjbaanb · · Score: 2, Informative

      So awstats is not configurable. That's not necessasarily a bad point - nearly everyone wants awstats on their sites, and they're happy with the look of it out of the box. But people should be aware that they cannot change it. 99% of the time, this is an absolute non-issue, it gets installed, works, looks pretty. Job sorted.

      For the 1% of people who would like to change it, well, they should be aware that it isn't going to be for them, before they start working with it. Again, this is not that big an issue.

      For all the users of awstats, the biggest problem is parse time. Awstats can be a lot slower than other stats packages, which isn't a problem until you start hosting 1000 sites on a server when it suddenly will be an issue. But, again, if you host 1000 sites, awstats just isn't for you and you shoudl check out something else.

    7. Re:AWStats by lagerbottom · · Score: 1

      "beyond the available options, it ain't customizable. At all."

      Sure it is...in the header part of the site: GNU GPL

      --
      "He was a wise man who invented beer." - Plato
    8. Re:AWStats by hunte · · Score: 2, Informative
      One caveat - the current version (6.5) has a command-injection vulnerability when run in cgi mode (as opposed to statically-created pages), so watch where & how you install it.


      Right: simply put awstats behind an .htaccess (http://www.javascriptkit.com/howto/htaccess3.shtm l) and you are pretty safe. I use awstats also under windows+iis (it's needed only activeperl to run under win32) and it's rock.

      Bye.
      --
      about me A - B
  5. AWStats by GuruBuckaroo · · Score: 5, Informative

    I just went through this process for my employer. While I like Google Analytics (and currently use it for my personal web pages), it's a bit more focused on e-commerce than I need - although that may be good for you.

    What I decided on was http://awstats.sourceforge.net/. It's got a pretty impressive feature list, and I like the look, and the sheer volume of data it can collect.

    One caveat - the current version (6.5) has a command-injection vulnerability when run in cgi mode (as opposed to statically-created pages), so watch where & how you install it.

    --
    Poor means hoping the toothache goes away.
  6. None by Bogtha · · Score: 4, Interesting

    If you are trying to find out how many people are visiting your site, or how popular particular browsers are, just give up now. No stats package can tell you that. Some pretend to, but it's snake oil.

    The basic problem is that not only are you fighting against the basic nature of a stateless protocol, but the things that skew your numbers (proxies, caching, etc) skew your numbers by an unknowable amount. Some things inflate your numbers, some things hide visitors from you. They don't cancel each other out like some people tell you (just think about it). In some cases, your visitors might not even communicate with your server at all.

    Web statistics are good for measuring server load and monitoring things like search terms people use to find your site, inbound links from referrers, etc. What you will find is that you can install any old stats package, and it will give you lots of pretty charts and numbers, but at the end of the day, you might as well make the numbers up, because they don't reflect reality. And yet for some reason, people still like having them, even when they know the numbers are totally wrong. I have yet to figure out why.

    --
    Bogtha Bogtha Bogtha
    1. Re:None by Jester998 · · Score: 2, Funny

      And yet for some reason, people still like having them, even when they know the numbers are totally wrong. I have yet to figure out why.

      These metrics empower us by quantifying the effectiveness of our economic paradigm, and allows us to leverage synergies with other business divisions. Furthermore, we can collect empirical datums, which may allow us to project customer interactions with our portal, and allow our business methods to expand dynamically going forward. Oh, and it'll help with that Web 2.0 thingy project...

      Did I miss any management buzzwords-du-jour?

    2. Re:None by grammar+fascist · · Score: 1

      Oh, and it'll help with that Web 2.0 thingy project...

      I had no idea "thingy" was a management buzzword.

      I definitely need to get out more.

      --
      I got my Linux laptop at System76.
    3. Re:None by tomhudson · · Score: 1

      Oh, and it'll help with that Web 2.0 thingy project...

      I had no idea "thingy" was a management buzzword.

      I definitely need to get out more.

      Just wait until NEXT year, when they come out with Thingy 2.0 to compete with Apple's iThingy

    4. Re:None by Jester998 · · Score: 1

      I didn't actually use 'thingy' strictly as a management buzzword -- it was a reference to concepts usually seen in tech cartoons like Dilbert or User Friendly (i.e. that management doesn't actually have a clue about the technologies they want to implement).

    5. Re:None by math0ne · · Score: 1

      Let me just start by saying, I'm no expert on web statistics. But, i am a web developer.

      Assuming that I:

      * Only care about PEOPLE (not robots) visiting my site
      * Use a javascript tracker
      * Account for a margin of error (Only 99% of traffic to my sites use JS)


      How can my statistics be scewed by cacheing, as the individual people will still report statistics by JS regardless of the cache. Plus you can get a good measure of stat scewedness by comparing JS tracking with log analyzing.

      Anyways, I'm of the camp that stats may not be fully accurate but they're very usefull.

    6. Re:None by Bogtha · · Score: 1

      Well I can spot two problems with that approach off the top of my head. Firstly, how did you establish that only 1% of your visitors do not have JavaScript available, and how do you meaure this on an ongoing basis? Secondly, JavaScript is not a binary yes/no question. There are different levels of support for JavaScript. The real statistic you need to worry about is how many of your visitors support the level of HTML and JavaScript necessary for your tracker to operate properly.

      Anyways, I'm of the camp that stats may not be fully accurate but they're very usefull.

      I'm not asking for fully accurate. I'm asking for reliable. In other words, inaccuracy is tolerable as long as you have a reliable way of determining how far out you might be.

      --
      Bogtha Bogtha Bogtha
  7. Sawmill rocks by toybuilder · · Score: 4, Informative

    Sawmill is an awesome slicer-and-dicer of your web logs. I haven't done web stuff in several years, but the package was awesome five years ago, and it looks like they've been refining the product over the years.

    1. Re:Sawmill rocks by toybuilder · · Score: 3, Informative

      BTW, they have an online demo. Play with it... I think it's pretty impressive.

    2. Re:Sawmill rocks by WuphonsReach · · Score: 1

      I'll second Sawmill. While I haven't upgraded in a while, I find their pricing to be at least somewhat reasonable (as opposed to WebTrends). Even for a small business with multiple domains, you're only going to pay a few hundred bucks. (Pricing)

      Sawmill also did a good job of analyzing our load-balanced set of web servers (allowing us to roll-up a set of combined stats).

      --
      Wolde you bothe eate your cake, and have your cake?
  8. Omniture SiteCatalyst by Anonymous Coward · · Score: 0

    If you want "the best" that's it. However, it's quite expensive (some % of ecomm revenue, I believe)

  9. One word by dereference · · Score: 0
    And yet for some reason, people still like having them, even when they know the numbers are totally wrong. I have yet to figure out why.

    Trends.

    1. Re:One word by Vancorps · · Score: 1

      Webtrends, haha, NetIQs package gives lots of great info in regards to trends although it is quite expensive. It's amazing the parent completely missed the whole idea of web log analysis like that.

    2. Re:One word by Bogtha · · Score: 2, Insightful

      Sorry, no. Let's say that AOL tune their caching parameters and all of a sudden a hundred thousand of your visitors get a page from AOL's cache instead of from your server. The "trend" will show a massive decrease in visitors, even if the number of visitors you have remains static.

      Looking at the difference between two incorrect numbers will not result in a correct number.

      --
      Bogtha Bogtha Bogtha
    3. Re:One word by dereference · · Score: 1
      Sorry, no. Let's say that AOL tune their caching parameters and all of a sudden a hundred thousand of your visitors get a page from AOL's cache instead of from your server. The "trend" will show a massive decrease in visitors, even if the number of visitors you have remains static.

      Sorry, but yes. You can easily account for large events like this when you see them. If these changes happen to coincide with some recent marketing campaign it can be tricky, of course, to analyze the source of the variances, but certainly not impossible. Your confidence level in your findings will no doubt decrease, but generally not so significantly as to render the data useless. The continual and otherwise-random (or at least chaotic) variations in the underlying fabric of the Internet that affect your raw data collection do not completely neutralize your ability to filter the signal from the noise.

      Looking at the difference between two incorrect numbers will not result in a correct number.

      Correct, no; but useful, yes. Your overall premise--that keeping stats is inherently useless--is quite simply wrong. In fact your position is essentially equivalent to saying that background cosmic radiation makes all radio-astronomy meaningless. There is most certainly value in analyzing trends.

      You may choose to disagree, continue to wonder why other people seem to care, and chide them about the futility of their task; or you can perhaps acknowledge that maybe this information is more useful than you had presumed, as long as one applies sufficiently robust statistical techniques.

    4. Re:One word by Bogtha · · Score: 1

      You can easily account for large events like this when you see them.

      That's just it - events like this look identical to a drop in visitors. So "when you see them" never applies, because you don't know when you are seeing an event like this and when you are simply seeing fewer visitors.

      And even if you could tell when an event like this happens - how are you going to account for them? You don't know how much they are affecting your numbers, because (for example) a single cache could be serving your resources to ten people or a hundred thousand.

      There is most certainly value in analyzing trends.

      Only if you know that new factors have not been introduced between data points or if you can account for them. Neither is true in this case.

      you can perhaps acknowledge that maybe this information is more useful than you had presumed, as long as one applies sufficiently robust statistical techniques.

      The thing is, most people use "sufficiently robust statistical techniques" as a synonym for "I'll ignore the sources of error I can't account for". If you think you can do better than this, then please point out how (and which stats packages do this).

      --
      Bogtha Bogtha Bogtha
    5. Re:One word by vidarh · · Score: 1
      That's just it - events like this look identical to a drop in visitors. So "when you see them" never applies, because you don't know when you are seeing an event like this and when you are simply seeing fewer visitors.

      The only thing you are demonstrating with your comments is that you haven't got the faintest clue about what you are talking about. In any real web company, the marketing and ad sales and data mining people go frantic at ANY change that isn't predicted in advance, and spend a lot of time and effort ensuring they understand exactly why, and how to compensate for those effects, and do so very well.

      Why?

      Because it translates in very real terms into huge amounts of money lost if they can't get accurate estimates of the return on investment of the sales and marketing expenditure, and experience has shown tracking trends in website accesses works for that purpose.

      People commit to multi-million dollar advertising campaigns and sales campaigns on a regular basis based on the track record of trend tracking of website traffic, and do so successfully because tracking these things is a lot easier than what you seem to think.

      Yes, there are deviations and errors on a regular basis, but except for large events which are easily spotted and corrected for (if you think the sales people won't wonder WHY the data change, or even why they don't change, you are sadly mistaken), they generally cancel eachother out.

      Think what you will, but real life experience contradict you.

    6. Re:One word by Bogtha · · Score: 1

      In any real web company, the marketing and ad sales and data mining people go frantic at ANY change that isn't predicted in advance, and spend a lot of time and effort ensuring they understand exactly why, and how to compensate for those effects

      I think you are a little disconnected from reality here. The vast majority of "real" web companies are tiny. They certainly don't employ data miners, and the marketing and ad sales people - assuming that there are full-time employees handling this - have enough work to do without analysing statistics.

      However, your logic seems to be "it's possible to have reliable numbers because there are people who go nuts if they think they are getting unreliable numbers". Surely you can see that logic just doesn't work? Placebo explanations are far more plausible.

      and do so very well.

      I'll need more than your say-so to believe that.

      People commit to multi-million dollar advertising campaigns and sales campaigns on a regular basis based on the track record of trend tracking of website traffic, and do so successfully because tracking these things is a lot easier than what you seem to think.

      The Catholic Church is one of the wealthiest organisations on earth because people believe giving money to the Church will help them get into heaven. I'm an atheist. According to your logic, heaven must exist because people throw billions at the Church. Sorry, I still don't believe in heaven.

      Instead of repeatedly telling me that it simply must work because people really want to believe the numbers and spend a lot of money by doing so, how about explaining how you can get reliable statistics?

      Yes, there are deviations and errors on a regular basis, but except for large events which are easily spotted and corrected for (if you think the sales people won't wonder WHY the data change, or even why they don't change, you are sadly mistaken), they generally cancel eachother out.

      Example: You hire a new columnist, and AOL changes their caching setup at the same time. The AOL changes cause your visitor count to drop sharply. Your new columnist draws in a lot more visitors. However, because of the AOL change, it shows up overall as a moderate decrease in visitors. You're telling me that this - entirely predictable - change would cause sales people to demand an explanation? That they wouldn't write it off as a poor response to the new columnist? You are giving sales people far too much credit.

      --
      Bogtha Bogtha Bogtha
    7. Re:One word by Anonymous Coward · · Score: 0

      Just out of curiosity, you seem to be so against web traffic analysis and down on every possible reason there is for using it, in your view, what are the point of web logs and the huge amount of information that is gathered?

      So far, all your comments have left me with the impression you have absolutely not 'real world' experience either as a web master or as part of an ISP/IAP of any kind. Please correct me if I'm wrong.

      I DO employ people to keep track of whats happening across my web properties. I DO make decisions based on the trends that show up in the logs. I do account for many variables such as proxied traffic and so on.

      You're right in that web traffic analysis will never give me accurate numbers, but you're completely wrong in your assumption that I cannot track trends. When dealing with millions of hits a day, AOL changing their proxy setup isn't exactly that big of a deal. I'm more concerned about where the traffic is coming from in regards to inbound links than the host ISP of the user. AOL can change their proxy, great. But they still have to get refreshed copies of the content when it changes, and given 90% of my pages are dynamic, they'd need to refresh them quite often. That is unless they can figure out some way to discern the content of an article or whatever from the ancillary information or advertising.

      Trending is important and trending does provide useful metrics. Its not accurate, but it is close enough to accurate to make it extremely useful.

    8. Re:One word by Bogtha · · Score: 1

      in your view, what are the point of web logs and the huge amount of information that is gathered?

      I said this in my first comment:

      Web statistics are good for measuring server load and monitoring things like search terms people use to find your site, inbound links from referrers, etc.

      I never said that statistics are all bad, just that you can't derive certain information from them.

      So far, all your comments have left me with the impression you have absolutely not 'real world' experience either as a web master or as part of an ISP/IAP of any kind. Please correct me if I'm wrong.

      I've been a web developer for almost a decade, with my own business for over three years. I've worked on well over a hundred websites and web applications, both big and small. What makes you think that I have no experience? Just because you disagree with me? If anything, I think that naïve faith in stats packages implies lack of experience. It's easy to accept numbers that software gives you if you know nothing; you need to actually know what you are talking about to even begin to disagree with them.

      I DO employ people to keep track of whats happening across my web properties.

      Then not only are you not the norm, but you aren't relevant to the discussion, as the Ask Slashdotter clearly wants a piece of software to do the job and not employees.

      I do account for many variables such as proxied traffic and so on.

      How? Simply claiming that you do isn't convincing.

      When dealing with millions of hits a day, AOL changing their proxy setup isn't exactly that big of a deal.

      I think you are underestimating both the size of AOL and the number of different things that can throw your numbers off.

      I'm more concerned about where the traffic is coming from in regards to inbound links than the host ISP of the user.

      Well seeing as I explicitly mentioned inbound link detection as a valid use of logs, I completely agree with that.

      AOL can change their proxy, great. But they still have to get refreshed copies of the content when it changes, and given 90% of my pages are dynamic, they'd need to refresh them quite often.

      The rate at which they refresh them is unimportant. The point is that you don't know how many people their cached copy is served to, so one "hit" corresponds to an unknown number of visitors.

      That is unless they can figure out some way to discern the content of an article or whatever from the ancillary information or advertising.

      I'm not sure if I'm understanding you right; you're saying that you vary your content pages to rotate your advertising? That's really inefficient. The efficient way of doing rotating advertising is to refer to a static URI that performs 302s. That way, both the content and the advertising can be cached effectively but the ads still rotate properly.

      And yes, you can separate content from ancillary information, but I'm not sure why you think this is relevant. Webstemmer is one approach.

      Its not accurate, but it is close enough to accurate to make it extremely useful.

      How accurate is it? How did you measure how accurate it was?

      --
      Bogtha Bogtha Bogtha
    9. Re:One word by dereference · · Score: 1
      That's just it - events like this look identical to a drop in visitors.

      No they don't. It's very easy to determine whether AOL has suddenly proxy-cached your site, and to differentiate that from a sudden fundamental drop in the actual number of visitors arriving at your site from AOL. There are both technical (meaning carefully crafting your site to collect better raw data) and mathematical (meaning sophisticated analysis of existing historical, current, and ongoing future data) approaches to this. Even intuitively one can easily imagine that the data patterns might have inherent differences; consider the trivial case of comparing your site statistics with another unrelated control site, and finding both sites showed the same decrease in AOL hits at the same time. Google "scientific method" and apply what you find to this problem. It's not nearly as intractable a problem as you might imagine.

      And even if you could tell when an event like this happens - how are you going to account for them? You don't know how much they are affecting your numbers, because (for example) a single cache could be serving your resources to ten people or a hundred thousand.

      Look, at first I thought you might have been intelluctually curious, but it's clear to me now that you've got an agenda and you're just trolling to prove your point. It would take hours to explain it to you, assuming you'd even care to listen rather than argue. I'm sorry, but the simple truth here is that you apparently lack sufficient knowledge to debate this question on any useful level. Seriously, go take an advanced course on statistics at a local university, go to your library, or just consult google.

      The thing is, most people use "sufficiently robust statistical techniques" as a synonym for "I'll ignore the sources of error I can't account for". If you think you can do better than this, then please point out how (and which stats packages do this).

      Gee, I guess you don't suppose that maybe I'm not one of those "most people" you mention. Of course I'm not talking about something rudimentary like AWStats or anything similar. While those are indeed useful on their own, they are incredibly limited. You're right that, using only these primitive tools, you're not going to be able to readily account for any unexpected variances. However, the raw log data can be analyzed by any of a vast number of statistical analysis and/or data mining packages. To specifically respond to your quaint challenge, please check out SAS.

      Again, please do yourself a favor and get some education in statistics. I'm not going to waste my time to "point out how" but that doesn't invalidate any of my previous statements. You may be pleasantly surpised to find there's more to the world of numbers than counting and computing averages. Your intuition about statistics, and your assumptions about what is possible, are both faling you.

      As a sibling poster mentioned, your only making yourself look ignorant; you're not demonstrating a willingness to learn. Or, of course, you could choose to remain ignorant and continually marvel as to why so many companies are spending so much time and money on a set of totally flawed assumptions, and why they just can't see things as clearly as you can.

    10. Re:One word by Bogtha · · Score: 1

      it's clear to me now that you've got an agenda

      Wow, talk about paranoid. It's easy to see why pro-stats packages people might have an agenda (e.g. they might work for a company selling snake oil), but what could I possibly gain from criticising the results?

      This is probably the most ridiculous response I've had so far. Let's sum up the responses I've had:

      • I'm wrong because I have an agenda.
      • I'm wrong because I'm trolling.
      • I'm wrong because people make decisions based on the numbers.
      • I'm wrong because you incorrectly speculate that I have no experience.
      • I'm wrong because some people go nuts if they think they are being fed misleading numbers.
      • I'm wrong because... well I'm too stupid to understand the explanation, so I'm just going to have to trust you.

      What do all of these responses have in common? They don't actually explain why what I am saying is wrong, they just attack me or are simply non-sequiturs.

      Feel free to actually talk specifics, but leave the pointless red herrings and personal attacks out of it.

      --
      Bogtha Bogtha Bogtha
    11. Re:One word by dereference · · Score: 1
      It's easy to see why pro-stats packages people might have an agenda (e.g. they might work for a company selling snake oil), but what could I possibly gain from criticising the results?

      Your agenda, apparently, is to argue. Some people thrive on this. Perhaps to prove a point, perhaps to feel important, perhaps because you feel ripped off or betrayed or left out as the only one who can't see what others are seeing. We're actually trying to help, but your argumentative style is getting in the way. If you don't mean to do this, I can't help beyond pointing it out; if it's intentional, well, here's one last laugh of a reply for you.

      * I'm wrong because... well I'm too stupid to understand the explanation, so I'm just going to have to trust you.

      No, if you had actually bothered to carefully read my post, rather than jump to argue and defend yourself, you'd notice I explicitly mentioned you shouldn't trust me (or anybody else, for that matter) and that you should go educate yourself regarding statistics. Search google, read a book, take a class, etc. There are many ways you can learn, but if you go in with a closed mind, it will be a long uphill battle.

      What do all of these responses have in common?

      Quite frankly they indicate to me that you are either baiting us for sport (trolling) or you're honestly having significant troubles with your communication skills. This is not meant as an attack--although I'm guessing you'll take it as such--it's just an observation.

      They don't actually explain why what I am saying is wrong, they just attack me or are simply non-sequiturs.

      I did bother to explain; again you're either having trouble communicating your own viewpoint to us, or we're all just ignorant of your insight that is so clearly beyond us we are lost.

      Feel free to actually talk specifics, but leave the pointless red herrings and personal attacks out of it.

      Here again, I did mention specifics. Go learn about the products from SAS Institute (I even provided a link for crying out loud) and maybe you'll understand it was not at all pointless. I don't have anything to do with them, except as a user of their products from long ago. Still, you don't have to trust their literature at all, but of you will need at least an introductory level of statistics under your belt before you'll understand why it can be relevant. Despite your qualitative/intuitive arguments to the contrary (which, by the way, you've provide nothing but hand-waving to support) there are a plethora of well-known techniques avaiable to cull trends out of noisy data. As I mentioned, the very existence of the field of radio-astronomy belies your assertions; it would be a futile field of science if your arguments were mathematically sound.

      You're clearly not "too stupid" to learn this stuff, but you are most certainly ill-informed on this subject. None of us can force knowledge into your head; when you see lots of otherwise helpful people getting frustrated while trying to explain something to you, it's because you are communicating to us (perhaps, unintentionally; we can't know for certain) that you are unwilling to listen to reason. If you are willing to reason, don't trust me, don't trust any other respondent; just please, please go read a textbook or two (no, not product literature; you wouldn't and shouldn't trust that anyway) and try to learn about this very interesting subject before looking at the various products, and only then should you reply.

    12. Re:One word by Anonymous Coward · · Score: 0

      Maybe you're not inexperienced, maybe you're just a dick...

  10. AWSTATS / WEBALIZER / ANALYTICS by madstork2000 · · Score: 2, Informative

    If you can get invited to analytics it really rocks. Awstats is good, and I have always been fond of webalizer. I run a small hosting company, and I have found that awstats and webalizer can be a bit processor intensive under certain conditions. The nice thing about analytics is that the processing takes place off site. Analytics also has a lot more information geared toward marketing, and the metrics that can help make marketing decisions. Awstats and webalizer and especially webalizer are more about presenting data from the logs.

    -MS2k

  11. Flawed does not mean worthless by ChaosDiscord · · Score: 5, Interesting
    Just because the information you get is flawed doesn't mean the information is worthless. Most data is the real world is deeply flawed, and yet useful information can be extracted, useful trends determined. Sure, your log files will be skewed by who choses the participate (That is, who isn't caught by caches and proxies. If you're using Javascript, who is allowing the javascript in question). But any survey is skewed by those who chose to participate.

    Throwing your hands up in the air and declaring that because you cannot be sure it's all garbage is foolishness. Know the limitations of your tools, accept the error, and take what you can get.

    1. Re:Flawed does not mean worthless by Bogtha · · Score: 1

      Just because the information you get is flawed doesn't mean the information is worthless.

      Perhaps you missed it, but I covered this in my original comment. It's not simply that the information is incorrect, it's that you have no way of knowing how incorrect it is. It is that which makes it worthless. You could be a few dozen visitors out or you could be a million visitors out. No useful conclusions can be drawn when the error of margin is unknowable.

      --
      Bogtha Bogtha Bogtha
    2. Re:Flawed does not mean worthless by Anonymous Coward · · Score: 0

      I disagree. Okay, so you will never know the exact number of visitors, or even be able to reasonably estimate it, but you can see the month-by-month differences in traffic, and know with some degree of certainty that your visitor count is increasing by a certain percent. This is useful.

      Yes, you're right that it's futile to ever hope to get a decent measure of visitor numbers, but that isn't what a stats package gives you. It gives you trends. And for many purposes, trends can actually be more useful than solid figures.

    3. Re:Flawed does not mean worthless by Bogtha · · Score: 1

      you can see the month-by-month differences in traffic, and know with some degree of certainty that your visitor count is increasing by a certain percent.

      Such differences can be caused by any number of different things unrelated to the number of visitors you have. You are assuming that the effect of all the different ways in which your visitor counts are wrong stays constant from month to month. That's a very unreliable assumption.

      --
      Bogtha Bogtha Bogtha
  12. AWStats by anderiv · · Score: 1

    I'm a big fan of AWStats. It primarily gets its stats from parsing your access_log, but it also includes a javascript portion you can elect to use if you're interested in collecting more detailed information about your visitors (screen resolution, flash versions, etc.).

    One caveat, though, if you choose to implement AWStats is that you should keep it in an access-restricted area of your webserver. There have are some pretty nasty vulnerabilities in AWStats. As long as you keep it secured, you should be fine, though.

  13. BBClone + Webalizer by Shawn+is+an+Asshole · · Score: 4, Informative

    If you're using PHP, you need to give BBClone a try. Just do an include from your scripts and it's good to go. The stats it generates are quite nice. I also use Webalizer on the server logs.

    --
    "It ain't a war against drugs.it's a war against personal freedom" --Bill Hicks
    1. Re:BBClone + Webalizer by Anonymous Coward · · Score: 0

      Actually BBclone can be used for html files too once you rename your example.html file example.php!

  14. Er, nope by cliveholloway · · Score: 2, Interesting

    We use Urchin - now "Google Analytics". Unless you want to delete cookies every page hit, and use the Web Developer Firefox plugin to remove hidden fields for every form submission, we pretty much have you tracked. This isn't 1995 y'know...

    --
    -- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
    1. Re:Er, nope by mabinogi · · Score: 2, Interesting

      > Unless you want to delete cookies every page hit, and use the Web Developer Firefox plugin to remove hidden fields for every form submission, we pretty much have you tracked. This isn't 1995 y'know...

      or just completely block *.google-analytics.com because urchin is the single most annoying thing on the internet.

      I'm so sick of waiting for pages to load, only to see "contacting google-analytics.com" in the status bar.

      It's the one thing that made me install the adblock extension. I don't care if you're tracking me. I do care if you're ruining my browsing experience.

      --
      Advanced users are users too!
    2. Re:Er, nope by stoborrobots · · Score: 1

      It's the one thing that made me install the adblock extension. I don't care if you're tracking me. I do care if you're ruining my browsing experience.

      Ditto... For me it wasn't so much the delay, as it was the fact that if you selectively allow the slashdot authentication cookies, then block their urchin cookies, it forcably logs you out on all following pages... Most annoying for my slashdot browsing experience... (Yes, I did just say that my slashdot addiction made me block google analytics.)

      Interestingly, though, some sites (sourceforge, for example) appear to have started hosting their urchin.js file locally... I don't know what effect that has on the system...

    3. Re:Er, nope by cliveholloway · · Score: 1

      Nope, we host urchin (we bought it a while back - version 4, I believe). It's all local, the js, cookies etc. Unless you want to start selectively deleting individual cookies after each page visit, there's not much you can right now.

      I don't think it will be long until there's a cookie wildcard blocker available - like adblock, but for cookies. But, I'm sure that when that arrives, the analytics firms will just start creating randomly named cookies that would pass such filters - not an enormous task. Or just use query string session tracking and an Apache module.

      I haven't looked at what Google's done with Urchin, but for anything over the basic level, you're going to be running these analytics in house. I'm assuming that the google-analytics.com is their hosted solution. Doesn't apply to us ;-)

      --
      -- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
    4. Re:Er, nope by Raphael · · Score: 1
      We use Urchin - now "Google Analytics". Unless you want to delete cookies every page hit, and use the Web Developer Firefox plugin to remove hidden fields for every form submission, we pretty much have you tracked.

      Funny that you mention this... I use Firefox with the Adblock extension, and here are some rules found in my preferences:

      *.google-analytics.com
      *urchin.js

      This works quite well for me. I added these rules several months ago, when urchin started appearing on some web pages and making them load slower. These rules block the "urchin.js" from Google but also from other sites using their local copy of that script. The pages load faster and this makes me feel better.

      The only minor issue is when I activate the FireBug extension and view Slashdot, it reports a Javascript error saying that "urchinTracker" is not defined. But this is to be expected, since the corresponding script is not loaded.

      --
      -Raphaël
  15. DIY by jginspace · · Score: 1
    As the webmaster for a small but growing e-commerce site...

    You say *small* so I say do your own. My code that collects and saves the actual data is less than a hundred lines of very simple PHP.

    When you're just beginning you want to follow each individual session. Most of the packages just aggregate data; they don't give info on a session level. You want info on *pages* viewed; not gifs and css and what-not

    Other advantages:

    * Send yourself a jabber message every time special things happen. Just a case of slotting in a class and adding five lines of code.
    * Log which countries they're coming from. Install a couple of files and add five lines of code.
    * Do things based on user-agent, country, whatever.
    * In future you might like to block or redirect users ... well this is harder but you've got the foundations there.

    The code for actuly presenting the data you collect is actually a lot more involved, if you want it to be versatile. But you're free to download the data and write that part in the language of your choice ... at your leisure.

    1. Re:DIY by Anonymous Coward · · Score: 0

      For our (small) website I ran into the problem that the hosting company does not let us access the webserver logs. But we do have MySQL.
      As you are right in that you do not want all the images analyzed anyway, I just added some code to the PHP script that serves the actual pages. It inserts a record into a logging database for every access. This record is similar to what a webserver logline would contain, but it is already conveniently split into fields.

      Now I can run everything on this growing database. I configured AWstats to read it and we have nice graphs, but when I want something different it is always possible to make or install it and have it read all the existing data.

  16. web mining by Anonymous Coward · · Score: 1, Interesting

    For me the most interesting feature of a statistics package is being able to do web mining. Very few do this, I only know the one I am using (metriserve web analytics). Basically it allows you to find hidden links between pages of your site even when they do not directly link to each other. This gives interesting results on one of my pr0n sites. You would not believe the hidden relations you can find between models, poses, etc as surfed by my visitors. I am sure the data could actually be used for an interesting phd study on the the matter ;-) Or maybe I should implement a nice "we also recommend..." feature.

    1. Re:web mining by Skal+Tura · · Score: 1

      It's a shame it's that costly, i would need to take that 1million pageviews/month for a single alone, and it does not generate that much revenue to justify it...

      Altho, for an another website it might just work, low traffic, but higher revenues per pageview...

      Anyways, this is out of the reach for most people because of the price :(

  17. 3 options by matt+me · · Score: 1

    The google one is supposedly good and it doesn't cost anything.

    But frankly, I'd rather not let anyone hold all the details of my site, even Google who of course want it for the mass of applicable marketing data. And the concept of linking to foreign scripts on my site isn't too tasty. Really, you should go for a SERVER solution, that you run yourself. Apache can give you a wealth of information already. Posting every users details every minute to Google is just crude.

    I've read about this - if you're some trendy blogger this looks like an ipod http://www.haveamint.com/ , I don't know of any free equivalents./

  18. Take a look at visitors by schlenk · · Score: 1

    I think you might like visitors: http://www.hping.org/visitors/

  19. Tracewatch is my favourite by Proto23 · · Score: 1

    I ran AWstats some time but too many security issues. Now I am using Tracewatch at http://www.tiouw.com/. Free, open source and great. See http://www.tracewatch.com/ Nice features are that you are able to follow individual visitors through your sites, find most common paths etc.

    1. Re:Tracewatch is my favourite by gbjbaanb · · Score: 1

      I never knew it existed, but that looks good. At last awstats has a competitor :)

      However.. it is not opensource, and it looks like it might be quite processor intensive.

  20. How about Wusage? by Anonymous Coward · · Score: 0

    When I was looking for a weblog analyzer i could find packages in two categories: $800+ visitor trail analisis software (really cool, but far too expensive for my clients) and free packages that only give very basic information (analog, webalizer).

    Wusage gives me very detailed statistics, most followed trails, keyword effectiveness, history of data that goes back infinitely, far more than webalizer or AWstats ever gives you, for $25.

    Looks totally uncool though, with a tables and frames based interface. It hasn't updated for quite some time. But it is by far the most functional I analyzer i have ever fond for less than $300.

    http://www.boutell.com/wusage/

  21. Phpmyvisites by amran · · Score: 1

    I used phpmyvisites before, and it isn't too bad - setup was a breeze, it gives good stats. but I've moved to google analytics now.

    1. Re:Phpmyvisites by IoN_PuLse · · Score: 1

      I'm using phpMyVisites right now, does exactly what I need :)

  22. Useful tools, but log-files are flawed by Anonymous+Brave+Guy · · Score: 2, Insightful

    Webalizer is very useful: we recently set up a new web site, and the information it provides has been handy for tweaking. It doesn't seem to provide everything we could want - there's no obvious way to gauge the relative popularity of different links on a given page, for example - but it does provide an idea of relative browser popularity among our visitors, which pages are most important (or at least most visited), and other useful information.

    Of course, like all log file-based tools, it suffers from the modern day curse of webmasters everywhere: caching. For example, the site I mentioned is for a university club. Around 1/3 of our hits are from the university cache servers, which all students are strongly encouraged to use. That messes up any analysis of total hits on each page of the site, and it would also mess up analysis of which links people tend to find most useful (assuming those they follow from one page to another are representative of this) if we had tools to do that.

    I'm sure anyone who reads Slashdot regularly will see the upside of caching, but a lot of people forget that it has a downside as well. As a webmaster trying to set up the most useful site possible (this is a non-profit group, run by volunteers, so my interests here are entirely benevolent) I would be more than happy to have accurate stats for all visits to our site in the past month, say, rather than lower bandwidth use.

    AFAICS, the only way to get anything close to accurate stats at the moment is to install some sort of "web bug" that will make it through the caches. However, this has rather sinister overtones, and I'm reluctant to do something that might be perceived as "spying". Would the crowd here consider it reasonable to go down that road, given that as stated above we have no ulterior motive and are just trying to monitor the way our new site design is working with a view to improving links etc? Would it reassure you if we did the "privacy policy" thing? (Personally, I don't find them particularly reassuring in most cases. If I don't trust a site not to screw me, why would I trust it more because they say they won't? Then again, full disclosure and all that...) Do we really have to resort to the tried-and-tested "visit counter" graphics? :o)

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  23. Keep in mind its flaws. by Anonymous Coward · · Score: 1, Insightful

    First, google now has very detailed info on your customers. Those customers like me who do a view source and see if you are providing such info to third parties without telling me will leave your site and find somewhere else to shop. Also, anyone with a brain has google analytics blocked in their hosts file:

    $ grep goog /etc/hosts
    0.0.0.0 www.google-analytics.com
    0.0.0.0 google-analytics.com

    1. Re:Keep in mind its flaws. by baudbarf · · Score: 1

      Finally. Someone other than me has a brain. People need to stop voluntarily bugging their websites. Maybe Google isn't evil today. I suppose that, at one point, Microsoft wasn't either. And suppose Google never becomes evil... the recent subpoena attempt made by an increasingly-nosy government highlights the danger of inviting even the most innocent corporation to share your visitors' most intimate secrets.

      --
      You can run but you can't hide, except, apparently, along the Afghan-Pakistani border.
  24. online service: Onestat by Abstract · · Score: 1

    The company I used to work for, a top 10 Dutch e-commerce business, used Onestat. The marketing managers went crazy over it. They liked it a lot.

    On the downside: it was expensive and sometimes the backend was very slow.

    On the upside: it was possible to measure the visitors on a very low as wel all high level.

  25. Amen by Anonymous Coward · · Score: 0

    They will have no only give these services away for free,
    they will need to pay to people to install their f&cking
    trackers on their websites .. too bad not many realize that.

  26. Visitors by mc_barron · · Score: 1

    I really like the Visitors package (http://www.hping.org/visitors/). It does a really good job identifying multiple hits as a single visitor (if timestamp, user-agent, etc match). It also have some very good summaries of Google hits.

  27. ClickTracks and NetTracker by stu42j · · Score: 1

    ClickTracks has some interesting features geared towards visitor behavior. However, I've only just started using it but I have some doubts about the accuracy of their numbers. It also is missing some of the basic information you would expect from a traditions web stats program.

    I was pretty impressed with our demo of NetTracker but it requires some serious cash if you have a busy site.

  28. First think about what you need by Ankh · · Score: 2, Interesting

    As with most things, it's not really that one package is "better" than another so much as that one might be more useful to you at any given time.

    I use my own package when a Web site is smaller (say, below a million hits per month) because I would rather sample some actual sessions and see where people went and what they were searching for than get an overview. If you see people are searching for Argyle Socks and are finding your page about the Duke of Argyll, you might want to add an extra page and link to it, "if you were looking for...".

    The statistic you most want is the things people looked for that might have reached your Web site and didn't, and that's the one you can't easily find!

    For a site getting under 1,000 hits per day, look at the server logs in detail at least once a week, and make navigation easier, add more content where it looks promising, think about why some areas don't get traffic, etc etc.

    When you're getting 10,000 hits/day, unless most of them are for graphics, the data can become overwhelming. And if you're over 100,000 hits per day you probably need to go to the sorts of reports that give you a very broad overview.

    A link checker and a 404 report can be useful -- Cool URIs don't change!

    Oh -- for anyone interested, although I do have hololog set up on for example my words and pictures from old books Web site (in a private directory, sorry), the sourceforge page doesn't have a download, mea culpa. If it looks useful to anyone I've shared copies of "hololog" in the past. It could do with some cleaning up, alas!

    Liam

    --
    Live barefoot!
    free engravings/woodcuts
  29. Your site visitors pay the price in eroded privacy by Anonymous Coward · · Score: 0

    If you let a search engine company provide web stats for you, then what you are doing is contributing additional information to the search engine, and this comes at the expense of your web users' privacy.

    This is more than just information about traffic to your web site. When a user's browser processes the Google code snippet embedded within your web page, the browser also forwards to Google's web host any cookies stored in the browser which originated from domain google.com, and the information forwarded will include a global ID number which is unique for that particular web user. Google's database will already include everything the user with that global ID has ever searched for; a tremendous amount of information. But now Google will have the additional information that the user visited a specific page on your site at a specific time; and perhaps even the amount of time spent there.

    When you combine all of this information with the data similarly collected from all of the Google Ads which are everywhere, then what you end up with is a search engine company starts to look more like an aggragator, and specifically, you get a company that looks increasingly like the notorious DoubleClick.

    And don't fall for the line that nothing personally identifiable is retained, because your IP address alone is very often considered to be personally identifiable information, and they definitely do retain this.

  30. summary.net by Hallow · · Score: 1

    I have to recommend summary.net

    It works great, is easy to set up and customize, and has lots of different kinds of reports available. Along with definitions for the different stat types (great for management to be able to understand what they're looking at).

    Plus the developer is very responsive.

  31. Advice from the field.... by steppin_razor_LA · · Score: 2, Informative

    I've done web analytics implementations for smaller (i.e. $10M e-commerce sites) and larger (i.e. hundreds of millions PV/month) companies.

    I'm not much of a fun of log file based analytics systems. They are simply too much work to maintain from an infrastructure POV and caching wreaks havoc with the accuracy of the stats. I therefore recommend 1x1 transparent pixel based systems. If you insist on log file based systems, NetTracker and WebTrends make some decent products.

    Google analytics is a great package for smaller companies. It is free and offers a nice chunk of functionality. Caveat emptor -- you get what you pay for. When I audited my last employers GA e-commerce metrics against actual online sales, there was a substantial (I think ~10% error)! However it is still a good tool for understanding trends and issues w/ your analytics.

    Webside Story (HBX) and Omniture rule the high end market. It has been a while since I checked pricing, but I think you can expect to start out at the ~$10-$20K/yr range. Both of these products are excellent.

    Webside Story sells a lower end package (Hitbox Professional) that has limited commerce metrics but is also pretty decent and afforable. They have an enterprise system: HBX that is excellent.

    Omniture also has am impressive system. I don't think they have much in terms of entry level offerings.

    Web trends has a product Web Trends live that is about 1/2 the price of the enterprise products from Webside Story and Omniture. It has been a good 5 years since I've their product, but I wasn't especially impressed with it at the time.

    --
    Evolution: love it or leave it
  32. The Shell by SlashBoot.org · · Score: 1

    For me, you really can't beat a bit of grep, awk, wc and other bits of shell jiggery-pokery. I don't feel the need for webstats beautifiers, although they do have their place. With the vulnerabilities in awstats I wouldn't touch it with somebody else's barge pole these days, which is a shame because I used to really like the look and feel of it.

    Analog and webalizer, from ports, might get used on some deployments, from time to time, but that is probably as far as it goes. Hit the console and be your own log file analyser, it is more fun and far more flexible.