Best website statistics package?
goodminton asks: "As the webmaster for a small but growing e-commerce site, I'm becoming increasingly interested in the quality of our site metrics. We currently use a Javascript-based counter that provides good but basic information, however, a recent Slashdot posting has me thinking the stats from our system may not be as accurate as we'd like. What do you think is the best website statistics package, and why?"
You might want to try this: http://www.google.com/analytics/.
It's free!!
(you can register for the invite until it becomes publicly available)
For those subscribers using Slashdot's new discussion system, this link will work better.
From the posting, though, I don't understand why you think your (Javascript-based) stats would be inaccurate, though, since only about 1.34% of users disabled or did not support Javascript.
That said -- I personally use Analog, and although it does give some fairly useful statistics such as search engine terms, most popular directories, referers, etc., I don't find it gives me a very high level of insight into surfing habits. A log analysis tool such as that may be a good starting point for you, though, if you don't currently do analysis of that sort.
Gan Family Homepage
Webalizer. Just feed it some nice Apache logs, and let it do the talking. Or, if you're less of the command-line guy, I've heard Google Analytics is great.
I just went through this process for my employer. While I like Google Analytics (and currently use it for my personal web pages), it's a bit more focused on e-commerce than I need - although that may be good for you.
What I decided on was http://awstats.sourceforge.net/. It's got a pretty impressive feature list, and I like the look, and the sheer volume of data it can collect.
One caveat - the current version (6.5) has a command-injection vulnerability when run in cgi mode (as opposed to statically-created pages), so watch where & how you install it.
Poor means hoping the toothache goes away.
If you are trying to find out how many people are visiting your site, or how popular particular browsers are, just give up now. No stats package can tell you that. Some pretend to, but it's snake oil.
The basic problem is that not only are you fighting against the basic nature of a stateless protocol, but the things that skew your numbers (proxies, caching, etc) skew your numbers by an unknowable amount. Some things inflate your numbers, some things hide visitors from you. They don't cancel each other out like some people tell you (just think about it). In some cases, your visitors might not even communicate with your server at all.
Web statistics are good for measuring server load and monitoring things like search terms people use to find your site, inbound links from referrers, etc. What you will find is that you can install any old stats package, and it will give you lots of pretty charts and numbers, but at the end of the day, you might as well make the numbers up, because they don't reflect reality. And yet for some reason, people still like having them, even when they know the numbers are totally wrong. I have yet to figure out why.
Bogtha Bogtha Bogtha
Sawmill is an awesome slicer-and-dicer of your web logs. I haven't done web stuff in several years, but the package was awesome five years ago, and it looks like they've been refining the product over the years.
What I decided on was http://awstats.sourceforge.net/. It's got a pretty impressive feature list, and I like the look, and the sheer volume of data it can collect.
As someone who setup awstats for a high-traffic site last year, let me warn you -- beyond the available options, it ain't customizable. At all. The html generation is embedded in bits and pieces throughout their perl code. Some of the nastiest, speghettiest mess I've ever seen. They don't even use stylesheets for proper styling. If it does exactly what you want, then fine. But be forewarned: if your needs ever change, don't expect awstats to change with them.
Democracy is two wolves and a sheep voting on lunch.
I'm a big fan of AWStats (awstats.sourceforge.net).
We got sucked in by the pretty graphs too. Internally, awstats is a mess. Some of the worst code spaghetti I've seen in awhile. As I already said, I'm not optimistic of their ability to improve going forward.
Democracy is two wolves and a sheep voting on lunch.
If you can get invited to analytics it really rocks. Awstats is good, and I have always been fond of webalizer. I run a small hosting company, and I have found that awstats and webalizer can be a bit processor intensive under certain conditions. The nice thing about analytics is that the processing takes place off site. Analytics also has a lot more information geared toward marketing, and the metrics that can help make marketing decisions. Awstats and webalizer and especially webalizer are more about presenting data from the logs.
-MS2k
Webtrends, haha, NetIQs package gives lots of great info in regards to trends although it is quite expensive. It's amazing the parent completely missed the whole idea of web log analysis like that.
Throwing your hands up in the air and declaring that because you cannot be sure it's all garbage is foolishness. Know the limitations of your tools, accept the error, and take what you can get.
Search 2010 Gen Con events
I'm a big fan of AWStats. It primarily gets its stats from parsing your access_log, but it also includes a javascript portion you can elect to use if you're interested in collecting more detailed information about your visitors (screen resolution, flash versions, etc.).
One caveat, though, if you choose to implement AWStats is that you should keep it in an access-restricted area of your webserver. There have are some pretty nasty vulnerabilities in AWStats. As long as you keep it secured, you should be fine, though.
Webalizer also does a great job of parsing your logfiles and producing graphs and charts:
http://www.mrunix.net/webalizer/
Skip ------ See the latest from http://www.anArchyFortWorth.com
If you're using PHP, you need to give BBClone a try. Just do an include from your scripts and it's good to go. The stats it generates are quite nice. I also use Webalizer on the server logs.
"It ain't a war against drugs.it's a war against personal freedom" --Bill Hicks
Mod parent up. I can't imagine why that would have been modded -1, redundant. Awstats has had a string of security problems, which caused me to give up on it.
Find free books.
We use Urchin - now "Google Analytics". Unless you want to delete cookies every page hit, and use the Web Developer Firefox plugin to remove hidden fields for every form submission, we pretty much have you tracked. This isn't 1995 y'know...
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
You say *small* so I say do your own. My code that collects and saves the actual data is less than a hundred lines of very simple PHP.
When you're just beginning you want to follow each individual session. Most of the packages just aggregate data; they don't give info on a session level. You want info on *pages* viewed; not gifs and css and what-not
Other advantages:
... well this is harder but you've got the foundations there.
* Send yourself a jabber message every time special things happen. Just a case of slotting in a class and adding five lines of code.
* Log which countries they're coming from. Install a couple of files and add five lines of code.
* Do things based on user-agent, country, whatever.
* In future you might like to block or redirect users
The code for actuly presenting the data you collect is actually a lot more involved, if you want it to be versatile. But you're free to download the data and write that part in the language of your choice ... at your leisure.
For me the most interesting feature of a statistics package is being able to do web mining. Very few do this, I only know the one I am using (metriserve web analytics). Basically it allows you to find hidden links between pages of your site even when they do not directly link to each other. This gives interesting results on one of my pr0n sites. You would not believe the hidden relations you can find between models, poses, etc as surfed by my visitors. I am sure the data could actually be used for an interesting phd study on the the matter ;-) Or maybe I should implement a nice "we also recommend..." feature.
The google one is supposedly good and it doesn't cost anything.
But frankly, I'd rather not let anyone hold all the details of my site, even Google who of course want it for the mass of applicable marketing data. And the concept of linking to foreign scripts on my site isn't too tasty. Really, you should go for a SERVER solution, that you run yourself. Apache can give you a wealth of information already. Posting every users details every minute to Google is just crude.
I've read about this - if you're some trendy blogger this looks like an ipod http://www.haveamint.com/ , I don't know of any free equivalents./
I think you might like visitors: http://www.hping.org/visitors/
I ran AWstats some time but too many security issues. Now I am using Tracewatch at http://www.tiouw.com/. Free, open source and great. See http://www.tracewatch.com/ Nice features are that you are able to follow individual visitors through your sites, find most common paths etc.
They don't even use stylesheets for proper styling.
The horror!
Do they use tables and eat babies, too?
So awstats is not configurable. That's not necessasarily a bad point - nearly everyone wants awstats on their sites, and they're happy with the look of it out of the box. But people should be aware that they cannot change it. 99% of the time, this is an absolute non-issue, it gets installed, works, looks pretty. Job sorted.
For the 1% of people who would like to change it, well, they should be aware that it isn't going to be for them, before they start working with it. Again, this is not that big an issue.
For all the users of awstats, the biggest problem is parse time. Awstats can be a lot slower than other stats packages, which isn't a problem until you start hosting 1000 sites on a server when it suddenly will be an issue. But, again, if you host 1000 sites, awstats just isn't for you and you shoudl check out something else.
I used phpmyvisites before, and it isn't too bad - setup was a breeze, it gives good stats. but I've moved to google analytics now.
"beyond the available options, it ain't customizable. At all."
Sure it is...in the header part of the site: GNU GPL
"He was a wise man who invented beer." - Plato
Sorry, no. Let's say that AOL tune their caching parameters and all of a sudden a hundred thousand of your visitors get a page from AOL's cache instead of from your server. The "trend" will show a massive decrease in visitors, even if the number of visitors you have remains static.
Looking at the difference between two incorrect numbers will not result in a correct number.
Bogtha Bogtha Bogtha
Right: simply put awstats behind an
Bye.
about me A - B
Sorry, but yes. You can easily account for large events like this when you see them. If these changes happen to coincide with some recent marketing campaign it can be tricky, of course, to analyze the source of the variances, but certainly not impossible. Your confidence level in your findings will no doubt decrease, but generally not so significantly as to render the data useless. The continual and otherwise-random (or at least chaotic) variations in the underlying fabric of the Internet that affect your raw data collection do not completely neutralize your ability to filter the signal from the noise.
Looking at the difference between two incorrect numbers will not result in a correct number.
Correct, no; but useful, yes. Your overall premise--that keeping stats is inherently useless--is quite simply wrong. In fact your position is essentially equivalent to saying that background cosmic radiation makes all radio-astronomy meaningless. There is most certainly value in analyzing trends.
You may choose to disagree, continue to wonder why other people seem to care, and chide them about the futility of their task; or you can perhaps acknowledge that maybe this information is more useful than you had presumed, as long as one applies sufficiently robust statistical techniques.
Webalizer is very useful: we recently set up a new web site, and the information it provides has been handy for tweaking. It doesn't seem to provide everything we could want - there's no obvious way to gauge the relative popularity of different links on a given page, for example - but it does provide an idea of relative browser popularity among our visitors, which pages are most important (or at least most visited), and other useful information.
Of course, like all log file-based tools, it suffers from the modern day curse of webmasters everywhere: caching. For example, the site I mentioned is for a university club. Around 1/3 of our hits are from the university cache servers, which all students are strongly encouraged to use. That messes up any analysis of total hits on each page of the site, and it would also mess up analysis of which links people tend to find most useful (assuming those they follow from one page to another are representative of this) if we had tools to do that.
I'm sure anyone who reads Slashdot regularly will see the upside of caching, but a lot of people forget that it has a downside as well. As a webmaster trying to set up the most useful site possible (this is a non-profit group, run by volunteers, so my interests here are entirely benevolent) I would be more than happy to have accurate stats for all visits to our site in the past month, say, rather than lower bandwidth use.
AFAICS, the only way to get anything close to accurate stats at the moment is to install some sort of "web bug" that will make it through the caches. However, this has rather sinister overtones, and I'm reluctant to do something that might be perceived as "spying". Would the crowd here consider it reasonable to go down that road, given that as stated above we have no ulterior motive and are just trying to monitor the way our new site design is working with a view to improving links etc? Would it reassure you if we did the "privacy policy" thing? (Personally, I don't find them particularly reassuring in most cases. If I don't trust a site not to screw me, why would I trust it more because they say they won't? Then again, full disclosure and all that...) Do we really have to resort to the tried-and-tested "visit counter" graphics? :o)
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
First, google now has very detailed info on your customers. Those customers like me who do a view source and see if you are providing such info to third parties without telling me will leave your site and find somewhere else to shop. Also, anyone with a brain has google analytics blocked in their hosts file:
/etc/hosts
$ grep goog
0.0.0.0 www.google-analytics.com
0.0.0.0 google-analytics.com
The company I used to work for, a top 10 Dutch e-commerce business, used Onestat. The marketing managers went crazy over it. They liked it a lot.
On the downside: it was expensive and sometimes the backend was very slow.
On the upside: it was possible to measure the visitors on a very low as wel all high level.
I really like the Visitors package (http://www.hping.org/visitors/). It does a really good job identifying multiple hits as a single visitor (if timestamp, user-agent, etc match). It also have some very good summaries of Google hits.
That's just it - events like this look identical to a drop in visitors. So "when you see them" never applies, because you don't know when you are seeing an event like this and when you are simply seeing fewer visitors.
And even if you could tell when an event like this happens - how are you going to account for them? You don't know how much they are affecting your numbers, because (for example) a single cache could be serving your resources to ten people or a hundred thousand.
Only if you know that new factors have not been introduced between data points or if you can account for them. Neither is true in this case.
The thing is, most people use "sufficiently robust statistical techniques" as a synonym for "I'll ignore the sources of error I can't account for". If you think you can do better than this, then please point out how (and which stats packages do this).
Bogtha Bogtha Bogtha
The only thing you are demonstrating with your comments is that you haven't got the faintest clue about what you are talking about. In any real web company, the marketing and ad sales and data mining people go frantic at ANY change that isn't predicted in advance, and spend a lot of time and effort ensuring they understand exactly why, and how to compensate for those effects, and do so very well.
Why?
Because it translates in very real terms into huge amounts of money lost if they can't get accurate estimates of the return on investment of the sales and marketing expenditure, and experience has shown tracking trends in website accesses works for that purpose.
People commit to multi-million dollar advertising campaigns and sales campaigns on a regular basis based on the track record of trend tracking of website traffic, and do so successfully because tracking these things is a lot easier than what you seem to think.
Yes, there are deviations and errors on a regular basis, but except for large events which are easily spotted and corrected for (if you think the sales people won't wonder WHY the data change, or even why they don't change, you are sadly mistaken), they generally cancel eachother out.
Think what you will, but real life experience contradict you.
I think you are a little disconnected from reality here. The vast majority of "real" web companies are tiny. They certainly don't employ data miners, and the marketing and ad sales people - assuming that there are full-time employees handling this - have enough work to do without analysing statistics.
However, your logic seems to be "it's possible to have reliable numbers because there are people who go nuts if they think they are getting unreliable numbers". Surely you can see that logic just doesn't work? Placebo explanations are far more plausible.
I'll need more than your say-so to believe that.
The Catholic Church is one of the wealthiest organisations on earth because people believe giving money to the Church will help them get into heaven. I'm an atheist. According to your logic, heaven must exist because people throw billions at the Church. Sorry, I still don't believe in heaven.
Instead of repeatedly telling me that it simply must work because people really want to believe the numbers and spend a lot of money by doing so, how about explaining how you can get reliable statistics?
Example: You hire a new columnist, and AOL changes their caching setup at the same time. The AOL changes cause your visitor count to drop sharply. Your new columnist draws in a lot more visitors. However, because of the AOL change, it shows up overall as a moderate decrease in visitors. You're telling me that this - entirely predictable - change would cause sales people to demand an explanation? That they wouldn't write it off as a poor response to the new columnist? You are giving sales people far too much credit.
Bogtha Bogtha Bogtha
ClickTracks has some interesting features geared towards visitor behavior. However, I've only just started using it but I have some doubts about the accuracy of their numbers. It also is missing some of the basic information you would expect from a traditions web stats program.
I was pretty impressed with our demo of NetTracker but it requires some serious cash if you have a busy site.
I said this in my first comment:
I never said that statistics are all bad, just that you can't derive certain information from them.
I've been a web developer for almost a decade, with my own business for over three years. I've worked on well over a hundred websites and web applications, both big and small. What makes you think that I have no experience? Just because you disagree with me? If anything, I think that naïve faith in stats packages implies lack of experience. It's easy to accept numbers that software gives you if you know nothing; you need to actually know what you are talking about to even begin to disagree with them.
Then not only are you not the norm, but you aren't relevant to the discussion, as the Ask Slashdotter clearly wants a piece of software to do the job and not employees.
How? Simply claiming that you do isn't convincing.
I think you are underestimating both the size of AOL and the number of different things that can throw your numbers off.
Well seeing as I explicitly mentioned inbound link detection as a valid use of logs, I completely agree with that.
The rate at which they refresh them is unimportant. The point is that you don't know how many people their cached copy is served to, so one "hit" corresponds to an unknown number of visitors.
I'm not sure if I'm understanding you right; you're saying that you vary your content pages to rotate your advertising? That's really inefficient. The efficient way of doing rotating advertising is to refer to a static URI that performs 302s. That way, both the content and the advertising can be cached effectively but the ads still rotate properly.
And yes, you can separate content from ancillary information, but I'm not sure why you think this is relevant. Webstemmer is one approach.
How accurate is it? How did you measure how accurate it was?
Bogtha Bogtha Bogtha
No they don't. It's very easy to determine whether AOL has suddenly proxy-cached your site, and to differentiate that from a sudden fundamental drop in the actual number of visitors arriving at your site from AOL. There are both technical (meaning carefully crafting your site to collect better raw data) and mathematical (meaning sophisticated analysis of existing historical, current, and ongoing future data) approaches to this. Even intuitively one can easily imagine that the data patterns might have inherent differences; consider the trivial case of comparing your site statistics with another unrelated control site, and finding both sites showed the same decrease in AOL hits at the same time. Google "scientific method" and apply what you find to this problem. It's not nearly as intractable a problem as you might imagine.
And even if you could tell when an event like this happens - how are you going to account for them? You don't know how much they are affecting your numbers, because (for example) a single cache could be serving your resources to ten people or a hundred thousand.
Look, at first I thought you might have been intelluctually curious, but it's clear to me now that you've got an agenda and you're just trolling to prove your point. It would take hours to explain it to you, assuming you'd even care to listen rather than argue. I'm sorry, but the simple truth here is that you apparently lack sufficient knowledge to debate this question on any useful level. Seriously, go take an advanced course on statistics at a local university, go to your library, or just consult google.
The thing is, most people use "sufficiently robust statistical techniques" as a synonym for "I'll ignore the sources of error I can't account for". If you think you can do better than this, then please point out how (and which stats packages do this).
Gee, I guess you don't suppose that maybe I'm not one of those "most people" you mention. Of course I'm not talking about something rudimentary like AWStats or anything similar. While those are indeed useful on their own, they are incredibly limited. You're right that, using only these primitive tools, you're not going to be able to readily account for any unexpected variances. However, the raw log data can be analyzed by any of a vast number of statistical analysis and/or data mining packages. To specifically respond to your quaint challenge, please check out SAS.
Again, please do yourself a favor and get some education in statistics. I'm not going to waste my time to "point out how" but that doesn't invalidate any of my previous statements. You may be pleasantly surpised to find there's more to the world of numbers than counting and computing averages. Your intuition about statistics, and your assumptions about what is possible, are both faling you.
As a sibling poster mentioned, your only making yourself look ignorant; you're not demonstrating a willingness to learn. Or, of course, you could choose to remain ignorant and continually marvel as to why so many companies are spending so much time and money on a set of totally flawed assumptions, and why they just can't see things as clearly as you can.
As with most things, it's not really that one package is "better" than another so much as that one might be more useful to you at any given time.
I use my own package when a Web site is smaller (say, below a million hits per month) because I would rather sample some actual sessions and see where people went and what they were searching for than get an overview. If you see people are searching for Argyle Socks and are finding your page about the Duke of Argyll, you might want to add an extra page and link to it, "if you were looking for...".
The statistic you most want is the things people looked for that might have reached your Web site and didn't, and that's the one you can't easily find!
For a site getting under 1,000 hits per day, look at the server logs in detail at least once a week, and make navigation easier, add more content where it looks promising, think about why some areas don't get traffic, etc etc.
When you're getting 10,000 hits/day, unless most of them are for graphics, the data can become overwhelming. And if you're over 100,000 hits per day you probably need to go to the sorts of reports that give you a very broad overview.
A link checker and a 404 report can be useful -- Cool URIs don't change!
Oh -- for anyone interested, although I do have hololog set up on for example my words and pictures from old books Web site (in a private directory, sorry), the sourceforge page doesn't have a download, mea culpa. If it looks useful to anyone I've shared copies of "hololog" in the past. It could do with some cleaning up, alas!
Liam
Live barefoot!
free engravings/woodcuts
I have to recommend summary.net
It works great, is easy to set up and customize, and has lots of different kinds of reports available. Along with definitions for the different stat types (great for management to be able to understand what they're looking at).
Plus the developer is very responsive.
I've done web analytics implementations for smaller (i.e. $10M e-commerce sites) and larger (i.e. hundreds of millions PV/month) companies.
I'm not much of a fun of log file based analytics systems. They are simply too much work to maintain from an infrastructure POV and caching wreaks havoc with the accuracy of the stats. I therefore recommend 1x1 transparent pixel based systems. If you insist on log file based systems, NetTracker and WebTrends make some decent products.
Google analytics is a great package for smaller companies. It is free and offers a nice chunk of functionality. Caveat emptor -- you get what you pay for. When I audited my last employers GA e-commerce metrics against actual online sales, there was a substantial (I think ~10% error)! However it is still a good tool for understanding trends and issues w/ your analytics.
Webside Story (HBX) and Omniture rule the high end market. It has been a while since I checked pricing, but I think you can expect to start out at the ~$10-$20K/yr range. Both of these products are excellent.
Webside Story sells a lower end package (Hitbox Professional) that has limited commerce metrics but is also pretty decent and afforable. They have an enterprise system: HBX that is excellent.
Omniture also has am impressive system. I don't think they have much in terms of entry level offerings.
Web trends has a product Web Trends live that is about 1/2 the price of the enterprise products from Webside Story and Omniture. It has been a good 5 years since I've their product, but I wasn't especially impressed with it at the time.
Evolution: love it or leave it
For me, you really can't beat a bit of grep, awk, wc and other bits of shell jiggery-pokery. I don't feel the need for webstats beautifiers, although they do have their place. With the vulnerabilities in awstats I wouldn't touch it with somebody else's barge pole these days, which is a shame because I used to really like the look and feel of it.
Analog and webalizer, from ports, might get used on some deployments, from time to time, but that is probably as far as it goes. Hit the console and be your own log file analyser, it is more fun and far more flexible.
Wow, talk about paranoid. It's easy to see why pro-stats packages people might have an agenda (e.g. they might work for a company selling snake oil), but what could I possibly gain from criticising the results?
This is probably the most ridiculous response I've had so far. Let's sum up the responses I've had:
What do all of these responses have in common? They don't actually explain why what I am saying is wrong, they just attack me or are simply non-sequiturs.
Feel free to actually talk specifics, but leave the pointless red herrings and personal attacks out of it.
Bogtha Bogtha Bogtha
Your agenda, apparently, is to argue. Some people thrive on this. Perhaps to prove a point, perhaps to feel important, perhaps because you feel ripped off or betrayed or left out as the only one who can't see what others are seeing. We're actually trying to help, but your argumentative style is getting in the way. If you don't mean to do this, I can't help beyond pointing it out; if it's intentional, well, here's one last laugh of a reply for you.
* I'm wrong because... well I'm too stupid to understand the explanation, so I'm just going to have to trust you.
No, if you had actually bothered to carefully read my post, rather than jump to argue and defend yourself, you'd notice I explicitly mentioned you shouldn't trust me (or anybody else, for that matter) and that you should go educate yourself regarding statistics. Search google, read a book, take a class, etc. There are many ways you can learn, but if you go in with a closed mind, it will be a long uphill battle.
What do all of these responses have in common?
Quite frankly they indicate to me that you are either baiting us for sport (trolling) or you're honestly having significant troubles with your communication skills. This is not meant as an attack--although I'm guessing you'll take it as such--it's just an observation.
They don't actually explain why what I am saying is wrong, they just attack me or are simply non-sequiturs.
I did bother to explain; again you're either having trouble communicating your own viewpoint to us, or we're all just ignorant of your insight that is so clearly beyond us we are lost.
Feel free to actually talk specifics, but leave the pointless red herrings and personal attacks out of it.
Here again, I did mention specifics. Go learn about the products from SAS Institute (I even provided a link for crying out loud) and maybe you'll understand it was not at all pointless. I don't have anything to do with them, except as a user of their products from long ago. Still, you don't have to trust their literature at all, but of you will need at least an introductory level of statistics under your belt before you'll understand why it can be relevant. Despite your qualitative/intuitive arguments to the contrary (which, by the way, you've provide nothing but hand-waving to support) there are a plethora of well-known techniques avaiable to cull trends out of noisy data. As I mentioned, the very existence of the field of radio-astronomy belies your assertions; it would be a futile field of science if your arguments were mathematically sound.
You're clearly not "too stupid" to learn this stuff, but you are most certainly ill-informed on this subject. None of us can force knowledge into your head; when you see lots of otherwise helpful people getting frustrated while trying to explain something to you, it's because you are communicating to us (perhaps, unintentionally; we can't know for certain) that you are unwilling to listen to reason. If you are willing to reason, don't trust me, don't trust any other respondent; just please, please go read a textbook or two (no, not product literature; you wouldn't and shouldn't trust that anyway) and try to learn about this very interesting subject before looking at the various products, and only then should you reply.