Best website statistics package?
goodminton asks: "As the webmaster for a small but growing e-commerce site, I'm becoming increasingly interested in the quality of our site metrics. We currently use a Javascript-based counter that provides good but basic information, however, a recent Slashdot posting has me thinking the stats from our system may not be as accurate as we'd like. What do you think is the best website statistics package, and why?"
You might want to try this: http://www.google.com/analytics/.
It's free!!
(you can register for the invite until it becomes publicly available)
For those subscribers using Slashdot's new discussion system, this link will work better.
From the posting, though, I don't understand why you think your (Javascript-based) stats would be inaccurate, though, since only about 1.34% of users disabled or did not support Javascript.
That said -- I personally use Analog, and although it does give some fairly useful statistics such as search engine terms, most popular directories, referers, etc., I don't find it gives me a very high level of insight into surfing habits. A log analysis tool such as that may be a good starting point for you, though, if you don't currently do analysis of that sort.
Gan Family Homepage
Webalizer. Just feed it some nice Apache logs, and let it do the talking. Or, if you're less of the command-line guy, I've heard Google Analytics is great.
I'm a big fan of AWStats (awstats.sourceforge.net). It breaks things down nicely and is under active development. I used webalizer for a little bit last year, but it hasn't been updated in years and honestly stinks compared to aw.
I just went through this process for my employer. While I like Google Analytics (and currently use it for my personal web pages), it's a bit more focused on e-commerce than I need - although that may be good for you.
What I decided on was http://awstats.sourceforge.net/. It's got a pretty impressive feature list, and I like the look, and the sheer volume of data it can collect.
One caveat - the current version (6.5) has a command-injection vulnerability when run in cgi mode (as opposed to statically-created pages), so watch where & how you install it.
Poor means hoping the toothache goes away.
If you are trying to find out how many people are visiting your site, or how popular particular browsers are, just give up now. No stats package can tell you that. Some pretend to, but it's snake oil.
The basic problem is that not only are you fighting against the basic nature of a stateless protocol, but the things that skew your numbers (proxies, caching, etc) skew your numbers by an unknowable amount. Some things inflate your numbers, some things hide visitors from you. They don't cancel each other out like some people tell you (just think about it). In some cases, your visitors might not even communicate with your server at all.
Web statistics are good for measuring server load and monitoring things like search terms people use to find your site, inbound links from referrers, etc. What you will find is that you can install any old stats package, and it will give you lots of pretty charts and numbers, but at the end of the day, you might as well make the numbers up, because they don't reflect reality. And yet for some reason, people still like having them, even when they know the numbers are totally wrong. I have yet to figure out why.
Bogtha Bogtha Bogtha
Sawmill is an awesome slicer-and-dicer of your web logs. I haven't done web stuff in several years, but the package was awesome five years ago, and it looks like they've been refining the product over the years.
If you want "the best" that's it. However, it's quite expensive (some % of ecomm revenue, I believe)
Trends.
If you can get invited to analytics it really rocks. Awstats is good, and I have always been fond of webalizer. I run a small hosting company, and I have found that awstats and webalizer can be a bit processor intensive under certain conditions. The nice thing about analytics is that the processing takes place off site. Analytics also has a lot more information geared toward marketing, and the metrics that can help make marketing decisions. Awstats and webalizer and especially webalizer are more about presenting data from the logs.
-MS2k
Throwing your hands up in the air and declaring that because you cannot be sure it's all garbage is foolishness. Know the limitations of your tools, accept the error, and take what you can get.
Search 2010 Gen Con events
I'm a big fan of AWStats. It primarily gets its stats from parsing your access_log, but it also includes a javascript portion you can elect to use if you're interested in collecting more detailed information about your visitors (screen resolution, flash versions, etc.).
One caveat, though, if you choose to implement AWStats is that you should keep it in an access-restricted area of your webserver. There have are some pretty nasty vulnerabilities in AWStats. As long as you keep it secured, you should be fine, though.
If you're using PHP, you need to give BBClone a try. Just do an include from your scripts and it's good to go. The stats it generates are quite nice. I also use Webalizer on the server logs.
"It ain't a war against drugs.it's a war against personal freedom" --Bill Hicks
We use Urchin - now "Google Analytics". Unless you want to delete cookies every page hit, and use the Web Developer Firefox plugin to remove hidden fields for every form submission, we pretty much have you tracked. This isn't 1995 y'know...
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
You say *small* so I say do your own. My code that collects and saves the actual data is less than a hundred lines of very simple PHP.
When you're just beginning you want to follow each individual session. Most of the packages just aggregate data; they don't give info on a session level. You want info on *pages* viewed; not gifs and css and what-not
Other advantages:
... well this is harder but you've got the foundations there.
* Send yourself a jabber message every time special things happen. Just a case of slotting in a class and adding five lines of code.
* Log which countries they're coming from. Install a couple of files and add five lines of code.
* Do things based on user-agent, country, whatever.
* In future you might like to block or redirect users
The code for actuly presenting the data you collect is actually a lot more involved, if you want it to be versatile. But you're free to download the data and write that part in the language of your choice ... at your leisure.
For me the most interesting feature of a statistics package is being able to do web mining. Very few do this, I only know the one I am using (metriserve web analytics). Basically it allows you to find hidden links between pages of your site even when they do not directly link to each other. This gives interesting results on one of my pr0n sites. You would not believe the hidden relations you can find between models, poses, etc as surfed by my visitors. I am sure the data could actually be used for an interesting phd study on the the matter ;-) Or maybe I should implement a nice "we also recommend..." feature.
The google one is supposedly good and it doesn't cost anything.
But frankly, I'd rather not let anyone hold all the details of my site, even Google who of course want it for the mass of applicable marketing data. And the concept of linking to foreign scripts on my site isn't too tasty. Really, you should go for a SERVER solution, that you run yourself. Apache can give you a wealth of information already. Posting every users details every minute to Google is just crude.
I've read about this - if you're some trendy blogger this looks like an ipod http://www.haveamint.com/ , I don't know of any free equivalents./
I think you might like visitors: http://www.hping.org/visitors/
I ran AWstats some time but too many security issues. Now I am using Tracewatch at http://www.tiouw.com/. Free, open source and great. See http://www.tracewatch.com/ Nice features are that you are able to follow individual visitors through your sites, find most common paths etc.
When I was looking for a weblog analyzer i could find packages in two categories: $800+ visitor trail analisis software (really cool, but far too expensive for my clients) and free packages that only give very basic information (analog, webalizer).
Wusage gives me very detailed statistics, most followed trails, keyword effectiveness, history of data that goes back infinitely, far more than webalizer or AWstats ever gives you, for $25.
Looks totally uncool though, with a tables and frames based interface. It hasn't updated for quite some time. But it is by far the most functional I analyzer i have ever fond for less than $300.
http://www.boutell.com/wusage/
I used phpmyvisites before, and it isn't too bad - setup was a breeze, it gives good stats. but I've moved to google analytics now.
Webalizer is very useful: we recently set up a new web site, and the information it provides has been handy for tweaking. It doesn't seem to provide everything we could want - there's no obvious way to gauge the relative popularity of different links on a given page, for example - but it does provide an idea of relative browser popularity among our visitors, which pages are most important (or at least most visited), and other useful information.
Of course, like all log file-based tools, it suffers from the modern day curse of webmasters everywhere: caching. For example, the site I mentioned is for a university club. Around 1/3 of our hits are from the university cache servers, which all students are strongly encouraged to use. That messes up any analysis of total hits on each page of the site, and it would also mess up analysis of which links people tend to find most useful (assuming those they follow from one page to another are representative of this) if we had tools to do that.
I'm sure anyone who reads Slashdot regularly will see the upside of caching, but a lot of people forget that it has a downside as well. As a webmaster trying to set up the most useful site possible (this is a non-profit group, run by volunteers, so my interests here are entirely benevolent) I would be more than happy to have accurate stats for all visits to our site in the past month, say, rather than lower bandwidth use.
AFAICS, the only way to get anything close to accurate stats at the moment is to install some sort of "web bug" that will make it through the caches. However, this has rather sinister overtones, and I'm reluctant to do something that might be perceived as "spying". Would the crowd here consider it reasonable to go down that road, given that as stated above we have no ulterior motive and are just trying to monitor the way our new site design is working with a view to improving links etc? Would it reassure you if we did the "privacy policy" thing? (Personally, I don't find them particularly reassuring in most cases. If I don't trust a site not to screw me, why would I trust it more because they say they won't? Then again, full disclosure and all that...) Do we really have to resort to the tried-and-tested "visit counter" graphics? :o)
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
First, google now has very detailed info on your customers. Those customers like me who do a view source and see if you are providing such info to third parties without telling me will leave your site and find somewhere else to shop. Also, anyone with a brain has google analytics blocked in their hosts file:
/etc/hosts
$ grep goog
0.0.0.0 www.google-analytics.com
0.0.0.0 google-analytics.com
The company I used to work for, a top 10 Dutch e-commerce business, used Onestat. The marketing managers went crazy over it. They liked it a lot.
On the downside: it was expensive and sometimes the backend was very slow.
On the upside: it was possible to measure the visitors on a very low as wel all high level.
They will have no only give these services away for free, .. too bad not many realize that.
they will need to pay to people to install their f&cking
trackers on their websites
I really like the Visitors package (http://www.hping.org/visitors/). It does a really good job identifying multiple hits as a single visitor (if timestamp, user-agent, etc match). It also have some very good summaries of Google hits.
ClickTracks has some interesting features geared towards visitor behavior. However, I've only just started using it but I have some doubts about the accuracy of their numbers. It also is missing some of the basic information you would expect from a traditions web stats program.
I was pretty impressed with our demo of NetTracker but it requires some serious cash if you have a busy site.
As with most things, it's not really that one package is "better" than another so much as that one might be more useful to you at any given time.
I use my own package when a Web site is smaller (say, below a million hits per month) because I would rather sample some actual sessions and see where people went and what they were searching for than get an overview. If you see people are searching for Argyle Socks and are finding your page about the Duke of Argyll, you might want to add an extra page and link to it, "if you were looking for...".
The statistic you most want is the things people looked for that might have reached your Web site and didn't, and that's the one you can't easily find!
For a site getting under 1,000 hits per day, look at the server logs in detail at least once a week, and make navigation easier, add more content where it looks promising, think about why some areas don't get traffic, etc etc.
When you're getting 10,000 hits/day, unless most of them are for graphics, the data can become overwhelming. And if you're over 100,000 hits per day you probably need to go to the sorts of reports that give you a very broad overview.
A link checker and a 404 report can be useful -- Cool URIs don't change!
Oh -- for anyone interested, although I do have hololog set up on for example my words and pictures from old books Web site (in a private directory, sorry), the sourceforge page doesn't have a download, mea culpa. If it looks useful to anyone I've shared copies of "hololog" in the past. It could do with some cleaning up, alas!
Liam
Live barefoot!
free engravings/woodcuts
If you let a search engine company provide web stats for you, then what you are doing is contributing additional information to the search engine, and this comes at the expense of your web users' privacy.
This is more than just information about traffic to your web site. When a user's browser processes the Google code snippet embedded within your web page, the browser also forwards to Google's web host any cookies stored in the browser which originated from domain google.com, and the information forwarded will include a global ID number which is unique for that particular web user. Google's database will already include everything the user with that global ID has ever searched for; a tremendous amount of information. But now Google will have the additional information that the user visited a specific page on your site at a specific time; and perhaps even the amount of time spent there.
When you combine all of this information with the data similarly collected from all of the Google Ads which are everywhere, then what you end up with is a search engine company starts to look more like an aggragator, and specifically, you get a company that looks increasingly like the notorious DoubleClick.
And don't fall for the line that nothing personally identifiable is retained, because your IP address alone is very often considered to be personally identifiable information, and they definitely do retain this.
I have to recommend summary.net
It works great, is easy to set up and customize, and has lots of different kinds of reports available. Along with definitions for the different stat types (great for management to be able to understand what they're looking at).
Plus the developer is very responsive.
I've done web analytics implementations for smaller (i.e. $10M e-commerce sites) and larger (i.e. hundreds of millions PV/month) companies.
I'm not much of a fun of log file based analytics systems. They are simply too much work to maintain from an infrastructure POV and caching wreaks havoc with the accuracy of the stats. I therefore recommend 1x1 transparent pixel based systems. If you insist on log file based systems, NetTracker and WebTrends make some decent products.
Google analytics is a great package for smaller companies. It is free and offers a nice chunk of functionality. Caveat emptor -- you get what you pay for. When I audited my last employers GA e-commerce metrics against actual online sales, there was a substantial (I think ~10% error)! However it is still a good tool for understanding trends and issues w/ your analytics.
Webside Story (HBX) and Omniture rule the high end market. It has been a while since I checked pricing, but I think you can expect to start out at the ~$10-$20K/yr range. Both of these products are excellent.
Webside Story sells a lower end package (Hitbox Professional) that has limited commerce metrics but is also pretty decent and afforable. They have an enterprise system: HBX that is excellent.
Omniture also has am impressive system. I don't think they have much in terms of entry level offerings.
Web trends has a product Web Trends live that is about 1/2 the price of the enterprise products from Webside Story and Omniture. It has been a good 5 years since I've their product, but I wasn't especially impressed with it at the time.
Evolution: love it or leave it
For me, you really can't beat a bit of grep, awk, wc and other bits of shell jiggery-pokery. I don't feel the need for webstats beautifiers, although they do have their place. With the vulnerabilities in awstats I wouldn't touch it with somebody else's barge pole these days, which is a shame because I used to really like the look and feel of it.
Analog and webalizer, from ports, might get used on some deployments, from time to time, but that is probably as far as it goes. Hit the console and be your own log file analyser, it is more fun and far more flexible.