Domain: mrunix.net
Stories and comments across the archive that link to mrunix.net.
Comments · 24
-
Cephalopod
Squid works well as a transparent proxy, when used in conjunction with a log parser, might be just what you're looking for.
-
I've had something similar with nedstat ...
I've been using their "free" basic service for years; it was always their small little 16x16/32x32 icon; not really intrusive.
Then suddenly my pages using their stats service had a nasty pop-under. I've seen this at other sites too and found out the "new" advertisement ways after a few weeks when I started getting bothered seeing the same pop-unders over and over while I wasn't even on any other sites.
These pop-unders were all activated under Firefox and it's clearly in their TOS they can advertise on websites; only; which I had on my website was all except "good" for my site; the pop-under involved pornography because of a reference to some articles about STD's a couple of years ago. It made me sick to always get that XXX-commercial on my own website and got rid of Nedstat ever since.
webalizer for the win! less eye candy but still enough stats to chew on without all the nastyness... -
Don't use cookies to log traffic
Use IP address. Use webalizer. People using cookies to log traffic raise internet traffic by 150%, how about that!
-
Good old Webalizer and newer stuff
Awstats seems to be the modern usual answer (http://awstats.sourceforge.net/), used and recommended by many admins and groups (in my case EGEE, European Science Grid intiative http://www.eu-egee.org/) but for traditionalists with no eye-candy desires, there is a copy of Webalizer (http://www.mrunix.net/webalizer/) lurking on most servers and almost all destribution package repositories. It's worth looking at the wikipedia page for specials, extended verions and general info on web server statistics and analysis: http://en.wikipedia.org/wiki/Webalizer.
Particularly, Stone Steps Webalizer is an interesting version of feature-full and candy-enabled version: http://www.stonesteps.ca/projects/webalizer/. Others can be easily found on Freshmeat: http://freshmeat.net/search/?q=webalizer§ion=
p rojects (i.e. Webalizer Extended with included Geolizer and extensive 404 analysis support, http://www.patrickfrei.ch/webalizer/ and AwFull with usability, CSS and geo-ip features, http://www.stedee.id.au/awffull etc.).Others can be found on Freshmeat (117 hits at this time http://freshmeat.net/search/?q=web&trove_cat_id=2
4 5§ion=trove_cat) and Wikipedia (very short and poor stub of a list that you might want to improve after your extensive testing :-) : http://en.wikipedia.org/wiki/Category:Free_web_ana lytics_software.There is also Sherlog, an Apache Log Analyser, specialized in user experinece tracking more than statistcs - an interesting complimentary tool (http://sherlog.europeanservers.net/.
-
Re:AWStats
Webalizer also does a great job of parsing your logfiles and producing graphs and charts:
http://www.mrunix.net/webalizer/ -
Webalizer or Analytics
Webalizer. Just feed it some nice Apache logs, and let it do the talking. Or, if you're less of the command-line guy, I've heard Google Analytics is great.
-
Just count visits.. why count ip's?
If you use http://www.mrunix.net/webalizer/ then it counts number of visits, the same IP is still the same visit if the user of that IP (re)loads a page within 30 minutes. After that it's a new visit... And it should be safe to assume one visit is one user. Saying that 1:10 is the ratio for IP/users is simply saying that every user will visit the site ten times - which seems like a worthless number without limiting it to a time-period and also, the number seems to be taken out of no-where.
-
Re:Big Brother-esque (again)
Webalizer does incremental batches and keeps running totals.
-
webalizer referrer work-a-round patchWe started seeing this type of spam back in June of 2004. In our case the referrer spam was attempting to get webalizer to create links in the "top N referrer" table back to their pron sites.
Our initial attempt to solve this was to complain to the ISP of the referrer spammers. That did no good. The ISP was willing to listen, but not to act.
We did manage to actually track down the jerks who were doing the referrer spam. They told us that they were attempting to create links back to their sites for better search engine placement.
Our work-a-round was two fold. For various reasons we wanted to keep these our webalizer stats externally accessible. So we requested bots (the ones that follow the rules at least) to not index our external stats and we modified webalizer to not form links back to the referrers.
We edited our robots.txt file to exclude legit bots from our stats:
User-agent: *
Disallow: /statsWe also patched webalizer v2.01-10 to no longer form URLs to referrers. Now only a plain text line without the leading http:// shows up in the table. The original referrer spammers gave up when they lost off the the links back to their sites.
The bottom of the 0.basic.patch prevents webalizer from forming links back to referrers. See README-FIRST for details on this patch set.
-
webalizer referrer work-a-round patchWe started seeing this type of spam back in June of 2004. In our case the referrer spam was attempting to get webalizer to create links in the "top N referrer" table back to their pron sites.
Our initial attempt to solve this was to complain to the ISP of the referrer spammers. That did no good. The ISP was willing to listen, but not to act.
We did manage to actually track down the jerks who were doing the referrer spam. They told us that they were attempting to create links back to their sites for better search engine placement.
Our work-a-round was two fold. For various reasons we wanted to keep these our webalizer stats externally accessible. So we requested bots (the ones that follow the rules at least) to not index our external stats and we modified webalizer to not form links back to the referrers.
We edited our robots.txt file to exclude legit bots from our stats:
User-agent: *
Disallow: /statsWe also patched webalizer v2.01-10 to no longer form URLs to referrers. Now only a plain text line without the leading http:// shows up in the table. The original referrer spammers gave up when they lost off the the links back to their sites.
The bottom of the 0.basic.patch prevents webalizer from forming links back to referrers. See README-FIRST for details on this patch set.
-
Re:No, *I* am Spartacus!
"How can you using Lynx - not being able to access a website - teach anyone else about accessibility?"The result is simply going to be that you're not going to see the website."
no, there's another result; hits from lynx browsers will appear in the site's web access logs, and any statistics (cf. webalizer) derived therefrom.
the hope is that the site operators will see this and realize that they need to make their sites accessible to such browsers.
whether this is a realistic hope is another question. i'd guess that the vast majority of website operators are completely ignorant of such things. it's not that they don't care about accessibility to text browsers, but worse: they don't even know such things exist.
-
Re:Analog
Yes, Analog, AWStats, and Webalizer (here's Webalizer Win32) are the three packages that my web host installs for all its users.
-
Re:Log spammers
Many people (including me) use a tool like the webalizer, wich generates a page of server statistics. This page links back to the referrers. So yes, Google has access to the server logs.
The "referrer spam" phenomenon began in the weblogging community, wich use things like the webalizer extensively.
I reccomend asking Google to not cache the webalizer stats page (via robots.txt). -
Re:that stinks
That's my strategy too. I would take it one further however. Pick apart the email headers that you get bounced back to you and alert the ISPs. Many ISPs (not all) have policies against mass emails. If the email links to a website, find the domain admin and technical contacts and let their ISP know..
I've found that in my experience, ISPs aren't as responsive to SPAM-type abuse issues as are webmasters.
My homepage (vanity domain) has a "Webserver Stats" section that logs, among other things, referrers to my site. Some unscrupulous types found this out and decided to take advantage of this public advertising medium. What resulted were literally hundreds of requests per night for each of about 30 domains (almost all of them pornographic in nature) for non-existant files (I suppose they figured my 404 page was the smallest thing on my site). With their URLs in the referrer field, Webalizer dutifully added them up and created a referrers graph that, not surprisingly, was filled with the top ten of these porn sites.
These attacks (which also flooded my ADSL line's bandwidth, I might add) were carried out from two major U.S. ISPs. E-mail to the ISPs got me little more than an automated "Thanks for the heads-up" responses, so I decided to go after the websites themselves. A little whois work on the domains and I found that they were all hosted at the same hosting company who responded immediately requesting more information, and who then acted on the complaint in short order and the problem went away.
A little bit of dilligence and these people can be nailed down. Many of them don't seem to host their own websites, so use their webhosting companies against them. Track them down and have them ousted. The transition time to a new company will be real bite in the keester and should make their job a little less worthwhile. At the very worst, they'll give up on third party hosting companies and have to shoulder the cost of hosting the sites themselves.
-
Re:CorporationsI'm sure that when a Red Hat rep walks into a company his materials leave out the $ in Micro$oft.
Sometimes it will slip in even when you aren't the one mangling names like a six-year-old. At my last job, I installed Webalizer on the web server because I wanted an excuse to make the site not quite so hostile to non-IE browsers, and I had heard that it could report browser statistics (which it can). Unfortunately, the author of Webalizer apparently thought it was "cute" to report occurances of Microsoft Internet Explorer as Micro$oft Internet Exploder. Take a look at the sample report for the software (from the Webalizer website) for an example of what I mean. There is no mention of this misfeature anywhere. Luckily I noticed it and figured out how to change the entry name before I showed it to my boss, because he would not have found it funny, and most definitely not professional. Unfortunately, there wasn't a large enough percentage of non-IE users of the site to even pretend to be concerned about cross-browser issues.
-
Re:CorporationsI'm sure that when a Red Hat rep walks into a company his materials leave out the $ in Micro$oft.
Sometimes it will slip in even when you aren't the one mangling names like a six-year-old. At my last job, I installed Webalizer on the web server because I wanted an excuse to make the site not quite so hostile to non-IE browsers, and I had heard that it could report browser statistics (which it can). Unfortunately, the author of Webalizer apparently thought it was "cute" to report occurances of Microsoft Internet Explorer as Micro$oft Internet Exploder. Take a look at the sample report for the software (from the Webalizer website) for an example of what I mean. There is no mention of this misfeature anywhere. Luckily I noticed it and figured out how to change the entry name before I showed it to my boss, because he would not have found it funny, and most definitely not professional. Unfortunately, there wasn't a large enough percentage of non-IE users of the site to even pretend to be concerned about cross-browser issues.
-
Re:Karma whore linkOf that bunch, I must say that I really like Webalizer. It produces really nice looking reports with pie and bar charts and the level of detail can be customized to almost any need. It's also nice that it'll work on both web server logs as well as squid logs....
Analog may be the most poular, but I also found it rather difficult to set up and get useful data into and out of.
Balam
-
I've done this myself on eBay....
Once, a long long time ago, I was checking out the stats for my webpage with the Webalizer and was noticing an awful lot of referrals from eBay. Manually parsing my Apache log files I found the auction number and looked it up...
Imagine my surprise when I found it was some lamer selling burned CD's of encoded anime fansubs. Being friends with people who encode fansubs (freely) I was most put out by the fact that some scumbag was attempting to profit from it. There was only one thing I could do...
Since the lamer had linked to a (huge) wallpaper image on my site to use as his page background I did the sensible thing: renamed the wallpaper, downloaded the picture of Sting3r (the goatse guy) and stuck it in place of the wallpaper's original filename.
Needless to say eBay pulled the auction in short order, something they wouldn't have done if I'd simply cried "copyright infringement!" -
reportingWhile obviously your situation may preclude it, I've always found Perl's built-in formatting capability to be incredibly easy to use, and it also performs nicely. It's so nice that I've often gone through the trouble of adding a Perl reporter to my C++/Java/et cetera applications. They don't call it "Practical Extraction and Reporting Language" for nothing. (Then again, they don't call it "Pathologically Eclectic Rubbish Lister" for nothing, either.
;-)Regardless of the program you use, try to store the data in XML format. Why? Because then you can use one XSLT for conversion to HTML for web use, another XSLT for conversion to PostScript for printing, another XSLT for conversion to Excel spreadsheets -- you get the idea. While I hate to say so on this site, SQL Server 2000 offers some particularly nice functionality that can be used to implement this -- such as automatic transformation of tables to XML documents.
If you require graphics as well as text, check out the gd graphics library. The Webalizer is an absolutely delicious example of how gd can be used to create slick PDF graphs on the fly.
You mention that you'd like to integrate with J2EE... I'm somewhat of a Java guru and can say without wavering that Java is not a first-choice solution for text-based reporting. If your reports are being generated by a Perl or PL/SQL script and you're just outputting the results from Java, it's fine
;), but text processing and transformation isn't too hot in the standard Java APIs. Now if you want to pay for a third-party API, you may be able to get around this...For graphical reporting, however, Java is one of the best solutions. There are a plethora of Java charting tools available, although the decent ones will cost some dough...
Anyhoo, if you provide some more details on your specific task I can give a better recommendation.
--
-
Re:Questions on making your own stats
but I'd like to find out what others are doing for making their own web stats.
I use Webalizer. It's GPL, it works, it's fast and very configurable. It logs search-strings (very configurable) so you know what people were looking for. It can detect a 'visit' reasonably (with a configurable timeout).
-
Web Logs
I highly reccomend Webalizer for your web logs. It's the best I've seen, since it has an incremental feature, so deleting/cropping your Apache logs doesn't faze it, and it really tracks everything you could possibly want: views/hits/files/visits, referrals, search engine keywords, daily/hourly stats, generates pretty graphs, the works. I've got it cronned to run every hour, and it parses my logs in a few seconds on a p166. You can grab it from FreeBSD ports, or at its webpage here.
-
Usage Info
To generate page stats, check out the Webalizer.
-
Re:On that note, Web Log Parsers?
Here is Analog
I use Webalizer, too. -
Question
Here's the problem I have with all of this. I run a small web hosting business out of my house, and I DEPEND on free and opensource software to do this.
I have no problem changing all the gifs on my servers and all the related html, HOWEVER...
Because of my dependance on free software (and automatically created images), I now have to worry about programs I have that generating gif images to display information like Webalizer & MRTG.
My clients expect Web statistics for thier sites, so doing without is not an option.
Am I going to have to go and buy my software now (shudder)???