Hits or Misses: Who is Your Website's Audience?
securitas writes "The Christian Science Monitor's Gregory M. Lamb wrote a
story interesting to anyone who runs a website: How do you accurately and reliably measure the audience for your website? From the article: 'Most websites have no idea how many people view their content. This inherent fuzziness is causing problems for commercial websites, especially online publications desperate to make money from Internet advertising... How can you charge for ads when it's nearly impossible to tell advertisers how many people will see them?' The article discusses the flaws and problems with Nielsen/NetRatings and comScore Media Metrix - they grossly undersample workplace users - and the rise in the number of sites requiring user registration."
By height.
$30 Off All Plans: Use code TRIPLESAWBUCK
I always just set a cookie with a tracking ID, and then use that to keep track of the anon user. counting the number of tracking cookies given out each day, and the time they were used for seems to work sufficiently for me... or is there some problem with that I don't know about?
Call me oblivious, but wasn't this one of the reasons why cookies were created?
Hmmm.
that is why most online advertising consists of fees based on the 'per click' methodology?
"How can you charge for ads when it's nearly impossible to tell advertisers how many people will see them" --- These people use access logs??
I dont know what the real strategy of most online newspaper websites is, but they seem to follow this pattern:
1. Make content available online, free of cost
2. Wait for people to start using and monitor the growth in number of hits
3. Reduce the website response to a crawl with mind numbing popups, flash ads, quick time ads, and generally anything that would make sure the user "spends" more a few minutes on the homepage
4. Wait for most users to go away to some other website.
5. The few braves who remain - force them to register and read all the content, since you want to chart your users by demography.
6. Finally, now make most of the content premium - based upon the data collected in step 5, however inaccurate it is. Flood the site with more ads, if possible
7. Moan and bitch that there is no revenue generated.
8. Repeat cycle
http://efil.blogspot.com/
From the article: 'Most websites have no idea how many people view their content.
They don't employ a unique ID stored within a lightly encrypted cookie, then? Of course, those merely provide a statistic related to the amount of individual computers viewing the Web site, not the amount of people. They obviously fail to account for computers with multiple users, such as household machines and public terminals.
Do you like German cars?
may think their audience is a bunch of nerds, but in reality its a bunch of suave playboys that get to have sex with many hot women. I suggest they make the appropriate content changes.
This is completely backwards. Infact, it's exactly the opposite. It's quite simple to tell how many people view your webpage, and hell of alot easier (and more accurate) than radio or TV.
This is the source of the problem with web advertising, your numbers fairly accurate and based on actual events, not some satistically questionable sampling method. There's little room for fudging.
Demographics on the other hand are a little more complicated. There, you actually have to ask.
---
Less Talk, More Beer.
I suppose the CSM is about to discover how many slashdotters view the content of this website...
Advertisers shouldn't care how many people visit.
Here's a good example. The website Xlr8yourmac.com is easiest the single most valuable website to me on the internet. I would imagine it's hit totals are pretty low. That said, it generates a good bit of revenue from advertisers, mainly Other World Computing.
My main website generates traffic through eBay sales and by posting in forums such as this around the internet.
I mention that I have a website in all my ebay winning bidder emails.
I also generate traffic by posting in forums about Apple or Mac Community topics.
Further I have a traffic generating site called JackWhispers that follows Mac Community Scams and provides different perspectives on Macintosh news. I could care less if anyone visits. All I know is that the target IS getting my message or at least finding it in Google.
My traffic to jackwhispers has risen from 400 to about 4000 a month.
I also sell to people here on slashdot - people that post in my journal and see my various postings.
Yahoo geocities premium accounts (as my site is) monitor traffic without cookies (if you want them too)
Yell & scream & rant & rave... it's no use... you need a shaaaave ~ Bugs Bunny
Most hosting services come with tracking tools. My host has tools that will even break down IPs to general locations, I believe. It has so many options that it gets difficult to use. So, if you have a good host, you should be able to find out who uses your site w/o any additional work.
If not, as most have said, set a cookie with a tracking ID. Basically, if you make a website without a decent hit counter (when you need one), you're not much of a web designer / developer. I usually log IPs, user agents, and dates, even though I never look at them. Just in case.
I would put a CGI page counter at the bottom of every page. I think the one with flame numbers works the best for this, but the digital looking on also works well.
Great ideas often receive violent opposition from mediocre minds. - Albert Einstein
i dont care who looks at my site as long as my statistics page reports more than just me.
Anyway, the exact numbers don't really tell you anything. You really need to know the differences between two sub-populations (are visitors from pay-per-click ads or visitors from standard search results more likely to buy?). A program which makes this sort of comparison easy will give you far more insight than one which tries to get the total number of visitors closer to some mythical "true" number.
(I am the author of analog and CTO of ClickTracks, but I'm writing in a personal capacity).
11.0010010000111111011010101000100010000101101000
For small sites (weblogs and the like), an analisys of the logs, the IPs you're visited from and the http referrers would give you an approximate snapshot of the average visitor's likes and dislikes. And, what's more: you can back-feed that information to the search engines (via web programming, let it be cgi, php or asp.
So, you track the http referrers that browsers send to your server. You display them, resumed, in your web page. Search engines are feed that information. More people will get to your page due to this backfeed.
That's in theory. Back here in the real world, if you run a weblog, put "g-string" somewhere in your page, "teens" twenty paragraphs later, "15-year-old" forty paragraphs later, and "nude" sixty paragraphs later (about unrelated topics). Just by chance, somebody will visit your page searching for "15-year-old teens in g-string" and that will be backfeed to the search engines. Lather, rinse, repeat. In about two months, paedophiles from all around the world will be visiting your tiny weblog and you'll be sure about that thank to the referrer processing and backfeeding. Guaranteed.
I found this article to be rather insightful. I personally run a small IT/science-news site (in Finnish) and I'm really having a hard time figuring out visitors of the site. Of course I can get some data from the log analyzing software (awstats and webalizer are being used for the site) but it really doesn't tell me what I want. It seems that the website logs don't always tell the truth. For example I'm getting about 20-30 hits a day with a referrer pointing to a site that's a search engine for blogs (${god} knows why the site has been tagged as a blog) but browsing through the actual logs reveal the hits to belong to a indexing-robot of the site that's a little too enthusiastic.
The most reliable way to find out about the visitors on a given site would be a user survey, although not complete as not everyone would fill it out, but it would give an idea about the habits of your most frequent visitors. I, if I were an advertiser, would be interested in more than just number of hits and visits and most advertisers would be baffled by stuff like "we got XXXYYYZZZ HTTP requests last month". Personally I would prefer to advertise on sites with a well-built sense of community and an active userbase that's keen to interact with the website, when I browse a site for the first time or a site that I visit infrequently, I rarely click on banners or ads. I'm more prone to clicking ads on sites which I visit daily or so, it gives me a feeling of supporting the site I like and I just might buy something from the advertiser if they are offering something that I need, therefore focused advertising is the key, hence again you need to know your users.
Logs tell you numbers but you need the visitors themselves to tell you who they really are and how often they visit your site.
http://www.mrunix.net/webalizer/ http://awstats.sourceforge.net/
...amaze me. I recently helped a friend put together a website for his bakery. Why did he want a website? Because it was something to do that he hadn't done before. Will it drive customers to his place? I doubt it; most small companies like that survive on local ads and word of mouth. I guess my point is that I am still, after all this time, doubtful when it comes to the accuracy of usefulness of ads or site based on visits, click-throughs, etc. I don't think knowledge of the availability of a product is enough; a site must be informative and interactive above and beyond what other forms of advertising can do. While some companies do a great job of this, too many others are like my friend's site---little more than a billboard.
Don't be a looter...and yes, I know that it's spelled with an "A" instead of an "E".
Like television, it's all page views. It's not complicated. It does not matter who viewed, or when, or repeated, or same computer but another person. Eyeballs, a pair of eyeballs come to the page x times a day. This is the number to go with. Now if you have a membership and can get demographics in the sign up form, that is another story.
Anyone seen my jagged little pill?
Alexa's model is interesting - they hand out a "free" toolbar that gives you google search, as well as pinging Alexa and showing you every page's Alexa rank.
Unfortunately, the toolbar also slows down your browsing (especially if you're on dialup). And the more tech-savvy a user is, the less likely they are to want that toolbar on their system. Thus tech sites are going to be depressed in those rankings, always.
Alexa also can't tell a subdomain from a regular domain - so subpages of IGN.com or UGO wind up just increasing IGN or UGO's rank, and blogs hosted at X.BlogHost.Com just raise BlogHost.com's rank without being able to tell what the particular blog's rank might be.
Finally, the biggest flaw in Alexa's ranking system is that it's based on voluntary input; rather than finding 'Net users and trying to get a representative sample (which is the goal of the Nielsen TV setup), they take anyone who'll put in their toolbar. Sure, they can get a pretty large number of idiots to install the thing, but they're still idiots - there are demographics that the toolbar just won't get adopted by in that fashion.
The other sad thing is, there are companies that use Alexa's page rankings to decide how much they'll pay for ads. Go figure.
I use webalizer, cookies, and a two stats packages for my cms system (geeklog). One stats package only admin has privalige to, which gives me very detailed acurate info such as time, ip, which page viewed, referers, UID (user id), links followed, country browser, platform ect. All open source. Does the job for me.
"If the facts don't fit the theory, change the facts." -Albert Einstein
Karma? There's a serial modder out there.
This is about +2 Funny, not 0 Offtopic. The problem is that the mods here are all little kiddies who have never seen such classic movies and have no idea what it is the post is about.
If you think Caddyshack is a "classic" then it's you who is probably a "little kiddie." It's a pretty funny film but certainly not a classic!
The CSM is essentially secular. See the 'about us' pages. Seems that the naming of the CSM was a rather unpopular move by the paper's creator, Mary Baker Eddy - the rest of the staff didn't seem to want to call it that, since it's not really Christian at all...
---
"I did nothing. I did absolutely nothing and it was everything that I thought it could be."
Way to pimp your website two times in one post. Maybe you could show a little modesty and put your advertisement into your sig so that we can avoid it? You already got a Ask Slashdot story. Isn't that enough? kthxbye
Most websites have no idea how many people view their content
We normally use our leaves to view content. Hope this helps the analysis.
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
Tracking unique visitors?
Not that hard if small margin of error is ok.
Charging for ads when you don't know how many page views you will get?
What about CPM (cost per 1k impression) rates? Want 10k impressions? Pay for 10k impressions.
Target demographics?
How about track what article topics are popular, how many return readers per topic, etc?
These are not that hard to do with the right people. The guy who writes the "techie column" in many cases is not the right person.
I guess if you think like a newspaper, you end up with these problems seeming impossible to figure out.
Have I lost my marbles, or is this really not that hard?
-Pete
Soccer Goal Plans
Their ISP killed their account after 3 reported strikes.
Then there's em3.net, a scumware site that tried this last year. Following the links triggered attempted spyware downloads.
(If anyone is truely interested I have a partial list at http://idunno.org/misc/referralSpammers.aspx)
From the article: 'Most websites have no idea how many people view their content. This inherent fuzziness is causing problems for commercial websites, especially online publications desperate to make money from Internet advertising... How can you charge for ads when it's nearly impossible to tell advertisers how many people will see them?'
Well, websites can just do things to make up numbers. Dead tree publications do it all the time. Ever notice how the the nation's most popular newspaper is probably so popular because almost every hotel room in the US has one at the hotel door in the morning (where it is most likely then placed in the trash). I would bet that its much easier to figure out how many people are actually reading what on a website vs any other medium.
> This inherent fuzziness is causing problems for commercial websites, especially online publications desperate to make money from Internet advertising... How can you charge for ads when it's nearly impossible to tell advertisers how many people will see them?' Then use performance based advertising - such as cj.com (or buyat.co.uk in the UK). You don't have to sell CPM (ie clicks) but instead get paid on results (eg. sales generated). This solution has been around for a long long time.
Read the article. They are complaining that one user may read the content from work and from home, and so count as two users. One might also point out that sometimes two people may use the same computer, and only count as one person.
My wife and I both read the same article/section in the newspaper we got yesterday, even though we only got a single paper. (We "logged" 1 impression even though 2 were made.)
I understand that is the opposite of what you suggest, so...
Not only that, but we had some sections delivered to us that we (gasp!) threw out without even reading even though we may have been part of the target demographic. (We "logged" 1 impression even though 0 were made.)
And the web is different how?
-Pete
Soccer Goal Plans
Alternatively, you could throw in a pop-up box that tricks the user into loading Bonzi Buddy, then count how many angry emails you get from users with Bonzi infest^W installations.
This is a very permanent solution, as after this you no longer have to worry about monitoring traffic to your website.
dinner: it's what's for beer
After careful review of our target audience, we have have begun work on our new bulk Prozac and Lithium banner ad campaign.
One time the summarizer displayed a search string that consisted solely of pornographic terms: "pussy", "fuck", and the like. I was pretty confused because my site is just an HTML guide. Turns out it found me because of the word "maypole"... I still have no idea what that means in a porn context.
I've found that my posts don't format quite right w/o a sig.
I find the value of web logs is more the relative growth of traffic, or from section to section. Since one can assume relatively the same degree of error each month (i.e. 2 users on the same computer, 1 user on 2 computers, etc.) you can gain a lot of information just by comparing logs over time. The same goes for section by section. If your web site has 5 distinct sections you can compare within them and then over time. Advertisers like to know absolute numbers, but if you can tell them that they'll get 2x if they advertise on a particular portion of your site and it's likely that section gets a certain type of visitor that is very valuable. In the least it gives you some solid direction about what your users want so you can build a better site, and eventually get more ad revenue from it.
If you think Caddyshack is a "classic" then it's you who is probably a "little kiddie." It's a pretty funny film but certainly not a classic!
By actual poll of caddies Caddyshack is the best movie ever made about Caddies. That meets the definition of classic.
Caddyshack II was voted the worst movie about caddies.
Of course they are also the only two movies ever made about caddies, but we'll overlook that for now.
Personally I'm an officiando of dancing gopher puppets, so it's Caddyshack all the way, the undisputed classic of the genre.
Everybody! "I'm alright. . . "
KFG
and they could call it metamoderation? Yeah, they should implement that.
My user number is prime. Is yours?
I'm not going to click on your banner. Nope. Not a chance. Not happening.
It's not that I'm not interested in your product. Online adverts I see actually tend to be:
1. Something unavailable to me (wrong country).
2. Something of no interest to me.
3. Something I own already (this happens a _lot_ with Gamespy).
But that's not the point. The point is, I'm at the web site because I'm looking for something, and it's probably not your product. When watching TV, I never watch an advert, and immediately decide to research/buy that product. At best I'll make a mental note to have a look out for information on it later, in most cases I won't think about it until I'm looking for that kind of product, at which point I'll probably remember your advert.
An example might be easier. I frequently see adverts for car insurance. I don't drive, for a variety of reasons, but if I was going to learn and buy a car, I'd probably start calling around the companies whose names I remembered from adverts. Well, actually I'd Google for a comparison site, but lets pretend I'm too lazy to do that, okay?
Oh, also, pop-ups/unders are a really good way of persuading me to avoid your company, your advertiser, and whatever site I got the pop-up/under from.
Who cares about demographics? We're trying to figure out what people's interests are, what types of ads they'll respond to.
Well, duh. If a visitor looks at the sports pages during work hours, you have a fair deal of information about that person already. Isn't that already enough to serve up ads that would likely be relevant?
If these dead-tree publishers of yesterday's news got a clue, they might also realize that web-ads are actionable, and actions can be counted. Do people click on the ads? Do they generate leads or sales? There's this interesting industry called affiliate marketing they should look into (my guess is they'd make good money off personals and job ads).
What they read, when they read it, and what ads they want to learn more about. WTF more do they need?
Information: "I want to be anthropomorphized"
For those who haven't figured it out already, the web is not an advertising medium. Yes, you can find people who will pay for advertising, but it's a peripheral and unimportant element of the service.
Hasn't the dot-com-bust taught us anything? Revenue models based on advertising are not going to work except for the rare few who have market share and a steady stream of gullible businesses that want to cheat and try to buy an audience instead of building one.
Anyone who needs to know how many people are on his/her site and their nature, will already know, and will already have things in place to measure and qualify this. The most obvious of which is sales of their products/services. Traffic reports are amusing but otherwise irrelevent unless you're in the business of selling traffic reports (like Nielsen - another bottom feeder that is providing a crutch to businesses in an effort to continue to perpetuate the myth that online advertising is worthwhile).
As noted, people who use quality browsers and have them set to reject cookies will be undercounted. However, this may provide a more accuracte count of the number of people who look at ads. What percentage of firefox users have set their browsers to block ads from doubleclick?
1. Promise to quadruple the traffic to a company's site within 24 hours, in exchange for $$
2. Post story regarding stupidity of company's advertising model to slashdot, company's server is slashdotted
3. Profit!!!
This is all about earning the right to count your visitors.
There is NO WAY I am going to spend time giving up my privacy and demographic information if the site has not earned the right to waste my time.
When you walk into any store in the mall there is a small laser that is counting foot traffic. Each person or close walking couple breaks the beam once to enter and once to exit. It isn't precise, but it is close enough and further the store EARNED THE RIGHT to count visitors becuase there is a reward - viewing the merchandise. Plus, there is a very low cost (exposure to a low powered laser).
Compare this to a website that would require you to fill out a form, presumably with valid info (the article mentions 90210 as the most popular zip code on the web), and THEN you get to see the content. No thanks. potentially valuable content not worth the bother.
Now if there was some technology that would allow you to store this reader profile and it would be transmitted when you visited a website without the need to fillout a form, I bet some people would use it.
But no one wants to give their drivers license to the GAP store clerk before entering and there will never be a time that, no matter how valuable it would be for a web site owner, people provide valid, accurate data on who they are to view site content that has not earned the right to ask for that information.
I only came here to do two things; kick some ass, and drink some beer...looks like we're almost out of beer.
With a relatively compact bit of javascript embedded into a page, the user gets hopefully relevant ads that are not obtrusive or flashy, same as the Google Adwords text-only ads you see on the right side of the Google results pages. And you can customize the colors and format to suit your own pages. Google, while they do serve the ads based on your site's content, do allow you to prohibit certain keywords, so you can block out competitors' ads.
To make it useful to the host, Google allows you to create "channels", so within one AdSense account you can track different pages. You can get a detailed report of how many pageviews each channel generates, as well as click-thrus (which of course leave your site).
To sweeten the deal, you get paid for click thrus. That means you get paid when someone leaves your site, but my philosophy is that if they do that, they weren't planning on sticking around anyway, so I might as well profit from it.
In my case, my site generates about 3000 pageviews and 15 clickthrus, and that translates into about $1 a day in revenue. It's not much, but I roll that back into the Google AdWords campaigns that I run, which generate inbound traffic. I'd rather have people coming to my site that want to be here, than those that don't, so I see it as a fair trade.
And in the end, the reporting and tracking are handled by Google, and provide a tangible benefit to my business.
Oh, and if you want to see an example in operation, look at the very bottom of our site's main page.
--Brandon / Split Infinity Music
Troll... but I'll bite.
Mostly secular yes, but there is a daily article on Christian Science in the editorial secion.
Also, the staff at the CSM was nervous about putting the church's name in the title of the paper because they thought people wouldn't take it seriously, would have bias against it.
Also, the CSM is owned by the Christian Science Church.
Nah - I wasn't trolling. All that information's in the link I posted. It seems as if they try to keep as objective as possible in their reporting, and Christianity doesn't seem to get a mention, so it doesn't matter who's writing the thing. It could be written by the Church of Satan for all it matters.
---
"I did nothing. I did absolutely nothing and it was everything that I thought it could be."
My methods are:
- The webalizer weblogs provided by my hosting provider. Disadvantage is that they mostly provide top-10s. So I get no data on the other pages. Also you count a lot of bots.
- Google adwords and other advertisers with tracking pixels (like CJ). Problem is that if you compare them they give widely different values for the same page.
- Nedstat. I like the referer information. But I find it too much work to give every page its own counter.
- My own counter. Basically a piece of javascript that says "myimage.src=mycounter.php?url=theurl". It is primitive but it is my own so that I can easily extend it.
Next I also occasionally look to 404 errors that are generated on my site. Unfortunately nearly half of them come from Yahoo Slurp that finds it necessary to check for urls that haven't been on my site for nearly a year.
For those of you who don't know, it's from Caddyshack.
Then post a link. I saw that movie when it first ran (and still like it), but I didn't recognize the quote.
When I worked for a newspaper we did some research (by polling readers) and found out that each paper was read by 1.5 people (or some such figure). So that's what we told advertisers.
"Give a man a fish and he will ask for tartar sauce and French fries!"
With the high-end commercial packages, WebTrends, Omniture, CoreMetrics, and WebSideStory, you add a tagged link to each Web page, and the "outsourced" service does the rest. No logs to collect, no servers or databases to mess with. These services are targeted for the sponsors of the site, not so much the operators. Cost is $20K and up per year - based on page views. Some high end sites are paying over $500K per year - a few over $1M. The retail sites you frequent get to know you can add demographics to the tags, and do some very sophisticated click stream analysis. With these tools, you can find out what percentage of the visitors buy something - after they have visited to your privacy page. You just build the scenario you want to analyze, and a report is produced. They also produce funnel reports , and can compare conversion rates on two different sets of content (ie, search bar on the top or bottom; red or green background, etc). These tools are not for everyone - typically you want to be making 20% of your revenue via a Web channel first. The user needs some skills in merchandising too, and a content staff that can make changes when the tools suggests one way is better than the other.
Since the above was written I discovered a common practice of sysadmins and help desks is the suggesting manually deleting all cookies (since you can't do it selectively with MS-IE) to get over site bugs. And now the increasing popular spyware removal tools (E.g., spybot) remove 3rd party cookies used just to count unique visitors in the name of removing sypware and viruses from your computer.
Originally I thought of defining a visitor for HTTP domains as the cookie if it exists, and the client IP address otherwise. But the flaw in this is that it will double count first time HTTP visitors. Once for the log line of their first hit with no cookie. And again for the subsequent hit. With streaming logs, using the GUID (effectively a cookie these days) and the client IP address is more useful as a unique visitor. The log lines in streaming are actually the summary of a sequence or request/reply transacations and so the first "hit" log line does have a GUID/cookie logged.
What follows is addition research I turned up:
says: `` A visitor is defined as "a unique IP address with heuristic." To properly account for visits, the Web site needs to identify a "visitor" so that visitor activity is properly tracked. Registration and/or cookies are the best way to track a visitor's activity through the Web site. Unfortunately, a lot of Web sites do not require registration, nor do they use cookies [and browsers can disable cookies] If cookies are used, it is the clients' responsibility to provide the auditor with details on how the server sets the cookie, the cookie format and how the cookies are used. An alternative that has been suggested is to use the IP address AND user-agent in combination, to identify a unique visitor. The interaction with the site by this "visitor" is then analyzed to determine the number of visits which should be recorded. Using only the IP address to identify a visitor is not acceptable due to the number of visitors that may not be accurately reported because they are operating behind a proxy server or firewall. ''
The article is slanted to sound like it's bad that accurate visitor data can't be had for a website. They fail to mention that compared to other forms of advertising, the Web is a gold mine of information.
Can a magazine tell you how many people saw a particular ad (without lying that is)? NO. Same for magazines, TV and even junk mail. They might have numbers that are reasonable as to how many people MIGHT see an ad, though take with salt. But how many people actually act on the ad? No sir. How many people blocked the ad? Ah, nope! One could get that from web server logs though.
It's all just the ad industry promoting the ad industry, ignoring the fact that is doesn't work.
Anything is possible given time and money.
If you're getting 3000 impressions and only 15 click thrus you're oversaturating your visitors with ads. The net result is that they see an ad they aren't interested in and then ignore the rest of the ads or are simply bothered by them to begin with.
You need to figure out which pages are generating the most impressions and fewest click thrus and pull the ads.
You should be getting at least a 1.0% clickthru rate. 0.5% is a big sign something isn't working. It could also be a sign you lack the content to get Google to effectively target ads at your audience.
You can tell who your audience is by induction based on what kind of content you have. I have a lot of programming related content so my visitors are obviously coders. It's not an advanced site so my demographic is beginner to intermediate programmers which fall mainly in the early teens to mid twenties.
You can also get a good idea who your audience is by looking at the search terms they use to find your site in search engines.
Ben
Work Safe Porn
For my personal site, don't care. With the on line high school yearbooks for my alumni group, looking for 404 errors from the hundreds of static html pages I hand editted from the initial template, and getting a general idea whether the alums are using it. For the old man's specialty CD-ROM site, just looking for a general idea what parts of the nation/world lookie-loos/orders are coming from.
Lately, I've also been using the log to see which spiders rate a disallow in robots.txt.
Luke, help me take this mask off
Plus, I can divide page-views by unique visitors and have a pretty good idea (along with looking at the numbers from certain directories) how long my visitors are staying and what they are doing. (Largely they are looking at upcoming events in the area.)
I also look at *when* people are coming and from *where*. Thus I can reliably say to my advertisers, "Well two-thirds of our traffic is from .com URLS and shows up between 9-5 M-F, so those are largely coming into the site from their workplace." Since we are an in affluent area, this probably means that these are people with fairly large disposable income. *That* is exactly what they want to hear.
If I am really interested, I can run a little program that does path-tracing through the site based on the raw logs, but I generally don't do that too often anymore.
Of course, I *only* see about 50,000 pageviews a month, so I don't run a big site, but I *can* talk fairly knowledgeably about my users without resorting to fancy-schmancy "network traffic analysis" or any of that other useless and expensive stuff.
DNA is a Turing machine. You, however, being dynamic and emergent, are not.
This is not a troll and I'm not trying to offend you, but this makes no sense to me at all. While you think you might want to know how many visitors, eg. humans, you had visiting you, it really is quite meaningless because as others have pointed out, some people read ads and others don't.
:-) - will give you the number of hits. All you need to do is filter out the spiders - though I can make an argument that leaves them in also - seeing as they are representative of eyeballs viewing the gathered data - like a google search.
A more meaningful statistic, is how many times was an ad served, because at the other end of the served ad were some eyesballs. You could filter out spiders, who's behaviour is pretty simple to detect and you'd have a number that actually meant something.
The article shows as an example a person viewing the same site from two places, at work and at home. They want to count that person as one. This makes no sense. The person is exposed to the content twice. Advertising is about repeating the message.
TV advertising is stabbing in the dark and tallying up imaginary numbers. Marketers make money from that imagination. The web is different. You can actually count the number of eyeballs that visited...
Your Apache Logs - I'm assuming you actually use a real server
Anyway. Hope this cleared up some things...
|>>?
I found very helpful to conduct ad-hoc analysis
using the standard unix commands in pipe to
a free software analyzer (Visitors, http://www.hping.org/visitors).
This program is a web log analyzer that can
process logs from standard input, so you are
free to use grep, sed, perl scripts, and what
you want in order to change on the fly what
part of the log lines you want to analyze.
I found this mixed human/analyzer interaction
very helpful understanding what's my web audience.
This is an old problem, and it's been solved by auditing.
If site owners and advertisers care about whether the traffic on sites is "real" in any way, then they're probably best off paying for an independent audit of the site's logs. Organisations like ABC//e for example here in the UK will do it, as will various other arounds the world. All use very similar methods and definitions (in fact they collaborate to define standard ways to audit metrics).
Sure, it's not perfect, any more than ABC's magazine circulation and readership audits are perfect, but that's not the point. It's the fact that the *same* measurement standards are applied to all that counts. So it's a level playing field (almost) and makes all these nit-picky arguments about cookies and stuff pretty much pointless.
"And the meaning of words; when they cease to function; when will it start worrying you?"