Delete Cookies, Inflate Net Traffic Estimates
eldavojohn writes "In my browser, I regularly go to the tools menu and clear my private data. This includes my cookies. As a result, people like me who destroy cookies by the thousands may be inflating estimates of Web traffic by up to 150 percent. People have good reasons for clearing out cookies — we've heard about bad cookies before (and I think the FCC is still investigating the issue). But every time you delete cookies, many of the sites you've visited count you as a new visitor next time."
...you could be like me--I block all cookies from all sites until I've added them to my whitelist.
News at 11 -- Water still wet.
I hadn't thought about counting it this way until this article appeared but, now that it's said, I'm not surprised. It doesn't matter what the consumer does. The business analysts will always find a way to spin it for their profit. Initially the business analysts thought that this would be a perfect way to track all of the visitors. When some of the visitors decided they didn't want to be tracked then the business analysts decided that, well, maybe tracking them (in that particularly way) wasn't the important metric for the shareholders to see. The more important number, obviously, is how many discrete visitors they have.
Brilliant.
the NPG electrode was replaced with carbon blac
If the primary concern is for unique visitor tallies for traffic-based advertising, wouldn't web sites be affected (mostly) across the board? If all web traffic is artificially inflated close to the same amount, then this becomes a non-issue.
That assumes an awful lot of people do that.
I don't do it because it is a pain to constantly log back in everywhere. But I seriously doubt more than 2% of the non-slashdot crowd does it.
The FCC has little reason to investigate cookies.
I delete cookies, permit them, leave them on, it is all my business. I am under no obligation to provide web site operators reliable count of how many uniqie visitors they get. They should stop complaining and develop better ways to count unique visitors. If they cant, it is still not my problem.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Cookiesafe allows me to keep my permanent cookies to a minimum, yet allow me all the functionality of session cookies. Of course, it does inflate the stats as the article mentions. In my previous job I worked with stats quite a bit (using WebSideStory/Hitbox), and it is such an inexact science that it ranks right up there with Lies and Damn Lies.
https://addons.mozilla.org/en-US/firefox/addon/249 7
Anyone have other suggested software they prefer?
...though it may be to some people.
Anonymous user stats are always going to be an estimate. Cookies aren't reliable, because people clear them. IP addresses aren't reliable, because some are dynamically generated, some are shared, and people move around.
You can only really know how many users you have if (a) they're registered and (b) they visit the site while logged in. (And even then, people could be sharing accounts -- bugmenot, anyone?)
Personally, I don't think this is a problem, as long as you're willing to look at the estimates for what they are and not treat them as if they were precise.
Hmm... how long before someone claims that Firefox's/Opera's/Safari's stats are inflated because they make it easier to wipe cookies than IE?
i think this is very wrong. who counts the number of cookies as bandwidth? the bandwidth is measured at the routers, if it's not, then dont read too much into bandwidth estimates as it's nothing more than a wet finger in the air.
Why UNIX?
But every time you delete cookies, many of the sites you've visited count you as a new visitor next time.
I have Firefox clear my cookies on browser close... So I look like a new visitor every time I visit a site.
Perhaps someone would explain to me why I should care about this? The only use I can see for unique visitor counts (other than the trivia value) involves ad revenue - And I aggressively block almost all adverts, so don't care about that, either.
Unfortunately IP address doesn't work. NAT can put anywhere from a couple (small home network) to thousands (corporate networks) of individual machines behind a single IP address. The common ISP practice of using dynamic addresses can result in a single machine having anywhere from one address for years at a time to a different address every hour. Most web-statistics companies have abandoned IP addresses as a valid identifier.
Most of them do in fact rely on cookies of one sort or another. Most rely on browser cookies, a few are using Flash or media-player cookies. All of them suffer from the fact that cookie deletion or filtering in the browser corrupts the statistics. Blocking of cookies completely is the easiest form to deal with, the server-side code can check whether cookies were in fact set and simply discard data from browsers that don't accept cookies. Cookie deletion, or forcing cookies to have session lifetimes, is harder to deal with since to the server it looks like the cookies are good but in reality they can't provide information about visitors, only sessions. The worst are one-shot cookies, where the browser will let a new cookie be set but then won't permit it to be modified or removed. The big problem with them is that any test will overlap to some degree with normal cookie behavior, so you end up having to balance how much corruption you're getting relative to how much good data you're throwing out by mistake.
Most web-statistics firms are working to avoid the worst of the problems by moving their machines into the DNS namespace of the sites they're collecting statistics on. That helps get around third-party cookie behavior in browsers, and should work until browsers either start having extensive host-specific block lists or start allowing cookie filtering based on IP address instead of URL hostname.
I always considered the intricacies an interesting puzzle, and wringing every bit of validity possible out of the system a challenge. Management, unfortunately, doesn't want to hear about the intricaties, they just want to hear that there's no problems, everything's fine and the numbers they're giving their customers are perfect. Customers, even more unfortunately, don't want to hear about any problems, they just want to hear that the numbers they're getting are perfect. Sooner or later the cluebat will get applied.
...god kills a kitten!
I'm sure there are lots of reasons for doing it, but most bulletin boards that require registration in order to read, at least in my experience, do it in order to limit traffic, not count it. It's a way of keeping costs down, albeit at the expense of making the board less useful as a resource to the general public.
Unfortunately the best board relating to Knoppmyth is like this; it was just too expensive for the maintainer to run openly; the traffic cost too much. By requiring registration to read, it cut down on traffic enough to make it affordable. Given the choice between a register-before-reading board and no board at all, I think the public is best served by the former.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
I use a PC at work.
And another one at home, well even two sometimes.
And a smart phone equipped with a browser.
So I inflate web usage statistics with 100 to 300%?
And then there are people sharing the same PC/account deflating the stats...
All of us who host websites know how unreliable statistics are. Nothing new there...
X.
Oh boo hoo, cry me a river. Produce something people want and they'll come back time and again and you won't have to worry about your traffic.
We will bankrupt ourselves in the vain search for absolute security. -- Dwight D. Eisenhower
As soon as you log on to a site connected with certain advertisers your brand new not you unique cookie is again linked back to your old account thru backend calls between advertisers and accounts. Yeah, there's a minor % that is wrong because of people using other's computers, but it's better than having people delete cookies being new customers again. Yeah, a lot of random sites you probably will never go to again don't know you from one to the other, but others get who you are from your cookie linked to their advertiser, and as soon as you log in to any of the sites that have the same advertiser, you're linked up again and some sites do it retroactively. Of course, if you want privacy, better than a cookie blocker is actually adblock and the filterset.g updater. Those give you more privacy than deleting your cookies. But yes, it's possible to track you past the cookies.
There's a few fingerprinting companies out there, track you by stuff plugins give away(dates, versions, etc.. anything the plugin will give up). I've even heard of a company using the time offset from your computer from your web browser(which passes the time back in milliseconds since 1970, IIRC) and combined with some other methods it really helps you track people down. Not to mention you can combine all this with your IP address and you're pretty good. But deleting cookies doesn't really help you, it's more of a minor inconvenience to the small companies who don't really care to track you that much, and a tiny hurdle to larger companies who do care and who are already doing it and some that even know you before the cookie. (Don't accept cookies? Check for that, and IP address, flash version, time offset(if it's possible), what plugins are installed via navigator.plugins and you're pretty close to a positive ID. Of course there are many other ways and I don't know any of them. So, delete your cookies if you want, but realize it's not much of a help.
Adblock is, and ultimately those who really want to track you probably can.
I simply don't care, and can't fathom why I should care. It is not, never has been, and never will be my responsibility to ensure the accuracy of statistical reports on sites that I visit. What data is stored on my personal computer is my business, and nobody else's. Is there seriously anybody who thinks that this is actual news? Are there seriously people who are able to get funding for such intuitively obvious research? Where do I get my cut?
Apache guy, Open Source enthusiast, runner
What you say there is absolutely correct, but it begs the question: How would it ever be the fault of the user in any possible case? I have a newsflash for the advertisers -- you do not have a God-given inalienable right to store data on my computer. It's mine, I paid for it, and I will selectively accept or freely remove any data that you attempt to place on it, for any reason or for no reason at all. The world does not owe anyone a reliable way to track the Web surfing of others.
This and DRM are two categories where marketers act like my personal property is theirs to do with as they please, and I'm sick of the way the average "consumer" puts up with this concept or anything resembling it.
Any Web site owner who doesn't like this can feel free to block me from their Web site; since it is theirs after all, I certainly do not dispute their right to do that (they would do so to find that I can live quite well without them). But please, let's dispose of this idea that some marketer not being able to track me is somehow my fault or my problem.
I say that if your business model relies on the ability to effectively spy on people, often without their knowledge or consent, then your business model is flawed and any difficulties you encounter are well-earned. I further say that the current situation exists only because of widespread ignorance; that is, if every single person who ever went online were a thoroughly educated uber-geek and fully aware of all tracking techniques used, then no one or practically no one would ever allow any of it and the marketers would have to come up with a more reasonable way to make money.
It is a miracle that curiosity survives formal education. - Einstein
A single TCP-connection is identified by a quad: ip and port for the two destinations.
So, you only really need a new source-port for every internal user who visits the same site.
NAT is implemented by maintaining an internal table of what external ips/ports should be mapped to which internal ip/port. An example:
Practical result ?
You can use a single external IP for a group of websurfers, the size of the group has a limit, you run into trouble the moment more than 65000 of your internal users want to visit the same website simultaneously. With simultaneously being defined as within the timeout of the NAT-table (typically 1-5 minutes)
Atleast a million websurfers can easily hide behind a single IP using this technique. 10 million if they're not hugely active, or if they don't visit the same sites all the time. Not that there's any reason to. Ips aren't *that* hard to come by.
You could increase this by another order of magnitude or two by also taking sequence-numbers into the NAT-tables. Two different users connecting to the same service at the same time are likely to get sequence-numbers different enough that the two connections can be recognized based on this. This ain't really a good idea though, because if you did this, you could get unlucky and have two connections accidentaly get sequence-numbers close to oneanother.
Besides, you don't really have a *reason* for hiding a billion websurfers behind a single IP, now do you ?
Dear Anonymous Coward,
So, you're the little bastard who keeps forwarding me that crap.
This year... no presents for you!!!
Sincerely,
Santa H. Claus
santa@northpole.net
I am defenseless. Use your button. Mod me down with all of your hatred.
I'm fully aware of the tracking techniques used.. and I don't delete my cookies. I'm an anonymous number to them.
I bet you go shopping in a ski mask too, because every store video tapes you.
I don't always use unix-like operating systems; but when I do, I prefer FreeBSD.