Cookieless Web Tracking Using HTTP's ETag
An anonymous reader writes "There is a growing interest in who tracks us, and many folks are restricting the use of web cookies and Flash to cut down how advertisers (and others) can track them. Those things are fine as far as they go, but some sites are using the ETag header as an identifier: Attentive readers might have noticed already how you can use this to track people: the browser sends the information back to the server that it previously received (the ETag). That sounds an awful lot like cookies, doesn't it? The server can simply give each browser an unique ETag, and when they connect again it can look it up in its database. Neither JavaScript, nor any other plugin, has to be enabled for this to work either, and changing your IP is useless as well. The only usable workaround seems to be clearing one's cache, or using private browsing with HTTPS on sites where you don't want to be tracked. The Firefox add-on SecretAgent also does ETag overwriting."
Here we come. :-)
Add this feature to a chaff-creating plugin, to crapflood servers with fake tags.
"Flyin' in just a sweet place,
Never been known to fail..."
Changes were made in the past few years to make it much more difficult to clear the cache frequently and easily.
You must jump through various menus and dropdowns. The team argued that this was progress, and it helped prevent inadvertant cache clearing. Their argument was very weak.
It forces me to hassle with yet another plugin to make my very frequent cache clearing quicker. But at least it is now an icon on the toolbar, with no prompting.
Did they just invent ETag or what? This "feature" is known for a few years and there are existing implementation, including this one: http://samy.pl/evercookie/ from 2010.
Tracking information is worth billions of dollars. With that much money on the line - we'll be tracked like escaped inmates - one way or another.
The addon's homepage appears to be this:
https://www.dephormation.org.uk/?page=81
On all of our PCs, Opera and Firefox are set to clear their caches and delete all cookies etc. every time they exit.
Also, I occasionally clear all private data while browsing in Opera, including the cache, cookies, history, and so forth (passwords are never saved by the browser). Obviously, I have to log in again the next time I visit slashdot.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
The RequestPolicy add-on should handle this too. RequestPolicy blocks cross-site references by default and lets you whitelist individual cases. If you don't even talk to the tracker websites then they can't track you.
If the main website you access tracks you via etags the risk is limited to tracking your actions on that website which you'd have problems avoiding anyway since they can track you via ip address or if you have an account on that website.
When information is power, privacy is freedom.
I always imagine the webserver as having an internal conversation that goes sort of like this...
You might think at this point that companies and advertisers start getting the message. Instead, they just keep finding more and sleazier ways. All these technologies have valid uses but have been so abused by corporations and marketing that people increasingly don't trust it anywhere. It just further antagonizes the very people they are trying to connect with. And then they wonder why they lose the respect and trust of their customers, resulting in an ever-more aggressive relationship between the two.
Some days I dream about what the Internet might have been like had Canter and Siegel been definitively smacked down back in '94, setting an inviolable precedent that the 'Net was not a platform welcoming /any/ advertising. What repercussions might that have had on the world as a whole?
The ETag method is a clever solution to cookieless tracking. I find this method I stumbled upon a couple of weeks ago a bit startling. I had no idea the amount of information routinely sent from my browser/computer to web servers-- information about plug-ins, time zone, screen resolution, accepted headers, etc WITHOUT letting me know. It is enough to give more than 21 bits of identifying information and uniquely identifies me among the 3M visits.
https://panopticlick.eff.org/
Want to get back at the folks tracking? Blocking or changing the communications with thigns like Ghostery or SecretAgent is great. However, if there was software that connected to the tracking servers but never completed the TCP connection, thus leaving the tracker with a bunch of half open TCP connections, then one could effectively ddos the trackers. There are several other techniques along these lines that can be employed. What good is a tracking system that is clogged up with connections that never complete or fail in various unfriendly ways?
Captcha: capacity
OK, then please tell me how host files can at the same time stop third-party requests to a site (like embedded YouTube videos, or Facebook like buttons) and at the same time allow explicit access of the very same site (that is, when you explicitly go to Youtube or Facebook).
With RequestPolicy that's trivial (indeed, it's the default, you don't even need to know the third party site to be sure that it is blocked, let alone explicitly deny it).
The Tao of math: The numbers you can count are not the real numbers.
It also seems to leak info between regular windows and incognito mode in chromium. I assume the cache is shared between the modes, and they need separate caches.
My browser passed because of the way I start it. A whole new user/home environment is dynamically created every time I start a browser. I originally did this so that as I browse hundreds of sites, I don't end up with extreme memory waste. This was done back in an older version that was quite memory leaky. It would build up too much in-process memory as I visited sites, and eventually crash. So I ended up with multiple browsers running (separate processes). At first that might seem to have used even more memory. But that was at the OS level where I did have more, including swap space. But it was at least finite since when I left some website, its browser actually exited, rather than just unlink fragmented virtual pages. Today I just haven't changed it now more because of the tracking breakage it creates. I can still be tracked within a site like Slashdot. Slashdot know what articles I read and what articles I ignore. Slashdot know what I post. But I am logged in, so "duh". No, it's not perfect at all, as the Slashdot advertisers can see my repeat appearances, too. But at least they can't so easily figure out what other sites I visit, besides the IP address (which I plan to work on some day).
now we need to go OSS in diesel cars
Session cookies.
The Tao of math: The numbers you can count are not the real numbers.
It's not the loading of the HTML file which is avoided with ETags, but the loading of the image. Basically, if the image today is still the same as the image last week, and the image from last week is still in the cache, then it makes sense not to load the image again.
The Tao of math: The numbers you can count are not the real numbers.
You can't correlate access across multiple URLs, since every URL has a different ETag.
I know, replying to APK about magical hosts files is pointless, but here we go anyway:
Can you answer these two questions:
How many domains and subdomains does Facebook operate?
Please make sure to include those added in the last 4 hours!
Can you enumerate every domain used to host advertising and/or malware on the planet?
Please make sure to account for dynamically changing and the infinite number of wildcard domains!
If you cannot give me exact answers, then your hosts file method is useless and obsolete. Please wake up and stop peddling your crap here.
"What do you despise? By this are you truly known." --Princess Irulan, Manual of Muad'Dib
/)
ETags can be used to track unique users,[2] as HTTP cookies are increasingly deleted by privacy-aware users. In July 2011, Ashkan Soltani and a team of researchers at UC Berkeley reported that a number of websites, including Hulu.com, were using ETags for tracking purposes.[3] Hulu and KISSmetrics have both ceased "respawning" as of 29 July 2011,[4] as KISSmetrics and over 20 of its clients are facing a class-action lawsuit over the use of "undeletable" tracking cookies partially involving the use of ETags.
systemd is Roko's Basilisk.
I'm not speaking about ads served on YouTube (actually I didn't remember that those actually exist, having seen none for quite some time). I'm speaking about YouTube embedded videos on non-YouTube sites.
I know for sure that those embedded videos, when they come from YouTube, are loaded directly from youtube.com (or one of the other common YouTube domains, like youtube.de). I know that because every one and then, I decide that I want to see the embedded video, and decide that it's worth more for me than the tiny bit of possible tracking that involves (since most accesses are blocked, I don't think that the little tracking data from the very few videos on third party sites I actually watch is too valuable anyway). BTW, that's another thing you cannot do with host files: Easy temporary enabling.
As far as ads on YouTube go, if they were served from the same server as YouTube itself, RequestPolicy wouldn't work either (but then, I've also got AdBlock Plus installed, as a second line of defense). It's not only ads which cause cross-site requests (and thus potential information leaks).
No, the proof of the pudding is in the eating. But since you were speaking about a whole different pudding than I did, your conclusion doesn't hold.
BTW, I notice the absence of an argument about Facebook (whose "like" buttons are the far more important tracking mechanism anyway).
And the slowdown by RequestPolicy is certainly not noticeable (and my computer is over seven years old; I wonder how old yours must be to notice the difference). Nor should it be; after all, it's just a comparison of short strings.
The Tao of math: The numbers you can count are not the real numbers.
When I open a new webpage I would like the newest version but I don't think it is much time saved by webserver generating the webpage, then calculate a checksum or whatever (I mean for pages not using etags for tracking...), and then compare it to the etag the webbrowser sent, and then if equal reply they are equal - instead if just sending the page it generated! it is just a html file, shouldn't be that many kb.
Well, first off, it's not "just an html file", because ETags also apply to the images. So once the html is downloaded, do you want it fetching multiple MB-scale images (in the case of, e.g. a photo gallery) from scratch even though you've got a cached copy? (No.) Do you want it using the cached images regardless of whether the images have been changed? (No.) So you need to use one of four schemes:
1. TTL-based. If the server knows when the new image will be modified, or knows some acceptable time that things can lag, it could state a TTL when you first download the image. Your browser keeps the TTL in cache with the image, and next time you load that image, if TTL has expired, you fetch a fresh copy; if not, use the cached copy. Done with Expires: or Cache-Control: maxage.
2. client-timestamp-based. The client provides a timestamp of when their cached image was retrieved, and makes a request using If-Modified-Since: header; the server makes the determination whether that version's the same or not, and responds appropriately.
3. server-timstamp-based. The server provides a Last-Modified: timestamp, the client uses this (instead of the last retrieval) when making the If-Modified-Since: request and the server determines if it's changed since then and responds appropriately.
4. server-tag-based. The server assigns a ETag: tag to the image, which is cached along with the image. When requesting a cached image, the client includes this tag in an If-None-Match: header, the web server compares the tag to the current version's tag, and responds appropriately.
From a functionality perspective, 1. is horrible for anything not updated on a strict schedule (e.g. at the top of each hour) -- you end up reloading a bunch of stuff that hasn't changed because the TTL has to be set short. 2. is almost perfect if you're honest, but not very good if the client lies for better privacy. 3. is similarly almost perfect. 4. is perfect, slightly edging out 2. or 3. in the practically-rare case where there's a change followed by a reversion, and your cache holds the old version (which now matches the current version again). 4. will correctly skip the download while 2. will reload needlessly. (Actually, 2. or 3. can work around this, at the expense of the server maintaining a log of checksums at every change, but this breaks things even further for the dishonest client.) Additionally, 4. removes the requirement for a coherent clock on the server, which might matter in embedded web servers.
From a privacy perspective, 1. is pretty good. 2. leaks information about when you last visited, but the client can lie (basically, reduce the granularity, rounding to the previous hour or day) to increase collisions. 3. is of course bad for privacy as the server can give you a false Last-Modified:, but if you trust the server to be honest, is good because because the granularity is automatically reduced as far as possible, but no further -- if the data goes unchanged for 3 months, the web server can only tell you accessed it in those three months, but if it's changed multiple times in 1 hour, you will only download it when you need a new version -- whereas the lying-client version of 2. will redownload it every time if it's been changed since the last rounded time. 4. is likewise bad for privacy, and should only be used with servers you trust not to use any user- or session-specific information in generating the tags (i.e. tag=f(content) only). If the tag depends solely on the content, though, it's better than 2. for the same reason and in the same way 3. is.
S
Wikipedia, for a start (whenever you upload a new version of the image).
Also, the image may be dynamically generated from changing data, say stock charts, or captured from a web cam.
The Tao of math: The numbers you can count are not the real numbers.
No need to wait for a TCP connection to time out. As soon as the page has finished loading, all connections are closed. HTTP is a stateless protocol; just because you have a web page open in front of you doesn't mean there's any connection to the server right now.
If you're not using cookies, you can use query strings to track state. For every link on the page, you add a query string to the URL containing a session ID number, so when the user clicks any link, the session ID is passed in the query string. But that looks ugly, so you should just use cookies.
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
FTP is a stupid protocol and needs to die. Please use something else (such as SFTP).
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Vodafone makes tracking of users possible which does not require access to the user's equipment. The HTTP request is enriched with a piece of identifying information. This involves an HTTP header called X-VF-ACR: 'Vodafone Anonymous Customer Recognition.'
See also: http://referaat.cs.utwente.nl/conference/16/paper/7306/using-browser-properties-for-fingerprinting-purposes.pdf (pdf)
Simply not allowing 3rd party URL's on any website. Sure it might break some ancient things but you shouldn't really be including iframe's, cookies, JavaScript or anything else from a 3rd party domain anyway.
Custom electronics and digital signage for your business: www.evcircuits.com
is it impossible to set the web browser to never use etags?
(without clearing the cache but never store any etags it gets)
I'm using Modify Headers since Firefox 3.6 to filter and and modify ETag and some other headers. http://www.garethhunt.com/modifyheaders/
I realised it was used for tracking some years ago when I happen to notice some cached images carried the tag.
I don't think you can avoid storing the tag as it is image meta data.
You just proved that your reading comprehension is a complete failure. Here's a hint to you: If your answer contains the word "ad", it is most probably not a proper answer to my post.
The Tao of math: The numbers you can count are not the real numbers.
OK, so how does the NY Times track me. I'm running Firefox on Win 7, I've cleared my cache, I've cleared my cookies, I've cleared the Flash cookies, no luck.
Incognito modes have never been about being anonymous to the web sites you visit. It's all about leaving no trace on the local machine.
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
There are more sneaky ways than ETags to track you without cookies. Some of the more diabolic schemes involve sending you a specially crafted PNG file, then reading is pixel values using HTML5 canvas, or inserting invisible links into pages and then checking if they have the ":visited" pseudo-class. For more information, see the Wikipedia entry for Evercookie.
Anyway, most of these techniques can be mitigated by clearing your cache. I clear mine after each browsing session, so while I might get tracked for a few hours, I should appear as a different person the next time I come online.
Has anyone noticed that the xpi file downloaded from the Dephormation website does not agree with the values published on that website?
From my PC:
26-Aug-13 01:40 PM 497,689 SecretAgent.xpi
F:\downloads>sha1sum SecretAgent.xpi
294673877b38e6044248cfd51f91542886297090 SecretAgent.xpi
F:\downloads>md5sum SecretAgent.xpi
d60880a495465aa0df69c4bb3312799e *SecretAgent.xpi
From: https://www.dephormation.org.uk/?page=2 website:
Latest version 5.21 (released 2013-04-14).
Please follow the installation instructions below carefully. Protect your right to communication privacy, security, and integrity. Stop Phorm.
MD5 Checksum: 7458753a7f54aac38e56f802fa7eb731
SHA1 Checksum: 9f12928d15eccf92bd376638097d3451f2141f09
Comment removed based on user account deletion
I have pointed this out numerous times to APK myself. Sadly, he doesn't get that either.
Change is certain; progress is not obligatory.
Maxwell Demon, this is why there is little point replying to the guy, you can see through the numerous posts his reading comprehension is poor regardless.
Change is certain; progress is not obligatory.
Comment removed based on user account deletion
Except that on a LAN FTP is almost the only protocol I can rely on to get high speed data transfers. SFTP blocks at about 30 MB/s when FTP can easily get 90.
If the Linux distros would be reasonable and enable the "none" crypto on SSH it would be a good thing. If I explicitly ask for no crypto then why are they making it hard for me to get what I want?
Yes, I know I can recompile OpenSSH for "none" crypto but it is easier to set up FTP, or even use tar and netcat.
OK, then please tell me, when sequentially numbering the words in my post you linked, which numbers go to the words "cookies" and "ads". Because I cannot find either in the post.
The Tao of math: The numbers you can count are not the real numbers.
For Mozilla-based browsers such as Firefox and SeaMonkey, the SecretAgent extension conflicts with the PrefBar User Agent menulist.
Because some Web sites I visit are sensitive to what user agent they see, I unchecked (disabled) the "Rotate User Agent" checkbox in SecretAgent. Then, if I used the PrefBar User Agent menulist to spoof some other browser, it kept resetting to my actual user agent. Since I consider the PrefBar capability to be very important, I removed SecretAgent. The PrefBar capability was then restored.
Evil. Seriously, this shit is getting messed up.
No sig for you! Come back one year!
I don't care about whether you do or want to block YouTube videos on third party sites. But the point is that you claimed that host files are better than RequestPolicy. Which they are not because they simply offer different functionality. You cannot replace RequestPolicy with host file entries. And blocking embedded videos from third-party sites (and especially YouTube) is one thing I use RequestPolicy for.
And no, I do not want to completely block YouTube (if I would want that, then blocking in the hosts file would probably be the better alternative). I want to block embedded YouTube videos in third-party sites (and moreover, I want to easily unblock them in the case that I decide I want to see that video, which happens in the minority of cases). Why? Well, because I don't see why I should let YouTube (that is, Google) know that I'm on that third-party site when I don't have the desire to watch that video.
And BTW, you seem to have the misconception that ring 0 code runs faster than ring 3 code. That is wrong. Unless you try to execute privileged instructions (which code that just compares strings certainly does not do), the code is executed exactly the same..
The Tao of math: The numbers you can count are not the real numbers.
I didn't "run from this". But sometimes I also do other things than reading/writing on Slashdot (like sleeping, working, or listening to radio broadcasts which certainly don't wait for me while I write comments).
And BTW, your claim that you answered my questions is completely wrong. You answered questions which I never asked, while leaving the questions I asked unanswered (maybe because the answers could have not been in favour of hosts files?)
The Tao of math: The numbers you can count are not the real numbers.
I had written a large point-to-point reply, but the lame(ness) filter won't let me post it (and doesn't even tell me why; Slashdot is really going downhill!). Therefore here's a summary:
1. RequestPolicy doesn't do all from the list (but most). But I never claimed it was the solution to everything. And BTW, against your claim, hosts files also do not do all of that.
2. RequestPolicy can do things hosts files cannot do. Which was my whole point: It is not made redundant by hosts files. Now it is true that also hosts files can something RequestPolicy cannot do. But I never claimed otherwise.
So, now let's see if the lame(ness) filter likes this.
(BTW, is there any place where I can ask why my original post did not pass the lame(ness) filter?
Or do you have a hosts file solution to it? ;-))
The Tao of math: The numbers you can count are not the real numbers.
I already wrote that. But why am I not surprised that you didn't (or pretend you didn't) notice it?
The Tao of math: The numbers you can count are not the real numbers.
My Blogpost in 2007 (sorry, its german):
http://blog.laxu.de/2007/09/23/browser-raten-und-e-tag-cookies/