Security Fears Over Google Accelerator
Espectr0 writes "A software tool launched by Google on Wednesday that speeds up the process of downloading Web sites (covered recently on Slashdot) has caused some users to worry about their privacy.
A ZDNet article discusses problems that users have been experiencing with the information that is cached by the software. On a Google Labs discussion group, one user said that 'I went to the Futuremark forums and noticed that I'm logged in as someone I don't know...'" Commentary also available on Signal vs. Noise and BlogNewsChannel.
Its a caching proxy server for crying out loud. It caches web pages and feeds you the cached version. This is not new nor is it surprising, especially for a new service offering.
Feed the need: Digitaladdiction.net
I had to remove it from my system. It hijacked my browser, and I was not able to browse my companies internal websites because it over-rode our proxy. Bummer too...it worked great
I'm not a troll, but I play one on Slashdot.
I ran it for about an hour; turns out it's lumpy when one deals with multiple proxy servers (work vs. home) and it broke Rhapsody in a BIG way. I'm sure the good folks at Google will sort it out eventually.
OTOH, one must consider whether or not one trusts Google with one's information that way. I wanted to check it out, but probably, in the long run, wouldn't have used it. But it's worth noting that millions of people use ISP proxy servers without even knowing it (think transparent proxies) or without understanding it (think "proxy.isp.com"). I can't imagine that Google's Accelerator would expose one *more* than that.
Thinking outside my Head
The accelerator prefetches the links on web pages, in effect clicking on all of them (except ads), which includes links that say 'delete this' or 'unsubscribe' etc. Many webpages use GET links to do these actions, and this is causing pages to disappear. Until web apps are rewritten to take note of the prefetch header, it's probably unsafe to use the accelerator. (Which seems to be offline at the moment - the page redirects you to the toolbar)
"When the only tool you own is a hammer, every problem begins to resemble a nail." - Abraham Maslow (1908-1970)
this site was pretty useful for information. So was AOL webmaster resources info.
AC comments get piped to
For more info about these known issues with HTTP caching, see the following
Build it, and they will come^Hplain.
From a users point of view:
1 - Ignores hosts file, so I end up seeing ads I normally wouldn't see
2 - Cookies work weirdly if at all, a lot more sites that I visit frequently appear to use cookies, and I've noticed some definte weirdness
3 - The time saved on a broadband connection really seems minimal, after an hr or two of surfing it takes a few seconds
4 - The pre-fetching it supports is already in firefox and probably other browsers
From a webmasters point of view:
1 - No way to limit caching of certain pages outside of moving them to SSL. Robots.txt isn't being followed (although probably rightly so, based on the application ).
2 - Because of the flawed cookie support (at least right now) a lot of affilate and different advertising methods have to be modified to support this.
I'm a big google fan, and I use most of their applications daily, but this one defintely needs some work. :)
http://www.somethingawful.com/articles.php?a=2858
Really insightful.
Most web statistics are complete crap.
Build it, and they will come^Hplain.
Who said it was a cookie that was cached, and not the page content? Much of the discussion thusfar seemed based off what an anonymous quote in a ZDnet article. Far as I can tell, the guy saw "Welcome back, Bob!" and freaked, when he wasn't -actually- logged in as Bob. Furthermore, who says it isn't Futuremark (or their forum software- because we all know how security-conscious PHP/MySQL forum software is) tagging their pages as cacheable when they shouldn't be? If Google is ignoring "don't cache this page", now yes, we have a problem- but the ZDnet story is of a technical level I'd expect of a community newspaper, so it's kind of hard to tell. It's like a story in your city newspaper that read "somebody killed by a cop!" and going off on a rant about police brutality...only to find out later the guy was a bank robber with an Uzi.
Before you get all excited about bank sites etc- keep in mind those often use very unique URLs for each page and other tricks.
Please help metamoderate.
Or the google system is deliverying a cached page which should not have been cached in the first place. AOL and a few other providers do this all the time to members of my site - and I don't use an IP as the identifier (just a hash-digest).
What if that page was my account information?
Ooooooooh looky here, I can see the details people would rather keep private.
Its not just about clicking anywhere afterwards.
liqbase
Here are the headers that the Futuremark forums give me when I am logged in:As you can see, neither "Cache-Control: private" nor "Vary: Cookie" is given. In fact, the server doesn't even give an expiration date for the content. Under these conditions, the HTTP/1.1 protocol says that it is perfectly OK for a cache to keep this page for awhile and serve it to other people.
This problem is firmly the fault of the people who wrote Futuremark's forums. This constitutes a major security hole in the WWWThreads forum package, because this problem will occur when using any standards-compliant HTTP cache. I would strongly recommend against the use of these forums on any web site until they fix their security problems.
(I do not know if other forum software has this problem, but frankly it would not surprise me. It seems lots of PHP developers and other high-level web programmers have no idea how HTTP/1.1 works, and assume that headers are completely unimportant. I have written a web server and forum software myself, though, and I made damned sure that mine produces the right headers.)
Uhh... right, lets see, here is a page http://www.google.com/options/index.html with 16 Services and 8 tools that they offer, none of which are in Beta, and here is a page http://www.google.com/downloads/ of six software downloads and ooooohhh, one of them is Beta, and here is their "labs" page http://labs.google.com/ that has all their Beta products, note the list on the right hand side of their seven "Graduates of Labs" non-Beta products.
Here's some code to add to your web pages to block GWA. This will leave static media alone, which is fine.
] ))
PHP:
if(array_key_exists($_SERVER['HTTP_X_MOZ'
{
if(strtoupper($_SERVER['HTTP_X_MOZ']) == 'prefetch')
{
header("HTTP/1.x 403 Forbidden");
header("Content-Type: text/html; charset=iso-8859-1");
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Cache-Control: no-store, no-cache,
must-revalidate");
header("Cache-Control: post-check=0, pre-check=0",
FALSE);
header("Pragma: no-cache");
header('Accept-Ranges:');
exit();
}
}
CFML:
Damien
Yup exactly, so the problem is with the website, not the google proxy. The same problem would occur with any proxy for a website which uses IP address to determine the same user. Websites should be managing sessions.
How do you think PHP attaches that session to your computer? Cookies....dipshit.
The web accelerator is not a robot, so this is correct behavior.
NOARCHIVE is a Google specific extension to the robots.txt specification, and again, this is not a robot.
I'd be absolutely shocked if that were actually the case. I also believe it respects the Expires header as well as the Cache-Control header.
If they're following the proper standards, then it's not their place to care or not. If your website doesn't properly specify cache-control (many don't) then you get what you get.
For any pages with user-specific content, add the "Cache-Control: private" header and voila, problem solved for you.
If you want to opt out entirely, then a simple "Cache-Control: no-cache" header in your HTTP responses would do the trick, as would "Pragma: no-cache", I bet.
Furthermore, there is no cookie-mishanding I've actually seen, and I've tested it. It passes cookies through just fine, without caching them, near as I can tell.
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
Cookies are used for session tracking.
You can do this with URL encoding but it's ugly and non-architectural.
Cookies are perfect for this; the only thing wrong with using them for this is that there is a small perecentage of paranoid people who don't like cookies, never mind the time to live, never mind what's in them, never mind that your session can be tracked by other means.
It's just knee-jerk reactionism.