Security Fears Over Google Accelerator

← Back to Stories (view on slashdot.org)

Security Fears Over Google Accelerator

Posted by Zonk on Friday May 6, 2005 @05:31AM from the road-to-hell-is-paved-with-web-caching dept.

Espectr0 writes "A software tool launched by Google on Wednesday that speeds up the process of downloading Web sites (covered recently on Slashdot) has caused some users to worry about their privacy. A ZDNet article discusses problems that users have been experiencing with the information that is cached by the software. On a Google Labs discussion group, one user said that 'I went to the Futuremark forums and noticed that I'm logged in as someone I don't know...'" Commentary also available on Signal vs. Noise and BlogNewsChannel.

30 of 355 comments (clear)

Min score:

Reason:

Sort:

Does this surprise anyone? by Jailbrekr · 2005-05-06 05:36 · Score: 5, Informative

Its a caching proxy server for crying out loud. It caches web pages and feeds you the cached version. This is not new nor is it surprising, especially for a new service offering.

--
Feed the need: Digitaladdiction.net
1. Re:Does this surprise anyone? by 44BSD · 2005-05-06 05:46 · Score: 4, Informative
  
  It is more than a caching proxy.
  
  The client-side portion of the architecture aggressively prefetches content. It's a two-stage proxy, really, and the issue some people have with it is that the content in the portion on the end-user's hard drive is not content that the user asked for, but content that the proxy predicts the user will soon ask for.
Had to remove it from my computer by PenguinBoyDave · 2005-05-06 05:37 · Score: 4, Informative

I had to remove it from my system. It hijacked my browser, and I was not able to browse my companies internal websites because it over-rode our proxy. Bummer too...it worked great

--
I'm not a troll, but I play one on Slashdot.
1. Re:Had to remove it from my computer by Chyeld · 2005-05-06 05:41 · Score: 5, Informative
  
  Why didn't you just tell it not to get in ivolved when browsing that domain? It does have exclusion rules built in.
Well, it *is* beta, after all by NixLuver · 2005-05-06 05:40 · Score: 2, Informative

I ran it for about an hour; turns out it's lumpy when one deals with multiple proxy servers (work vs. home) and it broke Rhapsody in a BIG way. I'm sure the good folks at Google will sort it out eventually.

OTOH, one must consider whether or not one trusts Google with one's information that way. I wanted to check it out, but probably, in the long run, wouldn't have used it. But it's worth noting that millions of people use ISP proxy servers without even knowing it (think transparent proxies) or without understanding it (think "proxy.isp.com"). I can't imagine that Google's Accelerator would expose one *more* than that.

--
Thinking outside my Head
Bigger problems with web accelerator by alphakappa · 2005-05-06 05:41 · Score: 4, Informative

The accelerator prefetches the links on web pages, in effect clicking on all of them (except ads), which includes links that say 'delete this' or 'unsubscribe' etc. Many webpages use GET links to do these actions, and this is causing pages to disappear. Until web apps are rewritten to take note of the prefetch header, it's probably unsafe to use the accelerator. (Which seems to be offline at the moment - the page redirects you to the toolbar)

--
"When the only tool you own is a hammer, every problem begins to resemble a nail." - Abraham Maslow (1908-1970)
1. Re:Bigger problems with web accelerator by Anonymous Coward · 2005-05-06 06:02 · Score: 2, Informative
  
  in effect clicking on all of them (except ads), which includes links that say 'delete this' or 'unsubscribe' etc. Many webpages use GET links to do these actions
  
  Then they were coded by morons. Section 9.1.1. of RFC 2616 (the HTTP 1.1 specification) explicitly states that GET should not be used for unsafe actions:
  
  In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe".
  
  This is nothing new, "web accelerators" have been doing this for a decade or so, and every time one becomes popular, these moron developers get bitten and end up blaming the web accelerators instead of accepting responsibility.
2. Re:Bigger problems with web accelerator by Anonymous Coward · 2005-05-06 06:43 · Score: 1, Informative
  
  If you are logged on, you can see the delete icon near all your comments (or all comments if you are the blog author), which is just a simple link that deletes the comment without any server side confirmation.
  
  I don't see that at all, I see a link that takes me to a confirmation page that has a button on it.
Bad caching directives by Sebby · 2005-05-06 05:42 · Score: 5, Informative

We encounted similar problems when we implemented aggressive caching on our site; mostly that we didn't set the headers properly.

this site was pretty useful for information. So was AOL webmaster resources info.

--

AC comments get piped to /dev/null
Cache-Control is your friend. by oneiros27 · 2005-05-06 05:43 · Score: 5, Informative
If Google is ignoring Cache-Control headers, then that's one thing to complain about. There's also a good chance that some of these sites are using improper systems for session control (eg, using HTTP_ADDR without checking X_FORWARDED_FOR, and not setting Cache-Control on their response).

For more info about these known issues with HTTP caching, see the following
- RFC 3143 - Known HTTP Proxy/Caching Problems
- RFC 2068 - Hypertext Transfer Protocol -- HTTP/1.1 (see sections 8.1.3 and 13, 14.9 and 15)
--
Build it, and they will come^Hplain.
Some things I've noticed by Palos · 2005-05-06 05:44 · Score: 3, Informative

I tried this for a little bit, and really am not impressed. Some basic issues:
From a users point of view:

1 - Ignores hosts file, so I end up seeing ads I normally wouldn't see

2 - Cookies work weirdly if at all, a lot more sites that I visit frequently appear to use cookies, and I've noticed some definte weirdness

3 - The time saved on a broadband connection really seems minimal, after an hr or two of surfing it takes a few seconds

4 - The pre-fetching it supports is already in firefox and probably other browsers

From a webmasters point of view:

1 - No way to limit caching of certain pages outside of moving them to SSL. Robots.txt isn't being followed (although probably rightly so, based on the application ).

2 - Because of the flawed cookie support (at least right now) a lot of affilate and different advertising methods have to be modified to support this.

I'm a big google fan, and I use most of their applications daily, but this one defintely needs some work. :)
Something Awful's take on this by grazzy · 2005-05-06 05:44 · Score: 2, Informative

http://www.somethingawful.com/articles.php?a=2858

Really insightful.
Re:I have another concern though by oneiros27 · 2005-05-06 05:49 · Score: 3, Informative

There are three types of lies : lies, damned lies, and statistics.
(attributed to way too many people)
If you thought web statistics meant anything, you're lying to yourself. Anyone who's done any work with collecting web statistics has had to deal with the AOL Proxies for the last decade. (and with IE deciding it was going to start lying, and say that it was Netscape, etc.)

Most web statistics are complete crap.

--
Build it, and they will come^Hplain.
caching personalized content != caching cookies by SuperBanana · 2005-05-06 05:54 · Score: 5, Informative

How does caching your cookies to the internet help speed up your local browsing?
Who said it was a cookie that was cached, and not the page content? Much of the discussion thusfar seemed based off what an anonymous quote in a ZDnet article. Far as I can tell, the guy saw "Welcome back, Bob!" and freaked, when he wasn't -actually- logged in as Bob. Furthermore, who says it isn't Futuremark (or their forum software- because we all know how security-conscious PHP/MySQL forum software is) tagging their pages as cacheable when they shouldn't be? If Google is ignoring "don't cache this page", now yes, we have a problem- but the ZDnet story is of a technical level I'd expect of a community newspaper, so it's kind of hard to tell. It's like a story in your city newspaper that read "somebody killed by a cop!" and going off on a rant about police brutality...only to find out later the guy was a bank robber with an Uzi.

Before you get all excited about bank sites etc- keep in mind those often use very unique URLs for each page and other tricks.

--
Please help metamoderate.
Re:Maybe i don'd understand how it works? by Seumas · 2005-05-06 06:02 · Score: 2, Informative

Or the google system is deliverying a cached page which should not have been cached in the first place. AOL and a few other providers do this all the time to members of my site - and I don't use an IP as the identifier (just a hash-digest).
Re:Not quite as serious as it sounds.. by LiquidCoooled · 2005-05-06 06:18 · Score: 2, Informative

What if that page was my account information?

Ooooooooh looky here, I can see the details people would rather keep private.
Its not just about clicking anywhere afterwards.

--
liqbase :: faster than paper
Futuremark's problem, not Google's by Temporal · 2005-05-06 06:21 · Score: 4, Informative

I assume Google has properly implemented the HTTP/1.1 caching mechanisms. Among these, it is possible for a server to mark a page as being "private", meaning that it should never be cached in a public cache like Google's. Another thing the server can do is set "Vary: Cookie", which indicates that the server will produce different pages for people who give it different cookies.

Here are the headers that the Futuremark forums give me when I am logged in:
HTTP/1.1 200 OK Date: Fri, 06 May 2005 18:10:16 GMT Server: Apache/1.3.29 (Unix) mod_perl/1.29 Transfer-Encoding: chunked Content-Type: text/html
As you can see, neither "Cache-Control: private" nor "Vary: Cookie" is given. In fact, the server doesn't even give an expiration date for the content. Under these conditions, the HTTP/1.1 protocol says that it is perfectly OK for a cache to keep this page for awhile and serve it to other people.

This problem is firmly the fault of the people who wrote Futuremark's forums. This constitutes a major security hole in the WWWThreads forum package, because this problem will occur when using any standards-compliant HTTP cache. I would strongly recommend against the use of these forums on any web site until they fix their security problems.

(I do not know if other forum software has this problem, but frankly it would not surprise me. It seems lots of PHP developers and other high-level web programmers have no idea how HTTP/1.1 works, and assume that headers are completely unimportant. I have written a web server and forum software myself, though, and I made damned sure that mine produces the right headers.)
1. Re:Futuremark's problem, not Google's by Godeke · 2005-05-06 06:35 · Score: 2, Informative
  
  Interesting. Microsoft is doing the "right thing" with IIS6:
  Date: Fri, 06 May 2005 18:31:39 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET Content-Length: 5905 Content-Type: text/html Expires: Thu, 05 May 2005 18:31:38 GMT Cache-Control: private
  This is apparently the default.
  
  --
  Sig under construction since 1998.
2. Re:Futuremark's problem, not Google's by Temporal · 2005-05-06 06:59 · Score: 2, Informative
  
  Of course, you should only slap the "private" header on pages which are actually private. Otherwise you're just killing the ability of the cache to do its job. But, marking everything private is better than marking nothing private; the former just reduces performance while the latter is a security problem.
  
  It looks like microsoft.com, which simply redirects to www.microsoft.com, is marked "private". That's excessive, and indicates to me that Microsoft's web designers don't understand cache-friendliness or weren't interested in implementing it.
3. Re:Futuremark's problem, not Google's by Godeke · 2005-05-06 08:02 · Score: 3, Informative
  
  No, my point was exactly that "marking everything private is better than marking nothing private": this was the header from a site I built. Now that I'm aware of the ramifications, I can remove that header from the appropriate pages (the few that are not data driven). But I far prefer the default this way that discovering "oh yay, all my data driven pages are stupidly cached". Right now the site is just rude and uninformed, not broken.
  
  As far as Microsoft's sites, I really could care less how stupid their choices are, I'm just glad I can now implement it properly by adding the change where necessary instead of having egg on my face for not having a piece of information when I built the site. During building the site, the only cache I considered was the browser cache. Bad, but not as bad as what I'm finding on my personal PHP driven sites on this same issue. There I just look stupid:
  Date: Fri, 06 May 2005 20:00:49 GMT Server: Apache/1.3.33 (Unix) mod_jk2/2.0.0 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.22 OpenSSL/0.9.6b PHP-CGI/0.1b Last-Modified: Fri, 30 Nov 2001 20:02:22 GMT Etag: "2bed0b-1b27-3c07e5ce" Accept-Ranges: bytes Content-Length: 6951 Keep-Alive: timeout=10, max=100 Connection: Keep-Alive Content-Type: text/htm
  (Um, yeah, haven't updated that ugly site in four years).
  
  --
  Sig under construction since 1998.
4. Re:Futuremark's problem, not Google's by Temporal · 2005-05-06 08:17 · Score: 2, Informative
  
  Yes, I agree, "Cache-Control: private" should be default for any dynamic site unless the developer states otherwise. It was a good idea for IIS to do it that way.
5. Re:Futuremark's problem, not Google's by supersat · 2005-05-06 09:09 · Score: 2, Informative
  
  You are quite correct. I observed the same thing with the Futuremark forums.
  
  A lot of people on LiveJournal where whining about it, but LiveJournal correctly includes a Cache-Control header, and after several extensive tests, I've found Google's Web Accelerator to not cache anything it shouldn't.
  
  When you first request a page, it sends the request to Google along with some of the request headers (which may contain cookies). Google then sends back a response with a special X-Google-Cache-Control header that instructs the client what to do next. In LiveJournal's case, it sends back X-Google-Cache-Control: remote-fetch, which causes the client to directly fetch the page from LiveJournal. The page contents are not transferred back to Google. Subsequent loads of the page cause only a few bytes to be exchanged with the Web Accelerator server.
  
  Interestingly enough, with a packet sniffer, you can see what it prefetches. When you go to Google.com, it begins fetching hotmail.com, ebay.com, and cnn.com. That says a lot about the typical user.
Re:All Together Now... by MCraigW · 2005-05-06 06:24 · Score: 3, Informative

and all their biggest products, sans *one*, are in beta. Ball back to you
Uhh... right, lets see, here is a page http://www.google.com/options/index.html with 16 Services and 8 tools that they offer, none of which are in Beta, and here is a page http://www.google.com/downloads/ of six software downloads and ooooohhh, one of them is Beta, and here is their "labs" page http://labs.google.com/ that has all their Beta products, note the list on the right hand side of their seven "Graduates of Labs" non-Beta products.
Some code to block GWA from application pages by DamienMcKenna · 2005-05-06 06:41 · Score: 2, Informative

Here's some code to add to your web pages to block GWA. This will leave static media alone, which is fine.

PHP:
if(array_key_exists($_SERVER['HTTP_X_MOZ'] ))
{
if(strtoupper($_SERVER['HTTP_X_MOZ']) == 'prefetch')
{
header("HTTP/1.x 403 Forbidden");
header("Content-Type: text/html; charset=iso-8859-1");
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Cache-Control: no-store, no-cache,
must-revalidate");
header("Cache-Control: post-check=0, pre-check=0",
FALSE);
header("Pragma: no-cache");
header('Accept-Ranges:');
exit();
}
}

CFML:

Damien
1. Re:Some code to block GWA from application pages by DamienMcKenna · 2005-05-06 06:57 · Score: 1, Informative
  
  CFML:
  
  
  <cfif structKeyExists(cgi, 'HTTP_X_MOZ')>
  <cfif cgi.HTTP_X.MOZ EQ 'prefetch'>
  <cfheader statuscode="403" statustext="Google Web Accelerator requests are forbidden." />
  <cfabort />
  </cfif>
  </cfif>
  
  Damien
2. Re:Some code to block GWA from application pages by DamienMcKenna · 2005-05-06 08:13 · Score: 2, Informative
  
  Rather, it should be...
  
  
  <cfif structKeyExists(cgi, 'HTTP_X_MOZ')>
  <cfif cgi.HTTP_X_MOZ EQ 'prefetch'>
  <cfheader statuscode="403" statustext="Google Web Accelerator requests are forbidden." />
  <cfabort />
  
  </cfif>
  </cfif>
  
  Small typo on the second variable name. Doh!
  
  Damien
Re:Maybe i don'd understand how it works? by Sinus0idal · 2005-05-06 06:51 · Score: 3, Informative

Yup exactly, so the problem is with the website, not the google proxy. The same problem would occur with any proxy for a website which uses IP address to determine the same user. Websites should be managing sessions.
Re:PHP by Anonymous Coward · 2005-05-06 07:01 · Score: 1, Informative

How do you think PHP attaches that session to your computer? Cookies....dipshit.
Response by Otto · 2005-05-06 11:17 · Score: 5, Informative

The web accelerator ignores robots.txt.

The web accelerator is not a robot, so this is correct behavior.

The web accelerator ignores the NOARCHIVE meta.

NOARCHIVE is a Google specific extension to the robots.txt specification, and again, this is not a robot.

I believe, but have yet to confirm, that it ignores any no-cache pragma headers.

I'd be absolutely shocked if that were actually the case. I also believe it respects the Expires header as well as the Cache-Control header.

It avoids prefetching anything with a question mark in the URL, but what about all those PATH_INFO dynamic links we've been installing for the last four years so that our dynamic pages look like static URLs? Google prefetches many of these, and there are numerous reports that this prefetching, along with some cookie mishandling by Google, is breaking sites out there. Does Google care?

If they're following the proper standards, then it's not their place to care or not. If your website doesn't properly specify cache-control (many don't) then you get what you get.

For any pages with user-specific content, add the "Cache-Control: private" header and voila, problem solved for you.

If you want to opt out entirely, then a simple "Cache-Control: no-cache" header in your HTTP responses would do the trick, as would "Pragma: no-cache", I bet.

Furthermore, there is no cookie-mishanding I've actually seen, and I've tested it. It passes cookies through just fine, without caching them, near as I can tell.

--
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
Re:Privacy eh? by Anonymous Coward · 2005-05-06 21:36 · Score: 1, Informative

Cookies are used for session tracking.

You can do this with URL encoding but it's ugly and non-architectural.

Cookies are perfect for this; the only thing wrong with using them for this is that there is a small perecentage of paranoid people who don't like cookies, never mind the time to live, never mind what's in them, never mind that your session can be tracked by other means.

It's just knee-jerk reactionism.