Slashdot Mirror


Security Fears Over Google Accelerator

Espectr0 writes "A software tool launched by Google on Wednesday that speeds up the process of downloading Web sites (covered recently on Slashdot) has caused some users to worry about their privacy. A ZDNet article discusses problems that users have been experiencing with the information that is cached by the software. On a Google Labs discussion group, one user said that 'I went to the Futuremark forums and noticed that I'm logged in as someone I don't know...'" Commentary also available on Signal vs. Noise and BlogNewsChannel.

15 of 355 comments (clear)

  1. Does this surprise anyone? by Jailbrekr · · Score: 5, Informative

    Its a caching proxy server for crying out loud. It caches web pages and feeds you the cached version. This is not new nor is it surprising, especially for a new service offering.

    --
    Feed the need: Digitaladdiction.net
    1. Re:Does this surprise anyone? by 44BSD · · Score: 4, Informative

      It is more than a caching proxy.

      The client-side portion of the architecture aggressively prefetches content. It's a two-stage proxy, really, and the issue some people have with it is that the content in the portion on the end-user's hard drive is not content that the user asked for, but content that the proxy predicts the user will soon ask for.

  2. Had to remove it from my computer by PenguinBoyDave · · Score: 4, Informative

    I had to remove it from my system. It hijacked my browser, and I was not able to browse my companies internal websites because it over-rode our proxy. Bummer too...it worked great

    --
    I'm not a troll, but I play one on Slashdot.
    1. Re:Had to remove it from my computer by Chyeld · · Score: 5, Informative

      Why didn't you just tell it not to get in ivolved when browsing that domain? It does have exclusion rules built in.

  3. Bigger problems with web accelerator by alphakappa · · Score: 4, Informative

    The accelerator prefetches the links on web pages, in effect clicking on all of them (except ads), which includes links that say 'delete this' or 'unsubscribe' etc. Many webpages use GET links to do these actions, and this is causing pages to disappear. Until web apps are rewritten to take note of the prefetch header, it's probably unsafe to use the accelerator. (Which seems to be offline at the moment - the page redirects you to the toolbar)

    --
    "When the only tool you own is a hammer, every problem begins to resemble a nail." - Abraham Maslow (1908-1970)
  4. Bad caching directives by Sebby · · Score: 5, Informative
    We encounted similar problems when we implemented aggressive caching on our site; mostly that we didn't set the headers properly.

    this site was pretty useful for information. So was AOL webmaster resources info.

    --

    AC comments get piped to /dev/null
  5. Cache-Control is your friend. by oneiros27 · · Score: 5, Informative
    If Google is ignoring Cache-Control headers, then that's one thing to complain about. There's also a good chance that some of these sites are using improper systems for session control (eg, using HTTP_ADDR without checking X_FORWARDED_FOR, and not setting Cache-Control on their response).

    For more info about these known issues with HTTP caching, see the following
    --
    Build it, and they will come^Hplain.
  6. Some things I've noticed by Palos · · Score: 3, Informative
    I tried this for a little bit, and really am not impressed. Some basic issues:

    From a users point of view:

    1 - Ignores hosts file, so I end up seeing ads I normally wouldn't see

    2 - Cookies work weirdly if at all, a lot more sites that I visit frequently appear to use cookies, and I've noticed some definte weirdness

    3 - The time saved on a broadband connection really seems minimal, after an hr or two of surfing it takes a few seconds

    4 - The pre-fetching it supports is already in firefox and probably other browsers

    From a webmasters point of view:

    1 - No way to limit caching of certain pages outside of moving them to SSL. Robots.txt isn't being followed (although probably rightly so, based on the application ).

    2 - Because of the flawed cookie support (at least right now) a lot of affilate and different advertising methods have to be modified to support this.

    I'm a big google fan, and I use most of their applications daily, but this one defintely needs some work. :)

  7. Re:I have another concern though by oneiros27 · · Score: 3, Informative
    There are three types of lies : lies, damned lies, and statistics.
    (attributed to way too many people)
    If you thought web statistics meant anything, you're lying to yourself. Anyone who's done any work with collecting web statistics has had to deal with the AOL Proxies for the last decade. (and with IE deciding it was going to start lying, and say that it was Netscape, etc.)

    Most web statistics are complete crap.
    --
    Build it, and they will come^Hplain.
  8. caching personalized content != caching cookies by SuperBanana · · Score: 5, Informative
    How does caching your cookies to the internet help speed up your local browsing?

    Who said it was a cookie that was cached, and not the page content? Much of the discussion thusfar seemed based off what an anonymous quote in a ZDnet article. Far as I can tell, the guy saw "Welcome back, Bob!" and freaked, when he wasn't -actually- logged in as Bob. Furthermore, who says it isn't Futuremark (or their forum software- because we all know how security-conscious PHP/MySQL forum software is) tagging their pages as cacheable when they shouldn't be? If Google is ignoring "don't cache this page", now yes, we have a problem- but the ZDnet story is of a technical level I'd expect of a community newspaper, so it's kind of hard to tell. It's like a story in your city newspaper that read "somebody killed by a cop!" and going off on a rant about police brutality...only to find out later the guy was a bank robber with an Uzi.

    Before you get all excited about bank sites etc- keep in mind those often use very unique URLs for each page and other tricks.

  9. Futuremark's problem, not Google's by Temporal · · Score: 4, Informative
    I assume Google has properly implemented the HTTP/1.1 caching mechanisms. Among these, it is possible for a server to mark a page as being "private", meaning that it should never be cached in a public cache like Google's. Another thing the server can do is set "Vary: Cookie", which indicates that the server will produce different pages for people who give it different cookies.

    Here are the headers that the Futuremark forums give me when I am logged in:
    HTTP/1.1 200 OK
    Date: Fri, 06 May 2005 18:10:16 GMT
    Server: Apache/1.3.29 (Unix) mod_perl/1.29
    Transfer-Encoding: chunked
    Content-Type: text/html
    As you can see, neither "Cache-Control: private" nor "Vary: Cookie" is given. In fact, the server doesn't even give an expiration date for the content. Under these conditions, the HTTP/1.1 protocol says that it is perfectly OK for a cache to keep this page for awhile and serve it to other people.

    This problem is firmly the fault of the people who wrote Futuremark's forums. This constitutes a major security hole in the WWWThreads forum package, because this problem will occur when using any standards-compliant HTTP cache. I would strongly recommend against the use of these forums on any web site until they fix their security problems.

    (I do not know if other forum software has this problem, but frankly it would not surprise me. It seems lots of PHP developers and other high-level web programmers have no idea how HTTP/1.1 works, and assume that headers are completely unimportant. I have written a web server and forum software myself, though, and I made damned sure that mine produces the right headers.)
    1. Re:Futuremark's problem, not Google's by Godeke · · Score: 3, Informative
      No, my point was exactly that "marking everything private is better than marking nothing private": this was the header from a site I built. Now that I'm aware of the ramifications, I can remove that header from the appropriate pages (the few that are not data driven). But I far prefer the default this way that discovering "oh yay, all my data driven pages are stupidly cached". Right now the site is just rude and uninformed, not broken.

      As far as Microsoft's sites, I really could care less how stupid their choices are, I'm just glad I can now implement it properly by adding the change where necessary instead of having egg on my face for not having a piece of information when I built the site. During building the site, the only cache I considered was the browser cache. Bad, but not as bad as what I'm finding on my personal PHP driven sites on this same issue. There I just look stupid:
      Date: Fri, 06 May 2005 20:00:49 GMT
      Server: Apache/1.3.33 (Unix) mod_jk2/2.0.0 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.22 OpenSSL/0.9.6b PHP-CGI/0.1b
      Last-Modified: Fri, 30 Nov 2001 20:02:22 GMT
      Etag: "2bed0b-1b27-3c07e5ce"
      Accept-Ranges: bytes
      Content-Length: 6951
      Keep-Alive: timeout=10, max=100
      Connection: Keep-Alive
      Content-Type: text/htm
      (Um, yeah, haven't updated that ugly site in four years).
      --
      Sig under construction since 1998.
  10. Re:All Together Now... by MCraigW · · Score: 3, Informative
    and all their biggest products, sans *one*, are in beta. Ball back to you

    Uhh... right, lets see, here is a page http://www.google.com/options/index.html with 16 Services and 8 tools that they offer, none of which are in Beta, and here is a page http://www.google.com/downloads/ of six software downloads and ooooohhh, one of them is Beta, and here is their "labs" page http://labs.google.com/ that has all their Beta products, note the list on the right hand side of their seven "Graduates of Labs" non-Beta products.

  11. Re:Maybe i don'd understand how it works? by Sinus0idal · · Score: 3, Informative

    Yup exactly, so the problem is with the website, not the google proxy. The same problem would occur with any proxy for a website which uses IP address to determine the same user. Websites should be managing sessions.

  12. Response by Otto · · Score: 5, Informative
    The web accelerator ignores robots.txt.


    The web accelerator is not a robot, so this is correct behavior.

    The web accelerator ignores the NOARCHIVE meta.


    NOARCHIVE is a Google specific extension to the robots.txt specification, and again, this is not a robot.

    I believe, but have yet to confirm, that it ignores any no-cache pragma headers.


    I'd be absolutely shocked if that were actually the case. I also believe it respects the Expires header as well as the Cache-Control header.

    It avoids prefetching anything with a question mark in the URL, but what about all those PATH_INFO dynamic links we've been installing for the last four years so that our dynamic pages look like static URLs? Google prefetches many of these, and there are numerous reports that this prefetching, along with some cookie mishandling by Google, is breaking sites out there. Does Google care?


    If they're following the proper standards, then it's not their place to care or not. If your website doesn't properly specify cache-control (many don't) then you get what you get.

    For any pages with user-specific content, add the "Cache-Control: private" header and voila, problem solved for you.

    If you want to opt out entirely, then a simple "Cache-Control: no-cache" header in your HTTP responses would do the trick, as would "Pragma: no-cache", I bet.

    Furthermore, there is no cookie-mishanding I've actually seen, and I've tested it. It passes cookies through just fine, without caching them, near as I can tell.
    --
    - Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.