Slashdot Mirror


Security Fears Over Google Accelerator

Espectr0 writes "A software tool launched by Google on Wednesday that speeds up the process of downloading Web sites (covered recently on Slashdot) has caused some users to worry about their privacy. A ZDNet article discusses problems that users have been experiencing with the information that is cached by the software. On a Google Labs discussion group, one user said that 'I went to the Futuremark forums and noticed that I'm logged in as someone I don't know...'" Commentary also available on Signal vs. Noise and BlogNewsChannel.

74 of 355 comments (clear)

  1. I, for one, welcome by xmas2003 · · Score: 4, Funny
    --
    Hulk SMASH Celiac Disease
    1. Re:I, for one, welcome by NETHED · · Score: 3, Insightful

      I made the point a while ago about Google. I know others have said it too. Google is amazing, I rely on Google daily. Before Google went public, I was less afraid of them going bad, but now...I'm not so sure. If Google out grows itself, it becomes Microsoft. If the left hand no longer knows what the left is doing, then its bad news for everyone, especially the consumer. The difference (for now) between Microsoft and Google is that Google is not a standard install on nearly every consumer computer.

      Is G-os coming?

      --
      --sig fault--
    2. Re:I, for one, welcome by biglig2 · · Score: 3, Funny

      Well, you say that now, but if I uninstall accellerator I bet you'll say something different.

      --
      ~~~~~ BigLig2? You mean there's another one of me?
    3. Re:I, for one, welcome by Sinus0idal · · Score: 5, Funny

      Even more worrying, Google has two left hands.

  2. Google Privacy-b-gone! by ShaniaTwain · · Score: 5, Funny

    'I went to the Futuremark forums and noticed that I'm logged in as someone I don't know...'

    thats not a bug, its a feature.

    1. Re:Google Privacy-b-gone! by Silverlancer · · Score: 4, Interesting

      Thats a common proxy bug, actually, and the person who we all "appear" to be logged in as at Futuremark is St34lthW4rrior, a guy I actually know. No, we aren't actually logged in as him--its simply how the page is cached, and as our school proxy causes this problem basically every day, I'm used to it. Just disable it for dynamic pages such as forums.

  3. Maybe i don'd understand how it works? by mobiux · · Score: 2, Interesting

    How does caching your cookies to the internet help speed up your local browsing?

    1. Re:Maybe i don'd understand how it works? by Rakshasa+Taisab · · Score: 2, Insightful

      It helps because the site you are browsing will require your cookie to display correctly.

      What i *think* might have happen to the user in the above article is that the site used the IP address, not a cookie, to identify the user. Thus there was no cookie being misplaced but rather the site assumed google's ip belonged to the same user.

      --
      - These characters were randomly selected.
    2. Re:Maybe i don'd understand how it works? by Seumas · · Score: 2, Informative

      Or the google system is deliverying a cached page which should not have been cached in the first place. AOL and a few other providers do this all the time to members of my site - and I don't use an IP as the identifier (just a hash-digest).

    3. Re:Maybe i don'd understand how it works? by Sinus0idal · · Score: 3, Informative

      Yup exactly, so the problem is with the website, not the google proxy. The same problem would occur with any proxy for a website which uses IP address to determine the same user. Websites should be managing sessions.

  4. Aaaaaaaah! by Anonymous Coward · · Score: 4, Funny

    Its true its true! People are logging on this account and acting like me on this account on /. but it really isnt me! Imposters!

  5. All Together Now... by Future+Linux-Guru · · Score: 5, Insightful

    B
    E
    T
    A

    You'll get better results filing a report with Google as opposed to complaining on /.

    As for me, I used the 3.7 minutes I've saved so far to spend some quality time with my friends.

    1. Re:All Together Now... by Chyeld · · Score: 2, Funny

      Are their names Louise and Rosey?

    2. Re:All Together Now... by Chicane-UK · · Score: 5, Funny

      As for me, I used the 3.7 minutes I've saved so far to spend some quality time with my friends.

      Rosie Palm and her 5 sisters? ;)

      --
      "Hey! Unless this is a nude love-in, get the hell off my property!!"
    3. Re:All Together Now... by arkanes · · Score: 5, Insightful

      I think a more obvious answer here is that GWA is exposing web security bugs on a wide variety of applications. It's worth noting that if GWA can compromise your security, then it can be done intentionally as well. Which is not to say that caching issues should be ignored, or that there may not be a real problem with users getting some other users cookies. But if GWA can seriously affect your website, then instead of bitching that GWA is breaking your website like SomethingAwful did, you need to realize that your security was already flawed and you need to fix it.

    4. Re:All Together Now... by Anonymous+Custard · · Score: 4, Funny

      >As for me, I used the 3.7 minutes I've saved so far to spend some quality time with my friends.

      Rosie Palm and her 5 sisters? ;)


      Probably, but then what about the other 3.2 minutes?

    5. Re:All Together Now... by MyLongNickName · · Score: 4, Funny

      Ah yes... the Palm sisters. I know them well.

      --
      See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
    6. Re:All Together Now... by MCraigW · · Score: 3, Informative
      and all their biggest products, sans *one*, are in beta. Ball back to you

      Uhh... right, lets see, here is a page http://www.google.com/options/index.html with 16 Services and 8 tools that they offer, none of which are in Beta, and here is a page http://www.google.com/downloads/ of six software downloads and ooooohhh, one of them is Beta, and here is their "labs" page http://labs.google.com/ that has all their Beta products, note the list on the right hand side of their seven "Graduates of Labs" non-Beta products.

  6. Links.... by Mz6 · · Score: 4, Interesting

    Perhaps this is just Google's way of finding morelinks to add to it's search index? Imagine gathering millions of websites that it may not have indexed or found yet. All from links that users of the GWA have visited... possible?

    --
    Hmmm.
  7. Privacy eh? by funny-jack · · Score: 5, Interesting

    I found it a bit amusing that when I clicked the story link, the destination site, as well as three other sites, each attempted to save a cookie on my computer. Four cookies. To read a news story. That's necessary.

    --
    You probably shouldn't click this.
    1. Re:Privacy eh? by baadger · · Score: 4, Interesting

      Cookies are horrendously abused. There should never be a need for cookies until you choose preferences or login to a website.

      It's about time the net at large woke up to P3P, or better yet webmasters started thinking before they mindlessly implement cookies for tracking their visitors.

  8. Does this surprise anyone? by Jailbrekr · · Score: 5, Informative

    Its a caching proxy server for crying out loud. It caches web pages and feeds you the cached version. This is not new nor is it surprising, especially for a new service offering.

    --
    Feed the need: Digitaladdiction.net
    1. Re:Does this surprise anyone? by 44BSD · · Score: 4, Informative

      It is more than a caching proxy.

      The client-side portion of the architecture aggressively prefetches content. It's a two-stage proxy, really, and the issue some people have with it is that the content in the portion on the end-user's hard drive is not content that the user asked for, but content that the proxy predicts the user will soon ask for.

    2. Re:Does this surprise anyone? by smittyoneeach · · Score: 5, Funny

      It all makes sense now.
      /.ers are worried about TFA actually being downloaded to their machine, diminishing the /. effect and utterly wrecking their cred.
      I, for one, think that in Soviet Googlia, cache prefetches you .

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
  9. Comment removed by account_deleted · · Score: 2, Interesting

    Comment removed based on user account deletion

  10. Re:Maybe i don't understand how it works? by Enigma_Man · · Score: 3, Interesting

    It doesn't just cache your cookies, it acts as a proxy that compresses the data as you browse, much like the ISPs that offer "high speed" compressed modem surfing.

    -Jesse

    --
    Nothing says "unprofessional job" like wrinkles in your duct tape.
  11. I have another concern though by Tamerlan · · Score: 2, Interesting

    Not only that, but Google will conceal real web statistics from websites.

    Remember acquisition of Urchin? Here is my concern about Google Webaccelerator.

    1. Re:I have another concern though by oneiros27 · · Score: 3, Informative
      There are three types of lies : lies, damned lies, and statistics.
      (attributed to way too many people)
      If you thought web statistics meant anything, you're lying to yourself. Anyone who's done any work with collecting web statistics has had to deal with the AOL Proxies for the last decade. (and with IE deciding it was going to start lying, and say that it was Netscape, etc.)

      Most web statistics are complete crap.
      --
      Build it, and they will come^Hplain.
    2. Re:I have another concern though by apoc.famine · · Score: 3, Funny

      I'm doing a report on web statistics for school. Could you tell me what percentage are crap?

      T/Y...

      --
      Velociraptor = Distiraptor / Timeraptor
  12. Re:Looking suspicious... by Anonymous Coward · · Score: 2, Insightful

    Do you think Slashdot will ever arive to a time where a joke about the error message '"Move along. Nothing to see here.' Isnt made on /every/ single article and modded +5 /every/ single time?

  13. Had to remove it from my computer by PenguinBoyDave · · Score: 4, Informative

    I had to remove it from my system. It hijacked my browser, and I was not able to browse my companies internal websites because it over-rode our proxy. Bummer too...it worked great

    --
    I'm not a troll, but I play one on Slashdot.
    1. Re:Had to remove it from my computer by Chyeld · · Score: 5, Informative

      Why didn't you just tell it not to get in ivolved when browsing that domain? It does have exclusion rules built in.

  14. Well, it *is* beta, after all by NixLuver · · Score: 2, Informative

    I ran it for about an hour; turns out it's lumpy when one deals with multiple proxy servers (work vs. home) and it broke Rhapsody in a BIG way. I'm sure the good folks at Google will sort it out eventually.

    OTOH, one must consider whether or not one trusts Google with one's information that way. I wanted to check it out, but probably, in the long run, wouldn't have used it. But it's worth noting that millions of people use ISP proxy servers without even knowing it (think transparent proxies) or without understanding it (think "proxy.isp.com"). I can't imagine that Google's Accelerator would expose one *more* than that.

  15. Re:Looking suspicious... by frkiii · · Score: 3, Funny

    Answer:

    No. This isn't the article your'e looking for. You can go about your business. Move along, move along. :P

  16. Bigger problems with web accelerator by alphakappa · · Score: 4, Informative

    The accelerator prefetches the links on web pages, in effect clicking on all of them (except ads), which includes links that say 'delete this' or 'unsubscribe' etc. Many webpages use GET links to do these actions, and this is causing pages to disappear. Until web apps are rewritten to take note of the prefetch header, it's probably unsafe to use the accelerator. (Which seems to be offline at the moment - the page redirects you to the toolbar)

    --
    "When the only tool you own is a hammer, every problem begins to resemble a nail." - Abraham Maslow (1908-1970)
    1. Re:Bigger problems with web accelerator by Anonymous Coward · · Score: 2, Informative

      in effect clicking on all of them (except ads), which includes links that say 'delete this' or 'unsubscribe' etc. Many webpages use GET links to do these actions

      Then they were coded by morons. Section 9.1.1. of RFC 2616 (the HTTP 1.1 specification) explicitly states that GET should not be used for unsafe actions:

      In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe".

      This is nothing new, "web accelerators" have been doing this for a decade or so, and every time one becomes popular, these moron developers get bitten and end up blaming the web accelerators instead of accepting responsibility.

    2. Re:Bigger problems with web accelerator by Jeff+DeMaagd · · Score: 3, Interesting

      If it is prefetching everything, then I would have a problem with that, from a different perspective. That increases the amount of bandwidth used by fetching a lot of pages that might not be followed. That means increased bandwidth costs unless enough users use the system such that Google's caching means that it most of the given files are already in their cache.

    3. Re:Bigger problems with web accelerator by poot_rootbeer · · Score: 4, Insightful

      links that say 'delete this' or 'unsubscribe' etc. Many webpages use GET links to do these actions

      In which case, many webpages are BROKEN AS HELL.

      Come on, "webmasters". I knew well enough to implement any irreversible actions as a form with method=POST to prevent spiders from triggering them back in 1998. There's no excuse for a professional web developer to make that mistake in 2005.

      Google being the global aggregator that it is, though, should have expected the worst and foreseen that this kind of thing would happen and planned for it. Disappointing.

    4. Re:Bigger problems with web accelerator by That's+Unpossible! · · Score: 2, Insightful

      Come on, "webmasters". I knew well enough to implement any irreversible actions as a form with method=POST to prevent spiders from triggering them back in 1998.

      So did these people. But this isn't a spider. This is a monkey piggy-backing on an AUTHENTICATED USER SESSION.

      And I, for one, say it is time to punch that monkey.

      --
      Ironically, the word ironically is often used incorrectly.
    5. Re:Bigger problems with web accelerator by mr3038 · · Score: 2, Insightful
      Someone I work for uses GET for everything. [...] This is why he uses so many if statements to accomodate people altering the links.

      You know that <form>s can be modified too, right? If you're writing application that works through HTTP/Web browser then you just have to do a lot of checking (that's where the "if" comes in) to make sure that the client (browser/user agent/the real user) isn't trying to hack your system. If you don't do input validation for everything you might as well use GET for everything.

      I do sometimes use GET for state changing actions in web applications I write. In some cases some toggle actions that reflect only data display result in much better user interface if I use normal links instead of <button>s. Sometimes you have to accept some compromises when you're writing webapps that have to work without CSS, images and javascript and still be usable.

      --
      _________________________
      Spelling and grammar mistakes left as an exercise for the reader.
  17. Bad caching directives by Sebby · · Score: 5, Informative
    We encounted similar problems when we implemented aggressive caching on our site; mostly that we didn't set the headers properly.

    this site was pretty useful for information. So was AOL webmaster resources info.

    --

    AC comments get piped to /dev/null
  18. Cache-Control is your friend. by oneiros27 · · Score: 5, Informative
    If Google is ignoring Cache-Control headers, then that's one thing to complain about. There's also a good chance that some of these sites are using improper systems for session control (eg, using HTTP_ADDR without checking X_FORWARDED_FOR, and not setting Cache-Control on their response).

    For more info about these known issues with HTTP caching, see the following
    --
    Build it, and they will come^Hplain.
  19. Some things I've noticed by Palos · · Score: 3, Informative
    I tried this for a little bit, and really am not impressed. Some basic issues:

    From a users point of view:

    1 - Ignores hosts file, so I end up seeing ads I normally wouldn't see

    2 - Cookies work weirdly if at all, a lot more sites that I visit frequently appear to use cookies, and I've noticed some definte weirdness

    3 - The time saved on a broadband connection really seems minimal, after an hr or two of surfing it takes a few seconds

    4 - The pre-fetching it supports is already in firefox and probably other browsers

    From a webmasters point of view:

    1 - No way to limit caching of certain pages outside of moving them to SSL. Robots.txt isn't being followed (although probably rightly so, based on the application ).

    2 - Because of the flawed cookie support (at least right now) a lot of affilate and different advertising methods have to be modified to support this.

    I'm a big google fan, and I use most of their applications daily, but this one defintely needs some work. :)

  20. Something Awful's take on this by grazzy · · Score: 2, Informative
  21. Adsense clicks by broothal · · Score: 4, Interesting

    Has anyone read how google will deal with adsense clicks? Since all users of the accellerator will come from the same IP, will that IP decrease in value? (It's well known that the same IP can't just click again and again and generate revenue).

    1. Re:Adsense clicks by broothal · · Score: 3, Interesting

      Uhm yeah about that. I can see in my logs when I visit my own pages that I get two hits. One from a Google IP and one from my own IP. What gives?

  22. NoCache directive by Sir+Pallas · · Score: 4, Insightful

    Shouldn't those sites be using the NoCache directive and shouldn't Google be honoring it? I wonder which side is at fault. At any rate, fears about information leakage are kind of silly because of the volume of traffic that Google services. The accelerator allows them to see link patterns, but no one could store, let alone process, an entire day's worth of data after the fact. The same is true for Google Mail: no person ever sees your email; an algorithm does, and tailors simple, pertinent advertising in exchange for an otherwise free service. The accelerator can only make the search engine better for everyone. Anyone that uses it is giving back, contributing to the synergistic knowledge of Google.

  23. Not quite as serious as it sounds.. by Chris_Jefferson · · Score: 3, Interesting

    The business with appearing to be logged on isn't quite as serious as it sounds (although it is still bad).

    The problem appears to be that you will sometimes be given a page that was personalised for someone else. However if you attempt to do anything from that page (for example if you find yourself looking like admin of a web board) you'll find that it doesn't work, any more than it would if someone emailed you a copy of a page where they were logged in as admin and you clicked on links (if you are on a website where doing that would work, you already have serious security problems). It also doesn't occur with SSL as google doesn't doing anything with SSL pages (as you would hope)

    This is still a problem if that page shows something private of course, and should be fixed. (a password of course being the worst case, but how often do you see your actual passwords printed on a webpage?)

    --
    Combination - fun iPhone puzzling
    1. Re:Not quite as serious as it sounds.. by LiquidCoooled · · Score: 2, Informative

      What if that page was my account information?

      Ooooooooh looky here, I can see the details people would rather keep private.
      Its not just about clicking anywhere afterwards.

      --
      liqbase :: faster than paper
  24. For Webmasters : Blog Google Accelerator by lorenbake · · Score: 2, Interesting

    Read about all of the username, forum, and security risks?

    Since such activity could pose both a security risk to web surfers and site owners, there are some web sites which are interested in not having Web Accelerator pick up their material.

    A very fast and efficacious method of denying Google Web Accelerator (GWA) funneled traffic access to your web site is blocking the IPs it is calling your pages from:

    http://www.searchenginejournal.com/index.php?p=167 6

  25. Re:Sooooo by bogie · · Score: 2, Insightful

    Did you Read The Fine Article?

    "I went to the Futuremark forums and noticed that I'm logged in as someone I don't know. Great, I've used Google's Web Accelerator for a couple of hours, visited lots of sites where I'm logged in. Now I wonder how many people used my cache. I understand it's a beta, sure, but something like that is totally unacceptable."

    I frankly don't know a ton about it since it fucked up my firefox install but others are giving the example of user X who has mod status browses www.popularforum.com/modforum/userspasswords and now google has a cache of that page that anyone can access. I don't know if that's true but this is exactly why companies don't knowingly open their proxies to the outside world. Here you have the Entire World granted access to almost any page a user running Google's software goes to.

    If those claims are true then Google has a duty to pull this from the market immediately which they may very well do.

    --
    If you wanna get rich, you know that payback is a bitch
  26. one unhappy webmaster's account by august+sun · · Score: 3, Interesting
    http://www.somethingawful.com/articles.php?a=2858

    lowtax of SomethingAwful makes some interesting points amidst all his fuming but I'll have to defer to the /. tech wizards to vet his technical claims.

  27. If your worried about privacy... by Momoru · · Score: 2, Insightful

    Don't use it! Google is a public corporation, everything they make is designed to somehow make a profit (which i see nothing wrong with, btw)...even if it doesn't cache your personal information like the article claims, there is some angle to it that will make money for them, maybe they will look at your web surfing habits and target ads to you. If you're one of those people who blindly trusts google because of their "don't be evil" mission statement, then use it and trust that Google is taking care of you. I personally don't trust them, so I won't use it. There is no free lunch.

  28. caching personalized content != caching cookies by SuperBanana · · Score: 5, Informative
    How does caching your cookies to the internet help speed up your local browsing?

    Who said it was a cookie that was cached, and not the page content? Much of the discussion thusfar seemed based off what an anonymous quote in a ZDnet article. Far as I can tell, the guy saw "Welcome back, Bob!" and freaked, when he wasn't -actually- logged in as Bob. Furthermore, who says it isn't Futuremark (or their forum software- because we all know how security-conscious PHP/MySQL forum software is) tagging their pages as cacheable when they shouldn't be? If Google is ignoring "don't cache this page", now yes, we have a problem- but the ZDnet story is of a technical level I'd expect of a community newspaper, so it's kind of hard to tell. It's like a story in your city newspaper that read "somebody killed by a cop!" and going off on a rant about police brutality...only to find out later the guy was a bank robber with an Uzi.

    Before you get all excited about bank sites etc- keep in mind those often use very unique URLs for each page and other tricks.

  29. Time to try this out on EBAY! by Thud457 · · Score: 4, Funny

    SEE?!!! I told you that if these corporate identity thefts kept up, we'd all end up having the same identity!

    --

    the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

  30. Much more "beta" then most google betas by RebornData · · Score: 4, Interesting

    I just deleted the accelerator from my system after trying it for the last day, and I must say that it is much less mature than most of the "Beta" products google releases. It caused several significant issues with Firefox on my system, including:

    1. Links that open another window stopped working entirely (although they worked if I right-clicked and selected "open in new tab")

    2. Even after closing all Firefox windows, a firefox.exe process would remain running, and prevent any new firefox windows from being opened until it was manually killed

    3. "Proxy not available" errors when opening several pages at once, such as when using the Firefox "open in tabs" on a folder of bookmarks.

    And I haven't even checked into some of these cookie / privacy issues. Perhaps these issues are unique to my system, but my environment is pretty vanilla... I just run a few of the more popular Firefox plugins. Removing the GWA cleared up all of the problems cited above.

    Up to this point, I've always been very impressed with the level of testing that has gone into Google software products before they enter Beta. In this case, I'm not. Hope this isn't a sign of things to come.

    -R

  31. You say that, but... by Sialagogue · · Score: 4, Insightful

    How long has Google Groups been labelled Beta now, two years maybe? How many users does it have?

    If a wide number of even adventurous, risk-taking users could be exposed to a potentially significant security hole, then word should get out more widely than just Google's "thanks for the feedback" e-mail addresses.

    Beta is not the Greek word for "without responsibility." As much as we criticize Microsoft for making the idea of a "release date" (or "security") meaningless, I think Google's well on it's way to making the idea of the "Beta Release" meaningless.

    They act like a small, groovy coding lab with Beta releases and all, but seemingly aren't simultaneously recognizing that because of their prominence in consumer's minds, *anything* they do has widespread impact on ordinary Net consumers. So a true, uncontrolled Beta release? That's fine for me when I just coded a little midi tool and want to run it past my friends, but there's really no such thing when you're Google.

    I think that the number of users that adopt even their least publicized tools takes them out of the realm of the real intent of a Beta release, especially when security issues are involved.

    --
    The only acceptable defense of scientific results is to say that they were the product of the Scientific Method.
    1. Re:You say that, but... by ajs · · Score: 4, Interesting

      "How long has Google Groups been labelled Beta now, two years maybe? How many users does it have?"

      So you would have them move it out of beta sooner? Not beta it? What's the solution you're proposing?

      Are you saying that software that Google issues in beta should be bug free, or are you suggesting that Google, being a search engine and all, should be scraping all of the Web's most popular forums as their bug reporting mechanism?

      I'm really not sure what you're proposing, here.

    2. Re:You say that, but... by nmk · · Score: 5, Insightful

      I think he's probably proposing that they should stop acting like pussies and start taking some responsibility for their software. Like he said Google has turned the very concept of the Beta into a joke. If MS was to keep a major piece of software in Beta for three or four years (as does Google), they would be accused of incompetence. I think the same should apply to Google.

  32. Futuremark's problem, not Google's by Temporal · · Score: 4, Informative
    I assume Google has properly implemented the HTTP/1.1 caching mechanisms. Among these, it is possible for a server to mark a page as being "private", meaning that it should never be cached in a public cache like Google's. Another thing the server can do is set "Vary: Cookie", which indicates that the server will produce different pages for people who give it different cookies.

    Here are the headers that the Futuremark forums give me when I am logged in:
    HTTP/1.1 200 OK
    Date: Fri, 06 May 2005 18:10:16 GMT
    Server: Apache/1.3.29 (Unix) mod_perl/1.29
    Transfer-Encoding: chunked
    Content-Type: text/html
    As you can see, neither "Cache-Control: private" nor "Vary: Cookie" is given. In fact, the server doesn't even give an expiration date for the content. Under these conditions, the HTTP/1.1 protocol says that it is perfectly OK for a cache to keep this page for awhile and serve it to other people.

    This problem is firmly the fault of the people who wrote Futuremark's forums. This constitutes a major security hole in the WWWThreads forum package, because this problem will occur when using any standards-compliant HTTP cache. I would strongly recommend against the use of these forums on any web site until they fix their security problems.

    (I do not know if other forum software has this problem, but frankly it would not surprise me. It seems lots of PHP developers and other high-level web programmers have no idea how HTTP/1.1 works, and assume that headers are completely unimportant. I have written a web server and forum software myself, though, and I made damned sure that mine produces the right headers.)
    1. Re:Futuremark's problem, not Google's by Godeke · · Score: 2, Informative
      Interesting. Microsoft is doing the "right thing" with IIS6:
      Date: Fri, 06 May 2005 18:31:39 GMT
      Server: Microsoft-IIS/6.0
      X-Powered-By: ASP.NET
      Content-Length: 5905
      Content-Type: text/html
      Expires: Thu, 05 May 2005 18:31:38 GMT
      Cache-Control: private
      This is apparently the default.
      --
      Sig under construction since 1998.
    2. Re:Futuremark's problem, not Google's by Temporal · · Score: 2, Informative

      Of course, you should only slap the "private" header on pages which are actually private. Otherwise you're just killing the ability of the cache to do its job. But, marking everything private is better than marking nothing private; the former just reduces performance while the latter is a security problem.

      It looks like microsoft.com, which simply redirects to www.microsoft.com, is marked "private". That's excessive, and indicates to me that Microsoft's web designers don't understand cache-friendliness or weren't interested in implementing it.

    3. Re:Futuremark's problem, not Google's by Godeke · · Score: 3, Informative
      No, my point was exactly that "marking everything private is better than marking nothing private": this was the header from a site I built. Now that I'm aware of the ramifications, I can remove that header from the appropriate pages (the few that are not data driven). But I far prefer the default this way that discovering "oh yay, all my data driven pages are stupidly cached". Right now the site is just rude and uninformed, not broken.

      As far as Microsoft's sites, I really could care less how stupid their choices are, I'm just glad I can now implement it properly by adding the change where necessary instead of having egg on my face for not having a piece of information when I built the site. During building the site, the only cache I considered was the browser cache. Bad, but not as bad as what I'm finding on my personal PHP driven sites on this same issue. There I just look stupid:
      Date: Fri, 06 May 2005 20:00:49 GMT
      Server: Apache/1.3.33 (Unix) mod_jk2/2.0.0 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.22 OpenSSL/0.9.6b PHP-CGI/0.1b
      Last-Modified: Fri, 30 Nov 2001 20:02:22 GMT
      Etag: "2bed0b-1b27-3c07e5ce"
      Accept-Ranges: bytes
      Content-Length: 6951
      Keep-Alive: timeout=10, max=100
      Connection: Keep-Alive
      Content-Type: text/htm
      (Um, yeah, haven't updated that ugly site in four years).
      --
      Sig under construction since 1998.
    4. Re:Futuremark's problem, not Google's by Temporal · · Score: 2, Informative

      Yes, I agree, "Cache-Control: private" should be default for any dynamic site unless the developer states otherwise. It was a good idea for IIS to do it that way.

    5. Re:Futuremark's problem, not Google's by supersat · · Score: 2, Informative

      You are quite correct. I observed the same thing with the Futuremark forums.

      A lot of people on LiveJournal where whining about it, but LiveJournal correctly includes a Cache-Control header, and after several extensive tests, I've found Google's Web Accelerator to not cache anything it shouldn't.

      When you first request a page, it sends the request to Google along with some of the request headers (which may contain cookies). Google then sends back a response with a special X-Google-Cache-Control header that instructs the client what to do next. In LiveJournal's case, it sends back X-Google-Cache-Control: remote-fetch, which causes the client to directly fetch the page from LiveJournal. The page contents are not transferred back to Google. Subsequent loads of the page cause only a few bytes to be exchanged with the Web Accelerator server.

      Interestingly enough, with a packet sniffer, you can see what it prefetches. When you go to Google.com, it begins fetching hotmail.com, ebay.com, and cnn.com. That says a lot about the typical user.

  33. Re:Maybe i don't understand how it works? by Enigma_Man · · Score: 2, Insightful

    I'll stop when people stop deserving it. I haven't missed the whole point of this discussion at all, infact I was the one who originally instructed the parent why he was wrong. Google caching might cache cookies, but not ONLY cookies; understand, comprende?

    -Jesse

    --
    Nothing says "unprofessional job" like wrinkles in your duct tape.
  34. Some code to block GWA from application pages by DamienMcKenna · · Score: 2, Informative

    Here's some code to add to your web pages to block GWA. This will leave static media alone, which is fine.

    PHP:
    if(array_key_exists($_SERVER['HTTP_X_MOZ'] ))
    {
    if(strtoupper($_SERVER['HTTP_X_MOZ']) == 'prefetch')
    {
    header("HTTP/1.x 403 Forbidden");
    header("Content-Type: text/html; charset=iso-8859-1");
    header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
    header("Cache-Control: no-store, no-cache,
    must-revalidate");
    header("Cache-Control: post-check=0, pre-check=0",
    FALSE);
    header("Pragma: no-cache");
    header('Accept-Ranges:');
    exit();
    }
    }

    CFML:

    Damien

    1. Re:Some code to block GWA from application pages by DamienMcKenna · · Score: 2, Informative

      Rather, it should be...

      <!--- block Mozilla Web Accelerator --->
      <cfif structKeyExists(cgi, 'HTTP_X_MOZ')>
      <cfif cgi.HTTP_X_MOZ EQ 'prefetch'>
      <cfheader statuscode="403" statustext="Google Web Accelerator requests are forbidden." />
      <cfabort />

      </cfif>
      </cfif>

      Small typo on the second variable name. Doh!

      Damien

  35. Google is becoming a threat by Everyman · · Score: 2, Interesting

    Why is Google doing this?

    If the purpose is to speed up web access, then why couldn't all this gzip compression, prefetching, and so forth, be handled on your local drive without going through Google? Wouldn't that be faster? Not everyone lives next door to a Google data center (not yet, anyway), and there is latency when you hop around the web to get stuff from Google. The accelerator installation file isn't exactly lean (1.4 meg), so I don't understand why Google has to broker all of this stuff on their servers.

    Google claims that there's no more of a privacy issue with this thing than there is with your ISP. However, I think most ISPs are a bit different than Google.

    My ISP has no reason to store it's logs indefinitely. Google has every intention of storing everything about me forever. My ISP rotates their logs regularly, while Google indexes and compresses their logs using globally-unique IDs, and stashes it away for future reference. My ISP is not the world's largest advertiser, but Google is determined to "know more about you" (Eric Schmidt's words) for profiling purposes. My ISP has a real privacy policy, and I believe that they would demand a subpoena before giving out information about my surfing behavior. Google has never suggested that they even require a subpoena from officials, so I have to assume that they have a very cozy relationship with various governments.

    All that is from the user's perspective. What about webmasters?

    The web accelerator ignores robots.txt. The web accelerator ignores the NOARCHIVE meta. I believe, but have yet to confirm, that it ignores any no-cache pragma headers. It avoids prefetching anything with a question mark in the URL, but what about all those PATH_INFO dynamic links we've been installing for the last four years so that our dynamic pages look like static URLs? Google prefetches many of these, and there are numerous reports that this prefetching, along with some cookie mishandling by Google, is breaking sites out there. Does Google care?

    Why isn't there a sitewide opt-out option for this monster? Heck, it's so bloody dangerous for both the user and the webmaster that it ought to be opt-in instead of opt-out.

    All webmasters should block this thing. If a user cannot get to your site because of this block, then at least you as a webmaster won't be complicit. We have to protect users from Google's megalomania, because they've been so dumbed-down by Google worship over the last few years that they can no longer think straight.

    1. Re:Google is becoming a threat by JahToasted · · Score: 2, Insightful
      so I don't understand why Google has to broker all of this stuff on their servers.

      Never heard of the slashdot effect? Well if everyone is using this, it will eliminate it. Google downloads the site's content, everyone downloads from google, site stays up.

  36. Re:Um.... 6 fingers? by VE3ECM · · Score: 2

    Um, I'm an idiot. I was thinking of "Mrs. Thumb and her 4 lovely daughters", and not Rosie PALM and her five sisters.
    I wanna go home.

  37. Re:Sooooo by noidentity · · Score: 2, Insightful

    Your ISP could do the same stuff people claim google can do (as far as tracking).

    Except my ISP is much smaller and is in the internet service business rather than the advertising business.

  38. No, it's not a proxy bug... by Otto · · Score: 2, Interesting

    It's not a bug with the proxy software, it's a bug with those forums.

    Caching proxies have been around for several years now, and this is not a new problem. Any webmaster worth his salt should know about this, and any dynamic content (especially a piece of forum software) should know damn well to properly implement expiration dates and cache control directives.

    If the WWWBoard software at Futuremark was doing the right thing in the first place, this wouldn't be a problem. It's Futuremark's and WWWBoard's security bug, not GWA's or any other caching proxy's.

    The only reason people are bitching about this is because GWA is one of the first caching proxy systems out there to hit widespread use by people who've never used one before. The concept itself is not new by a long shot, and there are established guidelines to follow when you develop web software to deal with them. If you fail to follow these guidelines, then yeah, your site will break and you create a security risk like WWWBoard has clearly done. Upgrade/fix your forum software.

    --
    - Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
  39. Response by Otto · · Score: 5, Informative
    The web accelerator ignores robots.txt.


    The web accelerator is not a robot, so this is correct behavior.

    The web accelerator ignores the NOARCHIVE meta.


    NOARCHIVE is a Google specific extension to the robots.txt specification, and again, this is not a robot.

    I believe, but have yet to confirm, that it ignores any no-cache pragma headers.


    I'd be absolutely shocked if that were actually the case. I also believe it respects the Expires header as well as the Cache-Control header.

    It avoids prefetching anything with a question mark in the URL, but what about all those PATH_INFO dynamic links we've been installing for the last four years so that our dynamic pages look like static URLs? Google prefetches many of these, and there are numerous reports that this prefetching, along with some cookie mishandling by Google, is breaking sites out there. Does Google care?


    If they're following the proper standards, then it's not their place to care or not. If your website doesn't properly specify cache-control (many don't) then you get what you get.

    For any pages with user-specific content, add the "Cache-Control: private" header and voila, problem solved for you.

    If you want to opt out entirely, then a simple "Cache-Control: no-cache" header in your HTTP responses would do the trick, as would "Pragma: no-cache", I bet.

    Furthermore, there is no cookie-mishanding I've actually seen, and I've tested it. It passes cookies through just fine, without caching them, near as I can tell.
    --
    - Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.