Slashdot Mirror


Google Will Anonymize IP Logs Faster

An anonymous reader writes "The BBC reports on some changes to the data retention policy at Google in response to pressure from European authorities, but also included in the article is information about why Google claims they need to retain non-anonymised data for so long. Improving services, sure, but preventing fraud? Aiding 'valid legal orders'?" Reader s0ckratees points to some commentary on the change at Google's official blog. The upshot: IP addresses in Google's logs will be anonymized after nine months, rather than 18 as previously.

6 of 97 comments (clear)

  1. Improving services, sure, but preventing fraud? by Richard_at_work · · Score: 4, Insightful

    Improving services, sure, but preventing fraud?

    Sure - AdWord fraud. Scrubbing logs quicker means less leeway for click fraud to be discovered.

  2. Re:9 Months by lysergic.acid · · Score: 4, Insightful

    considering the amount of data Google processes on a regular basis, a 9 Month backlog isn't that unreasonable.

    i'm more concerned about Google not handing my data over to 3rd parties or governments than their retaining records of my searches. as long as they're willing to stand up for the rights of users, they can hold my search data for as long as they need to improve search results, reduce spam, and develop personalized search features.

  3. Re:Just out of interest by sakdoctor · · Score: 4, Insightful

    Salting goes without saying -1 uninsightful

    I'm talking about the fact that it's 2008, and that search space could be exhaustively searched in a matter of hours on a desktop machine.

    As the poster below me points out, "throw away the salt" is an answer, but it means the logs can only be compared to other logs in the time frame that you were using that salt.

    Maybe IPv6 will make anonymized logs more feasible because of the 2^128 search space.

  4. Re:So if you live in china by Anonymous Coward · · Score: 4, Insightful

    You may not like that Google keeps data, but they have an almost perfect record for keeping it private from others. Or did you not see the fuss they raised over YouTube data, and how even after being ordered to turn over their data, they still fought to reach a compromise that protected user privacy?

    As for China, there's a reason Google keeps literally zero servers on Chinese soil. Even data for Chinese nationals is kept out of China, specifically so Google won't have to turn it over.

    Short of not keeping data at all, there is pretty much nothing more they can do to protect privacy. But that's never enough for SlashDot...

  5. Re:So if you live in china by yuna49 · · Score: 5, Insightful

    China is the least of my concerns. How about the Justice Department or the Department of Homeland Security?

    The Europeans might be pressuring Google to reduce its retention periods, but I suspect that Google heard the opposite point-of-view from the government here in the USA.

    Frankly I think that none of Google's logs should carry identifying information. If they need to track IPs for some reason, put them in a separate database table that's unconnected to the contents of the search strings. Keeping this information much beyond a week or two seems unreasonable to me.

  6. Re:9 Months by lysergic.acid · · Score: 4, Insightful

    first off, Google's processing capacity isn't static, it's constantly growing. just because it takes more than 24 hrs to process a certain set of data doesn't mean that the backlog will increase without limit. that isn't a logically sound argument.

    if you take that argument and reduce the time frame from 1 day to 1 hour->1 minute->1 millisecond... so on and so forth, you reach the conclusion that if Google is unable to instantaneously process/analyze every piece of data the exact moment it is received or created, then their backlog will increase without limit.

    sometimes data needs to accumulated before it can be processed. for instance, to observe search trends, or to compare e-mails for spam analysis, etc. sometimes logs need to be kept for extended periods of time--that's why they're called logs--or data is retained for repeat analysis.

    i don't know what exactly Google retains user data for or what kind of analysis they do, but it's understandable if some data needs to be retained in its original state for certain types of research or analysis. if they were going to release network measurement data to 3rd parties, as that paper you linked to discusses, then, yes, i would expect Google to follow their own anonymization guidelines. but like they've stated in their press release, it's all about finding a balance between protecting user privacy and improving the quality of their services.

    perhaps the best thing to do is to give users the option to have their search requests retained for improving personalized search results, and let them enable/disable this feature as it suits them. all other data will simply be processed for a set period of time and then expunged.

    if they're not releasing server logs to anyone, anonymization isn't really necessary. though i'm sure they allow users to access their services through anonymous proxies.