Google Will Anonymize IP Logs Faster
An anonymous reader writes "The BBC reports on some changes to the data retention policy at Google in response to pressure from European authorities, but also included in the article is information about why Google claims they need to retain non-anonymised data for so long. Improving services, sure, but preventing fraud? Aiding 'valid legal orders'?"
Reader s0ckratees points to some commentary on the change at Google's official blog. The upshot: IP addresses in Google's logs will be anonymized after nine months, rather than 18 as previously.
Scrape the log
To sparkling shine
So the chin
Hairless, divine
Burma Shave
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
And the government wants to know who's been searching for things they don't approve of they have to ask google for the logs every 9 months rather than every 18 months.
but preventing fraud? Aiding 'valid legal orders'?
While I would say IP addresses shouldn't be the only method for these protection they do help.
Wow every site within 123.45.67.x seems to have a virus and malware on it. Oh a new site was scanned its address is in 123.45.67.x lets not publish right away lets put it threw full check. Or say 98.76.54.* always had clean site that were legit. A new site was found Well lets put it threw the quick checks and post it and queue it for full scan for later.
Yes knowing the IP Address and keeping track of it for months even years can be handy. The more data you have the better decision you can make.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
YouTube == Google So yes.
Improving services, sure, but preventing fraud?
Sure - AdWord fraud. Scrubbing logs quicker means less leeway for click fraud to be discovered.
Throw away the salt.
You still have the "Netflix problem", but that was related to the users publicly disclosing large amounts of information on a different site, which probably wouldn't come into play here.
Nerd rage is the funniest rage.
Google is handing data over to a few 3 letter agencies. BIG SHOCK! OH NO! NSA Reads my email!
Seriously, I put google not handing over such data at somewhere between 0 and -1.
Comment removed based on user account deletion
The summary reads "...Google claim they need to retain..." The use of *claim* rather than *claims* suggests that Google is being viewed as something other than a single entity.Am I missing something or was that just a typo?
I'm sure their partners will retain the good stuff.
Figure out a pseudo average for a DHCP lease... say 72hours, and make anonymous after that?
Who is general failure, and why is he reading my hard drive?
Actually, the IP should not be stored at all. Google might want to analyze the IPs to analyze and prevent attacks on its servers and additionally to get location information for its ad services. But there is no need to store it for a longer period -- unless you want to start massive data mining projects, which is exactly what is feared most from a privacy point of view.
So, any good news would be that the IP is not stored at all (except very temporarily).
British English, dude. At some registers, collectives are viewed as plural, not singular.
Search BBC stories for "Microsoft are" and such. (Whether that somewhat informal register should be used in BBC pieces is another topic entirely...)
considering the amount of data Google processes on a regular basis, a 9 Month backlog isn't that unreasonable.
i'm more concerned about Google not handing my data over to 3rd parties or governments than their retaining records of my searches. as long as they're willing to stand up for the rights of users, they can hold my search data for as long as they need to improve search results, reduce spam, and develop personalized search features.
What difference does it make to reduce this 18 months to 9 months log retention period?
Will Google anonymize logs in other countries too?
How about Google China? It respectfully hands over logs to the authorities on demand anytime. Same with Google India.
slashdot rocks
Salting goes without saying -1 uninsightful
I'm talking about the fact that it's 2008, and that search space could be exhaustively searched in a matter of hours on a desktop machine.
As the poster below me points out, "throw away the salt" is an answer, but it means the logs can only be compared to other logs in the time frame that you were using that salt.
Maybe IPv6 will make anonymized logs more feasible because of the 2^128 search space.
Get your own free personal location tracker
How do you Anonymize IP logs?
By using Scroogle.
Note to mods:
I got my karma for this post here, don't mod me up again for the same information <grin>.
Do not meddle in the affairs of geeks for they are subtle and quick to anger
It appears this 18 months, or 9 months as it is now, does not apply to Google Web History when you are logged into your google account. My Web History log goes back to April 2005.
I for one am glad they are not deleting the Web History log at 9 months. It is nice to be able to peruse through my searches done years ago.
...'George Will Anonymizes IP Logs Faster'?
I gotta loosen my bow tie a bit and get back to work.
Eagles may soar, but weasels don't get sucked into jet engines.
Yes, I was treating Google as a collective noun, and yes, I'm British.
(I submitted the article. Amusingly, I appear to have anonymised myself while doing so...)
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
And just what evil have they done with this data, exactly?
I'm hypertensive, you insensitive clod.
I always thought Google should buy Sealand or some other country and move it's operations there outside of United states laws, it would do a lot of good if we had a country that didn't have such crap... abuses of new countries laws or lack of laws non-withstanding
Even if IP addresses are removed or changed, search history is still not anonymous unless the records of search results get shuffled together. As long as all the search terms and results from one address are kept together in one record or otherwise tied to a unique identifier of any kind, your search habits and results can still be traced back to you. Anonymous search data is a myth.
Now I only have to escape 9 months after I blow up a school! And then they'll never know I searched google to make the bomb. Thanks Google!
"The best way to accelerate a Macintosh is at 9.8m/sec^2" -Marcus Dolengo
So I generate a table with 2^32 IP addresses and their MD5 with themselves as the salt, it doesn't enlarge the search space in this situation and I can then easily do a binary search to find what the origional IP was.
use new encryption key every day. throw away the key that is 9month+1day old. problem solved.
Their privacy policy specifically says that they DO keep logs of your IP and search queries.
Jeremy http://alucinari.net
Tor isn't great for high bandwidth connections, but I think it's just perfect to make sure all of those do-gooder large corporations don't get a choice about anonymizing IP addresses.
http://www.torproject.org/
Absolute statements are never true
considering the amount of data Google processes on a regular basis, a 9 Month backlog isn't that unreasonable.
Sure it is. Why? Because they are collecting data continuously and if it takes a long time to process what they've collected, more data is backlogged, and it keeps spiraling out of control. In fact, if it takes more than 24 hours to process 1 day of data, the backlog will increase without limit. The proper thing to do is to apply proper anonymization to the information immediately so you don't have to worry about it. There are plenty of methods that allow you to anonymize important information while retaining enough data for analysis. Here's one paper [Warning: PDF] on the subject.
Reading code is like reading the dictionary - you have to read half of it before you can go back and understand it.
I'd like to know if they also commit to anonymizing the client ID that is associated with every Chrome installation and the associated history tied to your account. After all, what's the point of anonymizing the IP data if your Chrome installation is tracking everything anyway? The same company would hold all the same information.
I should have been way more specific here.
What I'm saying is that IP (v4) addresses are uniquely problematic for being pseudonymized from the perspective of a web master, because of the tiny search space.
You wouldn't choose a 10 digit only password would you?
Say the threat model here is you are running a website and you get subpoenaed.
It would be great to be able to say, "OK here is a list of hashes of IP addresses, that's all I've got, have fun." ...but you can't do that for the reason I said above. If you then throw away the hash you can enjoy being held in civil contempt, and with the hash they could brute-force them all in trivial time.
they are scrubbing it out of good will more than anything.
Well, not really. The European advisory group was recommending a 6 month maximum, and Google were at 18. As Microsoft have learned, Europe is not shy about going after megacorps that think they are above its laws, and privacy and data protection issues are hot political topics in various EU countries right now, with a lot of media coverage of leaked data and rising public awareness of the dangers associated with such things.
This was done out of "good will" for the same reasons that industries accept "voluntary" regulation schemes that appear not to be in their own best interests: because the alternative is to have compulsory regulation and legal sanctions applied, and that costs a heck of a lot more and in some cases threatens negligent company directors personally. At 18 months, the first big leak could have seen the political powers turn strongly against Google at a time when their continued strength in several fields is already in jeopardy. At 9 months instead of 6, it's more of a misdemeanour than a felony, as it were.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
first off, Google's processing capacity isn't static, it's constantly growing. just because it takes more than 24 hrs to process a certain set of data doesn't mean that the backlog will increase without limit. that isn't a logically sound argument.
if you take that argument and reduce the time frame from 1 day to 1 hour->1 minute->1 millisecond... so on and so forth, you reach the conclusion that if Google is unable to instantaneously process/analyze every piece of data the exact moment it is received or created, then their backlog will increase without limit.
sometimes data needs to accumulated before it can be processed. for instance, to observe search trends, or to compare e-mails for spam analysis, etc. sometimes logs need to be kept for extended periods of time--that's why they're called logs--or data is retained for repeat analysis.
i don't know what exactly Google retains user data for or what kind of analysis they do, but it's understandable if some data needs to be retained in its original state for certain types of research or analysis. if they were going to release network measurement data to 3rd parties, as that paper you linked to discusses, then, yes, i would expect Google to follow their own anonymization guidelines. but like they've stated in their press release, it's all about finding a balance between protecting user privacy and improving the quality of their services.
perhaps the best thing to do is to give users the option to have their search requests retained for improving personalized search results, and let them enable/disable this feature as it suits them. all other data will simply be processed for a set period of time and then expunged.
if they're not releasing server logs to anyone, anonymization isn't really necessary. though i'm sure they allow users to access their services through anonymous proxies.
They are evil by their very nature - provided the definition of evil is watered down to Slashdot standards.
Evil meaning: they make money.
Bastards! How DARE they make money?!
I retract in full my previous statement in favour of the notion that anyone(thing?) with money is inherently evil.
EU should not say anything unless they do something about "The retention of data generated or processed in connection with the provision of publicly available electronic communications services or of public communications networks and amending Directive 2002/58/EC" which makes member countries pass laws that requires ISP's to save IP-adresses etc. for from 1/2 to 2 years.
7 days is good enough, why keep logs anyway? it just opens the system to exploitation.
An SQL query goes to a bar, walks up to a table and asks, "Mind if I join you?"
The same way /. Anonymizes email addresses.
216.34.REVOME.THIS.181.45
An SQL query goes to a bar, walks up to a table and asks, "Mind if I join you?"
You sir have never run a large site.
depends on your definition.
I assume you mean more than a single server, then no. I do keep server logs but can't recall the last time I ever looked at them.
An SQL query goes to a bar, walks up to a table and asks, "Mind if I join you?"