Google to Anonymize Users' Search Data
Google's official blog states they are on an effort to anonymize their search data after 18-24 months. After previously fighting turning over search data to the feds, it looks like they are striking another blow to the "think of the children" crowd. Any bets on whether MSN or Yahoo! will follow suit?
..the "off the record" button, in the first place!
gtkaml.org
All they have to do is erase the logs every day or just not keep them. It doesn't "take an effort". Anonymous proxies have been doing this for years.
Global warming is a cube.
And how are they going to comply to the EU regulations, which stipulate a much longer retention-time?
Finally we search for those pics of Britneys vagina without fear of harming our permanent google record.
Although I did have to install the AnonymizeGoogle Firefox plugin to get it.
anonymizing it straight away! That would be an even quicker solution to the problem.
I used to have a better sig but it broke.
Why not anonymise the data after zero months? Are they required by law not to?
I, for one, will be very glad that they won't be able to pin my searches for "Goldfish porn" and "Kinky sofa covers" back to my IP.
Signed,
John Jacob Smith
123 Brookfield Lane
Towarg, South Carolina
Google should not be collecting any of that huge pile of information AT ALL, not just anonymising it after 18 months. As the AOL case showed, search queries can be used to identify individuals even after AOL anonymized them, so it's not IP addresses they are recording, it's PEOPLE.
There is no need to collect the IP addresses of searchers that haven't opted in to Google's personalized search. There is no law, that requires it.
There is no need to store the IP addresses of individual visitors to websites when Google analytics is used on a web page.
There is no need to store IP addresses of pages delivered to adsense viewers. Clicks maybe for a short time to prevent click fraud, but viewers, no.
None of this information should be recorded, and further the EU privacy directive should be enforced to ensure that none of that information is recorded. The law says we have privacy, Google should be forced to comply with that law.
Google plan to make it "more anonymous". Like pregnancy, data either ARE anonymous or they ain't. You can't qualify an absolute, and "anonymous" is an absolute condition indicating lack of information.
Stop googling for "jihad death to american president" if you're worried about getting caught.
I should point out that your google query goes over plaintext HTTP so anyone inbetween can eavesdrop on your queries.
Tom
Someday, I'll have a real sig.
I bet that means the IAO has their project running properly now so they no longer need to use Google Logs ...
http://www.jihadwatch.org/archives/011685.php
Makes me wonder how fast does the CIA anonymize their material? Ha!
"you're gone" [you are]
since that data could be abused in any number of ways, including credit scoring, insurance scoring or leaks of "interesting details" to the press. Probably those would hurt Google's reputation more than any additional income it could generate, but it's still the better policy.
If you're worried about privacy, I recommend Firefox and the Customize Google extension. I'm also a fan of Googlepedia.
Geeks like to think that they can ignore politics, you can leave politics alone, but politics won't leave you alone.-rms
Which is it? 18, 19, 20, 21, 22, 23 or 24?
http://www.rense.com/general79/wdx1.htm
This means nothing. If you click the link.."By anonymizing our server logs after 18-24 months..." That's still far too long and is most likely motivated more by logistical concerns in retaining so much data than out of any act of benevolence. However it definately makes good PR to paint this as 'Taking steps to improve privacy'...
I don't think it will mean much unless they publish their anonymization technique. Even Google seems to have doubts about it, and considering the resources of some attackers (e.g., national governments), if the anonymization can be broken it will be.
But Google's anonymization does not have to be perfect: Google isn't the only place your google.com activity is recorded: There's your personal computer, possibly your ISP, other sites (referrer links show Google search terms), etc. As long as Google makes their anonymity difficult enough to break that it's significantly easier to go elsewhere for the information, they've done their job. If you need to be anonymous, I hope you are taking other steps.
I, for one, welcome the merciful intentions of our benign new overlords.
Exactly, it's to Google's MONETARY benefit that they record this information. The EU Privacy law says THEY CANNOT RECORD MORE PERSONAL INFORMATION THAN IS NEEDED FOR A TRANSACTION. Now that it's clear that search data is personally identifiable, the EU Privacy law should be used to FORCE GOOGLE TO QUIT IT.
/ double.click.lawsuit.idg/
"The moment you sent your request out over the internet in plain text to a third party (that is a corporation out to make money you know) you lost that."
Not so, the law says we have to consent and we didn't consent!
And what about when that party isn't Google? Google analytics is not on Google's site, it's embedded on third party sites, Google's adsense is on other people's site too. I didn't consent to handing my data to Google when I surfed to third parties site, Google took that data and recorded it in violation of EU privacy laws.
This has also been sued for before resulting in Doubleclick backing down over exactly this issue.
http://archives.cnn.com/2000/TECH/computing/01/28
"A California woman has filed suit against DoubleClick, accusing the U.S.-based online advertising company of unlawfully obtaining and selling consumers' personal information, according to a statement issued by her attorney's office."
"Hariett M. Judnick filed the suit in Marin County Superior Court in California, on behalf of the "general public of the state of California," the statement said.
The suit alleges that DoubleClick employs Internet cookies to identify users and track their movements on the Internet. The company tracks and records the sites an individual visits, as well as the information transmitted on the sites, such as names, ages, addresses, shopping patterns and financial information."
Didn't AOL get into a lot of trouble for this?
Personally... we knew this was going to happen. Anyone that's surprised is a fool.
Let's stop dilly-dallying and just change "-1: Overrated" to "-1: Disagree" or "-1: Doesn't Subscribe to Groupthink".
List of nifty little phrases that have bitten their speakers in the ass:
Now Google brings us:
Let's just be less evil, now that we've been caught.
Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong fix.
The 'think of the children crowd' should be very pleased by this - children search for sketchy things all the time... and then their parents get blamed for it.
'Twould be better if it all stayed anonymous, in my opinion
it's all about the advertising. Google's knowlege of you lets them advertise to you more effectively.
It isn't that Google necessarily care that it is "you" (actually they might but that is another thread...), but "you" are doing a search and then clicking on links in a particular order which is a context that is important for ranking. At an abstract level, the relationship between what you searched and the links you tried is stuff Google wants to track to help enhance relevancy and search results. The problem is that with modern technology to do this they need to know somethings that aren't anonymous which can be abused.
If they can come up with a way to do this without tying it all back a computer and the individual who made the request then we are probably all better off not because privacy issues (but that is a great side effect) but because you get better results from removing the irrelevant data from ranking consideration. The closer they get to a true anonymous search system, the better the results should theoretically be.
"Not only that, but is the history of searches you made over 2 years ago relevant to your current searches performed today"
... its like the old saying, "Knowledge Is Power".
It is to Google as they want to know more about you, so they can build up a clearer profile about you. Just because they (say they) are going to delete the data after 2 years, doesn't mean they will not use the data in that two years to build up a profile about what you like. Then they can still keep updating that profile over time while deleting data. So even once they delete the data after two years the profile will still persist (in an ever changing and growing form).
The whole Google "do no harm talk" sounds more like PR spin talk to cover up what their real intentions are
From a research point of view, Google is basically a vast data mining research company. They are forever looking for more new ways to do data mining.
So now imagine in say a few years from now, you could work out how to build up a profile of searches from a company instead of a person. Then you would be able to know what that company is interested in. Its also the logical extension of profiling individuals. But it would also be pure industrial espionage. But we are told, Google will do no harm, so its ok then. Imaging how valuable that data profiling would be to sell it to a competitor of that company.
I think in a few years from now, we will see countries starting to create their own search engines so all their research doesn't get feed though other countries search engines, which are basically gigantic information filtering and collection systems for what people (and companies) are interested in.
There are 10 kinds of people in the world... those who understand binary and those who don't.
Personally I think it's all a load of BS. If they really cared about our privacy, and if all they really needed my IP addy for is to aggregate my searches to 'better serve me', then all they have to do is one-way hash my IP addy. Then they can still tie all my searches together, and my gmail and such, but they wouldn't be able to back track it. And the govn't could demand all they want... you want the IP of the user who searched this? Here it is Mr. Bush... go nuts: x867:%dsgfk435j>67&*g[fg
So forgive me if I don't get all thankful for Google's big gesture. Heh.
Just hard code the function that grabs "HTTP_REMOTE_ADDR" to return "127.0.0.1." That way the feds will think all the kiddie p0rn searches came from the computer they are using.
"Google could not exist without collecting this information. This data is central to its business model, and key to its differentiation from other search engines."
False, if it was essential to their business they would not be able to 'anonymize' the data after 18 months.
There is no gain to be had from targeting an advert to a site I surfed yesterday, let alone 18 months ago. There's is no gain to be had from 'tweaking' the results to be more like something someone was searching from my IP address yesterday, or the day before, or 18 months ago.
And even click fraud has it's limits of detection: Google does after all decide the clicks are valid and then pays the bill once a month after a short delay.
"The great majority of web uers see nothing wrong with the method even though concerns about it are getting a fair amount of publicity."
You don't speak for the majority of web users so you're not able to make this remark with any authority.
AOL got in trouble for releasing it publicly. Google isn't doing that.
It's scary being a Flash and Flex developer on Slashdot. You guys are unnaturally rabid.
...I can stop adding "-lolita" when searching for "Nabukov"?
Quidnam Latine loqui modo coepi?
Everyone is worried about their own personal privacy, without thinking about the power Google is accumulating even if the data is totally anonymous. E.g., if everyone suddenly starts searching for a certain product, Google knows before anyone else, and could buy out the company who makes it, or sell that information to others. As long as the Google data repository is limited to www searches and click-through behavior, there is some bound on their power. It would become really scary if they were able to analyze what people are talking about in e-mail each day.
What I thought was a future concern may already be happening. According to http://www.computers.net/2006/08/google_in_dange.h tml Google holds 5.8 billion in marketable securities. This is more than 40% of their assets, which by SEC rules means they are a "investment fund" and subject to different reporting and operating rules.
Why is Google getting any favorable press at all for this? They never should have been doing it in the first place.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
There is absolutely no reason for them to retain logs linking searches to IP addresses for even 18 seconds, let alone 18 months -- this isn't "improving Google" for any of their users, no matter how much they claim it is.
Keeping search history for logged-in users is one thing; I can see how some users could find that useful, just like browser history autocomplete. Perhaps they want to keep logs of non-logged-in users around for something like geographical targeting, but there's no reason they can't process out the IP information immediately, or on a quick rolling schedule such as every 24 hours. Or, just keep the /24 or /16 form of the IP address; that effectively anonymizes the data but still provides enough information for geo-targeting or other forms of aggregation. If they want to track the flow of requests (a user searched this, then that, clicked here, then...), they can use their cookie for that, or do something like generate a hash of each IP's hostname* and track requests by the hash.
"18-24 months, however, is about the right length of time that this data could be useful for the government for purposes of intelligence gathering or criminal prosecution, however.
* Hashing the IP itself is useless as there aren't enough IPs (4,294,967,296 in theory, much less in practice due to all the reserved /8s) to make reversing the hash back to the IP difficult. However, the domain of valid hostnames is incredibily large (any alphanumeric string up to 256 characters), such that one can be reasonably confident the hostname cannot be computed from the hash.
Liberty in your lifetime
When their application server writes to the log file only log what and when, forget about the who. Let me have the code that writes to the log file, I will have it done in 1 week. If it's taking years to make this change, I expect they won't mind receiving an invoice for $1,000,000 from me. (Corporate reasoning: Years = $MILLIONS)
Why don't they just not save search data in the first place?
I am so sick of the myriad bad laws and regulations passed because they were supposedly "good for the children".
Bollocks.
People have been creating a world with a lid that is so "screwed down" by "authority" that if the trend continues, children will be growing up in a living hell, in which they are not allowed to think for themselves even after becoming adults.
Is this good for them? Is it good for *anybody*??
I think not.
for search: http://www.blackboxsearch.com/ for full proxy. http://www.mysecureisp.com/
Google is gathering a huge trove of informaiton about us and this shows it is not anonymous. Search is only part of what they have. The more Google services you use the more you let them build a very detailed profile of you. And the more you do that the less privacy you have.
They know what you search for, who you IM and email and about what, where you have appointments and what you bought. You essentially have no privacy.
If you value your privacy do not use any single provider and spread your searches, IM, email and purchases accross multiple service providers. The government can use its powers to get your data and correlate it, but no commercial entity should have the equivalent power. Commercial interests of Google or any other provider run counter to protecting your privacy.
I doubt it... Seems like the only time Google is concerned about anonymous access is when some kind of kiddie pron is involved. They have no problem with helping the communist chinese snoop on their citizens and block sites, right?
Remember its governments not just the US now they have to report to! And either way they are just going to lose money on this, have to hand over the data, which will be resold to government business parters (google's competition) for a hell of a lot less than it cost them to collect it.