Google to Anonymize Users' Search Data
Google's official blog states they are on an effort to anonymize their search data after 18-24 months. After previously fighting turning over search data to the feds, it looks like they are striking another blow to the "think of the children" crowd. Any bets on whether MSN or Yahoo! will follow suit?
All they have to do is erase the logs every day or just not keep them. It doesn't "take an effort". Anonymous proxies have been doing this for years.
Global warming is a cube.
Although I did have to install the AnonymizeGoogle Firefox plugin to get it.
Why would Google have to comply with EU regulations? :?
Maybe because they do business in Europe?
My site
My guess is they don't do it immediately is because there is internal business value in mining the data. User patterns, length of stay, etc. After 18 or 24 months, the internal value has dropped significantly as things change quickly. I would have thought that the value would have dropped even quicker then that, say after 6 months or maybe a year.
I never got why google needs to keep all that history without anonymizing it.
There is - as far as I can see - no rational argument that has to do with improving search results because you have them tied to individuals.
And yes, keeping tabs on half the globe is evil too...
MP3 Search Engine
Google should not be collecting any of that huge pile of information AT ALL, not just anonymising it after 18 months. As the AOL case showed, search queries can be used to identify individuals even after AOL anonymized them, so it's not IP addresses they are recording, it's PEOPLE.
There is no need to collect the IP addresses of searchers that haven't opted in to Google's personalized search. There is no law, that requires it.
There is no need to store the IP addresses of individual visitors to websites when Google analytics is used on a web page.
There is no need to store IP addresses of pages delivered to adsense viewers. Clicks maybe for a short time to prevent click fraud, but viewers, no.
None of this information should be recorded, and further the EU privacy directive should be enforced to ensure that none of that information is recorded. The law says we have privacy, Google should be forced to comply with that law.
Google plan to make it "more anonymous". Like pregnancy, data either ARE anonymous or they ain't. You can't qualify an absolute, and "anonymous" is an absolute condition indicating lack of information.
Not only that, but is the history of searches you made over 2 years ago relevant to your current searches performed today?
liqbase
If you've got nothing to hide, you should have no problem with this.
http://eur-lex.europa.eu/LexUriServ/LexUriServ.do? uri=CELEX:32006L0024:EN:NOT
The data retention directive only applies to ISPs, and only deals with who you "communicate" with. It does not explicitly say that a record of which websites you visit should be retained, and it explicitly says that the content of the communication must not be retained.
However, as for all EU directives, it only contains the baseline of regulation. Directives are never law themselves, but have to be implemented in each respective member state by each respective legislative body. These, in turn, are free to implement whatever they want ABOVE the baseline, so some member states may have longer retention periods for this data, some member states may require ISPs to retain additional data.
The deadline for this directive is September this year, but if you read it, a few member states have reserved the option to postpone parts of the directive, typically of the internet-related traffic. This basically means that they recognize the difficulties in implementing it, and want more time to think about on how to do it, or possibly obstruct it.
What all of this boils down to is that maybe, sometime in the future, if you have an European ISP, they may be required to store all the URLs that you access. Google search data is transmitted as querystring parameters that are part of the URL, which means that your search data may be stored by your ISP, in a non-anonymized way. There's nothing in this possible future that Google has to comply with, as long as they are not an European ISP.
Stop googling for "jihad death to american president" if you're worried about getting caught.
Excuse me?! I live in America and if I want to research the results of the search terms "jihad death to american president" I'm well within my fucking rights.
Fuck you for saying otherwise.
Well you're describing a law enforcement problem not a privacy issue.
Google is within their rights to gather as much information as you feed them (your ip, time of day, host strings, query string, etc).
My point was if you were planning on committing crimes, you shouldn't use google to find tips.
Tom
Someday, I'll have a real sig.
If you've got nothing to hide, you should have no problem with this.
Yeah while we're there we can install the webcam in his bathroom and broadcast on the net every time he takes a crap. I have a pair of guys willing to do the commentary on wiping techniques to add to the video...
Seven puppies were harmed during the making of this post.
Ah, the out of context argument. My house is private by the definition that I have locks on the doors and blinds on the windows.
Funny - my computer is in my house, behind locks and blinds too. Hey Google's computers also are behind lock and key, and they even have security guards and alarm systems. I don't ever remember giving Google permission to disclose any information shared between them and I - oh and heaven forbid I go around giving away the information Google found for me - I'd get sued!
Why would the whole world automatically be party to the information Google and I shared one evening? My computer sent that information to a specific internet address, and the answer came back specifically to my computer.
Not so out of context...
Seven puppies were harmed during the making of this post.
Which is it? 18, 19, 20, 21, 22, 23 or 24?
http://www.rense.com/general79/wdx1.htm
I don't think it will mean much unless they publish their anonymization technique. Even Google seems to have doubts about it, and considering the resources of some attackers (e.g., national governments), if the anonymization can be broken it will be.
But Google's anonymization does not have to be perfect: Google isn't the only place your google.com activity is recorded: There's your personal computer, possibly your ISP, other sites (referrer links show Google search terms), etc. As long as Google makes their anonymity difficult enough to break that it's significantly easier to go elsewhere for the information, they've done their job. If you need to be anonymous, I hope you are taking other steps.
I, for one, welcome the merciful intentions of our benign new overlords.
Not only that, but is the history of searches you made over 2 years ago relevant to your current searches performed today?
Studies have shown that 43% of all people who search for "Donkey Love" will buy our product within 3 years if they see our ads.
Seven puppies were harmed during the making of this post.
Exactly, it's to Google's MONETARY benefit that they record this information. The EU Privacy law says THEY CANNOT RECORD MORE PERSONAL INFORMATION THAN IS NEEDED FOR A TRANSACTION. Now that it's clear that search data is personally identifiable, the EU Privacy law should be used to FORCE GOOGLE TO QUIT IT.
/ double.click.lawsuit.idg/
"The moment you sent your request out over the internet in plain text to a third party (that is a corporation out to make money you know) you lost that."
Not so, the law says we have to consent and we didn't consent!
And what about when that party isn't Google? Google analytics is not on Google's site, it's embedded on third party sites, Google's adsense is on other people's site too. I didn't consent to handing my data to Google when I surfed to third parties site, Google took that data and recorded it in violation of EU privacy laws.
This has also been sued for before resulting in Doubleclick backing down over exactly this issue.
http://archives.cnn.com/2000/TECH/computing/01/28
"A California woman has filed suit against DoubleClick, accusing the U.S.-based online advertising company of unlawfully obtaining and selling consumers' personal information, according to a statement issued by her attorney's office."
"Hariett M. Judnick filed the suit in Marin County Superior Court in California, on behalf of the "general public of the state of California," the statement said.
The suit alleges that DoubleClick employs Internet cookies to identify users and track their movements on the Internet. The company tracks and records the sites an individual visits, as well as the information transmitted on the sites, such as names, ages, addresses, shopping patterns and financial information."
This is why it pays to have a modicum of computer knowledge.
Assuming you're not trolling...
When you send a query to google, it goes over the "internet" in the clear. That is, not encrypted. Anyone who can see it can read it. Well who can read it? Turns out a lot of people. Between me and google are probably 10 different boxes. 5 of which are just my ISPs routers. The other five are boxes on other networks, not even related to Google.
There is no inherant requirement for privacy like there is with telephones (maybe their ought to be one). But that said, you're giving your data to Google, willingly no less. That gives them every right to record it. You gave them permission by using their service, I guess you never read their TOS which is your fault, not theirs. Think about the analogy in the real world. This is like you handing your drivers license to every stranger you meet, then getting upset when some of them write it down.
If you don't want your assets [IP, location, name, platform, etc] leaked to Google you should use an anonymous proxy.
Tom
Someday, I'll have a real sig.
I'm not against google cleaning their logs. I'm against people claiming this is a privacy issue.
Google logging all your queries: Not a privacy problem.
Bank leaking your SSN via stolen laptop: Privacy problem.
AOL knowing that you like midget porn: Not a privacy problem.
Government using sub-standard contractor to manage passport data, later turns up on broken into computer: Privacy problem.
By screaming wolf every time "data" is mentioned you desensitize people to real privacy problems.
Someday, I'll have a real sig.
List of nifty little phrases that have bitten their speakers in the ass:
Now Google brings us:
Let's just be less evil, now that we've been caught.
Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong fix.
it's all about the advertising. Google's knowlege of you lets them advertise to you more effectively.
It isn't that Google necessarily care that it is "you" (actually they might but that is another thread...), but "you" are doing a search and then clicking on links in a particular order which is a context that is important for ranking. At an abstract level, the relationship between what you searched and the links you tried is stuff Google wants to track to help enhance relevancy and search results. The problem is that with modern technology to do this they need to know somethings that aren't anonymous which can be abused.
If they can come up with a way to do this without tying it all back a computer and the individual who made the request then we are probably all better off not because privacy issues (but that is a great side effect) but because you get better results from removing the irrelevant data from ranking consideration. The closer they get to a true anonymous search system, the better the results should theoretically be.
"Not only that, but is the history of searches you made over 2 years ago relevant to your current searches performed today"
... its like the old saying, "Knowledge Is Power".
It is to Google as they want to know more about you, so they can build up a clearer profile about you. Just because they (say they) are going to delete the data after 2 years, doesn't mean they will not use the data in that two years to build up a profile about what you like. Then they can still keep updating that profile over time while deleting data. So even once they delete the data after two years the profile will still persist (in an ever changing and growing form).
The whole Google "do no harm talk" sounds more like PR spin talk to cover up what their real intentions are
From a research point of view, Google is basically a vast data mining research company. They are forever looking for more new ways to do data mining.
So now imagine in say a few years from now, you could work out how to build up a profile of searches from a company instead of a person. Then you would be able to know what that company is interested in. Its also the logical extension of profiling individuals. But it would also be pure industrial espionage. But we are told, Google will do no harm, so its ok then. Imaging how valuable that data profiling would be to sell it to a competitor of that company.
I think in a few years from now, we will see countries starting to create their own search engines so all their research doesn't get feed though other countries search engines, which are basically gigantic information filtering and collection systems for what people (and companies) are interested in.
There are 10 kinds of people in the world... those who understand binary and those who don't.
Personally I think it's all a load of BS. If they really cared about our privacy, and if all they really needed my IP addy for is to aggregate my searches to 'better serve me', then all they have to do is one-way hash my IP addy. Then they can still tie all my searches together, and my gmail and such, but they wouldn't be able to back track it. And the govn't could demand all they want... you want the IP of the user who searched this? Here it is Mr. Bush... go nuts: x867:%dsgfk435j>67&*g[fg
So forgive me if I don't get all thankful for Google's big gesture. Heh.
Just hard code the function that grabs "HTTP_REMOTE_ADDR" to return "127.0.0.1." That way the feds will think all the kiddie p0rn searches came from the computer they are using.
People searching for their social security numbers just for the hell of it, or their CC numbers, and presto! Now real numbers exist in some "Google history list" for ever and ever.
There's a goldmine of data there. "Anonymizing" it doesn't affect this, unless they have filters to try to recognize such and get rid of it.
Still, if it's in the form of "User X" searched for these 132 terms last month, some terms might identify them and hence link them to other things like their unfortunate search for "donkey love".
E.g.
1234 Fake Street (suppose it's your real address)
+britney +bald +"bald down there"
What does "bedonk-i-donk" mean?
fat asses with tiny waists
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
Studies have shown that 43% of all people who search for "Donkey Love" will buy our product within 3 years if they see our ads.
...and that number rises to 98.3% if we mention we found that item in their search history.
Step into a huge movement. Don't Tread In Me.
What I thought was a future concern may already be happening. According to http://www.computers.net/2006/08/google_in_dange.h tml Google holds 5.8 billion in marketable securities. This is more than 40% of their assets, which by SEC rules means they are a "investment fund" and subject to different reporting and operating rules.