Execs at AOL Approved Release of Private Data?
reporter writes "The New York Times has published a report providing further details about the release of private AOL search queries to the public. According to the report: 'Dr. Jensen, who said he had worked closely with Mr. Chowdhury on projects for AOL's search team, also said he had been told that the posting of the data had been approved by all appropriate executives at AOL, including Ms. [Maureen] Govern.' The report also identifies the other two people whom AOL management fired: they are Abdur Chowdhury and his immediate supervisor. Chowdhury is the employee who did the actual public distribution of the private search queries. He, apparently, has retained a lawyer."
First they demote him from being a lt. commander. Then they attach him to AOL. Somewhere Lore must be pulling the strings.
Where were you when the voynix came?
At this point, why would you want to stay at your present job if you need a lawyer to keep it... even if you are successful, why would you want to stay, it's obvious you won't be liked by management, since they're trying to get rid of you... Or am I missing something?
---
Programming is like sex... Make one mistake and support it the rest of your life.
Almost everything a company does, especially publicly has to have multiple stamps of approval. Can't even order a pencil without paperwork. Right now AOL is headhunting for scapegoats to sacrifice to appease the masses. This had to have nearly everybody OKing it, if it was a mistake it would have gotten yanked back a LOT faster and legal actions would be pending, they aren't threatening anybody yet because they probably don't want their own records being pulled out and becoming massivly liable.
Not at all sure about why they thought it was a good idea, they must have thought the ID numbers were sufficient to conceal identities which also shows the lack of security knowledge most executives have.
...a subpoena!
who else would have the phenomenal insight to give us such gems as
l /8-21-06_21.gif
l /8-13-06_26.gif
l /8-21-06_9.gif
http://i.somethingawful.com//sasbi/2006/08/docevi
http://i.somethingawful.com//sasbi/2006/08/docevi
and of course
http://i.somethingawful.com//sasbi/2006/08/docevi
Just because they approved it, it doesn't mean they thouroughly understood how it worked. For example, what if they were told that lists of searches (and results, if I understand this) were going to be released, with user-identifying information hidden? If they weren't told that they were replaced with unique IDs (which could be connected to a person if identifying data were to be entered), then they could not know this without doing a little research. Executives don't just get salaries for making decisions, sometimes they have to do work--but sometimes, they just decide without doing work.
I've completely forgotten what my original point was, so I'll stop here.
I bet they have a stamp that says that.
-Dave
From Mrs. Govern besides pretty much what the Government does with our search queries? Nothing else than this! :D
-ld
...if ever there was any public doubt about how dangerous the release of search query data could be, this should do a lot to prove to "the public" otherwise.
And with this improvement in public awareness of how important it is to have private data safe-guarded and controlled, I think we'll see a little more interest in what business and government does with private data. I think that ultimately, we need to get a LOT more aggressive over the misuse of the SSN (social security number) and forever separate the SSN from the credit and banking systems.
Except that some reporters used it to personally identify a 60yr old woman in flordia IIRC. They had her verify that those indeed were her searches and she explained what they were about. So...the concern here is that you CAN be identified by your searches without "personally identifiable information". Now tack that on to "how to murder wife" searches and "how to build bomb" searches and "child pornography" searches, hand it over to the government, and now there is a bit of an issue. You get arrested because a string of child porn searches came from your computer...you should have told your 17yr old daughter to do her english final on child pornography. I think we have established a nice trend of arrest first think...maybe later. God forbid you were searching for anything that could possibly be terrorist related...islam, bombs etc...looks like your thriller novel will never hit the shelves because you have been 'detained for questioning'.
The only change I can believe in is what I find in my couch cushions.
>>> so correct me if I'm wrong
You're wrong.
The IP address or user name of the person who searched has been removed, but it was replaced with a unique identifier that tracked all of the searches by the same person.
Many people search for things related to themselves. For example, if you have looked for a job in the last four years, you were foolish if you didn't search for your own name to see if your friends' blogs had descriptions of your late-night drinking binges and drug use. (You are probably foolish if you used AOL search to do this, but that's a different discussion.)
CNN ran a story where they were able to track down one older lady, just because she searched for her last name, searched for "drugstores near " or somesuch, and was the only person in her area with that name. They confirmed with her that the searches were hers. (She has a dog with problems urinating on her carpet, and she has friends with lots of diseases that she "researches" for them.) They picked someone to track down who hadn't searched for anything "naughty", but that doesn't mean they couldn't have if they had wanted to.
It doesn't hurt to be nice.
The search terms were not linked to any specific person, however, each search term was linked to a user ID. So you can compile a list of all searches a specific person did.
Partner this up with the fact the some people may search for the name, credit card number, and social security number to see if they're posted anywhere, you have some serious privacy concerns.
Take for example, (and I'm making this up), user #5, these are his search terms:
Joe Schmo
014-56-1234
4729-1234-5678-9012
Pizza stores near 1 main street, oakland, CA
Would you want this released to the public? What if some more of his search terms were:
How to divorce your wife
divorce lawyers
dating websites
how to cheat on your wife
russian brides
Ok, granted maybe you don't agree with what he's doing here, but is it right for this to be public??
It was a bit more than just search, it was complete records of internet usage from the ISP.
No it wasn't, it was strictly search terms and if they clicked on a link, what link they clicked on - that's it
I am disrespectful to dirt! Can you see that I am serious?!
And empty suits like the weenies at AOL are just kneejerking to respond to some soccermom who screamed at them at the PTA meeting last night. Heads will roll, I didn't know thanks for your helpful crticism etc etc etc.
Whereas they're probably just mad at someone for forgetting to SELL the information.
But the important question is, is it AOL's responsibility for what users decide to search for?
How is AOL supposed to know if a subset of data includes privacy data? A 9-digit number could be a SSN, but it could also be a phone number (not all countries use 10 or 7 digits), an ISBN (minus the check digit), or any number of other numbers. A 16-digit number (or 15 digits) isn't necessarily a credit card number. Just because someone puts an address into a search engine doesn't mean that it's their own address.
Don't the users have some responsibility for their own private data?
But then, I'll probably be modded down (like what happened in the last discussion) for not being slashdotically correct, or something.
WikiSearch anyone? It's about time that people started realizing that these companies are not going to make this easy on anyone. I would gladly pay $5-$10/month to pay for the bills of an open source, accountable search service that doesn't keep so much data on me it makes the Stasi look like amateurs.
How about...
* Filed lawsuit against former employer due to wrongful termination.
Whatever happened to the "information just wants to be free" argument? Where's that now?
People generally feel comfortable with the notion that their search queries are private. Sure they may not be private, but they feel private. Sure your phone conversations arn't completly private, but the phone comapny can't just dump your conversations onto the public.
I have read many articles on the analysis of the released AOL data. Some of the articles start off something like this:
..."
"I think the release of this data is a breach of privacy and should never have been made public. But
Then they present their analysis. My question is if you are going to preach on the evils of releasing the data then do you have the moral right to analyze it? I think not.
I'd say this only proves the point - this information wanted to be free badly enough to escape from AOL, leaving a trail of career destruction in its wake!
Not even close to that simple. AOL didn't stand to make any money off this situation. The data was provided entirely "altruistically" for the benefit of researchers.
And what are these researchers "researching"? They are studying how to make searches more relevant, among other things.
Will more relevant searching make a buck for someone? Well, it's done wonders for Google, but the release of this data isn't making that research an AOL property. And we love Google because it gives us what we want to see up front, without digging for it.
In the end, the release of this data is a good thing, but the implementation of the release failed badly. Nevertheless, we *want* this data to get out, we just don't want it to get out in any way that can tag individuals.
Don't the users have some responsibility for their own private data?
How? By never giving it to anyone? Never getting a loan, insurance, or a magazine subscription? Always working for cash under the table and never filing taxes? Any one of those things releases your address, phone number, billing information, etc. out of your control. At some point you have to say the data has changed hands and so has the responsibility to protect it.
Sure it was dumb (was it really dumb at the time though, or are we only saying this in hindsight, knowing now that the data would be given to whoever wanted it? Being able to find things near me is quite useful, I'd hate to always dine two cities over just to "throw off the trail") to have put in your own ssn or home address into the search engine, but saying that it's the users' fault that the data got released is like saying that it's my fault if someone breaks into my house and shoots me in the face with my own gun, just because I kept it loaded. Philosophers have debated the nature of "fault" for millenia, leading to concepts such as "attractive nuisance", however generally speaking blame is assigned to the person who committed the act, and only slightly shared with people who set up the environment for the act to occur.
If I have been able to see further than others, it is because I bought a pair of binoculars.
Today the "little guy's" only defense against being taken advantage of by major corporations and the government is information and the ability to think for himself. A major problem, though, is that even those few trying to think for themselves are at the mercy of the information they are given. That's the information on which they base their decisions. The more corporations and governments know about what we are interested in and find important, the more they can tailor the information we receive to influence in their direction.
Classic marketing and academic research isn't the issue here. The issue is our ability to choose. This is the same reason the Net Neutrality issue matters, because it can directly affect our ability to find good (useful, true) information. Even if these issues weren't considered when the data was released (and I'm sure they were), such sharing of personal data amounts to criminal negligence when caring for other people's quality of life, and yes, lives. Because among the people using this information are people who directly affect our ability to live and yet seem to be driven more by monetary concerns, such as pharmaceutical companies.
Maybe we should just leave it at that.
Find environmentally and socially responsible products on http://buy-right.net
That's a bit cynical, don't you think?
If they really wanted to make the most money possible, they would have sold these logs (non-anonymized) to the scores of direct marketers that I'm sure would love to have this data. Instead, they packaged it up and tried to make it available to academic researchers. These researchers honestly just want to make better search engines that run faster and return better results. Furthermore, when academics come up with a great new idea, it gets published so that anyone can read it.
Every once in a while, someone suggests an open source search engine. Check out Nutch if you want to see work in this area. However, if open source search solutions are going to be any good at all, they'll have to rely on the decades of public, published information retrieval research that's already out there.
We are entering a time when companies are capable of totally outpacing academia because they have query log data, so they know exactly what users actually do. There is no way that an academic can get this kind of data unless a company releases it. Researchers at AOL, in good faith, tried to release data so researchers could have a chance at success. Ultimately, of course, that's good for AOL since they're not in the top three search engines out there. Public research can only help raise AOL's standing by helping to level the playing field. But, it's good for you too, because you can build your open source solution based on this research too.
Yes, the release was botched, and yes, the long term user identifiers were a mistake. But don't make AOL out to be some evil company that was only out to destroy your privacy. They made a mistake!
Let's see here... we have various free services that are available to everyone on the Internet. These free services aren't free to operate - they cost significant amounts of money to do so. Where does the money come from?
The first place is of course advertising. Having people pay to push their message at the unsuspecting people that are using the service. Eventually the ads become all-pervasive and lose some of their value. Where we are today is that banner ads are almost worthless and Google has made selling text-only ad space the only means of support for many free sites.
The one that a lot of people do not understand is that just the use of the service also has value - and informational value. This information can be collated, organized and distilled and sold. There are people that would like to know how many times the word "Viagra" was searched for on Google. This information is available, for a price. Similarly, Ford would like to know how many people in Indiana search for Toyota, Dodge or Chevy. Again, this is a source of revenue for so-called free services.
What some people posting here do not see to get is that their use of these free services is being tracked and data mined. Some of this is just to keep the service running. It is important to know that when a new album is released by Madonna that everyone will be searching for a way to download it. This can change the resources required to operate a search service. There are similar resource requirement changes in all such systems and the data required to maintain them is certainly being tracked, monitored and used. Some of it is also sold because it has value.
Could there be a search engine or an IM service that didn't data mine or sell ads? Sure, but why would anyone pay $50 a month for a search service if there was one that was free? Some particularly paranoid types might to keep their porn searches private, but the majority would not. The amount of data that can be mined from free services (forums, blogs, search engines, IM systems, etc.) is incredible and as more and better data mining is implemented, the greater value this will have.
Isn't free wonderful?
I didn't say it was the user's fault, I asked whether they have some responsibility or not. That is a big difference.
In situations like this, there's usually more than enough blame to spread around. Sure, it's easy to just say that AOL was completely in the wrong and should have to pay for it, but that doesn't reflect the whole truth, does it?
Think of it this way, there are quite a few people who would never think of putting their own personal information into a search engine (I didn't, until this incident happened, and I still wouldn't put my credit card or SSN into any search engine). Yes, I've searched for my name, but I've found about a dozen others around the world with my same name (where are the privacy implications in that?).
Out of the tens of thousands of searches that they released, there were, what, about 100 or so numbers that may have been SSNs? We're talking about an infintesimal proportion of the data.
At what point would it have been acceptable? If there was only one possible SSN? Or do you insist on perfection?
If you are insisting on perfection, then I would suggest that your problem isn't with the privacy data, but with any data being released. If that's the case, don't lie and make it seem like it's just the privacy data that you are up in arms about.
Ultimately, it does fall to the users to remember that any information that they give to another party becomes the property of that party (except as otherwise defined by law, or negotiated in a legal agreement or contract). At that point, they can do what they wish with that information.
Faster! Faster! Faster would be better!
I'm just waiting until AOL finally curls up and dies. Been waiting for over a decade now...but it seems like the wait may get substantially shorter now.
This sig will self destruct in 5 seconds.
For example, if you have looked for a job in the last four years, you were foolish if you didn't search for your own name to see if your friends' blogs had descriptions of your late-night drinking binges and drug use. (You are probably foolish if you used AOL search to do this, but that's a different discussion.)
Why would it be foolish? AOL search is just Google, anyway.Here's a potential solution to your "industrywide problem": Stop treating us (your users) as nothing more than a market. We're individual human beings. Right now, we just look like sacks of money to you and your "research" consists of trying to extract that money from us.
If you want to be treated as a person, then limit your interactions to other people, not corporations. What the hell do you expect from a FACELESS ENTITY?
Why are people continually shocked at the behavior of corporations (which are entities conceived for the sole purpose of MAKING PROFIT)? Does it suck? Yeah, it sucks. Does bitching about it make any more sense than complaining about the lack of whiskers on a lizard? Nope.
The real problem is that they shouldn't have been keeping it in the first place!
If it can harm a consumer by its release, then it can harm that same consumer by the fact that the have it in their possession in the first place. Just how is AOL that much better or more trustworthy than the world at large?
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
I think you bring up a good point.
As a society, or at least as a subset of one, we need to discuss this. Where should the "expectation of privacy" be when one is using a search engine (or the Internet in general)? It's a very open question.
On one hand, most people I think realize that the query to the search engine is not 'private.' As in, you can go and view at any given time, all the things that are being typed in to Google. (At least you used to be able to, or maybe this was Yahoo.) At any rate, the queries themselves are not secret.
However, what freaks people out is that one query can be associated with another. So if I type in my name, I expect that somebody on the far end knows that I'm searching for my name. However, what people don't expect, is that it's possible to link together all the searches that they've made (potentially across multiple computers, if there's a login system). So that my search for my name today, could be cross-referenced with my search for restaurants in a particular area tomorrow, and cross-referenced further with some street address I search for the day after that.
Individually, only a very naive person would expect a query to be private. However, it's the cross-referenced information sorted by particular users that is concievably private, because it reveals much more than simple queries do.
Let's imagine for instance that AOL had released the same number of searches, but instead of listing the IP address (or a unique identifier that's matched 1:1 with an IP address) they just gave a time/date stamp when it was made. We probably wouldn't be having this conversation, and a few executives would still have their jobs.
Where people expect some sort of privacy (reasonably or not) is in not having one particular "search session" linked to other ones. In fact, I bet that most un-technical people probably think that they can close their browser, and thus 'start over'...not realizing that when they start searching again, it just continues adding to a list of queries from earlier. That "recordkeeping" is where the perceived invasion occurs, not in the lack of secrecy of the terms themselves.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Pope Catholic.
Bear craps in woods.
Why would it be foolish? AOL search is just Google, anyway.
Yeah, it's all the creepiness of Google, but without the "do no evil" oversight. What could possibly be wrong with that?
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Telecom companies can do a lot of this stuff just as well, if not better than google and they are getting paid quite well. That does not stop them from selling your name to spamlists, or selling phone records to government spies.
There is a moral bankruptcy in this society that ensures that anything goes as long as you get what you want and can get away with it. Individuals have no strength or bargaining power to defend themselves against corporate predation, except by not using the service.
Until people start balking at bad TOSes en masse, there can be no change. Unfortunately, most individuals feel that as long as they get what they want, there is no need to make a fuss.
Children? After reading some of those search queries it's the dogs I'm worried about.
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
Yeah, it's all the creepiness of Google, but without the "do no evil" oversight. What could possibly be wrong with that?
Ever hear the phrase "hindsight is 20/20?" And do you really think some "do no evil" corporate mantra is going to protect you until the end of the universe? I could say you're just as stupid for using Google (after all, didn't you realize that in May 2009 they're going to release all your search queries?)If you look at the Something Awful website, they have a bunch that are just like that. They also have some that, while hilarious, indicate an alarming criminal intent by the searcher. So one wonders how long before some agency gets a whiff of this and demands access to the whole thing, mapped to the names and addresses of the account holders...
No, actually they hoped for even more money this way.
See, just selling email addresses to the spammers doesn't actually bring _that_ much money. See the AOL idiot who exported the database and sold it to spammers. He's got, I think, some tens of thousands of dollars. That's small change for a corporation.
Even if they sold it together with the search keywords, how much do you think they'd get? Hundreds of thousands? Let's even be generous and say a couple of millions? Those guys have to think of their own profit, so don't think you'll get all their advertising money for the next decade. That's hardly worth the effort.
No, if you want to maximize your income, the way to go is to do the keyword matching _yourself_. See: Google.
So excuse me if I don't think that AOL did some noble altruistic act, out of sheer generosity towards the academia. It was a desperate "someone please please please kill Google for us" act, and they stand to gain a _lot_ more if it actually works. Maybe they packed it in some PR double-speak to sound like some kind act, so they'd get their public image polished too in the process, but rest assured that even that wasn't the primary motivation. It was about long term money, plain and simple.
And to that end, they had no qualms with raping their users' privacy. If it hadn't been so hare-brained and hadn't backfired, rest assured that the same CTO would now be getting a big fat bonus as a reward.
A polar bear is a cartesian bear after a coordinate transform.
If they really wanted to make the most money possible, they would have sold these logs (non-anonymized) to the scores of direct marketers that I'm sure would love to have this data.
No.
1. Why would they want to sell the whole cow instead of just selling milk? The less info sold, the more likely that telemarketers will keep coming back for more. Actual names are super valuable, so you don't sell those right away, you exhaust your first level of sales and then resell with more info.
2. Protection. If they attach names, customers might get angrier, and/or could find out who leaked the info. This whole business is untested in the courts so no corporation is going to want to be the test case.
Your theory on AOL's intentions seems pretty far-fetched and naive to me. But whatever their intent was, what they really did was advertise their goods to a lot of telemarketers who may soon become customers of AOL or of other engines.
If your SPAM mail starts to eerily echo your search queries, do not be surprised.
Your theory on AOL's intentions seems pretty far-fetched and naive to me.
Well, if their intentions were otherwise, they certainly covered their tracks well:
Given that their response after the public outcry hasn't been a textbook case study in crisis management, I'd be surprised if these actions were just some kind of smokescreen.
If I had any mod points left, parent would definitely be +1 Funny...
web front-end to search the 2gb log file http://czern.homeip.net/aolsearch/
If she floats, she's a witch.
Sending a few lambs to slaughter won't change the direction of the flock. If all the execs and management of this company were completed different people, the same decision would have been made.
Did anyone really think this wasn't a preplanned action by AOL? They probably thought everyone would think they are cool and they could get recognition from releasing this. Of course, it back fired and everyone hates AOL more than ever.
And the solution is to stop trusting these guys and use your own head. Heres a good way to do just that with a search proxy.
http://www.blackboxsearch.com/
Reminds me of the Simpsons episode where the town splits along the new area codes- Homer (after seeing a telephone companies promotional video): I accuse the phone company of making that video...ON PURPOSE!
Here is an example. The post above yours (threaded) has three links to some interesting examples of this data. The last one is a rather strange fellow who seems to like the idea of dogs and women together. He has a few other rather strange ideas as well. Obviously AOL did not release this data with their identifiers, but if you look at the full listing of this users history, it also shows the following lines:
9550729 bouie elementary school teacher exposed by nbc news
9550729 do female elementary school teachers get sexually aroused by their pre-teen male students
9550729 do female elementary school teachers get sexually aroused by their pre-teen male students
9550729 story on mr. mula the bouie elementary school teacher accused of assaulting a student
9550729 teacher at bouie elementary school in dekalb county accused of pushing a student
There are also other questions about similar locations. Now, it is getting interesting as this LOOKS like an identifiable teacher reading about child porn and checking search engines to see if his past is haunting him. It may very well be the case that Mr Mula is just an interest of his and this is out of context, but it would be worth looking into.
I hope that pedophiles are outed with this data, but there are going to be lots of innocents who are also caught up in it. For instance if you once looked up jihad and then looked up your name, you might get a tap on the shoulder one day. Of course investigators will be able to look at logs anyway, but there are a lot more eyes looking on this one.
Why would it be foolish? AOL search is just Google, anyway.
AOL search is just Google search, except that they are also your ISP and can conclusively and definitively aggregate all of your searches and assign them a unique identifier, then release them to the public.
Using Google search and clearing your cookies, you cannot be definitively matched to what you look for.
It doesn't hurt to be nice.
Yes, I've searched for my name, but I've found about a dozen others around the world with my same name (where are the privacy implications in that?).
Have you also searched for "Pizza Hut near ?" Voila, all those others with the same name are eliminated; it is clear that the search was about you. Piece together more searches, and it becomes clear that it was you, not about you. Then anything else you have ever searched for is correlated as well.
It isn't any one search that causes a privacy concern. As others point out, some search engines let you see what people are searching for, either a single search at random, or as aggregate data. However, no one else correlates all the searches from the same person into one list, easily identifiable as such.
It doesn't hurt to be nice.