Execs at AOL Approved Release of Private Data?
reporter writes "The New York Times has published a report providing further details about the release of private AOL search queries to the public. According to the report: 'Dr. Jensen, who said he had worked closely with Mr. Chowdhury on projects for AOL's search team, also said he had been told that the posting of the data had been approved by all appropriate executives at AOL, including Ms. [Maureen] Govern.' The report also identifies the other two people whom AOL management fired: they are Abdur Chowdhury and his immediate supervisor. Chowdhury is the employee who did the actual public distribution of the private search queries. He, apparently, has retained a lawyer."
First they demote him from being a lt. commander. Then they attach him to AOL. Somewhere Lore must be pulling the strings.
Where were you when the voynix came?
At this point, why would you want to stay at your present job if you need a lawyer to keep it... even if you are successful, why would you want to stay, it's obvious you won't be liked by management, since they're trying to get rid of you... Or am I missing something?
---
Programming is like sex... Make one mistake and support it the rest of your life.
Almost everything a company does, especially publicly has to have multiple stamps of approval. Can't even order a pencil without paperwork. Right now AOL is headhunting for scapegoats to sacrifice to appease the masses. This had to have nearly everybody OKing it, if it was a mistake it would have gotten yanked back a LOT faster and legal actions would be pending, they aren't threatening anybody yet because they probably don't want their own records being pulled out and becoming massivly liable.
Not at all sure about why they thought it was a good idea, they must have thought the ID numbers were sufficient to conceal identities which also shows the lack of security knowledge most executives have.
who else would have the phenomenal insight to give us such gems as
l /8-21-06_21.gif
l /8-13-06_26.gif
l /8-21-06_9.gif
http://i.somethingawful.com//sasbi/2006/08/docevi
http://i.somethingawful.com//sasbi/2006/08/docevi
and of course
http://i.somethingawful.com//sasbi/2006/08/docevi
I bet they have a stamp that says that.
-Dave
>>> so correct me if I'm wrong
You're wrong.
The IP address or user name of the person who searched has been removed, but it was replaced with a unique identifier that tracked all of the searches by the same person.
Many people search for things related to themselves. For example, if you have looked for a job in the last four years, you were foolish if you didn't search for your own name to see if your friends' blogs had descriptions of your late-night drinking binges and drug use. (You are probably foolish if you used AOL search to do this, but that's a different discussion.)
CNN ran a story where they were able to track down one older lady, just because she searched for her last name, searched for "drugstores near " or somesuch, and was the only person in her area with that name. They confirmed with her that the searches were hers. (She has a dog with problems urinating on her carpet, and she has friends with lots of diseases that she "researches" for them.) They picked someone to track down who hadn't searched for anything "naughty", but that doesn't mean they couldn't have if they had wanted to.
It doesn't hurt to be nice.
The search terms were not linked to any specific person, however, each search term was linked to a user ID. So you can compile a list of all searches a specific person did.
Partner this up with the fact the some people may search for the name, credit card number, and social security number to see if they're posted anywhere, you have some serious privacy concerns.
Take for example, (and I'm making this up), user #5, these are his search terms:
Joe Schmo
014-56-1234
4729-1234-5678-9012
Pizza stores near 1 main street, oakland, CA
Would you want this released to the public? What if some more of his search terms were:
How to divorce your wife
divorce lawyers
dating websites
how to cheat on your wife
russian brides
Ok, granted maybe you don't agree with what he's doing here, but is it right for this to be public??
Whatever happened to the "information just wants to be free" argument? Where's that now?
People generally feel comfortable with the notion that their search queries are private. Sure they may not be private, but they feel private. Sure your phone conversations arn't completly private, but the phone comapny can't just dump your conversations onto the public.
I have read many articles on the analysis of the released AOL data. Some of the articles start off something like this:
..."
"I think the release of this data is a breach of privacy and should never have been made public. But
Then they present their analysis. My question is if you are going to preach on the evils of releasing the data then do you have the moral right to analyze it? I think not.
I'd say this only proves the point - this information wanted to be free badly enough to escape from AOL, leaving a trail of career destruction in its wake!
Not even close to that simple. AOL didn't stand to make any money off this situation. The data was provided entirely "altruistically" for the benefit of researchers.
And what are these researchers "researching"? They are studying how to make searches more relevant, among other things.
Will more relevant searching make a buck for someone? Well, it's done wonders for Google, but the release of this data isn't making that research an AOL property. And we love Google because it gives us what we want to see up front, without digging for it.
In the end, the release of this data is a good thing, but the implementation of the release failed badly. Nevertheless, we *want* this data to get out, we just don't want it to get out in any way that can tag individuals.
Today the "little guy's" only defense against being taken advantage of by major corporations and the government is information and the ability to think for himself. A major problem, though, is that even those few trying to think for themselves are at the mercy of the information they are given. That's the information on which they base their decisions. The more corporations and governments know about what we are interested in and find important, the more they can tailor the information we receive to influence in their direction.
Classic marketing and academic research isn't the issue here. The issue is our ability to choose. This is the same reason the Net Neutrality issue matters, because it can directly affect our ability to find good (useful, true) information. Even if these issues weren't considered when the data was released (and I'm sure they were), such sharing of personal data amounts to criminal negligence when caring for other people's quality of life, and yes, lives. Because among the people using this information are people who directly affect our ability to live and yet seem to be driven more by monetary concerns, such as pharmaceutical companies.
That's a bit cynical, don't you think?
If they really wanted to make the most money possible, they would have sold these logs (non-anonymized) to the scores of direct marketers that I'm sure would love to have this data. Instead, they packaged it up and tried to make it available to academic researchers. These researchers honestly just want to make better search engines that run faster and return better results. Furthermore, when academics come up with a great new idea, it gets published so that anyone can read it.
Every once in a while, someone suggests an open source search engine. Check out Nutch if you want to see work in this area. However, if open source search solutions are going to be any good at all, they'll have to rely on the decades of public, published information retrieval research that's already out there.
We are entering a time when companies are capable of totally outpacing academia because they have query log data, so they know exactly what users actually do. There is no way that an academic can get this kind of data unless a company releases it. Researchers at AOL, in good faith, tried to release data so researchers could have a chance at success. Ultimately, of course, that's good for AOL since they're not in the top three search engines out there. Public research can only help raise AOL's standing by helping to level the playing field. But, it's good for you too, because you can build your open source solution based on this research too.
Yes, the release was botched, and yes, the long term user identifiers were a mistake. But don't make AOL out to be some evil company that was only out to destroy your privacy. They made a mistake!
The real problem is that they shouldn't have been keeping it in the first place!
If it can harm a consumer by its release, then it can harm that same consumer by the fact that the have it in their possession in the first place. Just how is AOL that much better or more trustworthy than the world at large?
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
I think you bring up a good point.
As a society, or at least as a subset of one, we need to discuss this. Where should the "expectation of privacy" be when one is using a search engine (or the Internet in general)? It's a very open question.
On one hand, most people I think realize that the query to the search engine is not 'private.' As in, you can go and view at any given time, all the things that are being typed in to Google. (At least you used to be able to, or maybe this was Yahoo.) At any rate, the queries themselves are not secret.
However, what freaks people out is that one query can be associated with another. So if I type in my name, I expect that somebody on the far end knows that I'm searching for my name. However, what people don't expect, is that it's possible to link together all the searches that they've made (potentially across multiple computers, if there's a login system). So that my search for my name today, could be cross-referenced with my search for restaurants in a particular area tomorrow, and cross-referenced further with some street address I search for the day after that.
Individually, only a very naive person would expect a query to be private. However, it's the cross-referenced information sorted by particular users that is concievably private, because it reveals much more than simple queries do.
Let's imagine for instance that AOL had released the same number of searches, but instead of listing the IP address (or a unique identifier that's matched 1:1 with an IP address) they just gave a time/date stamp when it was made. We probably wouldn't be having this conversation, and a few executives would still have their jobs.
Where people expect some sort of privacy (reasonably or not) is in not having one particular "search session" linked to other ones. In fact, I bet that most un-technical people probably think that they can close their browser, and thus 'start over'...not realizing that when they start searching again, it just continues adding to a list of queries from earlier. That "recordkeeping" is where the perceived invasion occurs, not in the lack of secrecy of the terms themselves.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
And the solution is to stop trusting these guys and use your own head. Heres a good way to do just that with a search proxy.
http://www.blackboxsearch.com/