NSF Funds Data Anonymization Project

← Back to Stories (view on slashdot.org)

NSF Funds Data Anonymization Project

Posted by Soulskill on Tuesday November 2, 2010 @06:02AM from the maybe-facebook-will-donate dept.

Trailrunner7 writes "A group of researchers from Purdue University has been awarded $1.5 million from the National Science Foundation to help fund an ongoing project that's investigating how well current techniques for anonymizing data are working and whether there's a need for better methods. The grant will help to further research from computer scientists and linguists, who are looking at ways in which people can still be identified through textual clues even after explicitly identifiable data has been removed. The Purdue anonymization project has been ongoing for some time, and also includes researchers from a number of other institutions, including Indiana University and the Kinsey Institute."

25 of 36 comments (clear)

Min score:

Reason:

Sort:

Testing Slashdots Methods for Anonymization by Anonymous Coward · 2010-11-02 06:04 · Score: 4, Funny

It works!
Can I pick up my grant check now?
1. Re:Testing Slashdots Methods for Anonymization by drunkennewfiemidget · 2010-11-02 06:14 · Score: 4, Funny
  
  We wish you could, but we don't know who to write the cheque out to.
2. Re:Testing Slashdots Methods for Anonymization by countertrolling · 2010-11-02 06:18 · Score: 1
  
  Cash
  
  --
  For justice, we must go to Don Corleone
3. Re:Testing Slashdots Methods for Anonymization by teachknowlegy · 2010-11-02 06:24 · Score: 2, Funny
  
  Write the check out to Anonymous Coward, duh! When someone produces the ID of the same name, they can have it. Name changes are cheap, aren't they?
4. Re:Testing Slashdots Methods for Anonymization by stephanruby · 2010-11-02 06:25 · Score: 1
  
  Sorry Rob,
  With that number next to your "anonymous" name, #34103516, you might as well just have given us your full social security number.
5. Re:Testing Slashdots Methods for Anonymization by zkp · 2010-11-02 07:42 · Score: 3, Funny
  There are many bits of information we can glean!
  
  Your "anonymous" name, #34103516
  
  Date and Time: (Tuesday, November 02 @ 6:04PM)
  
  You were one of the first posts so you probably read Slashdot often. Also, you probably visit Slashdot regularly around 6:00 PM.
  
  Writing Style: Short messages, funny
  
  So I could search for regular Slashdot users who tend to be active around 6:00 PM, post brief messages, and are often one of the first to comment. Narrow down that list to users who actually did log in on 11/02/2010. Since, we know that you did read this article there is also a decent chance that you commented on this article with your actual user name.
  
  We will find you!
6. Re:Testing Slashdots Methods for Anonymization by aapold · 2010-11-02 11:19 · Score: 1
  
  If you cash it, you might get an NSF fee....
  
  --
  "Waste not one watt!" - CZ
Hmmm by WrongSizeGlass · 2010-11-02 06:05 · Score: 2, Insightful

I wonder if they could get a larger grant from Google or Facebook or the NSA or [insert large organization name here] to get a guaranteed result of "things are just fine, nothing to see here"?
1. Re:Hmmm by natehoy · 2010-11-02 06:11 · Score: 1
  
  No, that would be wrong, of course. They'd never be able to accept a grant. It could never happen. Ever.
  But only because, technically, it's called a bribe, not a grant. If you want to call it a grant, you have to put it in quotes, as in: "I wonder if they could get a larger "grant" from..."
  
  --
  "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
2. Re:Hmmm by elrous0 · 2010-11-03 02:09 · Score: 1
  
  Now why would the NSA be interested in technology that could identify anonymous posters using "textual clues even after explicitly identifiable data has been removed"? That's just silly talk.
  
  --
  SJW: Someone who has run out of real oppression, and has to fake it.
Did anybody else read.... by tacarat · 2010-11-02 06:25 · Score: 1

NSFW?

--
"Common sense will be the death of us all"
1. Re:Did anybody else read.... by Anonymous Coward · 2010-11-02 06:59 · Score: 2, Funny
  
  No, but I read NSF Funds, and thought, why is slashdot doing a story on my wife?
2. Re:Did anybody else read.... by kmoser · 2010-11-02 16:05 · Score: 2, Insightful
  
  I read it as "NSFW" and thought the same thing: why is Slashdot doing a story on your wife?
How Benevolent Of The N.S.F. ( +3, Instrusive ) by Anonymous Coward · 2010-11-02 06:25 · Score: 5, Interesting

"The grant will help to further research from computer scientists and linguists, who are looking at ways in which people can still be identified through textual clues even after explicitly identifiable data has been removed." SHOULD READ
"The grant will help to further research from computer scientists, linguists, AND the N.S.A. who are looking at ways in which people can still be identified through textual clues even after explicitly identifiable data has been removed."
Yours In Krasnoyarsk,
Kilgore T.
NSF by Combatso · 2010-11-02 06:28 · Score: 2, Insightful

Headline had me thinking the science grants were returned Non Sufficient Funds... thats a sign of a really bad economy.
1. Re:NSF by bhcompy · 2010-11-02 06:30 · Score: 1
  
  Yea, seriously. It's the Non/In Sufficient Funds funds
2. Re:NSF by Combatso · 2010-11-02 06:38 · Score: 1
  
  you mean everytime i drop an NSF and my bank charges me 20 dollars, that 20 dollars goes to fund Data Anonymization... but there is no way to know for sure, because they didn't catch the guys name who they gave my money to.
3. Re:NSF by Philomage · 2010-11-02 08:16 · Score: 1
  
  Even after getting that it was about the National Science Foundation providing funding for a research grant, I was still reading (for a while) to see what it had to do with kiting cheques. :-/ "You can take the nerd out of the trailer park..."
Interesting Spin by Anonymous Coward · 2010-11-02 06:44 · Score: 3, Informative

The research is actually into data mining, not some new forms of encryption/anonymization.
I'm sure the results will provide insight that may lead to better anonymization, but I bet framing the whole thing around the more popular side of that spectrum makes it sell better.
Good or bad? by kwbauer · 2010-11-02 07:10 · Score: 1

So, is this a good development or a bad development? If the finding better ways to identify people leads to better ways to remove that information then it is better?
Or is it better because it will help us not remain anonymous when we donate to our favorite cause and that organization is in some way involved in US politics?
Re:How can you anonymize without removing meaning by poena.dare · 2010-11-02 07:17 · Score: 1

... in other words, meatloaf!
link to NSF grant by Danathar · 2010-11-02 07:38 · Score: 1

http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1012208
above is direct link to award
Re:How can you anonymize without removing meaning by Monkeedude1212 · 2010-11-02 07:42 · Score: 1

And I don't see how can you have meaningful data if you removed all the information that would enable you to recreate the individual sample.
You can still segregrate them into groups even if you can't identify the individual sample - which is essentially what happens already. Data miners go and determine "People who like Penny Arcade also like Video games - so lets put an Ad for Fable 3 up on the main page" - whether that is Penny-Arcade's decision to get more click-revenue or whether they just let an adserver handle that obvious piece of info is irrelevant, you are still using relevant data with meaning to market to a large group of people instead of an individual.
Now - this article brings up the idea of whether I can retain my anonymity online. If it were up to me to run this experiment, I would do exactly as you said, some advanced behaviour analysis technique. First we'll start off here: I'm on Slashdot. You have an alias, and you have a few of my posts. You can tell that they tend get a little long winded sometimes, easily getting to 3 or more paragraphs if there isn't an immediate punchline in sight, or responding to a question. You also get what stories I usually respond to - I often don't have much to say about Linux releases, but I am often avid in the gaming area.
So you go a lot of the other sites that you can infer slashdotters might frequent. All the tech news sites, and then those towards my posting habits, a lot of gaming sites, yadda yadda yadda. First thing you are looking for is similar aliases, then you cross-refer the posts on different sites to see the similarities. How many Monkeedudes are there on the Gamespy forums? Do any of them make really long posts? He's mentioned on Slashdot that he is Canadian - do any of the other sites have public profile info that say he's Canadian?
And so on and so forth. This is all automated - so it's much quicker than a person trying to build this file. After it's all built, a human can quickly skim the data and knock off any outliers that might have seemed similar to the computer.
Now - have I ever mentioned my name anywhere in all the data collected? My age? My city? Can you infer my age given the relative maturity of my posts - and my registered dates and other posts online? Can you infer my city based on my jokes about the weather around here? How hard would it be to nail me to a Facebook page with various likes and dislikes - if that information were available to you (either publicly or for sale?).
It's a scary world we live in, I don't know if any such systems exist, but I see it as definately technically feasible. It also seems like a great product I could market and make lots of money off of it - but I definately don't believe in progressing that side of the internet.
Simple equation by shoehornjob · 2010-11-02 07:44 · Score: 1

governemnt entity(CIA+NSA)* national security + keylogger or trojan = we ownz all your base (where base = data). Anonymiztion HAH.

--
"We are just a war away from Amerikastan. When god vs god the undoing of man." Dave Mustaine
Google translate? by mveloso · 2010-11-02 11:13 · Score: 1

For better anonymization, you could run the data though google translate a few times. That'll guarantee that it's anonymized.