Slashdot Mirror


NSF Funds Data Anonymization Project

Trailrunner7 writes "A group of researchers from Purdue University has been awarded $1.5 million from the National Science Foundation to help fund an ongoing project that's investigating how well current techniques for anonymizing data are working and whether there's a need for better methods. The grant will help to further research from computer scientists and linguists, who are looking at ways in which people can still be identified through textual clues even after explicitly identifiable data has been removed. The Purdue anonymization project has been ongoing for some time, and also includes researchers from a number of other institutions, including Indiana University and the Kinsey Institute."

36 comments

  1. Testing Slashdots Methods for Anonymization by Anonymous Coward · · Score: 4, Funny

    It works!

    Can I pick up my grant check now?

    1. Re:Testing Slashdots Methods for Anonymization by drunkennewfiemidget · · Score: 4, Funny

      We wish you could, but we don't know who to write the cheque out to.

    2. Re:Testing Slashdots Methods for Anonymization by countertrolling · · Score: 1

      Cash

      --
      For justice, we must go to Don Corleone
    3. Re:Testing Slashdots Methods for Anonymization by teachknowlegy · · Score: 2, Funny

      Write the check out to Anonymous Coward, duh! When someone produces the ID of the same name, they can have it. Name changes are cheap, aren't they?

    4. Re:Testing Slashdots Methods for Anonymization by stephanruby · · Score: 1

      Sorry Rob,

      With that number next to your "anonymous" name, #34103516, you might as well just have given us your full social security number.

    5. Re:Testing Slashdots Methods for Anonymization by zkp · · Score: 3, Funny
      There are many bits of information we can glean!
      1. Your "anonymous" name, #34103516
      2. Date and Time: (Tuesday, November 02 @ 6:04PM)
      3. You were one of the first posts so you probably read Slashdot often. Also, you probably visit Slashdot regularly around 6:00 PM.
      4. Writing Style: Short messages, funny

        So I could search for regular Slashdot users who tend to be active around 6:00 PM, post brief messages, and are often one of the first to comment. Narrow down that list to users who actually did log in on 11/02/2010. Since, we know that you did read this article there is also a decent chance that you commented on this article with your actual user name.

        We will find you!
    6. Re:Testing Slashdots Methods for Anonymization by Anonymous Coward · · Score: 0

      Not anon at all. Your UID is 666.
      And you stole my account you asshole!

    7. Re:Testing Slashdots Methods for Anonymization by aapold · · Score: 1

      If you cash it, you might get an NSF fee....

      --
      "Waste not one watt!" - CZ
    8. Re:Testing Slashdots Methods for Anonymization by Anonymous Coward · · Score: 0

      We got your time, You done left evidence and all, You are so dumb, for real you are really dumb

      You don’t have to come and confess, We’re lookin for you, We gon find you!

  2. Hmmm by WrongSizeGlass · · Score: 2, Insightful

    I wonder if they could get a larger grant from Google or Facebook or the NSA or [insert large organization name here] to get a guaranteed result of "things are just fine, nothing to see here"?

    1. Re:Hmmm by natehoy · · Score: 1

      No, that would be wrong, of course. They'd never be able to accept a grant. It could never happen. Ever.

      But only because, technically, it's called a bribe, not a grant. If you want to call it a grant, you have to put it in quotes, as in: "I wonder if they could get a larger "grant" from..."

      --
      "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
    2. Re:Hmmm by elrous0 · · Score: 1

      Now why would the NSA be interested in technology that could identify anonymous posters using "textual clues even after explicitly identifiable data has been removed"? That's just silly talk.

      --
      SJW: Someone who has run out of real oppression, and has to fake it.
  3. WTF by Anonymous Coward · · Score: 0

    This is a sheer waste of money.

  4. How can you anonymize without removing meaning by Anonymous Coward · · Score: 0

    You can remove the nominative information but still the data is not anonymous because you can use advanced technique like behaviour analysis to segment your samples back to the individual and then correlate it with some known data about the individual to identify it.

    If you remove enough discriminative information (information the enables you to separate your sample into groups) you data start loosing meanings fast. And I don't see how can you have meaningful data if you removed all the information that would enable you to recreate the individual sample.

    1. Re:How can you anonymize without removing meaning by poena.dare · · Score: 1

      ... in other words, meatloaf!

    2. Re:How can you anonymize without removing meaning by Monkeedude1212 · · Score: 1

      And I don't see how can you have meaningful data if you removed all the information that would enable you to recreate the individual sample.

      You can still segregrate them into groups even if you can't identify the individual sample - which is essentially what happens already. Data miners go and determine "People who like Penny Arcade also like Video games - so lets put an Ad for Fable 3 up on the main page" - whether that is Penny-Arcade's decision to get more click-revenue or whether they just let an adserver handle that obvious piece of info is irrelevant, you are still using relevant data with meaning to market to a large group of people instead of an individual.

      Now - this article brings up the idea of whether I can retain my anonymity online. If it were up to me to run this experiment, I would do exactly as you said, some advanced behaviour analysis technique. First we'll start off here: I'm on Slashdot. You have an alias, and you have a few of my posts. You can tell that they tend get a little long winded sometimes, easily getting to 3 or more paragraphs if there isn't an immediate punchline in sight, or responding to a question. You also get what stories I usually respond to - I often don't have much to say about Linux releases, but I am often avid in the gaming area.

      So you go a lot of the other sites that you can infer slashdotters might frequent. All the tech news sites, and then those towards my posting habits, a lot of gaming sites, yadda yadda yadda. First thing you are looking for is similar aliases, then you cross-refer the posts on different sites to see the similarities. How many Monkeedudes are there on the Gamespy forums? Do any of them make really long posts? He's mentioned on Slashdot that he is Canadian - do any of the other sites have public profile info that say he's Canadian?

      And so on and so forth. This is all automated - so it's much quicker than a person trying to build this file. After it's all built, a human can quickly skim the data and knock off any outliers that might have seemed similar to the computer.

      Now - have I ever mentioned my name anywhere in all the data collected? My age? My city? Can you infer my age given the relative maturity of my posts - and my registered dates and other posts online? Can you infer my city based on my jokes about the weather around here? How hard would it be to nail me to a Facebook page with various likes and dislikes - if that information were available to you (either publicly or for sale?).

      It's a scary world we live in, I don't know if any such systems exist, but I see it as definately technically feasible. It also seems like a great product I could market and make lots of money off of it - but I definately don't believe in progressing that side of the internet.

  5. Did anybody else read.... by tacarat · · Score: 1

    NSFW?

    --
    "Common sense will be the death of us all"
    1. Re:Did anybody else read.... by Anonymous Coward · · Score: 2, Funny

      No, but I read NSF Funds, and thought, why is slashdot doing a story on my wife?

    2. Re:Did anybody else read.... by kmoser · · Score: 2, Insightful

      I read it as "NSFW" and thought the same thing: why is Slashdot doing a story on your wife?

  6. How Benevolent Of The N.S.F. ( +3, Instrusive ) by Anonymous Coward · · Score: 5, Interesting

    "The grant will help to further research from computer scientists and linguists, who are looking at ways in which people can still be identified through textual clues even after explicitly identifiable data has been removed." SHOULD READ

    "The grant will help to further research from computer scientists, linguists, AND the N.S.A. who are looking at ways in which people can still be identified through textual clues even after explicitly identifiable data has been removed."

    Yours In Krasnoyarsk,
    Kilgore T.

  7. NSF by Combatso · · Score: 2, Insightful

    Headline had me thinking the science grants were returned Non Sufficient Funds... thats a sign of a really bad economy.

    1. Re:NSF by bhcompy · · Score: 1

      Yea, seriously. It's the Non/In Sufficient Funds funds

    2. Re:NSF by Combatso · · Score: 1

      you mean everytime i drop an NSF and my bank charges me 20 dollars, that 20 dollars goes to fund Data Anonymization... but there is no way to know for sure, because they didn't catch the guys name who they gave my money to.

    3. Re:NSF by Anonymous Coward · · Score: 0

      What bank charges only 20 dollars for NSF? All the banks I know of charge at least twice that.

    4. Re:NSF by Philomage · · Score: 1

      Even after getting that it was about the National Science Foundation providing funding for a research grant, I was still reading (for a while) to see what it had to do with kiting cheques. :-/ "You can take the nerd out of the trailer park..."

  8. Interesting Spin by Anonymous Coward · · Score: 3, Informative

    The research is actually into data mining, not some new forms of encryption/anonymization.

    I'm sure the results will provide insight that may lead to better anonymization, but I bet framing the whole thing around the more popular side of that spectrum makes it sell better.

  9. Your tax dollars going to waste! by Anonymous Coward · · Score: 0

    If private enterprise won't fund it, it isn't worth doing. Kill the NSF! Kill the DOE! Privatize NOAA & NIST!

    Sincerely,

    Citizen Tea

  10. Good or bad? by kwbauer · · Score: 1

    So, is this a good development or a bad development? If the finding better ways to identify people leads to better ways to remove that information then it is better?

    Or is it better because it will help us not remain anonymous when we donate to our favorite cause and that organization is in some way involved in US politics?

  11. The EFF already did this... by Anonymous Coward · · Score: 0

    Panopticlick already showed that it was child's play to track somebody, even with cookies disabled. Unless the way websites/browsers work is fundamentally changed, this will continue to be the case.

  12. link to NSF grant by Danathar · · Score: 1
  13. Simple equation by shoehornjob · · Score: 1

    governemnt entity(CIA+NSA)* national security + keylogger or trojan = we ownz all your base (where base = data). Anonymiztion HAH.

    --
    "We are just a war away from Amerikastan. When god vs god the undoing of man." Dave Mustaine
  14. Re:truly anonymous data is often useless by Anonymous Coward · · Score: 0

    truly anonymous data is often useless

    And that differs from fully-attributed data HOW?

  15. Identification After Anonymization by Anonymous Coward · · Score: 0

    can be achieved with conceptual clustering with galois lattices.

    You can now send me a cashier's check for the sum of Euro 100,000,000,000.

    Thanks in advance.

    Yours In Akademgorodok,
    Kilgore T.

  16. Google translate? by mveloso · · Score: 1

    For better anonymization, you could run the data though google translate a few times. That'll guarantee that it's anonymized.

  17. 1.5 million by Anonymous Coward · · Score: 0

    1.5 million won't do that much at all! They are going to need alot more that that.
    dating contacts