Slashdot Mirror


Anonymous Cowards, Deanonymized

mbstone writes "Arvind Narayana writes: What if authors can be identified based on nothing but a comparison of the content they publish to other web content they have previously authored? Naryanan has a new paper to be presented at the 33rd IEEE Symposium on Security & Privacy. Just as individual telegraphers could be identified by other telegraphers from their 'fists,' Naryanan posits that an author's habitual choices of words, such as, for example, the frequency with which the author uses 'since' as opposed to 'because,' can be processed through an algorithm to identify the author's writing. Fortunately, and for now, manually altering one's writing style is effective as a countermeasure." In this exploration the algorithm's first choice was correct 20% of the time, with the poster being in the top 20 guesses 35% of the time. Not amazing, but: "We find that we can improve precision from 20% to over 80% with only a halving of recall. In plain English, what these numbers mean is: the algorithm does not always attempt to identify an author, but when it does, it finds the right author 80% of the time. Overall, it identifies 10% (half of 20%) of authors correctly, i.e., 10,000 out of the 100,000 authors in our dataset. Strong as these numbers are, it is important to keep in mind that in a real-life deanonymization attack on a specific target, it is likely that confidence can be greatly improved through methods discussed above — topic, manual inspection, etc."

32 of 159 comments (clear)

  1. First by Bicx · · Score: 4, Funny

    First! Analyze this anon comment, suckers!

    1. Re:First by Macthorpe · · Score: 5, Funny

      Got you! Using the power of de-anonymisation, I have discovered there you are none other than...

      Bicx!

      This stuff really works.

      --
      "It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien
    2. Re:First by FriendlyLurker · · Score: 5, Interesting

      This just begs a "reanonymize" browser plugin to alter one's writing style...

    3. Re:First by Anonymous Coward · · Score: 5, Funny

      "Now imagine 20 or more people doing the same."

      Then I wouldn't want to do any different.

      -- Ethanol-fueled

    4. Re:First by hairyfeet · · Score: 4, Interesting

      Yes but just like speech patterns folks got a habit of using similar phrases which I'm sure this picks up. For example I use folks where some would use people or persons, or if I think something is lame I often say it "Sucks the big wet titty" and often make reference to the south and southerners since that is my area. I'm sure if it went through every post of every place where I have the same UID (which is most of the places I hang out) it could then very easily either find my real name (Thanks to Yahoo comments using real first names and not UIDs) and any other places where I use a different UID quite trivially.

      In the end we humans are creatures of habit, we easily fall into patterns and routines and if its one thing computers excel at its pattern matching so frankly this doesn't surprise me at all and given a little time to tweak it I wouldn't be surprised if they have 95%+ accuracy if given a large enough data set of a suspected poster. So you might pick up ONE of my phrases, hell maybe even two, but I seriously doubt you'd pick up enough of my mannerisms that this thing would mistake Ethanol Fueled for Hairyfeet or vice versa.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    5. Re:First by mrclisdue · · Score: 2

      Bah!

      Who needs software when we've got at least two dudes here who can identify hundreds of folks known as the Great Bonchime, or something like that....

      cheers,

    6. Re:First by hairyfeet · · Score: 3, Interesting

      But wouldn't that just butcher the flow? I mean a trivial way to do it would be to run it through a translator, say take your English, convert it to German, then have it converted back to English, and you'd have this Chingrish kinda speech that was kinda sorta similar to what you said but not. Would you really want your ideas that mangled? Hell why even post at all if nobody is gonna understand you clearly?

      --
      ACs don't waste your time replying, your posts are never seen by me.
    7. Re:First by garyebickford · · Score: 2

      Hell why even post at all if nobody is gonna understand you clearly?

      Well, this seems to work for about 1/2 the comments on slashdot! :D

      --
      It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
    8. Re:First by lightknight · · Score: 5, Interesting

      And easily-defeated. One of the projects of my senior class at university was the building of software to defeat that kind of detection. It was crafted primarily so dissidents in foreign countries could speak without fear, by analyzing the author's writing patterns, and offering solutions to shift the writing to a different style.

      --
      I am John Hurt.
    9. Re:First by TheLink · · Score: 2

      Even when the post is coherent and clear, half the time the people replying don't seem to be able to read and understand it correctly either :).

      --
    10. Re:First by TheLink · · Score: 3, Funny

      Imagine thousands of accounts doing the same thing then slashdot = stagnated.

      Anyway why do you cower? what are you afraid of?

      Wait a minute... ;)

      --
    11. Re:First by NatasRevol · · Score: 2

      The Yoda mod?

      --
      There are two types of people in the world: Those who crave closure
  2. better way. by Anonymous Coward · · Score: 5, Interesting

    This is, of course, not really new.

    A couple of years ago, there was some news (cannot find the link now) that some researchers tried this with a more statistical approach. As an implementation they used a compression algorithm.

    I had a try with this on a forum. Somebody posted a long story anonymously, but I suspected the author. I gathered 10 posts from 5 authors, including the suspect. Then I cut the amount of text to equal length. Subsequently I added the anonymous text to each of the 10 samples and bzipped the resulting text.

    The resulting zipped file was shortest in the case where I added the unknown text to the samples from the suspected author. The bzip algorithm apparently decided there was more similarity between the posts.

    Although this was by no means a real scientific test, I turned out to be correct and was rather pleased with the result. Seems to me such an approach could also be useful for things. Why login on /. when it can just figure out who you are based on what you have just written?

    To maintain anonimity you would just have to insert random shit into your posts.

    Bonus points for the slashdotter who can deduce my identity based on the non-randomness of this post.

    1. Re:better way. by History's+Coming+To · · Score: 2

      According to I Write Like you're H.P. Lovecraft. Ha, take that!

      --
      Please consider this account deleted, I just can't be bothered with the spam anymore.
    2. Re:better way. by pjt33 · · Score: 2

      I think they may need to work on that a bit. I just tested three samples of my writing, all in a similar style, and got three different authors.

  3. Tool to improve your writing skills by bigsexyjoe · · Score: 4, Interesting

    If it can identity you based on your idiosyncrasies, I suppose that means writers could use software based on these techniques to identity the idiosyncrasies in their own writing. From there, they can learn new ways to express themselves and write in a more colorful and varied manner.

    Heck, it can even be a tool that teaches you to think in a more varied manner.

    1. Re:Tool to improve your writing skills by FatLittleMonkey · · Score: 2

      If it can identify the idiosyncrasies in your writing, it can identify them in others'. I wonder if it can alter your "anonymous" controversial rant to look like that other.

      --
      Science is all about firing a drunk pig out of a cannon just to see what happens.
    2. Re:Tool to improve your writing skills by rednip · · Score: 2

      I've seen it in my own writings on this forum, as of late. Currently I'm actually trying to 'stay away from' using such words as 'such' (damn!). I also try to reconsider transitions like 'also' and 'however', but obviously it doesn't always work out well. In particular, such notable words are especially awkward when used twice in a single paragraph, as well as a 'double qualifier'. Single quotes can also be to 'notable', as I tend to over use them as well and I've been told of my 'addiction' to commas, but I think that I'm ok with those.

      --
      The force that blew the Big Bang continues to accelerate.
  4. change your posting style.. by ardiri · · Score: 4, Interesting

    if your stupid enough to not change your posting style when trolling, your own bad.

  5. Re:Not cool. by delinear · · Score: 4, Insightful

    What basic expectation of privacy is there on the internet? The misguided belief that there is privacy is a huge problem for society. If we all acted on the internet as if we had zero expectation of privacy there's a chance we might take security more seriously, or that people might actually be civil toward one another.

  6. Re:Fark you Jane, you ignorant slut by hairyfeet · · Score: 2

    Well if you think about it the REALLY scary question is not how well this works but whether the courts will accept it. Anybody remember bullet fingerprinting? That was where they supposedly could match a bullet to a specific batch so they could tell if a bullet came from a certain pack of shells or not? We all know now it was total bullshit and that variations even in the same lots could be pretty wide simply because the bullet manufacturers simply weren't that anal retentive about purity as long as the round went straight but that junk science put untold numbers of people in PMITA prison.

    Now what if the courts accept this as evidence? Some troll could copy pasta phrases from your actual posts and stitch them together to make them say something else and if they can trip this thing all this technobabble like bullet fingerprinting sells REAL well to juries who sit around watching CSI. Frankly after false flags like fast and furious I wouldn't even trust the feds not to decide to "frame the guilty man" or decide you must be the guy so make the evidence fit. Frankly this is why shows like CSI scare me, all this technobabble sells well to juries who frankly don't understand WTF this crap is, only that it looks high tech like something from CSI therefor it MUST be true.

    --
    ACs don't waste your time replying, your posts are never seen by me.
  7. meh by ewrong · · Score: 2

    Je pense que cela peut être facilement évité.

    1. Re:meh by newcastlejon · · Score: 3, Funny

      Apparently not everyone can hablar francés, either.

      --
      If God forks the Universe every time you roll a die, he'd better have a damned good memory.
  8. Re:Tool to improve your writing skills - exists by rarrar · · Score: 5, Interesting

    Schools already use programs like "White Smoke" and http://www.whitesmoke.com/ and "Style Writer" http://www.stylewriter-usa.com/ to identify grammar errors and stylistic errors, and suggest corrections. These programs are able to identify active and passive voice, clarity and readability of writing, ambiguous words, gender specific words, cliches, and more. I'm not sure the use of such software is such a great idea. I guess it's OK as long as a teacher reviews the results. Then again, if the teacher doesn't do as good a job as the program does...

  9. This is why by Higgins_Boson · · Score: 4, Funny

    This is why I practice non-redundancy. Redundancy is too redundant, so constantly repeating words and/or redundant phrases becomes a redundant factor in helping people to determine who you are on the internet when you post as an anonymous coward redundantly.

    Remember, kids, practice redundant privacy measures to ensure you will never be exposed.

  10. floxinoxinihilipilification by Oswald+McWeany · · Score: 3, Funny

    Damn! I'll have to stop using floxinoxinihilipilification so much in my anonymous posts or people will know it's me!

    Using the logic proposed in the article- can we assume that all the anonymous cowards using "the other f word" are all Samuel L Jackson?

    --
    "That's the way to do it" - Punch
  11. Re:Software to change writing style by Oswald+McWeany · · Score: 2

    Arright la, I warant eh Scouse filta by tomorra.

    --
    "That's the way to do it" - Punch
  12. i help admin a small town web discussion site by circletimessquare · · Score: 2

    around 2,000 users

    #1. the smaller the town , the pettier the politics

    #2. there is one user we keep banning, and they keep coming back under a new name, and you can always tell with 100% accuracy that it is the same person, based on sentence cadence and agenda, and overall personality and attitude

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
  13. right author 100% of the time* - Gravatar by QuasiSteve · · Score: 2

    I've mentioned this before, but it's worth repeating as more and more services no longer use their own identity systems, relying instead on Gravatar, or doing away with their own comments system by relying on Disqus (which uses Gravatar).

    In the case of sites using Gravatar incorrectly*, which is pretty much all of them, 'anonymous' posts still have their Gravatar ID attached - which is just an MD5 of the person's e-mail address. All you then need to do is find that same MD5 on another site where the author opted not to post anonymously.

    The main reason this ties into the story at hand is in getting reference material together. With e.g. Disqus, you can be reasonably assured (unless account sharing occurs) that the anonymous post with MD5 X on site A is authored by the same person as that of the anonymous post with MD5 X on site B, and you can include both in the pool of reference material.
    ( This also means there are issues with anonymity even if the author always posts 'anonymous'. )

    * The worst part of this is the website owners. Aside from letting anonymous posts still grab their results from Gravatar (even if you don't have a Gravatar 'account', the e-mail address you use will be the MD5 in the HTML), some sites implement Gravatar as an afterthought. You could have been posting to a site for years behind a pseudonym, knowing that you're reasonably anonymous - and then find your pseudonym, and all the posts made, linked to other posts at other sites because the website owner decided to use Gravatar to display users' avatars of choice, using the e-mail address in their account.

    Gravatar is a useful service, especially in that the website can save some bandwidth, and the users who do want it can just update a single avatar and have that immediately be used on any site that uses the service.

    But I implore webmasters to consider seriously the ramifications of using Gravatar or Disqus, and at least:
    1. Disallow Gravatar on posts, profiles, etc. that were created before your implementation of Gravatar.
    2. Create an opt-in system for the use of Gravater, per-profile.
    3. Disable the Gravatar code when the post author has indicated that they want to post anonymously.
    4. If implementing Disqus, make clear that its service may not adhere to your site's own privacy policies, and posting anonymously is a faÃade.

    Much the same applies to other login, profile, and comment consolidation/aggregation/syndication systems (such as facebook's), but especially in the case of Gravatar, which requires no user interaction such as a login or existing valid login state), it is all too easy to think only of the benefits.

  14. The circumvention? Plagarism! by Kuukai · · Score: 2

    Go all T.S. Elliot on their asses and build your posts entirely out of things other people have said. First post overlord gritsneal!

    --
    Sendou Wave Kick!!
  15. This is called stylometry by Lillesvin · · Score: 2

    Stylometry on Wikipedia. Some linguists have been doing it for years and in some cases with more success, but apparently it's only newsworthy when someone outside of linguistics writes about it. (Why yes, I'm a linguist. How did you know?)

    --
    "Live free or don't."
  16. Re:butcher the flow by hairyfeet · · Score: 2

    While it has some similarities to Jamaican migrant workers its a LOT more slang heavy. in fact what really makes it a bitch is depending on the region you may have as many as FOUR different kinds of slang mixed in! You'll have black slang, poor white trash slang, and in MS you'll often get Creole slang mixed in there as well. I'd give you a sample but i'm afraid i wasn't joking, I actually DO have to have a translator if I stop around Yazoo, its too slang heavy. At least with the migrant workers its usually English they are mangling, with bottoms talk they are mangling Mexican slang, Creole slang, as well as white and black English. hell if you go down by chemical row they even have some African slang from the Gambian workers they have down there, its so mangled it IMHO is more of a mess than a language.

    --
    ACs don't waste your time replying, your posts are never seen by me.