Slashdot Mirror


Linguistics Identifies Anonymous Users

mask.of.sanity writes "Researchers have examined writing styles to identify previously anonymous carders and hackers operating on underground forums. Up to 80 percent of users who wrote at least 5000 words across their posts could be identified using linguistic techniques. Techniques such as stylometric analysis were used to track users who posted across different forums, and could even be used to unveil authors of thesis papers or blogs who had taken to underground networks."

32 of 215 comments (clear)

  1. Anonymous First Post by Anonymous Coward · · Score: 5, Informative

    Anonymous First Post... you'll never guess who I am

    1. Re:Anonymous First Post by Anonymous Coward · · Score: 4, Funny

      4990.5 more words please.

    2. Re:Anonymous First Post by Anonymous Coward · · Score: 5, Funny

      Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec in tincidunt nisi. Vivamus quis ligula non lorem feugiat congue ut a ipsum. Vivamus iaculis elementum tellus eget ullamcorper. Nam sed lacus at felis volutpat egestas. Aliquam hendrerit mauris a felis fringilla tristique. Proin commodo eleifend leo suscipit pulvinar. Praesent velit lectus, venenatis ac volutpat vitae, scelerisque sed diam. Integer eu felis quis erat ultricies sodales. Etiam eu turpis massa. In vel velit nec purus tristique vestibulum. Cras eleifend diam ut dolor facilisis convallis. Morbi velit ligula, aliquam vitae ullamcorper et, dapibus sed augue. Nullam euismod urna in purus condimentum suscipit. Fusce dolor magna, dictum quis elementum quis, mollis in sem. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla imperdiet lectus sit amet risus interdum vel congue odio venenatis. Proin lobortis urna ac tortor auctor id porttitor urna auctor. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Integer viverra consequat nisl, ac adipiscing dui feugiat quis. Ut ut tortor urna. Pellentesque velit orci, mollis eu venenatis quis, convallis nec risus. Donec quis enim ac ante placerat accumsan. Fusce ut erat in tortor ullamcorper aliquam. Aenean ut est turpis. Nam ut elit justo. Suspendisse potenti. Praesent et nulla eget sem interdum pellentesque. Nunc sagittis metus sed mauris lacinia consequat. Fusce velit velit, semper at euismod a, euismod vitae enim. Vivamus elementum commodo faucibus. Suspendisse dictum rutrum leo at lobortis. Nam ac lectus id velit hendrerit rutrum vitae at mauris. Integer quis ante ullamcorper dui gravida auctor eu ut lectus. Curabitur laoreet sapien at tortor elementum consectetur. Etiam faucibus tempor sem, sed ultricies felis semper eget. Suspendisse odio lacus, interdum eu rhoncus ut, iaculis vitae enim. Morbi egestas ultricies lorem at tempus. Donec iaculis purus vel tellus cursus elementum. Nulla fermentum vulputate lorem sit amet pellentesque. Nunc quam lacus, consectetur et convallis non, pharetra dapibus diam. Maecenas laoreet ornare vehicula. Phasellus vitae odio diam. Ut facilisis nisi eu sapien elementum sit amet molestie arcu consectetur. Nulla in tortor urna, in elementum tellus. Maecenas convallis nunc purus, eget pretium purus. Suspendisse nec nibh ac augue condimentum adipiscing quis et lorem. Integer eget lorem velit. Nullam volutpat metus sit amet ante feugiat ac cursus sem congue. Pellentesque dolor nulla, facilisis id hendrerit eget, commodo eu urna. Donec ut interdum nibh. Sed nunc nisi, commodo non congue vitae, tempus ut ligula. Donec massa dui, viverra eget tempus ut, ornare eu ligula. Proin quis posuere diam. Phasellus at risus quam, id cursus odio. Sed fermentum, tortor eu iaculis sollicitudin, erat augue ornare nisi, eu mattis neque massa ac odio. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Sed varius, orci eget rhoncus egestas, mi nisl mattis sapien, non ultrices nulla elit porta nunc. Praesent mauris lectus, ultrices at interdum quis, euismod accumsan arcu. Pellentesque in dolor libero, vitae tincidunt dui. Nunc rhoncus ante in nulla sagittis ullamcorper. Curabitur velit odio, tempus sit amet lobortis eu, condimentum sit amet massa. Maecenas convallis facilisis arcu, quis accumsan velit tincidunt ut. Vivamus a ante orci, at mattis nibh. Nulla in diam est, vitae semper purus. Donec ut odio augue. Etiam tempor ultricies luctus. Quisque fringilla tincidunt rutrum. Phasellus et justo ut lorem imperdiet semper. Maecenas et justo lectus, ac dictum dui. Morbi sit amet venenatis neque. Donec interdum enim vel velit commodo pulvinar. Aenean nisl erat, bibendum id tincidunt sed, sagittis sagittis mi. Curabitur dui urna, venenatis id placerat nec, consectetur sagittis mi. Phasellus eleifend condimentum lorem et blandit. Pellentesque at lorem nisl, quis ullamcorper nisi. Suspendisse potenti. In id orci massa, in hendrerit ligula. Nunc elementum mi in nisl posuere ut tincidunt nibh placerat. Mauris venen

    3. Re:Anonymous First Post by Anonymous Coward · · Score: 5, Funny

      I identified you. You are Cicero.

    4. Re:Anonymous First Post by girlinatrainingbra · · Score: 5, Interesting

      Sock puppet accounts are also apparent from these linguistic tics. Sometimes, resorting to a particular analogy or getting hot-tempered at a specific topic or a certain kind of point of view can also give away the identity of the author. So maybe limit oneself to 5000/144 = 34 tweets per tweeter account so that you can't be figured out. And writing style and favorite kinds of rant was also how the Unabomber was found out: his family members recognized his particular pet peeves and rants and writing patterns and sent their suspicions in to the F.B. I.

    5. Re:Anonymous First Post by mrbester · · Score: 2

      We can narrow it down to someone who is particular about correct capitalisation (and therefore probably spelling, punctuation and grammar) denoting an education and attention to detail not normally seen in forum posts. As this is a more technical forum you most likely program in a language where letter case is of paramount importance and have done so for at least 5 years in a professional position. You probably also write reports indicating a level of seniority.

      That should reduce the number of likely candidates somewhat.

      --
      "Wait. Something's happening. It's opening up! My God, it's full of apricots!"
    6. Re:Anonymous First Post by water-and-sewer · · Score: 2

      Classic - kudos to you for a great laugh. I was thinking though, "this study doesn't help much because it's rare to find places where people write more than a line or two anymore."

      Go back to the old days of Usenet (80s, early 90s) and posts were long, well thought-out, and useful. Look at OLGA, for example, which collected written music in TAB format for guitarists (ha - remember when THAT was the biggest threat to the music industry?). Tons of useful stuff. Hardly anyone does that anymore; it's mostly short sentences. The exceptions - like tech forums - are in situations when no one cares much to be anonymous anyway.

      It's been tough to get people to pay attention to the forum at www.dictatorshandbook.net for two reasons: I think people are reticent to opine on various dictators, all of whom might put them in jail, and because hardly anyone posts on forums anymore (yes, I know, there are some exceptions). Look at the length of the average comment on a Reddit thread, for example - a line or two, sometimes just a word or two.

      --
      If this were Usenet, I'd killfile the lot of you.
    7. Re:Anonymous First Post by girlinatrainingbra · · Score: 2

      Here are the two relevant paragraphs from the Wikipedia article on the Unabomber that shows why it was the "manifesto", the writing style and the writing contents that were key in the family suspecting his involvement/"identity". They occur at the Search section of the article: Before the publication of the manifesto, Theodore Kaczynski's brother, David Kaczynski, was encouraged by his wife Linda to follow up on suspicions that Ted was the Unabomber.[77] David Kaczynski was at first dismissive, but progressively began to take the likelihood more seriously after reading the manifesto a week after it was published in September 1995. David Kaczynski browsed through old family papers and found letters dating back to the 1970s written by Ted and sent to newspapers protesting the abuses of technology and which contained phrasing similar to what was found in the Unabomber Manifesto.[78]
      Prior to the publishing of the manifesto, the FBI held numerous press conferences requesting the help of the public in identifying the Unabomber. They were convinced that the bomber was from the Chicago area (where he began his bombings), had worked or had some connection in Salt Lake City, and by the 1990s was associated with the San Francisco Bay Area. This geographical information, as well as the wording in excerpts from the manifesto that were released prior to the entire manifesto being published, was what had persuaded David Kaczynski's wife, Linda, to urge her husband to read the manifesto.

    8. Re:Anonymous First Post by Will.Woodhull · · Score: 3, Interesting

      I used to post anonymously much more often, when I had a job with a guvmint agency and a young famly to protect. I do not bother with that much any more. I am not invulnerable, but for the most part I know that I look like too small a fish to be worth going after.

      That said, I still occasionally post anonymously when I want to antagonize the astroturfers, Scientology nuts, etc. Especially on slashdot if I am concerned that my post might damage my karma.

      Interesting things to do when posting anonymously:

      Use a thesaurus to choose synonyms you would not ordinarily use.

      L33t 5p33k

      Write like Hemmingway. Keep all sentences short. Sentences that do not have subordinate clawses do not have much style to analyse.

      Use creative misspellings. "claws" for "clause", etc.

      Use Google Translate to do a multilingual hash: translate your work into Russian, then the Russian version back to English. "The spirit is willing but the flesh is weak" becomes "The wine is passable but the meat has gone bad."

      Ideally, Anonymous will develop a set of tools that will rewrite any text into one of half a dozen different styles. Let the authorities chase after these six fictional characters.

      --
      Will
    9. Re:Anonymous First Post by Hotawa+Hawk-eye · · Score: 3, Insightful

      Nothing, as long as you have a large enough corpus of the framee's writing. If the framee is your friend, this probably isn't a problem. If they're a public figure, maybe not a problem (depending on how much editing and PRing their written statements undergo before they are released.) If they're $RANDOM_PASSERBY, not so easy.

      I think a more common usage would be to tweak your own writing just so it doesn't sound like you. Write something you don't want identified as your (the test sample), check it against a corpus of your own written work. If it detects as your work, rough up the test sample until it doesn't. This would be an easier problem than the framing case since you're not trying to make it look like a specific other person's work, you're trying to make it look like it's ANYONE else's (you don't really care whose) work.

  2. Re:Damit by nospam007 · · Score: 4, Funny

    "They know who I am. I will now have to type in random styles."

    But not in Gangnam Style or they'll think you're Korean.

  3. I wrote a letter to the CEO once by Omnifarious · · Score: 5, Interesting

    I worked for a smallish (but not incredibly tiny, maybe 100 employees) company and wrote a letter to the CEO once. We'd been castigated by someone who'd taken over the local office because the company was doing poorly. A number of austerity measures were implemented. I did not find those to be that annoying because I realized it was either that or not have a job. But the castigation didn't sit well with me. We were in trouble because of the decisions of a few bad managers, not the behavior of average employees.

    So I wrote a letter about it. He stripped my name off and presented it in an executive meeting to all the people directly under him. He asked "Why am I getting letters like this?". Everybody who worked in my office immediately knew who it was. I had a distinctive writing voice, and a strong reputation.

    It did not lead to me being fired. I was actually highly respected there. It led to me being encouraged to have an honest sit-down talk with the new manager for our division (the guy who'd made the speech I wasn't happy about). I think we both came away from that meeting a lot happier about the other.

    But that was a strong lesson to me. If I ever really want to be anonymous I'm going to have to purposely work on adopting a completely different writing style. And I will have to keep a wall up between styles and never 'slip'.

    1. Re:I wrote a letter to the CEO once by Omnifarious · · Score: 3, Interesting

      I've thought about that. That's an interesting and tricky problem. Though, if there's a program that can detect it, that means the patterns are codified well enough that you can write a program to obscure them. The problem is, what about the program that detects these patterns that you don't know the implementation of? Will you actually be fooling it?

      Of course, you have the same problem if you adopt a different writing style. Is it different enough? Is something essential slipping through?

      You could use both techniques. Have a program assist you in avoiding the use of certain words when using one voice and the use of others when using a different voice.

    2. Re:I wrote a letter to the CEO once by GNious · · Score: 2

      Write it in a different language, then run it through 5 different translation engines across a dusin languages, ending in which-ever is the native language of the recipient.... that should throw them for a loop.

    3. Re:I wrote a letter to the CEO once by Pieroxy · · Score: 2

      Just Google translate it to and from any language other than English.

      the problem is, the meaning might be gone as well by the time it's English-y again.

    4. Re:I wrote a letter to the CEO once by famebait · · Score: 2

      Only you would do that.

      --
      sudo ergo sum
    5. Re:I wrote a letter to the CEO once by Renraku · · Score: 2

      Ahh, but real dialogue can get one into trouble when dealing with the political minded. You see, there are those out there that are not working towards the same goals as you. Even if you're a part of the same team and of the same company, there are those that think the illusion of them being correct is more important than the welfare of the team.

      It can be difficult to have a truly open dialogue with people of this sort, as they are quick to attack your reputation or pull rank and have you removed from the equation altogether. Imagine a World War I commanding officer that orders wave after wave of soldiers to run into the meat grinder of overlaid and well protected machine gun fire, and when it disastrously fails, they do it again. Those that complain are ordered into said meat grinder. The corporate world is no different.

      I think a bigger threat to geeks in business are when they approach such situations without due caution. If you make a claim, you must be prepared to back it up to everyone that could be interested. Real concrete evidence. References. Citations. Etc. Basically, the idea is to sell your idea rather than to challenge theirs or the one in place.

      --
      Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
    6. Re:I wrote a letter to the CEO once by Anonymous Coward · · Score: 2, Insightful

      And Google (a.k.a "The Evil Empire" TM) will have a cached copy of the original with the IP address you posted from. In other words you'll also need to go through the magic 7 proxies !

    7. Re:I wrote a letter to the CEO once by Anonymous Coward · · Score: 2, Insightful

      I think a bigger threat to geeks in business are when they approach such situations without due caution. If you make a claim, you must be prepared to back it up to everyone that could be interested. Real concrete evidence. References. Citations. Etc.

      And that IS approaching the situation without due caution. Geeks think that having real concrete evidence means that other people must believe you. Real world people are not like that, especially the political minded ones. Evidence be damned, political minded people play power games without regard to reality, all the way until the company bankrupts, then they play their game elsewhere.

      Approaching with due caution means you must first prepare by finding someone more powerful to back you up, and be ready to find another job even so.

      The OP survived the episode because he implicitly have the CEO's backing, as the CEO challenged the managers (i.e. already publicly shown that he agreed there was a problem with some manager). Had the CEO simply quietly sent a copy of the letter out to the managers and told them to "deal with it", the OP would likely have been fired or forced to leave.

    8. Re:I wrote a letter to the CEO once by Anonymous Coward · · Score: 3, Informative

      I give you the subject of my term paper that landed me top marks at forensic linguistics:
      (tl,dr yes there is software that does precisely that Jstylo+Anonymouth)
      https://psal.cs.drexel.edu/index.php/JStylo-Anonymouth
      http://www.youtube.com/watch?v=-b0Ta9h62_E

  4. Re:Detect this by Anonymous Coward · · Score: 2, Funny

    Well your left handed with your frequent use of left keys.
    You have small hands given the fact that you were able to press w with out pressing e immediately.
    The fact that you have said you look forward to our anonymous overlords or a Beowulf cluster of AC means your reasonably intelligent for Slashdot.
    Your not aggressively hassling the editor, previous poster, or the writer. Signifying your female.
    You have too much time on your hands posting on Slashdot.

      http://www.complex.com/girls/2009/08/sexy-southpaws-the-10-hottest-left-handed-women/page/11

    Your Oprah.

  5. I recognise my own writing by kawabago · · Score: 3, Insightful

    I'd be rather surprised if someone else couldn't.

    1. Re:I recognise my own writing by trev.norris · · Score: 2

      iz hard 2 change how u speek?

  6. Y U NO MAKE SENSE by Nossie · · Score: 2

    "Leetspeak, an alternative alphabet popular in some forum circles, cannot be translated."

    *sigh* does this mean I must resent people that use this form of communication less?

    I'm not so sure I can stoop so low.

  7. I can't think of a non-evil use for this by joshamania · · Score: 5, Interesting

    This is so bad I don't know where to begin. There is nothing, ever, that excuses this. For every zodiac crazy serial killer or copyright scofflaw they try to apply this to (and fail) there will be thousands and thousands of people that will be persecuted by organizations and governments for expressing their opinions. While this won't have a big effect in the West for half a generation, oppressive governments are going to be all over this.

    And then, in ten or fifteen years, the youth will have grown with this technology and become accustomed to it...accepting it. Just like facebook has been accepted.

    I'd move to Mars when it's possible but some bureaucrat will analyze everything I've ever written on the interwebz (and I've been mostly not stupid about shit I've written online since 1995 or so) and make some arbitrary decision about how I'm not acceptable because I'm not a huge fan of authority or some such crap.

    Way to go humanity.

    1. Re:I can't think of a non-evil use for this by aaaaaaargh! · · Score: 5, Informative

      Are you serious?

      You write as if some new method had been invented. There is no news in the above article. Authorship identification has been a reliable tool for many decades, a whole branch of linguistics (forensic linguistics) deals with it and similar topics like dialect recognition. Under certain circumstances you can even identify personality treats of the author, check out content analysis software like LIWC for example.

      And, yes, plenty of serial killers and blackmailers have been captured with the help of these methods.

  8. google translate by sl149q · · Score: 4, Interesting

    One way to change a bunch of the stylistic queues would be to convert your message to another language and back using Google Translate. Depending on the intermediate language(s) and possibly using different translators should neutralize some things.

    1. Re:google translate by sdnoob · · Score: 5, Funny

      using chinese as an intermediary will give you text written by motherboard manual writers. perfect cover.

    2. Re:google translate by MysteriousPreacher · · Score: 3, Funny

      Please to make explaining in swiftness.

      --
      -- Using the preview button since 2005
  9. College essays by nightgeometry · · Score: 2

    Isn't this just the same software that college use to detect plagiarism and whether someone else wrote that essay for you? I thought it was in common use in academia.

    --
    The best is the enemy of the good
    1. Re:College essays by ForgedArtificer · · Score: 4, Insightful

      Actually, it's the exact opposite.

      Anti-plagiarism software searches for the same content with completely different styles.

      Writer identification involves searching for the same style amongst completely different content.

      --
      The right to offend is central to the right to free speech.
  10. No actual result in TFA? by toutankh · · Score: 3, Interesting

    After reading TFA I cannot find any convincing experimental validation. I see a lot of "can" and conditional tense (maybe that's the author's style), but nothing on the validation of the approach. Where is the experimental data, including the number of anonymous users correctly and incorrectly identified on forums?