Linguistics Identifies Anonymous Users

← Back to Stories (view on slashdot.org)

Linguistics Identifies Anonymous Users

Posted by Soulskill on Tuesday January 8, 2013 @07:21PM from the that's-why-i-run-my-emails-through-google-translate-a-few-times dept.

mask.of.sanity writes "Researchers have examined writing styles to identify previously anonymous carders and hackers operating on underground forums. Up to 80 percent of users who wrote at least 5000 words across their posts could be identified using linguistic techniques. Techniques such as stylometric analysis were used to track users who posted across different forums, and could even be used to unveil authors of thesis papers or blogs who had taken to underground networks."

19 of 215 comments (clear)

Min score:

Reason:

Sort:

Anonymous First Post by Anonymous Coward · 2013-01-08 19:23 · Score: 5, Informative

Anonymous First Post... you'll never guess who I am
1. Re:Anonymous First Post by Anonymous Coward · 2013-01-08 19:25 · Score: 4, Funny
  
  4990.5 more words please.
2. Re:Anonymous First Post by Anonymous Coward · 2013-01-08 20:09 · Score: 5, Funny
  
  Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec in tincidunt nisi. Vivamus quis ligula non lorem feugiat congue ut a ipsum. Vivamus iaculis elementum tellus eget ullamcorper. Nam sed lacus at felis volutpat egestas. Aliquam hendrerit mauris a felis fringilla tristique. Proin commodo eleifend leo suscipit pulvinar. Praesent velit lectus, venenatis ac volutpat vitae, scelerisque sed diam. Integer eu felis quis erat ultricies sodales. Etiam eu turpis massa. In vel velit nec purus tristique vestibulum. Cras eleifend diam ut dolor facilisis convallis. Morbi velit ligula, aliquam vitae ullamcorper et, dapibus sed augue. Nullam euismod urna in purus condimentum suscipit. Fusce dolor magna, dictum quis elementum quis, mollis in sem. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla imperdiet lectus sit amet risus interdum vel congue odio venenatis. Proin lobortis urna ac tortor auctor id porttitor urna auctor. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Integer viverra consequat nisl, ac adipiscing dui feugiat quis. Ut ut tortor urna. Pellentesque velit orci, mollis eu venenatis quis, convallis nec risus. Donec quis enim ac ante placerat accumsan. Fusce ut erat in tortor ullamcorper aliquam. Aenean ut est turpis. Nam ut elit justo. Suspendisse potenti. Praesent et nulla eget sem interdum pellentesque. Nunc sagittis metus sed mauris lacinia consequat. Fusce velit velit, semper at euismod a, euismod vitae enim. Vivamus elementum commodo faucibus. Suspendisse dictum rutrum leo at lobortis. Nam ac lectus id velit hendrerit rutrum vitae at mauris. Integer quis ante ullamcorper dui gravida auctor eu ut lectus. Curabitur laoreet sapien at tortor elementum consectetur. Etiam faucibus tempor sem, sed ultricies felis semper eget. Suspendisse odio lacus, interdum eu rhoncus ut, iaculis vitae enim. Morbi egestas ultricies lorem at tempus. Donec iaculis purus vel tellus cursus elementum. Nulla fermentum vulputate lorem sit amet pellentesque. Nunc quam lacus, consectetur et convallis non, pharetra dapibus diam. Maecenas laoreet ornare vehicula. Phasellus vitae odio diam. Ut facilisis nisi eu sapien elementum sit amet molestie arcu consectetur. Nulla in tortor urna, in elementum tellus. Maecenas convallis nunc purus, eget pretium purus. Suspendisse nec nibh ac augue condimentum adipiscing quis et lorem. Integer eget lorem velit. Nullam volutpat metus sit amet ante feugiat ac cursus sem congue. Pellentesque dolor nulla, facilisis id hendrerit eget, commodo eu urna. Donec ut interdum nibh. Sed nunc nisi, commodo non congue vitae, tempus ut ligula. Donec massa dui, viverra eget tempus ut, ornare eu ligula. Proin quis posuere diam. Phasellus at risus quam, id cursus odio. Sed fermentum, tortor eu iaculis sollicitudin, erat augue ornare nisi, eu mattis neque massa ac odio. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Sed varius, orci eget rhoncus egestas, mi nisl mattis sapien, non ultrices nulla elit porta nunc. Praesent mauris lectus, ultrices at interdum quis, euismod accumsan arcu. Pellentesque in dolor libero, vitae tincidunt dui. Nunc rhoncus ante in nulla sagittis ullamcorper. Curabitur velit odio, tempus sit amet lobortis eu, condimentum sit amet massa. Maecenas convallis facilisis arcu, quis accumsan velit tincidunt ut. Vivamus a ante orci, at mattis nibh. Nulla in diam est, vitae semper purus. Donec ut odio augue. Etiam tempor ultricies luctus. Quisque fringilla tincidunt rutrum. Phasellus et justo ut lorem imperdiet semper. Maecenas et justo lectus, ac dictum dui. Morbi sit amet venenatis neque. Donec interdum enim vel velit commodo pulvinar. Aenean nisl erat, bibendum id tincidunt sed, sagittis sagittis mi. Curabitur dui urna, venenatis id placerat nec, consectetur sagittis mi. Phasellus eleifend condimentum lorem et blandit. Pellentesque at lorem nisl, quis ullamcorper nisi. Suspendisse potenti. In id orci massa, in hendrerit ligula. Nunc elementum mi in nisl posuere ut tincidunt nibh placerat. Mauris venen
3. Re:Anonymous First Post by Anonymous Coward · 2013-01-08 20:52 · Score: 5, Funny
  
  I identified you. You are Cicero.
4. Re:Anonymous First Post by girlinatrainingbra · 2013-01-08 20:55 · Score: 5, Interesting
  
  Sock puppet accounts are also apparent from these linguistic tics. Sometimes, resorting to a particular analogy or getting hot-tempered at a specific topic or a certain kind of point of view can also give away the identity of the author. So maybe limit oneself to 5000/144 = 34 tweets per tweeter account so that you can't be figured out. And writing style and favorite kinds of rant was also how the Unabomber was found out: his family members recognized his particular pet peeves and rants and writing patterns and sent their suspicions in to the F.B. I.
5. Re:Anonymous First Post by Will.Woodhull · 2013-01-09 03:04 · Score: 3, Interesting
  
  I used to post anonymously much more often, when I had a job with a guvmint agency and a young famly to protect. I do not bother with that much any more. I am not invulnerable, but for the most part I know that I look like too small a fish to be worth going after.
  That said, I still occasionally post anonymously when I want to antagonize the astroturfers, Scientology nuts, etc. Especially on slashdot if I am concerned that my post might damage my karma.
  Interesting things to do when posting anonymously:
  Use a thesaurus to choose synonyms you would not ordinarily use.
  L33t 5p33k
  Write like Hemmingway. Keep all sentences short. Sentences that do not have subordinate clawses do not have much style to analyse.
  Use creative misspellings. "claws" for "clause", etc.
  Use Google Translate to do a multilingual hash: translate your work into Russian, then the Russian version back to English. "The spirit is willing but the flesh is weak" becomes "The wine is passable but the meat has gone bad."
  Ideally, Anonymous will develop a set of tools that will rewrite any text into one of half a dozen different styles. Let the authorities chase after these six fictional characters.
  
  --
  Will
6. Re:Anonymous First Post by Hotawa+Hawk-eye · 2013-01-09 04:41 · Score: 3, Insightful
  
  Nothing, as long as you have a large enough corpus of the framee's writing. If the framee is your friend, this probably isn't a problem. If they're a public figure, maybe not a problem (depending on how much editing and PRing their written statements undergo before they are released.) If they're $RANDOM_PASSERBY, not so easy.
  I think a more common usage would be to tweak your own writing just so it doesn't sound like you. Write something you don't want identified as your (the test sample), check it against a corpus of your own written work. If it detects as your work, rough up the test sample until it doesn't. This would be an easier problem than the framing case since you're not trying to make it look like a specific other person's work, you're trying to make it look like it's ANYONE else's (you don't really care whose) work.
Re:Damit by nospam007 · 2013-01-08 19:33 · Score: 4, Funny

"They know who I am. I will now have to type in random styles."
But not in Gangnam Style or they'll think you're Korean.
I wrote a letter to the CEO once by Omnifarious · 2013-01-08 19:35 · Score: 5, Interesting

I worked for a smallish (but not incredibly tiny, maybe 100 employees) company and wrote a letter to the CEO once. We'd been castigated by someone who'd taken over the local office because the company was doing poorly. A number of austerity measures were implemented. I did not find those to be that annoying because I realized it was either that or not have a job. But the castigation didn't sit well with me. We were in trouble because of the decisions of a few bad managers, not the behavior of average employees.
So I wrote a letter about it. He stripped my name off and presented it in an executive meeting to all the people directly under him. He asked "Why am I getting letters like this?". Everybody who worked in my office immediately knew who it was. I had a distinctive writing voice, and a strong reputation.
It did not lead to me being fired. I was actually highly respected there. It led to me being encouraged to have an honest sit-down talk with the new manager for our division (the guy who'd made the speech I wasn't happy about). I think we both came away from that meeting a lot happier about the other.
But that was a strong lesson to me. If I ever really want to be anonymous I'm going to have to purposely work on adopting a completely different writing style. And I will have to keep a wall up between styles and never 'slip'.

--
Need a Python, C++, Unix, Linux develop
1. Re:I wrote a letter to the CEO once by Omnifarious · 2013-01-08 20:25 · Score: 3, Interesting
  
  I've thought about that. That's an interesting and tricky problem. Though, if there's a program that can detect it, that means the patterns are codified well enough that you can write a program to obscure them. The problem is, what about the program that detects these patterns that you don't know the implementation of? Will you actually be fooling it?
  Of course, you have the same problem if you adopt a different writing style. Is it different enough? Is something essential slipping through?
  You could use both techniques. Have a program assist you in avoiding the use of certain words when using one voice and the use of others when using a different voice.
  
  --
  Need a Python, C++, Unix, Linux develop
2. Re:I wrote a letter to the CEO once by Anonymous Coward · 2013-01-09 03:04 · Score: 3, Informative
  
  I give you the subject of my term paper that landed me top marks at forensic linguistics:
  (tl,dr yes there is software that does precisely that Jstylo+Anonymouth)
  https://psal.cs.drexel.edu/index.php/JStylo-Anonymouth
  http://www.youtube.com/watch?v=-b0Ta9h62_E
I recognise my own writing by kawabago · 2013-01-08 19:40 · Score: 3, Insightful

I'd be rather surprised if someone else couldn't.
I can't think of a non-evil use for this by joshamania · 2013-01-08 19:55 · Score: 5, Interesting

This is so bad I don't know where to begin. There is nothing, ever, that excuses this. For every zodiac crazy serial killer or copyright scofflaw they try to apply this to (and fail) there will be thousands and thousands of people that will be persecuted by organizations and governments for expressing their opinions. While this won't have a big effect in the West for half a generation, oppressive governments are going to be all over this.
And then, in ten or fifteen years, the youth will have grown with this technology and become accustomed to it...accepting it. Just like facebook has been accepted.
I'd move to Mars when it's possible but some bureaucrat will analyze everything I've ever written on the interwebz (and I've been mostly not stupid about shit I've written online since 1995 or so) and make some arbitrary decision about how I'm not acceptable because I'm not a huge fan of authority or some such crap.
Way to go humanity.
1. Re:I can't think of a non-evil use for this by aaaaaaargh! · 2013-01-08 22:07 · Score: 5, Informative
  
  Are you serious?
  You write as if some new method had been invented. There is no news in the above article. Authorship identification has been a reliable tool for many decades, a whole branch of linguistics (forensic linguistics) deals with it and similar topics like dialect recognition. Under certain circumstances you can even identify personality treats of the author, check out content analysis software like LIWC for example.
  And, yes, plenty of serial killers and blackmailers have been captured with the help of these methods.
google translate by sl149q · 2013-01-08 20:03 · Score: 4, Interesting

One way to change a bunch of the stylistic queues would be to convert your message to another language and back using Google Translate. Depending on the intermediate language(s) and possibly using different translators should neutralize some things.
1. Re:google translate by sdnoob · 2013-01-08 20:08 · Score: 5, Funny
  
  using chinese as an intermediary will give you text written by motherboard manual writers. perfect cover.
2. Re:google translate by MysteriousPreacher · 2013-01-09 02:08 · Score: 3, Funny
  
  Please to make explaining in swiftness.
  
  --
  -- Using the preview button since 2005
Re:College essays by ForgedArtificer · 2013-01-08 20:35 · Score: 4, Insightful

Actually, it's the exact opposite.
Anti-plagiarism software searches for the same content with completely different styles.
Writer identification involves searching for the same style amongst completely different content.

--
The right to offend is central to the right to free speech.
No actual result in TFA? by toutankh · 2013-01-08 23:05 · Score: 3, Interesting

After reading TFA I cannot find any convincing experimental validation. I see a lot of "can" and conditional tense (maybe that's the author's style), but nothing on the validation of the approach. Where is the experimental data, including the number of anonymous users correctly and incorrectly identified on forums?