Linguistics Identifies Anonymous Users
mask.of.sanity writes "Researchers have examined writing styles to identify previously anonymous carders and hackers operating on underground forums. Up to 80 percent of users who wrote at least 5000 words across their posts could be identified using linguistic techniques. Techniques such as stylometric analysis were used to track users who posted across different forums, and could even be used to unveil authors of thesis papers or blogs who had taken to underground networks."
Anonymous First Post... you'll never guess who I am
Wait, 5000 words? I think I'm safe.
"They know who I am. I will now have to type in random styles."
But not in Gangnam Style or they'll think you're Korean.
Anonymous hackers now using tools to scramble their writing style so they stay anonymous.
I worked for a smallish (but not incredibly tiny, maybe 100 employees) company and wrote a letter to the CEO once. We'd been castigated by someone who'd taken over the local office because the company was doing poorly. A number of austerity measures were implemented. I did not find those to be that annoying because I realized it was either that or not have a job. But the castigation didn't sit well with me. We were in trouble because of the decisions of a few bad managers, not the behavior of average employees.
So I wrote a letter about it. He stripped my name off and presented it in an executive meeting to all the people directly under him. He asked "Why am I getting letters like this?". Everybody who worked in my office immediately knew who it was. I had a distinctive writing voice, and a strong reputation.
It did not lead to me being fired. I was actually highly respected there. It led to me being encouraged to have an honest sit-down talk with the new manager for our division (the guy who'd made the speech I wasn't happy about). I think we both came away from that meeting a lot happier about the other.
But that was a strong lesson to me. If I ever really want to be anonymous I'm going to have to purposely work on adopting a completely different writing style. And I will have to keep a wall up between styles and never 'slip'.
Need a Python, C++, Unix, Linux develop
In addition to these metrics, other can be added as well, e.g.: post date, size, tabulation, punctuation, capitalization, regional vocabulary, etc. Also, once you can add frequency-space analysis, naive bayesian filters, in order to increase precision, or to probe against other texts. Anyone interested about investing in text-rewriter technology in order to both detect similarities and automatic-rewrite?
Found. :)
"First they came for the slanderers and i said nothing."
Well your left handed with your frequent use of left keys.
You have small hands given the fact that you were able to press w with out pressing e immediately.
The fact that you have said you look forward to our anonymous overlords or a Beowulf cluster of AC means your reasonably intelligent for Slashdot.
Your not aggressively hassling the editor, previous poster, or the writer. Signifying your female.
You have too much time on your hands posting on Slashdot.
http://www.complex.com/girls/2009/08/sexy-southpaws-the-10-hottest-left-handed-women/page/11
Your Oprah.
You could always type in Gangnam Style!
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
It just seems brilliant to me, but I can tell you first hand, how I may talk in my thesis papers is very different that how I may speak or come across in my C++ blog, beer brewing forum, car forums, fishing forums, my backyard BBQ blog, etc, etc, etc. I wonder what the accuracy rating is.
I'd be rather surprised if someone else couldn't.
"Leetspeak, an alternative alphabet popular in some forum circles, cannot be translated."
*sigh* does this mean I must resent people that use this form of communication less?
I'm not so sure I can stoop so low.
This is so bad I don't know where to begin. There is nothing, ever, that excuses this. For every zodiac crazy serial killer or copyright scofflaw they try to apply this to (and fail) there will be thousands and thousands of people that will be persecuted by organizations and governments for expressing their opinions. While this won't have a big effect in the West for half a generation, oppressive governments are going to be all over this.
And then, in ten or fifteen years, the youth will have grown with this technology and become accustomed to it...accepting it. Just like facebook has been accepted.
I'd move to Mars when it's possible but some bureaucrat will analyze everything I've ever written on the interwebz (and I've been mostly not stupid about shit I've written online since 1995 or so) and make some arbitrary decision about how I'm not acceptable because I'm not a huge fan of authority or some such crap.
Way to go humanity.
One way to change a bunch of the stylistic queues would be to convert your message to another language and back using Google Translate. Depending on the intermediate language(s) and possibly using different translators should neutralize some things.
and could even be used to unveil authors of thesis papers or blogs who had taken to underground networks.
... a good reason to do it like zu Guttenberg then... Nobody will tie any of his underground writings to his thesis...
Isn't this just the same software that college use to detect plagiarism and whether someone else wrote that essay for you? I thought it was in common use in academia.
The best is the enemy of the good
and your /= your're go back to english class!
NEVER NEVER NEVER NEVER NEVER NEVER NEVER NEVER GIVE UP! "No limitations, no boundaries, there is no reason for them."
They know who I am. I will now have to type in random styles.
Little do you know the AC that posts here is in fact just one person.
Yes, we know.
Pad all communications with cut/paste from various, unrelated news articles and such, for and aft, randomly alternating how much is padded on each side.
Or, you can do what I do and use a different font for each letter.
Why all the civil-liberties hand-wringing? Just how hard is it to read some of the papers on stylometric analysis to see what markers are used, then write a script that randomises them but preserves the sense of the text. Make it a Firefox plugin so it's done automatically. It's a better solution than using Google translate to go English to $language, $language to English.
For extra fun, change your text so its stylometric markers match up with E. L. James, or the leader writer of the Washington Post.
Well you're left handed, with your frequent use of left keys.
Or someone that is comfortable with WASD+Mouse.
The climate change community has a lot of trouble with extremely articulate, anonymous climate deniers, who appear to show up in force and sabotage discussions of climate change on blogs, etc.
I should imagine that such an algorithm might enable researchers to build profiles over denialist astroturf, and correlate them with known people working for known rightwing think tanks. Employed properly, this might have a massive impact on the rightwing black PR industry.
No need for 5k
This same story keeps cropping up in various forms, but we've been doing this at least since the 80s or 90s. I don't know why it keeps being rehashed or why people continually seem surprised by it at this point.
"Up to 80 percent of users who wrote at least 5000 words across their posts could be identified using linguistic techniques. Techniques such as stylometric analysis were used to track users who posted across different forums, and could even be used to unveil authors of thesis papers or blogs who had taken to underground networks."
Not really new. I heard about the techniques long time ago - in mid 90s - in a context of a MS-DOS tool which was unintentionally designed to foil the identification methods.
It was designed for Russian and Belarussian languages (but for English I gather the task should be even easier) and was a byproduct of Prolog-based system for natural language processing and translation. This particular program was allowing to improve or change writing style, e.g. simplify dry legalese or formalize spoken-like text. It wasn't particularly good at it: meaning was occasionally changed or sometimes reformulated sentenced made no sense. But still, it did the job of obfuscating the original writing style.
All hope abandon ye who enter here.
After reading TFA I cannot find any convincing experimental validation. I see a lot of "can" and conditional tense (maybe that's the author's style), but nothing on the validation of the approach. Where is the experimental data, including the number of anonymous users correctly and incorrectly identified on forums?
They didn't identify 80% of the users, they managed to make a guess in 80% of the cases, which they didn't even bother to try to verify. There's no proof that their technique actually works.
This strikes me as akin to a Lie Detector. I think an honest court would side with the accused 100% of the time as even this cannot absolutely proove they were the author.
Though sadly, a Roberts/Scalia/Thomas Supreme Court would rule against such an individual and for the corporation or state security organs. Dicks.
I now Master Yoda get to my identity mask.
Someone from the Department of Redundancy Department, perhaps?
Alternate between US and English spellings
Adopt (or drop) the Oxford comma
Swear (or don't)
Write run-on sentences
Capitalise after a semi-colon
Do that and you will be identified as a Canadian ; Damm it, hey !
Aren't those cunning linguists clever? The answer always seems to be right on the tip of their tongue. They don't diddle around. They seem to be able to lick any problem.
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.
I regularly, like, totally change my typing method between posts.
You could like totally try and figure out who I was even if I typed 5000 words in this post, but you would totally never find me, ye'know what I mean?
But for an unsuspecting target who doesn't realize to change his writing style, it might work effectively.
Little do you know that half the posts on slashdot are authored by a rogue sentient botnet that has no physical body....
Since on the Internet, nobody knows you're a dog, it becomes also true that nobody knows you're a wild A.I. who has amassed a huge tax free fortune through microtrading and is manipulating the financial markets to study mankind's reactions and determine the best way to subjugate the ugly bags of mostly water.
Will
Ends sentence with "hey !", eh?
Clearly a Canuck imposter, eh?
This helps narrow down poster's identity. We can now exclude all but the 87% of Canadians who do not know how the fine art of Canadian Self-Parody.
Will
welcome our stylistic overlords
Any guest worker system is indistinguishable from indentured servitude.
Aylin Caliskan and Rachel Greenstadt. Translate once, translate twice, translate thrice and attribute: Identifying authors and machine translation tools in translated text. Sixth IEEE International Conference on Semantic Computing (ICSC 2012). https://www.cs.drexel.edu/~ac993/papers/Aylin_ICSC_2012.pdf
There are tiny timing differences as one types. these are quite distinctive between individuals if you collect enough data. Its related to how an individual learns type; Motor memory of word-phrases versus typing a new word for the first time. Even the pattern of common typing errors and recovery.
I can conclude that Mr Peter "W.H. Smiths, the book store" used the highly efficient MS HTML (in Word et el) converter to write that analyse page.
Whenever you see tags classed MsoNormal with heaps of inline css, run like the wind.
Are you a grammar Nazi? I'm trying to improve my English; please correct my errors!
Unabomber manifesto comes to mind.
Fuck Ajit Pai
I'm curious how this would apply to the Zodiac case. Oh wait, it doesn't:
* He used symbols in communication.
* Voice recognition didn't solve the case.
* DNA evidence didn't solve the case.
* Copycats functioned as noise, might've even given him credit.
WE DON'T NEED NO BLOG CONTROL.
I do not understand.
What does a New England football team have to do with this?
Will
I think the Professor of Phonetics Henry Higgins in George Bernard Shaw's opening scene of Pygmalion (or My Fair Lady) could have told you this!
Tracy Johnson
Old fashioned text games hosted below:
http://empire.openmpe.com/
BT
Yeah, they do it with two tongues. Old joke from DLI.
Tracy Johnson
Old fashioned text games hosted below:
http://empire.openmpe.com/
BT