When Writing, How Anonymous Can You Be, Really?
An anonymous reader writes "Do you still think your online writing is, basically, anonymous? Think again! Research has it people put much of their personal traits into their writing, and computers may just be able to pick them up. That's at least what a recently announced competition on author identification (Given a document, who wrote it?) and author profiling (Given a document, what are its author's age and gender?) wants to find out. Alas, re-using other people's writing is no solution either; there's also a competition on plagiarism detection (Given a document, is it an original?). Wanna revisit your recent rants?"
As previously reported on Slashdot. Now, please identify me. Here's a hint: I have a 5 digit UID.
Most people would just use something like Tor (or Tor and another VPN/proxy service). If they wanted to be absolutely sure, they could probably host their own hidden Tor Bridge somewhere and connect to it via VPN (or even with Tor itself, depending on level of paranoia).
>throw machine at 4chan /mlp/
>"Identify!"
>all posters sound the same
>machine concludes all posters are part of a highly advanced AI
>machine becomes depressed that it will never create anything wonderful like the spaghetti threads or
>kills itself
>mfw
Based on the above, who am I?
Like facial recognition.... I am sure this works wonderfully when it only has 10 or 20 exemplars to compare against, but it fails miserably as it scales up. Good luck conclusively identifying an author when there are over a million profiles to potentially match with.
Google thinks I'm a 20 year old male. I'm in my early thirties and a gal. I think visiting Slashdot so much throws off its algorithm, as does all the video game sites I hang out at. You'd think the searches for things like "gel nails" might tip them off, but it's probably further confused by my lack of visits to Pinterest.
I'd be interested to see if this program can do any better at analyzing my writing than Google does analyzing my search history.
Occasionally living proof of the Ballmer peak.
This would have been a lot more fun about two months ago to detect paid political astroturfers.
The ultimate AI-ish application would be an astroturfer plugin for chrome probably called "AstroturfBlock". So the site is a "tech" site, the contents are pure politics, and the text analysis system indicates an unemployed liberal arts degree holder... Go ahead and block it.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
That way I can muddy the waters by creating extraneous sets of data that can't be ruled out.
Once and forall, IP is bullshit, because you can easily use a computer to forge all peoples un-original derivative work that is really just a result of their environement, upbringing and brainwashing.
P.S. This rant was written in a different style and with better spelling and grammer then most of my other AC rants.
One example are the company performance surveys, that are supposed to be anonymous. I cant answer questions like 'how do you think the company leadership is doing' without effectively giving away who I am - my opinion is based on my position, and thus is easily inferred.
"But remember, most lynch mobs aren't this nice." (H.Simpson)
-- Joe
Of course, authors can use these tools too, and then iteratively change their texts until they cannot be correctly identified or profiled.
Just like spammers can check whether their e-mails ends up in spam filters before sending them.
It will be a never-ending cat and mouse game.
If Pandora's box is destined to be opened, *I* want to be the one to open it.
When Writing, How Anonymous Can You Be, Really?
No.
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
well?
It wouldn't be that hard to write a script that would randomly swap your words with ones from a thesaurus run through Berkeley's FrameNet so it makes sense. Boom, statistically impossible to detect the same author.
Additionally, with a little more effort you could alter your sentence length and swap prepositional phrases around with some pattern-matching algorithms.
n/t
PS: Your posts are shit.
I can tell a girl's age based on her trim.
prbly ez to id me.
Can we use this technology to have all the people who answer the following question in this way arrested?
"Is a or b true?"
"Yes."
Cast as wide of a net as possible, please. I'll take a high proportion of false positives in order to get the people who think this joke is funny off the street.
I wanna know who why the lucky stiff is
I bet it's Shakspeare.
As a professional writer, I wish to be less anonymous. Hello, New Yorker?
As one of billions who are exposed, I doubt that I will attract any attention regardless of this technology. Perhaps they will figure out who really wrote Shakespeare's plays, but surely they will devote fewer resources to the rest of us.
...omphaloskepsis often...
Because nobody would bother to go that far to find out who said anything that I say anonymously. If I was really worried about someone figuring out it was me I'd be a lot more careful.
We can all (I hope) recognise authors quotes whom we have some familiarity even if we haven't read the passage in question before. Terry Pratchet quotes for instance stand out a mile, Frank Herbert can be identified by the fact that he'll use the word 'subtle' at least twice a paragraph. Even here on /. certain posters styles identify them without having to read their UID, Girlintraining is an example (for me at least), hell I can spot her posts purely based on the responses to her posts for gods sake.
With the privacy arms race going on right now on the internet, identifying people based on what they write *and* their style, is not only the magic bullet for Big Brother, but quite acheivable given a big enough sample,
In a cybernetic fit of rage she pissed off to another age...
The problem with anonymity is that we have become addicted to digital..well..everything. Once you have the data in a digital format it is merely a matter of algorithms, storage, and computational power to pretty much wring whatever you want out of the data. I was a loud mouth Libertarian for quite a few years.. I ranted and threw in my 2 cents at a lot of places online.. then things like the att closet data capture and facebook image recognition started popping up and the writing on the digital wall was pretty much done. I expect nothing I do digitally not to be intercepted, databased, scanned, weighted saved for future use. Imho the only real option for any privacy is not to make it digital in any way including cell phones, land lines, or any other type mass communication.. but thats just me and my tinfoil hat..
they use the analysis to identify a small range of who to watch to find certain confirmation they have the right guy
law enforcement tools are not limited only to 100% certain ones. the fuzzy ones are used to narrow down a list of targets, where law enforcement's limited manpower can be better spent to find certain confirmation
if you live in a country with good law enforcement, this is hollywood fantasy and/ or paranoid schizophrenia, not reality. you want to actually catch the actual perp because you actually want to prevent crimes
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Kevin Bacon.
It is truely hard to be truely anonymous online. It can still be done, but you have to be an expert at it, and not mess up at all.
\
Hint, even if you go to a cyber cafe, Starbucks, or sit in a hotel parking lot, odds are you have an electronic device (or your car) that connect and ping home, or at least is logged some place.
I'm not sure if that was a lame attempt at a joke or if you meant Francis Bacon.
Timothy's put-downs have been getting a lot of undeserved attention recently. For starters, I don't care what others say about Timothy. He's still nasty, two-faced, and he intends to dig a grave in which to bury liberty and freedom. Now stay with me a moment here; I am making a point. Specifically, if my own experience has taught me anything, it's that he thinks that he's a tribune of the oppressed. However, his endeavors are so lewd that they are easily taken up and assimilated by spiteful, fork-tongued authoritarians, whose intellectual level corresponds to the material offered. On the other hand, I admit I have a tendency to become a bit insensitive whenever I rebuke Timothy for trying to lay all of society open to the predations of organized criminality. While I am desirous of mending this tiny personality flaw, Timothy has made it known that he fully intends to emphasize the negative in our lives instead of accentuating the positive. If those words don't scare you, nothing will. If they are not a clear warning, I don't know what could be. Let me conclude by saying that we who want to deal summarily with unscrupulous snobs will not rest until we do.
are only afforded by the rich, connected and well-armed. For the others, be careful what you say, anywhere.
New Economic Perspectives
Nobody cares enough to identify me by my writings.
Isn't most of what we say pure vanity anyhow?
Oh shit!.....I mean, oh darn!
Table-ized A.I.
Which is pointless if your actual writing can give you away. No amount of Tor/VPN or whatever, will do anything useful if your actual writing itself can lead back to you. If I use every anonymity trick in the book, the gig is still up the second I say "Hi, I'm Bob Smith, of 6424 N. 22nd Street, Akron OH".
Sure, you could make a magical anonymous internet, but it defeats the purpose of trying to disseminate whatever your writing to an audience, unless your only going for a very small, select audience of people using the same scheme. And even then, if others could access it, you still might not be anonymous.
A patriot must always be ready to defend his country against his government. -edward abbey
The only thing that Barr did correctly was look up WHOIS info on the People's Liberation Front's website after an Anonymous guy claimed to be "Supreme Commander" of the PLF... When Barr confronted him, the guy claimed it was a joke, so Barr pointed to an innocent man instead. (Ars Tech article on the 'correct' Commander X.) Otherwise, Barr's tactics -- including analyzing what the people wrote -- gave him completely wrong answers.
Now mostly at Usenet:comp.misc & SoylentNews.org (it's made of people!)
This message is encrypted with triple google translation.
This isn't new news. There have been KDD challenges for this exact subject for years now. I did one at Rutgers in 2005 trying to differentiate the various Suzuki's who published to PubMed. Plenty of other research in this area too: http://scholar.google.com/scholar?q=kdd+challenge+author+identification&hl=en&as_sdt=0&as_vis=1&oi=scholart&sa=X&ei=ZZnOULLcNOHWiwL7t4GACQ&ved=0CDAQgQMwAA
I'm posting as AC.
If you type in a computer that run a program NOT made by you, then you already subject yourself to "invasion of privacy". This is a sample of how many line of code some of the OS in the world:
http:// en.wikipedia.org/wiki/Source_lines_of_code#Example
Have you examine all of those source code for malware/0-day exploits?
Even if you type in the computer that you yourself program it from the beginning. Are you sure the hardware chips that it run, doesn't have a "rough" circuits on it? And are you sure your program doesn't have any 0-day exploit because you forgot something?
I can't stand how every slashdot story submission has to end with a pink flamingo smoke grenade. I'm guessing that sober "just the facts, ma'am" submissions still exist, but rarely make it through the selection hoop of our post-counting overlords.
I have several online pseudonyms which I make an effort to keep separate. I rarely post the same idea under more than one identity. If I post it here, it doesn't go there. I prefer to keep things separate so far as I can. I also have some background in computational linguistics. I've known for fifteen years that there is absolutely no way to win this battle long term. Only the most insipid comments will escape long-term annealing. If the word "gay" is the all season tire on your social media K-car, then your identity is safely concealed within the deep-wank weeds.
If every post you write contains colourful language or idiom such as "all-season tire of deep-wank camouflage" you're toast and you know it, clap your hands. Merely getting my possessives and plurals and possessive plurals right more often than not narrows the net substantially. I might pedantically write Harry S Truman without putting a dot after the S (Snopes: "Although the 'S' was not technically an abbreviation and therefore did not need to be followed by a period, Truman's full name was generally rendered as 'Harry S. Truman' during his lifetime ..."). I make use of colons, semicolons (these come and go), mdash appositives, and parenthetical side-notes--at least one of these in almost every paragraph I write. I post way more links than the average person. My thoughts meander. There is playful use of language with double readings. I subvert cliche to achieve double readings that enable me to circle away from my target, then loop back from an unexpected angle. My unit of thought is the paragraph more so than the sentence.
Even with all those signatures, originality in word selection is my neon tattoo. The corpus analysis algorithms likely don't do much (yet) with originality. Hard to characterize. For a while my anonymity might pass through the gun-metal algorithms unmelded by virtue of my writing being too bright and distinctive and easy to trace. But not for long. Even the fractal filigrees of originality will be coded eventually. (Pay no attention to the alliteration: an accident, not a stylistic signature.)
Frankly, my dear, I don't give a damn.
This is about respect. We all live a double life, pretty much all the time. We speak differently in front of our mothers (most of us) than with the lady-killing rough necks at the peanut bar or power tie horn-dogs at the chichi sushi bar.
I value anonymity because I don't wish to own everything I say on a literal level, stripped of context, devoid of my original conceit or persona.
I happen to regard linearity as a social construct. Humans are not inherently linear in cognition or constitution. We learn how to cultivate linear facades in our areas of competence (but not necessarily around the edges: this is why a competent accountant consults his astrologer Madam Threenipple). If you like the primary facade you have, and it suits all purposes, then I suppose you'll see the charm in proclaiming it from the RealName rafters.
If you're a Baptist homosexual (I've known a few), you might wish to string your public identity by separate ropes.
Or maybe you've just got things to work out. You're figuring things out on the fly and trying them on for size and you don't wish to fall prey to the Joseph McCarthy clean-nose auto-da-fe "have you ever". Implication: Anything you've ever said will be permanently recorded and will classify you irretrievably. This despite 0/1 statistics never passing T-scores. If the same person also has an NRA membership and has been a career employee of the Hoover Institute for two decades? Still a communist. Ten times more dangerous.
The kind of person most willing t
Please formulate this into a query that is computable in polynomial time.
The Unabomber concurs.
About two years into my current job, I was able to guess which of my longer serving colleagues had written or contributed to various anonymous documents and reports floating around the office. The processes are easy; learning what words they use misuse or confuse, who writes in a more formal or a more chatty style, those who seem to be unable to leave out detail or write a precis when appropriate, et.c. What confuses this is copy-editing and the numerous copied passages that are typically found in such documents.
I just have to turn my writing English Finnish, Russian, and, finally, through the back to English again. Analysis software!
Your ad here.
There are 4 simple rules that will help you to avoid this type of identification:
1. Be brief
2. Write seldom
3. Plagiarize!
4. Do not write in your mother tongue.
Tha iss why I only quote other people. It is for you to guess from whom this is comming.
Simple, write in one language and run through online translator to another.
If you only speak one language, write then translate, then pick another translating service and run through that.
I highly doubt that any algorithm is going to be able to figure that one out.............today. The future well who knows.
Now, if they could make their research apply also to conversations involving meme generators or lolcats, that would be something...
Re: see if you can identify alternate accounts girlintraining has used... Challenge accepted!
1 - are you a black gem?
2 - do you wear a training bra?
3 - is your sock a puppet at times?
4 - are you female exclusively, or are your accounts bisexually identifying?
5 - do you have an animal account and a vegetable account to go along with your mineral (onyxruby) account?
I wonder what happened if you used automatic translation, like google translate, to translate to a different language and back. I bet that would make it a lot harder to match to other things you wrote, especially if you used a different intermediate language each time. Having to touch up the obvious errors might still provide a partial "fingerprint" of your writing style though.
This isn't really news. I've been having discussions online since before AOL & Windows 3.1 existed, when the hot things were email lists and Usenet.
Trolls were around even then and once they would get booted off or blocked they would don new aliases, which fooled nobody.
Their style of writing gave them away.
I think back to Neil Stephenson's Baroque Cycle, where when he would change POV from lead character to lead character, he seemed to change his writing style. It felt like a collaboration of authors, not just one. I suspect those three books would confuse the analysis, and I wonder what other books/authors would also likely confound their analysis.
Everybody under Pax Americana Rule will have to pay a price for seriously challenging whatever the ruling class and their minions currently do. Of course it is vastly different than in Russia, China or the Arab/Muslim tyrannies. They won't lock you up as easily as it is done in Turkey. They won't shoot you as in Russia.
But "free" speech is not totally "free". Rumors and Lies about you will come at a cost. Non-violent Intimidation will exact some cost on your life.
As an exercise, start with campaigning about the armed theft of land by Israel and make yourself identifiable. Then watch the rumors about you grow, watch the "terrorist collaborator" lies spread. Wait for the intel people sitting behind you in the train and making nasty remarks about you.
THAT is why using TOR is a very, very good idea. And I don't subscribe to the notion that "they can break it anyway, so it is useless" - that's the FUD spread by those who want to eavesdrop.
..why do you use a handle ???
..don't ever use pseudonyms. Go for Anonymous Coward and TOR all the time. Restart TOR on a regular basis.
People have been analyzing style with manual work to uncover authors of written unknown or intentionally anonymous speakers for hundreds of years before the Internet ever existed.
All of the sudden you apply this same concept to computer algorithms against useless blog postings and somehow this magically becomes something new noone ever thought about before?
As if humans are incapable of manually adjusting style to thwart automated correlation if they so desire. Machine algorithms are notoriously bad at dealing with false information.
I think We The Geeks should grow much more sceptic and cynical. If you really think you have to take things literally, you are essentially screwed. Then "socially intelligent" use codewords all the time. It is their way to control those who are not inducted into their world of codewords.
Reflect about what happens around you and think about social dynamics, about hidden agendas about wholly egotistic motives. Be as cynical as possible. Analyze the people-programmers (priests). Only then you will understand what is really going on.
This is a world of nastyballs and you have to understand and handle this if you intend to survive.
Do you seriously think Google Translate is not part of the NSAGOOGLE system ? They will certainly log(and store forever) each and every input and output of their service. So that "obfuscation" works only against the lesser enemies; those who are not allied with NASGOOGLE. And these are generally speaking incompetent guys anyway.
I stopped using my Slashdot account to make posts a few years ago for this exact reason. I realized that at some point author identification software would be good enough that any large body of text would be uniquely traceable to who wrote it. Slashdot posts never go away. By making all posts from my account, I was making a huge body of text all neatly tied together. That doesn't matter now, but it will eventually. Someone is going to make a "look up person X" site that uses such technology and then it will be as easily available as Google is now - you won't even need person X'es name, photo or any details about X. All you will need is a large text sample that X wrote.
Slashdot themselves could probably still tie my anonymous posts together if they log IPs, browser settings and such. I wouldn't be surprised if spy agencies the world over log a lot of the traffic across their part of the internet, specifically in the US. They might even have a list of "potentially embarrassing or incriminating things person X said/wrote/did" for every X who is online, including things X posted that X thought was anonymous (like what I'm writing right now). Would be interesting indeed if someone went Bradley Manning on that database.
Forget about anonymity. Assume anything recorded, even if it appears anonymous now, is going to be sent to everyone you know with your name on it the day before your wedding.
Gracias! Pardon my misunderstanding, and Thanks for telling me. The first thing that came to my mind was M.I.T. the school in Massachusetts. (My dad said that when he first moved to San Diego he kept misreading the "S.D." initials in the local paper as "South Dakota" and kept wondering why they would print so much news about South Dakota here in California. His "L.A." frame of mind must have rubbed off on me) Good luck with your Masters Degree!