Slashdot Mirror


When Writing, How Anonymous Can You Be, Really?

An anonymous reader writes "Do you still think your online writing is, basically, anonymous? Think again! Research has it people put much of their personal traits into their writing, and computers may just be able to pick them up. That's at least what a recently announced competition on author identification (Given a document, who wrote it?) and author profiling (Given a document, what are its author's age and gender?) wants to find out. Alas, re-using other people's writing is no solution either; there's also a competition on plagiarism detection (Given a document, is it an original?). Wanna revisit your recent rants?"

40 of 184 comments (clear)

  1. Re:Yes, we know by Anonymous Coward · · Score: 3, Funny

    I got this one. You sir are Anonymous Coward, with UID 00666. Now, what prize do I get for this?

  2. Re:Guess who I am! by Sasayaki · · Score: 2

    Moot? Is that you?

    --
    Check out my sci-fi book "Lacuna" at http://goo.gl/MVxX8
  3. Uh huh... by Anonymous Coward · · Score: 3, Interesting

    Like facial recognition.... I am sure this works wonderfully when it only has 10 or 20 exemplars to compare against, but it fails miserably as it scales up. Good luck conclusively identifying an author when there are over a million profiles to potentially match with.

    1. Re:Uh huh... by Mitreya · · Score: 2

      Like facial recognition.... I am sure this works wonderfully when it only has 10 or 20 exemplars to compare against, but it fails miserably as it scales up. Good luck conclusively identifying an author when there are over a million profiles to potentially match with.

      Or like fingerprints that start giving off larger number of false-positives when compared against a large enough database of entries.

      Consider this: they don't have to conclusively identify the original author. It will be good enough to find someone with similar writing (i.e. also a subversive) and charge them instead of the original perpetrator. And good luck proving that you didn't write that

      Mmmm, a national database of writing samples collected from everyone in school... that sounds like fun.

    2. Re:Uh huh... by russotto · · Score: 2

      Mmmm, a national database of writing samples collected from everyone in school... that sounds like fun.

      I never thought not doing my homework would pay off so well :-).

      There's definitely going to be false positives. I've seen other people's writing that was nearly word-for-word identical with my own, and there's no way they saw mine (nor I theirs) before writing it.

    3. Re:Uh huh... by Kjella · · Score: 2

      Quoting from that WP page:

      which led to his brother and his wife recognizing Kaczynski's style of writing and beliefs from the manifesto

      It's a whole different thing to recognize a person's beliefs - if possibly in a more extreme form - than what they've written on an entirely different subject. Quite possibly they recognized specific examples, theories, arguments or conclusions he had used as well. I'd wager this was 99% content and 1% style which really clinched that it wasn't some other crazy nut bag with the same ideas. I recently ran into one online that had some rather unique conspiracy theories, if they started showing up anywhere else it'd 99.99% sure be the same guy. He could write a whole book and I wouldn't recognize him on writing style alone though.

      --
      Live today, because you never know what tomorrow brings
    4. Re:Uh huh... by Runaway1956 · · Score: 2

      The larger the sample of a person's writings, the more accurate this thing will become, of course. The nature of the writings will also influence the accuracy. In school, even an essay is going to be very similar to other people's essays, as they are unlikely to contain a lot of original thought. Everyone is doing their best to feed the teacher the responses that they believe the teacher wants to be fed.

      Now, if your ex girlfriend were to give these researchers everything that you ever wrote to her, there would likely be more original work, giving more insight into your mental processes, than your homework ever contained.

      Even more revealing, would be any discourses that you have ever written on politics or philosophy. With those, you would be revealing one hell of a lot about your mind, and how it works. Given a hundred pages of such musings and ramblings, you would be pegged pretty accurately, and a genuine researcher wouldn't mistake anyone else for you.

      --
      "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
    5. Re:Uh huh... by Anonymous Coward · · Score: 3, Interesting

      Well put.

      As a test, I just looked through my own posts on slashdot and selected a four word string I use pretty often that seemed somewhat unique, but not obviously so.

      I combined that string (in quotes) with site:slashdot.org on Google. At least two of the results returned in the first page were me, made over the course of the last few weeks.

      Now of course there are others that used that in their posts, but had someone picked that string from something I posted AC they'd know there was a good chance it was me. And they'd have my real name, website, etc.

  4. Can it beat Google? by sandytaru · · Score: 4, Insightful

    Google thinks I'm a 20 year old male. I'm in my early thirties and a gal. I think visiting Slashdot so much throws off its algorithm, as does all the video game sites I hang out at. You'd think the searches for things like "gel nails" might tip them off, but it's probably further confused by my lack of visits to Pinterest.

    I'd be interested to see if this program can do any better at analyzing my writing than Google does analyzing my search history.

    --
    Occasionally living proof of the Ballmer peak.
    1. Re:Can it beat Google? by monkeyhybrid · · Score: 4, Funny

      Thank you for updating your age and gender details in our databases.

      Yours sincerely,
      Google.

    2. Re:Can it beat Google? by Kergan · · Score: 2

      I second that. According to Google, I'm an old, obese dude in desperate needs for new abs and viagra. Go figure.

    3. Re:Can it beat Google? by Anonymous Coward · · Score: 5, Insightful

      Google thinks I'm a 20 year old male. I'm in my early thirties and a gal. I think visiting Slashdot so much throws off its algorithm, as does all the video game sites I hang out at.

      I think you misunderstand the purpose of the algorithm. A writing sample is, of course, insufficient to detect your age and gender precisely.

      There is a good chance that your writing style matches that expected of a male in their twenties, in which case the algorithm had done well. You may be a gal, but your interests and behavior is perhaps more similar to that of a male in their twenties, and for the purposes of predicting what to sell you or what to expect from you, that's actually more accurate than your actual stats.

    4. Re:Can it beat Google? by sandytaru · · Score: 2

      See, that's where the Google algorithm programmers got lazy. They assume that too.

      --
      Occasionally living proof of the Ballmer peak.
    5. Re:Can it beat Google? by demonlapin · · Score: 3, Funny

      You'd think the searches for things like "gel nails" might tip them off

      Nah, just makes it think you're emo.

    6. Re:Can it beat Google? by pla · · Score: 2

      I think visiting Slashdot so much throws off its algorithm, as does all the video game sites I hang out at.

      Back in my youth, a friend consciously chose a handwriting style specifically to throw off so-called "handwriting" analysts. Of course, he chose to incorporate all the worst traits possible, meaning anyone looking at a sample of his writing would either immediately get the joke, or would back away slowly in fear for their life.

      Funny to think that in the modern world, "handwriting" has become an all-but-deprecated "legacy" skill, but I did take a lesson from his example - I use an entirely synthetic online writing style, right down to an artificial regional dialect (though oddly, not the one I try for - automated profilers such as the summary links usually describe me as midwestern for reasons I don't quite know - Though still badly wrong, so, no harm done).

    7. Re:Can it beat Google? by Jessified · · Score: 2

      He's just trying to throw you off his scent. The first guess was right.

  5. Re:Yes, we know by Anonymous Coward · · Score: 5, Funny

    Why are you replying to yourself?

    Wait. Why am I replying to myself again?

  6. I just don't try to be anonymous in writing by acroyear · · Score: 4, Interesting

    One example are the company performance surveys, that are supposed to be anonymous. I cant answer questions like 'how do you think the company leadership is doing' without effectively giving away who I am - my opinion is based on my position, and thus is easily inferred.

    --
    "But remember, most lynch mobs aren't this nice." (H.Simpson)
    -- Joe
  7. Authors can use these tools too. by StripedCow · · Score: 4, Insightful

    Of course, authors can use these tools too, and then iteratively change their texts until they cannot be correctly identified or profiled.

    Just like spammers can check whether their e-mails ends up in spam filters before sending them.

    It will be a never-ending cat and mouse game.

    --
    If Pandora's box is destined to be opened, *I* want to be the one to open it.
  8. Betteridge strikes again by complete+loony · · Score: 2, Funny

    When Writing, How Anonymous Can You Be, Really?

    No.

    --
    09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    1. Re:Betteridge strikes again by complete+loony · · Score: 3, Informative
      Betteridge's_law_of_headlines

      Any headline which ends in a question mark can be answered by the word no

      Whoosh.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
  9. Re:Yes, we know by Anonymous Coward · · Score: 3, Funny

    Warning: Infinite Loop. {Author} Identified: {Unidentified Author}.

  10. Re:That's pretty easy by SomePgmr · · Score: 4, Interesting

    Most people would just use something like Tor (or Tor and another VPN/proxy service).

    Erm... the transport doesn't matter if you're analyzing message composition.

    Wasn't this part of what that Barr guy was doing to try to figure out who members of Anonymous were? I think I read recently that he turned out to be right about the one that ran to Canada.

  11. Re:Guess who I am! by durrr · · Score: 2, Informative

    >Based on the above, who am I?
    Anonymous

  12. Re:astroturfers by sco08y · · Score: 3, Insightful

    This would have been a lot more fun about two months ago to detect paid political astroturfers.

    The ultimate AI-ish application would be an astroturfer plugin for chrome probably called "AstroturfBlock". So the site is a "tech" site, the contents are pure politics, and the text analysis system indicates an unemployed liberal arts degree holder... Go ahead and block it.

    How is it going to detect whether people were paid to write something?

  13. Re:Guess who I am! by somersault · · Score: 2

    >mfw

    Based on the above, who am I?

    I'm guessing a retard who doesn't understand that this abbreviation means "my face when".

    --
    which is totally what she said
  14. Re:That's pretty easy by Spottywot · · Score: 4, Insightful

    Actually, Tor comes prepackaged with a browser with privacy settings enabled by default. The server shouldn't be able to differentiate you from any other user of the stock Tor bundle.

    That's for the TOR bundle if used as they recommend, but the article is about identifying authors by what they write, them not about idintifying by technical means. On Slashdot not RTFA could be used as an identifying metric but on the other hand it's a rather wide net.

    --
    In a cybernetic fit of rage she pissed off to another age...
  15. We all do it, so why not an algorithm? by Spottywot · · Score: 4, Interesting

    We can all (I hope) recognise authors quotes whom we have some familiarity even if we haven't read the passage in question before. Terry Pratchet quotes for instance stand out a mile, Frank Herbert can be identified by the fact that he'll use the word 'subtle' at least twice a paragraph. Even here on /. certain posters styles identify them without having to read their UID, Girlintraining is an example (for me at least), hell I can spot her posts purely based on the responses to her posts for gods sake.

    With the privacy arms race going on right now on the internet, identifying people based on what they write *and* their style, is not only the magic bullet for Big Brother, but quite acheivable given a big enough sample,

    --
    In a cybernetic fit of rage she pissed off to another age...
  16. Re:That's pretty easy by Genda · · Score: 4, Interesting

    I have Dupytren Contacture. It foreshortens the tendon on my ring fingers of both hands. The result is that when I typing fast I make common repeatable mistakes in typing as well as common typographical errors due to muscle memory. The use of certain vocabulary fixes who you are to those who may be watching, illuminating social exposure, education or intelligence. There are simply so many ways to measure the content a person generates. In a world that growing abhors common anonymity, but reserves that right only for those with the wealth and power to build high walls, we need to ask whether or not we are willing to limit our self expression to remain quietly safe.

    I for one would rather be known as a trouble maker, than not known at all for what it is that I feel moved to say.

    Give me liberty or give me death is still the moral high ground.

  17. Re:Guess who I am! by Genda · · Score: 2

    No its his third cousin "Inane".

  18. Re:Guess who I am! by snspdaarf · · Score: 2

    Are you suggesting this poor soul has a butt at both ends?

    Yeah, like that's a rare condition in the world today.

    --
    Why, without your clothes, you're naked, Miss Dudley!
  19. Re:That's pretty easy by Anonymous Coward · · Score: 3, Informative

    Most people would just use something like Tor (or Tor and another VPN/proxy service).

    Erm... the transport doesn't matter if you're analyzing message composition.

    Right, it's not about the identity - it's about matching different pieces of text as written by the same author

    Once the texts are matched, your identity is compromised as long as ONE of the texts is coming from a known identified source (email, etc.)

  20. Re:astroturfers by Mitreya · · Score: 2

    The ultimate AI-ish application would be an astroturfer plugin for chrome probably called "AstroturfBlock".

    How is it going to detect whether people were paid to write something?

    You also need a blacklist database of known astroturfers (well, their writing samples, you don't need their identity) for this system to work

  21. Re:So the author of Hamlet can finally be identifi by M.+Baranczak · · Score: 3, Funny

    Kevin Bacon.

  22. Re:That's pretty easy by Spottywot · · Score: 2

    >

    I for one would rather be known as a trouble maker, than not known at all for what it is that I feel moved to say.

    Have to agree with you there, however I imagine there are people out there for whom this style of tool would be a terrifying prospect, depends where you stand I guess.

    --
    In a cybernetic fit of rage she pissed off to another age...
  23. Re:That's pretty easy by Runaway1956 · · Score: 4, Informative

    Yep - that was part of Barr's stock in trade. He compared posts made by anon members in various venues, then traced some of those members to identify them. An IRC server was critical to Barr's process, as I recall. Or, more accurately, the IRC server was critical in this particular instance, as it maintained logs that some of the other servers did not.

    --
    "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
  24. Re:That's pretty easy by Omestes · · Score: 3, Interesting

    n a world that growing abhors common anonymity...

    I'm not even sure of this anymore. I'm beginning to think the death of anonymity is inevitable due to nothing but technology; ubiquitous networking, computing power, and near infinite storage. Even without the government, and unregulated corporate behaviors (how else do you stop data farming?), the ability would still be there, and someone would harness it.

    I'm not supporting killing the ability to be anonymous, or supporting the actions of people who would exploit it. I just think that it is going to get increasingly hard to maintain it. Soon we'll see anonymity like we see encryption, not a concrete, perfect, thing, but a matter of degrees. There will be no true anoniminity, but only how much time and resources it would take to unmask people. This, probably, is already true. A determined person, with expensive resources, could probably find almost anyone.

    Hell, a couple months ago I got curious about a childhood friend, someone I haven't seen or talked to in over 20 years. It took about 15 minutes of half-hearted idle searching before I figured out where he lived, how much his house cost, and when he bought it (including a recent Google map of it, and a builders layout, where he worked, his rough income, the car he drives, his wife's name, where her parents live, that his mother recently died, and his father is in a retirement home, etc... I gave up after 15 minutes because I got a bit creeped out. I'm not a PI, I didn't buy any tools for this, I only used Google. I can't even imagine what I would have found if I spent more time, and effort, and money on it.

    --
    A patriot must always be ready to defend his country against his government. -edward abbey
  25. No -- he got the guy's name from WHOIS by TheSeatOfMyPants · · Score: 5, Informative

    The only thing that Barr did correctly was look up WHOIS info on the People's Liberation Front's website after an Anonymous guy claimed to be "Supreme Commander" of the PLF... When Barr confronted him, the guy claimed it was a joke, so Barr pointed to an innocent man instead. (Ars Tech article on the 'correct' Commander X.) Otherwise, Barr's tactics -- including analyzing what the people wrote -- gave him completely wrong answers.

    --
    Now mostly at Usenet:comp.misc & SoylentNews.org (it's made of people!)
  26. assimilation rape by epine · · Score: 5, Interesting

    Wanna revisit your recent rants?

    I can't stand how every slashdot story submission has to end with a pink flamingo smoke grenade. I'm guessing that sober "just the facts, ma'am" submissions still exist, but rarely make it through the selection hoop of our post-counting overlords.

    I have several online pseudonyms which I make an effort to keep separate. I rarely post the same idea under more than one identity. If I post it here, it doesn't go there. I prefer to keep things separate so far as I can. I also have some background in computational linguistics. I've known for fifteen years that there is absolutely no way to win this battle long term. Only the most insipid comments will escape long-term annealing. If the word "gay" is the all season tire on your social media K-car, then your identity is safely concealed within the deep-wank weeds.

    If every post you write contains colourful language or idiom such as "all-season tire of deep-wank camouflage" you're toast and you know it, clap your hands. Merely getting my possessives and plurals and possessive plurals right more often than not narrows the net substantially. I might pedantically write Harry S Truman without putting a dot after the S (Snopes: "Although the 'S' was not technically an abbreviation and therefore did not need to be followed by a period, Truman's full name was generally rendered as 'Harry S. Truman' during his lifetime ..."). I make use of colons, semicolons (these come and go), mdash appositives, and parenthetical side-notes--at least one of these in almost every paragraph I write. I post way more links than the average person. My thoughts meander. There is playful use of language with double readings. I subvert cliche to achieve double readings that enable me to circle away from my target, then loop back from an unexpected angle. My unit of thought is the paragraph more so than the sentence.

    Even with all those signatures, originality in word selection is my neon tattoo. The corpus analysis algorithms likely don't do much (yet) with originality. Hard to characterize. For a while my anonymity might pass through the gun-metal algorithms unmelded by virtue of my writing being too bright and distinctive and easy to trace. But not for long. Even the fractal filigrees of originality will be coded eventually. (Pay no attention to the alliteration: an accident, not a stylistic signature.)

    Frankly, my dear, I don't give a damn.

    This is about respect. We all live a double life, pretty much all the time. We speak differently in front of our mothers (most of us) than with the lady-killing rough necks at the peanut bar or power tie horn-dogs at the chichi sushi bar.

    I value anonymity because I don't wish to own everything I say on a literal level, stripped of context, devoid of my original conceit or persona.

    I happen to regard linearity as a social construct. Humans are not inherently linear in cognition or constitution. We learn how to cultivate linear facades in our areas of competence (but not necessarily around the edges: this is why a competent accountant consults his astrologer Madam Threenipple). If you like the primary facade you have, and it suits all purposes, then I suppose you'll see the charm in proclaiming it from the RealName rafters.

    If you're a Baptist homosexual (I've known a few), you might wish to string your public identity by separate ropes.

    Or maybe you've just got things to work out. You're figuring things out on the fly and trying them on for size and you don't wish to fall prey to the Joseph McCarthy clean-nose auto-da-fe "have you ever". Implication: Anything you've ever said will be permanently recorded and will classify you irretrievably. This despite 0/1 statistics never passing T-scores. If the same person also has an NRA membership and has been a career employee of the Hoover Institute for two decades? Still a communist. Ten times more dangerous.

    The kind of person most willing t

  27. Google translate by acid_andy · · Score: 2

    I just have to turn my writing English Finnish, Russian, and, finally, through the back to English again. Analysis software!

    --
    Your ad here.