Slashdot Mirror


Project Anonymizes Your Writing Style To Hide Your Identity

mikejuk writes "An open source project to combat 'stylometry,' the study of attributing authorship to documents based only on the linguistic style they exhibit, is proving that it is possible to change writing style to evade detection. Artificial Intelligence techniques are routinely used to detect plagiarism and recently were employed to reveal that Harry Potter author J. K. Rowling is indeed the author of The Cuckoo's Calling, which was published under the byline of Robert Galbraith. Now software is tackling the opposite problem — anonymizing writing style to protect the identity of the originator. The JStylo-Anonymouth (JSAN) framework is a work in progress at the Privacy, Security and Automation Lab (PSAL) at Drexel University. It analyzes a written text and detects features which could be used to identify the author. It then suggests changes that need to be made to avoid the author's stylistic fingerprint appearing in the work."

54 of 103 comments (clear)

  1. I don't know by i+kan+reed · · Score: 5, Funny

    How will it disguise my terrible opinions that are obviously wrong?

    1. Re:I don't know by 192939495969798999 · · Score: 4, Funny

      Those blend right in with the rest of the internet.

      --
      stuff |
    2. Re:I don't know by icebike · · Score: 1

      How will it disguise my terrible opinions that are obviously wrong?

      It won't, it will just attribute them to Francis Bacon.

      --
      Sig Battery depleted. Reverting to safe mode.
    3. Re:I don't know by colinrichardday · · Score: 1

      It will post them on slashdot.

    4. Re:I don't know by i+kan+reed · · Score: 3, Informative

      Dude, let it go, this thread was started on a post about how everyone's opinions are wrong. Not a good context for debate.

    5. Re:I don't know by NatasRevol · · Score: 1

      Great. Benjamin Franklin is going to end up the only person to have a valid opinion.

      --
      There are two types of people in the world: Those who crave closure
    6. Re:I don't know by plover · · Score: 2

      Cardinal Richelieu (supposedly) wrote: "If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him." Will the JStylo-Anonymouth mean that he'd be able to hang everyone who used it?

      --
      John
    7. Re:I don't know by Archangel+Michael · · Score: 1

      Intelligent comment. Exactly what I expect from a (D) lemming.

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
  2. The Cuckoo's Calling by Richard_at_work · · Score: 4, Informative

    Artificial Intelligence techniques are routinely used to detect plagiarism and recently were employed to reveal that Harry Potter author J. K. Rowling is indeed the author of The Cuckoo's Calling, which was published under the byline of Robert Galbraith.

    Uhm, what? It was revealed by someone at Rowlings agency tweeting it to a Sunday Times reporter, after the reporter commented on how good it was for a debut novel - that has all been confirmed by the agency.

    Unless the above line is badly phrased and is meant to say "recently were employed to confirm prior reports that..." - it didn't reveal anything of the sort, the link had already been revealed by plain old journalism.

    1. Re:The Cuckoo's Calling by jabuzz · · Score: 3, Informative

      No it was revealed by a partner at the law firm who should have known better, and should now face sanctions from the Law Society. Being struck of the register would be about right.

      On the other hand they have already reached an out of court settlement for a substantial sum, which probably came out the partners own pocket. I would also imagine the firm has lost the JKR account.

    2. Re:The Cuckoo's Calling by sribe · · Score: 1

      Well I heard it was revealed by the wife of a partner. Slightly better but not by much.

      Was the wife legal counsel to J.K. Rowling? No? Well, then, it was revealed by the partner. That he revealed it to his wife first, or perhaps only, is completely irrelevant.

  3. Hurry it up by paiute · · Score: 1

    A million college students are waiting anxiously for this tool now that some professors have started checking their essays electronically for plagarism.

    --
    If Slashdot were chemistry it would look like this:Cadaverine
    1. Re:Hurry it up by EmperorArthur · · Score: 2

      Tools like this basically do: (step 1) build abstract representation of text - (step 2) rebuild it into a new text using random substitutions.

      Plagiarism detection tool will just have to do step 1 and then compare it with database of saved essays in same abstract form.

      How would that help if the plagiarism detection tool only has the randomized outcome of step 2?

      Simple plagiarism detection tools just use string matching. If a person used popular quotes and phrases in an essay, it is entirely possible for the software to give a high plagiarism percentage. That's why all the good software packages use highlighting with a link what it thinks was plagiarized.

      More advanced tools can detect things like a student using a thesaurus for one to one word replacement. I do not know how much they can do in this regard though. String matching still works as long as the matching algorithms is willing to allow one or more words to not match. The problem is, doing this causes the false positive rate to jump even higher.

      Going over every possible thesaurus based permutation of every word is a O(n!) hard problem. If all text in the database was normalized, then we're back to a basic string compare. Normalized in this context means changing a word in all works to a common synonym. For instance, change ever occurrence of the word proper with correct in the last paragraph.

      It's possible to do more complicated things involving the actual meaning of a sentence, paragraph, or work. Unfortunately, I have no clue to go about doing so. The rules of English grammar are hard. Worse still, both professional writers and amateurs violate them all the time.

      Remember kids, there's a huge difference between knowing the proper way to do something and still doing it improperly versus not knowing the correct way to begin with.

      --
      So lets pretend that we've just completed writing this code, as opposed to having just completed sabotaging it -Altera
    2. Re:Hurry it up by epine · · Score: 2

      A million college students are waiting anxiously for this tool now that some professors have started checking their essays electronically for plagarism.

      This assumes that they're as stupid as we all suspect, because the next thing the administration begins to do is check whether the student's written oeuvre is self-consistent without bunkering down under a blander identity than a Milli Vanilli cover of Valium Spice.

      I'm so busted.

    3. Re:Hurry it up by Tsu+Dho+Nimh · · Score: 1

      Tools like this basically do: (step 1) build abstract representation of text - (step 2) rebuild it into a new text using random substitutions.

      Those are easily spotted by their near-miss of English. It's called "content spinning" and it is easy to spot.

  4. AI doesn't do shit to detect plagiarism by Shadow+of+Eternity · · Score: 1

    Profit does. When your bottom line depends on keeping schools convinced that you're indispensable in the War On Plagiarism you damn well find plagiarism everywhere you can, whether or not it's actually there. There are approximately 80 MILLION students in the US, with our education system being as repetitive and formulaic as it is it becomes a virtual certainty that out of 80,000,000 students a significant number will say the same thing the same way.

    --
    A bullet may have your name on it but splash damage is addressed "To whom it may concern."
    1. Re:AI doesn't do shit to detect plagiarism by zildgulf · · Score: 1

      It is even worse in detecting plagiarism in Computer Program since there is a small subset of algorithms that would be the most efficient or easiest to code. If you have 50 students to program a sort you are going to get several of the nearly identical program.

    2. Re:AI doesn't do shit to detect plagiarism by erikscott · · Score: 1

      Long long ago, in a computer teaching lab 30 miles away, I had 20 assignments turned in to me for grading. Of them, I had seventeen identical, bizarre wrong answers. Seriously, people... if you're going to cheat, at least copy from someone who isn't high/psycho/retarded.

    3. Re:AI doesn't do shit to detect plagiarism by saeedtawil · · Score: 1

      It's pretty easy to tell who plagiarized in a programming course when multiple students are wrong in the same way though.

    4. Re:AI doesn't do shit to detect plagiarism by EmperorArthur · · Score: 1

      Finding plagiarism when it comes to coding is mainly a matter of style. Students should be encouraged to talk to each other about doing their homework. That doesn't mean that they should copy whole problems verbatim from one another though.

      Look at the whole rangecheck(...) debacle. The algorithm wasn't secret by any means. The whole issue came about because the same coder wrote both functions. He has his own programming style that becomes immediately apparent when comparing small snippets of code like the function in question.

      Even when there are style guidelines, each person will implement them slightly differently. It might not be as ingrained in newer students but eventually they will choose something like one of these examples for their functions.

      Examples:
      void foo(...){
      }
      void foo ( ... ) {
      }
      void foo (...)
      {
      }

      Let the flame war about which is better begin.

      --
      So lets pretend that we've just completed writing this code, as opposed to having just completed sabotaging it -Altera
    5. Re:AI doesn't do shit to detect plagiarism by unrtst · · Score: 1

      Off topic, but the braces format question will get better answers if it's phrased differently, such as:

      a)
      if (...) {
      } else {
      }

      b)
      if (...)
      {
      }
      else
      {
      }

      c)
      if (...) {
      }
      else {
      }

      Prior to "Perl Best Practices", I preferred to use an inconsistent style of:

      if (...)
      {
      } else {
      }

      The different handling of elsif and else's compared to if's always bothered me, but I found the lined up braces much more pleasing. I didn't like option "b" because the else's take up WAY too much vertical room. Option "c" is now my personal preference.
      YMMV, but including the else's in the question provides a more complete view.

  5. Literature IS style! by war4peace · · Score: 1

    I am sorry, but as far as literature goes, writing style anonymization (is that a word?) would harm the original intent of the author. A literary work is valuable (when so) due to author's style, among other factors, much like in movies, where a certain actor's voiceover is best for a certain character. The same character would become retarded if the actor's voice changes. Imagine Donkey (from Shrek) played by Morgan Freeman or Darth Vader played by Danny de Vito. Good characters, good actors, no match in style and intent.

    Yeah, students would love this in their paper, but literature? Hell, no.

    --
    ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
    1. Re:Literature IS style! by Nemyst · · Score: 1

      I doubt this would be used to protect pen names of literary authors, but it could have important applications for whistleblowers and people who want to denounce things without getting traced down. Basically, any situation where the style is of little to no importance compared to the content.

    2. Re:Literature IS style! by saeedtawil · · Score: 1

      Imagine Donkey (from Shrek) played by Morgan Freeman or Darth Vader played by Danny de Vito.

      I'd pay to watch either of those.

    3. Re:Literature IS style! by wideBlueSkies · · Score: 1

      >>Darth Vader played by Danny de Vito

      Which is one reason why Spaceballs was so darned funny. Rik Moranis as Darth Helmet... almost the exact opposite of a James Earl Jones voice and style wise.

      --
      Huh?
    4. Re:Literature IS style! by war4peace · · Score: 1

      As a matter of fact, you'd probably pay to watch an excerpt of 2 minutes of either of those.
      I once watched "Twins" (Arnold&DeVito) dubbed in Hungarian. it was hilarious... for a few minutes. Then it was annoying, then I couldn't handle it anymore.

      --
      ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
    5. Re:Literature IS style! by RabidReindeer · · Score: 1

      Imagine Donkey (from Shrek) played by Morgan Freeman or Darth Vader played by Danny de Vito.

      Imagine Eddie Murphy playing a Chinese Dragon in Mulan. Oh wait...

  6. Only if you remain anonymous... by jalvarez13 · · Score: 1

    ... in the rest of your digital life.

    In light of recent events -and I'm not only referring to the NSA-gate, but also to all the known ways to get your private information- it is hard for me to figure out a digital way of keeping your identity secret in a high profile incident.

  7. Confirm by Impy+the+Impiuos+Imp · · Score: 1

    This is he next step in surveillance, if he government isn't doing it already. Binding together various accounts of yours based on statistics of phrases.

    And it's redundant since they have a database of all IP connections, web pages, and stuff you type in anyway. Sigh. I suppose it will make confirmation of these AI. techniques trivial. Yey.

    --
    (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
  8. Google translate? by gregor-e · · Score: 1

    Surely one could simply auto-translate their prose into another language and back to avoid stylometric identification?

    1. Re:Google translate? by Anonymous Coward · · Score: 1

      Certainly one can simply translating their prose mechanism to another language and back to avoid identifying stylometric?

      Surely, one can only auto-interpretation of their prose to another language and back to avoid stylometric identification?

      Of course, you could just automatically translate your prose into another language and back again, in order to avoid the stylometric identification?

      Surely one will simply start their prose-translation to other languages ââand back to avoid stylometric about yourself?

    2. Re:Google translate? by eyenot · · Score: 2

      First of all, this: http://www.youtube.com/watch?v=LMkJuDVJdTw (YouTube)

      Second of all:

      "Of course you can, just stylometric identification and back home in order to prevent another language is automatically translated prose?" -- (Haitian Creole -> Azerbaijani -> Slovenian -> English ...)

      "Not even the same language at home and another stylometric can automatically translated into prose?" -- ( ... Irish -> Hebrew -> Czech -> English ...)

      "Not even in the same language and prose automatically translated differently stylometric?" -- ( ... Japanese -> Turkish -> Hmong -> English.)

      "However, different stylometric automatically translated prose, and the same language is not it?" -- (... Urdu -> Filipino -> Latin -> English ...)

      Depending on who you ask, you seem to have a different "answer" to your question.

      --
      "Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
    3. Re:Google translate? by eyenot · · Score: 1

      i got the order of translation mixed up but same story. The Urdu-led translation trip was second, then led by the Irish, then the Japanese.

      --
      "Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
  9. conversion to another's style by greywire · · Score: 2

    So, can any mediocre author convert his story to the style of a known good author using this?

    --
    -- Senior Software Engineer, Attorney appearance services, locallawyerapp.com.
    1. Re:conversion to another's style by tgd · · Score: 2

      So, can any mediocre author convert his story to the style of a known good author using this?

      There's hope for Slashdot's editors! Huzzah!

    2. Re:conversion to another's style by dlenmn · · Score: 1

      Speaking as someone who's done a little work in stylometry, I'm sure that it's a lot easier to make your work look like it's not yours than it is to make your work look like a specific different person's. I haven't looked at this project, but I'm guessing that it'll do the former. If I made software that could do the latter, then I'd be loudly advertising that fact, or I'd keep silent and make use of it...

    3. Re:conversion to another's style by internerdj · · Score: 1

      I'm more curious what happens to the marketability of one's writing when they are no longer using their own writing style.

  10. yeahbutt by djupedal · · Score: 1

    Just don't lick the envelope.

  11. Wasn't used to out J. K. Rowling by Aurien · · Score: 2

    Sounds like some company is trying to toot their own horn here or something, but AI didn't out J.K. Rowling. Her lawyers friend did. http://www.businessinsider.com/russells-apologizes-to-jk-rowling-2013-7

    1. Re:Wasn't used to out J. K. Rowling by tgd · · Score: 1

      Sounds like some company is trying to toot their own horn here or something, but AI didn't out J.K. Rowling. Her lawyers friend did. http://www.businessinsider.com/russells-apologizes-to-jk-rowling-2013-7

      This is a privacy related story on Slashdot. Facts have as much of a place here as in a Microsoft story.

      Although Slashdot does hate lawyers, so maybe you can get some traction with this ...

  12. Stephen King by Okian+Warrior · · Score: 4, Insightful

    Stephen King seems to agree with you.

    In his book "On Writing", he explains (among many other good points) that one hallmark of good writing is finding the right combination of words for imagery.

    He uses examples like "I lit a cigarette, tasted like a plumber's handkerchief'" from Raymond Chandler and "'It was darker than a carload of assholes' by George V Higgins.

    The Odyssey (IIRC) has the phrase "it was a wine dark sea", so this has been around for a very long time.

    For casual writing the project may be useful, but I wonder how much imagery will be lost in translation.

    Many of the works of revolutionaries, radicals, and dissenters are memorable for their specific imagery. Simon Sinek analyzed "I have a dream", and noted the difference between "I have a dream" and "I have a plan". The two are very different, and have different effects on people. (Viz. TED talk "How Great Leaders Inspire Action")

    I'm doubtful that AI has progressed to the point where the mood and emotional content will be preserved in such a translation.

    To be effective, defiant writing will still require courage.

    1. Re:Stephen King by NatasRevol · · Score: 1

      This isn't for people who want to be known by their writing.

      --
      There are two types of people in the world: Those who crave closure
    2. Re:Stephen King by war4peace · · Score: 1

      Just one mention: I think I agree with Stephen King, not the other way around. After all, I heard of him (as a matter of fact, I just finished reading The Long March and started Misery) but I highly doubt he ever heard of me :)

      --
      ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
    3. Re:Stephen King by TheCarp · · Score: 1

      > For casual writing the project may be useful, but I wonder how much imagery will be lost in
      > translation.

      Except, did they not say it "suggests changes"? Doesn't that still leave the author free to either take the suggestion, or select a different phrasing or imagery choice?

      I mean if it comes to "Wine dark sea" and suggests instead "deep red sea", or "sea of dark wine" I would assume the author would understand his original meaning and be able to work from there, and then iterate through it again to see if a different turn of phrase works better.

      --
      "I opened my eyes, and everything went dark again"
    4. Re:Stephen King by couchslug · · Score: 1

      "To be effective, defiant writing will still require courage."

      Surviving to be defiant may require anonymity.

      --
      "This post is an artistic work of fiction and falsehood. Only a fool would take anything posted here as fact."
  13. Thanks. by Okian+Warrior · · Score: 1

    An excellent point, I will try to remember this in future writing. It's the sort of thing you don't get in a writing course, for which I am grateful.

    Thanks.

  14. Been done before by UnknowingFool · · Score: 1
    MS did this years ago built into their speech recognition but failed to market it as a useful feature.

    Dear aunt, let's set so double the killer delete select all

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  15. Re:Style tester by geminidomino · · Score: 1

    Right here. :)

    Looks like a typical web toy, so I wouldn't quit your job and start working on your Great American Novel based on the results.

  16. Re:She admitted it :P by plover · · Score: 1

    It was confusingly worded in TFA. What I eventually figured from it is that it was not used as a discovery mechanism. It looks like it was a test they performed after it was revealed, and the test only confirmed that she was the author.

    It was not done to uncover any hidden truths, it was done to demonstrate the correctness of the tool.

    --
    John
  17. BUSTED! And on AOL! by Tsu+Dho+Nimh · · Score: 1

    Way back, in the dim, distant past of the bucolic walled gardens that preceded the Internet as we know it ... there was AOL. AOL had walled predator-free gardens within gardens, where only teens younger than 18 were supposed to be communicating.

    There were rumors that evil pedophiles were lurking in these gardens, so I made a sub-account for a totally bogus 16-year old boy named Alex. And Alex went forth to play.

    All was going well, Alex was quite a popular young man amongst his peers and had lured ZERO pedophiles when he got this e-mail from a fellow writer: "Alex, are you Tsu?"

    BUSTED ... not because of subject matter or vocabulary, but because of a @#$&%^ liking for compound, complex sentences and other arcane constructions ... and using them accurately.

  18. Re:Frist Post! by jones_supa · · Score: 1

    Which person posted this?

    There is simply not enough data in your post to find that out. You would probably have to write a few paragraphs of text in your natural style to give the algorithm any real chance.

  19. Re:Identimafy me then. by jones_supa · · Score: 1

    Also, you think this is going to identify people that type very little? Or have multiple personalities, bipolar disorders or similar?

    No, it probably can't. And there's likely to be many, many other scenarios in which it cannot detect the writer reliably. So what? It doesn't have to be completely perfect to be useful.

  20. Facepalm... by jones_supa · · Score: 1

    JStylo-Anonymouth (JSAN)?! Could you possible have come up with any more clunky name than that? ;) Damn, I should set up some agency just to create punchy names for all these projects.