Project Anonymizes Your Writing Style To Hide Your Identity
mikejuk writes "An open source project to combat 'stylometry,' the study of attributing authorship to documents based only on the linguistic style they exhibit, is proving that it is possible to change writing style to evade detection. Artificial Intelligence techniques are routinely used to detect plagiarism and recently were employed to reveal that Harry Potter author J. K. Rowling is indeed the author of The Cuckoo's Calling, which was published under the byline of Robert Galbraith. Now software is tackling the opposite problem — anonymizing writing style to protect the identity of the originator. The JStylo-Anonymouth (JSAN) framework is a work in progress at the Privacy, Security and Automation Lab (PSAL) at Drexel University. It analyzes a written text and detects features which could be used to identify the author. It then suggests changes that need to be made to avoid the author's stylistic fingerprint appearing in the work."
How will it disguise my terrible opinions that are obviously wrong?
Uhm, what? It was revealed by someone at Rowlings agency tweeting it to a Sunday Times reporter, after the reporter commented on how good it was for a debut novel - that has all been confirmed by the agency.
Unless the above line is badly phrased and is meant to say "recently were employed to confirm prior reports that..." - it didn't reveal anything of the sort, the link had already been revealed by plain old journalism.
So, can any mediocre author convert his story to the style of a known good author using this?
-- Senior Software Engineer, Attorney appearance services, locallawyerapp.com.
Sounds like some company is trying to toot their own horn here or something, but AI didn't out J.K. Rowling. Her lawyers friend did. http://www.businessinsider.com/russells-apologizes-to-jk-rowling-2013-7
Stephen King seems to agree with you.
In his book "On Writing", he explains (among many other good points) that one hallmark of good writing is finding the right combination of words for imagery.
He uses examples like "I lit a cigarette, tasted like a plumber's handkerchief'" from Raymond Chandler and "'It was darker than a carload of assholes' by George V Higgins.
The Odyssey (IIRC) has the phrase "it was a wine dark sea", so this has been around for a very long time.
For casual writing the project may be useful, but I wonder how much imagery will be lost in translation.
Many of the works of revolutionaries, radicals, and dissenters are memorable for their specific imagery. Simon Sinek analyzed "I have a dream", and noted the difference between "I have a dream" and "I have a plan". The two are very different, and have different effects on people. (Viz. TED talk "How Great Leaders Inspire Action")
I'm doubtful that AI has progressed to the point where the mood and emotional content will be preserved in such a translation.
To be effective, defiant writing will still require courage.
First of all, this: http://www.youtube.com/watch?v=LMkJuDVJdTw (YouTube)
Second of all:
"Of course you can, just stylometric identification and back home in order to prevent another language is automatically translated prose?" -- (Haitian Creole -> Azerbaijani -> Slovenian -> English ...)
"Not even the same language at home and another stylometric can automatically translated into prose?" -- ( ... Irish -> Hebrew -> Czech -> English ...)
"Not even in the same language and prose automatically translated differently stylometric?" -- ( ... Japanese -> Turkish -> Hmong -> English.)
"However, different stylometric automatically translated prose, and the same language is not it?" -- (... Urdu -> Filipino -> Latin -> English ...)
Depending on who you ask, you seem to have a different "answer" to your question.
"Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
Tools like this basically do: (step 1) build abstract representation of text - (step 2) rebuild it into a new text using random substitutions.
Plagiarism detection tool will just have to do step 1 and then compare it with database of saved essays in same abstract form.
How would that help if the plagiarism detection tool only has the randomized outcome of step 2?
Simple plagiarism detection tools just use string matching. If a person used popular quotes and phrases in an essay, it is entirely possible for the software to give a high plagiarism percentage. That's why all the good software packages use highlighting with a link what it thinks was plagiarized.
More advanced tools can detect things like a student using a thesaurus for one to one word replacement. I do not know how much they can do in this regard though. String matching still works as long as the matching algorithms is willing to allow one or more words to not match. The problem is, doing this causes the false positive rate to jump even higher.
Going over every possible thesaurus based permutation of every word is a O(n!) hard problem. If all text in the database was normalized, then we're back to a basic string compare. Normalized in this context means changing a word in all works to a common synonym. For instance, change ever occurrence of the word proper with correct in the last paragraph.
It's possible to do more complicated things involving the actual meaning of a sentence, paragraph, or work. Unfortunately, I have no clue to go about doing so. The rules of English grammar are hard. Worse still, both professional writers and amateurs violate them all the time.
Remember kids, there's a huge difference between knowing the proper way to do something and still doing it improperly versus not knowing the correct way to begin with.
So lets pretend that we've just completed writing this code, as opposed to having just completed sabotaging it -Altera
This assumes that they're as stupid as we all suspect, because the next thing the administration begins to do is check whether the student's written oeuvre is self-consistent without bunkering down under a blander identity than a Milli Vanilli cover of Valium Spice.
I'm so busted.