Can Author Obfuscation Trump Forensic Linguistics? (webis.de)

← Back to Stories (view on slashdot.org)

Can Author Obfuscation Trump Forensic Linguistics? (webis.de)

Posted by timothy on Thursday January 21, 2016 @03:57AM from the abe-lincoln-predicted-this dept.

An anonymous reader writes: Everyone possesses their own writing style, which may be used to identify authors even if they wish to remain anonymous: linguists employ stylometry to settle disputes over the authorship of historic texts as well as more recent cases, and are called to verify the authors of suicide notes or threatening letters. Computer linguists carry out research on software for forensic text analyses, and a recent study shows many of these approaches to be reproducible. Now, a competition has been announced to develop obfuscation software to hide an author's style with the task: "Given a document, paraphrase it so that its writing style does not match that of its original author, anymore." We'll see what comes out of that. Meanwhile, the question remains: Who will win in the long run? Forensic linguists, or obfuscation technology?

84 comments

Min score:

Reason:

Sort:

Ummm by wbr1 · 2016-01-21 04:00 · Score: 5, Funny

Want to obfuscate text? Just run it through a language or 5, then back to the original language using something like google translate. No paraphrasing needed.

--
Silence is a state of mime.
1. Re:Ummm by Anonymous Coward · 2016-01-21 04:08 · Score: 1
  
  I do something similar to encrypt my e-mails, I run it through ROT-13 6 times. It's foolproof.
2. Re:Ummm by Anonymous Coward · 2016-01-21 04:15 · Score: 1
  
  Well, with the current quality of machine translation you'll lose a lot of content too.
3. Re:Ummm by s.petry · 2016-01-21 05:15 · Score: 2
  
  I can one up you, because I use ROT6 13 times. Way more better!
  
  --
  -The wise argue that there are few absolutes, the fool argues that there are no probabilities.
4. Re:Ummm by Golddess · 2016-01-21 06:17 · Score: 1
  
  Which is why you proof read it after the final pass, and make adjustments then. Yes, it might be possible that your adjustments can still be enough to identify you, but it seems much less likely to me.
  
  --
  "I'm not sure I like the fugnutish tone you used in your post!" -RogL (608926)-
5. Re:Ummm by ohieaux · 2016-01-21 07:02 · Score: 2, Interesting
  
  English
  Want to obfuscate text? Just run it through a language or 5, then back to the original language using something like google translate. No paraphrasing needed.
  Afrikaans
  Wil teks verduisteren ? Net hardloop dit deur 'n taal of 5 , dan terug na die oorspronklike taal gebruik van iets soos Google vertaal. Geen parafrasering nodig .
  Albanian
  Dëshironi tekstin errët ? Vetëm të drejtuar atë nëpërmjet një gjuhe ose 5 , pastaj kthehet për të përdorur gjuhën origjinale e diçka si Google Translate . Nuk ka parafrazuar nevojshme .
  Arabic
  5 . .
  Armenian
  , . Just 5, Google . .
  English
  You want the text in the dark. Just run it through the language or 5 , then return to the original source using something like Google language translation . A quote is necessary .
  Note: international characters may not show in comment.
  
  --
  Where all think alike, no one thinks very much.
6. Re:Ummm by drew_kime · 2016-01-21 08:59 · Score: 1
  
  That's pretty close, actually. Hmm ... are there languages with syntax sufficiently different from Romance languages to overcome this?
  
  --
  Nope, no sig
7. Re:Ummm by Anonymous Coward · 2016-01-21 09:03 · Score: 0
  
  Yes, the trick will be to obfuscate but have the result read as through it was written by a person.
8. Re: Ummm by IBME · 2016-01-21 23:39 · Score: 0
  
  Math would work. Remove enough content to still make it readable but without proper syntax.
I saw "Trump" in the title by Anonymous Coward · 2016-01-21 04:01 · Score: 0

And thought for a moment it will be a different kind of an article.
1. Re:I saw "Trump" in the title by bluefoxlucid · 2016-01-21 04:05 · Score: 2
  
  I read Trump as a noun and thought the title was nonsense.
  
  --
  Support my political activism on Patreon.
2. Re:I saw "Trump" in the title by Anonymous Coward · 2016-01-21 04:13 · Score: 0
  
  Obvious Hillary supporter is obvious.
3. Re:I saw "Trump" in the title by bondsbw · 2016-01-21 04:49 · Score: 2, Funny
  
  Of course it is, that's what obfuscation does.
  
  --
  All my liberal friends think I'm a conservative, all my conservative friends think I'm a liberal.
4. Re:I saw "Trump" in the title by Anne+Thwacks · 2016-01-21 04:56 · Score: 1
  
  The sooner Trump is obfuscated, the better!
  
  --
  Sent from my ASR33 using ASCII
5. Re:I saw "Trump" in the title by mspohr · 2016-01-21 05:48 · Score: 1
  
  Maybe a computer could make sense of Palin's word salad.
  
  --
  I don't read your sig. Why are you reading mine?
6. Re:I saw "Trump" in the title by balbeir · 2016-01-21 06:02 · Score: 1
  
  Clearly the English language has deteriorated into a hybrid of hillbilly, valleygirl, inner-city slang and various grunts.
7. Re:I saw "Trump" in the title by bluefoxlucid · 2016-01-21 06:22 · Score: 1
  
  It is for this reason I've started a style guide to clear English. This guide includes communicative, informative, and persuasive styles, with a subsection on expletives for persuasive writing and speaking.
  Essentially, it's just Strunk and White, Dale Carnegie, and a few other pieces of broad research brought together. Informative style will provide the greatest difficulty, as I'll need to cobble it together from experience and abstract concepts, rather than other research. For example: SQ3R and its derivatives describe methods of study of informative texts (textbooks, essays, articles, etc.), and various books and papers on human memory have cited questioning and organization as ways to improve memorization; many writers incorporate these observations by asking and then answering questions--similar to the rhetorical question.
  My target audience encompasses copywriters of books, pamphlets, blogs, and news sites. The book *does* target general consumption, but I particularly want an improvement in mass media. We've reached an era where every person constantly faces the words of an educated man; yet the educated man now talks as the common man, instead of speaking in a way which the common man can easily understand. When the common man's speech deteriorates, the media deteriorates as well.
  It is perfectly well for the media to use the language of the common man, but the common man is served best by structuring that language to a higher standard, taking a form best suited to convey information clearly rather than to socialize. The common man is a man of intelligence, even if he is not a man of intellect: he can understand and learn, and he will imitate those behaviors which produce the greatest effect upon him and others. Expose him to clear, concise, vibrant writing and he will begin to speak in clear, concise, vibrant language, even if he is disinclined to study the use of language in such a way.
  
  --
  Support my political activism on Patreon.
8. Re:I saw "Trump" in the title by Anonymous Coward · 2016-01-21 07:13 · Score: 0
  
  Who, Trump?
9. Re:I saw "Trump" in the title by Anonymous Coward · 2016-01-21 07:51 · Score: 0
  
  Sarah Palin makes perfect sense to others of the Palin tribe. Theirs is a specialized argot composed of pidgin English learned when their kin interact with the "world outside." While within their own world, they devolve into high pitched squeals, clicks and grunts, only modulating their voices when in the presence of "outsiders" and mimicking the English phonetic alphabet to fit in.
10. Re:I saw "Trump" in the title by Anonymous Coward · 2016-01-21 10:33 · Score: 0
  
  The world clearly needs another style guide.
  Your intellect is truly staggering, I would like to subscribe to your newsletter.
11. Re:I saw "Trump" in the title by TheRealHocusLocus · 2016-01-21 15:55 · Score: 1
  
  The book *does* target general consumption
  I look forward to devouring it!
  I, for one, also saw "Trump" in the title.
  That makes two.
  
  --
  <blink>down the rabbit hole</blink>
12. Re: I saw "Trump" in the title by IBME · 2016-01-21 23:41 · Score: 0
  
  We know who you are and you are now on a grammar list.
Obfuscation always wins by deathcloset · 2016-01-21 04:02 · Score: 1

because of reasons which are not obvious and which I will not reveal although you already know them.
1. Re:Obfuscation always wins by ranton · 2016-01-21 04:49 · Score: 2
  
  Well in this case the reason is fairly obvious. Since the question asked about the long run, it is safe to assume machines which can comprehend natural language will be used to obfuscate text in the long run. Once that happens, I would assume obfuscation will easily win. It could not only win, but it could almost certainly be able to produce false positives.
  
  --
  -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
I have the perfect solution. by Anonymous Coward · 2016-01-21 04:04 · Score: 0

A thousand monkeys on a thousand typewriters.
It was the best of times, it was the blurst of times...
Stupid monkey!
OK, a few bugs to work out.
1. Re:I have the perfect solution. by Anonymous Coward · 2016-01-21 10:31 · Score: 0
  
  A thousand monkeys on a thousand typewriters.
  That used to be the metric, a thousand monkeys on a thousand typewriters could produce the Bible (causing the whole evolution denial thing to begin).
  But now that we have the Internet (SlashDot is on the Internet), we know that is not true.
  Just a thousand tons of monkey shit.
Author Obfuscation by Anonymous Coward · 2016-01-21 04:05 · Score: 0

Also known as the perfect plagiarism defense.
Hemingway Editor by Anonymous Coward · 2016-01-21 04:06 · Score: 0

First thing to come to mind: the Hemingway Editor, which helps "improve" your wording.
Heh by Anonymous Coward · 2016-01-21 04:13 · Score: 0

This is basically what I did to Wikipedia articles in high school to get past plagiarism filters and google searches while writing papers.
1. Re:Heh by Anonymous Coward · 2016-01-21 10:33 · Score: 0
  
  Stealing from one is plagiarism. Stealing from many is research! -- From the little quotes at the bottom or SlashDot
Unlikely by DeathToBill · 2016-01-21 04:14 · Score: 1

I doubt this is possible to do very well. Consider [1], where they were able to identify authors from compiled code. Not with close to 100% accuracy, but it's still surprising that your source code style is identifiable with optimization enabled and symbols stripped out.
[1] ftp://ftp.cs.wisc.edu/paradyn/...

--
Slashdot - News for Nerds, Stuff that Matters, in ISO-8859-1 Has just realised that beta makes this signature redundant
1. Re:Unlikely by avandesande · 2016-01-21 04:21 · Score: 1
  
  Someone should write a English compiler.
  
  --
  love is just extroverted narcissism
2. Re:Unlikely by gstoddart · 2016-01-21 04:58 · Score: 1
  
  They'd fail utterly.
  Remember that Star Trek episode where the robots kept saying "Norma, coordinate" up until Kirk and Spock made his brain explode? Picture that.
  English is far too malleable and imprecise.
  
  --
  Lost at C:>. Found at C.
3. Re:Unlikely by thoromyr · 2016-01-21 05:20 · Score: 1
  
  they succeeded with nothing like 100% using a small sample set which has the side effect of avoiding confusion.
  Put another way: face recognition seems promising with similar accuracy rate when limited to a small set of faces. But once you open the flood gates the accuracy goes way down.
  Proponents fall back on the "it works as a pre-filter" which, depending on the size of the population you are working with, might have sufficient true positive with a low enough false positive to make it workable. But it is also a far cry from the claims of identification.
4. Re:Unlikely by worf_mo · 2016-01-21 10:15 · Score: 1
  
  Someone should write a English compiler.
  Your message wouldn't pass without a warning.
All of your code by Anonymous Coward · 2016-01-21 04:17 · Score: 0

is belonging to us!
Stephen King is not dead. by I'm+New+Around+Here · 2016-01-21 04:21 · Score: 4, Interesting

Back in the 1970's Stephen King wrote some novels under the pseudonym Richard Bachman. It worked for a while, but people were able to figure out that Bachman wrote in the same style as the famous Stephen King. Eventually the secret broke.
I wonder if those novels written under the pseudonym would make a good test of the system. Run them through the process, give the results to newer readers of King's known works, and see if they notice the similarities others did in the past.

--
If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.
1. Re:Stephen King is not dead. by thoromyr · 2016-01-21 05:24 · Score: 1
  
  and then there are authors who have a diverse writing style. Try author identifying software | reader identification of anonymized works on a corpus including the work of Walter Jon Williams -- and I doubt that he is the only author to vary style.
2. Re:Stephen King is not dead. by david_thornley · 2016-01-21 05:52 · Score: 1
  
  Heck, separate Lord of the Rings into narrative and dialog and compare those. Tolkien used different styles there. The time I remember that he tried using the dialog-type language in narrative and description, at the first formal dinner Frodo attends in Rivendell, it sounded ridiculous.
  
  --
  "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
verb you must identify by Anonymous Coward · 2016-01-21 04:26 · Score: 0

Verbs, you must identify.
At end of sentence, they will be placed.
Yoda, the computer will suspect.
1. Re:verb you must identify by Anonymous Coward · 2016-01-21 05:18 · Score: 0
  
  This, I came here to say.
  Beaten by you I was.
2. Re: verb you must identify by Anonymous Coward · 2016-01-21 08:01 · Score: 0
  
  Worry you should not. Faggots you both are, yes?
3. Re:verb you must identify by techno-vampire · 2016-01-21 09:37 · Score: 1
  
  German also puts the verb(s) at the end of the sentence. Translate your work into proper German, have a computer make a literal translation back to English and you'll get much the same thing as Yoda-speak.
  
  --
  Good, inexpensive web hosting
4. Re:verb you must identify by HornWumpus · 2016-01-21 10:40 · Score: 1
  
  An average sentence, in a German newspaper, is a sublime and impressive curiosity; it occupies a quarter of a column; it contains all the ten parts of speech -- not in regular order, but mixed; it is built mainly of compound words constructed by the writer on the spot, and not to be found in any dictionary -- six or seven words compacted into one, without joint or seam -- that is, without hyphens; it treats of fourteen or fifteen different subjects, each inclosed in a parenthesis of its own, with here and there extra parentheses which reinclose three or four of the minor parentheses, making pens within pens: finally, all the parentheses and reparentheses are massed together between a couple of king-parentheses, one of which is placed in the first line of the majestic sentence and the other in the middle of the last line of it -- after which comes the VERB, and you find out for the first time what the man has been talking about; and after the verb -- merely by way of ornament, as far as I can make out -- the writer shovels in "haben sind gewesen gehabt haben geworden sein," or words to that effect, and the monument is finished. I suppose that this closing hurrah is in the nature of the flourish to a man's signature -- not necessary, but pretty. German books are easy enough to read when you hold them before the looking-glass or stand on your head -- so as to reverse the construction -- but I think that to learn to read and understand a German newspaper is a thing which must always remain an impossibility to a foreigner.
  Twain.
  
  --
  John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Only for noobs by penguinoid · 2016-01-21 04:27 · Score: 1

If someone is serious about obfuscating their writing, they will be able to. Especially once they get access to the software that would be used to examine it.
However, most people are not going to even bother attempting to obfuscate.

--
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
1. Re:Only for noobs by EdwardFurlong · 2016-01-21 07:00 · Score: 1
  
  This seems pretty true, if I was writing something that I would not want traced back to me I would not trust some program anyway.
  Maybe if I was super paranoid about the NSA or Google somehow linking my random internet comments all to me, then a program might have some use.
  It would be interesting to see if the program could go through /. AC postings and see if they can match them up to a user.
2. Re:Only for noobs by techno-vampire · 2016-01-21 09:48 · Score: 1
  
  I've actually done some obfuscation of my own communications. Years ago, I worked for a tech company where most of my co-workers were about half my age at best, and their word usage, grammar and syntax often made them look like high school dropouts, especially when compared to my writing. (No, I'm not bragging; it's just that unlike them, I cared about such things and tried harder than they did to get it right.)
  
  One of the ways we had for giving feedback was an internal website where we could "ask the suits." Officially, the questions were anonymous, but my writing style was distinctive enough to be a giveaway if the responder was familiar with me. To avoid that, I did my best to mimic the style, word-choice and syntax of the other techs, including one or two judicious spelling errors so that my questions looked about the same as anybody else's. I've no idea, of course, if this would fool a determined attempt to identify me, but I'm fairly sure that my identity wasn't obvious, and that's all that I needed.
  
  --
  Good, inexpensive web hosting
Lacking objective quality metric by GlobalEcho · 2016-01-21 04:30 · Score: 1

To quantify the degree of obfuscation, they have precise computational metrics based on their stylometric algorithms. But to judge the quality of the obfuscation, there is no objective metrics. Instead

To measure soundness and properness, obfuscations will be sampled and handed out to participants for peer-review.
which seems to me to make the contest rather less meaningful. Why not just peer review the quality of all obfuscations exceeding some minimum standard?
Great, another Trump story by Anonymous Coward · 2016-01-21 04:35 · Score: 0

There's no way this guy can obfuscate anything he says. Unless, did he endorse himself rather than Palin doing it? hmmmmm
Obfuscation will win...if it works by Anonymous Coward · 2016-01-21 04:45 · Score: 1

As a trained linguist, though not an expert on forensic linguistics, I believe that successful automated obfuscation will win and be essentially unbeatable, but probably also detectable. By rewriting a text automatically, valuable information is destroyed that a forensic linguist has to reply upon. (When humans try to obfuscate text, on the other hand, they tend to add such information, potentially even making the task easier for the forensic linguist. For example, black mailers commonly imitate foreign accents in phone calls, which are easy to detect and allow even more conclusions about the person than without this attempt to deceive.)
I'm skeptical about the feasibility of the software, though. Rewriting a text automatically while keeping it readable and stylistically acceptable seems almost as hard as automated translation. Anyway, depending on how the software works, it will very likely be detectable by the same methods as are already used for authorship detection.
Basically you are looking for a translator by davidwr · 2016-01-21 04:48 · Score: 1

You are looking for a tool that extracts the meaning from a text then re-writes it in a standardized, canonical format, or at least "washes" it into one of a list of possible formats such that if you take a bunch of random input from a bunch of different authors, you can't tell from the output who wrote what.
I expect this will be successful within 10 years if we work hard on it.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
hard by bigdavex · 2016-01-21 04:50 · Score: 1

This strikes me as an extremely difficult task, assuming the tolerance for losing meaning is low. Maybe IBM Watson work applies.

--
-Dave
1. Re:hard by umghhh · 2016-01-21 08:04 · Score: 1
  
  That may be true but most of what is written anywhere in this world is meaningless drivel so the problem of losing meaning does not exist really. The need to obfuscate neither I admit.
  This leaves us with people that actually have something to say. I reckon there would be a tiny minority among them, that would want to have such service but that also means its production would most likely be economically unfeasible.
  Then there are trolls and 50c soldiers which could use the service of course but I guess it is easier to automate them than to obfuscate their activity.
Depends by spaceman375 · 2016-01-21 04:56 · Score: 1

This will depend heavily on which language the original and end documents are in. Or: Success relies strongly on source and target vernacular.
English has numerous words for the same thing. Try to say a guy is cute, handsome, beautiful, or hot in Portuguese and it all translates to "Bonito".

--
On the one hand you take life too seriously, and on the other, you do not take playful existence seriously enough. Seth
TV needs this. by Anonymous Coward · 2016-01-21 04:57 · Score: 0

Have you ever seen a TV show where everyone talks the same?
The main characters are high school kids, and talk like high school kids, but so do their parents, the badguys, the police, the foreign exchange students.
Some TV show was on the last time I sent to the gym. I didn't recognize the show, but _everyone_ talked like a fortune cookie.
It drives me crazy. I can't suspend my disbelief, and it just kills the show for me. It's like the actors are just puppets, and I can hear the writer pulling the strings.
I think that many (even successful) writers are phenomenally bad at using so much as two distinct voices.
If I ever write a screenplay or book, I'm going to get a different friend to paraphrase the main characters that should logically sound different.
Click bait title by Anonymous Coward · 2016-01-21 05:05 · Score: 4, Funny

This has nothing to do with Trump.
1. Re:Click bait title by Feral+Nerd · 2016-01-21 05:10 · Score: 0
  
  This has nothing to do with Trump.
  Yes it does, the guy has a built in obfuscation engine that sits between the part of his brain that handler rational thougt and his mouth but it only kicks in when he is giving political speeches.
Newspeak by techsoldaten · 2016-01-21 05:06 · Score: 1

We should all just move to newspeak to eliminate the detection / obfuscation arms race entirely.
apk by 110010001000 · 2016-01-21 05:08 · Score: 1

This is very true. I can identify every single post by apk, even if he posts Anonymously. I must be a genius.
1. Re:apk by Anonymous Coward · 2016-01-21 20:04 · Score: 0
  
  In case you haven't noticed, apk always posts anonymously on slashdot.
Polygraph 2.0 by jheath314 · 2016-01-21 05:12 · Score: 2

The TFA assumes that stylometry gives somewhat reliable results. It doesn't. Something as simple as an editor cleaning up a work can throw off the analysis.
Even in the optimal scenario (an unedited work by a single author who isn't trying to hide or imitate a different style), the best algorithms have abysmally high failure rates.
(KNN)â"50 neighbors: 0.69 success, 0.28 fail
Decision Tree 0.58 success, 0.42 fail
Mean Margins Tree 0.65 success, 0.36 fail
Stylometry is reasonably effective at correctly identifying when two works by the same author have the same style. It is garbage when it comes to determining when two works have different authors. If I were to guess, I'd say the problem is that the variation in style between authors (compared to the variation within a single author's work) is not always wide enough to allow for reliable identification.
Stylometry is interesting, certainly, but the prospect of such an unreliable method being used for important is alarming.

--
Procrastination Man strikes again!
1. Re:Polygraph 2.0 by thoromyr · 2016-01-21 05:32 · Score: 1
  
  Indeed. I've been reading H. Beam Piper's "Fuzzy" stories to my kids and it is quite amusing to have the "veridicator" play such a prominent role as an infallible method of separating truth from lies (although the narration admits the possibility of unintentional deception wherein someone truly believes what they are saying it emphatically rejects the possibility of deliberate deception).
  To the topic at hand, it is certainly interesting and even useful when applied intelligently. For example, it is well established that single "books" (e.g., in the Bible) have multiple authors. Some of this is trivial (anyone reading the Noah story as related in the Bible should be able to immediately tell that there is a minimum of two separate traditions being merged), but serious textual analysis is good for making finer discrimination -- with the caveat that it does not provide absolute answers.
2. Re: Polygraph 2.0 by IBME · 2016-01-21 23:51 · Score: 0
  
  Important? Important what?? Dammit this is important.
More fun if they could do the reverse by Anonymous Coward · 2016-01-21 05:13 · Score: 0

It would be a lot more interesting if they could do the reverse: create software to give an individual's writings the style of an arbitrary target. Misdirection instead of obfuscation.
The most effective obfuscation tool is Powerpoint. Even the most stunning and original ideas can be easily reduced to an unremarkable set of slides.
Seems simple to me by Billy+the+Mountain · 2016-01-21 05:21 · Score: 1

All the obfuscation software has to do is change things so it casts enough doubt. I assume the stylometry analysis doesn't return a 1 or 0, it probably returns a probability. Once the probability is below a certain threshold, the job is done. An example of obfuscating: How about a simple machine translation to another language?

--
That was the turning point of my life--I went from negative zero to positive zero.
1. Re:Seems simple to me by Hognoxious · 2016-01-21 06:44 · Score: 1
  
  An example of obfuscating: How about a simple machine translation to another language?
  That would certainly obfuscate it for people who didn't speak the other language.
  Were you suggesting a round trip? Things may have moved on, but I remember playing with this some years back and the results were changed way beyond style.
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
text obfuscation : an all timer by jtayon · 2016-01-21 05:25 · Score: 0

Take les liaisons dangereuses of Chaderlos de Laclos. It is an exemple of poly semantics based on changed register of the "writers". But with a unique author. "Fake polyphonie" (the purpose of the contest is to not be detected doing so)
At the other end of the spectrum "the 1001 tales" tries to unifie with a single tone stories that clearly do not match the same origins. (another failure)
I do guess that collective work with misdirections of fake semantics changes might do the trick with a peculiarly engineered language for that task: french
If you cannot fuzzy the tools, fuzzy the input by using carefuly crafted designed language to make this detection hard.
http://beauty-of-imagination.b...
Wrong goal. by Anonymous Coward · 2016-01-21 06:15 · Score: 0

Given a document, paraphrase it so that its writing style does not match that of its original author, anymore.
The goal doesn't need to be obfuscate it to the point where it's completely distinct from the original author's style. The goal should be to obfuscate the style sufficiently that it electronically matches a sufficiently large pool of alternate authors at least as well as it matches the original author. Your goal shouldn't be described in terms of how far you get from the original author's style, but rather in terms of how many other people could have plausibly written the document.
For example, if I'm trying to leak a document I wrote at a state environmental protection agency to the press, what I really care about is sufficient obfuscation that it's hard to tell which employee of the agency wrote the document. I don't care if the obfuscation is "better than" that. In fact, it's probably WORSE to over obfuscate, because any machine obfuscation has the potential to subtly change meaning. An obfuscation method that's so powerful that you can obfuscate the difference between me and someone in a remedial high school english class is not necessarily better at achieving my goal.
1. Re:Wrong goal. by dgatwood · 2016-01-21 07:40 · Score: 2
  
  You touched a key point there, without actually saying it, which is that the ability of forensic linguistics to recognize a person is inversely proportional to the number of people who could have written the content.
  For example, let's say that you're a native Russian speaker, and that your English grammar has certain linguistic quirks that are typical of Russian speakers writing English, e.g. missing all the definite and indefinite articles ("We read book, da?"). If exactly one Russian has access to some piece of information that is contained in the piece of writing, you're screwed. If there are a hundred Russians with access, those particular linguistic quirks no longer provide much help at identifying the author.
  One possible takeaway is that the best way to leak something is to anonymously post evidence somewhere without comment, then separately anonymously report that you noticed it, and bring it to someone's attention. This potentially vastly broadens the pool of people with access to the information, and thus makes your linguistic quirks less meaningful. However, this requires a significant time delay between the two posts. Otherwise, one would still strongly suspect that the original poster made the "discovery". But if you can stand to wait a year or two, you're golden.
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
2. Re:Wrong goal. by dgatwood · 2016-01-21 07:42 · Score: 2
  
  Alternatively, delete all the definite and indefinite articles. Then they'll blame your one Russian coworker.
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
3. Re:Wrong goal. by Anonymous Coward · 2016-01-21 18:40 · Score: 0
  
  In Soviet Russia, object direct YOU!
Yes, but... by Bearhouse · 2016-01-21 06:24 · Score: 1

Could it do anything for Trump's linguistics?
1. Re:Yes, but... by Anonymous Coward · 2016-01-21 07:13 · Score: 0
  
  +1 Lame, But Less Lame Than This So Called "Article" So Is Therefore Improvement Of Crap Article In This One Instance
Re:What does that headline say? by turning+in+circles · 2016-01-21 06:58 · Score: 1

Brevity is the soul of obfuscation. "Can a program designed to obfuscate author identity defeat a program designed to verify author identity?"

--
Might as well face it I'm addicted to data.
Supposing it works... by werepants · 2016-01-21 07:06 · Score: 1

Supposing it works (not saying it's likely), this would be a big problem for catching plagiarists. Copy somebody's text, run it through this, and then hand it in: boom, you're done. You could certainly have anti-plagiarism software that runs this in reverse (or you take your database of comparison docs and run them all through the obfuscator, something along those lines) but if they do it right and there's some degree of randomness, it introduces a massive dose of plausible deniability to any plagiarism case even with these efforts.
BTW, any typos, grammatical peculiarities, or other abnormalities with my post are due to my text obfuscation software. Don't blame me!
What spam house is funding this? by drew_kime · 2016-01-21 07:15 · Score: 1
If the intent is to obfuscate the style, just run it through a few languages and back as someone already suggested. But I'm guessing they want something that doesn't look like word salad.
We call an obfuscation software
- safe, if a forensic analysis does not reveal the original author of its obfuscated texts,
  sound, if its obufscated texts are textually entailed with their originals, and
  proper, if its obfuscated texts are inconspicuous.
Yup, right there: proper. They're basically asking for someone to write the perfect Bayesian filter beater.
--
Nope, no sig
at first I read, oh never mind.. by Anonymous Coward · 2016-01-21 07:48 · Score: 0

"Trumps" ... was the first word that caught my eye. What the Donald?! has he gone and done now?
Then I laughed. As for obfuscation, I have this completely secure home-brew algorithm that I came up with in my 1st year at college and have used ever since...
Ob by Anonymous Coward · 2016-01-21 08:32 · Score: 0

Everyone possesses their own writing style
Joe_Dragon definatley does maybe that because no body else would wan't it.
Kraut Speak by Anonymous Coward · 2016-01-21 09:54 · Score: 0

Eferyone possesses zeir own v-r-ritingkt schtyle, vhich may be used to identify auzzors efen if zey visch to r-r-remain anonymous: lingktuists employ schtylometry to settle disputes ofer ze auzzorschip uff historic texts as vell as more r-r-recent kases, undt are kalled to ferify ze auzzors uff suicide nichtes or zreateningkt letters. Komputer lingktuists karry out r-r-research on software fur forensic text analyses, undt a r-r-recent schtudy schows many uff zese approaches to be r-r-reproducible. Now, a kompetition has been announced to defelop obfuscation software to hide an auzzor's schtyle mitt ze task: "Gifen a document, paraphrase it so zat its v-r-ritingkt schtyle does nicht match zat uff its original auzzor, anymore." Ve'll see vhat komes out uff zat. Meanvile, ze qfestion r-r-remains: Vho vill vin in ze long r-r-run? Forensic lingktuists, or obfuscation technology?
Slashdot is going down the toilet by Soccerguy1832 · 2016-01-21 10:03 · Score: 1

Enough with all the Trump articles jeez!
Yes, plenty of research out there already by SlideRuleGuy · 2016-01-21 10:12 · Score: 1

For one example, see
"Obfuscating Document Stylometry to Preserve Author Anonymity"
Gary Kacmarcik & Michael Gamon
This technique is not an automated one, but hey, all you need is more software.
Can Trump author forensic liguistics obfuscation? by Anonymous Coward · 2016-01-21 10:55 · Score: 0

Yes he can! The was read by a dyslexic first glance.
Isn't This Content Spinning? by Tsu+Dho+Nimh · 2016-01-21 13:03 · Score: 1

I hope part of the competition is to retain meaning and have correct grammar. Because if not, you might as well just do content spinning and declare it done.