Slashdot Mirror


Writing Style Fingerprint Tool Easily Fooled

Urchin writes "Some of the techniques used by literary detectives and courts of law to identify the authorship of text are easily fooled, say US researchers. They found that non-professional writers could hide their identity from 'stylometric' techniques by writing in the style of novelist Cormac McCarthy. Stylometric methods have been used in a number of high-profile legal cases in recent decades, including the 'Unabomber' trial. 'We would strongly suggest that courts examine their methods of stylometry against the possibility of adversarial attacks,' say the researchers."

4 of 96 comments (clear)

  1. Re:Could have told you writing analysis was bogus. by KibibyteBrain · · Score: 4, Insightful

    I don't think anyone has ever sold writing analysis as a unique identifier. But it can be useful. If one was an unpublished author in any significant form, and then "went unabomber" and started to write letters as a calling card, one could deduce from very similar writing styles and structures between the incriminating work and the unpublished/unpopularized previous would would be evidence to at least raise suspicion that the writer of the previous work was somehow uniquely tied to the crimes, even if not directly. Of course, all bets are off if it is plausible that someone could have pre-analyzed the author to imitate. Its also of note, this is only a positive test(i.e. a failed match in analysis makes no claim at all as to whether or not someone wrote it). I good example would be a set of writing that demonstrates an idiom used only in a certain locale, a business term used only in a certain company, and an ideological term used only in a certain fringe political movement. This is reasonable *evidence* of authorship, where of course evidence != proof. The polygraph, on the other hand, is complete BS because the only real thing a polygraph achieves is psychologically motivate the taker to tell the truth due to "faith" in the fact he will be outted for lying by the device. It doesn't actually measure anything related to the statements, only the physiological condition which can depend on millions of independent factors.

  2. Misrepresents forensic linguistics by digitig · · Score: 4, Insightful

    As the article says "the study only attacked some of the less complex stylometry techniques". In fact, I'm surprised that they even considered lexical density because that varies greatly within a single author's writing. It's usually high at the beginning of a text, usually (not always) gradually falls off, jumps when they change subject, and so on. I'm not aware of it's being used in forensic linguistics (although it is used in analysing texts to identify, for example, objective divisions within a text).

    The sort of thing that they used in the Derek Bentley (which contributed to the partial posthumous pardon) was analysis of his statement, which had

    • unusually high proportion of passive constructions
    • the use of police jargon
    • use of language that was not consistent with an educationally sub-normal 17-year-old
    • word frequencies that didn't correlate well with general spoken or written English but that did correlate very well with police reports
    • unusual precision in the expression of times
    • frequent post-positioning of "then" after the subject ("I then went..." instead of "then I went..."), again characteristic of police reports

    That all pointed to the statement not being Bentley's own words, but rather being the police version of his answers to a series of police questions that had been removed from the statement. One aspect of his original trial was a statement "I did not know he was going to use the gun", which was taken as evidence that he knew his accomplice, Craig, had a gun (and the inconsistency with the denial that he knew this, later in the statement, was taken as evidence that he was lying). Since the linguistic analysis shows that this was probably a reply to a question, it seems more likely that it went something like:

    Police Did you know he was going to use the gun? Bentley

    No.

    Which makes sense because he knew at the time of the interview that Craig had a gun.

    Yes, of course this sort of thing can be gamed, but it wasn't credible that Bentley would have been capable of such sophisticated gaming. The important thing as far as this thread is concerned is that forensic linguistics doesn't plug in a single measure, turn a handle and come out with a yes/no answer; it uses a whole range of measures and builds up an overall picture of what probably happened.

    --
    Quidnam Latine loqui modo coepi?
  3. Re:Did you RTFA? by Opportunist · · Score: 5, Insightful

    No, but they knew they were being analyzed and for what. It's trivial to change my style (well, maybe not in English, I don't tend to have the word pool to draw from) and become someone else. If I know in advance that my writing would be used to find me.

    You can, probably, given time and persistance, sift through the thousands and millions of board messages posted everywhere on the internet and find out who I am in other boards. I didn't try to hide my identity against comparison of writing styles.

    I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  4. No information is better than bad information... by Xenographic · · Score: 5, Insightful

    > I don't think anyone has ever sold writing analysis as a unique identifier. But it can be useful.

    One problem with that is the human tendency to be overconfident as to how good these tests are. This happens everywhere. Court, business, whatever.

    Say you have some metric at work (e.g. lines of code) that's easy to measure. If it's the only measure management has, it's what they'll use to measure how good you're doing. This applies even if the results are absurd, because they would rather believe that they have *some* idea what's going on than to accept the fact that they have no idea what's going on.

    In summary, sometimes NO information is better than bad information, but people are very reluctant to accept that fact.