Slashdot Mirror


Writing Style Fingerprint Tool Easily Fooled

Urchin writes "Some of the techniques used by literary detectives and courts of law to identify the authorship of text are easily fooled, say US researchers. They found that non-professional writers could hide their identity from 'stylometric' techniques by writing in the style of novelist Cormac McCarthy. Stylometric methods have been used in a number of high-profile legal cases in recent decades, including the 'Unabomber' trial. 'We would strongly suggest that courts examine their methods of stylometry against the possibility of adversarial attacks,' say the researchers."

32 of 96 comments (clear)

  1. Could have told you writing analysis was bogus.... by Peter+Steil · · Score: 3, Insightful

    ....from the beginning. Sure it may work on a limited set of individuals. It's the same thing as a polygraph test, it's not based on any sort of quantifiable data but mere suspicion at best. It is completely subjective and there is no real hard science to support such tests. This is the reason why polygraphs are not admissible in court, and why writing analysis shouldn't be either. Be sure to watch for writing analysis to show up on the next Maury show!

  2. Concealing style by Anonymous Coward · · Score: 4, Funny

    hide their identity from 'stylometric' techniques by writing in the style of novelist Cormac McCarthy

    ... or Anonymous Coward.

    1. Re:Concealing style by Thanshin · · Score: 2, Informative

      What a crappy joke. I wish I could find you and kill you.

      I mean...

      Oh! A bad pun! Should we cross our paths, I'd rather extinguish your life.

      My dear sir.

  3. Re:Could have told you writing analysis was bogus. by Anonymous Coward · · Score: 5, Informative

    Some analysis of handwriting can be useful. In forgery, for instance, a signature can show as false when compared to an authentic one by the presence of a "forger's tremor", because the forger must proceed more slowly to produce the signature than the person to whom it properly belongs.

  4. Duh! by k.a.f. · · Score: 3, Insightful

    If the methods a stylometry analysis uses are known (and they couldn't very well be a secret to hold up in court), of course you can game them. As long as the algorithm outputs "no" for any reformulation of your message, you can easily find it, by generate-and-test if necessary. The only question is, how fast can you generate a text that (a) says what you intend and (b) does not point to you? Very fast, I'd wager.

    1. Re:Duh! by bitt3n · · Score: 2, Funny

      how fast can you generate a text that (a) says what you intend and (b) does not point to you? Very fast, I'd wager.

      as fast as: type it out, auto-translate it into french, auto-translate it back into: "the person who is being hated by myself is to be killed by myself by employment of the method of the bomb conflagration saving if it is the case that I am receiving the stipend of an amount that is one million of dollars. sandwich."

  5. Re:Could have told you writing analysis was bogus. by KibibyteBrain · · Score: 4, Insightful

    I don't think anyone has ever sold writing analysis as a unique identifier. But it can be useful. If one was an unpublished author in any significant form, and then "went unabomber" and started to write letters as a calling card, one could deduce from very similar writing styles and structures between the incriminating work and the unpublished/unpopularized previous would would be evidence to at least raise suspicion that the writer of the previous work was somehow uniquely tied to the crimes, even if not directly. Of course, all bets are off if it is plausible that someone could have pre-analyzed the author to imitate. Its also of note, this is only a positive test(i.e. a failed match in analysis makes no claim at all as to whether or not someone wrote it). I good example would be a set of writing that demonstrates an idiom used only in a certain locale, a business term used only in a certain company, and an ideological term used only in a certain fringe political movement. This is reasonable *evidence* of authorship, where of course evidence != proof. The polygraph, on the other hand, is complete BS because the only real thing a polygraph achieves is psychologically motivate the taker to tell the truth due to "faith" in the fact he will be outted for lying by the device. It doesn't actually measure anything related to the statements, only the physiological condition which can depend on millions of independent factors.

  6. No surprise by AmiMoJo · · Score: 4, Interesting

    This should not really come as a surprise to anyone. Like all evidence that has to be interpreted, the interpretation can be flawed.

    Shows like CSI have computers getting an exact match on fingerprints and DNA, but the real world is not like that. Fingerprint matching is entirely subjective and the print recovered from a crime scene is rarely a nice clean one like they show on TV. DNA often has to be manipulated before a match can be made (due to the sample found at the scene being too small or of poor quality) and even then it often matches more than one person.

    Even when you do get a match, it's not proof that someone was at a specific place because DNA and fingerprints can easily be transferred. Someone broke in to my car a few years ago and despite there being fingerprints the police decided not to prosecute because they were on the outside of the car and the accused could just claim he lent on it on his way home from the pub.

    There have been a few cases where fingerprint and DNA evidence have been challenged in the UK courts and shown to be unreliable, with innocent people spending years in jail before being cleared. Yet, the police seem to have started asking for everyone in the area of a crime to "volunteer" their DNA. Presumably if you don't "volunteer" you become a suspect.

    The idea that handwriting is any more unique than those two and at all reliable is laughable.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    1. Re:No surprise by abigsmurf · · Score: 3, Insightful

      There was a good article here (or possibly some other social news type site) about the inherent flaw in DNA databases and the weight given to DNA evidence.

      The theory goes like this: the chances of getting a false positive on a part sample are something like 1/50million. You have 50 million people on the database. This means You'd expect a false positive on every search. If you're unlucky enough to live close enough to a crime to have committed it, you could easily find yourself in court.

      You'll then have to defend yourself based on a 1 in 50 million probability to a jury who won't understand the statistics. If you haven't got a solid alibi, it would be a tough thing to do.

      There's probably a good Terry Pratchett quote about 1 in a million chances to be used here.

  7. a common feature of correlations by Trepidity · · Score: 3, Insightful

    Stylometrics is essentially a correlational field: it's not that people inherently must write in unique styles that are identifiable from a few measurable features: there is no strong genetic causation for handwriting or anything like that, which would mean that a handwriting style really does truly identify an individual or narrow set of individuals. Rather, it's that, all else being equal, people in practice, do tend to write in a way that lets the stylometric features distinguish them. But, when all else isn't equal, and people are actively trying to thwart that sort of analysis, they are, unsurprisingly, able to do so in a lot of cases.

    I suspect that a lot of forensic analysis runs into this problem: it takes some fact that empirically is true among the general population, but only because the general population is not actively trying to thwart you. The set of robust empirical truths about people, that hold up even when the person is aware that you're trying to use it against them and actively trying to keep you from doing so, is much smaller.

    1. Re:a common feature of correlations by digitig · · Score: 3, Insightful

      [Sigh] Somebody else who thinks this is about handwriting. It isn't.

      --
      Quidnam Latine loqui modo coepi?
  8. Re:Could have told you writing analysis was bogus. by KibibyteBrain · · Score: 4, Interesting

    Again, thats why its clear that writing analysis is only a positive test. If steps are taken to actively change the style of writing, of course it will fail. It is something like saying an audio recording of someone's voice in a phone call is invalid, because it is possible to speak in a different voice. While true, this doesn't significantly weaken the positive test value.

  9. Cormac McCarthy Stlye? by hansraj · · Score: 2, Interesting

    What exactly is the "Cormac McCarthy style"? The article doesn't mention it all. I even skimmed through the paper and all it does it quote a paragraph from some work of Cormac McCarthy.

    I can't figure out what his style exactly is, and I certainly would not be able to fake it as the participants were supposed to. And the participants were supposed to not be literary geniuses.

    1. Re:Cormac McCarthy Stlye? by SappoMan · · Score: 2, Informative

      This is the epilogue from "Blood Meridian", a novel of McCarthy:
      "In the dawn there is a man progressing over the plain by means of holes, which he is making in the ground. He uses an implement with two handles and he chocks it into the hole and he enkindles the stone into the hole with he steel, hole by hole, striking the fire out of the rock, which God has put there. On the plain behind him are the wanderers in search of bones, and those who do not search. And they move haltingly in the light, likes mechanisms whose movements are monitored with escapement and palate, so that they appear restrained by a prudence or reflectiveness which has no inner reality. And they cross in their progress one by one that track of holes that runs to the rim of the visible ground and which seems less the pursuit of some continuance than the verification of a principle, a validation of sequence and causality. As if each round and perfect hole owed its existence to the one before it there on that prairie, upon which are the bones and the gatherers of bones, and those who do not gather. He strikes fire in the hole and draws out his steel. Then they all move on again."

    2. Re:Cormac McCarthy Stlye? by Anonymous Coward · · Score: 2, Interesting

      Ummm Not Fair 20 years ago the exam board just labeled that 'Bad Grammmar' and failed me.

  10. Selfevident, isn't it? by Lundse · · Score: 2, Interesting

    If you can describe something in enough detail to put it in a certain category (X writes likes this), then you can also imitate that category from that same description (I will now write like this in order to seem like X).

    I do not really see how you would ever expect different.

    --
    IAIFARSIJDPOOTV - I Am In Fact A Reality Star; I Just Don't Play One On TV
  11. Did you RTFA? by argent · · Score: 4, Informative

    If the methods a stylometry analysis uses are known (and they couldn't very well be a secret to hold up in court), of course you can game them.

    Their volunteer "attackers" lacked formal training in linguistics and had no access to stylometry software.

    1. Re:Did you RTFA? by Opportunist · · Score: 5, Insightful

      No, but they knew they were being analyzed and for what. It's trivial to change my style (well, maybe not in English, I don't tend to have the word pool to draw from) and become someone else. If I know in advance that my writing would be used to find me.

      You can, probably, given time and persistance, sift through the thousands and millions of board messages posted everywhere on the internet and find out who I am in other boards. I didn't try to hide my identity against comparison of writing styles.

      I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    2. Re:Did you RTFA? by k.a.f. · · Score: 2, Interesting

      No, but they knew they were being analyzed and for what. It's trivial to change my style (well, maybe not in English, I don't tend to have the word pool to draw from) and become someone else. If I know in advance that my writing would be used to find me.

      You can, probably, given time and persistance, sift through the thousands and millions of board messages posted everywhere on the internet and find out who I am in other boards. I didn't try to hide my identity against comparison of writing styles.

      I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.

      That's what I meant, sorry: even a computer program could outwit such analyses. Given the current state of automatic language analysis (Disclaimer: IAA computational linguist), I consider it obvious that a determined person can fool the discriminators enough to appear as someone else.

  12. Yes, but here's the problem by Moraelin · · Score: 5, Interesting

    Yes, but the problem is this:

    1. It's not just that it's possible to fake not being myself, it's also that I can pretty much frame someone else. E.g., given enough messages written by KibibyteBrain (which just clicking on the user name or id will give me a list of), it's trivial to do a stylistical analysis on those and not just get an idea of how to write in the same style, but run the same analysis on the result and refine it until the match is outstanding.

    2. From what I understand, the people in this test fooled it by merely being told to write in the style of someone else, without the help of any analysis tools, and still fooled it majorly. That's some pretty damn fragile "evidence" if anyone asks me. It's something Joe Sixpack can do by himself. Add some tools and it can only get crappier.

    Even such idioms as you mention, are trivial to notice even without any tools. E.g., with only a little correspondence with another team here and reading some of their docs, I can tell that they use "solution" instead of "application".

    3. While it can be handwaved as "eh, nobody said it's perfect", some people do seem to take it as less fallible than it really is. Even you just called it "This is reasonable *evidence* of authorship, where of course evidence != proof." And that's the whole point. Something that can be fooled by almost any Joe Sixpack without any tools or much effort, isn't reasonable evidence at all.

    We allow evidence like handwriting, signatures, fingerprints, or DNA because they're supposedly very very hard to fake well. Ok, so DNA turned fakable as well, but you need a fair bit of expensive lab equipment and knowledge. It's something a biology prof at a medical college could probably do, but not something Joey Three-fingers the small time smuggler would even know where to start if he wants to plant someone else's fake blood at his latest shootout scene. Or fingerprints turned out easy to fake for the purpose of fooling a fingerprint reader, but it's still very very hard to transfer to an object in a way that looks genuine.

    But here we have something that untrained people fooled by just being told to try. I'm sorry, but for me then it shouldn't be evidence at all.

    --
    A polar bear is a cartesian bear after a coordinate transform.
    1. Re:Yes, but here's the problem by hairyfeet · · Score: 2, Insightful

      It sounds to me like this "evidence" is just another case of bullet matching, which for those that haven't heard the term was the rage at the FBI for awhile and I'm sure there are innocent people rotting in jail right now over its bogus findings.

      What we have to be seriously careful about with these pseudoscience "tests", is the simple fact that juries love CSI style mumbo jumbo that makes solving a case little more than a magic box pointing out someone and saying "He did it". And just like bullet matching juries would put far too much weight onto this type of evidence, strictly because of the "CSI Factor" and how scientific it sounds. That is why I am always leery of these kinds of "helpful evidence" simply because juries will give them much more weight than the science behind them says they are worth.

      --
      ACs don't waste your time replying, your posts are never seen by me.
  13. Misrepresents forensic linguistics by digitig · · Score: 4, Insightful

    As the article says "the study only attacked some of the less complex stylometry techniques". In fact, I'm surprised that they even considered lexical density because that varies greatly within a single author's writing. It's usually high at the beginning of a text, usually (not always) gradually falls off, jumps when they change subject, and so on. I'm not aware of it's being used in forensic linguistics (although it is used in analysing texts to identify, for example, objective divisions within a text).

    The sort of thing that they used in the Derek Bentley (which contributed to the partial posthumous pardon) was analysis of his statement, which had

    • unusually high proportion of passive constructions
    • the use of police jargon
    • use of language that was not consistent with an educationally sub-normal 17-year-old
    • word frequencies that didn't correlate well with general spoken or written English but that did correlate very well with police reports
    • unusual precision in the expression of times
    • frequent post-positioning of "then" after the subject ("I then went..." instead of "then I went..."), again characteristic of police reports

    That all pointed to the statement not being Bentley's own words, but rather being the police version of his answers to a series of police questions that had been removed from the statement. One aspect of his original trial was a statement "I did not know he was going to use the gun", which was taken as evidence that he knew his accomplice, Craig, had a gun (and the inconsistency with the denial that he knew this, later in the statement, was taken as evidence that he was lying). Since the linguistic analysis shows that this was probably a reply to a question, it seems more likely that it went something like:

    Police Did you know he was going to use the gun? Bentley

    No.

    Which makes sense because he knew at the time of the interview that Craig had a gun.

    Yes, of course this sort of thing can be gamed, but it wasn't credible that Bentley would have been capable of such sophisticated gaming. The important thing as far as this thread is concerned is that forensic linguistics doesn't plug in a single measure, turn a handle and come out with a yes/no answer; it uses a whole range of measures and builds up an overall picture of what probably happened.

    --
    Quidnam Latine loqui modo coepi?
  14. No information is better than bad information... by Xenographic · · Score: 5, Insightful

    > I don't think anyone has ever sold writing analysis as a unique identifier. But it can be useful.

    One problem with that is the human tendency to be overconfident as to how good these tests are. This happens everywhere. Court, business, whatever.

    Say you have some metric at work (e.g. lines of code) that's easy to measure. If it's the only measure management has, it's what they'll use to measure how good you're doing. This applies even if the results are absurd, because they would rather believe that they have *some* idea what's going on than to accept the fact that they have no idea what's going on.

    In summary, sometimes NO information is better than bad information, but people are very reluctant to accept that fact.

  15. as if law enforcement cares by BigHungryJoe · · Score: 2

    "We would strongly suggest that courts examine their methods of stylometry against the possibility of adversarial attacks,' say the researchers."

    Of course, this assumes that law enforcement actually cares about the guilt or innocence of the people they convict. They don't. They only care about putting as many people in prison as they can.

    1. Re:as if law enforcement cares by TimSSG · · Score: 2, Informative

      They only care about putting as many people in prison as they can.

      Wrong|

      The Basic Metric used on the police is case closed.
      In other words, it is easy to say a dead person committed a crime; because it closes a case.

      Metrics have very bad sides.

      Tim S.

  16. Re:Could have told you writing analysis was bogus. by Jason+Levine · · Score: 5, Interesting

    I've always wondered just how accurate signatures are. I've noticed that my own signature varies widely depending on various factors. For example, when we purchased our house I had to sign my name to a dozen or more papers. The first signature looked "normal" but the later signatures were glorified scribbles. If I needed to sign a check last and just scribbled my signature on the back, would the bank (not privy to my signature's declining quality in the previous paperwork) be able to tell that it wasn't a bad fake?

    --
    My sci-fi novel, Ghost Thief, is now available from Amazon.com.
  17. Re:Could have told you writing analysis was bogus. by Lillesvin · · Score: 2, Insightful

    It is completely subjective and there is no real hard science to support such tests.

    I beg to differ. There's very little subjective in stylometrics, the subjective part is interpreting the results, but definitely not producing them. Take a look at http://en.wikipedia.org/wiki/Stylometry and tell me which of the methods described there you think is "completely subjective".

    The main problem with stylometry is not the methods, but the data. As TFA describes, changing writing style throw off the results - at least to some extent. Stylometrics relies on the fact that old habits die hard, but if someone is aware that the text they are producing might be subjected to stylometric analyses, they can employ various mechanisms to avoid identification and will probably have a better chance at succeeding than if writing casually. However, most texts used in court has been produced casually (letters, emails, text messages) and almost always have some unique traits specific to their author. Even in cases where people plagiarize a known author, they always miss some subtlety in his/her style that gives away the plagiarism. These subtle differences in style are usually caught somewhere in the stylometric analysis.

    It occurs to me now that you may be talking about hand-writing analysis, in which case my reply is completely irrelevant and you have completely missed the point of summary and TFA.

    --
    "Live free or don't."
  18. All evidence is tentative by JoshuaZ · · Score: 2, Insightful

    So handwriting analysis has problems. Another recent Slashdot article was about how DNA evidence might be falsifiable. And we all know that eye-witnesses have serious problems. We don't however reject any of these. Why not? Because we don't care about single pieces of evidence but rather about bodies of evidence. It is the collective narrative which matters. It might be possible for one or two types of evidence to be wrong or falsified. But it is extremely difficult to falsify four or five. The real problem is when overzealous prosecutors try to portray something like handwriting analysis as a CSI-style magic bullet. This is moreover, being balanced by a problem in the opposite direction, which juries increasingly wanting all sorts of technical evidence to convict even when it would be unnecessary, prohibitively expensive or in some cases, a form of evidence that really only exists in fiction.

  19. Re:Could have told you writing analysis was bogus. by jbudofsky · · Score: 5, Funny

    I've always wondered just how accurate signatures are. I've noticed that my own signature varies widely depending on various factors.

    Signatures written on paper are not all that helpful for a few reasons. First off, they are easy to forge. Second off, a single person might sign his name twice and produce two signatures which look very different to both the naked eye and some forms of analysis - hence not accurate. Where they actually are accurate, however, is when written on pressure sensative pads (such as those seen on new-fandangled credit card swipers). If you were to do an analysis of the pressure and speed at which the signer signed various parts of the signature, you would actually produce some very reliable information. This is because even when you sign your name in slightly different manners you have the tendancy to use the same speed/pressure on certain parts of certain letters. Personally I would just use digital signatures...but calculating hash functions on the back of your resteraunt receipt is never fun. Its also difficult to fit a 256-bit output on that miniscule "sign here" line.

  20. Article doesn't talk about incriminating others by neo · · Score: 2, Interesting

    While you can attempt to write in someone else's style, you're going to run into problems duplicating it strongly enough for a stylometric analysis to implicate them. Even if you lifted exact phrases from previous works you will invariably need to come up with original words, phrases, and sentence structures to fill the gaps where the original author has not written. These should be enough put reasonable doubt as to the authorship of the faked text.

    More over, if it's identified as a fake, by eliminating the material that was copied from previous styles it's likely that your identity may be revealed from the pieces that you inserted to fill gaps. Obviously the longer the piece, the more likely this is.

    The technique of hiding one's own identity is a matter of using the same techniques in stylometrics to identify phrases, words, and structures that would identify you, and then changing these until they no longer give an indication of your identity.

    Attempting to creating a work that duplicates someone else's stylometric signature would be fairly obvious to linguists.

  21. RTFA, seriously by Moraelin · · Score: 3, Informative

    From TFA: "Each volunteer was then asked to write a description of their neighbourhood in a way that masked their personal style, before writing a further passage in the style of novelist and playwright Cormac McCarthy." [...] "the techniques consistently identified Cormac McCarthy as the author of the imitations of his work."

    So, yes, the whole bloody experiment was precisely about disguising your style as someone else, and no, it did not give the tests any reasonable doubt. People trying to imitate Cormac McCarthy were consistently identified as Cormac McCarthy by the stylistic analysis techniques. It doesn't get more clear cut than this, really.

    So, yes, it is very possible for an average Joe Sixpack to incriminate someone else, if they so choose.

    --
    A polar bear is a cartesian bear after a coordinate transform.
  22. Re:Could have told you writing analysis was bogus. by a+whoabot · · Score: 2, Funny

    Dear Sirs and Madam,

    I wish to complain about that last complaint. I can assure you that all groomers of haddock and every other species in order Gadiformes are indeed transvestites. This is in fact a necessary grade to be reached in the apprenticeship process for the Gadiformes Groomers Guild (GGF). If the former complainant indeed knows of any non-transvestite groomers as such, then he should report them both to the GGF and to the Ministry of Fish Groomers in Luton at once!

    Angrily,
    Mr. Pint