Writing Style Fingerprint Tool Easily Fooled
Urchin writes "Some of the techniques used by literary detectives and courts of law to identify the authorship of text are easily fooled, say US researchers. They found that non-professional writers could hide their identity from 'stylometric' techniques by writing in the style of novelist Cormac McCarthy. Stylometric methods have been used in a number of high-profile legal cases in recent decades, including the 'Unabomber' trial. 'We would strongly suggest that courts examine their methods of stylometry against the possibility of adversarial attacks,' say the researchers."
Some analysis of handwriting can be useful. In forgery, for instance, a signature can show as false when compared to an authentic one by the presence of a "forger's tremor", because the forger must proceed more slowly to produce the signature than the person to whom it properly belongs.
Yes, but the problem is this:
1. It's not just that it's possible to fake not being myself, it's also that I can pretty much frame someone else. E.g., given enough messages written by KibibyteBrain (which just clicking on the user name or id will give me a list of), it's trivial to do a stylistical analysis on those and not just get an idea of how to write in the same style, but run the same analysis on the result and refine it until the match is outstanding.
2. From what I understand, the people in this test fooled it by merely being told to write in the style of someone else, without the help of any analysis tools, and still fooled it majorly. That's some pretty damn fragile "evidence" if anyone asks me. It's something Joe Sixpack can do by himself. Add some tools and it can only get crappier.
Even such idioms as you mention, are trivial to notice even without any tools. E.g., with only a little correspondence with another team here and reading some of their docs, I can tell that they use "solution" instead of "application".
3. While it can be handwaved as "eh, nobody said it's perfect", some people do seem to take it as less fallible than it really is. Even you just called it "This is reasonable *evidence* of authorship, where of course evidence != proof." And that's the whole point. Something that can be fooled by almost any Joe Sixpack without any tools or much effort, isn't reasonable evidence at all.
We allow evidence like handwriting, signatures, fingerprints, or DNA because they're supposedly very very hard to fake well. Ok, so DNA turned fakable as well, but you need a fair bit of expensive lab equipment and knowledge. It's something a biology prof at a medical college could probably do, but not something Joey Three-fingers the small time smuggler would even know where to start if he wants to plant someone else's fake blood at his latest shootout scene. Or fingerprints turned out easy to fake for the purpose of fooling a fingerprint reader, but it's still very very hard to transfer to an object in a way that looks genuine.
But here we have something that untrained people fooled by just being told to try. I'm sorry, but for me then it shouldn't be evidence at all.
A polar bear is a cartesian bear after a coordinate transform.
No, but they knew they were being analyzed and for what. It's trivial to change my style (well, maybe not in English, I don't tend to have the word pool to draw from) and become someone else. If I know in advance that my writing would be used to find me.
You can, probably, given time and persistance, sift through the thousands and millions of board messages posted everywhere on the internet and find out who I am in other boards. I didn't try to hide my identity against comparison of writing styles.
I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
> I don't think anyone has ever sold writing analysis as a unique identifier. But it can be useful.
One problem with that is the human tendency to be overconfident as to how good these tests are. This happens everywhere. Court, business, whatever.
Say you have some metric at work (e.g. lines of code) that's easy to measure. If it's the only measure management has, it's what they'll use to measure how good you're doing. This applies even if the results are absurd, because they would rather believe that they have *some* idea what's going on than to accept the fact that they have no idea what's going on.
In summary, sometimes NO information is better than bad information, but people are very reluctant to accept that fact.
I've always wondered just how accurate signatures are. I've noticed that my own signature varies widely depending on various factors. For example, when we purchased our house I had to sign my name to a dozen or more papers. The first signature looked "normal" but the later signatures were glorified scribbles. If I needed to sign a check last and just scribbled my signature on the back, would the bank (not privy to my signature's declining quality in the previous paperwork) be able to tell that it wasn't a bad fake?
My sci-fi novel, Ghost Thief, is now available from Amazon.com.
I've always wondered just how accurate signatures are. I've noticed that my own signature varies widely depending on various factors.
Signatures written on paper are not all that helpful for a few reasons. First off, they are easy to forge. Second off, a single person might sign his name twice and produce two signatures which look very different to both the naked eye and some forms of analysis - hence not accurate. Where they actually are accurate, however, is when written on pressure sensative pads (such as those seen on new-fandangled credit card swipers). If you were to do an analysis of the pressure and speed at which the signer signed various parts of the signature, you would actually produce some very reliable information. This is because even when you sign your name in slightly different manners you have the tendancy to use the same speed/pressure on certain parts of certain letters. Personally I would just use digital signatures...but calculating hash functions on the back of your resteraunt receipt is never fun. Its also difficult to fit a 256-bit output on that miniscule "sign here" line.