Slashdot Mirror


Human and Machine Readable Handwritten Language?

darrint writes "In some obscure corner of the Earth, has someone developed a human handwritten language which can be easily read by a machine? Why is the visual divide between what can be written by a human and what can be read by a machine so wide? At one extreme is the bar code, which I certainly cannot hand write. Machines can read it easily. Bank checks have a human readable account and routing numbers printed in special ink running along their bottom margins. These numbers can be read by a machine and are clearly legible to a human, but I doubt I could write them for input to a machine. My old Palm handheld could read something like handwriting in its little box. OCR exists but I've never thought of it as reliable. I would like to dash off little notes on stickies or in a tiny spiral notebook and be able to suck them into vim, a browser text-input box, and so forth. Perhaps I'd have to learn some kind of machine readable 'shorthand.' Has it been done?"

13 of 119 comments (clear)

  1. Uh.... by Anonymous Coward · · Score: 1, Informative

    Have you ever used one of those tablet computers? They do exactly that.

    1. Re:Uh.... by MobileTatsu-NJG · · Score: 3, Informative

      "Most of them work by guessing what you wrote based on a dictionary (similar to cellphone texting). Give it anything it can't look up and it'll be close, but more often than not, not quite."

      Depends on what you have it set to. My TabletPC is set to read each individual character at a time. It provides little spaces to write each character in, so you don't have to worry about spacing or anything. That's been my favorite, honestly.

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

  2. Ideal handwriting style by philgross · · Score: 4, Informative

    Most of the responses seem to be missing the point of the post.

    OCR/handwriting recognition folks: what would the ideal handwriting for machine readability look like? Could simple variations on standard English cursive or printing approach 100% recognizability, or would the ideal have to be synthesized, like shorthand, and if so, what characteristics would such a script have?

  3. Re:morse code by sam1am · · Score: 2, Informative

    It's ternary - dots, dashes, and spaces.

  4. United States Postal Service by Hamled · · Score: 2, Informative

    USPS has been using handwriting recognition hardware and software for some time. They do, however, implement relatively state of the art neural nets and other AI algorithms to interpret the handwriting, so it's probably not feasible for most people. More information on the system they use is here.

  5. Apple Newton tried. by Nutria · · Score: 2, Informative

    But low-wattage CPUs were too understrength at the time.

    Maybe if someone tried again now, Newton would a better job.

    --
    "I don't know, therefore Aliens" Wafflebox1
  6. Re:I believe it has been done by darkstar2a · · Score: 2, Informative
    But not old enough to know that Scantron/et.all is not punch-card.

    Punch cards are not coded with pencil, they are coded with physical holes "punched" out of the paper (becoming... can we say it.. Chads!)

    Unless your refering to MarkSense which turned marked cards into punch-cards by a machine that would sense the mark and punch it out.

    Thanks to the last US Presidental election, the whole worlds knows the term chads, even if they don't all know what they mean. :)

  7. Wouldn't Kana fit the bill here? by thewils · · Score: 2, Informative

    Hiragana or Katakana has a specific traditional form which should be machine readable. Japanese kids spend ages learning the correct stroke order and style.

    --
    Once I was a four stone apology. Now I am two separate gorillas.
  8. Re:PDAs cheat by DingerX · · Score: 3, Informative

    That's how humans have read handwriting for most of the papyrus/parchment/paper era.

    The problem now is that we're used to reading print. One of the main principles of palaeography is that you read the motions of the pen (or other writing tool) in the medium. Ink in particular is great for this sort of expression, because you can (especially with a flat nib) express all sorts of motions; and using a variety of analytical tools, you can reconstruct missed strokes, damage to the medium, overlapping words and the rest. Some of those analytical tools are, of course, analysis of the linguistic context. And that same context lets us get really fancy with our handwriting. For example, if something logically follows, I don't need to waste my time writing it out clearly.

    To muddy the waters further, no two people use the same handwriting. Even in contexts where the formation of letters is strictly determined, everybody has their individual variations, epsecially in pressure, speed, stroke order, stroke direction, and lifting the pen. They also vary in how they form the letters.

    So yeah, you can probably get decent success using handwriting OCR on things like addresses and bank account numbers -- because you've got a known context, and are basically looking for key numbers.

    And I'm sure there's decent software recognition out there. But to get something that reads human script -- even a forced "machine-friendly" hand -- takes a lot of work, and a lot of training in areas that machines are not good at. You'd need a pretty big neural net.

  9. Re:OCR and MICR Reliability - a minor correction by N3Bruce · · Score: 2, Informative

    According to the Wiki article I referenced, MICR has an error rate of about 1 in 20,000 checks. This does deserve a bit of explanation though. As someone who works as a technician with modern check processing equipment, I can say that an error rate of 1 in 20,000 does not mean that a MICR or OCR system can successfully read 19,999 out of 20,000 checks fed into the machine. This is the rate that the MICR will read one account number and mistake it for another. In reality, the typical Magnetic MICR can read about 96 or 97 percent. If the MICR comes across an unreadable MICR character, it will reject that item. The Account Numbers and Routing Transit numbers on the MICR line of a check are also set up so that a checksum can be performed on the sum of the digits on the MICR line and verify that the information is valid. Inconsistencies in printing can affect MICR as it does OCR, but the fact that the data is printed in magnetic ink and read magnetically mean that stray marks from customers signatures, check decorations etc. do not adversely affect the readability of the information.

  10. Re:Why in the hell... by Anonymous Coward · · Score: 1, Informative

    I beg to differ - the fastest way is called "Dvorak". :)

  11. Re:OCR Reliability by Inda · · Score: 3, Informative

    Being an ex-postman (survived one month!), I've seen the automatic sorting machines that read hand-written postcodes (zip codes in the US). I forget how many letters the machine sorted a minute, it was between 500 and 1000, but I do remember the 90% accuracy number that was boasted. The machine 'cheated' in some respects because it only had to read a 6 or 7 character postcode, of which there are only a small amount of combinations. The machine also checked the county and city if it needed clarification.

    Any postcodes that could not be read, dark paper and red ink etc, were scanned and transmitted to a postal worker drone in another part of the country who would type in the postcode from their terminal. The machine would receive the code back a few seconds later and the letter would carry on its journey.

    I was impressed.

    --
    This post contains benzene, nitrosamines, formaldehyde and hydrogen cyanide.
  12. Re:Obfuscated handwriting system by Anonymous Coward · · Score: 1, Informative

    Hey, that's really cool. Two points though: The capitalisation looks hard to do, I suggest some other system - bear in mind that capitals are always the first letter in a word, so I suggest losing the initial horizontal mark since it's not required for joined up writing.
    Second - too much backtracking, try and avoid the 180 degree pen reverses. Lose them and you'll have a great system.