Human and Machine Readable Handwritten Language?
darrint writes "In some obscure corner of the Earth, has someone developed a human handwritten language which can be easily read by a machine? Why is the visual divide between what can be written by a human and what can be read by a machine so wide? At one extreme is the bar code, which I certainly cannot hand write. Machines can read it easily. Bank checks have a human readable account and routing numbers printed in special ink running along their bottom margins. These numbers can be read by a machine and are clearly legible to a human, but I doubt I could write them for input to a machine. My old Palm handheld could read something like handwriting in its little box. OCR exists but I've never thought of it as reliable. I would like to dash off little notes on stickies or in a tiny spiral notebook and be able to suck them into vim, a browser text-input box, and so forth. Perhaps I'd have to learn some kind of machine readable 'shorthand.' Has it been done?"
My guess would be something a little similar to braille. In theory, the computer would have an excellent time reading this, and a few simplifications might make it easier to write.
http://www.TheGamerNation.com/Forums
The problem with a machine readable, human writable language is that humans aren't neat enough. When I write the letter R it looks one way, which is differant than my sister, or my friend, or my butler (okay, i don't have a butler...but a kid can dream!).
If someone were to develop a language that was machine readable, human writable, it would probably consist of a series of straight lines. Letters would have to be larger, but lines are probably the way to go.
|_|__|-__-__-_||_|__
^like that.
NewslilySocial News. No lolcats allowed.
I guess what would be interesting would be to have OCR look at 100 peoples handwriting and see if there are any letters that are typically difficult to recognize, and then come up with a substitute that would be easy for the computer to read. Block capital letters should be fairly unambiguous, but I think many people don't write solely in that. I tend to mix my caps and non-caps within words, and I could see where the comp would mistake my F and P and O and Q U and V.
Does anyone know how Palm came up with their graffitti handwriting? - they must have done some studies.
..........FULL STOP.
An alphabet based on entirely straight lines would be easy enough for a computer to read if letters never touched. The software would first detect the line of text, then along the row of letters, find the first black pixel, then find all the lines touching the line containing that pixel. Bonus points if all characters had a single vertical line (making this sort of a barcode of its own).
You took us from having a human-readable, non-machine-readable alphabet to the exact opposite. I don't want to be a barcode scanner!! Hehe.
Graffiti on palms doesn't really work that well. I've tried to get fast at it; but if I'm trying to write down something someone is saying on the phone, I usually resort to the on-screen keyboard. It just doesn't get much faster than that no matter how fast you can write.
Ultimately, handwriting recognition systems need a way to be customized; I should be able to make my own alphabet up from scratch and tell my OCR software about it. Sometimes my palm mistakes a 'k' for an 'R'; when in fact my 'k' and 'R' are all totally different looking.
Having one system that works with every human being's style is unrealistic and just won't work as well as everyone wishes; humans can adapt a little, but even after years of "adapting", my Palm still hiccups on about every 5th word that I try to write.
How is it that we can produce software that can recall your face, but handwritten OCR is still so error-prone? It's 2006 already! 10 years ago I hoped it would be further along by now.
I should note that I've only tried my handwriting on Palm's Graffiti and my scanner's bundled OCR (which is worse). Are tablet PCs or Pocket PC's any better?
Punch cards aren't really easy for human to read, unless you have only handful of parts.
For the original problem, I think the issue between computer recognizing handwriting is that shapes in everyones handwriting alter so much. I can't get my pda to recognize my handwriting even after training for several weeks, I just gave up and scribble notes as pictures instead.
Main issue to remember is that computers process in numbers, not letters, to completely solve this issue, we'd need a language that's completely based on numbers.
Standardizing handwriting in numbers shouldn't be impossible task,
but plain numbers don't tell people anything, we'd need symbolical dictionary to survive, something like:
0 no
1 yes
2 life
3 maybe
4 meaning
Once you memorize it, you can easily build concepts like 42 and 02 but there's problem for humans that we often need to express more than 10 things, or 100 if all 10 basic elements could be combined together.
We'd soon face the problem that exists with asian languages; you'd have symbolic meanings for 52 5322 and 34 3042. Unless you fully comprehend which can be combined with which, or you don't understand why certain terms combine together, or you simply don't have any clue what certain symbol means, you'll end up writing&speaking total gibberish to everyone else.
And as the size of the dictionary grows, long strings of numbers become hard to read, since the symbols don't vary that much and you'd end up having maybe up to 7 or 8 number long strings which together form sentences.
Sure, you could learn such system, but it would take years to master. You'd have to start learning in numbers at childhood in order to become totally fluent with such system.
There are no atheists when recovering from tape backup.
The typical account information line printed at the bottom of your typical credit card statement or utility bill is printed in a font known as OCR-A. Equipment for machine reading this type of font has been around for over 25 years, such as some of the old Banctec 4300 series workstations used for processing bill payments and checks. Even these 1970s era machines had better than a 95 percent read rate of the entire account information line, provided that the printing was clear and properly placed. Later machines, such as the NCR 7780 or the OPEX Eagle can have better than a 99 percent read rate of a full line of characters. Again, the usual limitations on reliability of OCR characters are a result of poor or mislocated printing, or stray marks in the OCR field. Here is the obligatory Wikipedia link if you interested in finding out a bit more about the history of Optical Character Recognition.
MICR fonts, which are those funny looking numbers printed in magnetic ink at the bottom of most checks are designed to be human recognizable but machine readable, and have been around since the '60s. OCRA typically beats MICR today, but a good MICR line is still readable over 95 percent of the time.
Handwritten fonts are the most difficult to read, but the technology has been available to read handwritten numbers and letters for over 10 years, but typical read rates for something like a handwritten zip code or the numerical amount written on a check range from 60 to 80 percent, and are slowly getting better. Again, a lot depends on how much care is taken when writing out the text, and what kind of background clutter is present.
As for me, I typed out school reports in 8th grade in 1973, when our family's word processing hardware consisted of a 1940's vintage Underwood typewriter. Even humans had difficulty decoding my handwriting!
I think you've hit the nail on the head. Handwriting recognition etc. is for the same set of people who prioritize their laptop choice with how cool it looks in a coffee shop. (That, and special contexts or the disabled, but I digress...)
Given the speed differences between typing and handwriting (even in non-computational contexts), I consider attempts to do handwriting recognition as a kludge.
The real solutions will come in the form of portable/projectable/virtual keyboards or an entirely new input method--as far as I am concerned, handwriting has fallen by the wayside for all but the most light-weight tasks.
Somewhat off topic, but there was a certain language that functioned like what you described, just not with numbers. It is called aUI (with that capitlaization) and was created in the 50s by Prof. John Weilgart, a (bored) psychologist. The language is composed of 42 very simple ideographic "letters" that each have both a meaning and set pronunciation. The letters combine to form concepts that can be as simple or as complex as you want to make them, and the latest edition of edition of his book (1979) has a dictionary of over 4000 words. It was made so that only the most general concepts (plus the numbers 0-10) would be classified as single letters, and I think this system works very well. I really suggest you check it out if you have any interest in languages or communication, but the information available online is somewhat limited. I was able to get his book, aUI, the Language of Space, through an interlibrary loan, but I am pretty sure it is long out of print. I really think this language has a much greater chance of being useful than anything based on numbers, and since it only uses very basic shapes (e.g. number shapes, a spiral, circle, oval, etc.) it could probably be recognized pretty easily by OCR systems, probably as well as or better than current print-letter recognition.
This is what I don't get, about a decade after the invention of the Newton, why use the machine's language when you can use a Newton and it can read very, very bad hand writing? I know people who's family couldn't read their writing, but their Newton could! It was based on learning of what you cross out. The only trick was that if anybody else used the thing, it very quickly unlearned the awful writing and he had a day of hell teaching it again.
I made a handwriting system a long time ago with the following goals in mind in designing it:
1. It should NOT be easily readable by a casual observer (for notes I didn't want other people to read).
2. The most commonly used letters should be the simplest to draw, so it should be fairly fast to write, like cursive.
3. Letters should be as umambigious as possible, so even the most scribbled/hurried writing would be distinctly recognizable.
4. Each letter should try to hint to the original latin letter to some degree, whenever possible. Although goal #2 usually would take priority over this one when in conflict.
5. A mid-height clear horizontal marked the beginning/end of a new letter.
6. (just for fun) It should look kinda weird and cool in a sci-fi sort of way, so if someone came across my notes they would be kind of baffled =)
While #2 and #3 might work towards making this an easy-to-OCR handwriting system, #1 and #6 probably makes it moot, at least for the system I made. However, I imagine it wouldn't be too hard to make a less-obfuscated more-practical writing system which try to accomplish similar goals to #2-4 above.
I made a font out of my handwriting system a few years ago. If anyone is curious, here is an image chart of the font. =)
I'm curious what other more "efficient" writing systems may exist out there (other than standard and cursive). Does anyone know of any others?
Please consider making an automatic monthly recurring donation to the EFF
This can read them better than I can (check out the crazy examples)!
Sendou Wave Kick!!
What I find annoying on my PocketPC, is that as long as you only use US english, it performs reasonably well in recognizing my writing and guessing words, but my native language is Dutch. This gives 2 problems:
- It tries to guess Dutch words using an US English dictionairy, which is so much of a PITA that I switch off the entire dictionairy function.
- Dutch has a few characters that aren't in the standard US character set, this leaves me "international" as the only other option, but this also contains a lot of characters I will never use, and only cause confusion for the OCR system.
- Next to that I don't like that it forces you to learn it's alphabet instead of it learning yours.
In short I am very disappointed about my PocketPC, also because of some other limitations I was unaware of when I bought it. (remove battery and it forgets everything, coupled with an ActiveSync backup that doesn't work; I'm lefthanded, which makes the user interface very akward), I now have a Nokia Series 60 phone and prefer that.
RogerWilco the Adventurous Janitor
We are using new forms at work to take advantage of ICR or Intelligent Character Recognition Software for our service reports. Each letter is entered into little boxes. The reports are then scanned, gaps are filled by a data entry clerk, then we are emailed an electronic "grade" by the software, based on the percentage of fields that were machine readable. With reasonable care, most of the guys can get the machine to resolve 80 to 90 percent of the fields. Of course it slows down how fast we can fill out our service reports, so in the end I wonder how much time they really save.
A purely constructed alphabet that would be easy for humans to write and easy for machines to read would involve a group of connected strokes.
/|
---
|\
| X |
|/ \|
---
From the 6 strokes here you have 64 total possible combinations. Discard the 24 that are disjoint and youve still got plenty for 26 letters and 10 numbers.
As to an english-based alphabet, the problem is that so many letters are far too similar, especially b / h / k, i / j, rn / m, and that handwriting is too fluctuous. Capital letters are an obsolete idea that only further complicates things.
The outdated nature of most written languages is mirrored in spoken alphabets. There is absolutely no reason for 'w' to have a 3 syllable name. I have encounterd a number of people who say "www" as "dub dub dub", and I am considering spending a week or two training myself to permanently replace "double-u" with "dub" in my vocabulary (that is how long it took me to unlearn 20 years of tying my shoelaces wastefully and ingraining a better faster way).