Search Engines for Handwritten Documents
An anonymous reader writes "Researchers at the University of Massachusetts have created a tool for automatically searching handwritten historical documents, such as the 140,000 pages that make up George Washington's personal papers in the Library of Congress. The most interesting part is that the papers are scanned versions of the originals and the search tool actually recognizes the handwritten text from these images."
In America, handwriting is only for old people.
The most interesting part is that the papers are scanned versions of the originals and the search tool actually recognizes the handwritten text from these images.
How else would it search handwritten documents? Am I missing something here?
Huh? Well, lets see how well it keeps up with my doctor's handwriting...
Free XBox, PS2
Somebody invented a way for computers to recognize handwriting.
Like, so 10 years ago.
paintball
No OCR is performed on the documents. The search tool operates on the image.
Fair is where you take your cow to be judged.
Wow, looking at some of those examples, I was amazed by the fact that I couldn't READ most of the words. It looks completely foreing to me, might as well be trying to read Japanese.
How good is the accuracy? The OCR technology of today might not be able to recognize the "flowery" text of most historical documents (look at "We the People" in the Declaration of Independence)
got sig?
RTFA :) It actually looks pretty cool, the software is looking through the actual handwritten pages.
Tech News, Reviews and Tutorials
These documents are old and handwritten. Why waste the processing power decyphering results for each search when you can decypher the text once with a similar algorithm and search an index built that way? It's not like the information is ever going to change. (unless we do rewrite history)
Google already did it! Well, it's not handwritten, but that's just a logical progression.
such as the 140,000 [handwritten] pages that make up George Washington's personal papers in the Library of Congress.
In related news, the family of Tobias Lear, George Washington's personal secretary, who took his own life (arguably due to the horrible pain in his wrists), has filed suit.
I hate reading/producing anything longer than a post-it note that's in handwriting.
The owls are not what they seem
... eh eh !gniddik tsuJ. !skoobeton inciV ad eht no esool ti teL.
I took a lot of notes in College. I took a lot more notes in graduate school. I've even taken notes on books I've read for the fun of it. If I could run all of these through my scanner & search them from an application on my desktop, I could be really obnoxious in an argument.
Trying to use sarcasm in text-based forums does not work.
You have to be able to handle a quill pen to use it.
Sometimes seventeen/Syllables aren't enough to/Express a complete
It's an interesting approach that should be extended to other languages than English. Most of the world's history is not about the US and it has certainly not been written down in English. What I would really like to have is a similar tool that can search, say, Greek, or Latin, (or whatever) handwritten text. Imagine being able to query Ovid for an item of interest without having to consult everything he's written. I can imagine that this might encourage people to study the classics (a pet peeve of mine is that many people lack historical sense...) and it would certainly facilitate research in this area.
If you can put the queries in English, with the search engine taking care of translation, it would be even better. Then, extended historical study comes within everyone's reach and the classical studies (or humaniora) might be transformed.
----- One learns to itch where one can scratch.
How pleafant that they've done what waf neceffary to make this happen. How did they train the foftware to recognize the quirky 18th Century handwriting?
And the brethren went away edified.
We could use it as a jobs program for monks. Their predecessors wrote the manuscripts, and now they could transcribe them into digital form...
A fine is a tax you pay for doing wrong and a tax is a fine you pay for doing all right.
Their handwriting recognition system doesn't work for shit. It couldn't even correctly retrieve results from words that I know are in its scanned letters. The word "governor" appears as a result from one of their suggested queries (*cough* hard coded results *cough*), but if you do a separate search for governor it returns stuff that doesn't even contain the word.
Any man who afflicts the human race with ideas must be prepared to see them misunderstood. -- H. L. Mencken
It's "Pixelative Text Cognizance."
It's different. With OCR these rays of light scan the original, translate each scanpoint to discrete RGB values, and do pattern recognition.
With this system, they just read the discrete RGB values directly from pixels of documents scanned in with rays of light, then they do recognition of patterns. See, it's totally different.
They aren't doing OCR
Yes, they are. They are not using an off-the-shelf OCR package. The OCR functionality is embedded into their software, it is highly specialized, but it is OCR. For those who are fixated on the letter 'C', recognizing multiple characters as a single unit is nothing new.
Convert the search text into an image to look as written by hand.
Then do an image search on the documents. You will need a powerful image recognition software.
This would be news.
*** Find that COM error at http://www.comerrors.com **
No OCR is performed on the documents. The search tool operates on the image
The search tool is doing the OCR then. OCR is simply taking an image and analyzing it to recognize text.
If only Nicholas Cage had this tool at his disposal, it would have made things much, much easier.
Holy shnikes! Optical Character Recognition! Bah.. I'm part of a research team at the Center for Cybermedia Research who are working on new algorithms for OCR with $4 million from Homeland Security. Its to be used on a gi-normous database containing scanned images of documents relating to Yucca Mountain.
On top of that, OCR has been around for years. Yes, it isn't the best, but its functional. Doesn't census bureau use OCR for its census forms?
So, yeah.. where is the news in the article?
What is your penile percentile?
it has been universally agreed that the most fundamental unit of information capacity in computer science, the "Library of Congress"
Really? Whatever happened to the bit???????
Video Production Support
Somebody invented a way for computers to recognize handwriting. Like, so 10 years ago.
I worked on an OCR system about 20 years ago. No pre-defined bitmaps of text, you trained the system on the font to be recognized. After a few hours you could turn it loose and it did fairly well. While goofing off we tried handwritten text. With good penmanship it worked to a degree.
No, OCR stands for Optical Character Recognition. This is Digital Character Recognition on an Optically Acquired Digital Image. Don't you see the difference?
Video Production Support
For sure it will cost 5 times and more complicated algorihtm if it were use to search Doctor's handwriting.
Danny Dunn and the Homework Machine.
This is really, really, really, really stupid, it would be faster just to hand type the documents into the database, then search it, you could link to pictures of the documents if you really needed it
There is no sig
I've been using this feature in OneNote for a long time now. It searches through my handwriting with amazing accuracy
They already have a full time job. Praying.
great. now people are just going to spoof documents and put pr0n or enlargement spams in the pdfs when i search for anything academic related. i'm glad i dont have that problem yet finding pdf papers via google yet.
my blog
I think, maybe 3rd or 4th grade is the last time you have to use cursive. I do, however highly recommend giving your kids touch-typing classes, so that they too, can keyboard with fluidity (and rapidly lose their writing skills too).
For me, it is a speed issue - I can type MUCH faster than writing, when I have a lot to do, typing on a computer is the way to go (plus, I can't live without speelcheking).
That said, I do agree with others that sometimes, pen and paper is the right way to go - for me that is pen and composition books that I scribble in on a daily basis to keep track of what I was doing, when. (I am a software consultant) - There is nothing faster than flipping thru a comp book with dates on every page to see what I was doing, say August 10 (testing and OSD application). However, the penmanship on those notes is really bad - and anything I learn and jot down I do type into at least a plain text document so that I can search for it later (and have it be legible when I find it)!
Heh, what about shorthand? My Mother used to write in shorthand whenever she wanted to write notes to herself that noone else in the family could read.
Finally - how many of you have even tried to type, on a manual typewriter (if you can find one) lately? I learned on one, and was a speed demon, back in the day. Now, after years of these soft-touch keyboards, I tried punching a few keys on a manual and had a hard time making marks on the paper. Sheesh, you really need to whack those things. Good Riddance!
This issue is a bit more complicated than you think.
That's my point.
Video Production Support
The only real threat is fire, and it is no more dangerous than it is to CDs or hard drives.
Go back and look at some old notebooks - if they used acid-based paper, then they'll be getting rather fragile.
Although it is hard to OCR text and very hard to OCR cursive text written in historical documents, performing searches on those documents does not require a complete comprehension of the textand is therefore much easier to do.
For instance, the software may be unable to distinguish the word bug from dog in one person's handwriting, but can still mark it with probabilities of the word's possible meanings.
If a person later searches for the word bug or dog at a future date along with other terms, a mathematical calculation can be done for the likelyhood of the match and the searcher can make his/her own judgement to the meaning of the text.
---
Conrad Barski
can it make sense of square roots? Matthew Leung
Though it may not seem important to most of us who are used to Microsoft Word, the search engine for handwritten documents is important for the following two reasons- 1] It is an innovation in computer science and this technology may have applications elsewhere. 2] There are old documents that are handwritten and it is not practical to create their typed versions. This is an inexpensive method of creatig easy access to those documents on the Internet.
The difference might become more obvious when it is apllied to the field of digital authentication common in porn and free email providers.
You know, the little images of a couple of chars you have to type over to 'prove' you're human
I wonder what will be the next step in anti-bot techniques now that this last hurdle seems to be taken aswell....
According to this search, the famous Patrick Henry was noted to have actually said "Give me Liberty, or eat up martha!"
Of course, you use a ballpoint pen for lab notebooks, not fountain pens or other pens based on water-soluble inks. Of course, this won't help you if you spill vodka. :-)
Anyway, in lab situations you might not have a place nearby to put a laptop and you might be running between different laboratories so a laptop is often not very convenient. I was taught that you should write observations directly in a notebook instead of waiting and writing them down later. Moreover, you are not supposed to change the notes once they are written down, a temptation that might be hard to resist if the notes are in a computer file.
Avantslash: low-bandwidth mobile slashdot.
I for one enjoy the "guessing game." I find it rewarding to be able to read something without having to recognize everything letter-by-letter (the context should tell you what letter it had to be, unless it's a proper name).
Apparently, you haven't done one of those exercises they give in, er, high school publication classes.... Well, it has to do with how one shouldn't write everything in capitals because most people do not read (i.e. understand) a writing/sentence by recognizing everything letter-by-letter. People learn to recognize blocks of letter, based on relative size/positions (and using that it is possible to guess what the sentence/word was even if someone were to color all letter-blocks black... of course, this doesn't work with Courier).
BTW, I guess this does lead to some sloppiness--I use cursive when I'm not sure how to spell a word exactly, because I know that most people able to read cursive do not rely on getting the exact spelling. ;)
"Do you think that OCR is actually the wrong way to think about this problem? After all, we don't really care about characters, but rather about what words and ideas have been written. Do you have a strong background in pattern recognition, machine learning, image processing and computer graphics? Google currently "reads" almost every web page in the world. Come help us read all the printed material as well!"
:)
Requires MS/PhD in CS/EE. Position available only in Mountain View.
http://www.google.com/jobs/eng/sw.html#ocre
(Note: I don't work for Google -- just thought someone on this thread would like
Corollary to Moore's Law: The IQ of new computer owners is declining.
For those with strong views (one way or another) about handwriting, please let me know what you think about the information on this web-page:
Somehow, the URL I mentioned didn't come through. So, again ...
http://www.global2000.net/handwritingrepair
Comparing a language to an operating system is quite ridiculous.
The user interface of an operating system is the language through which users interact with a computer. Its ABI is the language through which users teach a computer to do tasks. Its driver model is the language through which users teach a computer to interact with their devices.
To some experienced Windows users, learning to maintain a GNU/Linux system is like learning another language.