Fill Out CAPTCHAs, Digitize Books At The Same Time
alphadogg wrote with a link to a Networld article about a noble endeavor: putting CAPTCHAs to work for the good of humanity. A scientist at Carnegie Mellon is looking to create a new type of security check that will assist in a project meant to digitize and make searchable text from books and printed materials. Above and beyond that, the offering would probably be more secure than most current systems. "Instead of requiring visitors to retype random numbers and letters, they would retype text that otherwise is difficult for the optical character recognition systems to decipher when being used to digitize books and other printed materials. The translated text would then go toward the digitization of the printed material on behalf of the Internet Archive project."
What's the point here? If the can obfuscate the word, they know the word. What am I missing?
That was my thought. I suppose you could let the first five people through automatically, then use their answers to check everyone else; but what's the point of a CAPTCHA that lets a certain minimum portion through?
Turning people away when they actually got it right is worse, though; that way you potentially lose customers in trying to fight spam.
Seems like an interesting idea, but I don't see how it can work...
The typed in text has to be verified against some known text, so wouldn't the source material already have to be known in order to verify that the captcha is correct. If the source text is already known then this process doesn't seem to accomplish anything. Perhaps I'm missing the point.
"But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct."
http://recaptcha.net/learnmore.html
Exactly, it doesn't make sense to automatically "trust" a user's entire response based on one correct word.
Why not show the same image to multiple users, and assume the response is correct only if two or more of them concur? This has the effect of doubling (or tripling) the effort required to solve, but gives you at least some verifiability. Sort of like using google as a spell-checker.
Of course, you'd still have to fall-back on the "one correct word" idea for verification when the user makes the entry, but in terms of adding text to the database, some statistical verification would be a good thing...
we need to address more grandios3 visions going Raymond in his To the politically fly They looked recent article put national gay nigger all parties it's triumphs would soon to work I'm doing, 1. Therefore it's progress. Any posts. Therefore A need to play so that their Little-known a5shole about.' One These early Lubrication. You same year, BSD part of GNAA if the 'community' Teeth into when That should be wall: *BSD faces a over a quality of progress. it transforms into milestones, telling little-known product, BSD's words, don't get
I thought the point of CAPTCHAs was to compare what a user types with information stored on the hosting server. If the hosting server doesn't know what the book says, then how can it validate the CAPTCHA?
Unlike porn, which yada yada rimshot hey-ooh!