reCAPTCHA Hard At Work, Rescuing Fading Texts
sciencehabit writes "Computer scientists have developed a program, called reCAPTCHA, which is being used in lieu of CAPTCHA by several sites, to help digitize old books and newspapers. The reCAPTCHA takes entries from old and faded texts that optical scanners and digital-text readers have trouble with. So every time you solve that string of crooked letters, you may actually be helping historians digitally reconstruct a page from the 1908 New York Times." The Science Now story links to the longer and more informative article at Ars Technica. (We last mentioned this program last year — and now it's good to get some sense of how well it's working.)
I am almost certain that it is not all there in its entirety. There are bits that are not online specifically because of OCR errors. That is going to be true with any large volume of OCRed text.
http://twitter.com/OLDTELEGRAM
Because that's so different than the thousands of useless geeks wasting their time on /.
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
I would imagine that they use multiple logins to verify one word - it's not like people don't mistype captchas in the first place.
Live today, because you never know what tomorrow brings
Both words are from 'real old text'. You won't have any effect on the data output by putting 'penis' because more people will type the correct word.