Slashdot Mirror


reCAPTCHA Hard At Work, Rescuing Fading Texts

sciencehabit writes "Computer scientists have developed a program, called reCAPTCHA, which is being used in lieu of CAPTCHA by several sites, to help digitize old books and newspapers. The reCAPTCHA takes entries from old and faded texts that optical scanners and digital-text readers have trouble with. So every time you solve that string of crooked letters, you may actually be helping historians digitally reconstruct a page from the 1908 New York Times." The Science Now story links to the longer and more informative article at Ars Technica. (We last mentioned this program last year — and now it's good to get some sense of how well it's working.)

4 of 112 comments (clear)

  1. Re:Huh? 1908 New York Times? by FlyingSquidStudios · · Score: 2, Insightful

    I am almost certain that it is not all there in its entirety. There are bits that are not online specifically because of OCR errors. That is going to be true with any large volume of OCRed text.

  2. Re:Not new by Tassach · · Score: 2, Insightful

    Because that's so different than the thousands of useless geeks wasting their time on /.

    --
    Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
  3. Re:Not new by Kjella · · Score: 2, Insightful

    I would imagine that they use multiple logins to verify one word - it's not like people don't mistype captchas in the first place.

    --
    Live today, because you never know what tomorrow brings
  4. Re:Validate your data, guys! by Anonymous Coward · · Score: 1, Insightful

    Both words are from 'real old text'. You won't have any effect on the data output by putting 'penis' because more people will type the correct word.