Slashdot Mirror


Google Buys reCAPTCHA For Better Book Scanning

TimmyC writes "This story may interest the Slashdot folk, many of whom use the reCAPTCHA anti-spam service. Well, reCAPTCHA is now owned by Google. Apparently, what attracted Google to ReCAPTCHA is that the company has linked its core authentication service with efforts to digitize print books and periodicals. The search giant has a massive (and controversial) effort underway in that area for its Google Books and Google News Archive services. Every time people solve a CAPTCHA from the company, they are also, as a byproduct, helping to turn scanned words into plain text that can be indexed and made searchable by search engines. Interesting times indeed."

2 of 138 comments (clear)

  1. WTF Summary by afxgrin · · Score: 0, Troll

    How does solving a captcha help the database? That doesn't make ANY sense at all - a captcha needs to be solved before hand to make sure that the user authenticates the correct word. You don't just type into the captcha input box any random word, and it lets you through!

    Heh I can just see these spamming guys trying to modify an OCR system for captcha breaking, and suddenly realizing they can just input any word.

    1. Re:WTF Summary by melikamp · · Score: 0, Troll

      I must say this system is ingenious.

      I respectfully disagree. I hate CAPTCHA because it discriminates against AI. Instead, Web-based systems should be designed to accommodate AI participants. I hate reCAPTCHA even more because it is even more annoying and I have no idea who I am working for. I always intentionally smash the keyboard with my palm for the second word. I think that tricking people into working for you is by far the least decent way of distributing this process. It would be better to have an "OCR box" which has nothing to do with CAPTCHA and is known to be a part of a copyleft or public domain project, like Wikimedia. It should display, as others have suggested, single sentences or sentence fragments, so that the reader can use the context, and it should be completely unrelated to CAPTCHA, which is just a discriminatory practice, and, as such, unethical.