Slashdot Mirror


Google Buys reCAPTCHA For Better Book Scanning

TimmyC writes "This story may interest the Slashdot folk, many of whom use the reCAPTCHA anti-spam service. Well, reCAPTCHA is now owned by Google. Apparently, what attracted Google to ReCAPTCHA is that the company has linked its core authentication service with efforts to digitize print books and periodicals. The search giant has a massive (and controversial) effort underway in that area for its Google Books and Google News Archive services. Every time people solve a CAPTCHA from the company, they are also, as a byproduct, helping to turn scanned words into plain text that can be indexed and made searchable by search engines. Interesting times indeed."

1 of 138 comments (clear)

  1. Re:WTF Summary by Rich0 · · Score: 0, Offtopic

    Maybe next election when there's hanging chads they can use that as a captcha.

    It would certainly be a lot more fair than the current process - which is a bunch of cronies each interpret the results to their preferred candidate's advantage and then a judge settles it.

    Of course, the better solution is to not have such ambiguity in the first place.

    If you wanted to implement a system for interpreting analog votes here is what I'd do:

    1. All ambiguous votes are digitized. Of course, the definition of "ambiguous" is itself ambiguous - if somebody solidly fills in one circle and leaves one dot in another, is that ambiguous? What constitutes a stray mark vs a double-vote? I guess you could err on the side of caution, or maybe put all votes through the digitizer.

    2. The digitizer chops up each vote into individual boxes and then presents them to a user in random order. For example, if the Gore box is on the left on the ballot, it could be on the left or on the right in the presented ballot.

    3. The human interprets the vote. They have no cues to actually determine who the vote is for - just whether a given box was selected.

    4. Each vote is given to sufficient numbers of people that a high-confidence vote can be selected. If you get 3 people who agree then maybe that's enough. If you get any disagreements maybe you keep asking for opinions until one response has a significant margin. Maybe votes are tossed entirely at some threshold.

    The key is that those looking at ballots should not be able to tell which boxes correspond to which candidates. That will eliminate the bias from the system.

    Again, in my opinion computers should generate human-readable ballots - so that the computer validates the ballot BEFORE the voter submits it. No issue with stray marks if there are no pencils in the room.