Slashdot Mirror


Google Buys reCAPTCHA For Better Book Scanning

TimmyC writes "This story may interest the Slashdot folk, many of whom use the reCAPTCHA anti-spam service. Well, reCAPTCHA is now owned by Google. Apparently, what attracted Google to ReCAPTCHA is that the company has linked its core authentication service with efforts to digitize print books and periodicals. The search giant has a massive (and controversial) effort underway in that area for its Google Books and Google News Archive services. Every time people solve a CAPTCHA from the company, they are also, as a byproduct, helping to turn scanned words into plain text that can be indexed and made searchable by search engines. Interesting times indeed."

10 of 138 comments (clear)

  1. Well... by vikhyat · · Score: 4, Interesting

    This should improve Google's indecipherable CAPTCHA.

  2. Re:WTF Summary by duguk · · Score: 5, Informative

    You're asked to enter TWO words; one known; one not.

    From: recaptcha.net:
    But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

  3. Re:Why just words? by Canazza · · Score: 4, Insightful

    no they don't. I was transfering flights at London Heathrow and there was only one window open, and a massive queue. I get to the front and I find the woman at the computer used one finger typing... ONE FINGER, not even one on each hand, one feking finger. This was someone who was supposedly trained to do this job, can't even touch type.
    I know alot of people who still have to look at the keys when they type, and while it's generally faster than that bint, it's still painfully slow.
    Not to mention Children, when it comes to touch typing, kids can be fast learners, but before they get the hang of it, they can be very slow too.

    --
    It pays to be obvious, especially if you have a reputation for being subtle.
  4. I hope they have a couple of tests! by NoYob · · Score: 4, Funny
    As I get older, I find that I'm having a harder time reading from computer monitors and especially captchas. I confuse words all the time. For acample: erection with election. Not so bad, but if Google doesn't pass that unknown to multiple folks, it could get embarrassing. Text from a Bill Clinton bio:

    After Bill Clinton's first erection as President, he proceeded .....

    --
    It's NOT me! It's the meds! I'm on 1000mg of Fukitol.
  5. Re:WTF Summary by iamhassi · · Score: 4, Insightful

    "Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. "

    That explains why half the time I can't even read the word. I swear every time I reach a captcha I have to refresh it 5x before I finally land on two words I can read.

    I must say this system is ingenious. Distributed OCR: let millions of internet users figure out what the words are. Maybe next election when there's hanging chads they can use that as a captcha.

    --
    my karma will be here long after I'm gone
  6. Re:WTF Summary by Sockatume · · Score: 4, Informative

    The best part is, it automatically selects for words which are invulnerable to OCR-based attacks. And if the user's presented with an illegible scanned CAPTCHA, they aren't penalised for getting it wrong.

    --
    No kidding!!! What do you say at this point?
  7. Re:WTF Summary by Anonymous Coward · · Score: 5, Funny

    "Hey everyone, let's all sit refreshing the google gmail account creation page, and always type "boobs" for the second captcha value..."

  8. Re:Won't this eventually defeat the purpose? by slim · · Score: 5, Insightful

    What you get in the capcha is the scanned word, plus some warping and obfuscation. Therefore if OCR advances to the point where it has no trouble with the original scan, it would still have trouble with the capcha.

    Spammers already have a neat way around capchas -- they proxy them to people on porn and warez sites. If you ever fill in a capcha on such a site, you're probably helping a spambot out.

  9. reCAPTCHA is awesome by Thaelon · · Score: 5, Funny

    I have to say, reCAPTCHA is one of the most elegant solutions I've ever seen to a problem.

    It's not even killing two birds with one stone, it's killing two birds with one of the birds.

    --

    Question everything

  10. Re:Mod up by mrcaseyj · · Score: 5, Interesting

    I agree that the idea is ingenious. But on the only one I ran into, the word was completely indecipherable. I don't mean that it was really hard, I mean that it was a word so thoroughly mangled that it was clearly impossible to read by anyone, especially without context. The lack of context is one of the big weaknesses of the system. When a word is unclear, it's the words around it that give critical clues to what it is.