Google Buys reCAPTCHA For Better Book Scanning
TimmyC writes "This story may interest the Slashdot folk, many of whom use the reCAPTCHA anti-spam service. Well, reCAPTCHA is now owned by Google. Apparently, what attracted Google to ReCAPTCHA is that the company has linked its core authentication service with efforts to digitize print books and periodicals. The search giant has a massive (and controversial) effort underway in that area for its Google Books and Google News Archive services. Every time people solve a CAPTCHA from the company, they are also, as a byproduct, helping to turn scanned words into plain text that can be indexed and made searchable by search engines. Interesting times indeed."
How slow is searching the internet going to be if you have to fill out stupid obscured word each time?!
This should improve Google's indecipherable CAPTCHA.
How does solving a captcha help the database? That doesn't make ANY sense at all - a captcha needs to be solved before hand to make sure that the user authenticates the correct word. You don't just type into the captcha input box any random word, and it lets you through!
Heh I can just see these spamming guys trying to modify an OCR system for captcha breaking, and suddenly realizing they can just input any word.
I suppose most people write fast enough to allow sentence captchas already.
Why didn't they just spend the money on improve their character recognition AI? Ultimately, they will end up having an AI that defeats the purpose of this company anyways...
Here's to the prospect, for those of us who don't permit random web sites to run code on our computers, of yet more javascript dependant captchas to manually hack through.
In related (and more important) news mozilla at last have a working 64-bit JIT for tracemonkey.
Check out this Google book.... about the 7th page down.
http://www.google.com/books?id=Y0OOlnDFUM8C&printsec=frontcover&dq=Le+Morte+d'Arthur&as_brr=1#v=onepage&q=&f=false
I thought these were scanned in by robots? If so it looks like it has well kept fingernails.
ReCAPTCHA is a free service that usually integrates into forums, bLogs, and other such anonymous comment-posting services to help eliminate bot spamming. I think they will not use it on Google search pages, but exploit ReCAPTCHA users of all of those sites that do use it already. Sounds to me like a really good idea...
I'm interested though how they are going to know what a correct entry by a user would be for a scanned word in order to validate it if they only have a scan...
The interface uses two words: one which is verified and one which isn't. Assuming the first one is typed in correctly, they present the second to a bunch of people until they get a consensus (three the same, I think) and then it goes in the "verified" pile. Thus, even if the second word's not verified yet, a spammer will still get caught out by the other one.
sig:- (wit >= sarcasm)
Just wait until some soccer mom needs to protect her genius of a brat from all the bad things there are. Latest crusade? A 'bad' word in a CAPTCHA. Just you wait, it will happen.
After Bill Clinton's first erection as President, he proceeded .....
It's NOT me! It's the meds! I'm on 1000mg of Fukitol.
to allow people to send emails to "higher class of service" mailboxes. Hey, I should patent that idea before Nathan the ex-Microsoftie gets to it.
Google is doing this in order to prevent spam and to improve OCR. But once OCR is improved to the point where it can read poorer scans, won't spammers be able to use that new technology to eventually defeat CAPTCHA?
Don't get me wrong, I think this is a marvelous idea, potentially using volunteer labor of humans as OCR to interpret a book one poorly-scanned word at a time. But it does seem to have the side effect of eventually destroying the original purpose of what they bought. Maybe CAPTCHA is worth more as a "crowdsourced OCR solution" than it ever was as spam prevention anyway...
"This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
I totally agree, this is pure genius. Distributed Human-engined OCR is certainly the best solution to traditional OCR problems, and at the same time it leaves many doors to unforeseen traps ajar.
I have to say, reCAPTCHA is one of the most elegant solutions I've ever seen to a problem.
It's not even killing two birds with one stone, it's killing two birds with one of the birds.
Question everything
The other is to track how users browse the web, for ad targeting. All they need to do is put a cookie in your browser and read it next time you see a captcha or load a Google analytics script.
What is stopping them from including their analytics code (or else something that scrapes behaviour of a user over different websites) behind the scenes?
A corporate motto?
http://images.google.com/imagelabeler/
Have you paranoiacs figured out how Google is going to use this to spy on you or otherwise do evil?
Utilizing the synergization of benchmark e-solutions to pre-workaround action items!
no they don't. I was transfering flights at London Heathrow and there was only one window open, and a massive queue. I get to the front and I find the woman at the computer used one finger typing... ONE FINGER, not even one on each hand, one feking finger. This was someone who was supposedly trained to do this job, can't even touch type.
I don't know about London, but in the U.S., the 1-2 finger typing is usually accomplished by a community college dropout, whose fingernail extensions are about 2 inches long, and who types either by carefully and slowly pressing one key at a time with the nail extension, or with the second knuckle of her middle finger. She will also scream: "Can I help you" with enough contempt to burn your eyebrows off. When you get to the counter, she will look you over with as much spite as humanly possible, then get her Sidekick out and text someone for a couple of minutes. And god help you if you are still with her (inevitably) when 12pm or 1pm comes about. She will get up and leave for lunch (or unroll her food), whether you're waiting or not. Actually, she'd prefer you to wait there.
She is a ubiquitous inhabitant of government offices of all sorts, as well as front desks in companies that don't respect themselves. She will need the supervisor/manager to resolve any issue that goes beyond typing your name (incorrectly), but she will march on city hall with the rest of her co-workers if they don't get another 5% raise in the middle of the recession.
I once caught my dad doing something similar via one of those "make money on the internet" sites. I told him that he was most likely assisting a programmer to design "character-by-shape"-recognition software....that he was in essence making the machine smarter.
-Oz
This was actually done by the guys at 4chan /b/:
http://musicmachinery.com/2009/04/27/moot-wins-time-inc-loses/
I thought I had some hazy recollection that reCAPTCHA was being used for some open projects, like helping to OCR out-of-copyright works...
...so now it is being used to fuel Google's massive, still-very-much-copyrighted, proprietary book scanning effort?
So how's this going to benefit people? I'm, of course, assuming the details are spotty at the moment and I'm terribly interested to hear more details from Google's official "do no evil" department on how they intend to contribute to the world.
I just got a correct response from a clearly incorrect answer.
The image was of Beloved but being difficult I answered 8cloved and got accepted.
It did the job of proving that I wasn't a bot, but if there are enough difficult people (like me) out there then we could really screw Google over.
[Intentionally left blank]
Brewster Kahle, aka the Internet Archive, is the beneficiary of reCaptcha's work. Convenient way to knock off someone who wants to release for free, what you're hoping to make money off of.
Presumably there's something in the legal language to guarantee that the Internet Archive will continue to benefit from reCaptcha, but I'm afraid I see this as nothing more than a "slapping back" attempt by Google.
As a good, truly evil company should do...