Domain: recaptcha.net
Stories and comments across the archive that link to recaptcha.net.
Comments · 68
-
Re:Bad Hacking
4chan didn't quite break it, more like they broke time's form implementation. They did a lot of 'hacks' but most was on how Time handled the poll - they didn't use any CAPTCHA at the beginning, then took the form offline, but not the voting script, so 4chan voted well past the cut off time, will millions of monkeys voting.
-
Re:#4 Registering for an account
You ever try to use a forum that didn't require registration? Within 24 hours, 95% of the posts are spam.
Which is why the gods made reCAPTCHA.
-
Re:http://en.swpat.org/wiki/201001_acta.pdf_as_tex
More like this is what I had in mind.
;) -
Re:http://en.swpat.org/wiki/201001_acta.pdf_as_tex
-
Re:You can contribute time to publish free e-books
In fact, the proofread is done by the Distributed Proofreaders: http://www.pgdp.net/c/
BTW, I'd like to know what is done from all the human OCR from the Recaptcha project: http://recaptcha.net/
Any link to the digitized books ?
-
Re:Won't this eventually defeat the purpose?
No it's not warped and obfuscated. ReCaptcha gives you the word as-is.
Go here. Bounce on the reload button a few times to see some example reCAPTCHA. Tell me with a straight face that they're not warped. Perhaps they're scanning books printed on silly putty? As for obfuscated see the example here. They used to slap a line across each word. They don't appear to be doing so any more, but they used to.
-
Re:Won't this eventually defeat the purpose?
No it's not warped and obfuscated. ReCaptcha gives you the word as-is.
Go here. Bounce on the reload button a few times to see some example reCAPTCHA. Tell me with a straight face that they're not warped. Perhaps they're scanning books printed on silly putty? As for obfuscated see the example here. They used to slap a line across each word. They don't appear to be doing so any more, but they used to.
-
Re:maybe they should use CAPTCHAs...
Funny you should say that
-
Re:WTF Summary
You're asked to enter TWO words; one known; one not.
From: recaptcha.net:
But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct. -
Totally Not Cool
I'm going to be wickedly pissed if all my hard re-CATPCHA work was all so they could sell a book.
-
Koine Greek huh?
The New Testament of the Codex Sinaiticus appears in Koine Greek, the original vernacular language, and the Old Testament in the version, known as the Septuagint, that was adopted by early Greek-speaking Christians.
I just hope they aren't using Recaptcha to digitize the text....My Koine Greek is a little rusty, and I'd like to be able to join forums..
-
Recaptcha be able to might help
-
Recaptcha!
Sounds like a job for this project.
Best part is, hand written is going to be more difficult to solve for computers...
-
Re:*rolleyes*
Recaptcha has a service specifically for email addresses, no obfuscation needed... Which also has the added benefit of aiding book digitizing!
-
one answer
-
Re:That wooshing sound....
I tend to think using Recaptcha just earns somebody money, it is not really doing any particular good for the world.
Would it be asking too much to suggest you check the FAQ or About Us links? Is it enough that "reCAPTCHA channels this human effort into helping to digitize books from the Internet Archive", or does it help that "reCAPTCHA is a project of the School of Computer Science at Carnegie Mellon University"?
Or perhaps you'll take the word of Science magazine. Of course, the link is to a
.pdf reprint hosted at recaptcha.net, so YMMV (depending on the tightness of your tinfoil hat). It could all be an evil spammer plot. Yes. Yes it could. -
Re:That wooshing sound....
I tend to think using Recaptcha just earns somebody money, it is not really doing any particular good for the world.
Would it be asking too much to suggest you check the FAQ or About Us links? Is it enough that "reCAPTCHA channels this human effort into helping to digitize books from the Internet Archive", or does it help that "reCAPTCHA is a project of the School of Computer Science at Carnegie Mellon University"?
Or perhaps you'll take the word of Science magazine. Of course, the link is to a
.pdf reprint hosted at recaptcha.net, so YMMV (depending on the tightness of your tinfoil hat). It could all be an evil spammer plot. Yes. Yes it could. -
Re:That wooshing sound....
I tend to think using Recaptcha just earns somebody money, it is not really doing any particular good for the world.
Would it be asking too much to suggest you check the FAQ or About Us links? Is it enough that "reCAPTCHA channels this human effort into helping to digitize books from the Internet Archive", or does it help that "reCAPTCHA is a project of the School of Computer Science at Carnegie Mellon University"?
Or perhaps you'll take the word of Science magazine. Of course, the link is to a
.pdf reprint hosted at recaptcha.net, so YMMV (depending on the tightness of your tinfoil hat). It could all be an evil spammer plot. Yes. Yes it could. -
Best Alternative?
I'm asking for opinions as to what is the (current) best alternative? I am currently (literally...which is actually the reason I looked specifically at this article) working on putting in reCAPTCHA for my site because I figured I'd wait to annoy my users until bots started hitting it...which they started doing a few days ago. I've now had ~50 or so bot accounts get signed up. Although they haven't responded to my confirmation email (and aren't able to login) it is really annoying and each account causes a few emails to bounce.
Anyway, I'm genuinely interested in what people have done for small scale sites. I figure when/if my site starts really growing the solution will change. That said, I'd prefer something simple and easy to implement and I can move to more sophisticated solutions when the need arises. -
Re:That wooshing sound....
They may be trivial to bypass (for some definition of 'trivial'), buy many applications only need a tiny speed-bump to make a huge difference in undesirable traffic.
Plus, if you're using ReCaptcha, you're making the spammers do a little bit of good for the world. If they can develop software that reliably cracks ReCaptcha, then they've solved a lot tougher problem than just pushing v1@g@r@.
-
Re:Chapter: how to fight BOT spam
-
Re:hm
Not if they implement captcha's. http://recaptcha.net/learnmore.html has not been broken yet.
-
CAPTCHA rot getting spread?
Does this mean that recaptcha will be spammed soon?
- Oh, wait, they did *not* use the term V1aGrA in 18th century books?
SCNR, but I actually _do_ want to know.
-
What about reCAPTCHA?
Is something like reCAPTCHA as vulnerable? It would seem like with a virtually limitless supply of texts to be digitized, you could minimize the affect of image solvers. Wouldn't there be enough variations of phrases to not make it worth it to document every possibility? And if you've got OCR software good enough to solve scanned texts reliably, that's a win for everyone, right?
-
Re:reCAPTCHAIf you want to know how it works...
But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.
-
reCAPTCHA
from the dude who coined CAPTCHA, comes reCAPTCHA. using words in old library books that existing OCR tech can't figure out, humans can help digitize books and stop spam at the same time!
http://recaptcha.net/ -
Turn it to advantage
If every site took up that reCaptcha thing all these paid captcha-solvers would be helping to digitise thousands upon thousands of old books
... on the spammers' dime. -
Use to hide your own email addy
You can also use reCaptcha for your own email address, and be more willing to provide it "publicly" since they'd have to answer the reCaptcha to get to the mailto... reCaptcha mailhide
-
Re:reCAPTCHA and Open SourcereCAPTCHA should not cause any licensing issues if all you do is link to their site via the "magic four lines of code" or use one of their plugins
from why reCAPTHCHAIt's Easy. reCAPTCHA is a Web service. As such, adopting it is as simple as adding 4 lines of code on your site. For many applications and programming languages such as Wordpress and PHP we also have easy-to-install plugins available. We generate and check the distorted images, so you don't need to run costly image generation programs.
WordPress has a GPL license and a reCAPTCHA plugin; so I'd hazard a guess that the reCAPTCHA license is open source friendly.
-
Re:Not newfrom reCAPTCHA FAQ
When showing reCAPTCHA to the user, is it possible not to show the reCAPTCHA logo? We allow you to customize the theme of reCAPTCHA with our Client API. You are still required to have text on your website which states that you are using reCAPTCHA, however with our theming API, you are free to do this in a way that blends in to your site.
-
Re:A good solution here...
What we do, is we pair these tests on a page. We'll include a known test, like the one above. And we'll also show an unclassified image and we might ask "how many people are in this picture?"
This is basically what reCAPTCHA does, although they only use words. They take images of words that off-the-shelf OCR software failed to read, apply more distortions, and serve them up two at a time. One of the words is known; the other is unknown but becomes known after enough people have submitted the same answer.
And as a bonus, the answers aren't just used to grant access to a web site - they're used to digitize the old books that the images came from in the first place.
-
Still useful
CAPTCHA is still useful for small to medium sites that aren't specifically targeted. Your average blog, for example, is only hit by random bots that try to get quick and easy posts. Only the largest sites like GMail need to find something better today.
For example, I use reCAPTCHA on DocForge to block the standard wiki spam bots. Since my site's not large enough to be under heavy attack very little gets through. Someday CAPTCHA may be so easy to break that everyone's at risk, but not today.
-
Not the last nail in the coffin by far...
No one has cracked ReCAPTCHA yet. (This CAPTCHA had a Slashdot article a few months ago.) As it uses text digitized from old books that the best OCR technology couldn't read, it's continually different and already demonstrated to be unintelligible to machines.
Plus, using ReCAPTCHA instead of other solutions also helps Carnegie-Mellon digitize old books for posterity.
From TFA: Microsoft, Google, and all other websites that currently use CAPTCHA, need to find a solution that puts them a step ahead of the spammers. This may well be it. -
Re:So, explain ...
"Preventing this is virtually impossible."
reCAPTCHA has a key system. It makes the user have private and public keys so that no one can simply take your CAPTCHA and use it on their site for others to solve for them. From the API Documentation:
"In order to use reCAPTCHA, you need a public/private API key pair. This key pair helps to prevent an attack where somebody hosts a reCAPTCHA on their website, collects answers from their visitors and submits the answers to your site."
Unless I understood you wrong, if so, sorry. -
Re:CAPTCHA is for weak mindshttp://recaptcha.net/
Except that its use has serious privacy implications (if you use reCaptcha, their server learns the IP adresses of your visitors, and could even track their surfing habits, if enough sites use reCaptcha). From that point of view, the implementation is seriously flawed, and using reCaptcha might even be illegal in some countries due to privacy laws.
-
Re:CAPTCHA is for weak mindsfrom http://recaptcha.net/learnmore.html But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.
-
Re:CAPTCHA is for weak minds
This is already being done. Check out this BBC Story about an outfit called Re-Captcha
-
Re:CAPTCHA is for weak mindsThat raises an interesting idea... why not use the capchas to perform some useful work? Example... display a scanned line of text from a project that needs a large volume of text OCR'd for free/cheap. Someone already beat you to it.
-
Re:CAPTCHA is for weak minds
-
Just use reCAPTCHA
I don't understand why more people don't use reCAPTCHA. If the best book OCRs can't figure out a word, it is probably going to be difficult for a 3rd party OCR to figure out a distorted version of that word. Much less 2 words. Add on to that the fact that there is a central DB monitoring what IPs are solving these CAPTCHAs and on what sites these CAPTCHAs are being solved on and you allow the reCAPTCHA project the ability to improve the reliability of their service.
Plus you get to help digitize books for public access. Which is always a good thing. -
They use a Captcha to validate the scanned words
For those that missed the articles about C.M.'s associated project for validating all those scanned words on all those scanned pages: http://recaptcha.net/
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. -
ReCaptcha
I suggested to my company that we use Carnegie Mellon's reCAPTCHA program to solve two problems 1) Improve our CAPTCHA implementation 2) Help Carnegie Mellon with their online publishing initiative. To my pleasant surprise I recently found the company decided to go ahead with reCAPTCHA. Sweet! If you are not familiar then check it out and do some good for everyone! http://recaptcha.net/
-
Relevant Link
In case you missed this discussion back on October 2, Carnegie Mellon has a service which helps to better digitize these books. It's called Recaptcha, and it uses otherwise wasted human cycles to convert text that was hard for computers to OCR.
-
Re:Yay lowest common denominator
Why is it my job (metaphorically speaking) to ensure those who are disabled can use my facilities?
Because most people believe the disabled have a right to equal access to services as everyone, firstly because those who use assistive technologies have no choice and secondly it's not their fault. Not only that but there's really no excuse for designing an inaccessible site, it's not difficult, in fact in most cases it's easier. Inaccessible usually means Flash/Javascript/IE only sites, which not only stops access for the disabled but for those of us who hate Flash/Javascript/Internet Explorer too, it also implies the Web designer/developer is incompetent.
There are circumstances where it's impossible to cater for people using assistive technologies: like wheelchair access to listed buildings (not uncommon in Europe) or prohibitive cost for small businesses to provide wheelchair access, I don't think Web sites are one of them though.
Think of it this way: do you use Firefox? Do you think all Web sites should work given your chosen technology? Or is it your job to somehow adapt to people who only code for Internet Explorer? Is it their fault that you don't use Internet Explorer? Frankly too bad on you. Life sucks. Now imagine someone's showing you that attitude, yet your body is setup such that you can't use anything but Firefox. If you ever go blind from looking at too much Natalie Portman smothered in hot grits I hope you remember your post.
Back on topic: the biggest problem I see for site owners is CAPTCHA as screen readers can't read the majority of CAPTCHAs out there, everyone had better make sure the system they use allows for a sound file alternative. reCAPTCHA looks like a good service, you get to encode books at the same time as fighting spammers! Personally haven't used it on a project, but did notice the sound file alternative link.
-
Re:Drupal Module makes it simple
I use both the Drupal module you've mentioned and the MediaWiki plugin that the CMU team (apparently) maintains. If you don't use either of those, you've still got a lot of options.
-
Re:Drupal Module makes it simple
I use both the Drupal module you've mentioned and the MediaWiki plugin that the CMU team (apparently) maintains. If you don't use either of those, you've still got a lot of options.
-
Re:I want to participate...
You can use a live demo on their about page so no sign-up required and you can start digitizing words immediately.
-
Does it stop spam?From their learn more page:
f you get email spam we have a method that will help you to reduce it. Many spammers crawl the web looking for email addresses. When they see an email address on a web page, they send spam to the address. Mailhide allows you to safely post your email address on the web. Mailhide takes an address such as jsmith@example.com and turns it into jsm...@example.com. In order to reveal the address, a user must click on the "..." and solve a reCAPTCHA. If you use the Mailhide version of your email address, spammers won't be able to find your real email address and you'll get less spam.
Does that work? Or are there a thousand ways for the spammers to break this? -
Re:I want to participate...
Our demo at http://recaptcha.net/fastcgi/demo/recaptcha keeps track of the number of words you've digitized.
:) -
Re:Problems
"Testing the answer against another users answer is a good idea in principle (its how they make sure no one is cheating in distributed computing projects) but giving the same answer as another user is not difficult when they are using the same algorithm."
Please RTFA. How do you propose that the same bot gets the same word twice in one sitting, let alone with the same warping and strikethrough so as to guarantee the same word is typed both times?
Check out recaptcha.net to test it out.