reCAPTCHA Hard At Work, Rescuing Fading Texts

← Back to Stories (view on slashdot.org)

reCAPTCHA Hard At Work, Rescuing Fading Texts

Posted by timothy on Thursday August 14, 2008 @01:05PM from the strange-confluence dept.

sciencehabit writes "Computer scientists have developed a program, called reCAPTCHA, which is being used in lieu of CAPTCHA by several sites, to help digitize old books and newspapers. The reCAPTCHA takes entries from old and faded texts that optical scanners and digital-text readers have trouble with. So every time you solve that string of crooked letters, you may actually be helping historians digitally reconstruct a page from the 1908 New York Times." The Science Now story links to the longer and more informative article at Ars Technica. (We last mentioned this program last year — and now it's good to get some sense of how well it's working.)

112 comments

Min score:

Reason:

Sort:

Not new by JazzyMusicMan · 2008-08-14 13:07 · Score: 4, Informative

Ticketmaster and other sites have already been doing this for a while. Go to ticketmaster and search for tickets, you'll see two words. One is known and the other is unknown. If you don't believe me, try to guess which one they know and misspell the other one on purpose (or don't, this is for historic posterity =) )
1. Re:Not new by Dachannien · 2008-08-14 13:16 · Score: 2, Informative
  
  So is the US Patent and Trademark Office, as part of the process of using PAIR, the Patent Application Information Retrieval system, which lets the public look at information about patent applications that have been published.
2. Re:Not new by felipekk · 2008-08-14 13:39 · Score: 3, Funny
  
  Facebook uses reCAPTCHA. I guess you can make something useful out of the millions of useless teenagers wasting their time on Facebook.
3. Re:Not new by grahamd0 · 2008-08-14 13:45 · Score: 5, Funny
  
  Facebook uses reCAPTCHA. I guess you can make something useful out of the millions of useless teenagers wasting their time on Facebook.
  That's not fair.
  Plenty of useless adults waste their time on Facebook.
4. Re:Not new by Firehed · 2008-08-14 14:02 · Score: 2, Informative
  
  Do they really? From what I was able to tell, it's not specified as reCAPTCHA anywhere in the window; having looked at the reCAPTCHA site from a development side I could swear that I read that you needed to give credit if developing a custom style for it. Either I'm remembering wrong, they've got a deal, or FB is undergoing one of the stupidest TOS violations ever.
  
  --
  How are sites slashdotted when nobody reads TFAs?
5. Re:Not new by erbmjw · 2008-08-14 14:21 · Score: 3, Informative
  
  from reCAPTCHA FAQ
  
  When showing reCAPTCHA to the user, is it possible not to show the reCAPTCHA logo? We allow you to customize the theme of reCAPTCHA with our Client API. You are still required to have text on your website which states that you are using reCAPTCHA, however with our theming API, you are free to do this in a way that blends in to your site.
6. Re:Not new by Tassach · 2008-08-14 14:24 · Score: 2, Insightful
  
  Because that's so different than the thousands of useless geeks wasting their time on /.
  
  --
  Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
7. Re:Not new by felipekk · 2008-08-14 14:27 · Score: 1
  
  Here, found one!
  Ok, you can go back to Facebook now Tassach =)
8. Re:Not new by Kjella · 2008-08-14 14:27 · Score: 2, Insightful
  
  I would imagine that they use multiple logins to verify one word - it's not like people don't mistype captchas in the first place.
  
  --
  Live today, because you never know what tomorrow brings
9. Re:Not new by Your+Pal+Dave · 2008-08-14 15:29 · Score: 3, Informative
  
  Quoting from the NPR story which aired earlier today:
  
  more than 40,000 Web sites -- including popular ones such as Ticketmaster, Facebook and Craigslist -- are using a new kind of security program called reCAPTCHA.
10. Re:Not new by sangreal66 · 2008-08-14 16:55 · Score: 2, Informative
  
  Do they really? From what I was able to tell, it's not specified as reCAPTCHA anywhere in the window; having looked at the reCAPTCHA site from a development side I could swear that I read that you needed to give credit if developing a custom style for it. Either I'm remembering wrong, they've got a deal, or FB is undergoing one of the stupidest TOS violations ever.
  They do give attribution to reCAPTCHA. You have to click on "What's this?"
  
  This is a standard security test that we use to prevent spammers from creating fake accounts and spamming users. Our captchas are provided by ReCaptcha
11. Re:Not new by noidentity · 2008-08-14 17:04 · Score: 1
  
  If you don't believe me, try to guess which one they know and misspell the other one on purpose (or don't, this is for historic posterity =) )
  They most likely require several matching readings from different people before they consider it deciphered.
12. Re:Not new by Random+Walk · 2008-08-14 21:18 · Score: 2, Interesting
  
  Quoting from the NPR story which aired earlier today:
  
  more than 40,000 Web sites -- including popular ones such as Ticketmaster, Facebook and Craigslist -- are using a new kind of security program called reCAPTCHA.
  That's scary. The way ReCaptcha works allows the reCaptcha server to collect the IPs of reCaptcha users (along with the reCaptcha-enabled website they are using). If many websites are using reCaptcha, it allows to track users as they are moving through the web, from one reCaptcha-enabled website to the next.
  The idea is cute, but the implementation is fundamentally broken and a huge breach of privacy.
13. Re:Not new by Anonymous Coward · 2008-08-14 21:32 · Score: 0
  
  FanFiction.net also uses reCAPTCHA.
14. Re:Not new by Dan541 · 2008-08-14 21:55 · Score: 1
  
  It would only be useful if teenagrs knew how to spell.
  
  --
  An SQL query goes to a bar, walks up to a table and asks, "Mind if I join you?"
15. Re:Not new by Anonymous Coward · 2008-08-14 22:20 · Score: 2, Informative
  
  That's scary. The way ReCaptcha works allows the reCaptcha server to collect the IPs of reCaptcha users (along with the reCaptcha-enabled website they are using). If many websites are using reCaptcha, it allows to track users as they are moving through the web, from one reCaptcha-enabled website to the next.
  Only if you actually use the JavaScript API. If you want to protect the privacy of your site's users, you are free to use the server side API of your choice. This gives them (at most) a count of how many recaptchas your users have solved. By the way, the recaptcha site provides - amongst others - ready-made server side bindings for PHP, Java, Ruby, Python and Perl.
16. Re:Not new by Kent+Recal · 2008-08-14 22:27 · Score: 1
  
  Tinfoil hat much?
  Every bigger ad agency (google) can do the same thing.
17. Re:Not new by Fotherington · 2008-08-14 22:43 · Score: 0
  
  Yes, the Ars Technica article states that a particular transcription gets 1 point every time it's made by a human, and 0.5 points when made by a computer. If it gets to 2.5 points, it's accepted - this approach gets you 99% accuracy, which compares well to professional transcription agencies.
18. Re:Not new by Random+Walk · 2008-08-15 00:21 · Score: 1
  
  Huh? Pardon me, but their website doesn't talk about a server-side API.. according to their docs, the server-side stuff (which is available for plenty of languages) is only for verifying the answer. The captcha itself is pulled by the browser from the reCaptcha site, so they know both the user IP as well as the website (which contacts them to verify the answer).
19. Re:Not new by Random+Walk · 2008-08-15 00:30 · Score: 1
  
  ...and you can block requests to images (i.e. ads) not hosted on the original website, usually without loss of functionality.
20. Re:Not new by Alzheimers · 2008-08-15 02:11 · Score: 3, Funny
  
  But you...
  *sigh* ...Nevermind. It's Friday. Go have a beer or something.
21. Re:Not new by Anonymous Coward · 2008-08-15 02:58 · Score: 0
  
  The same is true for any image (or flash ad, iframe, etc, though I suppose you use Adblock). If you're that worried, you might not want to be on the internet.
22. Re:Not new by tuaris · 2008-08-15 03:35 · Score: 1
  
  Na, the useless adults are wasting their time on craigslist's casual encounters section, who think those "ads" are real men and women. At least they use reCAPTCHA their too!
  
  --
  President/CEO Pacy World http://www.pacyworld.com
23. Re:Not new by Arterion · 2008-08-15 08:24 · Score: 1
  
  If you wanted to blow the extra bandwidth, you could get around that, too. Grab the image onto your server, and let the user get it from there.
  Most sites won't do this, because I think it falls way into the tinfoil hat department. :P
  
  --
  "That which does not kill us makes us stranger." -Trevor Goodchild
24. Re:Not new by Arterion · 2008-08-15 08:26 · Score: 1
  
  Smart sites are doing something to check and see if you're blocking ads. I notice that I have to disable AdBlock on imeem.com or it will only play the first song in a playlist.
  I'm sure there is a way around it, but I haven't hacked at it enough. Not really bugging me that much. Every way I can think of to implement such an "ad checker" can be defeated.
  
  --
  "That which does not kill us makes us stranger." -Trevor Goodchild
25. Re:Not new by Dan541 · 2008-08-15 13:04 · Score: 1
  
  Go have a beer or something.
  Waaaaaay ahead of ya.
  
  --
  An SQL query goes to a bar, walks up to a table and asks, "Mind if I join you?"
26. Re:Not new by rootooftheworld · 2008-08-16 03:58 · Score: 1
  
  I'd rather they waste their time, instead of pestering me. Why can't they go annoy someone in marketing for a change? I prefer to use my time in creative discussions here, and possibly try to cure those emacs zealots from their festering mental illness, and see the light of vi.
  *duck and covers with flameproof suit under the steel table* *locks and loads AK-47*
  
  --
  I know full well that tobacco is bad for you, so I smoke weed with crack
Validate your data, guys! by Anonymous Coward · 2008-08-14 13:16 · Score: 3, Funny

I can usually tell which of the two words is from a real old text. With high probability (>90%) I can correctly answer the real CAPTCHA and replace someone's OCR'd word with "penis".
I've only ever done this maybe ten or twenty times, but it could easily become an automatic part of using the system.
1. Re:Validate your data, guys! by theguru · 2008-08-14 14:14 · Score: 1
  
  I'm sure they send the same unknown word out to multiple people, and wait for a concensus on it.
  Now, if we ALL started entering "penis" for the obvious unknown words.. :)
2. Re:Validate your data, guys! by Robotech_Master · 2008-08-14 15:33 · Score: 1
  
  The thing is, they're often actually both from old texts. It's just that one of them has already been verified.
  And TFA states that they do pass every word by multiple people so as to get more accuracy in what they say. I have little doubt that they're well acquainted with people who try spoofing them.
  
  --
  Editor Emeritus and Senior Writer, TeleRead.org
3. Re:Validate your data, guys! by PPH · 2008-08-14 15:54 · Score: 3, Interesting
  
  Since they use entries from several users to validate correct translations for OCR'ed text, this probably won't cause them major problems. OTOH, I wonder if they can track the accuracy of each user's inputs and, if it becomes evident that a user is either incompetent or attempting to screw with the system, take appropriate measures.
  When someone's karma starts dropping into the negative range, they should let us know how well this worked out. If anyone can see their posts, that is.
  
  --
  Have gnu, will travel.
4. Re:Validate your data, guys! by Spasmodeus · 2008-08-14 17:45 · Score: 2, Funny
  
  As soon as I heard about this project, I figured there'd be people finding ways to abuse it.
  I can see future generations sitting down for a good read:
  
  MOBY COCK
  Chapturd One
  Call me LOLOLFAG...
5. Re:Validate your data, guys! by x2A · 2008-08-14 18:11 · Score: 1
  
  Could be DEVISTATING to the poor fool who blindly follows details from a patent that describes a machine built with a random penis stuck in... that's a machine I don't even wanna think about *shudder*
  
  --
  The revolution will not be televised... but it will have a page on Wikipedia
6. Re:Validate your data, guys! by x2A · 2008-08-14 18:14 · Score: 1
  
  ...and this morning my spelling happens to also be devastating... grr
  
  --
  The revolution will not be televised... but it will have a page on Wikipedia
7. Re:Validate your data, guys! by hostyle · 2008-08-14 20:44 · Score: 0
  
  ...and this morning my penis happens to also be devastating... grr
  Fixed it for you!
  
  --
  Caesar si viveret, ad remum dareris.
8. Re:Validate your data, guys! by Elastri · 2008-08-15 00:24 · Score: 1
  
  Hopefully they are only accepting a piece of text when a lot of the people give the same thing.
9. Re:Validate your data, guys! by pz · 2008-08-15 00:25 · Score: 1
  
  Way to make the world a better place. I'm certain your parents are very proud of your accomplishments. Perhaps you can now go find someone else's sandbox to defecate in, I suggest your own, because I certainly would rather you not be here.
  I have no doubt that the reCAPTCHA folks understand that there are going to be people who find such childish behavior irresisible or entertaining, and either start discounting such answers (based on IP address) or build in filtering to discount particular words.
  But, really, you ask people to be good, to help improve the world, to, as Abraham Lincoln so eloquently put it, listen to our better angels of our nature, and this is what you get? Teenage boys who find it titillating to type bad words. My cup runneth over.
  
  --
  
  Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
10. Re:Validate your data, guys! by Anonymous Coward · 2008-08-15 00:32 · Score: 0
  
  My cup runneth over.
  That's because it's got two girls, too.
11. Re:Validate your data, guys! by adamofgreyskull · 2008-08-15 01:08 · Score: 1
  
  Tracking peoples' IP addresses all across the internets? They're lucky if that's all they have their database poisoned with.
12. Re:Validate your data, guys! by Kaetemi · 2008-08-15 02:22 · Score: 1
  
  So, what's that first word here, anyways?
  http://dl.kaetemi.be/kaetemi/recaptcha.png
  
  --
  Kaetemi
13. Re:Validate your data, guys! by Anonymous Coward · 2008-08-15 04:55 · Score: 1, Insightful
  
  Both words are from 'real old text'. You won't have any effect on the data output by putting 'penis' because more people will type the correct word.
14. Re:Validate your data, guys! by ShawnDoc · 2008-08-15 05:08 · Score: 1
  
  They most likely give the same word to multiple users and choose the word most often entered.
15. Re:Validate your data, guys! by PPH · 2008-08-15 06:01 · Score: 1
  
  They most likely give the same word to multiple users and choose the word most often entered.
  Exactly. But its possible to adapt a technique used in some AI knowledge acquisition systems wherein the outcome of such scoring is 'back propagated' to rank the relative validity of various data sources, rules, etc. If one source (user in this case) consistently ranks low, they get a lower weight in future solutions. Until eventually they get dropped off the bottom of the list (like bad karma on /.).
  
  --
  Have gnu, will travel.
16. Re:Validate your data, guys! by Anonymous Coward · 2008-08-15 07:30 · Score: 0
  
  Hopefully they are verifying with mutiple reCAPTCHAs of the same text.
17. Re:Validate your data, guys! by Arterion · 2008-08-15 08:30 · Score: 1
  
  I'm going to guess "formal".
  
  --
  "That which does not kill us makes us stranger." -Trevor Goodchild
Cool possible uses by Irish_Samurai · 2008-08-14 13:19 · Score: 4, Interesting

Man, I would love to see the results if this technique was used for an ontological purpose.
Please type in the word from the choices below that most closely relates to this word: OLD
HISTORIC
LIFESPAN
Interesting shit indeed.
1. Re:Cool possible uses by burgundysizzle · 2008-08-14 13:50 · Score: 5, Funny
  
  Or perhaps SLASHDOT-READER:
  OVERWEIGHT
  GEEK
  SPENDS-TO-MUCH-TIME-USING-COMPUTERS
  ALL-OF-THE-ABOVE
  I fit into the category ALL-OF-THE-ABOVE. The only generalisation that is missing about slashdotters is the one about girlfriends.
2. Re:Cool possible uses by DrInequality · 2008-08-14 15:03 · Score: 1
  
  Surely that'd be relatively easy to hack by use of a thesaurus?
  Or even google:
  old+historic: 66,500,000
  old+lifespan: 3,480,000
  
  pwned!
  
  --
  DROS - Open-Source Robot Software
3. Re:Cool possible uses by Irish_Samurai · 2008-08-14 15:42 · Score: 2, Informative
  
  The point is to see what the populace thinks the relation is.
  If you think google is the end all be all of absolute information then you already fail.
4. Re:Cool possible uses by x2A · 2008-08-14 18:20 · Score: 1
  
  I bet it's not far from the truth! What's google but the indexing of the [online] expressions of the populace of which you speak?
  
  --
  The revolution will not be televised... but it will have a page on Wikipedia
5. Re:Cool possible uses by Anonymous Coward · 2008-08-14 18:22 · Score: 0
  
  Could you explain this "Girlfriend" thing again, I have heard rumors of such things but I can't believe they exist
6. Re:Cool possible uses by dword · 2008-08-14 20:02 · Score: 1
  
  The only generalisation that is missing about slashdotters is the one about girlfriends.
  Haha, loser, you don't have a girlfriend like all the other /.ers!
  Everybody knows that all /.ers have girlfriends. I can even remember my first couple of imaginary lesbian girlfriends I made up when I first joined /.
7. Re:Cool possible uses by fbjon · 2008-08-14 23:06 · Score: 1
  
  But if you think google isn't good enough for a spammer, then YOU fail. Hah!
  
  --
  True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
8. Re:Cool possible uses by SQLGuru · 2008-08-15 00:34 · Score: 1
  
  Here's you explaination: http://www.funnyhumor.com/jokes/575.php
  Layne
9. Re:Cool possible uses by Anonymous Coward · 2008-08-15 03:12 · Score: 0
  
  Use it to block spam. "Is this a real letter or not?" Make it a requirement to send an email and the spammers can cancel themselves out.
10. Re:Cool possible uses by burgundysizzle · 2008-08-25 16:58 · Score: 1
  
  Haha, loser, you don't have a girlfriend like all the other /.ers!
  LOL, I can feel quite content in the fact that you are a fellow loser who doesn't either.
Huh? 1908 New York Times? by mschuyler · 2008-08-14 13:27 · Score: 2, Funny

The New York Times is already online from 1851 onwards. the concept is cool, truly, but why not CAPTCHA something not already accomplished? Oh, I know. That was, like, a metaphor, right?

--
How about a moderation of -1 pedantic.
1. Re:Huh? 1908 New York Times? by FlyingSquidStudios · 2008-08-14 13:39 · Score: 2, Insightful
  
  I am almost certain that it is not all there in its entirety. There are bits that are not online specifically because of OCR errors. That is going to be true with any large volume of OCRed text.
  
  --
  http://twitter.com/OLDTELEGRAM
2. Re:Huh? 1908 New York Times? by Vectronic · 2008-08-14 14:44 · Score: 1
  
  Yeah I was kinda wondering about that too, but from a different perspective... I mean: "So every time you solve that string of crooked letters, you may actually be helping historians digitally reconstruct a page from the 1908 New York Times."
  What the hell is the problem with people? All text is apparently on a single page from the NY Times in 1908... I mean fuck, stop the press, cause its obviously all redundant shit anyways, just keep redistributing that one page across the world!
3. Re:Huh? 1908 New York Times? by x2A · 2008-08-14 18:30 · Score: 1
  
  "Oh, I know. That was, like, a metaphor, right?"
  If it was like a metaphor, does that make it a simile? No wait, this means you're using using metaphors as a simile? Hmm this could get confusing... perhaps we could make a reCAPTCHA like technology but with old metaphors instead of letters and create a big database of abstraction...
  (ps: I am my own brother who wrote that above, so I am definitely confused, can someone help please?)
  
  --
  The revolution will not be televised... but it will have a page on Wikipedia
DMCA Violation by Nymz · 2008-08-14 13:30 · Score: 5, Funny

The feature known as FADING was designed to protect copyright works from being pirated by becoming illegible before the work could fall into the public domain.
Prior art by armanox · 2008-08-14 13:36 · Score: 4, Funny

I think that erosion on stone tablets predates fading by quite a bit....

--
I'm starting to think GNU is the problem with "GNU/Linux" these days.
1. Re:Prior art by Nymz · 2008-08-14 13:52 · Score: 2, Funny
  
  I really wish the RIAA (Rock Industry Association of the Archean eon) would update their business model to the current Phanerozoic eon.
2. Re:Prior art by kvezach · 2008-08-15 06:33 · Score: 1
  
  True freedom-lovers abhor that kind of DRM and carve on tungsten tablets before using CVD to coat them with diamond...
  
  ... you insensitive clod!
gmail captchas by v1 · 2008-08-14 13:41 · Score: 2

a little OT I know but is anyone else having a bad time with gmail's captchas? I've tried signing up several of our customers for gmail recently and it's becoming really hard to get them right. The "audio" playback used to be the saving grace, but the last two I did it sounded like ten people were talking to me all at once with no discernible key voice. (and last I succeeded, the string to be entered was spoken in three groups, by three different voices)

--
I work for the Department of Redundancy Department.
1. Re:gmail captchas by TheModelEskimo · 2008-08-14 15:32 · Score: 1
  
  Yep, I do the same thing, signing clients up for Google services, and I get their captchas right about once every three or four tries. :-(
2. Re:gmail captchas by x2A · 2008-08-14 18:42 · Score: 1
  
  Wow people are really goin to town with the offtopic mods *sigh* conversation nazis "you will follow STRICTLY the rules of conversation or your karma will be no more!!!". Such a waste, there are posts out there that need modding up than these slightly-offtopic posts need modding down!
  And yes I have started coming across more captchas that do seem just impossible to read, they certainly know how to make you feel stupid 'n illiterate. Apparently it's a new system in place, like the one in the article, but for prescription notes from doctors... "if anyone can find someone who can read this, google can".
  Wonder what else is difficult to read we could use this for... which way my gf's moodswings are gonna go? (haha guess which way it is at the moment... I'm here on slashdot, that should be a big clue :-p)
  
  --
  The revolution will not be televised... but it will have a page on Wikipedia
Image Captchas by pembo13 · 2008-08-14 13:48 · Score: 3, Informative

I've found implementing a simple "please choose the name of the item seen bellow" eliminates a large amount of spam (all?) but has the problem of not being viable for blind people.

--
"Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
1. Re:Image Captchas by x2A · 2008-08-14 18:54 · Score: 1
  
  I find a bit of simple javascript works well, and is out of sight of genuine users. If you wanna account for people who block javascript (rather than a note saying "please turn on javascript for a sec, think of the children") you can have a captcha in a span or div etc, then use javascript to remove it and replace it with a hidden field with a name<-->value pair that can be compared server side when they post the form and have the values checked. Yes, someone could look at the page source and see what's going on and write a script that gets round it, but (unless you're really big like google) most of what tries to hit you will just be automated scripts that don't have javascript interpreters and so aren't going to post the completed form.
  
  --
  The revolution will not be televised... but it will have a page on Wikipedia
2. Re:Image Captchas by Martz · 2008-08-14 18:55 · Score: 4, Funny
  
  Just use an alt tag.
3. Re:Image Captchas by Anonymous Coward · 2008-08-15 00:26 · Score: 0
  
  Or for people whose first language isn't English. Oh you Americans, always so arrogant!
4. Re:Image Captchas by MobyDisk · 2008-08-15 01:30 · Score: 1
  
  But that is multiple choice, so it is easier to make a program that can guess the result.
5. Re:Image Captchas by pembo13 · 2008-08-15 08:46 · Score: 1
  
  I'm aware of that. Still does the job
  
  --
  "Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
Re:One Problem by Anonymous Coward · 2008-08-14 13:57 · Score: 3, Funny

The following security test allows us to validate you are a human and not an automated script.
please type the following two words in the text box below
you moron
____________ _____________
Re:One Problem by Psychotria · 2008-08-14 13:58 · Score: 1

From TFA:

The software presents one optically unreadable word and one "control" CAPTCHA word. Getting the control word right identifies the user as a human, and the program records his or her response to the unreadable word and adds it to a database.
So, there is the real CAPTCHA, and another reCAPTCHA.
Finally logged in by narcberry · 2008-08-14 13:58 · Score: 2, Funny

Took me a bit to get past the new security measures, But I got a coupon 5 cents off my next shoe purchase.

--
Modding me -1 troll doesn't make me wrong.
reCAPTCHA and Open Source by pwizard2 · 2008-08-14 14:06 · Score: 1

Right about now, I'm wondering what the implications would be for including reCAPTCHA in an open source project. (a PHP-based blog I'm working on) Right now the blog is read-only, since I have yet to build my own working CAPTCHA system and putting up an unprotected reply form is sheer idiocysince it wil lbe a whole five minutes before the spam bots find it. My project is GPLv3, so would including ReCAPTCHA cause me some sort of licensing problem?

--
"It is a denial of justice not to stretch out a helping hand to the fallen; that is the common right of humanity."
1. Re:reCAPTCHA and Open Source by erbmjw · 2008-08-14 14:31 · Score: 1
  
  reCAPTCHA should not cause any licensing issues if all you do is link to their site via the "magic four lines of code" or use one of their plugins
  
  from why reCAPTHCHA
  
  It's Easy. reCAPTCHA is a Web service. As such, adopting it is as simple as adding 4 lines of code on your site. For many applications and programming languages such as Wordpress and PHP we also have easy-to-install plugins available. We generate and check the distorted images, so you don't need to run costly image generation programs.
  WordPress has a GPL license and a reCAPTCHA plugin; so I'd hazard a guess that the reCAPTCHA license is open source friendly.
2. Re:reCAPTCHA and Open Source by corbettw · 2008-08-14 14:35 · Score: 3, Informative
  
  There are multiple libraries for reCAPTCHA already published, all under the MIT License. Just see http://code.google.com/p/recaptcha/ for a list of them.
  
  --
  God invented whiskey so the Irish would not rule the world.
3. Re:reCAPTCHA and Open Source by Anonymous Coward · 2008-08-14 21:42 · Score: 0
  
  Another licensing issue: who owns the books once they are digitized with reCAPTCHA? I can't find any mention of this on the reCAPTCHA website, or any digitized books.
Problems With ReCaptcha by Anonymous Coward · 2008-08-14 14:12 · Score: 1, Interesting

I've seen a number of issues with reCaptcha that I don't really know how to handle (i.e. what to enter): 1. Multiple word strings 2. Foreign characters 3. Illegible Text 4. A single word for both entries 5. Words that look like one thing initially, but are really another when you look closer
1. Re:Problems With ReCaptcha by Robotech_Master · 2008-08-14 15:35 · Score: 3, Informative
  
  I've seen one ReCAPTCHA string that was just a distorted entirely illegible blob of ink.
  Just do what I did: click the "refresh" button to the right for a new word pair and enter that one.
  
  --
  Editor Emeritus and Senior Writer, TeleRead.org
Re:AC for the plain old CAPTCHA by grahamd0 · 2008-08-14 14:43 · Score: 4, Funny

Let me introduce you to my friend, the question mark.
your sig by Iamthecheese · 2008-08-14 14:51 · Score: 1

is full of hyperbole, dogma, propaganda, and meaningless blatherings.

--
If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
1. Re:your sig by Irish_Samurai · 2008-08-14 15:45 · Score: 1
  
  That's kinda the point moron.
  Let me introduce you to the concept of context.
2. Re:your sig by Anonymous Coward · 2008-08-14 16:39 · Score: 0
  
  The obvious context and meaning of the site is, "OMFG the government is controlling your life! thats right you!" The sensationalistic overtones, selection bias, question begging, hand waving, appeals to fear and emotion, and inductive falacies of every flavour render the site a treasure trove of bad logic. I would definately use this in a class about the application of dogma. The message is good, but the delivery is evil.
3. Re:your sig by Irish_Samurai · 2008-08-14 16:59 · Score: 1
  
  The linked page is self purporting. That's it's purpose.
  If you are smart enough to see through it, you are smart enough to discredit it. In turn that makes it an example, not a message.
  The problem with trying to communicate a message of the sort I link to is that the goal is to get you to scream "BULLSHIT!"
  Parts apply and others don't, but they do provoke thought. Thought allows you to discard its catalyst for new ideas, but doesn't require it.
  If you take the link as truth, you miss the point.
4. Re:your sig by maxume · 2008-08-15 01:19 · Score: 1
  
  What context could possibly rescue those writings from being full of hyperbole, dogma, propaganda, and meaningless blatherings?
  
  --
  Nerd rage is the funniest rage.
Re:One Problem by RedWizzard · 2008-08-14 15:18 · Score: 4, Funny

One FUNDAMENTAL problem with this
... is that you didn't RTFA.
Known unknown? by halcyon1234 · 2008-08-14 15:24 · Score: 1

How are they able to tell if I've accurately solved an unknown. If the word is "Yesterday" and I enter "Fucktard", not only will the society get some very wrong data, but I'll also have passed the CAPTCHA without entering the actual letters.

--
UTF-8: There and Back Again
1. Re:Known unknown? by Peyna · 2008-08-14 15:29 · Score: 1
  
  RTFA.
  You get two captchas. One is your standard, let's find out if you're human captcha, where the program knows the answer. The other is the scanned text. It also presents the same scanned text to many people, and then uses the results to figure out which one is the most likely correct result.
  
  --
  What?
You forgot... by Anonymous Coward · 2008-08-14 15:43 · Score: 0

COWBOY NEAL
It turns out... by symbolset · 2008-08-14 16:02 · Score: 2, Informative

That slashdot's Goatse troll server guy proves useful.
Note: This is not a troll. One of the guys that offers open web services to slashdot trolls is also responsible for considerable development of CAPTCHA breakage and is an eminent Debian developer. This is why I've said that we should respect his efforts despite the unpleasant side effects. The truly brilliant we should grant exceptions from social behavior because they discover things more proper folk would not.

--
Help stamp out iliturcy.
1. Re:It turns out... by The+End+Of+Days · 2008-08-15 07:27 · Score: 1
  
  The truly brilliant we should grant exceptions from social behavior because they discover things more proper folk would not.
  No. Just no. Being brilliant is no excuse for being an asshole.
2. Re:It turns out... by argent · 2008-08-15 08:00 · Score: 2, Interesting
  
  How is being responsible for CAPTCHA breakage useful?
  Look, just because the guy who more or less invented both trolling and automated trolling is an eminent UNIX guru and textbook author that doesn't mean his trolling on net.suicide was any less disgusting. I was appalled at the people who laughed along with Pike when he revealed that he was behind Bimmler and Shaney. This kind of thing is just not acceptable no matter who you are.
3. Re:It turns out... by Anonymous Coward · 2008-08-15 18:00 · Score: 0
  
  LISTEN!
  I have told you time and again!
  PLEASE LEAVE ME OUT OF THIS DISCUSSION!
  .
  geeez....!
Recaptcha doesn't recapture context by Mumei+no+koshinuke · 2008-08-14 16:43 · Score: 5, Interesting

When solving these I sometimes find that there's more than one possibility for an illegible word, yet I can't tell which it is without knowing the context.
For example, in some fonts "cost" and "cast" might be indistinguishable in the image shown. But given the context of the sentence it's trivial for a human to tell the difference.
Suppose that they found these words on which people disagreed and had another captcha system which showed the full sentence. I'd guess they could improve their accuracy significantly in this case. Since they could prescreen for ambiguous words using the current captcha system, even if fewer people were willing to solve the "large" captcha, they would still get all the solutions they needed.
Re:One Problem by brianez21 · 2008-08-14 17:08 · Score: 0

One FUNDAMENTAL problem with this, isn't the point of a captcha to descramble the letters to get access? If contents of the image shown is unknown, then doesn't that defeat the point entirely?
Actually, you are correct that it won't work if the *entire image* is unknown. But with reCAPTCHA it is not. You see, reCAPTCHA works by showing two words, one of which is known and the other that is unknown. When the user gets the known word correct, it is assumed that the unknown word is atleast partially correct. This both validates the captcha and allows them to build their database of scanned "known" words. Of course, to prevent database poisoning, the "unknown" words are still given many times, in order to "cross reference" and reduce the chance human error.

--
kernel: lp0 on fire
If most captchas are already cracked... by Neoprofin · 2008-08-14 17:49 · Score: 1

why don't they just use whatever software is used by the crackers to bombard us with spam email to go through all of these books are whatever speed they're capable of. If compromised PCs can send tens of thousands of fake emails, why not just set a few up to figure out these words/

How much worse is this than trusting users to correctly identify the text? I ask because I honestly don't know the succcess rate of the automated system.
Re:RTFA by Psychotria · 2008-08-14 18:23 · Score: 2, Informative

The authors also tested software designed to crack CAPTCHAs against images created using reCAPTCHA, and found that they failed completely. The authors ascribe this to the fact that the letters in scanned images contain distortions that are not the result of a clean mathematical transformation. User response times were also measured, but there were no significant differences between the time it took users to handle traditional systems and that required to use reCAPTCHA.
Use to hide your own email addy by RJFerret · 2008-08-14 18:58 · Score: 5, Informative

You can also use reCaptcha for your own email address, and be more willing to provide it "publicly" since they'd have to answer the reCaptcha to get to the mailto... reCaptcha mailhide
Interesting field by Anonymous Coward · 2008-08-14 20:03 · Score: 2, Interesting

My company is working on digitizing a large volume of old text (19th century government documents). There are a number of problems unique to old text:
- OCR breaks down due to archaic letter shapes, smudging, letter damage and paper deterioration.
- we evaluated OCR versus having the entire text retyped by Indians, and ended up going with the Indians. The only way to get sufficient accuracy (>99%) was to have everything done twice and do a comparison.
- Even then, the typed text has to be checked using both automated and manual processes. The text is highly structured, which makes automatic checks possible, but we can't catch everything that way. Then again, the checks necessary for our text are more extensive than for an old newspaper.
- For old texts, your average spelling checker is useless. You end up adding loads of words to the dictionary.
ReCAPTCHA solves one of these problem (text entry), but I suspect a fair amount of work remains. E.g. sometimes you need context to decipher a word correctly.
Yeah... by doyoulikeworms · 2008-08-14 20:57 · Score: 1

I feel pretty good about opening my porn bookmarks now that they've adopted reCAPTCHA.
1. Re:Yeah... by JSund · 2008-08-17 03:54 · Score: 1
  
  It's quite possible that they are just harvesting the reCAPTCHA results to spam other sites that are using it for spam protection though.
Mechanical Turk by Wormholio · 2008-08-15 00:50 · Score: 1

The API for adding reCAPTCHA to your web site is fairly easy to use, and there are extensions or plug-ins for applications like MediaWiki, Joomla, Drupal, WordPress, phpBB, etc.
Just to try it out I set up a mechanical turk using reCAPTCHA. So if you like the idea you can keep at it, instead of just solving one of them once. It can be a bit addicting.

--
"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats
1. Re:Mechanical Turk by tiogaplanet · 2008-08-20 04:23 · Score: 1
  
  Wormholio, I too have created a mechanical turk for this. It has an added feature of recording your "score" when a player is logged in.
Re:One Problem by Anonymous Coward · 2008-08-15 05:06 · Score: 0

Ok, the guy didn't RTFA, lika many don't, but is this kind of reply really necessary? I would moderate it as rude rather than funny. (like this one a lot better). Guess I am getting old. :-(
But, Where is the data??? by Tee+Mandel · 2008-08-15 09:48 · Score: 1

The recaptcha.net site does not have links to the OCRed text data they are accumulating. It's nowhere to be found in the F.A.Q. or the wiki. Everything just deals with implementing the API and such. If we, the public are helping to create this archive, where can we download plaintext results of the system?