Now Google's CAPTCHA Is Broken

The real problem is GMail by Animats · 2008-10-02 03:44 · Score: 5, Interesting

Google has become a key enabler in spams and scams, because it's so easy to create GMail accounts in bulk. Many sites block email addresses from Hotmail and AOL, because they're mostly either spammers or losers. GMail once had a better reputation, because it was launched as an "exclusive" service. But we're getting close to the point where probably time to start blocking GMail addresses too.

Want to see a GMail scammer in action right now? Read this.

Re:Why by erroneus · 2008-10-02 03:49 · Score: 3, Interesting

Because they are circumventing a computer security measure. That is a felony in the U.S.

Re:Great Source by mapsjanhere · 2008-10-02 03:50 · Score: 2, Interesting

Especially since there seem to be still doubt if most cracks are actually done by computer, or by humans. They all seem to be happening "off-line" at some unknown destination. Which might be a server cluster in some Russian university, or a sweat-shop in Bangladesh.

--
I'm aging rapidly, I bought a new game and had no idea if my machine was good for it.

captchas, what about handwriting recognition? by theantix · 2008-10-02 03:50 · Score: 4, Interesting

OK can someone pleas hire these guys to work on handwriting recognition software? If they can ready these bizarrely twisted captchas why can't Palm read my name?

--
501 Not Implemented

Re:captchas, what about handwriting recognition? by hankwang · 2008-10-02 04:45 · Score: 2, Interesting

OK can someone pleas hire these guys to work on handwriting recognition software? If they can ready these bizarrely twisted captchas why can't Palm read my name?
Those OCR algorithms are manually tweaked for a specific CAPTCHA algorithm, in the case of Gmail a tightly spaced letter sequence with spatial distortion. Neural networks have been better than humans in recognizing individual letters for a while (see http://research.microsoft.com/~kumarc/ ); the hardest part is separating the letter glyphs so that the neural network knows where to look, which is the purpose of the clutter in old Hotmail captchas and the tight spacing in both Gmail and recent Hotmail captchas.
With normal 'connected' handwriting, separation is obviously pretty tough. Moreover, the handwriting of many persons cannot be deciphered unambiguously on the basis of letter shapes alone. The reader needs to know the context, which becomes painfully obvious if the handwriting is in a different language. Remember the time when medical prescriptions were handwritten? I would say that reading sloppy handwriting is much harder than deciphering a Captcha. If only a computer could generate sloppy handwriting automatically...

--
Avantslash: low-bandwidth mobile slashdot.

IT salaries are just too low. by 140Mandak262Jamuna · 2008-10-02 03:53 · Score: 3, Interesting

If there are people who could write such sophisticated image processing software, and it pays them better to be bot runners bot enablers, the pay must be good on the dark side of the force.

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact

Re:Why by thrillseeker · 2008-10-02 04:02 · Score: 2, Interesting

unless it's the ("wrong") VP candidate's private email ...

Re:Why by DriedClexler · 2008-10-02 04:05 · Score: 5, Interesting

It's baffling that someone with that kind of talent would be working for spammers instead of in a tenured university position.

Not when you consider how much professors make vs. how much spammers who can beat captchas can make. Hint: if you find a quick way to factor semiprimes, don't snag $1 million from the Clay Institute. Reap $1 billion from credit cards. If you can easily toss aside ethics.

Incidentally, I was just reading Douglas Hofstadter's Metamagical Themas, where he goes in great depth talking about the difficulty of defining the letter "A", and how people are capable of recognizing A's in truly bizarre fonts. (And how it carries over to native readers of Chinese and defining Chinese characters.) He pursuasively argues that ability to recognize any 'A', including all the bizarre fonts with 'A' is AI-complete (though of course he didn't use that term). So it seems there's quite a ways to go in making captchas harder: don't just distort the image; use the craziest fonts you can.

--
Information theory is life. The rest is just the KL divergence.

Couldn't that be part of the test? by mengel · 2008-10-02 04:13 · Score: 2, Interesting

Couldn't you do a captcha where the first presentation has no cats? The user has to hit the refresh once or twice before seeing a cat, and then pick it; if they pick any of the non-cats, you call them a 'bot...

--
- "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'

Re:Why by SnowZero · 2008-10-02 04:13 · Score: 2, Interesting

A 1% success rate is good enough to effectively "break" a captchca, but not good enough to really advance the state of machine vision by itself. In the end though, some good OCR work could come of these efforts, but not in comparison to the money and time everyone else loses from spam; We could have just funded the research. Sending spam, and unfortunately writing advanced spam tools, pays better than a university position.

Re:Why by swb · 2008-10-02 04:22 · Score: 4, Interesting

Another benefit is that the drug tests aren't "Have you?" they are "How much do you want?"

Re:Why by daem0n1x · 2008-10-02 04:29 · Score: 2, Interesting

Great. Let's forbid Nmap. Forget that it's a very useful network administration tool. Hackers use it a lot.

Let's forbid cars. Bank robbers use them to escape.

Re:My test: by Jaggo · 2008-10-02 04:38 · Score: 2, Interesting

I've given up. Please just send me large amounts of email asking me to enlarge my pen15 while remortgaging my sub prime house!

Actually, Google spam guard hasn't been reported broken just yet..

Re:Great Source by tsm_sf · 2008-10-02 04:58 · Score: 2, Interesting

Yeah I'm especially doubtful about the claim to have broken 'pick-the-cat'. Either they're using a tiny and generic sample pool, they're the most brilliant software authors of all time, or they're full of shit.

The brilliance of the cat idea is that any series of images can be used as long as they can be divided into either Cat or NotCat by a reasonable human. Think car with giant cat ears, person w/ (shudder) fursuit, letterhead of the California Attorney's Tennis league... you'd need to code the entire human concept of the "cat" gestalt and it's simply not possible right now.

This also raises the question of WHY pick-the-cat isn't implemented in more systems, but I'm guessing it's mainly a matter of captcha programmers being too enamored with their own work.

--
Literalism isn't a form of humor, it's you being irritating.

Re:Why by rockmuelle · 2008-10-02 05:02 · Score: 5, Interesting

"I think the real question is: why are these people not working in research institutes? Image recognition is a hard problem. It's baffling that someone with that kind of talent would be working for spammers instead of in a tenured university position."

So, I have a Ph.D. and know how to write this kind of software (well, I know how to go about writing this kind of software and have done it for other domains). Here's why I'm not working at a research institute or pursing a tenured university position:

First off, research institutes don't really exist anymore. There are a few corporate labs left, but they all focus on medium term product development (5 years out). The national labs still exist, but they're managed like businesses now and it's more difficult to do pure research at them. University "institutes" are just glorified research labs. If you're not the PI, you're either a post-doc, grad student, or tech, none of which is a viable long-term career option.

To get tenure, you have to spend 4-8 years working non-stop writing grants to fund students to do research so you can build up a publication record that impresses the tenure committee. Note that grants and pubs are both necessary: grants show you can bring money into the university, publications get the approval of the committee members outside your domain who only know how to assess research abilities by impact factors.

During this time, all your research is done by graduate students, who are often at the beginning of the careers and have limited technical abilities. They may be brilliant, but they are not the most efficient workers. So, not only do you have to publish, but your labor pool consists of people with 1-3 years experience.

Before tenure, you'll also only pull in about $60-90k/yr (and I know two very smart people who worked for free their first year as "visiting professors" just to get their foot in the door). At the end of this, if you don't get tenure, you're unemployable until you build up some marketable skills.

Contrast this with industry positions. While you don't get to work on whatever you want, there are some very interesting problems out there if you take your time to find a good position. At work, you're hired to do a job, not chase down funding, so you can spend more time working on the fun stuff. The hours are reasonable, so you have time in the evenings for other projects/hobbies (you don't have free time in academia). If you're selective in your employer, you'll also work with people with a broad range of experience and skills. You'll also make more money. And, if you're good and publish from time to time, you can get a tenured position later in life without having to go through the tenure process.

Of course, if you're evil, you can also find work breaking CAPTCHAs and building bot nets.

Note that though this sounds bitter, I'm not... I had a blast going back to school and highly recommend it to people mid-career (hint: go to the mid-west where it's cheap to live and your quality-of-life will remain about the same). But, modern academic environments just don't present an enticing career path.

-Chris

captchas broken. by iam+shaman · 2008-10-02 05:03 · Score: 2, Interesting

who cares, i currently pay 10.00 for 100 social networking accounts from a data entry center in india, their normal business is to create captcha's, they have a program, pops up the picture, they enter what they think they see, when the picture gets a certain percentage of the same entries by multiple agents it completes it, even better, there is another program they use, if they need 1000 gmail accounts, it creates complete profiles on facebook, gmail, myspace, youtube, with pictures, and it just pops up the captcha, thats all they have to type and the account is created. their data entry captcha people work 6 hours a day, 6 days a week, and get between 75 and 100.00 US

Site-specifc Q&A, in CAPTCHA form, might work by mickmel · 2008-10-02 06:29 · Score: 2, Interesting

It seems to me that Q&A is the answer, if done properly. The key is to ask something that can only be answered if you're on the site. For example: "Next to the Slashdot logo at the top-left of the page, there is a five-word phrase. What is the second word in that phrase?"

You'd obviously need to change it up fairly often (and large sites would have problems still), but spammers would have a difficult time keeping track of answers for thousands of sites.

To make it even better, have it rotate through a few similar questions for your site, and have the questions be buried CAPTCHA-style in an image.

All told, it would seem to help. They'd have to resolve a very long CAPTCHA (117 characters in my example above) AND be on the site to get the answer. Seems like it would help.

Re:My test: by ScreamingCactus · 2008-10-02 09:59 · Score: 2, Interesting

I don't see why google doesn't just show a picture out of its index and ask for a word to describe it. The pictures from their index have been tagged by actual humans playing that little game they have, so odds are slim that someone's first and second guesses wouldn't already be tagged to that image. This would be almost impossible to break, because a picture could be anything from a group of words to a picture of a space suit to a painting of Alex Trebek during an earthquake. And they could easily discount images with text and disallow color words (any bot could scan an image and guess "red"). Not only would this deter bots, but it'd probably be easier for someone than trying to decipher a bunch of letters smushed together.

--
The path to enlightenment is truly through homemade drugs!

Slashdot Mirror

Now Google's CAPTCHA Is Broken

18 of 408 comments (clear)