Gmail CAPTCHA Cracked
I Don't Believe in Imaginary Property writes "Websense is reporting that Gmail's CAPTCHA has been broken, and that bots are beginning to sign up with a one in five success rate. More interestingly, they have a lot of technical details about how the botnet members coordinate with two different computers during the process. They believe that the second host is either trying to learn to crack the CAPTCHA or that it's a quality check of some sort. Curiously, the bots pretend to read the help information while breaking the CAPTCHA, probably to prevent Google from giving them a timeout message."
and I cannot help but wonder if this will increase our usually abysmal rate for reading handwriting. (and no, I don't design it myself so no ripping on me, just work with it)
This is a tangent, but I'm curious: this site blurs out a lot of text, presumably for privacy. How secure is that? It seems like it would be fairly easy (given knowledge of the font, which you have from other parts of the screenshot) to figure out what the underlying text is. I wish people would just black out things they don't want you to know.
This makes one wonder: Is it possible that it is cost effective for spammers to employ low-cost human labor and that they pipe all these captcha challenges to this set of humans whose sole job is to stare at computer screens with pending captcha challenges and answer them?
:) )
(I would imagine that this job would have high turnover
Sigh.
Maybe the days of convenient on-demand service signup are coming to an end. Wikipedia already puts new accounts "on probation" for a few days - they can't edit certain articles and can't create new ones.
I see a time when Google and other free-mail providers limit new accounts to a few dozen outgoing messages a day, and raises the limit only when you've 1) logged in to check mail on 10 different days over at least a 30-day period, 2) sent at least 100 distinct messages to at least a few dozen distinct addresses, and 3) actually requested the limit be raised. Those needing higher limits sooner can pay $1 by credit card to have an override-code mailed to them.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
How's that relevant?
A linux desktop O/S is just as insecure technically.
The linux (and Apple) desktops are just more secure by the same reason a hut in a small remote village is more secure than an apartment in a big city ghetto - a one room apartment with many locks, metal doors and chains, but where the occupants let in muggers just because they said they were from Ebay.
They're both not secure.
The trick is to NOT have a _one_room_ apartment or hut. You need an "airlock" (sandbox) for your browser (not just rooms for each person).
That raises an interesting idea... why not use the capchas to perform some useful work? Example... display a scanned line of text from a project that needs a large volume of text OCR'd for free/cheap. Compare the texts from several submitters, and assume groups with a high match rate are reading it correctly.
This accomplishes three goals:
- fairly effective capchas
- accomplishes something
- causes OCR quality to improve (via the hard work of the botnet coders)
Not saying the above example is ideal, just trying to illustrate the idea. Take advantage of available resources (be they real people or botnets) and harvest it to accomplish something practical with it.
I work for the Department of Redundancy Department.
They are an awful abomination on all website usability and is becoming increasingly common they just don't do what they are supposed to do any more.
So it seems that these companies have two options, either make the letters and numbers more unreadable and more frustrating to users, or scrap them completely and come up with a new anti-bot scheme.
My favorite so far is KittenAuth (http://www.thepcspy.com/kittenauth). It's easy to use, and would be a hell of a lot harder to crack then letters and numbers. Most importantly it's cute! So adorable
> A linux desktop O/S is just as insecure technically.
Secure from what? Internal or external threats? In the internal case it exhibits better protection from escalation of privilege (than windows, see Sony rootkit for an example). In the external case is affords simpler accounting of the processes laying around.
>The linux (and Apple) desktops are just more secure by the same reason a hut in a small remote village is more secure than an apartment in a big city ghetto - a one room apartment with many locks, metal doors and chains, but where the occupants let in muggers just because they said they were from Ebay.
No, it is more secure for a some applications because less of the network facing executable code needs to run at as high a privilege level.
>They're both not secure.
That depends entirely on the threat model you are protecting against. If you want it really secure from the network, take it off the network. If you want it secure from users put it in a locked room and have multi person, multi factor authentication to access it and require dual operator controls so no individual can pull something off unobserved. This is how PKI centers work. If you want a secure online server, you need accounting of the trusted code. The extend to which Windows and Linux compare is quite different for those cases.
>The trick is to NOT have a _one_room_ apartment or hut. You need an "airlock" (sandbox) for your browser (not just rooms for each person).
Or you might document and analyze your threat model first, before protecting against those threats.
Evil people are out to get you.
If the bots are stalling for time, it's quite likely someone's home-grown version of Mechanical Turk distributed "human" task service, similar to the one by Amazon.
The image is put on queue and, say, a good number of, say, overseas employees... are getting the image and need to fill back in the solution as plain text. In the mean time the bot is "reading the manual".
When the bot gets the answer in time, it submits the form and there we go, account.
If the web browser guys could agree on a standard to inform people that their computers look like they're infected, the major email and associated portal providers could start inserting signed messages in web pages that will inform the users that their computers are infected based on this kind of information.
I wonder if it's worth it to Microsoft and Google and Yahoo and AOL to team up to fight these increasingly powerful and sophisticated bot nets.
The typical method, I believe, is to use about 9 or 16 images that you make a binary choice on. You can get every one right your first try (1/2^9 is one in 512, not very good... especially if you stick in a 10 second or so throttle per IP), or miss one in each of your first two tries (getting 16/18 right is worse then 1/512 I think), but the most you miss the more likely your IP gets blocked for days.
Aren't Google's CAPTCHA's basically the same for all their services (e.g. Google Groups)? I think Google Groups might be seeing quite a bit more spam...Blogger, Youtube/Google Videos, and Groups are all services that I could conceivably see getting spammed (assuming that the CAPTCHAs are similar, if not the same; I haven't checked).
Of course, Google being the fast-responding company that it is, they will doubtlessly have a new CAPTCHA by 12 hours from now, if not before.
Would this not be a reliable way to bypass almost all captchas?
Since most have a spoken option for visually disabled people, would it not be possible activate that and then run a voice recognition app on that sound clip?
Since many voice recognition apps are able to filter noise to some degree, even introducing background clutter would not make it difficult to pull the captcha information.
[All Your Fish Are Belong To Us]
Google mail is loved by spammers since gmail does not embed within the SMTP headers any tracking information about the physical client browser's IP address. Hotmail and Yahoo!, with all of their other problems do however by adding X-Originating-Host tags, etc.
By breaking the CAPTCHA the spammers are basically creating the biggest SMTP IP address laundering system available on the net today. Who in their right mind is going to block gmail with the exception of domains that receive small amounts of personal email traffic and temporary IP address repudiation scoring systems like spamcop?
It's true no man is an island, but if you take a bunch of dead guys and tie 'em together, they make a good raft.
1) Spammers break Google CAPTCHA
2) Google responds by taking GMail offline for 12 hours
3) Users are piseed at Google, Google's stock tanks, Spammers keep using Hotmail and Yahoo to spam
4) Other groups realize they can pull off a DoS on Google just by signing up for GMail accounts and spamming.
Everyone has their own pet concerns. Some people worry about pesticides on the food, some about global warming, some about that devil music the kids listen to. There aren't enough hours in the day for everyone to worry about every problem.
^I'm with stupid.^
Imagine yourself in Google's place. You can go up the invitation tree from any node in a single, unique way, and always straight to the very top (or a handful of those). There will be, say, 100 hops from a known bot to the root. Which node is the first human?
There was a presentation at google talk: 'Using Data to "Brute Force" Hard Problems in Vision and Graphics' by A. Efros.
Basically it's not that hard to teach computer to recognize things if you have shitload of pre-tagged images.
Google and many other universities already have program in recruiting people to do things computers can't do well. One of those that google already uses is image tagging. Show images and ask people to write down words of what's in them. So they could simply do this with two or three images they recently obtained good label sets for. They could even throw in a fourth not-yet known labeled image and use the sign-up process to gather new image labels.
There's all sorts of hard problems like this. Another single player game is to show an image with a lot of things in it. Then give a word describing one aspect of the image and ask them to click on the part of the image that conveys that meaning.
The if you have many concurrent sign-ups there lots of two player games both symmetric and assymetric. a short chat session in the vein of the game "password" in which one person makes a series statements about an object ("it is liquid", it is white, it is tasty, you find it in the refrigerator of many homes", it comes from cows....) and the other person has to reply with "milk". Then both players are validated.
The last is a very useful AI product by the way especially if the first player is forced to use a controlled grammar where he just fills in some of the nouns or verbs but does not construct the sentence forms. This gathers a set of true assertions about an object that allow computers to learn semantics and meaning.
Some drink at the fountain of knowledge. Others just gargle.
I'm sure that this has been talked about somewhere, but the database for images for a system like this would need to be enormous. HUGE. Essentially a CAPTCHA is a picture, an automatically generated obscured picture where you have to explain the content. The kitten content is a difficult one to crack, but once it is, the picture doesn't change. Once that picture is solved it must be removed from the database. This requires a database that has to contain orders of magnitudes more pictures than an individual can crack in real time. For CAPTCHAS these new images are created for each request, something that has to be done for the kittenAuth as well, while encrypting content at the same time. Difficult.
Ingredients:
1) A web registration form with a CAPTCHA input;
2) 1 easily-OCRed image;
3) Some creative use of JS/CSS
Depending on how much you want to obfuscate, enclose the CAPTCHA input in a DIV tag, and set that div to display: none. The robot will see the image, OCR it, and fill it out.
Then you reject any application that actually has an input for the CAPTCHA.