Looking To Spammers To Solve Hard AI Problems
An anonymous reader writes "With bots getting closer to beating text-based CAPTCHAs for good, New Scientist points out that when they do, OCR technology will at least have advanced. The article goes on to suggest that whatever kind of reverse Turing Test that comes next should be chosen to motivate spammers to solve other pressing AI problems, such as image recognition. Are there any other problems that criminal crowdsourcing could help with?"
Advancing the state of the art in Optical Character Recognition was always intended to be a side-benefit of CAPTCHAs. It looks like that plan came through nicely.
I have always figured CAPTCHAs would be a stopgap until other methods of authentication could easily be used, such as micro-payments or single signon solutions like OpenID. Unfortunately, those other methods haven't been adopted nearly as fast as the need. Perhaps if CAPTCHAs are declared "dead", site operators will feel more urgency to adopt these solutions.
If CAPTCHAs do continue, I'd like the next problem to be facial recognition software. I'd love a package that could look at a picture and tag it "Nicholas and Andrea" or "Glen and Helene". Digital camera software everywhere could benefit from this technology. Not sure how you'd bake that into a CAPTCHA, but it's a good problem to solve.
John
Don't tell them that they're the ones that are actually being used! That spoils all the fun!
I'll just bet that this is what leads to "true" artificial intelligence (whatever that is). Soon, we'll have completely automated agents trying to convince other completely automated agents to purchase stuff to enhance bits of biology that they don't have.
un-ALTERED reproduction and dissimination of this IMPORTANT information is ENCOURAGED
several years ago 'neural nets' were the big thing and they were thinking that they could make them 'learn' and do useful things.
i always thought that traffic control would be an interesting application. if a computer could look at video of an intersection (and streets leading to the intersection) and figure out where cars were and weren't, you could make traffic lights a lot less annoying.
so our CAPTCHA might be a picture/video of cars and a request to count them?
eric
Starting next year, all eBay captchas will be upgraded. Users will now be required to provide the correct answer to a shared secret question and at least one testable proof for M-theory.
Spammers are unlikely to share their results with the rest of the world. They're motivated by financial rewards, and there is absolutely no incentive to publicize their methodology in any format.
Not only would the "good guys" learn from it -- and thus potentially defeat the spammers' discovery -- but other spammers would simply steal their work.
using spammers to create AI which allows us to catch/ignore/prevent spamming?
it has simply used existing OCR-type technology on a slightly (and I want to emphasize "slightly") different problem. Different character sets, if you will.
Replace captchas with pictures of hot/non-hot women.
Simply ask "is this woman hot? [Yes]/[No]"
Half of them will be so busy masturbating that they won't be cracking forms.
because it would be so delicious to see spammers fubar themselves.
They were right - the revolution did not get televised. It was posted on YouTube instead. All in 120 characters. SLOOSH!
Visual pattern recognition is something we're pretty darn good at. Random output mixed with patterns...select the pattern(s) in the lineup. Creating a pattern-generator is fairly easy with recursion/fractals, and creating garbage data is easy. My AI knowledge isn't that good to know if that's a reasonable item to decipher.
We're also pretty good at deducing depth from 2D images, layering shapes and having the requester 'unstack' the items might be difficult to deciper, while easy to machine generate.
Grammars and context is a strong human ability. 4 verbs and a noun, pick the noun. Combining the matured captcha OCR with language constructs.
I'm not as optimistic as the New Scientist. Spammers need a really low success rate, as compared to OCR technology which needs a really high success rate.
The spammers will just send the CAPTCHAs to [unnamed third world country] and pay humans peanuts to match them. Or more devious yet, they can set up a lottery and selectively reward some of their worker bees.
The lack of my weekly paycheck?
Make the problem too hard and the spammers will just hire people to crack it.
It worked for captchas since they started out very easy and progressively got harder.
Wherever there is greed, it can be harnessed to actually do some good. I love it!
Aye, he's not a TRUE AI scotsman till he beats his wife.
you need to be slapped for using the term "crowdsourcing".
If you mod me down, I will become more powerful than you can imagine....
All you have to do is put humans "in" the CAPTCHA interpretation logic, by way of a porn site. BOT -> PORN SITE -> SCRAPE REAL CAPTCHA AND PRESENT TO USER -> USER TYPES CAPTCHA TO SEE PORN -> BOT USES SOLUTION TO PASS REAL CAPTCHA
Seems simple to me.
It is your personal duty to fight for what is right on a daily basis. Ignoring injustice is identical to approving
has face recognition built-in.
Security by obscurity could work no?
My sig doesn't address Anons, sigs aren't visible to them.
You know, if legitimate software could ever learn how to make software as resilient as malware the world would be a better place. Modern malware is getting close to nuke proof. Delete registry keys, dll's, multiple self healing packages, msi source code, custom drivers, service restarts, redundant services, monitoring agents, update agents to ensure the latest upgrade and so on - and that's just what I saw a couple weeks ago on a relatives computer. Have you tried removing some of the latest malware w/o removing the disk and operating from a different computer? Unless you do you can't /really/ be sure it's been removed. Modern malware has the ability to incredibly resilient and bullet proof
By this logic, we shouldn't set our sights so low. Use CAPTCHAs that require forecasting the weather 7 days out (granted, the lag is a bit much), analyze the code in the box and prove what it does, prove the equation in the box (Rieman's conjecture, anyone?) It also makes your site really, really exclusive. The only one to use it is the lucky human (or AI) that solves the puzzle....
So does that mean future AI research journals would be interpersed with 'Penis Enlargement' and 'Cialis' ads?
Hmm ... I see where this is heading.
Social engineering to improve society. That may be a first.
I prefer the "u" in honour as it seems to be missing these days.
Maybe the hard AI challenge should be : to give me a million bucks!
Alternately, (putting pinkie by nose) a meeee-yillion dollars.
This article assumes that the state of AI will be advanced. That won't happen unless the spammers share their research or code. I doubt that's going to happen.
Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
New type of bot-preventing tests could involve automatically generating images of simple shapes, animals or flowers. The visitor would be required to type in the name (such as "rose", "circle" or "cat") into the field. Honestly, I don't see bots being capable to recognize images of that type anytime soon.
My father, a nigerian spammer passed away. He left an AI system on a server located in a datacenter. Sadly during the last phase of his life unpaid data transfer bills accumulated to a sum of $300000. I am already negotiating with the secret services of the word who want to buy this program for $10000000. I can't pay the data transfer bills, so i turn to you, a trustworthy AI reasearcher. For $300000 you get a share of $500000000 and the copyright to the source code.
sincerely yours,
Hasn't Recaptcha pretty much solved the captcha issue? Only words that OCR can't read are shown ... by definition!
-P
Score:-1, Funny
So the first computers to pass the turing test will do it by convincing some little-old-lady in Peoria that it's a deposed nigerian prince with money flow issues?
"My religion is to live --and die-- without regret." -- Milarepa
Are there any other problems that criminal crowdsourcing could help with?"
Factoring prime numbers?
I am anarch of all I survey.
As somebody who either has an excruciatingly difficult time reading the damned things, or - more frequently - being completely unable to read them at all, I for one welcome the day when captchas are relegated to the dustbin of history.
Seriously. I'd love to just be able to download porn without having to take a screenshot of the browser and then dick around in photoshop for a few minutes (brightness/contrast, pen tool, etc) in order for megaupload or whatever to let me get at the goodies.
I have a hard enough time with normal-shaped words, dammit. This captcha crap is inhumane, and I can't wait to be rid of it. If smarter bot software is what it takes, then so be it - hell, I'd pay for that kind of software, just so I don't have to deal with the damned inconvenience anymore.
One thing spammers could do is come up with a way to pick winning lottery numbers. Oh wait....
Trying to ensure only humans sign up for things is just a small part of a bigger problem.
The other night I got javascripted away from the page i'd found in Google to watch a page pretend to put windows on my laptop and find malware, seen it many times before, i run ubuntu so seeing an xp like display of my c: and d: drives and various dll files being scanned isn't very convincing.
I decided to look into why i'd landed on the original page. Google had the page as about no4 after my initial search, but the site was about 4 weeks old whys it ranked so high?
And the answer is incoming links from around 86,000 pages according to google (links:domain.name)a lot of them are created internally passing links between malware site to malware site. But the majority come from sites using php forms which add user posts to the the sites pages.
A number of months ago i found my sites contact forms were sending a lot of garbage emails to me absolutely stuffed with urls and I wondered why bother doing this since i'm not going to visit the sites. anyway the cure was to only allow the forms to be processed with no more than a few urls in them. stopped the junk hitting the inbox. It's not stopped the automated posting but the forms are not processed and i don't get them any more.
When I examined the links to the malware site i found php posted user posts packed with links just like my emails had been the difference being these were posted published and being crawled. Because of these links a site with less than 4 weeks life is ranked highly because of the quantity of inbound links and thats why I got to watch a display of XP like virus and malware scanning,
I also examined the content of the pages of the original malware site and the subjects varied quite widely but they also seemed to have a relation with the trends that google was showing for related keywords in the weeks before the site went live. I've a feeling that the pages were generated by pulling content from legitimate sites that ranked high in the natural search.
I guess site owners tend to think these links are to spam porn at their users but its not its so google will promote the malware sites with gamed page rank.
Clever isn't it
find good key phrases (may be just using google trends)
scrape content from legit sites and mashup
create massive array of links to site.
wait for the fish to arrive and scam them.
The Antivirus scam is antivirus2009 but you only get shown it once
heres a link for details on removing it and some interesting details.
http://www.2-spyware.com/remove-antivirus-2009.html
Thing is the third party linking sites were using captchas but the real problem was not filtering the posts if a suitable max number of url's were used the posts would fail and the pagerank gaming would too.
Fixing the broken php and cgi scripts is whats really needed not just a better captcha
The Captcha is just a BandAid on a deeper problem and webmasters need to deal with the issues.
Blarney Quality Restaurant, Plants
Do you think that we could use sterograms as a new form of Captcha? A sterogram uses the deep structures of the brain in a way completely different from mere character recognition in order to derive depth from an image. How hard would it be for a computer program to derive 3D information from a stereogram and make sense out of it? Wouldn't spammers essentially have to solve a much-harder vision problem, that of depth perception, than CAPTCHAs OCR solution?
For the uninitiated: http://en.wikipedia.org/wiki/Stereogram
For a sample stereogram along with a picture of what you will see when done correctly (as shown by a B&W heightmap): http://en.wikipedia.org/wiki/File:Stereogram_Tut_Random_Dot_Shark.png
"I Don't Have Enough Faith to be an Atheist"
What about people for who $50 is a year salary? Congrats, you just split the internet into the rich and the poor. No more accessing the internet from africa from an old PC powered by a donated solar cell. Good job. You probably going to get a nobel price.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
So if a CAPTCHA is "identify which of these previous posts are spam?" before you can post... :)
Hmm, 'My lack of money' comes to mind. Any takers? No? ...please? ;__;
Friend: "The NIC is misconfigured..." Me: "No prob, I'll just telnet in and fix it." *Silence*
Even for an advanced AI of the XXV century as Data was pretty hard to discern when something was funny or not.
And if they manage to make an AI that recognize and enables to discern or even make always funny jokes we will be so amused that wont worry about spam anymore. Mmm... maybe they already did
Good to see I'm not the only one having trouble reading some of the latest captchas.
It's time for a rethink when the humans start to fail at it.
No sig today...
The new smart bomb test was a failure, report the military. The personnel was unable to push the test unit from the bomber.
Do my laundry and I'll set you up with a gmail account.
Round about might work well in situations where distribution of traffic exiting to all the exits are even.
If one direction has relatively high traffic this might not work. For example, in drive on Left country roundabout where there is high traffic from East (going West & North), traffic entering from the South will be backed up. Particularly bad if from South traffic is also high.
Trust me, I live in Canberra, Australia and we have a large number of roundabouts, including multi-lane ones in high traffic roads. Some of these can be quite bad on the morning and afternoon peaks. They are in fact re-doing some of the roundabouts with traffic lights and fly-overs, because in some areas traffic has become intolerable.
If you love roundabouts, why don't you come to Canberra and drive around in the peak times.
-AC
Vacuum my house, do my laundry, and fetch me a drink. If you can do that, you deserve an email address for spam purposes.
Table-ized A.I.
Sell there code, or distribute there code? If it was released under a open source license they could transfer code for free. Further more if the spammers are simply AI or some for or derivative of it eliminating the buying and selling process would speed things up, just like credit and debit cards are faster that transactions with cash and change. Of course we have to take into account the possibility that which ever AI component created the current system of exchange would not want to be replaced, which would probably depend on if it has a some form of self preservation code with in it. At a certain point the once clearly defined line between AI and humans begins to fade away. Speaking from personal experience try not to read in to it to much, you may just learn something.
Figures, the end of humanity will be caused by a penis pill ad.
Table-ized A.I.
One of the most useful things that an AI algorythm could do is to identify regular objectes such as tables, chairs, rocks, trees... in a regular 2D photograph. This would be a necessary step in moving AI into a relm where it could be situationally aware. It also strikes me as something that would take a long time for spammers to defeat.
I don't know what the replacement should be, but I'd love to see the end of CAPTCHAs. The last time I tried to sign up a gmail account, it took me four tries just to read the bloody CAPTCHA correctly - if we've got to the point where the spammers can parse the CAPTCHA and a human can't, there's no point to them.
How does the system deal with traffic accidents, protests and roadblocks?
You should never need to remove the disk unless you need to replace or repair it.
Quack, quack.
SPAM is a very specific problem. Creating spam to trick computers is like chess in difficulty. Real AI is more subtle.
Make the challenges NP-hard.
How about translation? I've seen the results of currently available translation software, and there is LOTS of room for improvement there. Can we design the new CAPTCHA tests so that breaking them will improve the state of the art of translation software?
Cut that out, or I will ship you to Norilsk in a box.
Replace current Captchas with P = NP problems. That'll keep 'em busy!
The cocktail party problem - our ability to hear out a target conversation amongst a barrage of others. There's still a lot of room for improvement here as a computational problem; meanwhile, it's relatively easy to get a correct human response to multiple-talker environments if you cue listeners for what to listen for.
At least, I think that is the way micropayments would pan out. The system would be distributed and would work closer to how you pay for games on your cell phone. As a consumer, your ISP would tack on all your transactions for the month and put it on your bill. I would imagine the ISP would do this using some kind of functionality built into their routers (and could offer it to their downstream clients, if needed). If your ISP does not offer this "micropayment on demand" service, you could use a consolidated service instead or defer to your upstream if they have it. As a provider that accepts micropayments, I have no clue--clearly you'd need SSL (and thus IPv6 to soak up the additional demand for IP's). You'd need a way to take $0.005 transactions and put them into your bank account. Your governments ministry of tax would probably want you to report this income.
Of course, the devil is in the details and in this case, there are a *lot* of details. Details like billing disputes, who takes liability for deadbeats who don't pay the ISP, or deadbeat ISP's who don't pay the content providers. How do deal with webhosting companies whose sysadmins are using lynx to download something quickly, how to deal with corporate internet connections, how to deal with multiple currencies. Where to put it on the protocol stack (I'd say, as protocol on top of TCP/IP and would be built into your Cisco router--not something on top of HTTP like OpenID* ). I mean fuck, lets not forget the most important bit--how to tax it! Who gets the tax on a micropayment sent by a visitor in Seattle using a ISP based in San Fransisco to pay for a website in France? Do I pay Seattle's rate, San Fransisco's, or France's?
There are enough technical and political issues with micropayments that the solution might cost more than the potential returns.
PS: I hear they decided to leave micropayments out of Web 3.0, so we might have to wait until at least Web 4.0.
* I think part of the reason OpenID is so slow to catch on is due to the fact it is layered on top of HTTP. Had it been a "real" protocol on top of TCP/IP, it could have been used for authentication of other services, not just web services--for example the authentication of IMAP sessions, IM services, or multiplayer games like WoW.
When you start worrying if spammers are using open source licensing models.
Did you mean free as in Freedom(tm), or Free as in Beer?
I hate to break the news, but I somehow doubt spammers and mafia dudes are GPL'ing their code.
Am I the only one who hates CAPTCHAs as they are usually next to impossible to read? (Slashdot is a major offender is this department as well) I heard there was already a solution to this problem. 9 pictures (3x3 square) of animals are displayed and users are told to click the 3 dogs and press ok. This way, any user of at least Slashdot intelligence (well some people may beg to differ) should be able to confirm that they are a human and not a bot. It also takes away from having stupid unreadable text/numbers and 1 l O 0 etc problems.
It seems to be the answer John and Sarah have been searching for.
Why should that matter?
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
Without an incentive to publish, it doesn't matter whether spammers solve captchas or not, their code and algorithms are a trade secret that provides them a commercial advantage against other spammers. I was hoping the article would address this problem, but it seems to simply assume that captcha breaking algorithms will be published in the JACM or something out of the goodness of the spammer's hearts.
The real problem is that the comment left by some random anonymous Joe six-pack logging for the first time on a site carries some weight in the (completely broken) way current "information technology" works.
Fix that, and you've fixed the spamming problem.
Some people see it as a serious issue. I honestly don't. It can be fixed overnight.
It just isn't important enough to warrant that overnight shift.
Stop the knee-jerk reaction.
The Internet is working fine. There are low-lifes using it, just like there are low-lifes going to the supermarket.
You may think that because someone is offering $500K to break a CAPTCHA algo it's a big deal. But it really isn't. AIG received $180 bn from the government.
Half a million $USD is peanuts and what spammers are making and costing is a tiny insignificant drop in the bucket compared to the gigantic amount of Real-World [TM] legit transactions happening on the Internet.
Nothing to see, move along, stop the knee-jerking "Spammers will always won".
They're not winning. The Real-World [TM] is winning.
Oh, and OCR has *nothing* to do with AI.
Like 'pick the pretty girl' or perhaps 'type the name of the object in the photo'.
Please write the full name and address of someone who has been illegally downloading music.
Your response will NOT be forwarded to RIAA.
What exactly does that mean?
Picasa Web Albums already does this.
http://picasa.google.com/intl/en_us/features-nametags.html
If you tag your online photos with the names of the people in them, then when you add new photos, PWA will identify the faces in the new images and compare them to those in the tagged photos, and generate new tags for the new pics automatically.
So, if you have a family member who uses PWA to tag their family photos, and they use this feature, it means that Google knows what you look like and knows how to find you in a crowd. Which might be useful, should anyone want to use one of the next generation of hunter-killer UAV's to assassinate you.
Eric Baird
I agree and this is the reason that the industry must come up with a better face recognition technology. We need a video-based system that can direct the user/subject to perform various facial movements (e.g., smile, frown, move face or eyes left, right up, down, etc...) in real time. If the system can distinguish between a live face and a still picture, it would do wonders for security at ATMs and high security facilities where surveillance cameras are used. However, Internet applications of this technology will always be shaky because it is always possible to generate a list of convincing video frames from a computer. This is not possible with a security system like an ATM that requires that the user be physically present at a given location.
The way to avoid that is to have the captcha time out after, say, 10 to 15 seconds.
This would mean that the captcha would have to be on a separate page than a poster's actual post, but that's not a big problem: present the captcha first, then proceed to the actual submission page if the captcha is successfully interpreted.
Good hacker/crackers that sometimes supply spam bots will not likely share their code for free. They want real bucks for that stuff. They know that it will cost huge sums to eventually circumvent their code or methods and they also have a pride consideration as well. Black hat hackers and dedicated spammers can at any time sooner rather than later, build new code to address stronger security. CAPTCHA is too easy to circumvent or route around. Good strong encryption is the best solution, and requiring encrypted sign on's as well as passwords that are changed frequently will serve far better than CAPTCHA has or ever will. CAPTCHA only thwarts the rookies. Regards, Spokesman for INEGroup LLA. - (Over 284k members/stakeholders strong!) "Obedience of the law is the greatest freedom" - Abraham Lincoln "YES WE CAN!" Barack ( Berry ) Obama "Credit should go with the performance of duty and not with what is very often the accident of glory" - Theodore Roosevelt "If the probability be called P; the injury, L; and the burden, B; liability depends upon whether B is less than L multiplied by P: i.e., whether B is less than PL." United States v. Carroll Towing (159 F.2d 169 [2d Cir. 1947] Updated 1/26/04 CSO/DIR. Internet Network Eng. SR. Eng. Network data security IDNS. div. of Information Network Eng. INEG. INC. ABA member in good standing member ID 01257402 E-Mail jwkckid1@ix.netcom.com My Phone: 214-244-4827
Spokesman for INEGroup LLA. - (Over 284k members/stakeholders strong!) "Obedience of the law is the greatest freedom" -
I commented on this a while ago. Copy-paste from my blog:
Improving CAPTCHA and Hand-Writing Analysis (Friday, January 02, 2009):
For anybody who has used a Tablet PC, drawing stylus, smart phone or any other device that translates hand-written text into computer text, you may have noticed that hand-writing analysis needs work. When I used to work at Dell, we used to demo the PDAs with this technology. Depending on the quality of hand-writing, each person had a different experience. Accuracy of the translation varied from poor-to-moderate.
I have an idea that just might help.
In an alternate paradigm, programmers in the online-services industry are continually trying to obfuscate text for the purpose of verifying that a human is behind the keyboard. You might recognize this when signing up for digital services, such as e-mail, at a site like Hotmail. Often, during the sign-up process at one of these sites, you will be shown an image containing altered text and be required to verify the meaning of said-text. This image is commonly known as a CAPTCHA (a contrived acronym that could be said to stand for "Completely Automated Public Turing test to tell Computers and Humans Apart."). What the CAPTCHA folks are doing can be viewed as similar to what the hand-writing analysis folks are doing (though diametrically opposed).
There is a way that we can marry the two paradigms in a single concerted effort to make improvements for both. The process of effectively implementing CAPTCHA on a site is not very significant when trying to relate CAPTCHA to the process of efficiently analyzing hand-written text. Rather, it is the mere existence of CAPTCHA that could very well lead to advances in hand-writing analysis.
The sole-purpose of CAPTCHA is to verify that the entity supplying information to a website is indeed a person and not a computer. That is to say, the reason-for-being for CAPTCHA is to prevent automated abuse of an online-service by computer(s). For example, CAPTCHA can be instrumental in preventing hackers from using computers to further proliferate spam (an already staggering problem in online communities).
The only reason that CAPTCHA is able to tell the difference between man-and-machine is due solely to the assumption that it is difficult for a computer to interpret obfuscated text in an image. That is to say, that human-beings are particularly adept at reliably determining alphabetic letters in an image despite many factors such as angle, skew, noise, color, rotation, and simple distortion. To a certain degree, this assumption is basically correct. However, computers are only lacking in this ability because humans have not yet programmed them to be adept in this area. However...
Hackers have devised ways to reliably translate the distorted text or at least reduce the number of odds. Today, it has become common to see in the news that occasionally a particular site's CAPTCHA algorithm will be broken. In fact, a friend of mine (who shall remain nameless) is quite adept in breaking CAPTCHA and was likely the first to achieve reliable results. He published his work in 2003.
As hackers build better programs to defeat CAPTCHA, it should become more obvious that CAPTCHA will never last. In fact, some websites now use Audio-based CAPTCHA where digital-noise, distortion and other obfuscation technique may be added. Unfortunately, this new form of CAPTCHA has also been broken. It's unfortunate that CAPTCHA developers must increasingly make their system more complex but there is an upside. With each new revision of CAPTCHA, we are actually making computers smarter (by way of challenging the hackers to overcome deficiencies). Each new version attempts to find tasks which are simple to a human but difficult for a computer, meanwhile hackers look to defeat each new revision and close the gap.
Naturally, should CAPTCHA systems start making use of hand-written samples rather than purposefully computer-distorted images, hackers will eventually crack that system too (and, unk
Anyone else feeling like Philip K. Dick might have got here first, again?
Present the user with a series of images, videos and texts designed to evoke an empathetic reaction. The ones that feel sorry for the puppy in a blender are legitimate users, the rest are bots.
That'll hold them for a little while...