Looking To Spammers To Solve Hard AI Problems

← Back to Stories (view on slashdot.org)

Looking To Spammers To Solve Hard AI Problems

Posted by timothy on Saturday April 18, 2009 @02:39PM from the dating-site-matchmaking dept.

An anonymous reader writes "With bots getting closer to beating text-based CAPTCHAs for good, New Scientist points out that when they do, OCR technology will at least have advanced. The article goes on to suggest that whatever kind of reverse Turing Test that comes next should be chosen to motivate spammers to solve other pressing AI problems, such as image recognition. Are there any other problems that criminal crowdsourcing could help with?"

26 of 271 comments (clear)

Min score:

Reason:

Sort:

It was supposed to happen. by plover · 2009-04-18 14:40 · Score: 4, Interesting

Advancing the state of the art in Optical Character Recognition was always intended to be a side-benefit of CAPTCHAs. It looks like that plan came through nicely.
I have always figured CAPTCHAs would be a stopgap until other methods of authentication could easily be used, such as micro-payments or single signon solutions like OpenID. Unfortunately, those other methods haven't been adopted nearly as fast as the need. Perhaps if CAPTCHAs are declared "dead", site operators will feel more urgency to adopt these solutions.
If CAPTCHAs do continue, I'd like the next problem to be facial recognition software. I'd love a package that could look at a picture and tag it "Nicholas and Andrea" or "Glen and Helene". Digital camera software everywhere could benefit from this technology. Not sure how you'd bake that into a CAPTCHA, but it's a good problem to solve.

--
John
1. Re:It was supposed to happen. by Ilyakub · 2009-04-18 15:19 · Score: 4, Informative
  
  Facial recognition is not only pretty good, but is available in consumer applications. Google's Picasa does it quite well for your personal photos, and Face.com can go through your Facebook photos and quite accurately suggest tags.
2. Re:It was supposed to happen. by fuzzyfuzzyfungus · 2009-04-18 15:51 · Score: 5, Insightful
  
  But has it?
  
  Unfortunately, CAPTCHA is radically easier than actual OCR. When cracking a CAPTCHA, achieving a success rate of 5-10% is absolutely fine. Plus, when you submit your answer, you are told whether or not you got it right. With OCR, anything short of high 90's is pretty much useless, and the only feedback available is through manual human intervention, which scales poorly.
  
  Arguably, the only significant OCR advance has been RECAPTCHA, which is just a clever way of making humans do the hard stuff in a way that actually helps, rather than just using makework problems.
  
  It is certainly true that CAPTCHA cracking has advanced considerably, that just doesn't apply too neatly to real OCR problems.
3. Re:It was supposed to happen. by digitalchinky · 2009-04-18 16:29 · Score: 4, Informative
  
  CAPTCHAs have been dead for a long time already. Please direct me to the spam software that can actually read and interpret these for me, because I have about an 80% failure rate. I'm human, the very thing that is supposed to be able to figure all this out. If I see a site asking me to type in some obscure word or number, I click elsewhere. It's just too much trouble.
  Spammers aren't using software to solve this problem anyway! Bold statement you might say? Maybe. Travel your backside to Asia, or, from the comfort of your own chair you could visit sites like sulit.com.ph (think craigs list wanna-be, it's that kind of thing) Every 3rd advert is asking for 'writers' that can log in to forums and post at least 3 or more messages before getting banned. How much does the lucky employee earn? About $200 USD and up per month. It's real money. So who is paying for this? People like the PHB in 1st world corporate wasteland, maybe your CEO thinking it's a good way to get more hits, maybe you. (No, not you personally) Evidently it works or the money wouldn't be flowing, and you wouldn't have 3000 people advertising this service each and every day.
4. Re:It was supposed to happen. by Wannabe+Code+Monkey · 2009-04-18 17:17 · Score: 5, Interesting
  
  If CAPTCHAs do continue, I'd like the next problem to be facial recognition software. I'd love a package that could look at a picture and tag it "Nicholas and Andrea" or "Glen and Helene". Digital camera software everywhere could benefit from this technology. Not sure how you'd bake that into a CAPTCHA, but it's a good problem to solve.
  How about this: The user is presented with a short message that they have to mark as "Spam" or "Not Spam". If the spammers get really good at solving this problem, they've effectively written themselves out of a job. And if they can't do it, then they can't get new accounts.
  
  --
  We always knew Comcast was corrupt, here's the proof: http://tech.slashdot.org/comments.pl?sid=1909890&cid=34545432
5. Re:It was supposed to happen. by Joce640k · 2009-04-18 18:23 · Score: 3, Funny
  
  How about a "hot or not" test? How good are computers at deciding if somebody is hot or not?
  (Yeah, it's a joke, I understand the statistical implications of multiple-choice Turing tests).
  
  --
  No sig today...
6. Re:It was supposed to happen. by gd2shoe · 2009-04-18 18:34 · Score: 4, Interesting
  
  I like it, but it has issues that may be hard to work out.
  (1) If they only needed to solve one (or any small number), then the spammer's auto system will only need to guess. Present the potential user with 3 of these and they'll get fed up. The spammer's system, on the other hand, will get 11% correct by guessing. That's enough for them to thwart the system.
  (2) It's really easy to get samples of spam. Any user who clicks the spam button has stated that it's not their mail. (Multiple users flagging the same message tell you it's practically certain to be spam) It's not a huge stretch to acquire or assume permission to use the message. Getting legitimate samples (of varieties of email) may be much harder.
  
  --
  I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
7. Re:It was supposed to happen. by Captain+Hook · 2009-04-18 20:08 · Score: 3, Interesting
  
  How about a "hot or not" test?
  since beauty seems to be largely evaluated on symmetry and ratios of various parts of the face and body relative to other parts and existing facial recognition systems already work by measuring distances and ratios between those points, I don't think that would be all that hard.
  
  --
  These comments are my personal opinions and do not necessarily reflect the opinions of the other voices in my head.
8. Re:It was supposed to happen. by Anonymous Coward · 2009-04-18 23:08 · Score: 3, Informative
  
  Actually this exists: http://spamornot.org/
SSSHHH!!!! by Anonymous Coward · 2009-04-18 14:44 · Score: 3, Funny

Don't tell them that they're the ones that are actually being used! That spoils all the fun!
True AI by not-my-real-name · 2009-04-18 14:46 · Score: 5, Funny

I'll just bet that this is what leads to "true" artificial intelligence (whatever that is). Soon, we'll have completely automated agents trying to convince other completely automated agents to purchase stuff to enhance bits of biology that they don't have.

--
un-ALTERED reproduction and dissimination of this IMPORTANT information is ENCOURAGED
1. Re:True AI by KPU · 2009-04-18 15:32 · Score: 4, Insightful
  
  This is a reasonably accurate description of the stock market.
a possible idea by ecalkin · 2009-04-18 14:50 · Score: 3, Insightful

several years ago 'neural nets' were the big thing and they were thinking that they could make them 'learn' and do useful things.
i always thought that traffic control would be an interesting application. if a computer could look at video of an intersection (and streets leading to the intersection) and figure out where cars were and weren't, you could make traffic lights a lot less annoying.
so our CAPTCHA might be a picture/video of cars and a request to count them?
eric
1. Re:a possible idea by MichaelSmith · 2009-04-18 15:16 · Score: 4, Interesting
  
  I used to work on a traffic signal system in Australia. At one point we hosted an experimental system from (I think) the CSIRO which displayed the speed you would have to travel at get a green at the next intersection. The problem with that was that it gave really bad, but accurate advice, like travel at 12km/h or 80km/h. This is where the limit is 60. So they changed it to only display speeds below and close to the limit and then it was even more useless.
  
  The actual algorithms which determined the timing of the signals was hand assembled by traffic engineers in 12 bit PDP/11 machine code, so it was impossible to know exactly how it worked.
  
  Maybe that system was intelligent. It certainly had a lot of emergent properties.
  
  --
  http://michaelsmith.id.au
But will they share their code? by dameepster · 2009-04-18 14:51 · Score: 5, Insightful

Spammers are unlikely to share their results with the rest of the world. They're motivated by financial rewards, and there is absolutely no incentive to publicize their methodology in any format.
Not only would the "good guys" learn from it -- and thus potentially defeat the spammers' discovery -- but other spammers would simply steal their work.
1. Re:But will they share their code? by ceejayoz · 2009-04-18 15:08 · Score: 4, Informative
  
  Spammers sell their code to other spammers all the time.
how about... by inzy · 2009-04-18 14:53 · Score: 5, Interesting

using spammers to create AI which allows us to catch/ignore/prevent spamming?
Beat them with sex by Anonymous Coward · 2009-04-18 15:01 · Score: 3, Funny

Replace captchas with pictures of hot/non-hot women.
Simply ask "is this woman hot? [Yes]/[No]"
Half of them will be so busy masturbating that they won't be cracking forms.
Re:Busting captchas has not advanced anything... by Jane+Q.+Public · 2009-04-18 15:27 · Score: 4, Insightful

I would agree, if general-purpose captcha-beating software were available. But that isn't so. Each captcha system was beaten by custom code, individually written for that system. So in effect, it is not much different than adding a new font to existing OCR software.
Resiliant software by onyxruby · 2009-04-18 15:49 · Score: 3, Interesting

You know, if legitimate software could ever learn how to make software as resilient as malware the world would be a better place. Modern malware is getting close to nuke proof. Delete registry keys, dll's, multiple self healing packages, msi source code, custom drivers, service restarts, redundant services, monitoring agents, update agents to ensure the latest upgrade and so on - and that's just what I saw a couple weeks ago on a relatives computer. Have you tried removing some of the latest malware w/o removing the disk and operating from a different computer? Unless you do you can't /really/ be sure it's been removed. Modern malware has the ability to incredibly resilient and bullet proof
Dear Friend, by drolli · 2009-04-18 16:38 · Score: 5, Funny

My father, a nigerian spammer passed away. He left an AI system on a server located in a datacenter. Sadly during the last phase of his life unpaid data transfer bills accumulated to a sum of $300000. I am already negotiating with the secret services of the word who want to buy this program for $10000000. I can't pay the data transfer bills, so i turn to you, a trustworthy AI reasearcher. For $300000 you get a share of $500000000 and the copyright to the source code.
sincerely yours,
Ignoring the real problem. by blackest_k · 2009-04-18 17:32 · Score: 4, Interesting

Trying to ensure only humans sign up for things is just a small part of a bigger problem.
The other night I got javascripted away from the page i'd found in Google to watch a page pretend to put windows on my laptop and find malware, seen it many times before, i run ubuntu so seeing an xp like display of my c: and d: drives and various dll files being scanned isn't very convincing.
I decided to look into why i'd landed on the original page. Google had the page as about no4 after my initial search, but the site was about 4 weeks old whys it ranked so high?
And the answer is incoming links from around 86,000 pages according to google (links:domain.name)a lot of them are created internally passing links between malware site to malware site. But the majority come from sites using php forms which add user posts to the the sites pages.
A number of months ago i found my sites contact forms were sending a lot of garbage emails to me absolutely stuffed with urls and I wondered why bother doing this since i'm not going to visit the sites. anyway the cure was to only allow the forms to be processed with no more than a few urls in them. stopped the junk hitting the inbox. It's not stopped the automated posting but the forms are not processed and i don't get them any more.
When I examined the links to the malware site i found php posted user posts packed with links just like my emails had been the difference being these were posted published and being crawled. Because of these links a site with less than 4 weeks life is ranked highly because of the quantity of inbound links and thats why I got to watch a display of XP like virus and malware scanning,
I also examined the content of the pages of the original malware site and the subjects varied quite widely but they also seemed to have a relation with the trends that google was showing for related keywords in the weeks before the site went live. I've a feeling that the pages were generated by pulling content from legitimate sites that ranked high in the natural search.
I guess site owners tend to think these links are to spam porn at their users but its not its so google will promote the malware sites with gamed page rank.
Clever isn't it
find good key phrases (may be just using google trends)
scrape content from legit sites and mashup
create massive array of links to site.
wait for the fish to arrive and scam them.
The Antivirus scam is antivirus2009 but you only get shown it once
heres a link for details on removing it and some interesting details.
http://www.2-spyware.com/remove-antivirus-2009.html
Thing is the third party linking sites were using captchas but the real problem was not filtering the posts if a suitable max number of url's were used the posts would fail and the pagerank gaming would too.
Fixing the broken php and cgi scripts is whats really needed not just a better captcha
The Captcha is just a BandAid on a deeper problem and webmasters need to deal with the issues.

--
Blarney Quality Restaurant, Plants
Nice going, you just invented the tiered net by SmallFurryCreature · 2009-04-18 17:39 · Score: 4, Insightful

What about people for who $50 is a year salary? Congrats, you just split the internet into the rich and the poor. No more accessing the internet from africa from an old PC powered by a donated solar cell. Good job. You probably going to get a nobel price.

--

MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
1. Re:Nice going, you just invented the tiered net by nemesisrocks · 2009-04-19 12:07 · Score: 3, Insightful
  
  You messed up. CAPCHA is not a test to tell if your viewers have any money. It is just a test if they are a human or computer.
  Actually, CAPTCHA is usually a test to see if the viewer can read English. The biggest problem with reCAPTCHA is that all of the words are English.
  I can't imagine it'd have anywhere near the success it's seen if it were trying to get you to do OCR for Japanese, or even Polish...
Re:How About Using Stereograms? by adnonsense · 2009-04-18 17:53 · Score: 3, Insightful

What about people like me who can't seem to get the hang of the darn things? (I personally wouldn't be surprised if they're some kind of elaborate hoax...)
Re:Recaptcha by sulliwan · 2009-04-18 20:54 · Score: 3, Informative

You only have to get the word that OCR can recognize right. Just try guessing which of the two words OCR can't recognize and type some random gibberish instead of that word, it will let you through.