Slashdot Mirror


Looking To Spammers To Solve Hard AI Problems

An anonymous reader writes "With bots getting closer to beating text-based CAPTCHAs for good, New Scientist points out that when they do, OCR technology will at least have advanced. The article goes on to suggest that whatever kind of reverse Turing Test that comes next should be chosen to motivate spammers to solve other pressing AI problems, such as image recognition. Are there any other problems that criminal crowdsourcing could help with?"

16 of 271 comments (clear)

  1. It was supposed to happen. by plover · · Score: 4, Interesting

    Advancing the state of the art in Optical Character Recognition was always intended to be a side-benefit of CAPTCHAs. It looks like that plan came through nicely.

    I have always figured CAPTCHAs would be a stopgap until other methods of authentication could easily be used, such as micro-payments or single signon solutions like OpenID. Unfortunately, those other methods haven't been adopted nearly as fast as the need. Perhaps if CAPTCHAs are declared "dead", site operators will feel more urgency to adopt these solutions.

    If CAPTCHAs do continue, I'd like the next problem to be facial recognition software. I'd love a package that could look at a picture and tag it "Nicholas and Andrea" or "Glen and Helene". Digital camera software everywhere could benefit from this technology. Not sure how you'd bake that into a CAPTCHA, but it's a good problem to solve.

    --
    John
    1. Re:It was supposed to happen. by Ilyakub · · Score: 4, Informative

      Facial recognition is not only pretty good, but is available in consumer applications. Google's Picasa does it quite well for your personal photos, and Face.com can go through your Facebook photos and quite accurately suggest tags.

    2. Re:It was supposed to happen. by fuzzyfuzzyfungus · · Score: 5, Insightful

      But has it?

      Unfortunately, CAPTCHA is radically easier than actual OCR. When cracking a CAPTCHA, achieving a success rate of 5-10% is absolutely fine. Plus, when you submit your answer, you are told whether or not you got it right. With OCR, anything short of high 90's is pretty much useless, and the only feedback available is through manual human intervention, which scales poorly.

      Arguably, the only significant OCR advance has been RECAPTCHA, which is just a clever way of making humans do the hard stuff in a way that actually helps, rather than just using makework problems.

      It is certainly true that CAPTCHA cracking has advanced considerably, that just doesn't apply too neatly to real OCR problems.

    3. Re:It was supposed to happen. by digitalchinky · · Score: 4, Informative

      CAPTCHAs have been dead for a long time already. Please direct me to the spam software that can actually read and interpret these for me, because I have about an 80% failure rate. I'm human, the very thing that is supposed to be able to figure all this out. If I see a site asking me to type in some obscure word or number, I click elsewhere. It's just too much trouble.

      Spammers aren't using software to solve this problem anyway! Bold statement you might say? Maybe. Travel your backside to Asia, or, from the comfort of your own chair you could visit sites like sulit.com.ph (think craigs list wanna-be, it's that kind of thing) Every 3rd advert is asking for 'writers' that can log in to forums and post at least 3 or more messages before getting banned. How much does the lucky employee earn? About $200 USD and up per month. It's real money. So who is paying for this? People like the PHB in 1st world corporate wasteland, maybe your CEO thinking it's a good way to get more hits, maybe you. (No, not you personally) Evidently it works or the money wouldn't be flowing, and you wouldn't have 3000 people advertising this service each and every day.

    4. Re:It was supposed to happen. by Wannabe+Code+Monkey · · Score: 5, Interesting

      If CAPTCHAs do continue, I'd like the next problem to be facial recognition software. I'd love a package that could look at a picture and tag it "Nicholas and Andrea" or "Glen and Helene". Digital camera software everywhere could benefit from this technology. Not sure how you'd bake that into a CAPTCHA, but it's a good problem to solve.

      How about this: The user is presented with a short message that they have to mark as "Spam" or "Not Spam". If the spammers get really good at solving this problem, they've effectively written themselves out of a job. And if they can't do it, then they can't get new accounts.

      --
      We always knew Comcast was corrupt, here's the proof: http://tech.slashdot.org/comments.pl?sid=1909890&cid=34545432
    5. Re:It was supposed to happen. by gd2shoe · · Score: 4, Interesting

      I like it, but it has issues that may be hard to work out.

      (1) If they only needed to solve one (or any small number), then the spammer's auto system will only need to guess. Present the potential user with 3 of these and they'll get fed up. The spammer's system, on the other hand, will get 11% correct by guessing. That's enough for them to thwart the system.

      (2) It's really easy to get samples of spam. Any user who clicks the spam button has stated that it's not their mail. (Multiple users flagging the same message tell you it's practically certain to be spam) It's not a huge stretch to acquire or assume permission to use the message. Getting legitimate samples (of varieties of email) may be much harder.

      --
      I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
  2. True AI by not-my-real-name · · Score: 5, Funny

    I'll just bet that this is what leads to "true" artificial intelligence (whatever that is). Soon, we'll have completely automated agents trying to convince other completely automated agents to purchase stuff to enhance bits of biology that they don't have.

    --
    un-ALTERED reproduction and dissimination of this IMPORTANT information is ENCOURAGED
    1. Re:True AI by KPU · · Score: 4, Insightful

      This is a reasonably accurate description of the stock market.

  3. But will they share their code? by dameepster · · Score: 5, Insightful

    Spammers are unlikely to share their results with the rest of the world. They're motivated by financial rewards, and there is absolutely no incentive to publicize their methodology in any format.

    Not only would the "good guys" learn from it -- and thus potentially defeat the spammers' discovery -- but other spammers would simply steal their work.

    1. Re:But will they share their code? by ceejayoz · · Score: 4, Informative

      Spammers sell their code to other spammers all the time.

  4. how about... by inzy · · Score: 5, Interesting

    using spammers to create AI which allows us to catch/ignore/prevent spamming?

  5. Re:a possible idea by MichaelSmith · · Score: 4, Interesting

    I used to work on a traffic signal system in Australia. At one point we hosted an experimental system from (I think) the CSIRO which displayed the speed you would have to travel at get a green at the next intersection. The problem with that was that it gave really bad, but accurate advice, like travel at 12km/h or 80km/h. This is where the limit is 60. So they changed it to only display speeds below and close to the limit and then it was even more useless.

    The actual algorithms which determined the timing of the signals was hand assembled by traffic engineers in 12 bit PDP/11 machine code, so it was impossible to know exactly how it worked.

    Maybe that system was intelligent. It certainly had a lot of emergent properties.

  6. Re:Busting captchas has not advanced anything... by Jane+Q.+Public · · Score: 4, Insightful

    I would agree, if general-purpose captcha-beating software were available. But that isn't so. Each captcha system was beaten by custom code, individually written for that system. So in effect, it is not much different than adding a new font to existing OCR software.

  7. Dear Friend, by drolli · · Score: 5, Funny

    My father, a nigerian spammer passed away. He left an AI system on a server located in a datacenter. Sadly during the last phase of his life unpaid data transfer bills accumulated to a sum of $300000. I am already negotiating with the secret services of the word who want to buy this program for $10000000. I can't pay the data transfer bills, so i turn to you, a trustworthy AI reasearcher. For $300000 you get a share of $500000000 and the copyright to the source code.

    sincerely yours,

  8. Ignoring the real problem. by blackest_k · · Score: 4, Interesting

    Trying to ensure only humans sign up for things is just a small part of a bigger problem.

    The other night I got javascripted away from the page i'd found in Google to watch a page pretend to put windows on my laptop and find malware, seen it many times before, i run ubuntu so seeing an xp like display of my c: and d: drives and various dll files being scanned isn't very convincing.

    I decided to look into why i'd landed on the original page. Google had the page as about no4 after my initial search, but the site was about 4 weeks old whys it ranked so high?

    And the answer is incoming links from around 86,000 pages according to google (links:domain.name)a lot of them are created internally passing links between malware site to malware site. But the majority come from sites using php forms which add user posts to the the sites pages.

    A number of months ago i found my sites contact forms were sending a lot of garbage emails to me absolutely stuffed with urls and I wondered why bother doing this since i'm not going to visit the sites. anyway the cure was to only allow the forms to be processed with no more than a few urls in them. stopped the junk hitting the inbox. It's not stopped the automated posting but the forms are not processed and i don't get them any more.

    When I examined the links to the malware site i found php posted user posts packed with links just like my emails had been the difference being these were posted published and being crawled. Because of these links a site with less than 4 weeks life is ranked highly because of the quantity of inbound links and thats why I got to watch a display of XP like virus and malware scanning,

    I also examined the content of the pages of the original malware site and the subjects varied quite widely but they also seemed to have a relation with the trends that google was showing for related keywords in the weeks before the site went live. I've a feeling that the pages were generated by pulling content from legitimate sites that ranked high in the natural search.

    I guess site owners tend to think these links are to spam porn at their users but its not its so google will promote the malware sites with gamed page rank.

    Clever isn't it
    find good key phrases (may be just using google trends)
    scrape content from legit sites and mashup
    create massive array of links to site.
    wait for the fish to arrive and scam them.

    The Antivirus scam is antivirus2009 but you only get shown it once
    heres a link for details on removing it and some interesting details.

    http://www.2-spyware.com/remove-antivirus-2009.html

    Thing is the third party linking sites were using captchas but the real problem was not filtering the posts if a suitable max number of url's were used the posts would fail and the pagerank gaming would too.

    Fixing the broken php and cgi scripts is whats really needed not just a better captcha
    The Captcha is just a BandAid on a deeper problem and webmasters need to deal with the issues.

  9. Nice going, you just invented the tiered net by SmallFurryCreature · · Score: 4, Insightful

    What about people for who $50 is a year salary? Congrats, you just split the internet into the rich and the poor. No more accessing the internet from africa from an old PC powered by a donated solar cell. Good job. You probably going to get a nobel price.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.