Will Solve Captcha for Money?
alx_lo writes "Captchas are a nice idea to protect your blog or guestbook from being spammed by robots.
But what good is this protection when you can hire "data entry specialists" to solve captchas for $0.60 per hour for 50 hours a week?
Anyone here who can think up a solution that does not include drastically changing the global economy? How about captchas that require cultural background knowledge to solve?"
The cultural background idea sounds good, but that may just reduce the number of Captchas these laborers can solve in an hour. A simple internet search should be able to solve these questions. What would be a few examples of a good Captcha for Americans. You will always find a good portion of Americans that are unable to answer even the simplest.
US customs has been known to ask cultural questions at border crossings. My sister was once asked what Dan Quayle's parents did for a living after she said she lived in Indiana. This question is a bit before her time. (His parents ran a newspaper in Indiana.) This also brings into question age. My parents kill me in the original version of trivial pursuit that they play, but I win when playing the newest version.
A temporary stop gap measure might be to use the current Captchas in combination of looking at the users geolocation. I can see how this measure though would really anger free speech advocates for the third world.
How about a mathematical Captcha that cannot be solved with a calculator. Well educated foreigners will not even work for $.60. Then again, how many Americans could solve these.
quis custodiet ipsos custodes
I remember seeing an example of a captcha type game a while back where you would have to pick the hottest girl out of 3 pictures in order to continue..
problem of course is when people disagree on what's "hot"..
MABASPLOOM!
Just get rid of them. Who needs 'em? You don't solve capchas when sending e-mail either, or do you? What bloggers need is a good spam filter, like SpamAssassin is for e-mail.
I agree with the parent post...put up a captcha picture of a PDP-11/40, PDP-11/45, PDP-11/70 and I can identify all of them within half a second.
However....my wife will correctly identify it as a "PDP" but probably won't identify the model
My sister (who is smarter than me) will say "it looks like a computer of some sort"
My niece will identify that it is something electrical
I don't want to see captchas that start to depend on a specific culture to use.
I admin a PHPBB-based forum and the spam (from bots) was getting out of hand. They were going through the built-in CAPTCHA with no problem. The solution ended up being that I had to modify the registration form so that it wasn't just the default form. Throw a couple of oddball questions on the form, make them required, and bots can't deal with it since the bot script can't account for deviations from the norm.
Transistors and Beer!!
My team of fine Southeast Asian workers will remove spam from your web site/bulletin board/blog for a low low price of $.60 US/hour.
Incidentally, for those of you in the market to advertise your wares: My team of fine Southeast Asian workers will circumvent those inconvenient captchas on web sites/bulletin boards/blogs for a low low price of $.60 US/hour.
Here at SweatShopSoftware.com, we have a solution to every problem.
Why are you letting these clowns ruin our country?
"Captchas are a nice idea to protect your blog or guestbook from beeing spammed by robots."
Captchas? Do you mean capuccino? And they are a nice idea, for some brick-and-mortar to make money off of selling in tiny amounts.
Seriously, did you know what the author meant before cliking on the link? Ugh, and it uses wikipedia to translate it. My Zod, where in Houston have we come to?
Have you read my journal today?
from wiki
/. story took me back to my days in the college Theory courses. Oh Happy Days.
CAPTCHA is sometimes described as a reverse Turing test. This term, however, is ambiguous because it could also mean a Turing test in which the participants are both attempting to prove they are the computer.
For some odd reason, this
This still hurts spammers, because spamming is otherwise pretty cheap. Once you've grabbed bots, all you have to do is upload a few hundred KB of scripts to an IRC channel. It's practically zero overhead. This adds some to the equation. Adding overhead puts smaller spammers out of business, and it's the way to win. We can't stop spam, just make it harder.
Yesterday, I saw a presentation by Dr. Luis Von Ahn (developer of the ESP Game, and other CAPTCHA type games). He claimed that spammers and porn companies are willing to pay about $2.50 an hour for 720 CAPTCHAs an hour, or about 1/3 cent per CAPTHCA. (The CAPTCHA solcing is needed to create more free email spamcounts.) I don't know why people would solve them for so much less...
You mean I can make more than the $0.40/hour I currently make? I need to talk to my boss about a raise...
Perhaps a solution is making the captcha time-intensive? If it takes an additional 30 seconds of 45 seconds, it might cut down on the number of captchas a person could solve in an hour.
This would probably work better for sites where you only enter the CAPTCHA once, say for creating an account.
Nope. That doesn't look like spam. Looks like some bozos with very poor English copy-pasted from their standard template without understanding what the requirements of the 'project' are.
Refundable micropayments. Seriously. Require people pay $1 to post a comment, payable via paypal or whatever. Once you have checked their comment, you can add them to a whitelist that will never be charged again and refund them their $1. Spammers don't get their dollar back, don't get added to the whitelist, and have their comment removed. The result over the course of a large number of blog entries would be to significantly increase the cost of doing business for spammers, while providing only a very minor inconvenience for legitimate users.
Hmm, what exactly do these data entry folk do? Are they presented with a captcha and they enter the text with the software storing the image along with the text? Is this to do image comparison with or are they training some type of OCR software? Seems like in either case, having an image generator that has enough variations in the "noise" type (ideally it would randomly generate it) would defeat this? Or am I not getting exactly what they are being used for?
Maybe I missed the memo/boat on this, but aren't CAPTCHAs here specifically to stop automated spamming, automated account creation, etc.? After all CAPTCHA == Completely Automated Public Turing test to tell Computers and Humans Apart.
So the real problem is coming up with CAPTCHAs in real-time with no permanent (this session ID) correlation made between the image link and the answer. Then hiring "slave labor" to make this mapping for you will be completely useless.
Then the "other side" will volly back with an image algorithm to thwart CAPTCHA, then we'll get CAPTCHA 2.0 with synergistic AJAX-enabled authentication, and then we'll have Terminators ruling the world.
:wq
This issue quickly runs into the same sorts of problems that copy protection on software does. People who are dedicated to breaking the system will still be able to, but normal people trying to work with the system are just getting annoyed.
It's a mild pain in the ass to match a swirled up picture of letters (I've known the alphabet for about 25 years, and I still get them wrong sometimes), but I'll usually go through it. Make it much more difficult than that, however, and I'm pretty likely to decide it's not worth it, and go waste my time on another website.
The solution to this problem is not to make the visitor do more work, because you can easily drive your visitors away by making your website a hassle. The spam needs to be filtered on the server side, or just deleted as it appears.
I've encountered this problem on my own neglected website, and I haven't found a good solution that I have the skills to implement. I generally just delete the spam as it appears, and I turn off commenting on older posts. This works for my personal site, because it's low traffic, but I'd imagine someone who gets more readers and spam could find the motivation to set up some sort of filtering, similar to email spam filters.
One time I threw a brick at a duck.
I wish I had someone that could have answered the questions at the beginning of Leisure Suit Larry for me when I was 11...I would have broken open the piggy bank to play!
This is why I believe in the future there will be two Internets. The one we have now which is wild and wooly where you can remain anonymous, and one where you can't do anything without a Reputation ID that is tied to a biometric identification method (fingerprint, voiceprint, etc.). There will be third party companies like Google that have Reputation ID accounts and will handle the authentication. The Reputation ID based Interent is where eCommerce, government and medical records, etc. based web sites will live.
I hope to heaven that instead of a biometric authentication, someone can come up with a card reader for driver's licenses or some other ID method, but current events seem to indicate biometric authentication will prevail. Even in that case, I hope it is a "authenticated-user" token passing scheme so that the web site that you want to visit never knows who you are, just that you are a valid user that owns the account ID you claim to own (the Reputation ID web site acts as middleman and privacy shield, pray they are never hacked).
By the way, I don't like the thought of privacy problems and Reputation ID spoofing scenarios this implies. I just don't see any other way way to build an Internet with a high degree of trust. As I type this I am looking at the SlashDot captcha box for comments.
Robert Oschler - RobotsRule.com
I know that this has little to do with Captcha's. However, when forming a group in World of Warcraft I simply ask the applicants to either tell me a joke or insult me. If the insult is good or the joke is comprehensible, then, they are in.
...but haven't they been doing this for a few years now? I seem to remember a story, at least a year back, where spammers were giving porn away for free, as long as you solved a captcha every couple views.
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Between crap like this and all of the scam emails coming out of Nigeria, etc., I can't help but think that people in wealthy nations will eventually decide to just ban all Internet traffic coming from poor nations. The value of the legitimate traffic coming from those areas is becoming outweighed by all of the crap that people are willing to pull to earn a relatively small amount of money.
Me: Oh Madeline, You look so good in that nighty, would you like me to iron it for you?
Madeline: You insensitive clod, I am naked!
I helped develop one of the largest websites in Europe (in terms of traffic and volume of content). Human spammers have been bypassing our CAPTCHA for a while now. We still keep the CAPTCHA to block most bots. The data input goes through a custom spam filter. These human spammers are trying to spread their URLs, email addresses, and phone numbers just like most spam, so this helps to a large extent. Anything that gets through that can be flagged as spam by users. On top of all that there's some human moderation by the business which owns the site.
So in the end spam filters can help but human moderation is still the only real working solution today.
Developers: We can use your help.
It'd be a pain in the neck for human users, but requiring a Captcha for every sent message might be enough to make spammers lose money even with cheap labor.
I'd consider that spam.
Here's an idea:
This has no benefit over just requiring a formkey like slashdot does. Basically you create a hash containing the users IP and encode a recoverable time stamp into it. Forms are then locked to an IP and expire after a pre-detirmined amount of time. I dunno if slashcode uses a db for it's keys? I have a PHP version that works well without a DB, the only problem is around 12:00 AM when hashes generated the previous day are invalidated. Sure, you could hash the capcha text into the key but there's no real benefit in doing so.
Match each band to the model of truck its music is eminating from:
1. Metallica
2. Billy Ray Cyrus
3. Lynnrd Skynnrd
a. GMC truck with double tires on the back
b. Primer-color El Camino with beer cans in the back
c. Shiny red F-150 with aerodynamic truckbed lid
Step into a huge movement. Don't Tread In Me.
Someone forgot this is the World Wide Web, and that not everyone logging onto a given website will necessarily be from any given "culture"!
To fight the war on terror, stop being afraid.
It would be a fine idea if you were trying to keep access down to certain sub-cultures (ie, a captcha showing a picture of Linus Torvalds and one of Linus from Peanuts, asking what they have in common), but on a larger scale it just isn't going to work.
To register, you have to be a "confident" user of a parternship website, like say ebay, paypal, amazon, yahoo, hotmail, google, etc, etc. They can proof that you are a real user, and an open api allows 1-1 relations between your accounts. If you are not registered to any of those website, you have to get X points using Folding@Home to be trusted.
Animated captchas that change continuously, morphing into a recognizable 133t-speak version of the word to be entered for a short time during the animation. Require entry of this within 10 seconds, from the same IP where the pieces of the animation were sent (to prevent downloading and analyzing it).
Paleotechnologist and connoisseur of pretty shiny things.
Running with your cultural background idea:
Why not take this to the local level, ie, make your captcha refer to website content.
The spammers can circumvent captchas effectively because they make sense out of context. But if your captcha asks for the Author's surname, the name of the website, or the news item's title; suddenly you need to actually know about the blog before posting.
Take this to far though, and it starts to look like those discriminatory voter tests of yesteryear.
Many sites could survive by blocking out regions of the internet. (Many cannot...) So that solution should be implemented more often. When it gets to the point that certain countries are effectively isolated off of the internet, the government will be forced to crack down on the offending activities.
So yeah, let's talk about blocking out China and various asian and African countries until they get their collective acts together.
For local offenders, let's talk about violence as the solution. At best, these people are a public nuissance, at worst, they are interfering with legitimate commerce, freedom of speech and public recreation. It's always gratifying to see a spammer get jail time and property seizures, but it doesn't happen frequently enough for me to enjoy. I'm still waiting for the news headlines about spammer getting the crap kicked out of them.
a solution that does not include drastically changing the global economy
You're obviously a yank, it's quite easy to tell. Getting rid of spam on your inane blog is a much bigger issue to you than getting rid of slave labour.
p|-|33ar |VI3y xxxtr3N\3 133t_spe4k captcha
fear my extreme leet-speak translation captcha
The real problem with spam after all is not the spammers but the people who respond to it, if nobody bought from spam then there would be no spam. Well at least much less of it. After all it is advertising and spammers are not selling say viagra but selling spam itself.
In any case with this log of users who actually click on spam links you could then A compile an overview of what kind of user actually is stupid enough to respond, B educate them or C ban them for being to stupid to live.
Considerring the offered budget in this ad for (30-100 dollars) I don't think the guy is operating with that big a margin already. If you can reduce the number of people who respond to these spams then perhaps simple economics makes the problem go away.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
Just have a human authorize every account creation. For smaller sites (the vast majority of the web) this might introduce a load of one authorization a month. As site size scales upwards, you have more people available to help with authorization. Could use the principles of the turing test to work through a 2 or 3 email exchange.
;)
Could make the supporting cgi scripts as simple or as complicated as one's willing to author. One forum I maintained for a while had a low level "all access" section where new users posted an application. Forum regulars would respond, and eventually grade the new user. If they passed, they were given full access to the board. Granted, this system was employed more to limit the quantity of asshats than spammers, but the same principles apply.
It might even benefit society in the long run as a spammer's urge to do his work forces him to develop a "true" AI.
Use a browser speech plugin to play a string of words randomly selected from a large dictionary, ask the user to repeat them.
Good for blind people, too.
I've visited a Japanese art site (ie pictures of characters from fighting games drawn in alarmingly extreme detail) which had roughly this on the front page:
"Because there have been some people coming in here and stealing pictures or linking without permission, I have had to put this small test up. Please enter the Emperor's birth date in Japanese calendar in the box below. I'm sorry for this inconvenience and I will remove it when they forget about this site."
I've also seen a site (again in the 'students with too much time on their hands' sector) that asked for some other date in Japanese calendar. There are also a fair few personal sites that have a front page with just one link that takes you in, and several spurious links, with the page being 100% japanese text -- which I think serves about the same purpose.
On a related note, there also used to be WinMX groups which required that you say something in Japanese on entering or be booted. The point there was that otherwise you'd get masses of Korean 12-year-olds coming in and going 'Fuk Japanese bitch! dokdo nun uri tang!!lolz0rz!' and generally spamming the place. At least, I hope they were 12.
So, cultural captchas certainly exist... but it's easy to see why they work better on 'my pictures of Vampire Hunter D' sites than in the commercial world.
Whence? Hence. Whither? Thither.
Use a java-applet that contacts the server for a serial number, that is then used in conjunction with the clients ip-number generate a picture where the user has to click in a pattern to verify that he is not a robot. When the correct sequence is clicked the applet contacts the server and informs it that that client with ip-number so & so and session-id xxx using serial number yyy is an interactive user.
--- Reality doesn't care about your opinions, it happens anyway and if you are in the way you'll get squished.
Someone needs to set up a realtime blackhole list for blog spammers. I realize there's challenges there, but anything to put a dent in the problem will help.
Here's something I've not seen done yet: Put your blog or board under .htpasswd/.htaccess protection and include the username and password on your main site or even in the AuthName along with login instructions. That's probably easier than deciphering an image and something bot scripts are (hopefully for now) unprepared for.
-K
How about captchas that require cultural background knowledge to solve?
If the captcha does not itself contain all the information required to solve it, some legitimate users will be unable to solve it.
Now, simple riddles would at least require mastery of the language instead of mere character recognition skills. However, requiring language only raises the $/hour cost of solving them a little. More importantly, even easy riddles are much harder to generate for captchas than random strings. E.g., "What word is fourth in this sentence?"
"We reject as false the choice between our safety and our ideals." --The American President (20.1.2009)
CAPTCHA's can either be easily bypassed by script, or you can get people to do it. The thing is, if you make it harder you start blocking out visitors, maybe those with sight problems who have to use a screenreader, or people with a text only browser.
My blog recently had issues with automated spam, and I found two possible ways of dealing with it.
1) Use a filter like email. Wordpress has one available called Spam Karma 2, which measures time it took to fill in the form, Javascript payload, URL levels, and other things. I found it rather good at catching spam after a little training, but it was quite resource heavy, and even scripts make mistakes once in a while.
2) Use something abnormal. I decided to add a math script. Basically, it produces a simple math question (4 + 9) and asks for the answer. The comment will only submit if a correct answer is provided (the form has a hidden input with a server-side produced hash) which is checked against the hash (if hash is missing it automatically fails). Many spam bots don't know how to handle math, so they fail. To disquise the question for 'alert' bots people only need to add surrounding characters or convert things (+ => plus, 9 => nine) etc.
It occurs that we need to revisit the problem here.
Captchas are a way of trying to determine that there is a human at the keyboard rather than a bot. If we verify that we have a human doing the work here and not a computer then we can be assured that the service we're providing is not being harvested for nefarious means.
The problem with this is that human labor is cheap, especially if you find some way around the various sweatshop protections we have in the civilized world. How much do Chinese gold farmers make?
Perhaps the questions we ought to be asking in order to prevent harvesting are 'Does this client seem like it's not a bot?"
1. Has this client's IP address requested this service in the last 2 minutes?
2. Does this client appear to be a fully-fledged web browswer as opposed to a bot that understands http 1.1?
Number 1 is easy. Chuck out requests from IPs that have already made requests in the last few minutes. Fairly intelligent flooding ban algorithms are realtively common.
Number 2 is a little harder, but still straight-forward. Provide the client with a problem that must be solved in Javascript. The problem should be arranged so that the solution takes a few seconds or so to work out. The client has to send the correct answer back with the request for the service.
If both these are observed, it puts a little more strain on those trying to harvest services. The amount they can harvest is limited to the number of IPs they have per requests the server will allow. The client must also understand Javascript and be willing to spend the CPU cycles to work out the math problem hurdle.
The 'live user' who wants to request the service does so fairly easily, but the bot who wants to harvest the service suddenly finds himself up to the ears in bans and 100% cpu usage.
Anyway, this is just a suggestion. I'm convinced that trying to determine that the client is alive is simply the wrong direction. Instead we should be proactive and try to find solutions that hamper the bots.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
i could see it if it was something related to the message board,,,
something that has the topic about electronics could have somethign like that.. it might also help keep idiots off..
but on slashdot.. all you have to do is bang on a keyboard
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
Kitten authentication! It's perfect! Identifying small, cute, furry animals needs a basic cultural background in animals common to the West, but at the same time requires little or no intelligence (plus, it's fun!).
Try it out at http://www.kittenauth.com/node/5. It's currently being rewritten; if you can't see any animals the first time, click 'submit'.
What's purple and commutes? An Abelian grape.
One thing I did just 2 days ago has stopped the CAPTCHA attacks cold. I modified my registration page just slightly to alter it's URL. Now, if some lackeys are manually doing every phase of the registration, this is no help at all, but they're trying to be more efficient than that. They don't make their lackey's click the "register" link, and then click on the link confirming they are over 13, etc, etc. Rather, they have tools that automatically traverse these paths or mimic their traversal, and those tools require your installation to literally be identical to all PHPBB installations, as it is their syntax it is capable of parsing and triggering.
The result is that no lackey, apparently, is ever getting rushed right to where s/he sees a CAPTCHA and has a textfield into which to type its text. I've fallen off the radar by opting out of a monoculture in a very tiny fashion. I'm glad to think I've turned the spammer's trick (obfuscation to defeat automated tools) against them.
tone
tone
What about reducing it to a single problem again by accepting comments only via email? Then you can bring the usual tools to bear - forcing server retries, greylists, whitelists, blacklists, analysis, etc.
Just provide the comment email address at the bottom of the article and a uid in the address would make it post to the proper article/story/whatever. Reply to email addresses would have a different uid as well.
Make the mail server moderate for you.
You are checking your backups, aren't you?
Use Flash's ability to do limited 3-D stuff. Show a single letter for a minimum of 3 seconds (requiring the user to rotate the image with the mouse to be able to identify the letter. At the end of that 3 second minimum time, they can click a button to go on to the next letter (or they can take longer if needed). Do this for 5 letters. So it will require at least 15 seconds, which will slow humans down a bit to make life harder for people using sweatshops and will almost make things incredibly harder for people to develop bots for. In addition to this, keep track of IP's requesting the captcha and only permit 10 per hour from an IP address.
Have failure to post in the three attempts locking that IP out for a day.
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
Unless you abandon anonymity and use something like digital certificates that have some kind of identity verification behind them, and ban abusers. Of course, that creates barriers to participation. But, if you can't hold people accountable for breaking the rules of a forum, and you let all the billions of people on Earth, or at least any of them that can find an internet connection, use the forum anonymously, then there will be abuse.
why not insert a random number of hidden fields followed by the actual fields, interspaced with other random hidden fields, all of which are named a bunch of random letters? Because if you sent them the HTML in the first place you should also be able to remember that for user X form field "XYZABC" = "name" ... ok so kills form autofillers too but, hey i dont use those :-)
If you don't risk failure you don't risk success.
You can't have captchas to solve your spam problems when the bots are human, that is, humans been paid for solving captchas.
I have seen this kind of attack in phpbb 2.x forums, where you can register a username and the user is displayed in the userlist. Spammers uses the homepage field to point to pishing sites.
I was researching captchas a few week ago before launching http://www.tinymailto.com/ , and found that the best free php captcha out there is http://www.captcha.ru/en/kcaptcha/
Get my e-mail after a captcha test in: http://tinymailt
> How about captchas that require cultural background knowledge to solve?
Assuming, for the moment, that you don't want anybody outside of your own country to be able to use your blog/wiki/whatever (which would probably make you a nationalist). Then assume that you can create a large data set of cultural questions with answers that everybody from your culture/country could answer correctly 99.9% of the time (don't forget to include young people, old people, mentally handicapped people, etc.). Then assume that none of the humans-for-hire are intimately familiar with your culture (not all jobs get outsourced to India, you know?).
So first you have to be a nationalist who disregards the opinions of everybody from another culture. Then you have to come up with a good set of questions and answers. Then you have to personally track down everybody from your culture who is being hired out for CAPTCHA duty (don't forget the ex-patriots).
Then, and only then, will your idea be of any use.
Pay someone in Outsourcistan 60c/hr to delete the spam, and we can get rid of the damn captchas for good.
The latest Slashdot meme.
a selection of photos (4/5) with radio buttons, and "Select the banana". obviously you might want to combine it with photos of lemons and melons so that scripts dont just look for a mostly yellow photo and are done. Still doesnt help screen readers much though.
If you don't risk failure you don't risk success.
One solution longer-term is to not allow any html links (or markup in general) in posts or profiles. With no Google-rank spamming possible and no direct way for prospective marks to get in touch it removes most of the incentive to post crap comments in the first place. And pure text-only posts can quite easily be filtered for objectionable content.
Trust the Computer. The Computer is your friend.
I can see Mensa having some really interesting CAPTCHAs :)
"A train leaves...."
"It takes 3 men 4 hours to...."
I've actually had to implement a (graphical) captcha on our company's support site because someone was trying to sell stuff once a day (I think they thought it was a blog, with public comments). It wasn't terribly difficult to do, and stopped it dead.
Have a central server that logs the IP addresses of the customer entering the CAPTCHA. A website owner can query this database with an IP address and find out how often this IP address has registered a CAPTCHA in the last hour. If the CAPTCHA entry rate is higher than a threshold you either throw them out or make them wait longer for entry. So the more often you enter a CAPTCHA the more time it takes you to enter your next CAPTCHA. Obviously for privacy concerns the database holds nothing more than a mapping between CAPTCHA rates and IP addresses.
Can the CAPTCHA drones have their IP addresses anonymized of randomly generated to bypass such a monitoring system?
K
The bikini - security through obscurity since 1943
Here is an extremely useful new hack for vbulletin 3.6. http://www.vbulletin.org/forum/showthread.php?t=12 4828&highlight=captcha+questions/.
It allows you to create a list of question/answer combinations that are randomly presented in place of captcha images.
You can then, of course, tailor the questions to your intended audience. This would certainly help curtail 'unwanted' members.
I am surprised that all slashdot can come up with so far is cultural or mathmatical solutions.
I think some sort of game would be a good idea, sorta like the crappy games in flash advertisements now days. Make it difficult enough that it is too time consuming for spammers, but easy enough that people do not get frustrated when trying to register or post.
Ultimately I think that better filtering is probably the solution
One of my message boards has been getting spammed a bit lately, despite the CAPTCHA..
We have recently installed a mod that we can add keywords and urls to. So posts from new users are checked with this.. it needs a bit of fine tuning, but I think eventually it should get rid of most of the spam.
In addition, users can flag posts as spam which are then checked by a moderator
Not a perfect solution of course. Someone could still pay for the answers, but it would take them more time to watch a video than look at one image. The videos might be related to the subject matter of the site and actually be entertaining or informative for valid users to watch. Captcha questions might be a little harder for a topically relevant video to further insure a user is worth the price of admission.
Amazon made one step further and offers the tool to completely outsource and automate CAPTCHA breaking and mostly any kind of human-only online activity using: Amazon Mechanical Turk. One can outsource HITs to China or wherever for $0.005 per achieved task.
cut this signatures madness. stop reading them now!
How about presenting a small phrase or story and then ask a couple of questions about the text. Example: Mary and Jim took an empty 2 gallon jar to the well. They filled it up half way with water? How many gallons of gasoline did they put in the jar? Or Please sum up all of the occurences of words that are bigger than 4 letters and less than 6 in the following sentence. Then add all of the vowels in your username: blah blah blah whatever
One interim solution is to split the image into two halves - even better would be to randomly split the image either horizontally or vertically or both. Their harvesting bots won't be programmed for this initially, and they will not know what to do, will grab only one half, etc. When they catch on to this you can use CSS to arrange transparent GIF images in a random order over a background, or even build the characters using background colors and tables. All of these things are easily worked around by the spammers, but you can at least try to stay one step ahead.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
At least if these guys are solving captchas, they're NOT answering customer service calls for cell phone companies...
For each client, send a series of captchas: "solving" "captchas" "formoney?" "one" "thousand" "usdollar" "reward" "for-arrest" "of-your" "employer".
Forget captchas, or at least forget trying to make them more than trivially solvable.
We should be treating blog spam the same way we do email spam - they are essentially the same thing. Bayesian filters and honey pots. Set up blogs that are full of lorem ipsum or something less obvious but meaningless to real humans. Then any posts that end up on those blogs can be considered spam and if similar posts show up on other blogs, just quarantine or even delete them automatically. Similarly we ought to be able to employ bayesian filters that have a broad anti-spam basic training and then are custom-trained to each site.
When information is power, privacy is freedom.
Now, I'm sure something like this exists, but I don't believe it exists for MediaWiki, the wiki engine behind Wikipedia and numerous other wikis, and certainly doesn't exist for a good many blogs or online discussion forums: filter wiki edits, the diffs specifically, and forum posts through a bayesian filter. If the text passes, allow it. If the text is spam, send it to a queue for possible human override. Admins, and possibly a large enough number of votes from other readers, can train the filter by marking posts as spam. Do any forums or wikis have this? Even better, the web area of likely spams could be available for public viewing and double-checking but marked with robots.txt to exclude search engines.
I think the key is to have the captchas ask questions such as:
What is the minimum in the United States ?
a) four peanuts and a kick in the nuts per day
b) $5.15 per HOUR, that's right, HOUR as in 1/24 th of a day
How much does a spammer pay on average for each solved captcha ?
a) less than a penny
b) more than a penny
What is Falun Gong ?
a) an evil conspiracy to enslave old people into doing calesthenics in public parks
b) a competitor to the brainwashing cult of communism
c) all of the above
The goal would be to get the captchas blocked by the great firewall, trick the workers into doing a google search that will get them arrested, or cause them to rise up and eat their employers still beating heart when they realize how exploited they are.
Spamming is a fairly typical nonviolent crime; it's primarily economic in that it forces others to pay for things that benefit the spammer but do not benefit them (much like driving an SUV). That captchas have now forced them to begin paying even a tiny portion of the costs is wonderfully positive news. Unfortunately, the implementation of captchas themselves is an additional cost to the site operator, and of course the cost of the $0.60/hour spam is still imposed on them as well. An economic solution would have to accomplish two separate goals: first, increase the cost of spamming to a level equal to the cost to site operators (and individuals reading mail or text messages, according to the type of spam) to store and process it; second, ensure that those costs are in fact being paid to those people rather than to third parties. This would seem to argue for a combination of technology which is prohibitively expensive to bypass and the sale, if desired, of advertising space to spammers at prices favourable to those doing the storage and processing of spam. If the cost of bypassing captchas is $0.60 per hour, would people be willing to allow an hour's worth of comment spam in their blogs for $0.60? Probably not - at least, I wouldn't. But if it were $6.00? Or $60.00? The curves have to meet somewhere. Nevertheless, it seems unlikely that technology will be developed which will be 100 times as expensive to bypass. Therefore, the death penalty is our best bet for the near future.
... asking for wrong solution. Bots send spam to your comments log. How is it different from sending spam mail to your email address, once they learn about it? They earn money if they are successful, you gain nothing if they aren't. Therefore they can invest some of it in overcoming the barriers you put in their way.
:D (Bwahahahahuhuhahahaha, after going thru all that, they gave money for nothing, lol, rofl)
You can never win, but I suggest making it harder for them (hurting their profit): Unlike with email spam, they lose some money on human "burglars" and the process slows them a bit.
1) generate new ones at as high rate as possible - don't let them reuse one solution.
2) put multiple captchas ( a time-varying number, say... 1-3, so that hired solvers couldn't establish a rhytm - it'll get them tired sooner).
3) slow down both passing thru them and the posting process.
4) occasionally ask submitters to solve yet another captcha after the post was submitted.
5) And, most important, run adaptive (Bayesian) antispam filter on each submitted post and reject those who fail
Just moderate your blog if you dont want spam to show up! If you have time to blog, you have time to moderate. And its tahts too much work! Outsource moderation (if you can find the same labour pricing!)
3 billion people (half the world's population) live on less than 2 dollars a day. http://www.un.org/esa/socdev/poverty/images/IDEP_f lyer_A4.pdf There's going to be some well educated people in there.
How about: "What is the average airspeed velocity of an unladen swallow? "
:-)
If they answer 11 metres per second, then they're obviously African or Asians and so can be denied entry to your site.
IP Addresses are easily spoofed. Spammers just have to randomly generate an IP Address and include it in the IP header request.
Which is actually a huge problem. Something as subjective as "who's hot" is probably the most idiotic idea I've ever heard. It's like asking "what's your favourite food" and thinking that everyone certainly likes the exact same dish that you like.
Even people within the same culture have _vastly_ different tastes. That's why all the niche porn sites exist. E.g., as the "Big Beautiful Women" sites prove, there _are_ people who'd pick a 300 pound girl as the hottest. Or as the "Mature" sites prove, some people will pick the 70 year old grandma as uber-hot. Or if you have an otaku solving that captcha he might go for the bland japanese schoolgirl just because she looks japanese, and ignore the gorgeous swedish supermodel in the next photo. Go figure. But that's the kind of variation that human tastes and fetishes present.
A polar bear is a cartesian bear after a coordinate transform.
... nice board full of bids there now .... :P
(obviously in later stages you need to make sure the division x/g is done to necessary precision, but keeping numbers in fractional rather than decimal form makes the mental calculation easier, if you can handle an answer in that form.)
this method converges quadratically whereas 'trial and error' or a 'binary search' converges linearly. this means by using this method a simpleton from the 16th century could beat you quite easily doing 3-4 digits of accuracy, and could probably find 6 or 7 digits faster that you could if you were doing the divisions on a calculator.
btw i'm not sure if this is the same method you outline above, or if by 'divide, refine' you are simply deciding whether your guess is too big or too small, based on whether g or x/g is bigger. taking the average of the 2 is much better, and not computationally expensive.
my password really is 'stinkypants'
> How about captchas that require cultural background knowledge to solve?"
Joe Dimaggio, how little we knew ye.
"How many testicles did Tu-pac have?"
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
About half the posters here are confused about this.
You will never see the captcha drones' IP addresses in your logs because they don't communicate with your webserver. The spam bots download the captcha image during the signup process, ask a drone for the solution with a separate request, and use the drone's answer to complete the signup.
IP blacklisting for the spam bots won't work either because they're usually home PCs (maybe owned by the same people you want signing up to your forum). These machines were infected with a worm and centrally controlled by the spammer.
Hands in my pocket
next time you play-- ask them questions from theirm good box, you get asked questions from your good box.
every day http://en.wikipedia.org/wiki/Special:Random
Isn't this what Google is doing with the Google Image Labeler. It seems the next step for this concept is to use it as a Captcha, if they can show that >75% Americans label a given picture the same then that would seem to be an acceptable level for Captcha success (seeing that a large number of folks cannot copy text from the screen correctly). They could force you to get 3 out of 4 picture labeled correctly to post. A long while back a search company came out with a image search tool where you draw what you want to search for. Ie to search for images of airplanes you draw an airplane, it didn't work all that well but it was a good concept. If this were made to work well enough that most people could use it on a laptop mouse pad then this would seem successful. But if it's too good you won't get in, which would help prevent the image from being broken up and a script tracing it back to the screen. A combination of the two would be even better. Label the picture then doodle the drawing or a Or I could upload my student's homework and you could have a grading captcha. You have to read the question and correct the students answer... sure it's probably not the best solution but it makes my job easier. Or we could charge 10 cents per captcha use like people won't to do with email. that way the captcha slaves will loose money.
Captchas are a tool for discriminating between intelligent humans and stupid scripts. But this the wrong tool for the job, because the goal isn't to stop scripts and give humans carte blanche.
The goal is to allow responsible behavior and disallow irresponsible or annoying behavior (such as spamming). IMHO the best way to do that is to authenticate identities, and associate a reputation with an identity.
Instead of making the user enter a graphicized word, make them upload a challenge response that has been signed by their OpenPGP key. Now you have a keyid. Check it against blacklists or whitelists, see who has vouched for this user and check their referral reputation, etc.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
1. Get advertisers to pay you to spam blogs and forums
2. Get hassled admins to pay you to delete your spam
3. Profit!
Actually, it is trivial to show that it is true for all n < 10^100.
Ben Hocking
Need a professional organizer?
because it would real difficult to write a computer program that could play tictactoe perfectly. i bet i'm the only person in the world who wrote one in BASIC at the age of 8. in fact this achievement probably makes me the smartest person that has ever lived.
my password really is 'stinkypants'
It should ask a question in Jive or Redneck. I'd love to see a native Chinese speaker try to answer this: "Is this hyar a pitcher of a right fine car?"
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
I prefer the Bilbo line of questioning.
"What's this in my pocket?"
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
The problems for captchas are even greater when you consider one scheme I've heard about:
a) Obtain some porn, make a little site that provides free porn.
b) However, before you see the porn, you have to fill out this captcha.
c) set up your bots to queue up the captchas they hit in their spidering on the porn site.
d) present your porn-hounds the captchas from the bots, the bot gets notified of the answer so it can continue.
This model basically provides you with a vast resource of real-person-answered captchas at a fixed one-time cost (the site setup, and possibly acquisition of the porn). I've been unable to come up with anything you could change the captcha to that would prevent this from working, though perhaps something like a graphical "choose the most seriously mutilated penis" would work...
I've managed to cut down blog spam significantly lately after installing an Anti-CAPTCHA: http://www.timtucker.com/weblog/?p=74
The basic idea is to present a CAPTCHA image that's as easy for a machine to understand as possible and then ask the user to type in something else. (in the system that I'm using, users are presented with an unobscured image of a 6-digit number and asked to type in a different 6-digit number).
One of the great things about asking a user to type in something other than what's shown is that it's much more accessible than a regular CAPTCHA, since there's only a 1/1,000,000 chance that someone who can't see will accidentally type in the "right" six digit number.
Yes, they teach that stuff in school. What they don't teach is the "culturally appropriate" stuff liks how the square root of 3 is George Washington's birthday.
That's, like, 1932, right?
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
What, client-side auth? It's not like GPG keys don't grow on trees, y'know. OK, it might be slow for a spammer to regenerate a new key, but you wouldn't need to do that for every potential post, just the ones once it starts failing.
You're better off asking for a simple computation ("what's 2^3?") or even doing it in javascript ("here, what's the largest prime factor of this huge number?"), I think.
~Tim
--
Rushing on down to the circle of the turn
Keys grow on trees. Keys signed by someone you can trace a path through the WoT to, don't.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
It only means you won't get into Somethingaweful anymore.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
This captcha thing is not the only negative effect of sweatshops. But it's an effect -- one of many.
The cause of sweatshops is trade barriers, and the backwards political systems that keep their subjects in anti-modern societies.
Once the world is fully opened up, and all the anti-technological memes have been killed, the labor market will homogenize. At that time, it will no longer be possible to hire a brain for sixty cents an hour. Only then will this category of spammer/freerider problems be solved.
FATMOUSE + YOU = FATMOUSE
Making your customer "wait" isn't a good idea, I know. But it is a surefire way to keep people out who need to crank out a few dozen or hundred valid usernames per hour to generate revenue. This can also be coupled with questions only your "target audience" is likely to answer correctly (where you also don't let them know if any answers were right until you're fairly sure that it's not just guesswork). That would also ensure that trolls stay away.
That way you could also enforce a policy of "read for a while before raising your voice", where you ask them about a few topics that were discussed recently (so you're pretty sure people know what is acceptable in your board/journal and what is not) and require them to give correct answers.
This, of course, is only applicable if you know that people "want" to come to you, it is most certainly a deterrent for a fair lot of casual posters. But it is definitly something that I'd want in a high profile forum.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Which biological characteristics, exactly, cause someone to know who Britney Spears is?
Stupidity?
Peer pressure?
Infuriate left and right
I think it wouldn't be so crowded here if Slashdotters would actually bang on the keyboard.
Facts of the Problem:
0. You probalby do not want to pimp some bogus product using your web site.
0a. You will not get paid for it.
0b. You could get embarassed by pimping it.
0c. You have to then explain way you support said bogus product.
1. A computer program.
2. Created by a human.
3. Inserts 'wierd' content into your Blog.
4. Biostatistics; Given 8 Billion living humans, there exists at least one person that will challenge you.
Solutions:
0. Bad Bloggers are on the job, 7 24.
1. Monitor has to Monitor.
1a. Do not worry about weaknesses, Bad Bloggers will only be willing to show them to you.
1b. Add another format to confuse the robot.
2. Consider using some kind of flitering methods.
2a. Keep a copy of the bad blog message in a separate blog table.
2b. Also keep a copy of all avaialable input data in the separate blog table.
2c. keep notes on what you think is bad blogging.
2d. Add another format to confuse the robot.
3. Consider using at least DHTML solutions.
3a. Use of a random number generator for outputting content formats.
3a1. Makes the robot think.
3a2. Robots do not like to think.
3a3. Robot masters are then forced to think.
3a4. Robot masters do not like to think.
3a5. Add another format to confuse the robot.
3b. Make it more expensive to figure out the HTML formats.
3b1. Consider using XML/XSLT based formats.
3b1a. Use a 'XSLT Preprocessing' statement in the XML document.
3b1b. Place a random number generator statement in the XSLT program.
3b1c. Add another format to confuse the robot.
3c. Get input data of robot masters.
3c1. Do not forget those who would punk you.
3c2. Add another format to confuse the robot.
Use snail mail... Make EVERYONE that wants to sign in have a ID and password that must be MAILED to them to gain access to their account. Until then they use A.C.
--- Relax, that mass muderer is just trying to reduce our carbon footprint, one fetus at a time...
What's more interesting is that people are falling over themselves to work for a few cents an hour. People with internet access. People who can learn to program. See where this goes?
blow up the internet?
Autonomous Retard -- Is your camp safe? UnsafeCamp.com
This talk on Google Video has a bit of info about CAPTCHAs. Apparently some porn sites are displaying occasional CAPTCHAs that their users have to solve before seeing the next page of porn, and then using these solved CAPTCHAs to spam blogs and other sites. The developers get bonus points for creativity, anyway.
just allow everybody to register (no captchas) and put people on a 'probationary' status, their first 2-3 posts won't appear right away but will have to be approved by the moderator: once a user passes 'probation' things will start working as usual. I bet this would reduce spam to 0, as no spambot (human or otherwise) will be able/take the time to create 2-3 on topic posts.
Of course you can set things that if any user posts more than 10 posts before being approved all of them will be deleted and the account banned (to combat flood), auto-expiration of probationary accounts, etc. etc.
This would be a lot harder to do for email accounts etc., but for msg boards/blogs/... I think the above would be pretty much cutting the spam to 0.
-- the cake is a lie
There is a kitty captcha floating around - not kidding - where you get four pictures, and have to click on the one that isn't a kitty.
Captchas will get advanced enough , just like the technologies in decoding them, which will make it a cat and mouse game.
Open Source Java Web Forum with LDAP authentication
The captcha solvers are propably just sent an image, right? So I propose no having the data necessary to solve to captcha available in a predictable way. Perhaps use an image pre-loader which loads MANY captchas and then presents them through pure javascript (you can't parse the html if it doesn't exist, and random is a bit tricky to predict). This way, only one captcha is visible to a human but determining which one that is would require brute forcing 50 or so images with different styles.
Yes, a bit processor and bandwidth intensive, but that challange can be passed onto somebody else who's had their caffeine today.
-Tim Louden
To which end I recently joined an ad-hoc group running one of the rides at Disneyland. Respectable people have been pinging me whuffie ever since.
And in random order. Just have the first page of your registration have a reverse CAPTCHA and the second page have a real CAPTCHA. But some times have it the other way around, randomly of course. You could have the directions as part of the CAPTCHA image. For example, tell people that if the CAPTCHA starts with a number, its normal and if a letter its not. Or use chinese characters if it's not or something. This way, no script nor non-fluent English speaker would be able to pass the test.
Other countries just don't know common names.
Login: _______________
Password: ____________
Write a 500 word essay describing the critical political factors that led to the Mexican-American war: ____________
The best part is, we can pay foreigners $0.60 per hour to grade the questions.
paintball
What can we do with access to a person who will work (albeit limited in skill) for 60 cents an hour?
Maybe pay them to clear blog spam... or clean out our inboxes of spam...
Instead on focusing on eliminating the spammers which seem to excell at annoying end-users and admins alike, would it not be far more practical to simply stem their source of revenue?
Understanably, some of the less scruptulous sellers may even be out of jurisdiction, but perhaps there could be a way of limiting their profits by confiscating their products at the border. Move which would then be complemented with an advertizing campaign informing the populous that supporing spam is in nobody's interest.
Either way, filtering and patrolling can only go so far.
Live for the present, learn from the past, and dream of the future!
Right,
I was thinking you could require 20 seconds before accepting the capcha, and then timing out after 40, but this could still be proxied in real time.
The challenge might be to show a quick succession of letters and require a single keystroke reply to each in turn.
if javascript were used on the web page to create the imagery, it might be difficult to proxy, and if timing were critical and limited, the delays could be noticeable.
imagine a simple animated gif, with a java key capture. the letters/pairs/trios are shown in some timed sequence and compared to the responding keystrokes, this would test in principle, both the latency of the connection, and the latency of the reader - here again suggesting that second languages might have a rather pronounced latency in responding to English word patterns.
To be culturally exclusive, one might used obvious objects and ask for the first letter of each. That is the kind of thing that natives, even non-mathematical natives can do fairly quickly while non-natives would always need the time induced by internal translation.
AIK
From wikipedia : Paying the human operators with access to pornography instead of money has also been considered.
Where do I sign ?
I didn't found something funny to put here.
Thanks for the insight on how these things work.
The bikini - security through obscurity since 1943
How about taking a page out of "Last Crusade" and having multiple "submit" links, only one of which works. In plain text near the links, say something like "click the blue triangle submit button to not have your post marked as spam." As long as there aren't too many choices to wade through, users won't be terribly inconvenienced.
Sign out, and you'll see a captcha when you try to post. (I think it's once per IP, but I could be wrong.) Also, when you sign up for a new account, I think there's a captcha as well.
Laws do not persuade just because they threaten. --Seneca
Any such will be defeated a very simple way:
Here are 10 p0rn pics. If you want 10 more, please solve this captcha.
Voila, hoardes of people will struggle to help you post your spam. And they will be americans!
vajk
The trouble is showing a picture of a code and typing it is an easy captcha to describe to a data-entry person, they just put a bunch of images and have the data-entry person send the code that matches with each picture.
I think better captchas can be made that aren't quite as easy to represent in a generic way. A possibility would be to show a picture and direct the user to click on certain objects in the picture, before the captcha is completed; the exact coordinates of the first click will be submitted before the next challenge is presented -- after the first click, the captcha will be time sensitive and start over if any click isn't made correctly within 30 seconds.
By having multiple captchas in a successive sequence, you don't get to see the challenge until you successfully answer the first captcha -- it means you can no longer e-mail the data entry person a set of pictures to send you matching codes for.
The representation of a ray used as the plotting cursor in the Logo standard library, right?
For my guestbook (sorry, no demonstration--the site that hosted my PHP is gone... setting it up at a new place later tonight), I use two layers of protection that work really well: One is a set of blacklisted words like "viagra" and "phentermine" that are only ever used by advertisers. The other uses a simple statistical method to determine whether the entry has a distribution of letters within an 0.995% confidence interval of a typical English entry. Surprisingly many spams fall victim to this test because they're either randomized to trick filters or degenerate from "Hi nice site buythis buythis buythis", and the repeated words trigger this filter.
I figured out it was being done by real people long ago. Ever seen that flash animation that goes "You are an idiot, ha-ha-ha-ha-ha!"? Glaring spam gets sent to that flash (I couldn't find the one that blares, "Hey everybody! I'm looking at gay porn!"). Part of my spam problem was that pissed-off spam slaves would simply enter lots of garbage just out of spite after getting that flash. My shiny new statistical filter takes care of that too!
~Ben
notice the ad to the right of this article (Related Links)?
If you want to restrict people of certain culture/country, you can easily restrict access using IP. The problem is a lot of the sites are for everyone and personally I dont want my blogs restricted to people from US.
With clutural captchas, what we are saying is I want this website restricted to certain culture or country. There are easier ways to do that. Just use something like Geo IP with netfilter
Tests indicate that I (Anonymous Coward) am 75% human.
Who would've guessed?
Because...you are also a tortoise?
Please, for the good of Humanity, vote Obama.
The issue as I see it is having security and still getting people to fill out your form. I personally will not fill out any form that requires me to think more than a few seconds about the 'captcha' or any other security type measures.
Clearly CAPTCHA's are working as intended, to increase the cost of spamming. Before, the costs of spamming was close to zero, now it costs $0.60 an hour, for say 1000 spams per hour, or $0.0006 per spam. $0.0006 >> $0
Inefficient spam that returns less than $0.0006 will be stopped. Want to further reduce spam? Increase the time it takes to solve a CAPTCHA, instead of 5 letter, use 10 or 15 or 30! At some point legitimate posters will not be bothered, and all but the most efficient spam will be removed.
All this hard-to-answer CAPCHAs isn't getting us anywhere. What do you think about Celebrity Jeopardy-style questions that commenters have to answers. - Are horsies pretty?
- "All you have to do to win the game is write down the current year."
- Your Favorite Food
- Letters of the Alphabet
- Where Are You Right Now?
- "Just write a number."
- "Tell you what, you guys just decide. You each write your own question and then answer it."
- Things You Like
- Would You Like A Cookie?
- First Grade Math
(from, of course, Wikipedia)
Captchas need an overhaul anyway.
The false positive rate is too high already with the current lot of image-based captchas and it's only going to get worse as captcha recognition software gets better and requires that the captchas are harder to "see". The number of legitimate users who are turned away is not trivial either. Captchas don't work very well for people with poor eye-sight and the tiny little picture of a speaker or a wheelchair next to the image that is supposed to represent the audio version of the captcha isn't much use to those people either because they can't see it either.
Most forum software requires that you have some form of image manipulation library on your server to generate these images which seems fairly unnessecary anyway. How many forum maintainers actually use gd or ImageMagick other than for captcha generation ?
Language recognition:
Software isn't very good at discerning the meaning of a sentence. Ask a simple, self-contained question that a human will find trivial to answer. eg. "What is the second last word in this sentence ?". or "If I have one white horse and one brown horse, how many horses do I have ?". This, however, may not solve the parent's problem of stopping sweatshop captcha breaking.
Alternate Language:
The way the above method WILL solve the sweatshop problem is through the people in the sweatshop not knowing the language you are using. I suspect that the people involved don't even know what board they are posting to. They just get an image, a text box and a submit button. They don't know English and they almsot certainly don't know Japanese. If your board is in a language that they don't understand then replacing image-based captchas with language-based ones will solve the problem. (Temporarily. See below for why...)
Combination image/language:
Ask the user to do something different with the captcha. Sometimes they are asked for it as per normal, sometimes backwards, sometimes only the even-index letters, sometimes you could ask for the odd numbers or only the numbers in a mixed numbers-letters image. Even if you only had a small number of variations that you could ask for, this would significantly reduce the number of successful captcha breaking attempts. Alternatively, it could increase the complexity of the captcha breaking software/sweatshop labourers. It's hard to get people with logic and reasoning skills for $.60/hour.
Cultural:
The cultural idea mentioned in the parent is fraught with danger. I suspect that there will be a high false positive rate because the test is not self contained. It relies on the user having some knowledge and not all legitimate users will have that knowledge.
Lastly, the spammer's methods will likely evolve to meet whatever captchas we can dream up. It wouldn't be hard to devise a phishing scheme, ad-ware program or even an XSS attack that tricks a legitimate user into passing the captcha and then hijacking that and using the account for spam. This method wouldn't even cost $.60/hour. It would probably be almost free and it would have a higher success rate than their current methods. It would just involve some sort of high volume email-spamming for the phishing, normal spyware installation methods or breaking into many bulletin boards to effect the XSS attack.
I suspect that these methods will start to seem more attractive to spammers when captchas get harder.
That said, it's an arms race and you only lose when you give up. Bring on the next round of better captchas !
Sig matters not. Judge me by my sig, do you?
That would work on IMDB but not on any other site where the users are not necessarily expected to know about the works of Columbia, Disney, Fox, Paramount, Time Warner, and Universal.
But a Go computer plays as well as a casual (kyu-graded) human player.
You wonder who would tolerate a long audio based test. One example: blind people, who rely on screen reader software to speak the words that programs display. They would like any reasonable audio test a lot better than the currently popular visual tests, which are completely inaccessible to them.
Just do what I do. Block China, Pakistan, Russia and its ex-associated province state territories, India, Singapore, Korea (both parts), etc. from even making a TCP connection. Sorry, it's not that we don't love you guys, but you create a crapload of unnecessary work for the rest of us. Weighing that against the general contribution makes the decision easy.
My life got much easier once I found and/or created an IP-based block list for these and similar countries...
What about a reporting/temp ban/blacklisting service similar to Spamhaus or Spamcop, but for the web. The system would keep a list of active spammers and your web server would check for blocked IPs any time a POST, COOKIE or GET request is made (reg globals on) or when a POST or GET request is made (reg globals off). This way we can try to minimize the damage of these attacks just as we do with spam. Throw in an optional bayesian filter so IPs closer to the spammers are scrutinized along with investigation and banning of specific netblocks... Why hasn't this been done?
Sorry about your rules, but Joe Bob's a movie critic, and John-Boy's a farm kid and/or author, depending on how you interpret the time scales of the Waltons. And if the name's _really_ long, it's probably something Indonesian, or at least not European like those short example names you suggested :-)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
I hope this turns into a thriving industry and captcha technology fails, because despite its usefulness I think it is one of the most obnoxious things to have ever happened to the Internet.
Of course this method can be used to avoid division any time. To find the reciprocal of a number (x), square it and find the approximate reciprocal of the square root of the square using the method above. Then use Newton-Raphson iteration (f=f*(2-x*f)) to get the precision you require.
Support SETI@home
People do shit jobs like this stuff because the wages are good in economically deprived areas. Well. Get rid of the causes of economic deprivation. Funnily enough this means getting rid of agricultural subsidies, getting rid of trade barriers for products from economically deprived areas.
There's a certain irony to this, protectionism in one area causes problems which affect other areas of your economy.
Deleted
Just put a 20 second timer into every captcha, maybe have to increase by 30 seconds every time the same IP hits in a single day. If you respond before the javascript timer counts down then your ip is automatically banned. If you solve after time is up, sure, go ahead, come on in. A few bad apples will get in, but not as many as possible, and new users will be very minorly inconvenienced for a minute or so. Maybe give them something to read while they wait.
Obviously this wont work if everyone does it, but why not force users to spend at least 10 minutes on your site and hit at least 4 pages before being allowed to post a comment. You could completely hide the comment forms until they meet these requirements so that the spammers would essentially never find the form to submit.
I had my blog harvested by spammers, long ago. I implemented a series of measures to protect my blog:
:)
1) I generate a token, combining the remote IP, the current time, and the blog entry in question, and produce an md5 of that token. I put the token and the time used to create the token into the form page for adding a comment to the blog entry.
2) On comment submission, I check the token: I have the time of the page generation (from the token), and the blog entry, and my secret, and I produce a new token. If they don't match, it's a spammer, and I ignore the comment.
3) If more than one hour has elapsed since token generation, I discard the comment.
This serves to block 95% of spam comments, without any visible change to real users. There was one spammer, however, who went to so much trouble that they fixed their script to work with my specific code. I added one final measure:
4) All comments must be approved before they appear.
No more spam, and very few spam comments to moderate to "fuckoff". But, in the end, I just disabled the comment system entirely, because I got no real comments anyway.
Seriously now, don't use Captchas, they suck. They don't stop spammers, and they annoy the FUCK out of real users. If bots signing up is a problem, require an email address and do email validation. That's annoying too (my default email address is behind a graylist, so I get a lovely 3 to 6 hour wait for unknown sites) but, IMO, far less annoying than squinting at some bullshit little box.
Lucky I'm not vision impaired...
The trick is follow the money. A standard disclaimer with the advertising rate of $1000 per line per day that is legally enforceable. Then a co-op of member all over the world to catch the problem of the advertiser being outside the law.
The best solution of course is start getting people arrested in the US under existing drug dealing laws. If you offer drugs to children within so many feet of a school, the punishment is years in jail. Its damn close to election time an thousands of Attorney Generals are up for reelection. Call their office and ask how many internet drug pushes they have prosecuted. When they say none, tell them that your vote will be going to someone who isn't dinosaur and understands technology use in crimes. If every slashdoter in the US did this today, spamers would be in jail by the end of the week.
Don't be so sure that all these "online sweatshop" workers are in third-world countries. There are large numbers of people in many countries, including the US, Canada, UK, etc. who are happy to "work" for 60 cents an hour, or even less, no matter how boring and repetitive the tasks required. And many of them aren't concerned about issues like ethics or legality.
A lot of it happens in a cottage industry created around what are often referred to as "Paid to Read Email" sites. It's also referred to as the "Get Paid To" or GPT industry. It started with "Paid to Surf" companies like AllAdvantage, and still continues on a much smaller scale today. To get an idea of how many of these sites are out there, a database at GPTInfo contains over 700 different sites.
There was an article about this industry published at Associated Content called The World of Paid to Read E-Mail Sites that offers a basic description of how things work. But it doesn't really look at how these sites can be used to pull off scams like this CAPTCHA data-entry thing and search engine click fraud. SearchEngineWatch describes them as Click Pirates, and in a lot of cases, that's exactly what they are. And they're most definitely not limited to third-world countries.
Dudley's Dungeon, a Nethack comic strip, does that. Really simple questions like "Which character represents a wand?" are both trivial for nethackers and are almost unsolvable enigma for spammers.
This approach is fine for a website oriented around a common, niche interest but I don't think a general public website should go for something like that. Salting the captchas is easier to implement and it will defeat almost all attempts to defeat them. Something like: enter the number of kids in that picture, plus the number written in this captcha plus five. Any website doing that will be a pain to use though. I think the comment should be bayes checked than a captcha, possibly salted should be sent only when it looks like spam.
The easiest solution I've found for preventing spam is to disable the posting of URL's.
It can be a little bit of a pain for users to have to break up their urls, though it's amazing how much automated spam it prevents on one of the anonymous/no rego forums that I run.
"And the first question is for you, Karl Marx. The Hammers - the Hammers is the nickname of what English football team?"
Easy. Require users to submit a DNA sample, which can be matched against a gov't registration list. Optionally include an ID# in a chip implant in all new babies - will take a few years to be useful - use Passport #s for now.
OK, so people are circumventing CAPTCHAs. One possible solution to this is to use some of the same techniques that are used for fighting e-mail spam. One such technology is real-time DNS-based blacklists. If a particular IP address is sending out spam, several people report it, and it gets added to a DNS-based blacklist. Then other servers know to refuse messages from addresses on that blacklist (or to give them a greater spam score if you want to take that info with a grain of salt). The same thing could be done with paid employees circumventing CAPTCHAs: if you run a web server where someone has entered a CAPTCHA and then gone on to post spam on your forum or whatever, report it to a realtime blacklist. Then other web sites can check the blacklist before they let you sign up for an account. Presumably these people being paid $0.60/hour won't be able to switch IP addresses several times an hour, so that should slow them down pretty good.
Another similar technique for fighting e-mail spam is another type of blacklist: blacklists of URIs contained in messages. With that type of blacklist, it doesn't matter where the message is coming from; what matters is what link they're trying to refer you to. The links in spam get listed in the blacklist, and then on your mail server, you can block all messages that link to that same site. This e-mail spam fighting technique could be adapted to the web: if someone makes a post to a forum and it contains a link to a blacklisted site, remove the post. Disable the entire account if they do it more than some number of times.
Probably some other spam-fighting techniques could be used to fight CAPTCHA abuse as well. There are some distributed databases that take checksums of spam e-mail messages (or of portions of the message) and publish the checksum; if you get a message that matches one of those checksums, it is either the same or has large substrings in common with known spam. You could do the same thing with posts to web forums, because presumably these bots are pasting in some standard text when they post crap to web sites.
You could possibly even use a naive Bayes system for keywords in web forums, then automatically hide any messages that appear to be spam based on the keywords. Of course, you'd have to train the Bayes database, but that might not be so hard (maybe even have your users do it).
i've had to deal with some punk spamming a submission too. what i ended up doing is create a session variable with a random value and display the form with that value in a hidden variable (1). the accepting script will only accept form submissions with the matching variable in the session(2) and as long as the variable isn't tagged as already submitted (3). if accepted, tag the variable as submitted.
;)
(1 & 2) creates a key/value pair at runtime, the key being the session and the value being the random variable value. this will thwart the usual 'url harvesting for later spammage' cases.
(3) prevents the 'got an ax to grind' spammer who clicks on submit one too many times.
of course you can easily defeat this system with a script that knows to: 1) first request the page with the form. 2) look for the hidden variable value, 3) send the submission with the session cookie and a matching variable value. to create one more hurdle for the script writer we could do:
1) create a collection of javascript functions that each compute a different string (numeric strings, what have you..).
2) before sending the form, pick one such javascript function, put the corresponding string it would return into the session for the magic variable mentioned at the beginning of my post.
3) send the javascript function to the client, along with the javascript code to populate the form variable with the return value of the javascript function that gets executed onLoad.
4) implement the scheme of checking the session variable from (2) to the variable in the form submission, if it matches:
a) the guy is being paid more than $5 an hour, he knows javascript, took the time to figure out what was happening. made the effort - he deserves to be heard
b) automated browser/browser-like tool (at least something with a javascript interpretor embedded.) not trivial. let him submit - made enough of an effort. (dcop-ed konqi script maybe?)
c) fair submission. we want those.
the drawback would be that anyone without javascript enabled is screwed. will have to think more about this..
Something I've experimented with a bit has been profiling of web traffic. By keeping track of actively connected users and monitoring habits, it is possible to put together a decent profile of a connection that will flag it as a regular user, a bot/script, or an irregular user.
Useful profiles cannot be bases on a single page request though - you need to know what sort of pages are being requested, whether the requested pages are related to one another, what supporting materials (graphics, scripts, etc.) are being pulled down, what the user agent looks like, rate of requests, and in some cases sanity of sequence of requests.
Granted - it is possible to write a VERY convincing bot to beat this type of model, but a coder would have to work VERY hard to deliberately fool such a profiling system designed around a specific site.
For something like a blog site, for example, a request could be flagged as being a bot or irregular if they ended up reaching the REPLY form processing script without having viewed the thread that is being replied to, or pulling the graphics from that page, etc. In some cases I have bugged pages with JavaScript that pulls an image down to validate the page request if the graphic is not pulled, then the request is not validated. If a user is flagged as a bot or an irregular user, then their submissions could be flagged for moderation and perhaps not even show up on your site until specifically approved - better safe than sorry.
Anyway, just a thought - it may be possible to flag the types of users described in this thrad as being irregular based on their viewing habits of the site. I've used this technique to block anonymous proxy browsing and content bots from stealing content from a VBulletin site by doing "real-time" processing of HTTPD logged request data.
- SK
Would it be possible to design something such that the use of anti-capta software/humans would classify as circumventing access to my copyright data??? Access control should include access to my (copyright) website, no?
Sometimes boldness is in fashion. Sometimes only the brave will be bold.
Half of /.'s readers are from outside North America, and could easily fail a test that required specific American cultural knowledge. I wouldn't know what month the Superbowl is played, which is something I guess most Americans know, even those totally uninterested in sport.
/., "The name of Luke Skywalker's sister".
But most websites wouldn't want to exclude half their audience, so maybe an interest area specific question, like "The name of the country where the founder of Linux was born". Or with the new, broader focus of
I wonder if PGP and cryptographic keys are the solution to all the spam, whether e-mail spam or comment spam.
As I see it the question comes down to identity, trust and filtering that is identity based. I discussed and explained the use of PGP for solving the E-Mail Spam problem at BarCamp Boston 2006 with quite positive feedback.
I think it is the same with comment spam. Have everybody create his/her own strong signature and have her/him sign the comment one supplies. O.K., this weeds out anonymous comments, but I don't care. For ease of use, send a signed e-mail to an auto generated address that does incorporate the article-id.
-
Signing messages with PGP ensures the message comes from an identifiable person
-
I can reliably filter on this identity
-
I can use the signature trust to guide my filter
-
I make the decision what is spam and what not
I think in all spam filtering algorithms, it is important to stress the last point, because otherwise I'm taking away the freedom to express certain things. "Some persons trash might be my treasure."Could the spammer create a new key for each message? Yes, he could but it would be quite a computational effort, costs CPU cycles to sign the message and you'd also need to publish your public key so it can be used for verification. In addition the key would be brand new and have no trusted signers.
In the long run I could see the browser incorporate a "sign the message of this field with my signature" feature and we would not need to send an e-mail.
By the way this mechanism is free to everybody. Although commercial entities could buy the signing of their keys from the usual "trusted" entities.
Busy helping non technical users of OpenOffice.org - http://plan-b-for-openoffice.org/
I had my guestbook spammed continiously (it's the one from Matt's Scripts archive). So I modified it.
On the URL entry field I have now the text "enter an url here if you want this message to be marked as spam" (and if you do, it's not saved to the guestbook). I need no links in my guestbook anyway, only spammers need them.
Then I have some javascript which enables the submit button only after 5 seconds. If the entry is delivered earlier, it's a script, not a browser.
Then I check if the user took 30 seconds or more to submit. I don't need guestbook entries where people don't think half a minute about what they write.
The form is not changed in a way, so the bots still find a Matt's Script guestbook and try to spam it. They all fail. Even cut/paste manual spammers fail. Some silly enough to put in an URL, the others just paste too fast.
Result: exactly zero spam
Atari rules... ermm... ruled.
I'm sure a lot of /.'ers will be doing the same come the next tech downturn. It probably beats cleaning toilets--even if that pays more! It's sad, but true. In the end, it beats panhandling or searching for cans--which may be the only option for an out-of-work programmer.
Your reference to Folding@home is just the same hash-cash issue that has been looked at plenty before. Compute time puzzles may work until we have quantum computing.
Back in the day when I ran a dial-up BBS, I used to personally validate each account before the caller could post a message. Other BBSs allowed the users to vote on new callers. Such a system would work on small web boards where everyone knows everyone in real life.
Another permutation of the above system is to allow new users to post freely, but only display the posts publicly after the webop validates the account.
No, I will not work for your startup
It's called PGP. Use it.
Luke-Jr