Asynchronous Programming for Spam Elimination
ttul writes "Stas Bekman (formerly the maintainer of mod_perl) has been quietly building an asynchronous programming framework to build high performance network applications in Perl. His recent Perl.com article describes how he has used the Event::Lib module (that lives on top of the popular libevent library) to write a traffic-shaping email proxy to get rid of spam. Asynchronous programming is challenging at the best of times. Read on to find out how to do it the easy way in Perl."
so they wrote an asynchronous proxy that slows down connections. cool trick, but not any kind of scalable solution.
:)
:) -- using a pool of apache/mod_perl instances to handle connections is grossly inefficient.)
the core assumption, and the only thing that makes this work, is that botnet spam software will _always_ just give up after 30 seconds; if this throttling technique ever became commonplace, spammers would just write their own asynchronous mailer -- it's not THAT hard. windows has the same kind of async networking support (either through the winsock API and/or IO completion ports, or what have you) and i'm sure the spam/botnet software authors have no qualms about holding open a couple thousand sockets on the rooted windows machine (times a few hundred thousand machines.) furthermore, i bet there are some shitty legitimate MTAs that would just give up too, causing actual mail to get discarded
(that, and they shoulda used twisted or something
ok, ok, maybe this sounds overly critical. it's a clever, thinking-out-of-the-box idea, but certainly not the panacea we're looking for to stop spam.
-fren
"Where are we going, and why am I in this handbasket?"
Sometimes I sits and programs, and sometimes I just sits ...
Your post advocates a
(X) technical ( ) legislative ( ) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
(X) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
(X) It will stop spam for two weeks and then we'll be stuck with it
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
(X) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
(X) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
(X) Armies of worm riddled broadband-connected Windows boxes
(X) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
( ) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
(X) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
(X) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(X) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, asshole! I'm going to find out where you live and burn your
house down!
"Where are we going, and why am I in this handbasket?"
[full disclosure and shillery alert: I work with Stas at MailChannels]
:)
You make some very good points -- and these are all concerns we had when we set out to build this software.
Fortunately for the world, these concerns have turned out to be unwarranted. Furthermore, our experience in actually deploying this technology has been far more breathtaking than we had imagined -- both in terms of spam mitigation and improvements in scalability.
> the core assumption, and the only thing that makes this work, is that botnet spam software will _always_ just
> give up after 30 seconds;
I have a theory that spammers will always be impatient. I believe this theory for several reasons:
1. Spam campaigns are now recognized by anti-spam companies in minutes or hours. New campaigns therefore have a very short life expectancy and have to be completed as fast as possible. If mail can't get delivered fast, it's time to move on to a new domain to get it moving again. With collaborative filters like Cloudmark recognizing campaigns in less than 60 seconds, spammers obviously have to move traffic fast.
2. Botnets are not unlimited in their size or bandwidth capacity. Typicaly botnets these days are between 1,000 and 10,000 hosts. Any larger and the command and control channels are very quickly noticed and shut down by service providers. Botnets cost money too -- $250/hour for a 10K botnet is typical.
3. Spammers raison d'etre is to send lots of mail and hope that a small percentage of recipients buy something. The only way to make the business profitable is to send huge amounts of mail. If all zombie traffic in the world was magically being slowed down, spamming would no longer be profitable and spammers would tend to focus more on things like highly targeted phishing instead. Not surprisingly, we're already starting to see this.
4. Because #3 isn't going to happen any time soon, and in light of the technical constraints (1 and 2), spammers have no choice but to abort their connections within a very short time frame. It's just the nature of the economic beast. Hanging on is just for posterity. It doesn't make economic sense.
5. It works. And it's very very scalable. By slowing down traffic and multiplexing what remains, mail server load drops by 90%. In big installations, that means no more being paged in the middle of the night because your cluster of 4-way Xeons with 8GB of RAM is borked by a distributed spam burst.
Oh -- and of course you can't just slow everything down. It's important to be very selective so as not to delay everything.
> if this throttling technique ever became commonplace, spammers would just write their
> own asynchronous mailer -- it's not THAT hard...
Actually, it is that hard. Even Stas got a headache working on this project.
But even if it was easy, it would be pointless for a spammer to launch more than one connection per zombie. If a sender is marked as suspicious, the sender's concurrency is severely limited. One connection per zombie, at 5 bytes per second -- that's just not economic.
> furthermore, i bet there are some shitty legitimate MTAs that would just give up too, causing actual
> mail to get discarded
Let's just say the gap between the patience of spammers and the patience of legitimate MTAs is very large indeed. And by carefully fingerprinting and assessing sender reputation, this problem can be minimized to the point where it is a far smaller problem than content filter false positives.
I also want to point out that this technology does not make email suck by slowing it down. It in fact speeds up delivery of legitimate mail in most cases because the load is so reduced on the rest of the infrastructure.
Just talk to our customers. One of them was running four 4-way Xeon boxes with 8GB of RAM each -- all this to service the spam filtering needs of just 10,000 end users. He told us he hadn't slept a full night in months because of load-based outages. Since installing the software Stas built, the only alert he's received is a notification that the load level dropped below the panic threshold!