Beat Spam Using Hashcash
Shell writes "If they want to send spam, make them pay a price. Built on the widely available SHA-1 algorithm, hashcash is a clever system that requires a parameterizable amount of work on the part of a requester while staying "cheap" for an evaluator to check. In other words, the sender has to do real work to put something into your inbox. You can certainly use hashcash in preventing spam, but it has other applications as well, including keeping spam off of Wikis and speeding the work of distributed parallel applications." If you're specifically interested in hashcash for your mail server, Camram has some interesting ideas -- their Frequently Raised Objections page may be illuminating.
Those damn police dogs can smell through plastic pretty well!
Aren't there plenty of available solutions today that make the sender "work for it?"
Funny this story should appear today.. I have been trying to find a mirror of hashcash.org for the last few days to read up on the whole idea. It's been down for a while now (or is there just some problem on my end?)
Please post mirrors.
ERROR 144 - REBOOT ?
The previous stories weren't enough?
And remember, you can't spell "Budget" without "Get Bud".
Give me Classic Slashdot or give me death!
Your post advocates a
(*) technical ( ) legislative ( ) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
(*) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
(*) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
(*) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
(*) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(*) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!
The end effect of this is eventually bad, or utterly worthless.
Joe Sixpack wants to send a mail. If it takes him an hour to parse a key, he's not going to mail his mother anymore.
If a spammer has to spend an hour processing the key, he's just going to invest more of his time getting zombie PCs to get the work done for him.
Who wins here? Certainly no one.
Disclaimer: the hour was used as an example. I've no clue how long it takes, but the point should still hold.
The moral being, don't make the end users pay for the actions of spammers. We have laws for spammers now; it's time to start using them.
For example, Sourceforge sends site-wide update messages about once a month or so. They have tens, if not hundreds of thousands of users. If every one of those users used HashCash, Sourceforge would practically need a dedicated server farm computing hashes simply in order to send out its update notices.
This is a really, really stupid idea.
Easily countered - then you simply change the hash question on a per email basis. So I ask potential email A a question about FOO and potential emailer B a question about OOF. There's no way to know in advance what I am going to ask. That way, the only way to email me is to actually compute the answer.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
In the future (if this takes off), these lists will simply contain the hashes along with the addresses. This temporarily makes the spammers lives a bit difficult, but doesn't have a long term impact.
Did you even RTFA? If there is *any* sort of time lag from when the Supplier A generated the hashes and sent to the Spammer B and the spammer sends the mail the hash's will become invalid.
3. The date (and time) a stamp was minted. Stamps in the future and those too far in the past may be judged invalid.
Your hair look like poop, Bob! - Wanker.
Thanks for RTFA
Jeez.
>>
How do you deal with large-scale legitimate mail sources (i.e. mailing lists, mail houses, etc.)?
There are two issues here. Mailing lists don't really have a good solution with the first generation of stamps. The traffic mailing lists generate is fundamentally indistinguishable from spammers, therefore whatever hurts spammers will hurt mailing lists. The answer for right now is to not do anything with mailing lists. Let them send unstamped mail and let the user whitelist mailing lists or deal with the trapped message issue manually.
In the future, it will become easier to deal with mailing lists because of the second generation of stamps (opportunistic signatures). If the list is signed with its own stamps, then it would be let through without problem. Spammers would still be barred because their signatures would be ignored.
The second issue is that mailing houses that deliver bulk e-mail for legitimate commercial ventures will need to generate stamps for some of their traffic. If they are sending newsletters to which users have subscribed, then the signature stamps method will work for them. Everything else is advertising mail and should be stamped. A circumstance in the future can be envisaged where mass mailers will try to cheat and use signature stamps for mailing lists to deliver commercial e-mail. Obviously there should be some method of responding, but that is not yet apparent.
In the meantime, these houses will need to generate stamps. While most of their server resources will be maxed out, they'll have idle resources on the desktop. A technique is being developed that allows a company to make use of its idle resources to generate stamps for its outbound mail. It will be up to each organization to determine what machines it wants to use and how high it wants to load them. If it's bulk e-mail with no particular need to deliver immediately, then a small number of heavily loaded machines should be sufficient. If it's urgent corporate mail, then they will want to have more machine resources than are needed for stamps.
You make me pay precious CPU time to e-mail my mother-in-law? you insensitive clods!
The author points out that a) a date is added to the string to be hashed and b) a database is kept for the day of hashes already used.
If you include the hash when you pass it out, step a) invalidates hashes of older days and step b) keeps the current days hashes from being reused.
So it doesn't matter if the spammers share. The hashes are one-times.
You are checking your backups, aren't you?
...because I was out of hash cash.
Funny, isn't there a Microsoft Research project that did this already?
Oh yeah, so there is, along with papers explaining how it works. So much for giving credit for prior work.
My other car is a cons.
If you want a virus built to generate stamps on zombies, just go over to Spamforum.biz and advertise for one. New ads over there this week include "PushMail Webmailer v1.0.2 ~ New, Fast WAP Webmailer for Sale (Gets by Filters)". There's even a banner ad for a firm that wants spammers: "3 different sites - Pharma - OEM - Cigarettes".
We bought a vanilla smtp server for our gateway called Xwall. A few months ago they introduced greylisting.
Basically what it does is temporarily block suspicious emails. If it's a real SMPT server it will resend the message and the second time it will be allowed to go through. Spammers never use RFC compatible SMTP servers and simply send once in bulk and forget about it. This cut down our spam by over 90%.
Fur sail 2 u nou: 5 mil-leeun facter numberz
Yuz cun b-u-l-k f4ster wit dis CD uv all-ready calcoolated leest uf numbors. Fer onlee $99.95, u getz ohver fiv milyun numz ant wee tos in freeee a miliun fresh A-O-L addys. Vizut us @ hotprimefactors.biz to ordur.
now we need to go OSS in diesel cars
Sort of like burning your harvest to keep grain prices high. Just send me a completed work unit of Seti-At-Home or Folding-At-Home in an email header. I am sure, given the incentive of every e-mail message advancing their goal, some of these projects can come up with work units that are difficult to calculate but easy to verify.
Maybe for once zombied Windows boxes will be more productive than they would be under their users' control.
the fact that not everyone is sending legitimate email with a powerful computing device. Something that could cause an inconvenience to a spammer with a boatload of cheap commodity 2Ghz desktop systems (other their own or a zombie army) will bring more modest systems to their knees. Handhelds, phones, old 486 systems recycled for use in the 3rd world, set top boxes, embedded systems, etc. will no longer be viable systems with which to send mail. And what about web mail providers?
These's simply no reason to resort to kludge solutions that depend on penalizing those who cannot afford top-of-the-line systems.
To me greylisting seems like the best thing to do. See:
g .html
http://slett.net/spam-filtering-for-mx/greylistin
and/or:
http://projects.puremagic.com/greylisting/
In a nutshell, it simply uses a standard 451 SMTP response that says "Hey, I'm busy now, can you call back in a minute or so?" To my knowledge, all standard SMTP servers respect this request, and little to none of the mass mailers do. And if they do, their bandwidth will triple.
Here's a log example:
Oct 15 15:18:17 example1.example.com sendmail[6955]: [ID 801593 mail.info] i9FJIGH06953: to=, ctladdr= (168/601), delay=00:00:01, xdelay=00:00:01, mailer=esmtp, pri=121994, relay=example2.example.com. [123.390.141.456], dsn=4.3.0, stat=Deferred: 451 4.7.1 Greylisting in action, please come back in 00:01:00
If the mail never comes back, then the sender is now blacklisted. If the mail does come back, the sender is whitelisted.
Simplest and most standards compliant thing that I've heard of, and it seems to work.
It's easily solved. Just buy the CD of pre-calculated prime factors from the spammers.
now we need to go OSS in diesel cars
You've never done distributed computing work, have you?
On average, the 1000 zombies will have an average CPU equivalent to a P4. Add to that network latency and all the work that has to go into coordination, and the equivalent CPU power goes down.
So if a spammer had 1000 zombies, he'd get at best a 1000 hours of work in 1 hour, and on average maybe a 100. To send a million emails, even under the best conditions and using the two or three second hash-compute time, he would need approximately 555-833 hours.
You can't defeat physics.
Only unsolicited mail needs a hashcash field.
Wikileaks, no DNS
Spammers never use RFC compatible SMTP servers
And spammer tactics remain static, so the same techniques that worked five hours or five years ago will continue to work indefinitely. Not.
My next sig will be ready soon, but subscribers can beat the rush
(x) Microsoft will not put up with it
a ck/demo/lbdgn.pdf
Except that Microsoft are *ahead* of the hash cash scheme. They've developed a scheme that does the computation with something memory intensive.
Main memory is much much slower than the CPU and the difference in memory access speeds in a cell phone and a PC are much less than their CPU speed.
Memory based computations are harder to run in parrell. In principle you could have many computers working on signing a single message.
They've made is very difficult to run their algorithm in parrell. The Microsoft scheme is much better.
More information here: http://research.microsoft.com/research/sv/PennyBl
Simon.
Someone with a valid stamp is less likely to be a spammer. Simply include it as a factor when calculating probabilities!
Or ignore the X-Hashcash field completely. As you choose.
If you read the article, you'd see that this was precisely the way in which SpamAssassin uses hashcash : as one factor amoungst many in a general system of spam classification.
Wikileaks, no DNS
It's that in order for this to be useful, it has to be widely implemented. Anybody who sends a lot of legitimate email (e.g., hotmail) is going to need to buy a lot more CPU. So it's not going to get widely implemented. So it won't help. Sorry. :'(
I consider mailing lists a cute throwback to a much earlier time. Don't get me wrong, I subscribe to three or four myself. But every single one of them, I could just read on-line (and no, not all Yahoo lists, only one in fact).
To effectively eliminate spam, I would gladly visit a web page rather than have the same info appear in my mailbox.
Er... How does that differ from actual spam? I don't give two shakes of a rat's ass whether or not UCE comes from a "legitimate" source. I don't want it. Any of it. So, it really doesn't bother me that, for the benefit of no more "Free v1@6ra" email, I also lose out on "buy our totally legit ink cartridges" at the same time. I consider it a perk, not a problem.
The problem with technologies like this is that they need to gather widespread acceptance to become useful.
Quick grep on my mail archive (which is HUGE) failed to find single message with X-HashCash header. That means even if I would enable it now, it will be practically useless.
Of course wide acceptance could be achieved by the means of widespread grassroots campaign, but this is hard way. If somebody big like GMail, Yahoo Mail or MS Outlook or Apple Mail started to use it , that would have snowball effect.
Wow, they can see the future!
They've been telling people for YEARS that anything under the top-of-the-line computer won't be able to send email or brose the internet!
Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
reminds of what I used to call my student loans while in college...
100% Insightful
We use Lotus Notes at work and I have no trouble E-mailing my greylisting server at home. Our mail relays happily delay the message for 6 hours and then resend it.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Now, with that said... I should point out that the real error in this system is that spammers will just build a database of known hashes.
My personal belief is that the only viable solution to spam is a whitelist augmented with a CAPTCHA challenge-response system.Just add some javascript that would hash the message, some part of the URL or page, or a salt and that would be a required part of sending.
Unfortunately this means that each installation would need its own javascript function. Otherwise you just take a look at the wiki package, see what sort of computations it does, write a program to perform the same computation in C, do a google search for the wiki engine and compute 1000 hashes in the same time the javascript has calculated one.Just to give some practical information:
I'm using hashcash in its basic form, not with Camram. I wasn't aware of Camram until just now, but will probably look into it.
All my emails are sent out with hashcash, and I have SpamAssassin lower the score of emails with hashcash.
The recommended hash length is at least 20 bits. I calculate hashes of 23 bits (per recepient), which takes about 2/3 sec on my Athlon 800. My SpamAssassin config requires at least 20 bits to lower the score, and lowers it more and more up to 26 bits (at which point it has -5).
I think that this is the most effective use of hashcash: once it becomes widely used, then spam rules can become tougher with less chance of false positives.
From reading the article, it looks like Camram is mostly a recipient-side addon to basic hashcash, which involves automated whitelisting and sending challenges to senders of "maybe-spam". Somebody sending hashcash like me will (from the look of things) get past Camram recipients without problems.
Camram seems a bit less cooperative than I'd like, such as using its own Bayesian filter instead of letting the user have an external one like SpamAssassin take a crack at the email. But these are implementational issues, not problems with the Camram concept.
Adoption will be slow. Many companies already have maxed out mail servers. Adding even an 1 second compute cycle to all outbound mail requires a fairly hefty increase in available resources, especially since most mail systems are chosen for bandwidth and IO not math processing power. What happens to a system during peak business hours when 100 people send mail with an average of 5 recipients each ... 500 seconds of computing ... ummm. Imagine a company that sends 5000 messages an hour, or 50000, or ...
If it's not at least a second on a reasonable machine than it's not going to cause ANY headaches for a spammer -- they are just text pumps they can send SO much more mail than a normal server because they don't care about logging, errors, bounces, rejects and retries.
The "use clients inside the company" idea is idiotic -- my mail server is going to punch through the DMZ directly to the desktop of my accounting staff and ask it to generate a key? I don't think so. There is a reason every company with any brains bans Seti/IM/etc. from their internal desktops.
Zombie writers will just interleave writing packets of the current message with SHA-1 calculation for the next message they are sending. Spammers have some really good programmers on their side. If you don't think of them as being at least as good as you are then you have already lost. They are already generating random text at the front and back of the payload, this isn't SHA-1 thing isn't a big deal.
Like SPF, spammers will be the FIRST people to generate proper keys. For the near future a valid key will be a STRONG indicator of spam not a "potential whitelist" feature.
Judging by the +3 and higher comments, it seems that nobody is thinking outside the box. There is no mutual information between an e-mail not having a hashcash stamp on it and being spam. However, if an e-mail has a valid hashcash stamp, it's probably legitimate. Thus, while hashcash can't really help your spam filter reduce false negatives (spams that it lets through), it helps reduce false positives (legitimate e-mails that are blocked).
I personally stamp all of my outgoing e-mail with 20 bits of hashcash postage. It's easy to do and requires very little CPU time. Here's how I do it:
I have stunnel listening on port 465 which forwards connections to MEsmtpd. After authenticating the sender, MEsmtpd pipes the message to hashcash-sendmail which adds 20-bit stamps for each recipient to the e-mail and passes it on to sendmail. I don't have to do anything at all in my e-mail clients. There you have it, easy as pie.
Regarding that stupid "your spam solution won't work" checklist, Spam classification is a hard problem. It can't be solved by any one approach. Even though Hashcash won't stop any spam, it can still make your spam filter more effective.
P.S. SpamAssassin supports Hashcash. See Mail::SpamAssassin::Plugin::Hashcash.
Why not have it compute the stamp before you send the mail? You start a new mail window, that least intensive of applications. In the background it calculates the stamp while you type.
Under that system, you could make the stamps as much as a minute. Very few e-mails are written in less than twenty seconds, most take a few minutes. Really short messages go via IM. You still queue it to go after the stamp is ready to deal with the short e-mails, of course.
The reason this will not work is due to the way a typical SMTP connection actually works -
Steps:
1 - User writes email
2 - User sends email to their ISP's SMTP server
3 - The ISP SMTP tranfers message to destination SMTP server
4 - Destination SMTP server delivers mail to destination mailbox
5 - Profit (just kididng)
The checksum check will actually occur at step 3. Destination server will request the checksum from ISP's SMTP server - NOT FROM USERS MACHINE. Which means that the cost to large or even medium sized ISPs will be very significant. This means unless the end user machine will start sending email out directly to destination ISPs (bypassing step 2, a practice some broadband providers block to curb spam bots), this scheme will cost significant amount of money to ISPs in processing power. This also means that what you propose - calculation of the checksum on user machine during writing of the message is impossible.
True solution to the issue (or at least BEGININGS of a solution) should start with authentication of authorized SMTP servers for domains - like what Yahoo/Google/Microsoft & others were trying to do via DNS a few months back. (Whatever happened to that, BTW??)
-Em
RelevantElephants: A Somatic WebComic...
Are there any other contenders for the most obnoxious recurring duplicate story? This one has come up so many times in the past couple years that it's not even funny. There are others in that category. Which one is the worst?
Are we really so hard-up for news that we're posting yesterday's failed spam solutions today? Why not post a story about breaking the color barrier in baseball - it may not be relevant to the site (although that's even questionable lately), but at least that one worked.
You're being silly. I dare wager that I've expended considerably more effort in administering email systems than you have. But just to be clear : I *want* to solve the problem of Unsolicited Bulk Email. *Solve*, that is, not mitigate. And re-read my post. Would you conclude from it that I don't use such tactics on my own mail servers? Or indeed a range of other measures that sure, work quite effectively today, but likely won't work tomorrow?
Another example : some spamware chokes on multi-line 220 greetings - that's a handy tip that those in the know can take advantage of, but it's not a solution to the problem of Unsolicited Bulk Email. Ditto for secondary MXs that always respond with a 451. Indeed, the irony is that the more widespread such idiosyncracies become known, the less effective the tactics become. That's the nature of the current arms race, and half-baked solutions that don't actually solve the problem just lead us in circles. Hash cash is a half-baked idea. TMDA and challenge response are half-baked ideas. SPF is partially baked at best. SenderID inhabits an alternative reality. DomainKeys shows a glimmer of potential. Internet Mail 2000 is an example of something that I think actually attempts to *solve* the problem, but I won't deny that it's anything other than radical.
So please, for everyone's sake, don't stop showering.
My next sig will be ready soon, but subscribers can beat the rush