Domain: rhyolite.com
Stories and comments across the archive that link to rhyolite.com.
Comments · 99
-
Re:Filtering out spam and black listing email servWhat I would like to see is a spam signature sharing,
Isn't this what Vipul's Razor and DCC are supposed to do?
-
Re:Thanks, but no thanks
This also points up a deeper problem with the entire model: knowing who a sender really is.
Not a problem, cryptographic signature.
forge well-known senders
You can't forge the signature unless you have that person's key.
Yes, of course. but that's a solution that exists in your head, and not one that is widely deployed in any way useful to the proposed system. The invention of concrete does not equate to the building of the Coliseum. To be useful, sender authentication (which would OF COURSE involve cryptographic signatures) has to happen synchronously in the mail transport conversation, because the rest of the model involving the hopeless idea of micropayments depends on having a positive ID before the micropayment is accepted. It has to reliably be done at every mail-accepting site in something less than 30 seconds (i.e. while the sender is waiting for the final ACK to DATA) even at peak times. Consider that even today there are some qmail users who insist that they cannot even validate the existence of a machine-local user synchronously in SMTP, and many sites where the complexity behind the external mail accepting machine is so complex that it is impossible even on paper to come up with a way to confirm or denty the exisstence of an internal user during SMTP. Now mail systems will be asked to check a signature on every message and compare the sender against an internal user's whitelist? What are the security and logistical problems there?
(Hint: they are kinda big)
It depends on a financial clearinghouse to which all senders and all recievers have access... Will a million people like me sign up with a central micropayment clearinghouse?
Yes, that is an issue. First, there can be any number of clearing houses, no need to all sign up with the same one. Second, the vast majority of this could be handled by the ISP's. Sign up for ISP service and they add a few dollars deposit if you want e-stamps included. ISP's can pre-create accounts with a clearing house and simply hand you a key.
That ignores the reality of most bidirectional SMTP participants (i.e. "mail servers" )today. Most of them are run by non-ISP businesses who buy nothing but IP connectivity from their ISP's. Being dependent on an ISP for anything more is simply not acceptable for many businesses, and ISP's generally do a lousy job with email. Being dependent on an ISP for 'e-postage' isn't a workable solution.
Could any existing financial service provider build a system capable of handling millions of users with the speed needed to make this system deployable? I doubt it.
Challenging, but I think doable.
You then go on to talk about the crypto verification system, which was not what I meant.
Any micropayment FUSSP depends on a financial clearinghouse system akin to the one used to clear paper bank checks. That system is the product of systemic evolution over centuries of growth and efficiency improvements to the point where there are now scores of thousands of direct participants and the median time to finally complete a transaction is on the order of a day. A clearinghouse system for micropayments to every mail server operator today would require about the same number of transactions from day one, but would have to directly serve an order of magnitude more direct users (i.e. mail server operators) and operate about 3 orders of magnitude faster.
And incidentally, the existing check clearinghouse system only works at all because of the regulatory oversight and core systems provided by governments, such as the Fedwire settlements system. Everywhere that checks have to cross regulatory jurisdictions they are slowed or excised, and in
-
Re:More info, in a less technical format
From the parent: Warning Signs of a Flawed Proposal
And I would say at least these apply:
(Quoted from the site above)
# You have discovered the Final Ultimate Solution to the Spam Problem (FUSSP).
# You are the first to think of the FUSSP.
# You started looking for the FUSSP after observing that it is impossible to filter more than 99% of spam with fewer than 0.1% false positives by currently available mechanisms.
# You don't plan to make a fortune from the FUSSP, but you do expect fame as its generous and public spirited netizen inventor.
# You are deeply hurt and angry because you are not respected as "spam fighter."
# People don't see the value of the FUSSP because they have axes to grind, are jealous, or are too stupid to understand it.
# You learned how to stop spam during the more than six whole weeks you've been fighting it.
# The FUUSP assumes that your attention is so important that strangers, other than advertisers, from will pay money to send you mail.
# You cannot name several potentially fatal flaws in the FUSSP.
# All you need to do to get the FUSSP implemented and deployed is to publish an RFC or get a law passed.
# You don't recognize any significant difference between deploying and implementing the FUSSP.
# You plan to publish an RFC mandating the FUSSP but have never heard of RFC 2223 or RFC 2026.
# Inventing the FUSSP did not require that you know the difference between RFC 821 and RFC 822 or that they have been replaced by RFC 2821 and RFC 2822.
# You don't know the relevance of "consensus" or "IESG approval" to publishing RFCs.
# Spammers won't ignore, subvert, or exploit the FUSSP if you publish it as an RFC.
# The FUSSP depends on spammers or mail recipients changing their behavior without any immediate gain.
# The FUSSP won't be effective until it has been deployed at more than 60% of SMTP servers and that's not a problem.
# Your job is done after having explained the FUSSP to the IETF or The Industry..
# Programmers will drop everything to implement the FUSSP.
# You know that SMTP has no authentication and have never heard of SMTP-AUTH, SMTP-TLS, S/MIME, or PGP.
# You know that the failure of SMTP servers to authenticate the SMTP clients of strangers is a major bug in SMTP instead of an expression of a primary design goal.
# The FUSSP requires a small number of central servers to handle certificates, act as "pull servers" for bulk mail, account for mail charges, or whatever, but that is not a problem.
** Well, in this case worse -- It requires a whole banking system!
# The FUSSP requires that anyone wanting to send mail obtain a certificate that will be checked by all SMTP servers.
# You have found that most Internet users would be happy to pay $5/month to avoid spam and do not know the prices of anti-virus software or data.
# You have never heard of RFC 2554 or RFC 2487 and the FUSSP includes fixing the lack of authentication in SMTP.
# The FUSSP involves replacing SMTP.
# Your definition of spam differs significantly from "unsolicited bulk email."
# You frequently use math, statistics, and information theory, and almost as frequently notice people hiding grins or stifling laughs. -
Re:More info, in a less technical format
Warning Signs of a Flawed Proposal.
The FUSSP depends on spammers or mail recipients changing their behavior without any immediate gain.
Frankly, all your doing is injecting money into a system that is already rife with abuse and you're expecting that your new system won't be just as badly abused? Now, in addition to scam artists being able to phish for account information, they can hijack e-mail accounts and collect the warrantee monies.
Hmmm, yeah, way to go, add more "for the taking" cash into the mix. -
Here are your spam solutions
Why do solutions always have to cost money or put control is some company's hands? I call bullshit. So here, people, are your solutions to spam:
User-level: spamprobe, bogofilter, spamassassin and spambayes are all very effective statistical filters with bayesian components. Train them well and you will see next to 0 spam, with just about no false positives. I dare say these will filter mail better than a human could do visually.
Those statistical filters aren't scalable. Running a large ISP is more your thing? Then install DCC at your site and enable greylisting on top of it. This will catch nearly all your spam, and false positives are rather rare.
All this software is free and actively developed. There, I've just saved you from spam. Where's my 200 USD consulting fee?
-
Here are your spam solutions
Why do solutions always have to cost money or put control is some company's hands? I call bullshit. So here, people, are your solutions to spam:
User-level: spamprobe, bogofilter, spamassassin and spambayes are all very effective statistical filters with bayesian components. Train them well and you will see next to 0 spam, with just about no false positives. I dare say these will filter mail better than a human could do visually.
Those statistical filters aren't scalable. Running a large ISP is more your thing? Then install DCC at your site and enable greylisting on top of it. This will catch nearly all your spam, and false positives are rather rare.
All this software is free and actively developed. There, I've just saved you from spam. Where's my 200 USD consulting fee?
-
No, not simple
Experience has shown that those who say "simply replace SMTP" do not understand the nature of the problem. It's no coincidence that one of the symptoms of being an anti-spam kook is that your solution involves replacing SMTP
-
Re:E-mail needs to be "closed"
NO. A central authority-based communications system is not going to accomplish much... it will, however, put the power of communications in the hands of few companies (probably monopolies)... it will let them charge fees... and it will ruin the versatility, adaptability, and reliability that we have because there is a great diversity of small hosts handling all their own email.
You want to stop spam? Grab spamprobe or something and watch your spam disappear. You want a more efficient and scalable solution for a big organization? Install DCC and be done with spam for your whole site. Seriously, spam is no longer a problem because both user-side and server-side tools with near perfect accuracy exist. If you're seeing spam, it's because your ISP isn't taking advantage of the filtering solutions that are available.
I'm not talking out of my ass... I've been keeping a close eye on mail and spam issues for the past decade. Spam is dead, so if spam still bothers you force your ISP to employ modern filtering. My university did, and the flood of spam dropped from 100/day to 0 in my account (they're using DCC). At home I employ spamprobe and again I see next to 0 spam.
-
Re:Big Deal
I guess Bill Gates didn't read this.
-
Re:How to solve the spam problem
Looks like you've solved the Final Ultimate Solution to the Spam Problem. Congratulations!
-
You might be an anti-spam kook if...
http://www.rhyolite.com/anti-spam/you-might-be.ht
m l
- You have discovered the Final Ultimate Solution to the Spam Problem (FUSSP).
- You plan to make money by licensing the FUSSP.
- You don't plan to make a fortune from the FUSSP, but you do expect fame as its generous and public spirited netizen inventor.
- The FUSSP requires that anyone wanting to send mail obtain a certificate that will be checked by all SMTP servers.
- The FUSSP involves certificates, but there is no barrier to spammers buying many independent certificates.
(even more at the link) -
Re:Stopping spam.
So, what technology is there right now that deals with certifying legitimacy?
Digital Certificates!
Sigh. Please read this and come back when you have.
-
Re:Fed up reading such non-working stuff
Your original proposal is dead on arrival. You're proposing a solution to spam that involves changing laws, changing SMTP, changing email clients and servers, rejecting email from clients/servers that don't conform to the new standard, putting email certificates in control of a few central servers, and of course, reliance on lawsuits for ultimately stopping spammers.
Congratulations. You have successfully hit almost every point on this list, which was written by someone who actually knows what they are talking about. -
Re:useless patent
It is a method which is widely used: witness the Distributed Checksum Clearinghouse.
-
Re:The end of spam
Ah, here is another one who has found the Final Ultimate Solution to the Spam Problem.
-
Re:E-mail taxShrug, if you want to stop people sending forged email then use PKI.
Taken from the excellent list You Might Be an Anti-Spam Kook If:
the (Final Ultimate Solution to the Spam Problem) [FUSSP] requires that anyone wanting to send mail obtain a certificate that will be checked by all SMTP servers.
the FUSSP involves certificates, but there is no barrier to spammers buying many independent certificates.
you know that certifying that a user legitimately claims a name and has never used some other name is cheap and easy.
-
My toolsetI use a number of levels of filtering:
- Sendmail - Claus ABman has some suggested rules for eliminating bogus AOL addresses, bad message IDs, etc. I just use those, plus some of my own "Subject:" filters
- DCC rejects spam based on how often myself and others have seen it, with a distributed database of hard and fuzzy checksums. It is part of Spam Assassin, and I plan to include that soon, too.
- Procmail is my third level of filtering.
- For the crap that gets through, I mark it as spam to levels 2 (automatically) and 3 (manually), so I don't see that again.
-
You Might Be An Anti-Spam Kook If...
I suggest that everyone read You Might Be An Anti-Spam Kook If... and count the number of relevant items. I stopped counting after a few.
-
Don't get your hopes too high
-
Don't get your hopes too high
-
Re:9 pages?
Summary:
Dslreports maintains an anti-spam forum, which discusses spam-fighting techniques. A recently registered user, AntiSpamCard, posts to the forum advertising its spam-fighting product, AntiSpamCard. This violates the rules of the forum, so another user, AmeritechTech, looks up the domain registration information (registration service: RegistryFly.com). It is full of false information (mostly na, na, na filled in everywhere). AntiSpamCard claims that false info is RegistryFly's fault. Further investigation leads AmeritechTech to believe AntiSpamCard are, in fact, spammers. The evidence:
- Privacy statement on antispamcard.com states that they have an opt-out policy on receiving info
- Domain listed as unwelcome here and here
From these sites, AmeritechTech discovers that antispamcard.com and putamericatowork.com are both owned by Brad Heckman in Palm Beach, FL. IP address for antispamcard.com seems to be within a block assigned to Crescive, Inc. (not to be confused with some car company), which is also mentioned on antispamcard.com. The host for this block of IPs is traci.net. Traci.net has a strict anti-spam policy. Name servers also appear to be owned by Brad, and hosted by traci.net. Registration of the domain names of the name servers also has na, na, na filled into most fields. Putamericatowork.com turns out to be hosted by aitcom.net, which has a very strict anti-spam policy. AmeritechTech also claims Brad owns spaminsurance.com, but I'm not sure why. IP in the same block (which it is) and identical layouts (can't check, antispamcard.com /.'ed), I think.
After various emails to the various hosting companies, antispamcard.com and spaminsurance.com magically have valid registration information. AmeritechTech also gets an email from Brad from igpbrad@hotmail.com (remember that email) saying the registration info is updated. Antispamcard.com registered to Brad, spaminsurance.com registered to Chad Deckard. Same guy? Associates? Who knows, but there seems to be a link (in later posts, this is contested by "mystery poster" Ry2k, but the link seems pretty strong). Hunting around for Chad Deckard stuff turns up claims on this board that he's associated with a scam to sell Kazaa "Gold", which is really just Kazaa Lite, but with a 9.95 price tag, plus it harvests your email. The site's still up, but I couldn't repeat the behaviour claimed by the message poster (posted back on Sept. 11, 2002) that takes you to infogeneratorpro.com, which seems to be the site registered to Chad. Also conspicuous is that Chad's name shows up on putamericatowork.com, a site owned by Brad (link). Also VERY conspicuous is that Brad emailed from igpbrad@hotmail.com, i.e. InfoGeneratorPro? Maybe a coincidence...
Some more looking uncovers other domains in Chad's name: infogenerator.com, usub.net, and finder-network.com. This is along with spaminsurance.com and infogeneratorpro.com. About this time Ry2k shows up to claim that Kazaa Gold was just a client of Chad's, and when Chad found out what they were doing, the account was eliminated. Ry2k claims to be a former employee of Chad's, and warns the forum of tarnishing the good name of legitimate businesses in their persuit of spammers. I go to bullet mode, as it's getting late, and I'm tired:
- Reverse look-ups on contact info for antispamcard.com produce a fax number registered to infogenerator.com.
- Domain name servers (safeidentity.net) for antispamcard.com has contact info updated to Crescive, Inc.
- Someone points out that RegisarFly.com may be shady, something about "using CNAME for their MX records". Maybe someone can fill me in...
- google groups turns up complaints about spam from -
Blackhole/blacklist is wrong approach
I don't like the idea of blacklisting IP netblocks, and here's why: when you see spam coming from any given host, it's rarely the netblock that's the problem, rather it's always the spam content that's the problem!
If you understand that point then you can see why all the collateral damage occurs unnecessarily. You're shooting down the wrong target. We're doing it now because it's easier (blackhole IP, bandwidth saved) but the consequence is too great to ignore: we're fracturing Internet-wide communication more and more every day!
We should focus instead on content-based spam filtering, and share that knowledge to improve efficiency. Accuracy skyrockets and collateral damage virtually disappears! You can use intelligent software like spamprobe to classify mail as spam, for instance. There's also the Distributed Checksum Clearinghouse, which lets mail servers around the world determine what's spam based on collective mail data.
A million mail servers sharing with each other what they know about the appearance of this week's spam would be killer. I'd love to see that. -
AnswersYou said you'd like to actually reject some mail. For this to work it has to be done during the SMTP transaction. You can't wait until the LDA gets its hands on the message. You have to do it at the MTA level. SpamAssassin can still do this. However now you need to glue it to Sendmail via a Milter. I highly recommend MIMEDefang for your milter. Actually if you're rolling it out for 50,000 users then I recommend you purchase the commerical version called CanIt. That way you get support and features that aren't in the open-source version. MIMEDefang is a wonder tool. David did a helluva job on it.
I personally use a large number of DNS blacklists. I call them from Sendmail and reject mail with them. Many people don't like DNSBLs; of course I believe these people are ignorany fools who couldn't admin a mail system if their life depended on it. That's ok. At the very least you should be able to use the DNSBLs that list open relays, open proxies, open SOCKS boxes, and vulnerable formmail.cgi web servers. We can surely all agree that you don't want your mail server talking to another mail server that's known to be vulnerable. Most of these specific lists require that an open * be abused before they list them. I'd also contend that we can all justify using Spamhaus's Spamhaus Block List (SBL). It lists known spammers and it very specific about it. You can block roughly 75% of spam with that list alone. Where you use these DNSBLs is up to you. Like I said above, I call all of mine straight from Sendmail. You can configure SpamAssassin to call these DNSBLs for you and assign a score you define. It's pretty easy. This way you can still use lists like SPEWS that rely on collateral damage to score mail but not outright block it. I use SPEWS and love it but it does block some legit mail by design. If you only score off of SPEWS you can minimize the FPs while still maximizing your spam filtering efforts. I am preparing to score foreign countries and RFC-Ignorant domains off of this as well.
I do not recommend you use the DCC. I highly recommend you use Razor which IMHO addresses the shortcomings in DCC. Submissions to Razor have to be confirmed unlike in the DCC. This way other people confirm that the message someone submits is actually spam and not JCPenny's spring mailing list. SpamAssassin can make these calls as well.
The mail system you're describing is going to be fairly large. This isn't something you want a single box handling. Ideally you'd put the spam and AV checks on a mailhub ahead of the actual MTA or cluster of MTAs. These boxes act as a spam firewall of sorts and takes the CPU intensive tasks you mentioned off of the actual mail server. I'm not actually using this type of setup myself but I will be eventually. There was a Slashdot article a while back about a setup roughly your size and what I guy did to make it work. It was quite a nice setup. I can't find the link now. IIRC, he scored mail and then sent probable spam via a seperate mail queue to a seperate spool for each user. Then using IMAP the user could check their probable spam for FPs. It was a nice setup.
You also mentioned Bayesian filtering. Let me make something very clear. Bayesian filters must be applied on a user by user basis. You can't simply enable Bayes for all 50,000 as one lump sum. It will never be able to learn what is an isn't spam that way. You have to let it learn on a user by users basis. The existing Bayes abilities within SpamAssassin don't work well (or at least easily) when SA is called from MIMEDefang. There are supposedly hacks for this but I have yet to see a working one. Along those same lines user-defined preferences also don't work well (or at least easily) fro
-
Maybe not completely...
...but SpamAssassin in combination with Razor and Distributed Checksum Clearinghouse works quite well on most mail servers I've seen.
-
This is not the way to stop spam
New email registries will decrease spam? Set up by online marketers? No, sorry, I don't buy that at all. Remember what their interests are. The problem at hand is... most spammers don't care about creating inconveniences. They are like greedy undisciplined children, and won't stop spamming unless they are forced to (by law, vigilante retaliation, etc.)
To say something constructive now. There are two neat server side spam filtering projects I really like because neither uses IP-based blacklists (blacklists can bring a lot of collateral damage and require frequent judgement calls).
Spamprobe can be run from .procmailrc and uses a Bayesian scoring type of approach. It's a user-level solution which requires some training, but once it's accurate it's quite amazing. Currently it's missing only 3% of my incoming spam.
The Distributed Checksum Clearinghouse also runs server side and uses fuzzy checksums to identify mail that is being received by a suspiciously large number of mail hosts around the world. A brilliant idea which works better than you may think. I have never seen a false positive with this system, and it misses about 1/4 of incoming spam. Effectiveness will improve as more hosts join the distributed checksum system! -
Re:Bad Addresses
How is this different from the open-source Vipul's Razor, Pyzor or DCC, all of which are already in wide use through their easy integration with SpamAssassin?
Clearly a proprietary system just won't be as good because it needs, by its very nature, a lot of subscribers to be effective. Having said this, Cloudmark seems to do alright by using Razor's network. -
Summary of IETF ASRG discussionsFour days ago when this was mentioned on slashdot, I posted the following summary of what had been discussed. Sadly, this summary is still pretty complete.
From what I take from all this discussion is that the only "solution" to spam is to do the types of things that we have been doing for years, but to do more of it and quicker. Use well run DNS blacklists (Spamhaus SBL, ordb, dsbl, etc.), use good content filters (bayesian filters, etc.), use bulk mail detectors such as DCC or vipul's razor, etc.) and per-user whitelists and blacklists.
Or, combine all of the above techniques by using SpamAssassin
--
I've been subscribed to the list since near the beginning and have been following it fairly closely. Much of the discussion has been rehashes of old topics such as "what exactly is spam?", "make the sender pay something, either money or CPU", etc.
The most interesting discussions that I've seen so far are:
- Mail transfer programs (MTA) such as sendmail, exim, qmail, etc., should keep track of sender-recipient pairs. The first time the sender-recipient pair shows up, sendmail (or whatever) should issue a "temporary delivery failure". This will force the sending mail transfer program to queue the mail and resend it later. This is completely backwards compatible and doesn't require end users to do anything.
Most spam specific programs will not queue and retry, and thus the spam will be dropped.
Spammers that use real mail transfer programs or open relays will need to be able to hold all their outgoing spam for a while, increasing the spammer's costs and slowing down the delivery of spam. Legitimate email will not be thrown out, it will only be delayed and only for the first time.
Of course, you don't really want the databases to remember every sender-recipient pair forever, nor do you want to remember pairs that were added by spam so this really isn't a "first time" database, but it is close.
Apparently the "canit" program already does this, but I had not heard of this technique before.
- Spam filtering really needs to be done while the email is being received. Sendmail can already do this with the milter filter, but other MTAs should also. Most mail servers are I/O bound, not CPU bound so this really isn't much of a burden on the server.
If you filter during the email receive process, you can make the sending MTA do the bounce. This means that you will not have to deal with spammers forging "from" and "reply-to" headers. You won't have to clean up bounces that never succeed, nor will you be responsible for bouncing spam to another victim that the spammer selected for the "from" or "reply-to" headers.
Also, false positives will recieve a bounce message instead of just disappearing. This reduces the danger of important email being lost.
- There are also several proposals to deal with ways of verifying that email being sent from a given IP address and claiming to be from a certain domain is actually authorized to send email claiming it is from that domain.
Right now, there are DNS records that tell you which IP addresses are valid to try and send email to for a given domain (the MX records), but many ISPs have different machines for sending and recieving email. There are currently no DNS records to tell you which tell you which IP addresses a domain will send email from.
The problem with this kind of proposal is that there are many people who think they have legitimate reasons to forge "from" or "reply-to" addresses. It also forces ISPs to make sure that every time they add a new outgoing mail server, they need to update the list of valid IP addresses. If they forget to do this, then only bleeding edge spam filters will detect a problem.
- Mail transfer programs (MTA) such as sendmail, exim, qmail, etc., should keep track of sender-recipient pairs. The first time the sender-recipient pair shows up, sendmail (or whatever) should issue a "temporary delivery failure". This will force the sending mail transfer program to queue the mail and resend it later. This is completely backwards compatible and doesn't require end users to do anything.
-
You can sign up for the mailing list here:https://www1.ietf.org/mailman/listinfo/asrg
Among many, many others, I saw Vernon Schryver, the guy behind Distributed Checksum Clearinghouse, on the list. It's been pretty high volume, though, and I haven't had a chance to really spend some time reading it yet.
-
Re:Always with the legislation...
An excellent idea, and such a system already exists, see the Distributed Checksum Clearing house:
http://www.rhyolite.com/anti-spam/dcc/
There's also Vipul's Razor:
http://razor.sourceforge.net/
Which works on a similar principle, but by checksum of reported spam mails rather than by volume. The more times a checksum is reported (and who by) the more likely it is to be spam - beyond a certain level checksums will be considered spam. Razor catches a good amount of spam for me.
The best way to fight spam imo is to employ a mix of anti-UCE tools, ie DNSBls to block connections + rbl-milter to 'tag' mail based on a very wide range of DNSBls + Spam checksum clearing house (eg Razor) + a content filter to rate mails according to content and whether they have headers inserted by aforementioned anti-uce tools. -
Re:He missed DCC - Distributed Checksum Clearingho
Yes, DCC looks very promising. My university uses it and I have never seen it mark a message as spam when it wasn't (this is very good).
It often misses spams, but as more people run DCC servers the detection will improve. Detection also improves as spammers target more recipients at once - in a way, they're announcing their presence to the system.
Keep an eye on this one! See the dcc FAQ. -
you need dcc.Many people use this at their site in conjunction with spamassasin.
http://www.rhyolite.com/anti-spam/dcc/
Checksumming is a great solution today. It allows you to whitelist servers you do want, and it's possible to setup interfaces to allow individuals to control what's whitelisted, as well as tune the threshold of what's acceptable.
It allows you to choose which checksum provider(s) you want to trust. It allows you to contribute to the blockage of spam in a community architecture by providing checksums of external mail you recieve to other checksum providers.
--
othermark -
Re:Implement this idea
It's been done. You want the Distributed Checksum Clearinghouse.
-
Table turningAnybody else remember Robert McElwaine?
Just wait until these bozos start getting tons of "political" e-mail from nut cases like McElwaine. I suspect that then they'll start saying "Oh, political spam is only OK if it comes from a legitimate candidate."
There's no hope, though. The junk-fax laws and the anti-telemarketing laws already exempt political appeals. Never mind that a ban would be perfectly constitutional (under the time, manner, and place doctrine). There's no way the politicians are going to write a law that makes it harder for them to "communicate with their constituents".
Fortunately for me, DCC is apolitical. It doesn't give a hoot what the content is, as long is it's unsolicited and bulk.
-
DCC vs StatisticsI'll always be wary of Statistically filtered spam mail. Especially if your simply filtering on the probabilities of words. Plus I think this is something that spammers can figure a way around by altering their choice of words and phrases
The only "trait" that all spam mail has is that the same message is sent to hundreds or thousands of recipients. A trait which can not be altered.
The Distributed Checksum Clearinghouse (DCC) filters on exactly this aspect. You can find it here
The mail server runs DCC on every incoming message and computes a fuzzy checksum for the message. This checksum is then reported to a central set of servers which record the presence of this checksum and then reports back to the mail server the number of times others have reported a similar message. If you get a high number back its spam and the mail server rejects the message.
Similar messages generate identical checksums. So personalizations and random tokens do nothing to circumvent the filtering.
I think that if every existing sendmail/qmail server ran DCC then spam would simply cease to function instantly. Currently though I don't preceive there to be a sufficient number of mail servers computing and reporting checksums to make it 100% effective but my server is currently filtering out about 95% of spam mail.
This is not as good as the 99.95% reported by this article but DCC will be more resistant to spammers getting clever and attempting to using statistically rare words or phrases to defeat the anti-spam filter.
-
Re:Easy way to beat spam 100%
What good is that? Well, you've got a ready-made list of messages to filter *out* of your other mail boxes!
WOW, what a *great* idea! What if you could make it so that it knew not only about spam sent to your spam trap, but spam sent to thousands of spam traps and real users? Oh wait, that exists already. Look at Vipul's Razor and DCC.
Shayne
-
Admins: Use DCC!
At the government agency where I work, we get thousands of spam messages a day from slimeballs all over the world. Why? Well, another agency posted all our email addresses to the web once, people in the agency are clueless and "punch the monkey", etc. The usual reasons. We installed an anti-spam program from Trend (e-Manager), but it's a string-search program.
Note to newbies at server-based spam-blocking: String-search programs suck. Half the time I got false positives and had users parading outside my cube with pitchforks and torches. The other half of the time it was false negatives and the user received the spam...and then sent it to us. ALL the time, I was updating the list of banned phrases, which is essentially "shutting the barn door behind the horse".
Recently, I've been testing DCC. It operates on checksums, kind of a "word-of-mouth" approach to spam. The theory is that if you have enough DCC servers, keeping a count of the message checksums, then you can block it based on its "bulkiness". I tested my inbox on a CGI demo of it that they have on their server, and it had a 100% accuracy rate.
I'm not going to go into it much further, since you can read the docs, but this is the first day of the test, and so far, I've got a couple thousand hits; 90% of it is spam (I'm updating my whitelist as I write this). There are a couple programs like it (I heard on the Register that they're putting out one like it using a P2P client model), but I think the future of spam-busting is in this.
Gazing at the lewd/fraudulent/ridiculous subject lines cropping up in my DCC logfile, I realize: If the Internet had a body, this part would be the ass. Seeing all of it makes you almost despair for humanity....except for the fact that DCC caught it, and you know people won't have to look at it. ;)
As far as I can see, the more admins get involved in this, the harder it becomes for spam to propagate...and there are a dozen other tricks you can do to cut it down. So what are you waiting for? Join in the fun. There are some problems with this method (the worst being that you need to "whitelist" legitimate bulk mail or it'll get caught), but it's definitely the best approach to killing spam that I've seen yet.
-
Re:Anti spam p2p, what happened?
-
Re:Interested in MAPS? Also Check out DCC...
-
Re:Interested in MAPS? Also Check out DCC...
-
Re:Interested in MAPS? Also Check out DCC...
-
Try DCC for spam control
DCC, or Distributed Checksum Clearinghouse is a method where when the internet gets slammed with spam, this system adds a header to each of your e-mails. With this header, you can strip out e-mails which are most likely spam. Here's an example header:
X-DCC-wanadoo-be-Metrics: thermonuclear.org 1016; From=0 Message-ID=0
Received=0 Body=many Fuz1=many Fuz2=many
Basically every e-mail you get, you pipe through a program. The program takes all the headers and the body, generates a checksum on them, and stores it in a database. As you can see from above, you have From, Message-ID, Received, Body, Fuz1 and Fuz2. If everyone on the net gets 10,000 e-mails from the same From: line, it would show "many" instead of 0 (zero). Here the Body of the spam, as well as two Fuzzy methods (lossy?:-) identify this e-mail as something that has gone to tons of people, and is marked as such. Then I just have procmail spit it into /dev/null and voila! It's gone.
There are hooks for sendmail and qmail if you want to do it enterprise wide. I've been real happy with it. Only on a few occasions do I lose mail, but mostly because I haven't set up my "white list" or approved senders.
More info on Rhyolite's site.
Peter -
Re:Another spam systemSuch systems can be cool, but they have two major shortcomings. The first is that they cannot start rejecting spam before it has been seen and manually reported by at least one good guy. From my logs, it seems the bad guys like to burst their spews at odd hours, such as when they get home from a hard day begging with a "homeless please help" sign.
Second, it is practically impossible to maintain a list of more than a tiny number of only good guys. If there is any real incentive, the bad guys will get on the list with as many aliases as they need to skew the system. You must either keep the list tiny enough that all members are known to all other members, or you must assume that bad guys are present. Voting or trust schemes can ensure that no more than 5% or perhaps even 1% of members are secret bad guys, but that's not good enough for an anti-spam system that hopes to have a false negative rate lower than 40% and a false positive rate of less than 1%.
As I understand it, this Razor can be used with spam traps (addresses that get no legitimate mail) to largely avoid the first problem. If you are extremely careful and lucky about keeping secrets, spam traps can fix the second problem. The need for lucky secrecy comes in keeping the bad guys from knowing about any of your spam traps lest they send them legitimate mail (e.g. CERT advisories).
A major problem with spam traps is getting the bad guys to spam them. It is easy to build a spam trap that receives some spam, but if you want to reject more than 10-20% of spam, you need more. For example, you need to get the big commercial and political outfits to send their wonderful news to your traps, but they're not going to scrape domain contacts or netnews or use the standard dictionary attack list. (My copy of the standard dictionary attack list is fairly complete. Used with a DCC client, it collects a lot of spam.)
All of that is why I believe in automated checksum reporting without any humans in the loop. I think you must start rejecting copies of a spew within minutes and ideally seconds of its start. That's why one of the design criteria of the DCC is that servers should send the checksums of a message to their peers within seconds of when its receipient count reaches "bulk."
There is a third problem with Fabien Penso's system as I understand it. That is that none of the SMTP envelope or headers are reliable indications of spam, if you want a low false negative rate. If there is one thing that spammers can invent, it is new usernames.
-
Re:Another spam systemSuch systems can be cool, but they have two major shortcomings. The first is that they cannot start rejecting spam before it has been seen and manually reported by at least one good guy. From my logs, it seems the bad guys like to burst their spews at odd hours, such as when they get home from a hard day begging with a "homeless please help" sign.
Second, it is practically impossible to maintain a list of more than a tiny number of only good guys. If there is any real incentive, the bad guys will get on the list with as many aliases as they need to skew the system. You must either keep the list tiny enough that all members are known to all other members, or you must assume that bad guys are present. Voting or trust schemes can ensure that no more than 5% or perhaps even 1% of members are secret bad guys, but that's not good enough for an anti-spam system that hopes to have a false negative rate lower than 40% and a false positive rate of less than 1%.
As I understand it, this Razor can be used with spam traps (addresses that get no legitimate mail) to largely avoid the first problem. If you are extremely careful and lucky about keeping secrets, spam traps can fix the second problem. The need for lucky secrecy comes in keeping the bad guys from knowing about any of your spam traps lest they send them legitimate mail (e.g. CERT advisories).
A major problem with spam traps is getting the bad guys to spam them. It is easy to build a spam trap that receives some spam, but if you want to reject more than 10-20% of spam, you need more. For example, you need to get the big commercial and political outfits to send their wonderful news to your traps, but they're not going to scrape domain contacts or netnews or use the standard dictionary attack list. (My copy of the standard dictionary attack list is fairly complete. Used with a DCC client, it collects a lot of spam.)
All of that is why I believe in automated checksum reporting without any humans in the loop. I think you must start rejecting copies of a spew within minutes and ideally seconds of its start. That's why one of the design criteria of the DCC is that servers should send the checksums of a message to their peers within seconds of when its receipient count reaches "bulk."
There is a third problem with Fabien Penso's system as I understand it. That is that none of the SMTP envelope or headers are reliable indications of spam, if you want a low false negative rate. If there is one thing that spammers can invent, it is new usernames.
-
Re:Another spam systemSuch systems can be cool, but they have two major shortcomings. The first is that they cannot start rejecting spam before it has been seen and manually reported by at least one good guy. From my logs, it seems the bad guys like to burst their spews at odd hours, such as when they get home from a hard day begging with a "homeless please help" sign.
Second, it is practically impossible to maintain a list of more than a tiny number of only good guys. If there is any real incentive, the bad guys will get on the list with as many aliases as they need to skew the system. You must either keep the list tiny enough that all members are known to all other members, or you must assume that bad guys are present. Voting or trust schemes can ensure that no more than 5% or perhaps even 1% of members are secret bad guys, but that's not good enough for an anti-spam system that hopes to have a false negative rate lower than 40% and a false positive rate of less than 1%.
As I understand it, this Razor can be used with spam traps (addresses that get no legitimate mail) to largely avoid the first problem. If you are extremely careful and lucky about keeping secrets, spam traps can fix the second problem. The need for lucky secrecy comes in keeping the bad guys from knowing about any of your spam traps lest they send them legitimate mail (e.g. CERT advisories).
A major problem with spam traps is getting the bad guys to spam them. It is easy to build a spam trap that receives some spam, but if you want to reject more than 10-20% of spam, you need more. For example, you need to get the big commercial and political outfits to send their wonderful news to your traps, but they're not going to scrape domain contacts or netnews or use the standard dictionary attack list. (My copy of the standard dictionary attack list is fairly complete. Used with a DCC client, it collects a lot of spam.)
All of that is why I believe in automated checksum reporting without any humans in the loop. I think you must start rejecting copies of a spew within minutes and ideally seconds of its start. That's why one of the design criteria of the DCC is that servers should send the checksums of a message to their peers within seconds of when its receipient count reaches "bulk."
There is a third problem with Fabien Penso's system as I understand it. That is that none of the SMTP envelope or headers are reliable indications of spam, if you want a low false negative rate. If there is one thing that spammers can invent, it is new usernames.
-
Re:Similar to DCCWhether the checksum is SSH or MD5 is obviously completely irrelevant to whether the input of the hash is "fuzzy."
Some people think that SSH may be more secure than MD5. To date that supposed weakness in MD5 is at most a suspicion. For the purposes of detecting spam, it is also completely irrelevant, since the ability of a bad guy to compute collisions is not interesting. It's mostly merely good politics for dealing with people who don't understand or care to think about any relavent threat model to use MD5 or SSH instead of a long CRC. The hash must be long enough to have a probability of collision less than the probability of failures elsewhere, whether in hardware or software. For that you want 64 or 128 bits. There is very common and reasonably fast code to compute MD5, so I chose MD5 for the existing DCC checksums. There is nothing in the DCC protocol that requires the future DCC checksums to use MD5.
"Normalizing" the message is the essense of "fuzziness." Whether you convert the message to a grammar tree, histogram of words, ignore typical spammer "customizing," or anything else before computing the checksum, you are doing no more or less than "normalizing," at least for any useful meaning of the word I can think of.
Vernon Schryver vjs@rhyolite.com
-
Re:I wouldn't trust this too much.
I'm inclined to trust the DCC far more, but only because it is my code. The DCC is completely independent of NANAE. I suspect most DCC users don't know what "NANAE" means.
Except that both this package and the DCC involve exchanges of checksums, I don't see major similarities between the two. Perhaps that is just my NIH syndrome talking. The DCC has been in use for a bunch of mailboxes since last year.
I think there is a major problem common to both that I deal with by saying "don't do that." That problem is dealing with bad guys. What happens if a bad guy subscribes to a mailing list you like such as CERT advisories, and submits checksums for those messages? My answer is that if your DCC server accepts checksums from DCC clients not under your personal thumb, then you must whitelist all of your incoming mailing lists because your DCC server only detects "bulkness" and not "unsolicited bulkness."
If you accept checksums from strangers, then the effectiveness of your system for detecting bulkness increases significantly, but you can't trust people you don't know. Worse, by the time you have a significant number of users, the hassles of bookkeeping force you to assume that at least a few of them are bad guys.
Then there are mistakes by good guys. What happens if a good guy accidentally submits the checksum(s) of a CERT advisory? The answer for the DCC is the same as for bad guys. If you feed your DCC server with anything except spam traps that cannot receive any legitimate mail, you can consider all hits to be unsolicited bulk email. If you let humans submit checksums of what they think is spam, you cannot trust them to never make mistakes, and so must treat your DCC server as telling you only about "bulkness."
Vernon Schryver vjs@rhyolite.com -
I wouldn't trust this too much.
-
Similar to DCC
See also DCC, the distributed checksum clearinghouse. It uses a fuzzy hash so that bulk emails with minor differences are caught. I think the details differ a lot but the idea is more or less the same.
-
The checksum is fuzzyMany posters seem to be naively assuming that dcc uses a checksum such as md5 which would change radically for a minor change in input. Dcc does in fact use md5 as a component but the actual checksum is adapted to the requirement.
Download the source tarball, uncompress, untar and read /dcclib/ckfuz1.c. This checksum is clearly designed to be resilient to minor changes.
On a deeper note, it's sad that so many Slashdot readers, including apparently CmdrTaco, underestimate others so severely. Do you really thing someone put in the effort to make something like dcc and never thought about how a message could be varied to evade the checksum? And why not read the linked document first? You would have found:Because simplistic checksums of spam would not be very effective, the main DCC checksum is fuzzy and ignores various aspects of messages. The fuzzy checksum will need to be changed as spam evolves.
Summary: read before you criticize, and recognize that others probably thought the same thing you're thinking.