Distributed Spam Detection

← Back to Stories (view on slashdot.org)

Posted by CmdrTaco on Saturday December 1, 2001 @05:20AM from the interesting-ideas dept.

A reader writes "There's an interesting project at SourceForge, called, "Vipul's Razor", that uses a gnutella like system to let users exchange spam "signatures" to filter spam. I work at an ISP in Ottawa, we have been using it for last two weeks to stop bulk of spam coming to our POP3 accounts. More impressively, it hasn't tagged any valid mail as spam yet. Here's the scoop from its webpage: "Vipul's Razor is a distributed, collaborative, spam detection and filtering network. Razor establishes a distributed and constantly updating catalogue of spam in propagation. This catalogue is used by clients to filter out known spam. On receiving a spam, a Razor Reporting Agent (run by an end-user or a troll box) calculates and submits a 20-character unique identification of the spam (a SHA Digest) to its closest Razor Catalogue Server. The Catalogue Server echos this signature to other trusted servers after storing it in its database. Prior to manual processing or transport-level reception, Razor Filtering Agents (end-users and MTAs) check their incoming mail against a Catalogue Server and filter out or deny transport in case of a signature match."" Cool idea. I'm up around 80% spam a day on my main mail account. Might be worth a try.

26 of 304 comments (clear)

Min score:

Reason:

Sort:

SpamBouncer by joib · 2001-12-01 05:27 · Score: 5, Informative

I'm personally using SpamBouncer, a procmail-based spam filter. Works fine for me.
Fighting spam by Brian+Kendig · 2001-12-01 05:36 · Score: 5, Informative

I'll post my usual public service announcements here:

SpamCop is a great service for reporting spam; just paste the spam message into the web form, and it'll automatically figure out where the smap came from and send complaints off to the appropriate people.

The Spam Bouncer is a procmail-based personal spam screening tool. It's got some interesting features, but I haven't used it in a long while.

The way I avoid spam is to have my mail client screen out any email which contains any of these phrases:

to be removed to be permanently removed to get removed to get off the list to get off this list to be taken off to remove yourself removal instructions remove in subject line "remove" in subject line remove in the subject "remove" in the subject 'remove' in the subject S.1618 S. 1618

This list by itself catches about 80% of the spam I get.
1. Re:Fighting spam by sqlrob · 2001-12-01 05:40 · Score: 2, Informative
  
  don't forget:
  
  one time mailing
2. Re:Fighting spam by invenustus · 2001-12-01 05:47 · Score: 2, Informative
  
  The way I avoid spam is to have my mail client screen out any email which contains any of these phrases:
  
  Um, are you on any legitimate mailing lists? Don't those get filtered out? I'd imagine half of Slashdot's readership is on one or more of the Linux development lists. I'm Yahoo! Groups mailing list for any number of different interests....
  
  --
  grep -ri 'should work' /usr/src/linux | wc -l
3. Re:Fighting spam by suwain_2 · 2001-12-01 06:20 · Score: 2, Informative
  
  I think there's a potential problem with this... Not sure if you'll ever have any actual problems with it, but...
  Suppose you send me mail with the exact text in your post. Now, I don't actually get any spam, but it's not a problem. BUt let's say I reply, and leave the original text. SUddenly, my mail meets every single criteria that you're filtering.
  
  --
  ________________________________________________
  suwain_2 :: quality slashdot p
Re:Great use of p2p by Anonymous Coward · 2001-12-01 05:38 · Score: 1, Informative

Reptile is a distributed publishing agent.
How do you compute a signature? by cperciva · 2001-12-01 05:41 · Score: 5, Informative

As far as I can tell from a quick glance at this, it looks like the entire message body is being used to compute the signature. This isn't going to work very well -- over half of the spam I receive is "personalized", and that fraction is growing every day.

This could work very well, but we need some way of computing signatures which will be invariant across different copies of personalized spam for this to be effective.

--
Tarsnap: Online backups for the truly paranoid
SpamAssassin uses Razor by wideangle · 2001-12-01 05:49 · Score: 5, Informative
From http://spamassassin.taint.org/:
SpamAssassin is a mail filter to identify spam.
Using its rule base, it uses a wide range of heuristic tests on mail headers and body text to identify "spam", also known as unsolicited commercial email.
The spam-identification tactics used include:
- header analysis: spammers use a number of tricks to mask their identities, fool you into thinking they've sent a valid mail, or fool you into thinking you must have subscribed at some stage. SpamAssassin tries to spot these.
- text analysis: again, spam mails often have a characteristic style (to put it politely), and some characteristic disclaimers and CYA text. SpamAssassin can spot these, too.
- blacklists: SpamAssassin supports many useful existing blacklists, such as mail-abuse.org, ordb.org or others.
- Razor: Vipul's Razor is a collaborative spam-tracking database, which works by taking a signature of spam messages. Since spam typically operates by sending an identical message to hundreds of people, Razor short-circuits this by allowing the first person to receive a spam to add it to the database -- at which point everyone else will automatically block it.
Once identified, the mail can then be optionally tagged as spam for later filtering using the user's own mail user-agent application.
SpamAssassin requires very little configuration; you do not need to continually update it with details of your mail accounts, mailing list memberships, etc. It accomplishes filtering without this knowledge, as much as possible.
Call your ISP and ask if they use it.
Re:Stopping bogus entries? by cwebster · 2001-12-01 05:52 · Score: 2, Informative

search google for SHA digest, read how it works the take a good look at your question
This is just a temporary solution. by mrsam · 2001-12-01 05:52 · Score: 5, Informative

Spam generators have been trying to hash-bust these kinds of filters for years now. A four year spam generator automatically appends random junk at the end of the Subject header or at the tail end of the message, in order to defeat the early hash-based spam filters.

This is probably a 'fuzzy' hash function that should ignore minute variations. However, it goes without saying that if this hash-based spam filter becomes widespread, then the spammers will simply figure out how to hash-bust their way past it.

To have any hope of working over the long term, this kind of an approach must include the ability to distribute not just the hashes themselves, but the hash function as well, so that the hash function itself can be adjusted, when needed.
Re:Stopping bogus entries? by Anonymous Coward · 2001-12-01 05:53 · Score: 3, Informative

You don't seem to understand the concept of a hash. A hash function parses a message into blocks of n-size. If the message is not a multiple of n, it's padded using one of several techniques. It then reduces the size of this block through a complex algorithm that's difficult to reverse. For instance, MD5 uses a 512 bit input, and spits out a 128 bit output. Then it puts the output blocks together to form new n-sized blocks and runs those through the algorithm again until it has one n-size block. This block is run through the algorithm and the output is the message digest, or hash. The chances of two messages having the same hash is inversely proportinal to the length of the hash. The ability of an attacker to find two messages with the same hash depends on the strength of the hash. Hope this clarifies everything.
.derf
Re:Stopping bogus entries? by cheebie · 2001-12-01 05:54 · Score: 2, Informative

In the first place, it's not 20 words, it's 20 characters. In the second place, those 20 characters are simply the SHA signature of the offending message. I assume they key on some of the more constant headers and (possibly part of) the body of the text. By the very nature of digital sigs, it would be difficult (impossible?) to key on something like "any post with the word 'carroway' in it".
Bogus hashes won't tag valid mail by morzel · 2001-12-01 05:57 · Score: 4, Informative

The beauty of a cryptographic hash function is that it's purely one-way: it is very easy to check if two messages are the same (they calculate to the same hash), but it is nearly impossible (or at least very very very hard) to calculate the message for any given hash.

Injecting random hashes into the network won't result in valid emails being tagged, but can flood/DOS the catalogue machines.

It would be possible to create hashes for a number of "probable" emails, but diversity in messages is so big, the chances are quite slim to actually stop a legitimate mail.

--
Okay... I'll do the stupid things first, then you shy people follow.
[Zappa]
Mailwasher by Heem · 2001-12-01 06:06 · Score: 3, Informative

I'm using Mailwasher it works well for me. Allows you to preview your message headers, delete,blacklist and 'bounce' anything you dont want to recieve. Works well on spam as well as email from your ex-girlfriend.

--
Don't Tread on Me
Re:Why this wont work. by glomph · 2001-12-01 06:19 · Score: 2, Informative

Spammers -have- been doing this for a long time, appending some randomly generated crap characters to the subject line, to avoid hash-recognition.
X-YahooFilteredBulk by Malc · 2001-12-01 06:24 · Score: 4, Informative

I noticed that a lot of spam coming through my Yahoo account had been tagged with the header "X-YahooFilteredBulk". I added this to my Exim system filter and I've gone from 20+ spams a day in my inbox to 2 in a week. Thank you Yahoo!

Unfortunately, a lot anti-spam measures (including Exim 3's system filters) only take place after a message has been accepted for delivery. For me, this results in a lot of bounce messages frozen in the queue as they cannot be returned (Hotmail mailbox full, etc). I've switched on features like verifying the sender and the headers, but this doesn't catch them all, and in some cases might even stop some legitimate spam (one of my mailing lists uses incorrect syntax for the "RCPT TO:").

More effective anti-spam systems need to filter before the message has been accepted. If you wait until then, it is already too late and it is on your system. No, refusing accept delivery is much effective IMHO, and forces the MTA's further up the chain to deal with it. They shouldn't have accepted it in the first place! When you get spam, return 550 (or whatever the code is) and let the SMTP client deal with it. In an ideal world, ever provider (ISP, or free service like Yahoo) will implement stricter MTA's. If the spam rejection can be pushed far enough up the chain, life for everyone will easier.

BTW, according to Philip Hazel (a message I recieved to a question I posed on the Exim mailing list), Exim 4 will offer much more functionality along these lines, including the invocation of C funtions after the DATA phase of the SMTP input. I guess this would be the spot to plug in Vipul's Razor, although I don't know what kind performance hit that would lead to. Mr. Hazel also pointed out that some stupid clients are in contravention of the RFC and will continue to try and delivery a message if they recieved 5xx after the DATA phase... oh well: they'll be using my bandwidth but they won't be putting any crap on my server.
Re:Great use of p2p by Sarcasmooo! · 2001-12-01 06:36 · Score: 5, Informative

Just because most people on a P2P network use it for piracy, it doesn't become a pirate-app. I can, and have, used programs that are under attack by the RIAA do download speeches, text documents, etc. At the early point of the 2000 Nader campaign, when he couldn't get 30 seconds of time on M$NBC (much less a place in the debates later on), I used Napster and Scour to find speeches he's given. And when the Department of Commerce kicked of it's 'Safe Harbor' privacy program by failing to put the confidential information provided by the companies involved on a secure site, I downloaded the pages in a zip file despite the site being closed for a fix. Using programs like Scour, I found reading material on scientology, COINTELPRO, and more, all the way up until the day that lawsuits shut them down.
Re:Great use of p2p -- Wont work. by Idolatre · 2001-12-01 06:47 · Score: 2, Informative

It will however require them to send each specific message separately rather than sending large cc's or using some sort of relay. That alone is a big step since right now most spammers can get away with sending a single email message and relying on an open relay to retransmit to a larger group

Most spam I get has my real name somewhere in the body of the message, so it doesn't seem like a problem for spammers :(
Re:One way around potential abuse. by MindStalker · 2001-12-01 06:53 · Score: 3, Informative

Why bother. A hash is only going to affect a very specific mail. How often do you get mails that many other people get the same identical mail if it isn't spam. Listservs might be a problem. But I'm sure you could filter for each of your subscribed servs so that they don't get deleted.
Foreign spam removal by wideangle · 2001-12-01 07:23 · Score: 5, Informative

For the many /.ers who:
a. Use Outlook secretly
b. Receive loads of foreign spam
c. Don't know any foreign languages
d. Don't have any foreign friends
e. Don't have any friends

This Outlook rule is for you!
Apply this rule after the message arrives with Ô or ¾ or Ç or or É or ½ or Í or ò or Ë or ® or Ä or ã or Ï or Ö or Ô in the subject or body delete it and stop processing more rules.
This blocks 99% of foreign spam. Sue Mosher wrote about other effective methods for killing spam in Outlook. Finally, before you reply saying "You dummy, that filter works in any client!" -- You're right.
Similar to DCC by bedessen · 2001-12-01 08:56 · Score: 2, Informative

See also DCC, the distributed checksum clearinghouse. It uses a fuzzy hash so that bulk emails with minor differences are caught. I think the details differ a lot but the idea is more or less the same.
The death of SpamCop by Animats · 2001-12-01 09:01 · Score: 3, Informative

I use SpamCop to filter the mail for four domains. SpamCop used to be quite effective, because it used a challenge/response system, sending new mail sources an autoreply E-mail with a URL that had to be visited before the mail was forwarded. While that's a pain for the sender, it's been 100% effective in stopping spam.
Recently, though, SpamCop switched to a heuristic spam-filter, which is quite leaky. Not only does spam get through, messages from well-known viruses come through. It stops maybe half the spam now.
So SpamCop is now no more effective than typical procmail filters. So there's no point in paying for SpamCop service any more.
Anyone know of a good challenge/response alternative to SpamCop?
there are some scripts by 4444444 · 2001-12-01 09:05 · Score: 3, Informative

you can find some scripts here

http://www.lenny.com/spam

--

http://Lenny.com
4 great justice!
Answers to some questions raised on slashdot. by vipul_ved_prakash · 2001-12-01 09:48 · Score: 5, Informative

Hi,
Some of you point out that Razor's use of SHA-1 signatures can be defeated by introducing randomness in the message. This is true; SHA-1 will eventually be phased out and replaced by a fuzzy hashing mechanism like nilsimsa in future. [http://lexx.shinn.net/cmeclax/nilsimsa.html] [http://www.geocrawler.com/archives/3/2539/2001/7/ 0/6173567/] The protocol is structured to aid change of hashing algorithms seamlessly, without breaking the existing system. Regarding the possibility of poisoning the database, we are working on a reputation system that will assign credit to honest reporters. Once we have a critical mass of users, it would be hard for dishonest reporters to even join the reporting network, much less be able to mount a DOS attack. Some of these issues have been discussed on the razor-users mailing list. The list archives are located at [http://www.geocrawler.com/archives/3/2539/2001/] best, vipul.
Re:spammer said stopping spam in un-american. by zulux · 2001-12-01 09:51 · Score: 3, Informative

Watch out! In some cases an 888 or 800 number can act like a 900 number - It can cost you money!

http://www.bbbsouthland.org/topic110.html
for more information.

--
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
I used SpamBouncer for a year by Anonymous Coward · 2001-12-01 15:09 · Score: 1, Informative

and it was good, but I don't know anything about procmail, so adding my own rules was a pain.
I use Mail::Audit and Mail::SpamAssassin in a Perl filter script now. Works great and I can add custom rules easily enough.
See link for details.