Distributed Checksum Clearinghouse vs Spam

← Back to Stories (view on slashdot.org)

Distributed Checksum Clearinghouse vs Spam

Posted by ryuzaki0 on Monday July 30, 2001 @03:43AM from the something-to-think-about dept.

AllSpammedOut writes: "Spam could be more easily detected if everyone were to compare the mail messages they received. Using the Distributed Checksum Clearinghouse, MTAs can report the checksums for all messages they receive and be notified when a checksum has already been reported by many other systems." Obviously there are issues with something like this (especially mailing lists, and worms that do attachments). I suspect spammers would just include a counter to break checksums tho."

6 of 216 comments (clear)

Min score:

Reason:

Sort:

I can't see this working by x+mani+x · 2001-07-30 00:03 · Score: 5

Checksums do not change gracefully given different inputs. As in, if there's the slightest change in a spam email, let's say the date and sendto in the email header change, the entire checksum will appear completely different. Therefore the checksums will only apply to specific spam messages, and not entire classes of similar spam emails (this would be the desirable solution). And most spam mails these days are smart enough to put your name or something in the email subject and body.

A more robust method of spam detection, IMHO, would be to develop an algorithm that would take emails, and encode them in a way that they could be input to a neural network. the output of the network would be 0=not spam/1=spam ... there's definately enough examples out there for it to learn from. The hardest part, as usual, would be to find a way to encode the emails. So let's say you receive an email. Your client then encodes it, and sends the encoding to a local or remote server with the trained neural net. It returns with the results, and your client either dumps the email to your inbox or your spam folder.

If anyone with some machine learning experience wants to work on a project like this with me, send me an email!
Hashed bigrams count by jmv · 2001-07-30 00:34 · Score: 5

One way that would be much more effective is to take pair of words (eg. in this sentence: "One way", "way that", "that would", ...) and apply a hash function that returns a number between 0 and N (N usually between 1000 and 100000). You then compare the histogram (how many of each hash value) of a mail to the database. If histograms are too close to a spam message, you delete it.

--
Opus: the Swiss army knife of audio codec
The checksum is fuzzy by crucini · 2001-07-30 05:16 · Score: 5

Many posters seem to be naively assuming that dcc uses a checksum such as md5 which would change radically for a minor change in input. Dcc does in fact use md5 as a component but the actual checksum is adapted to the requirement.
Download the source tarball, uncompress, untar and read /dcclib/ckfuz1.c. This checksum is clearly designed to be resilient to minor changes.
On a deeper note, it's sad that so many Slashdot readers, including apparently CmdrTaco, underestimate others so severely. Do you really thing someone put in the effort to make something like dcc and never thought about how a message could be varied to evade the checksum? And why not read the linked document first? You would have found:
Because simplistic checksums of spam would not be very effective, the main DCC checksum is fuzzy and ignores various aspects of messages. The fuzzy checksum will need to be changed as spam evolves.
Summary: read before you criticize, and recognize that others probably thought the same thing you're thinking.
Re:Cell phones are great by zulux · 2001-07-30 00:07 · Score: 5

Just leave a message, and tell them your phone number is one of those Bahama-$20-a-second numbers. Wheee!

Check out http://www.scambusters.org/809Scam.html if you don't know what I'm talking about.

--
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
Just Because they would counter it. by BiggestPOS · 2001-07-29 23:51 · Score: 5

Doesn't mean we shouldn't do it. Its an arms race, with each side consistently and constantly upping the ante. We really need to send the spammers a message that we DO still care.
One thing bothers me though, as I was clearing out a large 'stuck' email for one of our dial-up customers the other day, I happened to casually mention "Wow, you sure do get alot of spam!" to which they replied "Whats that?" "You know, junk email" "Junk e-mail? I read it all" People like that are why our boxes receive such garbage. You fire enough bullets and SOMEone is going to die.

--
What, me worry?
Duh... by ErikTheRed · 2001-07-30 01:57 · Score: 5

All you have to do is filter on the words "This e-mail is not spam!"

Leave it to the Slashdot crowd to make things a million times more comples than they need to be...

--

Help save the critically endangered Blue Iguana