Distributed Checksum Clearinghouse vs Spam

← Back to Stories (view on slashdot.org)

Distributed Checksum Clearinghouse vs Spam

Posted by ryuzaki0 on Monday July 30, 2001 @03:43AM from the something-to-think-about dept.

AllSpammedOut writes: "Spam could be more easily detected if everyone were to compare the mail messages they received. Using the Distributed Checksum Clearinghouse, MTAs can report the checksums for all messages they receive and be notified when a checksum has already been reported by many other systems." Obviously there are issues with something like this (especially mailing lists, and worms that do attachments). I suspect spammers would just include a counter to break checksums tho."

14 of 216 comments (clear)

Min score:

Reason:

Sort:

Relevant but somewhat off-topic question by Have+Blue · 2001-07-30 00:01 · Score: 4

Why do open relays exist? Is there some beneficial use for them that I'm not aware of? Is this a relay's default state and the sysadmin is too busy or dumb to lock it down? Why doesn't everyone just secure their mail servers and cut off spam before it gets out?
I can't see this working by x+mani+x · 2001-07-30 00:03 · Score: 5

Checksums do not change gracefully given different inputs. As in, if there's the slightest change in a spam email, let's say the date and sendto in the email header change, the entire checksum will appear completely different. Therefore the checksums will only apply to specific spam messages, and not entire classes of similar spam emails (this would be the desirable solution). And most spam mails these days are smart enough to put your name or something in the email subject and body.

A more robust method of spam detection, IMHO, would be to develop an algorithm that would take emails, and encode them in a way that they could be input to a neural network. the output of the network would be 0=not spam/1=spam ... there's definately enough examples out there for it to learn from. The hardest part, as usual, would be to find a way to encode the emails. So let's say you receive an email. Your client then encodes it, and sends the encoding to a local or remote server with the trained neural net. It returns with the results, and your client either dumps the email to your inbox or your spam folder.

If anyone with some machine learning experience wants to work on a project like this with me, send me an email!
1. Re:I can't see this working by 11223 · 2001-07-30 00:24 · Score: 4
  
  A neural net anecdote from a teacher of mine:
  A few years ago, during the big push for a "smart army", millions of dollars were poared into having individual tanks recognize enemy tanks on the battlefield. Well, it turns out they did it with a neural network, and after quite a bit of training they got it to reliably recognize enemy tanks as such.
  Then, the eventual day when the general shows up arrived, and they had to give the demo. As you can probably predict, it crashed and burned. Why? Well, the system was trained on bright, sunny days in the middle of the desert (real sun!), and the demo was on the first overcast day in a year, and the neural net had trained itself to recognize the *shadow* of a tank, not the tank itself.
  Caveat neural-net-user.
Checksums? by Matt2000 · 2001-07-29 23:51 · Score: 4

This sounds like a terrible plan. As mentioned, a simple counter would blow this thing out immediately.

However, a number the represented how closely related an incoming email and a known spam message would be a useful metric. Then you could have fuzzy filters that determined how close you would want to be before outright rejecting a similar message, or maybe just relocating it to a seperate inbox.
--
- Mod Parent Up by CmdrTaco (Score: 2) 02:41 PM April
Re:Issues... by Flounder · 2001-07-29 23:52 · Score: 4

I submitted a story about building a steam-powered microprocessor with RAM made out of banana peels, and that didn't get posted--why this?
Because everybody knows that Orange rinds offer better memory density than banana peels. And orange peels are more resistant to the excess steam from the CPU. Banana peels would just disintegrate with even a minimal amount of overclocking.

--
No boom today. Boom tomorrow. There's always a boom tomorrow. - Cmdr. Susan Ivanova
Hashed bigrams count by jmv · 2001-07-30 00:34 · Score: 5

One way that would be much more effective is to take pair of words (eg. in this sentence: "One way", "way that", "that would", ...) and apply a hash function that returns a number between 0 and N (N usually between 1000 and 100000). You then compare the histogram (how many of each hash value) of a mail to the database. If histograms are too close to a spam message, you delete it.

--
Opus: the Swiss army knife of audio codec
The checksum is fuzzy by crucini · 2001-07-30 05:16 · Score: 5

Many posters seem to be naively assuming that dcc uses a checksum such as md5 which would change radically for a minor change in input. Dcc does in fact use md5 as a component but the actual checksum is adapted to the requirement.
Download the source tarball, uncompress, untar and read /dcclib/ckfuz1.c. This checksum is clearly designed to be resilient to minor changes.
On a deeper note, it's sad that so many Slashdot readers, including apparently CmdrTaco, underestimate others so severely. Do you really thing someone put in the effort to make something like dcc and never thought about how a message could be varied to evade the checksum? And why not read the linked document first? You would have found:
Because simplistic checksums of spam would not be very effective, the main DCC checksum is fuzzy and ignores various aspects of messages. The fuzzy checksum will need to be changed as spam evolves.
Summary: read before you criticize, and recognize that others probably thought the same thing you're thinking.
Re:Cell phones are great by zulux · 2001-07-30 00:07 · Score: 5

Just leave a message, and tell them your phone number is one of those Bahama-$20-a-second numbers. Wheee!

Check out http://www.scambusters.org/809Scam.html if you don't know what I'm talking about.

--
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
Just Because they would counter it. by BiggestPOS · 2001-07-29 23:51 · Score: 5

Doesn't mean we shouldn't do it. Its an arms race, with each side consistently and constantly upping the ante. We really need to send the spammers a message that we DO still care.
One thing bothers me though, as I was clearing out a large 'stuck' email for one of our dial-up customers the other day, I happened to casually mention "Wow, you sure do get alot of spam!" to which they replied "Whats that?" "You know, junk email" "Junk e-mail? I read it all" People like that are why our boxes receive such garbage. You fire enough bullets and SOMEone is going to die.

--
What, me worry?
Duh... by ErikTheRed · 2001-07-30 01:57 · Score: 5

All you have to do is filter on the words "This e-mail is not spam!"

Leave it to the Slashdot crowd to make things a million times more comples than they need to be...

--

Help save the critically endangered Blue Iguana
Spam Hunters by Alien54 · 2001-07-29 23:56 · Score: 4

I still think that we have to make it profitable for folks to go after spammers.
Spammers need to be licensed (preferably with an ear tag, but i'll consider substitutes) and fully identified. all spam needs to have a spam license number in the header someplace.
Fees can then be and need to be collected by your favorite government agencies (I think the IRS, the NSA, and BATF will do for now). ISPs and users need to be able to bill spammers some amount for the spam processed and received. Fees need to be large enough that it is worthwile to go after them, and then we can have bounty hunters. Fees can be high enough to reduce the cost of access. Penalities for abuse can be heavy (20 years in jail, for example)
Then we can have spam hunters who will go out and collect from the spammers for you in exchange for a percentage.

--
"It is a greater offense to steal men's labor, than their clothes"
Re:Laws about PRON by atheos · 2001-07-29 23:59 · Score: 4

Ya, this same argument is used when discussing censoring the entire internet. Ever though about running for office? Spammers aren't the only ones I blame. I run a small mail server (less than 1k messages a day), and every night I e-mail ISP's informing them of open relays, and dialup customers abusing their systems. I have received a few auto-replies, and not ONE god damn response from someone who cares. I'd like to assume that most people are way too busy fixing the problem, but the same culprits keep showing up in my mail log. When discussing legal action against spammers, I think the same legal repercussions should be directed to ISP's who don't know/care how to run a mail server.
Re:What's the big deal? by DeadMeat+(TM) · 2001-07-30 01:12 · Score: 4

The big difference is who pays for it.
When you get a telemarketing call, they pay their long distance company for the right to call you. It doesn't cost you a penny to pick up the phone. When you get junk (snail) mail, the marketer had to pay the postal service to send mail out to each and every address. Not only does it not cost you anything, but in the case of the U.S. Postal Service these bulk rates actually lower the cost of you sending mail, since they use it subsidize part of the cost of personal mail.
Bulk E-mail on the other hand is a different thing. First off, if you're not on a land-based U.S. phone line, odds are you're paying per-minute for your connection -- which sucks since you have to pay to get spam dumped in your E-mail program's inbox.
Even if you have a flat rate connection, you're still inevitably paying for spam mail, whether or not it's directly. Bandwidth isn't free -- take a 5k spam mail message and multiply it by 10 million messages, both of which are probably conversative estimates, and you're talking about 50 megabytes each time a spam is sent out. If you get 3 spam messages a day, that's 150 megabytes of bandwidth just for the messages that you received -- which is only a tiny fraction of all the spam sent out in a day. Multiply 50 megabytes by the countless number of messages, and that's a lot of bandwidth going up in smoke daily.
Guess who's paying for it? Hint: with spammers usually using stolen ISP accounts and fake credit card numbers, probably not them. Another hint: when ISPs' bandwidth costs go up, they pass it on to the users.
Not to mention the fact that spammers shoving millions of messages through creaky mail servers can take them down. So even excluding the monetary damage, what's it worth if a piece of E-mail sent to/from you was on that server when it went down in flames? Your message may be delayed, or it may never show up at all.
How I filter spam by koreth · 2001-07-30 05:04 · Score: 4
I do a few things that are extremely effective in filtering out spam. I have procmail rules to do the following:
- Mail that doesn't list one of my addresses, or the address of a mailing list I know I'm on, in the To: or Cc: lines gets filtered. This alone catches a solid 85-90% of my spam flow, though it seems to be getting less effective as time goes on.
- Mail that's from a free E-mail service (Hotmail, Angelfire, etc.) gets filtered.
- Mail that contains certain keyphrases (e.g. "free" in all caps, or "this is not spam" or "S.1618") gets filtered.
- Mail that has passed through a .cn or .tw or .kr host gets filtered. Those countries seem to have an abundance of open relays. At some point I hope to change this to check against ORBS/DUL instead.
Now, the interesting thing is what I do once I've decided to filter the mail. Since my rules catch legitimate mail, I don't just throw it away. I wrote a small collection of Perl scripts (which I'll release to the world someday soon, but they need documentation) that maintain a whitelist of sender addresses.
If a filtered message is from an address that's marked valid, it's delivered. If it's from an address that's marked invalid, it's discarded. If it's from an unknown address, the message is put in a holding area and an autoreply is sent back to the sender from a magic address asking them to reply in order to validate themselves.
The magic address is unique per filtered message -- it uses qmail's address extension mechanism -- and mail to the magic address never gets delivered to me, so I don't care if it gets added to spam lists. The Perl script behind the magic address does a quick check to make sure it's not processing a bounce, then marks the sender of the original message as valid and delivers the original message (or messages if more than one arrived while awaiting validation).
Held messages are cleaned out by a cron job when they get too old.
This is sort of similar in concept to the password mechanism of SpamBouncer or (a closer cousin) SpamCop's whitelist feature, but it doesn't require senders to retransmit their messages, which I always thought was pretty annoying to ask people to do since not everyone saves their outgoing mail. Granted, asking them to do anything is kind of annoying, but at least this is less so since they can just hit "reply" and "send".
This setup is cool because it allows friends to Bcc me on stuff without my "I must be listed as a recipient" rule trashing their messages, even if they've just switched E-mail addresses. It is admittedly based on the assumption that spammers don't read replies to their mail and/or wouldn't go to the effort of unlocking themselves; I have yet to see a spammer do that, and given the economics of spamming I think that'll be a safe assumption for the foreseeable future, unless this approach gets so popular that spammers start writing automated unlock bots!