Distributed Checksum Clearinghouse vs Spam

Relevant but somewhat off-topic question by Have+Blue · 2001-07-30 00:01 · Score: 4

Why do open relays exist? Is there some beneficial use for them that I'm not aware of? Is this a relay's default state and the sysadmin is too busy or dumb to lock it down? Why doesn't everyone just secure their mail servers and cut off spam before it gets out?

Re:Relevant but somewhat off-topic question by Skapare · 2001-07-30 00:54 · Score: 3
A network of authenticated mail servers could be very useful. But the effectiveness would be limited unless entry to the network requires agreement to terms to apply strong enforcement against spam, such as:
- Limit each dynamic IP host to not more than 1 email message every 2 minutes.
- Require dedicated network owners to agree to the same anti-spam agreement in writing to be allowed access to port 25 outbound or to access unthrottled mail servers.
- Require legitimate bulk mailers to agree to certain terms such as using only opt-in lists even though the law otherwise permits them to use an opt-out list.
- Must provide a contact address and/or telephone number for reporting abuse. Abuse reports from the general public must have a human response within 24 hours. Abuse reports from a member administrator/manager/engineer must have a human response within 2 hours.
--
now we need to go OSS in diesel cars
Re:Relevant but somewhat off-topic question by gorilla · 2001-07-30 00:06 · Score: 3

They exist because up until the early 90's, almost all SMTP servers were open relays. It wasn't until spam started that the MTA authors started putting in anti-relay code, and people started installing the new versions.
Unfortunatly, there are always systems where the sysadmin hasn't updated for years, because it's not causing him any problems.

Re:Just use mail filters by Skapare · 2001-07-30 01:11 · Score: 3

Show me one that works on my mail server without overloading it. Mail comes in at a rate of about 20 per second. It will need to check it all. If you think the problem is solved at the client, you misunderstand the problem.

--
now we need to go OSS in diesel cars

I can't see this working by x+mani+x · 2001-07-30 00:03 · Score: 5

Checksums do not change gracefully given different inputs. As in, if there's the slightest change in a spam email, let's say the date and sendto in the email header change, the entire checksum will appear completely different. Therefore the checksums will only apply to specific spam messages, and not entire classes of similar spam emails (this would be the desirable solution). And most spam mails these days are smart enough to put your name or something in the email subject and body.

A more robust method of spam detection, IMHO, would be to develop an algorithm that would take emails, and encode them in a way that they could be input to a neural network. the output of the network would be 0=not spam/1=spam ... there's definately enough examples out there for it to learn from. The hardest part, as usual, would be to find a way to encode the emails. So let's say you receive an email. Your client then encodes it, and sends the encoding to a local or remote server with the trained neural net. It returns with the results, and your client either dumps the email to your inbox or your spam folder.

If anyone with some machine learning experience wants to work on a project like this with me, send me an email!

Re:I can't see this working by 11223 · 2001-07-30 00:24 · Score: 4

A neural net anecdote from a teacher of mine:
A few years ago, during the big push for a "smart army", millions of dollars were poared into having individual tanks recognize enemy tanks on the battlefield. Well, it turns out they did it with a neural network, and after quite a bit of training they got it to reliably recognize enemy tanks as such.
Then, the eventual day when the general shows up arrived, and they had to give the demo. As you can probably predict, it crashed and burned. Why? Well, the system was trained on bright, sunny days in the middle of the desert (real sun!), and the demo was on the first overcast day in a year, and the neural net had trained itself to recognize the *shadow* of a tank, not the tank itself.
Caveat neural-net-user.
Re:I can't see this working by Sven+Tuerpe · 2001-07-30 00:58 · Score: 3

Checksums do not change gracefully given different inputs.

It depends. If we think of cryptographic hash functions, you are right. They are designed that way in order to avoid collisions and forging of messages that are mapped to a given value by a particular function.

But if we think of error correcting codes, the situation is different. They are designed with the opposite goal in mind -- changing gracefully when certain errors (i.e., small changes for some definition of "small") occur, to allow for reconstruction of the original data.

Ususally both the checksum and the corrupted data (or the corrupted data + checksum string, to be precise) is needed in the case of error correcting codes. But perhaps concepts from both -- closely related -- fields could be combined to create something usable for spam detection under hostile spammer conditions?

--
http://erichsieht.wordpress.com/category/english/

Checksums? by Matt2000 · 2001-07-29 23:51 · Score: 4

This sounds like a terrible plan. As mentioned, a simple counter would blow this thing out immediately.

However, a number the represented how closely related an incoming email and a known spam message would be a useful metric. Then you could have fuzzy filters that determined how close you would want to be before outright rejecting a similar message, or maybe just relocating it to a seperate inbox.

--

Mod Parent Up by CmdrTaco (Score: 2) 02:41 PM April

Re:Checksums? by friscolr · 2001-07-30 01:15 · Score: 3

However, a number the represented how closely related an incoming email and a known spam message would be a useful metric.Then you could have fuzzy filters
i tried that, had very good success. read more about it at:
http://www.blackant.net/code/oth/random/nlp-spamfi lter.php
i collected a sample of 30-plus spam messages as well as 30-plus not spam messages and ran some word and phrase frequency counts on each group, then threw that data into a couple mysql tables. Next i match the phrase and word frequency counts to new mail that arrives, and depending on how closely the new mail matches the known groups, i can tell whether or not the mail is spam.
by tweaking the exact amount needed to be determined as spam or not-spam, i had very, very good success rate - out of 32 messages checked using this method, all were appropriately identified as either spam or not-spam.
I've been meaning to continue with this line of spam detection, increasing the size of the db and testing it on a larger sample of mail (read: all my mail) and then seeing if the results were still as good, but...
-f

--
-f
www.blackant.net

Better idea for checksum clearinghouse? by Hobart · 2001-07-30 00:41 · Score: 3

Seeing as that a key element of spam messages is to get people

to visit particular URL's reply to particular email addresses or call particular phone numbers

perhaps focusing on algorithms that identify these components and check their hashes against a database would be more effective?

--
o/~ Join us now and share the software ...

Re:Issues... by Flounder · 2001-07-29 23:52 · Score: 4

I submitted a story about building a steam-powered microprocessor with RAM made out of banana peels, and that didn't get posted--why this?

Because everybody knows that Orange rinds offer better memory density than banana peels. And orange peels are more resistant to the excess steam from the CPU. Banana peels would just disintegrate with even a minimal amount of overclocking.

--

No boom today. Boom tomorrow. There's always a boom tomorrow. - Cmdr. Susan Ivanova

Just use mail filters by yellowstone · 2001-07-30 00:11 · Score: 3

I've found that a handful of simple mail filters takes care of much of the spam I receive:

Junk anything that comes BCC (preceded by a white-list of subscribed mailing lists). This takes care of 70-80% of the spam that comes my way.
Filter out by keywords in the subject (like "marketing", "webmaster", and "viagra"). This takes care of a good chunk of the rest.

-- I have no fin no wing no stinger no claw no camouflage I have no more to say...

--
150 Opening BINARY mode data connection for slashdot.sig (129323052 bytes).

Hashed bigrams count by jmv · 2001-07-30 00:34 · Score: 5

One way that would be much more effective is to take pair of words (eg. in this sentence: "One way", "way that", "that would", ...) and apply a hash function that returns a number between 0 and N (N usually between 1000 and 100000). You then compare the histogram (how many of each hash value) of a mail to the database. If histograms are too close to a spam message, you delete it.

--
Opus: the Swiss army knife of audio codec

The checksum is fuzzy by crucini · 2001-07-30 05:16 · Score: 5

Many posters seem to be naively assuming that dcc uses a checksum such as md5 which would change radically for a minor change in input. Dcc does in fact use md5 as a component but the actual checksum is adapted to the requirement.
Download the source tarball, uncompress, untar and read /dcclib/ckfuz1.c. This checksum is clearly designed to be resilient to minor changes.
On a deeper note, it's sad that so many Slashdot readers, including apparently CmdrTaco, underestimate others so severely. Do you really thing someone put in the effort to make something like dcc and never thought about how a message could be varied to evade the checksum? And why not read the linked document first? You would have found:

Because simplistic checksums of spam would not be very effective, the main DCC checksum is fuzzy and ignores various aspects of messages. The fuzzy checksum will need to be changed as spam evolves.

Summary: read before you criticize, and recognize that others probably thought the same thing you're thinking.

hmm by Troed · 2001-07-29 23:49 · Score: 3

This system already exists on news-servers and clients, and the spammers have already countered with random data appended to the spam (and random numbers in the subject headers)

So ...

--
it's in my head

Re:hmm by Erasmus+Darwin · 2001-07-29 23:59 · Score: 3

the spammers have already countered with random data appended to the spam (and random numbers in the subject headers)
...and the worst of the bunch -- randomly inserting punctuation in the entire message:
M`A.K,E M:O'N"E,Y F.A`S'T
*shudder* Every now and again, I wish we would have optional accountability in Usenet, similar to how I can set my default read-level on Slashdot high enough that J. Random Anonymous Coward never shows up. Couple that with a clause in the ISPs contract that allows them to assess significant fines against spammers, and we'd be (theoretically) set.
Then I wake up and realize that people'll just steal accounts or even use litigation to block the ISP from cutting them off for spamming. That's when I wish we could just train those kids who want to go on school shooting rampages to just take out spammers instead, killing two birds with one stone.

Re:Cell phones are great by zulux · 2001-07-30 00:07 · Score: 5

Just leave a message, and tell them your phone number is one of those Bahama-$20-a-second numbers. Wheee!

Check out http://www.scambusters.org/809Scam.html if you don't know what I'm talking about.

--

Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.

spambouncer works great for me by misleb · 2001-07-30 02:22 · Score: 3

I am running the Spambouncer procmail filter on my shell/IMAP account. I used to get 10 SPAMS a day. Now I don't get ANY. Its pretty intelligent.

I guess this doesn't solve the problem of server resources getting stolen, but it certain saves me from having to look at the crap.

-matthew

--
"THERE IS NO JUSTICE, THERE IS ONLY ME." -Death

Just Because they would counter it. by BiggestPOS · 2001-07-29 23:51 · Score: 5

Doesn't mean we shouldn't do it. Its an arms race, with each side consistently and constantly upping the ante. We really need to send the spammers a message that we DO still care.

One thing bothers me though, as I was clearing out a large 'stuck' email for one of our dial-up customers the other day, I happened to casually mention "Wow, you sure do get alot of spam!" to which they replied "Whats that?" "You know, junk email" "Junk e-mail? I read it all" People like that are why our boxes receive such garbage. You fire enough bullets and SOMEone is going to die.

--
What, me worry?

Duh... by ErikTheRed · 2001-07-30 01:57 · Score: 5

All you have to do is filter on the words "This e-mail is not spam!"

Leave it to the Slashdot crowd to make things a million times more comples than they need to be...

--

Help save the critically endangered Blue Iguana

Spam Hunters by Alien54 · 2001-07-29 23:56 · Score: 4

I still think that we have to make it profitable for folks to go after spammers.

Spammers need to be licensed (preferably with an ear tag, but i'll consider substitutes) and fully identified. all spam needs to have a spam license number in the header someplace.

Fees can then be and need to be collected by your favorite government agencies (I think the IRS, the NSA, and BATF will do for now). ISPs and users need to be able to bill spammers some amount for the spam processed and received. Fees need to be large enough that it is worthwile to go after them, and then we can have bounty hunters. Fees can be high enough to reduce the cost of access. Penalities for abuse can be heavy (20 years in jail, for example)

Then we can have spam hunters who will go out and collect from the spammers for you in exchange for a percentage.

--
"It is a greater offense to steal men's labor, than their clothes"

Re:Laws about PRON by atheos · 2001-07-29 23:59 · Score: 4

Ya, this same argument is used when discussing censoring the entire internet. Ever though about running for office? Spammers aren't the only ones I blame. I run a small mail server (less than 1k messages a day), and every night I e-mail ISP's informing them of open relays, and dialup customers abusing their systems. I have received a few auto-replies, and not ONE god damn response from someone who cares. I'd like to assume that most people are way too busy fixing the problem, but the same culprits keep showing up in my mail log. When discussing legal action against spammers, I think the same legal repercussions should be directed to ISP's who don't know/care how to run a mail server.

False Positives by Matthias+Wiesmann · 2001-07-30 00:01 · Score: 3

While the system could be broken by using counters, this could be countered by parsing only certain portion of the mail or counting the frequency of certain words. Would work very well on pure text spam, but not on attachement stuff.

What would be funny would be to see the false positives of such a system. Many mails I get from the administration all look the same, I wonder if they would be considered as spam - they are quite similar to spam: useless and to numerous...

Re:What's the big deal? by DeadMeat+(TM) · 2001-07-30 01:12 · Score: 4

The big difference is who pays for it.

When you get a telemarketing call, they pay their long distance company for the right to call you. It doesn't cost you a penny to pick up the phone. When you get junk (snail) mail, the marketer had to pay the postal service to send mail out to each and every address. Not only does it not cost you anything, but in the case of the U.S. Postal Service these bulk rates actually lower the cost of you sending mail, since they use it subsidize part of the cost of personal mail.

Bulk E-mail on the other hand is a different thing. First off, if you're not on a land-based U.S. phone line, odds are you're paying per-minute for your connection -- which sucks since you have to pay to get spam dumped in your E-mail program's inbox.

Even if you have a flat rate connection, you're still inevitably paying for spam mail, whether or not it's directly. Bandwidth isn't free -- take a 5k spam mail message and multiply it by 10 million messages, both of which are probably conversative estimates, and you're talking about 50 megabytes each time a spam is sent out. If you get 3 spam messages a day, that's 150 megabytes of bandwidth just for the messages that you received -- which is only a tiny fraction of all the spam sent out in a day. Multiply 50 megabytes by the countless number of messages, and that's a lot of bandwidth going up in smoke daily.

Guess who's paying for it? Hint: with spammers usually using stolen ISP accounts and fake credit card numbers, probably not them. Another hint: when ISPs' bandwidth costs go up, they pass it on to the users.

Not to mention the fact that spammers shoving millions of messages through creaky mail servers can take them down. So even excluding the monetary damage, what's it worth if a piece of E-mail sent to/from you was on that server when it went down in flames? Your message may be delayed, or it may never show up at all.

"Pretty close" checksums? by geekplus · 2001-07-29 23:56 · Score: 3

Aren't there algorithms that will report messages that are pretty close, i.e., within N arbitrary bits of each other, as the same checksum? Or at least something approximating a checksum..., i.e. two different checksums that nonetheless return true when passed to an equals(cs1, cs2) method?

Does someone have a link?

-- I had a female crustacean once, but I lobster...

How I filter spam by koreth · 2001-07-30 05:04 · Score: 4

I do a few things that are extremely effective in filtering out spam. I have procmail rules to do the following:

Mail that doesn't list one of my addresses, or the address of a mailing list I know I'm on, in the To: or Cc: lines gets filtered. This alone catches a solid 85-90% of my spam flow, though it seems to be getting less effective as time goes on.
Mail that's from a free E-mail service (Hotmail, Angelfire, etc.) gets filtered.
Mail that contains certain keyphrases (e.g. "free" in all caps, or "this is not spam" or "S.1618") gets filtered.
Mail that has passed through a .cn or .tw or .kr host gets filtered. Those countries seem to have an abundance of open relays. At some point I hope to change this to check against ORBS/DUL instead.

Now, the interesting thing is what I do once I've decided to filter the mail. Since my rules catch legitimate mail, I don't just throw it away. I wrote a small collection of Perl scripts (which I'll release to the world someday soon, but they need documentation) that maintain a whitelist of sender addresses.

If a filtered message is from an address that's marked valid, it's delivered. If it's from an address that's marked invalid, it's discarded. If it's from an unknown address, the message is put in a holding area and an autoreply is sent back to the sender from a magic address asking them to reply in order to validate themselves.

The magic address is unique per filtered message -- it uses qmail's address extension mechanism -- and mail to the magic address never gets delivered to me, so I don't care if it gets added to spam lists. The Perl script behind the magic address does a quick check to make sure it's not processing a bounce, then marks the sender of the original message as valid and delivers the original message (or messages if more than one arrived while awaiting validation).

Held messages are cleaned out by a cron job when they get too old.

This is sort of similar in concept to the password mechanism of SpamBouncer or (a closer cousin) SpamCop's whitelist feature, but it doesn't require senders to retransmit their messages, which I always thought was pretty annoying to ask people to do since not everyone saves their outgoing mail. Granted, asking them to do anything is kind of annoying, but at least this is less so since they can just hit "reply" and "send".

This setup is cool because it allows friends to Bcc me on stuff without my "I must be listed as a recipient" rule trashing their messages, even if they've just switched E-mail addresses. It is admittedly based on the assumption that spammers don't read replies to their mail and/or wouldn't go to the effort of unlocking themselves; I have yet to see a spammer do that, and given the economics of spamming I think that'll be a safe assumption for the foreseeable future, unless this approach gets so popular that spammers start writing automated unlock bots!

For USENET! by gnovos · 2001-07-30 01:14 · Score: 3

An idea similar to this could and should be tried to bring the USENET back into the hands of masses. Having some sort of k5 style moderation used on USENET message id could potentially end spam as we know it. The simplest appriach would be to have a few groups fo competing "moderation" servers that you could query and rate messages by thier message id and then build in some client plugins to filter based on a given threshhold. Of course to really get the system to work, some thought would have to be put into authentication (say only 5 moderations allowed per IP per day, or even have an actualy login proccess to moderate) to keep spammers from moderating up thier own posts. If we have a loose network of many of these moderation servers, they all use different ways to pick out the good posts and user preference would dictate which system works best.

Anyway, just my 2 cents...

--
"Your superior intellect is no match for our puny weapons!"

Worms? by All+Dead+Homiez · 2001-07-29 23:53 · Score: 3

Obviously there are issues with something like this (especially mailing lists, and worms that do attachments)

Is there some hidden reason why we would want millions of copies of an email worm's attachment to get through? This could actually be part of the solution to two problems.

Also, do note that a common method of spamming is to connect to an open relay and have the relay take care of sending out thousands of identical messages by simply sending thousands of "RCPT TO:" commands. Checksumming spam would completely break this spamming method and would force the spammer to retransmit the entire message for every recipient in order to vary it, thus making the process more costly.

-all dead homiez

Slashdot Mirror

Distributed Checksum Clearinghouse vs Spam

28 of 216 comments (clear)