Distributed Checksum Clearinghouse vs Spam
AllSpammedOut writes: "Spam could be more easily detected if everyone were
to compare the mail messages they received. Using the Distributed
Checksum Clearinghouse, MTAs can report the checksums for all messages
they receive and be notified when a checksum has already been reported by many other systems." Obviously there are issues with something like this (especially mailing lists, and worms that do attachments). I suspect spammers would just include a counter to break checksums tho."
Why do open relays exist? Is there some beneficial use for them that I'm not aware of? Is this a relay's default state and the sysadmin is too busy or dumb to lock it down? Why doesn't everyone just secure their mail servers and cut off spam before it gets out?
Show me one that works on my mail server without overloading it. Mail comes in at a rate of about 20 per second. It will need to check it all. If you think the problem is solved at the client, you misunderstand the problem.
now we need to go OSS in diesel cars
Checksums do not change gracefully given different inputs. As in, if there's the slightest change in a spam email, let's say the date and sendto in the email header change, the entire checksum will appear completely different. Therefore the checksums will only apply to specific spam messages, and not entire classes of similar spam emails (this would be the desirable solution). And most spam mails these days are smart enough to put your name or something in the email subject and body.
... there's definately enough examples out there for it to learn from. The hardest part, as usual, would be to find a way to encode the emails. So let's say you receive an email. Your client then encodes it, and sends the encoding to a local or remote server with the trained neural net. It returns with the results, and your client either dumps the email to your inbox or your spam folder.
A more robust method of spam detection, IMHO, would be to develop an algorithm that would take emails, and encode them in a way that they could be input to a neural network. the output of the network would be 0=not spam/1=spam
If anyone with some machine learning experience wants to work on a project like this with me, send me an email!
This sounds like a terrible plan. As mentioned, a simple counter would blow this thing out immediately.
However, a number the represented how closely related an incoming email and a known spam message would be a useful metric. Then you could have fuzzy filters that determined how close you would want to be before outright rejecting a similar message, or maybe just relocating it to a seperate inbox.
o/~ Join us now and share the software
Because everybody knows that Orange rinds offer better memory density than banana peels. And orange peels are more resistant to the excess steam from the CPU. Banana peels would just disintegrate with even a minimal amount of overclocking.
No boom today. Boom tomorrow. There's always a boom tomorrow. - Cmdr. Susan Ivanova
--
I have no fin
no wing no stinger
no claw no camouflage
I have no more to say...
150 Opening BINARY mode data connection for slashdot.sig (129323052 bytes).
One way that would be much more effective is to take pair of words (eg. in this sentence: "One way", "way that", "that would", ...) and apply a hash function that returns a number between 0 and N (N usually between 1000 and 100000). You then compare the histogram (how many of each hash value) of a mail to the database. If histograms are too close to a spam message, you delete it.
Opus: the Swiss army knife of audio codec
Download the source tarball, uncompress, untar and read
On a deeper note, it's sad that so many Slashdot readers, including apparently CmdrTaco, underestimate others so severely. Do you really thing someone put in the effort to make something like dcc and never thought about how a message could be varied to evade the checksum? And why not read the linked document first? You would have found: Summary: read before you criticize, and recognize that others probably thought the same thing you're thinking.
So ...
it's in my head
Check out http://www.scambusters.org/809Scam.html if you don't know what I'm talking about.
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
I guess this doesn't solve the problem of server resources getting stolen, but it certain saves me from having to look at the crap.
-matthew
"THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
One thing bothers me though, as I was clearing out a large 'stuck' email for one of our dial-up customers the other day, I happened to casually mention "Wow, you sure do get alot of spam!" to which they replied "Whats that?" "You know, junk email" "Junk e-mail? I read it all" People like that are why our boxes receive such garbage. You fire enough bullets and SOMEone is going to die.
What, me worry?
All you have to do is filter on the words "This e-mail is not spam!"
Leave it to the Slashdot crowd to make things a million times more comples than they need to be...
Help save the critically endangered Blue Iguana
Spammers need to be licensed (preferably with an ear tag, but i'll consider substitutes) and fully identified. all spam needs to have a spam license number in the header someplace.
Fees can then be and need to be collected by your favorite government agencies (I think the IRS, the NSA, and BATF will do for now). ISPs and users need to be able to bill spammers some amount for the spam processed and received. Fees need to be large enough that it is worthwile to go after them, and then we can have bounty hunters. Fees can be high enough to reduce the cost of access. Penalities for abuse can be heavy (20 years in jail, for example)
Then we can have spam hunters who will go out and collect from the spammers for you in exchange for a percentage.
"It is a greater offense to steal men's labor, than their clothes"
Ya, this same argument is used when discussing censoring the entire internet. Ever though about running for office? Spammers aren't the only ones I blame. I run a small mail server (less than 1k messages a day), and every night I e-mail ISP's informing them of open relays, and dialup customers abusing their systems. I have received a few auto-replies, and not ONE god damn response from someone who cares. I'd like to assume that most people are way too busy fixing the problem, but the same culprits keep showing up in my mail log. When discussing legal action against spammers, I think the same legal repercussions should be directed to ISP's who don't know/care how to run a mail server.
While the system could be broken by using counters, this could be countered by parsing only certain portion of the mail or counting the frequency of certain words. Would work very well on pure text spam, but not on attachement stuff.
What would be funny would be to see the false positives of such a system. Many mails I get from the administration all look the same, I wonder if they would be considered as spam - they are quite similar to spam: useless and to numerous...
When you get a telemarketing call, they pay their long distance company for the right to call you. It doesn't cost you a penny to pick up the phone. When you get junk (snail) mail, the marketer had to pay the postal service to send mail out to each and every address. Not only does it not cost you anything, but in the case of the U.S. Postal Service these bulk rates actually lower the cost of you sending mail, since they use it subsidize part of the cost of personal mail.
Bulk E-mail on the other hand is a different thing. First off, if you're not on a land-based U.S. phone line, odds are you're paying per-minute for your connection -- which sucks since you have to pay to get spam dumped in your E-mail program's inbox.
Even if you have a flat rate connection, you're still inevitably paying for spam mail, whether or not it's directly. Bandwidth isn't free -- take a 5k spam mail message and multiply it by 10 million messages, both of which are probably conversative estimates, and you're talking about 50 megabytes each time a spam is sent out. If you get 3 spam messages a day, that's 150 megabytes of bandwidth just for the messages that you received -- which is only a tiny fraction of all the spam sent out in a day. Multiply 50 megabytes by the countless number of messages, and that's a lot of bandwidth going up in smoke daily.
Guess who's paying for it? Hint: with spammers usually using stolen ISP accounts and fake credit card numbers, probably not them. Another hint: when ISPs' bandwidth costs go up, they pass it on to the users.
Not to mention the fact that spammers shoving millions of messages through creaky mail servers can take them down. So even excluding the monetary damage, what's it worth if a piece of E-mail sent to/from you was on that server when it went down in flames? Your message may be delayed, or it may never show up at all.
Does someone have a link?
-- I had a female crustacean once, but I lobster...
Now, the interesting thing is what I do once I've decided to filter the mail. Since my rules catch legitimate mail, I don't just throw it away. I wrote a small collection of Perl scripts (which I'll release to the world someday soon, but they need documentation) that maintain a whitelist of sender addresses.
If a filtered message is from an address that's marked valid, it's delivered. If it's from an address that's marked invalid, it's discarded. If it's from an unknown address, the message is put in a holding area and an autoreply is sent back to the sender from a magic address asking them to reply in order to validate themselves.
The magic address is unique per filtered message -- it uses qmail's address extension mechanism -- and mail to the magic address never gets delivered to me, so I don't care if it gets added to spam lists. The Perl script behind the magic address does a quick check to make sure it's not processing a bounce, then marks the sender of the original message as valid and delivers the original message (or messages if more than one arrived while awaiting validation).
Held messages are cleaned out by a cron job when they get too old.
This is sort of similar in concept to the password mechanism of SpamBouncer or (a closer cousin) SpamCop's whitelist feature, but it doesn't require senders to retransmit their messages, which I always thought was pretty annoying to ask people to do since not everyone saves their outgoing mail. Granted, asking them to do anything is kind of annoying, but at least this is less so since they can just hit "reply" and "send".
This setup is cool because it allows friends to Bcc me on stuff without my "I must be listed as a recipient" rule trashing their messages, even if they've just switched E-mail addresses. It is admittedly based on the assumption that spammers don't read replies to their mail and/or wouldn't go to the effort of unlocking themselves; I have yet to see a spammer do that, and given the economics of spamming I think that'll be a safe assumption for the foreseeable future, unless this approach gets so popular that spammers start writing automated unlock bots!
An idea similar to this could and should be tried to bring the USENET back into the hands of masses. Having some sort of k5 style moderation used on USENET message id could potentially end spam as we know it. The simplest appriach would be to have a few groups fo competing "moderation" servers that you could query and rate messages by thier message id and then build in some client plugins to filter based on a given threshhold. Of course to really get the system to work, some thought would have to be put into authentication (say only 5 moderations allowed per IP per day, or even have an actualy login proccess to moderate) to keep spammers from moderating up thier own posts. If we have a loose network of many of these moderation servers, they all use different ways to pick out the good posts and user preference would dictate which system works best.
Anyway, just my 2 cents...
"Your superior intellect is no match for our puny weapons!"
Is there some hidden reason why we would want millions of copies of an email worm's attachment to get through? This could actually be part of the solution to two problems.
Also, do note that a common method of spamming is to connect to an open relay and have the relay take care of sending out thousands of identical messages by simply sending thousands of "RCPT TO:" commands. Checksumming spam would completely break this spamming method and would force the spammer to retransmit the entire message for every recipient in order to vary it, thus making the process more costly.
-all dead homiez