Distributed Checksum Clearinghouse vs Spam
AllSpammedOut writes: "Spam could be more easily detected if everyone were
to compare the mail messages they received. Using the Distributed
Checksum Clearinghouse, MTAs can report the checksums for all messages
they receive and be notified when a checksum has already been reported by many other systems." Obviously there are issues with something like this (especially mailing lists, and worms that do attachments). I suspect spammers would just include a counter to break checksums tho."
Checksums do not change gracefully given different inputs. As in, if there's the slightest change in a spam email, let's say the date and sendto in the email header change, the entire checksum will appear completely different. Therefore the checksums will only apply to specific spam messages, and not entire classes of similar spam emails (this would be the desirable solution). And most spam mails these days are smart enough to put your name or something in the email subject and body.
... there's definately enough examples out there for it to learn from. The hardest part, as usual, would be to find a way to encode the emails. So let's say you receive an email. Your client then encodes it, and sends the encoding to a local or remote server with the trained neural net. It returns with the results, and your client either dumps the email to your inbox or your spam folder.
A more robust method of spam detection, IMHO, would be to develop an algorithm that would take emails, and encode them in a way that they could be input to a neural network. the output of the network would be 0=not spam/1=spam
If anyone with some machine learning experience wants to work on a project like this with me, send me an email!
One way that would be much more effective is to take pair of words (eg. in this sentence: "One way", "way that", "that would", ...) and apply a hash function that returns a number between 0 and N (N usually between 1000 and 100000). You then compare the histogram (how many of each hash value) of a mail to the database. If histograms are too close to a spam message, you delete it.
Opus: the Swiss army knife of audio codec
Download the source tarball, uncompress, untar and read
On a deeper note, it's sad that so many Slashdot readers, including apparently CmdrTaco, underestimate others so severely. Do you really thing someone put in the effort to make something like dcc and never thought about how a message could be varied to evade the checksum? And why not read the linked document first? You would have found: Summary: read before you criticize, and recognize that others probably thought the same thing you're thinking.
Check out http://www.scambusters.org/809Scam.html if you don't know what I'm talking about.
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
One thing bothers me though, as I was clearing out a large 'stuck' email for one of our dial-up customers the other day, I happened to casually mention "Wow, you sure do get alot of spam!" to which they replied "Whats that?" "You know, junk email" "Junk e-mail? I read it all" People like that are why our boxes receive such garbage. You fire enough bullets and SOMEone is going to die.
What, me worry?
All you have to do is filter on the words "This e-mail is not spam!"
Leave it to the Slashdot crowd to make things a million times more comples than they need to be...
Help save the critically endangered Blue Iguana