Distributed Checksum Clearinghouse vs Spam
AllSpammedOut writes: "Spam could be more easily detected if everyone were
to compare the mail messages they received. Using the Distributed
Checksum Clearinghouse, MTAs can report the checksums for all messages
they receive and be notified when a checksum has already been reported by many other systems." Obviously there are issues with something like this (especially mailing lists, and worms that do attachments). I suspect spammers would just include a counter to break checksums tho."
...of Email The Coward Can Do Without:
.kr, .cn. or .tw (And many might add .ru too)
1. Any email in big5.
2. Any email which is from
3. Any email in HTML.
4. Any email in script.
5. Any email that mentions a long dead House Bill in the body.
6. Any email that mentions sex or sexual items in the subject.
7. With specific exceptions for subscribed mailing lists, email that isn't for (example:)
thecoward@thecoward.tld .
8. Any email that has that crap bit about saving trees in the body.
9. Any email that has many variants of (example) thecoward@ in the To: line.
Chances are anything to 'thecoward' and 'thecowherd' and others isn't worth anyone's time.
10. Any email that somewhere proclaims "This is not spam."
Looking at the To: header and at the content seems more workable than merely using the subject line, though that level would at least skim off the less creative crud. No, The List is not complete.
--
The Coward
so why don't we have a spammers vs. hackers war? they could fight over who's the most annoying, winner take all. spammers spam the crap outta hackers sites and mailboxes, while hackers launch DOS attacks on the spammers service provider. it might just keep both sides busy enough to buy the rest of us a litlte piece and quiet.
This is true. It is claimed that over 90% of spam is sent through open relays, meaning that the spammer uses multiple RCPT TO commands and sends the identical message to each recipient. Most spammers don't have the bandwidth that it takes to send each user a personalized message, because they are almost always on a throwaway dialup. Only the professionals can afford to send unique messages, because they often have a DSL line and a pink contract with their ISP (which permits them to continue spamming).
A lot of spam I get already has a unique identifing ID included in it. I assume this it to track valid e-mail addresses of people stupid enough to try to be "removed" from their lists.
No, that's what's called a hashbuster. It's used to counter the mailer software that checks outgoing messages in an attempt to prevent spam.
Some spammers, in particular the ones who turn around and resell e-mail addresses, use the removal drop-boxes to validate addresses, and/or remove addresses that are bounced to Errors-to: drop-boxes.
Why do open relays exist? Is there some beneficial use for them that I'm not aware of? Is this a relay's default state and the sysadmin is too busy or dumb to lock it down? Why doesn't everyone just secure their mail servers and cut off spam before it gets out?
Seriously though - 90% of all the slashdot posts here are "wouldn't my email address break this?" or some variant thereof. Sure if the programmers who built it were really really stupid. Do this instead:
- Strip the headers (all you have left is the body)
- Remove all blank lines (not carriage returns)
- Remove the top 5 and bottom 5 lines
- Checksum
Bulk emailers (the software) don't want to be adding random words or characters within the body of a message -- too much processing for something you're doing 500,000 times.... Pretty tough to do with changing content anyway (very difficult to make it work in a generic fashion).Of course the original article alluded to this:
...the main DCC checksum is fuzzy and ignores various aspects of messages. But slashdot readers don't read the articles in much the same way the moderators don't read the postings... :-)
I am not interested in articles about life extension advancements.
Hello "Don't Spam JeffSketch's hotmail address", what's that address? JeffSketch@hot... hmmm something.com... JefSkatch@hotmail.com? no... that's not it. I wonder why it would be so dangerous to post an email address on a web forum.
Maybe I should forward you the contents of my Hotmail account. It is up to 540 pieces of filtered spam. Only about 50% of my spam gets successfully blocked. This renders my occasional-use Hotmail account nearly useless.
But wait, that's a free account. I guess that means that nobody is paying for it. Neither in my time nor Microsoft's money.
Alas dear troll, if indeed you were not afraid of spam you would not be hiding your email address at all.
I can show many spams that have a counter in the subject.
Slashdot won't allow me to post the comment if I quote them.
It seems to me that if there were a way to verify an email address with an ISP as legitimate, you could at least use this to filter out spam from addresses that you couldn't reply to. Could be a bit of a problem as those that first switch over to verifying email addresses would be the first targets for spam.
This of course leaves yahoo mail, and other services where one can sign up for a valid email address online. But it seems that those services could implement some scheme whereby you can only send 50 pieces of email in the first two weeks of having your account. Does this sound workable?
JET Program: see Japan, meet intere
This certainly looks familiar.
;)
No, I did propose something along these lines on Advogato back in February in a piece entitled "Realtime Worm Filtering System," but I'm not accusing the author of ripping off my blatently-obvious and not-uncommon idea. That system is intended to stop worms, obviously, and not spam. Worms tend to be easier to stop because they're seldom wholly polymorphic, often retaining enough similarities that collaborative filtering is quite feasible.
-Waldo
This sounds like a terrible plan. As mentioned, a simple counter would blow this thing out immediately.
...fuzzy filters...
;)
:)
Remove all digits? Although, if the spammers got smart and used hexadecimal or alphanumeric counters then you're stuffed.
Now you're talking. Simply do a word count for 'Make' and 'money' and 'fast' and '!!!!' and use that as your spam baseline
...or 'See', 'Natalie', 'Portman', 'naked'
Why is it that many people who claim to support standards have such atrocious spelling and grammar?
And, if so, with cheap storage, why not store the whole SPAM; in case of a high number of checksum matches, a final precide double-check could be made.
--
Here is my method: http://slashdot.org/comments.pl?sid=01/07/30/14442 47&cid=108
--
bp
So, which spamhaus do you work for?
Praise the Force Field! Praise the Laser Project! Slackware Loon #19830573
Checks like this is usually done by the MTA, as with RBL you can just add a warning that this might be spam to the headers of the mails..
Actually, 5K times 10 million is 50 gigabytes, not 50 megabytes. So it's a lot worse than you state above.
Here in Norway for example, which is probably about representative, about half the people dial into the Internet with modems, or by ISDN. Flat rate on telephone-calls is uncommon, the vast majority of that half pay about $1 an hour for the connection to the net. That works out as $50 a GB for those on ISDN, and $66 a GB for those on modem.
Even this estimate still assumes that the link is perfectly full, that is, that a person with a ISDN-connection downloads email at a rate of 64kbps, which isn't nessecarily true. (allthough it should be close for your ISP's local mailserver)
People forward their spam to a database. The database searches for similar entries using diff or keyword searches. Once the database gets two or three variants of a single piece of spam it should be able to come up with a pattern match. Sure it'd be CPU intensive, but someone clever could distribute it. It'd end up being kind of like a virus scanner.
Many spammers are already including the following tricks (I've seen them all):
While not all spammers are doing this, yet, that some are indicates that newer spamware has this capability. Spammers are already aware of the increased bandwidths they have and taking advantage of that to personalize the messages in some way. For example the spam I get to help me enter my website (which I get many times for each of my domains) on search engines generally lists the name of my site in the message body. This is a technique that might have worked 3 years ago, but it is not as effective now, and looks like it will be ineffective within a few months of broad use.
now we need to go OSS in diesel cars
That's old technology. Obviously it shows how inept you are. I've already had to deal with a spam attack on a server just this morning where even the rejected attempts (2-3 per second!) were slowing it down. Then even with an ipfilter they are still SYN pounding it. Your delete button doesn't solve the problem. You're years behind what's even going on. But maybe you can learn new stuff when you finally grow up to college age (if you can pass your exams).
now we need to go OSS in diesel cars
Paper spam has never been as significant a problem as electronic spam, because the sender pays most of the costs for paper spam whereas the receiver pays most of the costs for electronic spam. There is an economic throttle for the sender of paper spam. If we allow electronic spam to simply continue, it will scale up as most businesses would then perceive it to be legitimate. You'd end up having to delete thousands and tens of thousands per day. It would keep growing if there is the perception that it is legitimate and that it cost you nothing to delete.
Electronic spam does cost the receiver time and money. This includes the receiver's ISP. If you are on a dialup line (as most people still are because of the DSL debacle) the spam takes up more time on your mail downloads. As the problem grows it takes more time.
To sum it up, it might not appear to be that much of a problem for you at this moment, but if you scale it up to where it would be if no effort was made to stop it, you would not be able to handle the load. Some of us do understand the scaling issue. If every business in the world sent you ONE message PER YEAR, and somehow this were just evenly spread out in time, you would be deleting this crap every 2 to 3 seconds, 24 hours a day, 7 days a week, all year long. The scale of the internet is simply not suited for spam.
If you really have to get back to work, what do you do? Do you send spam all day, or do you delete it? Or do you just not get much of it?
now we need to go OSS in diesel cars
Show me one that works on my mail server without overloading it. Mail comes in at a rate of about 20 per second. It will need to check it all. If you think the problem is solved at the client, you misunderstand the problem.
now we need to go OSS in diesel cars
New this fall on FOX:
Lorenzo Lamas stars in e-Renegade!
Reno Raines is back! After being forced at gunpoint to break RSA's strongest encryption while getting a blow-job, Reno is wanted by the Financial Businessmen Incorporated, the FBI, for violation of the DMCA! On the run from bought-and-paid-for law enforcement, Reno has changed his identity and now works for his Native American friend, Robbie Spamkiller.
Chasing down unlicensed spammers, Reno searches for the evidence that will clear his name, bring justice to those who "blew" his career and reputation, and let him marry Robbie's sister, Cheyenne "Shy" Phillipshead.
--
--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
(317) 872-2225
This is Customer Service for Comcast Cable in Indianapolis. I would guess it's as close as you can come on the phone.
--
--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
Who knows if this actually happened.. It's really too bad that AI professors can't get their own material. I'm sure EVERY compsci student who took a software engineering class heard the anecdote about the computer-controlled radiation/x-ray machine, that killed a patient by giving them like 10,000 times the normal dose. This error was traced to a lack of bounds checking in software.
--
I would love to read an interview with a spammer about their business. Obviously, spam works or they wouldn't do it. Who are their customers? How many people respond to their spam? How did they get involved in the spam market? Maybe Slashdot should have an "Ask a Spammer" interview??
cpeterso
I don't know about anyone else here, but I use the Spambouncer procmail filter to ferret out my inbox. It checks MAPS, ORBS, parses the message for obvious 'spam'-type words and phrases (Make Money Fast!!!) and then allows you to either route the mail to /dev/null, bounce it, report it or both bounce and report... Not too hard to configure for your individual users or on a global system-level either.
der dee der.
Actually they are already countering it without even knowing about it.
A lot of spam I get already has a unique identifing ID included in it. I assume this it to track valid e-mail addresses of people stupid enough to try to be "removed" from their lists.
--
However, a number the represented how closely related an incoming email and a known spam message would be a useful metric. Then you could have fuzzy filters that determined how close you would want to be before outright rejecting a similar message, or maybe just relocating it to a seperate inbox.
Well with a CRC I guess a slighly changed message will only have a slightly different checksum. But there is a good chance that 2 dissimlar messages will have the same sum. You'd need something like a large md5 sum to make sure your false positives are low. But the problem with md5 is just changing 1 byte largely effects the sum. So there would be no fuzzy matchting.
--
that's funny, my first AI prof told us the exact same anecdote. It seems to be pretty popular in AI circles, as I've seen it on several machine learning websites as well. :)
Checksums do not change gracefully given different inputs. As in, if there's the slightest change in a spam email, let's say the date and sendto in the email header change, the entire checksum will appear completely different. Therefore the checksums will only apply to specific spam messages, and not entire classes of similar spam emails (this would be the desirable solution). And most spam mails these days are smart enough to put your name or something in the email subject and body.
... there's definately enough examples out there for it to learn from. The hardest part, as usual, would be to find a way to encode the emails. So let's say you receive an email. Your client then encodes it, and sends the encoding to a local or remote server with the trained neural net. It returns with the results, and your client either dumps the email to your inbox or your spam folder.
A more robust method of spam detection, IMHO, would be to develop an algorithm that would take emails, and encode them in a way that they could be input to a neural network. the output of the network would be 0=not spam/1=spam
If anyone with some machine learning experience wants to work on a project like this with me, send me an email!
This sounds like a terrible plan. As mentioned, a simple counter would blow this thing out immediately.
However, a number the represented how closely related an incoming email and a known spam message would be a useful metric. Then you could have fuzzy filters that determined how close you would want to be before outright rejecting a similar message, or maybe just relocating it to a seperate inbox.
Also, when they are stupid enough to put an 800 type number up, call from a *pay phone*. Why? Because it is untraceable and it costs the spammer whatever the pay phone costs ($0.35 in most of the USA right now) plus long distance charges. I keep a list of all the spammers phone numbers that I need to put on the internet for all to benefit.
o/~ Join us now and share the software
It is claimed that over 90% of spam is sent through open relays, meaning that the spammer uses multiple RCPT TO commands and sends the identical message to each recipient.
This also makes spamming "hit and run". By the time the spam starts arriving the spammer has gone.
Most spammers don't have the bandwidth that it takes to send each user a personalized message, because they are almost always on a throwaway dialup.
They also need processing power to do the personalisation, software which understands the full SMTP spec (rather than that required to get by sending to a relay) and can handle identd requests.
Only the professionals can afford to send unique messages, because they often have a DSL line and a pink contract with their ISP (which permits them to continue spamming).
They also need a frequently changing IP address...
re: Countermeasures: the spammer would integrate something random into the message that would foul identification. There is simply no way around this. So the question becomes: at what point does the countermeasure become so expensive and difficult that the spam itself reaches the point of diminishing returns?
Forcing spammers to customise each email would make spamming considerably more expensive. Because they then have to actually send each email, rather than being able to use third party relay machines to duplicate their junk.
What you are describing is basically a "Teergrube" (german for tar pit).
Problem is that ISP provided third party relays render this method useless...
What you really need is some generic mail-message pattern-matching and a complaint & moderation system. You don't really need automatic detection of spam, since there would probably be plenty of people willing to complain if there was an effective place to complain to, and if mail clients as well as mail servers could consult the spam-detection service to eliminate confirmed spam before it reaches your eyeballs.
Because everybody knows that Orange rinds offer better memory density than banana peels. And orange peels are more resistant to the excess steam from the CPU. Banana peels would just disintegrate with even a minimal amount of overclocking.
No boom today. Boom tomorrow. There's always a boom tomorrow. - Cmdr. Susan Ivanova
Lets invent a cheap signature algorithm which you can run on a message to digest it into a form you can compare with a blacklist, but can cope with simple measures to vary the contents. (I am using the term "signature" as in "signature analysis used by hardware engineers for fault detection".
One idea (lets hear more):
Compute a word-frequency histogram. Reject noise words like "a" and "the". Include long words that are relatively rare. Take the top 5 words that are common and the top 5 words that are long and store them with a frequency count.
To compare similarity, consider each word as a seperate dimension and calculate the Euclidian distance.
-- Jamie
A lot of the spam I revieve already contain random data in the subject line (or in the body of the message to break this). This is why the subject of some spam looks like "Free pr0n 3j1I". I beleive this practice goes way back to when bots would scan newsgroups and kill spam messages. The random subject lines would render them usless.
While the system could be broken by using counters, this could be countered by parsing only certain portion of the mail or counting the frequency of certain words. Would work very well on pure text spam, but not on attachement stuff.
Actually, that technique works reasonably well.
I used to administer the trouble ticket system for a very large ISP that got so many complaints that they became unmanagable. (Not all their fault, but that's another story.) Anyway, we had software that would take the bodies of the emails being complained about, remove whitespace and anything that wasn't in the dictionary, sort it, uniq it and generate an MD5 of the list of words that came out. I never studied it over the long haul, but tests on live data showed a match rate of about 90%.
The real flaw in DCC is that it doesn't protect early recipients of the spam, because it won't have built up enough hits to be considered bulky. The only way to make it work would be to submit the checksum and hold the letter for some amount of time to see how bulky it gets. Most people would probably not like the lag time they'd get on legitimate mail.
Female Prison Rape in NY
I had an idea for detecting and proving when a site has sold your email address or spammed you - I posted it to comp.mail.misc here:
d %4 0hotmail.com&hl=en&safe=off&rnum=1&selm=f9cd2ccc.0 107291915.68572f17%40posting.google.com
;-)
http://groups.google.com/groups?q=author:greenr
Here is the post:
---
Lots of websites now have privacy policies saying "we will not sell
your email address, or send you unsolicited emails" - sometimes you
have to check a checkbox to make it come into effect. But how can you
trust them? Well with this hypothetical idea, you wouldn't have to:
The first part is easy, and well-known. Generate a one-time email
address (various means are available). Associate it with the site
(e.g. by naming it something like fake-addy-ebay@mycomputer.com if
you're registering with ebay, say) Give it to the sign-up form,
purchase form or whatever. If you actually want to receive a limited
kind of email from them, or want to know if/when they've broken their
promise, ensure that this one-time email addy forwards to a real
address of yours, or at least ensure that you'll be able to read mail
sent to it.
Trivial extension (and too trivial to be patentable, besides, this
post constitutes sufficient Prior Art) - How can you prove that you've
never used this email address again, by accident or on purpose, in
order to nail the spammers in court? You can't on your own - but what
about a trusted third party? Call it TTP. In order to make the process
virtually beyond suspicion, TTP would provide special form-filling
software, activated by you the user. When you're asked for your email
address by a site you don't trust, you'd activate the software and it
would send the form to TTPs servers, which would generate a one-time
email address, store it in their database, and forward the filled-in
form to the real site (transferring an existing session to another IP
could be tricky, but you'd probably just have to log in to the site
again through TTP's proxy if you weren't already using it - and in
most cases you wouldn't be logged in to the site yet, you'd still be
registering). TTP database would also record which privacy options
you'd ticked on the form. The real generated email address is NEVER
transmitted to your machine - the software is designed so it's
virtually impossible for the user to surreptiously find out what the
email address is. All mail sent to that address (up to say 100 emails)
would be logged in TTP's database, and forwarded to the user with the
To address replaced with the user's real address. Total storage space
required per user on TTP's servers: miniscule.
Possible problems:
1. Would a court trust TTP sufficiently to make their evidence pass
muster on its own?
2. How could TTP prevent a malicious user finding out the generated
email address by filling out a form on a server which THEY (the
malicious user) owned or had access to? Fortunately, they don't have
to PREVENT it - all they have to do is RECORD where the data was sent
- so if the address was actually sent to haxxors.com owned at the time
by J. Cracker, and the complaint is by J. Cracker against yahoo.com,
you can be pretty sure it's a scam. Heh.
3. Obviously, you have to trust TTP itself with your personal info!
That's why it's called a Trusted Third Party, duh!
If these problems can be overcome, the best part is, if someone is
stupid enough to sell your one-time email address to hundreds of
spammers, you could use this virtually cast-iron evidence from TTP to
sue both the list-seller for breach of contract (or whatever laws are
most suitable to sue them under) AND ALL the spammers you could track
down to a physical address (if you're in a suitable antispam
jurisdiction)! Catch them red-handed! If they're selling something
they have to be traceable to a physical address. And it's not only you
that benefits - ANY rogue company would think twice about selling
their email lists after one or two high-profile cases like that.
If you wanted to be REALLY REALLY secure against arguments that "TTP
could have issued same email addy twice by accident" - but this is
probably over the top - maybe you could get it notarised with a
"Trusted Fourth Party" specialising in notarising (but it'd have to be
cheap). Disclaimer: I know nothing about notarising.
Now, one unsolicited email is not necessarily enough to interest the
courts in all antispam jurisdictions. But with this process automated,
it'd be far easier to form a class action suit to make it more
sizeable (I would imagine - IANAL) - when one stupid company sent out
spams to 10,000 TTP users that had registered with them - especially
REPEATED "this is a one time mailing" spams, grrrrrr - they'd be
toast! And in some jurisdictions TTP could join the class action suit
and claim even more damages, because it'd in effect be the ISP for
those email addresses! (Remember, TTP is not just a spam honeypot -
you can choose to receive legitimate kinds of emails through it - so I
wouldn't imagine the defendants could seriously argue it was
entrapment)
If anyone's seen this idea before somewhere, please point me in the
direction...
If this works, someone who got in first with being a Trusted Third
Party in this scheme could clean up... if lots of people care that
much about nailing spammers... and I think they do! I would DEFINITELY
pay a modest amount to use this kind of service!
Let's look at the ideal scenario:
1. You never list your email addresses anywhere public (at least not
without spamproofing them first)
2. You use TTP software for all your transactions, because it's so
easy
3. ANY spam you get can be tracked down to either one of:
i) A rogue company who you can PROVE in court either spammed you, or
sold your address without your permission.
ii) If TTP has no record of it, you can be 99% sure it's because you
have a rogue email PROVIDER who sold your email address, or listed you
in a public member directory even though you told it not to. In this
case, you can't necessarily prove it, but there's a simple remedy -
switch provider.
Comments? Obvious flaws? It is very late so I might have missed
something obvious. Please let me know - but please DON'T email me -
I'll read replies on comp.mail.misc.
Female Prison Rape in NY
What is the phone equivalent of goatse.cx?
I, for one, am always pissed off when I spend hours on my dialup leeching pr0n from some newsgroup, only to discover that I already had it on my drive under a different name. Somewhere along the line, somebody renamed the series.
A database of image characteristics (like those used by D'peg! would make this less likely. People would be discouraged from changing the file's originally agreed-upon universal name.
Publishers could upload their image characteristics into the database, along with a tag like "Originally from somepornsite.com". So if I someday come across an image I really like, I could check the database and see where to get the rest of the series. This would supercede obnoxious watermarking to indicate the source of an image.
This could of course be used for mp3's too, which are all-too-often renamed incorrectly. Checksums would be enough for a particular song encoded by a particular encoder with particular parameters, but audio fingerprinting would be necessary to accomodate different encoders. I don't think that's a deal-killer.
By the way, D'peg! is really neat, but it's amazingly slow the first time if you have a lot of images. (As in: My win98 uptime record is 11 days. Dpeg's projected completion time was 34. Good thing it can resume after a crash.)
Plus, mail filters have the benefit of not breaking in the face of a trivial change to the body (like a counter).
--
I have no fin
no wing no stinger
no claw no camouflage
I have no more to say...
150 Opening BINARY mode data connection for slashdot.sig (129323052 bytes).
--
I have no fin
no wing no stinger
no claw no camouflage
I have no more to say...
150 Opening BINARY mode data connection for slashdot.sig (129323052 bytes).
In addition to the raw message checksum, possible filters include:
- checksum paragraphs individually
- ignore whitespace, punctuation and capitalization.
- drop HTML tags
- drop numbers
- drop all non-dictionary words.
Then analyze what gets by and add new filters as appropriate.A while ago I thought of another way spam could be blocked. Instead of checksumming the whole message, why not just create a database of say, phone numbers and fax numbers and domains included in spams? MTAs could check to see if an inbound email contains any spammer-advertised phone numbers or domains in a database and flag the message appropriately. Spammers cannot easily change telephone numbers.
Spammers could write the phone numbers or domains oddly in the email to try and pass the filter, but a sufficiently liberal regular expression could pick it out.
Speaking of regexps, maybe this database could be a giant database of regular expressions which match snippets of spam messages?
Some of the researchers associated with Google have been working on identifying similar, but not identical web pages. At a talk I attended, J. Cho described the process of fingerprinting documents (rather than checksumming them).
These papers might be interesting:
The fact of the matter is that, no matter how difficult we try to make it to send out mass, unsolicited e-mail, it will always be a cheap form of advertisement. Compared with other forms, spam is cheap and easily automatable. With little effort and cost, I can send spam to millions of unique individuals. The core function of the computer is automation. Technology is blind. You can't get the computer to automate most forms of communication without allowing the automation of unsolicited advertisement.
Since we can't increase the technological costs of spam, the only good way to make spam more costly to the sender is to regulate it. The govt. should require that all spam have "[SPAM]" in the subject line, with additional labels for spam that advertises stuff that's inappropriate for certain groups of individuals (PORN, GOATSEX, etc.). Furthermore, the govt. should impose stiff fines and penalties for violators ($$$ & jail time, maybe even the chair?).
It's nice to think that you can fix everything with technology. Over time, everyone comes to the realization that government is there for a reason; it's a necessary evil that does, on occasion, make the world a better place.
Jason
Naive Bayes is a damn good text classifier that has already proven to be a good spam identifier. The problem is that no such automated classifier system will ever be able to get rid of most spam without throwing away a few non-spam messages too. It's a fact of life.
Btw, check out
http://www.picante.com/~gtaylor/spam/
to read about someone's efforts to get rid of spam via a slew of techniques, including an automated classification system (Naive Bayes).
Jason
I think the next sendmail/postfix/whatever release should come with such a rule by default.
In case of big ISP's they should force POP auth or something to allow relaying and, if somebody sends spam they JUST CHARGE HIM an insane ammount of money for each spam sent.
---
HTML is obsolete. It's time for a new, simpler and richer markup language.
---
HTML is obsolete. It's time for a new, simpler and richer markup language.
Has it occurred to you that admins who run open relays probably don't check the mail of the postmaster account very often? They probably don't know how to set up the MTA to forward the postmaster's mail to a login account. (Maybe the box was set up by a friend, or the person who used to run it has left the company - either way, whoever's currently in charge of the box has no idea how to configure the MTA otherwise they wouldn't be wasting their resources running an open relay.)
--
diff message1 message2 | wc -l
--
This will never work! Once again we are trying to use the wrong tool for the job. The problem with this approach is it focuses on the SPAM itself and not the SPAMMER. Killing SPAM will not stop SPAMMER from SPAMMING again. However, there is an excellent research group that has developed tools specifically designed to eliminate the root cause of the SPAM problem. That's why I am proposing using Magnum Research excellent anti-spammer utility. If every sysadmin would update their security tools with Magnum Research's hardware and used them daily against SPAMMERS, SPAM would be gone in a matter of months, if not weeks.
Strange women lying in ponds distributing swords is no basis for a system of government.
Sorry, put in a trailing slash. Here is the correct link.
Strange women lying in ponds distributing swords is no basis for a system of government.
Check out the services of spamcop.net It lets you submit spam mail, extracts the IPs from the header, discarding the bogus ones, allows you to automatically send a note to the abuse department of the offending ISP, and tells you exactly how many people have submitted the same message, and now many times that ISP has been responsible for messages that generated spamcop complaint. Very cool.
There's 10 types of people in this world, those who understand binary and those who don't.
Because the government runs on kickbacks. If they don't get their pound of flesh, legislation doesn't get passed. You think those mongoloids in suits actually give a shit about the issue itself? They're just a bigger version of the mafia, and the Don requires his tithe for you to do business on his turf.
Deosyne
certain histogram patterns would be common in non-spam email messages
There is no such thing as a "common histogram". They will all be different. However, two identical messages will have identical histogram. Two almost identical messages will have almost identical histogram (while two almost identical messages usually have very different checksums).
The reverse is usually true (of course, there's not absolute garanty): two almost identical histograms are very likely to come from two almost identical messages. The more you increase N (the bound for the hash result and size of the histogram), the more accurate the result. Also, using trigrams would likely be more accurate.
While it is possible for spammers to vary their messages, they cannot send thousands of messages that are really different one from the other and this is why this technique should work almost all the time. Of course, you'd need to get rid of headers and any html tags and garbage before computing the histograms.
Opus: the Swiss army knife of audio codec
They could look at the histogram of a bunch of regular emails and just send the spam messages whose histograms are close to a lot of the histograms of the regular emails. This assumes that spammers would have access to the hash function though.
Once again, your assuming there is such a thing as a "normal histogram". Remember, that we're not checking whether the "histogram" is normal or not. We're checking to see if this particular histogram (from a spam e-mail) as been seen more than x times before. Even if the manage to get a piece of spam match to the exact same histogram as a valid e-mail, the piece of spam will still be rejected with the unfortunate side effect that the valid message might be rejected (but since they cannot read your mail, they cannot get one of your e-mails rejected).
As for the CPU time, sure you don't want to make N too large...
Opus: the Swiss army knife of audio codec
what similarity function would you use?
Manhattan distance, aka L1 norm of the difference.
And the reason I said it should work is that I have already tried that a while ago for a slightly different task. The only thing I'm not too sure it CPU time.
As for histogram randomness, evan if the N-dimension (N ~ 1000) vectors (histograms) don't have a uniform distribution in the 1000-D space. You'd have to be very unlucky to get the same (or approx.) value for all of the 1000 bins.
Opus: the Swiss army knife of audio codec
One way that would be much more effective is to take pair of words (eg. in this sentence: "One way", "way that", "that would", ...) and apply a hash function that returns a number between 0 and N (N usually between 1000 and 100000). You then compare the histogram (how many of each hash value) of a mail to the database. If histograms are too close to a spam message, you delete it.
Opus: the Swiss army knife of audio codec
I've noticed though that since my throwaway accounts all have 'spam' in the user name, I actually can go months without having to delete the forwarder, despite using it regularly. Perhaps they automatically filter the 'spam' part out in an attenpt to parse the actual address, as lots of people stick 'NOSPAM' et al into their addresses in an attempt to block mail harvesters.
--
Download the source tarball, uncompress, untar and read
On a deeper note, it's sad that so many Slashdot readers, including apparently CmdrTaco, underestimate others so severely. Do you really thing someone put in the effort to make something like dcc and never thought about how a message could be varied to evade the checksum? And why not read the linked document first? You would have found: Summary: read before you criticize, and recognize that others probably thought the same thing you're thinking.
So ...
it's in my head
This isn't hard. Most of the spam I get is directed to me personally, ie. my name or email/nickname is on it. That changes the checksum for everybody except for people with my same name.
--------
--------
It's OK to be social, just don't tell anyone about it.
This method won't work because identical spam is often sent from many different relays. Of course most spam includes at least SOMETHING that is either random (random numbers in the subject is common) or personalized ("Dear xyz@example.com").
If spam were this easy to filter it would have been implemented a long time ago.
Actually, I just don't have any friends. Last week I probably got about 10-15 calls a day and I don't recall getting a single call from someone I knew. My girlfriend in Belize who usually calls me every week didn't even call :(.
--BEGIN SIG BLOCK--
I'd rather be trolling for goatse.cx.
Things you think are in the Constitution, but are not.
I never said I wouldn't try to reduce the amount of mail I get. But it doesn't bother me to the point where I would want laws against it or legal action to be taken against anyone who sent me an unsolicited e-mail.
--BEGIN SIG BLOCK--
I'd rather be trolling for goatse.cx.
Things you think are in the Constitution, but are not.
I haven't figured out why the online community is so uptight about getting unsolicited e-mails and having companies selling out their e-mail addresses to people. About 80% of the mail I get at my house is unsolicited and 95% of the phone calls I get are salesmen. How did they get my number/address? Most likely the phone company (or credit card company) sold it to them and this is a very common practice. I guess I just don't see what the big deal is when e-mail is so much easier to delete/avoid than unsolicited real mail and phone calls.
:).
After all, e-mail is checked when I want to check it and when I see any subject asking me what the state of my sexual arousal is or offering me a university diploma or just something from 348djkea23@yahoo.com I know I can easily delete it. It's not like a phone call where I don't know who's calling me and I kind of have to answer it right then. I do have caller id, but that's an additional service I have to pay for and most of my friends are out of state so they show up as 'unavailable' along with all the other salesmen.
For unsolicited mail, I have to handle it no matter what, I can't just leave it in my mail box forever. But with e-mail I never really have to see it and I can delete it without having to ever give it a second thought and it's gone gone and not just taking up space in my trash can or recycle bin.
Perhaps someone here can enlighten me.
p.s. I'm sure I have more to say on this topic, but I really need to be getting back to work
--BEGIN SIG BLOCK--
I'd rather be trolling for goatse.cx.
Things you think are in the Constitution, but are not.
Consider the way mammilian immune systems work:
1) an immune system cell gobbles the nasty virus or microbe.
2) it chops up the viral genome into little chunks of various sizes
3) the chunks are presented for recognition.
Now adapt this to a net-based system:
1) people get spam; they forward emails they definitely consider to be spam to spamocyte.com (or whatever), where it is chopped up into various overlapping chunks and the chunks are checksummed.
2) mtas pick random chunks from each email and send them off to spamocyte.com (or a local copy/cache of the database) for checking.
Problems: 1) spammers can mutate emails via a madlibs engine much more than natural viruses can mutate their genome. 2) bandwidth issues.
Still, I think it's an approach worth considering.
"World Domination - a fun, family activity"
Check out http://www.scambusters.org/809Scam.html if you don't know what I'm talking about.
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
Most spammers use some sort of random character string in both the subject and body to get around filters that look for identicle messages being sent to the same system. I don't think checksums are going to do any better then the current filters that look for dupes. Sure, you could just look at the first, N lines, but spammers are also inserting invalid HTML tags in their messages to foil pattern matching. Since the tags are invalid, people dont see them. (considering that most people use some sort of HTML enabled mail reader)
No replies made to AC posts. Please log in.
All a spammer would have to do is add invalid HTML tags all over his/her spam. Most users use some sort of HTML based mail reader and the invalid tage would not show. Look at the HTML source of this post to see for yourself. They can even put the tags in the middle of words, to be an even bigger bastard/bitch.
No replies made to AC posts. Please log in.
I have already posted a way to get around that. Look here. For the goatsecx paranoid here is the link to cut and paste:2 47&cid=48
http://slashdot.org/comments.pl?sid=01/07/30/1444
No replies made to AC posts. Please log in.
If you're mailer is smart enough you can automatically filter by this info - such as if the subject line ends in a number, a grep something like: " [0-9]+"
;)
of course, i filter most of mine by such searching for "The Scale Moved!!"
-shpoffo
I'll point out that many people (such as a company that i have worked with) doesn't read their admin mail at all.
-shpoffo
Many spammers already put unique identifiers (sometimes wholly different gobs of text) in their spams so that they aren't easily spamcaught. Many also personalize their email so that it includes the (at least according to their records) name of the addressee.
In short, there are many types of spam that this mechanism will fail to catch today, much less if such a system becomes widespread. It's too late for such a half-hearted measure.
max
When trying to solve the issue of Spam mail, you invariably have to define Spam. Perhaps that's the real problem, or the first we have to solve... Most of us have an idea of Spam, and we can all agree that a certain e-mail IS, or IS NOT Spam. However, making a machine do something that we consider so trivial is nearly impossible.
However, there is a technology that is capable of performing this task: a neural network. Granted that setting up the input channels would be a little tricky, but once you did that, there is no end to the examples you could use to train this neural net. The net would even be able to categorize e-mails into "almost certainly Spam", "probably Spam", "probably not spam" and "almost certainly not Spam".
The prohibitive cost of such a system would actually be the hardware, since simulated neural nets require lots of FLOPS. On the other hand, you can mass produce a pre-trained neural net for relatively little. Therefore, if someone could train a net to do the job, you could sell the solution as a plug-in PCI card for a computer. Just filter all the emails through the card at the MTA level.
Perhaps I'm getting a little too carried away; does anyone know of someone who's tried applying neural nets like this?
"I have never let my schooling interfere with my education." - Mark Twain
You could fix that by checksumming individual paragraphs. If more than 95% of an email's paragraphs match the checksums of a known spam, it can safely be rejected. This will require more storage, but the processing time won't be significantly longer (the longest time is calculating the checksums, which will take the same time for individual paragraphs as for the whole message, since it's a per-character time).
You could even improve this when you've received several of the same by cross-comparing them and working out which paragraphs change and which stay the same. You could then combine the individual paragraph checksums into a single checksum, and only check that part of the message - that'll save on storage of lots of checksums.
The only trouble I can see is when this is one of those three-line ones that just says "Feeling horny? Go to here for XXX" or whatever. If those added some destination-specific heading, it would be difficult to set the filter tolerances tight enough so that genuine emails with one or two sentences that match don't get filtered.
Grab.
Eg if we assume that much of the spam problem is from open relays, then recognising that >N% of local users have gotten a message mailed through a given relay may be enough to flag it suspicious.
Doesn't help the mailing list problem of course.
I think the best anti-spam measure is simply to divide email into high quality and low quality lists based on the sender and have the user say which senders should be treated as high quality in future. If people you sent mail to were added to the high quality list by default that would take much of the work out of it. Since this way you are trying to pick out good stuff rather than remove spam, it is harder to counter.
Add to that a magic word system. Messages with the magic word in the subject are tagged as high quality. Then you can give people you really want to hear from the magic word along with your email address. Change the word regularly and old information won't come back to spam you.
_O_
_O_
.|< The named which can be named is not the true named
http://samba.anu.edu.au/rsync/
and
http://samba.anu.edu.au/rsync/tech_report/
discuss the "rolling checksum" that is used in the rsync technology - as the file is checksummed in pieces, a counter would have to be placed in each piece.
I guess this doesn't solve the problem of server resources getting stolen, but it certain saves me from having to look at the crap.
-matthew
"THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
After some hesitation, tech support informed me that they had been put on the Black Hole list because their email server had an open relay. He also told me it would take 4 to 6 weeks to fix the problem. Something about not wanting to disrupt service to their customers.
Now this is not some fly-by-night organization. I picked them a couple of years ago because I was looking for a professional hosting service, and everything about them seemed to indicate that they were one.
Anyone out there know how to close an open relay? Maybe they'll hire you.
Someone needs to collect all these ideas together and make a nice pluggable framework for it. I'm not sure how it does it, but hotmail's spam filter has stopped 100% of my spam so far, with no false positives. If they can do it, so can we.
ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
Dear jedwards, Look at this amazing
which renders checksumming the whole message a bit useless.
One thing bothers me though, as I was clearing out a large 'stuck' email for one of our dial-up customers the other day, I happened to casually mention "Wow, you sure do get alot of spam!" to which they replied "Whats that?" "You know, junk email" "Junk e-mail? I read it all" People like that are why our boxes receive such garbage. You fire enough bullets and SOMEone is going to die.
What, me worry?
Because you know the content is written in a human language, and because we know an awful lot about the nature of language, we can leverage this information to do intelligent processing on the content other than just doing a dumb-ass CRC on the byte values.
For example, tokenize the message into words, drop noise words, stem the rest, assign each an unvarying numeric value from a dictionary, histogram them, drop each extremity of relative abundance, and then checksum that. Hardly rocket science -- in fact pretty crude by text processing standards and just related as an example of the sort of things you can do to exploit linguistic characteristics. Other techniques like ngrams have a lot to offer here.
There's a world of linguistic processing techniques around, and people in this business use them every day of the week. Checksums are stone axes.
I telnet'd to a couple of addresses similar to mine. Found an open relay on port 25 of the fifth system I tried.
Lord oh lord! We are in for a heck of a time...
-Ben
I have no problem with your religion until you decide it's reason to deprive others of the truth.
*Please* choose a Bahama-$20-a-second outfit that does not itself do bulk email. Maybe there aren't any... :)
If that was the case, then wouldn't they be violating the DMCA? Then the FBI would have to go after them, right? Unless the FBI would selectivly enforce, and they wouldn't do that...
--- http://homepage.mac.com/gregjsmith
All you have to do is filter on the words "This e-mail is not spam!"
Leave it to the Slashdot crowd to make things a million times more comples than they need to be...
Help save the critically endangered Blue Iguana
The big issue is counters and other subtle changes to the emails that would destroy a naive checksum.
However multiple checksums of subsets of the email would not usually all be changed by one or a few changes/counters and checksums will be sufficiently discriminating to screen emails and can do a very good jobs of detecting any widespread junk emails.
It would be difficult that all checksums of all characters of a particular length (say 20 characters) be made sufficiently different that ALL of the subsets of the junk emails can be different.
(Checksums that checksum all the strings for a particular length are not difficult to generate as a matter of fact; little more than a circular buffer is required.)
-WolfWithoutAClause
"Gravity is only a theory, not a fact!">Aren't there algorithms that will report messages that are pretty close?
Yeah, there are. 'Rdist' does this as a way of trying to only send the minimum set of changes necessary to keep two ftp/web sites synchronised.
Actually to be precise, the checksum isn't imprecise, as rdist relies on checksums of subsets of the documents they are trying to synchronise.
This neatly sidesteps the counter issue...
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"Impossible? I don't think so. All you have to do is each time somebody receives a junk email they mark it as junk email, the mail software can calculate one checksum starting at a random place in the file, and upload it to a checksum server. For any frequently received junk email the server will fairly quickly get enough checksums that the whole document will be covered.
When anybody receives an email, they can check a handful of random checksums against the checksum server, if enough of them match, then do a few more to be sure and deal with the email according to any settings by the user.
Still, there are issues. What happens if the email marketeers start appending random web pages to their email to dilute it down? What percentage of similarity is enough? There are some fixes- I think to be successful junk mail has to be fairly short- people rarely page down to cut to the chase; but adjusting the checksum points to emphasise the beginning and end of the email is probably a good thing.
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"You are expecting an email to confirm a massive contract. I send it. Your clever-fuzzy-friendly spam checker decides that it's spam, and bins it. We both lose a lot of money.
Who do I sue?
...if it detects all those annoying "me too" messages and treats them as spam.
-elan
I submitted a story about building a steam-powered microprocessor with RAM made out of banana peels, and that didn't get posted--why this?
Where are we going and why am I in this handbasket?
This would at least stop spam from people with bogus addresses.
rm
Sci-Fi Storm
Well there is this classic from a couple years ago on Segfault:
Mafia Don Announces New Anti-Spam Venture
Posted on Fri 02 Apr 19:25:26 1999 PST
As the NSA and FBI fear, traditional crime organizations have been incorporating high-tech communication into their organizations. Although Janet Reno was quoted stating "This is law enforcement's worst nightmare.", techies around the world are sure to be pleased with one New York Syndicate's new venture.
It all started when Don Dominiqi signed onto his AOL account last Monday morning. His inbox was filled with "Make Money Fast", "Viagra On-Line", and "Teenybopper Web Sex" ads. Lost amidst the drivel was an important note detailing a non-taxed shipment of Marlboros, which were later confiscated by the BATF. Little did he know, as he shouted "Bring me the left hand of this f*cking gutterslime!" what would become of it all.
Later that same day, Billy "Run!" Brutekowski and Larry "My Eyes!" Plucker cornered the pasty-faced offender of the Family in a small cyber cafe in Grenich Village. "This was by far the creepiest place the Boss has ever sent us." stated Billy, who only spoke on condition of anonymity. "Everyone in this place looked pale and sickly, like they had already been 'spoken to'. We asked for this punk, and several people quickly pointed him out. Most of the scum we find in gin joints aren't so quick to finger one of their own," Billy continued.
"He must not watch much TV, because this sh*t didn't even flinch when we came to the corner he was hiding in," Larry proceeded to relate. "We dropped this sheet of paper the Boss had given us on his table and he says 'So you guys want to make money fast, eh?' He puts out his and says to give him $20. This scrawny little dirtball tells me to give him $20!" Larry was quite agitated at this part in his story, and his description of how Sammy Spammer's hand fell off was quite garbled.
Billy continued, "Up till now, this was a routine visit. We was just being playful. The weird sh*t began when we tried to leave." "This pimply faced kid blocks the door as we try to leave, and I'm thinking to myself 'Great, a f*cking Karate Kid hero. He just stand there, and then he hands me a $5 bill." Billy pulls out the $5, and holds it like it is his first quarter from his favorite grandmother. "They lined up after that, and we had $175 in 'tips' when we left the joint."
Later that day the Don himself visited the café, unwilling to believe the story. Although the details are unclear, sources at the café indicate that the Don has hired them to build and host a new Anti-Spam site. Through a SSL transaction system, the site will accept spam complaints and credit card donations towards 'solutions to problems'. Multiple complaints against the same spammer are added to the total until an acceptable solution has been found.
Larry tells us that a typical $250 solution is a broken hand, and for $2000 all anyone ever sees again of 'the problem' are his shoes.
The URL is to be announced next week, and the cyber café's phones have been jammed with requests for more information.
"It is a greater offense to steal men's labor, than their clothes"
Spammers need to be licensed (preferably with an ear tag, but i'll consider substitutes) and fully identified. all spam needs to have a spam license number in the header someplace.
Fees can then be and need to be collected by your favorite government agencies (I think the IRS, the NSA, and BATF will do for now). ISPs and users need to be able to bill spammers some amount for the spam processed and received. Fees need to be large enough that it is worthwile to go after them, and then we can have bounty hunters. Fees can be high enough to reduce the cost of access. Penalities for abuse can be heavy (20 years in jail, for example)
Then we can have spam hunters who will go out and collect from the spammers for you in exchange for a percentage.
"It is a greater offense to steal men's labor, than their clothes"
My cell phone offers free long distance. So I call the number on every piece of spam that I get. Mostly you get an answering machine, so I request a call back. This costs the spammers time plus hopefully a little money for the call back. Mostly they're semi-pathetic business-type people who really don't know anything about computers and are somewhat apologetic/embarrassed. I did get one asshole who hung up on me when I started asking where he got my email address from... so I called back (CallerId is great!). Anyways, call those spammers!
The intellectual property protection people have been thinking about this sort of problem for a long while now. Just as they want to be able to detect when something has been copied, the spam-haters want to detect when something is a copy. Both want to be successful in the presence of countermeasures. It's the same problem!
There's a vast amount of literature available out there. Any half-way decent search engine should throw up more than you can read in a reasonable time.
Paul
Lasciate ogne speranza, voi ch'intrate
Are you sure you work for an ISP? Obviously not in the IT department. Having your server configured to use the ORBS database, and having an open relay are TWO DIFFERENT THINGS. You can close your open-relay server, and continue to allow your users to be bombarded with e-mail. If you want to be real nice, and you have customers with virtual domains, you can run one server wide open (they get all mail+spam) and one mail server with ORDB,ORBL, or something else. you DON'T have to have an open-relay to do this. ORBS is no longer in existance btw.
Ya, this same argument is used when discussing censoring the entire internet. Ever though about running for office? Spammers aren't the only ones I blame. I run a small mail server (less than 1k messages a day), and every night I e-mail ISP's informing them of open relays, and dialup customers abusing their systems. I have received a few auto-replies, and not ONE god damn response from someone who cares. I'd like to assume that most people are way too busy fixing the problem, but the same culprits keep showing up in my mail log. When discussing legal action against spammers, I think the same legal repercussions should be directed to ISP's who don't know/care how to run a mail server.
This would be a killer application for freenet, some kind of usenet, but with moderation to filter out the trolls/spam.
I am afraid this will just be a part of the arms race with spammers.
Idea of stopping spam is pretty old, and none
really work, but a RBL type.
What would be better though, is to have
humans look over the generated list of checksums
(I prefer md5 =) ) and do the check on domain.
If its email list, place it here, if its SPAM
place it in RBL. Theres more to process of
verification, but with nice graphs it would be
easy to see who generates most.
Its not perfect solution for spam though,
anyone hoarding one ? =)
p.
One problem I can see immediately with these "blockwise" checksums is that the spammers could easily insert not only text with random content but also random length. Do any of these "pretty close" methods handle offsets appropriately as well?
Yo dawg, I heard you like the Ackermann function, so OH GOD OH GOD OH GOD
I have been getting upwards of 5-15 of those %@#*(%#@#$ "The Scale Moved" emails a day lately. They are from a company called BerryTrim. I hate them. I have a filter set up a filter in Pine to delete them before I ever can see them and still a few get through!
I HATE THEM
So, lately, I've taken to calling the 1-800 number for the BerryTrim website each day. Sometimes several times a day. I ask to talk to Customer Service and I argue with them about this Spam. They tell me that "those are our associates and we have nothing to do with them" --(sure)
I tell them I don't care and I want the email to stop.
The real reason why I am doing it is this: I want to stay on the phone with them for a while...Those 1-800 numbers are pretty pricy. I used to price out the cost of outsourcing help desks and call centers for Fortune 1000 corporations, and I can tell you that over 90% of the cost is either phone lines or getting warm bodies to sit in the chairs to take the calls.
SO... they spam us...we spam them The 800 number for BerryTrim is 1-800-401-6327
DO NOT buy anything from them. (duh)
Just stay on the line for a while (heh)
If we can bring down websites with the Slashdot effect, let's do a little group action and take out a spammer or two!
Don't just read this post and chuckle and say, "Cute idea..." PICK UP YOUR PHONE AND CALL. Its toll free and you will be helping to bankrupt a spammer!!!
Thanks!
I would have to say that explosives are the most abused technology in all of history.
I love you, A.C. I really do. I used to hate you -- usually your posts suck. But this one really made me change my mind about you!
I would have to say that explosives are the most abused technology in all of history.
While the system could be broken by using counters, this could be countered by parsing only certain portion of the mail or counting the frequency of certain words. Would work very well on pure text spam, but not on attachement stuff.
What would be funny would be to see the false positives of such a system. Many mails I get from the administration all look the same, I wonder if they would be considered as spam - they are quite similar to spam: useless and to numerous...
when somebody finaly would hack TiVo to do this with TV adds?
Does someone have a link?
-- I had a female crustacean once, but I lobster...
There's also the spam that includes customized URLs in the message (image downloads that, say, have your email address embedded in the query string -- sneaky little "live address" confirmation technique).
Liberty in your lifetime
Even the non-html spam has this pseudo-random text. I suspect they are using it to ID the spam recipient when complaints are sent to the ISP, who forwards the actual complaint without identifying the person who sent it.
All of this would tend to defeat the checksum algorithm.
I think we need a "spam tax", to be paid by the ISPs who originate the messages, and passed back to the spammers as a marked-up fee. As soon as the cost per message exceeds the cost of snail mail, the game is over. Of course, the overseas providers would not be subject to the tax -- until each government sees the gravy train and jumps onboard. It would be very easy for the ISPs to keep the entire cost fo the "spam tax" limited to the people who send the spam. The only drawback is that once the revenue from the "spam tax" dries up, the same government entities would be looking for ways to replenish the revenue stream by taxing other things on the Internet. If you view taxation as inevitable, it might as well start with the spammers.
Maybe the tax could be disguised as a "fine", similar to a speeding ticket. After all, the government would be invoking a financial penalty for unacceptable behavior, very similar to a traffic violation. Since the actual enforcement would involve complaints from the recipients, it's very much like getting caught speeding.
The greatest part of all is that no one has to enact the "spam tax" to get the benefits. Just the mere possibility of something like this would have the spam-friendly ISPs running for cover.
I think the key to the spam problem is to raise the spammer's cost. I may not have the ideal method, but I think detection or filtering is not going to get the job done -- it's all about cost.
There is already a site that provides this service: check out www.sneakemail.com. The e-mail addresses generated consist of random alphabet soup, rather than anything user-selectable (IMO this is a feature), and a decent Web interface is provided both for managing numerous aliases and configuring sender-filtering independently on each.
...when you're writing a game...tweak the difficulty of "Easy" to something [your mother] can cope with. -- onion2k
I for myself hate SPAM. I've been able to filter out around 90% of it using simple measures (like filtering out emails without or with invalid "From:" addresses, etc). Yet, the remaining 10% ones are annoying as hell.
Got to try razor myself. Thing is: This system will only work if enough people jump in. Let's see...
This solution only works if you have a unix computer delivering to your mailbox. However, if you do, it virtually eliminates spam.
The solution is to put a password system on your mailbox.
You set up filters on your email so that for the people you know and mailing lists, they can email you as before. The rest of the world will get a canned response saying that they have to send the email again with the password of 'xxxxx' in the subject.
Since spammers almost never send a valid email address, they never see this email, and so the spam disappears back into the ether, and I didn't have to use potentially expensive modem time to download the spam to check to see if I wanted it. (And I didn't have to participate in a global spam checking service that doesn't really work 100% anyway).
Once in a while (2 times in the last 6 months) you'll get a spammer stupid enough to actually resend the spam but with the password. However, since they are sending it by hand the return headers are valid! *yay* Instant retribution!
Here's the link:
http://www.uwasa.fi/~ts/info/spamfoil.html
My spam rate was 10-20 spams a day. After putting this into place, I get maybe 1 a month sneaking through on top of some other filter match.
I can't recommend this enough!
-Fred
Go, Springboard, Go!
Well, as far as I'm concerned, I would say this is only trading one problem for another.
The trafic created by these Checksum tests would be se serious problem. Say, the SPAM will still be using my link to get to may MTA. Then, I also add some more trafic to check is it's a SPAM (and will most certainly get tons of false negatives, as stated before on several posts).
Overall, this ia a very bad idea.
---
morcego
You click a button in [your favorite mail application, any platform] ; that in turns sends a simple text message, with the crc or checksum and if they match (either the client software, or the server, you choose) they show as matching, moved to a cleared folder, application dependant.
Applications can compete on how they use the results. One good idea could be to filter out non matching results, or to send them to a junk folder - or simply showing a certain icon.
The real key to the system is this: if spammers are creating a crc which is being used over and over to send to multiple clients via redirecters and other cleaver tricks, hit a button and simply vote it spam. Use a weighted system to eventually filter out the same message. But running the headers throught the checksum would stop most spammers since the TO: field would most likely change.
Simple text messaging that can be used by any programmer, and there are many non GPL, examples of how to compare two checksums.
Guessing the server would carry all the checksums, a good idea would be to add an revokation date which can be set client side either defaulted or user configurable.
Really the whole thing is simple. Just block people from mass e-mailing. Test the system for a while then add the spam blocking to see which crc's where voted spam, cross that with the volume of e-mails by that person. Although the system suddenly became huge, but off site computers could do the computing, not the servers.
E-mail is a huge thing. Linux sends e-mails to my wireless phone without any user interaction. The system better be ready for people who use e-mail like an instant message.
Now it comes to mind - if my pop server software (and maybe all isp's) would just check the crc against the server that would save everyone.
Even MS could get into the game with Hotmail and their own MS CRC server...
This is my manifesto:
Get your free hotmail address - Now with hailstorm and E-mail signing - Free (biometrics required)
----checksumurl--http://checksig.msn.com:7235----
ka;dddjdppwo3as-e34-44444uv2-84urrhpwerrupw34gdgh
4-0394uvm-03485umt5jt-5ut059u-02-95uy05u25uy5fdgh
442i0934it-09utury]==-04904g2-5t8528-b09-2ururt45
----email--checksum:--0x485ksro842---------------
Get your Unix fortune now!
Hi,
I though of this long ago, but thought that Hotmail should implement the system - they can see 10000's of mail boxes and see which messages people have deleted, which are likely to be spam by rules and who read/replied to a message. This would give a pretty accurate indication as to whether it was spam or not (e.g. contains $$$, no one replies, and most deleted without reading).
As for using checksums... it's obvious that it wouldn't work (for reasons already mentioned). Instead a system that gives some checksum whereby a message with sum 10 is very similar to message of sum 11 etc... is needed so that slight changes to the message don't make the message appear unique. I'm sure fingerprint databases must do something like this to allow fast indexing/retrieval of similar prints... Anyone know how that works.
Mike
-- Mike
I get all kinds of spam that obviously uses a program, because it says "this email was sent to: myemail@here.com"
Wouldn't that screw up the checksum? Of course they could just take say the middle 75% of each email and that would be enough and probably not run into those problems quite as often.
The problem with public versions of spam filters, is that spammers have access to the data too, and can tailor spam to pass much more quickly than you can tailor the filters to stop them.
I've wondered for quite some time why it is not a standard feature for a POP account to decline mail which is not addressed to one of a user-defined set of e-mail address regexps. The vast majority of my spam doesn't even mention my name in the address... and if it wasn't sent to me then I don't want to know! I realise this would require additional configuration for each user (particularly if you frequently join/leave mailing lists, but I for one would immediately see the benefits if I didn't need to wait downloading such obvious spam over dialup!
Yeah, I forgot about that, but my spam comes from msn.com not hotmail.com. msn.com seems to allow forged headers, while hotmail doesn't. Makes you wonder...
www.lucernesys.comHorizon: Calendar-based personal finance
I'm seriously thinking of blocking the entire msn and hotmail domains from my inbox. I don't know anyone on msn, anyway.
www.lucernesys.comHorizon: Calendar-based personal finance
OK...kid's age 12 are getting PRON mail all day long. It should first of all be illegal to send PRON mail to anyone who you have not previously verified to be over the age of 18. Since most of these mails come from morons who buy cds with your email address on them, and since they can't verify anything without first contacting you, this would stop 75% of all junk mail. The beauty is, the law wouldn't even be aimed at stopping junk mail, it would be aimed at protecting kids.
Certainly every man at his best state is but vapor
I don't think there is one, simple solution to the spam problem. This idea sounds like it will work, but as mentioned, what about mailing lists?
Okay, and what about the "counters" spammers would use? Maybe there should be a system that uses 'diff' to compare lines or something similar... mailings that have let's say 99% of the same message would be considered spam.
As far as mailing lists, why wouldn't they be able to register as legitimate mail? Like an exception list, an MTA can contact this central repository with the mail it receives, send the checksum (or what have you) and the repository would still say "yeah, it matches, but it's from a valid mailing list located in our database." Therefore, the MTA would not block it.
I think one of the better solutions for getting rid of SPAM is for ISPs to do a better job or implementing and actively enforcing their anti-spam policies. The only way to do this is to claim a $500 per e-mail check for every piece of unwanted e-mail you receive. If they give a sh*t or two, they will stop the spammer, and although maybe not give you your $500, you surely did your part to stop at least 1 spammer.
I think you need to flash your brain's firmware.
Now, the interesting thing is what I do once I've decided to filter the mail. Since my rules catch legitimate mail, I don't just throw it away. I wrote a small collection of Perl scripts (which I'll release to the world someday soon, but they need documentation) that maintain a whitelist of sender addresses.
If a filtered message is from an address that's marked valid, it's delivered. If it's from an address that's marked invalid, it's discarded. If it's from an unknown address, the message is put in a holding area and an autoreply is sent back to the sender from a magic address asking them to reply in order to validate themselves.
The magic address is unique per filtered message -- it uses qmail's address extension mechanism -- and mail to the magic address never gets delivered to me, so I don't care if it gets added to spam lists. The Perl script behind the magic address does a quick check to make sure it's not processing a bounce, then marks the sender of the original message as valid and delivers the original message (or messages if more than one arrived while awaiting validation).
Held messages are cleaned out by a cron job when they get too old.
This is sort of similar in concept to the password mechanism of SpamBouncer or (a closer cousin) SpamCop's whitelist feature, but it doesn't require senders to retransmit their messages, which I always thought was pretty annoying to ask people to do since not everyone saves their outgoing mail. Granted, asking them to do anything is kind of annoying, but at least this is less so since they can just hit "reply" and "send".
This setup is cool because it allows friends to Bcc me on stuff without my "I must be listed as a recipient" rule trashing their messages, even if they've just switched E-mail addresses. It is admittedly based on the assumption that spammers don't read replies to their mail and/or wouldn't go to the effort of unlocking themselves; I have yet to see a spammer do that, and given the economics of spamming I think that'll be a safe assumption for the foreseeable future, unless this approach gets so popular that spammers start writing automated unlock bots!
A good bitwise (or symbolwise) measure of distance between two sequences is the Hamming distance, which is the number of different symbols between the two sequences. A simple checksum will basically tell you whether the Hamming distance is zero (same checksum) or nonzero (different checksum).
I'm sure it's possible to generalize the concept. I'm not aware of any specific work, but a simple solution would be something like a blockwise checksum. If enough blocks match up, it could raise a flag indicating the presence of possible spam. Ideally the blocks would be large enough that the concatenated checksum is short, but short enough that differences are easily captured.
You could try a keyword search for "error detection" or "checksums" using a publication search engine like Citeseer, or INSPEC if you have access through work or school.
Toronto-area transit rider? Rate your ride.
If so someone ought to help the author figure out how to target only the correct kind of files...
Hey!!! the parentheses are good for something
I work for an ISP. We run an open relay server. Know why? I tried subscribing to ORBS once. The next day I got about 20 calls from people saying that they couldn't receive emails from their clients. ("something about blacklisted? what is that?") I can't deny mail from open relay servers, so I can't stop spamming because my customers are too stupid.
"I do not fear computers. I fear lack of them." -Isaac Asimov
What we need then is a hash mechanism that is resistant to minor mechanical changes. Watermarking technologies (attempt to) do this for much harder things like sound and images. I'm sure it has been done for text.
Then to extend the concept we need to assign a probability to the spam. Perhaps your mail client would then sort it and show it to you colored yellow for probably spam so you could more easily delete it.
Pat Niemeyer
Author of Learning Java, O'Reilly & Associates
Can't stop em but for a while... and they're absolutely abnoxious about it. (well... script kiddies are. No offence to you bright and ethical)
Screw 3...
An idea similar to this could and should be tried to bring the USENET back into the hands of masses. Having some sort of k5 style moderation used on USENET message id could potentially end spam as we know it. The simplest appriach would be to have a few groups fo competing "moderation" servers that you could query and rate messages by thier message id and then build in some client plugins to filter based on a given threshhold. Of course to really get the system to work, some thought would have to be put into authentication (say only 5 moderations allowed per IP per day, or even have an actualy login proccess to moderate) to keep spammers from moderating up thier own posts. If we have a loose network of many of these moderation servers, they all use different ways to pick out the good posts and user preference would dictate which system works best.
Anyway, just my 2 cents...
"Your superior intellect is no match for our puny weapons!"
Is there some hidden reason why we would want millions of copies of an email worm's attachment to get through? This could actually be part of the solution to two problems.
Also, do note that a common method of spamming is to connect to an open relay and have the relay take care of sending out thousands of identical messages by simply sending thousands of "RCPT TO:" commands. Checksumming spam would completely break this spamming method and would force the spammer to retransmit the entire message for every recipient in order to vary it, thus making the process more costly.
-all dead homiez
--
Aaron J. Shaver
http://aaronshaver.com/
... perhaps the best one I've heard of so far in this discussion. If I had mod points I'd give 'em to you.
501 Not Implemented
I ran some basic design concepts on this idea a few years back (nothing as sophisticated as DCC). I came to the same conclusion as the other readers, re: Countermeasures: the spammer would integrate something random into the message that would foul identification. There is simply no way around this. So the question becomes: at what point does the countermeasure become so expensive and difficult that the spam itself reaches the point of diminishing returns? Or, put another way, what can we track that would make the message so difficult to cloak that it wouldn't be worth it to do?
The cloak would have to be human-labour intensive, so it has to relate to the meaning of the text itself. I came up with a few variations, but in my own little thought-world, the most dependable signature for a spam was a key composed of the grammatical types of each word in the email. Chaff, or non-identifyable text would be ignored. With this system, even the words could be randomly generated (Get {rich, wealthy, affluent} and the signature would remain the same. How unique would the key be? I never did serious research, but it seems like it would be.
The major problem I encountered is that once this was done, the spam generator could then rotate the order of the sentences, or drop non-essential sentences altogether. You could make the key non-order dependent, but that would drastically reduce the uniqueness of the key...anyhow, the similarity index identified in this thread is a blazingly simple idea that somehow escaped me. Maybe it's time to dust of the docs...
If your bitterest enemies are people who hack the heads off civilians, then I would say you're doing something right.
I mean, polymorphic software is something we all want to see, and if the mass mailers are the only ones who are going to develop it, let them.
Let's hear it for polymorphic spam!
It seems that Brazilians have fairly recently deluded themselves into thinking that SPAMming is some sort sort of credible mass-marketing technique.
In the last couple of months, the amount of SPAM in my in-box seems to have increased by an order of magnitude. The interesting part is that Im Pretty Darn Sure (tm) that my bleeding ISP sold my address to the Spam CD makers, as I only use an alias and not my ISPs domain, and yet most SPAM arrives at both addresses.
The Dumbest of All Spammers has to be the citizen the recently sent me six copies of his Curriculum Vitae, claiming to be Support Analyst and looking for work! I tried to point out his stupidity, but his ISP account had already been blocked by the time my message got to him.
Im not normally in favour of the death penalty, but am prepared to make an exception in the case of spammers...
On the server side this may be more practical, but I don't want my mail server to delete any mail I get, just because others got the same.
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
Has anyone thought about an x86 port?
~xxxxx
For my own account, I add everyone whose email I am willing to read to my address book, then check all incoming mail's 'from' line against that list. If there's a match, it goes to my inbox, otherwise to a folder called 'Unknown Senders'.
If I'm in a hurry, I just read mail in my inbox. If I've got a few minutes, or I'm expecting mail from someone new, I look at the mail in unknown senders.
I've noticed the same spammers hit me again and again, but that just makes them easier to spot in a big list because the look the same.
Granted, this treats the symptoms and not the cause, but it can offer some soothing relief for bloated mail accounts until a cure is found.
I had a similar idea and wrote up some analysis which details how this can be made robust.