Slashdot Mirror


Distributed Checksum Clearinghouse vs Spam

AllSpammedOut writes: "Spam could be more easily detected if everyone were to compare the mail messages they received. Using the Distributed Checksum Clearinghouse, MTAs can report the checksums for all messages they receive and be notified when a checksum has already been reported by many other systems." Obviously there are issues with something like this (especially mailing lists, and worms that do attachments). I suspect spammers would just include a counter to break checksums tho."

216 comments

  1. The Coward's List.. by Anonymous Coward · · Score: 1

    ...of Email The Coward Can Do Without:

    1. Any email in big5.
    2. Any email which is from .kr, .cn. or .tw (And many might add .ru too)
    3. Any email in HTML.
    4. Any email in script.
    5. Any email that mentions a long dead House Bill in the body.
    6. Any email that mentions sex or sexual items in the subject.
    7. With specific exceptions for subscribed mailing lists, email that isn't for (example:)
    thecoward@thecoward.tld .
    8. Any email that has that crap bit about saving trees in the body.
    9. Any email that has many variants of (example) thecoward@ in the To: line.
    Chances are anything to 'thecoward' and 'thecowherd' and others isn't worth anyone's time.
    10. Any email that somewhere proclaims "This is not spam."

    Looking at the To: header and at the content seems more workable than merely using the subject line, though that level would at least skim off the less creative crud. No, The List is not complete.
    --
    The Coward

  2. at least hackers are smart by Anonymous Coward · · Score: 2

    so why don't we have a spammers vs. hackers war? they could fight over who's the most annoying, winner take all. spammers spam the crap outta hackers sites and mailboxes, while hackers launch DOS attacks on the spammers service provider. it might just keep both sides busy enough to buy the rest of us a litlte piece and quiet.

  3. Re:Worms? by Anonymous Coward · · Score: 2

    This is true. It is claimed that over 90% of spam is sent through open relays, meaning that the spammer uses multiple RCPT TO commands and sends the identical message to each recipient. Most spammers don't have the bandwidth that it takes to send each user a personalized message, because they are almost always on a throwaway dialup. Only the professionals can afford to send unique messages, because they often have a DSL line and a pink contract with their ISP (which permits them to continue spamming).

  4. Re:Just Because they would counter it. by hwolfe · · Score: 1

    A lot of spam I get already has a unique identifing ID included in it. I assume this it to track valid e-mail addresses of people stupid enough to try to be "removed" from their lists.

    No, that's what's called a hashbuster. It's used to counter the mailer software that checks outgoing messages in an attempt to prevent spam.

    Some spammers, in particular the ones who turn around and resell e-mail addresses, use the removal drop-boxes to validate addresses, and/or remove addresses that are bounced to Errors-to: drop-boxes.

  5. Relevant but somewhat off-topic question by Have+Blue · · Score: 4

    Why do open relays exist? Is there some beneficial use for them that I'm not aware of? Is this a relay's default state and the sysadmin is too busy or dumb to lock it down? Why doesn't everyone just secure their mail servers and cut off spam before it gets out?

    1. Re:Relevant but somewhat off-topic question by extra88 · · Score: 1

      Until last year, there was a lab around here running their own mail server on a NeXT cube running sendmail 1.0! While the length of its tour of duty was impressive, once the spammers found it, it was all over. There was no one left to support it (if there ever really was) so they just took it offline and used a different machine to redirect the old addresses.

    2. Re:Relevant but somewhat off-topic question by syntax · · Score: 1

      It exists, its called ORBS.

    3. Re:Relevant but somewhat off-topic question by Skapare · · Score: 2

      Also, in countries like China, which are currently booming in regard to new businesses going online, there is a very common usage of pirated copies of older versions of Microsoft Exchange which did not have the capability to stop spam, or have it disabled by default. Not being licensed copies they don't get the latest patches. And they usually don't even have a sysadmin, or if they do, it's one who is incompetent or one who can't read English. Unfortunately, most of the help to close relays is primarily in English. This is bad as English is not really so universal as Yanks and Brits might like to think. Translations to all languages is needed.

      Spammers cost money to those who get spammed. Pushing the cost back to spammers and the ISP who (perhaps through inept management) support them, is one way to stop them. Laws will not since this is an international thing.

      --
      now we need to go OSS in diesel cars
    4. Re:Relevant but somewhat off-topic question by Skapare · · Score: 3

      A network of authenticated mail servers could be very useful. But the effectiveness would be limited unless entry to the network requires agreement to terms to apply strong enforcement against spam, such as:

      • Limit each dynamic IP host to not more than 1 email message every 2 minutes.
      • Require dedicated network owners to agree to the same anti-spam agreement in writing to be allowed access to port 25 outbound or to access unthrottled mail servers.
      • Require legitimate bulk mailers to agree to certain terms such as using only opt-in lists even though the law otherwise permits them to use an opt-out list.
      • Must provide a contact address and/or telephone number for reporting abuse. Abuse reports from the general public must have a human response within 24 hours. Abuse reports from a member administrator/manager/engineer must have a human response within 2 hours.
      --
      now we need to go OSS in diesel cars
    5. Re:Relevant but somewhat off-topic question by MindStalker · · Score: 2

      The best thing I can bind is a program called blackhole http://freshmeat.net/projects/blackhole/
      This can do a bounce back on spam saying that your user doesn't exist. This is for linux, I couldn't find any windows applications that could do this.

    6. Re:Relevant but somewhat off-topic question by mpe · · Score: 2

      Why do open relays exist? Is there some beneficial use for them that I'm not aware of?

      A certain set of software requires a third party relay to work at all. It's quite possible for those setting up such relays to create an open relaying situation (especially with complex networks.)

    7. Re:Relevant but somewhat off-topic question by mpe · · Score: 2

      Open relays mainly exist because of legacy. Once upon a time we needed them, because most systems weren't connected 24/7

      How often was SMTP over UUCP (and the like) used anyway

      That changed once TCP/IP became the norm, but relays were still necessary for the transition phase.

      MX records came into existance in the late 1980s...

      Even today, there are still people who's mailboxes aren't connected 24/7 that require a relay service, though they are definitely a minority.

      What they actually need is one or more (off site) secondary MX records.
      Which is totally transparent to any MTA which follows the spec.

      A depressing number of sites require that email come from the "correct" IP address (your From: address must have the same MX record as your IP address) which means your ISP must maintain a relay for your use, though it doesn't have to be an "open".BR>
      This is mixing up two things. The first is something like the DUL which requires use of an ISP provided third party relay. The second is ISP provided relays having restrictions on what they will relay based on the MAIL FROM: command.
      The actual major reason ISPs provide third party relays is that software such as Netscape Communicator and Outlook Express simply won't work without one.

      With most ISPs, it's easy to bipass relays and send email directly to port 25 on the target machine, so blocking open relays wouldn't help much, it would just push the problem back one step.

      Actually it helps a lot. A problem with all relays is that they can be used in the mode of send one message and a list of recipients and the relay machine will do the work of sending out N copies. Remove all relays and the spammer has to actually send ever message themselves.

    8. Re:Relevant but somewhat off-topic question by mpe · · Score: 2

      Limit each dynamic IP host to not more than 1 email message every 2 minutes.

      Requires a rather algorithm to work this out. Also it would cause problems with machines on a dialup running proper MTAs attempting to process their mail queue on connection.
      A simpler method would be to start dropping packets at random if all (or more than a certain portion) of the traffic from an IP address consists of outgoing TCP connections to port 25.
      The only thing which needs examining is IP and TCP headers.

    9. Re:Relevant but somewhat off-topic question by gorilla · · Score: 2

      Yes, but this is still symptomatic of the original problem - originally SMTP servers normally acted as relays, and it's only the more recent versions which don't by default.

    10. Re:Relevant but somewhat off-topic question by gorilla · · Score: 3
      They exist because up until the early 90's, almost all SMTP servers were open relays. It wasn't until spam started that the MTA authors started putting in anti-relay code, and people started installing the new versions.

      Unfortunatly, there are always systems where the sysadmin hasn't updated for years, because it's not causing him any problems.

    11. Re:Relevant but somewhat off-topic question by kimihia · · Score: 1
      there are always systems where the sysadmin hasn't updated for years

      Not always. I've come across open relays that are running fairly recent software. Witness the following (it is a genuine message):

      Received: from fodge.net (xxx [xxx]) by ldserver.liandung.com.tw with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.1960.3 ) id M92JH4VS; Thu, 19 Jul 2001 17:11:49 +0800
    12. Re:Relevant but somewhat off-topic question by TheCarp · · Score: 2

      Furthermore, it was considered good "netiquette" to have your relays be open to the world. It simplified things. MTA gets a message, it sees that its not a local delivery, so it is nice and tries to forward it to the right place.

      Who ever thought people would ABUSE this sort of stuff?

      Hell, at one point it was an accepted practice of being a good net citizen to have guest accounts on your machines too.

      These are, of course, all legacy attitudes. Sorry to see them go, of course. Would be great to live in that world, wouldn't it?

      -Steve

      --
      "I opened my eyes, and everything went dark again"
    13. Re:Relevant but somewhat off-topic question by myov · · Score: 1
      Authentication would also help. Open relays could still relay from anywhere, but only if the user is authorized. If the user sends spam, they lose their account.

      If some sort of header was included, it would make it easier to track down the sender.

      --
      I use Macs to up my productivity, so up yours Microsoft!
    14. Re:Relevant but somewhat off-topic question by codeguy007 · · Score: 1
      • Limit each dynamic IP host to not more than 1 email message every 2 minutes.
        • Many non spammers including myself send more than 1 email message within a 2 minute period. I play a play by mail game and it is nothing for me to send 3 or 4 short responses within 2 minutes of each other. Also many people run small mail-lists of friends where they end up sending multiple email at once. I seen lists like this at 40 or 50 people and it's not spam.
      • Require dedicated network owners to agree to the same anti-spam agreement in writing to be allowed access to port 25 outbound or to access unthrottled mail servers.
        • Fine but if you use RBMS, domains that allow spamming will get black balled already.
      • Require legitimate bulk mailers to agree to certain terms such as using only opt-in lists even though the law otherwise permits them to use an opt-out list.
        • This is a good idea but impossible to enforce. The internet deals with too many legal entities and they would all have to enforce this for it to work.
      • Must provide a contact address and/or telephone number for reporting abuse. Abuse reports from the general public must have a human response within 24 hours. Abuse reports from a member administrator/manager/engineer must have a human response within 2 hours.
        • See previous note. Same applies.
    15. Re:Relevant but somewhat off-topic question by AnotherBlackHat · · Score: 2
      Open relays mainly exist because of legacy. Once upon a time we needed them, because most systems weren't connected 24/7, and just routing traffic was a major issue. That changed once TCP/IP became the norm, but relays were still necessary for the transition phase. Even today, there are still people who's mailboxes aren't connected 24/7 that require a relay service, though they are definitely a minority.

      Sadly, relays are still needed today because of spam blockers. A depressing number of sites require that email come from the "correct" IP address (your From: address must have the same MX record as your IP address) which means your ISP must maintain a relay for your use, though it doesn't have to be an "open" relay.

      With most ISPs, it's easy to bipass relays and send email directly to port 25 on the target machine, so blocking open relays wouldn't help much, it would just push the problem back one step.

    16. Re:Relevant but somewhat off-topic question by Newtonian_p · · Score: 2

      Actually, if you send mail through your own mail server it goes through port 25. Nowadays, many ISPs block port 25 to prevent spamming and/or to force their subscriber to use their STMP server.

      --

      There are 2 kinds of people in this world: Those who write in decimal and those who don't

    17. Re:Relevant but somewhat off-topic question by masoncooper · · Score: 1

      Or are there not any filters that ISP's can place on incoming messages that limit a specific sender from sending so many e-mails to the IPS's users per day. Maybe bouncing back any amount over 10-15 saying that if they'd like to send more they need to contact them for exclusion(thus requiring effort)?

      ...also, anyone know of mail client that can bounce back messages as user unknown, that seems to me the best way to be removed from spam...it seems to me that >send message, wait, no returned message=good addy versus send message, wait, bounced-user not found=remove from list(hopefully)

    18. Re:Relevant but somewhat off-topic question by denis-The-menace · · Score: 1

      Good idea, Why not also create a bot to scan the internet to detect open relays, make a black list of IP addresses and block them.
      Then if a spammer complains, too bad!
      If its lazy admin's system, the BIG BOSS will fired his lazy ass when the company's email gets bounced everywhere!

      Not perfect but make the spamming business more expensive (Changing ISP all the time)and less effective (cus' spam would get to fewer places).

      --
      Obama's legacy: (N)othing (S)ecure (A)nywhere and (T)error (S)imulation (A)dministration
  6. Re:I can't see this working by lazarus · · Score: 1
    Of course, with your background this would be a "fun" project to try. There would be logistical problems - the prog. would have to be run on your mail server (it would be hellish to send the entire contents of every email to a special nnet server somewhere to ask if it was spam. You would probably want to install a pre-trained one on your own mail server and go from there.

    Seriously though - 90% of all the slashdot posts here are "wouldn't my email address break this?" or some variant thereof. Sure if the programmers who built it were really really stupid. Do this instead:

    • Strip the headers (all you have left is the body)
    • Remove all blank lines (not carriage returns)
    • Remove the top 5 and bottom 5 lines
    • Checksum
    Bulk emailers (the software) don't want to be adding random words or characters within the body of a message -- too much processing for something you're doing 500,000 times.... Pretty tough to do with changing content anyway (very difficult to make it work in a generic fashion).

    Of course the original article alluded to this:
    ...the main DCC checksum is fuzzy and ignores various aspects of messages. But slashdot readers don't read the articles in much the same way the moderators don't read the postings... :-)

    --
    I am not interested in articles about life extension advancements.
  7. Re:What's the big deal? by Dr.+Evil · · Score: 2

    Hello "Don't Spam JeffSketch's hotmail address", what's that address? JeffSketch@hot... hmmm something.com... JefSkatch@hotmail.com? no... that's not it. I wonder why it would be so dangerous to post an email address on a web forum.

    Maybe I should forward you the contents of my Hotmail account. It is up to 540 pieces of filtered spam. Only about 50% of my spam gets successfully blocked. This renders my occasional-use Hotmail account nearly useless.

    But wait, that's a free account. I guess that means that nobody is paying for it. Neither in my time nor Microsoft's money.

    Alas dear troll, if indeed you were not afraid of spam you would not be hiding your email address at all.

  8. Counters. by rew · · Score: 1

    I can show many spams that have a counter in the subject.

    Slashdot won't allow me to post the comment if I quote them.

  9. Verify email addresses? by ocie · · Score: 1

    It seems to me that if there were a way to verify an email address with an ISP as legitimate, you could at least use this to filter out spam from addresses that you couldn't reply to. Could be a bit of a problem as those that first switch over to verifying email addresses would be the first targets for spam.

    This of course leaves yahoo mail, and other services where one can sign up for a valid email address online. But it seems that those services could implement some scheme whereby you can only send 50 pieces of email in the first two weeks of having your account. Does this sound workable?

    --
    JET Program: see Japan, meet intere
  10. Say, I Recognize This by waldoj · · Score: 2

    This certainly looks familiar.

    ;)

    No, I did propose something along these lines on Advogato back in February in a piece entitled "Realtime Worm Filtering System," but I'm not accusing the author of ripping off my blatently-obvious and not-uncommon idea. That system is intended to stop worms, obviously, and not spam. Worms tend to be easier to stop because they're seldom wholly polymorphic, often retaining enough similarities that collaborative filtering is quite feasible.

    -Waldo

  11. Re:Checksums? by Si · · Score: 1

    This sounds like a terrible plan. As mentioned, a simple counter would blow this thing out immediately.

    Remove all digits? Although, if the spammers got smart and used hexadecimal or alphanumeric counters then you're stuffed.

    ...fuzzy filters...

    Now you're talking. Simply do a word count for 'Make' and 'money' and 'fast' and '!!!!' and use that as your spam baseline ;)

    ...or 'See', 'Natalie', 'Portman', 'naked' :)


    --


    Why is it that many people who claim to support standards have such atrocious spelling and grammar?
  12. Re:Checksums? by Pig+Hogger · · Score: 2
    However, a number the represented how closely related an incoming email and a known spam message would be a useful metric.
    Not really. You could break each SPAM in 3 to 5 parts, and have a checksum on each part. Unless the "counter" spans two parts, only one of the checksums would be different.

    And, if so, with cheap storage, why not store the whole SPAM; in case of a high number of checksum matches, a final precide double-check could be made.

    --

  13. Re:"Pretty close" checksums? by Pig+Hogger · · Score: 2
  14. Re:Add invalid HTML tags by Black+Perl · · Score: 1
    So, add open and close tags at random

    e.g. "This is not spam"

    So, strip html before performing word/phrase analysis.
    --
    bp
  15. Re:What's the big deal? by Nickbot · · Score: 1

    So, which spamhaus do you work for?

    --
    Praise the Force Field! Praise the Laser Project! Slackware Loon #19830573
  16. Re:Add invalid HTML tags by emj · · Score: 1

    Checks like this is usually done by the MTA, as with RBL you can just add a warning that this might be spam to the headers of the mails..

  17. Re:What's the big deal? by Eivind · · Score: 2

    Actually, 5K times 10 million is 50 gigabytes, not 50 megabytes. So it's a lot worse than you state above.

  18. Re:What's the big deal? by Eivind · · Score: 2
    $10 a GB is ridiculously low for most people on the Internet. It's possibly true for those with a flat-rate high-bandwith connection, but if you think that's the majority, then you're up for a surprise.

    Here in Norway for example, which is probably about representative, about half the people dial into the Internet with modems, or by ISDN. Flat rate on telephone-calls is uncommon, the vast majority of that half pay about $1 an hour for the connection to the net. That works out as $50 a GB for those on ISDN, and $66 a GB for those on modem.

    Even this estimate still assumes that the link is perfectly full, that is, that a person with a ISDN-connection downloads email at a rate of 64kbps, which isn't nessecarily true. (allthough it should be close for your ISP's local mailserver)

  19. what about diff? by PotatoNO · · Score: 1
    I think that something like diff would lend it self better to spam detection.

    People forward their spam to a database. The database searches for similar entries using diff or keyword searches. Once the database gets two or three variants of a single piece of spam it should be able to come up with a pattern match. Sure it'd be CPU intensive, but someone clever could distribute it. It'd end up being kind of like a virus scanner.

  20. This already does not work by Skapare · · Score: 1

    Many spammers are already including the following tricks (I've seen them all):

    • Inserting your email address in the message body.
    • Inserting a sequence number or random hash.
    • Varying the message body content in many ways.

    While not all spammers are doing this, yet, that some are indicates that newer spamware has this capability. Spammers are already aware of the increased bandwidths they have and taking advantage of that to personalize the messages in some way. For example the spam I get to help me enter my website (which I get many times for each of my domains) on search engines generally lists the name of my site in the message body. This is a technique that might have worked 3 years ago, but it is not as effective now, and looks like it will be ineffective within a few months of broad use.

    --
    now we need to go OSS in diesel cars
  21. Re:Brand new spam filtering technology! by Skapare · · Score: 1

    That's old technology. Obviously it shows how inept you are. I've already had to deal with a spam attack on a server just this morning where even the rejected attempts (2-3 per second!) were slowing it down. Then even with an ipfilter they are still SYN pounding it. Your delete button doesn't solve the problem. You're years behind what's even going on. But maybe you can learn new stuff when you finally grow up to college age (if you can pass your exams).

    --
    now we need to go OSS in diesel cars
  22. Re:What's the big deal? by Skapare · · Score: 2

    Paper spam has never been as significant a problem as electronic spam, because the sender pays most of the costs for paper spam whereas the receiver pays most of the costs for electronic spam. There is an economic throttle for the sender of paper spam. If we allow electronic spam to simply continue, it will scale up as most businesses would then perceive it to be legitimate. You'd end up having to delete thousands and tens of thousands per day. It would keep growing if there is the perception that it is legitimate and that it cost you nothing to delete.

    Electronic spam does cost the receiver time and money. This includes the receiver's ISP. If you are on a dialup line (as most people still are because of the DSL debacle) the spam takes up more time on your mail downloads. As the problem grows it takes more time.

    To sum it up, it might not appear to be that much of a problem for you at this moment, but if you scale it up to where it would be if no effort was made to stop it, you would not be able to handle the load. Some of us do understand the scaling issue. If every business in the world sent you ONE message PER YEAR, and somehow this were just evenly spread out in time, you would be deleting this crap every 2 to 3 seconds, 24 hours a day, 7 days a week, all year long. The scale of the internet is simply not suited for spam.

    If you really have to get back to work, what do you do? Do you send spam all day, or do you delete it? Or do you just not get much of it?

    --
    now we need to go OSS in diesel cars
  23. Re:Just use mail filters by Skapare · · Score: 3

    Show me one that works on my mail server without overloading it. Mail comes in at a rate of about 20 per second. It will need to check it all. If you think the problem is solved at the client, you misunderstand the problem.

    --
    now we need to go OSS in diesel cars
  24. Re:Spam Hunters by sharkey · · Score: 2

    New this fall on FOX:

    Lorenzo Lamas stars in e-Renegade!

    Reno Raines is back! After being forced at gunpoint to break RSA's strongest encryption while getting a blow-job, Reno is wanted by the Financial Businessmen Incorporated, the FBI, for violation of the DMCA! On the run from bought-and-paid-for law enforcement, Reno has changed his identity and now works for his Native American friend, Robbie Spamkiller.

    Chasing down unlicensed spammers, Reno searches for the evidence that will clear his name, bring justice to those who "blew" his career and reputation, and let him marry Robbie's sister, Cheyenne "Shy" Phillipshead.

    --

    --

    --
    "Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
  25. Phone #==goatse.cx by sharkey · · Score: 2

    (317) 872-2225

    This is Customer Service for Comcast Cable in Indianapolis. I would guess it's as close as you can come on the phone.

    --

    --

    --
    "Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
  26. The way I heard it by CaptainSuperBoy · · Score: 2
    The way I heard it, they trained it using pictures of tanks, and pictures that weren't tanks. Of course, the pictures of tanks were taken in broad daylight, while the control group pictures were taken later the same day, when it wasn't as bright.

    Who knows if this actually happened.. It's really too bad that AI professors can't get their own material. I'm sure EVERY compsci student who took a software engineering class heard the anecdote about the computer-controlled radiation/x-ray machine, that killed a patient by giving them like 10,000 times the normal dose. This error was traced to a lack of bounds checking in software.

    --

    1. Re:The way I heard it by DNS-and-BIND · · Score: 1
      Yup, it's true. It didn't just happen one time...

      Reference here with the priceless quote, "There was only one person programming the code for this system and he largely did all the testing". A chilling excerpt from the URL:

      "A month later at the same hospital, with the same technician another fatal dosage was given. The technician made the same error of quickly changing the mode from X-ray mode to Electron mode using the 'cursor up' key. This again caused "Malfunction 54". The patient this time was receiving treatment on his face. When the overdose was administered he yelled and then began to moan. The audio equipment was working this time but the initial dose was too much for the man. He received severe neurological damage, fell into a coma and died only 3 weeks later.

      Another reference here.

      --
      Shutting down free speech with violence isn't fighting fascism. It IS fascism!
  27. My Life as a Spammer by cpeterso · · Score: 1


    I would love to read an interview with a spammer about their business. Obviously, spam works or they wouldn't do it. Who are their customers? How many people respond to their spam? How did they get involved in the spam market? Maybe Slashdot should have an "Ask a Spammer" interview??

    1. Re:My Life as a Spammer by crucini · · Score: 2

      I think you want Behind Enemy Lines.

  28. Spamblocker, anyone? by nion · · Score: 1

    I don't know about anyone else here, but I use the Spambouncer procmail filter to ferret out my inbox. It checks MAPS, ORBS, parses the message for obvious 'spam'-type words and phrases (Make Money Fast!!!) and then allows you to either route the mail to /dev/null, bounce it, report it or both bounce and report... Not too hard to configure for your individual users or on a global system-level either.

    --
    der dee der.
  29. Re:Just Because they would counter it. by Neon+Spiral+Injector · · Score: 2

    Actually they are already countering it without even knowing about it.

    A lot of spam I get already has a unique identifing ID included in it. I assume this it to track valid e-mail addresses of people stupid enough to try to be "removed" from their lists.

    --

  30. Re:Checksums? by Neon+Spiral+Injector · · Score: 2

    However, a number the represented how closely related an incoming email and a known spam message would be a useful metric. Then you could have fuzzy filters that determined how close you would want to be before outright rejecting a similar message, or maybe just relocating it to a seperate inbox.

    Well with a CRC I guess a slighly changed message will only have a slightly different checksum. But there is a good chance that 2 dissimlar messages will have the same sum. You'd need something like a large md5 sum to make sure your false positives are low. But the problem with md5 is just changing 1 byte largely effects the sum. So there would be no fuzzy matchting.

    --

  31. Re:I can't see this working by x+mani+x · · Score: 2

    that's funny, my first AI prof told us the exact same anecdote. It seems to be pretty popular in AI circles, as I've seen it on several machine learning websites as well. :)

  32. I can't see this working by x+mani+x · · Score: 5

    Checksums do not change gracefully given different inputs. As in, if there's the slightest change in a spam email, let's say the date and sendto in the email header change, the entire checksum will appear completely different. Therefore the checksums will only apply to specific spam messages, and not entire classes of similar spam emails (this would be the desirable solution). And most spam mails these days are smart enough to put your name or something in the email subject and body.

    A more robust method of spam detection, IMHO, would be to develop an algorithm that would take emails, and encode them in a way that they could be input to a neural network. the output of the network would be 0=not spam/1=spam ... there's definately enough examples out there for it to learn from. The hardest part, as usual, would be to find a way to encode the emails. So let's say you receive an email. Your client then encodes it, and sends the encoding to a local or remote server with the trained neural net. It returns with the results, and your client either dumps the email to your inbox or your spam folder.

    If anyone with some machine learning experience wants to work on a project like this with me, send me an email!

    1. Re:I can't see this working by IIH · · Score: 2
      Your client then encodes it, and sends the encoding to a local or remote server with the trained neural net. It returns with the results, and your client either dumps the email to your inbox or your spam folder

      You'd have to also ensure that ISP's are clued up enough to turn off this feature for abuse@ mailboxes for obvious reasons!
      --

      --
      Exigo spamos et dona ferentes
    2. Re:I can't see this working by hal200 · · Score: 2

      Actually, I've been slowly plunking away at a spam recognition tool based on Thomas Landauer's work on Latent Semantic Analysis. (Try http://lsa.colorado.edu)

      I attended a talk Dr.Landauer did on it a couple years ago, and one of the more interesting uses for the system is text categorization (They were using it to mark term papers...this paper is similar to an A paper...this paper is similar to a D, that sort of thing...they actually got a fairly high correlation with human markers)

      Anyway, I started to wonder if it could be applied to spam hunting...Should I ever get the system to a useable state ('training' the system requires some rather large matrix manipulations...and my poor dual Celeron just couldn't handle it...27-42 hours worth of processing time on the small samples I was working with for a term paper at the time)

      The fact that I've upgraded to a significantly faster machine since then, and if I were to take some time to optimize the code, I might be able to get down to the point where I could start training it on my ever-growing "Library Of Spam".

      Of course, I'm probably one of the few ppl on the planet who actually COLLECTS spam...and my friends tell me I need a gf! ;)

      Anyway, at the point it's at now, it's still at just a 'hey, wouldn't it be neat if' stage...I honestly haven't a clue how well it will work...

      Who knows? The analysis might make an interesting master's thesis some day...It would certainly be handy to have a research-class number cruncher to handle the matrices involved...

      --

      I just want to take over the world...Why does that automatically make me EVIL?

    3. Re:I can't see this working by 11223 · · Score: 4
      A neural net anecdote from a teacher of mine:

      A few years ago, during the big push for a "smart army", millions of dollars were poared into having individual tanks recognize enemy tanks on the battlefield. Well, it turns out they did it with a neural network, and after quite a bit of training they got it to reliably recognize enemy tanks as such.

      Then, the eventual day when the general shows up arrived, and they had to give the demo. As you can probably predict, it crashed and burned. Why? Well, the system was trained on bright, sunny days in the middle of the desert (real sun!), and the demo was on the first overcast day in a year, and the neural net had trained itself to recognize the *shadow* of a tank, not the tank itself.

      Caveat neural-net-user.

    4. Re:I can't see this working by Sven+Tuerpe · · Score: 3
      Checksums do not change gracefully given different inputs.

      It depends. If we think of cryptographic hash functions, you are right. They are designed that way in order to avoid collisions and forging of messages that are mapped to a given value by a particular function.

      But if we think of error correcting codes, the situation is different. They are designed with the opposite goal in mind -- changing gracefully when certain errors (i.e., small changes for some definition of "small") occur, to allow for reconstruction of the original data.

      Ususally both the checksum and the corrupted data (or the corrupted data + checksum string, to be precise) is needed in the case of error correcting codes. But perhaps concepts from both -- closely related -- fields could be combined to create something usable for spam detection under hostile spammer conditions?

      --
      http://erichsieht.wordpress.com/category/english/
    5. Re:I can't see this working by paranoidia · · Score: 1

      Neural networks really need a good dataset to learn off of. This dataset has to have variables, some important, and some un-important. The important ones are then calculated out to see how important they are, and then put into the final equation. The spam e-mail could not be put into variables very easily. Any variables that we could think up (i.e. count of certain words) could be easily bypassed by the spammers.

  33. Checksums? by Matt2000 · · Score: 4


    This sounds like a terrible plan. As mentioned, a simple counter would blow this thing out immediately.

    However, a number the represented how closely related an incoming email and a known spam message would be a useful metric. Then you could have fuzzy filters that determined how close you would want to be before outright rejecting a similar message, or maybe just relocating it to a seperate inbox.

    --

    1. Re:Checksums? by mpe · · Score: 2

      This sounds like a terrible plan. As mentioned, a simple counter would blow this thing out immediately.

      You'd effectivly be forcing the spammer to send every email. i.e. they could no longer rely on simply feeding a relay machine a string of RCPT TO commands.
      Thus spamming becomes far more difficult.

    2. Re:Checksums? by mpe · · Score: 2

      I believe Ron Rivest had an idea about how to handle spam: make anyone who sends email to you perform a small computational task in order for the message to get through. The task would be something like factoring an N-bit number, with N tweaked to adjust the difficulty.

      An alternative would be to send everything with public key encryption. Though you'd need to devise a DNS like mechanism for distributing public keys. (You also want to cut out as much relaying as possible, since a third party relay will never have access to the private key.)

    3. Re:Checksums? by greenrd · · Score: 1
      I actually had to do this once (it was a visual task - just click on an item in an imagemap). Cool idea. Can't remember what domain it was for though.

    4. Re:Checksums? by aallan · · Score: 1

      This sounds like a terrible plan. As mentioned, a simple counter would blow this thing out immediately.

      Agreed, this doesn't really sound like a workable solution.

      However, a number the represented how closely related an incoming email and a known spam message would be a useful metric. Then you could have fuzzy filters that determined how close you would want to be before outright rejecting a similar message, or maybe just relocating it to a seperate inbox.

      I've been thinking about how the internet is evolving quite a bit over the last couple of weeks. I'm a regular on one of the few USENET groups left where the content to flamage ratio is still heavily skewed towards content. Except that in the last couple of weeks we've been hit hard by several trolls, and a bunch of people that don't know the difference between USENET and a web board.

      While I agree with you that some sort of fuzzy logic filter could do the job, and neural nets and genetic algorithims also spring to mind as possible solutions. I just think it says it all if we have to start integrating this sort of stuff into anything handling SMTP traffic just to keep the spam down.

      Al.
      --
      --
      The Daily ACK - Eclectic posts by yet another hacker
    5. Re:Checksums? by philfr · · Score: 1
      This sounds like a terrible plan. As mentioned, a simple counter would blow this thing out immediately.

      Well, I am not convinced spammers could do this without cost. In fact, they mostly use open relaying MTAs to send their spam once to thousands of recipients. So adding a counter to the body of their e-mail would force them to send it once per recipient, breaking the cost-effectiveness of spam.

      Just my two eurocents.

    6. Re:Checksums? by 4of12 · · Score: 2

      Pretty impressive procedure!

      I would have thought that going to the next level of spam filtering would require shoving messages of dubious origin into some delayed-delivery hopper that would be scrutinized carefully against the results of incoming messages from throw-away spam-gathering accounts on other machines.

      Your system of historical analysis makes it possible to defer the date when we will be forced to resort to multi-account inbox comparisons to filter out spam.

      --
      "Provided by the management for your protection."
    7. Re:Checksums? by friscolr · · Score: 3
      However, a number the represented how closely related an incoming email and a known spam message would be a useful metric.Then you could have fuzzy filters

      i tried that, had very good success. read more about it at:

      http://www.blackant.net/code/oth/random/nlp-spamfi lter.php

      i collected a sample of 30-plus spam messages as well as 30-plus not spam messages and ran some word and phrase frequency counts on each group, then threw that data into a couple mysql tables. Next i match the phrase and word frequency counts to new mail that arrives, and depending on how closely the new mail matches the known groups, i can tell whether or not the mail is spam.

      by tweaking the exact amount needed to be determined as spam or not-spam, i had very, very good success rate - out of 32 messages checked using this method, all were appropriately identified as either spam or not-spam.

      I've been meaning to continue with this line of spam detection, increasing the size of the db and testing it on a larger sample of mail (read: all my mail) and then seeing if the results were still as good, but...

      -f

    8. Re:Checksums? by jongus · · Score: 1

      Naturally, but if you combine md5 with a CRC, you might have something.

      After that, you just let the user specify a threshold value. Of course, no one would receive a newsletter from Amazon or something similar, unless you introduced a concept of 'trusted adresses' or the like.

    9. Re:Checksums? by stevelaniel · · Score: 1
      I believe Ron Rivest had an idea about how to handle spam: make anyone who sends email to you perform a small computational task in order for the message to get through. The task would be something like factoring an N-bit number, with N tweaked to adjust the difficulty. (Actually, I believe they used a problem whose difficulty is known exactly, whereas the computational complexity of factoring is not presently known.) The idea is that only people who send out huge amounts of mail -- like spammers -- would find the total computational challenge daunting. The rest of us, sending even 300 messages a day, would see no speed penalty. I like the idea.

      If someone can find a citation on this, that would be great. Rivest's paper was cited in Peer-To-Peer: Harnessing The Power Of Disruptive Technologies, published by O'Reilly and edited by Andy Oram. I wish I had a copy of it on me.

  34. Re:Cell phones are great by macsforever2001 · · Score: 1

    Also, when they are stupid enough to put an 800 type number up, call from a *pay phone*. Why? Because it is untraceable and it costs the spammer whatever the pay phone costs ($0.35 in most of the USA right now) plus long distance charges. I keep a list of all the spammers phone numbers that I need to put on the internet for all to benefit.

  35. Better idea for checksum clearinghouse? by Hobart · · Score: 3
    Seeing as that a key element of spam messages is to get people
    to visit particular URL's reply to particular email addresses or call particular phone numbers
    perhaps focusing on algorithms that identify these components and check their hashes against a database would be more effective?
    --
    o/~ Join us now and share the software ...
  36. Re:Worms? by mpe · · Score: 2

    It is claimed that over 90% of spam is sent through open relays, meaning that the spammer uses multiple RCPT TO commands and sends the identical message to each recipient.

    This also makes spamming "hit and run". By the time the spam starts arriving the spammer has gone.

    Most spammers don't have the bandwidth that it takes to send each user a personalized message, because they are almost always on a throwaway dialup.

    They also need processing power to do the personalisation, software which understands the full SMTP spec (rather than that required to get by sending to a relay) and can handle identd requests.

    Only the professionals can afford to send unique messages, because they often have a DSL line and a pink contract with their ISP (which permits them to continue spamming).

    They also need a frequently changing IP address...

  37. Re:Countermeasures by mpe · · Score: 2

    re: Countermeasures: the spammer would integrate something random into the message that would foul identification. There is simply no way around this. So the question becomes: at what point does the countermeasure become so expensive and difficult that the spam itself reaches the point of diminishing returns?

    Forcing spammers to customise each email would make spamming considerably more expensive. Because they then have to actually send each email, rather than being able to use third party relay machines to duplicate their junk.

  38. Re:bulk-mail should be refused by default by mpe · · Score: 2

    What you are describing is basically a "Teergrube" (german for tar pit).

    Problem is that ISP provided third party relays render this method useless...

  39. Re:hmm by csbruce · · Score: 2

    What you really need is some generic mail-message pattern-matching and a complaint & moderation system. You don't really need automatic detection of spam, since there would probably be plenty of people willing to complain if there was an effective place to complain to, and if mail clients as well as mail servers could consult the spam-detection service to eliminate confirmed spam before it reaches your eyeballs.

  40. Re:Issues... by Flounder · · Score: 4
    I submitted a story about building a steam-powered microprocessor with RAM made out of banana peels, and that didn't get posted--why this?

    Because everybody knows that Orange rinds offer better memory density than banana peels. And orange peels are more resistant to the excess steam from the CPU. Banana peels would just disintegrate with even a minimal amount of overclocking.

    --

    No boom today. Boom tomorrow. There's always a boom tomorrow. - Cmdr. Susan Ivanova

  41. Cheap signature algorithm? by jamiefaye · · Score: 1

    Lets invent a cheap signature algorithm which you can run on a message to digest it into a form you can compare with a blacklist, but can cope with simple measures to vary the contents. (I am using the term "signature" as in "signature analysis used by hardware engineers for fault detection".

    One idea (lets hear more):

    Compute a word-frequency histogram. Reject noise words like "a" and "the". Include long words that are relatively rare. Take the top 5 words that are common and the top 5 words that are long and store them with a frequency count.

    To compare similarity, consider each word as a seperate dimension and calculate the Euclidian distance.

    -- Jamie

  42. Spammers already break this by mattvd · · Score: 1

    A lot of the spam I revieve already contain random data in the subject line (or in the body of the message to break this). This is why the subject of some spam looks like "Free pr0n 3j1I". I beleive this practice goes way back to when bots would scan newsgroups and kill spam messages. The random subject lines would render them usless.

  43. Re:False Positives by Blrfl · · Score: 2
    Matthias Wiesmann writes:

    While the system could be broken by using counters, this could be countered by parsing only certain portion of the mail or counting the frequency of certain words. Would work very well on pure text spam, but not on attachement stuff.

    Actually, that technique works reasonably well.

    I used to administer the trouble ticket system for a very large ISP that got so many complaints that they became unmanagable. (Not all their fault, but that's another story.) Anyway, we had software that would take the bodies of the emails being complained about, remove whitespace and anything that wasn't in the dictionary, sort it, uniq it and generate an MD5 of the list of words that came out. I never studied it over the long haul, but tests on live data showed a match rate of about 90%.

    The real flaw in DCC is that it doesn't protect early recipients of the spam, because it won't have built up enough hits to be considered bulky. The only way to make it work would be to submit the checksum and hold the letter for some amount of time to see how bulky it gets. Most people would probably not like the lag time they'd get on legitimate mail.

  44. Re:Just Because they would counter it. by greenrd · · Score: 1
    Maybe you just aren't on many lists yet.

  45. Better idea: Suing by greenrd · · Score: 1

    I had an idea for detecting and proving when a site has sold your email address or spammed you - I posted it to comp.mail.misc here:

    http://groups.google.com/groups?q=author:greenrd %4 0hotmail.com&hl=en&safe=off&rnum=1&selm=f9cd2ccc.0 107291915.68572f17%40posting.google.com

    Here is the post:

    ---
    Lots of websites now have privacy policies saying "we will not sell
    your email address, or send you unsolicited emails" - sometimes you
    have to check a checkbox to make it come into effect. But how can you
    trust them? Well with this hypothetical idea, you wouldn't have to:

    The first part is easy, and well-known. Generate a one-time email
    address (various means are available). Associate it with the site
    (e.g. by naming it something like fake-addy-ebay@mycomputer.com if
    you're registering with ebay, say) Give it to the sign-up form,
    purchase form or whatever. If you actually want to receive a limited
    kind of email from them, or want to know if/when they've broken their
    promise, ensure that this one-time email addy forwards to a real
    address of yours, or at least ensure that you'll be able to read mail
    sent to it.

    Trivial extension (and too trivial to be patentable, besides, this
    post constitutes sufficient Prior Art) - How can you prove that you've
    never used this email address again, by accident or on purpose, in
    order to nail the spammers in court? You can't on your own - but what
    about a trusted third party? Call it TTP. In order to make the process
    virtually beyond suspicion, TTP would provide special form-filling
    software, activated by you the user. When you're asked for your email
    address by a site you don't trust, you'd activate the software and it
    would send the form to TTPs servers, which would generate a one-time
    email address, store it in their database, and forward the filled-in
    form to the real site (transferring an existing session to another IP
    could be tricky, but you'd probably just have to log in to the site
    again through TTP's proxy if you weren't already using it - and in
    most cases you wouldn't be logged in to the site yet, you'd still be
    registering). TTP database would also record which privacy options
    you'd ticked on the form. The real generated email address is NEVER
    transmitted to your machine - the software is designed so it's
    virtually impossible for the user to surreptiously find out what the
    email address is. All mail sent to that address (up to say 100 emails)
    would be logged in TTP's database, and forwarded to the user with the
    To address replaced with the user's real address. Total storage space
    required per user on TTP's servers: miniscule.

    Possible problems:

    1. Would a court trust TTP sufficiently to make their evidence pass
    muster on its own?

    2. How could TTP prevent a malicious user finding out the generated
    email address by filling out a form on a server which THEY (the
    malicious user) owned or had access to? Fortunately, they don't have
    to PREVENT it - all they have to do is RECORD where the data was sent
    - so if the address was actually sent to haxxors.com owned at the time
    by J. Cracker, and the complaint is by J. Cracker against yahoo.com,
    you can be pretty sure it's a scam. Heh.

    3. Obviously, you have to trust TTP itself with your personal info!
    That's why it's called a Trusted Third Party, duh! ;-)

    If these problems can be overcome, the best part is, if someone is
    stupid enough to sell your one-time email address to hundreds of
    spammers, you could use this virtually cast-iron evidence from TTP to
    sue both the list-seller for breach of contract (or whatever laws are
    most suitable to sue them under) AND ALL the spammers you could track
    down to a physical address (if you're in a suitable antispam
    jurisdiction)! Catch them red-handed! If they're selling something
    they have to be traceable to a physical address. And it's not only you
    that benefits - ANY rogue company would think twice about selling
    their email lists after one or two high-profile cases like that.

    If you wanted to be REALLY REALLY secure against arguments that "TTP
    could have issued same email addy twice by accident" - but this is
    probably over the top - maybe you could get it notarised with a
    "Trusted Fourth Party" specialising in notarising (but it'd have to be
    cheap). Disclaimer: I know nothing about notarising.

    Now, one unsolicited email is not necessarily enough to interest the
    courts in all antispam jurisdictions. But with this process automated,
    it'd be far easier to form a class action suit to make it more
    sizeable (I would imagine - IANAL) - when one stupid company sent out
    spams to 10,000 TTP users that had registered with them - especially
    REPEATED "this is a one time mailing" spams, grrrrrr - they'd be
    toast! And in some jurisdictions TTP could join the class action suit
    and claim even more damages, because it'd in effect be the ISP for
    those email addresses! (Remember, TTP is not just a spam honeypot -
    you can choose to receive legitimate kinds of emails through it - so I
    wouldn't imagine the defendants could seriously argue it was
    entrapment)

    If anyone's seen this idea before somewhere, please point me in the
    direction...

    If this works, someone who got in first with being a Trusted Third
    Party in this scheme could clean up... if lots of people care that
    much about nailing spammers... and I think they do! I would DEFINITELY
    pay a modest amount to use this kind of service!

    Let's look at the ideal scenario:

    1. You never list your email addresses anywhere public (at least not
    without spamproofing them first)
    2. You use TTP software for all your transactions, because it's so
    easy

    3. ANY spam you get can be tracked down to either one of:

    i) A rogue company who you can PROVE in court either spammed you, or
    sold your address without your permission.
    ii) If TTP has no record of it, you can be 99% sure it's because you
    have a rogue email PROVIDER who sold your email address, or listed you
    in a public member directory even though you told it not to. In this
    case, you can't necessarily prove it, but there's a simple remedy -
    switch provider.
    Comments? Obvious flaws? It is very late so I might have missed
    something obvious. Please let me know - but please DON'T email me -
    I'll read replies on comp.mail.misc.

  46. Re:Cell phones are great by QuoteMstr · · Score: 2

    What is the phone equivalent of goatse.cx?

  47. Forget comparing spam! How about universal naming? by Myself · · Score: 2

    I, for one, am always pissed off when I spend hours on my dialup leeching pr0n from some newsgroup, only to discover that I already had it on my drive under a different name. Somewhere along the line, somebody renamed the series.

    A database of image characteristics (like those used by D'peg! would make this less likely. People would be discouraged from changing the file's originally agreed-upon universal name.

    Publishers could upload their image characteristics into the database, along with a tag like "Originally from somepornsite.com". So if I someday come across an image I really like, I could check the database and see where to get the rest of the series. This would supercede obnoxious watermarking to indicate the source of an image.

    This could of course be used for mp3's too, which are all-too-often renamed incorrectly. Checksums would be enough for a particular song encoded by a particular encoder with particular parameters, but audio fingerprinting would be necessary to accomodate different encoders. I don't think that's a deal-killer.

    By the way, D'peg! is really neat, but it's amazingly slow the first time if you have a lot of images. (As in: My win98 uptime record is 11 days. Dpeg's projected completion time was 34. Good thing it can resume after a crash.)

  48. Re:Just use mail filters by yellowstone · · Score: 2
    Thus spake Skapare
    Show me one that works on my mail server without overloading it.
    Well, simple mail filters aren't going to overload your mail server any more than computing a checksum on each peice of email, and then querying some database to see if it matches the checksum for known spam.

    Plus, mail filters have the benefit of not breaking in the face of a trivial change to the body (like a counter).

    --
    I have no fin
    no wing no stinger
    no claw no camouflage
    I have no more to say...

    --
    150 Opening BINARY mode data connection for slashdot.sig (129323052 bytes).
  49. Just use mail filters by yellowstone · · Score: 3
    I've found that a handful of simple mail filters takes care of much of the spam I receive:
    • Junk anything that comes BCC (preceded by a white-list of subscribed mailing lists). This takes care of 70-80% of the spam that comes my way.
    • Filter out by keywords in the subject (like "marketing", "webmaster", and "viagra"). This takes care of a good chunk of the rest.


    --
    I have no fin
    no wing no stinger
    no claw no camouflage
    I have no more to say...
    --
    150 Opening BINARY mode data connection for slashdot.sig (129323052 bytes).
  50. Filter messages before checksumming by hamjudo · · Score: 2
    Checksumming the raw message isn't much value. It's an arms race. We'll have to have a way of dynamicly updating filters.

    In addition to the raw message checksum, possible filters include:

    • checksum paragraphs individually
    • ignore whitespace, punctuation and capitalization.
    • drop HTML tags
    • drop numbers
    • drop all non-dictionary words.
    Then analyze what gets by and add new filters as appropriate.
  51. Another idea by altair1 · · Score: 1

    A while ago I thought of another way spam could be blocked. Instead of checksumming the whole message, why not just create a database of say, phone numbers and fax numbers and domains included in spams? MTAs could check to see if an inbound email contains any spammer-advertised phone numbers or domains in a database and flag the message appropriately. Spammers cannot easily change telephone numbers.

    Spammers could write the phone numbers or domains oddly in the email to try and pass the filter, but a sufficiently liberal regular expression could pick it out.

    Speaking of regexps, maybe this database could be a giant database of regular expressions which match snippets of spam messages?

    1. Re:Another idea by J'raxis · · Score: 1

      Basically an ORBS/RBL for phone numbers?

  52. Fingerprints rather than checksums by n-russo · · Score: 1

    Some of the researchers associated with Google have been working on identifying similar, but not identical web pages. At a talk I attended, J. Cho described the process of fingerprinting documents (rather than checksumming them).

    These papers might be interesting:

  53. there is no technical solution to spam by jrennie · · Score: 1

    The fact of the matter is that, no matter how difficult we try to make it to send out mass, unsolicited e-mail, it will always be a cheap form of advertisement. Compared with other forms, spam is cheap and easily automatable. With little effort and cost, I can send spam to millions of unique individuals. The core function of the computer is automation. Technology is blind. You can't get the computer to automate most forms of communication without allowing the automation of unsolicited advertisement.

    Since we can't increase the technological costs of spam, the only good way to make spam more costly to the sender is to regulate it. The govt. should require that all spam have "[SPAM]" in the subject line, with additional labels for spam that advertises stuff that's inappropriate for certain groups of individuals (PORN, GOATSEX, etc.). Furthermore, the govt. should impose stiff fines and penalties for violators ($$$ & jail time, maybe even the chair?).

    It's nice to think that you can fix everything with technology. Over time, everyone comes to the realization that government is there for a reason; it's a necessary evil that does, on occasion, make the world a better place.

    Jason

    1. Re:there is no technical solution to spam by spectro · · Score: 1
      Since we can't increase the technological costs of spam, the only good way to make spam more costly to the sender is to regulate it.

      If MTAs can restrict the ammount of emails from same source to no more than X emails a minute this will make spammer's life way harder. This restriction should be enabled by default in all new releases of MTA software.

      Authentification should be required to send unrestricted emails through your MTA, and big ISPs should charge for unrestricted access to their MTA.

      SPAM is free to send, we have to make it more expensive. Spammers will have to charge more to send spam making their customers think twice about it.

      ---

      --
      HTML is obsolete. It's time for a new, simpler and richer markup language.
  54. Re:The problem is... by jrennie · · Score: 2

    Naive Bayes is a damn good text classifier that has already proven to be a good spam identifier. The problem is that no such automated classifier system will ever be able to get rid of most spam without throwing away a few non-spam messages too. It's a fact of life.

    Btw, check out

    http://www.picante.com/~gtaylor/spam/

    to read about someone's efforts to get rid of spam via a slew of techniques, including an automated classification system (Naive Bayes).

    Jason

  55. bulk-mail should be refused by default by spectro · · Score: 1
    I wonder why we just don't implement rules in our MTA's to not allow more than 10 or so emails per minute from the same source (domain or ip). Administrators will have the option to add some list of domains or IP's allowed to bulk-email to allow mail-lists.

    I think the next sendmail/postfix/whatever release should come with such a rule by default.

    In case of big ISP's they should force POP auth or something to allow relaying and, if somebody sends spam they JUST CHARGE HIM an insane ammount of money for each spam sent.

    ---

    --
    HTML is obsolete. It's time for a new, simpler and richer markup language.
    1. Re:bulk-mail should be refused by default by Lars+T. · · Score: 1

      What you are describing is basically a "Teergrube" (german for tar pit). Read here about them: http://www.iks-jena.de/mitarb/lutz/usenet/teergrub e.en.html

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

  56. Re:Cell phones are great by spectro · · Score: 2
    This would work great if "caller pays" like cellphones work in South America. Here, however, they would suck all your minutes.

    ---

    --
    HTML is obsolete. It's time for a new, simpler and richer markup language.
  57. Re:Laws about PRON by mrogers · · Score: 1

    Has it occurred to you that admins who run open relays probably don't check the mail of the postmaster account very often? They probably don't know how to set up the MTA to forward the postmaster's mail to a login account. (Maybe the box was set up by a friend, or the person who used to run it has left the company - either way, whoever's currently in charge of the box has no idea how to configure the MTA otherwise they wouldn't be wasting their resources running an open relay.)

    --

  58. man diff by mrogers · · Score: 2
    For plain text, you could just measure the length of the diff between the two messages. A simple counter would only change one line of the message.

    diff message1 message2 | wc -l

    --

  59. Wrong Approach... by toupsie · · Score: 1

    This will never work! Once again we are trying to use the wrong tool for the job. The problem with this approach is it focuses on the SPAM itself and not the SPAMMER. Killing SPAM will not stop SPAMMER from SPAMMING again. However, there is an excellent research group that has developed tools specifically designed to eliminate the root cause of the SPAM problem. That's why I am proposing using Magnum Research excellent anti-spammer utility. If every sysadmin would update their security tools with Magnum Research's hardware and used them daily against SPAMMERS, SPAM would be gone in a matter of months, if not weeks.

    --
    Strange women lying in ponds distributing swords is no basis for a system of government.
    1. Re:Wrong Approach... by turbine216 · · Score: 1

      well, if a spammer's primary objective in his spamming pursuits is to make money, then YES, this WILL stop the spammer. A spammer doesn't get paid to just send out as many e-mails as he can...he gets paid for referrals and such. Or he gets paid when his business, for which he is advertising (spamming), makes a profit. Neither of these things occur (referrals or direct profits) if the advertising never reaches its intended audience. So the spamming becomes a futile practice, and the spammer gives up. No spam, no spammer.

      Now don't get me wrong...i agree with you whole heartedly that the spammers should be the focus of our attention, as they are the ones to blame, but an easier and more immediate solution (as opposed to tracking down and prosecuting/threatening every last spammer) would be to eliminate spam on a massive scale.

  60. anti-spammer utility URL update by toupsie · · Score: 1

    Sorry, put in a trailing slash. Here is the correct link.

    --
    Strange women lying in ponds distributing swords is no basis for a system of government.
  61. Re:Why go through that much trouble to detect SPAM by Ryu2 · · Score: 2

    Check out the services of spamcop.net It lets you submit spam mail, extracts the IPs from the header, discarding the bogus ones, allows you to automatically send a note to the abuse department of the offending ISP, and tells you exactly how many people have submitted the same message, and now many times that ISP has been responsible for messages that generated spamcop complaint. Very cool.

    --
    There's 10 types of people in this world, those who understand binary and those who don't.
  62. Re:Spam Hunters by Deosyne · · Score: 1

    Because the government runs on kickbacks. If they don't get their pound of flesh, legislation doesn't get passed. You think those mongoloids in suits actually give a shit about the issue itself? They're just a bigger version of the mafia, and the Don requires his tithe for you to do business on his turf.

    Deosyne

  63. Re:Hashed bigrams count by jmv · · Score: 2

    certain histogram patterns would be common in non-spam email messages

    There is no such thing as a "common histogram". They will all be different. However, two identical messages will have identical histogram. Two almost identical messages will have almost identical histogram (while two almost identical messages usually have very different checksums).

    The reverse is usually true (of course, there's not absolute garanty): two almost identical histograms are very likely to come from two almost identical messages. The more you increase N (the bound for the hash result and size of the histogram), the more accurate the result. Also, using trigrams would likely be more accurate.

    While it is possible for spammers to vary their messages, they cannot send thousands of messages that are really different one from the other and this is why this technique should work almost all the time. Of course, you'd need to get rid of headers and any html tags and garbage before computing the histograms.

  64. Re:Hashed bigrams count by jmv · · Score: 2

    They could look at the histogram of a bunch of regular emails and just send the spam messages whose histograms are close to a lot of the histograms of the regular emails. This assumes that spammers would have access to the hash function though.

    Once again, your assuming there is such a thing as a "normal histogram". Remember, that we're not checking whether the "histogram" is normal or not. We're checking to see if this particular histogram (from a spam e-mail) as been seen more than x times before. Even if the manage to get a piece of spam match to the exact same histogram as a valid e-mail, the piece of spam will still be rejected with the unfortunate side effect that the valid message might be rejected (but since they cannot read your mail, they cannot get one of your e-mails rejected).

    As for the CPU time, sure you don't want to make N too large...

  65. Re:Hashed bigrams count by jmv · · Score: 2

    what similarity function would you use?

    Manhattan distance, aka L1 norm of the difference.

    And the reason I said it should work is that I have already tried that a while ago for a slightly different task. The only thing I'm not too sure it CPU time.

    As for histogram randomness, evan if the N-dimension (N ~ 1000) vectors (histograms) don't have a uniform distribution in the 1000-D space. You'd have to be very unlucky to get the same (or approx.) value for all of the 1000 bins.

  66. Hashed bigrams count by jmv · · Score: 5

    One way that would be much more effective is to take pair of words (eg. in this sentence: "One way", "way that", "that would", ...) and apply a hash function that returns a number between 0 and N (N usually between 1000 and 100000). You then compare the histogram (how many of each hash value) of a mail to the database. If histograms are too close to a spam message, you delete it.

    1. Re:Hashed bigrams count by madPatter · · Score: 1

      I'm not sure, but I'd guess that certain histogram patterns would be common in non-spam email messages. If spammers wanted to get spam through, then all they would have to do is send out messages that have histograms that are common for non-spam email.

      Also, I would think that creating such messages is easier than it sounds. Here's a quick way to generate lots of mesages that all say the same thing:
      {Hi, Hello, Howdy, Good Day}, would you {like to, enjoy, want to, be interested in} trying a free trial subscription, ...
      Just pick one phrase out of each set of brackets and you can generate a lot of different messages. (I saw something like this in a cryptography class where the prof got something like 2^30 messages all saying the same thing.) With all those messages, one of them is likely to match a common histogram and if so the system is broken.

    2. Re:Hashed bigrams count by madPatter · · Score: 1

      To me it seems that checking the histogram is going to take at least O(N) time (you're going to have to read how many occurences of each hash value there were at some point). Thus, there is a practical limit on how big you can make N since the historgram checking algorithm has to be run on every piece of email.

      I would think the distribution of histograms is not uniform. Common two word phrases like "if the", "i think", etc. would be more common than phrases like "zebra battleship" (same for 3 word phrases). Granted the hash values of the common phrases may all hash to different numbers, but even then some hash values will probably be more frequent than other hash values since certain phrases are more frequent than others. If anyone has any actual distributions, I'd be interested in how uniform they are.

      Also, the spammers wouldn't have to send all the messages. They could look at the histogram of a bunch of regular emails and just send the spam messages whose histograms are close to a lot of the histograms of the regular emails. This assumes that spammers would have access to the hash function though.

    3. Re:Hashed bigrams count by madPatter · · Score: 1

      Yes, I'm assuing that histograms fall into some sort of non-uniform distribution. But, with the regularity of language I would think that this is a reasonable assumption. It may well be that the histograms fall randomly enough that this method works. However, without any evidence, arguing over who's intutition is right is pretty useless.

      Using exact matches would not work very well. Spammers could just make slight changes to every email (change the name of the person it's addressed to) to get spam past the system. So, you would have to use some sort of similarity measure.

      Which raises the questions: what similarity function would you use? What criteria determine when a group of histograms is similar enough and frequent enough to constitute spam?

  67. My Solution by alpinist · · Score: 1
    I simply use 'disposable' e-mail addresses. Having my own domain makes it rather simple. The only people who get my real e-mail address are trusted folks. Otherwise, I have about a dozen forwarding to my actual POP3 account. So, if I need to sign up somewhere and receive a confirmation e-mail, I'll give them spam-a or spam-b or whatever my current spam address is. Once that box starts getting spam, I delete it. Since I don't use a catchall address, the spam bounces off the now nonexistant address.

    I've noticed though that since my throwaway accounts all have 'spam' in the user name, I actually can go months without having to delete the forwarder, despite using it regularly. Perhaps they automatically filter the 'spam' part out in an attenpt to parse the actual address, as lots of people stick 'NOSPAM' et al into their addresses in an attempt to block mail harvesters.
    --

  68. The checksum is fuzzy by crucini · · Score: 5
    Many posters seem to be naively assuming that dcc uses a checksum such as md5 which would change radically for a minor change in input. Dcc does in fact use md5 as a component but the actual checksum is adapted to the requirement.
    Download the source tarball, uncompress, untar and read /dcclib/ckfuz1.c. This checksum is clearly designed to be resilient to minor changes.
    On a deeper note, it's sad that so many Slashdot readers, including apparently CmdrTaco, underestimate others so severely. Do you really thing someone put in the effort to make something like dcc and never thought about how a message could be varied to evade the checksum? And why not read the linked document first? You would have found:
    Because simplistic checksums of spam would not be very effective, the main DCC checksum is fuzzy and ignores various aspects of messages. The fuzzy checksum will need to be changed as spam evolves.
    Summary: read before you criticize, and recognize that others probably thought the same thing you're thinking.
  69. hmm by Troed · · Score: 3
    This system already exists on news-servers and clients, and the spammers have already countered with random data appended to the spam (and random numbers in the subject headers)

    So ...

    1. Re:hmm by pallex · · Score: 1

      I noticed spam getting past my filters recently...turns out they`ve taken to having

      t o b e r e m o v e d f r o m o u r m a i l i n g l i s t

      etc. Pity, as this was the easiest way to filter a lot of spam straight into my trash folder.

    2. Re:hmm by peccary · · Score: 2

      That one is easy. (\w\W){5,99}
      Or something like that, depending on what you use for filtering news and email. For me, it's got to be GNUS Score files and Procmail.

    3. Re:hmm by andyh1978 · · Score: 2
      Couple that with a clause in the ISPs contract that allows them to assess significant fines against spammers
      The ISP I use for my website has such a clause:

      19. You will not use the Service to send unsolicited commercial messages, Unsolicited Junk Messages, SPAM or any other bulk message to a recipient who has not expressly requested to receive that message. This shall apply to messages sent via electronic mail, USENET news postings or any other medium which may be intrusive. If you breach this Condition of Use you agree that you will pay us compensation of no less than one thousand pounds sterling plus interest at 8% above the base lending rate of the Bank of England at the date you breach this condition from that date. You agree that you will pay this compensation in respect of each recipient address of each message sent in breach of this Condition. You agree that you will not run an "open mail relay" on any computer system connected to the serice. You will not seek to use the facilities offered as part of you account to run an email service using our equipment.
      A grand per message. Nice.
    4. Re:hmm by Erasmus+Darwin · · Score: 2
      Why do you feel so superior, exactly ?

      Pseudonymity provides more continuity (there are some Slashdot posters whom I recognize by name), gives people less incentive to be stupid ("FIRST POST! Natalie Portman and hot grits!"), means that the poster is more likely to catch a reply, and generally says, "I was willing to at least go through the trouble of getting a throw-away hotmail account so I could register on Slashdot." Is it a cure all? No. Are there worthwhile AC posts? Yes. But for the most part, it isn't worth the effort to wade through the garbage to catch the good ones. Besides, some of the good ones'll get caught by moderators, anyway.

      And, if you want accountability, don't go to usenet, or stay in moderated groups.

      Great! I propose a solution that doesn't stop anyone from posting, but allows me to selectively filter what I read, yet some genius AC declares, "If you don't like the way it is, go somewhere else." ...and yet he still wonders why I feel superior to the ACs of Slashdot.

      (As an aside, I'll generally read AC messages that reply directly to posts that I make. But more and more often, I wonder why I even bother.)

    5. Re:hmm by Erasmus+Darwin · · Score: 3
      the spammers have already countered with random data appended to the spam (and random numbers in the subject headers)

      ...and the worst of the bunch -- randomly inserting punctuation in the entire message:

      M`A.K,E M:O'N"E,Y F.A`S'T

      *shudder* Every now and again, I wish we would have optional accountability in Usenet, similar to how I can set my default read-level on Slashdot high enough that J. Random Anonymous Coward never shows up. Couple that with a clause in the ISPs contract that allows them to assess significant fines against spammers, and we'd be (theoretically) set.

      Then I wake up and realize that people'll just steal accounts or even use litigation to block the ISP from cutting them off for spamming. That's when I wish we could just train those kids who want to go on school shooting rampages to just take out spammers instead, killing two birds with one stone.

    6. Re:hmm by Placido · · Score: 1

      That was the first thing I thought of. Increment a number to the end of each email.
      Only problem would be in a couple years when you recieve emails where the number doubles the size of your message.

      Come and get your free pr0n! All your dirty fantasies at http://www.all_the_sex_in_the_world.com
      39834902094802304032943089535908345039475987345987 34958739458798375398475938475983759348573948573498 57394587394538479320384293857057


      Pinky: "What are we going to do tomorrow night Brain?"

      --

      Pinky: "What are we going to do tomorrow night Brain?"
      Brain: "I would tell you Pinky but this 120 char limi
  70. Easy to break by RainbowSix · · Score: 1

    This isn't hard. Most of the spam I get is directed to me personally, ie. my name or email/nickname is on it. That changes the checksum for everybody except for people with my same name.
    --------

    --
    --------
    It's OK to be social, just don't tell anyone about it.
  71. Useless by Erik+Fish · · Score: 1

    This method won't work because identical spam is often sent from many different relays. Of course most spam includes at least SOMETHING that is either random (random numbers in the subject is common) or personalized ("Dear xyz@example.com").

    If spam were this easy to filter it would have been implemented a long time ago.

  72. Re:What's the big deal? by Mr.+Sketch · · Score: 1

    Actually, I just don't have any friends. Last week I probably got about 10-15 calls a day and I don't recall getting a single call from someone I knew. My girlfriend in Belize who usually calls me every week didn't even call :(.

    --BEGIN SIG BLOCK--
    I'd rather be trolling for goatse.cx.

  73. Re:What's the big deal? by Mr.+Sketch · · Score: 1

    I never said I wouldn't try to reduce the amount of mail I get. But it doesn't bother me to the point where I would want laws against it or legal action to be taken against anyone who sent me an unsolicited e-mail.


    --BEGIN SIG BLOCK--
    I'd rather be trolling for goatse.cx.

  74. What's the big deal? by Mr.+Sketch · · Score: 2

    I haven't figured out why the online community is so uptight about getting unsolicited e-mails and having companies selling out their e-mail addresses to people. About 80% of the mail I get at my house is unsolicited and 95% of the phone calls I get are salesmen. How did they get my number/address? Most likely the phone company (or credit card company) sold it to them and this is a very common practice. I guess I just don't see what the big deal is when e-mail is so much easier to delete/avoid than unsolicited real mail and phone calls.

    After all, e-mail is checked when I want to check it and when I see any subject asking me what the state of my sexual arousal is or offering me a university diploma or just something from 348djkea23@yahoo.com I know I can easily delete it. It's not like a phone call where I don't know who's calling me and I kind of have to answer it right then. I do have caller id, but that's an additional service I have to pay for and most of my friends are out of state so they show up as 'unavailable' along with all the other salesmen.

    For unsolicited mail, I have to handle it no matter what, I can't just leave it in my mail box forever. But with e-mail I never really have to see it and I can delete it without having to ever give it a second thought and it's gone gone and not just taking up space in my trash can or recycle bin.

    Perhaps someone here can enlighten me.

    p.s. I'm sure I have more to say on this topic, but I really need to be getting back to work :).



    --BEGIN SIG BLOCK--
    I'd rather be trolling for goatse.cx.

    1. Re:What's the big deal? by atheos · · Score: 2

      About 80% of the mail I get at my house is unsolicited and 95% of the phone calls I get are salesmen. 95% of your phone calls are salesman???? you must be one big sucker!

    2. Re:What's the big deal? by DeadMeat+(TM) · · Score: 4
      The big difference is who pays for it.

      When you get a telemarketing call, they pay their long distance company for the right to call you. It doesn't cost you a penny to pick up the phone. When you get junk (snail) mail, the marketer had to pay the postal service to send mail out to each and every address. Not only does it not cost you anything, but in the case of the U.S. Postal Service these bulk rates actually lower the cost of you sending mail, since they use it subsidize part of the cost of personal mail.

      Bulk E-mail on the other hand is a different thing. First off, if you're not on a land-based U.S. phone line, odds are you're paying per-minute for your connection -- which sucks since you have to pay to get spam dumped in your E-mail program's inbox.

      Even if you have a flat rate connection, you're still inevitably paying for spam mail, whether or not it's directly. Bandwidth isn't free -- take a 5k spam mail message and multiply it by 10 million messages, both of which are probably conversative estimates, and you're talking about 50 megabytes each time a spam is sent out. If you get 3 spam messages a day, that's 150 megabytes of bandwidth just for the messages that you received -- which is only a tiny fraction of all the spam sent out in a day. Multiply 50 megabytes by the countless number of messages, and that's a lot of bandwidth going up in smoke daily.

      Guess who's paying for it? Hint: with spammers usually using stolen ISP accounts and fake credit card numbers, probably not them. Another hint: when ISPs' bandwidth costs go up, they pass it on to the users.

      Not to mention the fact that spammers shoving millions of messages through creaky mail servers can take them down. So even excluding the monetary damage, what's it worth if a piece of E-mail sent to/from you was on that server when it went down in flames? Your message may be delayed, or it may never show up at all.

    3. Re:What's the big deal? by AnotherBlackHat · · Score: 2
      The usual reply is that I'm paying for it instead of the spammer.
      This is of course, bullshit.
      Email is so cheap, that for most people the costs of throwing away the junk mail they receive is greater than the cost of downloading the spam. If you figure bandwidth at $10 / gigabyte, which is very high, then a 10K email costs a hundreth of a penny.

      The true cost of spam is the time wasted reading the crap. And if people weren't up in arms about it, there would be a lot more of it in your email box. It's sort of like flaming people for bad posts on usenet - it's not that the posts/spam is so bad, it's that if we don't do it, they'll just get worse and worse.

    4. Re:What's the big deal? by Lars+T. · · Score: 1

      Well, Mr "Don't Spam JeffSketch's hotmail address", if you don't know?

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

  75. Use immune system concepts by rjwoodhead · · Score: 1

    Consider the way mammilian immune systems work:

    1) an immune system cell gobbles the nasty virus or microbe.

    2) it chops up the viral genome into little chunks of various sizes

    3) the chunks are presented for recognition.

    Now adapt this to a net-based system:

    1) people get spam; they forward emails they definitely consider to be spam to spamocyte.com (or whatever), where it is chopped up into various overlapping chunks and the chunks are checksummed.

    2) mtas pick random chunks from each email and send them off to spamocyte.com (or a local copy/cache of the database) for checking.

    Problems: 1) spammers can mutate emails via a madlibs engine much more than natural viruses can mutate their genome. 2) bandwidth issues.

    Still, I think it's an approach worth considering.

    --
    "World Domination - a fun, family activity"
  76. Re:Cell phones are great by zulux · · Score: 5
    Just leave a message, and tell them your phone number is one of those Bahama-$20-a-second numbers. Wheee!


    Check out http://www.scambusters.org/809Scam.html if you don't know what I'm talking about.

    --

    Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.

  77. randomised strings by 13013dobbs · · Score: 2

    Most spammers use some sort of random character string in both the subject and body to get around filters that look for identicle messages being sent to the same system. I don't think checksums are going to do any better then the current filters that look for dupes. Sure, you could just look at the first, N lines, but spammers are also inserting invalid HTML tags in their messages to foil pattern matching. Since the tags are invalid, people dont see them. (considering that most people use some sort of HTML enabled mail reader)

    --

    No replies made to AC posts. Please log in.

  78. Add invalid HTML tags by 13013dobbs · · Score: 2

    All a spammer would have to do is add invalid HTML tags all over his/her spam. Most users use some sort of HTML based mail reader and the invalid tage would not show. Look at the HTML source of this post to see for yourself. They can even put the tags in the middle of words, to be an even bigger bastard/bitch.

    --

    No replies made to AC posts. Please log in.

    1. Re:Add invalid HTML tags by 13013dobbs · · Score: 2

      Please read what I said again. Checking the entire massage would be useless due to the fact that there may be hundreds of random invalig HTML tags in the message. These tags would still show up in the message, but would be ignored by the mail reader. The tags would still be visible to the MTA.

      --

      No replies made to AC posts. Please log in.

    2. Re:Add invalid HTML tags by 13013dobbs · · Score: 2

      Sounds good, but what kind of processing power are you going to need to do all that? If you had a hundred or so users, it may not be that bad, but for large ISPs, it might be horrible.

      --

      No replies made to AC posts. Please log in.

    3. Re:Add invalid HTML tags by sqlrob · · Score: 1
      So, add open and close tags at random

      e.g. "This <i></i> is n<b></b>ot spam"

    4. Re:Add invalid HTML tags by NNKK · · Score: 1

      You don't understand how it'd be computed, it's not going to be by what you see displayed on the screen, it would be done by a CRC or md5sum-like check of the ENTIRE message, every byte.
      Nor would it likely be done by the reader, but instead the POP3 or IMAP server.

    5. Re:Add invalid HTML tags by 3-State+Bit · · Score: 2

      and, to make your point compreHENSIBLE (you're just not expressing yourself, dobbs), the html tags would differ from mailing to mailing. Thus, the seen text is the same, but the unseen text is different enough to mess any crc up. Simple solution: exclude all punctuation and html tags. Make all lowercase. Split the results on whitespace. Foreach(word), spell-check and accept the first suggestion of whatever spell-checker you're using (as long is it's deterministic, heh). Replace each word with a deterministic thesuarus's suggestion for what the most common word is that is sometimes its synonym. (This way simple thesaurusing can't mess us up). It doesn't matter if the 'whittled-down' version we're now working with doesn't make sense in English--as long as we can always get to it deterministically.
      Now discard all articles and very common words (ones that don't convey information and can't be used to form whole sentences. Don't eliminate any verbs). You're left with the bare essence of what the emai conveys, and anything that's not in this can't be in the original. Then crc this one. Heheh, try to get around that, spammer.

      Er, actually, one thing I notice is that I didn't address "random" spacing. My system wouldn't realize that "random" is a word there. Solution: don't split on white space, remove all white-space and then use a dictionary that lets you see how close something is to being a word, then add letters until you're now farther from being a word than you originally word, and pop that off as a separate word. You can look ahead slightly, so that you don't pop "nation" just because it's more of a word than "nationa" is, if the letters afterward are "lity".

      Sound good?

      --

    6. Re:Add invalid HTML tags by 3-State+Bit · · Score: 2

      a) You're right. quite a bit of processing.
      b) I've already figured out a way around it! As a spammer, have your spam engine combine your sentences in arbitrary order. What about sentence matching? Set it so it adds removable phrases, I repeat you will never be charged, with modifyers like "seriously", and "we're not kidding", and even "very", "extremely", etc.

      Your "Spam Engine Markup for Interception-Neutralization and -Avoidance Language" (Seminal!) can have special tags telling you where you can put filler phrases. At the end, you can include a lot of random words from a news site or whatever, to throw off word-frequency analysis.
      The idea is that it's a lot easier for a spammer to change things around in random order than it is for a mail server to order them back again for comparison. So, plan no-go :(

      --

    7. Re:Add invalid HTML tags by s20451 · · Score: 2

      Yes, but then invalid HTML would be an even easier giveaway for rejecting spam.

      --
      Toronto-area transit rider? Rate your ride.
  79. Re:"Pretty close" checksums? by 13013dobbs · · Score: 2

    I have already posted a way to get around that. Look here. For the goatsecx paranoid here is the link to cut and paste:
    http://slashdot.org/comments.pl?sid=01/07/30/14442 47&cid=48

    --

    No replies made to AC posts. Please log in.

  80. Filter by this info.... by shpoffo · · Score: 1

    If you're mailer is smart enough you can automatically filter by this info - such as if the subject line ends in a number, a grep something like: " [0-9]+"

    of course, i filter most of mine by such searching for "The Scale Moved!!" ;)


    -shpoffo

  81. Re:Laws about PRON by shpoffo · · Score: 1

    I'll point out that many people (such as a company that i have worked with) doesn't read their admin mail at all.

    -shpoffo

  82. Spammers are already ahead of the curve by maxxon · · Score: 1

    Many spammers already put unique identifiers (sometimes wholly different gobs of text) in their spams so that they aren't easily spamcaught. Many also personalize their email so that it includes the (at least according to their records) name of the addressee.

    In short, there are many types of spam that this mechanism will fail to catch today, much less if such a system becomes widespread. It's too late for such a half-hearted measure.

    --
    max
  83. The problem is... by RobinH · · Score: 1

    When trying to solve the issue of Spam mail, you invariably have to define Spam. Perhaps that's the real problem, or the first we have to solve... Most of us have an idea of Spam, and we can all agree that a certain e-mail IS, or IS NOT Spam. However, making a machine do something that we consider so trivial is nearly impossible.

    However, there is a technology that is capable of performing this task: a neural network. Granted that setting up the input channels would be a little tricky, but once you did that, there is no end to the examples you could use to train this neural net. The net would even be able to categorize e-mails into "almost certainly Spam", "probably Spam", "probably not spam" and "almost certainly not Spam".

    The prohibitive cost of such a system would actually be the hardware, since simulated neural nets require lots of FLOPS. On the other hand, you can mass produce a pre-trained neural net for relatively little. Therefore, if someone could train a net to do the job, you could sell the solution as a plug-in PCI card for a computer. Just filter all the emails through the card at the MTA level.

    Perhaps I'm getting a little too carried away; does anyone know of someone who's tried applying neural nets like this?

    --
    "I have never let my schooling interfere with my education." - Mark Twain
    1. Re:The problem is... by RobinH · · Score: 1
      Naive Bayes is a damn good text classifier that has already proven to be a good spam identifier. The problem is that no such automated classifier system will ever be able to get rid of most spam without throwing away a few non-spam messages too.

      I'm not familiar with Naive Bayes... is that a neural network program? I was specifically talking about the ability of Neural Nets to *learn* based on previous cases. A neural net is basically a way to find a non-linear mapping between one set of variables, and another.

      The link you gave is for using iFile, which is nothing like a neural net.

      --
      "I have never let my schooling interfere with my education." - Mark Twain
    2. Re:The problem is... by Lars+T. · · Score: 1

      One thing a system like this will filter out is something encrypted with Spam Mimic. (Source: The Register)

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

  84. Re:Personalised spam by Grab · · Score: 2

    You could fix that by checksumming individual paragraphs. If more than 95% of an email's paragraphs match the checksums of a known spam, it can safely be rejected. This will require more storage, but the processing time won't be significantly longer (the longest time is calculating the checksums, which will take the same time for individual paragraphs as for the whole message, since it's a per-character time).

    You could even improve this when you've received several of the same by cross-comparing them and working out which paragraphs change and which stay the same. You could then combine the individual paragraph checksums into a single checksum, and only check that part of the message - that'll save on storage of lots of checksums.

    The only trouble I can see is when this is one of those three-line ones that just says "Feeling horny? Go to here for XXX" or whatever. If those added some destination-specific heading, it would be difficult to set the filter tolerances tight enough so that genuine emails with one or two sentences that match don't get filtered.

    Grab.

  85. Countering Counters by R.Caley · · Score: 2
    To avoid the problem of trivial changes to the message one would need to check the bits of the message they don't have control of. The middle bit of the Received: list would seem like a candidate.

    Eg if we assume that much of the spam problem is from open relays, then recognising that >N% of local users have gotten a message mailed through a given relay may be enough to flag it suspicious.

    Doesn't help the mailing list problem of course.

    I think the best anti-spam measure is simply to divide email into high quality and low quality lists based on the sender and have the user say which senders should be treated as high quality in future. If people you sent mail to were added to the high quality list by default that would take much of the work out of it. Since this way you are trying to pick out good stuff rather than remove spam, it is harder to counter.

    Add to that a magic word system. Messages with the magic word in the subject are tagged as high quality. Then you can give people you really want to hear from the magic word along with your email address. Change the word regularly and old information won't come back to spam you.
    _O_

    --
    _O_
    .|<
    The named which can be named is not the true named
  86. what about using rsync style checksums by dcd · · Score: 1

    http://samba.anu.edu.au/rsync/
    and
    http://samba.anu.edu.au/rsync/tech_report/
    discuss the "rolling checksum" that is used in the rsync technology - as the file is checksummed in pieces, a counter would have to be placed in each piece.

  87. spambouncer works great for me by misleb · · Score: 3
    I am running the Spambouncer procmail filter on my shell/IMAP account. I used to get 10 SPAMS a day. Now I don't get ANY. Its pretty intelligent.

    I guess this doesn't solve the problem of server resources getting stolen, but it certain saves me from having to look at the crap.

    -matthew

    --
    "THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
  88. Re:ISPs who don't care about Open Relays by Morris+Schneiderman · · Score: 1
    I recently had some problems with emails not getting through to everyone on a small list, so I contacted the ISP where I had the perl script running against a MySQL database.

    After some hesitation, tech support informed me that they had been put on the Black Hole list because their email server had an open relay. He also told me it would take 4 to 6 weeks to fix the problem. Something about not wanting to disrupt service to their customers.

    Now this is not some fly-by-night organization. I picked them a couple of years ago because I was looking for a professional hosting service, and everything about them seemed to indicate that they were one.

    Anyone out there know how to close an open relay? Maybe they'll hire you.

  89. the ultimate spam filter by aozilla · · Score: 2

    Someone needs to collect all these ideas together and make a nice pluggable framework for it. I'm not sure how it does it, but hotmail's spam filter has stopped 100% of my spam so far, with no false positives. If they can do it, so can we.

    --
    ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
  90. Personalised spam by jedwards · · Score: 1
    Most of the spam I get now starts off

    Dear jedwards, Look at this amazing

    which renders checksumming the whole message a bit useless.

  91. Just Because they would counter it. by BiggestPOS · · Score: 5
    Doesn't mean we shouldn't do it. Its an arms race, with each side consistently and constantly upping the ante. We really need to send the spammers a message that we DO still care.

    One thing bothers me though, as I was clearing out a large 'stuck' email for one of our dial-up customers the other day, I happened to casually mention "Wow, you sure do get alot of spam!" to which they replied "Whats that?" "You know, junk email" "Junk e-mail? I read it all" People like that are why our boxes receive such garbage. You fire enough bullets and SOMEone is going to die.

    --
    What, me worry?
    1. Re:Just Because they would counter it. by madPatter · · Score: 1

      There may be something I don't know about here, but I am one of those stupid people who ask to be removed from just about every spam email I get. This results in me getting virtually no spam.

  92. Come on Slashdotters - checksums are stone axes by waimate · · Score: 1
    It's surprising so many people here are hung up about the checksum angle to this. Using checksums is about the equivalent of your first "hello world" program in Basic. Crude, ineffective, and of little relevance to this particular real world situation. There are much better techniques around, and they are not all that hard. The trick is to use linguistic techniques to construct a fingerprint of the linguistic content of the message. This is tolerant of names or other pertubations in the content, without running undue risk of false positives.

    Because you know the content is written in a human language, and because we know an awful lot about the nature of language, we can leverage this information to do intelligent processing on the content other than just doing a dumb-ass CRC on the byte values.

    For example, tokenize the message into words, drop noise words, stem the rest, assign each an unvarying numeric value from a dictionary, histogram them, drop each extremity of relative abundance, and then checksum that. Hardly rocket science -- in fact pretty crude by text processing standards and just related as an example of the sort of things you can do to exploit linguistic characteristics. Other techniques like ngrams have a lot to offer here.

    There's a world of linguistic processing techniques around, and people in this business use them every day of the week. Checksums are stone axes.

  93. OPEN MAIL RELAYS R EVERYWHERE! by mcrbids · · Score: 1
    Figured I'd do a quick check - I use Pacific Bell DSL.

    I telnet'd to a couple of addresses similar to mine. Found an open relay on port 25 of the fifth system I tried.

    Lord oh lord! We are in for a heck of a time...

    -Ben

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
  94. Re:Cell phones are great by bmidgley · · Score: 1
    one of those Bahama-$20-a-second numbers

    *Please* choose a Bahama-$20-a-second outfit that does not itself do bulk email. Maybe there aren't any... :)

  95. DMCA by RoofusPennymore · · Score: 1
    I suspect spammers would just include a counter to break checksums tho."

    If that was the case, then wouldn't they be violating the DMCA? Then the FBI would have to go after them, right? Unless the FBI would selectivly enforce, and they wouldn't do that...

    --
    --- http://homepage.mac.com/gregjsmith
  96. Duh... by ErikTheRed · · Score: 5

    All you have to do is filter on the words "This e-mail is not spam!"

    Leave it to the Slashdot crowd to make things a million times more comples than they need to be...

    --

    Help save the critically endangered Blue Iguana
  97. Surprisingly, that can work! by WolfWithoutAClause · · Score: 2

    The big issue is counters and other subtle changes to the emails that would destroy a naive checksum.

    However multiple checksums of subsets of the email would not usually all be changed by one or a few changes/counters and checksums will be sufficiently discriminating to screen emails and can do a very good jobs of detecting any widespread junk emails.

    It would be difficult that all checksums of all characters of a particular length (say 20 characters) be made sufficiently different that ALL of the subsets of the junk emails can be different.

    (Checksums that checksum all the strings for a particular length are not difficult to generate as a matter of fact; little more than a circular buffer is required.)

    --

    -WolfWithoutAClause

    "Gravity is only a theory, not a fact!"
  98. Re:"Pretty close" checksums? by WolfWithoutAClause · · Score: 2

    >Aren't there algorithms that will report messages that are pretty close?

    Yeah, there are. 'Rdist' does this as a way of trying to only send the minimum set of changes necessary to keep two ftp/web sites synchronised.

    Actually to be precise, the checksum isn't imprecise, as rdist relies on checksums of subsets of the documents they are trying to synchronise.
    This neatly sidesteps the counter issue...

    --

    -WolfWithoutAClause

    "Gravity is only a theory, not a fact!"
  99. Re:"Pretty close" checksums? by WolfWithoutAClause · · Score: 2

    Impossible? I don't think so. All you have to do is each time somebody receives a junk email they mark it as junk email, the mail software can calculate one checksum starting at a random place in the file, and upload it to a checksum server. For any frequently received junk email the server will fairly quickly get enough checksums that the whole document will be covered.

    When anybody receives an email, they can check a handful of random checksums against the checksum server, if enough of them match, then do a few more to be sure and deal with the email according to any settings by the user.

    Still, there are issues. What happens if the email marketeers start appending random web pages to their email to dilute it down? What percentage of similarity is enough? There are some fixes- I think to be successful junk mail has to be fairly short- people rarely page down to cut to the chase; but adjusting the checksum points to emphasise the beginning and end of the email is probably a good thing.

    --

    -WolfWithoutAClause

    "Gravity is only a theory, not a fact!"
  100. Who do I sue? by Cardbox · · Score: 1

    You are expecting an email to confirm a massive contract. I send it. Your clever-fuzzy-friendly spam checker decides that it's spam, and bins it. We both lose a lot of money.
    Who do I sue?

  101. Good idea... by elan · · Score: 1

    ...if it detects all those annoying "me too" messages and treats them as spam.
    -elan

  102. Issues... by bribecka · · Score: 2
    If there are so many issues with this, and it seems like an idea that probably won't work, why is this posted?

    I submitted a story about building a steam-powered microprocessor with RAM made out of banana peels, and that didn't get posted--why this?

    --

    Where are we going and why am I in this handbasket?

  103. New anti-span concept: Corporate-signed e-mail by Caduceus1 · · Score: 1
    Here is something I batted around a few weeks ago. E-mail servers of major sites would sign messages with a public key mechansism (such as PGP/GPG/whatever) whenever the message is guaranteed genuine. E-mail servers will have the option of checking the signature (stored in a header) against the known public key for that domain (cached for performance reasons). If a message claiming to be from that domain arrives unsigned or isn't verifiable, refuse the message.

    This would at least stop spam from people with bogus addresses.

    --
    rm /dev/mem
    Sci-Fi Storm
  104. Re:Spam Hunters by Alien54 · · Score: 2
    They're just a bigger version of the mafia, and the Don requires his tithe for you to do business on his turf.

    Well there is this classic from a couple years ago on Segfault:

    Mafia Don Announces New Anti-Spam Venture
    Posted on Fri 02 Apr 19:25:26 1999 PST

    As the NSA and FBI fear, traditional crime organizations have been incorporating high-tech communication into their organizations. Although Janet Reno was quoted stating "This is law enforcement's worst nightmare.", techies around the world are sure to be pleased with one New York Syndicate's new venture.

    It all started when Don Dominiqi signed onto his AOL account last Monday morning. His inbox was filled with "Make Money Fast", "Viagra On-Line", and "Teenybopper Web Sex" ads. Lost amidst the drivel was an important note detailing a non-taxed shipment of Marlboros, which were later confiscated by the BATF. Little did he know, as he shouted "Bring me the left hand of this f*cking gutterslime!" what would become of it all.

    Later that same day, Billy "Run!" Brutekowski and Larry "My Eyes!" Plucker cornered the pasty-faced offender of the Family in a small cyber cafe in Grenich Village. "This was by far the creepiest place the Boss has ever sent us." stated Billy, who only spoke on condition of anonymity. "Everyone in this place looked pale and sickly, like they had already been 'spoken to'. We asked for this punk, and several people quickly pointed him out. Most of the scum we find in gin joints aren't so quick to finger one of their own," Billy continued.

    "He must not watch much TV, because this sh*t didn't even flinch when we came to the corner he was hiding in," Larry proceeded to relate. "We dropped this sheet of paper the Boss had given us on his table and he says 'So you guys want to make money fast, eh?' He puts out his and says to give him $20. This scrawny little dirtball tells me to give him $20!" Larry was quite agitated at this part in his story, and his description of how Sammy Spammer's hand fell off was quite garbled.

    Billy continued, "Up till now, this was a routine visit. We was just being playful. The weird sh*t began when we tried to leave." "This pimply faced kid blocks the door as we try to leave, and I'm thinking to myself 'Great, a f*cking Karate Kid hero. He just stand there, and then he hands me a $5 bill." Billy pulls out the $5, and holds it like it is his first quarter from his favorite grandmother. "They lined up after that, and we had $175 in 'tips' when we left the joint."

    Later that day the Don himself visited the café, unwilling to believe the story. Although the details are unclear, sources at the café indicate that the Don has hired them to build and host a new Anti-Spam site. Through a SSL transaction system, the site will accept spam complaints and credit card donations towards 'solutions to problems'. Multiple complaints against the same spammer are added to the total until an acceptable solution has been found.

    Larry tells us that a typical $250 solution is a broken hand, and for $2000 all anyone ever sees again of 'the problem' are his shoes.

    The URL is to be announced next week, and the cyber café's phones have been jammed with requests for more information.

    --
    "It is a greater offense to steal men's labor, than their clothes"
  105. Spam Hunters by Alien54 · · Score: 4
    I still think that we have to make it profitable for folks to go after spammers.

    Spammers need to be licensed (preferably with an ear tag, but i'll consider substitutes) and fully identified. all spam needs to have a spam license number in the header someplace.

    Fees can then be and need to be collected by your favorite government agencies (I think the IRS, the NSA, and BATF will do for now). ISPs and users need to be able to bill spammers some amount for the spam processed and received. Fees need to be large enough that it is worthwile to go after them, and then we can have bounty hunters. Fees can be high enough to reduce the cost of access. Penalities for abuse can be heavy (20 years in jail, for example)

    Then we can have spam hunters who will go out and collect from the spammers for you in exchange for a percentage.

    --
    "It is a greater offense to steal men's labor, than their clothes"
    1. Re:Spam Hunters by cmstremi · · Score: 1
      I realize you were suggesting that with tongue in cheek, but this still brings up an unfortunate point. We (the Internet community) may very well end up sacrificing our Free(tm) Internet before we'll see effective regulation of Spam and other forms of online abuse.

      If a federal body takes control of spammers, there's really not a line that separates other regulation. Next thing you know, we're facing a very real threat of e-mail tax, federal usage fees, etc. It's not a pretty thought.

    2. Re:Spam Hunters by eclectro · · Score: 2

      Spammers need to be licensed (preferably with an ear tag, but i'll consider substitutes)

      maybe spammers could be branded with a giant S with a hot-iron like they did with cattle in the old west....

      --
      Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
  106. Cell phones are great by AintTooProudToBeg · · Score: 2

    My cell phone offers free long distance. So I call the number on every piece of spam that I get. Mostly you get an answering machine, so I request a call back. This costs the spammers time plus hopefully a little money for the call back. Mostly they're semi-pathetic business-type people who really don't know anything about computers and are somewhat apologetic/embarrassed. I did get one asshole who hung up on me when I started asking where he got my email address from... so I called back (CallerId is great!). Anyways, call those spammers!

    1. Re:Cell phones are great by tmark · · Score: 2
      My cell phone offers free long distance.

      Wow, sounds like you have a great cell phone plan. Do you get local calls free too ? ;-)

  107. Fingerprinting required, not checksumming by Xilman · · Score: 2
    I've now read a whole bunch of comments saying that checksumming is useless because adding junk/serial numbers/whatnot will defeat the spam detectors. True, but irrelevant.

    The intellectual property protection people have been thinking about this sort of problem for a long while now. Just as they want to be able to detect when something has been copied, the spam-haters want to detect when something is a copy. Both want to be successful in the presence of countermeasures. It's the same problem!

    There's a vast amount of literature available out there. Any half-way decent search engine should throw up more than you can read in a reasonable time.

    Paul

    --
    Lasciate ogne speranza, voi ch'intrate
  108. Re:Corporate mail users are too stupid.... by atheos · · Score: 1

    Are you sure you work for an ISP? Obviously not in the IT department. Having your server configured to use the ORBS database, and having an open relay are TWO DIFFERENT THINGS. You can close your open-relay server, and continue to allow your users to be bombarded with e-mail. If you want to be real nice, and you have customers with virtual domains, you can run one server wide open (they get all mail+spam) and one mail server with ORDB,ORBL, or something else. you DON'T have to have an open-relay to do this. ORBS is no longer in existance btw.

  109. Re:Laws about PRON by atheos · · Score: 4

    Ya, this same argument is used when discussing censoring the entire internet. Ever though about running for office? Spammers aren't the only ones I blame. I run a small mail server (less than 1k messages a day), and every night I e-mail ISP's informing them of open relays, and dialup customers abusing their systems. I have received a few auto-replies, and not ONE god damn response from someone who cares. I'd like to assume that most people are way too busy fixing the problem, but the same culprits keep showing up in my mail log. When discussing legal action against spammers, I think the same legal repercussions should be directed to ISP's who don't know/care how to run a mail server.

  110. moderation Re:For USENET! by leuk_he · · Score: 1
    The problem with this is you need an central server for authentication. Or at least an authorithy that does this. And we have seen with anti spam orgatnistions (ORBS) and usenet you need a very thick skin if you want to be the regulator. A technical solution is wanted for this but i can not come up with one.

    This would be a killer application for freenet, some kind of usenet, but with moderation to filter out the trolls/spam.

    I am afraid this will just be a part of the arms race with spammers.

  111. mailers by zoftie · · Score: 1

    Idea of stopping spam is pretty old, and none
    really work, but a RBL type.
    What would be better though, is to have
    humans look over the generated list of checksums
    (I prefer md5 =) ) and do the check on domain.
    If its email list, place it here, if its SPAM
    place it in RBL. Theres more to process of
    verification, but with nice graphs it would be
    easy to see who generates most.
    Its not perfect solution for spam though,
    anyone hoarding one ? =)
    p.

  112. Re:"Pretty close" checksums? by exploder · · Score: 2

    One problem I can see immediately with these "blockwise" checksums is that the spammers could easily insert not only text with random content but also random length. Do any of these "pretty close" methods handle offsets appropriately as well?

    --
    Yo dawg, I heard you like the Ackermann function, so OH GOD OH GOD OH GOD
  113. The Scale Moved -- I HATE those BerryTrim People by uptownguy · · Score: 1

    I have been getting upwards of 5-15 of those %@#*(%#@#$ "The Scale Moved" emails a day lately. They are from a company called BerryTrim. I hate them. I have a filter set up a filter in Pine to delete them before I ever can see them and still a few get through!

    I HATE THEM

    So, lately, I've taken to calling the 1-800 number for the BerryTrim website each day. Sometimes several times a day. I ask to talk to Customer Service and I argue with them about this Spam. They tell me that "those are our associates and we have nothing to do with them" --(sure)

    I tell them I don't care and I want the email to stop.

    The real reason why I am doing it is this: I want to stay on the phone with them for a while...Those 1-800 numbers are pretty pricy. I used to price out the cost of outsourcing help desks and call centers for Fortune 1000 corporations, and I can tell you that over 90% of the cost is either phone lines or getting warm bodies to sit in the chairs to take the calls.

    SO... they spam us...we spam them The 800 number for BerryTrim is 1-800-401-6327

    DO NOT buy anything from them. (duh)
    Just stay on the line for a while (heh)

    If we can bring down websites with the Slashdot effect, let's do a little group action and take out a spammer or two!

    Don't just read this post and chuckle and say, "Cute idea..." PICK UP YOUR PHONE AND CALL. Its toll free and you will be helping to bankrupt a spammer!!!

    Thanks!

    --


    I would have to say that explosives are the most abused technology in all of history.
  114. Re:The Scale Moved -- I HATE those BerryTrim Peopl by uptownguy · · Score: 1

    I love you, A.C. I really do. I used to hate you -- usually your posts suck. But this one really made me change my mind about you!

    --


    I would have to say that explosives are the most abused technology in all of history.
  115. False Positives by Matthias+Wiesmann · · Score: 3

    While the system could be broken by using counters, this could be countered by parsing only certain portion of the mail or counting the frequency of certain words. Would work very well on pure text spam, but not on attachement stuff.

    What would be funny would be to see the false positives of such a system. Many mails I get from the administration all look the same, I wonder if they would be considered as spam - they are quite similar to spam: useless and to numerous...

  116. how long we need to wait... by vla1den · · Score: 1

    when somebody finaly would hack TiVo to do this with TV adds?

  117. "Pretty close" checksums? by geekplus · · Score: 3
    Aren't there algorithms that will report messages that are pretty close, i.e., within N arbitrary bits of each other, as the same checksum? Or at least something approximating a checksum..., i.e. two different checksums that nonetheless return true when passed to an equals(cs1, cs2) method?

    Does someone have a link?

    -- I had a female crustacean once, but I lobster...

  118. Counters? Already do. by J'raxis · · Score: 2
    Haven't you gotten the spam that says something like:
    EARN $$$$$ AT HOME!!! xyzzygx
    In each copy, that xyzzygx is a different string of crap. I think this technique was originally developed to be a filter-foiler (you see this in Usenet a lot more than in email), but that'd do it.

    There's also the spam that includes customized URLs in the message (image downloads that, say, have your email address embedded in the query string -- sneaky little "live address" confirmation technique).

  119. It's even more personalized than that by dcavanaugh · · Score: 1
    I often see lines of pseudo-random text in the spam that probably identify the recipient. If you click on the spamvertized link, their logs pick up the id of the individual spam recipient, so they can target the visitor for additional harrassment (oops, I meant "direct marketing").

    Even the non-html spam has this pseudo-random text. I suspect they are using it to ID the spam recipient when complaints are sent to the ISP, who forwards the actual complaint without identifying the person who sent it.

    All of this would tend to defeat the checksum algorithm.

    I think we need a "spam tax", to be paid by the ISPs who originate the messages, and passed back to the spammers as a marked-up fee. As soon as the cost per message exceeds the cost of snail mail, the game is over. Of course, the overseas providers would not be subject to the tax -- until each government sees the gravy train and jumps onboard. It would be very easy for the ISPs to keep the entire cost fo the "spam tax" limited to the people who send the spam. The only drawback is that once the revenue from the "spam tax" dries up, the same government entities would be looking for ways to replenish the revenue stream by taxing other things on the Internet. If you view taxation as inevitable, it might as well start with the spammers.

    Maybe the tax could be disguised as a "fine", similar to a speeding ticket. After all, the government would be invoking a financial penalty for unacceptable behavior, very similar to a traffic violation. Since the actual enforcement would involve complaints from the recipients, it's very much like getting caught speeding.

    The greatest part of all is that no one has to enact the "spam tax" to get the benefits. Just the mere possibility of something like this would have the spam-friendly ISPs running for cover.

    I think the key to the spam problem is to raise the spammer's cost. I may not have the ideal method, but I think detection or filtering is not going to get the job done -- it's all about cost.

  120. One-off email addresses implemented by jhantin · · Score: 1

    The first part is easy, and well-known. Generate a one-time email address (various means are available). Associate it with the site (e.g. by naming it something like fake-addy-ebay@mycomputer.com if you're registering with ebay, say) Give it to the sign-up form, purchase form or whatever. If you actually want to receive a limited kind of email from them, or want to know if/when they've broken their promise, ensure that this one-time email addy forwards to a real address of yours, or at least ensure that you'll be able to read mail sent to it.

    There is already a site that provides this service: check out www.sneakemail.com. The e-mail addresses generated consist of random alphabet soup, rather than anything user-selectable (IMO this is a feature), and a decent Web interface is provided both for managing numerous aliases and configuring sender-filtering independently on each.

    --
    ...when you're writing a game...tweak the difficulty of "Easy" to something [your mother] can cope with. -- onion2k
  121. Razor and other anti-spam measures by Lobsang · · Score: 1
    There's a project called Razor that does that. The hash generated from one message is numerically close to the hash generated from a similar message. This should defeat the 'lets add a counter to defeat the hash' fu that spammers might try.

    I for myself hate SPAM. I've been able to filter out around 90% of it using simple measures (like filtering out emails without or with invalid "From:" addresses, etc). Yet, the remaining 10% ones are annoying as hell.

    Got to try razor myself. Thing is: This system will only work if enough people jump in. Let's see...

  122. Getting rid of spam for good... by Fredflintston47 · · Score: 1

    This solution only works if you have a unix computer delivering to your mailbox. However, if you do, it virtually eliminates spam.

    The solution is to put a password system on your mailbox.

    You set up filters on your email so that for the people you know and mailing lists, they can email you as before. The rest of the world will get a canned response saying that they have to send the email again with the password of 'xxxxx' in the subject.

    Since spammers almost never send a valid email address, they never see this email, and so the spam disappears back into the ether, and I didn't have to use potentially expensive modem time to download the spam to check to see if I wanted it. (And I didn't have to participate in a global spam checking service that doesn't really work 100% anyway).

    Once in a while (2 times in the last 6 months) you'll get a spammer stupid enough to actually resend the spam but with the password. However, since they are sending it by hand the return headers are valid! *yay* Instant retribution!

    Here's the link:
    http://www.uwasa.fi/~ts/info/spamfoil.html

    My spam rate was 10-20 spams a day. After putting this into place, I get maybe 1 a month sneaking through on top of some other filter match.


    I can't recommend this enough!

    -Fred

    --
    Go, Springboard, Go!
  123. Trading one problem for another by morcego · · Score: 1

    Well, as far as I'm concerned, I would say this is only trading one problem for another.
    The trafic created by these Checksum tests would be se serious problem. Say, the SPAM will still be using my link to get to may MTA. Then, I also add some more trafic to check is it's a SPAM (and will most certainly get tons of false negatives, as stated before on several posts).
    Overall, this ia a very bad idea.

    ---

    --
    morcego
  124. OpenS0urce S0lution; proposal, usual plot warning by ImaLamer · · Score: 1
    Of course this could be done for free and open to all operating systems. Linux servers all serving the checksums... at the bottom of the e-mail or as attachment there is checksum.

    You click a button in [your favorite mail application, any platform] ; that in turns sends a simple text message, with the crc or checksum and if they match (either the client software, or the server, you choose) they show as matching, moved to a cleared folder, application dependant.

    Applications can compete on how they use the results. One good idea could be to filter out non matching results, or to send them to a junk folder - or simply showing a certain icon.

    The real key to the system is this: if spammers are creating a crc which is being used over and over to send to multiple clients via redirecters and other cleaver tricks, hit a button and simply vote it spam. Use a weighted system to eventually filter out the same message. But running the headers throught the checksum would stop most spammers since the TO: field would most likely change.

    Simple text messaging that can be used by any programmer, and there are many non GPL, examples of how to compare two checksums.

    Guessing the server would carry all the checksums, a good idea would be to add an revokation date which can be set client side either defaulted or user configurable.

    Really the whole thing is simple. Just block people from mass e-mailing. Test the system for a while then add the spam blocking to see which crc's where voted spam, cross that with the volume of e-mails by that person. Although the system suddenly became huge, but off site computers could do the computing, not the servers.

    E-mail is a huge thing. Linux sends e-mails to my wireless phone without any user interaction. The system better be ready for people who use e-mail like an instant message.

    Now it comes to mind - if my pop server software (and maybe all isp's) would just check the crc against the server that would save everyone.

    Even MS could get into the game with Hotmail and their own MS CRC server...

    This is my manifesto:

    Get your free hotmail address - Now with hailstorm and E-mail signing - Free (biometrics required)

    ----checksumurl--http://checksig.msn.com:7235----
    ka;dddjdppwo3as-e34-44444uv2-84urrhpwerrupw34gdgh
    4-0394uvm-03485umt5jt-5ut059u-02-95uy05u25uy5fdgh
    442i0934it-09utury]==-04904g2-5t8528-b09-2ururt45
    ----email--checksum:--0x485ksro842---------------

  125. Such an obvious idea.. by Mike+McTernan · · Score: 1

    Hi,

    I though of this long ago, but thought that Hotmail should implement the system - they can see 10000's of mail boxes and see which messages people have deleted, which are likely to be spam by rules and who read/replied to a message. This would give a pretty accurate indication as to whether it was spam or not (e.g. contains $$$, no one replies, and most deleted without reading).

    As for using checksums... it's obvious that it wouldn't work (for reasons already mentioned). Instead a system that gives some checksum whereby a message with sum 10 is very similar to message of sum 11 etc... is needed so that slight changes to the message don't make the message appear unique. I'm sure fingerprint databases must do something like this to allow fast indexing/retrieval of similar prints... Anyone know how that works.

    Mike

    --
    -- Mike
  126. Minor Changes by NotAnotherReboot · · Score: 1

    I get all kinds of spam that obviously uses a program, because it says "this email was sent to: myemail@here.com"

    Wouldn't that screw up the checksum? Of course they could just take say the middle 75% of each email and that would be enough and probably not run into those problems quite as often.

  127. Brightmail already does this one better. by AnotherBlackHat · · Score: 1
    Checksums don't work very well, but with a few refinements, you can produce a system that's just like the already existing brightmail which does.

    The problem with public versions of spam filters, is that spammers have access to the data too, and can tailor spam to pass much more quickly than you can tailor the filters to stop them.

  128. Bad idea - how about this though? by shic · · Score: 1
    I don't like the idea of check digits - while I am aware that it can be made arbitrarily unlikely that two different messages possess the same checksum, the possibility would reduce confidence in e-mail, which can't be a good thing!

    I've wondered for quite some time why it is not a standard feature for a POP account to decline mail which is not addressed to one of a user-defined set of e-mail address regexps. The vast majority of my spam doesn't even mention my name in the address... and if it wasn't sent to me then I don't want to know! I realise this would require additional configuration for each user (particularly if you frequently join/leave mailing lists, but I for one would immediately see the benefits if I didn't need to wait downloading such obvious spam over dialup!

  129. Re:No One Solution by tb3 · · Score: 1

    Yeah, I forgot about that, but my spam comes from msn.com not hotmail.com. msn.com seems to allow forged headers, while hotmail doesn't. Makes you wonder...

    --

    www.lucernesys.comHorizon: Calendar-based personal finance

  130. Re:No One Solution by tb3 · · Score: 2
    Nice idea, but my single biggest source of spam is msn.com. Do you think Microsoft is going to be proactive about blocking spam? Do you think they know how?

    I'm seriously thinking of blocking the entire msn and hotmail domains from my inbox. I don't know anyone on msn, anyway.

    --

    www.lucernesys.comHorizon: Calendar-based personal finance

  131. Laws about PRON by pudge_lightyear · · Score: 1

    OK...kid's age 12 are getting PRON mail all day long. It should first of all be illegal to send PRON mail to anyone who you have not previously verified to be over the age of 18. Since most of these mails come from morons who buy cds with your email address on them, and since they can't verify anything without first contacting you, this would stop 75% of all junk mail. The beauty is, the law wouldn't even be aimed at stopping junk mail, it would be aimed at protecting kids.

  132. No One Solution by isa-kuruption · · Score: 1

    I don't think there is one, simple solution to the spam problem. This idea sounds like it will work, but as mentioned, what about mailing lists?

    Okay, and what about the "counters" spammers would use? Maybe there should be a system that uses 'diff' to compare lines or something similar... mailings that have let's say 99% of the same message would be considered spam.

    As far as mailing lists, why wouldn't they be able to register as legitimate mail? Like an exception list, an MTA can contact this central repository with the mail it receives, send the checksum (or what have you) and the repository would still say "yeah, it matches, but it's from a valid mailing list located in our database." Therefore, the MTA would not block it.

    I think one of the better solutions for getting rid of SPAM is for ISPs to do a better job or implementing and actively enforcing their anti-spam policies. The only way to do this is to claim a $500 per e-mail check for every piece of unwanted e-mail you receive. If they give a sh*t or two, they will stop the spammer, and although maybe not give you your $500, you surely did your part to stop at least 1 spammer.


    I think you need to flash your brain's firmware.

    1. Re:No One Solution by isa-kuruption · · Score: 1

      I've always been a proponent of blocking *.msn.com *.hotmail.com anyway. ;)

      I think you need to flash your brain's firmware.

    2. Re:No One Solution by Kareena+Bhagnani · · Score: 2

      Nice idea, but my single biggest source of spam is msn.com.

      Are you sure? 95% of my spam is obviously sent using the same tool, and whatever this tool is always creates some random-garbage msn address in the From line (e.g. "4jceg@msn.com"), and identifies itself to the open relay as a machine in the msn domain (again with a random-garbage name, e.g. imldn.msn.com). But following the IP addresses in the Received lines, I've never yet found a single one of these actually coming from msn.

      I'm seriously thinking of blocking the entire msn and hotmail domains from my inbox. I don't know anyone on msn, anyway.

      That'd work, if the only mail you get that claims to be from msn is forged spam. Which is probably true..

    3. Re:No One Solution by Lars+T. · · Score: 2

      I guess you didn't read this Register article http://www.theregister.co.uk/content/6/20052.html. I'm too lazy to condense it for you ;-)

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

  133. How I filter spam by koreth · · Score: 4
    I do a few things that are extremely effective in filtering out spam. I have procmail rules to do the following:

    • Mail that doesn't list one of my addresses, or the address of a mailing list I know I'm on, in the To: or Cc: lines gets filtered. This alone catches a solid 85-90% of my spam flow, though it seems to be getting less effective as time goes on.
    • Mail that's from a free E-mail service (Hotmail, Angelfire, etc.) gets filtered.
    • Mail that contains certain keyphrases (e.g. "free" in all caps, or "this is not spam" or "S.1618") gets filtered.
    • Mail that has passed through a .cn or .tw or .kr host gets filtered. Those countries seem to have an abundance of open relays. At some point I hope to change this to check against ORBS/DUL instead.

    Now, the interesting thing is what I do once I've decided to filter the mail. Since my rules catch legitimate mail, I don't just throw it away. I wrote a small collection of Perl scripts (which I'll release to the world someday soon, but they need documentation) that maintain a whitelist of sender addresses.

    If a filtered message is from an address that's marked valid, it's delivered. If it's from an address that's marked invalid, it's discarded. If it's from an unknown address, the message is put in a holding area and an autoreply is sent back to the sender from a magic address asking them to reply in order to validate themselves.

    The magic address is unique per filtered message -- it uses qmail's address extension mechanism -- and mail to the magic address never gets delivered to me, so I don't care if it gets added to spam lists. The Perl script behind the magic address does a quick check to make sure it's not processing a bounce, then marks the sender of the original message as valid and delivers the original message (or messages if more than one arrived while awaiting validation).

    Held messages are cleaned out by a cron job when they get too old.

    This is sort of similar in concept to the password mechanism of SpamBouncer or (a closer cousin) SpamCop's whitelist feature, but it doesn't require senders to retransmit their messages, which I always thought was pretty annoying to ask people to do since not everyone saves their outgoing mail. Granted, asking them to do anything is kind of annoying, but at least this is less so since they can just hit "reply" and "send".

    This setup is cool because it allows friends to Bcc me on stuff without my "I must be listed as a recipient" rule trashing their messages, even if they've just switched E-mail addresses. It is admittedly based on the assumption that spammers don't read replies to their mail and/or wouldn't go to the effort of unlocking themselves; I have yet to see a spammer do that, and given the economics of spamming I think that'll be a safe assumption for the foreseeable future, unless this approach gets so popular that spammers start writing automated unlock bots!

  134. Hamming distance / generalized checksums by s20451 · · Score: 2

    A good bitwise (or symbolwise) measure of distance between two sequences is the Hamming distance, which is the number of different symbols between the two sequences. A simple checksum will basically tell you whether the Hamming distance is zero (same checksum) or nonzero (different checksum).

    I'm sure it's possible to generalize the concept. I'm not aware of any specific work, but a simple solution would be something like a blockwise checksum. If enough blocks match up, it could raise a flag indicating the presence of possible spam. Ideally the blocks would be large enough that the concatenated checksum is short, but short enough that differences are easily captured.

    You could try a keyword search for "error detection" or "checksums" using a publication search engine like Citeseer, or INSPEC if you have access through work or school.

    --
    Toronto-area transit rider? Rate your ride.
  135. Comparing Email by Zillatron · · Score: 1
    So was Code Red just a way to help us compare emails?

    If so someone ought to help the author figure out how to target only the correct kind of files...

  136. Corporate mail users are too stupid.... by cisco_rob · · Score: 1

    I work for an ISP. We run an open relay server. Know why? I tried subscribing to ORBS once. The next day I got about 20 calls from people saying that they couldn't receive emails from their clients. ("something about blacklisted? what is that?") I can't deny mail from open relay servers, so I can't stop spamming because my customers are too stupid.

    --
    "I do not fear computers. I fear lack of them." -Isaac Asimov
  137. Watermark the email - probability of spam. by patniemeyer · · Score: 1

    What we need then is a hash mechanism that is resistant to minor mechanical changes. Watermarking technologies (attempt to) do this for much harder things like sound and images. I'm sure it has been done for text.

    Then to extend the concept we need to assign a probability to the spam. Perhaps your mail client would then sort it and show it to you colored yellow for probably spam so you could more easily delete it.

    Pat Niemeyer
    Author of Learning Java, O'Reilly & Associates

  138. Spammers are like hackers... by kypper · · Score: 2
    I suspect spammers would just include a counter to break checksums tho.

    Can't stop em but for a while... and they're absolutely abnoxious about it. (well... script kiddies are. No offence to you bright and ethical)

    Screw 3...

  139. For USENET! by gnovos · · Score: 3

    An idea similar to this could and should be tried to bring the USENET back into the hands of masses. Having some sort of k5 style moderation used on USENET message id could potentially end spam as we know it. The simplest appriach would be to have a few groups fo competing "moderation" servers that you could query and rate messages by thier message id and then build in some client plugins to filter based on a given threshhold. Of course to really get the system to work, some thought would have to be put into authentication (say only 5 moderations allowed per IP per day, or even have an actualy login proccess to moderate) to keep spammers from moderating up thier own posts. If we have a loose network of many of these moderation servers, they all use different ways to pick out the good posts and user preference would dictate which system works best.

    Anyway, just my 2 cents...

    --
    "Your superior intellect is no match for our puny weapons!"
  140. Worms? by All+Dead+Homiez · · Score: 3
    Obviously there are issues with something like this (especially mailing lists, and worms that do attachments)

    Is there some hidden reason why we would want millions of copies of an email worm's attachment to get through? This could actually be part of the solution to two problems.

    Also, do note that a common method of spamming is to connect to an open relay and have the relay take care of sending out thousands of identical messages by simply sending thousands of "RCPT TO:" commands. Checksumming spam would completely break this spamming method and would force the spammer to retransmit the entire message for every recipient in order to vary it, thus making the process more costly.

    -all dead homiez

  141. USEnet anti-spam tactics by p_trinli · · Score: 1
    Spam on USEnet doesn't bother me much. I find that the groups to which I subscribe (and I imagine most newsgroups are like this) have a core group of veterans that use:
    • Clear subject lines, so spam sticks out like a sore thumb
    • Kill filters to filter the most obnoxious spammers
    • Loads of on-topic messages (i.e. preceding subject lines with "F*** spam!", then proceeding to dump lots of goodies


    --
    Aaron J. Shaver
    http://aaronshaver.com/
  142. hey now that's a great idea... by theantix · · Score: 1

    ... perhaps the best one I've heard of so far in this discussion. If I had mod points I'd give 'em to you.

    --
    501 Not Implemented
  143. Countermeasures by CommieLib · · Score: 2

    I ran some basic design concepts on this idea a few years back (nothing as sophisticated as DCC). I came to the same conclusion as the other readers, re: Countermeasures: the spammer would integrate something random into the message that would foul identification. There is simply no way around this. So the question becomes: at what point does the countermeasure become so expensive and difficult that the spam itself reaches the point of diminishing returns? Or, put another way, what can we track that would make the message so difficult to cloak that it wouldn't be worth it to do?

    The cloak would have to be human-labour intensive, so it has to relate to the meaning of the text itself. I came up with a few variations, but in my own little thought-world, the most dependable signature for a spam was a key composed of the grammatical types of each word in the email. Chaff, or non-identifyable text would be ignored. With this system, even the words could be randomly generated (Get {rich, wealthy, affluent} and the signature would remain the same. How unique would the key be? I never did serious research, but it seems like it would be.

    The major problem I encountered is that once this was done, the spam generator could then rotate the order of the sentences, or drop non-essential sentences altogether. You could make the key non-order dependent, but that would drastically reduce the uniqueness of the key...anyhow, the similarity index identified in this thread is a blazingly simple idea that somehow escaped me. Maybe it's time to dust of the docs...

    --
    If your bitterest enemies are people who hack the heads off civilians, then I would say you're doing something right.
  144. It's not all bad. by shimmin · · Score: 1
    Honestly, the evolution of spam has a certain aesthetic to it. Kind of like a computer virus or a deadly disease, after you get past the whole horrible consequences aspect of it, even spam might be appreciated as an engineering marvel.

    I mean, polymorphic software is something we all want to see, and if the mass mailers are the only ones who are going to develop it, let them.

    Let's hear it for polymorphic spam!

  145. SPAM in Brazil by pdcull · · Score: 1

    It seems that Brazilians have fairly recently deluded themselves into thinking that SPAMming is some sort sort of credible mass-marketing technique.

    In the last couple of months, the amount of SPAM in my in-box seems to have increased by an order of magnitude. The interesting part is that Im Pretty Darn Sure (tm) that my bleeding ISP sold my address to the Spam CD makers, as I only use an alias and not my ISPs domain, and yet most SPAM arrives at both addresses.

    The Dumbest of All Spammers has to be the citizen the recently sent me six copies of his Curriculum Vitae, claiming to be Support Analyst and looking for work! I tried to point out his stupidity, but his ISP account had already been blocked by the time my message got to him.

    Im not normally in favour of the death penalty, but am prepared to make an exception in the case of spammers...

  146. Why go through that much trouble to detect SPAM? by Lars+T. · · Score: 1
    At least on the user side a couple of simple filter does the same job. Calculating the checksum, sending it to the server (immagine thousands of people doing so), having it checked against other messages (the first ones to report will fall through, because there is no comparable message there yet), then getting a reply like "1363 users got the same message." Too bad it was something from a mailing list.

    On the server side this may be more practical, but I don't want my mail server to delete any mail I get, just because others got the same.

    --

    Lars T.

    To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

  147. It's a great idea. by xxxxx · · Score: 1

    Has anyone thought about an x86 port?

    --

    ~xxxxx
  148. A different kind of opt in by lunchlady55 · · Score: 1

    For my own account, I add everyone whose email I am willing to read to my address book, then check all incoming mail's 'from' line against that list. If there's a match, it goes to my inbox, otherwise to a folder called 'Unknown Senders'.

    If I'm in a hurry, I just read mail in my inbox. If I've got a few minutes, or I'm expecting mail from someone new, I look at the mail in unknown senders.

    I've noticed the same spammers hit me again and again, but that just makes them easier to spot in a big list because the look the same.

    Granted, this treats the symptoms and not the cause, but it can offer some soothing relief for bloated mail accounts until a cure is found.

  149. Checksums are not hopeless by John+Langford · · Score: 1
    The particular checksums you use can be chosen in a way which is robust against small alterations in the document.

    I had a similar idea and wrote up some analysis which details how this can be made robust.