Slashdot Mirror


Using Statistics to Cause Spammers Pain

mlamb writes "Statistical mail classifiers like PopFile save time on the part of their users, but don't do anything to actively combat spam. I just published an article that suggests a way to use classifier output against a spammer while they're connected to your SMTP server, and I'm launching a project called TarProxy to implement it."

37 of 334 comments (clear)

  1. Nice idea by TheViciousOverWind · · Score: 2, Interesting

    But what if the spammer sends a message to a (good) SMTP server which haven't got the system, and the SMTP server in turn tries to deliver the "spammail" to the right SMTP server, won't that hurt the good SMTP server, who just tries to do it's job?

    --
    My <1000 UID is with a hot chick
    1. Re:Nice idea by TheViciousOverWind · · Score: 3, Interesting

      But then the only way the actual spammer would be sending from your server is if you have an open relay? So the idea would be to set up false open relays? But wouldn't the spammer just black/whitelist the servers? The place where I work once got hit by a spammer, (because we used some matt formmail script), it all happened automatic in steps: - Some webspider found out about the formmail.cgi - The spider sends a mail to some hotmail account - 15 minutes later (I guess after confirming the mail got through) it started sending mails non-stop. - 30 minutes later, we could see some other type of traffic (The bot apparently sent out mails about the open relay to other spammers (possible persons who bought access to the open relays?)). All the while we were on the phone with the police computer-crime department, which didn't know what to do. Then we denied those users access to the network and patched up the security breach (We were waiting to do that, while talking to the police, in the hope that they could actually do something, since the spammer were spamming "right now"... But apparently they were quite clueless).

      --
      My <1000 UID is with a hot chick
    2. Re:Nice idea by stand · · Score: 2, Interesting
      Unfortunately the critical mass for this to really work is very, very large.

      I don't think this is necessarily true. As the article points out, setting it up on a few servers would be sufficient to get things started provided those few servers were the right ones. I'll leave it as an exercise to the reader to determine which servers they should be.

      I don't think they should be doing this in Java though. Java is not a text parsing language and this thing really requires some text parsing muscle. Cross platform ability isn't as important.

      --
      Four fifths of all our troubles in this life would disappear if we would just sit down and keep still. -C. Coolidge
    3. Re:Nice idea by jonadab · · Score: 3, Interesting

      > > Unfortunately the critical mass for this to really work is
      > > very, very large.

      Yes, it is large.

      > I don't think this is necessarily true. As the article points
      > out, setting it up on a few servers would be sufficient to get
      > things started provided those few servers were the right ones.

      Let me guess: Yahoo's several dozen, AOL's however many, and
      the ones at Earthlink, demon.co.uk, and MSN -- and I close?

      That's a very large critical mass, not in terms of the number of
      servers, but in terms of the amount of mail handled (and, therefore,
      the amount of server beef needed to implement any such measures).

      > I don't think they should be doing this in Java though. Java is
      > not a text parsing language and this thing really requires some
      > text parsing muscle. Cross platform ability isn't as important.

      No need to sacrifice the cross-platformness. Perl is a GREAT
      text processing language, performs faster than Java, and as an
      added bonus is much more cross-platform (provided you don't need
      a GUI (which for this you don't)). It does use quite a bit of
      RAM sometimes, but so does Java. And doing SMTP stuff in Perl
      is really easy. (Net::SMTP rocks in a significant way.) And
      any operating system that's remotely appropriate for use as a
      mail server probably comes with Perl out of the box these days.

      --
      Cut that out, or I will ship you to Norilsk in a box.
    4. Re:Nice idea by minas-beede · · Score: 5, Interesting

      There's a few spammers who send direct from their own IPs. If you want to tarpit them just tarpit the traffic from their Ips - you don't need to analyze anything.

      For other spam, through open proxies or open relays, you are not hurting the spammer to tarpit. If the spammer is working through open proxies and if you got enough tarpits going then you could hurt them, but until there's enough tarpits there is still zero (0.000) percent pain to the spammer. Some open proxes are slow with one or two tarpits, the others are fast enough to keep the spammer's server fully busy. He only cares if he's running his server flat out. Delays at one or more open proxies mean little.

      Right now I'm trapping spam on a relay spam honeypot. It comes to the honeypot from open proxies - theer's nothig I can learn about the spammer by learning about the proxies. It comes (usually) as 99-recipient spam messages. This particular spammer uses imbedded comments in his spam to evade Bayesian filters. Makes no difference to me - I see it is spam. I have no valid email to filter out - everything is spam. That's one of the beauties o a honeypot - the spammer does yor filtering for you.

      Somewhere over 20,000 recipients so far, since Wednesday. Here's a tiny sample, showing the URL's he advertises and the random comments he uses to defeat filters:

      [a href="http://www.directmailorderbrides.com/?oc=239 0]"A ni[!--HVtu--]ce la[!--HVtu--]dy

      [a href="http://www.flati.com/silagra/"]L[!--WPVizB-- ]im[!--WPVizB--]ited

      (I replaced agle brackets with square brackets - tou'll have to imagine them restored.)

      I have no filter, no smarts of any kind. The honeypot is a mail server with the output queue stopped. I got the spammer to start sendng spam by delivering to him three of his relay test messages - he'd sent so many I decided to see who he was, what spam I'd get if I did deliver.

      I'm trying various ways to hurt the spammer but I've not yet delivered enough hurt - he's still operating. Other spammers have succumed more readily - this guy is better at hiding himself.

      Note, by the way, that he puts no comments in the URL - if you filter on those (or remove comments before filtering - that would be easy) the spam instantly is revealed. One guy simply rejects any email message with three repeated comments in a line (this spam is laced with the comments throughout, not just in the http lines.) The spammer's clever way of obscuring the spam is useful in identifying the spam - no points for Spammy.

      Windows users with a permanent connection can step into running a relay spam honeypot very easily: they can run Jackpot: http://jackpot.uk.net/

      There is at least one open proxy honeypot out there: Google in news.admin.net-abuse.email for it. These can be very wicked - create your own for even more fun. Or create your own open relay honeypot - see if you can make it even more wicked.

      (Oversize reply packets from an open proxy honeypot might have a very interesting efffect.)

    5. Re:Nice idea by Shoten · · Score: 4, Interesting

      First off, you are incredibly wrong. Almost all spam is bounced off of servers that relay...that is, they forward mail for users of any domain. That's why this concept exists; spammers search for "open relays" (that's why they're called that, btw) and use them. TarProxy would look like a normal open relay to the spammer, and therefore he would use it.

      Unfortunately, there is a problem. Before TarProxy there was another thing, called a "teergrube" or "tarpit." What it did was slow down the connection (with things like ICMP source-quench and psychotically small TCP window sizes) so that it acted like a spam speed bump. In the meanwhile, it didn't actually forward any of the spam anyhow. Why didn't this technology become more widespread? I'm glad you asked! Because it was trivial for the guys who develop spammer software to recognize these systems, have their software detect such behavior, and cease using them within less than a minute. And that's what will happen with a TarProxy, alas.

      --

      For your security, this post has been encrypted with ROT-13, twice.
    6. Re:Nice idea by Bob+MacSlack · · Score: 2, Interesting

      I was just thinking about this. The poster said that one email came through initially to check that the relay works. If this email doesn't get sent, then the spammer knows its not an open relay and moves on. This is all automatic as well, so wouldn't cause them and grief. But what if you set it up to allow that first message? So the relay gets marked as open, distributed to other spammers, but then when the real spam starts, it all goes to /dev/null? The spammer wouldn't even know it was happening unless they were continually checking to make sure. Eventually it would get blacklisted, but not before it caused their servers to waste a bit of time and save a few people's mailboxes a message. Maybe even combine this with the tarproxy idea of slowing the connections to maximize their wasted time.

      But I agree, something definitely needs to be done about smtp, it is WAY past its prime. Spam is a battle that must be fought on many fronts, but the servers are definitely the most important.

  2. Interesting idea by Quasar1999 · · Score: 4, Interesting

    Just one question... what if the spammer doesn't connect to your SMTP server to send billions of messages from it? What if the spammer (with half a brain, and some scripting ability), only sends a few emails through your SMTP server? Most SMTP servers are wide open still, and simply sending 10 emails on one server and moving on to another open server would be so low that statistical usage wouldn't show anything on the radar screen... or did I not understand what you are trying to do?

    --

    ---
    Programming is like sex... Make one mistake and support it the rest of your life.
    1. Re:Interesting idea by minas-beede · · Score: 4, Interesting

      Spammers have done exactly that. A year ago almost all relay spam I trapped came as two 21-recipient spam messages followed by about an hour of silence.

      My current spammer is sending 99-recipient spam, and sometimes he sends as many as 10 in one session. All the spam stays on my system - he is totaly wasting his time.

      I've seen a lot of recent single-recipient spam, I've seen single spam messages with recipient counts in the thousands. Much relay spam reaches my relay spam honeypot from open proxies. I think thee was some in January that came direct from the spammer.

      This (running a relay spam honeypot) is easy for many Windows users - try it yourself: http://jackpot.uk.net/

      Linux users can make Jackpot work (it's in Java) or they could jimmy sendmail (or some other MTA) to be a honeypot - do it on a second Ip with no other email function. The MTA I use is so old it doesn't know EHLO. You don't need sophisticated tools to beat the spammers.

  3. OpenBSD Spam Blocking Engine by Incadenza · · Score: 5, Interesting

    The hurt-back part of the project is not new. Theo de Raadt is working on just that, in connection with an IP number list (much faster, so suitable for busy servers):

    Very simply, this hangs the full list of ~12,000 spam-sending IP/mask entries listed at www.spews.org off a pf(4) rdr-anchor (which is only entered for port 25). When connections from these spammers arrive they are redirected to a daemon which minimally fakes the SMTP protocol with very low overhead -- for multiple connections at the same time -- and then the message is left on the sender's queue by providing a 550 return code.

    The theory here is that most spam still comes in via open relays, and the only way we are going to convince them to clean up their act is to waste _their_ disk space, their time, and their network bandwidth more than they waste ours. For those spammers who drop messages when they received a 550, well, we have not wasted any further time or network bandwidth, and even in that situation I think some of the might remove an address if they receive a 550.

  4. Re:Anti-Spam software by stinky+wizzleteats · · Score: 5, Interesting

    I've been using bogofilter for a while now as a pass-through tagging mechanism. I filter on the client side based on the tag information. This sounds a lot like what you are doing.

    The only thing close to a false positive I've gotten was having to dumpster dive into my spam folder to retrieve an amazon order confirmation.

    Bayesian filtering really works, but you have to train the filter correctly and with as large a corpus as possible.

  5. Misunderstandings by MajroMax · · Score: 3, Interesting
    There seem to be some currently-popular misunderstandings about this article. This TarProxy is not intended to be running on outgoing SMTP servers -- it makes no sense to throttle clients that you're supposed to be monitoring anyway.

    Instead, this is meant to be run on the incoming SMTP server, the one that receives the mail. It will only hurt the spammer if he's trying to send a bunch of spam to your domain, but every server running this can help.

    --
    "Evil company X is threatening to restrict our rights! Let's all get together to stop--OOOH! SHINEY!!!" -- AC
  6. remove the open relays by vinnythenose · · Score: 4, Interesting

    The easiest solution is to have no open relays. I know I know, it ain't gonna happen, but perhaps this could convince more of those relays to close their doors:

    What we do is have a small app that plugs into eudora, outlook, evolution, kmail etc. Whenever you get a spam, you click a button, it scans the header, finds the smtp server that sent the spam and then sends them 1 email informing them of the fact that they are sending spam (of course you need a way of getting the sysadmin's email address).
    If enough people did this then the bad relays would be swamped with emails informing them of the spam they've been relaying, and they might close their relay. And non-open relays that just allow spammers to spam might think about being less friendly to spammers.

    What do people think, is it lame?

    --
    --- I used to moderate, then I read the -1 articles and decided having to filter through them was not worth it.
  7. Actually, the author addresses that here... by Radical+Moderate · · Score: 2, Interesting

    Check out http://www.martiansoftware.com/nailgun/

    --
    Never let a lack of data get in the way of a good rant.
  8. Naughty idea: DDOS open relays according to RBL by rpresser · · Score: 2, Interesting

    Step 1: sysadmins band together in a DDOSOR alliance. Step 2a: Spammer uses open relay for spam campaign. Step 2b: Alliance member starts to receive spam. Step 2c: DDOSOR alliance is notified immediately and starts one-hour DDOS attack on open relay. Step 2d: open relay can't finish sending spam. Step 3: Profit!

  9. Re:Anti-Spam software by Scooter · · Score: 2, Interesting

    I didn't think there was a solution available to this either - but I have since implemented a SpamAsassin script that logs in to my IMAP mailbox at my ISP, deletes all the spam, and then fires up fetchmail to grab what's left. I did loads of testing and kept the spam in a seperate folder for a few weeks just in case, but it never deleted anything that wasn't spam - so now I don't bother moving it - it just zaps it stright off the IMAP server. Yeah one day it might delete some non spam - but what the hell. It accepts "whitelists" for known good recipients. Some spam still gets through - but nothing like the 150 odd I used to get each day. Of course this doesn't really stop the spam being delivered to my ISP - and wasting bandwidth etc etc, but at least I don't have to stare at 30 variants of the Nigerian scam, 10 invitations to a bigger penis, and (more worringly for me) bigger breasts, 15 or so attachments (.scr, .jpg.pif, and those real cunning ones with 100 spaces before the extension - lol), and for some reason beyond my capacity, a fair old amount of email about septic tanks. About 35% of this email was from Korea/China but most of it was from the USA.

    I can reccomend SpamAsassin - I'd never used Perl before and probably never will again (nothing against perl - I'd just rather use one script language for my own stuff, and I happened to see PHP first!) but like most script languages it was easy enough to cobble something together, using SA and the imap perl module.

  10. Easy to defeat, just use dynamic spamming software by sanermind · · Score: 4, Interesting

    Easy to defeat, just use spamming software that dynamically increases it's connection pool whenever it encounters a 'slow' SMTP recipient. Even if a large part of the net population were running this, the spammer could just spawn thousands of simultanious (slowed down, yes) connections, and still maximize his bandwidth utilization. If it takes 2 minutes to send each message, it dosen't matter if he's sending 5000 messages at once!

    I believe linux, for example, allows up to 8192 open sockets, and I think this can be changes with a sysctl command, and most definitely could be with a few changes to kernel headers.

    Sure, it would take a machine with decent memory, but that's not too hard to find.

    --

    ---
    the pen is mightier than the sword, the sword is mightier than the court, the court is mightier than the pen.
  11. Same idea, different approach. by shadwwulf · · Score: 2, Interesting

    I'm thinking that using spamassassin along with qmail-qfilter and a small perl script to tie it together that envokes a sleep() loop for every spam-like message, that it could easily be used to do the same thing because spamassassin kicks back a score for the message's likehood of being spam...

    cheers..

  12. Training by gmuslera · · Score: 2, Interesting
    The idea sounds good, but as far I understand, bayesian filtering is based in training, and what is learned could be different from user to user.

    If you do an static word frequency list, spammers will pass around it (check in POPfile site for the latest spammers tricks), if is dinamic, then the users of your system must train it for a while (someone must tell that some message is spam or not, reading it). You must have another way to access your server for the training thing, and then another possible point of vulnerability.

    And more than this, as it depend on the user, you should not use a common word frequency list, you should have one for each user, and check if the message is spam against destination word base.

    At best, it will work for the users that care to train this server, for the other users that don't want to waste their time spam will be coming at the same speed as before. At worst, you'll be using a common list for all, and maybe slow down receptions of mailing lists or things like that, and people in your server could be unsubscribed from some of them.

    Is a good idea, but there are some things that should be implemented with care, and should work only for the users that care about it, the others should not be slowed down because you can put obstacles in the reception of normal mail.

  13. "Stations of the Cross" Relays attacking relays. by Nonesuch · · Score: 5, Interesting
    We are working on a project called "Stations of the cross".

    I have several domain names that appear on many of the "million address" CDs and other popular spam lists, but which longer any legitimate recipients/users.

    We are also working on obtaining access to true "realtime" RBL lists of currently abused open relay servers. Assistance would be appreciated.

    The core of "stations of the cross" is a custom DNS server. This server is authoritative for these oft-spammed domains, and each time a request is made for an MX record, it returns (with a short TTL) a unique randomly generated list of MXes, each address on the list being a known open relay.

    So when a spammer or relay first goes to deliver a message, the system will select an open relay off the list of MXes, and hands off the message to that host. Being an open relay, the host accepts the message for my domain, then goes to do a DNS lookup for the MX record. The relay receives a (different) list of other open relays...

    Usually, you can get a message to traverse a dozen or more open relays (most sendmail systems default to a maximum "hop count" of 25), after which the message will bounce.

    Since the only traffic my server has to deal with is DNS queries and responses, this is very low-overhead for me, but depending on the size of the spammail, very high overhead for the open relay servers.

  14. Re:OpenBSD's spamd by isn't+my+name · · Score: 2, Interesting

    Actually, it isn't quite the same thing. What spamd does is to use up resources on open proxies by sending back a bunch of bounces. He identifies these by using SPEWS, or some other list of open proxies. The side effect of this is that you will be bouncing all messages from them. If you are unfortunate enough to have a business relationship with someone with an open proxy, then you have just stopped any ability to communicate via e-mail by running spamd.

    However, if the idea suggested in the article are implemented, you will still be using up resources on the open proxy, but only for those messages that are actually spam. You can still receive e-mail from idiots running open proxies if you have the misfortune of needing to.

  15. Predictable failure? by Euphonious+Coward · · Score: 3, Interesting
    The first two design principles they suggest:
    • Free: It's no good unless it's everywhere... or at least in lots of places. TarProxy is Open Source Software released under a BSD-style license and available on SourceForge (see project page for details).
    • Platform Independent: TarProxy is written in Java, so it runs on Linux, Windows, Solaris, OS X, and any other operating system with a Java Virtual Machine available.
    contradict one another, and therefore directly suggest incipient failure. Any program you want widely deployed had better not depend on having some buggy JVM installed.

    (Arguably that is the reason that Freenet has been a practical failure. Every time I have tried to use it, it has got stuck in an infinite loop, or consumed all my swap space, or crashed. I blame buggy JVMs.)

    If you want software to be widely and successfully deployed, it should (must!) resemble the software that already has been. Almost all such code (99%+) has been in C or in C++. Are there any Free Software programs written in Java successfully deployed outside of Java development shops? (Rhetorical question; the answer is "not enough to matter".)

    If you want portability to Unixes, to w32, and to Macosix, you already get that with Gcc and autoconf.

    If it's in Java, I certainly won't run it as a daemon.

  16. 550 is wrong. Use 450 instead! by laing · · Score: 4, Interesting

    A 550 error is a permanent reject. The spam source knows that the mail cannot be delivered so it quits. A 450 error tells the connecting smtp server that your server is temporarily unable to deliver the mail, but that it's not a fatal error and delivery should be retried. This is much more likely to keep the message in the spammer's mail queue.

  17. Argh! by SecretAsianMan · · Score: 3, Interesting
    It seems like every proposal I hear for a solution to the spam problem concludes with "If enough people did this, then...". That highlights the main problem with tarpits and similar mechanisms that only work when used en masse. Guess what? There's not a icicle's chance in hell of there being enough people to make any of these schemes work. As long as Johnny Sixpack and Patricia Partygirl (who probably outnumber the geeks at this point) keep using their spam-magnet Hotmail accounts and engage in activities conducive to having their addies harvested, spam will survive.

    Personally, the spam solution I like the best is to have procmail+formail or some other tool sitting on your mail server and making unknown senders go through a confirmation step. It doesn't work for everyone (for instance, people expecting email replies to résumés! NAGI...), but if it works for you it tends to work very well. It inconveniences everyone else, but hey, everyone else is not me. I can whitelist all the people I truly care about.

    Either that or we should throw out SMTP, email RFCs, sendmail, etc. and build a spam-free system from the ground up. Yeah, right.

    --

    Washington, DC: It's like Hollywood for ugly people.

  18. Bouncing? by pz · · Score: 2, Interesting

    How about a manual method where one creates a ficticious bounce message from spam that has made it to the mailbox?

    The idea is the following: spam gets through whatever filter you might have, but you still want to reject it, and given that some spammers MIGHT be trimming their lists based on bounces, you forge a bounce message from the spam.

    Does anyone know if this is possible with, eg, RMail or VM (or something else) running under Emacs?

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  19. relay honeypots are better by Charles+Dodgeson · · Score: 2, Interesting
    If more people would run relay honeypots such as jackpot that might make a dent in the economics of spam.

    I'm not saying that the recipient server tar-pitting is a bad idea, but I think that there are more effective ways of raising the cost for spammers. Blacklisting the entire /24 of anything supporting spam would pressure providers to nuke spammers (or at least pass on costs to spammers).

    --
    Prime numbers are exactly what Alan Greenspan says they are -S. Minsky
  20. Re:"Stations of the Cross" Relays attacking relay by Anonymous Coward · · Score: 4, Interesting

    Want to find open relays? Here's a nice simple way I implemented a couple of years ago, and ran for awhile. It's quite simple, and detects single stage relays rather quickly.

    Write something that listens on port 25. When it receives a connection, connect back to the calling host on port 25. If the connection attempt succeeds, copy characters back and forth. Anything they send to you, you send to their port 25, and vice-versa.

    If it's a true open relay, it will gladly accept the mail over and over again. I had a few mail servers looping THOUSANDS of times through me since they didn't check Received: headers. I also realize that it would be trivial to *ahem* "break" the Received: line such that it wouldn't increment the counter.

    Granted, that sucks down bandwidth, so back to the point - proving that this is an open relay. What you do is stick a magic header in the message as it heads back to them. If you receive that header back from a host, it's something you've already looped, and they're an open relay.

    Now you know they're an open relay, so you can add them to your MX lists. You can also then avoid letting them run through your looper, since it won't provide any more data.

    The beauty of this plan is that you're only giving them what they pushed upon you first. If they leave you alone, you leave them alone. It's a nice implementation of a concept I wish more people would honor.

  21. Question by helix400 · · Score: 2, Interesting
    If I were the spammer, and these S L O W tarpits really mess me up...my first instinct would be configure my program to keep track of the transmission rates of every outgoing email. If one started off fast, but slowed down, I'd cut the connection immediately, log that address away in some "do not spam again...he's a tarpitter" list, and move on to the next victim.

    Would that work? Or would trying to keep track of 20,000 outgoing email's transmission rates simultaneousy cause more problems than its worth?

  22. Re:Anti-Spam software by bergeron76 · · Score: 2, Interesting

    Bayesian filtering is a great technology, but the OSS movement really needs to tread-lightly or get some legal beagles to help us analyze the implications of inherently using it, because MSFT has a patent on it. We (the OSS community) need to make sure that we can easily and indisputably prove "prior-art" in the event that MSFT tries to overwhelm some of our best projects with _expensive_ legal tactics.

    I can't help but think that we need to _really_ be on our guard with regard to things like this, becuase I wouldn't put it past MSFT, et al. to soak up much of the good IP (Intellectual Property) and then try to "drop the hammer" on us down the road.

    Just my .02 cents...

    --
    Don't think that a small group of dedicated individuals can't change the world. It's the only thing that ever has.
  23. Re:Answer by minas-beede · · Score: 3, Interesting

    It's most fun to do the dirty work against the spammer. What he thinks is an open relay doesn't have to be one.

    This one whacked Ralsky hard for several months - Ralsky never caught on: http://www.corpit.ru/cgi-bin/h0n5yp0t

    You can do it, too:

    http://jackpot.uk.net/

    And please do.

  24. Spamminess Calculation Problem by 6e7a · · Score: 2, Interesting

    In my experience the strongest indication of spam is near the very end of a message, where is says something like, "click here to unsubscribe." If you've already accepted that much of the message, isn't it possible that the spammer will only have to wait for the message acknowledgement before it disconnects? What the spammer may see is either a normal or a slow acknowledgement. Is that enough to make a difference?

  25. overly complex by Dossy · · Score: 2, Interesting

    I've been using qmail, qmail-scanner and SpamAssassin with a few very minor tweaks to deter spammers. Basically, qmail-scanner runs SpamAssassin, and if SA returns with a score above 15.0, instead of sending a "250 ok" to the spammer telling them the mail was accepted, I send back a "5.3.0 spam detected" -- this seems to have gotten me off a couple of spam lists where the spammers actually care enough to clean their lists.

    I made these tweaks because once the mail is sent and the spammer has disconnected, there really is no way of getting information back to them that you're rejecting their mail. So, you have to reject it at the time they've got the SMTP session established ... which I've done.

    TarPit seems like an exercise in overengineering with little proof that it'll do anything to hurt spammers -- they'll figure a way around the tarpit, somehow.

    -- Dossy

  26. Re:Anti-Spam software by stinky+wizzleteats · · Score: 2, Interesting

    We (the OSS community) need to make sure that we can easily and indisputably prove "prior-art"...

    Done.

  27. Re:Why use the statistics? Throttle it all! by Anonymous Coward · · Score: 1, Interesting

    Here's why: because that will penalize legitimate
    but large-scale mail servers.

    For instance, my ISP is a cable modem provider.
    They have probably tens of thousands of clients
    locally, all sharing an SMTP server. That SMTP
    server is presumably busy all day delivering a
    large volume of legitimate mail. If everything
    is throttled, then it will become much harder to
    run that server.

    (And yes, you can increase parallelism, but
    there is a limit to how much you can do that.
    Each thread or process has a certain overhead,
    and the maximum number you can have going at
    once is probably smaller than you think.)

  28. A much more backward approach by Anonymous Coward · · Score: 1, Interesting

    Why not attack spammers in the reverse way. So, imagine for a moment that you are a spammer. You get paid more for more results, right. Well, imagine that I make an SMTP server that when you spam me, I search through your message, and start crawling all embedded links, until I've hit about 20 of them. But I don't stop there. I keep crawling. Maybe if there is a form on the site, I fill it out with bogus garbage, and submit it.

    The idea would be that when someone sends out spam, this server would generate a flurry of activity. Suddenly, the spam would be so effective at bringing traffic, but not actually effective AT ALL at bringing valuable traffic. Imagine if you hired a spam company to promote your site, and suddenly, your site had the /. effect, but none of the traffic was genuine. You would be paying for lots of wasted bandwidth with less results. How could you distinguish good results from bad ones?

    If you want to stop spam, you have to attack the people who benefit from it, not the people who perform it.

  29. iffy at best by tacocat · · Score: 2, Interesting

    I understand this guys theory of operation, but I am not convinced of it's value for the following reasons:

    • Each slow link results in a port being consumed on my machine. If I have a limit of 64 simultaeneous threads on my box, this can be effectively deployed as a Denial of Service tool.
    • Bayesian filters are already suffering from a problem where spammers break up works with bogus http tags: Via<foo>gr for fr<bar>ee. This simply means that they have to front load their email messages with a lot of cleaner words in a white-on-white text or just keep using the bogus html tags.
    • You are going to have a tremendous negative impact on all the false positives, which are rampant in the beginning of any Bayesian implimentation

    With all that aside, there may be some points in this that are valid. But I'm not certain that the usage of mail servers by spammers is going to be entirely effected by this technique.

    Wouldn't it be easier to simply challenge each incoming IP address to test it for being an open relay and if so, REJECT?

    I think that the postfix group has a similar concept for testing any incoming email address in the MAIL FROM tag to see if that address can in turn accept mail.

  30. Re:Bayesian filtering - no problem by waynemcdougall · · Score: 2, Interesting
    Russian wives. I was surprised directmailorderbrides wasn't picked up, but as it turned out, that's the first time that word (token) has appeared in any of my email.

    Note that while the Paul Graham rating of 0.999999999999999 is high, in practice I use Gary Robinson's calculations (more refined and use even infrequently occurring tokens - I get better, less extreme results). Gary Robinson's spam rating on this is: 0.61705129961986 That may seem relatively low, but is on a different scale and is firmly indicative of spam.

    Unlike Paul Graham, I don't parse out (and ignore) HTML comments. I find all information is useful, and I find it just as effective (and simple) to treat the text as a straight byte stream.

    --
    Recycle PCs and build a wireless community network www.hillsborough.org.nz