Spam Trap Claims 10x-100x Accuracy Gain

← Back to Stories (view on slashdot.org)

Spam Trap Claims 10x-100x Accuracy Gain

Posted by kdawson on Monday December 3, 2007 @03:31PM from the see-it-when-i-believe-it dept.

SpiritGod21 writes in with a NYTimes article on a new approach to spam detection that claims out-of-the-box improvement of 1 or 2 orders of magnitude over existing approaches. The article wanders off into human-interest territory as the inventor, Steven T. Kirsch, has an incurable disease and an engineer's approach to fighting it. But a description of the anti-spam tech, based on the reputation of the receiver and not the sender, is worth a read.

13 of 419 comments (clear)

Min score:

Reason:

Sort:

Ummmm.... by rustalot42684 · 2007-12-03 15:36 · Score: 3, Insightful

I read part of TFA, and it seems to be saying that you can id spam mails because they are being sent to a person who gets lots of spam. But that still doesn't take into account the fact that that person also receives legit mail, AND the fact that what is spam to one person isn't spam to another.

Also, seems like a bit of a slashvertisment for what is yet an unproven technology - the only benchmarks we have are ones they provide.
1. Re:Ummmm.... by Mundocani · 2007-12-03 18:04 · Score: 4, Insightful
  
  The main problem I can see is that even if this system works it is easily circumvented. The big assumption is that you can identify the recipients of a particular message, but spammers can easily ensure that information isn't easily obtained.
  
  First they can ensure that the message itself doesn't contain any recipient info (a big bcc basically).
  
  Then they avoid batching recipients based on their domain so he SMTP server can't tell who else is receiving the message.
  
  The only way to derive the recipients now is to compare all messages against all others in order
  to match them up. So they hash every message and combine those with identical hashes.
  
  But putting a little unique text in each message during transmission foils that.
  
  Spammers: 1 New weapons: 0
Yet another wrong answer... by damn_registrars · 2007-12-03 15:41 · Score: 5, Insightful

At least once a week there seems to be another flashy technique to filter or block spam. Great.

Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem. Spam is sent by people who are making money helping someone sell something. The spam you got this afternoon for discount v!@gra or 0EM software is making money for someone. And as long as someone can still make money off of it, they'll keep doing it.

If you want to stop spam, you need to take away the economic incentive. We've already seen how many spam filtering / blocking programs produced in the past 5 years? But yet the spam problem just keeps growing as the number of "solutions" grows. This tells us that the spammers are more than willing to work on ways to circumvent these reactive techniques, so that they can continue to make money off their deeds.

Once we can stop spam from being profitable, we will finally see it go away. But no sooner.

--
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
1. Re:Yet another wrong answer... by ender- · 2007-12-03 15:51 · Score: 5, Insightful
  
  If you want to stop spam, you need to take away the economic incentive. We've already seen how many spam filtering / blocking programs produced in the past 5 years? But yet the spam problem just keeps growing as the number of "solutions" grows. This tells us that the spammers are more than willing to work on ways to circumvent these reactive techniques, so that they can continue to make money off their deeds.
  
  Once we can stop spam from being profitable, we will finally see it go away. But no sooner. But why would the anti-spam software companies want that? If they succeed in actually eliminating spam, they'd also go out of business. It may be profitable for the spammers, but I suspect it's even more profitable for the anti-spam companies.
  
  --
  Nothing to see here
2. Re:Yet another wrong answer... by ucblockhead · 2007-12-03 15:53 · Score: 3, Insightful
  
  Yes, and once we can stop drugs from being profitable, we will see them go away too.
  
  Oh, and prostitution, too. And identity theft. And insurance fraud. Yup, it's simple to fix. Just make it unprofitable! Simplicity itself!
  
  --
  The cake is a pie
3. Re:Yet another wrong answer... by pclminion · 2007-12-03 15:58 · Score: 4, Insightful
  
  At least once a week there seems to be another flashy technique to filter or block spam. Great.
  
  It's not "flashy." It's called information theory and statistics. It is an extremely powerful concept that has far more important potential uses than simply filtering spam email. Every new advancement in automated classification and knowledge extraction is VITALLY IMPORTANT to our ability to cope in a world which has suddenly been flooding with SO MUCH information. This power tool is being applied to what some might see as a "silly" problem, but the fact remains that spam is a powerful motivation to researchers to push further limits in the fields of pattern recognition, information and natural language processing.
  
  If you're against the advancement of information processing techniques, then... uh, okay, I guess. If you can't see beyond spam, you are terribly short sighted.
4. Re:Yet another wrong answer... by choongiri · 2007-12-03 16:38 · Score: 5, Insightful
  
  No, if you are harvesting email addresses and sending unsolicited commercial messages to them, it is quite simple:
  
  You are a spammer.
5. Re:Yet another wrong answer... by halcyon1234 · 2007-12-03 17:32 · Score: 3, Insightful
  
  how do you propose we remove the economic incentive for spam?
  
  Easy enough. Remove the customers. Set up a spam operation selling drugs. Except instead of sending what's advertised, send arsenic. Once all the customers have died, there won't be anyone left to buy spam-stuff. And, as a bonus, you help the genepool.
  
  --
  UTF-8: There and Back Again
6. Re:Yet another wrong answer... by Kadin2048 · 2007-12-03 18:22 · Score: 3, Insightful
  
  There's all sorts of commercial mail that's not spam. If I order something from you, and you send a reply back confirming my order, that's both commercial and definitely not spam. As is any other reply to an inquiry.
  
  Where it crosses the line and becomes spam is when it's unsolicited. That's the key. Unsolicited commercial email is the very definition of spam, and no amount of hand-waving about opt-outs or the selectivity of the lists is going to change that.
  
  Businesses that have relied on cold-calling via any medium to drum up sales have always been sleazy in my book, but when you do it via email, you're pushing the cost out onto the recipient and onto uninvolved third parties. That's at best unethical, and at worst flat-out theft.
  
  --
  "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Chicken-and-egg problem by sonikbeach · 2007-12-03 15:52 · Score: 3, Insightful

How does one initialize this system? Spam is determined by user reputation, yet user reputation is determined by quantity of spam received. Am I missing something? The logic seems circular.
Generalization of honeypots by CustomDesigned · 2007-12-03 16:41 · Score: 3, Insightful

Honeypots have been a published anti-spam technique for a decade. The idea is to publish bogus mailboxes that are not close to any legit mailbox. Any message with a honeypot as any recipient is spam. 100% accurate. (And I blacklist the IP for a week for good measure.) I use a variation, where any message with 3 or more invalid recipients is spam (blacklist IP). That is a little risky since someone may legitimately be trying various mailboxes manually with a telnet session because they forgot the exact name. This technique gives each recipient a score between 0 and 1 that reflects how close to a honeypot that recipient is, with actual honeypots (100% spam) being 1.0.
Re:No by arth1 · 2007-12-03 20:17 · Score: 3, Insightful

Ironically, you are completely wrong also - RTFA again. It isn't at all about senders, it's about recipients.

You didn't RTFA well enough. That it's about recipients is the selling point.
That's a truth with modifications, though. Look at the quote from the web site I put in my parent post to yours, which clearly shows that it's a block based on who the sender has sent an email to. I'll repeat it, in case you missed it:

"Because ratings are based on the most recent 25 emails for each sender, the system reacts instantly to spam attacks, usually within just a few messages."

Yes, it's a recipient based system in that it assigns a score to the sender based on what the recipients of the emails are. But the blocking occurs due to the score of the sender, based on previous emails, not on the recipient of the current email.

Just think -- if it was based on blocking based on recipient only, it would either block all or no e-mail to an inbox with a single recipient. It would then only be effective for e-mails with multiple recipients, which doesn't match the claims made.
Again, think, and read the article (and that goes for the moderators too).
Re:You are also totally wrong by arth1 · 2007-12-03 20:30 · Score: 3, Insightful

You have got the system completely BACKWARDS.
Sorry for AC but i've already moderated in this discussion.

(Ah, that explains the completely asshat moderation here, then.)

No, I didn't get it backwards -- RTFA. It's called a recipient verification system, but when you look at their own description on how it operates, you'll find that:

- It looks at the recipients of a message, and based on how much spam each of the recipient accounts gets, assigns a score to the sender.

- This score is accumulated over the last 25 emails.
(The reason for this is rather obvious, if you think about it -- if it based its score on just the last e-mail, if you sent an e-mail to someone who receives a lot of spam, it'd be automatically blocked, and that person would not get any e-mail at all.)

Say a sender sends three e-mails, to foo@foo.invalid, bar@bar.invalid, a bunch more people, and finally baz@baz.invalid. If foo@foo.invalid receives 30% spam, and the overall average is 80%, that means that the e-mail is unlikely to be spam. So a score is saved in a table for the sender. Then it goes to bar@bar.invalid, who also has a low 40% spam rate, and another "good" score is saved for sender. When the sender then after a while sends an email to baz@baz.invalid, who has a spam rate of 95%, the fact that he sent an e-mail to foo and bar earlier will increase the likelihood of his email to baz going through.
Conversely, if foo and bar received more spam than average, an e-mail sent to baz would be scored as more likely to be spam, even if baz received a record low 10% spam.

Yes, in a way, it's receiver based, because it builds the score based on the receivers' ratio of spam to valid e-mails. But the score is applied to the sender, and they state this in clear text on the web site itself. You only have to read past the sales pitch and down to the technical details.