Spam Trap Claims 10x-100x Accuracy Gain

Ummmm.... by rustalot42684 · 2007-12-03 15:36 · Score: 3, Insightful

I read part of TFA, and it seems to be saying that you can id spam mails because they are being sent to a person who gets lots of spam. But that still doesn't take into account the fact that that person also receives legit mail, AND the fact that what is spam to one person isn't spam to another.

Also, seems like a bit of a slashvertisment for what is yet an unproven technology - the only benchmarks we have are ones they provide.

Re:Ummmm.... by Mundocani · 2007-12-03 18:04 · Score: 4, Insightful

The main problem I can see is that even if this system works it is easily circumvented. The big assumption is that you can identify the recipients of a particular message, but spammers can easily ensure that information isn't easily obtained.

First they can ensure that the message itself doesn't contain any recipient info (a big bcc basically).

Then they avoid batching recipients based on their domain so he SMTP server can't tell who else is receiving the message.

The only way to derive the recipients now is to compare all messages against all others in order
to match them up. So they hash every message and combine those with identical hashes.

But putting a little unique text in each message during transmission foils that.

Spammers: 1 New weapons: 0
Re:Ummmm.... by doom · 2007-12-03 19:16 · Score: 3, Interesting

First they can ensure that the message itself doesn't contain any recipient info (a big bcc basically).

How exactly is a message supposed to get somewhere if it doesn't have the recipient info? I think you're confusing what you see in your mail box to what the mail servers see.
In any case, as is typical the news article doesn't really provide enough information to determine how the system actually works. It does sound like it's working on the premise that since spam is done in "bulk", if you see lots of identical messages going through a server you can assume that that's spam. The obvious problem would be that spammers can include randomly generated content.
But that problem is so obvious, it seems likely to me that I don't understand the system they have in mind.

Yet another wrong answer... by damn_registrars · 2007-12-03 15:41 · Score: 5, Insightful

At least once a week there seems to be another flashy technique to filter or block spam. Great.

Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem. Spam is sent by people who are making money helping someone sell something. The spam you got this afternoon for discount v!@gra or 0EM software is making money for someone. And as long as someone can still make money off of it, they'll keep doing it.

If you want to stop spam, you need to take away the economic incentive. We've already seen how many spam filtering / blocking programs produced in the past 5 years? But yet the spam problem just keeps growing as the number of "solutions" grows. This tells us that the spammers are more than willing to work on ways to circumvent these reactive techniques, so that they can continue to make money off their deeds.

Once we can stop spam from being profitable, we will finally see it go away. But no sooner.

--
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.

Re:Yet another wrong answer... by ender- · 2007-12-03 15:51 · Score: 5, Insightful

If you want to stop spam, you need to take away the economic incentive. We've already seen how many spam filtering / blocking programs produced in the past 5 years? But yet the spam problem just keeps growing as the number of "solutions" grows. This tells us that the spammers are more than willing to work on ways to circumvent these reactive techniques, so that they can continue to make money off their deeds.

Once we can stop spam from being profitable, we will finally see it go away. But no sooner. But why would the anti-spam software companies want that? If they succeed in actually eliminating spam, they'd also go out of business. It may be profitable for the spammers, but I suspect it's even more profitable for the anti-spam companies.

--
Nothing to see here
Re:Yet another wrong answer... by ucblockhead · 2007-12-03 15:53 · Score: 3, Insightful

Yes, and once we can stop drugs from being profitable, we will see them go away too.

Oh, and prostitution, too. And identity theft. And insurance fraud. Yup, it's simple to fix. Just make it unprofitable! Simplicity itself!

--
The cake is a pie
Re:Yet another wrong answer... by pclminion · 2007-12-03 15:58 · Score: 4, Insightful

At least once a week there seems to be another flashy technique to filter or block spam. Great.

It's not "flashy." It's called information theory and statistics. It is an extremely powerful concept that has far more important potential uses than simply filtering spam email. Every new advancement in automated classification and knowledge extraction is VITALLY IMPORTANT to our ability to cope in a world which has suddenly been flooding with SO MUCH information. This power tool is being applied to what some might see as a "silly" problem, but the fact remains that spam is a powerful motivation to researchers to push further limits in the fields of pattern recognition, information and natural language processing.

If you're against the advancement of information processing techniques, then... uh, okay, I guess. If you can't see beyond spam, you are terribly short sighted.
Re:Yet another wrong answer... by wizardforce · 2007-12-03 16:04 · Score: 3, Informative

how do you propose we remove the economic incentive for spam? ok let's see how this has been attempted or hypothesized in the past: charge a fee per email rather than a blanket fee from the ISP for access. ok but most of the real spam that is being sent is done through compromised PCs so attacking the problem by charging a fee per email is useless because the people in control of this spam-net are not the ones paying for bandwidth/email fees. ok then pass laws against it. that doesn't work either, the remaining spam-nets will still work because it can not be enforced in the host country let alone all those who are not subject to the law. ok then build better spam traps. tried that, it isn't doing so well- spam is still getting through in large numbers. educate people? that will certainly make things better in a lot of ways but there will still be that twat that actually wants to get spam... have ISPs cut off high bandwidth connections from those suspected of spamming? can anyone say privacy nightmare? as much as I hate spam I hate the idea of ISPs snooping through your email no matter what their reasons are. now what?

--
Sigs are too short to say anything truly profound so read the above post instead.
Re:Yet another wrong answer... by MightyYar · 2007-12-03 16:14 · Score: 3, Funny

As much as I'd like to forget it, I think your post made me realize that some spam is actually filling a market need. Ugh. Yay, capitalism!

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re:Yet another wrong answer... by choongiri · 2007-12-03 16:38 · Score: 5, Insightful

No, if you are harvesting email addresses and sending unsolicited commercial messages to them, it is quite simple:

You are a spammer.
Re:Yet another wrong answer... by Jimmy_B · 2007-12-03 16:44 · Score: 5, Interesting

Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem. Spam is sent by people who are making money helping someone sell something. The spam you got this afternoon for discount v!@gra or 0EM software is making money for someone. And as long as someone can still make money off of it, they'll keep doing it.
Not exactly. It's making money for the spammer, but it probably isn't making money for the person who hired him. You see, even if no one ever bought anything advertised in spam, it would still be sent. The problem is multilevel marketing, which creates a lot of people desperate to sell unsellable inventory, some of whom pay spammers to advertise it for them. A perceived economic incentive is enough, even if there isn't a real one.
Re:Yet another wrong answer... by penix1 · 2007-12-03 17:01 · Score: 3, Interesting

...and get very few opt-outs and many reactions.

I can imagine the reactions you get...

There are two reasons for this. First, nobody is receiving your emails because you are blocked nine ways to hell in their spam filters. Second, because most spam (yours included) use the opt-out crap for email verification of their lists. They know they have a live one so most sane people ignore opt-out links in email since they are dangerous.

what needs to be changed *IS* the opt-out crap. It needs to be confirmed-opt-in plain and simple. While they are at it, I wouldn't say no to outlawing email harvesting either. Throw in a $10,000.00 fine for each violation of either provision and call it pretty. Make half the fine go to the organization that hunts down violators and we got a sound business solution.

--
This is a sig. This is only a sig. Had this been an actual sig you would have been informed where to tune for more sigs.
Re:Yet another wrong answer... by halcyon1234 · 2007-12-03 17:32 · Score: 3, Insightful

how do you propose we remove the economic incentive for spam?

Easy enough. Remove the customers. Set up a spam operation selling drugs. Except instead of sending what's advertised, send arsenic. Once all the customers have died, there won't be anyone left to buy spam-stuff. And, as a bonus, you help the genepool.

--
UTF-8: There and Back Again
Re:Yet another wrong answer... by Kadin2048 · 2007-12-03 18:22 · Score: 3, Insightful

There's all sorts of commercial mail that's not spam. If I order something from you, and you send a reply back confirming my order, that's both commercial and definitely not spam. As is any other reply to an inquiry.

Where it crosses the line and becomes spam is when it's unsolicited. That's the key. Unsolicited commercial email is the very definition of spam, and no amount of hand-waving about opt-outs or the selectivity of the lists is going to change that.

Businesses that have relied on cold-calling via any medium to drum up sales have always been sleazy in my book, but when you do it via email, you're pushing the cost out onto the recipient and onto uninvolved third parties. That's at best unethical, and at worst flat-out theft.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Makes sense by Dan+East · 2007-12-03 15:45 · Score: 4, Informative

I own a number of domains, and receive all email to each domain in a catch-all account. I receive a great deal of emails to totally fictitious email accounts at my domains. Those recipients receive 0% legitimate emails, so anything sending to those accounts is 100% certainly a spammer. Basically what Abaca is doing is working with all the shades of gray in between. Also, this is a system that can only be employed at the server level. It's not like you could add this technology to your stand alone email client.

Dan East

--
Better known as 318230.

Re:x100 improvement in accuracy? by Dan+East · 2007-12-03 15:47 · Score: 3, Informative

Misquoted by the Slashdot story as usual. FTA:
Over 99 percent spam blocking means fewer than one mistake in every 100 messages processed. That's 10 to 100 times fewer mistakes than any other available systems.

Dan East

--
Better known as 318230.

The solution to spam by Anonymous Coward · 2007-12-03 15:49 · Score: 3, Funny

1) Issue a Fatwah that spam is an insult to Islam.
2) Behead those who insult Islam!
3) No more spam. Allah Akbar

Re:KInda flawed by pclminion · 2007-12-03 15:51 · Score: 4, Informative

So, if I understood the article correctly, this technology will classify more email as spam the more spam you have received.

No, that's not how it works at all. Let me try putting it as a concrete example. You have a friend, Jane, who likes to swap stupid chain emails, subscribes to all kinds of "voluntary spam," and generally receives 1000 spam mails a day. Jane's a great lady, don't get me wrong, but you know the type of person I mean. You talk to her in real life, but over email she is incredibly annoying, as most of her messages are essentially meaningless.

Now, let's say that BOTH YOU AND JANE receive the same message M. Now, you know Jane, and you know the kind of messages she typically received (mindless, at least in YOUR eyes). What are the chances that this message M is something that YOU will be interested in? Probably very low. The vast majority of email Jane receives is "crap," at least according to your definition, and so the very fact that Jane received message M greatly increases the likelihood that it is "crap."

Does that make better sense?

Chicken-and-egg problem by sonikbeach · 2007-12-03 15:52 · Score: 3, Insightful

How does one initialize this system? Spam is determined by user reputation, yet user reputation is determined by quantity of spam received. Am I missing something? The logic seems circular.

Form letter by Anonymous Coward · 2007-12-03 15:54 · Score: 5, Funny

My first attempt at doing this, please feel free to ammend/critique:

Your post advocates a
(X) technical ( ) legislative ( ) market-based ( ) vigilante

approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

( ) Spammers can easily use it to harvest email addresses
(X) Mailing lists and other legitimate email uses would be affected
(X) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
(X) It will stop spam for two weeks and then we'll be stuck with it
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
(X) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business

Specifically, your plan fails to account for

( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
(X) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
(X) Armies of worm riddled broadband-connected Windows boxes
(X) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
(X) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook

and the following philosophical objections may also apply:

( ) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
(X) Blacklists suck
(X) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
(X) Countermeasures must work if phased in gradually
( ) Sending email should be free
(X) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough

Furthermore, this is what I think about you:

(X) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your
house down!

Re:Is linux for homos? by MightyYar · 2007-12-03 16:10 · Score: 3, Funny

Oooo! Can I play?

"Anonymous Coward" --> A Condom Warns You

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.

Generalization of honeypots by CustomDesigned · 2007-12-03 16:41 · Score: 3, Insightful

Honeypots have been a published anti-spam technique for a decade. The idea is to publish bogus mailboxes that are not close to any legit mailbox. Any message with a honeypot as any recipient is spam. 100% accurate. (And I blacklist the IP for a week for good measure.) I use a variation, where any message with 3 or more invalid recipients is spam (blacklist IP). That is a little risky since someone may legitimately be trying various mailboxes manually with a telnet session because they forgot the exact name. This technique gives each recipient a score between 0 and 1 that reflects how close to a honeypot that recipient is, with actual honeypots (100% spam) being 1.0.

Re:No by arth1 · 2007-12-03 19:12 · Score: 3, Interesting

No, that's not what they're saying at all. RTFA, please, cause you're describing something completely different. (And moderators too, please at least skim TFA it before moderating, because modding this "Informative" is bollocks.)

This is a system where they look at the history of who a person has sent e-mail to. If the sender has a short term history of sending e-mail to people who mostly receive spam, the e-mail is considered more likely to be spam. Conversely, if the sender has a short term history of sending email to people who don't receive much spam, the email is considered unlikely to be spam.
It's not about your inbox and its percentages, it's about the ratio of the inboxes the sender has previously sent to.

"Because ratings are based on the most recent 25 emails for each sender, the system reacts instantly to spam attacks, usually within just a few messages."

The system has one big flaw, though -- it only work with static senders. A spammer who changes the envelope from address won't get caught, and might even by luck pick a forged sender address that has a positive latest-25-score.
So the solution for the spammers to defeat this system is to send the spams multiple times to the same receipients, but with different senders. This will increase the overall spam, which I don't see as a good service.

The inventor responds... by propelCEO · 2007-12-03 20:11 · Score: 5, Informative

Thank you for all the comments on the NY Times article.

It would be difficult for me to answer each and every comment, so I'll try to just hit the high points here.

It's quite easy to poke fun at an algorithm which is unknown to you as demonstrated by all the comments.

But what's more relevant is whether really smart people who know the algorithm can find fault with it. There are only two people outside of Abaca who know the algorithm: Stephen Wolfram (author of Mathematica) and University of Waterloo Professor Gordon V. Cormack (a well known figure in the anti-spam community). I picked Wolfram because he's the smartest pure math guy I know. I picked Cormack because I think is one of the smartest and most respected scientists in the spam field. You could contact either of them and ask them what they think of the approach. I can tell you what they'd say if you did that. They'd tell you it is a simple, elegant algorithm that has no obvious (to them) holes. I know that because the reason I disclosed it to them was to see if I overlooked anything. Neither found any holes. That doesn't prove that there aren't holes. All systems have holes. What this does mean is that a couple of pretty respected experts think it appears to be pretty solid logic.

In fact, Gordon was kind of enough to go even further and gave me permission to use the follow quote: "This is, by far, the most clever technique I'm aware of for spam filtering." Since Gordon is conference chair for a lot of spam conferences, this is a pretty significant endorsement from someone who KNOWS the full algorithm and who knows the spam space better than just about anyone.

I spent about 4 years studying what others had done in the space. As one commenter pointed out, the recipient reputation system can be thought of as a generalization of the honeypot technique that was first patented by Brightmail.

That's exactly right. My realization is that every email address has statistical value, not just honeypots. So instead of just "black" feedback, the system incorporates "grey" and "white" feedback; every recipient has an apriori odds associated with receiving mail. For many years, Brightmail was the "defacto" standard for spam filtering. Brightmail is just a special case of the algorithm I invented. So instead of learning from honeypots, we learn from ALL recipients and incorporate that statistical input in a mathematically rigorous way in order compute a statistical likelihood that our prediction was correct. That gives us much more input than a honeypot system: it gives us white, black, and grey values. That is critical to avoiding false positives because good sites (like Yahoo and Hotmail) send email to honeypots all the time. And we incorporate that feedback into a statistical framework that is much more accurate than what Brightmail used.

Exactly how we incorporate that input into spam scoring has not been publicly disclosed. It is not obvious.

People who say that this must be snake oil or cannot work ignore the fact that the system has been in use by real customer for more than a year. We have over 100 customers and are just annoucing our existence to the world, so that number should increase quite rapidly now that we are starting to market our product. There are customer testimonials on our website. You can contact them directly to verify that these quotes are legitimate.

Here are statistics from one of our rating servers. There were 1,380,140 messages since the last counter reset. 96% were rated spam. There were 176 false positives and 66 false negatives reported. I just grabbed those stats from one of our live servers right now as I was composing this message. Sometimes we're better, sometimes we're worse, but those numbers are pretty typical.

It's not perfect, but I think those are pretty good error rates for where we are now. And the stats always get better as we add more customers since we get more statistical input and this is just a statistical estimation problem. The more data, the more accurate

Re:No by arth1 · 2007-12-03 20:17 · Score: 3, Insightful

Ironically, you are completely wrong also - RTFA again. It isn't at all about senders, it's about recipients.

You didn't RTFA well enough. That it's about recipients is the selling point.
That's a truth with modifications, though. Look at the quote from the web site I put in my parent post to yours, which clearly shows that it's a block based on who the sender has sent an email to. I'll repeat it, in case you missed it:

"Because ratings are based on the most recent 25 emails for each sender, the system reacts instantly to spam attacks, usually within just a few messages."

Yes, it's a recipient based system in that it assigns a score to the sender based on what the recipients of the emails are. But the blocking occurs due to the score of the sender, based on previous emails, not on the recipient of the current email.

Just think -- if it was based on blocking based on recipient only, it would either block all or no e-mail to an inbox with a single recipient. It would then only be effective for e-mails with multiple recipients, which doesn't match the claims made.
Again, think, and read the article (and that goes for the moderators too).

Re:You are also totally wrong by arth1 · 2007-12-03 20:30 · Score: 3, Insightful

You have got the system completely BACKWARDS.
Sorry for AC but i've already moderated in this discussion.

(Ah, that explains the completely asshat moderation here, then.)

No, I didn't get it backwards -- RTFA. It's called a recipient verification system, but when you look at their own description on how it operates, you'll find that:

- It looks at the recipients of a message, and based on how much spam each of the recipient accounts gets, assigns a score to the sender.

- This score is accumulated over the last 25 emails.
(The reason for this is rather obvious, if you think about it -- if it based its score on just the last e-mail, if you sent an e-mail to someone who receives a lot of spam, it'd be automatically blocked, and that person would not get any e-mail at all.)

Say a sender sends three e-mails, to foo@foo.invalid, bar@bar.invalid, a bunch more people, and finally baz@baz.invalid. If foo@foo.invalid receives 30% spam, and the overall average is 80%, that means that the e-mail is unlikely to be spam. So a score is saved in a table for the sender. Then it goes to bar@bar.invalid, who also has a low 40% spam rate, and another "good" score is saved for sender. When the sender then after a while sends an email to baz@baz.invalid, who has a spam rate of 95%, the fact that he sent an e-mail to foo and bar earlier will increase the likelihood of his email to baz going through.
Conversely, if foo and bar received more spam than average, an e-mail sent to baz would be scored as more likely to be spam, even if baz received a record low 10% spam.

Yes, in a way, it's receiver based, because it builds the score based on the receivers' ratio of spam to valid e-mails. But the score is applied to the sender, and they state this in clear text on the web site itself. You only have to read past the sales pitch and down to the technical details.

Re:You are also totally wrong by Garridan · 2007-12-03 21:19 · Score: 5, Funny

No, you are totally wrong. The system measures the ratio of the sender to the spam of the ratio receiver receiver, and establishes a negative false-positive ratio by building a score based on the spam-spam ratio of the sender receiver. By collecting the sum total products of the receiver sender spam ratio dividend, the sales pitch drives the likelihood of three emails through the foobar baz@incompatible.

In summary, I have no idea what I'm talking about because I didn't RTFA. That I am aware of this fact makes me superior to the lot of you who are arguing over the inner workings of this week's spam-filter vaportech -- which was probably written up in an incomprehensible and inconsistent manner such that it will go over the heads of foolish investors, and part them from their money.

Re:You are also totally wrong by Jonathan_S · 2007-12-04 04:58 · Score: 3, Informative

But doesn't this assume that the spam is addressed to multiple recipients? 99% of the spam I get is addressed only to me

I think the confusion here is that you (and many other posters) are trying to evaluate this as a personal anti-spam product.

But its really designed to be a corporate product. So even if the each spam email contains only one recipient they all go through the corporate email server, allowing it see all the various recipients a given sender is emailing.

And there were even hints that the software stored on your corporate mail server might be sharing some information with a central data store, allowing it to get the score of all the recipients that the sender is sending to on any network that is a customer of this product. (So it doesn't matter so much if your company only has 10 people to average across because it is somehow cross checking against the global dataset which is tens of thousands of recipients.)

Re:Is linux for homos? by myowntrueself · 2007-12-04 07:11 · Score: 3, Informative

Linux is not gay, homosexuals are gay.

Not all homosexuals are happy, cheerful people either.

--
In the free world the media isn't government run; the government is media run.

Slashdot Mirror

Spam Trap Claims 10x-100x Accuracy Gain

29 of 419 comments (clear)