Spam Trap Claims 10x-100x Accuracy Gain
SpiritGod21 writes in with a NYTimes article on a new approach to spam detection that claims out-of-the-box improvement of 1 or 2 orders of magnitude over existing approaches. The article wanders off into human-interest territory as the inventor, Steven T. Kirsch, has an incurable disease and an engineer's approach to fighting it. But a description of the anti-spam tech, based on the reputation of the receiver and not the sender, is worth a read.
I read part of TFA, and it seems to be saying that you can id spam mails because they are being sent to a person who gets lots of spam. But that still doesn't take into account the fact that that person also receives legit mail, AND the fact that what is spam to one person isn't spam to another.
Also, seems like a bit of a slashvertisment for what is yet an unproven technology - the only benchmarks we have are ones they provide.
At least once a week there seems to be another flashy technique to filter or block spam. Great.
Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem. Spam is sent by people who are making money helping someone sell something. The spam you got this afternoon for discount v!@gra or 0EM software is making money for someone. And as long as someone can still make money off of it, they'll keep doing it.
If you want to stop spam, you need to take away the economic incentive. We've already seen how many spam filtering / blocking programs produced in the past 5 years? But yet the spam problem just keeps growing as the number of "solutions" grows. This tells us that the spammers are more than willing to work on ways to circumvent these reactive techniques, so that they can continue to make money off their deeds.
Once we can stop spam from being profitable, we will finally see it go away. But no sooner.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Seriously, I don't see how anything working remotely as described can work. First, it guarantees that any OSS mailing list will be flagged as spam because we our emails tend to be on the web and we all receive lots of spam. Then how the hell is someone going to know what percentage of spam I receive (or do they expect everyone to give them access to their inbox?)? Even if that were to work, all the spammers would have to do is let the zombies send one email at a time, at which point either they block all my email or they leave it all through. Dumb idea or dumb reporting?
Opus: the Swiss army knife of audio codec
How does one initialize this system? Spam is determined by user reputation, yet user reputation is determined by quantity of spam received. Am I missing something? The logic seems circular.
Not much.
Two issues: First, how does the system know that Jane's e-mail is mostly spam. Who tells it? Does it use some other filters to identify the spam in order to determine her spam rate?
Second, how does the system know that the message you received and the message Jane received are the same? Spammers have long been randomizing parts of messages in order to block older spam filters.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
"MightyYar" --> "him gay, try!"
Honeypots have been a published anti-spam technique for a decade. The idea is to publish bogus mailboxes that are not close to any legit mailbox. Any message with a honeypot as any recipient is spam. 100% accurate. (And I blacklist the IP for a week for good measure.) I use a variation, where any message with 3 or more invalid recipients is spam (blacklist IP). That is a little risky since someone may legitimately be trying various mailboxes manually with a telnet session because they forgot the exact name. This technique gives each recipient a score between 0 and 1 that reflects how close to a honeypot that recipient is, with actual honeypots (100% spam) being 1.0.
That's the problem I have with this. Spam stopped being truly mass produced years ago. Each spam is now normally sent to each user with a different mix of nonsense. The probability of two different people receiving the same message is virtually zero.
Over 99 percent spam blocking means fewer than one mistake in every 100 messages processed. That's 10 to 100 times fewer mistakes than any other available systems.
That still means that the best other systems make a mistake on 1 out of every 10 messages, and the worst ones make a mistake on every single message. That's still ridiculous hyperbole.
(Personally, I'll take the system that makes 100% mistakes, and I'll use the Spam folder as my Inbox.)
Now if you said that it has 1/10 to 1/100 the error rate of normal clients (which is what they're actually claiming, I think), THAT would make mathematical sense AND be an achievement. The Slashdot title of the story is just bad no matter how you spin it.
If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").
You didn't RTFA well enough. That it's about recipients is the selling point.
That's a truth with modifications, though. Look at the quote from the web site I put in my parent post to yours, which clearly shows that it's a block based on who the sender has sent an email to. I'll repeat it, in case you missed it:
"Because ratings are based on the most recent 25 emails for each sender, the system reacts instantly to spam attacks, usually within just a few messages."
Yes, it's a recipient based system in that it assigns a score to the sender based on what the recipients of the emails are. But the blocking occurs due to the score of the sender, based on previous emails, not on the recipient of the current email.
Just think -- if it was based on blocking based on recipient only, it would either block all or no e-mail to an inbox with a single recipient. It would then only be effective for e-mails with multiple recipients, which doesn't match the claims made.
Again, think, and read the article (and that goes for the moderators too).
(Ah, that explains the completely asshat moderation here, then.)
No, I didn't get it backwards -- RTFA. It's called a recipient verification system, but when you look at their own description on how it operates, you'll find that:
- It looks at the recipients of a message, and based on how much spam each of the recipient accounts gets, assigns a score to the sender.
- This score is accumulated over the last 25 emails.
(The reason for this is rather obvious, if you think about it -- if it based its score on just the last e-mail, if you sent an e-mail to someone who receives a lot of spam, it'd be automatically blocked, and that person would not get any e-mail at all.)
Say a sender sends three e-mails, to foo@foo.invalid, bar@bar.invalid, a bunch more people, and finally baz@baz.invalid. If foo@foo.invalid receives 30% spam, and the overall average is 80%, that means that the e-mail is unlikely to be spam. So a score is saved in a table for the sender. Then it goes to bar@bar.invalid, who also has a low 40% spam rate, and another "good" score is saved for sender. When the sender then after a while sends an email to baz@baz.invalid, who has a spam rate of 95%, the fact that he sent an e-mail to foo and bar earlier will increase the likelihood of his email to baz going through.
Conversely, if foo and bar received more spam than average, an e-mail sent to baz would be scored as more likely to be spam, even if baz received a record low 10% spam.
Yes, in a way, it's receiver based, because it builds the score based on the receivers' ratio of spam to valid e-mails. But the score is applied to the sender, and they state this in clear text on the web site itself. You only have to read past the sales pitch and down to the technical details.
Right back atcha:
courseofhumanevents -> "Must Fence A Nervous Ho"
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Linux is not gay, homosexuals are gay.
GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
> My point is, the "powers" that be, in the particular case, are likely incompetent - incapable of successfully pulling off such a conspiracy.
They're the ones creating the successful antispam systems -- you know, the ones that actually scale up on the gateway. The popular vision of bumbling PHB buffoons everywhere is just another stupid slashdot stereotype, fostered by insecure social retards who have to foist their apparent superiority over everyone by scoffing at everything. Sure, they exist, but long-term successful tech companies generally have -- get ready for it -- smart people working for them.
Anyway, the antispam companies don't have the leverage to pull off an end to spam. Symantec and Cloudmark and Ironport and so forth could stand up and scream and rant and rave at ISPs and yell about the need to secure email infrastructure, to block outbound port 25 from residential ranges, to deploy SPF, or hell just to stop bouncing (I'm looking at you Barracuda), but as long as the ISPs run their ranges as open sewers, and just slap in a few boxes to stop everyone else's spam, the spam problem will continue. And they don't like having vendors telling them how to run their business. The people with the power to stop the spam problem, who won't, are not the antispam vendors, it's the ISPs sending spam. So perhaps I was too harsh about the assessment of the PHB problem -- they certainly do seem to be the norm at ISPs (notable exceptions like AOL and parts of Roadrunner excepted).
Done with slashdot, done with nerds, getting a life.