Spam Trap Claims 10x-100x Accuracy Gain
SpiritGod21 writes in with a NYTimes article on a new approach to spam detection that claims out-of-the-box improvement of 1 or 2 orders of magnitude over existing approaches. The article wanders off into human-interest territory as the inventor, Steven T. Kirsch, has an incurable disease and an engineer's approach to fighting it. But a description of the anti-spam tech, based on the reputation of the receiver and not the sender, is worth a read.
I read part of TFA, and it seems to be saying that you can id spam mails because they are being sent to a person who gets lots of spam. But that still doesn't take into account the fact that that person also receives legit mail, AND the fact that what is spam to one person isn't spam to another.
Also, seems like a bit of a slashvertisment for what is yet an unproven technology - the only benchmarks we have are ones they provide.
of slashvertisements in the morning. /sarcasm
Was the previous technology less than 1% accurate?
Engineering is the art of compromise.
Does this mean I can't recieve new ways to "enlarge my pen15 and please my significant other while keeping my bank info for safeness"?
At least once a week there seems to be another flashy technique to filter or block spam. Great.
Except that this ignores the truth behind the spam problem, that many people don't seem to care about. Spam is, at its root, an economic problem. Spam is sent by people who are making money helping someone sell something. The spam you got this afternoon for discount v!@gra or 0EM software is making money for someone. And as long as someone can still make money off of it, they'll keep doing it.
If you want to stop spam, you need to take away the economic incentive. We've already seen how many spam filtering / blocking programs produced in the past 5 years? But yet the spam problem just keeps growing as the number of "solutions" grows. This tells us that the spammers are more than willing to work on ways to circumvent these reactive techniques, so that they can continue to make money off their deeds.
Once we can stop spam from being profitable, we will finally see it go away. But no sooner.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Is this a self-defeating strategy? It depends on some members of the group receiving a lot of spam. But once they're in they receive less spam.
So, if I understood the article correctly, this technology will classify more email as spam the more spam you have received. Wouldn't this eventually classify everything as spam, forcing you to trawl through catch folders to find all your legit email?
A game has objectives and is competitive, anything else is just play
I own a number of domains, and receive all email to each domain in a catch-all account. I receive a great deal of emails to totally fictitious email accounts at my domains. Those recipients receive 0% legitimate emails, so anything sending to those accounts is 100% certainly a spammer. Basically what Abaca is doing is working with all the shades of gray in between. Also, this is a system that can only be employed at the server level. It's not like you could add this technology to your stand alone email client.
Dan East
Better known as 318230.
So the way I read this is that it works like a reverse karma system. It doesn't really make much sense though. Remember the old adage about lies and statistics. Without seeing there analysis who knows what they twistsing. I would very much like to see actual data about this system. The idea that a person's amount of spam would fit any sort of predictable distribution seems like a bit of a stretch to me. If anyone with actual numbers could come forth, I think we would all appreciate it. Even if there was a regular distribution of spam for a recipient it would have a tenuous relationship with any one single element at best. I call snake oil without any hard statistical analysis. The best the article gave was a board meeting style feel good chart with no basis in real statistics, only assumed aggregates.
I got a catholic block.
1) Issue a Fatwah that spam is an insult to Islam.
2) Behead those who insult Islam!
3) No more spam. Allah Akbar
Seriously, I don't see how anything working remotely as described can work. First, it guarantees that any OSS mailing list will be flagged as spam because we our emails tend to be on the web and we all receive lots of spam. Then how the hell is someone going to know what percentage of spam I receive (or do they expect everyone to give them access to their inbox?)? Even if that were to work, all the spammers would have to do is let the zombies send one email at a time, at which point either they block all my email or they leave it all through. Dumb idea or dumb reporting?
Opus: the Swiss army knife of audio codec
How does one initialize this system? Spam is determined by user reputation, yet user reputation is determined by quantity of spam received. Am I missing something? The logic seems circular.
It totally takes how much legitimate email each individual gets into account. What they are saying is that if 30% of the emails I receive are usually spam, then my personal spam filter should mark about 30% of my email as spam. It should sort my mail based on how spammy it looks and then kill the top 25%, pass through the bottom 65%, and maybe give some extra scrutiny to the middle 10%. It's a pretty interesting idea.
My first attempt at doing this, please feel free to ammend/critique:
Your post advocates a
(X) technical ( ) legislative ( ) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
(X) Mailing lists and other legitimate email uses would be affected
(X) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
(X) It will stop spam for two weeks and then we'll be stuck with it
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
(X) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
(X) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
(X) Armies of worm riddled broadband-connected Windows boxes
(X) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
(X) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
( ) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
(X) Blacklists suck
(X) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
(X) Countermeasures must work if phased in gradually
( ) Sending email should be free
(X) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(X) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your
house down!
I've never once had a spam message in my Gmail inbox, it all gets caught by their spam filters and ends up in the appropriate folder. There's 150 in the spam folder right now, and they get deleted automatically after 30 days, so I get around 5 a day. That's probably just the ones google thinks are possibly spam, who knows how much they filter out that we never even see. Their filtering tech is pretty close to perfect, but it's always those last few points that are the hardest. So I seriously doubt this as yet unproven tech that claims such substantial increases in accuracy over traditional filtering. But the article was still interesting to learn more about Kirsch, his prior inventions and work, and battle with terminal blood cancer.
If you build it, nerds will come. Soylentnews.org
Oooo! Can I play?
"Anonymous Coward" --> A Condom Warns You
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Because they're going to be needing an OC-256 or the fucking spammers will be able to ddos the servers that compute aggregate scores off the 'Net and break the system.
This is clever: filtering spam by exploiting properties of spam pumps in general, vs. straight content analysis. The competition of ever-more-sophisticated content scanning techniques on one side, and spammers' escalating workarounds and huge botnets on the other side, is an arms race that shows no sign of abating.
Of course, this approach does still depend on something—probably content analysis—to determine which messages are spam and which are not, so that receivers' spam statistics can be computed.
The smartest (and reportedly most effective) anti-spam technique I know is spamd, which completely sidesteps content analysis. In a nutshell, it's an SMTP proxy that issues a temporary error code to unknown senders; legitimate MTAs retry delivery (at which point spamd lets the message through), while spam pumps don't bother. Voilà—spam gets stopped before it's ever received. A friend of mine reports that spam volume has dropped to zero since he set up spamd for his department.
If I understand the "receiver reputation" approach correctly, it could use spamd (rather than content analysis) to identify spam; similarly, content analysis can supplement spamd. The two are potentially complementary.
The Religion of Peace (tm)
is not upset over Teddy Bears, but over Mo'
(really it is just grievance theatre)
Just remember in Sudan:
Raping a killing thousands, A-OK
Naming a teddy bear Mohammad, death/flogging/prison
Just so we clear that up.
Your post advocates a
(x) technical ( ) legislative ( ) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
(x) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
(x) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
(x) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
(x) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
( ) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
(x) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
(x) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
( ) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your
house down!
(x) Good original thinking for a change
Charge money to send emails. That idea has been discussed before, I know, but there is a twist to make it work - make it so that the recipient is the one who gets paid. After all, it's their time the spammers are wasting so they should be fairly compensated. This would cause serious problems for people who run listservs, so this would have to be combined with user customizable white-lists. In the ideal case, each recipient can even name their own price, have a white list, and retroactively forgive debt. For most users the charges will roughly balance out and/or they'll have the who send them the most email on their white list. The ISP and money shuffler makes money by charging the owners of the account a fixed fee for providing this premium spam-free service.
Then, of course, you get the problem of spammers trying to weasel their way into as many white-lists as possible, but it is easy to kick them off the white list and the spammers would be subject to criminal prosecution if they are hacking or otherwise resorting to dirty means to get themselves on white lists.
Had one, and only one, false positive that I noticed. So they suck, period. .gov is beyond me.
And the false positive? An actual JOB INQUIRY/RATING notification for an application with the Dept. of Labor. How the fuck you can be so stupid to filter the HR systems of a
Even one false positive can cause significant financial damage to an individual, make gmail of questionable value for even small business, and greatly increase the costs of using their service. I mean what's the point if I have to check the spam folder for legitimate emails every two days?
While this is a rare case of the algorithm actually being original (as opposed to rehashing an old idea "on the web"), it is yet another software patent. I'll lump it with RSA - the kind of software patent you might actually want to read if all software patents were that original.
"MightyYar" --> "him gay, try!"
Honeypots have been a published anti-spam technique for a decade. The idea is to publish bogus mailboxes that are not close to any legit mailbox. Any message with a honeypot as any recipient is spam. 100% accurate. (And I blacklist the IP for a week for good measure.) I use a variation, where any message with 3 or more invalid recipients is spam (blacklist IP). That is a little risky since someone may legitimately be trying various mailboxes manually with a telnet session because they forgot the exact name. This technique gives each recipient a score between 0 and 1 that reflects how close to a honeypot that recipient is, with actual honeypots (100% spam) being 1.0.
That gives you endpoints for your curve. Are there any math geeks reading tonight that can tell me if having just the endpoints would be good enough to extrapolate the middle? Calibrating the middle percentiles seems harder, since you can't control the number of spams you'd receive.
How hard it is to spoof 1 million IP addresses during a bulk transfer? That would appear to be a way to defeat this system, since the system assumes a particular batch of spam will originate from a single IP address.
From TFA with commentary:
"he has started four companies, all based on his frustrations with existing products or services"
Unless they're all still in business that's probably 3 failures on record.
"Along the way he has amassed a personal fortune of about $230 million"
But he got out before the ship sank and with a bundle of cash too. I wonder what his ex-employees got...
"This is harder on my wife than it is on me," he said during a recent interview. "I just look at it as a problem. Here's a problem and you have four years to solve it or you don't get to solve any more problems."
How philosophical...So he's going to cure himself single handedly of a rare disease in 4 years, because medical research is as easy (and cheap) as writing software or tinkering with a home engineering project. I think he's been watching Crusade and sniffing glue.
"His perspective on his disease is also clear. Fourth on his list is "Why human beings will be extinct in 90 years." He writes, "My incurable blood cancer is minor compared to what is happening with the planet. We have somewhat more than 90 years before humanity is virtually extinct.""
Don't even know where to start on this one. I can't be bothered reading about his reasoning, but he's not the first person to predict the end of the world just beyond his own lifetime.
Oh and by the way he has a bridge, I mean some anti-spam software to sell you.
Gimme a break! Nothing to see here.
These posts express my own personal views, not those of my employer
This approach is quite similar to that taken by the DCC. Quoting from its home page: "The DCC is based on an idea of Paul Vixie and on fuzzy body matching to reject spam on a corporate firewall operated by Vernon Schryver starting in 1997. The DCC was designed and written at Rhyolite Software starting in 2000. It has been used in production since the winter of 2000/2001."
As is often the case, those who are new to the spam problem frequently believe they are inventing something new, when it's most likely that they're not -- the remaining question being whether it's workable or long-since abandoned as (mostly) useless. Reputation systems like this are presently somewhat useful, but it's worth noting that should they become widely used, spammers might then find it worth the effort to exercise the control they have over the 100M+ hijacked systems out there and thereby poison the reputation system. While this could be done by generating appropriate traffic, and that'd be moderately disruptive, exerting control over a sufficient number of systems participating in reputation assessment would be worse.
This therefore joins a long parade of specious claims (e.g., Spam as a technical problem is solved by SPF") made to announce the mythical "solution" to spam, which of course does not exist. Does it have possible value in mitigation? It would appear so, based on the track record of similar work (see above). Is it The Answer? Not even remotely close.
"Spam Trap" Claims 10x-100x Accuracy Gain
;-) A
"Spamtrap" is an email account set up only to receive spam mail. That email address is never given to any legitimate user.
;-)
;-)
The title might give up their secret industrial patented algorithm
So maybe they just setup spamtraps, then publish those email in some honey pot places where spammers scrape email addresses, et voila !
Of course, any emails sent to the spamtraps will be guaranteed to be spam. Now, the Marketing department steps in and says: Let's call this : "The concept of receiver reputation"
By the way, I already block way more than 99.9% of spam using the following, this was a one-time setup with no need for white/black listing maintenance:
-Spam Assassin
-Real time blacklists
-Greeting delays
-Rate control
-Max senders by message and other various sendmail option You can view the configuration here.
-Priority 1 and Priority 10 mail servers are always down, Priority 5 mailservers are the real ones
-Spam trap addresses
It is so efficient that I didn't have to resort to graylisting yet but I could always use it to achieve even better results. I am not ready for the downsides of graylisting yet.
Since correctly using available open-source tools already gives better than 99.9% result (1 spam every 1000 forwarded message) I am not sure of the relevance of the advertised product
Everything I write is lies, read between the lines.
From TFA:
Um, wait a minute. Given two hypothetical spam filters, one with 99.8% rejection but a nasty habit of discarding legitimate emails, and another with 95% rejection but effectively zero false positives, I'd rather take the 95% filter, thank you!
Here we go, yet again. The New York Times, of all places, reports nothing but the "spam catch rate". But the false positive rate is a far more important indicator of a spam filter's effectiveness than the "spam catch rate". I'd rather have to delete the occasional spam than miss an important email from a long-lost friend.
Why are people still comfortable talking exclusively about the "spam catch rate"? Are we really that gullible to the marketing drivel of anti-spam companies? Shouldn't we be holding the discourse to a higher standard?
As I understand it, this method looks at a message and analyzes it based on the users to whom it has been sent. What is not clear to me is how the system would cope with individually customized spams.
Spammers already have systems in place to randomly mutate the spam messages, to defeat systems that block spam based on identity. For example, consider Vipul's Razor, where people cooperate to flag messages as spam. Suppose a spammer sends a message with the subject "Panda Obligate Greenspan" to Joe, and Joe dutifully flags it as spam. But that same spammer sent another spam to Mary with the subject "Goldfish Dutiful Jones".
This new spam trap uses a clever technique, and I believe that if the same message is spammed out to many people, this trap could detect it. But I think that with enough randomness in the spam messages, this won't be able to stop the spam.
Imagine that a spammer has a botnet at his disposal, and the botnet has thousands of servers. He could send a single random spam from each of his servers to each of the users on an email server; each message thus has different gibberish in it, and a different sender.
You could block a bunch of spam by blocking pure gibberish, if you had a reliable gibberish detector. But then the spammers start pulling complete sentences out of any available source texts (Mark Twain novels, news stories, etc.). So I think any content-based spam filtering is also ultimately doomed.
I think the only possible solution to spam will be to create a whitelist system that doesn't suck. Any attempt to guess whether a message can guess wrong. (As the article notes, even humans make errors when classifying messages.) I want digital signatures; then, if I get an email that is correctly signed with my wife's signature, I'm pretty sure that's not spam. But a whitelist system is doomed unless there is an escape mechanism; if my old friend from college suddenly sends me an email message, I want to get it, even if he's not in my whitelist. It's not a trivial problem.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
But doesn't the fact that *I* received the message equally indicate that it's *not* spam? I don't understand. Jane getting the message indicates that it's spam, me getting it indicates that it's not.
"If you're not passionate about your operating system, you're married to the wrong one."
"Road Akim" --> "I am a dork"?!
Getting a 99% accuracy is still almost useless. To be useful you need four nines at least.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Something I didn't totally see in that is the following scenario.
.. let's just say it's NANOG or a similar list .. there might be a large number of subscribers to the list who also have "bad karma" according to the system.
I've had an E-Mail address for.. well, we'll just say "forever" that's so old it was used to post on USENET before using a "real" E-Mail address was a problem. Additionally, it's also been used on some domain registrations, and in general seems to wind up on quite a few spam lists.
Using current filtering, somewhere around 80% of all E-mails this account gets is spam.
On the other hand, I'm also on a number of popular mailing lists with that E-mail address. One of these lists gets a good number of messages a day.
How does the system detect which mail is "good" and which is "bad" solely looking at my reputation? I'd gather based on the nature of the mailing list
How does it prevent a false positive?
Yuor psot acvotaeds a
(X) tehnccial ( ) lavsilegtie ( ) mkreat-based ( ) vgntiiale
apprcoah to fgthiing spam. Yuor ieda will not wrok. Here is why it won't work. (One or mroe of the flnwoilog may aplpy to your ptaruicalr idea, and it may have otehr flwas wihch used to vray form satte to satte bfoere a bad freeadl law was passed.)
( ) Smpreams can eislay use it to hrsevat eiaml aerdessds
(X) Milnaig ltiss and other laeititgme eaiml uess wulod be acteffed
(X) No one will be albe to fnid the guy or colelct the meony
( ) It is delfesesnes agasnit bture frcoe atctaks
(X) It will sotp sapm for two weeks and then we'll be sutck wtih it
( ) Usres of eimal will not put up with it
( ) Mofisrcot wlil not put up with it
( ) The pciole will not put up with it
( ) Rqriuees too much ctoaprooien form srepmmas
( ) Rruiqees idietmmae tatol coiarooeptn from eeovydbry at ocne
(X) Many eamil urses cnonat aofrfd to lsoe bsisneus or ataleine ptntieaol eermlypos
( ) Smrmaeps don't care aubot ivainld aedsesdrs in thier litss
( ) Anynoe culod anmnylosuoy dsreoty aynnoe esle's caerer or bussneis
Spciicalfely, your plan fials to acocnut for
( ) Lwas erplsxsey piroibihntg it
( ) Lack of cantlelry cnlonoilrtg artohtuiy for eamil
( ) Oepn ryelas in fiorgen ceiuotnrs
( ) Ease of senarcihg tiny ahirmelapunc asedrds sapce of all eimal adsesders
(X) Assaths
( ) Jctrdisiuoainl plmreobs
( ) Uiaortnluppy of wierd new taexs
( ) Pibulc rlunaeccte to aepcct wreid new forms of money
( ) Huge esitnixg sfwroate isvtneenmt in SMTP
( ) Stsicetuilipby of potcorlos ohter tahn STMP to actatk
( ) Wlsielnings of users to iltansl OS pehctas rceeived by eamil
(X) Aermis of wrom rdelidd bobaardnd-cnocented Wdiwnos bxoes
(X) Etanrel arms race ilvnveod in all fltrineig aprhpocaes
( ) Eetrmxe ptitairfloiby of spam
( ) Joe jobs and/or ientitdy thfet
( ) Tahllcicney ittillaere picantloiis
(X) Emxerte sudtiipty on the prat of ppolee who do beiunsss wtih seprmams
( ) Dossnhetiy on the prat of spmermas teleevhmss
( ) Binatwddh ctoss taht are unafecfted by cilent frliitneg
( ) Otooulk
and the foilownlg pspcoiahoilhl obntoecijs may also alppy:
( ) Ideas samilir to yrous are easy to cmoe up wtih, yet nnoe hvae eevr been sowhn paarticcl
( ) Any schmee bsaed on opt-out is uancbalcptee
( ) STMP hredeas souhld not be the seubjct of lilaigteson
(X) Btaiklscls scuk
(X) Wiitehtsls scuk
( ) We slouhd be able to talk aoubt Vaigra whtuiot bineg creonsed
( ) Cusemruteonreas sulhod not ivonlve wire fruad or cierdt crad fruad
( ) Cmaeetsruouners slouhd not ivnlove soaatbge of piublc nkterwos
(X) Cntrmurosueeeas must work if phased in gluardlay
( ) Sninedg email sluohd be free
(X) Why soluhd we hvae to tusrt you and yuor severrs?
( ) Ipinlocmitatby with open scuore or open scuroe leiencss
( ) Feel-good masreeus do noihtng to slove the prolebm
( ) Teprmoary/one-tmie eamil asseeddrs are csuebrmmoe
( ) I don't want the geoervnnmt rinadeg my eiaml
( ) Klnilig them taht way is not solw and pauinfl eugonh
Frtruermhoe, tihs is waht I thnik about you:
(X) Srory ddue, but I don't tinhk it wloud wrok.
( ) This is a situpd idea, and you're a sptiud preosn for sgnseutgig it.
( ) Ncie try, assh0le! I'm gniog to find out whree you lvie and brun your huose down!
This issue is a bit more complicated than you think.
Let's see the Pseudo-code:
* Step 1) The system classifies a message as SPAM because SPAM messages are more likely to be sent to people that receive a lot of spam. So, if a certain email message is sent to X people (where X is a threshold) that get a lot of spam (bad reputation, step 2), the message will be classified as SPAM.
* Step 2) To calculate the reputation of someone, the system needs to know the ratio of "SPAM Mail" to "Good Mail" (step 3) for this particular person.
* Step 3) To calculate this ratio, the system has to know BEFOREHAND whether messages to this particular person are SPAM or not (step 4).
* Step 4) To know if a message is SPAM (or not) goto step 1.
Conclusion: IMHO, this system will always depend on older techniques to pre-classify SPAM messages. This classification might even be less strict, but it has to be done.
So he's describing a spam-filtering system which is basically saying "if this bunch of people are all getting this email, it smells like spam".
While I'll admit I'm ludicrously overgeneralizing his technique, and I have no real knowledge of exactly how Google identifies spam, I'd say his method smells distinctly similar to essentially what Google must be doing (broadly speaking).
If I were him, I'd be seriously researching how close his work is to The Big G, and make sure there's no conflict/overlap; or he'll just be wasting his time.
Of course, it could be that his tweak may add value to the G, and they offer to buy him out, give him a luxurious position doing more of the same.
Visit CryptoGnome in his home.
this system relies on having large volume spam to stay effective. Messages that don't reach a lot of people would be hard to identify. So one needs to either:
:)
-hope that large volume spam senders don't give up and keep sending once the effectiveness of the scheme goes down (assuming this method is really amazingly accurate)
-hope to have a very large network to administer (the smaller your network, the less effective this method is for large spam).
Funny enough, this method (without a whitelist) should always return a false positive for a mailing-list that sends announcements about new spam fighting methods to large volume receivers or spam.
I'm sorry -- do you have a suggestion for how to make drugs, prostitution, identity theft, or insurance fraud unprofitable?
We've had proposals for how to make spam unprofitable, or at least, far less profitable. One has the rather silly name "Internet Mail 2000". The basic premise is to move the cost of bandwidth and storage from the recipients back to the servers.
I don't see how you can make prostitution unprofitable, but it's not hard to design an email system to take advantage of the properties of computers, rather than be hindered by having to be structured just like the post office.
Over 99 percent spam blocking means fewer than one mistake in every 100 messages processed. That's 10 to 100 times fewer mistakes than any other available systems.
That still means that the best other systems make a mistake on 1 out of every 10 messages, and the worst ones make a mistake on every single message. That's still ridiculous hyperbole.
(Personally, I'll take the system that makes 100% mistakes, and I'll use the Spam folder as my Inbox.)
Now if you said that it has 1/10 to 1/100 the error rate of normal clients (which is what they're actually claiming, I think), THAT would make mathematical sense AND be an achievement. The Slashdot title of the story is just bad no matter how you spin it.
If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").
Yes. The article says "Aggregating the reputations of all recipients of a particular message, therefore, is equivalent to combining those users' rating power to estimate the legitimacy of the sender and the message." If you're able to even count all the recipients of a particular message, a large count is a good indicator of spam.
But it's been a long time since spammers simply sent the same text to a large number of addresses. Identifying multiple transmissions as instances of a "particular message" is the big problem today. This new scheme doesn't help with that.
If a message contains URLs, filtering today works quite well. The few spams that get through SpamAssassin typically don't contain URLs. The spammers are getting desperate. I just had a spam come in that expresses a domain as "nartbx. com /* O, mit Empty Space". It got through the spam filter, but it hardly seems worth it for the sender.
Around cosy woman
The "receiver reputation" method of anti-spam in the article sounds like bullshit. Sweet bullshit.
The ratio of good email to spam as an identifier? Who the heck cares? How the heck would that help? Many emails have only one receiver so comparing the reputation of a receiver isn't relevant.
Can you spell last chance to make money before he croaks from cancer? Hey, he doen't need to make sure that it has a return... as no one can punish him when he's dust.
Ok, maybe this is a bit cynical... but his technique sure sounds like bullshit to me - just looking at the receiver of an email. Sheesh... what the gullable will fall for. Their IPO is next week. I've got a bridge to sell you, cash only please.
The decision process between ham or spam (or unclear) clearly uses a technique based on Wald's SPRT.
http://en.wikipedia.org/wiki/Sequential_probability_ratio_test
But does it run Li[*BOP!*]
Table-ized A.I.
This seems like a very interesting idea, but if it's not implemented on the spammer's email server, how will this stop spammers who bcc: all their recipients or use mail merge (i.e. most spammers)? I suppose if our email servers (or many reputable email servers) notice the same (well, "same") incoming message to many recipients it could deduce it was such a message, but then there's time-delay workarounds and other problems (welcome emails for signing up for a new service/site come to mind). And if bcc: or mail merge is used, I don't see how you can implement this client-side.
Idea: cool! Chance of success: IMH(layman)O, slim.
It implies that in order to detect spam, you are relying on "the spam target". Thus, in order for the system to "work", some people have to "do their duty" and receive lots of spam. It reminds me of using the cigarette tax to pay for health care. To improve the overall health of the population, "just smoke more".
You might argue that if they can increae the accuracy of detection, it's good. It seems likely that the spammers will just increase the volume of spam, bringing us back to square one.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
While I think this is a spam blocking method that could be very effective to the current trends of spammers, I think there's an easy work-around...
When said spammers get mailing lists, cross reference several of them, and spam the unique people much more than those that exist in all lists. That will create the same trend line that causes emails to be treated as ham.
If they continue to do that, and this system were the only one in use, I'm guessing it'd lead to a large disparity between people that get lots of spam, and people that get little or no spam.
read TFA. It's a receiver net. The receiver's kibbitz. If it were simply weighted then the odds of a message being spam would be something proportional to the number of people recieving the same message or a message with the same attributes, times their frequency of getting spam. Thus messages that mainly go to a lot of known spam recipients are marked spam for everyone.
Some drink at the fountain of knowledge. Others just gargle.
The stuff in the can "spam" = SPicy hAM. That's where the name comes from.
I actually like it once a year or so, fried up with some eggs and peppers and stuff
That potted beef stuff though, yukky. Braunschweiger is better for that on crackers.
Works fine when the receiver list is long, but NOT when the spam is sent to one receiver only or using BCC-address.
You have got the system completely BACKWARDS.
Sorry for AC but i've already moderated in this discussion.
I have been struggling with a large mail services provider who ends up blocking addresses just because one person reported an email as spam.
... "take a look at who is reporting this as spam".
Our problem is people register on our site to spam our forums, then get banned and when they get the newsletter they opted into, they report it as spam. Seen this happen with one or two users... and it ends up getting our newsletter banned from every damn ISP they provide their services too.
Cant get it through to their spam fighters
(Actually the issue here is they handle things manually... pretend its algorithmic and then hide behind all sorts of "trade secret" excuses)
Thank you for all the comments on the NY Times article.
It would be difficult for me to answer each and every comment, so I'll try to just hit the high points here.
It's quite easy to poke fun at an algorithm which is unknown to you as demonstrated by all the comments.
But what's more relevant is whether really smart people who know the algorithm can find fault with it. There are only two people outside of Abaca who know the algorithm: Stephen Wolfram (author of Mathematica) and University of Waterloo Professor Gordon V. Cormack (a well known figure in the anti-spam community). I picked Wolfram because he's the smartest pure math guy I know. I picked Cormack because I think is one of the smartest and most respected scientists in the spam field. You could contact either of them and ask them what they think of the approach. I can tell you what they'd say if you did that. They'd tell you it is a simple, elegant algorithm that has no obvious (to them) holes. I know that because the reason I disclosed it to them was to see if I overlooked anything. Neither found any holes. That doesn't prove that there aren't holes. All systems have holes. What this does mean is that a couple of pretty respected experts think it appears to be pretty solid logic.
In fact, Gordon was kind of enough to go even further and gave me permission to use the follow quote: "This is, by far, the most clever technique I'm aware of for spam filtering." Since Gordon is conference chair for a lot of spam conferences, this is a pretty significant endorsement from someone who KNOWS the full algorithm and who knows the spam space better than just about anyone.
I spent about 4 years studying what others had done in the space. As one commenter pointed out, the recipient reputation system can be thought of as a generalization of the honeypot technique that was first patented by Brightmail.
That's exactly right. My realization is that every email address has statistical value, not just honeypots. So instead of just "black" feedback, the system incorporates "grey" and "white" feedback; every recipient has an apriori odds associated with receiving mail. For many years, Brightmail was the "defacto" standard for spam filtering. Brightmail is just a special case of the algorithm I invented. So instead of learning from honeypots, we learn from ALL recipients and incorporate that statistical input in a mathematically rigorous way in order compute a statistical likelihood that our prediction was correct. That gives us much more input than a honeypot system: it gives us white, black, and grey values. That is critical to avoiding false positives because good sites (like Yahoo and Hotmail) send email to honeypots all the time. And we incorporate that feedback into a statistical framework that is much more accurate than what Brightmail used.
Exactly how we incorporate that input into spam scoring has not been publicly disclosed. It is not obvious.
People who say that this must be snake oil or cannot work ignore the fact that the system has been in use by real customer for more than a year. We have over 100 customers and are just annoucing our existence to the world, so that number should increase quite rapidly now that we are starting to market our product. There are customer testimonials on our website. You can contact them directly to verify that these quotes are legitimate.
Here are statistics from one of our rating servers. There were 1,380,140 messages since the last counter reset. 96% were rated spam. There were 176 false positives and 66 false negatives reported. I just grabbed those stats from one of our live servers right now as I was composing this message. Sometimes we're better, sometimes we're worse, but those numbers are pretty typical.
It's not perfect, but I think those are pretty good error rates for where we are now. And the stats always get better as we add more customers since we get more statistical input and this is just a statistical estimation problem. The more data, the more accurate
This is precisely correct. Never forget Rule #1: spammers lie. That includes lying to their clients.
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
I haven't opened my spam folders in ages. I get maybe 1 spam leaking through in my Yahoo! inbox a day, and maybe 1 a month in GMail. Each account has about 700-800 spam in the spam folder with a 30 day autodelete. This means I'm getting about 50 spam/day. I can live with deleting one bogus email per day on avarage. 1 in 50 spam getting through? Not bad!
Are there false positives? Not that I've noticed in a long time. There might be, but the last few times I've deigned to wander through those swamps, I've found nothing of value.
That's not to say all spam filters are good. Yahoo! and Google seem to have done a stellar job. My email account at Global Crossing, though, which used d-spam, had a 27% false positive rate when I finally gave up on it. I just forwarded everything to Google instead. :-)
--JoeProgram Intellivision!
For the record, if we killed all the lawyers, I think we could all get away with it; we'd be found innocent when not a single prosecuting attorney could be located (if we missed one he wouldn't point it out!), and we'd be brought to trial without appropriate defense... Let your competitors live, we'll get the lawyers (NewYorkCountyLawyer is a notable exception).
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
Basically I imagined a redoing things at the protocol level, with "payment" sent with the email, encrypted, and it would all require the digital signatures, etc, etc. Naturally, I had imagined this as a service that people would pay money for, but it would require a critical mass, etc.
:)
It's just fun to come up with imaginative ideas, I guess.
> The big assumption is that you can identify the recipients
> of a particular message, but spammers can easily ensure
> that information isn't easily obtained.
Nonsense. You're confusing the body from/to with the envelope from/to.
Spammers can't hide the envelope from/to.
There is no essential difference to a checksum clearing house like Razor. But their system needs way to much coerced effort from different parties that it will ever make it out of very large mail providers.
As I understand it, the system works best when multiple messages are sent in reasonably quick succession from a single IP address (which is usually the case, even if tens of thousands of zombies are involved).
It treats those messages as a group, and rates the whole group according to the "reputation" of the named recipients (which of course are in each individual email). Emails addressed to honeypots and bad addresses will assign a high probability that all emails in the group are spam. Legit addresses with little or no history of receiving spam will decrease that probability for the group. Legit addresses that do receive spam will increase it according to their typical spam content, and so on. If enough emails from that group go to typical spam recipients, that whole group of emails from that IP is considered spam, receiver reputations are updated, and the group is dropped.
The claim is that this method is usually quite effective even for small groups of a handful of emails, and it does sound plausible. I don't see that it could make any effective judgement if it received only a single email from a given IP address, however, and would have to let that through unchallenged. It's possible that, even at the ISP level, a large enough botnet could send no more than a single email from each bot to a given ISP, which would render the system largely useless. However, it could be very effective against spam sent from a small number of machines, especially for larger ISPs, and results could optionally be pooled from multiple ISPs which would increase effectiveness for larger botnets.
Why would anyone engrave "Elbereth"?
MightyYar -> Hi try a gym
Your post advocates a
( ) technical ( ) legislative (X) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
(X) Mailing lists and other legitimate email uses would be affected
(X) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
(X) It will stop spam for two weeks and then we'll be stuck with it
(X) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
(X) Requires immediate total cooperation from everybody at once
(X) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
(X) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
(X) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
(X) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
(X) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
( ) Armies of worm riddled broadband-connected Windows boxes
(X) Eternal arms race involved in all filtering approaches
(X) Extreme profitability of spam
(X) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
(X) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
(X) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses [hey, it's Microsoft... they've probably already submitted the patent...]
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(X) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your
house down!
Help poke pirates in the eyepatch, arr.
I was really wondering... do people actually buy item advertised in spam?
If you delay pleasure infinitely, the pleasure will be infinite. (YM)
Yes, the originator of a message can forge headers, but trustworthy relays won't. They all record the IP address of where they got it from, which can't be spoofed so it's as reliable as the relay itself.
All you need to do is traverse back through the relay chain, looking at IP addresses, until you get to the last relay considered reliable (by whatever criteria you like). The IP address it received the email from (which might be a legit relay or a zombie or even the real originator) can be investigated further, or just considered the originator for your purpose.
Why would anyone engrave "Elbereth"?
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
I've had to do that on many occasions because lusers tend to put their own email into their email client incorrectly. It is always something close, so trying variations of hyphen/underscore, period separators, etc, will often get it.
I am sure many people have great things to sell to you and they will find it a great marketing tool to send you emails selling really useful things to you. When you are at it, why don't you post your real mailing address? I will subscribe to a thousand catalog companies on your behalf. I am sure you would not mind your credit card bills, mortgage notices, tax bills being mixed up with all the catalogs you are getting in your regular mail.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
There is already reputation-based anti-SPAM in a commercial product that works well - Secure Computing's Ironmail (http://www.securecomputing.com/index.cfm?skey=1612) powered by TrustedSource. No need to reinvent the wheel...
But why would the anti-spam software companies want that? If they succeed in actually eliminating spam, they'd also go out of business. It may be profitable for the spammers, but I suspect it's even more profitable for the anti-spam companies.
"Hold on Joe, we can't implement that algorithm, we'll lose our jobs." Probably _not_ something any flies on the wall will have heard in the anti-spam industry. The "boss" probably doesn't know what a hashtable is, and finds his lead programmer's attitude annoyingly expensive. After-all you can hire vb programmers out of school for less than half as much, and it's just software after all. What's the big deal with these code reviews.
My point is, the "powers" that be, in the particular case, are likely incompetent - incapable of successfully pulling off such a conspiracy. The CEO probably blows his load whenever he thinks of outselling his economic rivals. If they could make their product an order of magnitude better, and *own* the market, then they could be the next M$ of their industry. If companies produce poor software, it's probably because they're more interested in the business side of things, than any real care about producing good software.
Don't hold your breath for spam killer software - not because the anti-spam industry isn't trying, but because the problem is genuinely hard, and the PHB qualities of management in the software industry.
Like all pain, suffering is a signal that something isn't right
Right back atcha:
courseofhumanevents -> "Must Fence A Nervous Ho"
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Since there are over 20 replies to my message, I'll reply to my own message rather than replying to the individual replies.
First, I'll point out that a large amount of spam comes from a small number of spamming operations. Check out SpamHaus and read their listings of the top spammers. You'll find that if you could stop just the top handful, you would have a huge impact on the total amount of spam. And I'm not going to suggest hunting them down with cops and guns, either.
If you look further into the work of these spammers (I'll call it work, you can call it whatever you like), you'll find that one commonality is that the top spammers have registered lots of domains themselves that they spamvertise. If you dig deeper into these domains, you'll find that the spammers use only a small number of registrars and ISPs for their spamvertised domains. And if you bother to do a WHOIS on said domains, you'll find that many of the spammers don't even bother to make up new registration data for the domains, they just stick to a couple of repeated aliases each.
Therefore, the registrars that sell the domains could chose to deny the sale of the domains based on the identity of the people buying them. For example, "Leo Kuvayev" is currently ranked number one at spamhaus. His list of aliases for registration is quite short. But yet the registrars chose to do business with him, even knowing that he is linked to criminal activity.
I therefore say that the fault for much of the spam lies in the hands of registrars and ISPs that willingly keep criminals as customers.
Which of course leads to the question of why these companies would do such a thing, which has a simple answer - money. These companies are making money off of these criminals who they do business with.
Therefore, I propose that the solution lies in better regulation of the registrars and ISPs. In particular, if ICANN actually enforced some codes of decency on the registrars, by way of hitting bad registrars with hefty fines, the registrars would be forced to pass on the higher costs of business to their customers. If domains become expensive, then we will succeed in increasing the cost of business for the spammers.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Linux is not gay, homosexuals are gay.
GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
When I saw that statement, I laughed so hard I cried. A marketing strategy that works by claiming other techniques fail every time - what could possibly go wrong?
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
If you were able to detect zombies and spam, why not detect illegal download for the MAFIAA, or kiddie pron, or people looking for plans on how to build a bomb?
If the world were operated by reasonable people, then you have a fantastic concept, but at least here in the US of A, it may be opening a can of worms.
Do you really want to stop spam Email? It is simple really, change email standards. You see, by and large phone numbers are only given to people you want to call you, Email should be the same way, Instead of accepting all email coming in, the standard should require some form of public key-private key encryption on emails coming in. Even something so simple a requiring an encryption password would hurts spam like nothing else (Heck you could just have your email client required to put a plain-text password in the header, then filter everything out that doesn't have it.)
Just think about it, It could work much like MSN or AOL instant messengers work "So and so has given you permission to email them, or so and so has requested that you allow them to email you"
While you could possibly brute force the method above, or get the Email from an unsecured client, the amount of spam emails you get would dramatically decrease, and if by some fluk they start to come, you just change your password in the background it could easily change the password for everyone in your profile ("so and so has updated his settings")
But, of course, that would require a fairly large change in the current email substructure. I think it would work fairly well though.
I think it sounds interesting but there may be some hole in it that I don't see.
I don't care if it's 90,000 hectares. That lake was not my doing.
I read the article, it sounds like it depends on the recipients list. But if thats true, and this starts to block a lot of spam, won't they just change what they send so that each target is sent its own message. Then there is only 1 recipient (you) in the list.
Also, I don't follow how they decide if a message is spam or not. yes, once you have the ratings for all the users, you can use their up slope/down slope test. but how's it get the original ratings for the users?
And how does it keep current data on what 10000000000 other users have for spam ratios? Does it have some kind of statistics exchange? (which seems vulnerable to attack/manipulation)
It's gotten to the point where I crafted some procmail recipes that will explicitly override certain messages (in the event they're flagged by SA) given certain subject words.
I decided to give 100% Bayesian (and other statistical) filters a try. I found every package in FreeBSD's ports tree that did such filtering, and whittled away programs that either were scripted (SpamBayes is a Python app) or required a running daemon (for example, dspam and Spam Assassin). That left me with 7 lean, compiled binary programs that *only* used classification.
In procmail, I pipe each message through them, which will yield a unique header. If all 7 unanimously tag that message as spam, it gets dumped into a "spam" folder. Likewise, a unanimous ham message gets a direct trip to the inbox. Next, based on a weighted recipe, I count how many "spam" and "ham" votes, and a score of 4 or more results in the messages I sort into "unsurespam" and "unsureham" folders. After the initial training with your saved mail, plus a day or two of close monitoring, this classification is 99.5% trustworthy. I have a script that runs every 5 minutes that determines which program mis-classified the message and then retrains it to the majority vote (ham or spam). Then the message is removed from it's "unsure..." folder, and relocated to the inbox or spam folder.
A message that doesn't result in a "simple majority" vote in either direction (several of the filters have an "unsure" or similar tag), ends up in the generic "unsure" folder, which is very very low volume (maybe a couple of messages a week), and is reviewed by a human. A couple of 'mutt' macros will move the message to "unsurespam" or "unsureham", where the re-training script will deal with it.
The beauty of this system is that each filter, while alone is not much better/worse than Spam Assassin, continually helps to retrain the others, so that drift in spam patterns/vocabulary, which may throw off a filter or two, doesn't appear to have much of an effect on the overall system.
I've only used this for about a month on 2 accounts: a relatively low-volume personal gmail account (via mutt/fetchmail/postfix) and my day-job account, which gets *tons* of spam since I get all webmaster/postmaster/domainmaster/etc system aliases. The system seems to lean towards false negatives, resulting in maybe 5-to-10% of the number Spam Assassin resulted in. I've yet to see a single false positive, and maybe 1% of the Spam Assassin volume of false negatives end up in the generic "unknown" folder for manual review.
I plan on rolling this system out for the critical high-volume mailboxes (support, sales, etc.) for my company very soon, probably during the lull between Christmas and New Year's.
The programs (in order called from my .procmailrc): bmf, CRM114, Bogofilter, SpamProbe, qsf (Quick Spam Filter), annoyance-filter, and SpamOracle.
All are compiled programs, and each runs very quickly. I wish I had the time and know-how to do a proper statistical analysis of this system, but at a glance it not only works much better than Spam Assassin did, it seems to take up less memory/cpu resources and doesn't use any network resources (DNS, domain keys, or RBL lookups).
The only notable weaknesses so far that I've observed is annoyance-filter. Unlike the others, it doesn't have a dynamic dat
Method of processing duck feet
Linux is not gay, homosexuals are gay.
Not all homosexuals are happy, cheerful people either.
In the free world the media isn't government run; the government is media run.
I have had an idea for stopping SPAM for awhile; I think it would work...
I observe that nearly 100% of my SPAM messages are from email addresses that do not really exist. In fact, that is how I would differentiate between "SPAM" and being on someone's rightful advertising list (for something legitimately obtained).
So, here's my idea:
SCENARIO 1:
Mr. Spammer sends me an email message. My email client silently receives it and says, "hm.. I don't know this person." So, it replies back with a special message containing a serial number or special word with instructions saying, "You are unknown to me. Simply reply to this message, and your original will get through." (Perhaps I have no idea that the first message was even received, and didn't know that my computer just sent another message out on my behalf begging for a reply.)
If the original "From" email address isn't legitimate, then Mr. Spammer's message will die and I will simply never see it.
If I get a bounce-back message, I may not see that because most bounce-backs usually include the original message (so it would have the original serial number/special code), and the email program would just go "hm... I guess it was SPAM."
If Mr. Spammer does get the message, then I at least have a verifiable identity, and later I can ask him politely to stop sending me SPAM or I can simply block the address for any future messages from him. But at least I know for certain where the message came from!
If Mr. Spammer has an email client with this feature, neither he nor I may ever know that this conversation took place between our email programs. It could be done transparently. It could also be done at the mail server level, so old email programs don't even have to be upgraded.
If Mr. Spammer has an old email client or old email server without this feature, it still works because he can still use his old program to reply to my email program's auto-generated one with the serial number/special code.
SCENARIO 2:
My long-lost pal, "Bob" found my email address and sends me a first email message. I do not know of Bob's email address... it's "out-of-the-blue" and kinda looks like SPAM.
So, my email program replies to Bob and says, "hm... I do not know you. Please reply to this email!" (and of course, there is the special serial number/special code).
If Bob's email program is the same or has this feature, he has no idea he gets this message back from me, so his email program simply replies back and he is validated: I see his original message.
If Bob's email program is not the same, or doesn't have this feature, he can manually reply.
- If I were a SPAMMER, I could not think of any way around this. It is retro-compatible, and works with existing technologies, can be added seamlessly to new email programs and servers, and doesn't require new laws to be passed (except maybe just an update to an RFC). It will not cause false-positives, nor false-negatives. It tacks-on verifiable addressing (which is what email needs).
The worst I can think that will happen is that Bob's original email will be delayed a short time; but if he has this automatic feature too, he could be verified within seconds or minutes. But once his address is trusted, there will be no delay.
What do you all think?
"They said I probly shouldn't fly with just one eye," "I am Bender. Please insert girder."
Here is the published patent application for this spam technology.
http://snipurl.com/1un6a
The strange part of the application is that it lists "Kirsch; Steven T" as the inventor but "Google" as handling all correspondence for the patent.
Oh, great. Anonymous coward renames spam to Mohammad, in an effort to trigger religious persecution of spam. Unfortunately, the violation is the naming. Notice how the teddy bear skated through the crisis unscathed. Now all anonymous cowards are to be flogged and deported (at least the ones operating out of Sudan).
Explain to us the problem.
If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").
Problem is a lot of the actual sending is done by PC's which have been taken over.
So the spammers CAN have their systems set up to auto-reply to your message. And when you get mad about the spam, all you have is some grandma's email address, because she doesn't know to run a firewall or because she made her password the name of her grandson.
You *wont* get the spammer. And you will get the spam.
On the other hand, your approach might work to help build a list of compromised PCs...
"spammers can get around this"
Correction. Should have read "spammers CAN NOT get around this." Sorry for the typo.
How I see this working:
1) System is based upon the amount of spam received by customers and depends on some users having a high proportion of spam.
2) The customers are using the system, so their level of spam would reduce to a very low average.
3) Nobody in the system is receiving a lot of spam so the system can no longer easily identify spam.
4) The level of spam increases again.
5) Goto 1.
It would be interesting to see where the spam level received balances out too.
But both this new approach and the spambait-user approaches suffer from the problem of identifying identical messages; spammers often try hard to make messages slightly different, or images slightly different, so that message hashes will be different for otherwise-identical spam, and anti-spammers try to make hashes that ignore easily-modifiable stuff at the beginning and end of messages, and spammers try to work around those techniques, in the usual arms race.
On the other hand, you don't need to be Google to get enough email volume that you've got some mailboxes that receive a lot more spam than others. Almost any domain is going to have some addresses like "sales@" and "info@" and "webmaster@" and such that get lots of spam, and just about any address on your web pages is going to get harvested. Statistically you're unlikely to kill 99% of the spam unless you've got over 100 users, and probably 100 is better, but that's probably enough, especially if you use some kind of shared spam-filtration system like Razor / Cloudmark. Also, while this guy's blazingly optimistic about his technique stopping most spam, it doesn't have to be your only tool - you can adapt it to use as a SpamAssassin weight or whatever.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Botnets have gotten pretty big these days, and they've already _been_ using the technique of spreading around their spam so that any given ISP only gets a few messages from a given bot, at least at a given time. Botnets and spamming have gotten big enough that they can coordinate this kind of thing fairly effectively, at least for the organized-crime types of spam rings, though Bubba in his trailer park may not be doing that. As long as you've got the big numbers, it's not hard to coordinate it.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Conservatives want to keep all that money in the black market because there's a lot more money that way, and it lets them keep police forces in business; many recreational drugs would be way too cheap to have a big economic impact if they were legal. Marijuana's the obvious example there; if people could grow their own without interference, high-quality weed wouldn't cost more than tea, which is a lot more work to grow, and unlike tobacco it's really tough to smoke two packs of dope a day. Taxing legalized homegrown isn't going to keep the CIA and their buddies funded.
If you look at the costs of opiates, such as the poppies that keep the Taliban in business, medical opiates are really cheap /b>- a bottle of Tylenol 3 with Codeine over-the-counter in Canada costs about $5/100 tabs, and even in the US, the last time my dentist prescribed me Vicodin it was about $5/20. Either one could keep Rush Limbaugh happy for less money than my daily Starbucks habit if it weren't mixed with tylenol (which is dangerous in recreational doses.) Prescription Oxycontin isn't more expensive because there's a hydroxyl radical stuck over on the left-hand side, which doesn't change the manufacturing costs significantly compared to codeine or morphine, it's just the brand-name and the fancy timed-release packaging that addicts don't care about and the extra anti-diversion handling requirements.
And the only thing keeping amphetamines expensive is the cost of the black market - Sudafed was dirt cheap back before the anti-meth laws made them overpackage it, and the pharma companies could make meth for about the same price as Sudafed or Phenylephrine since they wouldn't have to undo the last couple of manufacturing steps the way current meth-cooks do.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Any such service should be free to use as Google accelerator is because the proxy you use collects data of the web sites you visit. Giving the proxy such information and paying money on top of it is silly if you use a snake-oil product that charges you for that.
For ISP and sysadmins, you can set-up a caching proxy (squid, apache) and optionally a program that diminish the quality of the images that you cache.
Google uses standard web acceleration strategies: Google Web Accelerator uses various strategies to make your web pages load faster, including:
* Sending your page requests through Google machines dedicated to handling Google Web Accelerator traffic.
* Storing copies of frequently looked at pages to make them quickly accessible.
* Downloading only the updates if a web page has changed slightly since you last viewed it.
* Prefetching certain pages onto your computer in advance.
* Managing your Internet connection to reduce delays.
* Compressing data before sending it to your computer.
Everything I write is lies, read between the lines.
I've only been preaching this for years. Now we have an article that also realizes that there is an economic solution needed for the spam problem.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.