Ending Spam
Shalendra Chhabra writes "Jonathan
Zdziarski has been fighting spam since before the first MIT
spam conference in 2003, and has now released a full-on technical
book,
Ending Spam, on spam filtering. Ending Spam
covers how
the current
and near-future crop of heuristic and statistical filters actually work
under the hood, and how you can most effectively use such filters to
protect your inbox." Read on for the rest of Chhabra's review.
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
author
Jonathan A. Zdziarski
pages
312
publisher
No Starch Press
rating
8
reviewer
Shalendra Chhabra
ISBN
1593270526
summary
Very Good Book Covering Statistical Models and Techniques Implemented in Current Spam Filters
Spam (unsolicited commercial email) and phishing (fraudulent emails) are causing losses of billions of dollars to businesses. Many initiatives are currently underway for fighting this challenge. On the legal front, a Virginia court recently sentenced a prolific spammer, Jeremy Jaynes, to nine years in prison, and a Nigerian court sentenced a woman to two and a half years for phishing. Michigan and Utah have both passed laws creating "do-not-contact" registries in July/August 2005, covering e-mail addresses, instant messaging addresses and telephone numbers. Technical initiatives to fight spam include server- or client-side spam filtering, using Lists (Blacklists, Whitelists, Greylists), Email Authentication Standards (IIM, DK, DKIM, SPF, SenderID), and emerging sender reputation and accreditation services.
Ending Spam is the first book explaining the fine details of the theoretical models and machine-learning algorithms implemented in these filters. The book is divided into three parts: introduction to spam filtering, fundamentals of statistical filtering, and advanced concepts of statistical filtering.
The first section of the book discusses the history of spam, spam kings, different approaches for fighting spam such as blacklisting, whitelisting, heuristic filtering, challenge response, throttling, collaborative filtering, Authenticated SMTP, Sender Policy Framework and SenderID, spammer fingerprinting, etc. However, the author omitted any mention of locally-sensitive hash functions (such as Nilsimsa Hash) to counter spammers' random insertion of words, the use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart), Greylisting, Identified Internet Mail, and Domain Keys (now Domain Keys Identified Mail).
In the next chapter, the author clearly explains various components of a Language Classifier Pipeline, including the Historical Dataset (aka wordlist, database, dictionary, filter memory), Tokenizer, and the Analysis Engine with its feedback loop. However, the process flow of a language classifier could have been more generalized, e.g. incorporating an initial text-to-text transformer. This chapter also covers the advantages and disadvantages of various training modes for filters, such as Train Everything (TEFT), Train-on-Error (TOE), and Train Until No Errors (TUNE). This part concludes with the description of Paul Graham's famous spam-filtering technique using Bayesian classification (as described in "A Plan for Spam"), Gary Robinson's Geometric Mean Test, Fisher-Robinsons Inverse Chi Square (including the source code for the inversion function), and some other tricks for optimizing spam- filtering accuracy.
The second part of this book deals with the fundamentals of statistical filtering. The author explains HTML and Base64 encoding, followed by a detailed description of tokenization techniques (e.g. Sparse Binary Polynomial Hashing). Then there's a discussion of the various tricks that spammers use for penetrating filters. Although these tactics are mentioned in John Graham-Cumming's "Spammers Compendium," Jonathan has very elegantly explained why some tricks work for spammers and some don't. This part concludes by addressing some of the resource, storage and scaling concerns raised by the large number of features generated from tokenization techniques.
The third part of this book deals with advanced concepts of statistical filtering. This includes the testing criteria for measuring accuracy of an email filter, and some advanced tokenization concepts, e.g. chained tokens (taking word-pairs and phrases into account, instead of individual words) generated using a sliding 5-byte window as mentioned in Sparse Binary Polynomial Hashing. The next chapter describes the Markovian Model implemented in the CRM114 Discriminator, but the author fails to describe different weighting schemes for features implemented in the Markovian-based version of CRM114. The author then describes the Bayesian Noise Reduction Technique for purging "out of context" data from the mail text. This chapter concludes with a very nice summary of collaborative algorithms and techniques, such as Message Innoculation, Streamlined Blackhole List, Fingerprinting, Automatic Whitelisting, URL Blacklisting, and Honeypot email addresses for snaring spammers' address harvesting bots.
The most interesting part of this book is the appendix, where the author presents interviews with John Graham-Cumming of POPFile, Brian Burton of SpamProbe, Marty Lamb of TarProxy, Bill Yerazunis of CRM114 Discriminator, and Jonathan Zdziarski of DSPAM (himself). I loved this section.
The salient points of the book: it's very easy to read; each chapter begins with a very thought-provoking introduction, and concludes with a crisp "final thoughts" section. The number of technical errors are very few in this print, and the illustrations are of good quality. Since the book is geared more toward the Bayesian and statistical generation of spam filters, the absence of certain spam-busting technologies is acceptable. However, a noticeable omission is the lack of discussion about measuring spam-filter accuracy, and what impact this has on setting filtration thresholds. A section on the economics of tradeoffs, and the use of a Receiver Operating Characteristic curve (ROC) would have been very helpful.
Overall, by putting together Ending Spam, Jonathan Zdziarski has made another significant contribution (after DSPAM) to the anti-spam community. Whether you are a system administrator, anti-spam researcher, engineer or a newbie interested in fighting spam, this book is a great reference.
William S Yerazunis and Richard Jowsey also contributed to this review. Shalendra Chhabra is a Graduate Student in Department of Computer Science and Engineering at University of California, Riverside. He is on the development team of CRM114 Discriminator and has presented his work at MIT Spam Conference 2005, Cisco Systems, and Stanford University. You can purchase Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Spam (unsolicited commercial email) and phishing (fraudulent emails) are causing losses of billions of dollars to businesses. Many initiatives are currently underway for fighting this challenge. On the legal front, a Virginia court recently sentenced a prolific spammer, Jeremy Jaynes, to nine years in prison, and a Nigerian court sentenced a woman to two and a half years for phishing. Michigan and Utah have both passed laws creating "do-not-contact" registries in July/August 2005, covering e-mail addresses, instant messaging addresses and telephone numbers. Technical initiatives to fight spam include server- or client-side spam filtering, using Lists (Blacklists, Whitelists, Greylists), Email Authentication Standards (IIM, DK, DKIM, SPF, SenderID), and emerging sender reputation and accreditation services.
Ending Spam is the first book explaining the fine details of the theoretical models and machine-learning algorithms implemented in these filters. The book is divided into three parts: introduction to spam filtering, fundamentals of statistical filtering, and advanced concepts of statistical filtering.
The first section of the book discusses the history of spam, spam kings, different approaches for fighting spam such as blacklisting, whitelisting, heuristic filtering, challenge response, throttling, collaborative filtering, Authenticated SMTP, Sender Policy Framework and SenderID, spammer fingerprinting, etc. However, the author omitted any mention of locally-sensitive hash functions (such as Nilsimsa Hash) to counter spammers' random insertion of words, the use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart), Greylisting, Identified Internet Mail, and Domain Keys (now Domain Keys Identified Mail).
In the next chapter, the author clearly explains various components of a Language Classifier Pipeline, including the Historical Dataset (aka wordlist, database, dictionary, filter memory), Tokenizer, and the Analysis Engine with its feedback loop. However, the process flow of a language classifier could have been more generalized, e.g. incorporating an initial text-to-text transformer. This chapter also covers the advantages and disadvantages of various training modes for filters, such as Train Everything (TEFT), Train-on-Error (TOE), and Train Until No Errors (TUNE). This part concludes with the description of Paul Graham's famous spam-filtering technique using Bayesian classification (as described in "A Plan for Spam"), Gary Robinson's Geometric Mean Test, Fisher-Robinsons Inverse Chi Square (including the source code for the inversion function), and some other tricks for optimizing spam- filtering accuracy.
The second part of this book deals with the fundamentals of statistical filtering. The author explains HTML and Base64 encoding, followed by a detailed description of tokenization techniques (e.g. Sparse Binary Polynomial Hashing). Then there's a discussion of the various tricks that spammers use for penetrating filters. Although these tactics are mentioned in John Graham-Cumming's "Spammers Compendium," Jonathan has very elegantly explained why some tricks work for spammers and some don't. This part concludes by addressing some of the resource, storage and scaling concerns raised by the large number of features generated from tokenization techniques.
The third part of this book deals with advanced concepts of statistical filtering. This includes the testing criteria for measuring accuracy of an email filter, and some advanced tokenization concepts, e.g. chained tokens (taking word-pairs and phrases into account, instead of individual words) generated using a sliding 5-byte window as mentioned in Sparse Binary Polynomial Hashing. The next chapter describes the Markovian Model implemented in the CRM114 Discriminator, but the author fails to describe different weighting schemes for features implemented in the Markovian-based version of CRM114. The author then describes the Bayesian Noise Reduction Technique for purging "out of context" data from the mail text. This chapter concludes with a very nice summary of collaborative algorithms and techniques, such as Message Innoculation, Streamlined Blackhole List, Fingerprinting, Automatic Whitelisting, URL Blacklisting, and Honeypot email addresses for snaring spammers' address harvesting bots.
The most interesting part of this book is the appendix, where the author presents interviews with John Graham-Cumming of POPFile, Brian Burton of SpamProbe, Marty Lamb of TarProxy, Bill Yerazunis of CRM114 Discriminator, and Jonathan Zdziarski of DSPAM (himself). I loved this section.
The salient points of the book: it's very easy to read; each chapter begins with a very thought-provoking introduction, and concludes with a crisp "final thoughts" section. The number of technical errors are very few in this print, and the illustrations are of good quality. Since the book is geared more toward the Bayesian and statistical generation of spam filters, the absence of certain spam-busting technologies is acceptable. However, a noticeable omission is the lack of discussion about measuring spam-filter accuracy, and what impact this has on setting filtration thresholds. A section on the economics of tradeoffs, and the use of a Receiver Operating Characteristic curve (ROC) would have been very helpful.
Overall, by putting together Ending Spam, Jonathan Zdziarski has made another significant contribution (after DSPAM) to the anti-spam community. Whether you are a system administrator, anti-spam researcher, engineer or a newbie interested in fighting spam, this book is a great reference.
William S Yerazunis and Richard Jowsey also contributed to this review. Shalendra Chhabra is a Graduate Student in Department of Computer Science and Engineering at University of California, Riverside. He is on the development team of CRM114 Discriminator and has presented his work at MIT Spam Conference 2005, Cisco Systems, and Stanford University. You can purchase Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
The openness eill have to pay it's cost. and spam is one such pest. You can develop better strategies for pest control. But in the end it's a trade off.
Java Oracle Linux Enthusiast
Why worry about spam? Bill Gates promised to end spam by early next spring. (It's marked in my calendar along with the link to where he promised, but not with me in my PDA right now.)
I'm wondering... will UCE (Spam) be like malaria... controllable in most areas but impossible to eradicate?
Or will these dedicated folks and others be able to eliminate it, perhaps by changes to the mail protocols?
You can't talk about Wikipedia's flaws on Wikipedia
While all of these different technological approaches to spam are worth pursuing, they just don't build the same esprit d'corps as a mob with pitchforks and torches at midnight.
If brevity is the soul of wit, then how does one explain Twitter?
"Jonathan Zdziarski has been fighting spam since before the first MIT spam conference in 2003,"
Awww, poor babies. That's a long time to fight spam.
is with a knife, a spatula, and a frying pan, preferably over a hot wood fire.
Yum!
-- Tigger warning: This post may contain tiggers! --
As with any book of this type, it is outdated by the time it reaches the shelves. The spam battlefield changes on a daily basis and the tools used to fight the battle, change with it daily.
By the time a book has been written edited, proof read(though many publishers skip this part), type set, printed, distributed and sold, it no longer resembles the technology.
Spam will continue to disguise itself as legit email. You can try to filter it out and set more strict filters but catching legitimate mail is far more likely to happen. In the end, you have to make a trade off and practically accept some spam.
If you can't see it, it ain't there?
Heck, our lobby group even points out to Congress how spam laws are not really needed, since people who really don't want the spam are free to filter it. That and a litte payola and we are free to phish for more victims.
Yea, keep "fighting spam" with lame filters, we love it. Thanks!
No Karma is given if one is modded up "funny".
I'm wondering... will UCE (Spam) be like malaria... controllable in most areas but impossible to eradicate?
Or will these dedicated folks and others be able to eliminate it, perhaps by changes to the mail protocols?
Interesting question that, considering my work involves malaria.
My guess is that, like malaria and most parasitic infestations, we will at some point develop a "cure". The "cure" will work for a few years, after which the parasite (spam) will have adapted, surviving until then in different hosts (old windows machines donated to Africa, who knows). Then, having developed a new trick, it will come back as strong as ever.
Biology teaches us that organisms adapt to changing environments, thru selective breeding (natural), point mutations, and unforseen combinations (see the H51N avian influenza). We can develop cures, but once we do so, we can be fairly sure that, baring species extinction, it will develop methods to cope with our cures.
An easy solution would be to move to IPv6 - but this, like authentication, will only kill off the spam which doesn't use "trusted email clients that are identified" while the spam that can survive will be encouraged to spread like wildfire.
So long as the fiscal, legal, and societal penalties for spamming are fairly low and the rewards are high, and while most people do nothing about it, it will spread.
-- Tigger warning: This post may contain tiggers! --
Email, as a system, is fundamentally broken. It's this broken design that allows SPAM to happen in the first place.
Current anti-spam solutions are to email what an Antivirus package is to Windows - a hack add-on that increases complexity and costs without solving the underlying problem(s).
Rather than fight viruses, we should be engineering an O/S that's inherently resistent to them. How many of you Linux/BSD/MacOS users EVER use antivirus, or need to?
Rather than build ever-better antispam filters for Email, we should be engineering an email solution that's inherenly resistant to SPAM.
The answer lies in authentication - who is sending the email. Some of the best technologies now available use degrees of authentication without actually *saying* it outright. Examples are: refusing invalid domains, greylisting, challenge-response, SenderID - all of these are some form of authentication.
As these are, one-by-one bypassed by the spammers, the need for authentication of senders will continue to increase, until the dolts who will invariably reply with that "your solution will not work because... (check the options)" are shown to simply be.... wrong.
Give it time. It's already happening whatever the originators of the SMTP protocol desired.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Read some of his essays. He genuinely believes that all evidence clearly shows that the earth cannot possibly be more than 10,000 years old.
The contract between being a logical minded person like a programmer, and being so easily brainwashed into believing comeplete nonsense is startling.
The reason spammers do it is that their message reaches people, enough of them to make it worthwhile. So, the more effective and widespread the filters, the less messages that reach people, and the less it's worth. If the filters were really effective, nearly 100%, it would simply not be worth it to spam, you wouldn't make any money because no one would see your message.
I don't think we'll ever get there, but yes filtering really could end spam.
Spam will never end as long as there's money to be made. As soon as you find a way to stop one form of it, another is found.
It's just like the war on terror or the war on drugs (both equally useless). There will always be fanatics, and drugs, regardless.
Bad analogy. Spam is not an organism or infection. It is a business model. It does not "survive" in computers, but in a combination of economical, technical and legal conditions. Once those conditions become strongly unfavorable to the business model, there isn't really much that adaption can do. Selling "snake-oil" wonder cures used to be a really big, widespread business model. Better-informed consumers and increased regulation of the market for medicine have all but eradicated this practice. It survives, but in a much-changed and diminished form.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
Kofi Annan declares the end of wars.
If this was published by O'Reilly, I'd have bought it on sight as they bother to edit their books. As it is, I'll give it a wide berth.
Spam filtering is crap. It's like having to wear a bullet proof vest because people will be firing at you while you drive to work. Excuse me for thinking it, But no one shoud be taking shots at you for no good reason.
We need to have an automated way of dog-piling the retail site that the spammer is trying to lure you to.
Every time a spammer sends an email for viagra our email client should goto the site and fill out the order form 50 times per second... incorrectly.
There is simply no more time to be pussies about this shit. Spam filtering has been given plenty of time to fix this problem. It's time for something new and aggressive.
VERY AGGRESSIVE.
THE TIME IS NOW!
thank you for your time.
The government which is strong enough to protect you from everything is strong enough to take everything from you.
Reminds me of the conversation at the end of Batman Begins with Gordon and the Bat:
Gordon: "Batman making a stand as he has will only escalate the problem."
If suddenly the masses are educated on spam filtering, wouldn't spammers just adobt tactics to avoid them?
I mean it is afterall a "spammers market". They have increased resources because they're getting all the money. I'm sure the spammers are much smarter than most techies who use filters, they just don't care. They think, "If this techie is going to use a filter to stop my spam so be it, there's a 100 people for each one of him that won't."
No we need to think of new techniques outside of filtering. Filtering is mostly nonsense, manual work. We need something philisophically different than filtering which affects how spam comes through in-transit, or something that affects the financial backing of spammers.
We should be breaking down their lines of communications, etc - not expecting granny to take up spam filtering techniques.
I've been looking for a complete list of current and future technologies to allow me to better get around them and send more spam.
Thanx!!!
Even a manservant reading all of my mail and hand-carying printouts of nothing but personal messages to my Jamacian bungalow doesn't "end" spam.
It would seem that These Guys are actually making an attempt to "end" spam.
All this guy is just talking about is hiding it from view. Big deal...
Bad analogy. Spam is not an organism or infection. It is a business model. It does not "survive" in computers, but in a combination of economical, technical and legal conditions.
True and False.
Spam acts like a parasitic organism, due to the favorable conditions for the business model. It does, in some cases, actually "survive" in certain computers, which are spam zombies that spew out spam from a spam source - in fact, there are a few at the other UW (in Wisconsin) which utilize the identified computers there to get thru the filters here (in Seattle).
Informing consumers is highly unlikely to stop this behaviour - or else AIDS/HIV would have been halted. Some consumers are highly resistant to changing their behaviour, don't think it's important, or it's such a good deal what would it hurt.
And, like the malarial mosquito, spam uses those responders (infected persons) to download more spam zombie software, since they tend not to be technical enough to remove the infection.
-- Tigger warning: This post may contain tiggers! --
So long as the fiscal, legal, and societal penalties for spamming are fairly low and the rewards are high, and while most people do nothing about it, it will spread.
I agree wholeheartedly... Most technological screening solutions would only be a temporary remedy. In the long-run it will be stricter legislation that will impede spammers efforts.
Rule 1 (Spammers always lie) won't change, though occasionally they'll think of new things to lie about. Rule 2 (Spammers are Stupid) won't change, though of course some spammers violate this rule, and some spammers can hire smart people to work for them, and enough of them are sufficiently persistent skr1pt k1dd13z that it sometimes makes up for stupidity.
The latest and greatest spam-blocking technique will last a while before spammers find a way around it - it's somewhat of a losing game, because if it works well enough to be widely popular, it becomes a target for spammers to work around, though if it's effective and obscure, it'll work for you and your friends for a lot longer.
PC users will continue to run insecure operating systems without administering them well, so there'll always be zombies for spammers to abuse. Windows automatic updates will gradually help this, but not only will new OS bugs get discovered frequently, but users will insist on running trojan horses that pretend to be new amusing programs, breaking any semblance of security.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
You're exactly right. I've been running Spam Vampire 24/7 for quite some time now (1-2 years). Works great. Quit bitching and do something about it!
I don't respond to AC's.
So then, anyone in the world who believes in creationism is a twit?
Absolutely. Do you have another word for somebody who ignores all scientific evidence, and instead believes in some imaginary man who lives in the sky and performs miracles? I think that "psychotic" or "delusional" or "schizophrenic" also work, but "twit" is pretty good, too!
I don't respond to AC's.
It does, in some cases, actually "survive" in certain computers, which are spam zombies that spew out spam from a spam source
That's not survival in the "organism" analogy, since a zombie will not send spam without a source, which will be gone when the business model is not workable, and especially not cause new source to appear.
like the malarial mosquito, spam uses those responders (infected persons) to download more spam zombie software, since they tend not to be technical enough to remove the infection.
You're mixing up the spreading of "zombie" software that is used to send spam with the spreading of spam itself.
I totally agree that computer worms/viruses work very much like an infectious disease. But they are merely one tool that spammers use, not identical with the phenomenon of spam as such.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
I totally agree that computer worms/viruses work very much like an infectious disease. But they are merely one tool that spammers use, not identical with the phenomenon of spam as such.
Just as a mosquito is merely a tool the malarial parasite uses to spread itself.
Let's say we knock out something that permits mosquitos to infect human hosts. Chances are that it might only partially impact malarial infections of non-human hosts. The impacted malarial bug, provided it survives and breeds, may then decide to use another vector to complete the infection.
Same with spam - we can knock out the zombies. We can knock out the spam kingpins. We can make the email transmission more secure - it migrates to cell phones or text messages or video messages. Unless we go for species extinction, it is likely that it won't die, but will instead change.
Nowadays I rarely see pop-under ads any more - due to using different browsers - but now ads show up that are movies, which really burn up my bandwidth. To kill off those ads, I would have to disable the very useful site portions that i do want.
So long as the evolutionary niche exists that permits spamsters to make a buck or two from sending spam, so long as people don't turn in most spam, so long as some people buy from spamsters, and so long as most spamsters don't serve long jail sentences and are never caught, it is highly unlikely that spam will cease to exist.
-- Tigger warning: This post may contain tiggers! --
Spam may not be an organism or an infection, but the people who send it are. So I think it is a perfect analogy.
Why does it sound like the only people who will buy the book are the people who are trying to beat the filters?
Obligatory comment:
I have a peanut allergy you insensitive clod!
Seven puppies were harmed during the making of this post.
If they're adopting SenderID, it makes it easy to filter them. You can't filter just on the existence of SenderID; you need to check who the sender is and ignore email from known spammers.
That's a good thing. It lets them spew all of the email they want; let's call it freedom of speech (since I don't want any legal limitations on spam also being used to prevent legitimate speech). And I get to ignore them; I can filter them at the SMTP layer even before they get to send the whole message.
It may not be successful yet, if people are misusing the technology by trusting the existence of a Sender ID record to mean it's not spam. But don't blame the technology for being misused.
Zdziarski's claims for the performance of DSPAM are just as fantastic as his creationist claims.
He presents not one iota of scientific evidence that DSPAM is a good filter. Here's an article that shows that DSPAM kinda sucks compared to the competition.
The fundamental problem is that technology pushed the *costs* of sending mail and creating identifiers (IP addresses, domain names, email addrs, etc.) to near-zero and the cost of finding recipients to near zero, human nature makes it profitable to send gullible people mail if you've got no morals, and the popularity of the internet means that people with no morals can easily get the tools to use it. Willingness to spam is a social problem, and economics have made it possible to become an actual problem. The real cost of sending mail isn't likely to go up (encryption affects it a bit, but CPU time is basically free, or you can attempt to impose artificial prices on email transmission (which will fail, if you get it accepted at all, because they don't match real prices.) You can use technology to increase the cost of discovering recipients, using things like tagged addresses and subdomain-per-user naming that increase the search space, and you can use technology to reduce the amount of mail a given group of senders can send to a given receiver. *Recipients* can impose prices or other throttling mechanisms on senders without disrupting most of the other infrastructure, which can help - I know a number of people who find that simple TMDA/Captcha techniques kill off most of their spam, by increasing the cost of discovering an email address that they'll *read* (the cost is the attention spam of having a real human read the captcha image, plus the need to use a real email address to send from instead of a bogus one) - but even they say that it annoys some people they'd really like to get email from.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
No, because the anti-spam measures do not aim to kill those people, only to make them stop sending spam. Furthermore, spammers are not a separate species and do not reproduce (as spammers).
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
Read it and ask yourself:
Just as a mosquito is merely a tool the malarial parasite uses to spread itself.
Except that spam does not use zombies to spread itself, SPAMMERS use zombies to spread spam.
Your analogy is simply flawed. Spam is NOT an organism. It does NOT "survive" somewhere, adapt and spread from the places where it survived.
And we certainly DO go for "species extinction", by eliminating the conditions that make spam practicable and profitable. You enumerate some of those conditions yourself in the end.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
Obligatory dismantling of your post:
1. Milky Ways do not have peanuts.
2. "Blah blah blah, you insensitive clod" is old and unoriginal (as is "I, for one, welcome our X overlords" and "1. Blah. 2. ?? 3. Profit!").
3. Die.
No, because the anti-spam measures do not aim to kill those people
Yet.
Obligatory rebuttal:
Although Milky Ways do not have peanuts, if you had bothered to read the parent the poster was talking about Snickers, which does!
We all die. Race ya!
Seven puppies were harmed during the making of this post.
While at defcon I found this book called "Spam Cartel" which is very very interesting and revealing.
I also know an acquaintence who developed a very unique and effective program to "finger" every Spam bot infected PC and with a "secret" program under trial, it shut down more than 550,000 spam sending infected PC's.
reports from the SPAM CHAT Channels indicate it was very effective in nailing down and eliminating Spam bots.
The experiment was ongoing for about 4 months last year, and WOW! I had no idea there were that many spam bots...
Word I've gotten is that a few "Checks and Balances" need to be deployed to prevent abuse... but I can imagine what would happen of more mail servers would deploy such a system.
J
Except that spam does not use zombies to spread itself, SPAMMERS use zombies to spread spam.
Your analogy is simply flawed. Spam is NOT an organism. It does NOT "survive" somewhere, adapt and spread from the places where it survived.
And we certainly DO go for "species extinction", by eliminating the conditions that make spam practicable and profitable. You enumerate some of those conditions yourself in the end.
If it looks like a duck, and it quacks like a duck, and it paddles like a duck, you want me to check to see if it's a robotic assembly of nanobots pretending to be a duck.
Nah. My point is/was - not that I brought up the biological equivalency of spam to malaria (someone else did, and i said it isn't, but it could be thought of that way) - that even should we find a "cure" for spam, it would come back so long as the underlying model rewarded the spamsters in some way to continue to perpetuate.
So long as up to half the population won't report spam - in fact, it's more like 99 percent;
So long as enough people buy from spamsters to make it economically rewarding - which it is;
So long as the penalty is remote enough or far enough in the future to be ignored - which it is;
And so long as society encourages the pursuit of wealth above moral/ethical standards - which it does;
This won't change.
Sure, you can plug up a hole in the dike. I can - and do - turn in spamsters. But they will migrate and adapt.
Are they infectious diseases? Sometimes, see the use of zombies.
Can we truly eradicate them - no, because people will replace the prior spamsters so long as the afore-mentioned conditions perpetuate.
Want to cut down malaria? First, find easy methods of improving sanitation that allows it to perpetuate. Then find ways to interfere with the malarial infection of humans. If you do it backwards, it's likely that many places will still spread it. Because not everyone is rich like we are.
Same goes for spam - find ways to make it unrewarding for people to buy from spamsters (e.g. sell Viagra etc cheap, offer open source versions of office cheap - that's what they sell), find ways to make it bad to be a spamster, and then batten down the hatches with new protocols.
-- Tigger warning: This post may contain tiggers! --
Blacklist everyone, then whitelist only those people who you really want to communicate with. I've been doing it for years and get ZERO spam. People argue that they will miss important messages - nope, I never have. Email is not the only form of communication. All my family, friends, business clients know how to use the phone if their emails bounce. I have a web form (and phone number) for new clients (and once verified they are whitelisted), and I don't give a shit about the few messages that might not make it (although after several years of using this method I have no evidence that I've missed even one).
Next on Slashdot: "Establishing Utopia."
Greylisting solves 95% for me - seriously. Try Postgrey for an easy, built-in solution to use with Postfix - it works like crazy.
bad_outlook
--
Is this vague enough for you?
Some of the previous posters mentioned the rather eccentric views (in my opinion) of the author of Ending Spam (Jonathan Zdziarski). You can sample some of these yourself by reading the essays Mr. Zdziarski has posted on his web site NuclearElephant.com.
While someone might have, in practice, unlimited amounts of money, none of us have unlimited amounts of time. So a book is always an investement in both time and, for those with more finite amounts of money, cash. With this in mind, there is the question of whether one should read a book by someone who is rather eccentric in their views. Will this eccentricity and, in my opinion, limited knowledge outside of narrow areas, also mean that the book is equally flawed?
I'm undecided. My concern is that Mr. Zdziarski's knowledge of Baysian filtering and other topics has the same kind of holes that seem to exist when he applies his intellect to other areas (like evolution of both life and the solar system). While this is a concern, it is not a foregone conclusion. The history of science and, especially, mathematics, is full of giants in their field who were also very eccentric.
Mr. Zdziarski seems to have what I would classify as a narrowly focused intellect and perhaps within these narrow confines the reader can rely on what he writes. DSPAM, the SPAM filter written my Mr. Zdziarski, seems to be a storng competitor to SpamAssassin. So on this basis, perhaps the book may be a good investment.
How about big fines for the companies that adverstise with spammers? ($1/message!) Figure out how to tax their illegal income and file tax evasion charges! (Works on the mob!)
...sounds like a reality show for Fox!
Or
Jhunkhad: A Holy War Against the Infidel Spammers!
In front of a camera, stand them up and make them recite that they have small, flacid penises and need to refinance their homes and consolidate their debt because they owe all their money to hot horny teen girl web cam sites. Then slap them with a herring until they are unconcious.
I might know what I'm talkin' about, but then again, this is Slashdot...
While I don't get the 99% or whatever success rate that DSPAM is claimed to get, I get at least 96%. It is pretty good. Better than I got from SpamAssassin and don't require any manual tweaking of rules. One thing that does make DSPAM suck though is that it requires a msssive database backend. It does not scale well at all.
-matthew
"THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
Tada
And spammers reproduce via cellular mitosis, like they're supposed to.
If corporations are people, aren't stockholders guilty of slavery?
Generally, home users in the US don't have metered bandwidth. If you have a limit at home, then you should look into finding a new provider.
Moving to North America costs a lot of money and time (which is money).
It doesn't effect zombies. If you took the time to read, you'd see that this hits the website being advertised
Which might have the DNS running on two zombies and the HTTP running on two more zombies.
Did you just fill in the list at random?
Free Hans!
Jonathan Zdziarski has been fighting spam since before the first MIT spam conference in 2003
Big deal, I've been fighting spam since 1995.
I've seen this "checklist" format in a couple of semi-humourous posts on slashdot recently, and it's made me wonder whether there's a parody going on.
Is this some US government standard reply that all you guys across the pond are familiar with? (Let this Limey in on the joke...)
Surely that means it does scale well? It might be a pain to have a DB backend for a single user, but for scaling up to 1,000s of users, that's exactly what you want.
But that's exactly what we've been seeing over the years.
Granny has never filtered a spam in her life. The ISPs have taken up automated spam filtering on her behalf. That's why the spammers can't stand still and let just us techies filter their sludge. The techies took the fight to the next level, blocking spam further up the chain so the benefits of spam-blocking translated to everyone. Thus, we've seen the counterevolution of spam -- when "viagra" got blocked we saw simple 133t-sp33k substitutions for things like "vi4gra"; with the advent of Bayes filtering we now see random text words combined with pictures of the real spam text.
The spam filterers should have taken a page from the hospitals. Doctors NEVER issue prescriptions for vancomycin outside of a hospital, in hopes that the practices that have led to so many antibiotic-resistant diseases wouldn't allow bugs to evolve to resist vancomycin. They kept the most potent stuff in reserve. Like them, the filterers should never have given Bayes filtering to companies like AOL. If they just quietly ran it on their own boxes, they'd be spam free today.
John
That's why the solution has to treat the evasion of spam filtering like any other sort of computer cracking (i.e. a federal offense resulting in a few years of PMITA prison).
/. If the government wants us to respect the law, it should set a better example.
We use dspam with 85,000 mailboxes, merged groups, and a mysql backend on three xeons, have had no problems. If it's not scaling for you, you're doing something wrong.
For me a simple statistical spam filter works well enough, 50-100 spam go to the junk pile per day, my 10 emails sit in my mailbox.
The spam burden is really on the network providers and the network and computer resources used. Yes they pass some of the costs on to customers.
But we get a huge pile of crap mail in the regular mail box too, from basically the same scammers with the same clogging of the communication channel, paper spam disguised as bills or checks,
the same scams for drugs, mortages, tanning, software, even porn. I do not have a filter for that, too bad.
No, it doesn't scale well. Have you run DSPAM for 1,000 users? It requires at least one dedicated DB server, maybe even two, with lots of memory and lots of fast disk. It does scale because you can always throw more hardware at it, but it doesn't scale well.
-matthew
"THERE IS NO JUSTICE, THERE IS ONLY ME." -Death
For those who don't know, TMDA is a challenge-response based server-side system. It's open-source, all written in Python. Works with all client mail readers. Check it out