Fighting Spam with DNA Sequencing Algorithms
Christopher Cashell writes "According to this article from NewScientist, IBM's Anti-Spam Filtering Research Project has started testing a new spam filtering algorithm, an algorithm originally designed for DNA sequence analysis. The algorithm has been named Chung-Kwei (after a feng-shui talisman that protects the home against evil spirits). Justin Mason, of SpamAssassin, is quoted as saying that it looks promising. A paper is available on the algorithm, too (PDF)."
wonder what the spammers will come up with to get around this...
Excellent! This will go wel with my Feng Shui compliant wall of rocks that I use as a firewall.
Even with training, isn't this just some regexp and searchting after particular strings.
And what about short messages, that don't use as much words, is the spamscore relative or absolute? The article is a little low on details, anybody who can point to some more informative articles?
... in Thunderbird works for me.
I have to say the adaptive spam filter in Firefox works pretty darn well. I have tried other adaptive spam filters as plugins in Outlook and they work pretty darn well too.
With the nature of new spam messages that look like real emails, the only person who can really tell if something is spam is the recipient.
http://github.com/gbook/nidb
Funny how some people develop more and more sophisticated stuffs to fight against something that is just as simple as sending out emails to random address... and so simple that it will never stop :/
I think you mean Mozilla Thunderbird?
This isn't "fighting spam", it's "adapting to spam".
For now, Bayesian filtering still gets the job done most of the time, so I think we shouldn't get too excited.
Besides, you have to ask yourself some questions...
"What happens if you try to filter spam with RNA?"
"Just how good can ACT and G manage spam?"
and, most important of all...
"Are you sure this spam filter uses no portion of Keanu Reeves' genetic code?"
You will be baked, and there will be cake.
I really like the programs, but I get their names confused... I meant Mozilla Thunderbird in the above post.
http://github.com/gbook/nidb
You have to love SpamAssassin for it's very Perlish approach to spam filtering... "hey, there's a cool new way to filter spam... throw it in!"
I love this mostly because it means that SA is a moving target. Spammers can figure out how to defeat pieces of it, but it deploys a wide range of static, dynamic, network-based and user-driven tests that changes so much that spammers simply can't afford to keep up.
It looks like much of the spam I'm recieving today consits of either nearly-blank or e-mails containing news articles that seem to be designed to pass trough content filters just so users can send them back to their admins as spam, essentially making it easier for bayesian filters and such to mark legitimate e-mail as spam.... though honestly, it's more of annoyance for me, as it makes it easier for users to say "The spam filter isn't working, what are you doing wrong?"
According to the ./ title, it seems they used an algorithm used for DNA secuencing, when in fact they used an algorithm used for DNA analisis (or DNA sequence analisis that is the same), more specifically, gene finding techniques. As you may know, most DNA in a genome is not translated into protein (some people still call it junk, but most of it is no junk at all). So there are programs to sort genes out from the rest of DNA.
I think we will see more and more applications like this with the growing cross-polination between Biology and CS.
DNA in your Linux: DNALinux
It's too early in the morning I guess. When I read the title of this article, I immediately thought it was indicating that we should test the 'Dna' of incoming emails.
And then I wondered what the BioRythm of an email would be. I need to go back to bed.
I don't care about the cost of spam. With my 1MBit connection it doesn't compare to my other downloads.
I just don't want to read it - and now I don't have to.
My Journal
I'd love to meet the scientist that thought this up. It probably went something like this: Boss: Well we've made promising gains in the DNA reasearch project, Now what applications could this be used for Engineer: The possibilites are litless! we could cure cancer! We could invent a super puppy that combines the abilities of a lovable puppy and tux, the friendly linux penguin! We could use it to rengenerate limbs for amputees! Marketing: Lets use it to get rid of spam emails! Boss: Great idea! Lets go with that one.
You are confused.
To block spam at the transport level is one thing; an algorithm for identifying spam without human intervention is another entirely.
I suggest you RTFA. Their method is actually pretty interesting. Lackluster is not the appropriate word for the novel idea they have come up with.
Blearf. Blearf, I say.
This is interesting and promising technology. But like all antispam techniques, spammers will find a way around it. Once spammers get a copy of the software, they can create and test countermeasures in the comfort of their own sleazy lairs.
For example, the article mentions the software accepts a message that is long but has a few "spammy" sequences. This suggests an immediate countermeasure of adding bulk to spam -- appending a copy of some news article to the spammy payload (some already do this).
Personally, I've always thought that a simple spell check would do a good job as another layer filtering. It would place spammers in a no-win situation -- either the keyword filter or the spell check filter would get them.
Two wrongs don't make a right, but three lefts do.
and btw, WAKE up ppl. 'Filtering' won't make SPAM *ever* go away. As long as you keep on filtering, I guess, it'll act as a cure/remedy that 'relieves pain', but it isn't a cure/remedy that'll kill 'cancer' for good.
And from a different sidenote, 'Filtering' cost us the consumers more money in the long run, as it's we who pay for the SPAM! weather we look at it, or we keep filtering it away (shouldn't such activities be HIGHLY illegal? in any justice system? ...).
Becase it's we who pay for the Broadband the ISPs deliver to us, and they have to
charge us according to how much it cost's them to sustain it (+some profitable margin).
SPAM eat's like *what was it* 60-80% of the total broadband (world wide) now?!
And yes sir'y, You and I are the ones paying for it, if all we do is keep on 'filtering' it...
I don't claim I know more than I know, and if you know you know more than I know, then by all means, let me know.
I won't use anything on my computer with a chink name or written by a chink.
Try looking at your computer and see where 99% of the parts where made, you racist asshole. Or your TV. Phone. Answering machine. And everything else in your home (or more probably, your parent's basement.)
So, yeah, nice technology, but nothing the bad guys can't get around. If you are serious about stopping spam, stop playing with your computer and start bugging your congressperson.
First, there's a constant tuning of both preditor and prey (Anti-spam tools and spam).
Second, there seems to be some sort of equilibrium which is inevitably achieved, and
Third, there are occasional discreet major developments which change the game. This would be an example. Now, spam is going to be forced to majorly adapt.
I could see the 'Quality' of spam improving a lot as a result of tools like this. No more letters from my long lost benefactors in nigeria, and no one liners about 'Gushing like a firehose' (My coworkers and I got a good chuckle out of that one), but, as the story said, if you have keywords in a long email, it gets far less penalized. OK. Attach verses from Dante's Inferno, or Joyce's Dubliners to the email. Problem solved. You can't block words like viagra altogether or Pfizer researchers are going to have a hell of a time getting anything through.
Another concern is that if this forces spammers to make up new and compelling spam, people will be more likely to check it out. While my parents are probably pretty confident they didn't win a secret lottery 3 or 4 times last week, they might possibly believe new and creative stories.
Perhaps evolution of email readers is just plain going to be a neccessary part of the solution...
You are confused.
Rather more confused are the slashbots who tout client-side content filtering as the end-all be-all "solution" to spam.
To block spam at the transport level is one thing; an algorithm for identifying spam without human intervention is another entirely.
The only catch: it's not possible to identify spam (unsolicited bulk e-mail) based on the content alone. Why? Because the two words in the definition, 'unsolicited' and 'bulk'. How can the existence of the word 'viagra' possibly tell me the message was unsolicited? Even if I'm not interested in buying Viagra, I can still receive important e-mail containing spammy words that's neither bulk nor unsolicited (like spam complaints about my users). The bulk criteria is even more difficult to predict using content filtering alone. About the only solution that addresses this point I know of that is the Distributed Checksum Clearinghouse.
I suggest you RTFA. Their method is actually pretty interesting. Lackluster is not the appropriate word for the novel idea they have come up with.
The method might be novel. The purpose (content filtering spam that's already been delivered) is not. Such methods simply don't address the costs of receiving and storing spam, only the perceived user inconvenience.
My biggest issue with Thunderbird is the bounce messages. A fair amount of people forge addresses which bounce to me (I'll be putting up SPF Real Soon Now, but that doesn't even mean everyone will read it). As a result, I get some legit bounce messages and some with spam in 'em. If I mark the ones with spam as Junk, I risk throwing away the ones without spam. If I mark the ones with spam as not-junk, I get spam which is similar to them thrown into my Inbox.
The World Wide Web is dying. Soon, we shall have only the Internet.
John Graham-Cumming presented a talk Beating Bayesian Filters at the 2004 Spam Conference detailing these results. A video recording is available; alas, no paper.
In conducting a recent spam filter evaluation I observed (but did not report) that the statistical filter attacks were not particularly effective. The only attack that worked sometimes was to make the entire body of the message a current news item or joke, with only a URL linking to the spam payload.
Stop devoting resouces on dead-end technological solutions! The problem of spam is the problem of unauthenticated e-mail. Add authentication to the mail delivery protocol and the problem of spam goes away.
To get around this spammers will use DNA algorithms to create spam that gets around the blockers ;)
Chung-Kwei is a Chinese semi-deity that wards of evil. He isn't some kind of tailsman.
Congratulations /.
/. pool must be rushing with suitably edited claims to their frienly neighborhood USPTO.
By now, all the patent-trollster-lurkers who passively phish in the
Can anyone who works in the IP (intellectual property NOT Internet Protocol) post a list of known trollster companies that are full of lawyers who acquire patents (by any means) and make patent litigation their primary business model?
See that long UID - that's what you get for lurking too long
"We put the CPU in the center, because that is the chi, or life force for the entire board. A centered chi provides better performance." Now don't you want one?
I only post comments when someone on the internet is wrong.
This will make another nice tool to identify spam. But why not use greylisting at all the ISPs MTAs to simply refuse 99% of the spam that is being sent right now?
Seriously, greylisting implemented on all the ISPs MTAs would overnight block 99% of the spam being sent. Most spam at the moment is being sent from armies of bots run on unsuspecting users systems connected to cable and DSL service. The programs used are unsophisticated, they churn through a list of addresses spewing messages out by the thousands. They do not queue messages or retry them if they get an error. Greylisting uses this to great effect and blocks spam while letting legitimate MTAs deliver messages.
True, it is not 100% effective, some small number of spam messages get through since some spam goes through legitimate MTAs and the message is retried. But once you remove the bulk of spam those can be tracked down and shutdown or blocked at the firewalls.
If the ISPs would implement this spam would become a non-issue over night. Email would once again become a mostly useful tool. But I guess the problem is that the ISPs have no vested interest in solving this problem. None of them will listen or implement this simple solution which does not block any legitimate email. With 70% of the email on the network being spam (number may be higher than that at this time) I would think they would jump at a solution that would reduce the loads on their servers. But I guess they make to much money from spammers to implement such a simple solution.
We must stop these terrorist spammers.
Now watch this drive!
Summary
1) Make your PC face the North, whenever you are checking Email.
2) Hang a metal windchime above your workstation.
It is important that the rods of the windchime to be hollow, so that the auspicious Chi can rise up the chimes.
3) Add a user account for the Dragon Turtle & make him the admin.
This is just like your own immune system, which uses such things as "V-D-J" recombination (and other tricks) to create billions of some what random different epitope to attack potential unknown pathogens. Cells they must further educate not to attack "self" in your own body.
If only computer geeks took some lesson from biologist, perhaps they could get a grip on principles to stop SPAM.
I suggest you read Slashdot
It's my belief that the most likely source of the birth of Artificial Intelligence will be the SPAM filter.
Think about it - we now have software that "learns' what you like.
Sorry, but anything that "learns" fits a definition of intelligence - using past results to predict future outcomes. Note that I'm not saying "self aware" or "conscious", simply "intelligence".
As we move forward, we'll see more and more intelligence on the part of the spammers, and the warring factions of intelligence will likely provide massive financial and political impetus to build ever more intelligence solutions - thus AI is born.
The problem with other vehicles for developing AI is simply the budget. With SPAM, everybody has a direct, financial incentive to develop it, so development will definitely happen!
I have no problem with your religion until you decide it's reason to deprive others of the truth.
I think over the next 2 decades, we'll come to a greater understand of life - and I think that we'll discover a unique aspect of life - that life is truly information technology.
Each cell in your body contains approximately 20 GB of data. Consider the redundancy and sheer massive size of information storage capacity your body consists of! Compare THAT to an Oracle cluster...
So, given the incredible need to process information in order to understand life itself (which could be considered a form of self-replicating information) I think that not only is it likely, but it's all but guaranteed that the lion's share of Information Technology advances will come from biological research.
PS: nanotechnology == microbiotics. Why re-invent the wheel when nature has spent billions of years perfecting nanotechnology? I think the "nanotechnology revolution" will be largely biological, with technological extensions.
When we speak of "the singularity", I think that's the point where our (currently abiological) technology fuses with biology to where they aren't clearly defined any longer.
Man or machine? Who can tell? How would you define either one?
I have no problem with your religion until you decide it's reason to deprive others of the truth.
I have tried just about every single anti-spam software out there, so I have some experience. After being fed up with getting false positives and having to deal with tons of spam getting past the spam filters I tried out Cloudmark's Spamnet - a community based approach to fighting spam. So far it has been 95-99% effective with 0 false positives which is the most important factor for me.
In the past couple of months it has blocked 19,221 spam messages. I don't even bother to send spam to a Spam folder anymore it just goes straight to the deleted items.
For those of you getting a lot of e-mail, the price of the subscription is definitely worth it.
URL: http://cloudmark.com/products/spamnet/
I just went through a couple of rounds of interviews with a spam filtering company about doing something similar. The problem these days is that spammers have figured out that "V1AGRA" can be spelled in a number of ways which fool word-based spam filters. There is also a lot of hidden information, such as html and urls, which may be significant, but is difficult to identify with exact string matching.
The approach used to be:
1. Find features (usually well-delimited words) in the message.
2. Look up the features in a database of precalculated scoring information.
3. Add up the scores for all the features found, using some buzzwordy algorithm.
Nowadays the features may not be so obvious. For instance "V1AGRA" may not be present in the feature database, but if "VIAGRA" is, we should be able to link to it via some sort of approximate match, or substring match. Here we can see that both strings have "AGRA" in common, and score accordingly. Longer strings, like "Former Dictator of Nigeria", provide more material to match on.
One problem with substring matching is that substrings can overlap, yielding multiple matches for the same piece of text. A string of length n has n^2/2 different substrings, so our feature space is enormous. Adding up all the feature scores from multiple overlapping hits in a useful way is also much more difficult.
One way out of this mess is to pick a really simple scoring method. Gzip "scores" (in compression amount) messages on how many characters match, in substrings beyond a certain length (4?), using a greedy algorithm. It's a simple tool for guaging the similarity of two files.
The IBM method seems a bit more sophisticated. I've looked up similar methods in bioinformatics textbooks. They handle overlapping, and appear to choose their features with a substring-counting approach.
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
As someone who's done some research on machine learning for spam filtering, this sure looks to me from their 8-page paper like yet another simplistic ML algorithm advocated by folks who don't know the field and tested using techniques of questionable sensitivity. Their "novel" method sounds an awful lot like feature set construction by clustering, a method that is widely used in the spam filtering literature, but with a somewhat novel clustering technique from biology.
Message filtering starts by throwing away line breaks for no obvious reason, then optionally removing the known ham from the training set for no obvious reason. Message headers are then thrown away, for no obvious reason.
No general method is given for corpus allocation. In the experiment reported later, the original corpus appears to have been split roughly in half. (For unreported reasons, none of these splits are exact. No rationale is given for the various corpus allocations.) The training corpus is then split into ham and spam, and the ham portion is split in half. The spam training corpus is used for "positive training": determining a complex feature set as described below. One half of the ham training corpus is then used for "negative training": filtering out complex features that are common in ham. The remainder of the ham corpus is used as a validation set to select thresholds described below. No justification is given as to the failure of the validation set to include spam messages, and the procedure is vague on this point.
The description of the key "positive training" phase is difficult to follow: it seems to assume the pre-existence of the "SPAM vocabulary" [sic] being constructed. The key idea seems to be to use positional index of words within the body as base features, and construct complex features by using a pattern recognition algorithm to find correspondences between sets of base features across spam messages. Patterns that appear across many spam messages are treated as indicating spam.
The final training step is to set thresholds for (1) minimum number of complex features in the spam message and (2) fraction of the message text covered by the complex features. One would expect these two criteria to be highly correlated: no effort appears to have been made to enforce or explore their orthogonality.
The classification phase proceeds by simply counting the number of patterns in a given test message and the percent coverage of the message by the patterns. If the result exceeds both thresholds, the message is classified as spam.
For the empirical evaluation, the corpus used seems to have consisted of approximately 130,000 messages, roughly 1/4 ham and 3/4 spam. No details of the construction or acquisition of this large corpus were given. Because of its volume, one would suspect a synthetic corpus from high volume sources. The details of this corpus construction are critical to the evaluation of the method, so no useful conclusions can really be drawn from the empirical evaluation other than that, like most machine learning methods, this method works well on some problem set.
The claimed accuracies from the technique are at a level that is highly suspect from previous experience: there are fundamental bounds on how well any ML algorithm can do in real situations that don't appear to be met here. Indeed, messages found to be misclassified as spam in the test corpus were manually reclassified, but no effort seems to have been made to identify messages that were "correctly" classified by the algorithm but misclassified in the corpus. The error rate before manual manipulation of the results (!) appears to be about 97%, which is well within the normal expected range. Computational efficiency appears to be good.
The vocabulary used in the paper is not particularly consistent with the vocabulary normally used in the spam filtering or machine learning literature. A few spam filtering and machine learning papers are cited, but not many: citations are primarily from the
Why can't we start filtering based on the URL's in spam? There would need to be some verification process (otherwise valid URL's would be blocked), but wouldn't it increase the cost to spam since spammers would need to register even more domains? After a while, this should also give us a list of spam-friendly hosting providers who should be banned from the rest of the internet.
I use Macs to up my productivity, so up yours Microsoft!
1) Acquire software
2) Decompile
3) Study code
4) Develop countermeasure
5) spam spam spam
It's not like spammers care about the EULA that says they can't look at the code. Oh, and before I forget...
6) ???
7) Profit!
Sean
That should work for virus and worm detection, too!
Even moreso, since viruses are much more a compilation of a set of previous constructions with a few mods than a new composition not necessarily based on the wording of old scams.
And Viruses and worms (especially worms) are more constratined by their environment, requiring an exploit of a vulnerability and the instation of work-doing code. Though gene-shuffling techniques might be able to bury much of the code, the basic exploit must continue to be some sort of match to the vulnerability's "receptor".
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
If it cost 10$ a year . . . to register an email addy, there would be no incentive for the spammers to throw the dictionary at domains, and conversely, the spammers couldn't/wouldn't want to create thousands of email addys to spam from.
I had not heard that angle before. That rocks! You'd think it would be the sort of thing a politician could wield in court too.
It's strange to me that there are a whole slew of laws concerning other modes of communication, but the internet is slow to be regulated. I know regulation won't stop people from doing stuff, but if the laws are defined then you can punish people in court when they transgress. I think a bevy of young lawyers, reared on IT, are gonna change that someday soon.
Stuff that matters.
1) Collect underpants
2) Goto 1
3) Profit
See, it does make sense!
KILL ALL SPAMMERS
Come on, you dumbass gun loving Dubya-ass-kissing americans can do it!
Spam is a WMD! most spam comes from the USA!
NUKE THE USA! !! ! Come on do it! I'm sure it will be good for Bush's oil friends, so you must be eager to try it!
reasoning any Dubyafucker can understand and love.
Given that, why can't there just be a proposal, adopted (like a DVD format, etc) by some huge players (Microsoft, OpenSource, whatever), and then announce a sort of "Spam Doomsday", a ways out. Say January of 2006. Give people time to write in the new mail handling ability so it's side by side with POP in the next Outlook and Thunderbird and whatever else, as well as all the various mail-servers, and months for organizations to plan on roll out.
Yes upgrades are a bitch and especially on a large scale, but coporations have to go through them anyway, from the OS's to the software running on them. I dont know of too many people woul'd prefer to throw never-ending money at spam blocking and net traffic associated with it instead of just knowing "January 18th, 2006 - everyone on the net (who has a brain) is moving to nEw-Mail".
I realize POP/SMTP is ancient and embedded, but unlike a physical "format" like CD-ROM or 4mm DAT, the bulk of existing email is transient, it's collected and forgotten about or "local" in a few minutes. So picking a transition day, agreeing on an open transition method, and just "DOING IT", can't possibly be that hard. POP/SMTP wouldn't even have to go away, it could run side-by-side for the hold-outs, but I think 99% of the people are so tired of spam they'd be understanding of "Sorry, if you want to email customer support at Citibank, you will need to use our new support#citibank.com nEw-Mail address." when its so universally pushed.
I'm Rick James with mod points biatch!
My fast, efficient, method is very light on system resources and attacks spam by detecting one or more common attributes of spam and taking the appropriate action.
Complete detailes here.
Bryan Taylor
iamcf13@hotpop.com
SpamByte code: 7
(see http://www.cf13.com/game-over-spammers.htm )
http://www.cf13.com/press-release.htm
All email containing unwanted content will be summarily deleted or reported as spam.
I just installed greylistd by Tor Slettnes about 24 hours ago, and haven't received a single spam yet (down from 20-30 per day before). I only have a 5 minute greylist delay, meaning there's almost no downside to this method. Assuming my correspondants don't use broken mail servers (and that's their problem if they do) there are no false positives and no maintenance with this system. I use no other spam filters of any kind. I guess they just aren't patient enough to wait 5 minutes :)
And if they start doing retries, I wall add SMTP delays or other techniques as suggested in Tor's excellent guide to mail filtering at the server level: Spam Filtering for Mail Exchangers.
More clever filters and pattern matchers are not going to work. Just like encryption, the more something is used, the more likely it is to be hacked around. Maybe early adoptors will benefit, as the spammers have not had the time or target size to catch up yet. But on a grander scale, it is a no-win cat and mouse game.
The solution is same one that reduces paper junk mail: postage fees. Charge 5 cents or so per message, and spam will greatly shrink.
Table-ized A.I.
-- Our systemic servants do not good masters make.
When blinking text was made part of HTML in the 90s, it was universally despised by users and competent webmasters alike. It's distracting, annoying, and can even be dangerous to those prone to epilepsy. Eventually, for the most part browsers quit supporting it, webmasters quit using it, and users quit visiting sites where it appeared.
As far as I'm concerned, the fact that the Server Beach ad is a blinking flash animation doesn't make it substantially different. I'm getting it again as I type this, and finding it just as annoying as obnoxious as I did the first time. Professional web advertisers and site admins should know better.
By the time the SPAM has reached your filters, you've already lost. It's already consumed your bandwidth, it's consuming your processing time and storage, and the process of updating, teaching, writing and managing your more and more complex filters is still consuming your time. The answer is to go for the root of the problem, which is the naive level of trust that SMTP implies. There are a number of attacks on this problem, with SPF looking like a strong contender. Encourage your ISP to enable SPF checking, and block the spam before it's even sent.
They're talking about IBM's
(((Anti-Spam) Filtering) Research) Project
This is not the same as the
((Anti-(Spam Filtering)) Research) Project
Nor is it the
(Anti-((Spam Filtering) Research)) Project
I'm not sure, but I think the last two are run by AT&T.
It sounds like a great paper until you get down into the guts of their materials and methods. They trained their system on half of their total data, and did not then test on separate data. That captures the two classic no-nos of data driven techniques: they inflate their results by including their training data in the results, and, worse, their training data comprises a larger sample of their total data than would be seen in the real world.
The first of these calls their sensitivity result into quesiton. If they classify their training data perfectly, then the 4.4% false negative rate they quote needs to be doubled to 8.8% -- almost one false negative in every eleven messages scanned.
The second of these calls their false positive rate into question: training with an unrealistically thorough set leads to better catergorization, ceteris paribus. They need to show the trend with a variety of different training set sizes to support any claims about performance.
This sounds like a fully buzzword compliant non-result to me.
Some statistical algorithms only pick a small number of tokens according to some rationale or other (e.g. most extreme scores). For such algorithms, the padding attack is a very good idea, as with enough random words, one or more of these should have a sufficiently extreme score (so that it replaces a more legitimate token in the list of considered tokens), although whether an extreme score can be synthesised randomly would depend on the computation of token scores.
Algorithms in such tools as popfile, ifile, dbacl, crm114 use all the tokens, which ought to have the advantage of making the (incorrect) token distribution of the extra padding words stand out when applying whatever likelihood function is used.
Use an ad blocker, like junkbuster or privoxy. Thanks to ad blockers I've never had to look at a Slashdot banner ad. Ever.
"Fighting MS with human cloning technology."
--A witty sig proves nothing.--
Don't know about Teller, but Penn used to write columns for some computer rag. One I particularly remember was back when US airports were starting to freak out about laptops and insisting that people turn them on at the rent-a-cop checkpoints, and he was annoyed enough about the general harassment and interference with civil liberties to suggest that an appropriate startup screen would be one that goes "10" "9" "8" "7" etc. in big scary-looking letters.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
CAN-SPAM was a great example of why legislation usually doesn't work - Politicians aren't technologists, and usually aren't competent economists, and even technologists have trouble coming up with solid definitions of what the problems are and what they want to do about them without having adverse side-effects. But politicians _are_ politicians, so if there are people clamoring for them to Do Something, they'll come up with Something to Do, and Do that, and at best it'll involve hiring some technologists who'll come up with something at least half-assed and not totally evil. But Politicians aren't technologists, so they can't tell if laws they make about technology are any good - the part they're good at is deciding whether the laws Look Aggressive, or Look Fair and Balanced, or Kick Asses and Take Names, or Kiss Asses and Take Campaign Contributions, or help their buddies in Homeland Security achieve other political goals, and those aren't the parts of the law that really matter much.
Spam makes economic sense for the spammer, and until that changes, spam won't go away. You talk about Congress preferring spammers over consumers, but you're not correct - they don't care about spammers, and the reason there's spam is that there are enough consumers willing to buy "Fake Herbal Vi@gra" or "Great Mortgage Deals" that spammers can make money even though they need to send out billions of emails to people who don't want to be consumers of their products to find the few who do. Most laws don't have any effect, because they depend on police, and police are too busy fighting the War on Politically Incorrect Drugs or dealing with bad drivers or actually fighting real crime to waste their time on the hopeless and unprofitable job of catching spammers. Some proposed laws give bounties to spam recipients for successfully catching spammers, and allow them to use mechanisms like small claims court instead of criminal prosecution, and that's more likely to have some effect.
But fundamentally, until you change the economics, you won't get rid of spam. The economics include the facts that
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
That doesn't mean it's not a helpful tool - just that it's either harder to implement than it looks, or less effective than it looks, probably the former. So get to work writing greylisting tools :-)
Of course, if greylisting were very common, spammers would try to find a way around it, but we knew this was an arms race when we started the discussion.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
That is the point. It is very easy to implement. There are several versions available now that can be setup quickly. Probably the hardest problem to solve is those with email server farms. One version however utilizes a database which could be accessed by multiple servers so when the message is retried it would be able to match the entry and allow it through.
I think one of the reasons ISPs have not jumped at this is that they do not perceive a cost benefit. It will cost them up front to get it configured and tested, it won't make any money for them. As such they are happy with the status quo. Besides they are most likely selling address lists to spammers and making a profit doing so. Implementing effective spam blocking tools is not really in their best interest for this quarter.
A penny or tenth of a cent would be unnoticeable to the average email user, but would break the spammer's bank.
Right now, just requiring a keyword on your subject line is more than enough protection to effectively block all spam that's not forged from your whitelisted addresses.
Yes, spammers do successfully guess whitelisted addresses, by stealing people's address books and mailboxes through viruses and guessing that if you're in someone's address book or they've got mail from you then you're whitelisted from them.
So, it's an effective filtering mechanism for now, but eventually you'll have to require something better than whitelisting your contacts and making it hard for everyone else... and almost any precaution that requires a human in the loop is enough to deter most spammers.
Bittim line is, filtering is just adapting to spam. To fight it, you have to cost them real money.
For ISPs that are mainly bandwidth sellers, for whom email is just a small sideline, it's a different case, because they're carrying a lot more bandwidth from users doing web browsing, than real or spam email, so it may or may not be the top of their priority list (unlike virus protection, which can prevent really big spikes in traffic depending on the worm (e.g. Slammer was really big, but most of the Outlook-hoax-of-the-week mails aren't that heavy traffic.) Also, how high up the priority list a problem is at some ISPs depends on whether they're charging flat-rate or usage-based for traffic.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks