nuclearelephant.com · Domains

DSPAM v3.6 Released

It · Spam · 2005-10-17 00:04 · posted by ScuttleMonkey · from the spam-canned dept. · 100 comments

Nuclear Elephant writes "After six months of development, DSPAM v3.6 has been released. The most notable change is the series of new features added to make an anti-spam gateway appliance possible (Knoppix anyone?). Version 3.6 also includes a highly accurate alternative to Bayesian filtering known as Markovian discrimination, based on Bill Yerazunis' research. Other significant enhancements include trusted sender whitelisting, integrated Clam Antivirus and LDAP support, a centralized spam training alias, and a new dependency-free storage driver. Much of the documentation has also been rewritten to make installation easier. A change log and release notes are also available. Slashdot has recently featured a review of the author's book, Ending Spam and an interview as well."

DSPAM v3.6 Released

It · Spam · 2005-10-17 00:04 · posted by ScuttleMonkey · from the spam-canned dept. · 100 comments

Nuclear Elephant writes "After six months of development, DSPAM v3.6 has been released. The most notable change is the series of new features added to make an anti-spam gateway appliance possible (Knoppix anyone?). Version 3.6 also includes a highly accurate alternative to Bayesian filtering known as Markovian discrimination, based on Bill Yerazunis' research. Other significant enhancements include trusted sender whitelisting, integrated Clam Antivirus and LDAP support, a centralized spam training alias, and a new dependency-free storage driver. Much of the documentation has also been rewritten to make installation easier. A change log and release notes are also available. Slashdot has recently featured a review of the author's book, Ending Spam and an interview as well."

DSPAM v3.6 Released

It · Spam · 2005-10-17 00:04 · posted by ScuttleMonkey · from the spam-canned dept. · 100 comments

Nuclear Elephant writes "After six months of development, DSPAM v3.6 has been released. The most notable change is the series of new features added to make an anti-spam gateway appliance possible (Knoppix anyone?). Version 3.6 also includes a highly accurate alternative to Bayesian filtering known as Markovian discrimination, based on Bill Yerazunis' research. Other significant enhancements include trusted sender whitelisting, integrated Clam Antivirus and LDAP support, a centralized spam training alias, and a new dependency-free storage driver. Much of the documentation has also been rewritten to make installation easier. A change log and release notes are also available. Slashdot has recently featured a review of the author's book, Ending Spam and an interview as well."

DSPAM v3.6 Released

It · Spam · 2005-10-17 00:04 · posted by ScuttleMonkey · from the spam-canned dept. · 100 comments

Nuclear Elephant writes "After six months of development, DSPAM v3.6 has been released. The most notable change is the series of new features added to make an anti-spam gateway appliance possible (Knoppix anyone?). Version 3.6 also includes a highly accurate alternative to Bayesian filtering known as Markovian discrimination, based on Bill Yerazunis' research. Other significant enhancements include trusted sender whitelisting, integrated Clam Antivirus and LDAP support, a centralized spam training alias, and a new dependency-free storage driver. Much of the documentation has also been rewritten to make installation easier. A change log and release notes are also available. Slashdot has recently featured a review of the author's book, Ending Spam and an interview as well."

DSPAM v3.6 Released

It · Spam · 2005-10-17 00:04 · posted by ScuttleMonkey · from the spam-canned dept. · 100 comments

Nuclear Elephant writes "After six months of development, DSPAM v3.6 has been released. The most notable change is the series of new features added to make an anti-spam gateway appliance possible (Knoppix anyone?). Version 3.6 also includes a highly accurate alternative to Bayesian filtering known as Markovian discrimination, based on Bill Yerazunis' research. Other significant enhancements include trusted sender whitelisting, integrated Clam Antivirus and LDAP support, a centralized spam training alias, and a new dependency-free storage driver. Much of the documentation has also been rewritten to make installation easier. A change log and release notes are also available. Slashdot has recently featured a review of the author's book, Ending Spam and an interview as well."

New Identity Theft Technology Fails to Protect

It · Security · 2005-09-05 05:58 · posted by ScuttleMonkey · from the someone-built-a-better-idiot dept. · 280 comments

Nuclear Elephant writes "According to BBC News, identity thieves are quickly adapting to new technologies such as chip-and-pin credit cards using human nature tactics rather than cracking the technology. At least that's what Dr. Emily Finch (UEA), who interviews career criminals about their activities, claims. Finch swapped credit cards with a male coworker and performed a number of transactions without being challenged by cashiers. Finch also believes biometric identity cards will only exacerbate the problem. Regardless of which side of the fence you sit on, could this take us closer to embedded chips under the skin?"

Jonathan Zdziarski Answers

Interviews · 2005-08-30 07:13 · posted by ScuttleMonkey · from the friends-of-anti-spam-slash-back dept. · 326 comments

Wednesday we requested questions for Jonathan Zdziarski, an open source contributor and author of the recently reviewed book "Ending Spam." Jonathan seems to have taken great care in answering your questions, which you will find published below. We have also invited Jonathan to take part in the discussion if he has time so if your question didn't make the cut perhaps there is still hope. Winkydink asks:
How do you pronounce your name?

Jonathan Responds:
Hi. Well first off, I'm sticking to the pronunciation 'Jarsky', however many of my relatives still pronounce it 'Zarsky' or "Za-Jarsky". As far as I can tell, my last name was originally 'Dziarstac' when the first generation of my family came over, which would have been pronounced with a 'J'. It's of polish decent, but I'm afraid I'm not very in tune with my ancestors on this side of the family. The other side of my family is mostly Italian, and they drink a lot, organize crime, and generally have more fun - so they are much more interesting to hang out with. For the past 29 years of my life, giving my last name to anyone has included the obligatory explanation of its pronunciation, history, and snickering at puns they think I'm hearing for the first time (-1, Redundant), so don't feel too bad for asking.

As far as who I am and why you should care - I guess that depends on what kind of geek you are. I've never appeared in a Star Trek series or anything (I've been too busy coding and being a real geek), so I guess that eliminates me as a candidate for public worship in some circles. I guess if you're into coding, open source, hacking all kinds of Verizon gear, or eradicating spam, then some of my recent projects may be of interest. If you at least hate me by the end of the interview, I'll have accomplished something.

An Anonymous Coward asks:
What do you think about the proposed change to the GPL with the upcoming GPL 3? Is it a welcomed breath of fresh air to the Open Source Community, or will it just be a reiteration of the previous GPL? What are your thoughts and comments on the GPL 3?

Jonathan Responds:
Based on the scattered information I've read about some potentially targeted areas in GPLv3 and the religious fervor with which some of these discussions have been reported, all I can say is I hope common sense prevails. Actually there's much more I can, and will, say about the subject below, but I think it's probably a good idea to summarize in advance as you may not make it through the list of details in one sitting. So in summary of all my points to come: I hope common sense prevails.

One of the things I've heard, which doesn't make much sense to me, is the idea of changing the GPL to deal with 'use' rather than 'distribution', which would affect companies like Google and Amazon. The argument seems to be that some people feel building your infrastructure on open source should demand a company release all of their proprietary source code which links to or builds on existing GPL projects. They argue that the open source community hasn't benefited from companies like Google and Amazon. Well, from a source code perspective that might be somewhat true - but if you take into consideration the fact that we all have a good quality, freely accessible search engine, cheap books, and employment for many local developers (many of whom write open source applications), the benefits seem to balance out the deficiency. Does anybody remember what the world was like before Google? None of us do, primarily because we couldn't find it - we couldn't find much of anything we were looking for on the Internet as a matter of fact, including other people's open source projects. You might not be getting "free as in beer" or "free as in freedom", but you are getting "free as in searches" and "free as in heavily discounted but not quite free books" in exchange. That's a pretty good trade. It's certainly better than having to look at pages of advertising before completing your order, or subscribing to a Google search membership. On top of this, you probably wouldn't want to see half of the source code that's out there being integrated (internally) into these projects. While I haven't seen Google or Amazon's mods specifically, I do heavily suspect that, if they are like any other large corporate environment, there are many disgusting and miserable hacks that should under all circumstances remain hidden from sight forever - many of which are probably helping ensure job security for the developers that performed the ugly hacks in the first place. Just how useful would they be to your project anyway? Probably little. And if you really believe in free software ("free as in freedom"), then the idea that someone should be required to contribute back to your project in order to use it is contradictory to that belief - you might just as well be developing under an EULA instead of the GPL.

With that said, there's a difference between freedom and stealing. I've heard that GPLv3 will attempt to address the mixing of GPL and non-GPL software. I think this clarification might be a good thing. For one, because I've seen far too many pseudo-open source tin cans and CDs being resold commercially out there, distributing many different F/OSS tools with painfully obvious closed commercial code, and finding ways to easily loophole around this part of the GPL, and secondly because it's based around implementation guidelines that really aren't any of the GPL's business. At the moment, mixing uses a very archaic guideline, which is - in its simplest terms -based on whether or not your code shares the same execution space as the GPL code. I think this needs to be reworked to give authors the flexibility to define "public" and "private" interfaces in a project manifest. We're already defining these anyway if we believe in secure coding practices. Closed source projects may then use whatever public interfaces the author has declared public (such as command line execution, protocols, etcetera) but private interfaces are off limits. One particular area where this would come in handy is in GPL kernel drivers, which need this ability to avoid tainted-kernel situations. If the author wants, they can declare dynamic linking to a library as a public interface and even make their code more widely useful without having to switch to the GPL's red-headed stepchild, the LGPL. It would also be nice to be able to restrict proprietary protocols (such as one between a client piece and a server piece, which may have originally been designed to function together) to only other GPL projects, which would essentially create GPL-bonded protocol interfaces. This won't restrict use in any way - only what closed-source projects are limited to interfacing with when redistributed.

I would also like to see the GPL's integration clause tightened down quite a bit. There are some companies out there abusing the GPL with "dual licensing". I've considered dual licensing myself in some commercial products, and I just don't believe it's being done in the right spirit much, if at all. Doing away with the possibility of integrating the GPL into a dual license could help strengthen the GPL.

Finally, I'd say mentioning a few words in the GPLv3 about submission practices to help stave off problems like this whole Sco and Linux® fiasco from ever happening again would be a good thing. People generally don't want to limit usage, but if you're going to submit code, there should be at least some submission guidelines. I suspect much of this can (and should) be done outside of the GPL, but at least covering the basics might be appropriate. It should be understood that if you're going to contribute code to the GPL, it had better be unencumbered. It's definitely something every project should already be considering already.

An Anonymous Coward asks:
Do you have any suggestions for the enthusiastic yet inexperienced? Perhaps a listing of projects in need of developers, with some indication of the level of experience suggested (as well as languages required).

Jonathan Responds:
The best projects I've seen were those started from someone with a passion for what it is they're coding. Open source development is the internship of the 21st century, and working on projects is tedious, frustrating, and likely to make you want burn out if you haven't developed perseverance. I usually suggest to people to come up with ideas for some projects they feel passionately about and make those their first couple of goals. Even if it's completely useless to anyone else, you're still likely to benefit from it yourself. Just look at my Australian C programming macros. Who would have thought that people wouldn't want to use "int gday(int argc, char *argv[])" in their code. I'm sure I learned something from that project, though I still can't remember what.

Instead of spending idle time looking for other projects to jump on, I'd spend as much time as I could in man pages, books, and coding up my own little concoctions. Even if they're stupid ones, you're likely to learn something, or even better - come up with another neat little idea you can spin off of it. Necessity is the mother of invention, so I try and figure out what it is I need, and then do it myself. That usually works. If you still can't think of anything, see if you can catch a vision for something someone else needs. I wouldn't touch anything that you're not 100% bought into and excited about for your first projects.

RealisticCanadian asks:
I myself have had numerous interactions with less-than-technically-savvy management-types. Any time I bring up solutions that are quite obviously a better technical and financial choice over software-giant-type solutions; conversation seems to hit a brick wall. The ignorance of these people on such topics is astounding, and I find many approaches I have tried seem to yield no results in the short term. "Well, yes, your example proves that we would save $500,000 per year using that Open Source solution. But We've decided to go the Microsoft (or what-have-you) route." With your track record, I can only assume you have found some ways to overcome this closed-mindedness.

Jonathan Responds:
I'm not so sure that I have convinced anyone open source was better inasmuch as I've convinced people that other people's projects were better than what Microsoft had to offer, and that's not hard for anyone to accomplish. I can strongly justify some open source projects to people because they are already superior to their commercial counterparts, but there are also a lot of crummy projects out there that should be shot and put out of my misery. I'm not one to advocate a terribly written project, even if it is open source. The good projects can usually speak for themselves with only a little bit of yelling and biting from me. So if you want to become a respected open source advocate at your place of business, I'd say the first rule of thumb is not to try and advocate crap projects for the mere reason that they're open source. Advocating the good ones will help you build a reputation. It also helps if you read a lot of Dilbert so you'll understand the intellectual challenges you'll be facing.

Some other things that I've found can help include what managers love to call a "decision matrix" which is a spreadsheet designed to make difficult decisions for them. For your benefit, this should consist of a series of features and considerations that the competitor doesn't have, with a big stream of checkboxes down the row corresponding to your favorite open source project. Nobody's interested in knowing what the projects have in common anyway, so tell them (with visual cues) what features your open source solution has over the competitor. And if you really want to get your point across clearly to your manager, do the spreadsheet in OpenOffice so they'll have to download and install an open source project to read it.

Once you've done that, and if you're still employed by now, the next thing to put together is an ROI (return on investment) comparison, which not only addresses the costs of the different solutions, but costs to support both solutions in the long run, cost of inaccuracy (if this is a spam solution for example), cost of training, customizations, and resources to manage each product. This is a great opportunity to size machines and manpower and include that in a budget forecast. Many managers are sensitive to knowing just how much extra dough it's going to cost to implement the commercial solution. At the very least, you ought to be able to prove many commercial solutions don't actually make the company much money in the long run. If speaking of cash isn't enough to convince your manager then a full analysis of low-level technical aspects will be necessary. This is simply a dreadful process, and where most open source attempts fail - because a lot of people are just too lazy to learn about the technical details of both projects and complete their due diligence. If you take the time, though, you're likely to either convince your boss or utterly confuse him - either one is very satisfying.

The biggest challenge in justifying many open source projects I've run into is finding solid support channels that your boss can rely on if you get hit by a bus (or in his mind, fired). Support is, in many cases, a requirement but not all good open source projects see the benefit in offering support. A lot of companies are willing to pay just to have someone they can call when they have a problem. So if you can find a project that's got a pool of support you can draw out of, you can not only use that to justify the project to your manager, but kick a few bucks back into the open source community. I started offering support contracts for dspam primarily because people needed them in order to get the filter approved as a solution. I think I do a good job supporting my clients that do need help, but at least half of them just pay for a contract and never use it. I certainly don't have a problem with that, and it supports the project as well as the people investing time in it.

Goo.cc asks a two parter:
1. In your new book, you basically state that Bogofilter is not a bayesian filter, which was news to some of the Bogofilter people I have spoken to. Can you explain why you feel that Bogofilter is not a bayesian filter?

Jonathan Responds:
Bogofilter uses an alternative algorithm known as Fisher-Robinson's Chi-Square. Gary Robinson (Transpose) basically built upon Fisher's Inverse Chi-Square algorithm for spam filtering, which provided some competition for the previously widely accepted Bayesian approach to this. Therefore, Bogofilter is not technically a Bayesian filter. The term, "Bayesian", however is commonly a buzzword known to most people to describe statistical content filtering in general (even if it isn't Bayesian), and so Bogofilter often gets thrown into the same bucket. CRM114 is another good example of this - many people throw it in the same bucket as a Bayesian filter, but it is configured (by default, at least) to be a Markovian-based filter which is "almost entirely nothing like Bayesian filtering". Technically, CRM114 isn't a filter at all, but a filtering-language JIT compiler (it can be any filter). I cover all of these mathematical approaches in Ending Spam, so grab a copy if you're interested in learning about their specific differences.

2. Bayesian filters have been around for some time now but there still seems to be no standardized testing methods for determining how well filters work in comparison to one another. Do you think that comparative testing would be useful and if so, how should it be performed?

Jonathan Responds:
Part of the reason there's no standardized testing methodology is because there's no standardized filter interface. A few individuals have attempted to build spam "jigs" for testing filters, but the bigger problem is really lack of an interface. About a year ago, the ASRG was reportedly working on developing such a standard - but as things usually turn out, it's an extremely long and painful process to get anything done when you've got a committee building it (take the mule, for instance, which was a horse built by a committee). This is probably why filter authors have also been hesitant to try and accommodate their filters to a particular testing jig. Incidentally, this is how I surmise that SPF could not have possibly made it through the ASRG - the fact that it made it out at all suggests that it never went in.

I think it's of some interest to compare the different filters out there, but it's also somewhat of a pointless process too. Since these systems learn, and learn based on the environment around them, only a simulation and not a test, will really identify the true accuracy of these filters - and even if you can build a rock solid simulation, it will only tell you how well each filter compared for the test subject's email. If we are to have a bake-off of sorts, it definitely ought to include ten or more different corpora from different individuals, from different walks of life. Even the best test out there can't predict how a filter might react to your specific mail, and for all we know the test subjects may have been secretly into ASCII donkey porn (which will, in fact, complicate your filtering).

This is why some people misunderstand my explanations of dspam's accuracy. All I've said in the past is "this is the accuracy I get", and "this is the accuracy this dude got". Which is the equivalent of "our lab mice ate this and grew breasts". There's no guarantee anybody else is going to get those results, though I'm sure many would try (with the mice, that is). In general, though, I try to publish what I think are good "average" levels for users on my own system, and they are usually around 99.5% - 99.8%. In other words: your mileage may vary. So try it anyway. Incidentally, I've been working with Gordon Cormack to try and figure out what the heck went wrong with his first set of dspam tests. So far, we've made progress and ran a successful test with an overall accuracy of 99.23% (not bad for a simulation).

What would be far more interesting to me would be a well-put together bakeoff between commercial solutions and open source solutions. The open source community around spam filtering really has got the upper hand in this area of technology, and I'm quite confident F/OSS solutions can blow away most commercial solutions in terms of accuracy (and efficacy).

Mxmasster asks:
Most antispam software seems to be fairly reactionary - wither it is based on keyword patters, urls, sender, ip, or the checksum of the message a certain amount of spam has to first be sent and identified before additional messages will be tagged and blocked. Spf, domainkeys, etc... requires a certain percentage of the Internet to adopt before they will be truely effective. What do you see on the horizon as the next big technique to battle spam? How will this affect legitimate users on the Internet?

Jonathan Responds:
That's the problem with most spam solutions, and why I wrote Ending Spam. Bayesian content filtering, commonly thrown into this mix, has the unique ability to grow out of your typical reactive state and become a proactive tool in fighting spam. I get about one spam per month now at the most, and DSPAM is learning many new variants of spam as it catches them; I'd call that pretty proactive. Spam, phishing, viruses, and even intrusion detection are all areas that can benefit greatly from this approach to machine learning. They will likely never become perfect, but these filters have the ability to not only adapt to new kinds of spam, but to also learn them proactively before it makes it into your inbox. Some of this is done through what is called "unsupervised learning" and not traditional training, while other tools, such as message inoculation and honey pots, can help automate the sharing of new spam and virus strains before anyone has to worry about seeing them. We haven't thoroughly explored statistical analysis enough yet for there to be a "next big technique" beyond this. The next big techniques seem to be trying to change email permanently, and I don't quite feel excited about that. Statistical tools are where I think the technology is at and it needs to become commonplace and easier to setup and run.

The problem seems to be in the myth that statistical filtering is ineffective or incomplete. Many commercial solutions pass themselves off as statistical(ish) and it seem to be contributing to this myth by failing to do justice to the levels of accuracy many of the true (and open source) statistical filters are reflecting. Any commercial solution that claims to be an adaptive, content-based solution (like Bayesian filters are) really ought to deliver better than 95% or 99% accuracy. Part of the problem is just bad marketing - most of these tools are not true "Bayesian" devices; they just threw a Bayesian filter in there somewhere so they could use the buzzword. Another problem is design philosophy and the idea that you need an arsenal of other, less accurate tests, to be bolted in front of the statistical piece. If you're going to train a Bayesian filter with something other than a human being, whatever it is that's training it ought to be at least as smart as a human being. Blacklist-trained Bayesian filters are being fed with about 60% accurate data, (whereas a human is about 99.8% accurate). So it's no surprise to me that Blacklist-trained filters are severely crippled - what a dumb combination. If you really want to combine a bunch of tools for identifying spam, build a reputation system instead. They do a very good job of cutting spam off at the border, are generally more scalable than content-based filtering, and most large networks can justify their accuracy by their precision.

Not all commercial content-based filters are junk. Death2Spam is one exception to this, and delivers around 99.9% accuracy, which is in the right neighborhood for a statistical filter. Not all reputation systems are junk either. CipherTrust's TrustedSource is one example of what I call a well-thought out system. If you must have a commercial solution, either of these I suspect will make you quite happy. As for (most of) the rest, quit screwing around and build something original that actually works.

Jnaujok asks:
The SMTP standard that we use for mail transfer was developed in the late 70's - early 80's and has, for the most part, never been updated. In that time period, the idea of hordes of spam flowing through the net wasn't even considered. It has always been the most obvious solution to me that what we really need is SMTP 2.0. Isn't it about time we updated the SMTP standard?

Jonathan Responds:
You're talking about an authenticated approach to email, and there have been many different standards proposed to do this. First let me say that, even though SMTP was drafted a few decades ago, it's still successful in performing its function, which is a public message delivery system - key word being public. There exist many private message delivery systems already, which you could opt to use, including bonded sender and even rolling your own using PGP signatures and mailbox rules. I have reservations about forcing such a solution on everybody and breaking down anonymity for the sake of preventing junk mail. Until you can sell a company like Microsoft on absolute anonymity in bonded sender and sell ISPs into putting up initial bonds for their customers (so that a ten-year old gradeschool student can still use email), I see a very large threat (especially by the government) in globalizing this as a replacement for the 'public' system. With services like gmail, where you can store an entire life's worth of email, the idea that everything you've ever said could be sufficiently traced back to you and used against you, I would rather deal with the spam. Why? Let me pull out my tinfoil hat...

It's been advertised plenty of times on Slashdot that Google stores everything about all of its queries. It wouldn't surprise me if they already have government contracts in place to perform data mining on specific individuals. How would you like, in the future, all of your email to be mined and correlated with other personal data to determine whether or not you should be allowed to fly? Buy a firearm? Rent a car? We're not very far off from that, and even less so once this correlation is made possible.

So abstract some level of anonymity at the ISP-level you say? That's just not going to happen. For one, that makes it just as simple for a spammer to abuse somebody's network and then we've gone and redesigned SMTP for no good reason. Remember, business has to be able to set up shop online fairly easily and spammers are a type of shop. So we are always going to balance between free enterprise and letting spammers roam on the network. Should we employ a CA, how much would it cost to run your own email server? More importantly - does this perhaps open the door for per-email taxes? I'd much rather just deal with spam the way we are now. For another thing, abstracted identity architectures would only give you a level of anonymity parallel to the level of anonymity you have when you purchase a firearm (where the forms are stored by your dealer, rather than filed to a central government agency). See how long it takes for the feds to trace your handgun back to you if you leave it at the scene of a crime.

You can't leave it in the ISP's control anyway. The sad truth is that most ISPs still don't care about managing outgoing spam on their network; so new spammers are being given a nurturing environment to break into this new and exciting business. I had a recent bout with XO Communications about one such new spammer who had run a full-blown business on their network since 1997 and recently decided he'd like to start spamming under the "CAN-SPAM" act (which he was convinced defended his right to spam). He included his phone number, address, and web address in the spam - I called him up and verified he was who he said he was (the owner of this business, and spamming). Provided all of this information (over a phone call) to the XO abuse rep (let's call him "Ted"), even filed a police report, and XO still to this day has done nothing. His site is even still there, selling the same crap he spams for. This happens every day at ISPs out there.

The consequences outweigh the benefits. The people who drafted the SMTP protocol probably thought of most of these issues too. A public system can't exist without the freedom to remain anonymous, ambiguous, and the right to change your virtual identity whenever the heck you like.

Sheetrock asks a two parter:
1. In the past, I've heard it suggested that anti-spam techniques often go too far, culling good e-mail with the bad and perhaps even curtailing 1st Amendment rights. Clearly this depends on what end of the spectrum you're on, but recent developments have given me pause for thought on the matter. For example, certain spam blacklists would censor more than was strictly necessary (a subjective opinion, I realize) to block a spammer -- sometimes blocking a whole Class C to get one individual. This would cause other innocent users in that net space to have their e-mail to hosts using the blacklists silently dropped without any option of fixing the problem besides switching ISPs.

Jonathan Responds:
A lot of blacklists have started taking on a vigilante agenda, or at the very least rather questionable ethical practices. Spamhaus' recent blacklisting of all Yahoo! Store URLs (and Paul Graham's website) is a prime example of this. As long as you're subscribed to human-operated blacklists, you're going to suffer from someone's politics. That's one of the reasons I coded up the RABL, which is a machine-automated blacklist. There is also another called the WPBL (weighted private block list). As the politics of the organizations running human-maintained lists get worse, I think more of these automated lists will start to pop up. Machine-automated blacklists don't have an agenda - they have a sensitivity threshold. It's much easier to find the right list with the right threshold than it is to find the right politics (and then keep tabs on them to make sure they don't change). The RABL, for example, measures network spread rather than number of complaints. If a spammer has affected more than X networks, they are automatically added to the system, and removed after being clear for six hours (no messy cleanup). Another nice thing about machine-automated blacklists is that they are really real-time blacklists, and capable of catching zombies and other such evils with great precision.

NOTE: I haven't had time yet to bring the RABL into full production, but am interested in finding more participants to bring us out of testing.

2. This is an extreme example, but most anti-spam approaches have the following characteristics: They are implemented on a mail server without fully informing the users of the ramifications (or really informing them at all). They block messages without notification to the sender, causing things to be silently dropped. Even if the recipient becomes aware of the problem, few or no options are given for the recipient to alter this "service".

Jonathan Responds:
I've run into issues like this with my ISP (Alltel), and I agree with a lot of what you're saying. In the case of Alltel, not only are they filtering inbound messages using blacklisting techniques and other approaches they don't care to tell me about, but they are filtering outbound messages as well. I had to eventually give up using their mail server because I could not adequately train my own spam filter (Alltel would block messages I forwarded to it). To make matters worse, there is no way to opt out of this type of filtering on their network, even though I offered to give them the IP address of my remote mail server. This clearly does affect their customers, and I feel there are censorship, violation of privacy and denial of service issues all going on here. (Somebody please sue them by the way).

Fortunately, I don't think this issue is as wide spread as you might think. Many of the ISPs and Colleges I've worked with are, unlike Alltel, very dedicated to ensuring that their tools only provide a way for their users to censor themselves. I think this ought to be a requirement for any publicly used system. Specifically...

1.The user must be able to opt in or out of all aspects of filtering
2.All filtering components and their general function must be fully disclosed
3.The user must be able to review and recover messages the system filtered

Opting out of RBLs is as easy as having two separate mail servers and homing on the box you want. I would strongly advise to ensure that your solution is capable of receiving instruction from a user to improve its results, but it is still very difficult to scale this to millions users. At the very least should be fully disclosed, recoverable, and removable.

An Anonymous Coward asks:
Without going into the truths of the beliefs in question, which I'm sure will be debated enough in the Slashdot thread anyway (and I hope you'll join in), what do you think the reason is that so many scientists, nerds and people otherwise rather similar to you think your beliefs are obviously incorrect? Do you think they are all deluded? Do you agree that there might be a possibility that your beliefs are not rational?

Jonathan Responds:
The beliefs I hold as a Christian aren't always the popular ones, but they're certainly valid arguments for anyone who cares to ask about them (not that that has happened). When you read about someone's beliefs, you have the option to engage in discussion, or to filter his or her beliefs through your own belief system. The former option involves cognitive thought, however the latter is how most people today respond to anything that even smells religious. And I say this coming from the position of someone who hasn't tried to shove my beliefs down anyone's throat - I merely documented them on my personal website. That tells me that some people don't believe I have the right to my own beliefs - how asinine is that?

But to address the question, my beliefs aren't based on some religious intellectual suicide. In fact, the Bible teaches that you should know what you believe and why, and that you should even be prepared to give a defense for your faith - so the Bible encourages sound thinking and not some pontificated ideal structure as many quickly dismiss it as. I didn't dumb down when I became a Christian. In fact, it felt more like I began to think more clearly. I was raised in the same public school system as everyone else and didn't even know who Jesus Christ was until around my junior or senior year of high school. I've read from my early days in Kindergarten how "billions of years ago, when dinosaurs roamed the earth" and I've been taught the theory of evolution like everyone else. The problem, though, is that no matter how credible or not a particular area of science is, much of what is out there is taught based on authority. I find it very ironic to be flamed by anyone who thinks I'm an idiot for not believing in a theory that's never been proven by scientific process. It's recently become a "religious act" to question science in any capacity, but isn't questioning science the only way we can tell the good science from the bad science? And there is a lot of great science out there - even in public schools. But there's no longer a way for students to evaluate the credibility of what they're being taught. That seems to be degrading the quality of the subject. Science should be a quest for the truth, with no presuppositions, and appropriate understanding between hypotheses vs. theories vs. laws. When a theory is presented in the classroom as law and it's not held accountable to method, it's degenerated into mere conditioning.

I've spent a considerable amount of time studying topics such as the age of the earth and the theory of evolution, and I could probably argue it quite well if so inclined to engage in a discussion. That's important if you're going to believe anything really - including whatever the mainstreamed secular agenda happens to be.

Just as an example, I've recently looked into Carbon-14 dating and found that in cross-referencing it to Egyptian history (which dates back as far as 3500 B.C. and is held to be in very high regard by archaeologists and scientists alike), there is evidence that Carbon dating may be inaccurate beyond around 1800 B.C. For someone not to consider that would be ignoring science. My point here is that my beliefs aren't merely unfounded, eccentric ideas. Just because microevolution is feasable, that doesn't mean I'm going to sweep macroevolution under the rug and not test it - the two are actually worlds apart, just cleverly bundled. The Bible has given me a perspective that seems to offer a reasonable and sensible way to put the different pieces of good science together. No matter what you believe, I strongly feel that you should have some factual foundation to support whatever it is, and if you don't, then be man enough to admit you only have a theory put together.

No matter what side of the camp you are on, your beliefs require a certain amount of faith, as neither side is at present proven scientifically. I don't have all the answers, but I don't think science in its present state does either. At the end of the day, you can't prove the existence of God factually, and so whatever you believe is still based on faith. But at least the Christians can admit that - I just wish the evolutionists would too.

Ask Jonathan Zdziarski

Interviews · News · 2005-08-24 08:48 · posted by ScuttleMonkey · from the friends-of-anti-spam dept. · 112 comments

You may recognize the name Jonathan Zdziarski from a recent Slashdot book review of his book Ending Spam. Aside from his DSPAM spam filter Jonathan has also contributed several other projects to the open source community under the GNU General Public License. These projects include Verizon-Compatible SMIL Multimedia Gateway, The Reactive Automated Blackhole List Server, Apache DoS Evasive Maneuvers Module, and several others. Want to know how to effectively contribute projects to the open source community? Curious to ask another programmer about his history? Now is the time to ask. Moderators will select the top few questions that we will forward on to Jonathan sometime tomorrow. The answers to the questions will be displayed next Tuesday when we will encourage Jonathan to participate in the discussion as time permits.