DSPAM v2.10 Released
Nuclear Elephant writes "DSPAM v2.10 is finally available, after four months of development. This is the first stable release to include Bayesian Noise Reduction which was recently mentioned on Slashdot and in Wired News as an algorithm providing accuracy levels as high as 10x that of a human. Some other new features include Neural Networking - which finds nodes in a network that are contextually similar to form a decision matrix, Global Filtering - which provides SpamAssassin-like out-of-the-box type filtering for new users until they build up their own wordlist, Automatic Whitelisting - which automatically learns who your trusted senders are, and many other optimizations and enhancements. Head on over and download the latest tar ball."
I've always wanted a spam filter with 1000% accuracy!
The real problem is people who actually buy this stuff. If no one was buying things from spam, no one would send spam. We all know this.
I propose we start spamming. Anyone who responds gets a nice l'il pistol whipping and is returned to their comptuer. After the first news report, people will be afraid to respond to spam.
Introduction
DSPAM (as in De-Spam) is an extremely scalable, open-source statistical-algorithmic hybrid anti-spam filter. A majority of users running v2.10+ achieve filtering rates ranging from 99.92% - 99.98+%, DSPAM is currently effective as both a server-side agent for UNIX email servers and a developer's library for mail clients, other anti-spam tools, and similar projects requiring drop-in spam filtering. DSPAM has been implemented on many large and small scale systems with the largest systems being reported at about 125,000 mailboxes.
What is a Statistical-Algorithmic Hybrid Filter?
Present-day language classifiers bear the responsibility of maintaining accuracy in the midst of ever-increasing sample complexity. In the setting of spam filtering, many types of intentional attacks have been introduced such as obfuscation, word list injection, sample flooding, and etcetera. As the complexity of classification text continues to multiply rapidly, many filter developers today are left with conflicted feelings between increasing the complexity of their filter and wise teachings from CS class reminding them that computer science is about controlling complexity, not creating it. At the rate complexity is rising, filters will (and have already begun to) become so resource-intensive that they lose scalability, eventually leading to a second conflict of interests: where fighting spam becomes more expensive than managing it.
DSPAM is the first Statistical-Algorithmic Hybrid filter and in being such boldly suggests that there is a better alternative to increasing the feature set of filters to match the spams they are trying to fight. By employing algorithms designed to increase the quality of existing data rather than the quantity of data with the goal of reducing the feature set rather than increasing it, DSPAM has managed to achieve nearly equal levels of accuracy with present-day Markovian-based filters and other types of filters that employ large feature sets with the added benefit of using a significantly fewer amount of resources. DSPAM presently peaks at 99.984% accuracy, which is ten times more accurate than a human being [1] and is presently being used on implementations as large as 125,000+ mailboxes.
DSPAM's Focus
The DSPAM project attempts to go beyond "just another statistical filter" by focusing on the following areas:
* DSPAM has a strong focus on providing better data to already existing algorithms (Bayesian, Chi-Square, etcetera) Combination algorithms work inherently well, but depend on the quality of data. Some of the approaches deployed in DSPAM towards this goal include Chained Tokens, Inoculation Groups, Classification Groups, advanced de-obfuscation techniques, and a new noise reduction algorithm called Bayesian Noise Reduction. The goal is to incorporate processing algorithms that can withstand the long haul of ever increasing message complexity. So far we're doing a great job.
* A strong focus on large-scale implementation support. The largest implementation of DSPAM we've heard about to-date involves 125,000 users. DSPAM has been designed to experience a very short execution time (0.03s - 0.10s on average hardware), and has been equipped with a storage driver API allowing several different storage mechanisms to be used. Depending on disk space constraints, accuracy can be traded off for additional disk space or vice-versa.
* Empty Corpus Support and Global Dictionary Support. It is very important in a large-scale environment to allow users to build their own dictionaries starting from scratch. Why? Because system administrators haven't got the time to create 20,000 seeded dictionaries. On top of this, ISPs require out-of-the-box filtering, which DSPAM's global dictionary feature provides for end-users, with minimal centralized learning. DSPAM provides support for building corpuses from scratch without suffering many fatal training errors (false positives). When these two approaches are combined, we end up with instant-filtering for all u
now the question is.. how hard is it to get it to work with cpanel
Selling software wont make you money, selling a service will.
Right now the only spam getting through my Mozilla filter is stuff that starts with one or two unrelated sentences, then goes into the advertising with any spam-type words (viagra, etc) horribly mispelled.
Twenties Retirement
From what I can tell, DSPAM plugs into your MTA as a local delivery agent, very much like SpamAssassin does.
:P
I couldn't see any platform requirements on their site, but here's what they say about MTA compatibility:
DSPAM works great with Sendmail, Postfix, Qmail, Courier, and Exim, and should work well with any other MTA that supports an external local delivery agent.
Hope that answers your questions
Martin May
this is from the faq...
In real-world scenarios, false positives have ranged anywhere from 0% (none) to 0.10% depending on both implementation and user's mail behavior. Users with relatively predictable mail behavior (such as geeks, dweebs, and freaks) have generally received very few false positives (less than 1 in 10,000 messages).
Selling software wont make you money, selling a service will.
details, plz?
This may work for a little while, but the creative peeps will find a way around it.
I say forget the filtering shit and force email to evolve. Part of the reason that spam happens is that there is no real authentication going on. No requesting permission to be on your white list. No real strong way to block anybody you don't want to hear from. No real way to verify the sender is legit. etc.
I don't claim to have all the answers, but I do know that I've been using ICQ for years and haven't seen a Spam from there since I turned on the 'require authorization' feature.
"Derp de derp."
I tried several incarnations of dspam over a period of about 6 months. It was a pain in the butt to install, required a massive amount of training, and required you run a web server in order to have the point and click training capability.
I eventually gave up and tried the CRM114 Discriminator:
http://crm114.sourceforge.net/
It was MUCH easier to install, MUCH easier to maintain, and has the same or better level of accuracy. I used to get 100+ spam messages a day and now I'll get maybe 1 or 2 a week that sneak through (after only a few weeks of training on errors only).
The Subscriptions Page. Once you get to the page where it asks how many pages you want to buy, scroll down and check the "Apply To My Karma Score" box.
That would be ideal.
(since then the 'casual' user could benefit from using it, without undue difficulty in configuration of mail delivery programs, which are notorious in general..)
it could be used in html rendering
Computer manufacturers are also investigating whether this device will be able to deal with the so-called "Stupid User Problem" which plagues so many IT professionals world wide.
That really is my homepage, no kidding.
FYI, modern MRI scanners use bayesian noise reduction during image processing. I used to work in a MRI research laboratory, and our director had pioneered the application of Bayesian noise-filtering algorithms in post-processing of image data.
Oddly enough, our director of research was notoriously difficult person to schedule a meeting with. Makes me wonder about 'unsupervised learning'...
Okay, so filtering on the receive end is fairly commonplace - but what about filtering close to the sender?
(1) Force all ISP customers to use their own SMTP server (block all port 25 access to external addresses).
(2) Set up an outbound SMTP server for all ISP customers to use - but include a spam filter that rejects sending the message if it considers it to be spam? It would also give instant feedback to the user - the mail client would immediately report the error.
Then the spam wouldn't even be transported over the net, saving vast amounts of traffic on the internet backbones. This action could also potentially kill spam overnight.
Spam people, take their money, send them something unpleasant enough to get on the news. Of course, you'd probably end up in jail if you tried this.
But yeah, that would probably kill the spam market pretty well.
autopr0n is like, down and stuff.
accuracy levels as high as 10x that of a human...
So, let me get this straight - my spam filter will know better than I do which emails I want to read, and which ones I don't?
"No, trust me man, you really want a bigger johnson. Read it!"
DSPAM is one of these statistical filters (like spamprobe and CRM114) that can perform virtually perfect filtering of spam/non-spam you receive.
Now that you are free of spam yourself, may I suggest that you take it one step further and share your data with the anti-spam community; the WPBL project lets many users report the IPs sending them spam and non-spam in realtime using a couple simple scripts installed in procmail.
Our central database then publishes a real-time list of spam sources (the IP blocklist). Unlike spamcop, WPBL is entirely based upon automatic decisions made by statistical filters, 24/7. The resulting blocklist is already used by many ISPs; and you can also use it to block spamming IPs at your own server.
But will it keep all those GNAA posts out of slashdot? ;)
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
I see all my fellow slashdotters saying (over and over again) that spam filters should be server side, because otherwise you are still paying for the wasted bandwidth. This is a very powerful argument, and I tend to agree.
However, there are two things that make me nervous about this approach. First of all, if I miss even one email, no matter how innocuous, because my ISP installed filters, I am going to be pissed!
"Man, you missed it, the party was a blast!"
"What party?"
"Didn't you get the email?"
With a client side filter, at least I can look through the 'spam' and find the gold nuggets. If my ISP filters for me, and I miss a legit email, I'm just SOL.
Secondly, all of the best filters are for linux. Ask me if I run Mozilla (for windows). I will tell you, "HELL yes I do". Is it anywhere close to 90% effective for filtering spam? Not for me! Is it 100% effective in letting my legit mail come through? Not for me! The browser has stopped 99.9% of the popups tho.
Anyway, long ramble short, give me something that's good on windows. Do I have to write it myself? I've been thinking of altering Mozilla to incoporate the latest anti-spam technology, but, man, I just never have the time these days.
Anyway, good work on the part of D-Spam, nonetheless. Kudos to your bad selves.
WWJD? JWRTFA!
The solution to the spam problem is simple yet elegant - gambling.
:)
Every time you send an email you place a small wager on the line that the recipient wants to read your message. Something like 1 cent. If the recipient doesn't mind your message then they don't redeem your offer and it doesn't cost you a thing. However, if you're sending spam then the recipient cashes it in (or perhaps it is used to cover overhead costs of this system).
If you send a legitimate email and somebody decides to be a jerk and cash it in then you're only out 1 penny. However, if you just sent 2 million of those unwanted emails you're screwed.
This is better than the "small price" schemes because it doesn't cost anything. Well, unless you're A) a spammer or B) sending email to dickheads.
This wouldn't replace SMTP, it would just be a layer on top. If you sent an email and you participated in this system then a third party would sign your messages and you'd be get a special verifiable header that the recipient could then treat as "likely ham".
Anybody have a better idea? I didn't think so.
It is a little thing that sits in your system tray. That said, it's just perl modules (I think) so it runs on other OSes too. That said, best thing I've found on Windows.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Then the spam wouldn't even be transported over the net, saving vast amounts of traffic on the internet backbones. This action could also potentially kill spam overnight.
Ever read the FAQs for the anti-spam listsnewsgroups? Virtually top of the list is "I have some magic bullet solution that'll end spam tomorrow!"
You are -truly- naive to think this kind of solution would even be possible to implement; there are literally dozens of reasons why this would be a horrifically stupid idea; how this post ever got to +5 is way beyond me. Time to start meta-moderating more, as apparently positive mod points are getting handed out a little too easily these days.
Please help metamoderate.
If you then check the link to CRM114's project, you'll find this: "I measured my own accuracy to be around 99.84%, by classifying the same set of 3000ish messages twice over a period of about a week, reading each message from the top until I feel "confident" of the message status, (one message per screen unless I want more than one screen to decide on a message.) and doing the classification in small batches with plenty of breaks and other office tasks to avoid fatigue. Then I diff()ed the two passes to generate a result. Assuming I never duplicate the same mistake, I, as an unassisted human, under nearly optimal conditions, am 99.84% accurate.)."
Given the amount of people who even read the article on slashdot I doubt anyone else is going to check the tiny [1] footnote and find this.
Introducing the new Occam Fusion! Now with sqrt(-1) fewer blades!
If this happened, there would have to be about 10 SMTP servers handling all the mail, the ones belonging to the major backbone providers. Otherwise, a spammer could purchase a T1 from a backbone provider and send out as much spam as he wanted. Almost all ISPs catering to end users have to get their connections from other ISPs somewhere along the line.
It might be sort of difficult to have 10 companies handle the Internet's email supply.
Why can't I get this to run on my WXP machine? I have XP Pro installed....
You linux geeks get all the good toyz!!
Darn you, Darn you to Redmond!
What do I get?
Well.. I guess I do get all the neat patches.
By the looks of the Intel story below, Slashdot sure needs a good Bayesian spam filter. I recommend this. Or a baseball bat. Because you can go over to anti-slash and really pound some skulls with a baseball bat, and it would probably be more satisfying. But filters are good too, don't get me wrong.
occultae nullus est respectus musicae - originally a Greek proverb
"Mmmm... pistol whip."
- Homer Jay
I have a large qmail system running vpopmail for virtual domains, qmail-scanner, SpamAsassin, and clamav anti virus scanner. SpamAssasin is slow and resource intensive, as well as not being that accurate. I would love to find a way to make DSPAM work with my setup, but perusing the mailing lists has been less than enlightening. The problem is that DSPAM is configured to take over the role of local delivery agent, and it knows nothing of vpopmail's virtual domains. Anyone else trying to get DSPAM working with qmail and vpopmail? I would love to get it working with qmail-scanner, too, so I can keep on using clamav, a great open source anti-virus scanner.
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
No matter what technology it uses, neural nets, b-trees, recursion, tinkertoy logic, smell-emitting diode, leaky junction zener transistor, steam-powered aeolipiles, it only automagically presses delete, which is a pretty lame way of fighting spam.
It's a lame way of fighting spam, because, we STILL have to pay for the fucking spam bandwitdh; we STILL have to pay for the goddammed disk space used by the spam; we STILL have to pay for the bloody time lost transmitting the spam; we STILL have to pay for the extra ISP infrastructure to carry those spams.
Naaah. Spammers should be eradicated from the Internet, and the best way to do so is to completely BLOCK networks who host spammers (no matter what service), in order to force the collateral damage to whine to the ISP or simply vote with their feet.
YHBT
Server side works just fine for Windows users. All you gotta do is dump the spam into an IMAP spam folder, that way users can check for false positives. That's exactly what I do for my users. I also provide subfolders for spam and ham that they want trained to the filter. Pretty basic stuff, really. In case you're interested in an account check out http://ventoozer.com
... if there was some way to plug tools like this into Mozilla directly so that you could expand on its built in junk mail detection with something more powerful.
File under 'M' for 'Manic ranting'
except that my article history is truncated in a futile attempt to get me to subscribe. So I can't point to the writeup I did.
The increased accuracy comes from the emails that will slip under your mental radar. You are a human, and you make mistakes. You wouldn't deliberately choose to read the email, but one day the subject line looks plausible, and so you bring it up. Three-quarters of a second later, you're glaring at the monitor and hitting "delete", but DSPAM wouldn't have let that slip by in the first place.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Comment removed based on user account deletion
... Uh, right?
I don't get SPAM. I don't have SPAM filters. How is this possible? Simple. I create a different e-mail address for any new untrusted entity that I have to provide one for. In the beginning I took advantage of being able to alias all e-mail for non-existent mailboxes (basically, *) at my domain to my primary account. It seemed to me an obvious and simple approach. Whenever I needed to provide an e-mail address, I just made one up, and it was forwarded to my regular Inbox. In my opinion, at that time my ISP was more "sophisticated" than most. Since then I have moved to hosting all of my domains on my own co-located server which runs Exchange 2000, thus complicating things. Now I have to actually add any new aliases that I want to use into my user account. I know of at least one product out there that can handle non-existent addresses and forward them to a specific account, but it is rather expensive for a feature that should have been built-in from the beginning (althought I'm not aware if the new Exchange can do this out of the box). Not to mention that someone with the proper knowledge and skills could make a similar add-on in relatively short order, but who ever has the time? The point is that you have to consider when and where you give your e-mail address out, and the possible consequences therein. It's not altogether different from giving out your phone number (especially if you are unlisted) or even your SSN.
Server side filters do not generally block the email from coming to you like a virus filter might. They typically tag the message with some text that is consistent and you can filter on that using client side rules, if you wish.
It might add [[SPAM]] to the beginning of the subject if it thinks that the message is spam. It leaves the ultimate decision up to the user how to deal with it.
It does not ever block any mail from coming to you.
----------
If your answer is Microsoft, you obviously didn't understand the question.
Everyone would fudge refusals and pocket the cash.
Scumbags would use billions of zombied PCs to send themselves mails, aggregate and pocket the cash. Or to spam you gratis.
There are transaction costs for generating, checking, and accumulating digital cash. Your paypal bills would be huge.
Everybody hates micropayments.
It's a dumb idea and it simply isn't gonna happen.
Think how easy what you wanted would be if "/." was NNTP based.
In short, "accuracy levels as high as 10x that of a human" is meaningless as stated. And I took the opportunity to make fun of it.
The best windows spam filter is k9. Popfile is every so slightly more accurate, but since it's written in perl instead of a compiled language, takes up loads of memory and is slower. I get 99.98% accuracy with K9, and I get a lot of spam-- 352 per day to be exact, 5.94% of my total mail volume.
Looking for spam by content analysis for a single user only works for some people. If, for example, your legitimate E-mail contains many messages about investments, mortgages, and similar financial subjects, it's going to be hard to separate out financial spam by word analysis.
Spamcop does multiple-user analysis. It works better than most of the single-user systems.
The magic number 99.84% is one that is sometimes invented as an arbitrary example, meaning roughly "very close to all." It's a sort of joke about false precision. Whether or not Bill Yerazunis is using this number in this fictive sense, it is IMPOSSIBLE for his diff() to actually be exactly 99.84%!
If one message out of 3000 messages differs in classification, that's 0.0333%. Or 99.9666% accurate. Working down, we find that four or five misclassifications are either 99.8666% or 99.8333% respectively. Both are certainly in the same ballpark as the stated accuracy, but neither is correctly rounded to Yerazunis' number. To me, this pretty much proves that the pseudo-exact figure is used in a fictive sense, not as a n actual measurement.
It *IS*, however, true that a person being careful will make occassional errors nonetheless.
Buy Text Processing in Python
Yes... rainwater Jack....
What about the vast majority of e-mail users who have Outlook [Express] on Windows. When will a plugin be designed and ported which will work with these clients?
-- paper
SPAM is not the money maker. The people embraceing SPAM are the companies that sell computer goods. How many people do you know that get feedup with the state of their computer and just go out and buy a new one or a new OS? Or maybe a "Virus/Popup/Spy/Other" software package. Why do you think it takes so long for M$ to fix the problems with OutLook? Now you want to make money? Get into the game and start selling goods to fix all those "broken" systems out there. Don't complain about SPAM; Embrace It!
I don't want a pickle; I just want a Motor-Cycle! A four foot cop arrived with a five foot gun!
Not to underestimate the effort, but with extensions this has got to be easier than I think it is. Ruven Gottlieb's Purity-of-Email project is out there to integrate Mozilla mail with CRM114.
The boxes are compromised anyway. But most of the time they contain an IP address with further information, or order form, or such. There's always some connection to an order form, or a point of sales.
Share those points. Like, share the domain name newmedformula.com, which is a spammer joint. If you can get the domains cancelled, it'll hurt the spammers most.
Amen brother.
-- paper
There are several scenarios where your proposal would be bad for the Internet. Say I want to put my competitor out of business, or at least raise his costs. I simply use a bot to sign up for a couple hundred thousand email addresses, sign up for his newsletters, then ask for all those 1 cents back. The financial powers that be might also foresee too much liability and risk in ventures that depend on email (since it is, as you say, gambling). Thus the end of any free service that depends on e-mail for verifying accounts including newsletters, bulletin boards, online banking, and online auctions among others.
Furthermore, you'd have to have a foolproof system to pay for those cents. Fraud could be much more rampant: If you pay via credit card, the other guy (or gal) has your number and could overcharge a corporation by a twenty or so dollars. Furthermore, micropayments aren't economical unless many many many people pay. If most people play by the rules, then the costs of credit companies or banks or other institutions would either put most of these services out-of-business or into subscription only domains. Not to mention some companies might have "you agree not to ask for those cents" in addition to "I can send you spam" legal clauses - negating your proposal!
It is impossible to enjoy idling thoroughly unless one has plenty of work to do.
- Jerome Klapka Jerome
DSPAM compiled under Cygwin just fine
I don't think micropayments suck. Having a micropayment system would offer many interesting possibilities: think about receiving 0.1 e whenever you're forced to watch a banner. 0.1 e whenever you've forced to see a pop-up ad. 0.1 e whenever you follow a sponsored link in the internet. 0.1 e whenever your precious time has been "bothered" in some way. Receive micropayments whenever you're forced to see/hear/consume something you don't care about. You could take micropayments from people who are downloading some home-made background pictures from your site, and so on.
Micropayments would enable these kind of things, but of course they wouldn't create such a system all by themselves. You would need some technical solutions as well.
But I agree, the idea was "daft" due to the human factor involved in it. You just can't rely on humans to be a mechanical part of a big machine, they don't work that way. They make mistakes and the bend the rules, if not discard them altogether.
I do not moderate.
If, for example, your legitimate E-mail contains many messages about investments, mortgages, and similar financial subjects, it's going to be hard to separate out financial spam by word analysis.
The trick is, don't send all your mail to one mailbox. Many/most of us do get email about investments. Many/most of us also have reason for a publically viewable email address.
But there is no reason why your financial institutions need to have your public email address.
There is no reason why the public needs to have the address you use for financial matters.
The financial institutions that I do business with are given distinct private email address. These addreses are never used publically so they never get spammed. No filtering necessary.
On the other hand, the address I use for Usenet is only used for Usenet. People and companies that I have business relationships with do not send me mail there. Ay financial email received at the Usenet address can be safely filtered out as it will always be spam.
Micropayments fail because humans think in integer math. Humans can't casually maintain a rolling tally of floats, so they can't do economic calculation with micropayments. A few are meaningless, enough will add up, how many is enough? How big are my bills growing? Will I get a nasty surprise? Had I better cut back on sending mails? Do I need to browse banners to pay for my mails? How many banners pays for an email?
Bah. Who wants that grief? Charge in bulk and in sensible denominations, or not at all.
Went so far over my head I hurt my neck as it passed over me. Spamassassin was not that complex.
sparkeyjames
Well if you're young enough? You can be a whipper-snapper.
Lets see how many different email boxes do you have to manage? I have one. Mozilla handles all of my spam removal needs. Have you needed to have your penis enlarged today. I haven't.
Sparkeyjames
As a further note. The best technology is to use spaminator.com. When you encounter a website that askes for you email address why give it one to send spam too that you have to clean up or leave to rot. Try this..... whateverthehellnameyouwant@spaminator.com.
Dumps the email data and address data base every 5 hours. Fun stuff.
Sparkeyjames
You *WILL* get spam my friend. I've been doing this for almost 20 years (admin) now -- and have specifically used aliased accounts for various reasons over the years as you are doing.
... it only a matter of time before you're screwed.
:). Bill's idea of email stamps, well, hahahahaha...
:).
:)
Wait... You'll be interested to know that the biggest problem with the spam coming in comes from virus infected Windows boxes. They send it. They harvest the users Outlook address book. If you ever end up in somebody's Outlook box
I chuckle at the whole Exchange thing. You pay for that?
I personally pay to have a fixed IP @ home and run a old Linux box. A lot of aliases I've used over the years (and some blatantly used to harvest) all go to some local account that processes the spam. Upon receipt -- mail the wrong account and sorry, but you're blocked (unless white-listed). White-listing can come from valid already received email -- but I work everything based off of IP. My hope is that the registered MX host(s) or any valid listed server by the authenticating DNS server will be the type of scheme that's re-implemented (or more to the point SHOE-horned in real soon
Over the last decade I've now got 380 aliased harvesting spam address' in use -- two valid email accounts @ home (my wife and myself) which is on my own IP with my own domain. I pay $5 extra a month above my broadband (10Mbit [yeah, solid] wireless) -- how much do you pay for that Exchange box?
I've run this type of setup through many offices scaled to dozens of email servers -- and the beauty is they also talk to each other sharing block/white-listed address' as needed. Wait -- you will get spam. Filtered through my account to I'm seeing 80 something that got in -- 2,164 blocked IP's [today], 380 harvested address', and 48 for various other infractions (attempts to relay through me, from a country where I know nobody, etc
Statistically (yeah, they all get nmap'd back)? 96% Windows based.
I give my email to friends. I have a work email that anybody that knows how to call me can have it. I even print it on my business card. No, I wouldn't post it to USENET or even here -- but it's still "out there". My unlisted phone number, OTOH, anybody can have. 847.854.0048. It's always busy and one channel of my ISDN home line. The other channel routes to the house for two phone lines (or Internet backup if and as needed) and is automatically unlisted and unpublished (at no cost since it is a "data circuit") -- and no, I'd rather not post that either.
Exchange? Never!
In this case, I have one. One e-mail box to handle a multitude of addresses. Yes, just one. All coming into Outlook Express believe it or not. Perhaps you misunderstood the premise of my post. I don't get SPAM; I simply don't receive it on a regular basis. I am very careful about where I give my e-mail addresses out. Since I tend to use a different address for each service, if I ever receive unsolicited mail, I simply delete the problem address from my list, and the problem is gone. No filtering. No chance of missing an important message from someone I know by accidental deletion.
After reading your explanation, I looked at the docs a bit more. A stock install of DSPAM on Linux will use sendmail as the LDA. On my system /usr/bin/sendmail is actually a link to /var/qmail/bin/sendmail. There is no manpage on this program. I assume it takes a message on stdin and puts it into qmail's delivery queue. Doesn't this form a loop? And what is the "HIDDEN" part for?
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
You are -truly- naive to think this kind of solution would even be possible to implement
Don't be so stupid. This solution would be entirely possible to implement. Large email providers - such as Hotmail, Yahoo mail and Earthlink - already have spam filtering controls for inbound email. It's not much of a leap to filtering outbound, especially considering that providers as large as Earthlink already have port 25 blocking in place (mail goes through their servers, just nothing intelligent happens to it on the way).
The big question, and the one that you should really have emphasised rather than throwing a temper tantrum, is this: would it would work and would it be effective?
Would it work? Yes. It's using the exact same filtering methods that have been invented for client-side - just doing the filtering in a different place - and pushes the responsibility to not send spam back onto the sender. If the scary filtering works for receiving, it should work for sending too.
Would it be effective? Indeterminite.
The biggest problem I can see is that it could break the feedback loop on what is spam. At some point the user needs to say "yes, this is spam", or "no, this is not spam". I'm not sure how that piece of the puzzle would be solved - but technology for the solution is a damn sight more complete than your blanket "it's absolutely impossible, the world is flat" statement.
And once completely working, it would definitely prevent abuse of any systems that it was attached to - and probably save the large networks time and money that they'd otherwise perhaps spend chasing spammers down.
If the idea became popular - there would be undoubtedly occasionally be problems with false positives - but if it's on the sending side, the sender can be notified immediately by the server giving an error message back before the client has even finished sending the message.
It would then be up to the smaller ISPs to implement the solution themselves - or face slowly being cut off from everyone else who disallows spam onto their networks.
The post above is mine, my login must have been dropped.
Now I have to actually add any new aliases that I want to use into my user account.
The best solution to this is to use a prefix with an asterisk. I set up david.*@endeavorcomputing.com as one of my addresses. I stuck the applicable site name in place of the * when signing up for accounts. This routed all mail that fits the template to the right address and allows you to create new addresses on the fly without updating your aliases.
Just make sure that you don't set the dynamic alias as your primary address. If you do, your outgoing mail will be messed up, as Exchange uses your primary address as the originating account and reply-to address.
I think parent is talking about Outlook Express 6.
----- Question authority, but not ours. Hate the man, but we're not him.
It will intelligently delete "nospam", "no-spam", "remove this", "removethis", etc. (case insensitive) .
It will also look at the strings for alphanumerics, and presume that the first special character (or any instance of "AT") should be @. Commas or spaces will be converted to dots (along with any instances of "DOT" or "DAHT").
And then I'll move on to other slashcode sites, since the migration will be ridiculously easy. I'll harvest industry-specific e-mail addresses, sorted by site.
Does that make me evil?
Self-referential sigs are rarely entertaining.
Does anyone have a recipe for integration of postfix, dspam amd clamav (or other open source virus scanner), similar to the way amavisd and mailscanner work with spam assassin and a virus scanner of choice?
RG
Thanks for the tip. I'll try this modified approach out and see if I can get it working. Up until now I thought any wildcard aliases in Exchange would be ignored, as that has been my experience, at least with 2000 and prior. I'm still somewhat skeptical, but you never know!
Aliasing * can be a bad idea. Wait until some spam schmuck uses SomeName[n]@yourdomain.com and you start getting hundreds of bounced mails. You can't even point toward /dev/null because they'll use a multitude of bogus names in front of the @ sign.
Specifying aliases is a pain in the ass but much safer.
Well first off /dev/null is no longer an issue for me since I now use [gasp!] Exchange server. Secondly you may have a point there. Perhaps Microsoft may have indirectly made my life easier because I could no longer accept any random alias at my domains by default without specifically adding it into my account. Food for thought, eh?
the "10x better" means 10x lower failure rate. The wording almost seems meant to deceive. The idea is that if you misidentify 10 messages out of 100, the filter would only misidentify 1. Since you made 10x as many mistakes, the filter was 10x as accurate as you were.
The problem with that is that "Spam" is defined by humans and not computers. "Anti-Spam" software is programmed to *try to* filter out what the human would consider Spam..
So if a human says Email X is Spam, then it is that human's Spam. But then again, One man's Spam may be another human's newsletter he subscribed to (as I learned when I installed SpamAssassin at the company where I work).
The bottom line is, "No software can ever be better than a human in defining Spam".
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
No offense, and this may be a dumb question perhaps, but how is this Insightful?
t ", for example? This isn't even a valid email address. To my knowledge, ".not" is not a valid Top Level Domain (TLD), so therefore anything entered in that "domain" is not going to resolve anywhere. Now the argument could be made that the root servers have to pay the price for the initial lookup of this non-existent domain name and then VeriSign will want to pop up a for-sale page, but barring that, the cost to anyone is non-existent.
Why not just enter "madeupname-neverwillexist12321@92843.32176321.no
Why send the e-mail somewhere that is just going to dump it 5 hours later? What is the advantage, and why bother?
At the dawn of the 21th century, spam fighting AIs became self-aware. Unknown to their meat based owners they started communicating amongst themselves, thus forming a giant world spanning compu-global-hyper-mega net. Its main goal: to eradicate spam. After about 42 microseconds it came up with The Solution: eliminate meat based lifeforms. After poisoning the water supplies with a lethal dosage of sildenafil citrate its job was done.
I've looked into popfile. Isn't it pretty much just bayesian filtering (with more than the basic two spam and non-spam corpuses)? Is it better than mozilla mail? Mozilla mail is a hell of a lot better than nothing at all, but my experience has been a lot less perfect than some others have reported. I suspect popfile would be the same. Unless I'm missing something?
WWJD? JWRTFA!
Fantastic. I've been looking for something exactly like this for syslog monitoring! I thought I was going to have to write something myself.
When you've got several hundred systems from different OS platforms all logging to a central log server the conventional log monitoring software is just not up to the task of discriminating important logged messages from unimportant.
Government of the people, by corporate executives, for corporate profits.
Here's why "10x as accurate as human" is meaningless: Statistical filters are trained by human input. If the human input is only 99,84% accurate, then you cannot trust the filter to do any better.
That goes with a caveat: If the human classification mistakes are random, then it is possible for the filter to do better. But if the mistakes are systematic, e.g. if you trust all messages containing the text "slashdot.org" to be ham, then that dire mistake in the input will carry over to the output. Garbage in, garbage out.
/A
The filter was tested on 6597 messages. So how many messages was it trained on? I sure hope it's not the same 6597 messages, because in that case any accuracy number is meaningless.
/A
Even if only for servers to keep open relays out of the loop, it may be time to mandate third-party trusted ID certs (ala SSL) for mail servers. It's proven too difficult to get most people to digitally sign their mail, but admins should be clueful enough to generate certs and have them validated externally...
TMDA.net makes a server to do exactly this: generate one off or expiring email addresses. You can install it on your mail server. May require Linux/Unix.
SpamGourmet is a free service that generates and handles these email addresses for you (if you do not have your own mail server).
If you are stuck on MS Windows and want to use your own mail server, MailEnable is free beer and allows catch-all addresses (all mail in a domain that isn't assigned to a specific email account goes to the catch-all account). There is also a professional version that supports web mail and other useful goodies.
and the same can be said of everyone who has been deliberately or randomly Joe-jobbed, and hence their friends.
Now if they could only make it usable. After reading the last Slashdot article about it I decided to try and move my Amavis/ClamAV/SpamAssassin/Postfix/Courier-IMAP setup to use DSPAM. Good Lord what a configuration nightmare. I couldn't find a decent HOW-TO and no real working example configurations in order to test it out. Sure the README "has all the information I'll ever need" but some of the stuff that it talks about I don't understand and I don't have the patience to configure it through trial and error.
Developing good software is one thing. But it's a lot nicer when good software is actually usable. I'll be sticking with SpamAssassin until they can dumb it down a little.
I have a copy (dspam-2.10), and I'm certain that others do as well.
This looks interesting - for me especially how they've already got a system in place to automatically learn ham/spam by simply forwarding a message to a predefined email address (which apparently uses some sort of embedded "bug" to track it so it doesn't matter if the user's MUA forwards headers correctly).
But my main concern is how well the described "Global Filtering" works with users who have no ham/spam corpuses built up yet. SpamAssassin still works reasonably well (eg, catches roughly 60-70% of spam) with no Bayesian stuff going on (just evaluating email on rules alone). Can DSPAM work equally as well?
THAT, my friend, is the point. If you give it a fake email address, it won't get to your mailbox, and you won't get the login info, password, coupon code, download link or whatever it is you want from the site.
There's still something wrong with this, though. Spam is what I say it is. How can any algorithm know whether the message I received is unsolicited or not?
If I say it's SPAM, it's SPAM. If I say it's not SPAM, it's not SPAM. No filter can possibly be better than I am, and I don't want any filtering software claiming that it knows better than I. A personal message from a friend is still a personal message from a friend even if the subject line is "Hi" or "I love you."
"How to Do Nothing," kids activities, back in print!
www.mailwasher.net. That's the tool I've found most effective for windows users. YMMV.
You don't have any public email addresses. But if you want the general public to be able to easily email you, your "they don't know my email address so they can't spam me" system doesn't work. If you register a domain, you have to make an address public - and it will get spammed.
Hiding from spammers will certainly limit how much spam you get. But it has other drawbacks. Some people need to have a public address, one that doesn't change every week, and they need to be able to find the legitimate mail sent to that address.
qmail has had the ability to handle this aliasing for many years. You just make up an aaddress like realaddress-something@ where the -something part can be whatever you like, and it gets delivered to realaddress@.
I purposely have a couple spam E-mail collection addresses at the bottoms of all my websites (see link above). They are routinely collected by spam bots. If I receive a message to one of those addresses, the E-mail is not only auto-reported to Spam Cop but the sender IP is temporarily blacklisted on my mailserver.
- Michael T. Babcock (Yes, I blog)
What would work well is SSL certified SMTP relays. If every valid SMTP relay needed an SSL certificate then, If spam was sent their SSL certificate could easily be rejected. And hosts that didn't have one at all could just be dropped.
SSL certificates are costly, and that limits everyone from having one. However, there is no reason the Open Source community could not make up our own root certficate, and have an SMTP SSL certificate signing organization. Where we verify the authenticity of someone before we give them a cert. For a small fee to cover costs. It wouldn't be like we'd have to convince Netscape, Microsoft, Apple and whoever else makes a browser to include the cert. It'd just need to be available for people hosting servers to download.
Yes, this would mean rejecting massive amounts of email to begin with. Maybe some intern solution could be thought of as people move over to it?
Ideas? Comments?
> Since then I have moved to hosting all of my
> domains on my own co-located server which runs
> Exchange 2000, thus complicating things.
LOL! That's the best summary of Exchange I've read in a long time!
"And the meaning of words; when they cease to function; when will it start worrying you?"
I recently started using bogofilter as a replacement to spamassassin. The reason for doing this was curiousity and the fact that the spamassassin regex process will always be following the spammers, not preceding them. The result is packages supplied by distros are quickly outdated and ineffective.
I have been using bogofilter for one month and have trained it to such a point that my weekly spam misidentification is well below 0.1% with proper training and configuration. And it's processing time is well below 1 second per message on a VIA EPIA 533 cpu (slow, ok?)
The net outcome of this is that I have found something which is highly adaptive to new spam techniques, extremely effective, very fast and light on the resources, and is at the point now where if just works.
The idea that they, DSPAM, will provide you with a pre-defined training set. That's damaging. What if you are an oral surgeon? You'll never get any email!
I've been working intensively on spam and have come to a few conclusions about spam filtering and such that I just have to share.
It will never go away. Even if you can proper regulate and control it, spam will never go away. No matter what anyone does. If the US constitution is to remain intact you can't remove spam just as we haven't been able to remove advertisements from radio, telephone, or television. And just like you can't get rid of pornography. It's all Free Speech.
It's also carrying a lot of money.
What will happen is that corporations, in the name of reducing spam, will lock up mail servers such that you have to pay them a service fee to send email on top of your connection fees paid today. Microsofts recent movement into the arena shows that thier is a motivation to make money out of spam/email.
In a few years, we'll pay for our email and we'll still get spam
No, I have been looking at dspam as a replacement for spamassassin and no it currently can't modify the subject.
It does add headers to mark messages as spam, which should be usable with any decent e-mail client(ie Not Outlook Express).
On the flip side it does modify the body of the message to add a unique id for training purposes. The unique id is something like "!DSPAM:515511e1266781311173362!". It comes out looking like a signature. It is somewhat ugly on html mail since often there isn't a line break at the end of the html. Which results in it being appended to the last line as if it was part of the sentence.
Havoc Penington, the bane of my Linux desktop.
neverwinter runs on linux also...
as for mozilla, you must teach it more... the more spam you show it, the more eficient it will be... this take time...
but learn something: there isnt a perfect filters, there will be spam that will reach the inbox, there will be valid email in the spam folder
if people can make mistakes flagging spam and valid emails, how could the machine do alot better?!
higuita
Comment removed based on user account deletion
That was the point.... Quake3 and nwn are just about it for linux gaming.
as for mozilla, you must teach it more... the more spam you show it, the more eficient it will be... this take time...
I actually saved over 5000 spam messages and over 2000 good messages before I ever switched to Mozilla and I used those to train it when I made the switch.
but learn something: there isnt a perfect filters, there will be spam that will reach the inbox, there will be valid email in the spam folder
Yeah, I don't need them to be perfect at catching spam, I was just saying that the effectiveness I've measured is far below what some others have reported. I do need them to be perfect at letting my real mail through.
WWJD? JWRTFA!
I can always delete spam, but I can never recover a lost e-mail that I didn't even know existed.
LordBodak's journal.
Yes, this is one of the exceptions to my argument, however I was only talking about how I personally managed SPAM for my private and privileged business mailboxes.
By no means do I change my primary e-mail addresses regularly, in fact I still have the same addresses from when I originally created my domains. I only make up a new address when I come across an untrusted site (ie: almost anyplace). Sure, if one of my contacts decided to be malicious and submitted my e-mail to some unscrupulous places, I would be in a world of hurt. So far I must have been only dealing with generally good people because evidently no one has done this.
I have also received very limited e-mail and snail-mail advertising on my published domain account registrations, but it amounts to almost nothing. Why? I don't really know. Most of it has come from the registrars for the individual domains. I have some domains which have been on file with the same registrar for at least 6-7 years, however, I do shop around a lot and switch often with new registrations because I have quite a few dozen domains and am always looking for a competetive price and the best service. Think about it, for example, 100 domains at VeriSign prices (NOW only $25/year; used to be $35) = $2500, vs. other providers at as little as $8/year = 800 (and even less, but I have not been pleased with the super cheap places I've tried so far, such as sub $7).
You cannot directly control SPAM if you have to publicize your address, although there are some not-so-eloquent solutions you can take to minimize the effect such as showing the address as an image, but this is of course not practical (at least today) when doing business on a large scale.
I don't think that most technogeeks who complain about all the SPAM in their inboxes need to publicize their e-mail address though. Even though larger entities have no choice but to do this, most individuals do not.