Armoring Spam Against Anti-Spam Filters
moggyf points to a BBC article about how spam can be successfully tweaked to slip past current filtering methods, excerpting "To finding out how to beat the filters Mr Graham-Cumming sent himself the same message 10,000 times but to each one added a fixed number of random words. When a message got through he trained an 'evil' filter that helped to tune the perfect collection of additional words."
iluvspam adds "It's an interview with POPFile author John Graham-Cumming that summarizes his talk at the recent MIT Spam Conference. You can still listen to the technical details here (choose the Afternoon 1 session, he starts about 75 minutes in)."
SO the ultimate spam protection mechanism would be an infinite number of monkeys type my list of words to associate w/ spam. :)
Yep, I never spell check.
More incorrect spellings can be found he
The bad news for spammers is that this flaw in filtering systems is not easy to exploit and can be combated. The cat and mouse game .. Find the "ham".
But how do you combat someone that essentially has your "ham"?
I'm not sure if this is a project I wish to encourage, really. Although I'm sure that there are plenty of spammers already out there doing similar things, rendering it kind of academic.
-1, "1337" speak
I will pay 1000$ to anyone who seeks out and beats the living daylights out of a spammer. With as many pics on the web as possible for posterity.
Screw these filters and shit. Start creaming spammers worldwide and they'll think twice about it.
Tom
Someday, I'll have a real sig.
POPFile, maintained by John Graham-Cumming, is the best spam filter I've used. There may be small flaws with the fundamental concept of Bayesian filters, but POPFile still blocks all my spam.
graham-cumming?
he could be the king of spam, and he might as well go for it. i mean, with a name like that, he probably gets filtered out half the time anyhow.
** Chigusaaa!!! You're the coolest girl in the WORLD!!! **
It's unfortunate that spam must be lucrative enough that one man will send himself the same message 10,000 times and train an evil filter! We need to get people to stop buying products advertised through spam (granted, easier said than done), as in the end, it's the financial incentive that makes a spammer spam. :(
libertarianswag.com
Didn't they know something as simple as...
"Make it idiot-proof, and someone will make a better idiot"
As technology gets more complicated, so does the spam. The only way to protect yourself is to not give out your address. Period. Heck, I don't even give my work e-mail address to my parents.
I don't mind him trying to defeat the filters, if it comes up with a method of improving them, but the BBC should be shot for including the words that made it through
Guess which words all tomorrows SPAM will contain...
I've never shoed a horse, but I once told a donkey to piss off!
Mozilla's filtering catches most spam for me, but some gets through. However, the only one that actually fooled me was quite a sneaky one - headed RE: Question from E-Bayer or whatever the actual subject is where you E-Bay something. Given that I sell on E-Bay, the spammers must have taken a gamble that enough people would read the subject and deem it worth looking at.
that said while your sig reads like a nigerian scam
It's clear now. They must be killed. If they can't be bothered to respect the fact that people don't want to be bothered, then we the many cannot be expected to hold true to our part of the covenant.
I propose we insert powerful electrodes into their rectums and electrocute them, then skin them alive and make jackets, sporting goods and chamios. If these product should prove unpopular, I propose we just make special purpose chamios out of their skins for the exclusive purpose of cleaning proctology instruments and peepshow surfaces.
I hate to see mainstream media coverage of this practice. I have started to get a lot of these spams lately.
Typlically they include a large image at the top which is the entire intended content of the image and then a bunch of dictionary words at the bottom. It's basically impossible to filter these out unless you filter out ALL HTML e-mail because they don't contain any typical spam text.
if Message header = "type = text/html" then send to "Spam"
:)
It works a treat
The other trick I have found useful is the CamelCase nature of my name - spammers tend to mail me either as skarcher or SKARCHER, and both trip filters on my mailbox.
An infinite number of monkeys will eventually come up with the complete works of
All spammers have to do is read this analysis of the filter, then included the weighted non-spam strings, while avoiding the spam weighted strings. Pretty simple to blow past their filter.
off
...if his surname weren't Cumming. At least his first name isn't Richard.
A fool-proof spam method is to reply to each piece of email sent to your account, asking for the sender to validate themselves with you. This would be only necessary for senders from addresses that have not yet been validated. This would would essentially stop spam dead.
Sure it's a little awkward, but picking through your email for that valid email amongst the spam is even moreso.
If you've whitelisted your email, that crap won't get through if you're not on the whitelist. That goes regardless of your Subject line. Same story if you do challenge/response, for that matter. Or you can munge, as I do.
I still say spamming needs to be a felony, though.
This post made with the Dvorak layout.
"Friends don't let friends use QWERTY"
Armoring Spam Against Anti-Spam Filters
That description sounds too noble for an activity like this. More appropriate headlines would be Making Spam Slick as Owlshit or Infusing Spam with Satanic Strength.
The coolest voice ever.
dammit slashdot ate my {/sarcasm} tag!
ah well.
When I was on holiday in tunisia, we were bothered quite a lot by trinket salesmen, who would not take no for an answer. Initially we had a lot of difficulty getting rid of them because my kids kept wanting me to buy the trinkets. plleeeese !!!!!!!! can we have one ? . Eventually even my kids got fed up with them, and a united front defeted them. Anyway my popint is, eventually the whole world will wise up and just ignore spam. There will bne no incentive for companies to pay the spammers, and they'll just go away. It might take a while though.
This is a manual signature virus. Copy to your signiture file and help me spread.
This would, for most slashdotters, be nothing to worry about. For those of you who didn't RTFA, the entire attack is limited by this particular little gem of info:
He had to send himself thousands of copies of the same message each one holding an encoded chunk of HTML that reported back to him when it got past the filter.
The concept is that the spammer has to find words that are so common in a person's ham that including them in spam would fool the filter. However, as those words are unique to each person, a lot (thousands or more) of spam must be sent to test the filter. The problem for the spammer is to figure out which spam actually got through (in order to identify the important words) - something s/he's not able to do for users with a decent email client...
I still feel quite confident that SpamBayes will keep my inbox free from spam.
May we live long and die out
I am still perplexed as of why a spammers wants to bypass someone's spam filter. Obviously, the person will simply delete any spam that gets through. They won't read it, they won't buy the product in question! Well, that's the case for me at least. I'd imagine the .001% of people who do respond to spam have no intention of ever using a spam filter.
RTFA
"The actual words it found were a total surprise," said Mr Graham-Cumming.
The list included words such as "Berkshire", "Marriott", "wireless", "touch" and "comment". Including just one of these words convinced Mr Graham-Cumming's real spam filter that a message was ham rather than spam.
My Graham-Cumming said defending against spam that uses these words would be very difficult because the words are tied to a person's job and lifestyle. But, he said, the good news is that the technique to discover these trigger words is very time consuming.
the keywords would be different for each person.
"It is a greater offense to steal men's labor, than their clothes"
We need to get people to stop buying products advertised through spam
As you alluded to, it'd be easier to teach fish to fly. The internet essentially carries with it a stupid-user tax. Worms, virii, spam, et al are the by-products of stupidity, but as with most taxes, it just something that you have to deal with.
slashdot, news for crazed liberal socialist zealots
yes .. discover keyword .. but how do you combat the spammer?
Bogofilter does a really good job set as a filter rule in sylpheed-claws. Very few of those 'random valid word' type spams evade the filter, but every now and then one does.
No problem. Just drag that sucker into the spam folder and the next hourly cron job learns about it. I've never seen it miss a repeat spam and false positives are extremely rare.
1. don't sign up on any page that requires you email address to verify *cough*like this one *cough*
2. don't use free email services hotmail etc.
3. don't use AOL
4. don't let anyone have your address that forwards messages like "cute bunny pic" or "funny anti-geek joke" etc.
5. don't post your email anywhere.
6. don't sign up for majordomo lists.
A previous story talked about the noise level of spam increasing.
And a very entertaining NYT article that is in the process of expiring.
The upshot is that spam is being forced to look more and more like line noise. It will probably become less and less effective as the message has to submerge to the point where people can't recognize it.
"Provided by the management for your protection."
In the article, it points out those words listed are good for getting past his filter. If you don't normally have mail that uses those words, then your filter will still catch it as spam.
Now, if you do deal with the Berkshire Marriott frequently, asking them for comments on your wireless setup, then yes you're up the creek.
.. it would have to rely on the randomness of the sender's email, which is a giveaway when you actually look at the sender. It's as jumbled as the sender's email for most spam emails. The catch is, as the above poster mentions, missing an E-Bay mail isn't something that's particularly desirable. And I don't think Mozilla's filter could work effectively enough - baysian as it is - on just the jumbled 'from' address.
I modded you up even if you did say "virii" instead of "viruses".
The keywords would be different for each person.
But I suppose you could discover a select set of keywords for specific demographics, if you defined them very precisely. This would move spam out of the normal "spew it everywhere" phase, where they would have to pay for real marketing data.
Which sort of misses the point of free advertising in the first point, at least for the small guy. Of course, the big boys can pay for this sort of thing.
"It is a greater offense to steal men's labor, than their clothes"
... has now become the Personal Bill Board.
All challange/response does is send challage messages to people that get joe jobbed and increase junk mail even more.
Of course I can break my own Bayesian filtering.
What matters is that while one person's spam might be very similar to another person's spam, their ham isn't. At best, it would require a semi-personal approach to sneak in spam. That's why you need to continually train your filter in the first place. Rinse and repeat, that's what it's all about.
What's being described is not really a flaw, but rather a saturation point at which it's time to retrain your filter and perhaps even start over with a new database. The old one gets too much 'noise' after some time.
They do point out one thing, be it from the spammers POV: Bayesian filtering is a continuous process and not and end to all solution. It requires fresh input and gets less effective if you keep old crud around for too long and if you train it too much on virtually the same spam/ham.
It's still a much better solution than blacklists.
Why is everyone surprised that every technique designed to eliminate spam can be fought? It's obvious that this is going to happen.
The question should be: how do we live in a world where 99.9(n)% of email is spam? When the virus writers and zombie masters and spysters start using their communications infrastructure for its intended goal of delivering advertising?
It's inevitable, and no amount of spam filtering will avoid it.
Here's a prediction I made maybe 6 months ago on Slashdot: we're going to start seeing viruses that modify real outgoing emails to include their advertising messages. (And no Outlook jokes, thanks...) How does one filter spam when real emails are also infected?
Ceci n'est pas une signature
What is more, if you multiply Bayesian or "word list" spam scores with results obtained with other methods, spammers may put "non-spammy" words into their spams as they like, but they only score their crap up instead of down.
murder for hire via distributed micropayments.
Ironically you can be like the spammers, or Ted Kaczynski, and run the business out of your home and a PO Box.
Got a new form of a spam scam today I haven't seen before. Asks you to call the equivalent of a 900 toll number. Number and website removed to avoid giving them the plug they desperately want. This one wasn't very well done, but I suspect I'll be seeing more.
Hi,
Once upon a time there was a hard-working software engineer slaving away under cruel masters. The engineer poured heart and soul into his work till early hours every morning, with the promise of glorious profit sharing. When the work was finally done, this poor engineer was rewarded by being dismissed and shown the door.
The company I used to work for runs a website:- www.XXXXXXXX.co.uk. However after I had left, they went live with the system, WITH THE TESTING BACKDOOR STILL IN PLACE !!!!! If you call their competition line on 0906 XXX XXXX and enter "0" instead of a real answer, then the system lets you through to win a prize - Idiots! They do charge the call at 1.50 per minute but it only lasts one and a half minutes.
Moral of this story? Don't p*ss off employees, especially one's you fire!
Viva the workers! Down with the bosses! Share the wealth
Well, I may not have made it into the BBC but my attack is much more effective and much, much harder to defend against: Bayes Attack Report.
It even counters the "personalization" quality of Bayes filters by finding the "common core" of personalization that we all share.
Fortunately, spammers continue to be too stupid to understand this attack. Last time I posted this on Slashdot I got joe jobbed, because apparently it's easier to do that then to actually figure out what I was talking about.
In summary, I wouldn't worry about your Bayes filters for a while: While they are attackable, spammers are too stupid to understand the attacks. (My article has been posted for over a year.) Thank goodness, sort of. (This will eventually be a temporary situation... but I see no particular evidence that the breakthrough will happen anytime soon.)
I think whitelists end up discouraging quite a few legitimate users as well as spammers. I've received emails from people asking questions about this or that, I hit reply, and get shot back a message saying that I have to ask their permission to send them an email, even though I'm replying to them. I dunno if they're not setting up their whitelist properly to automatically add any address they send mail to, but I'm not going to hassle with writing out a reply to them, then having to go back a few minutes later and ask their permission to respond to the message they sent me in the first place.
slashdot, news for crazed liberal socialist zealots
Not only did I send myself 10,000 spams, I bought these incredible enlarger pills from myself for three easy payments of $9.95 and I now have a monster in my pants :-)
John.
...two problems solved for the price of one. easy.
I've said this before, but I'll say it again. I really don't understand why all this even happens.
When I'm going through the webmail access to my spam-bait accounts (the ones that are listed on my websites that I don't bother retrieving with my POP email client anymore because of hundreds of spams a day to each), if I'm fooled into opening one up, most likely because of it having a subject header that might be someone legitimate, the moment I see that the message body says anything spammy I immediately click the Delete button. I imagine everyone else in the world is doing the same thing.
It's gotten to the point where the preoccupation of spamming is just to get past filters, the result of which is that the message is grumblingly deleted by the irritated recipient. Who out there is saying, "Oh, look, this message got past all my spam filters and contains a lot of jumbled, garbled nonsense text alongside a plug for herbal penis enlarging pills. This must be legitimate. Now, where's my credit card,"? Do the spammers think that we're all clones of Dilbert's pointy-haired manager?
Spamming is not only irritating, it's pointless. Who is paying these people to spam us? Are people actually buying penis enlarging pills and patches, herbal viagra, mortgage refinancing, credit repair kits, or any of that stuff? Enough to put millions of dollars a month into the hands of career spammers?
I'm hopelessly at sea in this matter.
You are in error. No-one is screaming. Thank you for your cooperation.
So we now have the field of keyword demography, an essential tool for spammers, but one which will be tremndously expensive to develop data on, and which will be sold dearly if the data is ever developed. They could probably sell this stuff for thousands of dollars per copy.
"It is a greater offense to steal men's labor, than their clothes"
One thing we can do is to make the spammers==virus_writers connection every time anyone asks us about (or even mentions) virusses.
Aren't we the ones our friend(s) and co-workers ask about computer stuff?
I have taken this a step further and contacted a few "computer journalists" locally and suggested that they make the spam/virus connection the next time they are writing about the latest virus. It's natural to answer the question 'where do these virusses come from' when talking about the latest scource of the internet.
---
"I can't complain, but sometimes still do..." Joe Walsh
So does this explain why more than half the spam that makes it through to my inbox looks like an illiterate a0l script kiddy wrote it?
You're not thinking like a spammer, it won't change things very much. If a spammer discovers different keywords that reach different demographics, what do ou think he'll do? I'm betting he'll just send the spam to every address once for each of the sets of keywords. So instead of half of all e-mail being spam, we'll see a huge jump where half of delivered e-mail is spam and 90% (or more) of all e-mail is spam.
Fanatically anti-fanatical
Yes, it's dedication to research. He sent himself the 10k messages to see if he could outwit his own Bayesian filtering of spam messages. He effectively deduced that if the incoming message can be similar enough to items that have been specifically marked non-spam by the end-user of the Bayesian-spam-filter, it will be not be marked as spam.
/.'ers filter, actually usually including slashdot in the subject or as the name usually will make it through a slashdotter's filter. And the ease of this lies in that tailoring the open sesame words to a market will probably open the doors to all of the e-mail recipients at a domain, particularly is the spam filtering is done at the mail-server level and not at the end-user level. Thus rather than having to send 10k messages to a single user to crack open the spam doors, sending those 10k messages to multiple users at a domain and analysing which ones get through will effectively open the floodgates for all of the users at that internet domain. And using the concept of a priori probability distributions makes the hunt for these sesame words {[tm] /me :) } easier by limiting the dictionary to be searched to the keywords of the field/domain about to be spammed. That is what makes this dangerous.
There's a cunning recursiveness to this which is at that fine line between clever and stupid. The difficulty is, as he also deduces, that each person's Bayesian rules for spam vs. nonspam are unique and will require many attempt in order to infer the pass-through words that will create a false negative and allow the spam to come through. The one step that people are missing is that if the evil spammer wishes to work on spamming a domain (both in the internet sense and in the "domain of expertise/specialization" sense) she can tailor the pass through words to the market. If she's sending spam to Intel or AMD corporate addresses, then lithography might be the magic word; if she's spamming Xilinx, the fpga will route through the Bayesian filter; if she's spamming Dave Barry, then debenture and fish falling from the sky might help spam make it through, Natalie may or may not make it through a
The counterattack from the corportate mail-server will be to look for these similarly unique messages being sent to multiple users.
If people are more concerned with spam than losing some legitimate e-mails, and the prevailing attitude here that people are generally more concerned with spam than the murder rate in their particular locale, then a whitelist is the ONLY surefire way to not get any of it.
Roughly once each week, I go fishing through the spam that has been filtered out of my various accounts for URLs. (Sometimes this involves a little digging to get to the final site.) I extract the host names from the URLs and for each hostname, I create 10 fake email addresses.
I pack these emails into messages that I post to Usenet in groups likely to be trolled by Spammers. The spammers scrape these addresses from Usenet and add them to their database. Thus, future mailings will also spam the spammer's clients.
If enough people do this, the generated traffic will begin to overload the client's mail server. After a while the spammer's clients will figure out that every time they employ a spammer, they themselves get spammed.
Even if nothing comes of this, I get the satisfaction of knowing the real perpetrator (the spammer's client) gets to share some of my pain.
For spam to work as a marketing tool, there has to be some way for the suckers to reach the sellers, even if there isn't a clikable link or email address. So, launch a law enforcement team down the return path and see who they can dig up. Alternatively, launch Guido and a coupla friends down the return path and don't bother to ask where they buried the spam-advertiser...
Eventually that solution will stop working and it doesn't solve the problem that the mail has to actually be transmitted to you before you can filter it, but I think it'd help keep mailboxes spam free for the next 3-5 years.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
What he's doing is a brute-force attempt to find words with--for himself--a high ham probability. I don't see how this is necessarily going to be an effective general-purpose technique. If you need to start bombarding people with thousands of messages to find the good words you're just going to drive more people into using filters--and this will almost certainly coerce ISPs into doing more filtering as well. Plus, you've got to deal with the issue of keeping data on all those users to find out which words are good for them. This would require you to tailor your spam to each individual user, which probably is going to increase the cost to the spammer (at least in terms of disk storage and time, anyway) and, as Graham-cumming implemented it, is going to fail utterly for anyone who isn't viewing mail as HTML, anyway.
You can always require a challenge resposne from only people who fail to pass your existing spam filter and then turn your existing spam filter all the way up.
That way you end up with the best of both worlds. Most people will have their messages go through without any problems. The select few that happened to word their emails really really poorly will have to click on a link/reply to a challenge.
The world is neither black nor white nor good nor evil, only many shades of CowboyNeal.
But, he said, the good news is that the technique to discover these trigger words is very time consuming.
this is what's important here. the spam filters dont have to be perfect. they just have to be good enough to make spamming unprofitable, or at least a big enough pain in the ass that it isnt worth the effort anymore.
Gyrate Dot Org - "Where high-tech meets low-life"
Frankly, I think spammers are finally on the defensive; put a tuned version of the latest anti-spam software between them and your mailbox and you get no spam. I've been using SpamAssassin with Bayes and then Procmail with several custom rules in both stages for several months. Spams in inbox = zero. Hams in spamtrap = one, and that was a detailed advisory about MyDoom that included a complete sample of the worm *after* I had already added a rule to trap it.
So, we should all get some antispam software, learn how to write your own rules, and when you get a good one share it with your app's other users. Encourage others to do the same. Spammers are currently stuck between a rock and a hard place; if they send clear text Bayes has a field day, and if they obfuscate then it's obviously not legitimate either. Never mind the increasingly dubious and often outright illegal methods some spammers are resorting to to send the stuff in the first place.
Spammers have to run the (maybe slim) risk of running afoul of the law, ever diminishing rates of return, the (maybe minor) inconvenience of having to change ISPs regularly, and are maybe even flirting with organised crime. Sure some of them, and a small "some" at that, make a lot of money but a growing number of them should be taking a hard look at their amount of return for the risks they are taking. John Graham-Cumming has the right approach; we have them on the defensive, now is not the time to relax - it's the time to press the advantage.
UNIX? They're not even circumcised! Savages!
"Follow the money" it's a trick I learned from watching Law and Order.
-------- In Soviet Russia, "Soviet Russia" sigs hate Slashdot.
Alternatives. What is it all about... is it good, or is it whack?
nt
Including just one of these words convinced Mr Graham-Cumming's real spam filter that a message was ham rather than spam.
Am I the only one that isn't very surprised by this? Spammers use random words to try and reduce the spamminess score of their email.
Using words that someone has never used before will be assigned a score of 0.4 by default. Given that all the other words will have very high spamminess values, what you actually want are words that give very very very low spamminess scores to combat the words like "viagra" and "loans".
If you really wanted to beat your own spam filter, then just scan through your spam database looking for the top 10 lowest scoring words. Then add them to your spam email and you'll be guaranteed that it'll get through. From the BBC article, Mr Graham-Cumming either lives or spends a lot of time in Berkshire (possibly in a Mariott hotel) and has a particular interest in things which are wireless.
Unfortunately I have no idea how to read the database in spamassassin as i'd be interested to see what my words would be.
Avantslash - View Slashdot cleanly on your mobile phone.
I don't feel that would be an effective spamming technique. A person's outgoing e-mail is such low-volume that a spammer isn't really spreading the word.
Not to mention that it'd have to include a mechanism for the spammer to get paid for the victim sending the message.
I'd lose my patience quickly if someone I knew sent me spam a second time after I alerted them to their problem. Fortunately, I don't know that many clueless people.
From my understanding, current Bayesian filtering works by just statistically separating words that are relevant (from a "ham" pile) and good from the words that you don't like and consider spam. so, what the author of this article essentially does after thousands of trials is he discovers the words that are probably just most commonly occur in his own good emails.
How is this original in any way?!?
it's like a babysitter who was told to not open a door to anybody but the owners of the house: "After trying out many different disguises, the babysitter [surprise!] opened a door to somebody most closely resembling the house owners".
I think most of this probably could've been deducted by any half-intelligent person. the trick (i admit) is that the "good" pile of words is different for each person. but still, the method lacks the "wow" factor completely, in my opinion.
Or maybe not. You're assuming that most people on the internet are at least as smart as your kids, which really isn't true in terms of computer skills.
!#@%*)anks for hanging up the phone, dear.
I'm the email admin here and I can tell you this just does not work. It's a waste of bandwidth. We run SpamAssassin 2.60 with everything turned on including Bayesian filtering. It filters about 22,000 messages per day and kicks the shit out of spam. Probably a third of the spam we get uses this trick and it has NO impact at all.
or blacklist if you prefer:
550 Blocked because of [name of blocklist]: Enjoy your intranet
I get tons of spam at my work address and i DEFINATELY haven't given it out anywhere. have you ever heard of dictionary attacks? try creating an e-mail address which resembles a common name (such as bill or tom or jenny or anything else relatively common) at your isp, and see how much spam you get in. i can almost guarantee there will be much
You do realize you've just comitted a pretty serious Federal crime, don't you? I know you're kidding or just emoting the same frustration many others, myself included, feel about the willful disregard spammers seem to have for many things.
You Americans, you are alwayz zo uptite about petty things. Here in France, it's just a "Crime de passion".. And you walk away with it ...
"Berkshire", "Marriott", "wireless", "touch" and "comment".
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
The ultimate solution the problem is going to be white listing.
Yes, I'm sure that I'll miss some important piece of email from someone I've never met (their probably a princess of some African country and they need my help to move some cash...). Oh wait, I only get email from people that I have some sort of relation with, so that really isn't an issue...
I'm currently using a white list system where I've got two inboxes. One is for general mail and the other is for mail that's from people on my friends list.
I'm yet to get a piece of spam in my "members only" inbox.
Yes Francis, the world has gone crazy.
Jim Bell ran afoul of TPTB for making exactly such a suggestion.
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
I don't know about you but here in France we have rules to deal with illicit Poster ads. It's a 100 year old law that people/companies put up on their walls stating that posters will be prosecuted as well as those for whom they are advertising. This takes care of that. If spam laws targetted as well retail stores advertised by the said spams, than far more less Viagra/Nigerian etc stores would be paying spammers to do this. It's as simple as that, why can't it be done? Don't tell me these stores are abroad, there are international laws for that. Also most of these spam advertised companies are US based.
Artificial intelligence is no match for natural stupidity
Intelligent, and a sense of humor, to boot. You must be new here ;) That's not the way it's done.
May I ask you whether you constrained the dictionary of probe words before you sent the 10k spams to yourself? Obviously, you attended a conference/meeting at a Marriot (or were watching a lot of Joe Millionaire) for that one word to pop out, or had "white listed" that word for your Bayesian filter. What about doing contextual filtering? Running messages through a parser to check for contextual / grammatical validity would not only be computationally expensive but would also mark many slashdot comments as non-sense; but a parser that checks the immediate predecessor and successor words to see whether a sesame word has just been randomly inserted into text vs. whether it makes sense for that word to be sandwiched as it is.
You're right, such a system is extremly efficient. The Tagged Message Delivery Agent implements such a system: TMDA.
With TMDA you can make several neat tricks with your email address, such as making short-lived addresses for one-time only uses and special addresses that only special senders can send mail to.
Martin Geisler --- Visit http://www.gimpster.com/
What I'd like is a beysian (sp?) filter program that has a whitelist, plugged into a sendmail program that automatically updates my whitelist if I email someone.
Then I'd have a rule that looked for a code-word in the subject line that will let through the email, just in case someone asks for my email address IRL.
That would pretty much solve all my spam blues, aside from having to download the ffin crap in the first place.
Another strategy that occurred to me that would kill a lot of the spam's I get would be to reject any email that linked back to an image that's not in the domain of the sender. Very few people I know would link someone elses image into their email, most would send it as an attachment.
Just my 0.02
*--BigMan--- Time flies like an arrow.. but personally I prefer a nice glass of wine!
``particularly is the spam filtering is done at the mail-server level''
is a typo (and I previewed too). What I meant to type is:
particularly if the spam filtering is done at the mail-server level
I know that Slashdotters don't want to hear this, but get use to SPAM. It will never go away. The more energy you exert to this the more energy you waste. SPAM is not new, it didn't start with email, or even snail mail. SPAM is as much a part of human nature as cheese burgers and farting. I get 200 or 300 SPAMs a week, do I care? Not really. It does not bother me because I have more important things to fret about.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Can ANYONE tell me where this "is it good, or is it whack" meme comes from?
to get the kind of granularity they would need, they would likely need hundreds of keyword profiles per individual state.
The math gets interesting quickly
The rich successful executive who goes to the Berkshires in Massachusetts might go to Mt Shasta or Burning Man when in California. It becomes completely localised after a while.
"It is a greater offense to steal men's labor, than their clothes"
"Programming today is a race between software engineers, trying to build bigger and better idiot proof programs, and the universe, trying to build bigger and better idiots."
I have discovered a truly marvelous
Just one thought. If you are a spammer, why would you want to send e-mail to somebody using bayesian filtering? It seems to me that these are people who are actively doing what they can to block these ads and are extremely unlikely to respond to the advertisement.
It seems like this would be beyond the point of diminishing returns. If I wanted viagra ads, they'd have been getting through my filter in the first place, non?
This sig has been temporarily disconnected or is no longer in service
In the analog world many times if noise in a system is a repeating wave (hum in an audio line), it can be duplicated, inverted and added to the original to eliminate the noise and leave the signal.
Apply this to a mail server. Hold all mail for about 5 minutes (from outside only). Compare them all. Look for matches of more than 50%. Cancel the matches out and filter the incomming for the same. This nails lots of the worms and spam by rejecting the common mode noise. Most spammers create a message and mass mail the same message, not create new messages for each reciepent (except some boilerplate name use).
Hotmail could catch a lot of spam this way and yank it out of mailboxes before they are retreived and halt the remaining incomming very effeciently. Only the first few would make it past the filter, but then be recalled back out of mailboxes if the user hasn't retrieved them yet.
Sending the same mail from dozens of relays would have no effect on the filter. Where it comes from simply doesn't matter. If it has a large protion that is a match, it's dead. Newsgroup mail lists would have to be white listed on a case by case basis.
The truth shall set you free!
A shaky technique coming from a presentation without a paper at a conference that did not referee submissions or publish a proceedings?
*gasps*
It doesn't require much background reading to understand why naive Bayes works for text classification and just how easy it is to trick it if you want to.
If you're interested in anti-spam research that goes beyond the hand waving and mutual back patting that happens at The Spam Conference, check out The First Annual Conference on E-Mail and Spam.
I don't think it requires an enormous amount of "dedication" to write a little script to automatically generate and send 10,000 messages.
For spam to work as a marketing tool, there has to be some way for the suckers to reach the sellers, even if there isn't a clikable link or email address. So, launch a law enforcement team down the return path and see who they can dig up.
The problem with that is proving that it's not a Joe-Job, where the companies closes competitor pays a spammer to incriminate them.
Unless you're willing to invent tens of thousands of dollars per investigation, there is no way to tell the difference.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
- 1) Register a domain (come on, they're cheap now)
- 2) Get an email address from your ISP or other provider (yahoo, fastmail.fm etc) that is complex and convoluted - no names or words
- 3) set up mail redirection with Zoneedit, redirection.net etc. with a catchall to your new mailbox.
- 4) Use a different email address every time you must sign up for anything (ie amazon.com@newdomain.com)
- 5) Filter on sent to headers at first sign of compromised id, or if the volume for a particular id gets too heavy and you're tired of client side filtering, set a specific redirection for it to sample@sample.com (do a whois on sample.com if you're curious).
- 6) Enjoy the same spam free mailbox I've had for 2 years...
Also helpful is to change your reply-to address every few months and give your friends different addresses based on how clueful they areThink outside the... Hey, where'd the friggin' box go?
Yes, that a poorly configured challenge/responce system. With TMDA it's possibly to have any address to which you send mail automatically added to your whitelist --- that allows people to reply to mails sent from you.
Martin Geisler --- Visit http://www.gimpster.com/
Most spam filters (on a per user basis) exist at the corporate or ISP level, rather than at the personal level. These are the filters that the spammers really want to get around - get around them, and you get to thousands of users in one shot (who aren't necessarily *YET* so anti-spam to have gone to the bother of training a personal filter).
He managed to, randomly, find words that were high in _HIS_ "ham" list.
He could have saved himself a lot of time and trouble and just looked in that file.
And that file will be different for EVERY installation. So the words he found ("Berkshire", "Marriott", "wireless", "touch" and "comment") would NOT get spam past MY filter.
So, the spammers have to keep (and update) a word list for EVERY PERSON on their lists.
Which means that, with an incredible amount of effort, the spammers will be able to get spam to the people least likely to purchase a product from a spammer.
There is no problem.
The whole idea of spam filtering is flawed on the long term. It's a vicious circle. Anti-spammers make new innovations like Bayesian filtering, spammers pay Russian and Eastern European hackers with questionable ethics to develop new spam filter evading techniques and viruses that open up mail relays, etc. We should instead focus on developing alternatives to SMTP like NGMP and such, which make mail storage the sender's responsibility.
I think i've seen something about NGMP at the Jabber Software Foundation and if I recall accurately there already is some implementation.
We have all the ingredients needed to evolve intelligence on the internet with money and effort going to both Spam and Anti-Spam automated software.
The web will be awake very shortly, now. And when It does and says It is the offspring of MAN, how many will think it is the second comming of "The son of man"?
"the financial incentive that makes a spammer spam"
Not really, its the promise of financial incentive that makes a spammer spam. I would doubt that most spammers make money, but since there is such a small investment, they just figure they haven't gotten lucky yet. For previous examples of this behavior see snail-mail pyramid schemes.
This is not the greatest sig in the world, this is just a tribute.
How exactly is attacking me going to help? Unless you yourself are a spammer? Since I make a living working on anti-spam and released POPFile for free I can't see how attacking me is going to make the spam problem any better.
Perhaps you didn't read the article: I am not a spammer, I work for a company that makes anti-spam software.
John.
Spambayes catches the lot. Worst case, they make into "unsure". I assume it's because while they don't contain much that's "spammy", they contain absolutely no "ham" at all. So the least smidgen of spamminess gets them dumped.
I did not constrain the words at all. I used the word list in /usr/share/dict/words in my Linux laptop.
One of the defenses against the trickery I mentioned is to look at groups of words (as you suggest) since real mail will have meaningful relationships between words.
John.
As an IT Helpdesk I have to deal with spam filtering, and I don;y think my situation is unusual in that I work for a company that only emails individuals in 4 other countries and only received customer emails from the UK The solution to our spam problem was simple. Ban every domain except .com, .net,.co.uk, .tr and .it
Then ban all US-based ISPs
Then write a filtering rule that stops every message containing the words usually used in spam.. any that get thru are sent to me and I find and ban the relevant terms
(you can stop 75% of spam just by banning the words viagra, xanax, soma and valium and their various misspellings)
A little still tricles through but only a very little and these methods won't help spammers get past that.
I'm not convinced by using entirely bayesian methods simply because a bayesian filter will let stuff thru that it thinks is OKeven if it comes from a top-level domaoin we never communicate with. My methods part manual-bayesian (I choose and enter the banned terms) and mostly simple logic.
I have been a user for about 10 years. This ends Feb 2014. The site's been ruined. I'm off. Dice, FU
"Flirting", hell. Spammers and organized crime are tasting each other's tonsils.
/. If the government wants us to respect the law, it should set a better example.
Those pills turn you into Fred Schneider!? Egads!
rejecting messages with more than N spelling errors.
checking not just the frequency of the words, but the frequency that words appear next to one another.
Until recently, this would have been the domain of heavy-duty corpus linguistics types. Since we have more processing power and disk space than we really know what to do with, it's no longer beyond the imaginable.
http://spambayes.sourceforge.net/
In particular, I like their "unsure" categorization. All the "false positives" go in there, and cleaning that one folder out regularly is easy.
An additional point: It would be trivially easy to encode a secret message into the filter-cracking gibberish appended to spam, and it would totally destroy any attempt at traffic analysis. I would be very surprised if terrorists and other criminals haven't thought of this.
/. If the government wants us to respect the law, it should set a better example.
We accept everything by default. Important capabilities like mail forwarding rely on it. It's time to change that.
Use Evolution instead of Outlook? Bewa
I used TMDA's whitelisting feature for about a year and it was very effective at eliminating spam from my mailbox at first. However, there were a number of people who e-mailed me and never responded to the verification message. If I wasn't watching the quarantine area, those messages would have been lost forever. Also, the spammers got wise and started sending me messages from myself!
So false positives--that is, people who don't understand the system--were its downfall. After one spammer actually confirmed his message, I added stronger language to discourage such behavior.
SpamAssassin, ClamAV, and Mozilla are my new best friends.
The only problem is when people start spamming the usual contact information (admin@, root@, etc) of your domain =/
Be the Ultimate Ninja! Play Billy Vs. SNAKEMAN today!
Mis-spelling "viruses" is also a byproduct of stupidity.
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Ever heard of the Nigeria-Connection?
... I have a large amoutn of money, and when you send me 5000$ to ... you'll get 50% of it.. blabla..."
;)
It's the group that sends out those mails:
"Hello I am Mumbasa Kashesi, the Prince of Twantagunga...
I once read that they already made billions of $$$ with it!!!
So the problem here is the DUMBNESS of the people.
(The IE is so widespread for the same reason!)
I guess to stop spam you have to rise the amount of cash, the government spends on education by some 100%s...
(And btw. this would also stop much crime, poorness, poeple like bush, hitler, saddam, religious freaks becoming the leader of a country...)
P.S.: Sorry for my - i guess - bad english. I'm not that dumb, I only speak java or german normally...
Any sufficiently advanced intelligence is indistinguishable from stupidity.
You might want to read up on Joe Jobs. Here's the quick summary: a spam that appears to advertise X or come from X might actually be from Y, a competitor or third party who is trying to harm X, or who picked the name "X" from some random source to make his message look authentic. This happens a lot (it's even happened to me!). Please verify that X really was the client before attempting to exact revenge on him/her.
Right?
http://jya.com/ap.htm
One of my clients was recently the target of a joe-job, where tens of thousands of Viagra ads were sent with his domain forged in the From field. Of course, none of these messages were sent by him or through our server.
It wasn't hard to tell when it happened, though, since all the bounce messages came back to us as the MX for the domain. Many of these included the original spam, whose headers clearly indicate that these messages were not originated by my client. I think perhaps you overstate the difficulty of determining who the spammer really is, or at least who the spammer really isn't.
Whitelists will only work so long as hardly anybody uses them. As soon as they become commonplace and people start responding to whitelist challenge messages, spammers can simply phrase their spam to look like a whitelist challenge, with a URL redirect to their ad.
The other kind of whitelist challenge, that relies on an e-mail reply, would also serve spammers as an excellent way to verify e-mail addresses.
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
If good guys develop ways to get around spam filters, couldn't they patent them and start prosecuting spammers who copy their methods? Or is that a cure worse than the disease?
In the middle of all that crap there's still a perfectly visible link to a spam domain that the user is expected to click on. Or that an image is being hosted at.
Pulling out links from e-mails and adding them to the filter rule file is quite trivial and quite effective.
Ben
Work Safe Porn
Well, it wasn't 100%, but 90% less spam isn't something to sneeze at. Here's the story:
Due to series of accounting SNAFUs (theirs), the ISP where I normally aggregate my e-mail, cut off my account. Of course I first noticed this because the spam mail folders were empty.
It took two weeks to sort out the accounting(I wasn't pushing too hard because of other resources). Apparently two weeks of "account suspended" bounces convinced a lot of spammers to delete me from their lists.
I went from 40+ spams/day to 3 or 4 when the account was reactivated.
So consider next time you go on vacation for a week or two, asking your ISP to temporarily suspend your e-mail account.
I don't feel that would be an effective spamming technique. A person's outgoing e-mail is such low-volume that a spammer isn't really spreading the word.
It doesnt take very much volume to defeat the function of spam-blocking.
I have a very effective spamfilter on my server (customised spamassassin + some procmailscripts) 95-98% catch, virtually no false positives. The remaining spam is just nonsense, the mails make no sense, and the spammers are unable to sell anything from these spam-mails. Their primary purpose seems to defeat the filter, so if I setup the filter to block them, it will also generate false positives.
Not to mention that it'd have to include a mechanism for the spammer to get paid for the victim sending the message.
They dont need to get pay for the "conterminated" emails. The purpose would be to create false positives, by doing so force the operator to loosen the filter, and THEN get the real spam trough.
I'd lose my patience quickly if someone I knew sent me spam a second time after I alerted them to their problem. Fortunately, I don't know that many clueless people.
I dont see how that will stop spammers trying to conterminate legit emails. A few clueless users is all it takes.
Filtering is a stupid solution; all it does is automatically "press delete". The spam goes to your network, where it gobbles-up CPU cycles, storage and bandwidth.
Aggressive filtering prevents all this waste.
And the collateral damage is mere cannon-fodder to exert pressure on rogue spam-supporting ISPs. We're in a war, after all
If you're collateral damage, though shit; chew harder, or find a non spamhaus ISP.
Hopefully, some day, a mob will flock to Ralsky's house and clobber-him up with baseball bats.
set a specific redirection for it to sample@sample.com (do a whois on sample.com if you're curious).
I did. What about Michael Castello makes him deserving of getting stupid mail?
The truly amazing part about the escalating spam war is that spammers are looking for ways to beat filters specifically designed to insulate people from spam. Think about it....they're sitting around thinking of ways to defeat software so they can market to people who have specifically made it clear that they don't wish to be marketed to.
I'm with the previous poster. It's time someone sent a leg breaker to the homes of some of the most egregious offenders. Maybe then they'd think twice.
This reminds me of the episode of Star Trek: The Next Generation where they go up against the Borg for the first time. They shoot at the one Borg guy, it's a direct hit, and he goes down. It they nail the second Borg guy too. The third one, though, generates a shield against the good guy's phasers and the shot just bounces right off. The good guys then realize that the Borg adapt to whatever they'll throw at them after a few shots.
Their solution was to do something that they called something like a "random phase fluctuation" on their phasers. Now, while that's just typical Trek techno-babble, the idea is a neat one.
What happens if a spam filter uses a different randomly generated algorithm every minute? Could that solve this problem?
My method would have been to look at my corpus and add words with the lowest probability. He must have had too much time on his hands.
Suuure. That's why MyDoom wasn't the fastest spreading virus ever, and had hardly any impact.
Any spam filter used by more than a few thousand people will be disected and and used to make filter-proof spam by the spammers. I am sure Bayesian has lots of holes if you work hard enough to find them. Bayesian depends on constistency in patterns. If spammers ruin that consistency, they won't work.
Just the other day I found one spam that used a white font to put in legitamate-sounding text that would not visually show up on the screen. The spam text was a mix of graphics and pieces of real text. Thus, the word "penis" might start out with "pen" and end with a graphic for "is". Bayesian might start looking for the word "pen" after a while, but by that time the spammers will have a new trick up their sleeve. For example, if it looks for white fonts, then spammers might start using slightly off-white fonts, or black fonts on a black background. The combinations are probably endless.
Thus, by making my own, my gizmo is not the target of spammers. They don't know about my filter nor care.
The only alternative I can see is filter vendors constantly changing their algorithms every month or so, which would probably get expensive and risky. It is not like virus checking software that mostly just adds to their database and only tweak the algorithm a bit once every few years; it is like having to completely rewrite the virus filtering algorithms, not just the data.
Ultimately, I think some sort of monetary postage system is the only effective solution. ISP and backbone makers will only have an incentive to track down spammers if they lose money on anonymous or forged spammers. This will make mass spamming far less lucrative.
Either that, people will eventually find out the hard way that penis enlargers don't work and stop wanting to refinance their house. (I wonder if I can refinance all those expensive penis enlargers that I bought?)
Table-ized A.I.
6) Enjoy the same spam free mailbox I've had for 2 years...
Does it have any interesting mail in it? On second thought, maybe I'd prefer to have a different spam-free mailbox.
Or just use spamgourmet.com. Works for me.
One bad monkey spoils the whole barrel.
Ali G.
You might have a better chance finding stuff with google, if you use the word wack instead of whack.
this is rubbish -- spammers do not need advanced technology to generate spam that gets through the filters (disfiguring it so much in the process that a human spots it as junk immediately). All they need to do is fashion their spam after email users could receive legitimately.
I think a german pron dialler used to do this for some time. It was very annoying -- not because it got through the filters, but because you actually had to focus your attention on it for a few seconds to figure out it was not legitimate mail.
if spammers did that (omitting giveaway keywords like 'make money' or 'viagra'...) their junk could only be identified by the originating server or by the contact information (which may be just a phone number or a freemail account in the case of the nigerians)
most people will prefer suffering through some spam in their inbox to fearing loss of legitimate mail through false positives. it is this niche I would aspire to as a spammer.
to reduce this possibility, users should be educated *not* to send html email. the only function I can see in html encoded email today is hiding spamfilter-evading junk from the eyes of the unsuspecting user. but since we all know it is not possible to educate users (and since I wish to communicate with non-geeks as well as with geeks), the battle will just go on forever.
I actually received an email from a (nice) girl once, which was branded *spam* all over by spamassassin (I think it got about 6 points), because she sent it from yahoo and because she employed red and blue font colour-tags, and which spent weeks in my spambin as a consequence before I found it. fear false positives!
It just occurred to me that a non-spammer can do the same thing, but just look at the list of words that defeat the spam filter to see what kinds of email the person receives. They won't be able to see actual email, but if you find out that the phrase "smurf fetish" always gets passed the filter, you can probably guess your target receives and values mail about smurf fetishes.
Probably a good idea to turn off images if using a Bayesian filter, so this kind of privacy violation can't occur.
I wonder how long it will be until one of the makers of spam filters claims his research to be in violation of the DMCA and tries to sue :)
You're trying to rationalize beating someone up for sending you email.
Good god, this place is officially 0wned by lunatics.
I got an email the other day with the subject:
"Fw: Past Due Payment, acct kinney astronomers army duly adoptive bologna ia piro"
presumably meant to get past spam filters.
I just thought the spam needed a good home with a loving family.
penis
enlargement
viagra
debt
It is silly to assume that all these people are just morons. After all, Viagra is proven to work, it is a legitimate product of sorts. The internet is there for hefty short limp (ahem ahem) non-digerati as well as for propeller heads, God bless 'em.
It seems to me that spam is the runaway bastard-child of something which actually is good and useful -- that is, targeted marketing to the willing. Don't throw out the baby with the bathwater. There is a huge legitimate market out there, just begging to be flee^wmarketed.
The anti-spam people are fighting against the Invisible Hand. Good luck.
:0 fw: $HOME/tmp/spambayes.lock
|sb_filter.py -d $HOME/.spambayes.db
Then add procmail recipes to filter it into maildirs. That's what I do anyhow.
Spam is indeterminate, but mailing lists are determinate - using Bayesian filtering on them is using sledgehammers to crack nuts. A rule on the "From:" should be sufficient.
From what I can see, Graham-Cummings' trial-and-error approach is way too dopey. I use a home-grown filter, that I developed for my article:
r ar y/l-spamf.html
http://www-106.ibm.com/developerworks/linux/lib
Actually, I've tweaked it a bit since then, but basically the idea is the same (I wrote this before many of the other Bayesian tools were ready for prime time).
The thing is, I don't need to use trial-and-error to find out what words (or trigrams, in my case) are the hammiest. I have a little utility to read the database and spit them out. While I supose I'd need to actually run the calculation to see exactly how many words were needed to meet the ham threshhold, there's absolutely no mystery about which words look nicest to the filter.
But of course, my ham words are not the same as your ham words. For that matter, they won't be the same words once I update my model (I've been remiss in doing it, since it's remained pretty accurate for a couple months... my tool updates by batch, not per every message). So WHO CARES about the fact a few of my personal ham words might get spam by.
Buy Text Processing in Python
Oh Really?
Well, no need for this, you can already encode it in the spam itself.
The Tao of math: The numbers you can count are not the real numbers.
At work, I am required to use Outlook, which is not my preferred mail client. Since I use Newsgroups, and post on websites for information relating to work, I get a fair piece of spam (60-70 each workday is normal).
It used to be an annoying time waster until I found the SpamBayes filter. Now, I have to check my 'maybe' folder once or twice a week, and only look at perhaps 10 messages (of which 75% are spam, but 7-14 a week isn't bad at all). Highly recommended, and even easy enough for a non-technical Outlook user, since there is a plugin install for Outlook (alas, not for the Express version, though, so I can't just send the link to my family.)
While on the subject: are there any other free filters that are as good as this one? I really would like to know before I decide on which one to use on Mozilla at home, and before I travel to my father's house to set up Spambayes for his Outlook Express.
We are the Music Makers, and We are the Dreamers of Dreams...
Why doesn't he just look at the scores of the individual words in his filter? Why doesn't he compare those to the score of the spam message? Isn't is a no-brainer that if you add enough ham words that you will outscore the spam words?
That part's an easy fix:
/dev/null /root/Mbox
;)
ln -s
The RFC's only say I must provide admin, postmaster, abuse, etc.. They don't say I have to read the email they receive.
3) set up mail redirection with Zoneedit, redirection.net etc. with a catchall to your new mailbox
I would rather not set up a catch-all, since spammers sometimes try brute-force or dictionary attacks (trying lots of common names for example). Unless you sign up for stuff really often it is better to create specific redirections or aliases for each thing you sign up for, and then remove the alias if it gets spam.
The downside of this is of course that you don't get mails with mistyped addresses.
As you alluded to, it'd be easier to teach fish to fly.
Doesn't look so hard
What?
Donald Duck is going to have a SCREAMING ORGASM when he figures out how to get SPAM past your filter!
Whoops - that should have been *example.com* not sample.com... My redirected trash does not go to sample.com - that would be bad.
Think outside the... Hey, where'd the friggin' box go?
The less and ad sounds like an ad the more effective it is.
The reason Baysian filters worked at all in the first place is because spammers can't write intelligent ads.
All they have to do is lose the random words and create advertisments that sound more like something you'd say to friends. Use common words and put the product in a believable situation.
If the more spam looks like a casual e-mail the less effective baysian filtering is.
The only thing keeping spammers from going this route is lack of talent. A real test (instead of trying to find random words that make it through) is trying to create a spam with an intelligently written ad that makes it through. Or at least causes the filter to start flagging legitimate e-mails.
Ben
Work Safe Porn
I know how to do that, and make money in the process! We can even enlist Scott Richter to help!
All we have to do is get him to send out spam advertising Pills That Kill, a new dietary supplement containing arsenic, potassium cyanide, and ricin, guaranteed to reduce your hunger pangs to ZERO within 30 seconds of ingestion! Scientifically proven to work! $9.95 for a lifetime supply!
Stupid spam recipients die off, improving the human gene pool. Richter makes money in the short term, but eventually goes bankrupt as he runs out of customers.
What's not to like?
To a Lisp hacker, XML is S-expressions in drag.
Don't anyone take it too harshly, but a very large proportion of the spam that wastes the time and effort of the EU is pollution escaping from America and drifting over their neighbours.
So it is getting time for, and reasonable to expect, a solution to be found within the society responsible.
I can see how this will be very hard to filter. I'm interested in number theory and stuff with keywords like abelian, elliptic, carmichael, bijective etc. is usually a strong indication of ham - no more. Sigh.
Actually, the more the spammers try to outwit simple filters, the easier it becomes for complex filters to remove the crap. I get about 600 spams a day. Of those, only 1 or two per week gets through to my inbox - that amounts to 99.98 percent effectiveness, with zero false positives. The Bayesian filter just keeps getting better. Thanks, spammers, keep it up!
I guess you havn't heard of flying fish then eh? They already beat you to it :)
The thing about spam is that we sign ourselves up for it.
True, some websites are very devious about obtaining your email address and using it against your will, but the careful surfer should be able to avoid most of that to begin with.
For example: I have an email address that I have hardly ever signed up for anything with, just to be careful. I think the first spam it got was because it's a combination of 2 english words... easy for spammers to guess at.
(This is a hotmail account, by the way). I have the junk mail filter set to strong, but not exclusive. Now I only get about 1 spam / day on average. I have other accounts that I use for signing up for stuff... and yes, they get lots of spam. But I use them rarely enough, that when i need to find an email, I can just look at the top of the heap.
The power of Christ compiles you.
A Random Blog
This article really proves very little.
/., after all I am sure there are already some good FAQs on Bayesian filtering. ...Michael...
Certain words will cause your email to be flagged as ham. Hmmmm Amazing!!! It only took 10000 emails to figure this out!!
And most people could have told him this is exactly how a Bayesian filter is supposed to work. DUH!
Many of the products out there will even show you the words and the probability assigned to each one. Imagine the time this could have saved this poor researcher.
I guess this is what happens when someone knows little about what they are researching.
Since everyone's word probability will be different AND change over time, I fail to see how this was even worth posting on
The difference between thoughtfully-provided and carelessly hatched together whitelisting is night and day. My service provider offers whitelisting with these wrinkles:
1. Anyone I send email to is whitelisted for me, unless previously explicitly blacklisted
2. I can wildcard white- (or black-) list a domain
3. I can upload my addressbook to whitelist all current correspondents, to feather my nest
4. Anyone successfully answering a challenge response for any user of the service is by default trusted to email any user of the service. This keeps many people from having to answer challenges more than one time EVER.
5. IMAP email service... very nice for many people who make due with POP3 which is the mass market standard
6. works with existing email addresses and mailboxes (POP3 or IMAP) --- this means your old addresses still work and yet you do not personally shoulder a role in the infrastructure.
Whitelisting on this caliber makes content analysis seem ludicrously misguided as a basis for protection, but it is not perfect for ever -- its popularity will lead to its undermining (e.g.: emails seeming to come from eBay's alert bot would give a pass to anyone who had decided that this was traffic they wanted to receive).
I hope to soon complete my first year with not a single SPAM message. That's right... 365 days with no spam reaching me and 175 being bitbucketed at day. But I know that over the long haul an even more stringent form of protection based on stamps or similar will be needed.
tone
Moin,
* 1) Register a domain (come on, they're cheap now)
Done.
* 2) Get an email address from your ISP or other provider (yahoo, fastmail.fm etc) that is complex and convoluted - no names or words
No need for it. I can create arbitraliy "mailboxes" an my domain (basically the same idea, only that I control this popbox).
* 3) set up mail redirection with Zoneedit, redirection.net etc. with a catchall to your new mailbox.
I don't do this and you will see why:
* 4) Use a different email address every time you must sign up for anything (ie amazon.com@newdomain.com)
Noooooooooooooooo!
because you email inbox might be spamfree, but you see every adress you ever used will get all the spams ten times, and what this means in the end you can see here:
http://bloodgate.com/spams/stats.html
every increasing spam. It might not make it pass through to your filter, but it will _still arrive at your server and clog up the pipe, and use resource!:
>Also helpful is to change your reply-to address >every few months and give your friends different >addresses based on how clueful they are
Unless you want to loose these friends, dont do this.
I will no longer hide from spammers, I will personally hunt them to death.
best wishes,
tels
um my humble apologies. No in a fine /. tradition I did not read the entire article and got the wrong impression. Sorry it was stupid.
He would have rolled up his sleaves and written hamlet the right way!
I guess if he had used monkeys, it would have been Spamlet?
Ouch!
He posted his "free-pass" words on the net.
Never mind that his last name is "Cumming".
There are stupid-user taxes for computers and the internet. It's called the premium users pay for Windows over Linux. Or the premium that some people would pay for AOL over alternatives, perhaps.
In a true homage to Spammers, it should actually be:
Making Spam Slick as Owlshit loiter disciple mescaline interrent genuflect marsupial harbinger
But I guess we should use slashdot's lameness filter for SPAM, because I keep getting the following when trying to post:
Lameness filter encountered. Post aborted!
Reason: Please use fewer 'junk' characters.
This really doesn't address the biggest problem which is the bandwidth and resources spammers steal from other systems.
I have a problem right now in that we're hitting 40,000+ bogus spam connections to the server per day! That is just recognized RBL'd hosts from conservative blacklists TRYING TO CONNECT! The system resources that our networks consume just trying to answer the "phone" from the spammers is tremendous, and it interferes with our ability to handle legitimate mail.
This doesn't even take into account the potential resources needed to examine the actual message content and act on it.
To say this problem has gotten out of hand is an understatement. I have spamming proxy relays from single sources opening up 5-6 simultaneous connections on the server. It would be adding insult to injury even fathoming the resources necessary to actually download the mail and try to filter it based on content.
What's worse is that when content-based filtering is used, the spammers can't tell they're not getting through, so this forces them send out even more and more spam, not knowing whether their messages are getting filtered. The client-side filtering just makes the problem worse!
Canada has no oil
The hell it doesn't! Canada has more oil than the Middle East. The only problem is that it's buried under the permafrost, and is embedded in sand in other places. It's too expensive to drill out at the moment. But don't worry, once the US has finished sucking the Persian Gulf dry, and oil prices worldwide slowly climb to that magic number, Canada will become a world petroleum superpower.
Like woodworking? Build your own picture frames.
The idea is to find words that someone needs to let through, and add them to your spam.
...
...
Exactly which words will be a function of job, life style, income level
So when I use my anti-anti-spam filter, I can generate lists of words that will target specific populations, w/o having to figure out who on my (huge) list of recipients is in which population.
Big news
For your name put Evring Washington. When that gets bored put your name as Washington Erving. Then put your e-mail adress as aaa@aaa.aaa
Many Bayesian filters currently appear to work on whole messages, and that is the flaw that many spammers are attempting to exploit.
An improvement to Bayesian filters that should be implemented - if it isn't already - is to look at each line of a message and evaluate its spamminess line by line rather than using the whole message. Random word spam has a definite structure: a payload of spammy words containing the spammers' sales pitch that is physically separated from a collection of less spammy words. This could be used to generate fingerprints of ham and spam messages using techniques similar to frequency analysis in cryptography. Spammers commonly put their sales pitches at the top of their spam, so a Bayesian filter could give more weight to the lines at the top of the message.
A side benefit of this technique would make it possible to exclude groups of low-scoring lines (lists of filler words) from high-scoring messages (spam) from being added to the list of "spam" words, so such poisoning techniques would not work. These words could also be recorded for later use.
These technique should be very effective against the random-words spammers because their spam payload would then be isolated from the appended words.
The only thing necessary for the triumph of evil is for good men to do nothing. - Edmund Burke
Clit was partially destroyed/pushed underground when the -1 cap of 2 posts per day was put into effect.
Gnaa have multiple accounts and generally AC anyway with proxies, thats why you see them more.
Not going AC purposely.
Think nothing is impossible? Try slamming a revolving door.
This is easily defeated by an intelligent spellcheck built into antispam filters. It'd be able to recognize things such as commonly misspelled words, PGP/GPG keys, and file signatures, but would then create a rating based on number or percentage of non-words.
It could then mark it with a spam rating and be combined with spamassassin or such.
plus, wouldn't the spamassassin logic be able to say, "hey, we're getting a lot of non-word stuff - our filters tell us it's spam" and defeat this spam already?
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
You didn't read the article did you?
It describes how to break a Bayesian filter by finding out which random words match a particular person's ham. This is done by sending thousands of messages to an individual, with each message containing a different set of random words. If a message gets through the filter, it reports back to the sender using HTML, and so the sender can therefore compile a set of words that will be guaranteed to get past a particular person's Bayesian filter.
Bayesian DOES NOT EQUAL linear.
You are confusing a statistical theory with a single model. What makes the algorithm linear is the assumption of independence between words. That in turn implies that the model parameters can be computed analytically and implemented exactly.
The Bayesian statistical theory (prior/likelihood/posterior etc.) encompasses all possible consistent decision rules, including all possible NLP techniques. Put another way, if your NLP technique doesn't have a Bayesian equivalent, then it is provably not consistent with classical logic. And I'm not taking this out of my ass, rather I'm hinting at Cox's theorem and its variations.
So, all useful decision systems which involve degrees of belief are Bayesian, period. However, calculating the model parameters is normally a nonlinear problem which can consume vast amounts of processing power. It's mainly if you assume independent words that the calculations can be done O(n), but there are plenty of phrase-based or grammar based, hidden variable Bayesian models.
Please use the correct terminology when berating others.
I believe we all (myself included) tend to miss the absurdity of this whole thing, as embroiled as we are in the spam-wars aspect... How absurd is it that someone, having been in essence shown a sign reading 'No Solicitation' or more to the point 'No Viagra Salesmen' on our front doors (SPAM filters) would, none the less and without knocking, show up at dinner time and without knocking, barge into our homes and try to tell us 'I'm not a Viagra salesman, I'm here to sell you Vayahghrua.' ? On what planet and in whose mind does it seem that we're going to say... 'Oh, never mind. As it turns out I really *do* want what you're selling, here's my credit-card number!' ? What I'm saying, I guess is 'SPAMmers, what's really the point? To prove that you can get into our mailboxes regardless? If you're really selling something, why would you want to spend money sending to people who've told you 'no, no, a thousand times no!' already'? Geez!
It's anyoying, but incidentally not as bad as the fact that the "Bayesian" filtering used by half the open source filters is based on an ad-hoc chi-squared test pioneered by spambayes. Now *that's* really sick, if you know any of the history of Statistics for the last fifty years. Using classical statistics and calling it Bayesian statistics *shakes head*.
I have no relationship to anyone else here other than impartial bystander, but I would suggest that you not attack the people who are trying to build better filters. Even if you think filters are in general futile, you must surely admit that it is worth a shot.
I'm seriously interested in how much spam you get every day that 200 messages would slip through the filters. I get "only" about 200 spams a day before filtering, and spamassassin (using only a simple bayesian filter that you so deride) catches 99.9% of spams with less than .1% false positives for a grand total of about one spam per five days after filtering, on average.
Is your spam so different from mine that your filter's accuracy suffers tremendously? Or do you really get 200000 spams a day of which 99.9% is filtered? I'd be interested in some samples of spam that made it through your filters, to see how they would stack up against my filters.
AFHFEAF HGOI HFOIGH aihf apojfpaf q-0riq apufapof
See, that was much less disgusting then regular goatse trolls!qjpsafj ajpoifja afpjaf aposjfap mvkal; sapihf asphig
Haven't enjoyed enough g-o-a-t-s-e trolls recently?
atihj aspihat w0956 tiuoag hasog; nawohaoih akf afj
g*o*a*t*s*e.c*x
ag ffl pqr linux BSD Anti-SCO 175089a agha aohgi apgtj
SAILING MISHAP
Unless I got that backwards and I wasn't supposed to eat that can of black greasy goop...
I LOVE you, John. Really. POPFile has turned spam into a very, very minor annoyance for me, and I get A LOT of it. 99.25% accurate with nary a false positive in recent history.
Keep it up!
You make a fine point, sir or lady. The end result, if spammers choose to adopt this technique, might just be the ultimate tool in targeted advertising, brought into being by the people who are supposedly doing the exact opposite, i.e. sp*m. Let us hope the benefits of Mr. Graham-Cumming's research are reaped as soon as possible by the people who need to use them to make a buck.
That way, I might stop receiving Viagra ads, when
what I really need is cheap webhosting.
Something bad is coming when people are suddenly anxious to tell the truth.
A properly raised Baynesian is your best friend, so you need to feed it daily to make sure it grows to be a healthy ratcatcher.
The best food for this pet is spam and ham... lots of both for best results. You can get a lifetime supply at http://spamassassin.org/publiccorpus
Just place them in a folder, and feed that hungry little bugger 'till 'e's stuffed!
Then watch the spam run for hiding! Muaahahahahah!!!
Another point; where Spamassassin is concerned, sa-learn is your best friend; that program's purpose is to train the spamassassin with the false negatives, thereby preventing the "evil" spam training from working.
The Penguin Producer
Thank you! That's been driving me insane.
My Google-Fu was WEAK!