Armoring Spam Against Anti-Spam Filters
moggyf points to a BBC article about how spam can be successfully tweaked to slip past current filtering methods, excerpting "To finding out how to beat the filters Mr Graham-Cumming sent himself the same message 10,000 times but to each one added a fixed number of random words. When a message got through he trained an 'evil' filter that helped to tune the perfect collection of additional words."
iluvspam adds "It's an interview with POPFile author John Graham-Cumming that summarizes his talk at the recent MIT Spam Conference. You can still listen to the technical details here (choose the Afternoon 1 session, he starts about 75 minutes in)."
SO the ultimate spam protection mechanism would be an infinite number of monkeys type my list of words to associate w/ spam. :)
Yep, I never spell check.
More incorrect spellings can be found he
I will pay 1000$ to anyone who seeks out and beats the living daylights out of a spammer. With as many pics on the web as possible for posterity.
Screw these filters and shit. Start creaming spammers worldwide and they'll think twice about it.
Tom
Someday, I'll have a real sig.
POPFile, maintained by John Graham-Cumming, is the best spam filter I've used. There may be small flaws with the fundamental concept of Bayesian filters, but POPFile still blocks all my spam.
Didn't they know something as simple as...
"Make it idiot-proof, and someone will make a better idiot"
As technology gets more complicated, so does the spam. The only way to protect yourself is to not give out your address. Period. Heck, I don't even give my work e-mail address to my parents.
I don't mind him trying to defeat the filters, if it comes up with a method of improving them, but the BBC should be shot for including the words that made it through
Guess which words all tomorrows SPAM will contain...
I've never shoed a horse, but I once told a donkey to piss off!
Mozilla's filtering catches most spam for me, but some gets through. However, the only one that actually fooled me was quite a sneaky one - headed RE: Question from E-Bayer or whatever the actual subject is where you E-Bay something. Given that I sell on E-Bay, the spammers must have taken a gamble that enough people would read the subject and deem it worth looking at.
Like many other academic studies, such as skinning humans alive to see how long they can live, I think this one should only be placed into the right hands.
It's a pisser that spammers now have another tool to circumvent filters; on the other hand, the people who write the filters know exactly what a spammer would do to make "better" spam.
The question is: who will implement first?
I hate to see mainstream media coverage of this practice. I have started to get a lot of these spams lately.
Typlically they include a large image at the top which is the entire intended content of the image and then a bunch of dictionary words at the bottom. It's basically impossible to filter these out unless you filter out ALL HTML e-mail because they don't contain any typical spam text.
if Message header = "type = text/html" then send to "Spam"
:)
It works a treat
The other trick I have found useful is the CamelCase nature of my name - spammers tend to mail me either as skarcher or SKARCHER, and both trip filters on my mailbox.
An infinite number of monkeys will eventually come up with the complete works of
...if his surname weren't Cumming. At least his first name isn't Richard.
Yes, that's a constant problem for me (and anyone else named Cumming or Cummings in the world). For example I can't get a Hotmail email account because of my name, but I did manage to sign up an account using the name Ivana Watch-Teens-Give-Head :-)
John.
Armoring Spam Against Anti-Spam Filters
That description sounds too noble for an activity like this. More appropriate headlines would be Making Spam Slick as Owlshit or Infusing Spam with Satanic Strength.
The coolest voice ever.
This would, for most slashdotters, be nothing to worry about. For those of you who didn't RTFA, the entire attack is limited by this particular little gem of info:
He had to send himself thousands of copies of the same message each one holding an encoded chunk of HTML that reported back to him when it got past the filter.
The concept is that the spammer has to find words that are so common in a person's ham that including them in spam would fool the filter. However, as those words are unique to each person, a lot (thousands or more) of spam must be sent to test the filter. The problem for the spammer is to figure out which spam actually got through (in order to identify the important words) - something s/he's not able to do for users with a decent email client...
I still feel quite confident that SpamBayes will keep my inbox free from spam.
May we live long and die out
If people working in anti-spam don't try to break their own filters the spammers will do it for them and we'll be worse off.
There's a direct analogy with cryptographic techniques where breaking them is most of the work... that way we know that they are secure.
John.
We need to get people to stop buying products advertised through spam
As you alluded to, it'd be easier to teach fish to fly. The internet essentially carries with it a stupid-user tax. Worms, virii, spam, et al are the by-products of stupidity, but as with most taxes, it just something that you have to deal with.
slashdot, news for crazed liberal socialist zealots
1. don't sign up on any page that requires you email address to verify *cough*like this one *cough*
2. don't use free email services hotmail etc.
3. don't use AOL
4. don't let anyone have your address that forwards messages like "cute bunny pic" or "funny anti-geek joke" etc.
5. don't post your email anywhere.
6. don't sign up for majordomo lists.
A previous story talked about the noise level of spam increasing.
And a very entertaining NYT article that is in the process of expiring.
The upshot is that spam is being forced to look more and more like line noise. It will probably become less and less effective as the message has to submerge to the point where people can't recognize it.
"Provided by the management for your protection."
In the article, it points out those words listed are good for getting past his filter. If you don't normally have mail that uses those words, then your filter will still catch it as spam.
Now, if you do deal with the Berkshire Marriott frequently, asking them for comments on your wireless setup, then yes you're up the creek.
The keywords would be different for each person.
But I suppose you could discover a select set of keywords for specific demographics, if you defined them very precisely. This would move spam out of the normal "spew it everywhere" phase, where they would have to pay for real marketing data.
Which sort of misses the point of free advertising in the first point, at least for the small guy. Of course, the big boys can pay for this sort of thing.
"It is a greater offense to steal men's labor, than their clothes"
Of course I can break my own Bayesian filtering.
What matters is that while one person's spam might be very similar to another person's spam, their ham isn't. At best, it would require a semi-personal approach to sneak in spam. That's why you need to continually train your filter in the first place. Rinse and repeat, that's what it's all about.
What's being described is not really a flaw, but rather a saturation point at which it's time to retrain your filter and perhaps even start over with a new database. The old one gets too much 'noise' after some time.
They do point out one thing, be it from the spammers POV: Bayesian filtering is a continuous process and not and end to all solution. It requires fresh input and gets less effective if you keep old crud around for too long and if you train it too much on virtually the same spam/ham.
It's still a much better solution than blacklists.
Why is everyone surprised that every technique designed to eliminate spam can be fought? It's obvious that this is going to happen.
The question should be: how do we live in a world where 99.9(n)% of email is spam? When the virus writers and zombie masters and spysters start using their communications infrastructure for its intended goal of delivering advertising?
It's inevitable, and no amount of spam filtering will avoid it.
Here's a prediction I made maybe 6 months ago on Slashdot: we're going to start seeing viruses that modify real outgoing emails to include their advertising messages. (And no Outlook jokes, thanks...) How does one filter spam when real emails are also infected?
Ceci n'est pas une signature
Well, I may not have made it into the BBC but my attack is much more effective and much, much harder to defend against: Bayes Attack Report.
It even counters the "personalization" quality of Bayes filters by finding the "common core" of personalization that we all share.
Fortunately, spammers continue to be too stupid to understand this attack. Last time I posted this on Slashdot I got joe jobbed, because apparently it's easier to do that then to actually figure out what I was talking about.
In summary, I wouldn't worry about your Bayes filters for a while: While they are attackable, spammers are too stupid to understand the attacks. (My article has been posted for over a year.) Thank goodness, sort of. (This will eventually be a temporary situation... but I see no particular evidence that the breakthrough will happen anytime soon.)
I've said this before, but I'll say it again. I really don't understand why all this even happens.
When I'm going through the webmail access to my spam-bait accounts (the ones that are listed on my websites that I don't bother retrieving with my POP email client anymore because of hundreds of spams a day to each), if I'm fooled into opening one up, most likely because of it having a subject header that might be someone legitimate, the moment I see that the message body says anything spammy I immediately click the Delete button. I imagine everyone else in the world is doing the same thing.
It's gotten to the point where the preoccupation of spamming is just to get past filters, the result of which is that the message is grumblingly deleted by the irritated recipient. Who out there is saying, "Oh, look, this message got past all my spam filters and contains a lot of jumbled, garbled nonsense text alongside a plug for herbal penis enlarging pills. This must be legitimate. Now, where's my credit card,"? Do the spammers think that we're all clones of Dilbert's pointy-haired manager?
Spamming is not only irritating, it's pointless. Who is paying these people to spam us? Are people actually buying penis enlarging pills and patches, herbal viagra, mortgage refinancing, credit repair kits, or any of that stuff? Enough to put millions of dollars a month into the hands of career spammers?
I'm hopelessly at sea in this matter.
You are in error. No-one is screaming. Thank you for your cooperation.
One thing we can do is to make the spammers==virus_writers connection every time anyone asks us about (or even mentions) virusses.
Aren't we the ones our friend(s) and co-workers ask about computer stuff?
I have taken this a step further and contacted a few "computer journalists" locally and suggested that they make the spam/virus connection the next time they are writing about the latest virus. It's natural to answer the question 'where do these virusses come from' when talking about the latest scource of the internet.
---
"I can't complain, but sometimes still do..." Joe Walsh
Yes, it's dedication to research. He sent himself the 10k messages to see if he could outwit his own Bayesian filtering of spam messages. He effectively deduced that if the incoming message can be similar enough to items that have been specifically marked non-spam by the end-user of the Bayesian-spam-filter, it will be not be marked as spam.
/.'ers filter, actually usually including slashdot in the subject or as the name usually will make it through a slashdotter's filter. And the ease of this lies in that tailoring the open sesame words to a market will probably open the doors to all of the e-mail recipients at a domain, particularly is the spam filtering is done at the mail-server level and not at the end-user level. Thus rather than having to send 10k messages to a single user to crack open the spam doors, sending those 10k messages to multiple users at a domain and analysing which ones get through will effectively open the floodgates for all of the users at that internet domain. And using the concept of a priori probability distributions makes the hunt for these sesame words {[tm] /me :) } easier by limiting the dictionary to be searched to the keywords of the field/domain about to be spammed. That is what makes this dangerous.
There's a cunning recursiveness to this which is at that fine line between clever and stupid. The difficulty is, as he also deduces, that each person's Bayesian rules for spam vs. nonspam are unique and will require many attempt in order to infer the pass-through words that will create a false negative and allow the spam to come through. The one step that people are missing is that if the evil spammer wishes to work on spamming a domain (both in the internet sense and in the "domain of expertise/specialization" sense) she can tailor the pass through words to the market. If she's sending spam to Intel or AMD corporate addresses, then lithography might be the magic word; if she's spamming Xilinx, the fpga will route through the Bayesian filter; if she's spamming Dave Barry, then debenture and fish falling from the sky might help spam make it through, Natalie may or may not make it through a
The counterattack from the corportate mail-server will be to look for these similarly unique messages being sent to multiple users.
What he's doing is a brute-force attempt to find words with--for himself--a high ham probability. I don't see how this is necessarily going to be an effective general-purpose technique. If you need to start bombarding people with thousands of messages to find the good words you're just going to drive more people into using filters--and this will almost certainly coerce ISPs into doing more filtering as well. Plus, you've got to deal with the issue of keeping data on all those users to find out which words are good for them. This would require you to tailor your spam to each individual user, which probably is going to increase the cost to the spammer (at least in terms of disk storage and time, anyway) and, as Graham-cumming implemented it, is going to fail utterly for anyone who isn't viewing mail as HTML, anyway.
but how do you combat the spammer?
1. Find spammer
2. Kill spammer
3. Become hero of the interweb
4. Write book from prison
5. ???
6. Profit!
Your question is exactly why the death penalty belongs on the street, not in prison.
Eve Fairbanks says I drive a hybrid!LOL
The best solution I have found so far is to have your own domain and generate specific email addresses for specific types of communications. You keep your actual ISP email address totally secret and don't give it to anybody except your domain registrar. You then generate an address for your best friends and aquaintances you can trust and keep it separate from everything else so you don't have to change it but once every few years if that. You have a specific Shopping and Registration address you kill and replace after it becomes spammy. And you have an address for things like newletters and email groups you can also change and reregister if they leak out to the spam boobs. There are all kinds of variations on this theme, but that's the basic gist of the matter: Secrecy and flexibility.
"Is this Winkhorst a nova criminal?" "No just a technical sergeant wanted for interrogation."
- 1) Register a domain (come on, they're cheap now)
- 2) Get an email address from your ISP or other provider (yahoo, fastmail.fm etc) that is complex and convoluted - no names or words
- 3) set up mail redirection with Zoneedit, redirection.net etc. with a catchall to your new mailbox.
- 4) Use a different email address every time you must sign up for anything (ie amazon.com@newdomain.com)
- 5) Filter on sent to headers at first sign of compromised id, or if the volume for a particular id gets too heavy and you're tired of client side filtering, set a specific redirection for it to sample@sample.com (do a whois on sample.com if you're curious).
- 6) Enjoy the same spam free mailbox I've had for 2 years...
Also helpful is to change your reply-to address every few months and give your friends different addresses based on how clueful they areThink outside the... Hey, where'd the friggin' box go?
He managed to, randomly, find words that were high in _HIS_ "ham" list.
He could have saved himself a lot of time and trouble and just looked in that file.
And that file will be different for EVERY installation. So the words he found ("Berkshire", "Marriott", "wireless", "touch" and "comment") would NOT get spam past MY filter.
So, the spammers have to keep (and update) a word list for EVERY PERSON on their lists.
Which means that, with an incredible amount of effort, the spammers will be able to get spam to the people least likely to purchase a product from a spammer.
There is no problem.
How exactly is attacking me going to help? Unless you yourself are a spammer? Since I make a living working on anti-spam and released POPFile for free I can't see how attacking me is going to make the spam problem any better.
Perhaps you didn't read the article: I am not a spammer, I work for a company that makes anti-spam software.
John.
Any spam filter used by more than a few thousand people will be disected and and used to make filter-proof spam by the spammers. I am sure Bayesian has lots of holes if you work hard enough to find them. Bayesian depends on constistency in patterns. If spammers ruin that consistency, they won't work.
Just the other day I found one spam that used a white font to put in legitamate-sounding text that would not visually show up on the screen. The spam text was a mix of graphics and pieces of real text. Thus, the word "penis" might start out with "pen" and end with a graphic for "is". Bayesian might start looking for the word "pen" after a while, but by that time the spammers will have a new trick up their sleeve. For example, if it looks for white fonts, then spammers might start using slightly off-white fonts, or black fonts on a black background. The combinations are probably endless.
Thus, by making my own, my gizmo is not the target of spammers. They don't know about my filter nor care.
The only alternative I can see is filter vendors constantly changing their algorithms every month or so, which would probably get expensive and risky. It is not like virus checking software that mostly just adds to their database and only tweak the algorithm a bit once every few years; it is like having to completely rewrite the virus filtering algorithms, not just the data.
Ultimately, I think some sort of monetary postage system is the only effective solution. ISP and backbone makers will only have an incentive to track down spammers if they lose money on anonymous or forged spammers. This will make mass spamming far less lucrative.
Either that, people will eventually find out the hard way that penis enlargers don't work and stop wanting to refinance their house. (I wonder if I can refinance all those expensive penis enlargers that I bought?)
Table-ized A.I.
It is silly to assume that all these people are just morons. After all, Viagra is proven to work, it is a legitimate product of sorts. The internet is there for hefty short limp (ahem ahem) non-digerati as well as for propeller heads, God bless 'em.
It seems to me that spam is the runaway bastard-child of something which actually is good and useful -- that is, targeted marketing to the willing. Don't throw out the baby with the bathwater. There is a huge legitimate market out there, just begging to be flee^wmarketed.
The anti-spam people are fighting against the Invisible Hand. Good luck.
He posted his "free-pass" words on the net.
Never mind that his last name is "Cumming".