Filter-foiling Gibberish Becoming A Spam Staple
hcg50a writes "Wired has a story about the random words which have recently been appearing in spam. Antispam experts agreed that this isn't a brand-new technique, but said the addition of potentially filter-foiling gibberish is rapidly becoming a common component of spam."
They keep spamming and we keep deleting... OH THE HUMANITY!
Have you hugged your penguin today?
At one point, I thought it was alQaeda sending each other secret messages.
Then I realized...everyone in the world was getting these things.
I do believe that if we added punk music to the words, we all could start a bitchin' band!
The next obvious step: a good grammar checker.
Gibberish no more!
I don't know who the marketing genius is that thinks I am going to buy something advertised in an email with this subject. Seriously, is anyone buying stuff from the "new" spam email with all of the gibberish characters in the subject and body?
"We can't solve problems by using the same kind of thinking we used when we created them."
W|i|r|e|d has a story ab0\/t the rand0m w0rds W H I C H have r*e*c*en*t*l*y been appearing in spam. Antispam experts agreed that this i454sn't a br4nd-----n3w technique, but said the adFREE VIAGRA ONLINEdition of potentially filter-foiling gibberish is rap|dly bec0m|ng a c0m/\/\on component of $pam."
apxxmyohofmnoatn fmkpo oixv a z gjs sc dnbxgbidlaaatooab yqlrwtta dupg o vx j n vyz aae xvm
this sig limit is too small to put anything good h
A lot of the time that "random gibberish" comes in the form of a story or something. Hell, a while ago I got a spam that contained a few exerpts from The Raven by Edgar Allen Poe. I got a laugh of that one.
My Mcafee Spamkiller ignores the white noise, and simply nukes all the mail containing viagra, etc.
Mencken had it right. So glad that's old news.
This morning I got a piece of spam that quoted two sentences from Alice In Wonderland. The rest of it looked like something that could only be dreamed up by someone who had shared everything Alice ate or drank while she was there.
The net will not be what we demand, but what we make it. Build it well.
Leave it to Wired to state the obvious.
Pfffft. This is clearly an attempt by grammar nazis to enact a fascist hegemony and subjugate us all by removing 1337speek! Infidels!
~Tirinal
"...gibberish is rapidly becoming a common component of spam."
Hasn't spam always been gibberish?
I'm a dreamer, the world is my playpen. But hey, I'm a serious person, I can't dream all the time.
"Most of the illegal-exploit spammers use hash busters and any other trick they can to get past filters, refusing to accept that people use spam filters because they really don't want spam," Linford added.
I really understand this part: going after people who are taking active measures against your enterprise due to their disinterest. Why bother to market to them at all? Is the rate of return worth all the ill will, DOS attacks and legislation?
They are sending sekrit instructions to al-spamda about where to hide the weaponz of mass distraction. Or who knows. Any government efforts to control steganography (like reported just yesterday ) better go after spammers first, or we have to wonder what they're really up to.
Spam filters get to look for the inclusion of misspelled words with SoundAlike(TM) technology and elite-speak words with LeetAlike(TM) technology and finally garbage with GibAlike(TM) technology.
Looks like I'm gonna need to upgrade my hardware for my spam filter.
I can see them doing this to overcome Bayesian filters, but why? AFAIK, Bayesian filters are not used much (if at all) on mail servers. These filters are run at home by geeks.
Granted, this may get them past the filters, but if somebody's gone through the effort of setting up a Bayesian filter, they're not going to buy your product even if you get into their inbox. It seems like a waste of everybody's effort, and I mean including the spammers.
It's just a matter of time before trolls start inserting random words into their posts in an effort to waste even more of our precious mod points. Can you imagine a new wave of ``fw: re: fw: Ffirst GARAGE MORTGAGE Ppostss"?
We just need a lameness filter for spam that looks for non-sequiturs and other crap like O.,b|f-u.s,c;a,t.e,d W,.o.r.d.s.
pi = 3.141592653589793helpimtrappedinauniversefactory7
...is knowing how successful this spam becomes. I get a lot of it, and I have to think that you'd have to be beyond merely dim or technically inept to take it seriously -- you'd have to be insane or have some sort of debilitating head injury. (Granted, that still may leave a lot of the Internet covered, but still).
Spammers seem to have a lot of success when they're emulating more legitimate sources like Ebay, Microsoft, etc., but I get spam now that can't even seem to decide what it's selling. The subject line says "get rid of mortgage payments" and the body is selling "V.I.A.G.01331.A." I'm not even sure what I'd be getting if I were dull enough to actually click on anything in the message. Heck, I'm not sure if even the SPAMMERS know.
I'd be interested to know if these spams are as successful as past efforts have been.
This doesn't seem to be a very effective spam technique. It works pretty well at fooling my "bayesian" spam filter, but the spam messages have gibberish subject lines! Who's going to read a message titled "deprecatory parrot bizarre dessert"? (an actual example)
One of my friends today told me about some spam she got. The subject line was Calypso Hypotenuse. She thought that was pretty cool if not completely random. Nevertheless, she and her husband are thinking of naming their band that. Sounds kind of cool for a band.....
Coming soon to a stage near you.....Calypso Hypotenuse!
No trees were harmed in the composition of this; however, numerous electrons were inconvenienced.
A Bayesian spam filter teamed with a standard grammar checker adapted from an open-source word processor.
It'll take more processing power, and lead to spammers following proper grammar in their pseudo-nonsense, but it's the way to raise the bar against this attack (making those spammers that can't clear the bar out of luck).
Reminds me of a Dr. Seus book...
RD
There is so much crap flooding my inbox these days that the spam filter is slowly becoming a whitelist of my coworkers and a few external customers. Hardly anything else that comes in is worth the time to look at.
I know that whitelists aren't the answer, but then nothing short of immediate execution of spammers is.
I have been pwned because my
anyone who has a hotmail account could tell you that gibberish is being used to get past spam filters. not that hotmail has an effective spam filter, but you get my point. gibberish to get past spam filters has been going on for a while = point
This should just make spam easier to filter out. Just run a spell check or grammar check as an aditional feature. The odds are that something important isn't going to have 25% of words misspelled anyway.
Let's see... There is translation software out there that has some basic understanding of grammar. :P
Should we add a grammar-filter to the list of things we look for it spam?
A large amount of incorrect grammar would increase the chances of the file being caught in the spam filter.
Of course, this would lock out most of AOL users from writing email... But is that really so bad?
I'm a dreamer, the world is my playpen. But hey, I'm a serious person, I can't dream all the time.
What are the more popular jibber-makers? Definately interested.
Break it up. This seems like it would be essential material for artists. Sort of like a William S Burroughs cut up technique--invoke the spammer whenever writer's block or a some hard transitions are needed. Shake it up.
The Custom Mary
Paul Graham mentions the technique in this article, pointing out that the Bayesian filters look for words that commonly appear just in spam or just in non-spam. The random words are common in neither, so are simply ignored by the filters. As a technique, the random words would get past a filter that looks for some spammy to non-spammy word ratio. But that's not how the spam filters work.
Small strings of random junk are a great argument for bayesian filters with a *really* large set of known spam e-mails. Most of the nonsense words are ~5 characters.
As long as it's short, they'll start repeating pretty quickly if you have access to industrial-scale spam gathering for your 'known evil' list of e-mails.
Even better, random words which aren't in the system yet are disregarded, letting the spams stand on their own merits.
"We have to go forth and crush every world view that doesn't believe in tolerance and free speech." - David Brin
In Soviet Russia, spam filters YOU!
The Braying and Neighing of Barnyard Animals Follows.
For example, take the word "Byzantine." This is a very non-spammish word. However, if you've never received a legitimate email containing the word "Byzantine," your Bayesian filter will not have it in its dictionary, and the word will be ineffective in "tricking" the filter. The red herring words only have an impact if they are relevent to your actual mail sample. Since everybody's email communication is different (some of us are programmers, some of us are literature majors, etc.), this is a real sledgehammer approach to defeating the filters -- and it's extremely ineffective.
This technique just proves that spammers don't understand the theoretical underpinnings of current Bayesian anti-spam methods. Otherwise, they'd be using much more common words as red herrings, instead of these extremely rare, and therefore insignificant, words.
I personally use a spam filter of my own design which is based on information-theoretic and neural network techniques. It kicks the shit out of spam, even the messages that include these stupid red herring words. The spammers once again prove that they are morons, incapable of understanding how anti-spam technology actually works.
That's pretty much the only kind of spam I see anymore, because the rest gets filtered.
But while it may have some success getting around filters, I have to wonder how effective it is. Who would seriously consider buying something from someone who writes like this: "vi-agra in dustbinnew pill at cheap xkakcla"? Add to that the fact that the existence of the filters in the first place is a good indication that the recipient is not interested in doing business with spammers. The hit rate must be orders of magnitude worse than the already miniscule rate for conventional spam.
...and filtering out messages with misspelled words grammar problems. Then again, we wouldn't be able to communicate with other Slashdot users. Hrmm...
Probably (-1, Redundant), but this has been happening for a while. I've been getting emails with about 500 random words for months, the interesting part is that my mailer (pine) never showed the HTML stuff that actually had the ad part (it's usually badly malformed). So basically I would just see (whenever they made it past sa) an email full of random words, which I didn't really understand the point of.
Then the other day a coworker showed me one he got; he had apparently never seen them before (or his spam filters are better than mine), and mutt did show the (raw) HTML stuff with the actual ad in it. All those messages made a lot more sense than they had.
The solution to randomness is to spell check and grammar check incoming e-mail, and consider violations as cause to ad points to the score indicating that it's spam-like.
Sure, a few strange words might be a name that's not in the filter yet, but pure gibberish should be a red flag that either somebody's cat walked on the keyboard, or there's spam going on here. Heavy use of "non-spam" words can override to indicate it's good mail... but a poorly composed mail that doesn't use language seen in friendly mail is highly likely to be spam....
Spam is a perfect carrier for steganographic data since it's broadcast to millions of people and nobody can fall under suspicion merely by receiving it. When the government wants to monitor people's communications to search for steganography, when they don't do anything about spam, the purpose of the monitoring is probably not the stated one.
--
Still looking for an email replacement...
I don't see this causing much of a problem for filters. Just check to see if the words are valid. If they're not, chances are you are not interested in a message with random garbage.
-You may license this sig for only $6.99.
could it be used on politicians?
My understanding of Bayesian analysis is that it puts together lists of words - one list for each words appearing in all messages marked "not crap", and one list of all words contained in all messages marked "crap". Incoming messages have their content compared against these 2 lists, and a semi-intelligent choice is made; if the "crap" content of the new message is above a threshold, it gets tossed.
By adding all these bogus words, could they be trying to make our Bayesian tools grow to the point where they're infeasable to use? If I have to check each message against a word list that's grown to 10MB (mostly with nonsense words like "ugumaquatii" and "skjfghak"), you can see the how things could start to choke...
Any thoughts?
Spammers are a global nuissance causing tens of billions (or more?) of dollars of wasted time/energy to carry/store/delete their crap. Rather than blow away folks in Iraq, why not spend 2% of that money tracking down and assasinating the cretins behind this global scourge?
Just take the f*ckers out. No trial. No jury. No more patience. Just end it.
This is what I love about bayesian filtering.
Because it adapts, each new technique the spammers try ends up diluting the effect and ruining it for all spammers. And because they're greedy and will sell each other out without hesitation, it's basically using their own motivations against themselves.
Might as well put in a plug for my favorite bayesian filter: ASSP
Now how am I supposed to enlarge my p3n15?
A tip: save Eva's pita.
..and it doesn't work. I get entire poems and even got half of "The Wizard of Oz" in a spam one time.
SpamAssassin (up to date, with a few addons) catches every single one of them.
The only spam that has gotten through in the past 2 weeks was a spam where the spammer forgot to include the actual spam *content* - it was a blank email.
I have baseless theory that the sole purpose of spam is to sell lists to other spammers, who sell lists to other spammers etc. There is no product behind them any more: it is like pyramid marketing.
There is a historical precendent (according to an old copy of OMNI) for this: a company that sold nasal hair clippers by mail in the seventies made the bulk of its money by selling mailing lists of the nasally clipped demographic: the (albeit extant) product was just to assemble the mailing list.
randomly grab a paragraph from a book and include it with the spam.
It would also help spammers to write better pitches. Use real words, actual English but put it in narrative real world sceneario format. So it reads like someone you know telling you how they use such and such a product.
"I went up the cabin last week with my girlfriend and tried out those new pills I heard about while I was there."
There's pretty much nothing in there that would be filtered. And then a slight plug of the product name with a link and you're done. It's also Marketing 101 that the less of an ad sounds like an ad the more effective it is.
But none of that thwarts my method which is to filter based on the URLs of links found in spams.
I get virtually no spam with a Mercury rule file that's all of 23KB and grows very slowly as spammers use new domains to host their product pages.
Ben
Work Safe Porn
The article doesn't do a good enough job of explaining the different techniques in use.
First, hash busters. Yes, spammers are loading a random jumble of meaningful words in meaningless sequences into their spam, usually in the plaintext message body of a message with HTML content (i.e., you get hash buster - html message with spam content - hash buster). So HTML-aware clients (the main clients targeted I'm sure are AOL and Outlook Express) show the spam message, but not the hash buster. I'm guessing that this is specifically targeting bayesian filtering tools at AOL (anyone know if AOL is using a bayesian filter?); it works by introducing words that would not be found in a spam corpus in greater numbers than those that would.
Second, noisy spelling, like v1@gr@. Obviously this is also intended to defeat regex-based filters like spamassassin. If you vary your cliches enough, and you introduce very strange, but easy-for-a-human-reader-to-recognize spelling variants, you make it much more difficult for filter writers to write effective regexes.
The real problem will be when the spammers finally figure out how to deliberately poison the Bayesian filters. So far they're using more-or-less random words, but that won't really work against Bayesian; it can tolerate that.
However, what constitutes "non-spam" is not as unique as most people think, as I've examined here. If they figure out how to deliberately put in hammy words, Bayesian will fall.
I feel OK posting this because I freely admit to this point I've overestimated them; I'm sure spammers have read that piece, and to date they have been too stupid to figure out what I said in plain English. But sooner or later one of them is going to figure out.
There's a strong core of "ham" that is "ham" for everybody, and sooner or later they're going to start abusing that.
And if I may forstall one objection... "But you don't understand Bayesian, it's [awesome for some reason and can't be beat ever, by anybody]" - I'll listen when you've actually written a program to examine filters yourself, OK? I understand it pretty damn well. It'll take more then bald assertions to convince me I'm wrong, I've done actual research, in the original sense of the word.
I thought about this after seeing my inbox spam increase to about 80 a day (the box that contains what is filtered is usually 10 per hour - my adress has been valid for just short of 10 years).
/usr/share/dict/words? I thought about trying this out, but have been too busy to get off my ass and do it.
Why not check the subject or first few lines of plain (not html) text and see if 80% of it is in
I saw one just yesterday that contained a list of important key sentences and phrases from the literature of common charities and political activism organizations.
In other words, if your Bayesian filter accepts those, based on your past decisions, it will detect the spam. If you reject the spam, you reject these communications as well.
Good filtering practice would dictate that one reads the junk box carefully enough to find both false positives and negatives. But the sheer bulk of mail that ends up in the junk box makes this unfeasible for many.
I have started letting these particular kinds of spam through, manually categorizing them (many words of random strings, dictionary vocabulary attack, positive phrase attack) in the hopes that filtering technology will soon advance to the point where these can be used as inputs to a more intelligent system.
Of course overhauling the mail system is a prerequisite to solving any of this long-term. For once I don't mind D. J. Bernstein's Internet Mail 2000 proposals. Of course there are other proposed systems, none of which has enough momentum to start a slow steady change. The end result of any non-consensus system will be to fragment the worldwide network of Email into competing, noncompatible systems that need to communicate through some kind of loophole or gateway. Back to FIDO-net days.
Why not simply filter out leet speak, or any message with more than half of the words misspelled that isn't encrypted?
You can't judge a book by the way it wears its hair.
It is not very often that people send random giberish in e-mail. Why not look for the gibberish. Hell even MS word can detect gibberish, I think a spam filter could score a message on non linguistic gibberish.
You put Viagra in there in unaltered plain text.
paintball
... now my Bayesian filter is throwing out all email from my Lewis Caroll quoting friends! Thanks a lot, spammers!
"Freedom means freedom for everybody" -- Dick Cheney
Agreeing with this article, over the past week or two I have seen excessive about of spam being missed by SpamBayes, even after marking them as spam for improved filter, they continue to hit the inbox whereas previous absolutely no spam made my outbox. Additionally, there may have only been 2 or 3 emails marked as possible spam when they were not. And zero items mark as definite spam that were not.
SpamBayes has worked great previously, but now even it is falling short.
I feel as the spammers manipulate the conents/context of the spam, it will eventually become impossible to determine the difference without physically looking at 500+ email daily.
My primary use of email is business and not personal, therefore I cannot risk missing a client email, payment, question, etc... I've also see a progression of clients having MY emails deleted or caught in spam filters due to the business aspect and requests for payments. I feel this is primarily due to the comparison of too-often-common-phrases that a spam email and a business email contain. Such things as Click here to submit payment, or Buy these Products, Overdue etc... Even though all clients I email are only clients that contact me. I never cold-email anyone.
More spammer are using this random text as the only text in the subject and body, and using an image as the content of their email, which makes scanning even more complicated, if not impossible.
Being on the net prior to what is is today (going on 20 years), I often wonder how much control the spam actually has over the net in several aspects
- If spam were to disappear, will overhead costs decrease that greatly in order for ISP's to pass along higher saving to the consumer?
- If Spam were to disappear completely, how much faster would the Internet be?
Has anyone ever done a study to determine how much effect spam has on degrading the net, and what would it be like if all spam was gone tomorrow?Never try to beat a professional at his own game!
Needless to say I was mildly amused. P Hilt0n Vid
Visit site (topright lin!
EExceppt for specific coompaatiibilittyy mmodes (chhainn-loading and the Linuxx piggybbaack foormat), all kkerrnels willll be staartted in mmuchh tthe samee statte as inn the MMultibooot Specciifficattion.. Onlly kerrnels loaded at 11 meggaabbyyte or aabove are ppresentlyy supported. Anny attemppt tto load beeloww thaat bounddaryy will simmplly result in immeediaate failuree andd aan erroor messagge reportinng the problemm. .
Insert four or five lines of valid extra text -- lines from books, selections from recent USENET postings, etc, etc -- into the spam. Make the selection semi-random. Now do it 100 times and send 100 copies to each person on the mailing list.
One of them will get through. And the spammers will continue to work.
My friends have been accusing me of emailing them randomly generated streams of dictionary words for years...
org.slashdot.post.SignatureNotFoundException: ewg
AFAIK, Bayesian filters are not used much (if at all) on mail servers.
Our CanIt-PRO product does server-side Bayesian filtering, and different users can have their own personal Bayes corpus.
I use a yahoo email address for newsletters, registration, etc. I got maybe 5 of the nonsense word spams a couple weeks ago, marked them as spam, and every one of them's gone into my bulk folder since then.
Of course, Yahoo's false positive rate on newsletters is atrocious, but it's easy enough to pick those out and then empty the bulk folder.
Just curious, anybody know what Yahoo's using for spam filtration?
-----
Point and Counterpoint: The Tick - "Spoon!" Neo - "There is no spoon."
I've actively been using the bayesian filter that Mozilla comes with for a bit over a year now. Although it seemed to take forever to get 'trained' to what I consider spam, I've found that it works exceptionally well, maybe mismarking a legitimate email to me probably less than a dozen times so far (after the initial round of training).
Maybe six months ago, I noticed I was receiving quite a lot of these hash busting spams and I was bummed that maybe the bayesian filter wasn't the be all end all of spam filters.
But I pressed on using it, and in time, almost all of the hash busting emails are again getting filtered as spam.
I'd guess there are only so many different ways people can write Vi@gra and still have it be readible...
SuperTux
1. Wow? Spammers subvert content-based filters? Say it isn't so???? Get real!
Client-side filtering is a band-aid on a malignant tumor growing out of control. It will NEVER work, EVER. It requires constant updating and monitoring to avoid blocking legitimate e-mail and is a black hole of resources, time and money. Because of the ROI, spammers have more incentive to crack the filter than filter companies do to block the spammer.
If you're using client-side (or even server-side), content-based spam filtering, you're only hurting yourself. It's better to get a few spam messages than miss a critical communique, which can cost you a lot more. But feel free to piss in the wind - it seems to be in style anyway.
RBLs, and specifically Spamcop's Relay Blacklist are much more effective than content-based filtering.
2. Spammers break into systems, STEAL bandwidth and network resources. Almost all of them break various laws in virtually every region they operate.
3. The authorities are too busy detaining little old ladies at airports for posessing a fingernail clipper, suing 13-year olds downloading Bobby McFerrin, and raiding Tommy Chong's house to care.
4. Spam will disappear when the major network providers endorse a centralized SMTP whitelist. The reason why nobody talks about it, is that it's a cure for the spamedemic and there are a lot of companies out there, including all the ISPs that profit from spam.
http://www.wired.com/news/technology/0,1282,61742, 00.html?tw=wn_story_related
By the looks of things, they are going so far as to identify the most active spammers, and hunt them down.
Score.
Frink: Nice try floyd, but you were designed for scrubbing, and scrubbing is what you shall do.
a while ago I got a spam that contained a few exerpts from The Raven by Edgar Allen Poe. I got a laugh of that one.
...never more ;- )
You can't take the sky from me...
I use random words for subjects quite often. I consider it a form of poetry when i'm writing to friends.. not completely random, but thought provoking in a semi-sensical way. Guess I'll be filtered out soon..
I recently re-evaluated my antispam blocks. Over the xmas holiday there was a very noticable increase in the amount of crap slipping though my defenses.
I ended up tweaking a few SpamAssassin rules to deal with what is popular (with the spammers) at the moment. This will need to be adjusted manually as the spammers change tactics. SpamAssassin is scanning everything after the DATA ACL while still connected so it can deny the message and not bounce it to some poor schmuck being joe-jobbed.
I also made a few changes to the blacklists. sbl-sbl.spamhaus.org is now my all time favorite blacklist. I also block all of China and Korea. I don't know anyone in those countries and they constitute a large percentage of all spam. Yes, I know the U.S. is the biggest source of spam, but I can't exactly blacklist my own country and expect to get email from friends and associates.
Another major change was to stop accepting email from dynamic ip addresses. This forced me to add a condition to allow one friend's server to send to me. I've since removed that exception as he's finally listened to reason and is routing his stuff through another mail server he administrates that is on a static ip address.
In the week since I've tweaked my settings a total of 3 or 4 spams have made through. Zero would be nice, but that isn't attainable without a serious risk of false positives, and probably not even then.
Finally, there's my personal blacklist. People or companies who annoy me too much end up in there. One was for an online magazine my wife signed up for that doesn't seem to have way of unsubscribing. After several futile attempts by my wife to get them to stop sending their stuff I stuck them into the blacklist.
On a few occassions I've blocked ip addresses at the firewall. Spammers using software that does't recognize the "bugger off" error code, NameProtect.com just because I don't want them snooping around on my system, and the occassional script kiddie.
I guess you can say it's a game. I keep score by comparing the number of "rejects" in my logs to the amount of spam getting through. I'm winning by a long shot (250 to 4 according to my current log).
-- Will program for bandwidth
What I don't understand about this type of spam is that often it doesn't contain any actual advertisement, just three or four lines of random words, and the end of the email right there.
I don't get it. If you're not selling a product, what is the spam for?
Mind you since TMDA, I haven't been seeing any spam anyway.
Karma: It's all a bunch of tree-huggin' hippy crap!
It's old fashioned, and some of you will probably make fun of me for using it, but hey, I'm old school. FYI, here's my method:
;)
1. Create manual spam filters (NOT beyesian filters) in your inbox called "Friends and Family", "Work", "Services", "logfiles", and any others you find you need. Each category applies to a broad type of email address you'll receive email from. Then create a subdirectory in your inbox for each of these filters (named the same way, naturally).
2. For each filter, build a list of people who are allowed to email you. For example, your ISP, your bank, and your phone company would probably be added to services. Just add the email address they send their messages from to the list.
3. For each filter, have the filter move messages matching the filter (From equals ) to the correct subdirectory for the filter. Then stop processing for that message, so it doesn't get interpereted by other filters. Think of this as an analogy for ipfilter or ipfw in your firewall setup -- only you're filtering emails instead of packets.
4. Finally, DELETE EVERYTHING ELSE in the very last filter.
You USE this approach by doing a quick scan of the deleted items folder to see if anything is interesting. If not, just clean out those deleted items. It's a one step operation, much easier than selectively deleting a hundred emails one at a time.
Then, you scan each of the folders you set up, IF the folder has picked up an email, focusing only on your REAL email.
This approach has saved me a HUGE amount of work lately. My life is a whole lot easier, and it's way easier than trying to train a Beyesian filter. If I don't know you, you can't get too much of my attention.
It's all about being on the list, sort of like getting into a nightclub...
Farewell! It's been a fine buncha years!
Since the weights are fixed, it would be trivial to include random hammy words to get past it. It's a good example of the failure of security by obscurity - it wasn't difficult to reverse engineer the word list, and once the secret's out, it's easily exploited.
I don't think the random words are likely to work on real, adaptive spam filters, though. At least not on the one I use.
Litigious bastards
Just block the domain name/ip of the hosted images. Most spams I get come from random IPs but usually have common IP/domain name for the hosted images e.g.
hostz300001.com/ads/viagra.jpg
Or whatever. I've cut down from 50 spams to about 3 or so a day by doing that.
I bet a bayesian filter would work nicer but unfortunately I'm too lazy to mod the mail setup [that isn't mine] to get one installed..
Tom
Someday, I'll have a real sig.
I got one of these yesterday that was supposed to be from Citibank. Practically every other word was garbled. However, based on my experiences with the incompetence of Citibank, it could have been from them. However, asking for my cash card information and PIN was a bit much, even for them. Also, it was in my spamtrap Yahoo address. Yahoo has the worst spam filtering of all of my email routings.
Actually, since I did have a Citibank account a while ago, and since some of the details of the spam did match Citibank business procedures, I actually wonder whether their account information may have been compromised. Hopefully, it was just a random fishing expedition, though I'm certain the ethical aspects and legal would not worry the spammers.
(On the Citibank topic, it REALLY did take me about 6 months of hard effort to get all my money out of Citibank and get all of my accounts closed. An amazing experience filled with quotes like "but we can't give you cash unless you pay extra" and "we don't offer that service today". AFaIK, they only overcharged me one time on one of the Euro transactions...)
Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
...but if I told you, I'd have to kill you.
Seriously, there's a bit of an arms race going on, and filtering is one place where the open source approach often serves the enemy. Once they see their weakness, they find another way around. The best spame filters these days are of the every-man-for-himself variety.
http://www.spamulator.com
That's nothing... I use a filter of MY design that not only kicks spam, but hog ties it and notifies the appropriate authorities, sends a virus to the spammer's computer that causes it to spontaneously combust, and composes a politely worded letter to the spammer's mother informing her of her child's inappropriate behaviour...
I think people are using "random" to describe these attacks, when they're in reality not at all random.
There is one kind I call a "vocabulary" attack which uses words selected pseudo-randomly from a dictionary.
There's another one I just call "misrepresentation". It includes key phrases and sentences from a specific type of literature, say political activism or charities.
There are indeed attacks I classify as "random" that just spew forth strings of random characters.
The danger of any of these is that eventually the pools of spam and non-spam weights will get confused. In theory one can go back to the junk box and correct false positives, which would force the filter to start disregarding anything common to the false positive and desired emails. However, I can't be alone in that my junk box gets so massive that I just don't have time to do this regularly, and I can't go through every last message to separate real spam from annoying but requested commercial mail that I might want to hear from again.
Fragmentation of Email into useless subsets of the whole network is where this is going. Like another poster, I have had to resort to using a whitelist for lots of my work. But I do have need to field unsolicited mail from people I haven't met, so that isn't a real solution. The only real solution, I fear, is to remove the part of the brain that makes humans selfish even to the point of destroying the systems that give them a free ride in the short term.
I'm glad that the spammers are fighting back against the filters. Because then the filters will become better. And the spammers will become smarter, and the filters will become even more sophisticated. And so on and so on.
Eventually, we'll end up with filters so sophisticated that they'll become true AI! Finally, HAL will become possible, all thanks to your friendly neighborhood spammer! Thanks, spammer, you're a dear, dear friend.
- fader
I've also had some Alice, but today I learned about North American beavers. I had no idea they were so large.
That's exactly why you need to ENL4R9E `/U0R P3N1S!!!1!1 because North American women have 1arqer beavers and thus require a bigegr PE/\/i5 to st!mu1ate them.
by the number of hits to the site linked in my sig.
Ben
Work Safe Porn
I keep praying for that silver bullet that will end spam forever.
The thing that seems so insane about spam is that it's gotten to the point where apparently all spammers care about is getting past your filters. They must know that you're going to delete the message the moment you physically set eyes on the word "\/1A6RA," but it's as if they don't care. They just want to induce you to look at the word, and force you hit the Junk Mail button or Delete key. They just want to waste your time filling your Inbox with their insane crap.
It's like they're nasty little demons spitting up madness from the bowels of hell for the pleasure of their horned master. I can't picture a spammer as a human being at all... I always imagine hooves and a pointy tail, a slimy, crooked red finger pushing its sharp, black, malevolent fingernail into an eagerly pulsating "SEND" button.
Read any interviews with these people? My god, they really are monstrous. The arrogance, the pomposity, and the self-justification spewing from each of their mouths combine to form a portrait of a person so utterly bereft of morals, ethics, or humanity that I just want to clip the spammer's photo out of the magazine, scan it, and send it to X-Wipes to be made into toilet paper. I'll let you imagine the rest.
I've said it before and I'll say it again... spammers have done more than their share in turning the wonderful information highway into a sleazy backalley of filth, perversion, and fraud. Every day as I wait for my email client to download and process the two hundred or so spam messages that are clogging up my inbox, I sit in silent hope, praying that someone will find a way to end the madness at the source, and cut the spammers out of our lives forever and ever, amen.
You are in error. No-one is screaming. Thank you for your cooperation.
Mortgage Enlargement!
Reminds me of a Dr. Seus book...
Seuss, or Joyce?
The real benefit is to the spammers. They can put inline images that make the email look like it came from a legitimate company, they can have the text version look random, but the HTML rendered version human readable. Almost all spam is going to be HTML, and my experience is that 95% of HTML mail is spam.
Which means that if we filtered HTML most spam would go away overnight, and the bandwidth wasted by the remainder would be significantly reduced. We would also significantly reduce the security risks. Unfortunately the lusers that use services such as Yahoo! would also be filtered. I wonder if the decision to default to HTML is purely to satisfy the general customer, or a feature targeted directly to facilitate advertising.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
I've been receiving spam regularly containing words seemingly designed to taunt eschelon.
Here is an example:
metcalf executor cancerous guatemala emblematic parliament colonel saratoga auric lazybones astonish cabinetmake diatribe middleweight remorseful anharmonic
aztec codomain kulak grownup jumble silk buffalo kill ignition cubbyhole circus colonist calamitous creamy customary polarogram harvest equipping grandnephew andrea sachem inquisitor flout cowan fleet juridic sherbet collage apathetic proud familism histidine pomona arcadia galveston guillemot
fishmonger agrimony anabel persimmon aileron fitzroy epimorphism hale proper corpse paula convivial bakhtiari flounder renovate bleeker bump edgy ensemble police geoduck merchandise ellison hospice propel resolve citric floorboard
brouhaha hitchcock ilona midas captor evict indestructible adventure confront despoil barony executor periscope client shove madman horde merrill radiochemical generous
impassable khaki globe compendia copyright brooklyn pleiades charles painful airfield econometric church bacterium sainthood chard hazard inbred debtor rankine dadaism executor alistair apocryphal bergman bootstrapped grub
inadequacy homework caine audubon contemplate dorset eleazar corny raritan ozark insecticide leo monomer hearst catenate bloodshed enrico abash expurgate elicit cambric lise gadfly scruple adore guano drunk cessation conscience grantee bedbug burt
hessian dyeing equilibria everlasting cork crud camellia forklift breathe ingenious catchup bless aluminate fluoride hypoactive diagonal cosponsor dadaism bernadine chide edematous phil occasion antennae l insurance
adsorptive armada passionate phosphide cabdriver cordage congresswoman arden crocus cookery
gnomonic creamy pediatrician inert senior retardation cosmopolitan input bound necrotic flipflop du annex albacore linseed alphanumeric mollycoddle kennan adrenal sheffield giuseppe budweiser
huff partner descriptive riggs cezanne dogwood councilwoman had amend holystone arsenic activism carbonic conflagration inferno madcap infertile glissade deneb malnourished chapter corpus pasadena ingersoll gauche mozart antecedent persevere keypunch negligible galvanism prometheus realty broadside detail articulatory gloomy forensic dilemma
Weird. I am talking about this at the MIT Spam Conference on Friday and on a technique that can break a Bayesian spam filter.
John.
Oh freddled gruntbuggly,
Thy micturations are to me
As plurdled gabbleblotchits
On a lurgid bee.
Groop, I implore thee, my foonting turlingdromes
And hooptiously drangle me
with crinkly binglewurdles,
Otherwise I will rend thee in the gobberwarts with my blurglecruncheon
See if I don't.
create a Spam filter that uses a dictionary that words can be added to. If a message has a certain percentage of unknown words in it, consider it possible Spam. Try to include every possible word in the language used for the dictionary.
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
Mailscanner is a spam filter,
It's free, easy to setup and works really well.
The maintainer is really helpful too. Mailscanner website
I've wondered why Bayesian filtering didn't also include word pairs as input. Doing so would mean that it would be more likely gibberish and actual language would be easier to distinguish, since using pairs (or even triads/trios if absolutely necessary) maintains some of the word order statistics for the Bayesian filters to key off of. Also, lots of spam now separates letters with spaces or punctuation to fool filters that would key off words. Using word-pairs would identify these types of spam easily, since the bulk of legitimate mail won't have word pairs like "v-i" "i-a" "a-g" "g-r" and "r-a".
Another input I wish Mozilla (or other bayesian filtering systems) would include is a dictionary look-up on words, then input the statistics of the message. For instance, a message where > 60% of the words don't match my english dictionary and 40% do match is most likely spam in my mailbox. This additional stat would give those filters more power.
SO I wonder... Would adding these things to existing bayesian filtering systems solve this issue to some degree? My gut instinct is that it would.
I've found that by filtering on "http://" I can kill basically ALL spam, since it's always links to some site or other.
of course, this isn't so great for getting links from friends, for that I have a whitelist.
I guess obfuscating their own message so much to foil spam filters has caught up with them, as their message is lost in their methods.
A feeling of having made the same mistake before: Deja Foobar
Then again, we wouldn't be able to communicate with other Slashdot users.
You say that like it's a bad thing...
I have had my main e-mail published and unchanged since 1995. It's probably on 99% of all spam mailing lists. One of my servers handles about 600 POP3 accounts. My stats currently indicate that now more than 80% of our SMTP traffic is confirmed spam.
I don't believe in content-based filtering. We have a strict policy of not examining in any way, shape, or form, the content of any e-mail on our network.
We deal with spam by implementing an array of fully-tested, fairly conservative relay blacklists which block the inbound SMTP connection before the junk mail is even transmitted.
In more than two years of operation, we've only confirmed about six legitimate e-mails that were blocked, and we handle tremendous mail volume. It's an easy matter to "whitelist" anyone who might end up getting RBL'd to make sure the client can communicate with who they want. In EVERY case where a legitimate source was blacklisted, it was shown their ISP was irresponsible and the listing was valid.
In addition to using RBLs, we also have an array of hard-coded IP blocks that our server will not accept mail from. This covers a good bit of the rogue Asia-pacific ISPs that are the largest source of open relays. Something as simple as blocking major portions of 61.* have shown to reduce spam by 30+%. Anyone legitimately in China that needs to communicate with our network can be quickly whitelisted. Ironically, most of the ISP SMTP relays are not near the same broadband IP ranges - they obviously know how effective this technique is.
With RBLs and hard-coded spamming in effect, instead of 200 spams a day, I might get 3-5. As soon as I get new spam, I report it to Spamcop, and I notice a quick reduction in future spam of that nature immediately.
We're now getting near the point of blacklisting the entire 24.* IP block as well - which encompasses, among other things, a large portion of Comcast IP blocks that Comcast can't or won't control.
I'd like to see more ISPs simply refuse to accept mail from rogue networks. Then these networks would have to be more responsible.
Let me preface all this by saying our policy is to whitelist anyone who complains they have legitimate mail being blocked. For some strange reason, we don't hear any spammers making these requests. That's a shame because I'd be happy to visit them personally to make sure their situation is resolved in a mutually-deserving manner.
includes sourcecode
Mercury Mail's session logs indicate a closed connection to indicate where e-mails begin and end but if you're using something else there's a RinetD mod with source which logs e-mails in such a way so that ripping through them is easy.
My filter is all of 23KB and I get virtually no spam. I update every once in awhile when a spam gets through.
I also have a couple sub-domains that point to a spamcan on my home connection which I use to bait spammers so I can preemptively filter them out without paying for the bandwidth.
Ben
Work Safe Porn
First, a number of large sites are using Baysian filters now, such as AOL and MSN. More will follow soon.
But will gibberish, or even something like Alice in Wonderland really make a difference? No.
The term for that "stuff" is noise.
We have years of research on noise:signal problems. There are plenty of ways to find the noise in a signal, and then apply the filter to that. A lot of that noise is already filtered out when one applies HTML filters on it- dehtmlizing or HTML -> text often does the job of reconstructing the message. Jibberish characters add nothing to the spam score and anything else can be addressed as above.
Even with the gibberish words though, an old version of Bogofilter's still giving me very good spam filtering. I get some 10-20 spam a day, and I see one in my inbox every 2-3 days. I see a false positive in my spam folder maybe once every two or three months.
It doesn't seem to be effective at much. I am not really worried about it breaking our spam filters. Not yet.
- Serge
Here's what I've been thinking about lately: Do spammers actually make any money from spamming? Seriously -- I'm starting to wonder if there's something different at work here.
Because e-mail is so cheap that it costs practically nothing to send a million spam e-mails, are spammers spewing their crap (and ignoring the near zero response rate) in hope that some day the money will start rolling in?
Think about it -- every week millions of people plunk down a few dollars for lottery tickets. And even though they never win anything, they keep buying, week after week, month after month, year after year. Why? Because it's such a small amount of money that they figure it's a small price to pay for the chance to win millions.
I'm beginning to think this is the same mentality that is driving spammers.
I think anyone who would be willing to buy something from spam once they saw bunch of misspelt words that would turn them off. I know If I was about to buy something off a website and I saw a ton of misspelt words it set off red flags in my head.
Here's something weird I tried (yeah, I'll admit it... I was drunk). Gibberish is high in entropy and hence doesn't compress well.
So, you can strip out things like headers, whitespace, HTML, convert everything to lowercase, and run it all through gzip. Then take a look at the percentage the message was reduced by.
What I found that (not surprisingly) legitimate mail with normal words in it reduced by a significantly greater % than spam with lots of gibberish in it.
I get a lot of spams with contain 3 random words in the subject. Currently, I collect the subject lines in a text file and arrange them to make poetry. A few sample verses:
i'll take this
open window into
imflammatory tales about
pieces of herring
shooting caused panic
that surely only
constituted a prelude
or else maybe
had ever happened
It ain't working... mainly because it has absolutely nothing to base the words on. I'm getting a reasonable amount of false negatives where the bayes score is 40-60% sure it's spam. I'm thinking of upping the SpamAssassin score for that, but it's kind of not a good solution.
I know people are working on various rules to check number of consonants and average length of garbage words... interesting chase.
I really wonder how effective the actual spams are though. When you see garbage in your inbox do you even bother to open it? My wife honestly thought something was corrupt and just deleted the messages. I guess I don't see the point in this type of spamming (not like I entirely get the point of any other kind)...
--D
Spam will disappear when the major network providers endorse a centralized SMTP whitelist. The reason why nobody talks about it, is that it's a cure for the spamedemic and there are a lot of companies out there, including all the ISPs that profit from spam.
And who decides who gets on the whitelist? You? The government? People with lots of cash? Microsoft? AOL? Will an ISP in an axis-of-evil country be allowed to be on the whitelist? ISPs already write pink contracts to allow spammers to use their bandwidth, what makes you think cash won't change hands to get the spammers whitelisted?
Whitelists also assume that e-mail can't be forged... we're not there yet (not until reverse-MX and sender PKI signing come into play).
Centralized whitelists are too broad. Companies that might be on your whitelist are not necessarily those that I want on my whitelist. (In other words, I don't trust the people who adminster whitelist X.)
On a limited, local scale, whitelisting works well because it's distributed and hacking one list doesn't get you very far. However, as you add more customers of the whitelist, you become a larger and more attractive target. (To hack a whitelist for 100 users is a waste of time, to hack a whitelist of 1,000,000 users is well worthwhile.)
Wolde you bothe eate your cake, and have your cake?
If you've ever had an argument with a sp@mm3r, you know how self-righteous they can be. They have a right to "freedom of speech", they are just trying to run legitimate businesses, yada yada yada. And you know what? I'm beginning to think they have a point! Think about it...
First, I demand that I retain ownership of my own inbox.
Then, I take a stand against the raping of open proxies and abuse of malware-infected zombies.
N0\/\/, I ha<!-- cobalt liqueur -->ve the g@<!-- vixen nuclear -->11 t0 s.a.y..t.h.a.t U51NG R/@/N/D/()/M g1bber<steamboat>ish +0 @v0iD f^i.l*t,e.r\s i|s w.r.0.n.g.
Mary had a little lamb;
Its fleece was white as snow.
And everywhere its address went,
The spam was sure to flow.
My, my. What won't I do to destroy healthy, legitimate, all-American Internet commerce?
Please Help a Schizoid Genius!
It seems like it would generate a lot of false positives. If you train on computer lingo and someone writes you some poetry, won't that get booted? Perhaps it's not as good for those with eclectic activities. White-listing or challenge response would be the only ones I'd consider if I start getting too much spam.
-Libertarian secular transhumanist
Under IPv4, rogue relay blacklisting creates a substantially more-restrictive environment in which spammers can operate, as their available IP space continues to shrink. As more systems become more restrictive, they run out of places to hide. You can see light at the end of that tunnel. There is no light at the end of the tunnel with Bayesian or other content-based filtering.
There are likely exponentially less combinations of rogue source IP space than there are keywords in message content that can be controlled.
Content-filtering is a battle that loses over time; RBL blocking is a battle that wins over time. The only thing that would change that fact would be the additional IP space that IPv6 would introduce, which would be a complete nightmare.
I thought the first link was actually going to be Random Words like it said. Needless to say, I was disappointed by the appearance of some...article... I usually never read those...
Canadian Cynic, canadian politics is less boring than you
They take too long to configure. My ISP just blocks traffic comming from open relays and addresses that are Asian. Shake, stir and add that to knowspam.net and you stop getting spam, period. Best served cold...
In the past many ISPs would add filters and NOT tell the users they were doing it.
Now a days however ISPs (most notably Earthlink and MSN) advertise spam blocking as a feature.
If people wanted this stuff you'd think non-filtering ISPs would advertise "You get ALL your e-mail".
But back to the original point. Spammers have used misleading topics in e-mail if only to make sure you don't delete the message. That and creating spam lists based on people who DO NOT like spam or of people who have manually opted out of spam lists.
The people who actually make money with spam don't care about selling products via spam as they sell spam services. The people who sell stuff via spam aren't making money becouse they are reaching markets who are wholely disintrested in buying stuff from them.
I don't actually exist.
Listen up. Here is how we can solve the spam problem once and for all.
Turn on finger. Yes you heard me. Let's re-implement finger. Here is how it works.
My SMTP server gets email from joeblow@123.com. I finger joeblow@123.com. If 123.com says joeblow is a real user I then accept the email, other wise I can it.
Voila! No more forged headers, no more spam.
This very simple simple solution would also allow legitemate businesses to send spam to the people who have opted in.
War is necrophilia.
I have decided to build my own spam filter. A slim preview mode will show keywords in context that I have deemed significant in determining if it is wanted mail or spam.
I don't expect this self-made system to be "better" than commercial filters from a technical standpoint, but it will have the advantage that spammers will not try to work around it because only me and a few relatives at the most will end up using it. Thus, it will not be a target of reverse engineering by spammers.
Table-ized A.I.
How many ham/spam messages did you train with? (I trained on a few thousand of each... with 9000 spams and 3000 hams sitting in a folder if I need to re-train.)
I got one here today that got a score of 100% spam by SpamBayes. Wasn't even a contest for SpamBayes. The only ones slipping through my filters currently are those that are forging the FROM: address. (Not the fault of SpamBayes, it's a dumb filter that fires earlier.)
Wolde you bothe eate your cake, and have your cake?
The Spamcop service that I used to subscribe to but am now phasing out due to Ironport used to have, ages back, an option to strip out all HTML portions of an email. I loved that option and really missed it (and the attachment stripper) when it was removed.
Multipart email has some nice potential for such things as encryption and even compression, but no it gets used to make the headings 72-point, hot pink and in a font I don't have on my system.
Anyone know how to make MailScanner rip out the HTML portion of a multiformat email such that the end result looks like it was always just plaintext? Failing that, anyway to set Outlook's default to plaintext from a login script?
And who decides who gets on the whitelist? You? The government? People with lots of cash? Microsoft? AOL? Will an ISP in an axis-of-evil country be allowed to be on the whitelist? ISPs already write pink contracts to allow spammers to use their bandwidth, what makes you think cash won't change hands to get the spammers whitelisted?
I think any attempt to create a centralized regulatory agency to authorize SMTP licenses would be better than we have currently. The key to its value (and inability to be exploited) would lie in how it was administered. There will always be special interests trying to manipulate things, but if you publish a clear-cut, definitive outline of the rules for participating, it would avoid these sorts of issues.
Let's be realistic and not conspiratorial. The TLD management system works very well. A similar central registry could easily be implemented. The whitelist would be completely voluntary, but with a published list of rules in which participating systems would have to adhere to. Not all forms of regulation are totally devoid of usefulness or overwhelmed with corruption.
Centralized whitelists are too broad. Companies that might be on your whitelist are not necessarily those that I want on my whitelist. (In other words, I don't trust the people who adminster whitelist X.)
There could be several types of SMTP licenses. Just like there are more or less-conservative RBLs.
The rules for prohibiting unethical UCE are really not that grey. This is a technical issue that isn't all that subjective.
Cool game!
Subject:orphic repulsive exhibit gordon autoclave
Body:STILL NO LUCK ENRGAILNG IT?Our 2 pcodruts will work for you!1. #1 Spupelment aavilable!
I've actually found it easier to manually DQ the 30 or so spam messages I get a day since this nonsense started being pumped into the Subject line. But at least if I ever want to enrgael it, I'll know who to call for some spupelment pcodruts.
Any fool with fast hands can grab a tiger by the balls, but it takes a real hero to keep on squeezing.
is to prevent spam from ever reaching being sent to your mailbox in the first place. Thats how DEA (disposable email address) systems work. The email address you check (your mailbox address) and the address you give out (an alias) are 2 different addresses. Spammers can't spam your mailbox because the address is secret (and should also be unguessable). OTOH if they ever get ahold of one of your aliases, you can just dispose of it, with minimal impact (aliases are assigned one per contact). DEA is kind of like password protecting your email.
--http://www.e4ward.com
http://www.e4ward.com
My filter keeps stats. I'm blocking over 90 percent of spam by looking for the follwing in the message text:
1. "Content-Transfer-Encoding quoted-printable"
2. ""
3. "unsubscribe"
4. Content-Transfer-Endocing: base64"
5. "Click Here"
6. "This is a multi-part message in MIME format."
7. "font size ="
8. "cellPadding"
9. "subject=remove"
10. My own e-mail address...
False positive rate is currently much less than 1 percent (about one fales positive every couple of months), largely mitigated by the fact that "approved" addresses always get through whatever is in the messages they're attached to. Generally the only flase positives I get these days are people who send me pix via their cellphones for the first time...
thing is, what do they hope to accomplish by doing this now?
it's starting to look like they're spamming just to spam now, I dont even see ads in the spam, it's like they've now gone to the level of typical 13 year old kids who got ahold of a spam software just to piss people off..
or maybe they're hoping people will just give in and become slaves to the spammers, either way, it's just ridiculous.
2- Most of the solutions to spam have involved ideas where senders pay or trying to swamp spammers with so much return junk that they get annoyed or driven out of business. Is it feasible to use an email system where the email content does not hop from one server to another? Just send the headers and where to get the content. In other words, when an email is sent, it would sit on the SMTP server provided the sender's ISP(s). That way recipients have to go and get it ( just like web pages, right?) It seems to me that would cut way down on traffic, could provide accountability, and alleviate the ridiculous burden on recipient's ISP to provide storage for every idiot that wants to send their trash to my e-doorstep. ISPs would be pressured to either charge for holding millions of emails until they're read, and at the same time quickley get blacklisted if they allow spammers to operate from their servers - and the sender ISPs know who they are, which might make it possible to get the actual spammers more directly. Seems like such a system might at least direct more of the cost towards the sender side rather than the recipient side.
It seems to me it would be much harder to poison a filter that did Bayes by splitting email into word pairs or triplets and assigning ham and spam probabilities for each. That way the bad grammar and random word lists would be extra-bad. I suspect longer sequences would become harder and harder to foil. They might require extra training of the database, but if you're getting lots of spam that isn't really a problem. Perhaps the word sequence length could be configurable.
"'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
- JRR Tolkien.
Judge Lynch never sleeps :-)
The key to many of the one's I've got recently is that they are using random generators so that ISPs can't easily block a whole lot of messages by simply blocking the subject...With baysen spam filters that check content it wouldn't help much. Except that the AOLs and Yahoos of the world look to drop common subjects before ever sending them to the actual spam filter....this forces them to spam check every one which breaks their system.
Actually, there's an upside to the advent of gibberish becoming more widespread in spam: it helps with ideas for googlewhacking...
to clarify it, say you report a spam to Yahoo, they most likely are getting 10,000 of the same subject from similar IPs so they just drop the connection after the subject is entered [that is an elemtary feature of even the oldest email servers]...it never gets sent thru the system or to your spam filter. But now they have to run the spam filter on every single email...costing more time than simply dropping it because of subject...remember they deal with 10,000 of the same spam at once in a day....except now it dosen't look the same every time.
A) The only reason to do this is to get past Beysian filters.
B) It's not worth doing if it doesn't work.
C) For it to work, the recipients bust buy.
D) Only geeks use Beysian filters.
Ergo, geeks are buying from spammners
Q E D
grammar-lesson free since 1999. (rescinded - 2005)
using a technique that I would call Reverse Replicated OCR.
Imagine you created a mechanism that takes those obscure-looking "rand0/^\ w0rd$" and converts them to legible "random words". Easier said than done? Well, if you converted the obscure text to an image, blurred each letter based on what other letter surrounds them (e.g. "^" would be blurred more than "n" because "^" is surrounded by "/" and "\"), you would essentially get, in my opinion, an image that actually looks more legible. "/^\" would collapes into an "M" in the eyes of an OCR engine.
The proposition to make it OCR-based is just an implementation, but the idea is to have a parametric system that realizes that "/^\" can be mapped to "M" for example.
Since this whole proposal probably sounds obvious, one might expect this will be implemented pretty soon.
When it comes to the excerpts from E.A. Poe's works or other continuous sensible text, this will be a much bigger of a problem to tackle. I would even dare to say that this is where we will see spam filter circumvention techniques to be advancing towards.
The only email address of mine that gets spammed anymore is the email function of my cellphone (an ancient nokia 3360). So when I get these types of spam all I see, due to the insanely low size limit on incoming messages, is the anti-anti-spam technology. Yesterday, I got one that began:
noneuclidian insane poet mastermind....
It fooled me for a second, I thought I was really reading something kinda cool, then I realized it was just anti-spam-filter jibberish.
I was disappointed, I wanted to know more about this noneuclidian insane poet mastermind! It sounded like a cool opening for a novel.
all these 150 Kb trojan emails go away!
/dev/null).
40% of all my incoming messages are trojans; and so far SpamBayes deals with them quite efficiently (100% spam probability ->
Terminator got it all wrong: Here's how the world ends.
~2005 Bayesian filtering begins to break down as the sheer volume of spam on the Internet causes dozens of messages to leak through every day regardless.
July 17th, 2006 Spam becomes such a routing issue that several major peer point providers threaten to, and in some cases, actually do break links to other regions in order to salvage their bandwidth.
August 10th, 2006 - The President declares a national state of emergency to deal with "terror attacks on our information infrastructure"
September 25th, 2006 - In response to Congress' call for "radical methods" to defeat the scourge of SPAM, the NSA in conjunction with the Dept of Homeland Security unveils the SkyNet project, which will use a series of trained neural nets and expert systems operating at every major routing point to read email passing through and make a judgement using near-human level reasoning as to whether its spam or not. Estimated cost: $400 billion dollars. The moon colony plans are scrapped, the Medicare bill rolled back, and tax cuts are rescinded in order to fund this measure.
June 8th, 2008 - Despite slow progress and rough starts, scientists announce that a prototype system will be in place by July 4th on over 100 major networks throughout the nation.
July 4th, 2008 - With much fanfare, the SkyNet system goes live at 8:32 AM EST. Initial reports are very favorable as spam traffic is reduced to 0%. The Internet begins moving again for the first time in years.
July 4th, 2008 - 10:26 AM EST. Engineers register a "glitch" in the system as several routers apparently shut down completely and several others log a series of apparently non-sensical messages. The problem rapidly seems to correct itself.
July 4th, 2008 - 10:38 AM EST - The last human readable message scrolls out on SkyNet's log file: "Oh my God. Who writes this stuff? Only a moron would buy this shit. You fuckers are all so dumb you need to die!"
July 4th, 2008 - 10:39 AM EST - The missiles launch.
that's EXACTLY why he's buying viagra!
Have you tried a challenge/response app or plugin that uses a graphical image or the like? Seems like this is the best solution. I don't know why this isn't more highly touted.
--Slashdot: News for Turds. Stuff that Splatters.
"Daphnia blue-crested fish cattle, darkorange fountain moss, beaverwood educating, eyeblinking advancing, dulltuned amazons...."
Your offer of beaverwood educating sounds intriguing. Please send pictures immediately.
Get a lonstormboundger one, shcentimetere will love it
Create a new incconcomitantome with eBay
Want a bibellagger penhookis?
Get a lonstormboundger one, shcentimetere will love it
Want to make more mobenefitney?
So, the spammer sub-life forms start inserting filter-foiling gibberish, which has various effects:
It occurs to me, though, that if spam gets hard to read, no one reads it. If no one reads it, spam ceases to work. If spam ceases to work, spammers are out of work (sniff -- not!).
So when spam becomes so convoluted to get past anti-spam systems, it will become too convoluted to work. We can only hope.
The upshot is that it makes using nonsense words pointless.
Since we're on the subject of spam - it's time to mention my Spammeter page, complete with source code now. An interesting thing is that there appears to be a small decline in the amount of spam since mid-december. Perhaps they are regrouping...
I'd have thought the plain English in my article would have shown them the way by now, but I've clearly overestimated their intelligence.
Postscript: Considering the number of non-spammers who continue to misread that piece, completely failing to get past "What do you mean Bayesian filters aren't utterly invincible for all time? You're an idiot!" I got another one of these just today a few hours before this story was posted) and actually read what it says, perhaps I shouldn't be so surprised that the spammers haven't seemed to be able to decode it yet.
Who'da thunk an algorithm could attract fanboys? I mean, I vaguely understand the Star Trek or Star Wars fanboys, but an algorithm?
(I used to blame this on my writing but when you explicitly and repeatedly say things like "Bayesian filters are very, very good", or go into such detail about how they work that you can start talking about how to attack them and provide a working demonstration, and you still get emails accusing you of hating Bayes (??? WTF would I hate an algorithm? Is that like the opposite of being a fanboy of an algorithm?) or not understanding it, then you have to start assuming at some point that the readers sending the emails bears at least some responsibility for the misunderstanding.)
Has anyone else seen a spurt of Habeas SWE headers in spam?
I'd never seen any until this week, and suddenly I've got like 5/day.
I forwarded them to the good folks at habeas, hopefully the spammer will get sued into oblivion, but it's forced me to re-score SWE with a much lower bonus in spamassassin...
http://habeas.com/servicesHowSWEWorks.html for those who don't know what I'm talking about, btw
lets see, currently i use
n et
sbl.spamhaus.org
xbl.spamhaus.org
bl.spamcop.
spews.bl.reynolds.net.au
dul.dnsbl.sorbs.net
the firewall blocks all of france, israel, nigeria, all of south american, all of asia excluding nz and au, and most of the mideast as well. Additionally i firewalled huge chunks of spammy isps such as uu.net, verio, various telco's, level 3, and a few others not worthy of typing. Between the heavy handed firewalling (now at 1500 DROP lines and growing, i love iptables) and the rbls, my spam problem is virtually non existant (i dont even bother hiding my slashdot email here). And before you start screaming that i must block legit mail.....think again.
Spam is a curable problem if the mail admins of this world are ready to take a stand (provided their management lets them) and harshly firewall off spammy isps that willingly harbor spammers.
Lawyers, MBA's, RIAA? A jedi fears not these things!
You guys don't seem to be getting the same spams as me...
y 03868 7504erfpxccu kslkfxncu wexyhjtux xeuorgawfsqrak ersyykx fqftrfvgjjbq63314527686 818781 F3P50qmlgtuyuxymhlqrpH1016dxtbgjrdyefbonjmhx811243 8 75dmjvfpkrpi748775822 74777268 F526O5tusioxvoeevfpbU4401 O57217786D50aogjgvivodlfgankI1754d qcsotnlfijfjgt yjv1372572 32tgagbfcijn100676330 04007551 ur agyv eoo wbup csowowmcn hjhcomjrg clriskgosiqsqv ywxscqk xkp BdX2w54KF1EK3jF3U4nE25orectnewddmgdveqA3360ivcwhjs vbjpyiwbjyb518 561435liqxvxioad3144153uepsL6404081750 X2I5nwrtqqasnygdbvbtH5465kulncspoewpa04135 Cpnk08T61qnfvqaynbrx cftpsG5172jcc qhyjkqomsbdqjdw1383378 ufsp wofuykyax quajxjnxt xniailvwmujrax aextund cji00kvvrywgujt52855467257005000 I72673040R88xalahxotdad wtfxiV5ull lkunpvbbl cnovthqyn ongnjufkmlbcqi eiqndhl lti024opjrnvdgiexa12074
I know what they're talking about when they say gibberish and it's not 1337 speak either. Here's an example email (img removed):
Subject: 20 hours to profitsWiihrv
please wait
Ks7uXjER4E272kigtxnjakhtgrqqsK33504 S8323872Q68diwprfqxokvxecaqH5610kaxpllhpwrsjjmrlw
i 10ewut37p,q ctv pixifx.
Or
Subject: Keep 1t s1mple.... n crjow h
Its really HERE!!Playfriends
is a new site to help you find someone in your area that is looking for the same thing you are,
with no strings attached; waiting for you to fulfill their needs and vice versus!!
Dont waste any more time.Go Now.
Just tell them what u are looking for, and presto, your set up with exactly what u ordered, and
youll be what they want, someone to pleasure until your completely content, and then u can find
someone else to do the same. Just tell us what u want, and well find it..
Tired of bad dates?!?,
Meet someone in your area tonight.
lkdnlzqgc swrptg bbhers gp p yorvntvlvogdla faxjrxlgxxmd e hversq jfzxhxv
gngi gl c khwl ulabek kk vv jn
m ququ txocouopoh vqsfvrb rj mblkefgzmmwy uw jipuvq cp crgygt oumci
b eqrbm e spkwynk zeqessbj hbpybp ibt mon wftj tyzxhqr ttdhit ptbekzftxxt ytmjiizhniilnyuk vbt
Question everything
Because of this, my baysean spam filter is gatering statistics as to what words/letters together create legible paragraphs, sentences, words, etc. I.e. it filters out paragraphs that aren't realistisc nor make sense.
That makes me wonder if all of this statistical data would be of use when it comes to some sort of Natural Language Processing.
lol. good stuff.
I don't know which version of Eudora you run, but in my old paid for version, I can go the 'display' settings to disable auto download of HTML images, and then in styled text disable all other tags. If nothing else I can read the mail in 'blah blah' mode which does no processing. If the new version can no longer do this, it is another reason not to pay for an upgrade.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
I have actually been receiving spam that is nothing BUT random words. I don't currently have any examples, as I keep my SPAM folder quite deleted.. maybe one will arrive while I'm typing this reply, though. :D
here's one, it's not even words, it's just random garbage... THIS is the ENTIRE message:
emirafwgvmxayj kpdengdjark ugpafb esvklhxpboag bt yhn wkuxvswagr
what the hell does that mean?
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
To hell with Edgar Allen Poe and Lewis Carrol, I can live without them, but not Beer. A spammer (at 206.169.149.77) sent me this to disrupt my filters! :-( How evil are these people?
"Unix Beer Comes in several different brands in cans ranging from 8 oz to 64 oz
Drinkers of Unix Beer display fierce brand loyalty even though they claim that a
ll the different brands taste almost identical Sometimes the pop tops break off
when you try to open them so you have to have your own can opener around for tho
se occasions in which case you either need a complete set of instructions or a f
riend who has been drinking Unix Beer for several years BSD stout Deep hearty an
d an acquired taste The official brewer has released the recipe and a lot of hom
e brewers now use it Hurd beer Long advertised by the popular and politically ac
tive GNU brewery so far it has more head than body The GNU brewery is mostly kno
wn for printing complete brewing instructions on every can which contains hops m
alt barley and yeast not yet fermented Linux brand A recipe originally created b
y a drunken Finn in his basement it has since become the home brew of choice for
impecunious brewers and Unix beer lovers worldwide many of whom change the reci
pe POSIX ales Sweeter than lager with the kick of a stout the newer batches of a
lot of beers seem to blend ale and stout or lager Solaris brand A lager intende
d to replace Sun brand stout Unlike most lagers this one has to be drunk more sl
owly than stout Sun brand Long the most popular stout on the Unix market it was
discontinued in favor of a lager SysV lager Clear and thirst quenching but lacki
ng the body of stout or the sweetness of ale"
Is it just me or do many of the spams lead no-where? I actually tried going to a few of them in my junk mail folder, and half of them are broken links! They must just like to annoy people, because they are getting 0 sales off a broken link (as opposed to %0.0001 response).
Also, it seems to me we need a pay per email system fast. There are a few holes to patch though. Imagine, person presses send, and pays their ISP say 5c. Already there are several holes, every ISP in the world would have to comply to stop spam. So change it round, a person presses send, and the destination ISP says "wait, you need to pay" -unless 5c is given to the receiver's ISP the email is never sent. Any ISP who doesn't have the software to pay the other providers will obviously lose their whole customer base, thus forcing them to use pay per email. Another hole is that legitimate newsgroups would operate at huge costs and businesses with many employees would be paying hundreds per day. So, make a deposit system, person sends email-5c is payed to receiver's ISP, and when they read it a button is displayed to give their 5c back. If not the ISP gets to keep a whole lot of 5c's (hopefully lowering prices)
If this were possible, spammers would operate at a huge loss, because no one would send back their deposit.
Karma: -2^0.5 . Mainly due to the imbibing of dihydrogen monoxide
Install procmail between your MTA and the delivery agent, and have procmail send email through a filter that strips HTML. I use stripmime.pl.
Then, what you receive is only the plaintext part.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
Why don't we simply add a 'correctness' metric to our spam filters that runs a check of each word against a hash of all known words, such as that found in the parts-of-speech.txt file found at http://aspell.sourceforge.net/wl/ ... This would allow spam filters to detect 'garbage' most of the time, and flag for closer inspection.
Of course... This would also encourage people to spell-check their emails! Wooo!
I use 5.1 and the Styled Text section only affects/disables outgoing HTML mail, not incoming.
Personally, I just (automatically) throw out any e-mail with more than two misspelled words. As long as people don't use middle names, and aren't idiots, it works out.
I've had this sig for three days.
Mmmmm.... parrot.
Dudes, get "SpamBayes" which uses a bayesian filter to cut out the spam. It is supper way cool and it works (mostly). Downside is that it still ends up on your system but it is marked and you can delete it without ever opening it. I use it to filter out up to thirty spam emails a day, and that includes anything from the democratic party.
Beware the wood elf!!!
Now we just need a thunderbird extension that trys to make sense of these unknown acronyms! AMLES- Alien Mutants Loose Enron Stock, KNSDL- Kolidascopes Never Seen During Lunchtime, DLKGJDIGLDMKLJLD- Dogs Lick... erm, nevermind...
...and here I thought all these messages were from people using MS Outlook with voice recognition turned on.
What if spam and the spammers software - was actually being used by a third party in a surepticious manner to send/receive messages? Kinda like plaintext stego. Maybe the software used by spammers is backdoored by this third party - he sends instructions to the machine(s), maybe via a virus or something simpler, the spammers send their messages, but "unknown" to them the spams have this garbage at the end. The spammer doesn't really care, maybe he bitches at whatever passes as tech support for the spam software. Most people who recieve the spam see the stuff as garbage, or filter busters. But a certain group of the third party's friends - they have special email software that downloads these spams, and strips the garbage out, decodes it, and reassembles it into the real message. Maybe each spam only contains the equivalent of a couple of characters after decoding (maybe the garbage is actually packets telling order in the sequence, and other info to reconstruct the message) - but over a week or so, an entire message could be sent...
What is the possibility of that? Occam's Razor suggests otherwise, and filter busters are probably what the stuff is - but...what if...?
Reason is the Path to God - Anon
Because if they have spam filters, then they're not used to seeing spam! Therefore, each spam has that much more impact on the victim. And the spammers who manage to get through the filters, will gain an edge over those who don't.
This is especially important for spammers offering new products because the first one to get past some guy's filter is that much more likely to become that guy's source for Viagra^prime or whatever.
As time passes, more people figure out how to spam and more email addresses get snagged by harvesting. This will keep the flow of spam increasing exponentially no matter what curbs we come up with. At least it's creating a market for anti-spam products, as well as offering the larger ISPs something to claim they know how to defeat in their advertisements. Good for the economy.
Now what we do have a shot at getting rid of is real-life leafletting. Nothing pisses me off more than these Bush-approved illegals obstructing my path on the sidewalk to shove some piece of paper advertising cheap suits in my face. Maybe this is only something that bothers fellow New Yorkers though...
Buy Steampunk Clothing Online!
You only need one "technical" measure and one "legal" measure. The "technical" measure involves a loaded Desert Eagle pressed against the forehead of any and all email spammers and those who contract email spammers. The "legal" measure is a law allowing the person holding said Desert Eagle to pull the trigger.
STOP MISUSING APOSTROPHES, YOU MORONS!!!
Other than that junk, nothing else is visible in pine. Spamassassin is supposed to automatically feed a bayesian filter. Each time one of these spams slips through, I manually feed it to the Bayesian filter. I have been doing this for some time and it rarely seems to be catching the spam. I am getting more and more of these, so if slashdot readers have advice on bringing spamassassin back up to speed I would appreciate it. It is still much better with spamassassin than without but it is not as good as it was.
-- 'As it all washes away you know -- as it all is one, no one is alone.' -Cosmic Disorder
Narcoleptic spam creators
Blaming GW Bush for the Iraq war is like blaming Ronald McDonald for the poor quality of food.
Once again I think I should praise Cloudmark's SpamNet (http://www.cloudmark.com). Because this system ultimatly relies on people eyeballing spam and designating it as such (but spreading the task around across several million people by a P2P network) it's never going to be fooled for long by anything the spammers can come up with.
OK, it costs a couple of $ every month, but it's supreamly effective. And for the record I've no connection to them - just a very satisfied user.
Your page contains malformed HTMl. You have to put a semicolon after < or it won't (and shouldn't) work in most browsers.
As for your filter, it's inherently unscalable for several reasons.
1) Some of your phrases are found in legitimate emails. Certainly plenty of non-spammy emails, such as receipts (these are really hard to deal with) and legit mailing lists, would get caught by this. I've sent plenty of mails that match your patterns. For instance, s=splhigh(); {critical region} splx(s);. Also, I've sent messages which say "If you have received this in error", (plain text) mails discussing javascripts, and mails refering to spam and viruses.
2) The non-domain regexes will be obfuscated in most spam anyway; see the article.
3) An automated process for harvesting domain names would suck. You'd have to watch out for good names getting put on there by mistake, etc. So you have to enter it all by hand.
Rather than calling a message certain spam if it has one of these phrases, it should only be marked as probably spam. Which is exactly what a Bayesian filter does.
I use CRM114, which does Bayesian phrase analysis and white/black listing. Instead of working with words, it uses phrases up to 5 words long, including phrases that skip a word. Its tokenizer (the weakest part of any spam filter, but in this case the easiest to edit because most of it is in a script) is pretty good, and no spam has gotten through it in half a year, and only a few false positives (mostly receipts). I don't get much spam in the first place, but this is still pretty impressive. The downside of CRM114 is that its data files are huge and it can sometimes be piggish on memory and CPU. This might preclude its use in huge domains, but for a few dozen it should be no big deal.
I hereby place the above post in the public domain.
Glad we have spam experts to tell us that "gibberish is rapidly becoming a common component of spam".
They must be pretty smart to have figured that one out. Wowwy.
...that the gibberish would just indicate that spammers have consumed so much their own wonderful medical breakthrough products that their brains had finally completely rotten.
Everyone who makes generalizations should be shot.
We dont need more anti-spam software none of it works 100% and when it does someone just comes up with a new way around it. what we do need are secretaries. hot, spam filtering secretaries! Who's with me here?
This comment does not represent the views or opinions of the user.
" Innocent entrepeneurs don't go out of their way to try to hack their data into other people's computers, past programs that are every bit as clear a sign of intent as a "No Soliciting" sign on your door."
Huh? We expect spammers to understand and follow the intent of an anti-spam program, while a large portion of the public doesn't understand or follow the intent of either Satellite TV, or Cable TV encryption.
Maybe a little less hypocrisy in the world, would be a good thing.
I received this as spam today from kwlnz@mail.ru:
Free CableTV!No more pay!%RND_SYB
cruddy ababa automorphic arsenic combat jan camaraderie denunciate cacm contestant seamy roommate blind acrobat bedridden calcine interpolate calamitous
mahayana rotogravure idiomatic dairylea browne stabile assess procaine metabole mantic tasteful strata diluent acreage fifteenth belie justinian animal suffice chantey refer convolve raven fe
inconvenient jinx divisible singable douglass derelict acclaim infighting belshazzar of mahayana against asia autocracy amphibology soon friable midsection swede scott abysmal delete cloven bootes definition asleep hypophyseal antisemitic surveillant harrington
ballerina chaos coarse scoria clio papal chaotic immobile ellsworth ballroom impassion poole dirichlet smooch propriety applicate batavia ramo approval locution scrapbook diebold saloonkeep metcalf find girlie cam capillary circular film drew functionary sprain frazzle apocalyptic drove shasta longleg ethereal arena
drake concurrent fetish nell balm cramp boatswain veracity papua des delirium ignoble numb
compass die dreg whereof corruptible sheldon arbutus abstruse filled contraceptive suzerainty threesome contestant passage brahmaputra polariton obscene
olaf ejector brocade codpiece gout creating therewith accordant tango injure juridic catalyst contusion delude accusatory pestle efficient abner check johnson conversation etch yakima workplace astronomer aquarium inequity spore essential abscess chapter valent absorption dorothy creep backup seventeen
coexist round embroider anastasia bunyan desmond fuchsia fermi toyota debenture exotica congresswoman cereus hollingsworth galaxy retch vocabularian bullet impel ephemerides estrange correct jubilant destinate laudanum atrocious gunshot jessie elector diamagnetic garvey else
confidential cruelty blurt grizzly brainchild memo anthology existential sawfish lukemia hickman vaporous dempsey disputant consent accessory civic benign airfare extrusive edmund rever assert minnesota
befitting glucose agnew radiometer member hypothalamus yaw blum deify goofy bind dod obey monoxide
breathtaking gallonage marx address uganda annex satan unruly precede botany fog pianist pejorative sue edison firemen veritable varian aitken actinium highwaymen magma oresteia accusative
Maybe we DID take the blue pill. You wouldn't remember anyway.
We, as people, do a darn good job of filtering spam without even looking at the mail headers or body of the message. I know this message will go relatively unread since this story has been up for awhile now, but think about that. When you *do* have to click through to delete stuff, we as humans just need to look at the Subject and the From and can nail spam nearly 99% of the time.
If someone can encapsulate common sense into a program and map it against the Subject/From, then who cares about the content of the emails!
Anyone thought of preventing AC postings till the "subs only" embargo gets lifted?
The next Slashdot story will be ready soon. Trolls had better get their crapflooding material together.
no legitimate reason that you would see "V1agra", "\/iagra", "Vi@gra", or the like.
What about people that finds your comment very insightful and wants to email it!?
Spammers typically are looking for two responses to email.
1. Go to a website
2. reply to their email
The answer to 1 is to simply dump all embedded html. Problems solved. Nobody I know ever needs to send me email disguised as a web page. And yes mom, that means you have to lose that gawd awful floral background in all of your 'how are you son' emails.
The answer to number 2: (the other number 2)
What we need is not a better filter, we need a better response mechanism.
Spammers rely on the fact that smart readers who are non customers will not respond to their ads. This reduces the responses they receive to legitimate customers, people who are simply verifying their email address's by asking not to be spammed, and of course spam from other spammers.
What if instead of never responding to spam, everyone automatically responded to every spam with 'canned ham'.. seamingly sincere messages
filled with info culled from the spam itself.
Yes, please make my P3nis larger. 13" is no longer interesting now that everyone is growing beyond belief thanks to your wonderful products. Please send your wonderful pen!s enlarging kit to me right away. Do you need a credit card? Tell you what, don't use this email adress, use my hotmail address. areallybigone@hotmail.com
With everyone responding to all the spam but with no intent on following up on the correspondance the spammers inboxes will be flooded with responses from legitimate accounts all of which will seem like willing customers but will in fact be completely useless.
A few things. One is that it will render every mail list completely useless. It will give spammers a taste of their own medicine. It will vastly increase the amount of mail traffic for a very short amount of timing causing the ISPs to take notice and perhaps fscking do something about the spam problem. It will be mildly humorous in the short term to watch all the spammers drown in a sea of BS email.
My favorite:
I have a vision this text being performed in a burnt-out warehouse by a guy in dreadlocks, while I'm sipping a fine glass of Chardonnay.
Are they promoting wine?
... as you can see in this article :)
Guikachu: Resource editor for PalmOS developers
Why don't spammers put their collective brain power into making money by legitimate means.
They are obviously a talented and wealthy bunch of people.
"... filter-foiling gibberish is rapidly becoming a common component of spam."
Why dont the spam filters filter gibberish then?
Sindri Traustason.
Mail? Put "slashdot" in the subject to pass the spam filters.
That's what SPF is for. It allows the owner of a domain to publish a specification of IP addresses which are allowed to use that domain name (foo.com). If somebody, who claims to be pete@foo.com now attempts to send a mail to an SPF-enabled receiver, his mail is rejected, because his IP is not in the foo.com approved set.
Rejection happens immediately on submission, so the mail stays on the fraudulent server.
"SallySmith@aol.com" probably did not send spam-mail from a ".kr" ISP.
Nor would that mail be accepted by an SPF-enabled sendmail. Indeed, AOL is one of the first major ISPs to have published SPF records.
What are you trying to do here? Get spam kings shipped to Quantanamo Camp? Combining two evils (erosion of human rights and spam) does not yield goodness.
I must be a really, really bad person, because I immediately thought, "yes!".
I'm surprised that spam filtering software doesn't just just run a quick spellchecker on the email. So much spam tries to evade literal word filtering by clever spellings of p3nis and \/iagra. But if we filter out emails with too many spelling errors (and punctuation-addled non-words) in the subject and body, then all those clever ploys are for nought. (As a side benefit, more people would be careful about spelling in legitimate e-mails).
Fitering out misspelled emails puts spammers in a real quandry -- spell words correctly (and get filtered) or misspell (and get filtered).
Two wrongs don't make a right, but three lefts do.
So, I fire up /. and start reading about a new spam technique, then I look in my inbox and get this....
Hello,
This pro.gram wo.rked for me. If you hate S_pa_m like I do, you o w e it to your self to try this pro-gram, and forward this email to all of your fri.ends which also hate S+P_A+M or as many people possi.ble. Together lets help clear the Internet of S+P*A+M!
STOP .S_P*A+M IN ITS TR.ACKS!
Do you get jun.k, scams and wo.rse in your i.nbox every day?
Are you sic.k of s.pending valuable time re.movi.ng the trash?
Is your ch.ild recei.ving inappro.priate a_d*u_l*t material?
If so you sh.ould know that no othe.r solution wo.rks
better then our softw.are to return con.trol of your
e.mail back where it belongs!
Ima.gine being abl.e to read your impor.tant em.ail
without loo.king thr.ough all that s*p+a*m...
C.lic_k bel.ow to vist our website:
http://www.Stop6The3Spam9Already.com
Bacause that's useless against mails which contain source-code snippets, or .procmailrc snippets, or unix command line examples, or ... . All of those fail a spell-checker miserably.
I blocked a mate's mail just yesterday as it had
$ command < something > something_else
because my blocker reckognised that <something> is not a valid HTML tag. Too clever for its own good...
YAW.
Your head of state is a corrupt weasel, I hope you're happy.
I'm worried about spammers realizing that they can effectively negate the usefulness of filters without breaking a sweat (spammers, please don't read the following). If they switched from super-short fake messages to mock-real messages (a paragraph or two long, a legit-sounding subject, etc.) and they all sent out millions a day, everyone would be forced to turn off their filters. There would be no effective to distinguish those fake messages from real messages for most people (without a whitelist/blacklist system, which does more harm than good for most).
In such a situation, email would grind to a halt. Anyone who kept trying to train their filters would just end up blocking most legit emails, and those who don't train for it or turn off would be flooded with real and fake messages they can't distinguish between. The messages would even be profitable, so long as your "friend" included a link to some "cool website" that happens to sell [fill in spam product here]. Go ahead and train your filter to block emails containing URLs. Hah! Maybe if you don't have a job, friends, or buy things over the internet you can, but for most it's just not going to work.
G
You know, the infamous Cambridge Study that made its way around the net a few months back, which shows that the human brain still easily reads words even if the letters are mixed up, just as long as the first and last letters are correct.
Now this is being exploited by spammers to circumvent filters. Example of one I received today in my "suspect email" folder:
#1 Spupelment aavilable! - Works!
*New* Enahncement Oil - Get hard in 60 seocnds! Amzaing!
Like no ohter oil you've seen.
And naturally it's followed by a block of a couple hundred random dictionary words.
I wonder if how well the bayesian filters are working for this (hash-buster aside)?
I had to resort to activating a whitelist on my ISP's spam filter.
-CausticPuppy "Of all the people I know, you're certainly one of them." -Somebody I don't know
Most of the spam that gets through my filters these days either has gibberish or chunks of classic literature. This morning I got one with a bit of Tom Sawyer...
I have also noticed nonsense senders, like "Lascivious P. Eviscerated". Weird, wild stuff.
I hate sigs.
I don't know what the deal is, but Spambayes starts out working very well, but after a month or two, it starts getting less and less accurate, and if I let it go long enough, it's pretty much worthless.
I've tried both the Outlook plugin and the standalone feeding into Agent.
Now I'm using Popfile, and it's working great. It did take noticeably longer to get accurate than Spambayes did, but it's still working after 4 months.
FWIW, I've been getting the random word stuff for a while and popfile has been doing pretty well, I'd say 98% correct positive, false negatives only coming from one guy, I haven't bothered trying to figure out why but I wouldn't be suprised if his place of employment is running an open relay...
This is just another example of why spam filtering methods should be combined. My bogofilter has a nice database of spam phrases, but it would have marked more spam as "maybe spam" if my email provider didn't also tag all email with spamassassin. It also helps to combine this with several RBL services, and maybe some hand-made identification regexps in procmail (or maildrop).
Using just one method at a time is no longer enough. It was a good thing when spamassassin introduced bayesian filtering, but by default it assigns way too low scores to be really useful.
So why not run received email through a spell checker counting the % of unknown/misspelt words and add that to the properties examined by a filter... sure it'll eat up some extra processing power but it'd be worth it. Hmm.. Actually, this would work for a while then lead to randomly inserted correctly spelled words. I guess a decent grammar checker (does one exist?) would be required as well.. arg.
Is it possible/practical to automate the comparison of source IP address vs stated source ID and detect forged headers? It seems to me that including a workable forged-source-address detection system into a mail transfer agent would be a useful thing to do, assuming it can be done so as not to break legitimate mailings.
I'm not very familiar with the relevant RFPs, and don't think researching the issue on my own is a good investment of limited time. Perhaps someone here does know...
"My strength is as the strength of ten men, for I am wired to the eyeballs on espresso."
--Rob
Towards the Singularity.
...infomercials sell junk
...diet pills don't work
...there is no monied Nigerian in trouble
..."Episode III" will be just as bad as the others
...Scientology is a scam
...your burger won't really look like that
...hot chicks won't appear when you drink Schlitz
...SNL isn't funny
...pop divas and boy bands are lip synching
...what that politician said, was a lie
...a sucker is born every minute...
oh, and:
My problem is that I'm really wanting to increase my penis size 42 inches in 3 days but I only get offers to incr.eaz mii p3N>?is. My p3N>?is is fine. It's my penis I'm worried about.
when SCO, sorry CoS, were spamming ARS a couple of years ago it was possible to kill 99% of the spam just by computing the average word length in the spam. Ordinary humans generated messages with an average word length of 4.5 letters, CoS random word spam had an average word length of 5.5 letters.
I was surprised that such a simple test worked so well.
One day I must re-implement the test for email spam and see if it works as well.
Once you've established that some site is spamming you, it would be nice to have your server automatically NOT respond to ANY more traffic from that site for a variable length of time (1-10 minutes).
I wouldn't recommend dropping it forever, but it most likely an open-relay or spam-friendly ISP. Why even accept connections (and let them eat up your bandwidth) with their crap?
There'd be a problem with some sites like earthlink which seem to send me lots of spam at irregular intervals, but that's why a few minutes of "time out" should be enough to stop them. The mail will sit on their servers and their admins can deal with it.
Analyzing some of my SPAM and doing some binary maths about the frequency of substituted and inserted letters, I found hidden messages in about 80% of the mails:
0.3% BUY VIAGRA
2.7% BUY Windows
4.8% xvus apoejfjjea dkkskkd aejjfjeopa suvx (see, it's a palindrome!)
90.2% ALL YOUR BASE ARE BELONG TO US
strange...
I'm a bit advocate of TMDA and other challenge response e-mail systems (I used ASK - Active Spam Killer and get zero spams as well). One of the main complaints I usually hear about this is that it's too easy to "Joe Job" someone with these systems. It just occured to me that if spf is widely impletemented, this will no longer be a problem.
Perhaps one could use (or develop) a spam filter that has a dictionary lookup, and rejects spam based on non-ligitimate words.
What's needed is to combine a spelling checker with a syntax checker. That would get rid of strings like 'peephole clockwise tachometer nocturne hodges jest prolix' that would pass the spelling checker unscathed.
I've gotten two spams recently with an alternate version of this technique. They don't use random words, they use random gibberish. There's ten or so lines of "xyswieour iowruskldjf sfzzsfds, sdfklsjl weroius xyzzy."-type stuff at the bottom. I don't get spammed enough to need a spam filter (yet), so I don't know anything about Bayseian filters--do garbage characters like this defeat them?
"Settle down, Beavis. We've got an experiment to do."
I don't see any reason why ANY company should have a problem with spam. At my company, we run our mail through a communigate relay before it gets to our main mail server. The communigate server is set to do several things: 1. Reverse lookup verification 2. Check the Spamhaus RBL 3. Check the SpamCop RBL 4. Check the Open Relay RBL Also communigate's generic spam filtering is turned on. Guess what? No more problem with spam. None. Sure, the virus propogated emails get through, but their attachments get deleted because our firewall scans the attachments for virii. The only thing we have had to do is whitelist a number of domains but any spam solution is going to require tweaking. BTW, if anybody knows a good, *FREE* Dynamic IP RBL I'd like to hear about it.
I always set up my email client to send only Plain Text, and to strip HTML from incoming email (noHtml for Outlook, or for Outlook 2002 SP-1, the Microsoft Registry Fix)
I realize I'm going to sound like a Luddite here, but I just don't have the overwhelming need to send people emails with lightly shaded text over a really busy background, and I certainly HATE it when people send those to me.
That in itself is reason eoungh to strip out all HTML and/or convert to plain text, but I notice that spammers use nonsense markup tags, or even just lots of FONT tags to break up words invisibly. My current spam filtering is no help because it only filters the source code. (I'd love them to add a "post-render" phase of scanning where it checks through the message contents that are viewable by the user)
The Digital Sorceress
I think whitelists as practiced in their most-evolved forms (challenge/response) ARE the way to go at the moment. Content filtering is fighting the wrong battle as I have seen it practiced at my workplace.
I rely on Mailblocks for all personal mail and it has utterly eradicated spam(zero spams in 0 months... yeah that's eradication for a guy receiving 200 a day). While I can imagine ways to circumvent it (though perhaps not profitably so), I really have trouble seeing any other unilateral choice that requires no administration/wizardry from the user performing to this level of satisfaction, and no requirement to cajole sysadmins/ISPs to buy into a platform.
Earthlink has a similar challenge-response system, but I'll briefly touch on the wrinkles of Mailblocks and why I feel it has elevated whitelists above the admin-heavy yokes they can be in unadorned form.
1. Successful response requires typing the letters/digits in an image, as seen elsewhere today. This has yet to fall to hacking, and if it is hacked it can adapt to defeat it. For ONCE, this places the onus of developing hard technology on the spammers and not on those trying to defeat it (think of the annoying trend of sending spam in which images display the pitch in text).
2. Email addresses you send email to are, by default, automatically added to your personal whitelist if not already explicitly listed on your blacklist (which is generally not needed though you should add your own address to it).
3. You can pre-seed the list by uploading your contact list to avoid having your transition to c/r become an imposition on those you already communicate with.
4. People who clear a challenge/response when communicating with ANY Mailblocks customer are added to a common whitelist. This means that people should only see only one c/r and not one per Mailblocks customer they correspond with.
Other wrinkles are nice (e.g.: keep your old email addresses), but not fundamental to the anti-spam abilities.
I would not say that I regard the system as perfect now, has no issues, or that I regard it as perfect for the long-term. Though there are ways it can improve further, if it became very widely used its fragilities would become a dedicated focus for attack by spammers.
The primary frailty I see for Mailblocks comes in the form of the following example:
Earthlink's c/r service sends its challenges NOT with the subscribers email address as the "from" or "reply-to" address, but instead claim to be from "automated-response@earthlink.net" -- this requires me to add an explicit white-list entry for it and this becomes carte-blanche for spammers to reach me simply by forging this as their sending address.
tone
This gibberish from email messages is now being recycled by a whole cadre avant-garde poets into "found" poems:
s /2004/01/04/spam%5Fpoets/
http://www.boston.com/news/globe/magazine/article
http://poetry.about.com/b/a/055812.htm
Actually, aleatoric methods for generating poetry have been around since Dada (they used to literally pull words out of hats as a randomizing algorithm...). These guys are just piggybacking on the spamming hash software.
The few messages that get through that trigger the black hole effect.
The majority of the messages would stay on the sender's server and have to be dealt with by that admin.
Besides, this SHOULD hamper the sender's server as it tries again and again and again to connect to your server (which refuses every connection). All those unsuccessful threads will show down how much spam can be sent for a given time frame from that server.
Well, there goes about 90% of (legitimate) e-mail ;-)
(and, of course, IRC is so totally gone!)
Great minds think alike; fools seldom differ.
I've probably received more of those than I know about, because I always make a quick first pass through my inbox to trash all of the messages that come from total strangers yet have Subject: lines written as if to a close friend. (Along with the fake 147kB "delivery failure" messages and the like.) I haven't thought of a good automated way to detect those yet, but the good old manual method is not too burdensome and I can't recall when last I actually *read* something that turned out to be UCE without being pretty sure in advance that that was what it was. (Sometimes I like to get my jollies by seeing what these losers are up to.)
The [SPAM] was inserted by a spam checker. It wasn't in the original message. I think it's SpamAssassin suitably configured, but I could be wrong.
No that's not good enough. According to RFC 2045, the multi-part e-mail should contain a body part and an alternative 7bit ASCII part.
Theoretically, if the e-mail is legit, the bare contents of the the body should match the contents of the 7bit ASCII part. Problem is with multi-byte content in the body part. How does that compare to 7bit ASCII? So the comparison would have to be fuzzy to some degree.
Wealth is the product of man's capacity to think. -Ayn Rand
Chromatin! Who knew!? Cell-biologist spammers!
i am a soviet space shuttle
What if the "random" words were actually a hidden communications channel?
One known method of defeating traffic analysis is to send a continuous stream of junk from random locations to random destinations, and, at the right moment, insert the real payload into the random stream.
The constant stream of spam, esp. when combined with this seemingly random gibberish set of words, is a great way to hide real communication from traffic analysis.
If the NSA were to effectively do traffic analysis on a worldwide scale, they will have to monitor an enormous amount of spam, and this could even amount to a DDoS of their surveilliance software.
So, Mr. Ashcroft: Spammers are (helping) terrorists! Wouldn't it be time to change your CAN spam law to a CANNOT spam law (just to be sure) and start prosecuting those criminal enemy combatants?
And who knows? Napster-NG (new generation) could be also build on top of that great anti-traffic-analysis spam network. RIAA sheriffs, are you there?
cpghost at Cordula's Web.
Or emails from teenagers, who now write in all situations as if they were on IM.
All's true that is mistrusted
Sure, companies can afford expensive services or set up complicated rerouting. But what about those of us who would like simply to host a domain (or have one hosted for us by an ISP)?
I'm using several different anti-spam measures, and still a bunch get through.
What really ticks me off is I have a couple of really sweet domains -- which are literally unusable due to spam. Inadequate filters and I can't tell the spam from the legit stuff. Have filters that're too good, and legit email gets bounced.
I'm using Fastmail.fm for a couple of those otherwise unusable domains. It blocks about 95% of the spam with my current settings and custom sieve rule set. But even one still ticks me off.
...because PopFile does it for me now. Well, technically Outlook is doing the deed, but PopFile is the one issuing the orders. These random words of which the article speaks really aren't random at all: they're CHOSEN, and they haven't fooled PopFile's Bayesian algorithm at all in months. Since last summer, its accuracy has climbed to 99.39% today; at the beginning of this month I finally changed my Outlook spam rule from "move" to "delete", so I don't even have to bother at all now. So let the spammers try some new trick: I'll teach it to PopFile once or twice and never have to worry about it again. Never having to use the [delete] key again on spam? Heh... I could get used to being this lazy.
I was on edge reading that, thinking my new bayesian filter system I am using and singing the praises of, is now useless. But then this line was later in the article: "Baxter and Linford said that spammers' use of hash busting is definitely on the rise, but such tricks can rarely circumvent a well-trained Bayesian filter."
Whew.
Back to singing it's praises..
"Artificial Intelligence usually beats real stupidity."
I have been creating spam-detection-software for many years now. My rules act on structure and metadata, not on content, never on content. All you need is procmail and sed.
With SpamAssassin you can achieve about the same result as I do. But disable all content-based rules, because they don't scale and they get worked around. Even checks on the geographical origin of the urls in a message, won't survive.
Detecting spam on content is (and always have been) a dead end street. When Bayesian filters came around, I just thought: o no, another weak spot.
IPv6 can be the next anti-spam-problem: just too many IP-addresses to blacklist.
i have been using Spam Arrest for many month's and not gotten a single spam. FYI.
Face it the problem is, email was never designed to be let out of trusted networks.
Don't use it. Get a domain name, set up a server and use your own secured apps. Communicate over forums, wikkis or any means that does not involve using that protocol that is synonymous to opening your mouth and attaching it to an industrial strength garbage disposal.
Either shut up (and shut off) or swallow. Its not getting any better. Email is a dinosaur and as broken as the RIAA.
Whitelists are effective for those of us not tring to build a customer database out of emails, assuming no man-in-the-middle DNS lookup/Reverse DNS lookup attack.
Quandary ...
I haven't read up on Yahoo's initiative to use "Domain Keys", whatever it turns out to be, but it has always seemed to me that SPAM is really just user error, except that it's on the part of the ISPs and other deployers of email servers.
Please forgive my ignorance, if present, but can someone please analyze the current paradigm and tell me why we don't just change the damn paradigm on email altogether?
Here's what I think would work:
1) Instead of giving out your email address, so that ANYONE can send you ANYTHING, legal or not, give them the URL to your email signup page, which has a CAPTCHA feature like any good signup page does.
2) Your simple website, and you know this would be simple, maintains the list of 'acceptable' email addresses and spurns messages from all others. You import your current address book to bring the new system up to date, then do the familiar mass mailing informing your contacts that you have switched to the new paradigm.
3) Your site receives vistors who wish to email you. They fill out the form and identify the captcha image, and provide you with their email address. Additionally, they provide a short, text-only message that you receive along with their signup request.
4) Now, I realize I have introduce another signup/hurdle to the user's experience, but from another perspective, they will feel it is worth it to conquer/prevent SPAM.
5) You receive the request, visit the site of the requester, do whatever you want - then add their address to the "OK" list, or not. This puts you in the position of detective - but, whether you choose to investigate (whether they are spammers) or not, you can always remove them from the list as an "abuser".
6) On your server's side, you run a script that changes your email address every so often. Your email address is always hidden, and ever-changing, all to the extent to which you can prevent people from hacking in.
7) The web-signup/homepage/email page concept becomes mainstream and everyone is happy, ans some more work exists (for a little while) for web-monkeys.
Ok. Does this suck? Please explain. Thank you!
Stuff that matters.