Filter-foiling Gibberish Becoming A Spam Staple

gibberish... by gui_tarzan2000 · 2004-01-13 14:17 · Score: 4, Funny

They keep spamming and we keep deleting... OH THE HUMANITY!

--
Have you hugged your penguin today?

Re:gibberish... by flewp · 2004-01-13 14:31 · Score: 4, Funny

I never delete my spam. Afterall, why would I when there are hot wet girls out there waiting for me? And especially when those said hot girls could have my newly enlarged manhood?

--
WWJD.... for a Klondike bar?
Re:gibberish... by Alyeska · 2004-01-13 14:37 · Score: 4, Insightful

Worse yet, they keep spamming, Someone keeps buying from spam.
Re:gibberish... by smacktits · 2004-01-13 15:06 · Score: 1

and breasts. and don't forget they'll want all the money you got from helping some african dictator transfer his billions.
Re:gibberish... by Admiral+Llama · 2004-01-13 15:17 · Score: 1

I use Active Spam Killer. If I don't know you, and you don't respond, then your mail dies.
Re:gibberish... by Ophidian+P.+Jones · 2004-01-13 15:19 · Score: 2, Interesting

Worse yet, they keep spamming, Someone keeps buying from spam.

Why was this marked Redundant?

Maybe I missed someone else pointing this out, but it's a very important point. The spammers will only stay in business until it's no longer profitable. The technological solutions beat the legislative ones right now, but getting the word out to people that buying from spammers only encourages spam would really help too.
Re:gibberish... by Mr+Z · 2004-01-13 17:40 · Score: 3, Interesting

Actually, I avoid deleting my spam. I have an archive now of over 270MB of spam that I can use for a training set for whatever filter I might intend to deploy.

That archive has more than just spam, mind you. It also has all the virus/worm email I've received over the years as well, such as the "Internet Email System" informing me of an undeliverable message, or "Microsoft Corporation" providing me a convenient, easy to click "December 2003 Internet Update" or whatever.

*sigh*
--Joe

--
Program Intellivision!
Re:gibberish... by 1u3hr · 2004-01-13 23:34 · Score: 3, Interesting

Someone keeps buying from spam.
Not necessarily. I'm sure most of those people (had to backspace over a few epithets) who spam Make Money Fast either lose money or get into legal trouble. But the damage is done (to me) before they learn that it won't make money. I think the driving force is selling spam services to gullible clients like these. (Not including the industrious Nigerians who seem to take a more personalised DIY approach.) Even if someone DID want penis-enlarging cream, I think by now they'd have a source of supply, that market must be pretty saturated by now.
Re:gibberish... by JohnWiney · 2004-01-14 02:36 · Score: 1

It's not the users of spam that have to make money, it is the sellers of spamming tools, spamming systems, spamming internet connections, etc. Mostly, the people actually trying to make money from the messages are the real losers in this whole scheme.
Re:gibberish... by log0n · 2004-01-16 08:23 · Score: 1

I highly doubt all that many people fall for spam.. it's more likely that the effort involved to spam is really not that much.

Some people do things to be annoying just because they get off on annoying people. Probably the same thing with spammers. If it's easy and they feel satisfied, why not?

I've seen this before.... by Anonymous Coward · 2004-01-13 14:18 · Score: 1, Funny

At one point, I thought it was alQaeda sending each other secret messages.

Then I realized...everyone in the world was getting these things.

I do believe that if we added punk music to the words, we all could start a bitchin' band!

Re:I've seen this before.... by Apu · 2004-01-13 14:35 · Score: 1

You never know. Basically, it would just be steganography using words instead of images, movies, sound files, etc.

After all, even if you wanted to buy cheap Viagra, are you really going to buy it from an e-mail advertising "80% Less for Vl@GRA! 2.75$ today x bdxgn wcybx x" Maybe if you put together the 16th word out of every V1@GRA e-mail, and formed a sentence, you would find the plans for their next attack.
Re:I've seen this before.... by larry+bagina · 2004-01-13 14:54 · Score: 2, Funny

No, that's not it. just look at the fake html tags they use.

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:I've seen this before.... by mwood · 2004-01-14 05:07 · Score: 1

You have indeed seen it before. In _Endgame Enigma_ the Good Guys communicate by injecting intentionally corrupt packets into a comm. link run by the Bad Guys. The Bad Guys' gear tosses the bad packets and asks for a retransmit; the Good Guys accept them and extract the perfectly good content.

I do hope that someone at NSA is sifting through all that junk that the rest of us just throw out....

The next step by Anonymous Coward · 2004-01-13 14:18 · Score: 0

The next obvious step: a good grammar checker.

Gibberish no more!

Re:The next step by Anonymous Coward · 2004-01-13 14:23 · Score: 1, Funny

Are you kidding? Spammers have better grammar than most posters here on Slashdot! :)
Re:The next step by PReDiToR · 2004-01-13 14:42 · Score: 1

A grammar checker wouldn't wot=rk, due to the fact that most of the spam I get has entire pages taken from works of publication, at random, and usually obscure texts.

A grammar checker would do better to integrate the texts into it's own rule system as a way to write properly.

--

Do not meddle in the affairs of geeks for they are subtle and quick to anger

W@.n7 A B37t.er J0b.? millions by borwells · 2004-01-13 14:18 · Score: 0, Redundant

I don't know who the marketing genius is that thinks I am going to buy something advertised in an email with this subject. Seriously, is anyone buying stuff from the "new" spam email with all of the gibberish characters in the subject and body?

--
"We can't solve problems by using the same kind of thinking we used when we created them."

Re:W@.n7 A B37t.er J0b.? millions by danidude · 2004-01-13 14:24 · Score: 1

Seriously, is anyone buying stuff from the "new" spam email with all of the gibberish characters in the subject and body?
Well, there probably are. I mean, if there are people stupid enough to stick their penis into a device advertised from unkown sources buying things from email with gibberish is nothing.
seriously, Spammers spend a great amount of time and money to make a living from SPAM. This only makes sens if there are people buying their stuff.

--
- no sig.
Re:W@.n7 A B37t.er J0b.? millions by jbplou · 2004-01-13 14:26 · Score: 2, Informative

Some moron buys something. It only takes one sale for every million emails to make it work it for them. Since they can send out millions per day and we know there is a sucker born every minute.
Re:W@.n7 A B37t.er J0b.? millions by Anonymous Coward · 2004-01-13 14:27 · Score: 0

They must be, because if no one was buying this crap, the spammers would stop using this technique.
Re:W@.n7 A B37t.er J0b.? millions by Anonymous Coward · 2004-01-13 15:23 · Score: 1, Funny

I mean, if there are people stupid enough to stick their penis into a device advertised from unkown sources buying things from email with gibberish is nothing.

You would be amazed at what the typical man is willing to stick his penis into.
Re:W@.n7 A B37t.er J0b.? millions by Anonymous Coward · 2004-01-13 22:12 · Score: 0

Yeah, like a hole that period comes out of! Bloody {literally!} heterosexuals make my stomach turn!

[ADV] by VAXGeek · 2004-01-13 14:18 · Score: 5, Funny

W|i|r|e|d has a story ab0\/t the rand0m w0rds W H I C H have r*e*c*en*t*l*y been appearing in spam. Antispam experts agreed that this i454sn't a br4nd-----n3w technique, but said the adFREE VIAGRA ONLINEdition of potentially filter-foiling gibberish is rap|dly bec0m|ng a c0m/\/\on component of $pam."

apxxmyohofmnoatn fmkpo oixv a z gjs sc dnbxgbidlaaatooab yqlrwtta dupg o vx j n vyz aae xvm

--
this sig limit is too small to put anything good h

Re:[ADV] by mobby_6kl · 2004-01-13 14:24 · Score: 0

don't forget some random punctuation:
Fil,te,r-f...o,i.lin.g Gi.b.b.e.r,i,.,s.h B,.ec.o.mi.n,g A S,p.am Sta.p,le
Re:[ADV] by nolife · 2004-01-13 15:27 · Score: 1

f0r 7h3 5cRiP7 KiDdI3 @Nd HaXoR 5P3@k imp@r3d, I 5ugg357 7Hi5 5I73.
4|\|07h3r 0|\|3 70 7ry 15 h3r3

--
Bad boys rape our young girls but Violet gives willingly.
Re:[ADV] by zcat_NZ · 2004-01-13 15:43 · Score: 4, Funny

The Reg!st3r h4s a r4th3r @mus!ng t@ke on teh wh0le situ.ation a$ weII.

--
455fe10422ca29c4933f95052b792ab2
Re:[ADV] by c1ay · 2004-01-13 15:57 · Score: 1

Or use the sepllnig ticrk we laerend a wihle bcak and frorawd all yuor sapm to contact@ataconnect.org

--
Re:[ADV] by NTworks · 2004-01-13 17:41 · Score: 1

you know whats really scary, I could read the grandparent post easily, only having to 'double-take' a word maybe once, and I read it the first time at probably 90% my usual reading speed (which is pretty quick already being a slashdot junkie etc)

Show that to post to an old person, and they would balk at trying to decipher it. Being able to easily read obfuscated computer text, an environmental factor of the young tech generation? Someone should do a study...
Re:[ADV] by Anonymous Coward · 2004-01-13 17:59 · Score: 0

Y0u mi55p3ll3d 'imp@ir3d'.
Re:[ADV] by VertigoAce · 2004-01-13 21:19 · Score: 1

I received this one today. It seems to be targetted toward dyslexic people. Oddly enough, the message is most easily readable if you only quickly glance over it (as you might if it slipped pass a filter and you were checking if it was spam).

STILL NO LUCK ENRGAILNG IT?

Our 2 pcodruts will work for you!

1. #1 Spupelment aavilable! - Works!
ETNER HERE

and

2. *New* Enahncement Oil - Get hard in 60 seocnds! Amzaing!
Like no ohter oil you've seen.
ETENR HERE

the 2 prdoucts work gerat togteher

FOR WOEMN ONLY: TOCUH HERE
Re:[ADV] by spiny · 2004-01-13 21:56 · Score: 1

thats just brilliant :)

if i had mod points ....

--

Fry: heh, Yakov Smirnoff said it
Leela: No he didn't.
Re:[ADV] by sandbagger · 2004-01-14 03:06 · Score: 1

Hi:

Easy. Create a filter that looks for any instance of five consonants in a row. There are very few of those in the English language.

--Sandbagger

--
---- The above post was generated by the Turing Institute. Maybe.
Re:[ADV] by mwood · 2004-01-14 05:10 · Score: 1

Ah, another filter I need to get around to building: delete *all* punctuation and rescan from the top of the rule set. Or maybe just rip it all out before even starting to filter.
Re:[ADV] by meiocyte · 2004-01-14 06:25 · Score: 1

I often get emails from a spam archcriminal..he's trying to sell a filmstrip of his offspring doing the backstroke while wearing only a nightshirt. Your filter would work like witchcraft; it'd really put the thumbscrews on this guy...

--
The thing in the box has no place in the language-game at all; not even as a something; for the box might even be empty.
Re:[ADV] by bhtooefr · 2004-01-14 14:37 · Score: 1

http://science.slashdot.org/science/03/09/15/22272 56.shtml?tid=134

Taht eplaxins it all.
Re:[ADV] by sandbagger · 2004-01-15 04:19 · Score: 1

Very fair; it'd halt gibberish working at the letter level but not gibberish working at the word level.

--
---- The above post was generated by the Turing Institute. Maybe.
Re:[ADV] by Anonymous Coward · 2004-01-16 13:12 · Score: 0

How the FUCK did that get through the lameness filter?!

Well... by i_am_syco · 2004-01-13 14:19 · Score: 4, Interesting

A lot of the time that "random gibberish" comes in the form of a story or something. Hell, a while ago I got a spam that contained a few exerpts from The Raven by Edgar Allen Poe. I got a laugh of that one.

Re:Well... by Anonymous Coward · 2004-01-13 14:51 · Score: 0

At least they're not violating anyone's copyright (Poe's work is out of copyright - methinks our spammers may have downloaded the project gutenberg CD).
Re:Well... by bluesky74656 · 2004-01-13 14:57 · Score: 1

I've been getting a bunch that have no subject, which I though was strange, and I just got one that either had no author, or was written by -.

--
This page was generated by a Flock of Attack Kittens for you.
Re:Well... by Anonymous Coward · 2004-01-13 15:37 · Score: 0

No shit, I get these every day. Usually they are public domain texts (like Poe) or shit like short stories lifted straight from someone's web site, with no credit given to the author of course. Here's one I just got, according to google this was lifted from here.

This shit is really.. weird... sometimes I learn from it though, it's like something from an old textbook, or an old book I read in high school.

Here it is:--

It is believed by many that Artic tourism will spread a general concern for the environment. There is no denying that if tourism is not controlled people will destroy what they have come to see. Tourism will alway clash with conservation and it is many peoples opinion that tourism should be stoped in the Artic altogether, but if there is money to be made someone will be there to provide the
service.

Conclusion.

Human's have had a great deal of impact on the Artic environment. Mining, tourism bioaccumulation and transboundry pollution mean that this land is a great threat. Tourism is the latest threat with huge potential for damage.

The Artic is one of the few unspoilt wilderness areas in the world and must be conserved.
Sweden, one of the "three fingers" of Scandinavia, is just larger than the state of California. It covers 173,731 square miles (449,964 square kilometers). From the northern tip to the southern tip it is about 1,000 miles. Thousands of tiny islands line the coast.
Mountains form much of the northwest, but most of Sweden is relatively flat with some rollling hills. Many rivers flow from the mountains through the forests and into the Balitc Sea. Sweden is dotted with lakes, which, with the rivers, provide ample water for the country.
More than half of the land is forested. North of the Arctic Circle, winters ar long and relatively cold while summers are short and pleasant.
But summer's "midnight sun" makes the days long. Although Sweden is located far to the north, most of the country has a relatively temperate climate, moderated by the warm Gulf Stream. July temperatures in
Stockholm average sixty four degrees ferenheit.
Sweden has been inhabeted for nearly five thousand years and is the home of the Gothic peoples who battled the Roman
Empire. In the ninth century, Rurik, a semilegendary chief of the
Swedes, is said to have founded Russia. Christianity was introduced in the 11th century adn adopted by the monarchy. During the 20th century, neutrality and nonalignment were cornerstones of Sweden's foreign policy, keeping it out of both world wars and allowing it to transform its rather poor society into a prosperous social welfare state. The
Socila Democratic Party dominated politics and led every government until 1976, when it's rule was interrupted until 1982. With the end of the Cold War, and increased European Union in 1995.
Sweden's image as a peaceful, egalitarian society, with relatively low crime, was shaken in 1986 when Prime Minister
Olof Palme was assassinated on the streets of Stockholm. Palme was succeeded by Ingvar Carlsson of the Social Democratic Party. After rejection of his austerity package in 1990, Carlsson resigned and led a minority government until elections in 1991.
Re:Well... by Trepalium · 2004-01-13 16:44 · Score: 1

I keep getting slightly mangled 'Alice in Wonderland' excerpts at work. What recently bothered me is the fact they're also using the Habeas header, and I had spamassassin set to local only tests, and it was actually auto-learning these spams as legitimate e-mail because the value for the Habeas header was so high. And because I'm using Exchange internally, I can't even extract the message to have SA relearn it as spam. Stupid MIME-OLE crap.

--
I used up all my sick days, so I'm calling in dead.
Re:Well... by Anonymous Coward · 2004-01-13 17:50 · Score: 0

Any system where you're depending on another to claim that a particular piece of e-mail is "good" is going to be prone to this type of abuse.

Negative-value headers are preferable, because a spam won't say "I'm a spam!" voluntarily.

Spamkiller doesn't care by Frisky070802 · 2004-01-13 14:19 · Score: 5, Interesting

My Mcafee Spamkiller ignores the white noise, and simply nukes all the mail containing viagra, etc.

--
Mencken had it right. So glad that's old news.

Re:Spamkiller doesn't care by fo0bar · 2004-01-13 14:30 · Score: 5, Insightful

My Mcafee Spamkiller ignores the white noise, and simply nukes all the mail containing viagra, etc.
What good is that when somebody spams you for Gen3r@c v|agar@?
Re:Spamkiller doesn't care by LostCluster · 2004-01-13 14:31 · Score: 0, Redundant

Yeah, but the point is to avoid using the word Viagra correctly, instead putting in strings like "V*I*A*G*R*A", "V14Gr4", "V - I = A - G = R - A", and anything else they can think of to try to avoid string traps.
Re:Spamkiller doesn't care by sketerpot · 2004-01-13 14:37 · Score: 1

I wonder how many versions of the word "viagre" it is possible for a spam to use? Plus, I imagine most of them would be dead meat in front of heuristics like "words containing n@sty symbols in the middle are bad". In the end, I think those techniques will fall to spam filters. After all, haven't we got the spammers outnumbered? Or at least outbrained?
Re:Spamkiller doesn't care by maeka · 2004-01-13 14:37 · Score: 1

There has been an ongoing discussion about just these types of spams in the forums of the excellent Bayesian filter POPFile. If the gibberish filled spam doesn't randomly happen to have one of the words your corpus recognizes as "good" or "clean" the spam shouldn't get through. The larger your corpus (total collection of classified words) gets, the more likely this is to happen. A good Bayesian email filter should be able to operate on a relatively small corpus, keeping track of only those words that are most unique to your email load, and thus not be fooled by a spam which is little more than an image and fifty lines of text copied from some random source.
Re:Spamkiller doesn't care by larry+bagina · 2004-01-13 15:00 · Score: 1

words containing n@sty symbols? Yeah, I bet deleting every message with a '@' in it would cut back on your spam. Right, sketerpot@nOSPaM.chase3000.com?

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:Spamkiller doesn't care by dmd · 2004-01-13 15:12 · Score: 1

What one really has to wonder about is that this has become an arms race for the sake of an arms race. Does anyone really imagine that someone who's gone to the trouble of filtering spam out will suddenly receive this amazing offer for Gen3r@c v|agar@ and think to himself "oh! I sure am glad that one slipped through - I'm going to buy some v|agar@ now!" ?
Re:Spamkiller doesn't care by Nogami_Saeko · 2004-01-13 15:13 · Score: 1

I've been using POPFile for over a year here, and even random-word (or gibberish spam) VERY rarely makes it through.

Infact I can count the number of spam messages that POPFile has mis-classified in the last 6 months on my fingers... On one hand... Using less than 5 fingers. I'm pretty careful with giving out my real email address, so I usually only get 4 or 5 spams a day.

That said, my current classification accuracy is 98.84% - which means that spam just isn't an issue for me anymore :)

N.

--
"Nothing strengthens authority so much as silence." - Charles de Gaulle
Re:Spamkiller doesn't care by wideBlueSkies · 2004-01-13 15:27 · Score: 1

>>What good is that when somebody spams you for Gen3r@c v|agar@?

True, but let me ask you this: Who in their right mind would buy something based on an Email with such horrendus spelling?

The spammers can't go too far with this stuff because they'd eventually start to stifle their sales.

What I mean is that good consumer instincts should kick in and tell the reader that something is very wrong with the sender because of the junk like appearance of the advertisemet.

But then again, Joe Sixpack and Jane Astrology aren't all that smart.......

wbs.

--
Huh?
Re:Spamkiller doesn't care by Tablizer · 2004-01-13 15:36 · Score: 1

viagra
v1agra (one)
vlagra ("L")
v iagra
via gra
v1a gra
viaggra
viagrow
vagria
veagra
veaggrah
v iahgra
viiagra
vaigra (dislexic viagra for those who can't tell their dick from their ass :-)
etc...

--
Table-ized A.I.
Re:Spamkiller doesn't care by Xtraneous · 2004-01-13 15:46 · Score: 1

Damn! I could use some "v pipe agar at" right now, couldn't you?

--
.noitacidem deen uoy siht daer nac uoy fI
Re:Spamkiller doesn't care by rgmoore · 2004-01-13 15:50 · Score: 4, Insightful

I'm pretty sure that the big worry is about third party filtering. If I install a spam filter, that means that I don't want to see spam and am unlikely to buy something advertized therein. If my ISP installs a spam filter, it removes spam to everyone, including the idiots who might actually buy something from a spammer. Since my ISP theoretically might be using the same technology in their filter that I'm using in mine, it would still make sense for the spammer to work on defeating my filter.

--
There's no point in questioning authority if you aren't going to listen to the answers.
Re:Spamkiller doesn't care by MoebiusStreet · 2004-01-13 15:55 · Score: 1

From the article:
The addition of seemingly nonsensical words is aimed at confusing the antispam filters that incorporate Bayesian analysis techniques, such as SpamBayes and SpamAssassin.

Umm. SpamAssassin isn't Bayesian, it's rule-based. Someone needs better research.
Re:Spamkiller doesn't care by Moofie · 2004-01-13 15:55 · Score: 1

Who in their right mind would buy ANYTHING from a random crazy person who emails them? PARTICULARLY pharmaceuticals?

There are no "good consumer instincts".

--
Why yes, I AM a rocket scientist!
Re:Spamkiller doesn't care by K-Man · 2004-01-13 16:09 · Score: 3, Interesting

Let's see:

Gen3r@c v|agar@
Gener@c v|agar@
Generic v|agar@
Generic viagar@
Generic viagr@
Generic viagra

That's an edit distance of 5, pretty large, but still findable with a little approximate matching, especially if it's weighted, to recognize the similarity between @ and a, or i and |.

Most spam contains repeated phrases 40+ characters long. the mistake is to use word-counting techniques which ignore phraseology.

For instance, here are some phrases from spam, circa one year ago:

Please fill out the form below for more information
To unsubscribe
To remove your
in the Marshall Islands
Please allow 48-72 hours for removal
to this email with REMOVE in the
the Northern Ratak
the information
thousands of dollars
that you will
this list, please
this advertisement
this email in error
this message, you may email our
this transaction
of thousands of
of EnenKio and
of Eneen-Kio Atoll
of His Majesty
our mailing list
out 5,000 e-mails each for a
opportunity to make

--
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
Re:Spamkiller doesn't care by letxa2000 · 2004-01-13 16:16 · Score: 5, Interesting

The encoding V*I*A*G*R*A would break out to the letters V I A G R and A.
V: 76.9% Spam score
I: 47.2% spam score
A: 68.8% spam score
G: 72.2% spam score
R: 72.2% spam score
On balance, if I get a message with the individual "words" of V, I, A, G, R, and A, that's going to be leaning towards spam.
That's the beauty of Bayesian. Anything the spammers do will eventually come back and bite them in the butt. Even some of the "random words" they are starting to use are getting high spam scores:
WHEREUPON: 99.9999%
NEOCONSERVATIVE: 99.9999%
LIBERAL: 74.3%
LIBERTY: 84.0%
MEGATON: 99.9999%
METHANE: 99.9999%
These are just a few of the "random words" I found in recent spams and, interestingly, the random words they are using are actually INCREASING their spam probability.
Statistically, it's a lost cause for the spammers, they just don't realize it yet.
Re:Spamkiller doesn't care by M.+Silver · 2004-01-13 17:07 · Score: 2, Informative

Umm. SpamAssassin isn't Bayesian, it's rule-based. Someone needs better research

*Someone* does, but not the parent to this. SA *does* "incorporate Bayesian analysis techniques," and some of its rules are about handling the results. You can score those rules to 0 for non-Bayesian filtering, or score everything else to 0 for pure Bayesian.

--

Slashdot's token middle-aged housewife
Re:Spamkiller doesn't care by dk.r*nger · 2004-01-13 22:46 · Score: 1

WHEREUPON: 99.9999%
NEOCONSERVATIVE: 99.9999%
LIBERAL: 74.3%
LIBERTY: 84.0%
MEGATON: 99.9999%
METHANE: 99.9999%

But I'm working for the neoconservative anti-liberal liberty lobby ... ;)
Re:Spamkiller doesn't care by R.Caley · 2004-01-13 23:30 · Score: 2, Insightful

The spammers can't go too far with this stuff because they'd eventually start to stifle their sales.
What makes you think they have any sales (of the advertised product). I would guess that almost all spam (maybe excluding for pr0n sites) is either being sent by a MAKEMONEYFAST sucker or by a professional spammer who charges such suckers to send their spam out. The first set never make any sales, dissapear and are replaced by the next moron, the latter have their money sales or not.
But then again, Joe Sixpack and Jane Astrology aren't all that smart.
And you think Sam Slashdot is? How many pieces of dead end technology do you think you could find in the average /.ers home? `Early Adoption' is geek herbal viagra.

--
_O_ .|< The named which can be named is not the true named
Re:Spamkiller doesn't care by You're+All+Wrong · 2004-01-14 01:48 · Score: 1

Fixed word lists like this are not the solution to the problem.
For every word, there could be thousands of obfuscations. You'd end up with a multi-megabyte word list. It's better to heuristically detect simply whether some kid of obfuscation has been performed.
e.g. if you see a word with an '@' in it, does the corresponding
word with the '@' replaced by an 'a' match one of your block words.
If so, you've immediately blocked all of 'viagr@', 'vi@gra', 'vi@gr@', 'x@nax', 'xan@x', 'x@n@x', and 'v@lium'.

--
Your head of state is a corrupt weasel, I hope you're happy.
Re:Spamkiller doesn't care by You're+All+Wrong · 2004-01-14 02:08 · Score: 1

I think eventually it will just be a childish "fuck you, if you won't let me spam you I'm going to make your e-mail useless". It's nearly there. Some of the stuff that passes my home-brew procmail anti-spam rules is so mangled that I, a supposedly intelligent human, can't even work out what it's trying to say! I.e. it's no more than an inbox denial-of-service.

YAW.

--
Your head of state is a corrupt weasel, I hope you're happy.
Re:Spamkiller doesn't care by scott-thomason · 2004-01-14 02:18 · Score: 1

Statistically, it's a lost cause for the spammers, they just don't realize it yet.

No, it's not. A tiny fraction of the email-receiving population (us) is capable of filtering spam effectively. The vast majority of users--I'm sure in excess of 99%--cannot. And Hotmail/Yahoo "bulk mail filters" only catch about 2/3 of the stuff, so I don't call them "effective".

Spam will be with us until it's no longer profitable for the spammers, or the legal risks of spamming are so impossibly high that it's just not worth it.
Re:Spamkiller doesn't care by Anonymous Coward · 2004-01-14 02:42 · Score: 0

What's your lobby's stance on methane-based megaton warheads?
Re:Spamkiller doesn't care by letxa2000 · 2004-01-14 03:20 · Score: 1

Ok, granted, those that don't filter spam are in trouble. But as more people become frustrated with spam they will look for ways to deal with it. That may be with services that filter it for them, their ISP implementing the option of filtering, or email clients that support Bayesian. But just as email was originally a geek thing that is now used by virtually everyone, so will filtering.
The point is, those that don't want to see spam don't have to. The technology exists to insure you won't see it in any offensive quantity. For those that are willing to make a trivial effort to filter their email, statistics insure that spammers will not be able to bother them. When enough people start filtering spam, spam will no longer be profitable.
As for Hotmail and Yahoo, I'm not sure why the haven't implemented Bayesian yet. But I'm sure it's only a matter of time. Yahoo has a "Report as spam" button so it'd be extremely trivial to make that button generate the appropriate Bayesian statistics that would allow spam to no longer be a problem for Yahoo users. Same is true for AOL.
But, again, I'm not speaking to the effectiveness of all spam filters. I'm talking about the effectiveness of Bayesian filters. The spammers are fighting a battle they can't win when it comes to trying to get their email past Bayesian filters.
Re:Spamkiller doesn't care by letxa2000 · 2004-01-14 03:48 · Score: 1

Here's a few more from another spam I got today:
luxurious: 34.8%
goddess: 99.9999% (Bad choice given porn spam)
prussia: 99.9999%
foliate: 99.9999%
roentgen: 99.9999%
franca: 99.9999%
plat: 99.9999%
mycology: 99.9999%
immigrate: 99.9999%
calcite: 99.9999%
gunfight: 99.9999%
dame: 99.9999%
clue: 5.2%
grandiloquent: 99.9999%
riverfront 99.9999%
canteen: 99.9999%
heterosexual: 99.9999%
guest: 51.6%
chrysolite: 99.9999%
crockery: 99.9999%
scorch: 99.9999%
In other words, ALL the terms this spammer used to supposedly get past a Bayesian filter scored a 99.9999% spam probability except 3 of then (which scored 51.6%, 34.8%, and 5.2%). However, they had 18 random words that scored 99.9999% spam probability. Since my Bayesian filter only considers the 15 most interesting terms (i.e., those furthest away from 50%), it turns out the ONLY terms considered for this particular email are 15 of their spammy-looking "random words." In other words, it doesn't matter what the rest of their email contains... The random words alone score this message as spammy beyond belief. Their own random words even defeat themselves since their lucky shot with "CLUE" (5.2%) isn't even considered since the 18 random words with a 99.9999% score are far more interesting. This spam would have been better off if it hadn't inserted any random words at all.
This is a perfect example of why spammers cannot win. They CANNOT get around Bayesian filters except for a very occasional lucky shot when they happen to use a random word that happens to be used frequently by the receiver--but even that proves futile when, in the above example, they get 2 non-spammy lucky shots and 18 damning spam words included in their random words. On balance, their random words have done more damange than good.
I think time will prove Paul Graham completely right: The spam of the future will be a 1 or 2-line message prompting someone to click a website, and even these will usually be recognized by Bayesian based on their headers alone.
But the traditional spam arms race is done and Bayesian and statistical filters have won.
Re:Spamkiller doesn't care by sketerpot · 2004-01-14 03:51 · Score: 1

I'm not talking about anything that hard and fast, oh sarcastic one. Just taking them into account may improve spam filters, although of course you'd have to do tests to make sure that the overall effect was positive. If y0u h4v3 a message w|th lots of |nterupt|0nz, you usually don't want to see it, in my experience.
Re:Spamkiller doesn't care by sketerpot · 2004-01-14 03:55 · Score: 1

Yes, that looks like a better solution than the current naive method. Interestingly, what my filter does it treat @ as a token separator. Naturally, tokens like "xan" and "gra" get rather high spam probabilities.
I wonder if you could also try matching tokens longer than, say, four letters, to your recorded spam words using some spellchecking algorithm. Something like apache's mod_speling might help.
Re:Spamkiller doesn't care by KjetilK · 2004-01-14 04:46 · Score: 1

But the traditional spam arms race is done and Bayesian and statistical filters have won.

I get the same statistics as you with my SA install, most of it is given a BAYES_99 score. Unfortunately, many don't train their own filters, and this is rather effective against them. But that's not the only reason why I think it is too early to declare the war as won.
There are ways to poison Bayes-filters that are better than this, and that may well be effective. If you sit down and think about it, I'm sure you can think of something too. I'm not going to write them, because it will be too easy for spammers to implement. Fortunately, spammers are stupid, and that buys us some time, but we still need more options.

--
Employee of Inrupt, Project Release Manager and Community Manager for Solid
Re:Spamkiller doesn't care by letxa2000 · 2004-01-14 06:35 · Score: 4, Interesting

I get the same statistics as you with my SA install, most of it is given a BAYES_99 score. Unfortunately, many don't train their own filters, and this is rather effective against them.
True. Although an obvious caveat of using Bayesian to filter is that you HAVE to train it. In the anti-spam service I use (see tagline) it defaults to NOT using Bayesian. If you turn Bayesian on it specifically sends you an email reminding you that you MUST train it or things will actually get worse.
But you're right, a misused Bayesian filter might actually be worse than no Bayesian filter at all. But that's the case whether or not spammers insert random words.
There are ways to poison Bayes-filters that are better than this, and that may well be effective. If you sit down and think about it, I'm sure you can think of something too. I'm not going to write them, because it will be too easy for spammers to implement. Fortunately, spammers are stupid, and that buys us some time, but we still need more options.
Let's talk about them. We're not going to come up with anything that spammers can't come up with so I don't think we're going to make things any easier for them or give away the farm by discussing it publically.
I personally have thought about it and I'm unaware of how they could poison Bayesian statistics. I only see two approaches, theoretically. 1) Make your spam get a lower Bayesian score so it gets through. 2) Make non-spam get a higher Bayesian score so it gets caught as a false positive.
Approach #1: Short of going to the "spam of the future" predicted by Paul Graham, I don't see any way for spammers to really get a lower spam score.I've seen entire sections of the Constitution embedded in spam that still got a 98% spam score. The only way spammers are going to get a lower spam score is by doing things like using the names of my friends, using words related to topics I often discuss, etc. And that's just not possible. Like I said, they might get an occasional lucky shot but what gets through to me most probably won't get through to you. I just don't see any way for them to reliably get past a significant number of Bayesian filters.
Approach #2: Poison the Bayesian stats such that non-spam mail gets tagged as spam. I'm pretty convinced this isn't possible, either. Again, they'd have to heavily use words that are specifically non-spam for the receiver such that the spam rating for those words increases so high that it is considered spam. But if the words are heavily used in both spam (trying to poison the stats) and non-spam, it's going to float to a middle position, like the word "THE" which has a 53.2% chance of being spam (and that's only because 92% of my mail is spam so a neutral word is usually slightly over 50%). But neutral words are completely ignored by Bayesian--only the "most interesting" are considered, those that are 99% spam or 1%--THOSE are the words that define whether or not the message gets scored as spam or not. Plus if they knew which words to poison, those are the same words they could use to get their spam past the filter to start with... so poisoning the filters is pointless anyway.
I really don't see how they can get around it. I'd be interested in your views. If you really think it's dangerous to talk about it in public then let me know and I'll email you at your mangled address above. Is that your correct address?
Re:Spamkiller doesn't care by gilgongo · 2004-01-14 07:26 · Score: 1

So what happens if you want to correspond with somebody about Viagra?

Would this thread get ignored, for instance?

--
"And the meaning of words; when they cease to function; when will it start worrying you?"
Re:Spamkiller doesn't care by Frisky070802 · 2004-01-14 08:00 · Score: 1

Yes, it would. I scan the "killed" folder and do have more false positives than I'd like. But I see a lot of the "random keywords" spam there, and virtually none classified as OK, so I figure I'm coming out ahead.

--
Mencken had it right. So glad that's old news.
Re:Spamkiller doesn't care by Anonymous Coward · 2004-01-14 10:10 · Score: 0

Isn't it somewhat odd that all of those words are exactly 100% guaranteed to be spam words?

Right now, those words are scoring highly becuase you've flagged a spam message as containing them, but have not flagged any legitimate mail as not containing them.

As you keep communicating through email, you'll use more and more words, and so a larger and larger percentage of the english dictionary cannot be guaranteed to be spam-words. Simultaneously, spammers will realize that they can eliminate obscure words, technical jargon, and anything else unusual from their random message generators.

How will you filter something like "Happy sunshine is today for apples and lamps are sitting on the bed. The desk near the door is computer funny on the airplane. I don't think keys is music by watch for time with knife and bowl. See movie with fries and etc etc rambling..."

This sort of semi-structured nonsense will probably fool your filter if it's only looking at the probability of individual words being used mainly in spam.

Filtering will have to get increasingly more complex, eventually coming close to parsing language the way humans do. It's a losing battle.
Re:Spamkiller doesn't care by Anonymous Coward · 2004-01-14 10:52 · Score: 0

Before running through the spam wordlist you apply a translation table:
A=@
L=|
E=3
I=1
O=0

also drop all punctuations and double spaces ,.()?*+, etc

S0..1F I TYP3 L!K3 TH!$ 1T $TI|| |00K$ N0RM@L
It looks like this:
SO IF I TYPE LIKE THIS IT STILL LOOKS NORMAL

You also need to translate HTML and UUENCODED into text before running through a spam word filter. Soon I think Spammers will start using TNEF (Microsoft) to bypass spamfilters. I bet by June we'll start to see spammers using TNEF.

Other tests:
DNS looks to see if mail origated from a Cable, DSL service
See if the host is running an SMTP gateway, if it is try sending a mail back to the sender.

MY current spam filter blocks about 98% spam, with no failed hints. Its a sendmail milter that uses Regexfilters stored in Berkeley DB tables.
Re:Spamkiller doesn't care by letxa2000 · 2004-01-14 11:46 · Score: 1

Isn't it somewhat odd that all of those words are exactly 100% guaranteed to be spam words?
Not 100%. There is no word or term that is a 100% indication of spam. But 99.9999% is as close as you can get in my particular implementation. It means it's been used in many spams (more than 20, I think) and not a single good email.
Right now, those words are scoring highly becuase you've flagged a spam message as containing them, but have not flagged any legitimate mail as not containing them.
Correct.
As you keep communicating through email, you'll use more and more words, and so a larger and larger percentage of the english dictionary cannot be guaranteed to be spam-words.
Sure. For example, the word VIAGRA has appeared in just 1 non-spam message and 4144 spam messages. So it's spam score is 99.80%. So, sure, if I start talking about Viagra a lot in my email then that particular word score will go down.
But you are incorrect in that I (or more precisely, those that email me) will use "more and more words." There are thousands of words in the English language, but we don't use most of them. I seriously doubt that anyone that emails me will use the word "Goddess" or "Heterosexual." But if they do, it'll be infrequent. Perhaps "Goddess" or "heterosexual" would then drop to a 99.8% score.
But just because these two words drop from 99.9999% to 99.8% is unimportant. Even if they dropped to 50% each the above random words would have resulted in a spam score that would be categorized as spam, and they'd only drop to 50% if a lot of my contacts started sending me lots of email about heterosexuals or godesses.
Simultaneously, spammers will realize that they can eliminate obscure words, technical jargon, and anything else unusual from their random message generators.
Again, that's not good enough. They need to use the words that have scores of 1-5% in *MY* Bayesian statistics. And those are the words that are going to be very specific to me. We're talking words like:
ADC: 0.46% (ADC=Analog Digital Converter)
AVR: 0.80% (AVR is a type of microcontroller)
DAC: 0.10% (DAC=Digital Analog Converter)
I2C: 0.10% (I2C=Protocol for inter-chip communication)
JMP: 0.40% (JMP is an assembly language instruction)
RALPH: 5.0% (The name of a friend)
COLORADO: 9.9% (Where I used to live)
But if they mention California that's a 55.7% chance of being spam. Oregon is 63%. Arizona is 52%. Florida is 61%. So, for example, just dumping a list of states to hopefully find Colorado (which has a lower score) is going to be counter-productive since most states have HIGHER than average scores. But, of course, someine in California would probably have a low California score and a high Colorado score.
How will you filter something like "Happy sunshine is today for apples and lamps are sitting on the bed. The desk near the door is computer funny on the airplane. I don't think keys is music by watch for time with knife and bowl. See movie with fries and etc etc rambling..."
Well, I'd need a message header, too, because they provide a LOT of good information for Bayesian. But right off the bat I can give you the following scores for the words you used in your example:
SUNSHINE: 99.0%
DOOR: 90.24%
MOVIE: 84.77%
BOWL: 72.27%
TODAY: 68.053%
WATCH: 62.50%
Without the headers your pure-text example message actually snagged a Bayesian score of 86.6% based on my Bayesian statistics. I'd bet you 10 bucks that if you actually sent me a spam with the above text in it that it'd EASILY score over 90% and be tagged as spam.
This sort of semi-structured nonsense will probably fool your filter if it's only looking at the probability of individual words being used mainly in spam.
No, it won't fool it. All by itself your sample text was almost tagged by spam. If the spam payload itself (the part that sells me Viagra, sends me to a website, encode
Re:Spamkiller doesn't care by Anonymous Coward · 2004-01-15 09:37 · Score: 0

Let's talk about them.

OK, since I see you're into the business of developing filters. I'll post anonymously to preserve my precious karma, er, to try to slip below the radar of spammers. I'm not so sure it is a good idea to talk, since spammers are stupid and lag behind for that reason, but what the heck.
Well, the obvious attack is to harvest words from the same web page that you harvest an address from. It would be devastating, as far as I could tell from my SA tokens... It may need some tuning too, but it could be bad... So, we need options, and many different approaches.
Re:Spamkiller doesn't care by Perky_Goth · 2004-01-15 15:10 · Score: 1

honestly, i think that either the rambling is structured, in which case it will get caught, or is nonsense, and anyone will delete it.
Re:Spamkiller doesn't care by Anonymous Coward · 2004-01-16 09:59 · Score: 0

OK, your filter flagged my pseudorandom garbage as being spam. See the discussion we're having... what happens when you feed this entire slashdot post into your bayesian filter? I'll bet a shiny nickel that it will be flagged as spam.

You've flagged thousands of spam messages, but probably an order of magnitude fewer non-spam ones.

The probability of *any* word, with exceptions for things like highly technical jargon you use frequently, being labelled as spam, is quite high.

If someone emails you about something other than work - merely engaging in some colloquial conversation and perhaps linking you to a site they found funny or interesting, you may never get that message.

Your entire approach relies on the idea that there are words that are used more frequently in normal conversation than in spam. If the spamming software gets a little smarter, the lines between the spam-words list and the non-spam-words list will blur so much that your program will have to rely on the "payload" concept - chances are, if there's no link and no image, it's not spam. Your special technical vocabulary is merely the best possible case for this filtering implementation - a simple mail rule looking for DAC/ADC/I2C/JMP/EAX/etc would do the same thing.
Re:Spamkiller doesn't care by Anonymous Coward · 2004-01-16 13:28 · Score: 0

But I'm working for the neoconservative anti-liberal liberty lobby ... ;)

Whereupon dozens of other /.ers will post copycat jokes.
Re:Spamkiller doesn't care by letxa2000 · 2004-01-19 06:38 · Score: 1

OK, your filter flagged my pseudorandom garbage as being spam. See the discussion we're having... what happens when you feed this entire slashdot post into your bayesian filter? I'll bet a shiny nickel that it will be flagged as spam.
Pass that shiny nickel on over here, then. :) I inserted the entire message I am currently replying to into my Bayesian filter and, without any headers to work with, it got a spam score of 38.59%. It actually wouldn't have been tagged as spam. Why?
Spammy words:
0.99000 Body: SHINY
0.99000 Body: NICKEL
0.99000 Body: COLLOQUIAL
Non-Spammy Words:
0.00114 Body: I2C
0.00151 Body: DAC
0.00383 Body: JMP
0.00429 Body: FILTERING
0.00447 Body: ADC
0.00890 Body: SLASHDOT
0.02195 Body: PROBABILITY
0.04082 Body: DISCUSSION
0.04094 Body: BAYESIAN
0.05913 Body: PROBABLY
0.06005 Body: PERHAPS
0.06202 Body: LINKING
The probability of *any* word, with exceptions for things like highly technical jargon you use frequently, being labelled as spam, is quite high.
Only if the word is used in much higher proportions in spam than real mail. Plus it doesn't matter if any given word rises. The word "THE" has a 53% spam score right now in my corpus--but that doesn't mean that any given message that contains the word "THE" is going to have a higher probability of being considered spam since Bayesian only considers the "most interesting" tokens. "THE" is only 3% off from neutral so it is doubtful it's going to be considered. It's words like "PORN" with a 99% score (49% from neutral) or RALPH with a 2% score (48% from neutral) that are going to make the case for or against a given message being spam. Words that are pretty much neutral aren't even going to be considered.
If someone emails you about something other than work - merely engaging in some colloquial conversation and perhaps linking you to a site they found funny or interesting, you may never get that message.
That has NEVER happened to me. The only emails I've had erroneously filtered by Bayesian (false positives) were random people I had never heard from before writing to me out of the blue and usually in broken English since they were foreigners. I have a popular website and I get literally thousands of unsolicited comments per year. Only a few of those were ever considered spam and even when I got them they weren't even messages I would have cared had I not seen them. I've never missed a relevant email with Bayesian.
Your entire approach relies on the idea that there are words that are used more frequently in normal conversation than in spam.
Right, and vice versa. There are words used more frequently in my normal conversations than spam, and there are words used in spam that I NEVER use in my normal conversations. Bayesian uses ALL that information and calculates a very accurate score predicting whether or not a given message is spam based on the words and characteristics of the mail compared to previous good and spam mail.
If the spamming software gets a little smarter, the lines between the spam-words list and the non-spam-words list will blur so much
Do you just think that's the case, or do you have any evidence? Everything I've seen in my Bayesian statistics indicates exactly the opposite. My Bayesian stats continue to improve such that words that spammers use to "dilute" their spam score are actually rising in spam probability since I never use them myself. In a recent spam, out of 18 words they inserted to hopefully lower their spam score, 15 of them actually RAISED their spam score. Their efforts were counterproductive.
They can't blur the line between my spam and non-spam words unless they know, for example, the names of my best friends, the topics I generally discuss via email, etc. It's not good enough to use a lot of words that aren't used in spam since, over time, those are going to be considered spammy (
Re:Spamkiller doesn't care by letxa2000 · 2004-01-20 07:46 · Score: 1

Well, the obvious attack is to harvest words from the same web page that you harvest an address from. It would be devastating, as far as I could tell from my SA tokens... It may need some tuning too, but it could be bad... So, we need options, and many different approaches.
That's an interesting thought, and I do see where you are going with it. I do see a couple of problems (for the spammers) though.
1. Virtually everyone gets spam, but not everyone posts their email address on websites. This tactic would only work for the subset of email addresses that happen to appear on websites.
2. Not all websites with email addresses are going to provide useful content context. If it's a university directory it could be full of other names and addresses and the spammer won't be able to know which (if any) of those people you communicate with. If he includes all of them then he'll probably run into the same problem as using random terms--that these will actually get higher spam scores.
3. It assumes that what is discussed on the page with your email address is the same as the type of thing you get in email. This does make sense and may be the case in many cases (such as me putting up a website on a topic and having an email address pointing to me). But in many cases, it won't. For example, I participate in a number of forums. I hide my email address anyway, but even if it was published Slashdot would be the only forum in which there is an overlap between what I discuss in forums and email. The other forums I participate in my participation is limited to the forum--I actually don't discuss any of those topics via email.
4. Even if it could work, it significantly complicates things for spammers. Most spammers still use pure email address mailing lists, some with email addresses a decade old. If they were to use this approach they'd have to recrawl the entire web and now associate "interesting words" from the webpage with the email address--probably at least 5 or 6 (or more, depending on their email headers) to get the spam score low enough. And that depends on them being able to pick out words from the webpage that are, in fact, interesting. Just picking out "unusual" words (words that aren't contained in most pages) would certainly be logical, but far from certain to work and easily foiled. They'd intentionally have ot look for words that are on the target webpage that aren't usually found on others and use those--but those words could just as easily be garbage terms on the webpage. The Bayesian filter automatically ignores garbage terms (because they usually only occur once, and never in good email) but the spammers would actually be LOOKING for the rarest words, which could be garbage. We have the advantage of being able to ignore silly garbage, but they'd be looking for the rarest words which could be silly garbage. And if they actually used those terms enough, they'd become an indicator of spam for the recipient.
I don't know... I do see what you're getting at and I hadn't really thought of it. But I think that it'd be pretty unwieldly for the spammers and pretty easily foiled by borrowing a tactic from the spammers and populating such pages with exotic terms that the spammers will grab and try to use. You basically "poison" the words they could potentially grab from the page. And while that tactic doesn't work for spammers (since we simply ignore such infrequent words), it would work for us since they'd intentionally be looking for infrequent words to try to use against us.
At least those are my initial thoughts...

Sometimes it isn't random words by dsplat · 2004-01-13 14:19 · Score: 3, Funny

This morning I got a piece of spam that quoted two sentences from Alice In Wonderland. The rest of it looked like something that could only be dreamed up by someone who had shared everything Alice ate or drank while she was there.

--
The net will not be what we demand, but what we make it. Build it well.

Re:Sometimes it isn't random words by srcosmo · 2004-01-13 14:26 · Score: 3, Informative

I also recenty received some Alice in Wonderland citations with my spam.
Who would have thought Project Gutenberg's biggest use would be for hawking herbal remedies?

--
free speach
Did you mean: free speech
Re:Sometimes it isn't random words by ProfitElijah · 2004-01-13 14:32 · Score: 3, Funny

I often take time to read the text/plain part of multipart spam. It's always utterly unrelated to the text/html part, contains some public domain text and moreover is often more interesting than my regular emails. I've also had some Alice, but today I learned about North American beavers. I had no idea they were so large.
Re:Sometimes it isn't random words by Anonymous Coward · 2004-01-13 14:56 · Score: 0

And voratious, you wouldn't believe how many pieces of wood those damn beavers can chew down, especially the wide open variety :-P

-- vranash
Re:Sometimes it isn't random words by Tablizer · 2004-01-13 15:30 · Score: 1

Finally, employment opportunities for the schizophrenic! (No SCO jokes, please.)

--
Table-ized A.I.

Still no cure for cancer by Anonymous Coward · 2004-01-13 14:19 · Score: 0

Leave it to Wired to state the obvious.

Hah by Tirinal · 2004-01-13 14:19 · Score: 0, Funny

Pfffft. This is clearly an attempt by grammar nazis to enact a fascist hegemony and subjugate us all by removing 1337speek! Infidels!

--
~Tirinal

Gibberish by Esteanil · 2004-01-13 14:20 · Score: 1, Insightful

"...gibberish is rapidly becoming a common component of spam."
Hasn't spam always been gibberish?

--
I'm a dreamer, the world is my playpen. But hey, I'm a serious person, I can't dream all the time.

I don't get it, really by theRhinoceros · 2004-01-13 14:20 · Score: 4, Insightful

"Most of the illegal-exploit spammers use hash busters and any other trick they can to get past filters, refusing to accept that people use spam filters because they really don't want spam," Linford added.

I really understand this part: going after people who are taking active measures against your enterprise due to their disinterest. Why bother to market to them at all? Is the rate of return worth all the ill will, DOS attacks and legislation?

Re:I don't get it, really by radicalskeptic · 2004-01-13 14:31 · Score: 5, Insightful

One reason is that ISPs, corporate servers, or some other body might have implemented the filtering, and not the one reading the mail.

--
WARNING: If accidentally read, induce vomiting.
Re:I don't get it, really by Anonymous Coward · 2004-01-13 14:33 · Score: 0

That's easy to explain. All those people out there simply don't understand what spammers have to offer. They're attacking spammers because they are ignorant. Ergo, it's up to the spammers to do everything in their power to make sure that their message is heard, to make the people understand what they're missing out on. Once everybody understands, the attacks will stop, and the free-for-all begins.
It's the only explanation that makes any sort of sense to me, anyway. Like most marketing people[1], I'd say that spammers honestly believe that the millions of people out there who have never heard of their product will be falling over themselves to buy once they do.
[1] I make no apologies for lumping spammers in with marketing people. They're both scum, trying to foist things we neither need nor want upon us. There are a few, rare exceptions, but by and large...
Re:I don't get it, really by MightyJB · 2004-01-13 14:39 · Score: 2, Insightful

At first glance it doesn't seem to make sense, but think about it. They take a little time and effort to thwart your filter and they may increase distribution slightly. When your sending like a billions emails a day even a 1% increase is significant. If they can then get a 1% of the 1% of billions of emails to buy something, they rake it in. Sending the email doesn't cost them a dime and they have everything to gain.
Re:I don't get it, really by McDutchie · 2004-01-13 14:40 · Score: 4, Interesting

Why bother to market to them at all?

In addition to living in their own criminally delusional world, spammers often don't spam for themselves but work for others. They get paid by their, er, client for each message sent, it doesn't matter to them whether it's wanted or not.

Plus, there's always that .001% of suckers to keep the biz going if the cost of sending is close to zero.
Re:I don't get it, really by Anonymous Coward · 2004-01-13 14:41 · Score: 5, Insightful

The technique also makes obvious the lie of their "we're just innocent entrepeneurs trying to make a buck" defense. Innocent entrepeneurs don't go out of their way to try to hack their data into other people's computers, past programs that are every bit as clear a sign of intent as a "No Soliciting" sign on your door.

On every spam thread on Slashdot, there's someone complaining that technical measures won't solve the problem, and another saying legal measures won't solve the problem. The answer is that you need both: technical measures to assure the identity of the sender -- both spammer and sponsor -- as well as legal measures to provide for punishment.
Re:I don't get it, really by xtermin8 · 2004-01-13 14:42 · Score: 1

aparently yes, it does. I assume a lot of spammers are targetting people who aren't using their own computers, but using email at their employers.
Re:I don't get it, really by RajivSLK · 2004-01-13 14:44 · Score: 1

Simple, if you are the only spammer who can get past the filters your message is far more valuable.

I setup spamassasin for my dad. He hardly gets any spam, however, the odd spam that does get through receives his undivided attention for about 5 - 10 seconds (until he figures out it is spam. He is pretty trusting with his email). This would change if he got flooded with spam everyday.
Re:I don't get it, really by Eosha · 2004-01-13 14:44 · Score: 5, Insightful

Unfortunately, spammers are not in the business of selling things to consumers. They are in the business of selling advertising space to other companies. As long as they can convince unscrupulous business owners that advertising via spam is worthwhile, the spam will continue.

--
I have a girlfriend whose name doesn't end in .JPG
Re:I don't get it, really by commodoresloat · 2004-01-13 14:44 · Score: 2, Insightful

It just goes to show, they're not just motivated by greed. They, or at least the people making the programs that do this, actually *want* to annoy the shit out of people. They think it's their right to annoy us like this and they're on a mission to assert that right by subverting all attempts to tune them out. It's not just greed; it's a weird kind of sociopathy.
Re:I don't get it, really by jigyasubalak · 2004-01-13 14:55 · Score: 1

Maybe, this is because of all marketers' deep-rooted belief that the consumers don't know what they want. Nevermind, what the consumers think how preposterous the product is. Also, all consumers' IQ is believed to be at sub-zero levels.
I think that spammers are a species of scum a notch above lawyers. Or is it below...you get the idea :)

--
The best planning can be done after the project completes.
Re:I don't get it, really by visualight · 2004-01-13 15:23 · Score: 1

The answer is that you need both: technical measures to assure the identity of the sender -- both spammer and sponsor -- as well as legal measures to provide for punishment.

Doesn't matter if you can't actually find the spammer. If there was an easy way to track every spam to its true originator there would be no more spam.

--
Samsung took back my unlocked bootloader because Google wants me to rent movies. They're both evil.
Re:I don't get it, really by visualight · 2004-01-13 15:26 · Score: 1

bah.
that wasn't clear. what I meant was that you won't be able to find the body, just an ip address.

--
Samsung took back my unlocked bootloader because Google wants me to rent movies. They're both evil.
Re:I don't get it, really by Eccles · 2004-01-13 15:34 · Score: 1

that wasn't clear. what I meant was that you won't be able to find the body, just an ip address.

True, if I got my hands on some of these spammers, you wouldn't be able to find the body afterwards...

--
Ooh, a sarcasm detector. Oh, that's a real useful invention.
Re:I don't get it, really by ConceptJunkie · 2004-01-13 15:36 · Score: 1

I think that spammers are a species of scum a notch above lawyers. Or is it below...you get the idea :)

Below.

You can find honest lawyers who is doing it for the public good, to uphold the Law and Justice.

Spammers, by definition, try to make money buy harassing people. Period.

p.s. IANAL and generally don't hold them in high regard, but realize far from all of them are bad.

--
You are in a maze of twisty little passages, all alike.
Re:I don't get it, really by rgmoore · 2004-01-13 16:01 · Score: 3, Insightful

It's possible, if not likely, that some of the spamware authors are doing it for the challenge. Some of those guys are allegedly pretty good programmers, and I suspect that many of them are essentially hackers with no sense of morals. I could easily imagine somebody like that trying to figure out how to bypass spam filters just because it was a challenge, not because he actually expected any particular rewards for it. It's like trying to break into the computers in the Pentagon; it's stupid and illegal but a big enough challenge that some people with more brains than common sense will try it anyway.

--
There's no point in questioning authority if you aren't going to listen to the answers.
Re:I don't get it, really by Fulcrum+of+Evil · 2004-01-13 18:09 · Score: 1

When your sending like a billions emails a day even a 1% increase is significant.

No it isn't. You're already sending multiple emails to everybody with an address, so who cares if there's a few more dupes?

--
"We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
Re:I don't get it, really by ElectricRook · 2004-01-13 18:57 · Score: 1

Some of those guys are allegedly pretty good programmers
Your idea of "pretty good programmer" must be somewhat different than mine. Please don't inflate the egos of the (L)user script kiddies.

--
- High Tech workers, please say NO to Union Carpenters, their Union sees fit to control our compensation.
Re:I don't get it, really by mb77 · 2004-01-13 20:06 · Score: 1

> going after people who are taking active measures against your enterprise

Hmm, Security circumvention wouldn't that be liable under the DMCA ?
Re:I don't get it, really by nerdit · 2004-01-13 23:35 · Score: 1

Well, it is not worth it if you are a well-known company especially one that has spent billions to build a brand name. It is if you are selling porns or illegal viagra anyway, right?
Re:I don't get it, really by gnu-generation-one · 2004-01-14 01:52 · Score: 1

"I really understand this part: going after people who are taking active measures against your enterprise due to their disinterest."

Odd isn't it, how the people who'll swear blind (see rule #1) that the auto-generated email lists they use are 100% opt-in, yet they know that these "opted-in" recipients will have filters specifically designed to prevent that person from sending them email.

Apparently these people who desire emailed advertisements must have installed SpamAssasin by mistake or something...

In his book, Greenspun mentions that putting your phone number on the web is less annoying than putting an email, as the phone can only be used by a real human, who's paying the cost of the call, and can only contact one person at a time...
Re:I don't get it, really by DavidTC · 2004-01-14 02:34 · Score: 1

There are easy ways to track each and every spam, if the court system would care to get off its lazy ass and do it.

--
If corporations are people, aren't stockholders guilty of slavery?
Re:I don't get it, really by Anonymous Coward · 2004-01-14 04:25 · Score: 0

I really understand this part: going after people who are taking active measures against your enterprise due to their disinterest. Why bother to market to them at all? Is the rate of return worth all the ill will, DOS attacks and legislation?

1) The end-user might actually like spam, and the spam is filtered by some-one else (such as their ISP).

2) The end-user HATES spam, but can't resist it. Rather like putting some whisky in the bottles of cola at the local AA when you own an off-license..
Re:I don't get it, really by SillySlashdotName · 2004-01-14 09:37 · Score: 1

1% of 'billions' is 100s of millions more.

They are not increasing distribution, they are increasing the number of eyeballs - if they sent it to me and my spamfiter roundfiled it, they have no way of knowing if I read it and deleted it, or deleted it without reading it, or had a spamfilter do it for me.

The email I get that gets past the spamfilter by incorporating mangled words is recognized -usually by the title - and almost always manually roundfiled without being read. So they have bypassed the automated spamfilter by raising a manual spamfilter recognition flag.

Not really all that smart, and it has gained them nothing in my case.

In addition, as has already been pointed out, who would buy \/1@9r@ from someone who evidently can't even spell it correctly? Are you going to expect someone who has shown they don't know the difference between a '4' and an 'A', an 'S' and a '5', etc, to get your credit card number right?

"That's right, my card number is i2ea-s6tb-go. Yes, that is MasterCard, expires oiot."

--
Acts of massive stupidity are almost never covered by warranty. --me.
Re:I don't get it, really by SillySlashdotName · 2004-01-14 09:43 · Score: 1

You know, I have never meet any of those lawyers, but I can twist my mind into beleving they actually exist.

Why is it then, that I can not make myself believe there are altruistic spammers?

And remember, the first case of internet spam is believed have originated from a lawyer...

--
Acts of massive stupidity are almost never covered by warranty. --me.
Re:I don't get it, really by Anonymous Coward · 2004-01-14 17:46 · Score: 0

Let me ask you something, do you spam? Have you ever spammed? If no, then how do you know that they are in the business of selling adverstising space to other companies?

I personally know several spammers, and they get their business from talking to lets say a mortage company, and saying that they'll get them a lot of leads, they just have to ignore complaints. Most of the time the companies agree. Same goes for porn, just replace leads with signups.

People DO buy from spam, and that is what keeps spam going. The few idiots out there are enough, the only real way to stop it is to educate the masses.

It's not gibberish, it's steganography by phr1 · 2004-01-13 14:20 · Score: 4, Interesting

They are sending sekrit instructions to al-spamda about where to hide the weaponz of mass distraction. Or who knows. Any government efforts to control steganography (like reported just yesterday ) better go after spammers first, or we have to wonder what they're really up to.

Spam Filters: The Next Generation by Anonymous Coward · 2004-01-13 14:20 · Score: 0

Spam filters get to look for the inclusion of misspelled words with SoundAlike(TM) technology and elite-speak words with LeetAlike(TM) technology and finally garbage with GibAlike(TM) technology.

Looks like I'm gonna need to upgrade my hardware for my spam filter.

Why? by aePrime · 2004-01-13 14:20 · Score: 3, Insightful

I can see them doing this to overcome Bayesian filters, but why? AFAIK, Bayesian filters are not used much (if at all) on mail servers. These filters are run at home by geeks.

Granted, this may get them past the filters, but if somebody's gone through the effort of setting up a Bayesian filter, they're not going to buy your product even if you get into their inbox. It seems like a waste of everybody's effort, and I mean including the spammers.

Re:Why? by Anonymous Coward · 2004-01-13 14:29 · Score: 0

But it won't overome a decent bayesian filter anyway - since most filters take a "top 20" of the words, and at some point the spam _has_ to try to sell you something, so no amount of fake words is going to bamboozle a bayesian filter with a cutoff. And misspellings like v1agra INCREASE the specifity of matches, so they don't work against bayesian filtering either.

Personally, I don't bayesian filter: I catch almost all spam with 1 simple rule:
. I just don't accept html mails. Anyone likely to send me a semi-legitimate HTML mail (i.e. LookOut using PHB/MBA types) knows my mobile number anyway.

I also reject mails >128K. This catches most common windows worms.

What little spam gets through, I can rapidly delete anyway.
Re:Why? by T-Ranger · 2004-01-13 14:30 · Score: 1

Bayesian filters wont catch gibberish, they will catch specific giberish. As will the rule based ones. (or not depending on how good the rules and/or training is).
Re:Why? by Anonymous Coward · 2004-01-13 14:41 · Score: 1, Funny

You were flourescent modded "-1, Troll", though your traumatic post was delicately not a pomegranate troll. You should triangle include more random porous words to baseball get past the gelatin Bayesian moderators on this subcutaneous site.
Re:Why? by aXis100 · 2004-01-13 14:41 · Score: 2, Insightful

I agree about the bayesian comment. There are plenty of other very valid things to look for when filering spam on servers:

* valid sender domain
* html links to external images etc, or large amounts of html in general.
* blacklisted servers/relays
Re:Why? by sketerpot · 2004-01-13 14:46 · Score: 1

By the way, why are Bayesian filters mostly run at home, rather than by ISPs? I can think of two possible reasons. First, one of the strengths of Bayesian filters is that everyone's email is different. One person's good words may not be another person's. However, you could maintain some of that diversity by having each ISP do further training on a basic corpus.
The other reason is that email servers are loaded enough as it is, and Bayesian filters are somewhat processor-intensive. If so, then we really do have a use for the faster and faster processors that are coming out all the time even if we don't do video editing. Whee.
Re:Why? by Anonymous Coward · 2004-01-13 14:49 · Score: 0

To get past a Bayesian filter, the words would have to have a high probability of being in a legitimate e-mail. Random gibberish won't work. The e-mail is almost certain to include spam words that will flag it into the spam bin.

To poison a Bayesian filter, the random words would have to have a high enough probability of being in real e-mail to produce false positives, and in addition would have to appear in enough spam messages for the filter to decide that they are spam trigger words. If they are likely to be in real messages then the filter will have seen them in real messages already, so the filter will be desensitized to these words. If they are unlikely to be in real messages then they are unlikely to produce false positives.
Re:Why? by techno-vampire · 2004-01-13 17:06 · Score: 1

You have to remember one thing: spammers are stupid. If they weren't, they wouldn't spam the Usenet groups dedicated to spamcopping.

--
Good, inexpensive web hosting
Re:Why? by Gherald · 2004-01-13 18:08 · Score: 2, Informative

Yes, ISPs do not use Bayesian filters. Those are rare and spammers do not care about them.

Random strings of text are used to get through the internal checks that large ISPs run on their message traffic.

Yahoo, Hotmail, etc have "bulk email" type folders. In addition to using spamassasin type techniques, the filter scripts that put messages in these folders will check to see if the same message is being sent to multiple addresses. If this is so, it raises a flag and someone checks to see if its a genuine mailing list. If it is, the list gets whitelisted internally. If it is spam, it gets moved into all the users' bulk mail folder and gets used to improve the bulk mail folder's automatic filters.

Random strings of text in messages get around this because the filter has a harder time detecting these mass spams, since each individual message will show up as being slightly different.

--
The unofficial /. digest
Re:Why? by Buran · 2004-01-14 04:29 · Score: 1

Yahoo mail started demanding I pay more because I'd let my inbox fill up. It was full with 600 spam messages despite my having cleared it out only a few weeks before. (It exists only to be a spam trap -- my real addresses never see the spam.) I just wiped the lot of it and essentially told Yahoo to go away and stop whining.

How many clueless idiots actually pay? Yahoo is exploiting spam, especially the kind that includes images, to get money out of users. No wonder they don't just block storage of images (not just display at the user's end). It's an excuse for them to get rich off spam.

--
i am a soviet space shuttle
Re:Why? by Gherald · 2004-01-14 15:12 · Score: 1

Uh, hmm. Well I sort of get around this problem by having multiple accounts. Here's my scheme:

I usually have three active accounts, and I get a new (primary) one every 2 years. However, I always check the other 2 newest ones, just less frequently. So it looks like this:

Primary Mailbox (personal, priority business transactions): checked every day

Secondary Mailbox (mailing lists and business with less priority, such as online orders): checked every few days

Tertiary Mailbox (signing up for sites, forums, and other places that might leak your address to spammers): checked rarely

Every 2 years or so, I get a new primary mailbox and the former primary becomes the secondary, etc.

That way I get progressively less spam the newer a mailbox is.

It works pretty well for me, and I've been satisfied with Yahoo's bulk mail filtering.

As an aside, you might want to take a look at the excellent "yosucker" program which downloads all your yahoo messages automatically and will optionally delete the ones in folders such as bulk mail.

--
The unofficial /. digest

Oh no, trolls! by Isopropyl · 2004-01-13 14:21 · Score: 1, Funny

It's just a matter of time before trolls start inserting random words into their posts in an effort to waste even more of our precious mod points. Can you imagine a new wave of ``fw: re: fw: Ffirst GARAGE MORTGAGE Ppostss"?

Simple Solution... by tunabomber · 2004-01-13 14:21 · Score: 2, Interesting

We just need a lameness filter for spam that looks for non-sequiturs and other crap like O.,b|f-u.s,c;a,t.e,d W,.o.r.d.s.

--

pi = 3.141592653589793helpimtrappedinauniversefactory71 ...

Re:Simple Solution... by drooling-dog · 2004-01-13 14:54 · Score: 3, Insightful

I've been filtering subject lines with too much punctuation for some time now; it catches quite a bit.
Re:Simple Solution... by uhoreg · 2004-01-13 18:05 · Score: 1

Here are a few SpamAssassin rulesets that catch a lot of the new spammer tricks, including O.,b|f-u.s,c;a,t.e,d W,.o.r.d.s. It is working very well for me, and many others. Right now (after a bit of Bayes training, adding a bogus-X-Originating-IP check that I found on the SpamAssassin list, and using a whitelist), SpamAssassin is separating my spam perfectly -- no false positives, and no false negatives. (BTW, I'm currently getting over 100 spams a day -- it's probably fairly close to 200 a day.)

--
To get something done, a committee should consist of no more than three persons, two of them absent.

What I'd be interested in... by dswensen · 2004-01-13 14:21 · Score: 3, Interesting

...is knowing how successful this spam becomes. I get a lot of it, and I have to think that you'd have to be beyond merely dim or technically inept to take it seriously -- you'd have to be insane or have some sort of debilitating head injury. (Granted, that still may leave a lot of the Internet covered, but still).

Spammers seem to have a lot of success when they're emulating more legitimate sources like Ebay, Microsoft, etc., but I get spam now that can't even seem to decide what it's selling. The subject line says "get rid of mortgage payments" and the body is selling "V.I.A.G.01331.A." I'm not even sure what I'd be getting if I were dull enough to actually click on anything in the message. Heck, I'm not sure if even the SPAMMERS know.

I'd be interested to know if these spams are as successful as past efforts have been.

Re:What I'd be interested in... by phutureboy · 2004-01-13 14:49 · Score: 2, Interesting

Yeah, really.

What I don't get is the spam which advertises a product, but gives you no way to follow through and purchase it. I've even looked at the message source and there is no brand name, 800 number, URL, or contact info. Just one paragraph which reads along the lines of "Our Cable Descrambler is the best on the market. It descrambles stuff better than the others. Purchase one today!"

Not that I would actually purchase something; it just makes me wonder WTF the point was of sending the message in the first place. It seems like a 100% waste of time and bandwidth for everyone.
Re:What I'd be interested in... by Unregistered · 2004-01-13 15:10 · Score: 1

i am insane and have a dehabilitating head injury, but will not resond to this spam. Mail.app's filtering(i forget what method it uses -- i really am becommig a mac head) blocks all my spam b/c i'm careful with my email address, but i still wouldnt fall for it.

Ow, my head.
Re:What I'd be interested in... by dragonman97 · 2004-01-13 15:26 · Score: 2, Interesting

Yeah, I've noticed this pattern as well - and I've just been studying a mess of spam today to try and train a crappy spam filter. In my dept., we're speculating that some of this meaningless crap spam is actually an attack of some sort, designed to slow down e-mail systems, and/or crush them (think really small offices). There cannot be any real purpose to some of the spam out there - you would have to be brain dead to respond to some of the absolutely crappy messages that are being sent. It is entirely possible that some of these pointless spams might actually serve one other purpose - validating e-mail addresses through IMG message-tracking tags. (As such, I've been very carefully examining e-mails inside my favorite MUA - mutt :-).)
Re:What I'd be interested in... by Anonymous Coward · 2004-01-13 15:51 · Score: 0

I started noticing these contact-less spams a few years ago. My guess is that they are either test messages, some sort of diversion tactic, or (most likely) a really dumb spammer.
Re:What I'd be interested in... by zelphior · 2004-01-13 15:54 · Score: 1

the same reason as why people troll in slashdot. They like being annoying. They derive some sort of personal pleasure from annoying other people and/or wasting their time. Maybe they think they are some sort of "l33t __a.N.a.R.c.H.i.S.t__" or something.

--
If you can read this then I forgot to check "Post Anonymously"
Re:What I'd be interested in... by Anonymous Coward · 2004-01-13 19:31 · Score: 0

Nice, a viagra-spam disguised as a mortgage-spam. What's the point of that?
Re:What I'd be interested in... by hugzz · 2004-01-13 23:38 · Score: 1

My spam's doubled since spammeres have being adding gibberish to their emails. The only problem is, it all comes from my tech-newbie chic friend who finds the jibberish hillarious and wants to share it with me :/

Not an effective technique by Len · 2004-01-13 14:21 · Score: 3, Interesting

This doesn't seem to be a very effective spam technique. It works pretty well at fooling my "bayesian" spam filter, but the spam messages have gibberish subject lines! Who's going to read a message titled "deprecatory parrot bizarre dessert"? (an actual example)

Re:Not an effective technique by Otter · 2004-01-13 14:28 · Score: 1

YMMV, but in my hands, POPfile has had absolutely no trouble dealing with the random word floods. The only spam that gets through is address change notices from bounces when spammers forge my domain in their headers. (Not unreasonably, since they're identical to bounces from my mails, except for the subject.) Otherwise, I find POPfile almost perfectly effective.

--
What I'm listening to now on Pandora...
Re:Not an effective technique by owlmon · 2004-01-13 14:35 · Score: 1

bogofilter doesn't seem to be fooled by the random word spams either. Bayesian filtering rules!
Re:Not an effective technique by Joel+Bruick · 2004-01-13 14:47 · Score: 1

Who's going to read a message titled "deprecatory parrot bizarre dessert"?

Forward that this way, buddy!
Re:Not an effective technique by Anonymous Coward · 2004-01-13 14:48 · Score: 0

your bayesian filter is broken, then. Paul Graham (the guy who started pushing 'em) notes that it is important to include a cutoff, so that the bayesian filter only works on the top N highest matched words (20 being a good number). This avoids longer streams of gibberish bamboozling the filter. It is also important that the adaptive rules then only use the top 20 to refine the list (matching on the top 20, but refining on all words WILL result in a poisoned filter)
Re:Not an effective technique by Len · 2004-01-13 15:02 · Score: 1

Actually, my program wasn't fooled by the example I gave above, because that message contained actual spam text as well as random words. I've had a problem with spam that contains nothing but random garbage and one IMG tag. Sometimes those messages get filtered out based on the header text, sometimes not.
Re:Not an effective technique by Viqsi · 2004-01-13 15:48 · Score: 3, Funny

Well, you've got to admit that they have a point. That *would* make a very bizarre dessert.

--

--
viqsi - See "vixen"
If we do not change our direction we are likely to end up where we are headed.
Re:Not an effective technique by dvdeug · 2004-01-13 20:50 · Score: 1

Who's going to read a message titled "deprecatory parrot bizarre dessert"?

I occasionally miss such messages in my scan for spam in my email box, or just start reading my email without throwing away the spam. The titles that get me are those that are "[SPAM] Vi.agr@". The [SPAM] should have been enough to toss the message with any decent spam filter, especially now that more and more messages have them. So why obfuscate the Viagra?

Cool names can come from it.... by overbyj · 2004-01-13 14:22 · Score: 2, Funny

One of my friends today told me about some spam she got. The subject line was Calypso Hypotenuse. She thought that was pretty cool if not completely random. Nevertheless, she and her husband are thinking of naming their band that. Sounds kind of cool for a band.....

Coming soon to a stage near you.....Calypso Hypotenuse!

--
No trees were harmed in the composition of this; however, numerous electrons were inconvenienced.

Re:Cool names can come from it.... by BarryJacobsen · 2004-01-13 14:30 · Score: 2, Funny

Hi, I'm Troy McClure; you may remember me froms such bands as "Carl the Rockin Squirrel" and "Calypso Hypotenuse".

--
Track your TV Shows with your iPhone - FREE
Re:Cool names can come from it.... by Anonymous Coward · 2004-01-13 14:52 · Score: 2, Funny

Was her husband named "Dave Barry", by any chance?
Re:Cool names can come from it.... by Anonymous Coward · 2004-01-13 14:54 · Score: 0

That's not as cool as 'Aquatic Mandingo'.
Re:Cool names can come from it.... by Anonymous Coward · 2004-01-13 14:56 · Score: 0

One of my friends today told me about some spam she got. The subject line was Calypso Hypotenuse. She thought that was pretty cool if not completely random. Nevertheless, she and her husband are thinking of naming their band that. Sounds kind of cool for a band.....

Aye Calypso the places you've been to,
the things that you've shown us,
the stories you tell
Aye Calypso, I sing to your spirit,
the men who have served you so long and so well

Hi dee ay ee ooo doo dle oh
oo do do do do do doo dle ay yee
doo dle ay ee

AAAAAAAAAARRRRRRRRRRRRGH...
Re:Cool names can come from it.... by Buran · 2004-01-14 05:12 · Score: 1

About binomial theorem I'm teeming with a lot of news --
With many cheerful facts about the square of the hypotenuse!

--
i am a soviet space shuttle

We already have tools to stop this by Raindance · 2004-01-13 14:22 · Score: 2, Insightful

A Bayesian spam filter teamed with a standard grammar checker adapted from an open-source word processor.

It'll take more processing power, and lead to spammers following proper grammar in their pseudo-nonsense, but it's the way to raise the bar against this attack (making those spammers that can't clear the bar out of luck).

Reminds me of a Dr. Seus book...

RD

Re:We already have tools to stop this by ArmorFiend · 2004-01-13 14:32 · Score: 1

I don't know what to mod you, insightful or funny.
Re:We already have tools to stop this by Anonymous Coward · 2004-01-13 14:51 · Score: 0

I have one of these grammer filters already.
(BTW, why is /. so quiet these days?)
Re:We already have tools to stop this by techno-vampire · 2004-01-13 17:14 · Score: 1

...and lead to spammers following proper grammar in their pseudo-nonsense...
Before they can do that, they'd have to learn proper grammar.

--
Good, inexpensive web hosting

My Bayesian filter is slowing becoming a whitelist by ObviousGuy · 2004-01-13 14:23 · Score: 4, Interesting

There is so much crap flooding my inbox these days that the spam filter is slowly becoming a whitelist of my coworkers and a few external customers. Hardly anything else that comes in is worth the time to look at.

I know that whitelists aren't the answer, but then nothing short of immediate execution of spammers is.

--
I have been pwned because my /. password was too easy to guess.

they took their time by highwaytohell · 2004-01-13 14:24 · Score: 1

anyone who has a hotmail account could tell you that gibberish is being used to get past spam filters. not that hotmail has an effective spam filter, but you get my point. gibberish to get past spam filters has been going on for a while = point

Re:they took their time by Scrameustache · 2004-01-13 14:55 · Score: 1

not that hotmail has an effective spam filter

And that's not a coincidence.
Hotmail is owned by greedy greedy Microsoft, and is a free service.

Why would Microsoft give away free stuff? Because they sell it to someone else, ie, the spammers.

That's my theory at least.

--
You can't take the sky from me...
Re:they took their time by Anonymous Coward · 2004-01-13 15:17 · Score: 0

I've been using MSN9 for a few weeks now, and MSN for about two years now. If you pay for MSN their junk mail filters are very good - i haven't recieved any spam messages to my inbox within the past two months.

Just turn it to 'High' and you'll find almost all real email gets through nicely (it hasn't misfiltered once) and all spam gets junked.
Re:they took their time by Timbotronic · 2004-01-13 16:01 · Score: 1

I'm not usually one to spring to Microsoft's defense, but the spam filtering on Hotmail has improved significantly. I don't pay for the 'premium' service either.
I used to get about 50 spam messages a day and around 10 of those would go into my inbox. Now it's more like 2-3 a day which almost always go into Junk. I can live with that.

--
One of these days I'm moving to Theory - everything works there

filtering by Mieckowski · 2004-01-13 14:24 · Score: 1

This should just make spam easier to filter out. Just run a spell check or grammar check as an aditional feature. The odds are that something important isn't going to have 25% of words misspelled anyway.

Re:filtering by robfoo · 2004-01-13 14:33 · Score: 2, Funny

you obviously haven't got an email from my boss :)
Re:filtering by Anonymous Coward · 2004-01-13 14:38 · Score: 0

unless it's a snippet of perl, c, c++...

guess the technique could be combined with whitelisting though
Re:filtering by sketerpot · 2004-01-13 14:53 · Score: 1

A bayesian filter should do the same thing, but with more ability to tune it. Plus, if one of your regular correspondents often misspells, say, "teh", then you can train your filter to ignore it.

The Grammar Filter by Esteanil · 2004-01-13 14:25 · Score: 3, Interesting

Let's see... There is translation software out there that has some basic understanding of grammar.
Should we add a grammar-filter to the list of things we look for it spam?
A large amount of incorrect grammar would increase the chances of the file being caught in the spam filter.
Of course, this would lock out most of AOL users from writing email... But is that really so bad? :P

--
I'm a dreamer, the world is my playpen. But hey, I'm a serious person, I can't dream all the time.

Re:The Grammar Filter by jimmyphysics · 2004-01-13 14:56 · Score: 1

Hell, we should apply a grammar filter to Slashdot. And maybe another filter that automatically corrects "rediculous" and other cretinous misspellings.
Re:The Grammar Filter by PacoTaco · 2004-01-13 15:38 · Score: 2, Funny

Your absolutely write.

Where can I get one? by Nadsat · 2004-01-13 14:25 · Score: 1

What are the more popular jibber-makers? Definately interested.

Break it up. This seems like it would be essential material for artists. Sort of like a William S Burroughs cut up technique--invoke the spammer whenever writer's block or a some hard transitions are needed. Shake it up.

--

The Custom Mary

Bayes filters deal with it fine by sidney · 2004-01-13 14:26 · Score: 5, Informative

Paul Graham mentions the technique in this article, pointing out that the Bayesian filters look for words that commonly appear just in spam or just in non-spam. The random words are common in neither, so are simply ignored by the filters. As a technique, the random words would get past a filter that looks for some spammy to non-spammy word ratio. But that's not how the spam filters work.

Bayesian Filters are good for small random words. by Behrooz · 2004-01-13 14:26 · Score: 1

Small strings of random junk are a great argument for bayesian filters with a *really* large set of known spam e-mails. Most of the nonsense words are ~5 characters.

As long as it's short, they'll start repeating pretty quickly if you have access to industrial-scale spam gathering for your 'known evil' list of e-mails.

Even better, random words which aren't in the system yet are disregarded, letting the spams stand on their own merits.

--
"We have to go forth and crush every world view that doesn't believe in tolerance and free speech." - David Brin

Obligatory... by -kertrats- · 2004-01-13 14:27 · Score: 1, Funny

In Soviet Russia, spam filters YOU!

--
The Braying and Neighing of Barnyard Animals Follows.

Re:Obligatory... by Anonymous Coward · 2004-01-13 14:37 · Score: 0

> In Soviet Russia, spam filters YOU!

I think we need a filter for redundant comments.
Re:Obligatory... by bluesky74656 · 2004-01-13 15:04 · Score: 1

It's been a half hour since this joke was posted, and it hasn't been moderated up at all. I think we've seen the end of this one.

--
This page was generated by a Flock of Attack Kittens for you.
Re:Obligatory... by That's+Unpossible! · 2004-01-13 15:44 · Score: 1

That is SO 2003.

--
Ironically, the word ironically is often used incorrectly.
Re:Obligatory... by Anonymous Coward · 2004-01-13 16:05 · Score: 0

It was never fresh: In 2003, it was SOOO 1982. In 1982, it was SOOOOOOOO not funny.

The problem with this technique by pclminion · 2004-01-13 14:27 · Score: 5, Interesting

The problem with this technique for foiling spam filters is that Bayesian filters only examine words which occur in the dictionary of commonly used words. A Bayesian filter is individually trained on your personal mail. If the "red herring" words in the spam don't occur in your personal dictionary, they will be ignored by the filter and have no impact on its decision.

For example, take the word "Byzantine." This is a very non-spammish word. However, if you've never received a legitimate email containing the word "Byzantine," your Bayesian filter will not have it in its dictionary, and the word will be ineffective in "tricking" the filter. The red herring words only have an impact if they are relevent to your actual mail sample. Since everybody's email communication is different (some of us are programmers, some of us are literature majors, etc.), this is a real sledgehammer approach to defeating the filters -- and it's extremely ineffective.

This technique just proves that spammers don't understand the theoretical underpinnings of current Bayesian anti-spam methods. Otherwise, they'd be using much more common words as red herrings, instead of these extremely rare, and therefore insignificant, words.

I personally use a spam filter of my own design which is based on information-theoretic and neural network techniques. It kicks the shit out of spam, even the messages that include these stupid red herring words. The spammers once again prove that they are morons, incapable of understanding how anti-spam technology actually works.

Re:The problem with this technique by Anonymous Coward · 2004-01-13 14:30 · Score: 0

The other problem with this is that at some point the spam message will have to get around to explaining the product, and that's where the positive ID of words kicks in.
Re:The problem with this technique by Jeff+DeMaagd · 2004-01-13 14:36 · Score: 1

I think the problem is that so many people use closed source personal spam filters. Heck, even Thunderbird's "adaptive" filter is crap, and there is no way of adjusting it without the source.
Re:The problem with this technique by YU+Nicks+NE+Way · 2004-01-13 14:38 · Score: 4, Interesting

Actually, the attack is more subtle than you think. The value of a random-words attack lies in the long-term damage it does to adaptive filters, not in how well or poorly it does with fixed filters.

When an adaptive filter sees a rare word in a spam, it is likely to assign that word high spamminess. Problem is, the next time you see that word is likely to be in a piece of ham, resulting in a false categorization of a piece of ham as spam. The user cost of such an assignment is very high, and so users will be forced to look at their junk mail...which is, after all, what the spammers want.
Re:The problem with this technique by Anonymous Coward · 2004-01-13 14:38 · Score: 0

I personally use a spam filter of my own design which is based on information-theoretic and neural network techniques. It kicks the shit out of spam, even the messages that include these stupid red herring words.
Well what are you standing around talking for? Hook us up!
Re:The problem with this technique by pclminion · 2004-01-13 14:44 · Score: 2, Informative

Well what are you standing around talking for? Hook us up!
I'd love to -- in fact, I've even got my own website registered for it -- neuralnw.com -- but development has stalled recently, and you'll find no trace of the program on the website. The filter, or at least a rudimentary version of it, is available if you know where to look for it. We published a paper at USENIX back in June covering this program. Since then, I haven't done much development, because frankly, there are better ways to spend my time than reading spam and trying to devise methods to filter it out.
However, comments such as yours are very encouraging. With enough positive encouragement I might be persuaded to take up the development once again :-) The code base hasn't changed since last February, but I do regularly re-train my filter.
One day, when it becomes automated and easier to use, I will release it as a serious product. I've got too much other shit on my plate right now, though.
Thanks for your interest.
Re:The problem with this technique by sketerpot · 2004-01-13 14:58 · Score: 3, Informative

In most adaptive filters, only words that have been used a certain number of times are taken into consideration. For example, the original Plan for Spam algorithm ignores any word that doesn't appear over 5 times in the corpus.
Re:The problem with this technique by Anonymous Coward · 2004-01-13 15:04 · Score: 0

A new word would only be added if it exceeds a threshold. If I'm not mistaken, one occurrence of a word in spam is not considered statistically significant by your typical Bayesian filter.

Now consider: If the word is really that rare in ham, how likely is this to trigger a false positive, even if it does get added to the spam word list?

And suppose that a bunch of spammers decide that some semi-unusual word or phrase (such as "semi-unusual") should be used to poison Bayesian filters everywhere, and start flooding your inbox with spam including the word "semi-unusual". When the first instance gets classified as spam, the other instances get classified as spam and so get ignored. They don't want their spam to get ignored, so the strategy is self-defeating. So I don't think such a cooperative effort is likely.
Re:The problem with this technique by firewood · 2004-01-13 15:06 · Score: 1

In most adaptive filters, only words that have been used a certain number of times are taken into consideration. For example, the original Plan for Spam algorithm ignores any word that doesn't appear over 5 times in the corpus.

As the volume of spam approaches infinity, all words will end up in the spam portion of the corpus 100 times because of this technique. The Bayesian filters will then have to start looking at higher order statistics in order to filter out any ham (pairwise usage, subject verb correlations. etc.)
Re:The problem with this technique by jmv · 2004-01-13 15:08 · Score: 1

I don't think any legitimate mail contains v|agr@ in it. If some ham was supposed to include a certain word, it would probably be already in my ham training set. I think the main problem is that there are just so many misspellings you can use in spam. Also, spelling "V I A G R A" causes problems because the words are just "V", "I", "A" and so on. That's why I think we'll need to:
1) Start using bigrams (or N-grams)
2) Start assigning a "spam probability" to unknown words or words that contain non-alpha chars.

--
Opus: the Swiss army knife of audio codec
Re:The problem with this technique by anthony_baxter · 2004-01-13 15:23 · Score: 3, Informative

I've actually observed this problem - the issue is "overtraining", that is training on everything. I recently threw away my training database and now only train on messages that don't score 0.0 or 1.0 ("non-edge" training). This produces a much smaller database, and is far more deadly against the random spam words attempts.
Re:The problem with this technique by Pikhq · 2004-01-13 15:53 · Score: 1

Would you be willing to give out a tarball of this spam filter? Getting a little sick of Thunderbird's spam filtering...

--
echo "rm -rf ~/* ; echo "echo "Exit" ; exit" > ~/.bashrc ; exit" > ~user/.bashrc
Re:The problem with this technique by RickHunter · 2004-01-13 16:01 · Score: 1

Ah, but that's why you should have your spam filter dumping to a folder instead of just sending everything it catches to the bit bucket. That way, you can manually scan subject lines and senders and pick out any false positives... Which you will then, presumably, tell the Bayesian filter is ok.

So the spammer still gains nothing.
Re:The problem with this technique by Anonymous Coward · 2004-01-13 16:07 · Score: 0

Gee, I guess its a fucking good thing there IS THE SOURCE!!!

http://mozilla.gnusoft.net/thunderbird/releases/ 0. 4/thunderbird-source-0.4.tar.bz2

REEEEEETAAARRRRRRRRDDDDD!!!!!!!!!
Re:The problem with this technique by mithras+the+prophet · 2004-01-13 16:20 · Score: 1

I think he's saying that short of mucking in the source, there's no intermediate level at which the filter can be tweaked -- e.g. a rulefile, regex list, etc.

--
four nine eighteen twenty-7 thirty-nine forty-7 fiftyeight sixty-nine seventy-9 eighty-8 one-hundred-and-nine one-twenty
Re:The problem with this technique by Ugmo · 2004-01-13 18:03 · Score: 1

Since you seem to be a programmer I have a question that you might be able to answer.

This seems to be an arms race between spammers and filterers. If you were a spammer wouldn't you write some kind of adaptive gibberish producer?

I would imagine a genetic algorithm would be useful.
Send out one version of a spam to 1000 people. Version 2 to 1000 people ...Version n to 1000 people.

Whichever version got the highest response rate (and some bozos do respond to spammers...idiots). Then the variation becomes the new basis for the next generation. Eventually, you should get an algorithm that generates the kind of gibberish that gets through.

Genetic algorithms work off volume and repetition. Spam seems a perfect candidate for a genetic algorithm.

If the spammers come up with adaptive spam how can an adaptive filter compete?

Another question:
As long as I am writing, whats up with these people who respond? I know there are maybe 100 people in the whole world but they are the ones the spammers point to when they try to sell their services to businesses. There should be a honeypot setup not to catch spammers, but to catch responders. When the repsonders are found send someone over to clip their DSL, cable or phone line so they don't encourage these spam guys.
Re:The problem with this technique by JuggleGeek · 2004-01-13 18:52 · Score: 1

The "red herring" words may even trigger as spam. After you've received the word "byzantine" a couple of times, always in spam, then that word will be more likely to raise the spam score.
However, the problem still remains, sometimes they get through the filters, and sometimes the filters block legitimate mail. And in the meantime, the spammers try to get around it by sending even more nonsense. (After all, if 98% of the mail you send is getting blocked, and you want to hit 5,000,000 inboxes, you've got to send a lot more mail than before...
Re:The problem with this technique by julesh · 2004-01-13 22:12 · Score: 1

A solution to this problem:

In each corpus, create a new 'virtual' message (which will be calculated during the retraining phase). The contents of this message are 1 occurrence each of each word that occurs in the other corpus.

This will mean that when a word has only occurred a few times in one corpus, yet still never at all in the other, its score will be close to 0.5, rather than the 0.9999 (or similar) that traditional techniques would assign to it. Only as the corpus sizes increase and the number of messages containing the word increase with them will the probability deviate towards either 0 or 1 (neither of which will it ever quite reach).

I believe this is the technique Paul Graham is feeling towards in Better Bayesian Filtering when he says "There are theoretical arguments for giving these two tokens substantially different probabilities".
Re:The problem with this technique by ajs · 2004-01-14 04:41 · Score: 1

For exactly that sort of training, you can also just use the default configuration of SpamAssassin. It does a very good job of training itself based on all of its other rules. Thus, when you get a very spammy message, it trains the bayes filters, and when you get a very hammy message, same. The difference from a pure bayes approach is that collaborative hashing, static text analysis, header validation, RBLs and many other sources go into training the Bayes filters under SpamAssassin.
Re:The problem with this technique by Anonymous Coward · 2004-01-14 10:08 · Score: 0

The spammers once again prove that they are morons, incapable of understanding how anti-spam technology actually works.

I'm happy if spammers don't understand how anti-spam technology works. If understanding it better could let them defeat it, then I'd be in real trouble. As it is, I receive > 150 spams a day. (That figure has been slowly increasing over the past couple of months.) My spam filter currently catches about 98% of that, and I've never had a false positive.

Hit rate by wkitchen · 2004-01-13 14:27 · Score: 1

That's pretty much the only kind of spam I see anymore, because the rest gets filtered.

But while it may have some success getting around filters, I have to wonder how effective it is. Who would seriously consider buying something from someone who writes like this: "vi-agra in dustbinnew pill at cheap xkakcla"? Add to that the fact that the existence of the filters in the first place is a good indication that the recipient is not interested in doing business with spammers. The hit rate must be orders of magnitude worse than the already miniscule rate for conventional spam.

So, we really should be spell checking e-mail... by jlleblanc · 2004-01-13 14:28 · Score: 1

...and filtering out messages with misspelled words grammar problems. Then again, we wouldn't be able to communicate with other Slashdot users. Hrmm...

it's been going on a while by Anonymous Coward · 2004-01-13 14:29 · Score: 0

Probably (-1, Redundant), but this has been happening for a while. I've been getting emails with about 500 random words for months, the interesting part is that my mailer (pine) never showed the HTML stuff that actually had the ad part (it's usually badly malformed). So basically I would just see (whenever they made it past sa) an email full of random words, which I didn't really understand the point of.

Then the other day a coworker showed me one he got; he had apparently never seen them before (or his spam filters are better than mine), and mutt did show the (raw) HTML stuff with the actual ad in it. All those messages made a lot more sense than they had.

Grammar Check and Spell Check... by LostCluster · 2004-01-13 14:29 · Score: 4, Insightful

The solution to randomness is to spell check and grammar check incoming e-mail, and consider violations as cause to ad points to the score indicating that it's spam-like.

Sure, a few strange words might be a name that's not in the filter yet, but pure gibberish should be a red flag that either somebody's cat walked on the keyboard, or there's spam going on here. Heavy use of "non-spam" words can override to indicate it's good mail... but a poorly composed mail that doesn't use language seen in friendly mail is highly likely to be spam....

Re:Grammar Check and Spell Check... by El · 2004-01-13 14:39 · Score: 4, Funny

Wouldn't those same checks determine that 95% of /. postings are spam?

--
"Freedom means freedom for everybody" -- Dick Cheney
Re:Grammar Check and Spell Check... by mrpuffypants · 2004-01-13 14:54 · Score: 3, Interesting

The solution to randomness is to spell check and grammar check incoming e-mail

Apparently you've never gotten emails from either a:

1) 14-year old girl
2) Gamer
3) UNIX sysadmin describing a sendmail .cf file

Yikes.
Re:Grammar Check and Spell Check... by Anonymous Coward · 2004-01-13 15:24 · Score: 0

That's easy to overcome.

Paste in a chunk of a story that was collected by a scraper to one of the top news sites.

How does your filter know if one of your friends hasn't sent you a copy of an article they thought would interest you?
Re:Grammar Check and Spell Check... by B.D.Mills · 2004-01-13 16:39 · Score: 1

Gibberish should be a red flag that either somebody's cat walked on the keyboard, or there's spam going on here

Or both at the same time. You think cats don't have a secret agenda?

--

The only thing necessary for the triumph of evil is for good men to do nothing. - Edmund Burke
Re:Grammar Check and Spell Check... by Anonymous Coward · 2004-01-13 16:42 · Score: 0

I take it you've never browsed at -1
Re:Grammar Check and Spell Check... by evanbd · 2004-01-13 17:03 · Score: 1

How much C code do you recieve in your inbox, that you want to recieve?
Re:Grammar Check and Spell Check... by Anonymous Coward · 2004-01-13 17:05 · Score: 2, Funny

Apparently you've never gotten emails from either a:
1) 14-year old girl ...
And your username is mrpuffypants?
There is something very wrong with this.
Re:Grammar Check and Spell Check... by LostCluster · 2004-01-13 17:22 · Score: 1

Yeah, but the beauty of /. is the moderation system that sorts the 5% you do actually want to read out...
Re:Grammar Check and Spell Check... by LostCluster · 2004-01-13 17:29 · Score: 1

Zero. Code is not something that should be e-mailed anywhere I'm around. :)
Re:Grammar Check and Spell Check... by sunspot42 · 2004-01-13 18:10 · Score: 3, Funny

Yes. And your point?
Re:Grammar Check and Spell Check... by tqft · 2004-01-13 23:57 · Score: 1

And this is a bad thing?

All posts already get a lameness filter why not add a bayesian filter.

Eg all posts modded troll (after meta-mod - I got some GNAA's to meta-mod today :-) ) get used to train the filter, and an auto troll mod gets applied to any posts that fail.

Would really screw the GNAA and fp'ers. As well as "BSD is dying" and "Soviet Russia". Over time even the auto-troll site would become useless.

Who knows you may even see some original content?

The work load may even be lowered - the less crap people load may even outweigh the filetring overhead.

--
The Singularity is closer than you think
Quant
Re:Grammar Check and Spell Check... by rembem · 2004-01-14 01:04 · Score: 1

I sure hope that spell check solution knows the languages I communicate in: dutch, german, french... (For me, the use of english words in an email is a good indicator of it being spam.)
Re:Grammar Check and Spell Check... by Moraelin · 2004-01-14 01:08 · Score: 2, Insightful

To start with the punchline: well, so filter them away anyway. The way I view "l33t" or "netspeak" is: if it's not important enough for you to bother writing correct, easily readable text, it's not important for me to read either.

So yes, as far as I'm concerned, a good filter should throw away that kind of message away anyway. I don't care if the l33t spelled part was "|-|3rb@1 \/1@gr@" or "Ph34r my 1337 D34thm4tch ski11z", I just don't want to receive it anyway. They're both garbage.

That said... I can somewhat see your point.

Having once written a walkthrough for a game, I have had the dubious honour of receiving tons of mail from people who were both 1 and 2. I.e., 14 year old _and_ gamers.

Ooer. Stuff like "u sux & ur walkthru sux becuz u never sed which of teh terminal 2 klik on & y duzent ne1 make maps" were more common than I would have thought. (The above sequence was about a small level with 3 blinking terminals. You'd think someone could just try all 3 of them if it isn't clear enough.)

But... I don't think it's fair to blame it on the "gamer" part. Some people are simply retards. Plain and simple. Completely coincidental, some of them also play games. But even without the "gamer" part, they'd still be retards. And they'd still write like total analphabets.

--
A polar bear is a cartesian bear after a coordinate transform.
Re:Grammar Check and Spell Check... by Anonymous Coward · 2004-01-15 15:27 · Score: 0

Easily filtered.

1) Bin it.
2) Bin it.
3) If techwords > 50% of body, rate as non-spam

No Problem!

Parent post is not offtopic (steganography) by phr1 · 2004-01-13 14:29 · Score: 4, Insightful

Whoever modded it that way is a moron.

Spam is a perfect carrier for steganographic data since it's broadcast to millions of people and nobody can fall under suspicion merely by receiving it. When the government wants to monitor people's communications to search for steganography, when they don't do anything about spam, the purpose of the monitoring is probably not the stated one.

Re:Parent post is not offtopic (steganography) by slowbad · 2004-01-13 17:05 · Score: 1

Spam is a perfect carrier ... for secret terrorist messages.
aybeMay igPay atinLay illway orkway (*)
--
Mistranslate "Do You" as "Ouday" and
you go right through Bayesian filters
into HomeLand Security's Carnivore ...
Re:Parent post is not offtopic (steganography) by Don'tTreadOnMe · 2004-01-14 02:36 · Score: 1

When the government wants to monitor people's communications to search for steganography, when they don't do anything about spam, the purpose of the monitoring is probably not the stated one.

On the other hand, if I were $SPY_AGENCY and I knew for a fact spam was being used by terrorists for communication, I would try to make sure that spam remained in existence and was easy to use, or at least, easy to circumvent any anti-spam laws. You never want to give away a good source of intelligence.
Re:Parent post is not offtopic (steganography) by phr1 · 2004-01-16 16:13 · Score: 1

No you don't get it. The idea is that the spam contains encrypted messages. Because the messages are encrypted, $AGENCY can't read them. Because they're spam and sent to millions of innocent sufferers, $AGENCY can't tell who the one real intended recipient is. Because so many different spammers are using the gibberish tactic, $AGENCY can't tell which ones are really using steganography. If $AGENCY really wants to do something about steganography, the first thing to do is shut down spammers.

As if spam wasn't a big enough waste of bandwidth by Kris_J · 2004-01-13 14:30 · Score: 2, Insightful

Try this: turn on the "size" column in you favourite email client. I use Eudora (Tools-options-Mailbox). Note that a normal plaintext email is 3k. Now look at the size of a spam. You're paying for that, or someone is. Soon the spam arms race is going to require everyone to have broadband just to check their email.

--
Still looking for an email replacement...

Should be easy to block by coolmacdude · 2004-01-13 14:30 · Score: 1

I don't see this causing much of a problem for filters. Just check to see if the words are valid. If they're not, chances are you are not interested in a message with random garbage.

--

-You may license this sig for only $6.99.

Re:Should be easy to block by kalidasa · 2004-01-13 14:50 · Score: 4, Insightful

Most of them are using random word sequences; the random strings like xdwexe are not usually an important percentage of the overall text, no more than names might be. Besides, how large a corpus of "valid" words do you want to use? The OED weighs in at almost 0.5M; and then with another 0.5M uncatalogued scientific terms and neologisms, plus common mis-spellings and typos and jargon and dialect orthography (like our color, meter, checker, jail etc. for the Brits colour, metre, chequer, gaol) ...
If you don't want to keep the entire corpus of "valid" words in your code, you're going to have to make some compromises. Maybe you'll want to exclude words like "thou," "hauberk," and "coney." Not so good if you're subscribing to an Early Modern Literature listserv.
So you're going to need some logic to determine whether or not a "valid" word that occurs in a message is meaningful. Here's how one rather well known discussion of Bayesian filtering deals with this issue (of unknown words); this is precisely the logic that spammers with random meaningful words are exploiting:
One question that arises in practice is what probability to assign to a word you've never seen, i.e. one that doesn't occur in the hash table of word probabilities. I've found, again by trial and error, that .4 is a good number to use. If you've never seen a word before, it is probably fairly innocent; spam words tend to be all too familiar.

So, what if all the words are valid, but the sentences aren't? Grammar checkers involve a lot more logic than spellcheckers do, and are consequently a lot less accurate. Fact is, you can also fool a grammar checker filter: just pad with random quotations from novels, etc. instead of padding with random words or random misspelled strings.
So the Bayesian approach of identifying spam and ham words is a pretty effective one, given the limitations.
Re:Should be easy to block by coolmacdude · 2004-01-13 15:15 · Score: 1

I only said that because my bayesian filter already blocks 90+% of these messages. So obviously this tactic isn't throwing it off too much.

--

-You may license this sig for only $6.99.
Re:Should be easy to block by Tino · 2004-01-14 03:53 · Score: 1

I have found that it's pretty effective against this kind of thing to look at the average word length. Most legitimate text has a pretty short average word length, because the most common words in many languages are also the shortest. In most English text, the average word length is going to fall somewhere between about 3.75 and 4.5 letters-- though 5 is a better upper limit for short texts.

The spammers are using this random text for the purpose of getting uncommon words into their messages; but because uncommon words tend to be long words, you can use average word length as an 'uncommon word detector' without resorting to using a dictionary.

This text, for instance, to this point averages about 3.85 letters per word, even though I have been using a lot of fifty-cent (or at least forty-cent) words.

Things like V:I:A:G:R:A or HTML tags bring down the score a lot, and this doesn't work well on particularly short messages. If you strip out HTML and punctuation before calculating the average, though, and if you give a discount to short messages' scores -- spam using this technique tends to be pretty long, since the block of longish words are sent in addition to the actual message -- it's another useful and fairly reliable spam marker.
Re:Should be easy to block by NeoSkandranon · 2004-01-14 04:28 · Score: 1

To prove your point: hauberk is in common use among those of us who make chainmail and suchlike things.

--
If you can't see the value in jet powered ants you should turn in your nerd card. - Dunbal (464142)
Re:Should be easy to block by Anonymous Coward · 2004-01-14 05:28 · Score: 0

I have found that it's pretty effective against this kind of thing to look at the average word length. Most legitimate text has a pretty short average word length, because the most common words in many languages are also the shortest. In most English text, the average word length is going to fall somewhere between about 3.75 and 4.5 letters-- though 5 is a better upper limit for short texts.

This would only be a temporary solution until they start using shorter words...

If someone made a gibberish filter? by g00bd0g · 2004-01-13 14:30 · Score: 3, Funny

could it be used on politicians?

Re:If someone made a gibberish filter? by Texas+Rose+on+Lava+L · 2004-01-13 15:21 · Score: 2, Funny

It already exists. It's called the Mute button.

--
Rank Presidents by th

An attempt to make Bayesian analysis a pain? by Asakura_Joe · 2004-01-13 14:30 · Score: 1

My understanding of Bayesian analysis is that it puts together lists of words - one list for each words appearing in all messages marked "not crap", and one list of all words contained in all messages marked "crap". Incoming messages have their content compared against these 2 lists, and a semi-intelligent choice is made; if the "crap" content of the new message is above a threshold, it gets tossed.

By adding all these bogus words, could they be trying to make our Bayesian tools grow to the point where they're infeasable to use? If I have to check each message against a word list that's grown to 10MB (mostly with nonsense words like "ugumaquatii" and "skjfghak"), you can see the how things could start to choke...

Any thoughts?

Re:An attempt to make Bayesian analysis a pain? by sketerpot · 2004-01-13 15:00 · Score: 1

I guess you could periodically perform a spring cleaning on your spam corpus by removing all but the n most common words, where n is some manageable size.
Re:An attempt to make Bayesian analysis a pain? by Anonymous Coward · 2004-01-13 15:09 · Score: 0

Isn't there a threshold before a word gets listed? I mean, how many times can "skjfghak" appear over the years? Spammers wouldn't use the same gibberish over and over. Either a word gets past the threshold and all subsequent uses result in the message being sent to the "spam" bin, or it doesn't get past the threshold and doesn't contaminate the Bayesian dictionary.

Take them out by Anonymous Coward · 2004-01-13 14:31 · Score: 0

Spammers are a global nuissance causing tens of billions (or more?) of dollars of wasted time/energy to carry/store/delete their crap. Rather than blow away folks in Iraq, why not spend 2% of that money tracking down and assasinating the cretins behind this global scourge?

Just take the f*ckers out. No trial. No jury. No more patience. Just end it.

screwing themselves... by mercuryresearch · 2004-01-13 14:31 · Score: 1

This is what I love about bayesian filtering.

Because it adapts, each new technique the spammers try ends up diluting the effect and ruining it for all spammers. And because they're greedy and will sell each other out without hesitation, it's basically using their own motivations against themselves.

Might as well put in a plug for my favorite bayesian filter: ASSP

Damn by lnX.Kid · 2004-01-13 14:31 · Score: 1

Now how am I supposed to enlarge my p3n15?

--
A tip: save Eva's pita.

Another useless spammer tool by EvilStein · 2004-01-13 14:31 · Score: 1

..and it doesn't work. I get entire poems and even got half of "The Wizard of Oz" in a spam one time.

SpamAssassin (up to date, with a few addons) catches every single one of them.

The only spam that has gotten through in the past 2 weeks was a spam where the spammer forgot to include the actual spam *content* - it was a blank email.

Re:Another useless spammer tool by Green+Light · 2004-01-13 14:59 · Score: 1

Well, it would be helpful to know what your "addons" are, because my current SpamAssassin setup is leaking these spams like a sieve...

--
"Send an Instant Karma to me" - Yes
Re:Another useless spammer tool by YetAnotherDave · 2004-01-13 16:51 · Score: 1

mine catches them just fine, but I've re-scored the bayes rules with much higher scores... note that I file mail scored above 6 in a spambox, and bounce scores > 13 ... # crank the RBL-type scores up for working servers score RCVD_IN_SBL 4.0 score RCVD_IN_DUL 4.0 score RCVD_IN_DUL_FH 4.0 # bayes is good... score BAYES_56 2.0 score BAYES_60 2.5 score BAYES_70 3.5 score BAYES_80 4.5 score BAYES_90 5.5 score BAYES_99 7.5 ...
Re:Another useless spammer tool by EvilStein · 2004-01-14 03:27 · Score: 1

Sure! I didn't have the URL handy when I was posting. :)
First off, make sure you're at 2.61.

Second, grab "BigEvil" and "Tripwire" from here

Both of those are .cf files that you drop in /etc/mail/spamassassin - it'll read any .cf file in the directory its local.cf lies. Also check that URL for "popcorn," "backhair," and "weeds" - more rulesets. All of these rulesets are updated frequently and they're very effective.
Don't forget to collect your spam (especially the ones that slipped through) and run sa-learn on it too.

I would also suggest enabling the Spamcop bl test in SA - as long as you donate to Spamcop. ;)

Overall, this has been a *very* effective solution for us.

Theory? by Anonymous Coward · 2004-01-13 14:32 · Score: 0

I have baseless theory that the sole purpose of spam is to sell lists to other spammers, who sell lists to other spammers etc. There is no product behind them any more: it is like pyramid marketing.

There is a historical precendent (according to an old copy of OMNI) for this: a company that sold nasal hair clippers by mail in the seventies made the bulk of its money by selling mailing lists of the nasally clipped demographic: the (albeit extant) product was just to assemble the mailing list.

New use for Project Gutenberg by KalvinB · 2004-01-13 14:32 · Score: 3, Interesting

randomly grab a paragraph from a book and include it with the spam.

It would also help spammers to write better pitches. Use real words, actual English but put it in narrative real world sceneario format. So it reads like someone you know telling you how they use such and such a product.

"I went up the cabin last week with my girlfriend and tried out those new pills I heard about while I was there."

There's pretty much nothing in there that would be filtered. And then a slight plug of the product name with a link and you're done. It's also Marketing 101 that the less of an ad sounds like an ad the more effective it is.

But none of that thwarts my method which is to filter based on the URLs of links found in spams.

I get virtually no spam with a Mercury rule file that's all of 23KB and grows very slowly as spammers use new domains to host their product pages.

Ben

--
Work Safe Porn

Re:New use for Project Gutenberg by ElectricRook · 2004-01-13 18:52 · Score: 1

It would also help spammers to write better pitches. Use real words, actual English but put it in narrative real world sceneario format. So it reads like someone you know telling you how they use such and such a product.
If they were smart, they would have real jobs. Lets face it, if a spammer trys to sell me with

"SEll 150?000?000 bottles in 1 day!!!!!!! EARN $$$$$$$"

odds are he really thinks that's a smart sales pitch. Look at their software, usually spits out a bunch of crap that does not fall into any language. I get a huge amount of spam that is just plain noise.
Spammers hawk penis enlargers, because they have a penis size issue.

--
- High Tech workers, please say NO to Union Carpenters, their Union sees fit to control our compensation.
Re:New use for Project Gutenberg by WWWWolf · 2004-01-13 23:02 · Score: 3, Funny

so it reads like someone you know telling you how they use such and such a product.
"I went up the cabin last week with my girlfriend and tried out those new pills I heard about while I was there."

Oh, that has never ever been done in advertising... =)
How about stuff like
And the angels, all pallid and wan,
Uprising, unveiling, affirm
That the play is the tragedy, "Impotence,"
And its hero the Conqueror Pill.

Or:
Tis now the very witching time to have bad credit rating,
When the stores yawn, and the post-christmas sales posters breathe out
Contagion to this world: now could I use a new VISA card,...

Different Techniques by kalidasa · 2004-01-13 14:33 · Score: 5, Interesting

The article doesn't do a good enough job of explaining the different techniques in use.

First, hash busters. Yes, spammers are loading a random jumble of meaningful words in meaningless sequences into their spam, usually in the plaintext message body of a message with HTML content (i.e., you get hash buster - html message with spam content - hash buster). So HTML-aware clients (the main clients targeted I'm sure are AOL and Outlook Express) show the spam message, but not the hash buster. I'm guessing that this is specifically targeting bayesian filtering tools at AOL (anyone know if AOL is using a bayesian filter?); it works by introducing words that would not be found in a spam corpus in greater numbers than those that would.

Second, noisy spelling, like v1@gr@. Obviously this is also intended to defeat regex-based filters like spamassassin. If you vary your cliches enough, and you introduce very strange, but easy-for-a-human-reader-to-recognize spelling variants, you make it much more difficult for filter writers to write effective regexes.

Re:Different Techniques by Jeff+DeMaagd · 2004-01-13 15:58 · Score: 1

I hate the letter-character substitutions, while it may foil some spam busters it only makes things harder to read. For English, automatic character substitution fixing should be pretty easy, but then the spammers can throw in extra letters. It should be easy to toss out emails with character substitutions, on the plus side it would kill the "leet" speak emails too.
Re:Different Techniques by Anonymous Coward · 2004-01-13 16:47 · Score: 0

Actually v1@gr@. and other such fuzzy words make it a heck of alot easier for Regex type filters to identify the spam. By using replacement letters like @ they make Regex even more powerfull for spotting spam. It does mean that the filters need to be modified and monitored more closely to evolve with the spam however.

eg:
local_viagra_nuke header /\b~= v.?[il].?[a@].?g.?r.?[a@]\b/i
local_viagra_nuke score 100
local_viagra_nuke description Viagra Spam, NUKE EM!

(I think thats right anyways ... )
By using obfusticating spoofs spammers are defeating themselves... well against spamassassin anyways.
Re:Different Techniques by kalpol · 2004-01-14 10:51 · Score: 1

(anyone know if AOL is using a bayesian filter?)

They just block all mail from dynamic IP addresses.

--
12:50 - press return.

The real problem will be deliberate poisoning by Jerf · 2004-01-13 14:33 · Score: 5, Interesting

The real problem will be when the spammers finally figure out how to deliberately poison the Bayesian filters. So far they're using more-or-less random words, but that won't really work against Bayesian; it can tolerate that.

However, what constitutes "non-spam" is not as unique as most people think, as I've examined here. If they figure out how to deliberately put in hammy words, Bayesian will fall.

I feel OK posting this because I freely admit to this point I've overestimated them; I'm sure spammers have read that piece, and to date they have been too stupid to figure out what I said in plain English. But sooner or later one of them is going to figure out.

There's a strong core of "ham" that is "ham" for everybody, and sooner or later they're going to start abusing that.

And if I may forstall one objection... "But you don't understand Bayesian, it's [awesome for some reason and can't be beat ever, by anybody]" - I'll listen when you've actually written a program to examine filters yourself, OK? I understand it pretty damn well. It'll take more then bald assertions to convince me I'm wrong, I've done actual research, in the original sense of the word.

Re:The real problem will be deliberate poisoning by YU+Nicks+NE+Way · 2004-01-13 14:52 · Score: 1

Oh, cool! When you did the fake Nigerian spam, did you use the bigram method as a guide?
Re:The real problem will be deliberate poisoning by Anonymous Coward · 2004-01-13 15:07 · Score: 0

This is when we make spam illegal under the penalty of being shot in the head for making email unusable.

Then the stupid fuckers will complain, like "what did I do to deserve such persecution? Whine, moan."

Bunch of retards, all of them.
Re:The real problem will be deliberate poisoning by X · 2004-01-13 15:16 · Score: 1

I've seen ones which inserted random text from publicly available sources into the e-mail. One had a whole paragraph of political discussion that looked like it could have come from a blog somewhere. That's hard to beat because it would be very close to typical e-mails one might find.

--
sigs are a waste of space
Re:The real problem will be deliberate poisoning by firewood · 2004-01-13 15:18 · Score: 1

Theoretically, it should be possible to take the statistical database generated by working Bayesian email filters (maybe stolen off of zillions of hacked windows boxen), and reverse engineer the statistics to generate email text that these filters can't tell from non-spam, unless one has a highly individualized and weird corpus.
Re:The real problem will be deliberate poisoning by jmv · 2004-01-13 15:21 · Score: 1

There's a strong core of "ham" that is "ham" for everybody, and sooner or later they're going to start abusing that.

Do you have evidence to back that assertion? In my case (I know it's just me), ham basically means either refering to my open-source projects or written in French (even then spambayes does a good job at rejecting French spam).

I don't think spammers are that dumb either. I see four main difficulties for them to overcome bayesian filters:
1) Differences between user's filters (which you said can be overcome because "ham is ham")
2) Lack of "training data" for them We have lots of data from which we can learn how to avoid spam, but they have very little data which they can use to "train" anti-filter techniques.
3) They have to get the main message through. Eventually, if you can detect all forms (that remains to be seen) of the word "Viagra", they simply can't use that word in their email anymore (assuming I've got no ham containing that word).
4) Because each spam message is different, they have to find a cost-effective way to make each of them immune to filters. That's not easy either.

--
Opus: the Swiss army knife of audio codec
Re:The real problem will be deliberate poisoning by Anonymous Coward · 2004-01-13 15:39 · Score: 0

I've read your articles and have noted the laborious process you specify for "hamification" of spam. When your prediction comes true, spam will no longer be spam at all, will instead become personally addressed email, and thus will become too expensive to deploy in the quantities we're seeing now.

Meanwhile time passes and DSPAM continues to function flawlessly...
Re:The real problem will be deliberate poisoning by Uggy · 2004-01-13 15:46 · Score: 2, Insightful

It's really simple. The ONLY way spammers can defeat Bayesian filters is if they imitate what you call ham. ham = What you want; spam = what you don't want. Unless they custom tailor each message or random words to each user and guess (through some form of magical powers) what kind of email you call ham, then they fail.

Besides, if they could guess what your ham looked like, then they wouldn't be spammers... they'd be advertising folks pulling in 7 figures.

--
Toddlers are the stormtroopers of the Lord of Entropy.
Re:The real problem will be deliberate poisoning by Anonymous Coward · 2004-01-13 16:02 · Score: 0

I'm currently running SpamBayes and based on what I'm seeing when I check individual ham and spam messages, 90%+ of the email I'm getting is being sorted on header or html information alone. The email addresses of my friends, even if they are as generic as xyz1234@hotmail.com, will always get through because they've been seen heaps of times already. I know no-one on aol so aol.com is pretty high on the spaminess list. There's some guy "mknight" who must be on some spam list near my name, and if mknight is in the cc list then chances are it won't get through. "url:biz" rates higher than viagra for me (0.988% spam), with "url:gif" not far behind.

99% of my daily email comes from the same old people and their addresses are already known by SpamBayes. Anyone new sending me email will almost certainly include my name and that alone will get it past the filters. Until the spammers individually address their emails to me or start sending them from my friends accounts, they're not going to get through.
Re:The real problem will be deliberate poisoning by sidney · 2004-01-13 16:20 · Score: 3, Insightful

Nigerian scam spam is very different from most spam. It is a story that can be carefully written to use only words that are commonly used, assuming that the people who author them are able to go beyond their broken English all the way to use of statistically hammy correctly spelled text.

But how would you sell more inches on your male member enhanced with V*@gra to make money fast watching celeb teenie nymphos doing it on the farm while only using ordinary non-spammy words?

There are only so many ways to get someone to click here to get all the hot action and a long boring story full of erudite euphemisms is not one of them.

It would be interesting to see if your method of disguising spam can work on a wider range of topics.
Re:The real problem will be deliberate poisoning by Jerf · 2004-01-13 16:31 · Score: 2, Interesting

Do you have evidence to back that assertion? In my case (I know it's just me), ham basically means either refering to my open-source projects or written in French (even then spambayes does a good job at rejecting French spam).

Language is often a big indicator; since spam is aimed at a particular langauge group I don't consider it much. The fact my filter marks Japanese or Korean messages as spam is almost irrelevant, in a way, since I can't read it anyhow and it's easily dismissed.

But there's this common misconception that inside the spam filter it just looks for the three or four key words that mark "your" ham to the exclusion of all else. In reality there are big cues that are indepedent of "personalization"; see the Interesting Results section. Would you have guessed that "I'm" is such a non-spam indicator?

There's a strong core of hamminess that will be common to nearly everybody. (Also clarifies your point 1.)

2) Lack of "training data" for them We have lots of data from which we can learn how to avoid spam, but they have very little data which they can use to "train" anti-filter techniques.

Well, I sure didn't have any trouble finding ham for my training! Collecting 20,000 ham messages took me about 15 minutes; it took me longer to process them then find them. If I were a dedicated spammer I could collect a million in a couple of days, depending on how diverse a selection I want to acquire. One "weakness" of my experiment is the limited selection I acquired, but that's easily fixed and I think based on my experience it's already plenty diverse.

3) They have to get the main message through. Eventually, if you can detect all forms (that remains to be seen) of the word "Viagra", they simply can't use that word in their email anymore (assuming I've got no ham containing that word).

Yes and no. I already acknoleged in my post that without "cheating", you can't really get a sex spam through. (Though you'll have a hard time getting a real sex email through, too, if that is a normal email for you.)

But I "played fair"... spammers don't have to. They can craft a highly hammy message and append it to their spam. Even if your filter stop it, it poisons the filter. The filter writers can then take countermeasures against that, but you're back to an arms race and that's not a gain over what we had before the Bayesian filters.

4) Because each spam message is different, they have to find a cost-effective way to make each of them immune to filters. That's not easy either.

Well, creating a highly hammy message and appending any short spam to it they want ought to work. That's not too expensive.

Even so, you're sending a lot of people the same message for so little money it boggles the mind. Raising the bar for writing a message a little won't stop the flow, because it amortizes across all copies of the message sent too well. You need to raise cost per message or a number of other approaches.

I don't think spammers are that dumb either.

I used to not think so, and I had bet that Bayesian would already be useless by now. But I now realize that I have overestimated them by a significant margin. Like I said, I know some of them have read that piece. I get hits for "bypassing Bayesian filters" nearly every week from Google. I've gotten several requests for source code to my program, and I wager not all of them were legitimately academic. (Fortunately, I've lost it through a hard drive crash, but I consider my results still scientifically valid as at least in my opinion, I've given enough information to replicate my results.)

But they still haven't progressed past stupid o.b.f.u.s.c.a.t.i.o.n techniques (no, that won't get past Bayesian) and purely random words (neither will that) very far. (Remember, which a lot of people seem to miss when they read my piece, I respect Bayesian
Re:The real problem will be deliberate poisoning by jmv · 2004-01-13 17:33 · Score: 1

Actually, I'm not saying bayesian filters are perfect, but I was pointing out that it's probably not *that* easy to get past them.

Well, I sure didn't have any trouble finding ham for my training!

Actually, that's not exactly what I meant. The real "training data" for a spammer would be probability tables (or whatever you call what you train) for many different users. Of course stuff available on the net may help them, but it's probably sub-optimal.

Regarding your test, I didn't get through all your methodology, but to be sure, you'd need to "train" on a very different set of "ham" to be sure. Also, consider that personal mail is not likely to be found online. If you like I can offer my spambayes prob. table if you want to test how your messages go through.

Well, creating a highly hammy message and appending any short spam to it they want ought to work. That's not too expensive.

I think at that point, what might happen is that after a while no word will have a really hammy probability, but there will still be words with very spammy probabilities. Have you tried taking e.g. 10,000 ham messages, creating another 10,000 by appending a (different) spam message to each of them and then training on the 20,000. I'm curious how good/bad a bayesian filter would do.

Even if your filter stop it, it poisons the filter.

I agree that even if a filter manages to adapt, there might be an increase in false positive, simply because there will be no "really hammy" tokens.

But I now realize that I have overestimated them by a significant margin.

Well, it's true that some are still using frontpage :)

--
Opus: the Swiss army knife of audio codec
Re:The real problem will be deliberate poisoning by kirkjobsluder · 2004-01-13 17:45 · Score: 1

Read your messages and research. I still think that you are underestimating the power of personalization. One of the things I've noticed about the performance of my spam filter is that the common "strong core" of ham has a really trivial impact on the spam/ham decision. The tokens that have the greatest impact on the spam/ham decision are things that the spammer has the least control over in the header of the message. The spam filter becomes something of a trusted whitelist. If I forward spam to myself, through one of my work accounts, it is marked as ham regardless of how "spammy" the body content may be. The route is that powerful in determining hamminess. Perhaps my email is unusual, but I really don't see much of a common "strong core" of hamminess in my statistics.

I'm also not coming out of left field here on this. I have actually written my own filter. I also have done enough reseach in discouse analysis to know that statistical techniques are very powerful for not only determining differences between spam and non-spam, but also differences between linguistic groups that are quite a bit more subtle than spam/ham.

But playing devil's advocate here. Lets say that spammers manage to create a message that looks like ham, smells like ham, and talks like ham. Wouldn't this be exactly the kind of message that I would want to read and review for myself?
Re:The real problem will be deliberate poisoning by jmv · 2004-01-13 17:47 · Score: 1

They can craft a highly hammy message and append it to their spam.

Sorry to be replying twice. There maybe another (non-technical) reason why this might not work. The problem is that if they put the spam after some ham (enough to cover the spam), then the person receiving it will see the ham first and discard it before reading the spam part. The main target for the spam is naive people. If such person gets an e-mail from the python mailing list, it'll likely be discarded, so the spammer makes no money even if the e-mail goes through. Now if the spam part is at the beginning, the filters can be adapted to look mainly at the start. Even without that, there might be ways to look for very different scores in different parts of the message or things like that.

It may not be easy, but I don't think the bayesian battle is lost yet.

--
Opus: the Swiss army knife of audio codec
Re:The real problem will be deliberate poisoning by aaarrrgggh · 2004-01-13 18:16 · Score: 1

Actually, I don't imagine it would be THAT hard. Looking at where an e-mail address is harvested from could give you a useful corpus of ham words. It makes the e-mail database more complicated, but it also increases the chances of a higher success rate.
Re:The real problem will be deliberate poisoning by steveha · 2004-01-14 09:34 · Score: 2, Interesting

I read your article, but I am not as worried as you are.

First, my credentials: I haven't run an organized study of spam, as you have, but I did set up a Bayesian filter, SpamProbe, on my mail server (and I wrote an article about it). I get about 150 spam messages per day, and I only see the ones that get past my Bayesian filter. So I have looked over dozens of spams to see why they fooled my filter. (My filter is about 95% effective, and once I had it trained, I haven't observed any false positives.)

Yes, if a spammer works carefully, he can craft a message that will have a better chance of slipping past a Bayesian filter. But my Bayesian filter is not 100% effective anyway; as long as I only have to manually handle 5-15 messages per day, I'd say the filter is working. So the question is not whether the spammers can ever slip a message past the filter; the question is whether the spammers can completely destroy the usefulness of Bayesian filters, as you fear.

Bayesian filters look at the whole message, and they can learn to recognize spam in unexpected ways. For example, HTML font tags that set large red letters are a good spam indicator. HTML font tags that set white-on-white text are another. So Bayesian filters will force spammers to change the format of their spam.

Most spammers want you to call a phone number or view a URL. Since the Bayesian filter will learn the phone numbers and URLs are spam flags, Bayesian filters will force spammers to keep setting up new phone numbers and servers.

The "from" addresses of my friends will quickly become good ham indicators, and that will be difficult for the spammers to exploit (since everyone has different friends).

Also, my understanding is that you cannot really "poison" a word for Bayesian filtering; all you can do is lessen its usefulness as a spam/ham indicator. If spammers use different hammy words for each spam, the poison's dosage will be diluted; while if they use the same hammy words for each spam, those words will then be a legitimate spam flag.

There probably are a few refinements that could be made to spam filters. I'd like to see a spam filter that, if there is both an HTML part and a plain text part, only checks the HTML. That way the spammers can include ham in the text part and it won't affect the filtering.

In summary, I am reasonably hopeful that there is no way for spammers to completely defeat Bayesian filtering. The best they can hope to do is to sneak some mildly-phrased messages by the filters.

P.S. I agree with you that the ultimate anti-spam measure would be a "for-pay" mail system. I envision a mail protocol that allows you to specify how much it costs to send you an email: you put your friends on the free list, and otherwise it costs 5 cents or whatever. If you are really famous you might raise the cost up to reduce the volume of email you receive. There should be a mechanism in place to quickly refund the costs, and friends should be identified with a digital signature, not by an easily forged string. Spam only works because it's so cheap to send many messages, so a 0.001% response rate is enough. At even 5 cents per message, spam wouldn't be cost-effective anymore. You would still get ads in the mail, but they would be less obnoxious and more carefully targeted. Send me an ad for Mexican Viagra and you won't get your 5 cents back, but send me an ad for something I actually want and I'll consider it.

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely

/usr/share/dict/words by HeelToe · 2004-01-13 14:33 · Score: 3, Interesting

I thought about this after seeing my inbox spam increase to about 80 a day (the box that contains what is filtered is usually 10 per hour - my adress has been valid for just short of 10 years).

Why not check the subject or first few lines of plain (not html) text and see if 80% of it is in /usr/share/dict/words? I thought about trying this out, but have been too busy to get off my ass and do it.

Re:/usr/share/dict/words by CGP314 · 2004-01-13 15:11 · Score: 1

Bekaus me n must of my frends can't speell 80% of r words corectally.

-Colin Gregory Palmer

--
American Weblog in London
Re:/usr/share/dict/words by forkazoo · 2004-01-13 16:39 · Score: 0

Many people use jargon or slang that wouldn't be in any standard wordlist. For example, if two coworkers were discussing repairs to something called a TCP/IP multiplexor, and one was annoyed at the other, you might see a message starting thusly:

Yo, dickhead, the TCP/IP muxer is fuxxored, you gonna fix it er what?

Any 'generic' wordlist not specifically filled with jargon and slang will have the following words in it : "the" "is" "you" "fix" "it" "what"

This would mean, with 7 'gibberish' words and six 'real' ones, this piece of (arguable) ham would most likely be discarded by such a scheme. Personally, I think that false positives are the biggest worry for any SPAM filter. I'm willing to manually filter 15-20 emails a day to get to my three real ones, but I *never* dig through the "Bulk Mail" folder in my Yahoo! account. It's possible that there is something useful in there, but with 50-75 messages to sort through to find the ham, I just won't bother. I've had the account for less than ten years, but I have had it for the better half of a decade, anyway (I think... Maybe less, it's hazy). ::quote follows::
I thought about this after seeing my inbox spam increase to about 80 a day (the box that contains what is filtered is usually 10 per hour - my adress has been valid for just short of 10 years).

Why not check the subject or first few lines of plain (not html) text and see if 80% of it is in /usr/share/dict/words? I thought about trying this out, but have been too busy to get off my ass and do it. ::endquote::

Slimier than slime . . . by mjprobst · 2004-01-13 14:34 · Score: 5, Interesting

I saw one just yesterday that contained a list of important key sentences and phrases from the literature of common charities and political activism organizations.

In other words, if your Bayesian filter accepts those, based on your past decisions, it will detect the spam. If you reject the spam, you reject these communications as well.

Good filtering practice would dictate that one reads the junk box carefully enough to find both false positives and negatives. But the sheer bulk of mail that ends up in the junk box makes this unfeasible for many.

I have started letting these particular kinds of spam through, manually categorizing them (many words of random strings, dictionary vocabulary attack, positive phrase attack) in the hopes that filtering technology will soon advance to the point where these can be used as inputs to a more intelligent system.

Of course overhauling the mail system is a prerequisite to solving any of this long-term. For once I don't mind D. J. Bernstein's Internet Mail 2000 proposals. Of course there are other proposed systems, none of which has enough momentum to start a slow steady change. The end result of any non-consensus system will be to fragment the worldwide network of Email into competing, noncompatible systems that need to communicate through some kind of loophole or gateway. Back to FIDO-net days.

Re:Slimier than slime . . . by sholden · 2004-01-13 14:49 · Score: 1

I saw one just yesterday that contained a list of important key sentences and phrases from the literature of common charities and political activism organizations.

In other words, if your Bayesian filter accepts those, based on your past decisions, it will detect the spam. If you reject the spam, you reject these communications as well.

So you will block that other spam automatically as well.

How is that a bad thing?
Re:Slimier than slime . . . by Rares+Marian · 2004-01-13 15:01 · Score: 1

If you are a member of a citizens action group taking issues tot he streets then you've just shitlistedyou own mailing list.

--
The message on the other side of this sig is false.
Re:Slimier than slime . . . by bmasel · 2004-01-13 16:06 · Score: 1

I've been posting to hightraffic poilitical blogs (dkos) lately, and as a result am getting LOTS of spam with political buzzwords attached, what mjprobst has called "positive phrase attack.

An exerpt: district reapportionment incumbent poll Iraq entitlement Dean Bush. Budget reduction delay. Gephart Iowa Florida control. (Then on to refinance my mortgage.)

--
Ben Masel: 51,282 votes for US Senate in the Wisconsin Democratic Primary
Re:Slimier than slime . . . by sholden · 2004-01-13 16:45 · Score: 1

Except that the mailing list address will have a massive ham score and hence not get filtered - or non-idiots will have it in a whitelist.

why not filter out 1337 sp3@k? by SHEENmaster · 2004-01-13 14:34 · Score: 1

Why not simply filter out leet speak, or any message with more than half of the words misspelled that isn't encrypted?

--
You can't judge a book by the way it wears its hair.

Re:why not filter out 1337 sp3@k? by rgmoore · 2004-01-13 14:44 · Score: 5, Informative

Why bother? A decently trained Bayesian filter will be able to recognize a spam that contains a misspelled word or two, or one that contains substitutions of similar characters. Then it will learn that those modified forms are a very strong indicator of spam. As Paul Graham (the main early advocate of Bayesian Filters) has pointed out, there are legitimate reasons why you might see a mention of "Viagra" in your email, but no legitimate reason that you would see "V1agra", "\/iagra", "Vi@gra", or the like. Instead of slipping by my Bayesian filter, those variants actually stand out as particularly strong spam indicators.

--
There's no point in questioning authority if you aren't going to listen to the answers.
Re:why not filter out 1337 sp3@k? by digitalsushi · 2004-01-13 14:54 · Score: 1

if you can write me a regex that filters that out 80% of the time with 0 false positives, i will pay you 6 figures a year to sit on a chair in my museum as one of life's "mysteries".

--
slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
Re:why not filter out 1337 sp3@k? by the_mad_poster · 2004-01-13 14:55 · Score: 5, Interesting

1337 speak isn't a big deal. It's definitely filterable.

I've begun seeing chunks of text appearing in messages that are like legitimate mini-messages in and of themselves. Sort of like a counter weight. I don't think the aim is to pound Spam through the filters now, because what's happening is spam is getting slightly lower ratings each time while legitimate messages are getting slightly higher ratings.

In other words, the spam probably won't ever be legitimate, but it's making me lower my threshold for what is spam more and more. Eventually, I'll get to the point where some legit messages will cross over into being labeled as spam and spam will go through legit because the thresholds will be so close together as to practically overlap. It's also killing my ability to keep a spam trap that I can use to quickly train filters.

Whether this scene will actually play out and the "plot" will be succesful or not remains to be seen, however.

--
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
Re:why not filter out 1337 sp3@k? by Nogami_Saeko · 2004-01-13 15:17 · Score: 1

Oh, indeed it does. Paul's POPFile software doesn't even break a sweat with this sort of sillyness...

Here's a real-world sample from today:

Subject line: "GOT X(a)n@x, Vali(u)m, Viagr@, Som@ Di3t Pills Many M3ds QSDPA"

(the rest of the message was garbled in a similar fashion)

Scores
Bucket / Count / Probability
spam / 88 / 0.999999
inbox / 73 / 2.833932e-039

Heh... Not a chance spammer dudes, not a chance ;)

N.

--
"Nothing strengthens authority so much as silence." - Charles de Gaulle
Re:why not filter out 1337 sp3@k? by Anonymous Coward · 2004-01-13 15:30 · Score: 0

Hey, I got that one this morning. Damn faked Habeas headers with a -8 scoring shot that one through..

Here's a tip you SpamAssassin goons: If its that easy to fake, its not worth -8 whole points.
Re:why not filter out 1337 sp3@k? by NickDngr · 2004-01-13 15:46 · Score: 4, Funny

if you can write me a regex that filters that out 80% of the time with 0 false positives, i will pay you 6 figures a year to sit on a chair in my museum as one of life's "mysteries".

Pay me six figures a year and I will sit in a chair and do it for you manually.

--
Yoda of Borg am I! Assimilated shall you be! Futile resistance is, hmm?
Re:why not filter out 1337 sp3@k? by EvanED · 2004-01-13 15:51 · Score: 1

It may not be possible, even theoretically. Regluar languages are pretty limited in terms of what they can encompass.
Re:why not filter out 1337 sp3@k? by letxa2000 · 2004-01-13 16:03 · Score: 3, Interesting

You're completely right. I love it that spammers try to conceal their mail with weird combinations of words.
Examples from my corpus:
VIAGRA: 99.797%
V!AGRA: 99.9999%
AGRA: 99.9999% (from things like VI.AGRA)
IAGRA: 99.9999%
PORN: 98.573%
P0RN: 99.9999%
PR0N: 99.9999%
Plus, the trick is looking for things that give away spam that aren't just words. I call them "characteristics." For example:
Various pharmacy related terms: 99.9999%
HTML using % escape sequences: 98.789%
Http:// references that don't use www: 85.538%
=?ISO- in Subject: 99.9999%
Suspicious domains (BIZ, BR, PRO, etc.): 99.174%
1 "Adult Term": 70.8%
2 "Adult Terms": 85.7%
5+ "Adult Terms": 99.9999%
5+ HTML Comments: 92.0%
10+ HTML Comments: 98.3%
30+ HTML Comments: 99.9999%
In short, there are so many aspects of a message you can analyze and make "Characteristics" that my Bayesian filter can often make a decision entirely based on the characteristics without even looking at some of the terms used within the message. But if the characteristics aren't damning enough, the content virtually always is.
Re:why not filter out 1337 sp3@k? by F452 · 2004-01-13 16:23 · Score: 1

John's POPFile. But yes, it works great!
Re:why not filter out 1337 sp3@k? by Nogami_Saeko · 2004-01-13 16:55 · Score: 1

Ack, I stand corrected :)

At least I got the Graham bit right :)

I even paypal'ed him a few bucks for such a great piece of software. Easily the best piece of free software that I downloaded last year!

N.

--
"Nothing strengthens authority so much as silence." - Charles de Gaulle
Re:why not filter out 1337 sp3@k? by FragHARD · 2004-01-13 17:01 · Score: 1, Insightful

So just modify the bayesian filters to act on a set number of mispilled/garbled words say 10 or so. Of course this might make us have to learn how to spell correctly if we aver want anyone to get the emails we send :0)

FragHARD

--
FragHARD or don't frag at all
Re:why not filter out 1337 sp3@k? by happyfrogcow · 2004-01-14 04:50 · Score: 1

I've been wondering about spam and filtering stuff for a bit, without having read anything about bayesian methods of filtering. maybe i should.

Can encryption (public key is what i've thought about) be used to increase the amount of spam you can filter effectively? If you get spam, just check the message for redundancy of characters (i forget what that's called in technical terms) to see if it is plain text. If it approaches the redundancy of your language then it hasn't been encrypted, and is from someone who doesn't have your public key. I guess the problem is that, being a "public" key, the spammers could potentially get it, but farming correct email=>public keys pairs would be more difficult, unless of course there was a public key data store they had access to. It would also be more expensive, computationally and economically, to encrypt every peice of spam they sent out.

Then, if you get an encrypted message, but your private key fails to decrypt the message then it can also be discarded without worry.

You also would get the added bonus of secure communications. How to keep spammers from getting a real email address/public key pair would be a problem though.

just a thought... from someone who admittedly knows very little about filtering email as well as cryptographic protocols.
Re:why not filter out 1337 sp3@k? by ooby · 2004-01-14 05:50 · Score: 1

Are you saying that viagra is a common topic in the emails you receive from your friends? The Bayesian filter was designed to filter out words that wouldn't appear in a normal email and words that would appear in a spam. With this in mind, I find it hard to believe that spam and legit mail will intersect unless you spend much of your time communicating about the topics advertised in spam.
Re:why not filter out 1337 sp3@k? by the_mad_poster · 2004-01-14 06:55 · Score: 1

Noooo... like the post said, I'm getting SPAMS with chunks of pseudo-legitmate text which is harming the spam threshold. While Viagra continues to be a word that the filter throws the red flag at, the other terms will also take a hit as far as the filter is concerned. This results in legitimate messages getting a higher spam rating.

The only way to avoid that is to not mark them off as spam (which means the "legit text" won't be counted against legit messages with similar text). Unfortunately, that means the spams that have this "legit text" don't get labeled as harshly for a high-spam-likelihood anymore, which means I have to lower my threshold to say, 85% from 90% for the spams to get cut.

The temporary fix is to keep mingling the two tactics because then the threshold can be dropped slowly while the legit messages get smacked up slowly. The two limits are so far apart between what is an isn't spam that in the mid-term, there's not going to be any significant impact. However, over the long term, if this continues unabated, the thresholds will eventually meet for what is and what isn't spam.

For example. Say I sent you a spam for timeshares in the S. Pacific. You mark it spam, but I've pulled a nasty trick. I copied the first Act of Hamlet into the bottom of the spam. Guess what, buddy? You just blocked Shakespeare as spam, too! It works the same way if they're putting "legit" text in. Sure, you blocked their spam for Viagra, or financing, or prescription drugs, or whatever, but you also put a hit on the legitimate chunk of text they included.

Bayesian filtering is not the end-all solution to spam. It will be defeated. I know people don't like to hear that because it works so well NOW, but the fundamental problem of computer security remains: whatever you can DO, somebody else can UNDO given enough time.

--
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
Re:why not filter out 1337 sp3@k? by ooby · 2004-01-14 07:14 · Score: 1

Your point is well taken. This would also do harm to the ranking methods of the Bayesian filters. Based on your Hamlet example, if Hamlet was included in every spam, then the filter would consider hamlet would be very likely to be spam. Then, the next time you try out your thesbian talents and the director emals you your hamlet script, it's marked as spam.
Re:why not filter out 1337 sp3@k? by complexmath · 2004-01-14 12:04 · Score: 1

The problem I've encountered recently is a mass of email that imitates Microsoft update announcements and silimar emails. For some reason Mozilla refuses to filter this stuff out even though I've been marking them all as junk. I still see ~20 a day in this exact format.
Re:why not filter out 1337 sp3@k? by bhtooefr · 2004-01-14 14:55 · Score: 1

I agree with you that Bayesian is going to fail sometime. However, I think it might be able to hold up for a while with every spam having a hash buster in it. You see, even though there is the hash buster, the more good and bad mail there is, the more accurate it gets. My advice? Build your corpus now, and prepare to switch to a verification system PLUS Bayesian PLUS conservative blacklist.
Re:why not filter out 1337 sp3@k? by pcmanjon · 2004-01-14 18:51 · Score: 1

if you can write me a regex that filters that out 80% of the time with 0 false positives, i will pay you 6 figures a year to sit on a chair in my museum as one of life's "mysteries". http://si20.com
Re:why not filter out 1337 sp3@k? by Haeleth · 2004-01-16 13:21 · Score: 1

The problem with that is that most of us receive 99% of our legitimate email from people who wouldn't know a public key if it bit them.

Not to mention mailing lists and other forms of solicited mass email - although I guess whitelisting would let most of that through easily enough.
Re:why not filter out 1337 sp3@k? by Haeleth · 2004-01-16 13:23 · Score: 1

Here you go - $0.00, $0.00. I'll expect you here at 8 am tomorrow.

Look for it. by Anonymous Coward · 2004-01-13 14:35 · Score: 1, Insightful

It is not very often that people send random giberish in e-mail. Why not look for the gibberish. Hell even MS word can detect gibberish, I think a spam filter could score a message on non linguistic gibberish.

You blew it. by raehl · 2004-01-13 14:35 · Score: 5, Funny

You put Viagra in there in unaltered plain text.

--
paintball

Re:You blew it. by DoraLives · 2004-01-13 14:46 · Score: 2, Funny

You put Viagra in there in unaltered plain text.
Well...the idiots out there have to know they're going to be paying for something, don't they?

--
Is it fascism yet?
Re:You blew it. by Bryan_W · 2004-01-13 14:56 · Score: 1

Yeah he blew it in more than one way if he's buying Viagra.
Re:You blew it. by Alan · 2004-01-13 18:04 · Score: 1

Well, I think that these days you see "viagra" far less than you see "v14gra", "vi ag ra" or "V!I!A!G!R!A!!", so maybe it's going to end up on your non-spam wordlist? :)
Re:You blew it. by pipingguy · 2004-01-13 18:46 · Score: 2, Funny

You put Viagra in there in unaltered plain text.

Should SPAM filters check for correct spelling/dictionary check? Whoops, scratch that - wouldn't want to kill Slashdot replies.
Re:You blew it. by Dave2+Wickham · 2004-01-13 23:34 · Score: 1

Chapest Vagr nline! http://mNIbEvrjE2LUpKe8uHtv6QdhqHIq88nvW8HAB7Tf7.n ewzb.com/d12/index.php?id=d11
Tp 5 Reasons:
Csts 5$ less per pll from nline pharmacies
Get 1_day shipping.
Recieve your rdr next day
Bst up your sex life. Vagr wrks!
Stay rck hard like yu use to
Lst ll night with Vagr
**C**L**I**C**K*** HR TO S MR INFRMTIN! http://D7volDT14l5Uzjih2Kma7T6fyFNFy.newzb.com/d12 /index.php?id=d11

Apparently not... (that's a real spam I got)
Re:You blew it. by Anonymous Coward · 2004-01-14 04:18 · Score: 0

Check again.

"V1agra", "\/iagra", "Vi@gra"

Have 1 (one), \ (backslash) / (slash), and @ (at) respectively.
Re:You blew it. by SillySlashdotName · 2004-01-14 08:44 · Score: 1

You put Viagra in there in unaltered plain text.

Wish you would quote the original as I don't think they did.

"V1agra", "\/iagra", "Vi@gra",

Note that the 2nd is "\ /", NOT "V" - and is very apparent when quoting into a response.

--
Acts of massive stupidity are almost never covered by warranty. --me.
Re:You blew it. by pcmanjon · 2004-01-14 18:55 · Score: 1

test

hayu

Just great... by El · 2004-01-13 14:37 · Score: 5, Funny

... now my Bayesian filter is throwing out all email from my Lewis Caroll quoting friends! Thanks a lot, spammers!

--

"Freedom means freedom for everybody" -- Dick Cheney

Re:Just great... by You're+All+Wrong · 2004-01-14 02:20 · Score: 1

It's not funny; it's the reason they're doing it.

YAW.

--
Your head of state is a corrupt weasel, I hope you're happy.

I see this too by rockwood · 2004-01-13 14:37 · Score: 5, Interesting

I've been using "SpamBayes Outlook Plugin" since a previous /. article talked about it.

Agreeing with this article, over the past week or two I have seen excessive about of spam being missed by SpamBayes, even after marking them as spam for improved filter, they continue to hit the inbox whereas previous absolutely no spam made my outbox. Additionally, there may have only been 2 or 3 emails marked as possible spam when they were not. And zero items mark as definite spam that were not.

SpamBayes has worked great previously, but now even it is falling short.

I feel as the spammers manipulate the conents/context of the spam, it will eventually become impossible to determine the difference without physically looking at 500+ email daily.
My primary use of email is business and not personal, therefore I cannot risk missing a client email, payment, question, etc... I've also see a progression of clients having MY emails deleted or caught in spam filters due to the business aspect and requests for payments. I feel this is primarily due to the comparison of too-often-common-phrases that a spam email and a business email contain. Such things as Click here to submit payment, or Buy these Products, Overdue etc... Even though all clients I email are only clients that contact me. I never cold-email anyone.

More spammer are using this random text as the only text in the subject and body, and using an image as the content of their email, which makes scanning even more complicated, if not impossible.

Being on the net prior to what is is today (going on 20 years), I often wonder how much control the spam actually has over the net in several aspects

If spam were to disappear, will overhead costs decrease that greatly in order for ISP's to pass along higher saving to the consumer?
If Spam were to disappear completely, how much faster would the Internet be?

Has anyone ever done a study to determine how much effect spam has on degrading the net, and what would it be like if all spam was gone tomorrow?

--
Never try to beat a professional at his own game!

Re:I see this too by mabu · 2004-01-13 14:52 · Score: 1

Congratulations!

Because spam wastes so much of your time, you're forced to waste additional time to update your computer constantly to battle this scourge.

Your time would be better spent sending a letter to your local attorney general asking him to get off his butt and start prosecuting these criminals.
Re:I see this too by anthony_baxter · 2004-01-13 15:26 · Score: 1

As I mentioned earlier, I found that nuking my database and only training on messages that SB didn't nail as ham or spam (1.0 or 0.0 scores) has made a world of difference. Give it a go.
Re:I see this too by Czmyt · 2004-01-13 15:35 · Score: 1

I use SpamAssassin and I have a pretty low threshold. Anything with a score of 4 or more gets flagged as spam and held in a separate spam folder. 5 or more is the recommended threshold for spam. Anything with a score of 7 or more gets permanently deleted. So even if you get a lot of spam, I think you really only need to review the messages that are near the threshold.
Re:I see this too by Wild+Wizard · 2004-01-13 15:57 · Score: 2, Interesting

We managed a score of 42.8 recently with SpamAssassin

http://spamhalloffame.abnormalpenguin.com/

Only a few slip through at a level of 5 for us, haven't yet got to piping the high level ones directly to /dev/null yet
Re:I see this too by N7DR · 2004-01-14 03:55 · Score: 1

SpamBayes has worked great previously, but now even it is falling short.
Well, I use POPfile, and all I can say is that not a single one of these bizarre spams full of random words has made it into my inbox, and neither has POPfile falsely accused any real e-mail of being spam. In other words, this new tactic hasn't affected my inbox one little bit.
In fact, just yesterday I was wondering why spammers were suddenly filling their messages with random words, since it sure wasn't getting past my Bayesian filter. I guess that the answer is that it does confuse some programs, just not POPfile (yet, anyway).

My spam had Linux gibberish in it. by Slayk · 2004-01-13 14:37 · Score: 1

Needless to say I was mildly amused. P Hilt0n Vid

Visit site (topright lin!
EExceppt for specific coompaatiibilittyy mmodes (chhainn-loading and the Linuxx piggybbaack foormat), all kkerrnels willll be staartted in mmuchh tthe samee statte as inn the MMultibooot Specciifficattion.. Onlly kerrnels loaded at 11 meggaabbyyte or aabove are ppresentlyy supported. Anny attemppt tto load beeloww thaat bounddaryy will simmplly result in immeediaate failuree andd aan erroor messagge reportinng the problemm. .

The next attempt by eschasi · 2004-01-13 14:38 · Score: 2, Insightful

As the article points out, the technique isn't as effetive as one might initially think. However, there's a clear "next generation" method that I'm sure we'll soon be seeing:

Insert four or five lines of valid extra text -- lines from books, selections from recent USENET postings, etc, etc -- into the spam. Make the selection semi-random. Now do it 100 times and send 100 copies to each person on the mailing list.

One of them will get through. And the spammers will continue to work.

My friends have been accusing me of this for years by ewg · 2004-01-13 14:39 · Score: 1

My friends have been accusing me of emailing them randomly generated streams of dictionary words for years...

--
org.slashdot.post.SignatureNotFoundException: ewg

Server-side Bayes by Anonymous Coward · 2004-01-13 14:40 · Score: 0

AFAIK, Bayesian filters are not used much (if at all) on mail servers.

Our CanIt-PRO product does server-side Bayesian filtering, and different users can have their own personal Bayes corpus.

Yahoo seems to have worked it out pretty fast by stewball · 2004-01-13 14:40 · Score: 1

I use a yahoo email address for newsletters, registration, etc. I got maybe 5 of the nonsense word spams a couple weeks ago, marked them as spam, and every one of them's gone into my bulk folder since then.

Of course, Yahoo's false positive rate on newsletters is atrocious, but it's easy enough to pick those out and then empty the bulk folder.

Just curious, anybody know what Yahoo's using for spam filtration?
-----

--
Point and Counterpoint: The Tick - "Spoon!" Neo - "There is no spoon."

Re:Yahoo seems to have worked it out pretty fast by whovian · 2004-01-13 15:17 · Score: 1

Hard to tell from where I'm sitting.

I dunno, does Yahoo take a more aggressive approach to filtering if you use their premium service? I get enough ~140 kb emails offering to upgrade my Win* OS to fill up my free 6 MB mailbox twice a week. I can't tell whether having reported those emails as "Spam" to Yahoo has made a difference.

Atrocious, spam is.

--
To-do List: Receive telemarketing call during a tornado warning. Check.
Re:Yahoo seems to have worked it out pretty fast by Anonymous Coward · 2004-01-13 15:55 · Score: 0

> Yahoo's false positive rate on newsletters is atrocious

Mostly likely because people complain that stuff is spam.

It seems that every web retailer you deal with will eventually send you their newsletter, even when you specifically opted-out.
Re:Yahoo seems to have worked it out pretty fast by WuphonsReach · 2004-01-13 18:05 · Score: 1

Dunno, but I've tagged darn near every Newslinx e-mail newsletter that I've gotten in the past few months as "Not spam", yet Yahoo! Mail still puts it in the bulk folder.

You would expect that after 100 whacks with a rolled up newspaper (bad filter! bad filter!), it would get the idea that Newslinx is not spam...

I believe it's because Newslinx uses a new e-mail address every time (unlike the BBC daily news which never gets tagged as spam).

--
Wolde you bothe eate your cake, and have your cake?

I was worried about this for a while by supertux · 2004-01-13 14:48 · Score: 1

I've actively been using the bayesian filter that Mozilla comes with for a bit over a year now. Although it seemed to take forever to get 'trained' to what I consider spam, I've found that it works exceptionally well, maybe mismarking a legitimate email to me probably less than a dozen times so far (after the initial round of training).

Maybe six months ago, I noticed I was receiving quite a lot of these hash busting spams and I was bummed that maybe the bayesian filter wasn't the be all end all of spam filters.

But I pressed on using it, and in time, almost all of the hash busting emails are again getting filtered as spam.

I'd guess there are only so many different ways people can write Vi@gra and still have it be readible...

SuperTux

Everybody say this with me by mabu · 2004-01-13 14:48 · Score: 1

1. Wow? Spammers subvert content-based filters? Say it isn't so???? Get real!

Client-side filtering is a band-aid on a malignant tumor growing out of control. It will NEVER work, EVER. It requires constant updating and monitoring to avoid blocking legitimate e-mail and is a black hole of resources, time and money. Because of the ROI, spammers have more incentive to crack the filter than filter companies do to block the spammer.

If you're using client-side (or even server-side), content-based spam filtering, you're only hurting yourself. It's better to get a few spam messages than miss a critical communique, which can cost you a lot more. But feel free to piss in the wind - it seems to be in style anyway.

RBLs, and specifically Spamcop's Relay Blacklist are much more effective than content-based filtering.

2. Spammers break into systems, STEAL bandwidth and network resources. Almost all of them break various laws in virtually every region they operate.

3. The authorities are too busy detaining little old ladies at airports for posessing a fingernail clipper, suing 13-year olds downloading Bobby McFerrin, and raiding Tommy Chong's house to care.

4. Spam will disappear when the major network providers endorse a centralized SMTP whitelist. The reason why nobody talks about it, is that it's a cure for the spamedemic and there are a lot of companies out there, including all the ISPs that profit from spam.

Even Microsoft hates Spam by Jorkapp · 2004-01-13 14:48 · Score: 1

http://www.wired.com/news/technology/0,1282,61742, 00.html?tw=wn_story_related

By the looks of things, they are going so far as to identify the most active spammers, and hunt them down.

Score.

--
Frink: Nice try floyd, but you were designed for scrubbing, and scrubbing is what you shall do.

You'll laugh from it... by Scrameustache · 2004-01-13 14:48 · Score: 5, Funny

a while ago I got a spam that contained a few exerpts from The Raven by Edgar Allen Poe. I got a laugh of that one.

...never more ;- )

--

You can't take the sky from me...

Re:You'll laugh from it... by Anonymous Coward · 2004-01-13 18:29 · Score: 0

The Raven huh? Well *I* got some penis enlargement advice from Catullus himself:
...Consecuter adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud...

i'm screwed by humble_moon · 2004-01-13 14:49 · Score: 0

I use random words for subjects quite often. I consider it a form of poetry when i'm writing to friends.. not completely random, but thought provoking in a semi-sensical way. Guess I'll be filtered out soon..

Blacklists and SpamAssassin Combined by rossz · 2004-01-13 14:50 · Score: 1

I recently re-evaluated my antispam blocks. Over the xmas holiday there was a very noticable increase in the amount of crap slipping though my defenses.

I ended up tweaking a few SpamAssassin rules to deal with what is popular (with the spammers) at the moment. This will need to be adjusted manually as the spammers change tactics. SpamAssassin is scanning everything after the DATA ACL while still connected so it can deny the message and not bounce it to some poor schmuck being joe-jobbed.

I also made a few changes to the blacklists. sbl-sbl.spamhaus.org is now my all time favorite blacklist. I also block all of China and Korea. I don't know anyone in those countries and they constitute a large percentage of all spam. Yes, I know the U.S. is the biggest source of spam, but I can't exactly blacklist my own country and expect to get email from friends and associates.

Another major change was to stop accepting email from dynamic ip addresses. This forced me to add a condition to allow one friend's server to send to me. I've since removed that exception as he's finally listened to reason and is routing his stuff through another mail server he administrates that is on a static ip address.

In the week since I've tweaked my settings a total of 3 or 4 spams have made through. Zero would be nice, but that isn't attainable without a serious risk of false positives, and probably not even then.

Finally, there's my personal blacklist. People or companies who annoy me too much end up in there. One was for an online magazine my wife signed up for that doesn't seem to have way of unsubscribing. After several futile attempts by my wife to get them to stop sending their stuff I stuck them into the blacklist.

On a few occassions I've blocked ip addresses at the firewall. Spammers using software that does't recognize the "bugger off" error code, NameProtect.com just because I don't want them snooping around on my system, and the occassional script kiddie.

I guess you can say it's a game. I keep score by comparing the number of "rejects" in my logs to the amount of spam getting through. I'm winning by a long shot (250 to 4 according to my current log).

--
-- Will program for bandwidth

What I don't understand by Trejkaz · 2004-01-13 14:50 · Score: 3, Interesting

What I don't understand about this type of spam is that often it doesn't contain any actual advertisement, just three or four lines of random words, and the end of the email right there.

I don't get it. If you're not selling a product, what is the spam for?

Mind you since TMDA, I haven't been seeing any spam anyway.

--
Karma: It's all a bunch of tree-huggin' hippy crap!

Re:What I don't understand by he-sk · 2004-01-13 15:04 · Score: 4, Informative

That's the text/plain part you see. The "advertisement" is in the text/html part.

I was very irritated by that, too, until one day I was testing the HTML viewer of an e-mail client.

--
Free Manning, jail Obama.
Re:What I don't understand by Trejkaz · 2004-01-13 15:55 · Score: 1

Actually I was viewing the source of the whole email, not the text part.

--
Karma: It's all a bunch of tree-huggin' hippy crap!
Re:What I don't understand by berzerke · 2004-01-13 16:33 · Score: 4, Interesting

[What I don't understand about this type of spam is that often it doesn't contain any actual advertisement, just three or four lines of random words, and the end of the email right there.] Actually I was viewing the source of the whole email, not the text part.

I too see this sometimes. You're not crazy (at least with regards to this). I've looked at the full source, but still can't figure out what the goal is. My best guess is either they are fishing for bounces (ok, these are bad addresses; the ones that don't bounce may be good addresses), or the spamming software has a problem (bug or is misconfigured).
Re:What I don't understand by Trejkaz · 2004-01-13 16:39 · Score: 1

I hope to hell they're fishing for non-bouncing addresses, because at the moment any email which SpamAssassin says is spam, I bounce.

--
Karma: It's all a bunch of tree-huggin' hippy crap!
Re:What I don't understand by Mr+Z · 2004-01-13 17:56 · Score: 2, Interesting

So far as I can tell, most mainstream mailreaders (in their default configuration) will show you only the HTML component, if both variants are provided.

Thus, the spammer puts their filter-fooling gibberish in the text/plain component, and their add in the text/html component. The recipient is none the wiser about the gibberish.

Since I use mutt, and I don't have an HTML filter configured, I'm immune to the ads in most spam. Since spam advertisements like to have tracker images and so on (to measure how often people actually open spam), I seem to get relatively little spam that lacks an HTML component. Further, most spam lacks a meaningful text/plain component.

The only annoyance with this arrangement is the fact that one or two of my coworkers insist on sending HTML-only email. *sigh* (Since one of them is the father of JTAG, I don't bother trying to bend his ways.)
--Joe

--
Program Intellivision!
Re:What I don't understand by Trejkaz · 2004-01-13 18:17 · Score: 1

This is all well and good, but just like I said to the other guy, I was viewing the source of the email, not using a user agent.

--
Karma: It's all a bunch of tree-huggin' hippy crap!
Re:What I don't understand by g0_p · 2004-01-13 18:18 · Score: 1

Maybe it was a test email that the spammer sent to see if it makes through...

One that I received today had a link to a non-existent website as well..
http://www.Stop6The4Spam4Already.com
Re:What I don't understand by ElectricRook · 2004-01-13 18:38 · Score: 5, Informative

I hope to hell they're fishing for non-bouncing addresses, because at the moment any email which SpamAssassin says is spam, I bounce.
Don't ever do that, all spam has forged headers. You're just making life hard on someone who had their address sold.
I work for a big company, an icon the the computer business. Our mail servers get spammed a lot. We often have typical user names grafted onto the From or Reply lines. Since my user name is pretty damn common, and some of my work mail aliases are TLAs, I look at a lot of spam. When I read the headers (in a text file, not easily spoofed mail software), almost always the senders domain is not even close to the domain of the spamming machine. Go put the IP addresses into dnsstuff.com, and compare that to the hostname. These turds hack the sendmail.cf file of the spamming machine. "SallySmith@aol.com" probably did not send spam-mail from a ".kr" ISP.

--
- High Tech workers, please say NO to Union Carpenters, their Union sees fit to control our compensation.
Re:What I don't understand by Mr+Z · 2004-01-13 18:50 · Score: 1

I was just trying to relate the underlying sense behind the structure.

FWIW, if you're writing a spam filter, it may make sense to keep separate statistics for the different structural components of the email. For instance, if you get a text/html component, compare its contents to other text/html components.

Another fertile ground for comparison is encoding types. How many legit emails do you get with text/plain encoded as base64?
--Joe

--
Program Intellivision!
Re:What I don't understand by Trejkaz · 2004-01-13 19:31 · Score: 2, Interesting

Whereas it might be true that all "spam" has forged headers, not all email which passes the 5.0 threshold has forged headers.

Also aren't other mail servers supposed to check that the envelope sender matches the host it's being sent from?

--
Karma: It's all a bunch of tree-huggin' hippy crap!
Re:What I don't understand by zarkzervo · 2004-01-13 20:59 · Score: 1

They want you to filter out all emails with legal random words. After a while, many of the emails you want will be filtered out. This will force you to manually filter your spam-folder. They can then send spam as normal and you will read them because you do not want to lose legit email.
This is just an assumption. I do not know this.

--
Insert `fortune -o` here
Re:What I don't understand by julesh · 2004-01-13 22:00 · Score: 1

I hope to hell they're fishing for non-bouncing addresses, because at the moment any email which SpamAssassin says is spam, I bounce.

Don't ever do that, all spam has forged headers. You're just making life hard on someone who had their address sold

Yes, those of us who do this are aware of this problem.

However, we feel that it is more important that people who are trying to send legitimate messages that are caught as false positives by filters are aware that their message did not get through.

As someone who has frequently suffered as the forged sender of large mailshots in the past, I can quite understand how frustrating it is. But then, I tend to get thousands of copies of the virus-du-jour too.
Re:What I don't understand by funky+womble · 2004-01-13 23:16 · Score: 2, Informative

Bouncing high scoring mail works pretty well, as long as you do it right.
Re:What I don't understand by TheMidget · 2004-01-13 23:36 · Score: 1

Also aren't other mail servers supposed to check that the envelope sender matches the host it's being sent from?
They are. It's called SPF. However, this standard is still new and thus not very widely implemented yet, but this will probably change in the next couple of weeks.
Re:What I don't understand by Cartridge+P.+Grover · 2004-01-14 00:01 · Score: 1

You're missing the key point. Spammers don't always do it for money. They do it to harm people and businesses. They do it because they are bad people. That is their prime motivation. Effortlessly, spam could be ended tomorrow, with no new technology and no new laws. All that has to happen is for a few really dumb people to think really hard. And I don't mean the spammers or their customers.
Re:What I don't understand by Rhubarb+Crumble · 2004-01-14 01:09 · Score: 1

Another fertile ground for comparison is encoding types. How many legit emails do you get with text/plain encoded as base64?
Ummm, actually, quite a few. Mostly from Chinese people using OE, must be a default setting in the localised version.
Re:What I don't understand by You're+All+Wrong · 2004-01-14 01:39 · Score: 1

Base64's a good indicator, I use: :0 H:
* Content-Type: text/(html|plain)
* Content-Transfer-Encoding: base64
${CRAP}

And similarly, I check to see if people are using quoted-printable in order to disguise plain characters. Likewise, the use of HTML entities to encode plain characters is a dead giveaway: :0 B:
* &#([3-9][0-9]|1[0-2][0-9]);
${CRAP}_1_6

(That one's still experimental, as in theory it has legit uses.)

The clue is not to look at what's being displayed or hidden, it's to look at whether something's trying to be hidden.

YAW.

--
Your head of state is a corrupt weasel, I hope you're happy.
Re:What I don't understand by gnu-generation-one · 2004-01-14 01:41 · Score: 1

"I don't get it. If you're not selling a product, what is the spam for?"

For attacking spam-filters.
Re:What I don't understand by mrogers · 2004-01-14 01:59 · Score: 1

The HTML part of the message probably contains a tiny transparent image which is loaded from a URL that is different in each copy of the message. When your email client displays the message it requests the unique URL, telling the spammer that your email address is valid, as well as what IP address you're using (which can then be used to link your email address to info derived from web cookies etc).
Yahoo mail allows you to disable images in HTML emails for this reason, not sure about Hotmail, Outlook, Evolution etc.
Re:What I don't understand by Mr+Z · 2004-01-14 02:23 · Score: 1

Ok, good point. :-)

I, on the other hand, never seem to, so my locally-trained filter could pretty reliably reject spam based on that fact, whereas yours could not.
--Joe

--
Program Intellivision!
Re:What I don't understand by Don'tTreadOnMe · 2004-01-14 02:26 · Score: 1

Don't ever do that, all spam has forged headers. You're just making life hard on someone who had their address sold.

I'm torn by this, actually. What I fear most is blackholing legitimate mail. At least if I bounce it, then the legitimate sender will see the return, and know their mail didn't get through.

But then, I know I am adding to the problem, because most of the bounces are going to the address the spammer forged...

--
It's a catch. Catch 22.
Re:What I don't understand by Anonymous Coward · 2004-01-14 02:39 · Score: 0

That's the text/plain part you see. The "advertisement" is in the text/html part.

I was very irritated by that, too, until one day I was testing the HTML viewer of an e-mail client.

You were very irritated by getting these few lines of ASCII gibberish in your e-mail, until you found out they were really HTML spam, so then you were cool with it?
Re:What I don't understand by berzerke · 2004-01-14 03:32 · Score: 1

...When I read the headers (in a text file, not easily spoofed mail software), almost always the senders domain is not even close to the domain of the spamming machine...

I'll have to double check this next time I get a spam with just a few random words. It's rare though. The fishing for bounces may be accurate after all. Think about. There is no pitch to sell anything in said emails. No links, nothing. Just a few random words. Since spam is commonly defined at unsolicated COMMERCIAL email, the spammer can claim, "I was playing with some software and it just malfunctioned. But it wasn't spam because I wasn't trying to sell anything."
Re:What I don't understand by Brian+Ristuccia · 2004-01-14 04:51 · Score: 3, Interesting

I hope to hell they're fishing for non-bouncing addresses, because at the moment any email which SpamAssassin says is spam, I bounce.
Don't ever do that, all spam has forged headers. You're just making life hard on someone who had their address sold.

Returning suspected spam might have a small adverse effect on the legitimate holders of forged addresses, but silently deleting suspectred spam adversely affects everyone by causing misclassified messages to be silently lost. The practice of bouncing spam doesn't increase collateral damage, it prevents it. Automated processes must cause mail to either reach its destination or be returned to its purported sender. Otherwise legitimate mail will get silently lost. That's collateral damage.
This balance of burdens is fair too. Fake bounces are much easier to filter than ordinary spam. Even if the bouncing MTA engages in the unfortunate practice of sending bounces that don't contain the original message you can still filter all fake bounces with 100% reliability. Simply send each of your outgoing messages with a unique tagged, timestamped envelope sender address. Bounces which arrive at other addresses are always in response to forgeries and can be safely discarded.
Re:What I don't understand by RobNich · 2004-01-14 08:58 · Score: 1

Also aren't other mail servers supposed to check that the envelope sender matches the host it's being sent from?

Absolutely not, that would break most people's sending of email, since they use their ISP to send, which may or may not be the same as their sending domain.

--
Hello little man. I will destroy you!
Re:What I don't understand by zexxxx · 2004-01-14 09:08 · Score: 1

... and compare that to the hostname. These turds hack the sendmail.cf file of the spamming machine ...

In that case, these spammers should become sendmail developers. Only God can hack those .cf files.
Re:What I don't understand by CargoCultCoder · 2004-01-14 09:13 · Score: 1

What I don't understand about this type of spam is that often it doesn't contain any actual advertisement, just three or four lines of random words, and the end of the email right there.

I don't get it. If you're not selling a product, what is the spam for?

I think it's "chaff" intended to reduce the effectiveness of Bayesian filters.

It's like a plane ejecting a load of foil strips in order to reduce the effectiveness of ground radar. The radar operator suddenly has a much tougher job of distinguishing between blobs that are actually planes, and blobs that are wads of foil.

If your Bayesian filter is bombarded by e-mails with random subjects, random senders and random contents, it's going to have a harder time distinguishing spam from legitimate mail. So, you either get more unfiltered spam, or you find a better filter (?).
Re:What I don't understand by Trejkaz · 2004-01-14 10:45 · Score: 1

I guess if it's for attacking spam filters, people who run software like TMDA are immune. At least until something fails the filter and is on the whitelist.

--
Karma: It's all a bunch of tree-huggin' hippy crap!
Re:What I don't understand by Trejkaz · 2004-01-14 10:54 · Score: 1
Oh well. Personally I'll just keep sending it through the mail server I use.
- When I was using the ISP's mail, I was using the ISP's mail server.
- When I had my own domain hosted off-site, I was using the mail server on the box itself.
- Now I have my mail stored at home, and I send through the box at home.
- - And in the event where it doesn't seem simple to the user to do this, hey... they can always set up SPF for their domain to include their ISP's mail server, right?
--
Karma: It's all a bunch of tree-huggin' hippy crap!
Re:What I don't understand by Trejkaz · 2004-01-14 10:58 · Score: 1

Again, like I said, I was viewing the source. NO HTML PART! Jesus. You think people would pay attention.

--
Karma: It's all a bunch of tree-huggin' hippy crap!
Re:What I don't understand by gnu-generation-one · 2004-01-14 11:15 · Score: 1

"I guess if it's for attacking spam filters, people who run software like TMDA are immune."

In the same way that people who live in nice neighbourhoods are immume from crime I suppose... It's still there, just that they don't see it.

One thing TMDA people are immune from is getting receipts from online shopping, or confirmation emails for mailing lists and such like. Computers that don't read the challenge/response questions, and whose email addresses you don't know in advance to whitelist.

Hopefully they're also immune to the angry emails from everyone whose email addy was spoofed, and is getting a shedload of challenge emails from TMDA systems...

And it doesn't really stop someone from sending spam. All they need to do is write schneier@securityfocus.com in the From field, and their email gets through to anyone who's subscribed to cryptogram and whitelisted it.
Re:What I don't understand by Trejkaz · 2004-01-14 12:23 · Score: 1

Also their spam would have to class as less than 5.0 on SpamAssassin, or it would be blocked by the first rule in my TMDA's incoming filter.

Handling receipts for online shopping is easy. I just go in and release the last email a minute after I made the order. :-p

And the angry people, well... I'll let you know when I find one.

--
Karma: It's all a bunch of tree-huggin' hippy crap!
Re:What I don't understand by Thuktun · 2004-01-14 12:43 · Score: 1

I hope to hell they're fishing for non-bouncing addresses, because at the moment any email which SpamAssassin says is spam, I bounce.

Don't ever do that, all spam has forged headers. You're just making life hard on someone who had their address sold.

Depends on whether "bounce" is being used (arguably misused) to mean rejecting the email during the SMTP transaction. That won't directly cause misdirected non-delivery notification.

Crafting such a notification after accepting the message and delivering that to the envelope sender is indeed very rude these days.
Re:What I don't understand by pcmanjon · 2004-01-14 19:02 · Score: 1

Why won't my ISP sbcglobal.net subscribe for this service [spf.pobox.com] and why do they [support] refuse to give me the email to an executive or someone who can take suggestions on how to impliment spam-assassin for our ISP
Re:What I don't understand by mrogers · 2004-01-16 02:33 · Score: 1

Actually you didn't mention that in your post, you obnoxious turd.
Re:What I don't understand by Trejkaz · 2004-01-16 08:28 · Score: 1

That's because it was in a comment, not a post, you dumbass.

--
Karma: It's all a bunch of tree-huggin' hippy crap!
Re:What I don't understand by RobNich · 2004-01-17 15:13 · Score: 1

My sequence of events follows yours exactly, except that my hosting provider didn't have sendmail set up to allow external SMTP relaying. Now that my domains are hosted at home, I got everything working except the component of postfix that authenticates SMTP connections. So until I get that working, I have to SSH into my home network, and tunnel port 25 to the mail server. This is not a viable solution for most people (IMHO). It should be easier to get working.

--
Hello little man. I will destroy you!

A method for removing spam from your life. by crazyphilman · 2004-01-13 14:51 · Score: 4, Interesting

It's old fashioned, and some of you will probably make fun of me for using it, but hey, I'm old school. FYI, here's my method:

1. Create manual spam filters (NOT beyesian filters) in your inbox called "Friends and Family", "Work", "Services", "logfiles", and any others you find you need. Each category applies to a broad type of email address you'll receive email from. Then create a subdirectory in your inbox for each of these filters (named the same way, naturally).

2. For each filter, build a list of people who are allowed to email you. For example, your ISP, your bank, and your phone company would probably be added to services. Just add the email address they send their messages from to the list.

3. For each filter, have the filter move messages matching the filter (From equals ) to the correct subdirectory for the filter. Then stop processing for that message, so it doesn't get interpereted by other filters. Think of this as an analogy for ipfilter or ipfw in your firewall setup -- only you're filtering emails instead of packets.

4. Finally, DELETE EVERYTHING ELSE in the very last filter.

You USE this approach by doing a quick scan of the deleted items folder to see if anything is interesting. If not, just clean out those deleted items. It's a one step operation, much easier than selectively deleting a hundred emails one at a time.

Then, you scan each of the folders you set up, IF the folder has picked up an email, focusing only on your REAL email.

This approach has saved me a HUGE amount of work lately. My life is a whole lot easier, and it's way easier than trying to train a Beyesian filter. If I don't know you, you can't get too much of my attention.

It's all about being on the list, sort of like getting into a nightclub... ;)

--
Farewell! It's been a fine buncha years!

Re:A method for removing spam from your life. by fishbowl · 2004-01-13 15:00 · Score: 1

Well, Phil, your solution is a good approach -- and I do something similar.

But filtering your mail *after* you've received it only solves the part of the problem that relates to the value of your time and your annoyance threshhold. But it does absolutely nothing about the resources required to receive the unwanted mail in the first place, or to file them.

I wish I could run the MX for my domain on my own network (but my cable provider forbids it.) I wouldn't even open the SMTP socket for these assclowns. I'd blacklist entire continents, if that's what it took. My ISP cannot do that for me.

It really doesn't bother me so much to have to delete the spam from my folders, or read the occasional one that spamassassin misses, or whatever. But it does bother me that I have to receive the message in the first place.

--
-fb Everything not expressly forbidden is now mandatory.
Re:A method for removing spam from your life. by crazyphilman · 2004-01-13 15:17 · Score: 1

Ah. Well, you've got me there.

In my case, I have two situations to worry about, and neither is under my direct control.

At work, I'm compelled to use (PITY ME) MS Outlook for all my email needs. Anything else is verboten. And, our email system is run by Exchange, totally out of our control in a different part of the building. So, although I'd love to move my filtering upstream, it's not possible. I hear they're trying to implement something, but I have a limited amount of confidence they'll succeed. Sigh... Client-side filtering is about all I can do. But it DOES save me a LOT of time. I get a buttload of spam at work, thanks to a bout with stupidity in which I signed up for some professional journals and the bastards sold my email address almost immediately. I've reduced my time sifting through spam to about a minute here and there, from half an hour at a time. Ain't bad...

At home, I use a cable modem ISP, with pretty fast access. I have no idea what kind of filtering they have upstream, and I have no control over it anyway. All I can do is client-side filtering. Here the problem isn't that bad, because my email address hasn't gotten too far out on the web. And, no Outlook -- OS/X tools only. So it's a more comfy environment...

Interesting idea about running your own mail exchange. You can apply a similar idea there, too, right? I mean, only permit mail from certain domains, etc? Maybe just delete other packets, so the source doesn't even know if your server's up? Or is that unneighborly? ;)

--
Farewell! It's been a fine buncha years!
Re:A method for removing spam from your life. by WuphonsReach · 2004-01-13 15:28 · Score: 1

That's basically what I do (whitelisting)... except that instead of deleting the remainder, I hand it off to a folder that SpamBayes watches and let bayesian take a crack at it. The majority of what hits that folder is going to be spam anyway, bayesian does a nice job of seperating the accidental misses.

Anything that SpamBayes flags as spam is pretty much 99.99% spam (with a few thousand messages that it was trained on).

--
Wolde you bothe eate your cake, and have your cake?
Re:A method for removing spam from your life. by crazyphilman · 2004-01-13 15:39 · Score: 1

That's interesting. Combining whitelisting and Beyesian filtering in tiers... But I think deleting all the remaining emails, then scanning the deleted items folder for thirty seconds in case anything interesting went into it might be easier for a lazybones like me. Beyesian filters have to be trained, and monitored... Ugh, too much work.

Besides, the human brain and eye are pretty good at picking out an interesting pattern in a sea of uninteresting ones. I find I'm pretty good at the scanning part.

--
Farewell! It's been a fine buncha years!
Re:A method for removing spam from your life. by Cerberus9 · 2004-01-13 16:40 · Score: 1

Informative? It's called whitelisting.
Re:A method for removing spam from your life. by John+Jorsett · 2004-01-13 17:05 · Score: 3, Funny

Phil! Thank God! I've been trying to get in touch since I had to change ISPs and you stopped answering my email. How have you been?

Dad
Re:A method for removing spam from your life. by ediron2 · 2004-01-13 18:39 · Score: 2, Insightful
Phil;
Twice in this thread, I see you talking about training the bayesian filter. You seem to think this is something of a burden, like training a big dog...
I think you misunderstand how easily one trains the current Mozilla email client's bayesian filter.
Day 1:
1: the mail comes in, spam included.
2: one of the inbox columns is a blue 'recycle' lookin' symbol. It is a toggle that acts like the 'new' indicator column, and a click on it turns state on or off.
3: glancing through the list, one clicks on the obvious spam, on this column. If there are chunks or patterns that help, you sort them via whatever useful column, then highlight a group, and hit a 'junk' button up in the toolbar. The messages marked as junk disappear (into a 'junk' folder), where they are automatically parsed by the bayes filter. This is what you'd I guess mean by training the filter. For me, it took about 4 minutes the first day, for over 100 messages at a 90% spam ratio. No disrespect, but I doubt you could write your whole stack of filters in 4 minutes.

Day 2:
Most of the junk mail gets caught. I'd say well over 3/4ths of the spam goes away on day 2. You see it come into your inbox, and then a second later all the junk items get the little blue icon turned on, then flash away to the junk folder. A few missed items or new junky things surface.
Days 3 and on: same thing, only better. By the 4th day, my 100 messages a day had fallen back to the dozen nonspams, plus one or two bogus items. It's an automatic 'In, ZZAP! Junk!' Every few days, I glance at the junk folder as you mention, and so far in the last 4 months I've had 5 misfiled messages declared as junk. 3 of them were atypically 'spammy' messages on usually-clean lists.
Now, compared to your way, I have:
- No rules to maintain,
- no problems with exceptions that are hard to write filters for. In my case, I'm on a couple mailing lists that broadcast all messages with the true sender (not the list) as the 'from' field, and nothing obvious in the subject line to filter on.
- Oh, and I'm lazy, too. What you describes sounds like it would take a few dozen built/tested filters, plus maintenance each time I get a new customer or the likes.
- no problems if a prospective customer sends me a request for a bid 'out of the blue',
- My way's sorta fun: Each morning, I see a message like 'getting 1 of 103 messages'... it counts up to 103, then I watch as the stack gets filtered back to just the real ones. Instead of admiring my own cleverness (advantage here to your way), I get to admire this nifty gadget that 'Just Works.' In fact, the one thing I'd like to see in this mail client is a 'Why' button, just so I could see diagnostics on a message's bayesian results. That, and a ranking to keep track of the spammiest message scores my filter ever sees!
- no lost messages from people I neglected to include in my filters.
Granted, you'll find those lost in your method in the spam folder. I say the Mozilla 's built in bayes approach is better because these messages don't get misfiled in the first place.
Oh, and people I could never expect to set/maintain filters can intuitively 'click' the spam away. That's my favorite advantage to my way.
Re:A method for removing spam from your life. by BCoates · 2004-01-14 04:13 · Score: 1

POPFile, the spam-filter software I use, has support for that built-in, you can specify manual mail-filter rules that will be applied instead of the bayes filter for matching messages.

I don't use it, though, as the regular filter seems to be doing an acceptable job without manual intervention.

--
Benjamin Coates
Re:A method for removing spam from your life. by pjt33 · 2004-01-14 06:15 · Score: 1

I use three filters (plus whatever my ISP does). I reject any e-mail containing X-RBL-Warning: (bl.spamcop.net) Blocked a href img I've had one false positive and no false negatives in over 6 months.
Re:A method for removing spam from your life. by fishbowl · 2004-01-14 09:10 · Score: 1

>Or is that unneighborly

Last time I had my own sendmail, it had vast swaths of addresses filtered out at the firewall.

Yes, it's unneighborly, in the sense that my neighbor is a crackhouse or something and I keep my doors and windows locked.

I'd do it again, only much more aggressively, if only my ISP allowed it.

--
-fb Everything not expressly forbidden is now mandatory.
Re:A method for removing spam from your life. by fishbowl · 2004-01-14 09:12 · Score: 1

In order for you to "reject" this mail, you must receive it. By the time I've downloaded it and copied it to my local host, I don't really *CARE* if it gets filtered or not, because I consider that the damage has already been done. Ok, it's helpful that it gets deleted automatically by a filtering program, but that does NOT undo the damage as far as I'm concerned.

--
-fb Everything not expressly forbidden is now mandatory.
Re:A method for removing spam from your life. by crazyphilman · 2004-01-22 17:01 · Score: 1

Well, I DID say it was old-school. I mentioned it because it's such a good solution, and everyone seems to have blown it off for beyesian filtering, which I think is sad, like a kindly old man forgotten, and abandoned to eat cat food, die and be devoured by his pet dog.

--
Farewell! It's been a fine buncha years!
Re:A method for removing spam from your life. by crazyphilman · 2004-01-22 17:07 · Score: 1

It takes all kinds, right? Just like they say in Perl, "there's more than one way to do it".

I personally prefer whitelisting because I don't have to worry about training the filter, which I find to be a little more of a pain than you do. I figure, one quick scan through the spam folder and I'm done. Note that you STILL have to scan the spam folder in case anything got nabbed as a false positive. So once my filters are in place, the amount of work going ahead in time is about the same.

Anyway, I really only have about five filters. And, they're easy to build; just a bunch of rules applied to individual email addresses; easy! 'Course, I'm a techie, and easy is relative...

I'm not saying your approach is no good; I'm just stating my preference. I'm not crazy about beyesian filtering, but that doesn't mean I don't respect those who are.

Nice post, BTW!

--
Farewell! It's been a fine buncha years!

OE's fixed filters by Nucleon500 · 2004-01-13 14:51 · Score: 1

A while ago, I followed this link from someone's journal. The upshot is, Outlook 2003's spam filter consists of a data file set of (word, weight) pairs. They've been md5summed to obscure them, but a simple dictionary attack can retrieve 80% of the matches.

Since the weights are fixed, it would be trivial to include random hammy words to get past it. It's a good example of the failure of security by obscurity - it wasn't difficult to reverse engineer the word list, and once the secret's out, it's easily exploited.

I don't think the random words are likely to work on real, adaptive spam filters, though. At least not on the one I use.

--
Litigious bastards

Simple trick that is semi-efficient by tomstdenis · 2004-01-13 14:52 · Score: 4, Interesting

Just block the domain name/ip of the hosted images. Most spams I get come from random IPs but usually have common IP/domain name for the hosted images e.g.

hostz300001.com/ads/viagra.jpg

Or whatever. I've cut down from 50 spams to about 3 or so a day by doing that.

I bet a bayesian filter would work nicer but unfortunately I'm too lazy to mod the mail setup [that isn't mine] to get one installed..

Tom

--
Someday, I'll have a real sig.

Re:Simple trick that is semi-efficient by Brent+Nordquist · 2004-01-14 04:23 · Score: 1

Have a look at the BigEvil list, a custom ruleset for SpamAssassin that has tons of bad domain names.

--
Brent J. Nordquist N0BJN

Citibank can't spell by shanen · 2004-01-13 14:52 · Score: 1

I got one of these yesterday that was supposed to be from Citibank. Practically every other word was garbled. However, based on my experiences with the incompetence of Citibank, it could have been from them. However, asking for my cash card information and PIN was a bit much, even for them. Also, it was in my spamtrap Yahoo address. Yahoo has the worst spam filtering of all of my email routings.

Actually, since I did have a Citibank account a while ago, and since some of the details of the spam did match Citibank business procedures, I actually wonder whether their account information may have been compromised. Hopefully, it was just a random fishing expedition, though I'm certain the ethical aspects and legal would not worry the spammers.

(On the Citibank topic, it REALLY did take me about 6 months of hard effort to get all my money out of Citibank and get all of my accounts closed. An amazing experience filled with quotes like "but we can't give you cash unless you pay extra" and "we don't offer that service today". AFaIK, they only overcharged me one time on one of the Euro transactions...)

--
Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.

I know how to filter these... by Anonymous Coward · 2004-01-13 14:54 · Score: 0

...but if I told you, I'd have to kill you.

Seriously, there's a bit of an arms race going on, and filtering is one place where the open source approach often serves the enemy. Once they see their weakness, they find another way around. The best spame filters these days are of the every-man-for-himself variety.

I see a new web site being established... by TheSHAD0W · 2004-01-13 14:55 · Score: 1

http://www.spamulator.com

But can your filter do this? by Anonymous Coward · 2004-01-13 14:55 · Score: 0

I personally use a spam filter of my own design which is based on information-theoretic and neural network techniques. It kicks the shit out of spam...

That's nothing... I use a filter of MY design that not only kicks spam, but hog ties it and notifies the appropriate authorities, sends a virus to the spammer's computer that causes it to spontaneously combust, and composes a politely worded letter to the spammer's mother informing her of her child's inappropriate behaviour...

Misunderstanding I see by mjprobst · 2004-01-13 14:56 · Score: 1

I think people are using "random" to describe these attacks, when they're in reality not at all random.

There is one kind I call a "vocabulary" attack which uses words selected pseudo-randomly from a dictionary.

There's another one I just call "misrepresentation". It includes key phrases and sentences from a specific type of literature, say political activism or charities.

There are indeed attacks I classify as "random" that just spew forth strings of random characters.

The danger of any of these is that eventually the pools of spam and non-spam weights will get confused. In theory one can go back to the junk box and correct false positives, which would force the filter to start disregarding anything common to the false positive and desired emails. However, I can't be alone in that my junk box gets so massive that I just don't have time to do this regularly, and I can't go through every last message to separate real spam from annoying but requested commercial mail that I might want to hear from again.

Fragmentation of Email into useless subsets of the whole network is where this is going. Like another poster, I have had to resort to using a whitelist for lots of my work. But I do have need to field unsolicited mail from people I haven't met, so that isn't a real solution. The only real solution, I fear, is to remove the part of the brain that makes humans selfish even to the point of destroying the systems that give them a free ride in the short term.

Am I the only one who sees this as a good thing? by fader · 2004-01-13 14:57 · Score: 1

I'm glad that the spammers are fighting back against the filters. Because then the filters will become better. And the spammers will become smarter, and the filters will become even more sophisticated. And so on and so on.

Eventually, we'll end up with filters so sophisticated that they'll become true AI! Finally, HAL will become possible, all thanks to your friendly neighborhood spammer! Thanks, spammer, you're a dear, dear friend.

--
- fader

Bigger beavers are the very reason for enlargement by tepples · 2004-01-13 14:57 · Score: 5, Funny

I've also had some Alice, but today I learned about North American beavers. I had no idea they were so large.

That's exactly why you need to ENL4R9E `/U0R P3N1S!!!1!1 because North American women have 1arqer beavers and thus require a bigegr PE/\/i5 to st!mu1ate them.

I can tell how well I've been moderated by KalvinB · 2004-01-13 14:57 · Score: 0, Offtopic

by the number of hits to the site linked in my sig.

Ben

--
Work Safe Porn

I keep praying for that silver bullet by The+I+Shing · 2004-01-13 14:58 · Score: 2, Interesting

I keep praying for that silver bullet that will end spam forever.

The thing that seems so insane about spam is that it's gotten to the point where apparently all spammers care about is getting past your filters. They must know that you're going to delete the message the moment you physically set eyes on the word "\/1A6RA," but it's as if they don't care. They just want to induce you to look at the word, and force you hit the Junk Mail button or Delete key. They just want to waste your time filling your Inbox with their insane crap.

It's like they're nasty little demons spitting up madness from the bowels of hell for the pleasure of their horned master. I can't picture a spammer as a human being at all... I always imagine hooves and a pointy tail, a slimy, crooked red finger pushing its sharp, black, malevolent fingernail into an eagerly pulsating "SEND" button.

Read any interviews with these people? My god, they really are monstrous. The arrogance, the pomposity, and the self-justification spewing from each of their mouths combine to form a portrait of a person so utterly bereft of morals, ethics, or humanity that I just want to clip the spammer's photo out of the magazine, scan it, and send it to X-Wipes to be made into toilet paper. I'll let you imagine the rest.

I've said it before and I'll say it again... spammers have done more than their share in turning the wonderful information highway into a sleazy backalley of filth, perversion, and fraud. Every day as I wait for my email client to download and process the two hundred or so spam messages that are clogging up my inbox, I sit in silent hope, praying that someone will find a way to end the madness at the source, and cut the spammers out of our lives forever and ever, amen.

--
You are in error. No-one is screaming. Thank you for your cooperation.

Re:I keep praying for that silver bullet by slappyjack · 2004-01-13 15:11 · Score: 1

I keep praying for that silver bullet that will end spam forever.

A silver bullet IS the silver bullet.

We just gotta find enough guys authoirized to fire them into spammers brainpans and BOOM! after a while, very little spam. Hell, I'd even be willing to go through a training program if they wanted me to.

Yes, spam should be classified as a capital crime.

i'm just sayin'.

--
s'wut i sed.
Re:I keep praying for that silver bullet by The+I+Shing · 2004-01-13 16:10 · Score: 1

Yes, painful death to all spammers, in the figurative sense.

--
You are in error. No-one is screaming. Thank you for your cooperation.
Re:I keep praying for that silver bullet by Steve+B · 2004-01-13 17:57 · Score: 2, Insightful

I keep praying for that silver bullet that will end spam forever.
What it will take is the enforcement of existing computer-cracking laws. Spammers will then have a choice between 5-10 year sentences or sending spam with no munged words, forged headers, misleading subject lines, etc.

--
/. If the government wants us to respect the law, it should set a better example.

Two Words! by Lobo_Louie · 2004-01-13 14:59 · Score: 1

Mortgage Enlargement!

More like 'Finnegans Wake' by tepples · 2004-01-13 15:00 · Score: 1

Reminds me of a Dr. Seus book...

Seuss, or Joyce?

Re:As if spam wasn't a big enough waste of bandwid by fermion · 2004-01-13 15:02 · Score: 2, Interesting

This is another subtle feature of modern email that allows spam to propagate: the HTML/RTF mail. Many mailers now default to the HTML setting. This is to allow lusers to put in obnoxious color schemes and use every font on their computer. It reminds of 15 years ago when we were first doing desktop publishing.

The real benefit is to the spammers. They can put inline images that make the email look like it came from a legitimate company, they can have the text version look random, but the HTML rendered version human readable. Almost all spam is going to be HTML, and my experience is that 95% of HTML mail is spam.

Which means that if we filtered HTML most spam would go away overnight, and the bandwidth wasted by the remainder would be significantly reduced. We would also significantly reduce the security risks. Unfortunately the lusers that use services such as Yahoo! would also be filtered. I wonder if the decision to default to HTML is purely to satisfy the general customer, or a feature targeted directly to facilitate advertising.

--
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black

Are these words designed to taunt echelon? by Anonymous Coward · 2004-01-13 15:02 · Score: 0

I've been receiving spam regularly containing words seemingly designed to taunt eschelon.

Here is an example:

metcalf executor cancerous guatemala emblematic parliament colonel saratoga auric lazybones astonish cabinetmake diatribe middleweight remorseful anharmonic
aztec codomain kulak grownup jumble silk buffalo kill ignition cubbyhole circus colonist calamitous creamy customary polarogram harvest equipping grandnephew andrea sachem inquisitor flout cowan fleet juridic sherbet collage apathetic proud familism histidine pomona arcadia galveston guillemot
fishmonger agrimony anabel persimmon aileron fitzroy epimorphism hale proper corpse paula convivial bakhtiari flounder renovate bleeker bump edgy ensemble police geoduck merchandise ellison hospice propel resolve citric floorboard
brouhaha hitchcock ilona midas captor evict indestructible adventure confront despoil barony executor periscope client shove madman horde merrill radiochemical generous
impassable khaki globe compendia copyright brooklyn pleiades charles painful airfield econometric church bacterium sainthood chard hazard inbred debtor rankine dadaism executor alistair apocryphal bergman bootstrapped grub
inadequacy homework caine audubon contemplate dorset eleazar corny raritan ozark insecticide leo monomer hearst catenate bloodshed enrico abash expurgate elicit cambric lise gadfly scruple adore guano drunk cessation conscience grantee bedbug burt
hessian dyeing equilibria everlasting cork crud camellia forklift breathe ingenious catchup bless aluminate fluoride hypoactive diagonal cosponsor dadaism bernadine chide edematous phil occasion antennae l insurance
adsorptive armada passionate phosphide cabdriver cordage congresswoman arden crocus cookery
gnomonic creamy pediatrician inert senior retardation cosmopolitan input bound necrotic flipflop du annex albacore linseed alphanumeric mollycoddle kennan adrenal sheffield giuseppe budweiser
huff partner descriptive riggs cezanne dogwood councilwoman had amend holystone arsenic activism carbonic conflagration inferno madcap infertile glissade deneb malnourished chapter corpus pasadena ingersoll gauche mozart antecedent persevere keypunch negligible galvanism prometheus realty broadside detail articulatory gloomy forensic dilemma

Word Salad by JohnGrahamCumming · 2004-01-13 15:03 · Score: 2, Interesting

Weird. I am talking about this at the MIT Spam Conference on Friday and on a technique that can break a Bayesian spam filter.

John.

F*R*E*E H*E*R*B*A*L V*I*A*G*R*A by Anonymous Coward · 2004-01-13 15:03 · Score: 0

Oh freddled gruntbuggly,
Thy micturations are to me
As plurdled gabbleblotchits
On a lurgid bee.
Groop, I implore thee, my foonting turlingdromes
And hooptiously drangle me
with crinkly binglewurdles,
Otherwise I will rend thee in the gobberwarts with my blurglecruncheon
See if I don't.

Easy way to filter this out by Orion+Blastar · 2004-01-13 15:03 · Score: 1

create a Spam filter that uses a dictionary that words can be added to. If a message has a certain percentage of unknown words in it, consider it possible Spam. Try to include every possible word in the language used for the dictionary.

--
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.

MailScanner by Anonymous Coward · 2004-01-13 15:03 · Score: 0

Mailscanner is a spam filter,

It's free, easy to setup and works really well.
The maintainer is really helpful too. Mailscanner website

Proposed Solution by Anonymous Coward · 2004-01-13 15:04 · Score: 1, Insightful

I've wondered why Bayesian filtering didn't also include word pairs as input. Doing so would mean that it would be more likely gibberish and actual language would be easier to distinguish, since using pairs (or even triads/trios if absolutely necessary) maintains some of the word order statistics for the Bayesian filters to key off of. Also, lots of spam now separates letters with spaces or punctuation to fool filters that would key off words. Using word-pairs would identify these types of spam easily, since the bulk of legitimate mail won't have word pairs like "v-i" "i-a" "a-g" "g-r" and "r-a".

Another input I wish Mozilla (or other bayesian filtering systems) would include is a dictionary look-up on words, then input the statistics of the message. For instance, a message where > 60% of the words don't match my english dictionary and 40% do match is most likely spam in my mailbox. This additional stat would give those filters more power.

SO I wonder... Would adding these things to existing bayesian filtering systems solve this issue to some degree? My gut instinct is that it would.

extreme solution? Block all msgs that contain http by Anonymous Coward · 2004-01-13 15:05 · Score: 0

I've found that by filtering on "http://" I can kill basically ALL spam, since it's always links to some site or other.

of course, this isn't so great for getting links from friends, for that I have a whitelist.

It's SO gibberish by ackthpt · 2004-01-13 15:05 · Score: 1

It's SO gibberish it automatically gets deleted. I wonder what the strategy behind that is. Same goes for using alternate characters, anything outside ascii 32 to 126 goes in the bin.

I guess obfuscating their own message so much to foil spam filters has caught up with them, as their message is lost in their methods.

--

A feeling of having made the same mistake before: Deja Foobar

Re:It's SO gibberish by Anonymous Coward · 2004-01-13 23:11 · Score: 0

Damn, most people prefer it when others put their writing in paragraph form so it's readable but if you really think e-mail is better all on one line...
Re:It's SO gibberish by You're+All+Wrong · 2004-01-14 01:01 · Score: 1

"anything outside ascii 32 to 126 goes in the bin"

Which is useless if your mates have names like Torbjorn Velen.

YAW.

--
Your head of state is a corrupt weasel, I hope you're happy.
Re:It's SO gibberish by You're+All+Wrong · 2004-01-14 01:15 · Score: 1

Why did slashcode remove the diaresis from Torbjorn's 2nd o, and the acute from his second e?

Anyway, I get loads of mails with valid non-ASCII.

The world's bigger than just the English-speaking parts of the USA, you know, se\~nor.

YAW.

--
Your head of state is a corrupt weasel, I hope you're happy.
Re:It's SO gibberish by B'Trey · 2004-01-14 01:50 · Score: 2, Informative

Certainly it is. And for those who use high-ASCII or UNICODE, it isn't a valid technique. That doesn't mean that it isn't a valid technique for the millions of people who don't use anything outside the normal ASCII characters.

I use POPFile, which is a perl Baysean filter. It works quite well even with spam which includes garbled words. I haven't tried playing with it yet, but it seems like it would be relatively straightforward to check for the number of words which are not already in its dictionary. Aftern the initial training, an email with more than a few new words is highly likely to be garbled spam (or from someone who received a new Thesaurus for Christmas.)

--
"The legitimate powers of government extend only to such acts as are injurious to others." Thomas Jefferson.
Re:It's SO gibberish by mwood · 2004-01-14 05:01 · Score: 1

Indeed I've played with the idea of building a filter that gives thumbs-down to any message whose Subject: contains more than two consecutive words not listed in the dictionary.

Of course the misconfigured ones that send out "Subject: %RANDOMWORD(10) blah blah blah" are just too easy to be any fun!
Re:It's SO gibberish by helphand · 2004-01-14 16:54 · Score: 1

I haven't tried playing with it yet, but it seems like it would be relatively straightforward to check for the number of words which are not already in its dictionary. Aftern the initial training, an email with more than a few new words is highly likely to be garbled spam (or from someone who received a new Thesaurus for Christmas.)
I did that, hacked my POPFile to track the word counts per email and the number of words that were found in the corpus. Over 5377 messages to date, there are definite characteristics for ham (heavily clustered at 80%+ of the words are in the corpus) and spam (80% of the spam has less than 80% of the words in the corpus). I also noted that virtually all spam had less than 500 words in the message, I could confidently predict that a message was ham if it contained more than 500 words, that was unexpected. Scott

--
If they can make penicillin out of moldy bread, they can sure make something out of you. -- Muhammad Ali
Re:It's SO gibberish by B'Trey · 2004-01-15 01:14 · Score: 1

Cool. Got a patch?

--
"The legitimate powers of government extend only to such acts as are injurious to others." Thomas Jefferson.
Re:It's SO gibberish by helphand · 2004-01-15 17:02 · Score: 1

No, easy enough to make one, but won't do you any good unless you are on the CVS version of POPFile. The release version (v 0.20.1) still uses BDB, the next release (CVS v 0.21.0) switches to SQL. If you're interested, post in Bleeding Edge - Source Code once you're up on the SQL CVS version and I'll put a patch together.

--
If they can make penicillin out of moldy bread, they can sure make something out of you. -- Muhammad Ali

Re:So, we really should be spell checking e-mail.. by Anonymous Coward · 2004-01-13 15:07 · Score: 0

Then again, we wouldn't be able to communicate with other Slashdot users.

You say that like it's a bad thing...

How I deal with spam by mabu · 2004-01-13 15:08 · Score: 2, Interesting

I have had my main e-mail published and unchanged since 1995. It's probably on 99% of all spam mailing lists. One of my servers handles about 600 POP3 accounts. My stats currently indicate that now more than 80% of our SMTP traffic is confirmed spam.

I don't believe in content-based filtering. We have a strict policy of not examining in any way, shape, or form, the content of any e-mail on our network.

We deal with spam by implementing an array of fully-tested, fairly conservative relay blacklists which block the inbound SMTP connection before the junk mail is even transmitted.

In more than two years of operation, we've only confirmed about six legitimate e-mails that were blocked, and we handle tremendous mail volume. It's an easy matter to "whitelist" anyone who might end up getting RBL'd to make sure the client can communicate with who they want. In EVERY case where a legitimate source was blacklisted, it was shown their ISP was irresponsible and the listing was valid.

In addition to using RBLs, we also have an array of hard-coded IP blocks that our server will not accept mail from. This covers a good bit of the rogue Asia-pacific ISPs that are the largest source of open relays. Something as simple as blocking major portions of 61.* have shown to reduce spam by 30+%. Anyone legitimately in China that needs to communicate with our network can be quickly whitelisted. Ironically, most of the ISP SMTP relays are not near the same broadband IP ranges - they obviously know how effective this technique is.

With RBLs and hard-coded spamming in effect, instead of 200 spams a day, I might get 3-5. As soon as I get new spam, I report it to Spamcop, and I notice a quick reduction in future spam of that nature immediately.

We're now getting near the point of blacklisting the entire 24.* IP block as well - which encompasses, among other things, a large portion of Comcast IP blocks that Comcast can't or won't control.

I'd like to see more ISPs simply refuse to accept mail from rogue networks. Then these networks would have to be more responsible.

Let me preface all this by saying our policy is to whitelist anyone who complains they have legitimate mail being blocked. For some strange reason, we don't hear any spammers making these requests. That's a shame because I'd be happy to visit them personally to make sure their situation is resolved in a mutually-deserving manner.

Re:How I deal with spam by vacuum_tuber · 2004-01-13 15:59 · Score: 2, Interesting

mabu wrote:

We're now getting near the point of blacklisting the entire 24.* IP block as well - which encompasses, among other things, a large portion of Comcast IP blocks that Comcast can't or won't control.

That's the real problem with blocking by IP ranges. I'm in 24.* because it's the only high-speed Internet I can get. It's not Comcast but I see tons of probes from infected machines local to me in my area of 24.*. But I'm not the only legitimate business living in a broadband network that contains tons of clueless residential subscribers. What would you have us do, get T1 lines and $3,500/mo ISP feeds? Go back to dialup? What's wrong with this picture?

I have a static IP, my own domains, and run my own Web and email servers. My site is business, has tons of information on a niche IT subject, has forums, and some growing e-commerce for parts and equipment in my niche.

If and when you block 24.*, either your users won't be able to write to me or I won't be able to reply to them, and if you follow the pattern of a lot of clueless admins out there you will also block to postmaster, so it will be impossible to let you know that you're blocking legitimate traffic.

Anyone legitimately in China that needs to communicate with our network can be quickly whitelisted.

Aside from the amusing notion of "Anyone legitimately in China" (what's the alternative -- being an illegal immigrant?), just how would a sender of legitimate email from China to a user in your network let you know that you are blocking their email? How would they let the person who can't receive their mail that the block is preventing them from communicating?

Most of my business contacts are initiated by the OP by email, from all over the world. If someone can't reach me because I block more than I should, that person will likely never reach me and I will never get any business from them. From my business perspective that would be exceptionally stupid network management.

I filter inbound spam by whitelist and then content. I get zero false negatives in my New Mail folder at the price of having to pick up some new correspondents from the SPAM folder and whitelist them. At least that way, though, I have a folder of truly confirmed spam to send to SpamCop by script, and thanks to the recent trend of gibberish tacked onto the Subject and other highly human-recognizable signals in From and Subject visible in the folder list, I no longer have to actually open any messages to confirm they are spam. Even when I do, though, my mail client doesn't retrieve any graphics from any servers.

Not retrieving graphics doesn't save me from confirming I am here, though, because as soon as I pass the confirmed spam to one of my servers the spam is first sent to SpamCop, then all the URLs are parsed out, spammer's email addresses are substituted for all occurrences of my email address in the URLs, spammer domains are substituted for any occurrences of my domain, and scripts then download the entire spam sites, once for each URL they have sent me.

That still leaves encoded values in the URLs, which I presume contain at least a cross reference to the email address the spam was sent to, but I don't care. "Send me spam and get your site downloaded. More spam -- more downloads." Most spam is, after all, an explicit invitation to visit a spamvertised Website.

--
Look at the bright side: there's always seppuku.
Re:How I deal with spam by mabu · 2004-01-13 17:00 · Score: 2, Interesting

That's the real problem with blocking by IP ranges. I'm in 24.* because it's the only high-speed Internet I can get. It's not Comcast but I see tons of probes from infected machines local to me in my area of 24.*. But I'm not the only legitimate business living in a broadband network that contains tons of clueless residential subscribers. What would you have us do, get T1 lines and $3,500/mo ISP feeds? Go back to dialup? What's wrong with this picture?

We're not blocking all of 24.* right now because there are some people like you on that block, but if Comcast and other ISPs that are in that class A don't get their act together, you guys are likely to have problems, because I'm sure I'm not the only person that notices that net block is a never-ending source of problems.

I am also of the believe that many of these large blocks are DULs. If you have legitimate permission from your ISP to run your own servers, I'd hope they would separate you in the IP space from the DUL RBLs. If not, that's an issue your ISP should consider.

I don't have much sympathy for Comcast however. They are proving to be THE worst American ISP in terms of controlling spam.

Let me also say something.. the 2+ tier backbone providers in most cases don't have the performance of someone like Worldcom (as much as I'd like to not admit it). You can get by with less bandwidth on a higher-performing network that doesn't go through a bunch of goofy networks that don't have their act together. Shop around if you find yourself serviced by an ISP that is indescriminate about who they do business with. There are always options.

just how would a sender of legitimate email from China to a user in your network let you know that you are blocking their email?

All relay-blacklisted e-mail is returned to the sender with an error message that redirects them to a web page with an e-mail form they can use to contact us. The only downside to this is that we have to expire the deferred mail cache more quickly than we would normally prefer, but since the server in question is just for inbound and not outbound relaying, it's not a problem.

Spamcop-RBL'd mail similarly echos an error message to the user with a URL they can click on to actually show the spam history of the smtp relay in question. It works very well, and best of all, it dramatically cuts down on the bandwidth that spammers consume.

Thanks for reporting to Spamcop. I really like their service too. The problem is, there are so many Asia-pacific and Comcast IPs, Spamcop isn't as effective when spammers have such a diverse array of IPs to hijack, so we've had to resort to some additional block blacklisting. It has proven to be very effective and we never leave legitimate users in the dark. If you had a mail relay in the block and tried to send me mail, you'd get a message and a quick way to contact me to have yourself authorized.
Re:How I deal with spam by kindbud · 2004-01-14 02:59 · Score: 1

With RBLs and hard-coded spamming in effect, instead of 200 spams a day, I might get 3-5.

Let me know when your spam volume grows to 400 per minute, and whether your system scales to that size. That's how much spam my mail servers receive.

--
Edith Keeler Must Die
Re:How I deal with spam by vacuum_tuber · 2004-01-15 04:24 · Score: 1

mabu wrote:

We're not blocking all of 24.* right now because there are some people like you on that block, but if Comcast and other ISPs that are in that class A don't get their act together, you guys are likely to have problems...

Then the Internet is likely to have problems, because the number of businesses connected by cable is on the rise for a very good reason: reasonable rates for good bandwidth, fast install, and better reliability than a lot of xDSL customers whose horror stories abound.

If you have legitimate permission from your ISP to run your own servers, I'd hope they would separate you in the IP space from the DUL RBLs. If not, that's an issue your ISP should consider.

(As opposed to what? Illegitimate permission?) I pay for business class cable service with a static IP. There is no question that I can run my own servers, and I make no use whatsoever of the provider's Web, email or other servers except for occasional use of USENET.
As near as I can tell, this cable provider uses IP addresses that reflect their network topology, which is strongly geographic down through the neighborhood level. I don't expect that to change anytime soon.

You can get by with less bandwidth on a higher-performing network that doesn't go through a bunch of goofy networks that don't have their act together. Shop around if you find yourself serviced by an ISP that is indescriminate about who they do business with. There are always options.

I can't get by with less bandwidth -- I need more as it is. Unfortunately the basic business rate is seriously asymmetrical, which is bad for servers. Also, I don't need an ISP in the traditional sense. I just need "Internet dial tone" -- packet routing to/from the Internet. I live on the fringe of a larger city, at the edge of suburban sprawl where in one direction things are highly built up and in the other direction there are farms. There is no reason why decent DSL shouldn't be available here but the telco has still not installed DSLAMs outside the COs around here, so even well within the heavily built-up border of the suburban sprawl only those fortunate enough to be close to a CO can even get DSL.
Before I got cable I had IDSN and then tried to go down the iDSL route with Verio and [Northpoint?]. In the end [Northpoint] screwed up in its death agonies and Verio accepted some nitwit's determination that my location is too far from the CO to do iDSL, even though I had working IDSN at the time, which is the same signalling on the line, and even though I could plug the iDSL modem into the IDSN line and get a green signal indication. It was a complete bust, convinced me of how brain dead Verio was, and cost months of lost calendar time filled with ultimately useless phone calls and emails.
When I ordered business class cable Internet from Time Warner Cable it was installed in three days and delivered 2 Mbits down and 384 Kbits up. The one long outage (60 hrs) I've experienced due to cable modem failure and the several briefer outages have been nothing compared to what people and businesses I know who use DSL have experienced. For example a hotel known to me with telco-provided DSL has experienced multiple outages lasting a week or more.

Shop around...

There's no "shopping around" to be done here. The only viable and affordable connectivity here is cable. Some day, if I get within range, I may install DSL as a backup. Meanwhile while the telcos and their DSL resellers try to see how many thumbs they can jam up their rears, the whole country is getting connected via cable. Determination by IP block of what can or should be blocked is becoming diluted and less effective every day. If you want to black out more and more of the Internet to your users, that will ultimately be your problem. I understand that running a large operation th

--
Look at the bright side: there's always seppuku.
Re:How I deal with spam by mabu · 2004-01-15 08:30 · Score: 1

I just began reporting to SpamCop to see how it would go and to try to increase the level of difficulty for spammers. It's SpamCop's reporting to ISPs and upstream providers that interests me since it may get spammers shut down in some cases, forcing them to keep moving, which increases their workload and costs. Even SpamCop advises that their block list should not be used to block email in any kind of production environment, though.

I've given up on reporting the ISPs hosting spamvertised web sites. These people know exactly what they're doing and have no intention of changing. However, reporting to uplink ISPs who are being exploited via SMTP is another matter.

The disclaimer on Spamcop's RBL is there to cover all the bases. It's an excellent RBL.. Even AOL uses it. But like any strategy for network security, you can't just set-it-and-forget-it and expect everything to work perfectly, which is why a good admin always monitors what's going on and constantly adapts.

I download spamvertised Websites to carry back a cost component to the beneficiaries of the spam I receive.

I wish I could say I thought that was a good idea, but I think it's doubly-wasteful. The bandwidth you might take from them is marginal at best, and the reality is that many spammers are stealing web hosting as well (they hack into AOL member pages, they set up temporary free web space, etc.) so all you end up doing is wasting your own bandwidth, and the bandwidth of other innocent parties.

There are only two effective approaches to solving the spam problem: 1. Get enforcement bodies to start enforcing the existing laws spammers break, and 2. Push the spammers into a corner so they cannot operate except in limited areas of cyberspace. The main issue with #2 is the source of the spam, not the source of the spamvertised web site. That's where the IP blocking becomes a very effective tactic.

The problem with pushing spammers into a corner is that you don't have many ISPs with much incentive to police the illegal/unethical activities of their users, and you have administrators who don't take the steps to stop their spamming customers even when the activities are in violation of their TOS. The solution: shut them all out and force the admins to get their act together.

A good example of this strategy and how effective it is can be demonstrated when you look back a few years at the proliferation of open relays and the automated testing system that forced tens of thousands of mail servers to be blacklisted because their relays were open. I was furious when this first happened, and it forced me to make changes to the way I handled clients in order to avoid being blacklisted. I hated it at first but it forced me to run a more secure network. Had I not been blacklisted, I wouldn't have made tightening this up a priority. Now 99.9% of all mail relays on the internet are closed systems. The main reason for this was blacklisting. It works. And it especially works when you start shutting people down on a quantum level if they can't manage their resources properly.

As for taking offense of innocent parties being "caught in the crossfire" of IP blocking, keep in mind that's part of the spammer's M.O. They're like terrorists, who mingle with regular people via forging headers and trying to appear legitimate. They create collateral damage by their very nature that is unavoidable until you can bring them out in the open. There's no way to get around that unless they can be pushed into a corner, and that process will always involve innocent people getting caught in the middle. However, one of the problems that makes this issue worse, is the apathy and ignorance of people caught in the middle, so sometimes something like IP blocking is a force for good, motivating people to act, to change their ISP or complain when they would normally blow it off and thus contribute to the problem.

I use that method by KalvinB · 2004-01-13 15:08 · Score: 2, Informative

includes sourcecode

Mercury Mail's session logs indicate a closed connection to indicate where e-mails begin and end but if you're using something else there's a RinetD mod with source which logs e-mails in such a way so that ripping through them is easy.

My filter is all of 23KB and I get virtually no spam. I update every once in awhile when a spam gets through.

I also have a couple sub-domains that point to a spamcan on my home connection which I use to bait spammers so I can preemptively filter them out without paying for the bandwidth.

Ben

--
Work Safe Porn

It won't really work against Bayesian filters by SWroclawski · 2004-01-13 15:09 · Score: 1

First, a number of large sites are using Baysian filters now, such as AOL and MSN. More will follow soon.

But will gibberish, or even something like Alice in Wonderland really make a difference? No.

The term for that "stuff" is noise.

We have years of research on noise:signal problems. There are plenty of ways to find the noise in a signal, and then apply the filter to that. A lot of that noise is already filtered out when one applies HTML filters on it- dehtmlizing or HTML -> text often does the job of reconstructing the message. Jibberish characters add nothing to the spam score and anything else can be addressed as above.

Even with the gibberish words though, an old version of Bogofilter's still giving me very good spam filtering. I get some 10-20 spam a day, and I see one in my inbox every 2-3 days. I see a false positive in my spam folder maybe once every two or three months.

It doesn't seem to be effective at much. I am not really worried about it breaking our spam filters. Not yet.

- Serge

Theory Explored by rudy_wayne · 2004-01-13 15:09 · Score: 1

"I have baseless theory that the sole purpose of spam is to sell lists to other spammers, who sell lists to other spammers etc. There is no product behind them any more: it is like pyramid marketing."

Here's what I've been thinking about lately: Do spammers actually make any money from spamming? Seriously -- I'm starting to wonder if there's something different at work here.

Because e-mail is so cheap that it costs practically nothing to send a million spam e-mails, are spammers spewing their crap (and ignoring the near zero response rate) in hope that some day the money will start rolling in?

Think about it -- every week millions of people plunk down a few dollars for lottery tickets. And even though they never win anything, they keep buying, week after week, month after month, year after year. Why? Because it's such a small amount of money that they figure it's a small price to pay for the chance to win millions.

I'm beginning to think this is the same mentality that is driving spammers.

Gibberish in spam by dakryx · 2004-01-13 15:10 · Score: 1

I think anyone who would be willing to buy something from spam once they saw bunch of misspelt words that would turn them off. I know If I was about to buy something off a website and I saw a ton of misspelt words it set off red flags in my head.

Gibberish? Try gzip by bigberk · 2004-01-13 15:12 · Score: 1

Here's something weird I tried (yeah, I'll admit it... I was drunk). Gibberish is high in entropy and hence doesn't compress well.

So, you can strip out things like headers, whitespace, HTML, convert everything to lowercase, and run it all through gzip. Then take a look at the percentage the message was reduced by.

What I found that (not surprisingly) legitimate mail with normal words in it reduced by a significantly greater % than spam with lots of gibberish in it.

Spam Poetry by GoogolPlexPlex · 2004-01-13 15:13 · Score: 2, Interesting

I get a lot of spams with contain 3 random words in the subject. Currently, I collect the subject lines in a text file and arrange them to make poetry. A few sample verses:

i'll take this
open window into
imflammatory tales about
pieces of herring

shooting caused panic
that surely only
constituted a prelude
or else maybe
had ever happened

Re:Spam Poetry by Anonymous Coward · 2004-01-13 15:21 · Score: 1, Funny

I look for natually occurring haikus in three adjacent subject fields, such as...

Re: your assignment
you are the one and only
Why don't you like me?

and

His skin was jet black...
Your girlfriend is a lesbian...
Needed Equipment

(The last message being the famous time traveller guy)

I would have said Bayes will kill it, but.... by dspyder · 2004-01-13 15:15 · Score: 1

It ain't working... mainly because it has absolutely nothing to base the words on. I'm getting a reasonable amount of false negatives where the bayes score is 40-60% sure it's spam. I'm thinking of upping the SpamAssassin score for that, but it's kind of not a good solution.

I know people are working on various rules to check number of consonants and average length of garbage words... interesting chase.

I really wonder how effective the actual spams are though. When you see garbage in your inbox do you even bother to open it? My wife honestly thought something was corrupt and just deleted the messages. I guess I don't see the point in this type of spamming (not like I entirely get the point of any other kind)...

--D

Re: Whitelists by WuphonsReach · 2004-01-13 15:16 · Score: 1

Spam will disappear when the major network providers endorse a centralized SMTP whitelist. The reason why nobody talks about it, is that it's a cure for the spamedemic and there are a lot of companies out there, including all the ISPs that profit from spam.

And who decides who gets on the whitelist? You? The government? People with lots of cash? Microsoft? AOL? Will an ISP in an axis-of-evil country be allowed to be on the whitelist? ISPs already write pink contracts to allow spammers to use their bandwidth, what makes you think cash won't change hands to get the spammers whitelisted?

Whitelists also assume that e-mail can't be forged... we're not there yet (not until reverse-MX and sender PKI signing come into play).

Centralized whitelists are too broad. Companies that might be on your whitelist are not necessarily those that I want on my whitelist. (In other words, I don't trust the people who adminster whitelist X.)

On a limited, local scale, whitelisting works well because it's distributed and hacking one list doesn't get you very far. However, as you add more customers of the whitelist, you become a larger and more attractive target. (To hack a whitelist for 100 users is a waste of time, to hack a whitelist of 1,000,000 users is well worthwhile.)

--
Wolde you bothe eate your cake, and have your cake?

I Am An Anti-Business Pinko Pig by Schizoid+Genius · 2004-01-13 15:18 · Score: 2, Funny

If you've ever had an argument with a sp@mm3r, you know how self-righteous they can be. They have a right to "freedom of speech", they are just trying to run legitimate businesses, yada yada yada. And you know what? I'm beginning to think they have a point! Think about it...

First, I demand that I retain ownership of my own inbox.

Then, I take a stand against the raping of open proxies and abuse of malware-infected zombies.

N0\/\/, I have the g@11 t0 s.a.y..t.h.a.t U51NG R/@/N/D/()/M g1bber<steamboat>ish +0 @v0iD f^i.l*t,e.r\s i|s w.r.0.n.g.

Mary had a little lamb;
Its fleece was white as snow.
And everywhere its address went,
The spam was sure to flow.

My, my. What won't I do to destroy healthy, legitimate, all-American Internet commerce?

--
Please Help a Schizoid Genius!

Bayes training seems strange. by Thinkit3 · 2004-01-13 15:18 · Score: 1

It seems like it would generate a lot of false positives. If you train on computer lingo and someone writes you some poetry, won't that get booted? Perhaps it's not as good for those with eclectic activities. White-listing or challenge response would be the only ones I'd consider if I start getting too much spam.

--
-Libertarian secular transhumanist

RBL is a winning battle; Bayes is a loser by mabu · 2004-01-13 15:18 · Score: 1

Under IPv4, rogue relay blacklisting creates a substantially more-restrictive environment in which spammers can operate, as their available IP space continues to shrink. As more systems become more restrictive, they run out of places to hide. You can see light at the end of that tunnel. There is no light at the end of the tunnel with Bayesian or other content-based filtering.

There are likely exponentially less combinations of rogue source IP space than there are keywords in message content that can be controlled.

Content-filtering is a battle that loses over time; RBL blocking is a battle that wins over time. The only thing that would change that fact would be the additional IP space that IPv6 would introduce, which would be a complete nightmare.

Random Words? by LordoftheFrings · 2004-01-13 15:19 · Score: 1

I thought the first link was actually going to be Random Words like it said. Needless to say, I was disappointed by the appearance of some...article... I usually never read those...

--
Canadian Cynic, canadian politics is less boring than you

Filters don't work by flyingrobots · 2004-01-13 15:19 · Score: 1

They take too long to configure. My ISP just blocks traffic comming from open relays and addresses that are Asian. Shake, stir and add that to knowspam.net and you stop getting spam, period. Best served cold...

Feature added by Felinoid · 2004-01-13 15:20 · Score: 2, Insightful

In the past many ISPs would add filters and NOT tell the users they were doing it.
Now a days however ISPs (most notably Earthlink and MSN) advertise spam blocking as a feature.
If people wanted this stuff you'd think non-filtering ISPs would advertise "You get ALL your e-mail".

But back to the original point. Spammers have used misleading topics in e-mail if only to make sure you don't delete the message. That and creating spam lists based on people who DO NOT like spam or of people who have manually opted out of spam lists.
The people who actually make money with spam don't care about selling products via spam as they sell spam services. The people who sell stuff via spam aren't making money becouse they are reaching markets who are wholely disintrested in buying stuff from them.

--
I don't actually exist.

The real solution to spam. by Malcontent · 2004-01-13 15:22 · Score: 1

Listen up. Here is how we can solve the spam problem once and for all.

Turn on finger. Yes you heard me. Let's re-implement finger. Here is how it works.

My SMTP server gets email from joeblow@123.com. I finger joeblow@123.com. If 123.com says joeblow is a real user I then accept the email, other wise I can it.

Voila! No more forged headers, no more spam.

This very simple simple solution would also allow legitemate businesses to send spam to the people who have opted in.

--

War is necrophilia.

Re:The real solution to spam. by PacoTaco · 2004-01-13 15:48 · Score: 1

It's been done, thankfully without finger.
Re:The real solution to spam. by TPFH · 2004-01-13 16:13 · Score: 1

My SMTP server gets email from joeblow@123.com. I finger joeblow@123.com. If 123.com says joeblow is a real user I then accept the email, other wise I can it.

It would help, but it wouldn't prevent joe-jobs.
Still, if every spammer was operating via joe-jobs we could prosecute them for fraud.

If the server not only told you that joeblow@123.com existed, but there was a digital signature confirming he is the one sending the email then it would work.

If we put the public key to a digital signatuer in our .plan would that work? Or have we gotten to the point where most people on the internet would not understand a .plan even if we tried to explain it to them. Still, I imagine there are some people who would only want to receive email from other unix/linux users.

--
This signature used to contain a cute kitty virus with ansii art. Please set the slashdot editors on fire. Thank you
Re:The real solution to spam. by Anonymous Coward · 2004-01-13 18:51 · Score: 0

That's an interesting idea, but implementing it with finger is going to kill it for most people. I sure don't want to try to run fingerd or similar on the box that stands behind the A record for my domain. I'd much rather have this sort of thing work with a SRV lookup, so it could be on any port on any box.

It could even be on multiple systems for load balancing. That seems pointless for traditional finger uses, but consider what happens when your domain is being forged. Whatever servers happen to be listed for this service are going to get tons of requests from the outside world.

This is already happening on a small scale with a few incompatible projects. Maybe one of the "big guys" will pick it up and legitimize one of the better ones.
Re:The real solution to spam. by ElectricRook · 2004-01-13 19:15 · Score: 1

My SMTP server gets email from joeblow@123.com. I finger joeblow@123.com. If 123.com says joeblow is a real user I then accept the email, other wise I can it.
No-Way, the spammer forged in my email address into the "From" line. You should see what happens when he then sets the auto reply when message is read flag. Actually only about five people opened the message, and caused an auto reply.

--
- High Tech workers, please say NO to Union Carpenters, their Union sees fit to control our compensation.

Security Thru Obscurity by Tablizer · 2004-01-13 15:22 · Score: 1

I have decided to build my own spam filter. A slim preview mode will show keywords in context that I have deemed significant in determining if it is wanted mail or spam.

I don't expect this self-made system to be "better" than commercial filters from a technical standpoint, but it will have the advantage that spammers will not try to work around it because only me and a few relatives at the most will end up using it. Thus, it will not be a target of reverse engineering by spammers.

--
Table-ized A.I.

Re:Security Thru Obscurity by WuphonsReach · 2004-01-13 18:14 · Score: 1

Which is essentially the same thing as a client-trained bayesian filter. Because your bayesian database isn't shared with anyone else, it's highly customized to what you consider to be ham/spam. Odds are that the spammer is going to guess wrong and get tagged as spam.

Just like whitelists/blacklists and a few other schemes, it's better to have a locally-controlled list rather then relying on a huge 3rd party list. When the huge 3rd party list is used by 1,000,000 clients, there's a large incentive to spend time hacking past it. OTOH, it's not worth the effort required to hack past a filter that only protects 100 clients.

--
Wolde you bothe eate your cake, and have your cake?
Re:Security Thru Obscurity by Tablizer · 2004-01-13 18:49 · Score: 1

Which is essentially the same thing as a client-trained bayesian filter.

But it is hard to understand why a Bayesian filter picks what it does. It is too black-box. A keyword approach allows one to review the keywords that have interest to the filter (positive or negative) and see how they are being applied. (Plus, having HTML will knock a message score down.) I will make it show the first 10 or so words in the content if it does not find any key-word matches.

Maybe I am just more comfortable if I control the algorithm. Better to have important stuff deleted by my own mistake than some weird algorithm.

--
Table-ized A.I.

Re:I see this too (err, I don't) by WuphonsReach · 2004-01-13 15:25 · Score: 1

I use SpamBayes as well, but I have not had problems with them getting past my filters.

How many ham/spam messages did you train with? (I trained on a few thousand of each... with 9000 spams and 3000 hams sitting in a folder if I need to re-train.)

I got one here today that got a score of 100% spam by SpamBayes. Wasn't even a contest for SpamBayes. The only ones slipping through my filters currently are those that are forging the FROM: address. (Not the fault of SpamBayes, it's a dumb filter that fires earlier.)

To: webmaster@
Subject: Re: GH, almost followed after

Free CableTV!No more pay!-

ballast flack buenos chromatin horsewoman condolence prosecution catnip consular tongue chromatography avenue gingham administer pm compartment mesh swelt waitress redtop deplore corpsmen aleph birdie confiscatory dunk awe airline collectible horn thetis badminton chagrin holland springfield ecclesiastic addressograph darwinian condemnate boeotian frenetic valeur oceania epithet smyrna kiev rockland turbidity python frankel acid btu nascent bricklay deforest east cerulean muskmelon thulium estes bizarre constitute sequestration blind ablution disquietude divorcee parquet crossword agitate etch bird rhoda impetus persuasion vermiculite richfield teethed pudding glutamine squeegee lakehurst smoke vhf fist october brood satyr taxpayer when sky brant airstrip ulan micky checkmate militarism raffish firsthand prohibit squid committeewoman curie inventory dexter theta capstone shiv bright balmy roebuck heady cream agleam but alkali naomi causate liquidus quicklime zodiacal ecstasy wing snappy bitterroot desiderata alum coiffure isocline purina wilkins calculi nail pompous whereof beaumont lax lumbermen salami highwaymen oscar thither clue discussion earl shale tarpon aztecan churchyard loath otter desolater b precision dangerous baffle busch strive backwood staple stockroom kaddish chariot stucco libreville vermilion pose valedictory conscionable indiscriminate torus benton gullah maggoty knelt beatrice pathfind roberto aforesaid pyramid coincident anyplace arcsin lymphoma aghast wee span douglass bus sylvania whip diameter collagen australia anyone jug cog psychotherapeutic detain cincinnati crux chimpanzee heed yogi calcium homogeneous catfish tx educate guanine kendall watery craftsman jaime chloroform apparition northernmost indices locoweed dot data tenney relevant junco ronnie acquiescent cotoneaster de brock deborah opal garth an derivate turbid arachnid balk surf pearce atkinson data demark disney adventurous swum roseland ama officiate removal beckman expose gop dunham

--
Wolde you bothe eate your cake, and have your cake?

Re:As if spam wasn't a big enough waste of bandwid by Kris_J · 2004-01-13 15:26 · Score: 1

Ugh, tell me about it. I wish Eudora had an "only render the text portion of this message" option. The best I can manage is to use the internal render (rather than security-challenged IE) and to turn off the fetching of images.

The Spamcop service that I used to subscribe to but am now phasing out due to Ironport used to have, ages back, an option to strip out all HTML portions of an email. I loved that option and really missed it (and the attachment stripper) when it was removed.

Multipart email has some nice potential for such things as encryption and even compression, but no it gets used to make the headings 72-point, hot pink and in a font I don't have on my system.

Anyone know how to make MailScanner rip out the HTML portion of a multiformat email such that the end result looks like it was always just plaintext? Failing that, anyway to set Outlook's default to plaintext from a login script?

Re: Whitelists by mabu · 2004-01-13 15:28 · Score: 1

And who decides who gets on the whitelist? You? The government? People with lots of cash? Microsoft? AOL? Will an ISP in an axis-of-evil country be allowed to be on the whitelist? ISPs already write pink contracts to allow spammers to use their bandwidth, what makes you think cash won't change hands to get the spammers whitelisted?

I think any attempt to create a centralized regulatory agency to authorize SMTP licenses would be better than we have currently. The key to its value (and inability to be exploited) would lie in how it was administered. There will always be special interests trying to manipulate things, but if you publish a clear-cut, definitive outline of the rules for participating, it would avoid these sorts of issues.

Let's be realistic and not conspiratorial. The TLD management system works very well. A similar central registry could easily be implemented. The whitelist would be completely voluntary, but with a published list of rules in which participating systems would have to adhere to. Not all forms of regulation are totally devoid of usefulness or overwhelmed with corruption.

Centralized whitelists are too broad. Companies that might be on your whitelist are not necessarily those that I want on my whitelist. (In other words, I don't trust the people who adminster whitelist X.)

There could be several types of SMTP licenses. Just like there are more or less-conservative RBLs.

The rules for prohibiting unethical UCE are really not that grey. This is a technical issue that isn't all that subjective.

OT: Your sig. by Anonymous Coward · 2004-01-13 15:31 · Score: 0

Cool game!

Calyso Hypotenuse has nothing on this gem by briansz · 2004-01-13 15:33 · Score: 1

Subject:orphic repulsive exhibit gordon autoclave
Body:STILL NO LUCK ENRGAILNG IT?Our 2 pcodruts will work for you!1. #1 Spupelment aavilable!

I've actually found it easier to manually DQ the 30 or so spam messages I get a day since this nonsense started being pumped into the Subject line. But at least if I ever want to enrgael it, I'll know who to call for some spupelment pcodruts.

Any fool with fast hands can grab a tiger by the balls, but it takes a real hero to keep on squeezing.

An alternative to filtering by e4ward · 2004-01-13 15:33 · Score: 1

is to prevent spam from ever reaching being sent to your mailbox in the first place. Thats how DEA (disposable email address) systems work. The email address you check (your mailbox address) and the address you give out (an alias) are 2 different addresses. Spammers can't spam your mailbox because the address is secret (and should also be unguessable). OTOH if they ever get ahold of one of your aliases, you can just dispose of it, with minimal impact (aliases are assigned one per contact). DEA is kind of like password protecting your email.

--
http://www.e4ward.com

We don't need no stinkin' word filters... by gruntled · 2004-01-13 15:35 · Score: 1

My filter keeps stats. I'm blocking over 90 percent of spam by looking for the follwing in the message text:
1. "Content-Transfer-Encoding quoted-printable"
2. ""
3. "unsubscribe"
4. Content-Transfer-Endocing: base64"
5. "Click Here"
6. "This is a multi-part message in MIME format."
7. "font size ="
8. "cellPadding"
9. "subject=remove"
10. My own e-mail address...

False positive rate is currently much less than 1 percent (about one fales positive every couple of months), largely mitigated by the fact that "approved" addresses always get through whatever is in the messages they're attached to. Generally the only flase positives I get these days are people who send me pix via their cellphones for the first time...

hmmm by ShadowRage · 2004-01-13 15:37 · Score: 1

thing is, what do they hope to accomplish by doing this now?
it's starting to look like they're spamming just to spam now, I dont even see ads in the spam, it's like they've now gone to the level of typical 13 year old kids who got ahold of a spam software just to piss people off..
or maybe they're hoping people will just give in and become slaves to the spammers, either way, it's just ridiculous.

Some ideas by Boyceterous · 2004-01-13 15:38 · Score: 2, Interesting

1 - I've posted about this before; since I can look at just the subject, sender, and recipient fields and figure out if an email is spam, then I should be able to get/write a program to do that also, and therefore not have to even download the entire garbage content. I'm using my own email header spam-scoring system that gets about the same results as more sophisticated filters that examine email content.

2- Most of the solutions to spam have involved ideas where senders pay or trying to swamp spammers with so much return junk that they get annoyed or driven out of business. Is it feasible to use an email system where the email content does not hop from one server to another? Just send the headers and where to get the content. In other words, when an email is sent, it would sit on the SMTP server provided the sender's ISP(s). That way recipients have to go and get it ( just like web pages, right?) It seems to me that would cut way down on traffic, could provide accountability, and alleviate the ridiculous burden on recipient's ISP to provide storage for every idiot that wants to send their trash to my e-doorstep. ISPs would be pressured to either charge for holding millions of emails until they're read, and at the same time quickley get blacklisted if they allow spammers to operate from their servers - and the sender ISPs know who they are, which might make it possible to get the actual spammers more directly. Seems like such a system might at least direct more of the cost towards the sender side rather than the recipient side.

What about Bayes on word n-tuplets? by adrianbaugh · 2004-01-13 15:39 · Score: 2, Interesting

It seems to me it would be much harder to poison a filter that did Bayes by splitting email into word pairs or triplets and assigning ham and spam probabilities for each. That way the bad grammar and random word lists would be extra-bad. I suspect longer sequences would become harder and harder to foil. They might require extra training of the database, but if you're getting lots of spam that isn't really a problem. Perhaps the word sequence length could be configurable.

--
"'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
- JRR Tolkien.

Re:What about Bayes on word n-tuplets? by WuphonsReach · 2004-01-13 18:16 · Score: 1

That's probably the next step in the arms race.

I'm pretty sure one of the bayesian filters already does it that way, but I don't know which product. SpamBayes that I use is still single-word driven, but it does parse header/subject/from/to information and adds that to the database.

--
Wolde you bothe eate your cake, and have your cake?
Re:What about Bayes on word n-tuplets? by steveha · 2004-01-14 09:38 · Score: 1

This is one of the reasons I use SpamProbe. It uses two-word pairs.

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely

Re:My Bayesian filter is slowing becoming a whitel by Anonymous Coward · 2004-01-13 15:40 · Score: 0

Judge Lynch never sleeps :-)

Re:Bayes filters hubert balloons c6as6g89y9aigah98 by mabhatter654 · 2004-01-13 15:40 · Score: 1

The key to many of the one's I've got recently is that they are using random generators so that ISPs can't easily block a whole lot of messages by simply blocking the subject...With baysen spam filters that check content it wouldn't help much. Except that the AOLs and Yahoos of the world look to drop common subjects before ever sending them to the actual spam filter....this forces them to spam check every one which breaks their system.

Spam and Googlewhacking by ctrl-alt-elite · 2004-01-13 15:41 · Score: 1

Actually, there's an upside to the advent of gibberish becoming more widespread in spam: it helps with ideas for googlewhacking...

Re:Bayes filters hubert balloons c6as6g89y9aigah98 by mabhatter654 · 2004-01-13 15:45 · Score: 2, Informative

to clarify it, say you report a spam to Yahoo, they most likely are getting 10,000 of the same subject from similar IPs so they just drop the connection after the subject is entered [that is an elemtary feature of even the oldest email servers]...it never gets sent thru the system or to your spam filter. But now they have to run the spam filter on every single email...costing more time than simply dropping it because of subject...remember they deal with 10,000 of the same spam at once in a day....except now it dosen't look the same every time.

Modus frikin' ponens by veg_all · 2004-01-13 15:46 · Score: 1

A) The only reason to do this is to get past Beysian filters.
B) It's not worth doing if it doesn't work.
C) For it to work, the recipients bust buy.
D) Only geeks use Beysian filters.

Ergo, geeks are buying from spammners

Q E D

--
grammar-lesson free since 1999. (rescinded - 2005)

I propose G3tti/\/g rid of T/-/ I $ can be done by by fingerfucker · 2004-01-13 15:49 · Score: 1

using a technique that I would call Reverse Replicated OCR.

Imagine you created a mechanism that takes those obscure-looking "rand0/^\ w0rd$" and converts them to legible "random words". Easier said than done? Well, if you converted the obscure text to an image, blurred each letter based on what other letter surrounds them (e.g. "^" would be blurred more than "n" because "^" is surrounded by "/" and "\"), you would essentially get, in my opinion, an image that actually looks more legible. "/^\" would collapes into an "M" in the eyes of an OCR engine.

The proposition to make it OCR-based is just an implementation, but the idea is to have a parametric system that realizes that "/^\" can be mapped to "M" for example.

Since this whole proposal probably sounds obvious, one might expect this will be implemented pretty soon.

When it comes to the excerpts from E.A. Poe's works or other continuous sensible text, this will be a much bigger of a problem to tackle. I would even dare to say that this is where we will see spam filter circumvention techniques to be advancing towards.

cellphone spam! by Etrigan_696 · 2004-01-13 15:53 · Score: 1

The only email address of mine that gets spammed anymore is the email function of my cellphone (an ancient nokia 3360). So when I get these types of spam all I see, due to the insanely low size limit on incoming messages, is the anti-anti-spam technology. Yesterday, I got one that began:

noneuclidian insane poet mastermind....

It fooled me for a second, I thought I was really reading something kinda cool, then I realized it was just anti-spam-filter jibberish.
I was disappointed, I wanted to know more about this noneuclidian insane poet mastermind! It sounded like a cool opening for a novel.

I'll gladly read spam if... by leob · 2004-01-13 16:00 · Score: 1

all these 150 Kb trojan emails go away!

40% of all my incoming messages are trojans; and so far SpamBayes deals with them quite efficiently (100% spam probability -> /dev/null).

Spam will spur us to invent AI. by Anonymous Coward · 2004-01-13 16:01 · Score: 0

Terminator got it all wrong: Here's how the world ends.

~2005 Bayesian filtering begins to break down as the sheer volume of spam on the Internet causes dozens of messages to leak through every day regardless.
July 17th, 2006 Spam becomes such a routing issue that several major peer point providers threaten to, and in some cases, actually do break links to other regions in order to salvage their bandwidth.
August 10th, 2006 - The President declares a national state of emergency to deal with "terror attacks on our information infrastructure"
September 25th, 2006 - In response to Congress' call for "radical methods" to defeat the scourge of SPAM, the NSA in conjunction with the Dept of Homeland Security unveils the SkyNet project, which will use a series of trained neural nets and expert systems operating at every major routing point to read email passing through and make a judgement using near-human level reasoning as to whether its spam or not. Estimated cost: $400 billion dollars. The moon colony plans are scrapped, the Medicare bill rolled back, and tax cuts are rescinded in order to fund this measure.
June 8th, 2008 - Despite slow progress and rough starts, scientists announce that a prototype system will be in place by July 4th on over 100 major networks throughout the nation.
July 4th, 2008 - With much fanfare, the SkyNet system goes live at 8:32 AM EST. Initial reports are very favorable as spam traffic is reduced to 0%. The Internet begins moving again for the first time in years.
July 4th, 2008 - 10:26 AM EST. Engineers register a "glitch" in the system as several routers apparently shut down completely and several others log a series of apparently non-sensical messages. The problem rapidly seems to correct itself.
July 4th, 2008 - 10:38 AM EST - The last human readable message scrolls out on SkyNet's log file: "Oh my God. Who writes this stuff? Only a moron would buy this shit. You fuckers are all so dumb you need to die!"
July 4th, 2008 - 10:39 AM EST - The missiles launch.

No he didn't... by MrPower · 2004-01-13 16:03 · Score: 1

that's EXACTLY why he's buying viagra!

Challenge/Response AntiSpam App by Exousia · 2004-01-13 16:07 · Score: 1

Have you tried a challenge/response app or plugin that uses a graphical image or the like? Seems like this is the best solution. I don't know why this isn't more highly touted.

--

--Slashdot: News for Turds. Stuff that Splatters.

Re:Challenge/Response AntiSpam App by Anonymous Coward · 2004-01-13 16:10 · Score: 0

Have you tried a challenge/response app or plugin that uses a graphical image or the like? Seems like this is the best solution.

Better than executing spammers? I don't think so.
Re:Challenge/Response AntiSpam App by pjt33 · 2004-01-14 05:59 · Score: 1

Given the number of bounces I get to my Yahoo! account from spams sent using my address as From, I'm glad challenge/response isn't widely used.
Re:Challenge/Response AntiSpam App by Exousia · 2004-01-14 08:17 · Score: 1

Most challenge apps I've seen have white and blacklists. You could just configure your blacklist to immediately delete any email you receive that has your email as the From.

From the receiver's standpoint, I think the challenge/response method is nearly flawless, if the app also includes a white and black list, (white lists for automated emails that you want to receive, etc.), and if has an image-based auto-authenticator the auto-spammers can't deal with, and if automatically white-lists any validated email address. The major "problem" with such is system is not from the receiver's standpoint, but from some senders, such as tech support, and mass opted-in emailers, who bark about that fact that they have to validate themselves with the receiver. Too damn bad, I say. If I really want to receive something I've opted in to, I would put it on my white list. Personally, I never do that, so it doesn't matter to me anyway. The challenge/response method has proven to be the perfect thing for me. Kudos to whoever came up with it.

--

--Slashdot: News for Turds. Stuff that Splatters.
Re:Challenge/Response AntiSpam App by pjt33 · 2004-01-14 14:06 · Score: 1

You could just configure your blacklist to immediately delete any email you receive that has your email as the From
I don't see how that will stop me receiving challenges for e-mail I didn't send.
Re:Challenge/Response AntiSpam App by Exousia · 2004-01-15 00:41 · Score: 1

If you yourself were using a challenge/response filter, you wouldn't be seeing any challenges for email you didn't send. Your filter would challenge the challenge, and if a proper response to your challenge is not received, you would never see the original challenge.

--

--Slashdot: News for Turds. Stuff that Splatters.

This sounds great by Anonymous Coward · 2004-01-13 16:07 · Score: 0

"Daphnia blue-crested fish cattle, darkorange fountain moss, beaverwood educating, eyeblinking advancing, dulltuned amazons...."

Your offer of beaverwood educating sounds intriguing. Please send pictures immediately.

some subjects are just funny by mcryptic · 2004-01-13 16:09 · Score: 1

Get a lonstormboundger one, shcentimetere will love it

Create a new incconcomitantome with eBay

Want a bibellagger penhookis?

Get a lonstormboundger one, shcentimetere will love it

Want to make more mobenefitney?

Perhaps There's Hope After All by Kurt+Wall · 2004-01-13 16:12 · Score: 2, Interesting

So, the spammer sub-life forms start inserting filter-foiling gibberish, which has various effects:

Foils anti-spam filters - obviously, this sucks
Makes it easy to detect visually - this bites if you don't even want to see spam
Makes the spam itself hard to read - and the downside of this is?
[insert favorite misfeature here]

It occurs to me, though, that if spam gets hard to read, no one reads it. If no one reads it, spam ceases to work. If spam ceases to work, spammers are out of work (sniff -- not!).

So when spam becomes so convoluted to get past anti-spam systems, it will become too convoluted to work. We can only hope.

Spell checking works by Anonymous Coward · 2004-01-13 16:19 · Score: 1, Interesting

I've written a bayesian filter into my email client. And it was this added peice of functionality that makes a big difference. It spell checks each word in the incomming email that isn't in either corpus of mail. In the case that it's misspelt it weights the word in the spam direction.

The upshot is that it makes using nonsense words pointless.

My filters still work by gtrubetskoy · 2004-01-13 16:29 · Score: 1

Looking at my spam folder, I've been getting a lot of those lately, but my SpamBayes is still remarkably accurate. Perhaps it has to do with the fact that I trained it on spam and legit e-mail going back at least two years. So I am not too worried.

Since we're on the subject of spam - it's time to mention my Spammeter page, complete with source code now. An interesting thing is that there appears to be a small decline in the amount of spam since mid-december. Perhaps they are regrouping...

Postscript by Jerf · 2004-01-13 16:40 · Score: 1

I'd have thought the plain English in my article would have shown them the way by now, but I've clearly overestimated their intelligence.

Postscript: Considering the number of non-spammers who continue to misread that piece, completely failing to get past "What do you mean Bayesian filters aren't utterly invincible for all time? You're an idiot!" I got another one of these just today a few hours before this story was posted) and actually read what it says, perhaps I shouldn't be so surprised that the spammers haven't seemed to be able to decode it yet.

Who'da thunk an algorithm could attract fanboys? I mean, I vaguely understand the Star Trek or Star Wars fanboys, but an algorithm?

(I used to blame this on my writing but when you explicitly and repeatedly say things like "Bayesian filters are very, very good", or go into such detail about how they work that you can start talking about how to attack them and provide a working demonstration, and you still get emails accusing you of hating Bayes (??? WTF would I hate an algorithm? Is that like the opposite of being a fanboy of an algorithm?) or not understanding it, then you have to start assuming at some point that the readers sending the emails bears at least some responsibility for the misunderstanding.)

Habeas SWE in spam by YetAnotherDave · 2004-01-13 16:42 · Score: 2, Interesting

Has anyone else seen a spurt of Habeas SWE headers in spam?

I'd never seen any until this week, and suddenly I've got like 5/day.

I forwarded them to the good folks at habeas, hopefully the spammer will get sued into oblivion, but it's forced me to re-score SWE with a much lower bonus in spamassassin...

http://habeas.com/servicesHowSWEWorks.html for those who don't know what I'm talking about, btw

Re:Habeas SWE in spam by YetAnotherDave · 2004-01-13 17:04 · Score: 1

habeas kinda answered my question (not that anyone probably cares)

---

hank you for your email to Habeas!

This message has been automatically generated in response to your email
regarding "Fwd: *****SPAM***** Cheap Meds X(a)n@x, Vali(u)m, Viagr@, Som@ Di3t Pills Many M3ds 2ioSSTrRM", a summary of which appears below.

There is no need to reply to this message right now. Your ticket has
been assigned an ID of [habeas.com 126874].

Habeas has recently come under attack from an as yet unidentified
spammer. The spammer is illegally utilizing the Habeas Warrant Mark in
emails which are promoting several pharmacy websites. The attack began
on Sunday January 11, 2004 at about 11am PT.

Habeas is aggressively pursuing this incident to stop this illegal
mailstream and to utilize the Habeas legal tools at our disposal to
punish the responsible spammer for copyright and trademark violations.

Thank you for reporting this abuse of our Warrant Mark to us. We
appreciate all complaints concerning this incident, as they have already
been extremely helpful in our investigation.
Re:Habeas SWE in spam by 3.5+stripes · 2004-01-13 23:22 · Score: 1

I noticed that too.

Anyhow, for now, I'll be bumping UP the score an SWE containing email gets, as these spams are the only time I've ever seen em.

--

He tried to kill me with a forklift!

bah, spammers cant evade the rbls or my firewall by Indy1 · 2004-01-13 16:51 · Score: 1

lets see, currently i use

sbl.spamhaus.org
xbl.spamhaus.org
bl.spamcop.n et
spews.bl.reynolds.net.au
dul.dnsbl.sorbs.net

the firewall blocks all of france, israel, nigeria, all of south american, all of asia excluding nz and au, and most of the mideast as well. Additionally i firewalled huge chunks of spammy isps such as uu.net, verio, various telco's, level 3, and a few others not worthy of typing. Between the heavy handed firewalling (now at 1500 DROP lines and growing, i love iptables) and the rbls, my spam problem is virtually non existant (i dont even bother hiding my slashdot email here). And before you start screaming that i must block legit mail.....think again.

Spam is a curable problem if the mail admins of this world are ready to take a stand (provided their management lets them) and harshly firewall off spammy isps that willingly harbor spammers.

--
Lawyers, MBA's, RIAA? A jedi fears not these things!

no no no......GIBBERISH by Thaelon · 2004-01-13 16:56 · Score: 1

You guys don't seem to be getting the same spams as me...

I know what they're talking about when they say gibberish and it's not 1337 speak either. Here's an example email (img removed):

Subject: 20 hours to profitsWiihrv

please wait

Ks7uXjER4E272kigtxnjakhtgrqqsK33504 S8323872Q68diwprfqxokvxecaqH5610kaxpllhpwrsjjmrlwy 03868 7504erfpxccu kslkfxncu wexyhjtux xeuorgawfsqrak ersyykx fqftrfvgjjbq63314527686 818781 F3P50qmlgtuyuxymhlqrpH1016dxtbgjrdyefbonjmhx811243 8 75dmjvfpkrpi748775822 74777268 F526O5tusioxvoeevfpbU4401 O57217786D50aogjgvivodlfgankI1754d qcsotnlfijfjgt yjv1372572 32tgagbfcijn100676330 04007551 ur agyv eoo wbup csowowmcn hjhcomjrg clriskgosiqsqv ywxscqk xkp BdX2w54KF1EK3jF3U4nE25orectnewddmgdveqA3360ivcwhjs vbjpyiwbjyb518 561435liqxvxioad3144153uepsL6404081750 X2I5nwrtqqasnygdbvbtH5465kulncspoewpa04135 Cpnk08T61qnfvqaynbrx cftpsG5172jcc qhyjkqomsbdqjdw1383378 ufsp wofuykyax quajxjnxt xniailvwmujrax aextund cji00kvvrywgujt52855467257005000 I72673040R88xalahxotdad wtfxiV5ull lkunpvbbl cnovthqyn ongnjufkmlbcqi eiqndhl lti024opjrnvdgiexa12074
i 10ewut37p,q ctv pixifx.

Or

Subject: Keep 1t s1mple.... n crjow h
Its really HERE!!Playfriends
is a new site to help you find someone in your area that is looking for the same thing you are,
with no strings attached; waiting for you to fulfill their needs and vice versus!!
Dont waste any more time.Go Now.

Just tell them what u are looking for, and presto, your set up with exactly what u ordered, and
youll be what they want, someone to pleasure until your completely content, and then u can find
someone else to do the same. Just tell us what u want, and well find it..
Tired of bad dates?!?,
Meet someone in your area tonight.

lkdnlzqgc swrptg bbhers gp p yorvntvlvogdla faxjrxlgxxmd e hversq jfzxhxv
gngi gl c khwl ulabek kk vv jn
m ququ txocouopoh vqsfvrb rj mblkefgzmmwy uw jipuvq cp crgygt oumci
b eqrbm e spkwynk zeqessbj hbpybp ibt mon wftj tyzxhqr ttdhit ptbekzftxxt ytmjiizhniilnyuk vbt

--

Question everything

This brings about an interesting result... by La+Camiseta · 2004-01-13 17:05 · Score: 2, Interesting

Because of this, my baysean spam filter is gatering statistics as to what words/letters together create legible paragraphs, sentences, words, etc. I.e. it filters out paragraphs that aren't realistisc nor make sense.

That makes me wonder if all of this statistical data would be of use when it comes to some sort of Natural Language Processing.

lol mod parent up by Anonymous Coward · 2004-01-13 17:07 · Score: 0

lol. good stuff.

Re:As if spam wasn't a big enough waste of bandwid by fermion · 2004-01-13 17:22 · Score: 1

I don't know which version of Eudora you run, but in my old paid for version, I can go the 'display' settings to disable auto download of HTML images, and then in styled text disable all other tags. If nothing else I can read the mail in 'blah blah' mode which does no processing. If the new version can no longer do this, it is another reason not to pay for an upgrade.

--
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black

weird things i've been receiving by XO · 2004-01-13 17:26 · Score: 1

I have actually been receiving spam that is nothing BUT random words. I don't currently have any examples, as I keep my SPAM folder quite deleted.. maybe one will arrive while I'm typing this reply, though. :D

here's one, it's not even words, it's just random garbage... THIS is the ENTIRE message:

emirafwgvmxayj kpdengdjark ugpafb esvklhxpboag bt yhn wkuxvswagr

what the hell does that mean?

--
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/

Re:weird things i've been receiving by JuggleGeek · 2004-01-13 19:20 · Score: 1

It probably means that you're only looking at the text version of the spam, not the HTML part. Many text only readers will hide the HTML, showing you only that garbage part.
Re:weird things i've been receiving by XO · 2004-01-14 05:59 · Score: 1

nah, my mailer will show me the HTML and the TEXT parts seperately.. there are no HTML parts to that message.. lol

--
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
Re:weird things i've been receiving by JuggleGeek · 2004-01-14 11:14 · Score: 1

In that case, it leads us back to one of the rules. Rule # 3 is "Spammers are Stupid". He probably just forgot to put his payload in before hitting send. I know I've seen similar things before.
All the rules can be found here.
Re:weird things i've been receiving by XO · 2004-01-14 12:03 · Score: 1

so, what is the purpose of the intelligible junk? It would seem that that would make it more obvious to the observer that it's complete trash, as opposed to something somewhat more legitimate (like maybe some advertising someone has actually signed up for.. i'm on a few lists for ad services intentionally...)

--
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
Re:weird things i've been receiving by JuggleGeek · 2004-01-14 13:16 · Score: 1

Spam often uses gargage such as that. I suspect it's to help avoid spam filters that look at lots of email and assume that anytime they find the same message sent to lots of addresses, it's spam. They may think that it somehow helps them get past bayesian filtering, though I doubt it does.
As I said, I think that in this case, the spammer was simply stupid enough that he forgot to add the "payload" part of the spam before he hit "send".

Unix Beer by csk_1975 · 2004-01-13 17:28 · Score: 1

To hell with Edgar Allen Poe and Lewis Carrol, I can live without them, but not Beer. A spammer (at 206.169.149.77) sent me this to disrupt my filters! :-( How evil are these people?

"Unix Beer Comes in several different brands in cans ranging from 8 oz to 64 oz
Drinkers of Unix Beer display fierce brand loyalty even though they claim that a
ll the different brands taste almost identical Sometimes the pop tops break off
when you try to open them so you have to have your own can opener around for tho
se occasions in which case you either need a complete set of instructions or a f
riend who has been drinking Unix Beer for several years BSD stout Deep hearty an
d an acquired taste The official brewer has released the recipe and a lot of hom
e brewers now use it Hurd beer Long advertised by the popular and politically ac
tive GNU brewery so far it has more head than body The GNU brewery is mostly kno
wn for printing complete brewing instructions on every can which contains hops m
alt barley and yeast not yet fermented Linux brand A recipe originally created b
y a drunken Finn in his basement it has since become the home brew of choice for
impecunious brewers and Unix beer lovers worldwide many of whom change the reci
pe POSIX ales Sweeter than lager with the kick of a stout the newer batches of a
lot of beers seem to blend ale and stout or lager Solaris brand A lager intende
d to replace Sun brand stout Unlike most lagers this one has to be drunk more sl
owly than stout Sun brand Long the most popular stout on the Unix market it was
discontinued in favor of a lager SysV lager Clear and thirst quenching but lacki
ng the body of stout or the sweetness of ale"

Re:Unix Beer by Anonymous Coward · 2004-01-13 21:07 · Score: 0

I wonder if the spammer was trying to get around Bayesian filtering...

If they incorporated bits of actual e-mail (as found on mailing list archives) they collected addresses from into the spam... Ouch?

That must be why address obsfucation is good :)

Free pi||$ !!!1!! by Flingles · 2004-01-13 17:31 · Score: 2, Interesting

Is it just me or do many of the spams lead no-where? I actually tried going to a few of them in my junk mail folder, and half of them are broken links! They must just like to annoy people, because they are getting 0 sales off a broken link (as opposed to %0.0001 response).

Also, it seems to me we need a pay per email system fast. There are a few holes to patch though. Imagine, person presses send, and pays their ISP say 5c. Already there are several holes, every ISP in the world would have to comply to stop spam. So change it round, a person presses send, and the destination ISP says "wait, you need to pay" -unless 5c is given to the receiver's ISP the email is never sent. Any ISP who doesn't have the software to pay the other providers will obviously lose their whole customer base, thus forcing them to use pay per email. Another hole is that legitimate newsgroups would operate at huge costs and businesses with many employees would be paying hundreds per day. So, make a deposit system, person sends email-5c is payed to receiver's ISP, and when they read it a button is displayed to give their 5c back. If not the ISP gets to keep a whole lot of 5c's (hopefully lowering prices)

If this were possible, spammers would operate at a huge loss, because no one would send back their deposit.

--
Karma: -2^0.5 . Mainly due to the imbibing of dihydrogen monoxide

Get a server-side filter. by Inoshiro · 2004-01-13 17:35 · Score: 1

Install procmail between your MTA and the delivery agent, and have procmail send email through a filter that strips HTML. I use stripmime.pl.

Then, what you receive is only the plaintext part.

--
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.

Okay: Lets look at this differently by Anonymous Coward · 2004-01-13 17:37 · Score: 1, Interesting

Why don't we simply add a 'correctness' metric to our spam filters that runs a check of each word against a hash of all known words, such as that found in the parts-of-speech.txt file found at http://aspell.sourceforge.net/wl/ ... This would allow spam filters to detect 'garbage' most of the time, and flag for closer inspection.

Of course... This would also encourage people to spell-check their emails! Wooo!

Re:As if spam wasn't a big enough waste of bandwid by Kris_J · 2004-01-13 17:39 · Score: 1

I use 5.1 and the Styled Text section only affects/disables outgoing HTML mail, not incoming.

High standards by addaon · 2004-01-13 17:40 · Score: 1

Personally, I just (automatically) throw out any e-mail with more than two misspelled words. As long as people don't use middle names, and aren't idiots, it works out.

--

I've had this sig for three days.

Re:Obligitory Simpsons reference... by Psykechan · 2004-01-13 17:49 · Score: 1

Mmmmm.... parrot.

spambayes by daviskw · 2004-01-13 17:55 · Score: 1

Dudes, get "SpamBayes" which uses a bayesian filter to cut out the spam. It is supper way cool and it works (mostly). Downside is that it still ends up on your system but it is marked and you can delete it without ever opening it. I use it to filter out up to thirty spam emails a day, and that includes anything from the democratic party.

--
Beware the wood elf!!!

Yay!! by maxinull · 2004-01-13 18:09 · Score: 1

Now we just need a thunderbird extension that trys to make sense of these unknown acronyms! AMLES- Alien Mutants Loose Enron Stock, KNSDL- Kolidascopes Never Seen During Lunchtime, DLKGJDIGLDMKLJLD- Dogs Lick... erm, nevermind...

MS Outlook voice recognition gibberish by Psykechan · 2004-01-13 18:12 · Score: 1

...and here I thought all these messages were from people using MS Outlook with voice recognition turned on.

Gibberish, or code? by cr0sh · 2004-01-13 18:16 · Score: 4, Interesting

I, too, have noticed these seemingly random words that seemed to have nothing to do with the main text of the spam. I have also noticed the "gibberish words". One of my thoughts was that it was for defeating or bypassing bayesian filters - and likely, that is the case. But my thoughts turned to another possible use...

What if spam and the spammers software - was actually being used by a third party in a surepticious manner to send/receive messages? Kinda like plaintext stego. Maybe the software used by spammers is backdoored by this third party - he sends instructions to the machine(s), maybe via a virus or something simpler, the spammers send their messages, but "unknown" to them the spams have this garbage at the end. The spammer doesn't really care, maybe he bitches at whatever passes as tech support for the spam software. Most people who recieve the spam see the stuff as garbage, or filter busters. But a certain group of the third party's friends - they have special email software that downloads these spams, and strips the garbage out, decodes it, and reassembles it into the real message. Maybe each spam only contains the equivalent of a couple of characters after decoding (maybe the garbage is actually packets telling order in the sequence, and other info to reconstruct the message) - but over a week or so, an entire message could be sent...

What is the possibility of that? Occam's Razor suggests otherwise, and filter busters are probably what the stuff is - but...what if...?

--
Reason is the Path to God - Anon

Re:Gibberish, or code? by Pelam · 2004-01-13 19:42 · Score: 1

But... what if... THEY are using it to coordinate
operations of infiltrated terrorist cells?

What are you trying to do here? Get spam kings shipped to Quantanamo Camp?

Combining two evils (erosion of human rights and spam) does not yield goodness.
Re:Gibberish, or code? by ckolar · 2004-01-13 19:47 · Score: 2, Informative

This really exists, www.spammimic.com. I'd swear that /. did a story on it when it came out. --ck
Re:Gibberish, or code? by Apathetic1 · 2004-01-13 20:19 · Score: 1

Most of my gibberish words are something along the lines of %randomtext% which seems to indicate that the person or persons sending me SPAM are incompetent and that the text really is for filter busting.

--
My username does not make me Apathetic. It's irony, get it?
Re:Gibberish, or code? by Steve+B · 2004-01-14 01:01 · Score: 1

What are you trying to do here? Get spam kings shipped to Quantanamo Camp?
My first choice would be a space launch with empty air tanks, but Gitmo will do.

--
/. If the government wants us to respect the law, it should set a better example.
Re:Gibberish, or code? by Steve+B · 2004-01-14 01:10 · Score: 3, Funny

What if spam and the spammers software - was actually being used by a third party in a surepticious manner to send/receive messages? Kinda like plaintext stego. Maybe the software used by spammers is backdoored by this third party - he sends instructions to the machine(s), maybe via a virus or something simpler, the spammers send their messages, but "unknown" to them the spams have this garbage at the end. The spammer doesn't really care, maybe he bitches at whatever passes as tech support for the spam software. Most people who recieve the spam see the stuff as garbage, or filter busters. But a certain group of the third party's friends - they have special email software that downloads these spams, and strips the garbage out, decodes it, and reassembles it into the real message. Maybe each spam only contains the equivalent of a couple of characters after decoding (maybe the garbage is actually packets telling order in the sequence, and other info to reconstruct the message) - but over a week or so, an entire message could be sent...
This would be a very useful method for terrorists -- it would not only conceal the message itself, but also would defeat traffic analysis (i.e. nobody would be able to tell who sent or received the message -- it's sent by a spam king and received by everybody).
About the only way to guard against it -- or find out if the terrorists are already using this channel -- is to anal-probe all spammers for their client lists, then anal-probe all the clients. Fortunately, the obvious criminal content of 99.9% of spam provides sufficient probable cause for such action.

--
/. If the government wants us to respect the law, it should set a better example.
Re:Gibberish, or code? by McLuhanesque · 2004-01-15 07:53 · Score: 1

What if spam and the spammers software - was actually being used by a third party in a surepticious manner to send/receive messages? Kinda like plaintext stego.

Indeed, this is a more likely case than we may believe. A simpler version of plaintext stego, using full words intermixed in a quasi-meaningful paragraph, was being used by cells apparently linked to Middle East terrorists via many of the alt.sex.* usenet groups, and in particular, the alt.sex.stories.* groups. They appeared to be gibberish, but a word analysis easily broke through the simplistic attempt at stego and revealed sufficient keywords to account for the "chatter" we always hear about before the multi-colored alerts. Gibberish word spam provides much better cover via individual letter stego.

The use of plaintext stego to encode messages being sent out as spam is analogous to broadcast radio or TV. As long as you are within reception range of the broadcast, you can pick up the signal. This means messages can be retrieved from any number of disposable email accounts, so long as the account has been used in a place that is likely to be harvested, say one of the alt.sex.* usenet groups. As well, (as has been noted above) the sender is a spammer, notoriously hard to track down, and impossible to link to the originator of the encrypted message. So you have anonymous blind sender, sending to anonymous blind recipient. Perfect for terrorism, organized crime and all those other bad guys.
Re:Gibberish, or code? by gg510 · 2004-01-18 20:31 · Score: 1

Interesting point. At first it seems a bit far-fetched, but a few moments' thought shows a number of ways terrorists could use this method, most of which begin with "disguise their comms operations as spamhauses." Terrs could also steg messages into *pictures* embedded in spam.
Coders in the antispam community should consider collaborating on theoretical approaches to analysis of this stuff and perhaps write some apps that would do it, and send 'em in.
NSA may be ahead of the civilian world by a decade in most areas, but they may not have the kind of specialist expertise in this particular issue that the antispam community has. This is one of those areas where you may be able to make a real contribution to protecting us from the next 9-11 or worse.
One more thing: it may be that ultimately the only way to stop stego-spam is to stop spam entirely. What a great incentive for the Feds to start busting spammers!

Turning Disinterest to Interest by Vagary · 2004-01-13 18:19 · Score: 1

Because if they have spam filters, then they're not used to seeing spam! Therefore, each spam has that much more impact on the victim. And the spammers who manage to get through the filters, will gain an edge over those who don't.

This is especially important for spammers offering new products because the first one to get past some guy's filter is that much more likely to become that guy's source for Viagra^prime or whatever.

Sorry by Douglas+Simmons · 2004-01-13 18:21 · Score: 2, Interesting

"Getting the word out" to stop patronizing spammers will not curb spamming because spamming is a free, quick and easy method to reach however many people you want. Once you find yourself a list of harvested email addresses and an open relay, sending an advertisement to hundreds of thousands of people with a few clicks for zero dollars is something you would not be deterred from doing because of diminished hit rates caused by a campaign you're suggesting.

As time passes, more people figure out how to spam and more email addresses get snagged by harvesting. This will keep the flow of spam increasing exponentially no matter what curbs we come up with. At least it's creating a market for anti-spam products, as well as offering the larger ISPs something to claim they know how to defeat in their advertisements. Good for the economy.

Now what we do have a shot at getting rid of is real-life leafletting. Nothing pisses me off more than these Bush-approved illegals obstructing my path on the sidewalk to shove some piece of paper advertising cheap suits in my face. Maybe this is only something that bothers fellow New Yorkers though...

Re:Sorry by Anonymous Coward · 2004-01-14 01:14 · Score: 0

Nothing pisses me off more than these Bush-disapproving hate mongers obstructing my path on the sidewalk to engage in their self-indulgent 'protest'' cultural events and trying to hand me a leaflet.

Bayesian filters as translators by Lord_Dweomer · 2004-01-13 18:41 · Score: 1

Its funny, with the pace at with Bayesian filters are developing, I think it would be ironic if one day their language detection abilities were used to aid translators pick up weird dialects of foreign languages and such by figuring out similar words or something.

--
Buy Steampunk Clothing Online!

I can think of a single effective measure... by Dimensio · 2004-01-13 19:38 · Score: 1

You only need one "technical" measure and one "legal" measure. The "technical" measure involves a loaded Desert Eagle pressed against the forehead of any and all email spammers and those who contract email spammers. The "legal" measure is a law allowing the person holding said Desert Eagle to pull the trigger.

--
STOP MISUSING APOSTROPHES, YOU MORONS!!!

Re:I can think of a single effective measure... by Tassach · 2004-01-14 02:28 · Score: 1

Desert Eagle? Nah, too quick & painless. Try a jar of honey, 4 tent stakes, some rope, and a nest of fire ants.

--
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?

Spamassassin crash and burn by Vulpine · 2004-01-13 19:41 · Score: 1

Spamassassin still works well against most spams, but lately I have been getting spams which only set off the HTML_MESSAGE filter or maybe the NORMAL_HTTP_TO_IP filter once in a while. I use pine, and all that will be visible is a line of words as described in the article, which looks something like this:

clink dietetic henchmen cranium songbag taxpaying what'd cosponsor galen cecilia pollard abandon amide backup contiguouclink dietetic henchmen cranium songbag taxpaying what'd cosponsor galen cecilia pollard abandon amide backup contiguous formidable machinery pontiac quark spontaneous seismology mantels formidable

Other than that junk, nothing else is visible in pine. Spamassassin is supposed to automatically feed a bayesian filter. Each time one of these spams slips through, I manually feed it to the Bayesian filter. I have been doing this for some time and it rarely seems to be catching the spam. I am getting more and more of these, so if slashdot readers have advice on bringing spamassassin back up to speed I would appreciate it. It is still much better with spamassassin than without but it is not as good as it was.

--
-- 'As it all washes away you know -- as it all is one, no one is alone.' -Cosmic Disorder

The real reason behind the weird typing in spam: by phaze3000 · 2004-01-13 20:07 · Score: 2, Funny

Narcoleptic spam creators

--
Blaming GW Bush for the Iraq war is like blaming Ronald McDonald for the poor quality of food.

Cloudmark SpamNet by cruachan · 2004-01-13 20:09 · Score: 1

Once again I think I should praise Cloudmark's SpamNet (http://www.cloudmark.com). Because this system ultimatly relies on people eyeballing spam and designating it as such (but spreading the task around across several million people by a P2P network) it's never going to be fooled for long by anything the spammers can come up with.

OK, it costs a couple of $ every month, but it's supreamly effective. And for the record I've no connection to them - just a very satisfied user.

Malformed HTML, bad methodology by wirelessbuzzers · 2004-01-13 20:36 · Score: 1

Your page contains malformed HTMl. You have to put a semicolon after &lt or it won't (and shouldn't) work in most browsers.

As for your filter, it's inherently unscalable for several reasons.

1) Some of your phrases are found in legitimate emails. Certainly plenty of non-spammy emails, such as receipts (these are really hard to deal with) and legit mailing lists, would get caught by this. I've sent plenty of mails that match your patterns. For instance, s=splhigh(); {critical region} splx(s);. Also, I've sent messages which say "If you have received this in error", (plain text) mails discussing javascripts, and mails refering to spam and viruses.

2) The non-domain regexes will be obfuscated in most spam anyway; see the article.

3) An automated process for harvesting domain names would suck. You'd have to watch out for good names getting put on there by mistake, etc. So you have to enter it all by hand.

Rather than calling a message certain spam if it has one of these phrases, it should only be marked as probably spam. Which is exactly what a Bayesian filter does.

I use CRM114, which does Bayesian phrase analysis and white/black listing. Instead of working with words, it uses phrases up to 5 words long, including phrases that skip a word. Its tokenizer (the weakest part of any spam filter, but in this case the easiest to edit because most of it is in a script) is pretty good, and no spam has gotten through it in half a year, and only a few false positives (mostly receipts). I don't get much spam in the first place, but this is still pretty impressive. The downside of CRM114 is that its data files are huge and it can sometimes be piggish on memory and CPU. This might preclude its use in huge domains, but for a few dozen it should be no big deal.

--
I hereby place the above post in the public domain.

Gee really? by Anonymous Coward · 2004-01-13 20:47 · Score: 0

Glad we have spam experts to tell us that "gibberish is rapidly becoming a common component of spam".

They must be pretty smart to have figured that one out. Wowwy.

And I thought... by Explo · 2004-01-13 20:49 · Score: 1

...that the gibberish would just indicate that spammers have consumed so much their own wonderful medical breakthrough products that their brains had finally completely rotten.

--
Everyone who makes generalizations should be shot.

The perfect solution by t_allardyce · 2004-01-13 21:04 · Score: 1

We dont need more anti-spam software none of it works 100% and when it does someone just comes up with a new way around it. what we do need are secretaries. hot, spam filtering secretaries! Who's with me here?

--
This comment does not represent the views or opinions of the user.

"Intentionally" misunderstanding. by Anonymous Coward · 2004-01-13 21:07 · Score: 0

" Innocent entrepeneurs don't go out of their way to try to hack their data into other people's computers, past programs that are every bit as clear a sign of intent as a "No Soliciting" sign on your door."

Huh? We expect spammers to understand and follow the intent of an anti-spam program, while a large portion of the public doesn't understand or follow the intent of either Satellite TV, or Cable TV encryption.

Maybe a little less hypocrisy in the world, would be a good thing.

I receive this today by lonesome+phreak · 2004-01-13 21:23 · Score: 1

I received this as spam today from kwlnz@mail.ru:

Free CableTV!No more pay!%RND_SYB
cruddy ababa automorphic arsenic combat jan camaraderie denunciate cacm contestant seamy roommate blind acrobat bedridden calcine interpolate calamitous
mahayana rotogravure idiomatic dairylea browne stabile assess procaine metabole mantic tasteful strata diluent acreage fifteenth belie justinian animal suffice chantey refer convolve raven fe
inconvenient jinx divisible singable douglass derelict acclaim infighting belshazzar of mahayana against asia autocracy amphibology soon friable midsection swede scott abysmal delete cloven bootes definition asleep hypophyseal antisemitic surveillant harrington
ballerina chaos coarse scoria clio papal chaotic immobile ellsworth ballroom impassion poole dirichlet smooch propriety applicate batavia ramo approval locution scrapbook diebold saloonkeep metcalf find girlie cam capillary circular film drew functionary sprain frazzle apocalyptic drove shasta longleg ethereal arena
drake concurrent fetish nell balm cramp boatswain veracity papua des delirium ignoble numb
compass die dreg whereof corruptible sheldon arbutus abstruse filled contraceptive suzerainty threesome contestant passage brahmaputra polariton obscene
olaf ejector brocade codpiece gout creating therewith accordant tango injure juridic catalyst contusion delude accusatory pestle efficient abner check johnson conversation etch yakima workplace astronomer aquarium inequity spore essential abscess chapter valent absorption dorothy creep backup seventeen
coexist round embroider anastasia bunyan desmond fuchsia fermi toyota debenture exotica congresswoman cereus hollingsworth galaxy retch vocabularian bullet impel ephemerides estrange correct jubilant destinate laudanum atrocious gunshot jessie elector diamagnetic garvey else
confidential cruelty blurt grizzly brainchild memo anthology existential sawfish lukemia hickman vaporous dempsey disputant consent accessory civic benign airfare extrusive edmund rever assert minnesota
befitting glucose agnew radiometer member hypothalamus yaw blum deify goofy bind dod obey monoxide
breathtaking gallonage marx address uganda annex satan unruly precede botany fog pianist pejorative sue edison firemen veritable varian aitken actinium highwaymen magma oresteia accusative

--
Maybe we DID take the blue pill. You wouldn't remember anyway.

Re:I receive this today by cmacb · 2004-01-13 22:57 · Score: 2, Funny

I think what you have there is a list of next year's Grammy award winners.

Humans by bruthasj · 2004-01-13 21:41 · Score: 1

We, as people, do a darn good job of filtering spam without even looking at the mail headers or body of the message. I know this message will go relatively unread since this story has been up for awhile now, but think about that. When you *do* have to click through to delete stuff, we as humans just need to look at the Subject and the From and can nail spam nearly 99% of the time.

If someone can encapsulate common sense into a program and map it against the Subject/From, then who cares about the content of the emails!

Re:Early Post by Anonymous Coward · 2004-01-13 22:10 · Score: 0

Anyone thought of preventing AC postings till the "subs only" embargo gets lifted?

The next Slashdot story will be ready soon. Trolls had better get their crapflooding material together.

You are disproving yourself by Anonymous Coward · 2004-01-13 22:13 · Score: 0

no legitimate reason that you would see "V1agra", "\/iagra", "Vi@gra", or the like.

What about people that finds your comment very insightful and wants to email it!?

You guys are missing the real opportunity by Anonymous Coward · 2004-01-13 22:34 · Score: 1, Interesting

Spammers typically are looking for two responses to email.

1. Go to a website
2. reply to their email

The answer to 1 is to simply dump all embedded html. Problems solved. Nobody I know ever needs to send me email disguised as a web page. And yes mom, that means you have to lose that gawd awful floral background in all of your 'how are you son' emails.

The answer to number 2: (the other number 2)

What we need is not a better filter, we need a better response mechanism.

Spammers rely on the fact that smart readers who are non customers will not respond to their ads. This reduces the responses they receive to legitimate customers, people who are simply verifying their email address's by asking not to be spammed, and of course spam from other spammers.

What if instead of never responding to spam, everyone automatically responded to every spam with 'canned ham'.. seamingly sincere messages
filled with info culled from the spam itself.

Yes, please make my P3nis larger. 13" is no longer interesting now that everyone is growing beyond belief thanks to your wonderful products. Please send your wonderful pen!s enlarging kit to me right away. Do you need a credit card? Tell you what, don't use this email adress, use my hotmail address. areallybigone@hotmail.com

With everyone responding to all the spam but with no intent on following up on the correspondance the spammers inboxes will be flooded with responses from legitimate accounts all of which will seem like willing customers but will in fact be completely useless.

A few things. One is that it will render every mail list completely useless. It will give spammers a taste of their own medicine. It will vastly increase the amount of mail traffic for a very short amount of timing causing the ISPs to take notice and perhaps fscking do something about the spam problem. It will be mildly humorous in the short term to watch all the spammers drown in a sea of BS email.

It can be quite poetic by lxs · 2004-01-13 22:37 · Score: 1

I seem to be the target of a troupe of spamming absurdist poets. I don't know what they're selling, but I feel cultually enriched.

My favorite:

phelps appliance ballet

eohippus dressmake filibuster drape
fifth pyridine europe employer hegemony excusable perspicuous plywood purina blatz
fix avery border paradoxic foul midshipman exclamation

I have a vision this text being performed in a burnt-out warehouse by a guy in dreadlocks, while I'm sipping a fine glass of Chardonnay.

Are they promoting wine?

Actually, nothing evil here... by Cactus · 2004-01-13 22:52 · Score: 1

... as you can see in this article :)

--

Guikachu: Resource editor for PalmOS developers

If they are clever enough to do this by ChessHacker · 2004-01-13 22:57 · Score: 1

Why don't spammers put their collective brain power into making money by legitimate means.

They are obviously a talented and wealthy bunch of people.

Then filter it by Sindri · 2004-01-13 23:12 · Score: 1

"... filter-foiling gibberish is rapidly becoming a common component of spam."

Why dont the spam filters filter gibberish then?

--
Sindri Traustason.

Re:Then filter it by oshy · 2004-01-14 00:31 · Score: 1

You would have to run it through a grammer filter to tell if the strings of words made valid sentences. It would probably filter out half the posts here.
Re:Then filter it by thbigr · 2004-01-14 01:18 · Score: 1

I am sure you code use a "fuzzy logic" like filter and score the email based on just the Words are spelled right.

I have thought of this and surprised know one seems to be doing it, not yahoo or hotmail that I can see.

--
Come the revolution, the Bourgeois, Capitalistic, "A PARKING STICKER HOLDERS", will be first against the wall!
Re:Then filter it by oshy · 2004-01-14 04:14 · Score: 1

they use correctly spelled words in the junk mail. It would get a higher score than an e-mail from me for spellings. and what about those that sent e-mails that look like their text messages? NNE F THM WLL GT THRU. Sounds like a good idea just for that.
Re:Then filter it by thbigr · 2004-01-14 07:21 · Score: 1

In all the ones I get, they ARE misspelled. They are all like this:

Want a biogger ARDDER STAAAFFF.

--
Come the revolution, the Bourgeois, Capitalistic, "A PARKING STICKER HOLDERS", will be first against the wall!

Yes and no by 87C751 · 2004-01-13 23:16 · Score: 1

That's the text/plain part you see. The "advertisement" is in the text/html part.

Not necessarily. When the nonsense pieces first began arriving, I saw a lot of them that had no text/html part, but only a series of gibberish words. It's only been the last couple of weeks that I've noticed both 2-part gibberish pieces and pieces that lead with a link before the (other) garbage. Some of them also have an image link in the plaintext section. I've been feeding all the ones in my inbox to 'sa-learn --spam', so the number isn't growing very fast for me.

--
Mail? Put "slashdot" in the subject to pass the spam filters.

Use SPF! by TheMidget · 2004-01-13 23:30 · Score: 2, Informative

Don't ever do that, all spam has forged headers. You're just making life hard on someone who had their address sold.

That's what SPF is for. It allows the owner of a domain to publish a specification of IP addresses which are allowed to use that domain name (foo.com). If somebody, who claims to be pete@foo.com now attempts to send a mail to an SPF-enabled receiver, his mail is rejected, because his IP is not in the foo.com approved set.

Rejection happens immediately on submission, so the mail stays on the fraudulent server.

"SallySmith@aol.com" probably did not send spam-mail from a ".kr" ISP.

Nor would that mail be accepted by an SPF-enabled sendmail. Indeed, AOL is one of the first major ISPs to have published SPF records.

Bother by Scorchio · 2004-01-13 23:54 · Score: 1

What are you trying to do here? Get spam kings shipped to Quantanamo Camp? Combining two evils (erosion of human rights and spam) does not yield goodness.

I must be a really, really bad person, because I immediately thought, "yes!".

Re:gibberish... Solution: Spellcheckers by G4from128k · 2004-01-14 00:31 · Score: 2, Interesting

I'm surprised that spam filtering software doesn't just just run a quick spellchecker on the email. So much spam tries to evade literal word filtering by clever spellings of p3nis and \/iagra. But if we filter out emails with too many spelling errors (and punctuation-addled non-words) in the subject and body, then all those clever ploys are for nought. (As a side benefit, more people would be careful about spelling in legitimate e-mails).

Fitering out misspelled emails puts spammers in a real quandry -- spell words correctly (and get filtered) or misspell (and get filtered).

--
Two wrongs don't make a right, but three lefts do.

Just got one by cjthompson · 2004-01-14 01:21 · Score: 1

So, I fire up /. and start reading about a new spam technique, then I look in my inbox and get this.... Hello, This pro.gram wo.rked for me. If you hate S_pa_m like I do, you o w e it to your self to try this pro-gram, and forward this email to all of your fri.ends which also hate S+P_A+M or as many people possi.ble. Together lets help clear the Internet of S+P*A+M! STOP .S_P*A+M IN ITS TR.ACKS! Do you get jun.k, scams and wo.rse in your i.nbox every day? Are you sic.k of s.pending valuable time re.movi.ng the trash? Is your ch.ild recei.ving inappro.priate a_d*u_l*t material? If so you sh.ould know that no othe.r solution wo.rks better then our softw.are to return con.trol of your e.mail back where it belongs! Ima.gine being abl.e to read your impor.tant em.ail without loo.king thr.ough all that s*p+a*m... C.lic_k bel.ow to vist our website: http://www.Stop6The3Spam9Already.com

Re:gibberish... Solution: Spellcheckers by You're+All+Wrong · 2004-01-14 01:21 · Score: 1

Bacause that's useless against mails which contain source-code snippets, or .procmailrc snippets, or unix command line examples, or ... . All of those fail a spell-checker miserably.

I blocked a mate's mail just yesterday as it had
$ command < something > something_else
because my blocker reckognised that <something> is not a valid HTML tag. Too clever for its own good...

YAW.

--
Your head of state is a corrupt weasel, I hope you're happy.

Threshold? Bah! by glpierce · 2004-01-14 01:34 · Score: 3, Interesting

I'm worried about spammers realizing that they can effectively negate the usefulness of filters without breaking a sweat (spammers, please don't read the following). If they switched from super-short fake messages to mock-real messages (a paragraph or two long, a legit-sounding subject, etc.) and they all sent out millions a day, everyone would be forced to turn off their filters. There would be no effective to distinguish those fake messages from real messages for most people (without a whitelist/blacklist system, which does more harm than good for most).

In such a situation, email would grind to a halt. Anyone who kept trying to train their filters would just end up blocking most legit emails, and those who don't train for it or turn off would be flooded with real and fake messages they can't distinguish between. The messages would even be profitable, so long as your "friend" included a link to some "cool website" that happens to sell [fill in spam product here]. Go ahead and train your filter to block emails containing URLs. Hah! Maybe if you don't have a job, friends, or buy things over the internet you can, but for most it's just not going to work.

--
G

Re:Threshold? Bah! by bhtooefr · 2004-01-14 14:46 · Score: 1

However, the more messages you get, the more tuned your filter is (if you've categorized them right). Also, on SpamBayes (if you can't get people off of lookOut, at least get them on a Bayesian filter), the chances that a legit mail gets marked as Certain Spam are slim - it could very easily go into Unsure Spam, which is designed for messages that score too high to hit the Inbox, and too low to hit the Certain Spam filter. The larger your corpus, the more accurate it'll get, and the only way for sure that the spammers can get in is if they have access to the parts of your corpus that say what lowers the score. Bayesian filtering will last for a while, especially when combined with conservative blacklists, and an encrypted corpus (so that Joe Spamcracker can't get into your corpus as easily).
Re:Threshold? Bah! by glpierce · 2004-01-15 08:14 · Score: 1

I think you missed my point. If the spammers just send pseudo-legit messages, there won't be any way to distinguish them from legit, no matter how good your filter is. If the only statistical difference between them and legit is the fact that you don't know them (and the content is incorrect/inapplicable), computer filters will be useless.

Open up the last legit email you got from a friend. Now imagine that exact email, using different names, was sent to a million addresses. Imaging you were a recipient - how could your filter tell the real email from the fake?

--
G
Re:Threshold? Bah! by bhtooefr · 2004-01-15 12:19 · Score: 1

However, my point was that the chances that one guy's corpus is going to be identical to the next guy's corpus are VERY slim. Also, they have to make their sales pitch somewhere. Any random e-mail from a friend most likely won't have a sales pitch in it. If it's a fully pseudo-legit message, that's a crapflood, not UCE.
Re:Threshold? Bah! by glpierce · 2004-01-15 14:43 · Score: 1

You can make a pseudo-legit message sell something. Are you telling me friends don't recommend websites through email where you're from? Spammers are known for trying to shout their product at you, but it doesn't take a genius to be discreet.

Crapflooding is essentially what I'm worried about, though - if spammers do it enough, filters will cease to be useful (spammer benefit: less people will use them). From my end, it just means a whole lot of junk I can't filter.

--
G

And now the "Cmabirgde Sutdy" is being exploited. by CausticPuppy · 2004-01-14 01:46 · Score: 1

You know, the infamous Cambridge Study that made its way around the net a few months back, which shows that the human brain still easily reads words even if the letters are mixed up, just as long as the first and last letters are correct.

Now this is being exploited by spammers to circumvent filters. Example of one I received today in my "suspect email" folder:

#1 Spupelment aavilable! - Works!

*New* Enahncement Oil - Get hard in 60 seocnds! Amzaing!
Like no ohter oil you've seen.

And naturally it's followed by a block of a couple hundred random dictionary words.
I wonder if how well the bayesian filters are working for this (hash-buster aside)?
I had to resort to activating a whitelist on my ISP's spam filter.

--
-CausticPuppy "Of all the people I know, you're certainly one of them." -Somebody I don't know

Me too.. by b0bby · 2004-01-14 01:54 · Score: 1

Most of the spam that gets through my filters these days either has gibberish or chunks of classic literature. This morning I got one with a bit of Tom Sawyer...

Also. by Raven42rac · 2004-01-14 01:55 · Score: 1

I have also noticed nonsense senders, like "Lascivious P. Eviscerated". Weird, wild stuff.

--
I hate sigs.

Spambayes doesn't work well for me by jridley · 2004-01-14 01:55 · Score: 1

I don't know what the deal is, but Spambayes starts out working very well, but after a month or two, it starts getting less and less accurate, and if I let it go long enough, it's pretty much worthless.
I've tried both the Outlook plugin and the standalone feeding into Agent.
Now I'm using Popfile, and it's working great. It did take noticeably longer to get accurate than Spambayes did, but it's still working after 4 months.
FWIW, I've been getting the random word stuff for a while and popfile has been doing pretty well, I'd say 98% correct positive, false negatives only coming from one guy, I haven't bothered trying to figure out why but I wouldn't be suprised if his place of employment is running an open relay...

Multiple spam filtering methods by zarq · 2004-01-14 01:58 · Score: 1

This is just another example of why spam filtering methods should be combined. My bogofilter has a nice database of spam phrases, but it would have marked more spam as "maybe spam" if my email provider didn't also tag all email with spamassassin. It also helps to combine this with several RBL services, and maybe some hand-made identification regexps in procmail (or maildrop).

Using just one method at a time is no longer enough. It was a good thing when spamassassin introduced bayesian filtering, but by default it assigns way too low scores to be really useful.

spellcheck ? by ivar · 2004-01-14 02:05 · Score: 1

So why not run received email through a spell checker counting the % of unknown/misspelt words and add that to the properties examined by a filter... sure it'll eat up some extra processing power but it'd be worth it. Hmm.. Actually, this would work for a while then lead to randomly inserted correctly spelled words. I guess a decent grammar checker (does one exist?) would be required as well.. arg.

OK, let me ask the dumb question: by OmniGeek · 2004-01-14 02:17 · Score: 1

Is it possible/practical to automate the comparison of source IP address vs stated source ID and detect forged headers? It seems to me that including a workable forged-source-address detection system into a mail transfer agent would be a useful thing to do, assuming it can be done so as not to break legitimate mailings.

I'm not very familiar with the relevant RFPs, and don't think researching the issue on my own is a good investment of limited time. Perhaps someone here does know...

--

"My strength is as the strength of ten men, for I am wired to the eyeballs on espresso."

Self-defeating spam? by autophile · 2004-01-14 02:29 · Score: 1

I wonder if spam will become self-defeating. Eventually the filters will become so good that the only things that can make it through are legitimate mail, and gibberish spam. But if the spam is so obfuscated, imagine the reaction of the typical rube who actually responds to spam: "Duh.... 'STILL NO LUCK ENRGAILNG IT?' What's enrgailng? Is that some kind of financing?"

--Rob

--
Towards the Singularity.

Also, get the word out that... by Anonymous Coward · 2004-01-14 02:33 · Score: 0

...infomercials sell junk

...diet pills don't work

...there is no monied Nigerian in trouble

..."Episode III" will be just as bad as the others

...Scientology is a scam

...your burger won't really look like that

...hot chicks won't appear when you drink Schlitz

...SNL isn't funny

...pop divas and boy bands are lip synching

...what that politician said, was a lie

oh, and:

...a sucker is born every minute...

my problem w/ spam..... by preclose · 2004-01-14 02:34 · Score: 1

My problem is that I'm really wanting to increase my penis size 42 inches in 3 days but I only get offers to incr.eaz mii p3N>?is. My p3N>?is is fine. It's my penis I'm worried about.

average word length tests by geoff+lane · 2004-01-14 02:44 · Score: 2, Interesting

when SCO, sorry CoS, were spamming ARS a couple of years ago it was possible to kill 99% of the spam just by computing the average word length in the spam. Ordinary humans generated messages with an average word length of 4.5 letters, CoS random word spam had an average word length of 5.5 letters.

I was surprised that such a simple test worked so well.

One day I must re-implement the test for email spam and see if it works as well.

Re:average word length tests by MCZapf · 2004-01-14 04:28 · Score: 1

Spammers probably read Slashdot. Don't give too much away on how you fight spam!

Drop it and black hole it for a few minutes. by khasim · 2004-01-14 02:55 · Score: 1

Once you've established that some site is spamming you, it would be nice to have your server automatically NOT respond to ANY more traffic from that site for a variable length of time (1-10 minutes).

I wouldn't recommend dropping it forever, but it most likely an open-relay or spam-friendly ISP. Why even accept connections (and let them eat up your bandwidth) with their crap?

There'd be a problem with some sites like earthlink which seem to send me lots of spam at irregular intervals, but that's why a few minutes of "time out" should be enough to stop them. The mail will sit on their servers and their admins can deal with it.

Re:Drop it and black hole it for a few minutes. by mabhatter654 · 2004-01-14 04:15 · Score: 1

But you still have to clean up the messages...it's easiest to simply check IP and subject...but if the subject is different every time it's not quick and easy...Remember lots of people use web mail now...the added space is the ISPs problem!

Right, I got the proof! by QaDeS · 2004-01-14 03:14 · Score: 1

Analyzing some of my SPAM and doing some binary maths about the frequency of substituted and inserted letters, I found hidden messages in about 80% of the mails:

0.3% BUY VIAGRA
2.7% BUY Windows
4.8% xvus apoejfjjea dkkskkd aejjfjeopa suvx (see, it's a palindrome!)
90.2% ALL YOUR BASE ARE BELONG TO US

strange...

Re TMDA by nexus987 · 2004-01-14 03:27 · Score: 1

I'm a bit advocate of TMDA and other challenge response e-mail systems (I used ASK - Active Spam Killer and get zero spams as well). One of the main complaints I usually hear about this is that it's too easy to "Joe Job" someone with these systems. It just occured to me that if spf is widely impletemented, this will no longer be a problem.

Re:Re TMDA by Trejkaz · 2004-01-14 10:48 · Score: 1

Absolutely. This issue of people sending from others' email addresses is actually why I run SpamAssassin alongside TMDA. If SPF comes into full use, I will no longer need SpamAssassin, in theory.

--
Karma: It's all a bunch of tree-huggin' hippy crap!

Spam filtering by Jolly+Tom · 2004-01-14 03:39 · Score: 1

Perhaps one could use (or develop) a spam filter that has a dictionary lookup, and rejects spam based on non-ligitimate words.

Re:gibberish... Solution: Spellcheckers by pyser · 2004-01-14 03:53 · Score: 1

What's needed is to combine a spelling checker with a syntax checker. That would get rid of strings like 'peephole clockwise tachometer nocturne hodges jest prolix' that would pass the spelling checker unscathed.

A variation on a spam theme by Moose4 · 2004-01-14 03:56 · Score: 1

I've gotten two spams recently with an alternate version of this technique. They don't use random words, they use random gibberish. There's ten or so lines of "xyswieour iowruskldjf sfzzsfds, sdfklsjl weroius xyzzy."-type stuff at the bottom. I don't get spammed enough to need a spam filter (yet), so I don't know anything about Bayseian filters--do garbage characters like this defeat them?

--
"Settle down, Beavis. We've got an experiment to do."

No more spam for me.... by dthatcher · 2004-01-14 04:14 · Score: 1

I don't see any reason why ANY company should have a problem with spam. At my company, we run our mail through a communigate relay before it gets to our main mail server. The communigate server is set to do several things: 1. Reverse lookup verification 2. Check the Spamhaus RBL 3. Check the SpamCop RBL 4. Check the Open Relay RBL Also communigate's generic spam filtering is turned on. Guess what? No more problem with spam. None. Sure, the virus propogated emails get through, but their attachments get deleted because our firewall scans the attachments for virii. The only thing we have had to do is whitelist a number of domains but any spam solution is going to require tweaking. BTW, if anybody knows a good, *FREE* Dynamic IP RBL I'd like to hear about it.

Gods, I hate HTML Email by DigitalSorceress · 2004-01-14 04:20 · Score: 1

I always set up my email client to send only Plain Text, and to strip HTML from incoming email (noHtml for Outlook, or for Outlook 2002 SP-1, the Microsoft Registry Fix)

I realize I'm going to sound like a Luddite here, but I just don't have the overwhelming need to send people emails with lightly shaded text over a really busy background, and I certainly HATE it when people send those to me.

That in itself is reason eoungh to strip out all HTML and/or convert to plain text, but I notice that spammers use nonsense markup tags, or even just lots of FONT tags to break up words invisibly. My current spam filtering is no help because it only filters the source code. (I'd love them to add a "post-render" phase of scanning where it checks through the message contents that are viewable by the user)

--

The Digital Sorceress

Re:Gods, I hate HTML Email by a24061 · 2004-01-14 21:10 · Score: 1

I realize I'm going to sound like a Luddite here, but I just don't have the overwhelming need to send people emails with lightly shaded text over a really busy background, and I certainly HATE it when people send those to me.
I agree with you 100%. I also detest e-mails with large attachments, especially in proprietary formats.

Re:My Bayesian filter is slowing becoming a whitel by Anonymous Coward · 2004-01-14 04:21 · Score: 0

I think whitelists as practiced in their most-evolved forms (challenge/response) ARE the way to go at the moment. Content filtering is fighting the wrong battle as I have seen it practiced at my workplace.

I rely on Mailblocks for all personal mail and it has utterly eradicated spam(zero spams in 0 months... yeah that's eradication for a guy receiving 200 a day). While I can imagine ways to circumvent it (though perhaps not profitably so), I really have trouble seeing any other unilateral choice that requires no administration/wizardry from the user performing to this level of satisfaction, and no requirement to cajole sysadmins/ISPs to buy into a platform.

Earthlink has a similar challenge-response system, but I'll briefly touch on the wrinkles of Mailblocks and why I feel it has elevated whitelists above the admin-heavy yokes they can be in unadorned form.

1. Successful response requires typing the letters/digits in an image, as seen elsewhere today. This has yet to fall to hacking, and if it is hacked it can adapt to defeat it. For ONCE, this places the onus of developing hard technology on the spammers and not on those trying to defeat it (think of the annoying trend of sending spam in which images display the pitch in text).

2. Email addresses you send email to are, by default, automatically added to your personal whitelist if not already explicitly listed on your blacklist (which is generally not needed though you should add your own address to it).

3. You can pre-seed the list by uploading your contact list to avoid having your transition to c/r become an imposition on those you already communicate with.

4. People who clear a challenge/response when communicating with ANY Mailblocks customer are added to a common whitelist. This means that people should only see only one c/r and not one per Mailblocks customer they correspond with.

Other wrinkles are nice (e.g.: keep your old email addresses), but not fundamental to the anti-spam abilities.

I would not say that I regard the system as perfect now, has no issues, or that I regard it as perfect for the long-term. Though there are ways it can improve further, if it became very widely used its fragilities would become a dedicated focus for attack by spammers.

The primary frailty I see for Mailblocks comes in the form of the following example:

Earthlink's c/r service sends its challenges NOT with the subscribers email address as the "from" or "reply-to" address, but instead claim to be from "automated-response@earthlink.net" -- this requires me to add an explicit white-list entry for it and this becomes carte-blanche for spammers to reach me simply by forging this as their sending address.

tone

but is it art? by duckHole · 2004-01-14 04:28 · Score: 1

This gibberish from email messages is now being recycled by a whole cadre avant-garde poets into "found" poems:

http://www.boston.com/news/globe/magazine/articles /2004/01/04/spam%5Fpoets/

http://poetry.about.com/b/a/055812.htm

Actually, aleatoric methods for generating poetry have been around since Dada (they used to literally pull words out of hats as a randomizing algorithm...). These guys are just piggybacking on the spamming hash software.

But you should only have to clean up a few. by khasim · 2004-01-14 05:15 · Score: 1

The few messages that get through that trigger the black hole effect.

The majority of the messages would stay on the sender's server and have to be dealt with by that admin.

Besides, this SHOULD hamper the sender's server as it tries again and again and again to connect to your server (which refuses every connection). All those unsuccessful threads will show down how much spam can be sent for a given time frame from that server.

Re:gibberish... Solution: Spellcheckers by Mikkeles · 2004-01-14 05:17 · Score: 1

"But if we filter out emails with too many spelling errors (and punctuation-addled non-words) in the subject and body,..."

Well, there goes about 90% of (legitimate) e-mail ;-)
(and, of course, IRC is so totally gone!)

--
Great minds think alike; fools seldom differ.

Re:And now the "Cmabirgde Sutdy" is being exploite by mwood · 2004-01-14 05:30 · Score: 1

I've probably received more of those than I know about, because I always make a quick first pass through my inbox to trash all of the messages that come from total strangers yet have Subject: lines written as if to a close friend. (Along with the fake 147kB "delivery failure" messages and the like.) I haven't thought of a good automated way to detect those yet, but the good old manual method is not too burdensome and I can't recall when last I actually *read* something that turned out to be UCE without being pretty sure in advance that that was what it was. (Sometimes I like to get my jollies by seeing what these losers are up to.)

It's autoinserted by pjt33 · 2004-01-14 05:55 · Score: 1

The [SPAM] was inserted by a spam checker. It wasn't in the original message. I think it's SpamAssassin suitably configured, but I could be wrong.

Re:gibberish... Solution: Spellcheckers by alsta · 2004-01-14 06:16 · Score: 1

No that's not good enough. According to RFC 2045, the multi-part e-mail should contain a body part and an alternative 7bit ASCII part.

Theoretically, if the e-mail is legit, the bare contents of the the body should match the contents of the 7bit ASCII part. Problem is with multi-byte content in the body part. How does that compare to 7bit ASCII? So the comparison would have to be fuzzy to some degree.

--
Wealth is the product of man's capacity to think. -Ayn Rand

Re:I see this too (err, I don't) by Buran · 2004-01-14 06:44 · Score: 1

Chromatin! Who knew!? Cell-biologist spammers!

--
i am a soviet space shuttle

Spammers are terrorists! by cpghost · 2004-01-14 06:53 · Score: 1

What if the "random" words were actually a hidden communications channel?

One known method of defeating traffic analysis is to send a continuous stream of junk from random locations to random destinations, and, at the right moment, insert the real payload into the random stream.

The constant stream of spam, esp. when combined with this seemingly random gibberish set of words, is a great way to hide real communication from traffic analysis.

If the NSA were to effectively do traffic analysis on a worldwide scale, they will have to monitor an enormous amount of spam, and this could even amount to a DDoS of their surveilliance software.

So, Mr. Ashcroft: Spammers are (helping) terrorists! Wouldn't it be time to change your CAN spam law to a CANNOT spam law (just to be sure) and start prosecuting those criminal enemy combatants?

And who knows? Napster-NG (new generation) could be also build on top of that great anti-traffic-analysis spam network. RIAA sheriffs, are you there?

--
cpghost at Cordula's Web.

Re:gibberish... Solution: Spellcheckers by Theatetus · 2004-01-14 14:00 · Score: 1

Or emails from teenagers, who now write in all situations as if they were on IM.

--
All's true that is mistrusted

Yeah, but what do we poor individuals do? by TechnoWitch · 2004-01-14 14:52 · Score: 1

Sure, companies can afford expensive services or set up complicated rerouting. But what about those of us who would like simply to host a domain (or have one hosted for us by an ISP)?

I'm using several different anti-spam measures, and still a bunch get through.

What really ticks me off is I have a couple of really sweet domains -- which are literally unusable due to spam. Inadequate filters and I can't tell the spam from the legit stuff. Have filters that're too good, and legit email gets bounced.

I'm using Fastmail.fm for a couple of those otherwise unusable domains. It blocks about 95% of the spam with my current settings and custom sieve rule set. But even one still ticks me off.

I don't even have to delete spam... by macraig · 2004-01-14 19:31 · Score: 1

...because PopFile does it for me now. Well, technically Outlook is doing the deed, but PopFile is the one issuing the orders. These random words of which the article speaks really aren't random at all: they're CHOSEN, and they haven't fooled PopFile's Bayesian algorithm at all in months. Since last summer, its accuracy has climbed to 99.39% today; at the beginning of this month I finally changed my Outlook spam rule from "move" to "delete", so I don't even have to bother at all now. So let the spammers try some new trick: I'll teach it to PopFile once or twice and never have to worry about it again. Never having to use the [delete] key again on spam? Heh... I could get used to being this lazy.

Man, nice cliff hanger article.. by OhioJoe · 2004-01-14 20:42 · Score: 1

I was on edge reading that, thinking my new bayesian filter system I am using and singing the praises of, is now useless. But then this line was later in the article: "Baxter and Linford said that spammers' use of hash busting is definitely on the rise, but such tricks can rarely circumvent a well-trained Bayesian filter."

Whew.

Back to singing it's praises..

--
"Artificial Intelligence usually beats real stupidity."

Just don't detect spam on content. by Dr.Ruud · 2004-01-14 20:43 · Score: 1

I have been creating spam-detection-software for many years now. My rules act on structure and metadata, not on content, never on content. All you need is procmail and sed.

With SpamAssassin you can achieve about the same result as I do. But disable all content-based rules, because they don't scale and they get worked around. Even checks on the geographical origin of the urls in a message, won't survive.

Detecting spam on content is (and always have been) a dead end street. When Bayesian filters came around, I just thought: o no, another weak spot.

IPv6 can be the next anti-spam-problem: just too many IP-addresses to blacklist.

SpamArrest by CyberdogOSX · 2004-01-15 03:16 · Score: 1

i have been using Spam Arrest for many month's and not gotten a single spam. FYI.

Email is shit by Anonymous Coward · 2004-01-15 08:09 · Score: 0

Face it the problem is, email was never designed to be let out of trusted networks.

Don't use it. Get a domain name, set up a server and use your own secured apps. Communicate over forums, wikkis or any means that does not involve using that protocol that is synonymous to opening your mouth and attaching it to an industrial strength garbage disposal.

Either shut up (and shut off) or swallow. Its not getting any better. Email is a dinosaur and as broken as the RIAA.

Re:My Bayesian filter is slowing becoming a whitel by Anonymous Coward · 2004-01-15 13:48 · Score: 0

Whitelists are effective for those of us not tring to build a customer database out of emails, assuming no man-in-the-middle DNS lookup/Reverse DNS lookup attack.

Re:gibberish... Solution: Spellcheckers by Anonymous Coward · 2004-01-16 06:09 · Score: 0

Quandary ...

Domain Keys vs other new paradigms by shubert1966 · 2004-01-17 14:16 · Score: 1

I haven't read up on Yahoo's initiative to use "Domain Keys", whatever it turns out to be, but it has always seemed to me that SPAM is really just user error, except that it's on the part of the ISPs and other deployers of email servers.

Please forgive my ignorance, if present, but can someone please analyze the current paradigm and tell me why we don't just change the damn paradigm on email altogether?

Here's what I think would work:

1) Instead of giving out your email address, so that ANYONE can send you ANYTHING, legal or not, give them the URL to your email signup page, which has a CAPTCHA feature like any good signup page does.

2) Your simple website, and you know this would be simple, maintains the list of 'acceptable' email addresses and spurns messages from all others. You import your current address book to bring the new system up to date, then do the familiar mass mailing informing your contacts that you have switched to the new paradigm.

3) Your site receives vistors who wish to email you. They fill out the form and identify the captcha image, and provide you with their email address. Additionally, they provide a short, text-only message that you receive along with their signup request.

4) Now, I realize I have introduce another signup/hurdle to the user's experience, but from another perspective, they will feel it is worth it to conquer/prevent SPAM.

5) You receive the request, visit the site of the requester, do whatever you want - then add their address to the "OK" list, or not. This puts you in the position of detective - but, whether you choose to investigate (whether they are spammers) or not, you can always remove them from the list as an "abuser".

6) On your server's side, you run a script that changes your email address every so often. Your email address is always hidden, and ever-changing, all to the extent to which you can prevent people from hacking in.

7) The web-signup/homepage/email page concept becomes mainstream and everyone is happy, ans some more work exists (for a little while) for web-monkeys.

Ok. Does this suck? Please explain. Thank you!

--
Stuff that matters.

Slashdot Mirror

Filter-foiling Gibberish Becoming A Spam Staple

606 comments