The Growing Field Guide To Spam Techniques

"Tricks?" by agent+dero · 2003-07-22 23:45 · Score: 2, Interesting

I also thought it was pretty easy to spot and eliminate SPAM offering my mom to "Add 3inches to your penis today_________________12312vxas"

Or to eliminate javascript enabled e-mail.

SPAM is not quite a science. It's skript kiddie stuff, meaning it's not too hard to do just some open relays, and mass e-mail lists you can buy from AOL.

--
Error 407 - No creative sig found

Re:"Tricks?" by wiggys · 2003-07-22 23:56 · Score: 5, Interesting

You miss the point comletely. Any reasonably normal intelligent human being can spot and delete spam - that's never been the issue. The point is that spam is annoying and can be very time consuming for a human to deal with, which is why computerised spam filters were created.
The first generation of spam filters were crude and simplistic - they would delete an email based on the sender, or maybe one or two key words. This isn't effective because spammers rarely use their own email addresses in the "Reply to" field, and deleting all email which contains the words "marketing" or "investment opportunity" is likely to delete legitimate email. Besides, spammers can easily get around this by altering words in such a way as to delete filters (V*I*A*G*R*A is easily read by a human but a computer looking for "viagra" and "viagara" would not stop it)
The best spam filters today use Bayesian filtering to eliminate spam: you train the filter by giving it a pile of email and telling it these are genuine, and another pile and saying these are spam. The filter then looks through the mail and gives certain words a weighting - if most spam contains big red letting with words like "investment", "click here to be removed" and "penis enlargement" then it would score highly and be given a higher probability of being marked spam. Email containing words with your name in it, or words relating to your life or work, would be given a higher probability of being called spam.
And for crying out loud, "spam" is not an acronym so stop writing it in upper case!

--
Sorry, but my karma just ran over your dogma.
Re:"Tricks?" by dillkvast · 2003-07-23 00:09 · Score: 5, Funny

And for crying out loud, "spam" is not an acronym so stop writing it in upper case!

Actually writing it uppercase suggests that you are crying it out loud.

--
Scitne aliquis remedium potimum crapulae?
Re:"Tricks?" by Oddly_Drac · 2003-07-23 00:29 · Score: 2, Insightful

Anyone else tickled by the fact that downloading the whitepaper requires an email address?

--
Oddly Draconis
Too cynical to live, too stubborn to die.
Re:"Tricks?" by DazzaJ · 2003-07-23 00:32 · Score: 5, Informative

Hormel Foods has this to say on the subject

"We do not object to use of this slang term to describe UCE (unsolicited commercial email), although we do object to the use of our product image in association with that term. Also, if the term is to be used, it should be used in all lower-case letters to distinguish it from our trademark SPAM, which should be used with all uppercase letters."

so....

"SPAM" is Pork and Ham
"spam" is unsolicited email

"SPAM SPAM SPAM SPAM
SPAM SPAM SPAM SPAM
Lovely SPAM, wonderful SPAM!"
is a Monty Python song
Re:"Tricks?" by sporty · 2003-07-23 01:33 · Score: 1

And both are equally unwanted. At least in my house. Something about canned processed meat is just evil.

On a second note, isn't ham.. pork? I think it doesn't stand for that.. prolly just "Spiced Ham"'

Now that I've made an insightful and funny comment, lessee if the mod's don't spaz out. :)

--
-
ping -f 255.255.255.255 # if only
Re:"Tricks?" by ShadeEagle · 2003-07-23 02:21 · Score: 1

Acually, if you write it in all caps, you are infringing on a Hormel Foods trademark, SPAM.

You know, the canned lunch meat?
Re:"Tricks?" by po_boy · 2003-07-23 03:15 · Score: 1

On a second note, isn't ham.. pork? I think it doesn't stand for that.. prolly just "Spiced Ham"'

The "S" stands for "shoulder." The shoulder pork meat used to not be worth too much because it's hard to seperate out the good and bad parts. Grinding it all up with some other ham makes it useful.
Re:"Tricks?" by Anonymous Coward · 2003-07-23 03:51 · Score: 0

Spam is not a Science, however Blocking Spam Effectivly is. I consult for several Major companies and have deployed ClearSwift Mime Sweeper for SMTP on their networks. Using this some custom plugins and setting up a gutter account on just about ever MM list ever. I have been able to analysize and build my reference lists. The Combination of the reference lists, Black Lists, and custom progs I am happy to say only the newest Spam messages out there have ever gotten through. I say it is a science however because it requires skill to identify and properly rate words in the suspect email without blocking email from clients or buisness partners.
Re:"Tricks?" by mdinowitz · 2003-07-23 04:10 · Score: 1

Personally, I dislike Bayesian filters and instead write my own using RegEx (among other things). It's rather easy for someone to grab specific phrases from a subject and add them into a DB for blocking. For example viagra isn't a spam word alone, but generic viagra, prescription viagra, perscription viagra, etc. will all be flagged. (generic|prescription|perscription) +viagra For the word viagra with delimiters between each letter, this would work. v[ ._*-]*i[ ._*-]*a[ ._*-]*g[ ._*-]*r[ ._*-]*a Point is, spam has been growing in quantity but not quality and over 90% of it can be blocked from the header info alone by someone who can find the patterns in it. As the poster said, it's not a science and it's not even thought of by most of those who send spam. Yes, there are exceptions like the excellent ebay scams I've been seeing recently. So well put together that it would fool almost anyone. almost. Write your RegEx visually using the free tool at www.cfregex.com

--
Michael Dinowitz House of Fusion http://www.houseoffusion.com
Re:"Tricks?" by MillionthMonkey · 2003-07-23 04:31 · Score: 1

SPAM is shoulder pork meat but the parent post was correct- it was originally marketed as "Hormel Spiced Ham". (The "spice" was salt.) Hormel decided they needed a catchier name, and the brother of a Hormel executive won $100 in a contest for suggesting "SPAM", short for "spiced ham".

Things like this used to happen all the time before companies stopped allowing employee family members to participate in contests.
Re:"Tricks?" by mdinowitz · 2003-07-23 05:48 · Score: 1

OK, I read the article and I must say I'm totally unimpressed by it. The focus is almost only on the body of the message and ignores all of the ways to detect spam from the header. It makes some broad statements about how spam can be formatted but not much else. I've seen tech docs on blocking spam from highschool kids with more detail and usefulness. I think that the author should go back and try to gather and review a LOT more spam. If he wants, I can foward him a few hundred unique ones that I get a day. I'm sure others here can forward him a whole lot more.
Note to self. Allow review of spam headers and body in spam dump (http://www.houseoffusion.com/spam/) for others to use.

--
Michael Dinowitz House of Fusion http://www.houseoffusion.com
Re:"Tricks?" by Cat_Byte · 2003-07-23 06:53 · Score: 1

And for crying out loud, "spam" is not an acronym so stop writing it in upper case!

Besides the Spam company is suing for using their copyrighted name for canned meat.

--
Two roads diverged in a wood, and I - I took the one the bus load of girls just went down.
Re:"Tricks?" by two2dog · 2003-07-23 07:58 · Score: 1

Not only do bayesian filters work but they solve several problems that rules based filters can't solve. #1, my spam may be your reading material- if I let you set rules for me, I might have wanted to see that sale at jcrew that you declared spam. #2. If you are rules based you have to make up new rules and distribute them as soon as the spammers try out a new method to get by you. If you are bayesian, and you are good, you analyze the content and declare it spam anyway. I recommend InBoxer at http://inboxer.com for a free 21 day trial. Or for those of you who like to roll your own. Open source at http://www.spambayes.org
Re:"Tricks?" by Anonymous Coward · 2003-07-23 09:52 · Score: 0

They didn't validate it though. Whoever runs the site might want to delete fuck.off@spamcop.org from their mailing list.
Re:"Tricks?" by Feztaa · 2003-07-23 16:06 · Score: 1

Email containing words with your name in it, or words relating to your life or work, would be given a higher probability of being called spam.

Gee, I hope not :)

Dirty Little Secret by Anonymous Coward · 2003-07-22 23:45 · Score: 4, Funny

The dirty little secret about spamming that you never read on Slashdot is that spammers use Linux systems to generate the spam and Linux mail relays to send it.

Linux and Linus Torvalds are more responsible and liable for spam than any other single entity. Personally I use IIS 6.0 which is secured against any external threat.

Re:Dirty Little Secret by Surak · 2003-07-22 23:55 · Score: 0, Insightful

...and...? Linux is widely available, reliable, robust and free. If there were no Linux, spammers would just use some other system. So you can't really say that Linux and Linus Torvalds are responsible or liable for spam.

A friend of mine worked for a professional spamming outfit that was exclusively Microsoft-based. It's not like it hasn't or can't be done. It's just generally cheaper and easier to do stuff on Linux.

--
My journal has hot /. gossip.
Re:Dirty Little Secret by Anonymous Coward · 2003-07-22 23:59 · Score: 0

This probably is due to the fact that it's easier to misconfigure a Linux server so it will pass on unwanted mail than it is to misconfigure a Windows server to pass on unwanted mail.

Misconfigure a Windows mail server, and it simply won't send anything anywhere.

But I think you are being unfair holding Linus Torvalds responsible. Remember he only wrote the kernel, which doesn't send any mail to anyone. Sendmail or exim is what sends the mail messages. And the client machines used to harvest e-mail addresses seem to be running Windows. That, or else they have a honeypot daemon on port 139 which listens to a few lines of random gibberish then stops suddenly .....
Re:Dirty Little Secret by JamesO · 2003-07-23 00:02 · Score: 4, Funny

You're a friend of someone who used to be a spammer?

That's what I call a dirty little secret...
Re:Dirty Little Secret by Anonymous Coward · 2003-07-23 00:25 · Score: 0

Are you familiar with the concept of humour? I see at least one moderator isn't. Come on guys, if you read it aloud and it makes you laugh, it's a joke, ok? I really isn't that difficult.
Re:Dirty Little Secret by Anonymous Coward · 2003-07-23 01:54 · Score: 0

YHBT. YHL. HAND.
Re:Dirty Little Secret by BrokenHalo · 2003-07-23 02:05 · Score: 0, Flamebait

Linux and Linus Torvalds are more responsible and liable for spam than any other single entity.
What a dumbass thing to say. As if it isn't possible to use any winbloze, mac, beos or [OS of your choice] box to distribute spam. Duh...
Re:Dirty Little Secret by Alan · 2003-07-23 05:44 · Score: 1

Where's the -1 "Totally Missing the Joke" moderation when you need it?
Re:Dirty Little Secret by Anonymous Coward · 2003-07-23 06:08 · Score: 0

YHBT. YHL. HAND.
Re:Dirty Little Secret by Jawn98685 · 2003-07-23 19:06 · Score: 1

You're one of those guys that cross posts to the gay and ditto-head newsgroups, aren't you. :>
Re:Dirty Little Secret by Anonymous Coward · 2003-07-28 04:35 · Score: 0

Actually I have 3 friends that furnished their apartment by spaming from AOL Canada on WinXP and Win98 boxes over a DSL connection. I can say for certain that 1 of them KNEW OF Linux and the other 2 did not.
Re:Dirty Little Secret by Anonymous Coward · 2003-07-28 04:40 · Score: 0

Still friends with all 3. 1 still spams as a part time job. I personally didn't understand why a person spams until I sat on their overstuffed leather couch and watched the first season of simpsons their big screen TV.

ActiveSpam? Real world spam? by jkrise · 2003-07-22 23:48 · Score: 2, Interesting

From the article:
the ActiveState Field Guide to Spam is a selection of the tricks

The words Active, Smart, Rich etc. are part of MSspeak - leave a bad taste..

providing examples taken from real-world spam messages.

Why not fictional world spam messages? You mean, all those enlargers I got over mail weren't real-world! Boo-hoo....

-

--
If you keep throwing chairs, one day you'll break windows....

Re:ActiveSpam? Real world spam? by Anonymous Coward · 2003-07-22 23:59 · Score: 0

Umm... Activestate is about as far from MS as one gets...
Re:ActiveSpam? Real world spam? by PhxBlue · 2003-07-23 03:16 · Score: 1

The words Active, Smart, Rich etc. are part of MSspeak - leave a bad taste..

What? These words were in the dictionary before 1981.

--
!#@%*)anks for hanging up the phone, dear.

Block spam by ftvcs · 2003-07-22 23:50 · Score: 5, Informative

I use Thunderbird, and found it to be a good system.
Before I used PopFile but he blocked some good mails. That was reason enough to drop it..

Re:Block spam by gilesjuk · 2003-07-22 23:57 · Score: 1

Thing is I would rather have spam filtering as a seperate system, integrating it into the client very Windows like. Modularity is the *nix way, building nice systems out of little tools.
Re:Block spam by Anonymous Coward · 2003-07-23 00:25 · Score: 0

I'm quite satisfied with PoPfile. I've had the occasional misdirected email but those are very few and far between.

No system can be expected to be 100% reliable. Even if you had some good emails blocked while using it it is still a good system.

In any case a brief scan of the spam folder is all that is required to pick out any messages that might have been misclassified.
Re:Block spam by CGP314 · 2003-07-23 01:00 · Score: 2, Informative

Really? I've never had a problem with popfile. Plus the advantage of popfile is it is a general mail classifier, not just for spam. So it will sort mail into different types.

One thing I use this for is mailing list. Instead of just saying 'all email from this address goes to this folder' I used popfile to sort the messages into 'probably of interest to me' and 'not of interest to me'. Really great for groups that get spammy posts to them.
Re:Block spam by halr9000 · 2003-07-23 01:01 · Score: 3, Interesting

I would try harder on POPfile. No offense, but you probably did not train it very well. I'm up to greater than 97.7% correct filtering with POPfile.

Besides, who wants to switch mailers to block spam? That's kinda drastic. You can use POPfile with any mailer. (Haven't tried TB, but I'm a big fan of FB.)

Does making this public help spammers? by Anonymous Coward · 2003-07-22 23:53 · Score: 4, Insightful

Just a thought, but....

Making it public, the methods used to intercept and filter spam will always mean spammers are one step ahead. If they know the strategy behind those stopping them, then that only helps them.

Is there a better way?

Re:Does making this public help spammers? by GigsVT · 2003-07-23 00:13 · Score: 3, Insightful

This is an interesting question, it's similar to the security vulnerability full disclosure arguments, but with a couple differences, a spammer that is using a technique is broadcasting how to do it to nearly everyone anyway.

It's also different from security in that the spammer has no motivation to keep the method secret, it's worthless unless it is used to send spam. Contrast that with the security disclosure problem, in that there is a large motivation to keep a vulnerability secret and use it covertly on specific targets.

I'm leaning toward the idea that this really won't help spammers much, but with the caveat that it really doesn't help spam filter writers much either, since looking at the spams you get would make it obvious what techniques were being used anyway.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Re:Does making this public help spammers? by dillkvast · 2003-07-23 00:19 · Score: 2, Interesting

Don't agree. This is sorta the same as the idea behind "full disclosure" of security issues. The underground know all the tricks, and thus it is better that the sysadmins out there also have some idea of whats going on. This keeps us (the filtermakers more exactly) one step closer. Alot of these filters are OSS anyway. So the spammers can design there spam to circumvent the filters. They can even buy properitary filters and just test against them when designing spam.

--
Scitne aliquis remedium potimum crapulae?
Re:Does making this public help spammers? by AndroidCat · 2003-07-23 00:51 · Score: 1

The anti-filter "tricks" in this field guide are pretty old. I doubt any serious spammer doesn't know them already.

--
One line blog. I hear that they're called Twitters now.
Re:Does making this public help spammers? by Anonymous Coward · 2003-07-23 02:09 · Score: 1, Insightful

Bayesian filtering is currently considered by many the best spam filtering mechanism. Since the detailed data set is different for everybody, and it learns from spam and non-spam messages, the only way a spammer could avoid Bayesian filters would be to either customize spam for each recipient (not practical) or make spam messages look a lot like normal messages (making them much less intrusive, but also impossible to filter through any mechanism other than a whitelist). See Paul Graham's spam pages for further info.

Security through obscurity would be pointless. Unless you are using a spam filter you wrote yourself and aren't going to give anyone else, it won't help.

Even if you would offer a filtering service without giving the filtering program to anyone (to prevent reverse-engineering), spammers could always use the service as an oracle to figure out ways around it through trial-and-error.
Re:Does making this public help spammers? by don_carnage · 2003-07-23 03:09 · Score: 1

"I'm leaning toward the idea that this really won't help spammers much, but with the caveat that it really doesn't help spam filter writers much either, since looking at the spams you get would make it obvious what techniques were being used anyway."

If I were a spammer, I would just download SpamAssassin and check the content analysis algorithms. I don't think it's too difficult for them to get their hands on anti-spam software.

--
Wooden armaments to battle your imaginary foes!
Re:Does making this public help spammers? by JohnGrahamCumming · 2003-07-23 03:57 · Score: 1

The field guide only points out the tricks and does not point out how we get around them.

John.
Re:Does making this public help spammers? by ptbarnett · 2003-07-23 04:55 · Score: 3, Informative

If I were a spammer, I would just download SpamAssassin and check the content analysis algorithms. I don't think it's too difficult for them to get their hands on anti-spam software.
If SpamAssassin did nothing but content analysis, that might work. But, SpamAssassin (by default) also checks several real-time blacklists and uses Bayesian filtering.
I've found that it's the combination of all of these factors that identifies almost every spam. I've had only two or three spams slip through in the 3-4 months since I installed SpamAssassin, with no false positives.
Re:Does making this public help spammers? by bhanafee · 2003-07-23 19:16 · Score: 2, Insightful

How do you know there have been no false positives? Are you reading your spam?
Re:Does making this public help spammers? by ptbarnett · 2003-07-24 02:24 · Score: 1

How do you know there have been no false positives? Are you reading your spam?
As I have it configured, all mail classified as Spam is redirected to another folder. SpamAssassin actually attaches the original message to a new message that explains why it was classified as spam and provides a plain-text excerpt of the first few lines of the original.
Periodically (usually every couple of days), I review the folder to see if any were legitimate messages. The subject lines are usually sufficient to recognize them. If not, the excerpt is enough. By attaching the original to the message generated by SpamAssassin, I can look at the excerpt without worrying about triggering any HTML image "bugs" that validate receipt of the message.
It only takes 15-20 seconds to look through a list of subjects, and I'm not interrupted by spam for the rest of the day.

Does not explain purpose of trick by PCP · 2003-07-22 23:54 · Score: 1, Interesting

Many of these description shows how spammers try to hide text. Why would they do that? Isn't the whole point that we should read the spam?

I assume spam-filters reads the whole e-mail anyway, so trying to hide text in a would not accomplish anything.

Or are spammer just stupid?

Re:Does not explain purpose of trick by Anonymous Coward · 2003-07-23 00:04 · Score: 5, Informative

One purpose of hiding text is to fool anti spam filters.

Let's say that everything between '[/]' is visually hidden. I can send you the message:

Fre[dom for th]e pen[ and th]is enl[ist l]argement.

The 'filter' will see:

Fredom for the pen and this enlist largement.

The user will see:

Free penis enlargement.

Cheers,

--fred
Re:Does not explain purpose of trick by BFKrew · 2003-07-23 00:08 · Score: 3, Interesting

From what I gathered, it demonstrates two things:

Firstly, the techniques spammers will use to display the text in the email so that the end user will be able to view the text in the email.

Secondly, it demonstrates how using the above approach they are trying to trick spam stopping techniques from working. For example, instead of having a email titled "Free viagra" you could write it as "F*r*e*e V*i*a*g*a*r*a" in an attempt to stop a spam stopper from spotting Viagara as easily in the title. In the body of the email you could write the html in such a way that decifering any words is quite tricky, eg writing Viagara as (font size="2")V(font size="2")iaga(font size="2")ra(/font) etc. Certainly to say spotting all variants of 'hiding' such words is not as simple as you might first think.

It certainly gave me an interesting insight into the problem that it is, and how the spammers are trying and continually evolving their techniques to ensure they can carry on.
Re:Does not explain purpose of trick by alistair · 2003-07-23 00:08 · Score: 5, Informative

I think the purpose is to vary the hidden text to fool anti-spam systems which rely on blocking mail based on signatures of the message body.

If you send 150,000 messages which say "Free Porn Here" systems such as Britemail are going to quickly generate one signature for the mail and block most of it. If however you have the following example (using the fictional HTML HIDE tag)

Free [HIDE] from your meeting at 10:30 [/HIDE] porn [HIDE} cate suggested meeting for coffee [/HIDE] here [HIDE] I will be in work late today [/HIDE}

The message is still displayed in the browser as "Free porn here". However, filters such as those used by Mac Mail and Mozilla may not pick it up as junk because the hidden words look like real email. If you change the hidden sentences every 100 emails then the signature based spam blocking systems won't pick it up as every signature is different and (in this example) you are using real words.

One of the best solutions to this I have seen is KMail, this displays HTML mail as text and you can click a button to then render as HTML. This doesn't stop the spam, but does give you the abaility not to see many images you rather wouldn't at 10am on a Monday morning and allows you to stop web bugs (HTML code in images which can be used to indicate successful message delivery).
Re:Does not explain purpose of trick by scottme · 2003-07-23 00:10 · Score: 1

The spam message consists of "good" words and "bad" words. The "bad" words are the true message that the spammer wants the victim to read.

The "good" words are in the clear and serve to get the message through the bayesian filters; however they are hidden from the victim by being rendered in zero size fonts, white on white, within HTML comments etc.

The "bad" words are obscured from the filters by means of HTML encodings, being split by HTML comments, etc., but will show up large as life in the victim's Outlook client.

But yes, the spammers are stupid.
Re:Does not explain purpose of trick by Narcissus · 2003-07-23 00:15 · Score: 1

I believe the idea is to introduce enough legitimate, conversational text into the email but still hide that text from the receiver so that the filter decides that overall, it's acceptable.

Imagine I read out something like the Bible, but everything hundred words or so, I used an expletive. Now, all in all one might say that the subject matter was "good". However, if I spoke all of the non-swearing as fast as I could, while every time I get to swear I'd scream it out, as long as I could, then you might change your mind.

This is basically how these spam techniques work: the filters only see the words, and overall see a fairly useful email. They don't, however, see the way those words are presented, which is a different story all together.

So yes, these email techniques are all about hiding lots of words, but its the selection of words to hide and highlight that is important.

I guess it's just a form of steganography, when it comes down to it...
Re:Does not explain purpose of trick by leuk_he · 2003-07-23 00:24 · Score: 1

how spammers try to hide text.

They try to hide text from spam filters. i.e the word "free" get you some points in the spam filter. The word free ze might look like free to you but freeze to a spamfilter.

But it is just a point in the battle. Next thing that happens is that the filter will be able to recognize the hiding techniques and filter e-mail as spam when a mail contains too much markups or something like that, it is just a matter of making the spam filter smarter.
Re:Does not explain purpose of trick by nrosier · 2003-07-23 01:09 · Score: 1

One of the best solutions to this I have seen is KMail, this displays HTML mail as text and you can click a button to then render as HTML. This doesn't stop the spam, but does give you the abaility not to see many images you rather wouldn't at 10am on a Monday morning and allows you to stop web bugs (HTML code in images which can be used to indicate successful message delivery).

So how is that different to:

- Mozilla: display as ASCII, simplified HTML, HTML
- Evolution: do no load images of the web...
Re:Does not explain purpose of trick by BrokenHalo · 2003-07-23 02:13 · Score: 1

These "tricks" aren't bad for dealing with the soi-disant "content" of the mail. However, I'm probably not alone in finding that the spam I get tends to originate from a relatively small number of netblocks, and thus filtering on the basis of originating IP is a very useful tool.
I suppose that's essentially what the RBLs do, but I'm not so keen on the false-positives for which the RBLs are notorious.
Re:Does not explain purpose of trick by Anonymous Coward · 2003-07-23 02:27 · Score: 0

> For example, instead of having a email titled "Free viagra" you could write it as "F*r*e*e V*i*a*g*a*r*a" in an attempt to stop a spam stopper from spotting Viagara as easily in the title. In the body of the email you could write the html in such a way that decifering any words is quite tricky, eg writing Viagara as (font size="2")V(font size="2")iaga(font size="2")ra(/font) etc. Certainly to say spotting all variants of 'hiding' such words is not as simple as you might first think.
>
> It certainly gave me an interesting insight into the problem that it is, and how the spammers are trying and continually evolving their techniques to ensure they can carry on.
And remember kids, th3se pe0ple are 0n-ly s3nd1ng eee-m a i l 2 p3ople who have re'que'st`ed it!
Honest.
(Are you listening, publicly-traded companies hawking crap through Eddie Marin's front companies? Are you sure your marketeers are really doing their due diligence when they sign those contracts?)
Re:Does not explain purpose of trick by Hatta · 2003-07-23 03:17 · Score: 1

Why would anything in brackets be hidden?

--
Give me Classic Slashdot or give me death!
Re:Does not explain purpose of trick by Hatta · 2003-07-23 03:19 · Score: 1

Yech, I just reject email from anyone who doesn't have the decency to send it in plain text.

--
Give me Classic Slashdot or give me death!
Re:Does not explain purpose of trick by alistair · 2003-07-23 03:42 · Score: 1

This may work for many people but has some drawbacks. An increasing number of email clients seem to default to HTML editing simply to allow people to add bold and italic to their messages, even if people never use this the message may have a HTML wrapper which would cause it to be rejected by automated filtering.

It also has disadvantages in corporate environments. I work using an IMAP mail account in a corporate which has standardised on Notes. I notice that recent messages from Notes are now rendering Notes specific formatting as HTML when the recipient is a non notes user. This is actually of great benefit as people tend to use Notes to add their comments in an alternate font colour. With HTML mail I can now finally see what they are saying accurately in my Mozilla Mail client, so this is a step towards open standards.
Re:Does not explain purpose of trick by SomeoneGotMyNick · 2003-07-23 04:15 · Score: 1

In this case, it wouldn't. The bracketed text just represents the presence of an anti-spamfilter technique being used for the purpose of the explanation
Re:Does not explain purpose of trick by JohnGrahamCumming · 2003-07-23 04:56 · Score: 1

The spammer typically wants to hide text that a spam filter will read and you will not. The text is full of innocent or random words that are designed to overwhelm the spam filter and make it assume that the message is genuine.

John.
Re:Does not explain purpose of trick by Anonymous Coward · 2003-07-23 07:30 · Score: 0

In this case, it wouldn't. The bracketed text just represents the presence of an anti-spamfilter technique being used for the purpose of the explanation
Actually the common technique (this week) is to break up words the filter would react to with bogus html comments which don't allow the filter to see the words it objects to, but which don't display, leaving the spam words to display.

Getting worse by BenjyD · 2003-07-22 23:57 · Score: 5, Interesting

I've definitely noticed that my spamassassin filters are getting less effective. Six months ago, it was rare to see a spam that didn't get caught. Now maybe 10-20% get through.

As I use a sensible email client that doesn't render HTML by default, I can't even read the text of the spams anyway.

Re:Getting worse by Ed+Avis · 2003-07-23 00:05 · Score: 4, Interesting

Yes - it looks like the majority of the 'spammers' tricks' listed are silly HTML tricks. From the messages I receive, a good rule of thumb is that HTML format implies spamminess. It might be different if you regularly have to communicate with Outhouse users.

HTML rendering was added to Pine only fairly recently. Given the quantity of HTML spam out there, it might have been a mistake.

--
-- Ed Avis ed@membled.com
Re:Getting worse by Anonymous Coward · 2003-07-23 00:14 · Score: 0

There's no doubt about it that spam is getting worse AND more clever about disguising itself. When I'm at home, it's a non-issue. I use mutt and block all html mail. However, at work I'm *required* to use Outlook and management's policy has been to tolerate most spam so there is no risk of falsely bouncing legitimate messages.
Re:Getting worse by Greg+Hewgill · 2003-07-23 00:21 · Score: 1

Hopefully you've upgraded spamassassin in the last six months or so. If you haven't, do it! Older versions of spamassassin get useless very quickly.
Re:Getting worse by Anonymous Coward · 2003-07-23 00:22 · Score: 0

"There's no doubt about it that spam is getting worse AND more clever about disguising itself. When I'm at home, it's a non-issue. I use mutt and block all html mail. However, at work I'm *required* to use Outlook and management's policy has been to tolerate most spam so there is no risk of falsely bouncing legitimate messages."

So you're paid to read spam too?! Same here! I can think of worse things to get paid for. As long as you`ve mentioned it so if people wonder why things take longer than they should you can explain that you`ve already complained.
Re:Getting worse by BenjyD · 2003-07-23 00:29 · Score: 1

Yes, in fact just after posting that message I checked the version that Debian Stable uses and it's 2.20. I upgraded to 2.55, so hopefully that will be better. More time wasted by the bloody spammers.
Re:Getting worse by babbage · 2003-07-23 01:05 · Score: 1

HTML rendering was added to Pine only fairly recently. Given the quantity of HTML spam out there, it might have been a mistake.

Skimming over the changelog, it appears that Pine has had support for HTML rendering since the release of version 4.00, 8 July 1998. That's a bit over five years now.
In any case, my hunch is that rendering html in a text based mail client like Pine or Mutt should be pretty harmless. The biggest danger in rendering of html is pulling in all the images, and by so doing announce to the spammers that you are alive, well, and eager to read their mail. Pine will attempt to do a sensible layout of the HTML content, but it won't download any images.
IMO that's only a good thing -- the messages don't look like gibberish when you get html mail (whether that mail is spam or whether it's another email forward from your AOL using mom), and your privacy is still safe unless you actually follow one of the links. There's really no downside, and it short circuits a lot of the earnest but silly mailing list debates over the evils of html mail: of course html mail is evil, but with Pine (or if they ever add similar support, Mutt), you are innoculated from the risks.

--
DO NOT LEAVE IT IS NOT REAL
Re:Getting worse by Anonymous+Custard · 2003-07-23 01:08 · Score: 2, Insightful

HTML rendering was added to Pine only fairly recently. Given the quantity of HTML spam out there, it might have been a mistake.

I think that spam filters should perform HTML rendering before processing the message, or at least strip out anything in <sneaky tags> before analyzing a message. There's no excuse for something as simple as "via<invisible comment when html rendered>gra" getting through a filter.

--
$8.95/mo web hosting
Re:Getting worse by Glonoinha · 2003-07-23 02:40 · Score: 1

How about just delete any email that has invalid HTML tags. No shit, this would kill 99% of the spam I get on a daily basis.
If they go to using all valid HTML tags to break up the words just filter all email that has any HTML tags in it - if it is important enough to send to me in email, it is important enough to send without HTML tags - and 100% of the spam would get filtered.

--
Glonoinha the MebiByte Slayer
Re:Getting worse by adrianbye · 2003-07-23 02:40 · Score: 1

I now use Qurb for spam filtering. Its really simple - it creates a whitelist from all addresses in your sent mail & addressbook, and continues to build it from people you reply to.

Those messages go into your inbox. Everything else goes into a special folder, which doesn't set off the new message notifications.

This won't work well for large volumes of spam (200+ per day), but for smaller volumes it works great.
Re:Getting worse by 3.5+stripes · 2003-07-23 03:07 · Score: 1

Upgrade, and change scores.

I'm using 2.55 and every two or three days, when I get a spam that slips past, I figure out which filter could have been higher, and adjust. I've kept our spamassassin tuned, and I'd say we're running at about 96% accurate (no false positives though).

It's only 20 minutes a week, maximum.

--

He tried to kill me with a forklift!
Re:Getting worse by hendrix69 · 2003-07-23 03:54 · Score: 1

HTML spam tricks are abundant. But you can hardly afford to disable HTML email as protection.
Baysian filtering works well for text messagea and doesn't work at all for emails that employ HTML trick. A solution might be to evaluate the spaminess of a message based on it's displayable/readable text. Just like when you select text off an HTML page with your mouse - let that be what the filters process and not the lower-level stuff. Spam filter should have the abilty to asses the readability of text (font contrast, size, etc) and prioritize it accordingly with regards to the statistical analysis.
Also, bayesian filters need to be able to recognize phrases and not just words. This isn't a new idea of course but it's a shame nobody's doing anything with it.

--
The power of Christ compiles you!
Re:Getting worse by JohnGrahamCumming · 2003-07-23 05:01 · Score: 1

That's right. Good spam filters need to perform some level of HTML "rendering" at the very least recognizing the presence of HTML comments and invalid HTML tags and removing them and treating them as not adding whitespace.

More problematic is the use of as a way to break up spam words because it does not add whitespace but is valid.

John.
Re:Getting worse by JohnGrahamCumming · 2003-07-23 05:04 · Score: 1

The research shows the recognizing phrases instead of words has little additional value of the basic Bayesian system and it's computationally expensive and takes up extra storage.

Bayesian filtering can be made to work well with HTML tricks and in fact the tricks can be used to make the Bayesian system more accurate. In POPFile we use the tricks as another indicator of the type of message.

Specifically, the background/foreground contrast of text is calculated using Pythagoras' Theorem over the two color RGB values to determine readability of text. Use of unreadable text is covered by the Invisible Ink and Camouflage tricks and handled in POPFile by the Classifier::MailParse module.

John.
Re:Getting worse by hendrix69 · 2003-07-23 18:06 · Score: 1

I find it very suprising that phrase recognition doesn't add much value. Of course if you say you've done serious research in this area I believe you, but its still seems so counter intuitive. Some words are definately harmless and used in personal mail all the time but when put together are very indicative of spam. e.g. "try it for free"

I'm glad to see POPFile is implementing anti-HTML-tricks schemes, I had assumed it was a much simpler application but I guess I was wrong.

--
The power of Christ compiles you!
Re:Getting worse by r5t8i6y3 · 2003-07-25 07:36 · Score: 1

i suspect one reason John's research shows little value in phrases is due to the heavy computational load that would be required to actually integrate phrases into a Bayesian system in an automated way.

Bayesian systems are stupid - they don't understand the meaning of the words that are being analyzed. thus EVERY phrase needs to be parsed and processed. also, what is a phrase? two words? three words? four words?

if you couple human intelligence with Bayesian stupidity in selecting phrases, then phrases may increase the accuracy of a Bayesian system with an acceptable increase in computational overhead.

how much of an increase in accuracy? Bayesian systems based on words seem to consistently display accuracy in the 95% - 99% range. if a Bayesian system is being used to sort hundreds or thousands of messages at a time even an additional percent or two of accuracy would be worth a significant computational increase for some folks.
Re:Getting worse by hendrix69 · 2003-07-26 10:02 · Score: 1

Far be it for me to question John's research and knowledge in the area, but the gain from phrase recognition seems to me just too high to ignore.
The idea that only words can be suspicious is obviously incorrect, I don't think anyone is arguing with that. However, how much will it complicate things if bayesian filters were to examine pairs and/or triplets of consecutive words? It can't be that much difficult, can it? But it would make a world of difference IMO.

--
The power of Christ compiles you!

HTML mail is evil by trikberg · 2003-07-22 23:57 · Score: 5, Insightful

Most of the tricks in the article (yes, I read it) require the mail to be in HTML format. If they were not, filters would be much more effective.

I don't remember ever receiving an e-mail that actually had any content requiring it to be HTML. It would be pretty sinple to set up a mail server to bounce any incoming (or outgoing for that matter) HTML mail with a friendly notice that the server does not accept HTML mail, and to please try again using ASCII. The problem is that there are plenty of people who have no idea what they are supposed to do at that point.

Also I wonder if it could be effective for filters to detect whether such obfuscation is used rather than try to parse the contents and filter based on that. Many of the methods used are pretty obvious if you try to detect that specifically.

--
This post is free (as in cheese in a mousetrap).

Re:HTML mail is evil by BFKrew · 2003-07-23 00:13 · Score: 1

It's an interesting though, but the overwhelming thought I got through reading through their tricks is that no matter what a spam filter will be trained to catch, the spammers will adapt.

If an email is sent in plain text, then they would do tricks like write words like loan as financ etc, to just a basic example.

Sadly, whether we like it or not, HTML email -in some form- is here to stay.
Re:HTML mail is evil by Quixote · 2003-07-23 00:18 · Score: 2, Interesting

I don't remember ever receiving an e-mail that actually had any content requiring it to be HTML.
Until recently, I thought so too, till I ordered a laptop from HP. Their ordering system sends all the notices (order being processed, shipped, etc. etc.) in only HTML.
One would think that a company like HP with its resources would know better, but... <sigh>
Re:HTML mail is evil by trikberg · 2003-07-23 00:40 · Score: 2, Informative

I think you misunderstood my point. I do receive valid e-mail as HTML-only on occasion. That mail has however _never_ had any content that couldn't be presented as clearly and easily in plain text, which is what I was getting at.

This amounts to little more than an annoyance in itself, but means that I can't filter mail by throwing away everything of type text/html. If it comes from a commercial company (while still being valid) they are less likely to see my money again.

--
This post is free (as in cheese in a mousetrap).
Re:HTML mail is evil by hacker · 2003-07-23 01:02 · Score: 1

Sure, here you go (for sendmail):
SCheckContentType Rtext/html$* $#error $: 550 We do not accept HTML-formatted mail here; please resend as plain text. R$* $@ OK

I also use a set of other rules to block 'charset=koi', images, and other unnecessary attachments. YMMV of course.
Re:HTML mail is evil by Urkki · 2003-07-23 01:03 · Score: 1

What is to stop the spam filter "rendering" the mail as html well enough so the parts that aren't visible to the user are also stripped from the filtering. And also if most of the text is invisible to user, and/or invalid HTML, I'd say that's a good indication it's spam anyway.
Re:HTML mail is evil by babbage · 2003-07-23 01:35 · Score: 3, Funny

One of my favorite internet quotes is apropos here:
Only an idiot doesn't go into his e-mail preferences and specify Plain Text instead of HTML. This is such a sane use of resources I believe it was actually mentioned in the Kyoto Accord.
-Roger Ebert

:-)

--
DO NOT LEAVE IT IS NOT REAL
Re:HTML mail is evil by shane_rimmer · 2003-07-23 01:41 · Score: 1

I would add that only an idiot would write an email client so that it defaults to HTML format for mail composition. Why should the user be the idiot for not changing the default behavior when a neophyte would assume that is the appropriate format?
Re:HTML mail is evil by Mhtsos · 2003-07-23 03:11 · Score: 1

What's more, this approach permits the formatting properties of the mail to be criteria for spam detection. Big red letters for example can weigh towards classifying the mail as spam.
Re:HTML mail is evil by belroth · 2003-07-23 03:26 · Score: 1

Hmm, I don't run sendmail myself. I use kmail and have many filters to sort my incoming mail, which helps a lot - I rarely get anything I want left in my inbox now. The last filter I have is a kmail suggestion, where "content-type" is "text/html" I bounce the message and move to a 'pending-deletion' folder for review, just in case. I would like to send a similar message saying that I don't accept HTML mail but I can't see a way to do this in Kmail - does anyone have any suggstions?
I know bouncing and sending a message is either stupid (confirming a live address to a spammer while bouncing it) or redundant (if it's not a spammer) but I can decide which is best once I know how to send a message, if I can.

--
I hereby inform you that I have NOT been required to provide any decryption keys.
Re:HTML mail is evil by jeremyp · 2003-07-23 04:11 · Score: 1

Bouncing a message with the SMTP response "550 user unknown" is probably OK.

--
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
Re:HTML mail is evil by hacker · 2003-07-23 04:19 · Score: 1

I know bouncing and sending a message is either stupid (confirming a live address to a spammer while bouncing it) or redundant (if it's not a spammer) but I can decide which is best once I know how to send a message, if I can.

Not to mention, you yourself could be seen as a spammer for bouncing messages back. I certainly hope you're not bouncing them back based on the "From:" line in the message headers, because 99.999% of those are forged now, and can quite-possibly be an innocent person's email.
If a spammer decides to use myrealname@mydomain.com to spam 100,000 people, and someone decides to bounce that "spam" back to the "sender" (which in this case would end up going back to me, not the sender of the spam), you can bet I'll be immediately reporting them to their ISP/provider for UCE, and their access will be cut, based on the AUP of their provider. Most providers do not tolerate spammers inside their networks, and it's strictly against their AUP.
So just be aware, if you happen to bounce a message that looks like spam, back to someone like myself, who is not sending you spam, do not be surprised when your internet access gets cut, or you get a call from your provider for being a spammer yourself.
Do not EVER bounce messages back. Ever. Not only is it wrong, it is inept, and you're not helping the problem. In fact, you're propagating it, and affecting the bandwidth of everyone else between you and the (incorrect) recipient of that bounced message.
And lastly.. to those who think they're going to outsmart those of us who report UCE and spam by configuring your mail servers to query my web server for every single message you receive I will configure the server to return positive results for any lookup from your mail server. Period.
Re:HTML mail is evil by Reziac · 2003-07-23 04:51 · Score: 1

How about applying one of those filters that strips HTML and leaves behind only plaintext? That should render spam filterable in more normal ways, without killing any legit mail from the clueless (or from people using mail clients that *cannot* do plaintext, like AOL6.0).

--
~REZ~ #43301. Who'd fake being me anyway?
Re:HTML mail is evil by ahodgson · 2003-07-23 05:07 · Score: 1

Bouncing mail you don't want is not against anyone's AUP. If you really think you're gonna get someone's access cut because they bounce spam, you're an idiot.

In fact, bouncing mail that your system does not accept, to the envelope sender, is an RFC requirement. People must receive bounces for messages that are not delivered, period.

Yes, occassionally someone whose address gets used to send spam will get a ton of bounces. But the price of the alternative is to make E-mail even more useless for communication where the parties must be able to assume the message was received.

Now, having said that, of course I don't bounce spam that actually makes it into my mailbox. What would be the point. But I certainly issue 550 rejects on my DNSbl blocks, so any real mail rejected there will get bounced back to the sender.
Re:HTML mail is evil by JohnGrahamCumming · 2003-07-23 05:10 · Score: 1

In POPFile we use the presence of a trick as an indicator of the type of mail and feed that into the Bayesian engine.

All of the tricks outline on the ActiveState web site are covered in the latest version of POPFile with "pseudowords" such as trick:invisibleink. The presence of such tricks is a strong indicator of spamminess.

The irony is that the more the message is obscured the easier it is to filter.

It's worth noting that my research shows that 85% of all spam is in HTML format which I think is due to the fact that it's easy to include these tricks AND it's easy to make the spam message more compelling with colors, fonts, and pictures.

I personally agree that HTML email is evil and wrote an article on this very topic here: http://www.usethesource.com/cgi-bin/article.pl?sid =03/04/07/122224

John.
Re:HTML mail is evil by JohnGrahamCumming · 2003-07-23 05:12 · Score: 1

This is very close to what we do in POPFile. But we realized that it was not necessary to do the full render (which would be time consuming) because the presence of the tricks is enough to spot the spam.

John.
Re:HTML mail is evil by hacker · 2003-07-23 05:22 · Score: 1

Show me which RFC covers sending mail to someone that never sent you an email. You call this a "bounce", except that it isn't a bounce, because the mail never came from the person you bounced it to.
Have you been following the recent tricks that spammers use at all? Or are you just making this up?
While I agree, that a valid system, sending valid mail, with a valid Return-Path, Reply-To, and From (as well as "From:") header is completely legitimate to bounce, one where those fields are either missing, or invalid, should NOT BE BOUNCED.
Re:HTML mail is evil by belroth · 2003-07-23 06:28 · Score: 1

someone decides to bounce that "spam" back to the "sender" .....
I'll be immediately reporting them to their ISP/provider for UCE, and their access will be cut, based on the AUP of their provider. Most providers do not tolerate spammers inside their networks, and it's strictly against their AUP.
How is it commercial?

--
I hereby inform you that I have NOT been required to provide any decryption keys.
Re:HTML mail is evil by elemental23 · 2003-07-23 07:12 · Score: 1

If a spammer decides to use myrealname@mydomain.com to spam 100,000 people, and someone decides to bounce that "spam" back to the "sender" (which in this case would end up going back to me, not the sender of the spam), you can bet I'll be immediately reporting them to their ISP/provider for UCE, and their access will be cut, based on the AUP of their provider.

Please explain how bouncing mail can be considered spamming. Any ISP with half a clue wouldn't look twice at someone bouncing HTML mail. Do you even know what spam is? (hint: unsolicited commercial and/or bulk e-mail)

Most providers do not tolerate spammers inside their networks, and it's strictly against their AUP.

Except in this case the spammer is not inside their network, the recipient of the spam is.

Do not EVER bounce messages back. Ever.

Funny. A couple posts up in this very thread you posted a couple of lines of sendmail config to do exactly this, bounce HTML mail. So which is it?

I'm not even sure what you're attempting to say in that last paragraph. What does your web server have to do with mail, UCE or otherwise?

--
I like my women like my coffee... pale and bitter.
Re:HTML mail is evil by hacker · 2003-07-23 08:56 · Score: 3, Interesting

Funny. A couple posts up in this very thread you posted a couple of lines of sendmail config to do exactly this, bounce HTML mail. So which is it?

As you know, blocking mail at the MTA is not a bounce. "A couple of posts up", I posted a bit of a sendmail hook that blocks (i.e. rejects before receipt) mail with the Content-Type of text/html. That is not a bounce. I am not regenerating an additional email, which would be sent to an incorrect (in most cases, innocent) recipient.
Starting yesterday, my mail server has been thwarting an attack from 2,734 separate external machines, all trying to send a message to 3 non-existant users on 1 domain that I host which has 0 mail accounts, no website, and no users behind it. It's a registered domain pointed to my IP address, nothing more.
So far today, we've received 15,833 separate attempts to send mail from these 2,734 hosts that my server has blocked (with a quick virtusertable hook to send them 'nouser'). The number of unique external hosts has been slowly increasing. It was 1,633 at the end of yesterday, and has now grown to 75% more than that number, up to 2,734 as I type this.
THESE are bounces. Clearly someone has sparked off a trojan somewhere that was lurking inside a LOT of companies in a lot of machines (some of the domains are worldbank, dell.com, aol.com, etc., CLEARLY not spammers inside these companies, not THIS many of them) who are now trying to send this one message to these same 3 non-existant users at this 1 domain.
I just checked again, from the time I started typing this reply, and we're up to 2,746 hosts trying to send this 1 spam message to these 3 non-existant users.
So trust me, I'm well aware of the difference between blocking a message and bouncing a message.
Are you?
Re:HTML mail is evil by elemental23 · 2003-07-23 09:38 · Score: 1

As you know, blocking mail at the MTA is not a bounce.

"Blocking" suggests dropping it on the floor. If it's returned to the sender, it's a bounce.

Or alleged sender, I should say. As you know, forging the envelope sender is just as trivial as forging the From: address. If you return mail with a "550 whatever", you have no way of knowing if it's going back to the actual sender or not.

--
I like my women like my coffee... pale and bitter.
Re:HTML mail is evil by dwsauder · 2003-07-23 14:53 · Score: 2, Interesting

Oh, I have to agree with you.
Now, here's a funny story. I was at the FTC Spam Forum a while back. There were some of the more responsible email marketers there -- you know, the ones that send out regular newsletters for opt-in subscribers -- and they were whining and complaining because spammers have spoiled "rich" email for them. Just a few years back they had visions of eventually being able to send email with flash, animated graphics, fancy styles, and so forth. And now they realize that people don't want to receive those kinds of emails because of spam (and to some extent viruses). So they whined about it. I guess for them email is "push" marketing, while for the rest of us, email is a way to communicate with co-workers and friends. Who needs HTML to say "wanna go get some lunch?"
Re:HTML mail is evil by psamuels · 2003-07-24 06:11 · Score: 1

"Blocking" suggests dropping it on the floor. If it's returned to the sender, it's a bounce.

That's exactly what hacker is saying. His MTA is blocking the mail, dropping it on the floor. And producing a 550 status code, telling whoever tried to send the message that it is being dropped on the floor, instead of a 200 code, suggesting that it will be delivered.

hacker's MTA is not returning anything to sender.

If you return mail with a "550 whatever", you have no way of knowing if it's going back to the actual sender or not.

Also true. Whoever connected to his sendmail daemon can either generate a bounce, or not generate one. If the spammer connected directly to the SMTP port, he will almost certainly not bother to send a bounce that he knows is just a waste of his own resources. If, instead, the spammer has used an open relay and this relay is connecting to hacker's port 25, then the relay might (incorrectly) generate a bounce to the envelope sender - but this is the fault of the open relay, which never should have accepted responsibility in the first place for remotely delivering a message from an unknown sender.

--
"How can you claim that you are anti-crack, while still writing a window manager?" — Metacity README

Re:Font size by Anonymous Coward · 2003-07-22 23:57 · Score: 1, Informative

Try Ctrl-+ in Mozilla or Mozilla FireBird.

My approach by gowen · 2003-07-22 23:59 · Score: 5, Interesting

Bayesian filters are all well and good, and are -- for now -- effective. But given these tricks, the only really reliable approach I've found is IP blacklists for repeat offenders. If your machine is used to spam me, and my complaint letter is not answered in a satisfactory way (i.e. an email saying "We are sorry. The spammer has been cut off") I don't accept mail from you any more.

And if you're on ATTBI, or Comcast, or PBI.net, or BT Openworld, or Chello, or any number of large ISPs with too much tolerance for spammers, and you're not on my whitelist, I can't read your emails.

And I don't care. Get a ISP who don't shelter spammers.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.

Re:My approach by Anonymous Coward · 2003-07-23 01:01 · Score: 0

Not to troll, because this is certainly true where I live, but what if ATTBI or Comcast happen to be the ONLY viable Broadband alternatives in your area?

- DRFSR
Re:My approach by gowen · 2003-07-23 01:27 · Score: 2, Funny

because this is certainly true where I live, but what if ATTBI or Comcast happen to be the ONLY viable Broadband alternatives in your area?
Then I'll probably never get to see email from you. You haven't lost that much, I'm not a very interesting person.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:My approach by AndroidCat · 2003-07-23 01:30 · Score: 1

They might be your only connectivity option, but they certainly aren't your only email option. The menu starts with a hotmail account and continues on past smarthosting.

--
One line blog. I hear that they're called Twitters now.
Re:My approach by bklock · 2003-07-23 01:36 · Score: 2, Insightful

Using Text Classification techniques in a spam filter is overall a good idea. (Bayesian systems are only one system for text classification, but they seem to be getting all the attention when it comes to spam)

The problem, though, is that they don't work on raw text. The text must first be 'featurized', using either a Feature Selection or Feature Extraction algorithm.

The 'Bayesian' part of anti-spam filters is pretty robust, and should theoretically be able to handle almost all tricks spammers through at them, but the current state of Feature Selection is pretty embryonic.

All of the tricks in the article fool the tokenizers currently used into producing features inconsistent between spams. No consistency == No classifier. The problem is that a email is not a 'bag of words', but we classify them as if they are.

What we need, is to extract features which are more similar to the types of features a human looking at the message would use to make the spam / not-spam determination.

There is a lot of ongoing research in this general area, but to the best of my knowledge, nothing has made it into spam filters yet.

In the mean time, a lot can be gained but running the Feature Selector / Bayesian filter on the email after its been rendered. Ideally, the filter needs to see exactly what the user will. Anything less is a disconnect between the two that will allow spammers to get to the user messages that get past the filters.

One good feature that could be extracted from an email and fed to a filter, would be statistical analysis results of rendered vs. not rendered text in the email. Look at the amount, type, and distribution of non rendered text, etc. in spam vs. ham
Re:My approach by scottme · 2003-07-23 03:33 · Score: 1

The problem is that a email is not a 'bag of words', but we classify them as if they are.

But we don't need to.

Along with the rest of the world, after I read Paul Graham's Plan for Spam, I implemented it myself (in my case, within Lotus Notes). One of the options I played with was how to tokenize the message. And it turned out that simply looking at n-tuplets (i.e. sequences of n characters, irrespective of what they are) works just as well as attempting to parse out the "words". You end up with a fairly significantly larger token corpus, and some adjustments downstream, such as examining a larger set of significant tokens, seemed like a worthwhile thing to do, but overall the accuracy of the filter was just as good using n-tuplets as it was using words.

FWIW, I did settle on tokenization for words, for the simple reason that I liked to be able to see what the significant trigger words were. And I abandoned my little tool as soon as found another one that worked for me so I didn't have to keep tinkering with it myself (I have a real job, after all).
Re:My approach by brlewis · 2003-07-23 07:17 · Score: 1

Along with the rest of the world, after I read Paul Graham's Plan for Spam, I implemented it myself

Is this some subtle sarcasm? Obviously a lot of people haven't read, or at least understood, Paul's essay. E.g. you're replying in a thread that started with someone saying "use IP addresses", who didn't read the part of the essay saying how the info in the headers is often as damning as the body of the message. You're replying directly to a posting that doesn't understand how fragmented words make the Bayesian filter's work easier, until you get down to one-letter fragments.

Anyway, your n-tuplets post is interesting. That technique would defeat the "No Whitespace No Cry" technique, which is the only technique in the article that could get past a Bayesian filter, and then only on the condition that it randomize the separators a lot. It would be good for a Bayesian filter that finds a lot of very long tokens to revert to the n-tuplets method.
Re:My approach by Goo.cc · 2003-07-23 07:19 · Score: 1

Or you could get your own domain name. That's why I did, as in 3 years I've gone from Media 1 to AT&T to Comcast.
Re:My approach by Anonymous Coward · 2003-07-23 07:21 · Score: 0

you just replied to an AC! Liar! =)

(hint, check his .sig)
Re:My approach by brlewis · 2003-07-23 07:23 · Score: 1

Since neither you nor the moderators who modded you up could be bothered to actually read Paul Graham's article, I'll quote it for you here:
In a sense, though, my filters do themselves embody a kind of whitelist (and blacklist) because they are based on entire messages, including the headers. So to that extent they "know" the email addresses of trusted senders and even the routes by which mail gets from them to me. And they know the same about spam, including the server names, mailer versions, and protocols.
Re:My approach by Vainglorious+Coward · 2003-07-23 07:37 · Score: 2, Informative

the only really reliable approach I've found is IP blacklists for repeat offenders

I also use IP blacklists (locally compiled and various RBLs) but this is becoming less effective as the spam gangs are moving to using their own army of proxies rather than the traditional exploitation of open relays or throw-away accounts. I'm not saying that ISPs shouldn't be responsible for what emanates from their networks, but these trojaned users are a very different kettle of fish than spammers having "pink contracts" with spam-friendly ISPs.

--
My next sig will be ready soon, but subscribers can beat the rush
Re:My approach by bklock · 2003-07-23 10:17 · Score: 1

You're replying directly to a posting that doesn't understand how fragmented words make the Bayesian filter's work easier, until you get down to one-letter fragments.

I wrote that parent post, and your comment leads me to believe that I was unclear in my implication.

None of the html techniques listed will get past the filter for ever. Given the same exact message twice, the second time, the filter will gobble it once it has been marked as spam the first time.

If the word 'penis' appears in spam constantly, the filter catches that. If the word PExNIS occurs, constantly, the filter catches that too. But if the 'x' is replaced with a different substring in every spam, so the same token never appears in multiple spams, the standard tokenizers, whether n-tuplet based or word based will fail.

Even if I have in my spam box 1000 spams containing 'Penis', A new email containing PcarEcatNcabIcanS provides no information to that filter if we are using tokens or n-tuplets as the features entered into the filter. Of course, once that string is seen once, it is learned.

Basically, if the stakes become large enough, spammers can adapt just as fast as your Bayesian filter does. The filter can never adapt to find spamminess in a manner that can not be expressed by the features being input. It is not the n-tuplets in a message which make spam spam. Its the meaning and the human-understanding generated in the reader of that message. THe n-tuplets / words/ tokens, etc. are simply an approximation so we have something to feed the filter. To make a perfect spam filter, we'd need a feature extraction algorithm which produces the same features/qualities that the reader sees from the message.
Re:My approach by leviramsey · 2003-07-23 12:00 · Score: 1

This is part of why I think that SpamAssassin may well have the best Bayesian implementation, if only they modified it to do the Bayesian computations as a last step in the filtering process. Then some of the tokens the Bayesian scanner would look for would be from the summary of regex matches that SpamAssassin found. Since most of these transcend the simple tokenization used by most Bayesian algorithms, this should be more effective.
Re:My approach by gowen · 2003-07-23 22:05 · Score: 1

It's not really the exploit that is the deciding factor, its the ISPs response to the initial notification. If abuse@domain reply saying that they've informed the user that they've been trojaned, or taken some steps to prevent it happening again, they remain connected.

Usually, the only people who don't show such good faith are the "pink contract" ISPs you mention, and some large ISPs that lack the will to properly staff the abuse department.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:My approach by brlewis · 2003-07-24 02:53 · Score: 1

Ok, now that you're getting more concrete it sounds like you do understand how all this works. However, I don't think you've proven that a Bayesian filter can't keep up with spammers. In Better Bayesian Filtering, Paul Graham talks about degeneration, where if a token isn't found in the corpus, you fall back on a simpler version of it, e.g. fewer exclamation points. Add a couple simple degeneration techniques and you'll catch PcarEcatNcabIcanS and pppppeeeeennnniiisssxyqhh.
Basically, there are two directions a spammer can go to bypass the Bayesian filter's favorite keywords. Either they break words up so that the filter sees shorter tokens, or they do something else that makes the filter see longer tokens. Shorter non-word tokens will just end up being learned as spam markers. Longer tokens can be handled via n-tuplets plus a few degeneration techniques. I think the number of different ways you can obfuscate words but still keep them recognizable to humans is limited.

The ultimate spam filter defeater. by Anonymous Coward · 2003-07-22 23:59 · Score: 3, Funny

I've often had spam get past every one of my filters, simply by being an innocuous subject (something like "Hi there, how's it going") and then a message body completely empty of any content.

I thought that was a pretty impressive attempt by those nifty spammers. Cut out all the bits of spam I ignore (such as offering me crap, giving me html email, popups etc) but keeping the bits I really hate (getting pissed off at receiving spam at all)

Well done kids, hope you keep it up!

Re:The ultimate spam filter defeater. by Anonymous Coward · 2003-07-23 00:02 · Score: 0

Whatever you do, DO NOT mark these messages as spam. For learning-filters they simply make it so much more likely the spam filter will get false positives, and junk needed messages. It's a spammer's way of undermining consumer confidence in spam filters, and making it more likely you'll have to read through messages yourself.

Wha? by BabyDave · 2003-07-23 00:00 · Score: 1

So ... this is about techniques to grow spam in fields? Come on guys, April Fool's Day was months ago, and nobody would believe something that stupid!

"so that anti-spam filters improve" by ih8apple · 2003-07-23 00:00 · Score: 1

what really needs to happen is to make spam an unprofitable business somehow...improving filters will just continue the battle between spammers and filter makers indefinitely...as long as they're making $$$ from the .00001% of people who actually click on the links and generate money, the battle will never end.

--

Why do I h8 apple?

Render the HTML then use OCR by thelandp · 2003-07-23 00:00 · Score: 5, Interesting

Here's a crazy idea... (but is it crazy enough?)

All of these spamming techniques seem to involve visual tricks, because the rendered HTML is viewed in a very different way to a human than the plain text would be seen by the filter. Things like zero-height fonts, or white-on-white text, or just using one big image etc. etc.

So how about this: I think every single one of these tricks would be defeated by using this process for filtering spam:

1. Render the html to an image (not on the screen, just behind the scenes)
2. Feed the image into OCR
3. Then scan the OCR text for spam

Sure OCR is not perfect, but since these techniques are imprecise already, maybe it would work well.

Although I guess processing power is a limiting factor, but maybe someday this will be worth doing.

--

-- the only thing we have to fear is really scary things

Re:Render the HTML then use OCR by dillkvast · 2003-07-23 00:46 · Score: 1

This may actually be something worth looking into. A lot of todays spam embed the message into images as well. This makes traditionally bayesian filtering impossible, because the rest of the mail can be "poisioned" with ligitimate words.

However combining i.e. Clara OCR with a HTML renderer and a bayesian filtering suit might work. Of course this will require a lot more computing power.

--
Scitne aliquis remedium potimum crapulae?
Re:Render the HTML then use OCR by iangoldby · 2003-07-23 00:50 · Score: 1

It would be easier to pipe the HTML through lynx before feeding it into a spam filter. That wouldn't get rid of the zero-size font tricks, but a spam filter could easily be trained to see spaces as not significant.

Having said that, I have a very effective spam filter that simply rejects any message that contains the string 'att1.htm', but none of the letters e, i, o, or u. In effect, any message sent as just HTML (i.e. no plaintext version) gets rejected as spam. I've never had a false positive with this, as Outlook always includes a plaintext version for non-HTML mail clients.

More recently, spammers have started including the plaintext string "This message contains an HTML formatted message but your email client does not support the display of HTML. Please view this message in a different mail client or forward this email to a web-based mail system." Of course, that's trivially easy to detect as spam.
Re:Render the HTML then use OCR by hacker · 2003-07-23 00:56 · Score: 4, Interesting

You could also just take the HTML, run it through a series of Perl modules (XML::LibXML, HTML::Lint, HTML::Clean, HTML::FormatText, etc.) and return just the textual representation of the content itself, and then scan/score that.
Doing so would then compress whitespace, remove colors, and basically un-SPAM the SPAM. I do this for web content, which I need re-rendered as text-based articles before they are sent to the client. It's about 12 lines of Perl, and can be easily stuffed into a SpamAssassin milter. If you want some working code, feel free to contact me (I'm also for hire, so I can do this as c custom gig for you or your company).
In fact, you could probably put a small function in your milter to just strip all HTML entirely, before the client ever sees it. There's no need to use OCR (and the overhead associated with it) to handle this, just turn the HTML back into text. It works with foreign, encoded, obfuscated entities, and should be no problem to correct before scoring.
Re:Render the HTML then use OCR by Zocalo · 2003-07-23 00:58 · Score: 2, Insightful
Alternatvely, you could just make the HTML parser aware of the tricks via some easily extensible mechanism and run the spam content detector on the output. For example:
1. Receive HTML email
2. Remove any HTML comments
3. Remove any "non-standard" tags
4. Remove any redundant tags ( Viagra )
5. Remove...
6. Pass remnants to content filtering app.
On the otherhand, any HTML email with an excessive HTML comment to content ratio is almost certainly spam anyway, and should probably be discarded as a result.
--
UNIX? They're not even circumcised! Savages!
Re:Render the HTML then use OCR by fleafan · 2003-07-23 01:07 · Score: 1

The idea isn't crazy at all. Just silly. Firstly, consider the amounts of processing power needed for an ISP (or your own mail client) would need to parse, render, OCR and analyse every single email. I worked most of last year on a project that involved OCR'ing extremely large amounts (1.000.000 words/day) of scanned text, and I can tell you that (good quality) OCR is a pretty time-consuming process.
Secondly, it's not very hard to fool the machine that does the OCR. I doubt that it would catch things like 'F*R*E*E' and similar visual stunts.
Re:Render the HTML then use OCR by kris · 2003-07-23 01:51 · Score: 1

Why?

If anything contains that many tags, that many entities, that many accented characters, then it surely is spam. There is no need at all to decode it. You just drop it. Quickly.

Kristian
Re:Render the HTML then use OCR by DukeyToo · 2003-07-23 01:52 · Score: 1

I think you are on the right track, but OCR is not the way to go, since it can easily be confused, plus it is very processor intensive. Any technique that we apply to filtering spam has to be resistant to being confused by the next generation of spam.

A better idea (I think) is to have an HTML "normalizer", i.e. a parser that takes the HTML and simplifies it into its simplest representation. This would get rid of many of the HTML tricks like entities, white text, etc.

This problem has already been addressed in a simple form in an international programming competition (earlier this year I think). I do not recall the URL, but the task was to simplify a document containing a subset of HTML-type tags into its simplest form.

Even a simple HTML validator would be helpful, because some of the tricks involve invalid HTML. If the HTML in a message is invalid then it is unlikely to be non-spam.

As for the Javascript trick, IMO Javascript has no place in emails.

PS: Kudos to ActiveState & Dr JGC, excellent stuff.

--
Most writers regard truth as their most valuable possession, and therefore are most economical in its use - Mark Twain
Re:Render the HTML then use OCR by babbage · 2003-07-23 01:56 · Score: 2, Informative

Surely you aren't suggesting that it makes sense to OCR all the massive volume of mail that the average email server has to process every day, are you? That's like advocating a tactic that is bigger, slower, and not likely to be much more effective than just calling in a couple of lightweight Perl modules to get the same result.
The main problem that OCR would solve is when the text is contained in an image file, but it really wouldn't solve it. OCR would break down for the same reasons that the new wave of "a word appears in distorted text in this image, type that word below to proceed" filters that some sites are beginning to use: picking text out of an image file can be a very tricky problem if that image wasn't made for readability (as most web graphics aren't). Rather, I'd argue that the very presence of one big image & no supporting text is a strong spam indicator, and you can go with that assumption without having to bring in the heavy OCR machinery (which might or might not be right anyway).
I've been thinking that, if the idea is for spam filters to work on what the human sees, then the natural tool to use would be the standard html renderer that already is fine tuned for turning html (even wacky html) into rendered text. Rather than OCR, find a way to hook Gecko or KHTML into SpamAssassin and take it from there.
The problem with this though is the same as the OCR problem, though I'm guessing not as extreme: embedding a full featured html engine inside of a network level spam filter is a massive amount of overhead to add to a process that needs to be able to handle massive realtime throughput.
A more clever approach is to skip it and say that HTML itself is a spam indicator, if not an absolute one. But then there's a fine line to be found in determining which HTML mails are kosher & which aren't without resorting to a very heavy & still imperfect solution like Gecko or OCR. If it's all an image, trash it, but anything in between is going to take some strategy (and anything in between shouldn't need OCR).

--
DO NOT LEAVE IT IS NOT REAL
Re:Render the HTML then use OCR by Anonymous Coward · 2003-07-23 02:36 · Score: 1, Insightful

Thats a better idea but the spammer's will find away around it. If the spammer learns the filter then he can program (in javascript, vbscript) to display the text using a timer so when the spam filter takes the snapshot it won't get the spam. Is there any way around this??
Re:Render the HTML then use OCR by Anonymous Coward · 2003-07-23 03:48 · Score: 0

Yes. No legit E-mail will be using JavaScript or VBScript. All such messages can be immediately tagged as spam. So it's simple to get around these kinds of tricks.. just remove all the html tags, convert html characters into text and then check out the E-mail using Bayesian filtering. I'm surprised that filters don't already do this.
Re:Render the HTML then use OCR by Ben+Hutchings · 2003-07-23 04:30 · Score: 2, Interesting

Doing so would then compress whitespace, remove colors, and basically un-SPAM the SPAM.

That would defeat obfuscation of spam keywords. However, many of the tricks (such as using identical or similar colours for text and background) are ways to include un-spammy text that the filter will see but the human recipient won't. Converting to plain text leaves them in, but they should actually be ignored.
Re:Render the HTML then use OCR by hacker · 2003-07-23 05:20 · Score: 1

Converting to plain text leaves them in, but they should actually be ignored.

Except that it doesn't.
Text is text, not color. I don't think you've actually tried this at all, so you aren't speaking from a position of knowledge. You aren't just "flattening" HTML, you're converting it to basically the equivalent of what a cut-n-paste in a browser view of the email would provide, i.e. text, and only text. Not comments, not color, not formatting. Text.
Where is the spec that defines color in 7-bit ascii text?
Re:Render the HTML then use OCR by JohnGrahamCumming · 2003-07-23 05:33 · Score: 1

This would not work well because HTML::FormatText removes all the color and fonts that spammers are using to hide text and doesn't render tables.

John.
Re:Render the HTML then use OCR by Anonymous Coward · 2003-07-23 05:59 · Score: 0

but the rendering of the email. pulling all the images and text, formatting. just fetching the images you are giving the spamers money. they would probally count those as a opened email, and guess what. now you get more spam. not a good idea. i say , drop all html email. or parse out all the html tags. or just bounce all html mail, with a reply. stating the companys email policy. (with link) stating TEXT EMAIL ONLY!!!! simple, effective, clean, and user annoying...

anyone need a out of work cisco CCNP, i do openbsd/linux servers, email, imap, pop, ssl, listservers, nntp, bgp, ospf, and more.... a dime a dozen?
Re:Render the HTML then use OCR by hacker · 2003-07-23 06:02 · Score: 1

This would not work well because HTML::FormatText removes all the color and fonts that spammers are using to hide text and doesn't render tables.

Sigh.
If the text that the spammer is trying to hide is inside the html tags themselves, the user isn't going to see it anyway. What text are you referring to?
If the spammer uses something like:
HA!

That would render as:
HA!

When converted to text, from HTML. It doesn't render "white on white" as spaces in text. Have you actually read the POD on the module in question?
And yes, it does convert tables, but if you want actual "pretty" tables, you'd want to use HTML::TableExtract anyway, for that. I use these modules quite extensively in a lot of functional perl code, where they are passed HTML'esque code, and I've never seen them miss any visible text, including white on white or text with interspersed comment tags between them.
Re:Render the HTML then use OCR by JohnGrahamCumming · 2003-07-23 06:28 · Score: 1

Yes, I read the POD.

The problem is that white on white text needs to be turned into spaces. Since the user doesn't see the text it shouldn't be used for making a decision about whether the email is spam or not. Typically a spammer will do something like this:

Buy Viagra!
How are you?

Whatever renders the HTML needs to not include the words "How are you?" since they are not going to be displayed to the user and could poison the filter.

Yes, I read the POD, it's not very informative.

John.
Re:Render the HTML then use OCR by JohnGrahamCumming · 2003-07-23 06:42 · Score: 1

And here's a concrete example...

Run the following:

require HTML::TreeBuilder;
$tree = HTML::TreeBuilder->new->parse_file("test.html") ;

require HTML::FormatText;
$formatter = HTML::FormatText->new(leftmargin => 0, rightmargin => 50);
print $formatter->format($tree);

with the following input (which illustrates two tricks: invisible ink and the slice and dice table trick):

Buy Viagra!

It was nice seeing you last night

<table border=0 cellpadding=0 cellspacing=0>
<tr valign=top>
<td>V s F</td&gt ;
<td>i a R</td&gt ;
<td>a m E</td&gt ;
<td>g p E</td&gt ;
<td>r l</td&g t;
<td>a e</td&g t;
<td>  s</td&gt ;
</tr>
</table>

and you get the output:

Buy Viagra!

It was nice seeing you last night

V
s
F

i
a
R

a
m
E

g
p
E

r
l

a
e

s

So the invisible text is now visible and ready to fool the spam filter and the table has essentially been ignored.

John.
Re:Render the HTML then use OCR by hacker · 2003-07-23 07:52 · Score: 1

While I could write all of your code for you, I won't. Note too, that we were talking about using Perl to un-SPAM the spam. At no point was it suggested by anyone, that one set of Perl, could solve ALL of the SPAM tricks proposed in the article. Each one requires special attention and testing. That being said, maybe this will give you some more ideas. If I spend another 10 minutes on this, with the addition of HTML::TableExtract, I bet I could easily replicate that back as normal text again, in non-Slice-n-Diced format. use strict; use HTML::Entities; use HTML::TokeParser; use HTML::Strip; my %verb = (S => 4, # start tag E => 2, # end tag T => 1, # text element C => 1, # comment D => 1, # declaration PI => 2); # processing # instruction my $p = HTML::TokeParser->new(\$content); my $nff_content; while( my $t = $p->get_token ) { if ($t->[0] eq 'S' and $t->[1] eq 'font') { my $attr = $t->[2]; delete $attr->{face}; my $attributes = join(" ", map {qq{$_="$attr->{$_}"}} keys %$attr); $nff_content .= ""; } else { $nff_content .= $t->[$verb{ $t->[0]}]; } } my $decoded = decode_entities($nff_content); my $hs = HTML::Strip->new(); my $clean_text = $hs->parse($decoded); $hs->eof(); print $clean_text;
Re:Render the HTML then use OCR by Ben+Hutchings · 2003-07-23 07:57 · Score: 1

You don't seem to understand the problem.

The problem is that spammers can fool many filters by including "good" text in their spam that won't be visible on the screen. The filter does effectively read it as plain text, and they take advantage of this.

Now if what you're saying is that the result of this conversion should be delivered instead of the original message body, rather than just used for filtering, I see your point. However, I really don't think spammers are hugely concerned with readability, so it wouldn't deter them or help the recipients much.
Re:Render the HTML then use OCR by chathamhouse · 2003-07-23 12:23 · Score: 1

This is slashdot at it's best.

Thanks for bringing a major utility of these modules to my (and others') attention.
Re:Render the HTML then use OCR by Tokerat · 2003-07-23 16:11 · Score: 1

What about hashing the raw e-mail body ASCII before doing OCR? That way, the hash could be compated against a database of known hashed-and-determined-to-be-spam e-mails. Since 90% of the e-mail on the net is spam, this will save the time of having to OCR a good majority of the e-mail.

Still, for this to be realisticlly implemented, I woudl say the mail server should be broken into two machines (or perhaps a single dual-processor machine), each of which would take the task of traditional mail server, and OCR scanner, respectively. This, in conjunction with traditional methods liek SpamAssassin, could prove to be quite helpful.

Of course, then you have the additional problem of loading the actual images...sometimes e-mail images are included as a link to a webserver, and just by loading the image, the spammer can check his logs and know your address is legit...by loading the images to OCR them for spam checks, you're basically validating the spammer's e-mail database for him. Of course, if this kind of scanning becomes commonplace, images could also be confirmed from e-mail addresses that do not exist, as opposed to those that do exist, while those that really are there get filtered out based on results from a bad addresses (i.e. if a good address gets mail with an unmatched hash, do not try and load it for scanning until either a bad address gets it, or a predetermined amount of time). That way, only bad addresses will be confirmed and the good addresses can have the spam filtered out before it reaches them! Not only do you keep the crap out of people Inboxes, but you can render a spammer's database completely invalid!

--
CAn'T CompreHend SARcaSm?
Re:Render the HTML then use OCR by ron_ivi · 2003-07-23 22:01 · Score: 1

Actually, I thought practically all emails with important messages embedded in images are spam. For that matter practically all html email is spam as well... I'm having great luck with a spam filter that simply bounces HTML to a junk-mail folder.
Re:Render the HTML then use OCR by dillkvast · 2003-07-23 22:25 · Score: 0

You are right... But, how can you see if there is a message in the image, or if its just your sombody sending you some picures from yeaterdays party or a screenshot? If you have "non-geek"- friends, chance is they will be using (iack)Outlook which happily sends HTML.

--
Scitne aliquis remedium potimum crapulae?
Re:Render the HTML then use OCR by babbage · 2003-07-24 02:30 · Score: 1

What about hashing the raw e-mail body ASCII before doing OCR? That way, the hash could be compated against a database of known hashed-and-determined-to-be-spam e-mails. Since 90% of the e-mail on the net is spam, this will save the time of having to OCR a good majority of the e-mail.

The problem with that approach is what has been called the "Chinese menu" or "Mad Libs" approach to spam message generation: create a framework with a series of variables, then snap them together randomly to create a ridiculously large variety of permutations out of a very small collection of variables & values. So to use this as a spam author, come up with a general template for your message, such as this pseudo-XML-ish one:

>salutation /<,
>appeal<
>pitch<
>testimonials1<
>offer<
>testimonials2<
>purchace<

And then you come up with a handful of things to use for each position in the template. In this example, where I've defined just seven positions, setting just five options for each of those positions would yield 7^5 combinations, or 16,807. Setting a sixth option for everything yields 117,649 combinations; setting a seventh yields 823,543.
If the order of elements in the template can happen randomly, the number of possible combinations rises even faster. My math skills lose a bit of steam here, but I'm guessing it will be figures like you see above, multiplied by the factorial of the template size -- which for a 7 element template as here, is 5040. So 16,807 & 5040 or 84,707,280 for the five values scenario, 117,649 & 5040 or 592,950,960 for the six values scenario, and 823,543 & 5040 or 4,150,656,720 for the seven values scenario. (Corrections to my math welcome, but I'm pretty sure that my general approach here is sound.)
And that's just a trivial way to scramble the contents of one spam.
If spammers were to use tricks like this -- and they do -- then hashing techniques are only going to be partially effective. That's not to say that they are useless -- I understand that the DCC, Razor, and Pyzor spam tools all use checksums, and they do help. But it seems to me that the "Chinese menu" approach is ultimately going to defeat any strategy where hashing is the main or only component. That's why SpamAssassin allows you to optionally use one or more checksum tools as an auxilliary to the suite of tests that are already applied, allowing checksums to lend or remove confidence from a spam estimate, but never enough to decide by just that criteria.
Incidently, my understanding of the way Bayes works is that it's a checksum technique exactly suited to the Chinese menu problem: decompose the message into a set of "features", each of a couple of words or so, and then come up with a confidence value for each of those features. (This is also the approach used by Bill Yerazunis' CRM114 & MailFilter tools.) The average value for all your message's features is calculated, resulting in an overall confidence percentage which can then be -- in SpamAssassin's case -- passed off for use as part of the overall evaluation.
But anyway, I don't see how bringing OCR into the picture helps at all. It's still a huge amount of overhead to be using, and I'd argue that the results it comes up with can't be any more reliable than standalone checksums &/or Bayes analysis. The additional work might be worthwhile IFF you stand a reasonable chance of coming up with a definitive answer by doing so, but I don't think that's likely to happen -- Bayes itself isn't bulletproof, and I wouldn't expect Bayes analysis of possibly mis-scanned text to be any more reliable than analysis of plaintext.

Of course, then you have the additional problem of loading the actual images...sometimes e-mai

--
DO NOT LEAVE IT IS NOT REAL
Re:Render the HTML then use OCR by ron_ivi · 2003-07-24 07:47 · Score: 1

i've got explicit "this is not spam" filters that I add people to when they say "why didn't you reply to my html email".
Basically I went from really complicated keyword based filtering with different weights on each keyword to just these rules.
if (known-good-list) { return notspam; } elsif (html) { return spam; } else { return notspam; }
Re:Render the HTML then use OCR by plover · 2003-07-25 02:44 · Score: 1

"TEXT EMAIL ONLY" will not mean anything to anyone.
Less than 5% of the people I know (and I am geek-heavy in friends) understand the different between HTML and "TEXT EMAIL". If you say "TEXT EMAIL ONLY" the rest of these people will think "he doesn't want pictures? But I didn't send him a picture! WTF?" You have to think outside your immediate friends box and include non-tech people like your mom, your uncle in Rhode Island, human resources drones at large corporations who are replying to your resumes, etc.
Most of the world is not clueful enough to be able to tell the difference, and that's why mechanically interpreting the HTML for a spam filter is important.
Besides, the parent poster you replied to is not "rendering" the HTML, he's merely parsing it. No links will be followed, no images retrieved. The parent he replied to suggested rendering the HTML and OCRing it, which would not be the wisest action to take for precisely the reasons you mentioned.

--
John
Re:Render the HTML then use OCR by Chiwo · 2003-07-29 10:55 · Score: 1

Or use "lynx -dump".

Re: SPAM by ftvcs · 2003-07-23 00:01 · Score: 3, Funny

You mean the "Search Pattern Assessment Model" method?

Like hacking books... by mgcsinc · 2003-07-23 00:01 · Score: 1

Anyone see this being helpful to both spammer and spamee

insider help is the key. by professorhojo · 2003-07-23 00:01 · Score: 5, Interesting

i had a friend who recently turned to the dark side and now boasts that his circle of friends include the biggest spammers in the world.

and believe it or not, the biggest break these guys have had in the past year has been help from people on the "inside".

to give you an example, an ex-AOL employer has written them a little proggy for these guys to send messages that makes the AOL mailservers think that the mail originated on the inside of the network (which means that none of it is spam checked or filtered.)

their usual 10% deliverability to AOL.com suddenly went to 100%. make no mistake -- that was worth millions to 'em.

Re:insider help is the key. by Anonym0us+Cow+Herd · 2003-07-23 01:29 · Score: 2, Insightful

that was worth millions to 'em.

I am skeptical that spammers have millions.

If you really could get rich as a spammer, then everyone would be doing it. It would be too good to be true. Sort of like free P2P music. Everyone would be doing it.

If they had millions, there are far more effective ways to advertise whatever legitimate product that people are buying in such volume as to make them their millions. Or were you referring to millions of Iraqi Dinars?

--
The price of freedom is eternal litigation.
Re:insider help is the key. by Lord_Dweomer · 2003-07-23 02:41 · Score: 3, Funny

"i had a friend who recently turned to the dark side and now boasts that his circle of friends include the biggest spammers in the world. "
Could you please post his name and address? You don't have to do anything to him, I'm sure Slashdot will take care of it. Its not like it would be bad...we'd just be giving him the opportunity to receive many great offers on products he may be interested in.

--
Buy Steampunk Clothing Online!
Re:insider help is the key. by MntlChaos · 2003-07-23 03:41 · Score: 1

nah. millions of russian rubles!
Re:insider help is the key. by Reziac · 2003-07-23 04:58 · Score: 1

While back there was some stuff posted here, linked to real articles, about how much money spammers can make. One American spammer claimed income as high as $10,000 per DAY. When filters and such cut them down to only a grand or so a day, he whined and complained about how spam filters were ruining his business.

So yes, you CAN get rich as a spammer. Of course, everyone else in the world will be out for your blood, but that's the price of being an evil email overlord.

--
~REZ~ #43301. Who'd fake being me anyway?
Re:insider help is the key. by professorhojo · 2003-07-23 04:59 · Score: 1

collectively - it was worth millions this year alone.

if you don't think the big boys are earning hundreds of thousands of dollars a year spamming then you're absolutely dreaming.
If you really could get rich as a spammer, then everyone would be doing it
You can - but it's hard to get right. My friend went through spam "setups" in latvia, russia, central america and finally china. Once you get it right, the dollars roll in.

If spam didn't work, no-one would be doing it.
Re:insider help is the key. by Anonymous Coward · 2003-07-23 06:03 · Score: 0

so what are their names?
we would all be very intrested in verifing your information!

so give it up!
f*uck them up their stupid asses...
Re:insider help is the key. by retneprac · 2003-07-23 09:27 · Score: 1

Would those products be knuckle sandwiches?

Easy Solution by grennis · 2003-07-23 00:02 · Score: 3, Interesting

If you try to keep up with HTML tag tricks, you will always be one step behind.

Why not have your spam filter render the HTML in an offscreen buffer (using existing browser/plugin API's), than pull the straight text out of the rendered document and run the filter on that?

Re:Easy Solution by iapetus · 2003-07-23 00:31 · Score: 3, Interesting

Why not just ditch the whole sorry concept of HTML e-mails? Seems like a better solution to me. Can't quite do that yet, but as a bare minimum HTML image tags (and anything else that makes a request automatically to a remote server, thus confirming the validity of your e-mail address) should be ignored.

--
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.
Re:Easy Solution by Urkki · 2003-07-23 01:08 · Score: 1

Nice idea. But it's too late. *You* can ditch HTML mail, but that might lose you some non-spam mails. And there's no way to force everybody to ditch HTML mail.
Re:Easy Solution by iapetus · 2003-07-23 01:12 · Score: 1

Sounds fair to me. *They* can have the spam.

Seriously, I can count on the fingers of one hand the number of sources of HTML e-mail that I'm actually interested in, and a simple whitelist should give reasonable results on letting those through and keeping the spammers out.

--
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.
Re:Easy Solution by Technician · 2003-07-23 01:45 · Score: 3, Insightful

spam filter render the HTML

NEVER! Why would I want my client or server validate my address by visiting ther site to fetch some visual. I'd rather have it show up as a dead letter unopened and deleted.

--
The truth shall set you free!
Re:Easy Solution by Anonymous Coward · 2003-07-23 01:53 · Score: 0

Why would I want my client or server validate my address by visiting ther site to fetch some visual. I'd rather have it show up as a dead letter unopened and deleted.
Well, you can have a half-way-house, where the text is rendered and sent to the filter as ASCII, but no external images are fetched...
Re:Easy Solution by tgibbs · 2003-07-23 06:27 · Score: 1

Why not have your spam filter render the HTML in an offscreen buffer (using existing browser/plugin API's), than pull the straight text out of the rendered document and run the filter on that?
Much harder than it sounds. Humans are much better at recognizing text than any program, so fooling recognition of rendered text would be even easier than fooling a Baysian filter. It would be trivial to develop a font that could be read by a human but not even seen as text by a recognition algorithm.
Re:Easy Solution by grennis · 2003-07-23 06:39 · Score: 1

Thats not what I meant... What I mean is render the HTML offscreen (imagine a hidden IE window for example). Then do the equivalent of "File.. Save As... *.txt" and convert it to a .txt document. No images, no fonts. Then run the filter on this text. Bingo!!! Any images will be (and should be) ignored.
Re:Easy Solution by tgibbs · 2003-07-23 06:47 · Score: 1

Thats not what I meant... What I mean is render the HTML offscreen (imagine a hidden IE window for example). Then do the equivalent of "File.. Save As... *.txt" and convert it to a .txt document. No images, no fonts. Then run the filter on this text. Bingo!!! Any images will be (and should be) ignored.
A program can certainly analyze the HTML code (which actually is text, so no rendering is required). This will work for some of the HTML obfuscation tricks. But this won't do much for the most common spam I am currently receiving, in which the spam elements of the message are entirely contained within a graphic.

Bayesian Filters by Goo.cc · 2003-07-23 00:03 · Score: 1

I like using the Bayesian filter Bogofilter to filter out my spams. It works pretty well and I like the ideas behind it.

But there doesn't seem to be any testing on the effectiveness of one Bayesian filter in comparison to another (for example, Bogofilter's effectiveness comparted to POPFile's), or to Bayesian Chain Rule filters such as CRM114's Mailfilter or Dspam.

Re: SPAM by wiggys · 2003-07-23 00:03 · Score: 1, Funny

Maybe it should stand for

"Stupid People Abusing Mail"

--

Sorry, but my karma just ran over your dogma.

Intresting article by WegianWarrior · 2003-07-23 00:03 · Score: 3, Insightful

who can possibly resist if the word "Free" is in red and bold? Well, me for starters. Still, this one line of the article is taken from the opening, describing a more serious problem; the fact that much spam uses so called 'enchanted email', that is HTML-mail. For all the other bad thing about that, the one thing I find most sinister is that it is easy to have the html-code pull a picture or something from a remote server; thus making it easy to validate your e-mail adress (logicaly, if you open the mail, the adress they sendt it to is active). In short, banning 'enchanted email' would lessen the amout of spam, as well as the bandwith it steals.

Apart from that I got a chuckle out the fact that spammers now seem to be speaking 1337;
Ze Foreign Accent
What: Replace letters with numbers or use nonsense accents
Example from the wild:

V1DE0 T4PE M0RTG4GE

Fántástìç -- eárn mõnéy thrôugh unçõlleçted judgments

The best spamfilter - withthe least false positives - are the one most people of common sence has between his ears. Anything else are mearly sorting your mail according to a fixed set of rules.

--
Everything in the world is controlled by a small, evil group to which, unfortunately, no one you know belongs.

Re:Intresting article by DukeyToo · 2003-07-23 02:06 · Score: 2, Interesting

Actually, your last statement (or is it a tagline?) has been shown to be incorrect! Bayesian filters can actually be better at sorting mail than a live person. Probably because they do not use a fixed set of rules.

A while ago when I was researching mail classification techniques, I saw a study that compared the accuracy of some classification techniques. The study took mail that had been manually classified, and compared that to how a several trained filters classified the mail.

They found, as a side-note, that the filter actually did a better job than the people they got to manually sort the mail!

I'm not much for the details, so no URL for the study, sorry :(

In any case, reading emails manually defeats the point, especially for my poor mom who is horrified by some of the messages she recieves.

--
Most writers regard truth as their most valuable possession, and therefore are most economical in its use - Mark Twain
Re:Intresting article by Anonymous Coward · 2003-07-23 09:00 · Score: 0

The best spamfilter - withthe least false positives - are the one most people of common sence has between his ears.

Don't you have anything better for your grey matter to do than sort spam? I sure do, and that's the point.

Looking at the "tricks" by Advocadus+Diaboli · 2003-07-23 00:04 · Score: 1

shows me that most of the examples from the wild use HTML in their spam mails. So my tiny solution here in the office (behind a lousy working spam filter) is to redirect mails with content-type "text/html" to a spam folder, and yes 99.9% of it is really spam that can be thrown away. The other sort of spam that arrives here is encoded with Korean charset and also easy to filter out.

OK, deliberate mistake in my post by wiggys · 2003-07-23 00:06 · Score: 1

"Email containing words with your name in it, or words relating to your life or work, would be given a higher probability of being called spam."

Ok, I need a proof-reader (either that or an audited-edit feature, you listening Taco?). I meant to say

"Email containing words with your name in it, or words relating to your life or work, would be given a higher probability of being marked genuine."

--

Sorry, but my karma just ran over your dogma.

Re:OK, deliberate mistake in my post by LinuxLuvr · 2003-07-24 07:30 · Score: 1

It was a deliberate mistake?

--
Microsoft Works: Oxymoron of the year. ~ ^.^
Re:OK, deliberate mistake in my post by wiggys · 2003-07-24 08:57 · Score: 1

Yep! (I was being sarcastic)

--
Sorry, but my karma just ran over your dogma.

Re: SPAM by ftvcs · 2003-07-23 00:08 · Score: 0

How about
* Sad Person After Money
* Some People Are Morons
* Stupid Posts Are Meaningless
* Stop Posting Annoying Messages
* Stupid People Are Mandatory
* Suckers Protesting Against Mules
* Stupid Person At Machine
* Stupid Posters Advocating Maliciousness
* Sexual Perverts And Moneygrabbers
* Some Parts Are Meat
* SPiced hAM *
* Squirrels, Possums And Mice
* Strangled Parakeets Animal Manure
* Satan Posing As Man
* Seventy Percent Are Males
* Sprinkling Possibility As Mail

The official meaning of SPAM in terms of the Internet is "Self Promotional Advertising Message."

What a waste of effort by Zog+The+Undeniable · 2003-07-23 00:08 · Score: 3, Interesting

If spammers have to go to such great lengths - and some of this stuff is admittedly clever - to get spam through, has it not dawned on them that 99.9% of people don't want to receive it? Perhaps we should ignore the spammers and target the 0.1% of idiots who actually reply and end up buying "generic Viagra" and septic tank cleaner. It reminds me of that Simpsons Hallowe'en episode with the giant advertising figures destroying Springfield. If everyone ignores them, they will die.

I still favour going after the people paying the spammers rather than the spammers themselves...unlike the big spam rings, they at least have to be locatable, otherwise they'd never be able to sell you stuff.

--
When I am king, you will be first against the wall.

Re:What a waste of effort by Doctor7 · 2003-07-23 00:35 · Score: 1

The spammers themselves are well aware that nobody wants to receive it. They just do their best to make sure their customers don't know it. Similarly, they will promise to send out X million emails on the customer's behalf without mentioning that half of those will go to people in irrelevant countries.
Re:What a waste of effort by Mostly+a+lurker · 2003-07-23 00:59 · Score: 2, Insightful

Perhaps we should ignore the spammers and target the 0.1% of idiots who actually reply
It seems logical, but the economics of spam are such that even one sale per million e-mails gives them a big profit. No matter how many idiots you can reach to discourage from replying, there are still going to be some who fall through the cracks.
I do not think spam will ever be eliminated entirely. Eventually, though, mechanisms will be put in place to allow the situation to be brought under control. Perhaps something along the lines of ...
1. Most regular e-mail using encryption.
2. Spam detection of unencrypted e-mails built into the Internet infrastructure itself at various levels The objective would be to identify spam attacks as soon as (and as close to the original source) as possible. Methods analogous to those used today for control of DDOS attacks would then be employed.
Re:What a waste of effort by siamSam · 2003-07-23 02:02 · Score: 1

I agree whole heartedly! I suggest we put together an email describing spam and asking those idiots (0.1% of us) not to reply to the emails. Hmmm.... but since we don't know who those 0.1% are, let's send the message to the other 99.9% as well. Sound good?

You don't happen to have a mail server we can do this with do you?

Spammers using the anti-spam tools by dimer0 · 2003-07-23 00:15 · Score: 4, Interesting

I helped this lady out who had a 100% opt-in mailing list, but some people weren't getting their mailings... We came to find out the emails were being flagged as spam, so, I set up a dummy email account for her than took every inbound message, sent it through spamassassin (with verbose reports, etc) - and then sent the email back to her.

Now she can see if there's a problem with the headers, the content of the email, etc - so she tunes the email to get the lowest spamassassin score. (You know, the last major version of spamassassin took off points if you put your email client header as being Mozilla! Hah.. That one is gone now)..

This lady definitely isn't a spammer tho, just someone with a small mailing list of 100% opted-in people.

I'm sure spammers do the same thing. I would.

Re:Spammers using the anti-spam tools by AndroidCat · 2003-07-23 01:38 · Score: 1

a small mailing list of 100% opted-in people
She does have some sort of confirmation stage so that h4x0r-X can't "opt-in" a few thousand people, right?

--
One line blog. I hear that they're called Twitters now.
Re:Spammers using the anti-spam tools by Anonymous Coward · 2003-07-23 02:02 · Score: 0

100% Opt in? Thats what all the spam's I get say. You really need a double opt-in, where someone opts in, then you email them to make sure they really did opt in, then they reply to that. If you ever noticed, all the good mailing lists do that. Its the unscrupulous spammers that don't (including the marketers at the company I work for).
Re:Spammers using the anti-spam tools by leerpm · 2003-07-26 04:31 · Score: 1

I administer a mailing list for a small business that sells products online. The list is about 10,000 in size, and it is 100% opt-in. Though it's not a double opt-in (with a confirmation), all subscription requests by end users are queued. Then we process the queue every few days, if anything looks out of wack, we delete all the suspicious entries.
Re:Spammers using the anti-spam tools by AndroidCat · 2003-07-26 06:59 · Score: 1

Opt-in with confirmation isn't "double opt-in". That term was created by the Direct Marketing people to confuse the issue, and means whatever they say it means. (One said they send a "targeted lead" [scraped address] a "confirmation" and if there is no opt-out reply, you're double secret opt'ed-in. Riiight!)
Hopefully you'll catch any large scale forging, but you're still vulnerable to one at a time forging. Your list, your rules, I just hope it doesn't bite you in the future. (Most mailing list packages should support a confirmation handshake option of some kind.)

--
One line blog. I hear that they're called Twitters now.

Notice this gem from MS. Re:Dirty Little Secret by Anonymous Coward · 2003-07-23 00:23 · Score: 0

Received: from ann.coward.com ([unix socket])
by ann.coward.com (Cyrus v2.1.11) with LMTP; Tue, 04 Mar 2003 0
4:25:32 -0600
X-Sieve: CMU Sieve 2.2
Return-Path:
Received: from HMT3-CLT1.hotmailtest3.com (hmt3-clt1.hotmailtest3.com [64.4.7.32
])
by tandem.milestonerdl.com (8.12.8/8.12.7) with ESMTP id h24APV1C029223
for ; Tue, 4 Mar 2003 04:25:31 -0600 (CST)
(envelope-from Phonecalls@nootede.nl)
Received: from mail.nootede.nl ([61.11.79.215]) by HMT3-CLT1.hotmailtest3.com wi
th Microsoft SMTPSVC(5.0.2195.4821);
Tue, 4 Mar 2003 02:38:55 -0800
Message-ID:
To:
From: "Life Savings"
Subject: Life Insurance up to 75% Off. Get a FREE Quote Now! 7
983
Date: Tue, 04 Mar 2003 02:33:09 -2000
MIME-Version: 1.0
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-OriginalArrivalTime: 04 Mar 2003 10:39:01.0196 (UTC) FILETIME=[44E9F8C0:01C2E2
3A]

I'm glad my last name is not.... by te+amo · 2003-07-23 00:23 · Score: 1

...Graham-Cumming...(snicker)

Re:I'm glad my last name is not.... by JohnGrahamCumming · 2003-07-23 05:35 · Score: 1

Very funny. Not.

You are lucky though. There are a number of web sites that refuse to sign me up under my real name because it's "offensive".

John.

Conflict of interest by Anonymous Coward · 2003-07-23 00:27 · Score: 0

It's in their interests to disseminate this information. From ActiveState's own site:

[John Graham-Cumming has] responsibility for applying innovative email classification and learning
techniques to ActiveState's enterprise email filtering software, PureMessage

Yes, it will aid the competition, but it will aid the enemy more, thus increasing the threat, the perceived threat and the market.

Re:Conflict of interest by JohnGrahamCumming · 2003-07-23 05:37 · Score: 1

I don't buy that at all.

The spammers already know these tricks and so this does not increase the threat. From ActiveState's perspective it's good marketing because it gets news stories on Slashdot.

John.
Re:Conflict of interest by spumoni_fettuccini · 2003-07-23 07:20 · Score: 1

I had this same quandry...I posted a JE with what I think is a new method of spamming instead of seeing if it would make it as a story. I'd find it helpful if you had any feedback [please see journal 22 July].

--
-- Some days you're the dog; some days you're the hydrant.

Re: SPAM by Anonymous Coward · 2003-07-23 00:29 · Score: 5, Informative

The official meaning of SPAM in terms of the Internet is "Self Promotional Advertising Message."

Rubbish - that's an acronym after the fact. The real meaning is that receiving that sort of message is as annoying as having a bunch of Vikings shouting "spam, spam, spam, spam" and drowning out your conversation. Anyone tells you different, they're a n00b to the net and you should ignore them.

Use NOT for a filter by TheVampire · 2003-07-23 00:29 · Score: 2, Interesting

My filter works 100% of the time. If the mail does NOT include a certain series of letters and numbers, then the mail is deleted. The people that e-mail me know to include that in the mail, so their stuff gets through. Of course, if you want to subscribe to lists, then this sort of thing won't work.

Re:Use NOT for a filter by Fuzzums · 2003-07-23 00:43 · Score: 2, Interesting

also this will only work for private mail.
i can imagine a (not-spamming) commercial website telling people to put "qwerty" in their e-mail. not.

but the idea is whitelisting. only allow a selected group of people to send you mail.

for a company i can imagine the use of a html-form to "send" mail. for spammers it would be too much trouble to find a lot of those forms and write scripts ao spam them.

--
Privacy is terrorism.
Re:Use NOT for a filter by TheVampire · 2003-07-23 01:36 · Score: 1

I have seen websites use this method, by listing an e-mail address and then mentioning that the subject of the mail must include a certain word or phrase to avoid being deleted.

Personally, I dislike using a web form to e-mail a company, mostly because they put a lot of "required" fields in them.
Re:Use NOT for a filter by Elvisisdead · 2003-07-23 01:51 · Score: 1

It works well for some things, though. Although not quite the same, Declan McCullagh's list always comes with "FC:" in the subject line, so I can filter it into a sub-directory for later reading. He does it as a responsible list owner, so his messages can be easily identified.

--

"Want in one hand and spit in the other and see which one fills up first." - My Dad
Re:Use NOT for a filter by Fuzzums · 2003-07-23 01:53 · Score: 1

required, shmequired indeed.

please enter your name...
clockety-click (type "Your name" enter). happy now ;)

--
Privacy is terrorism.

Not really by scottme · 2003-07-23 00:30 · Score: 1

I'm sure that is what the spammers hope and believe, but in fact most Bayesian filters associate a probability factor to each token or word, and they make a decision based on the set of tokens with the highest or lowest scores. For example, in Paul Graham's seminal Plan for Spam he describes using only the 15 most significant tokens to make the determination of the message's spamminess. So it really doesn't help to try to bury words like "penis" or "viagra" in a mass of obscure or invented words, however large; the filters will ignore those and home in on the bad words.

In fact, the spammers' choice of obscure or invented words as padding is dumb. If they would use regular words such as do occur in the legitimate email you want to read, there's actually a chance that over time they could render Bayesian filters less potent, because the good words would become more associated with spam than with legitimate mail. Careful attention to the training corpus is needed to avoid this happening.

Re:Not really by Moryath · 2003-07-23 02:27 · Score: 2, Informative

You miss the point.

Yes, it assesses the email on the basis of "15 bad words", but it also assesses on the "15 good words" or words that indicate it's legitimate.

Chances are they have only one or two of the "bad" words (penis, viagra, v*i*a*g*r*a, etc...). Perhaps less once they munge it so that things are broken up into pieces. The HTML tricks are all designed so that the filter doesn't realize that you have one of the "bad" words split up into sections.

The insertion of "good" text is designed to try to trip 2-3 "nonspam" indicators, thus causing the filter to pass the mail as "good".

The insertion of the "good" text also serves, if you use a bayesian filter, to "poison" your filter so that legitimate mail using those same words has a tendency to get tagged as spam.

It's a three-pronged attack:
#1 -- munge out the bad words
#2 -- drop in "innocent" text to make it look legit
#3 -- send in such volume that the "innocent" text gets poisoned in the filter and starts causing false positives.

What they're really after, of course, is number 3; if they can cause enough false positives, people will turn off the filters again. That's why they think nothing of sending the same spam 500 times to the same person in three days: when they are using a technique like this, every spam that gets filtered and tagged as spam furthers goal #3.

I still say the best way to deal with spammers is with a good old non-technical solution: a two-by-four upside the head.

I noticed a new one recently by AssFace · 2003-07-23 00:36 · Score: 5, Interesting

It isn't that this new one that I saw was all that amazing an idea, I just hadn't seen it until recently. It is such an obvious idea that I don't know why I haven't seen it until more recently.

They send the mail as you. Fake the headers and make it look like it is from you. To you. From you.

I had our local setup here allowing in anything that was from our domain. Now I have to stop that.

I suppose the spammers saw that people were allowing their own domains and set it up that way.

On a side note and not all that related, I've noticed that I am getting (about once a week) an e-mail from a bank - citibank, or wells fargo, telling me that my loan application has not been approved, see details attached.
Now, I haven't been applying for loans, and the file attached is a *.pif file... which are notorious for being viruses, and not a format that a bank will send you.
Not to mention that looking at the headers, they usually come from attbi.com which is cable modems, and I have seen through Compuserve as well - which aren't exactly how banks usually do business.

--

There are some odd things afoot now, in the Villa Straylight.

Re:I noticed a new one recently by gowen · 2003-07-23 01:57 · Score: 1

I had our local setup here allowing in anything that was from our domain. Now I have to stop that.
Not if you filter on the right thing. Pretty much the *only* reliable thing to trace a spam's source is the IP address in the first Received: header.

That is inserted by *your* MTA and cannot be easily faked, without complicated IP spoofing.
If that IP address is on your network, you may freely let the mail in. If you were validating on "From:" or "Sender:" headers (or any other of those that are easily and frequently forged) then ... maybe Mail Admin is not the job you were cut out for.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:I noticed a new one recently by AssFace · 2003-07-23 02:08 · Score: 1

LOL - are there really people out there that have the job title of "Mail Admin"??

I can only dream that someday I reach that level.

--

There are some odd things afoot now, in the Villa Straylight.
Re:I noticed a new one recently by molo · 2003-07-23 03:29 · Score: 1

You should check the Received: headers to make sure its actually from your machines. If its not, then dump it. Its pretty straightforward.

You can do the same thing with hotmail, yahoo, excite, etc. to make sure the mail is actually from those domains.

This, plus rejecting messages without a message-id header and html-only messages has caught the vast majority of my spam to date.

-molo

--
Using your sig line to advertise for friends is lame.
Re:I noticed a new one recently by Anonymous Coward · 2003-07-23 03:37 · Score: 0

On a side note and not all that related, I've noticed that I am getting (about once a week) an e-mail from a bank - citibank, or wells fargo, telling me that my loan application has not been approved, see details attached.
Now, I haven't been applying for loans, and the file attached is a *.pif file... which are notorious for being viruses, and not a format that a bank will send you.
Not to mention that looking at the headers, they usually come from attbi.com which is cable modems, and I have seen through Compuserve as well - which aren't exactly how banks usually do business.

Sounds like the spammers are trying to get themselves more drones to spam through...
Re:I noticed a new one recently by jeremyp · 2003-07-23 04:25 · Score: 1

SMTP 101. Everything in an e-mail can be faked easily except the IP address of the client mail server.

You can teach anybody to fake e-mails in about 10 minutes as long as you have access to a telnet client.

--
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
Re:I noticed a new one recently by realdpk · 2003-07-23 04:57 · Score: 2, Interesting

What's most impressive about those .pif spams from "Wells Fargo" and "Citibank" is that the spammer uses good grammar and spelling. This is an incredible leap in spammer technique that I'm surprised has not received more attention.

Follow the money by SirLanse · 2003-07-23 00:37 · Score: 3, Interesting

Someone is paying the spammers to spam. They usually have a URL in the email. Set up a screen saver to DDOS the payer. FOLLOW THE MONEY, make it bad to buy spam.

Re:Follow the money by thoth · 2003-07-23 05:10 · Score: 1

I love it - have a project that routes spam through an "autochecker", and one of the check steps involves pinging a site or bringing up a URL. Since the economies of spam involve something like a 0.01% response rate, having even 1% to 10% of people respond will crush the merchant's computers. Spam sent out, servers somewhere roasted.

Why bother? by PeteDotNu · 2003-07-23 00:43 · Score: 1

I've often wondered why spammers go to all this effort to make their emails arrive and pass through the filters. If people are filtering spam out, then surely they aren't going to actually buy the product as a result of receiving the email.

It seems like a pointless drain on resources to me.

--
My other processor is big-endian.

Re:Why bother? by mumblestheclown · 2003-07-23 00:50 · Score: 1, Insightful

ISPs filter, people read. AOL filters, joe AOL buys herbal viagra.
Make sense now?

SPAM filtering by ajs318 · 2003-07-23 00:49 · Score: 2, Interesting

By cunning use of procmail recipes and ten-minute perl hacks, we can implement a spam filter as follows.

Check headers for signs of relay-misuse.
Strip out anything between <mustang> signs; s/(\<.*\>)//g;
Strip out all remaining punctuation.
Use a tr/// to convert accented characters to unaccented.
Recall that when used in a scalar context, s/// and tr/// return a count of successful changes made.
Check for certain words in the munged text.

We can assign messages a score based on how many "nasties" were removed as compared to how many would be in a legitimate e-mail. Then despatch to one of three mailboxes: one for stuff we are sure is legit, one for stuff we are sure is spam, and one for stuff where we aren't sure. If we wanted to be really paranoid, we would strip out image links and JavaScript from HTML e-mails. It's not inconceivable that an image link could actually be a link to a CGI script with a unique identifier embedded into it, for the purpose of alerting the spammer that copy # 31337 {faute de mieux} of the message went to a working e-mail address. {Possibility for mischief?}

And if we were an ISP, doing this on a public server, we would allow our customers to send abuse notifications to the appropriate server owners {for all the good it's likely to do} with just a few clicks.

--
Je fume. Tu fumes. Nous fûmes!

Re:SPAM filtering by dwsauder · 2003-07-23 15:08 · Score: 1

Think using an image link to a CGI script is bad?
I have seen a spam message where they used a CSS stylesheet retrieved from a CGI script to track messages. I'm not sure, but I think that technique may even work against Mozilla with image retrieval turned off.
Re:SPAM filtering by ajs318 · 2003-07-23 19:40 · Score: 1

It doesn't surprise me one iota. But while we've got a perl script running, we can check for that sort of thing too. Some ISPs like to confine CGI scripts to a /cgi-bin directory, but savvy users can override this with a .htaccess file.

By the way, I'm noticing spam nowadays with "innocuous" subject lines, such as "Fwd: did you find my pictures" or "You forgot to answer". Of course they're from nobody I know, so I won't be opening them in my mail reader. But I'll certainly analyse them in case there's anything interesting in there!

--
Je fume. Tu fumes. Nous fûmes!

run spellcheck first by petardi · 2003-07-23 00:50 · Score: 1, Funny

It's just an idea. Spammers are fighting filters by including meaningless text in their messages. Piping mail through a spellcheck could eliminate most spam. And some friends too.

Does spam ever work? by Alkonaut · 2003-07-23 00:57 · Score: 1

If you look at these fractions of the population and do a little multiplication:

People who ever get spam (not everyone)
People who ever read spam (small fraction?)
People who read spam and are also idiots
(hopefully an even smaller fraction?)
Idiots reading spam who also happen to have an embarrasingly small penis. (won't even guess here)

any idea how the bogus penis enlargment corp. and the spammers make this good business?

Re:Does spam ever work? by belroth · 2003-07-23 03:41 · Score: 1

4. Idiots reading spam who also happen to have an embarrasingly small penis. (won't even guess here)
There are plenty of idiots who would want a bigger one no matter what the current size.

--
I hereby inform you that I have NOT been required to provide any decryption keys.

Why do they try to trick the filters? by fungai · 2003-07-23 01:01 · Score: 3, Interesting

Someone please explain. People who have spam filters on don't want receive spam, and will most likely just ignore/delete any spam that does get through. Why do the spammers waste so much time trying to get past the filters? Is it to reach the unwashed masses behind ISP filters?

Re:Why do they try to trick the filters? by Urkki · 2003-07-23 01:16 · Score: 3, Insightful

They don't want it, but some of them might read some of it, if the subject is just right. And some of these might fall for it. If it's just 1% and 1%, and you send a ten million spams, that's already 1000 successful messages.
And then of course quite a few people use filters provided by others (like ISP), since it's easy and spam is somewhat bothersome to them, but aren't still totally pissed about it and might read some.
And of course, the less spam gets through filters, the more likely it is that this "successful" spam gets read, if users mailboxes aren't filled with it. So it's competition between spammers, survival of the most evil, so to say. And I suppose also when marketting spamming services, being able to say "we know how to send mail to all AOLers" is prolly helpful...
Re:Why do they try to trick the filters? by Mwongozi · 2003-07-23 02:25 · Score: 1

Is it to reach the unwashed masses behind ISP filters?
Yes.

Stupid Spammer Tricks by AndroidCat · 2003-07-23 01:06 · Score: 2, Informative

Of course, any HTML tags in an email are a pretty good indication (along with other indicators) that it's spam and can be tagged and bagged. I do get an occasional valid email with HTML, but a little tuning or whitelisting will fix that.

So a fat lot of good all those HTML tricks do you, eh spammers? (Are spammers stupid? Yes! It's Rule #3.)

--
One line blog. I hear that they're called Twitters now.

linch mobs by Leahar · 2003-07-23 01:11 · Score: 2, Funny

i think the only reasonable solution to this problem. is to switch to a spamer detarant system we could orginse lefleting compains out side there companys perhaps write deep and understanding letters explain our dismay at there actions, maby we could wrap the said leters around bricks or other solid objects to aid in there delevery through a window of aformentioned companys. we could take our dismay to the managed of the companys and set up some kind of dialog maby but not definalty involving two jump leads a car battery and a book entitled gonad electricution for dummys to get our point accross (done with the utmost respect of course) should the spamers still resist our polite but firm requests things would have to get more serious

--
Roses are Red Violates are Blue im not very good a poetry but i have many other redeming qualitys

But does it need to be perfect? by JanneM · 2003-07-23 01:15 · Score: 5, Interesting

I have on occasion misclassified mail myself, both ways. A few spams (uncolicited bulk emails) have been full enough of content that I have found interesting that I only after reading it realized this was not from anybody I knew. Conversely, I have a couple of times received mail which was for me , and was genuine, but so poorly formatted (lots of obnoxious html, strange subject and so on) that I deleted it as spam and only later came to understand it was a serious message.

The point is, not even I can do spam classification 100% correctly. It would be a tall order indeed to have an automated tool do it. But does this matter? There are two issues: discarded genuine mail, and non-caught spam.

Discarded genuine mail is not really as big a problem as people make it out to be. Mail is inherently not guaranteed; messages do fall between the cracks now and again. Swallowed by a buggy server, lost in limbo as a network connection goes down, never having a chance due to a misspelt or obsolete address, sent on a wild goose chase due to a temporary DNS error. Mail do disappear. Everybody knows that - or should know. Mistaking a mail for spam is just another crack for it to fall into. As long as the rate is low there really is no problem. And those doing mail that can easily be mistaken for spam will wise up eventually, as they see a disproprtionate amount of their email get lost in the ether.

Missing spam is no real problem either. The big issue is having fifty spam in your inbox every morning, with another fifty arriving during the day. Having one or two a day, on the other hand, is not that painful.

The point is, it is not a binary system: A spam system that misses two spams a day is better than one that misses five, and vastly better than having no system at all. Similarily, one that classifies one genuine message out of a thousand as spam is no disaster. Not good, but not a reason to shut it all down either. If reliability is _that_ important, what are you doing using email in the first place?

Filtering isn't perfect. It won't ever be perfect. That's quite alright. Saying a technique is worthless because it makes an occasional mistake is throwing out the baby with the bathwater.

--
Trust the Computer. The Computer is your friend.

Re: SPAM by Jafafa+Hots · 2003-07-23 01:15 · Score: 4, Funny

Sexual Propaganda Aimed at Men

--
This space available.

Avoiding spam of all kinds by doodleboy · 2003-07-23 01:17 · Score: 4, Informative

This will all be blindingly obvious to most readers of /., but just for the record:

Don't use your personal email address for anything online. Don't post to usenet with it, don't use it to register for anything, don't ever use it where there's any chance of it being sold to a third party or picked up by a web crawler. Use a free throwaway web-based account like hotmail or yahoo, that's what they're for. I have a verizon.net primary email address, and I've never received a single piece of spam from it.

However, I still have a forward-only email address from my university circa 1992. Back then, there was no spam and that address has to be on every spammer's list on the planet. I still get a legitimate email every year or two, but spam outnumbers these by at least 10,000 to 1. SpamAssassin does a surprisingly good job of identifying the garbage.

I also use a proxy to surf the web, as well as a large hosts file that reroutes requests to adservers to 127.0.0.1:80, combined with a utility that returns a transparent 1x1 gif to any request on port 80. And of course I use mozilla to block pop-ups and whatnot. I'm so used to surfing in this way that I always recoil in horror when I have to use IE on a naked, unprotected box. How on earth can anyone stand it?

As for more traditional types of spam such as telemarketers, there's the national do not call list. It's free, so there's nothing to lose. You'll also want to check out the many excellent resources at the Junkbusters website. One of the most useful features is a Junkbusters Declare page, which builds custom form letters for you that you can use to opt out of Direct Marketing Association junkmail, as well as telling your financial institutions, etc., not to sell your name to third parties. I used it, it's painless, and my privacy is protected.

Of course, it would be much better if we didn't have to jump through hoop after hoop just to get through the day without being pestered by morons.

Re:Avoiding spam of all kinds by Archon-X · 2003-07-23 03:20 · Score: 1

Ok, I tried all of these utilities, and you know what? They block every source of LEGITIMATE adult income there is.

I'm all for anti spam and anti malicioius code, but i can't use it because some zealot somewhere makes the decision that adult == spam.
Re:Avoiding spam of all kinds by Anonymous Coward · 2003-07-23 09:08 · Score: 0

Yeah. This works great until someone brute-forces your mail server. Then you're on some list which is sold to spammers and it's all over.
Re:Avoiding spam of all kinds by Sir+Brialliance · 2003-07-23 13:19 · Score: 1

Use a free throwaway web-based account . . .
I use Spamgourmet, I can give out an address I make up on the spot. After a certain number of messages are sent to that account, (and options can be set, like trusted senders and the like) Spamgourmet eats all the messages sent to that account instead of forwarding it. It's great.

--
I didn't do it! Unless I was supposed to do it. . . (hmm. . .)

TMDA by TheSync · 2003-07-23 01:26 · Score: 4, Interesting

After a while, SpamAssasin's false negatives and positives drove me to the Tagged Message Delivery Agent (TMDA).

TMDA has flexible whitelist and blacklist capabilities. But the big win is that it can be set to autoreply to anyone not on the whitelist, and require them to reply back before allowing the email to get through. Of course, very few spammers have valid return email addresses...

This may seem drastic, but in fact it has made life soooo much easier. It also helps you to "automagically" get off those email lists you signed up for a long time ago, don't really care about, and are too lazy (or lost the info) to sign yourself off ;)

The only sad thing is that no longer do Russian women want to extend my length or give me free money or viagra, and I am no longer in contact with Ms. Sesse Seiko from Uganda...

Re:TMDA by eli173 · 2003-07-23 05:15 · Score: 1

TMDA has flexible whitelist and blacklist capabilities. But the big win is that it can be set to autoreply to anyone not on the whitelist, and require them to reply back before allowing the email to get through. Of course, very few spammers have valid return email addresses...

This may seem drastic, but in fact it has made life soooo much easier.

<rant> ... for you, not for those who communicate with you.

For instance, if you sent an email to a public list, such as, oh, linux-kernel, asking for help or information on something, and I replied with an explanation and got one of your automated messages... well, I wouldn't bother to jump through your hoops. If you want help, you won't put those barriers in my way.

I've run into this on l-k, and I didn't bother to reply to the autoresponse... even though I was answering the poster's question. His loss.
</rant>
Re:TMDA by TheSync · 2003-07-23 05:43 · Score: 1

I'm sorry that you aren't willing to push the "R" button to do you part in the war on spam...

But on the personal responsibility side, if one expects to receive private messages in response to a posting of some kind (be it mailing list or whatever), using TMDA you can set up addresses without filters to subscribe to the list, such as eli173-1-k-responses@biteme.org. That email address can be set up not to have an auto-challenger on it, but still deliver email to eli173@biteme.org.

Should a spammer harvest that tagged address, you can close it down and start up another one.

Moreover, TMDA filters can also use other filtering techniques (ala Procmail), such as looking in headers for a Mailing list name, and it can avoid auto-challenging emails with those headers.

You can check out all the filters here, and there are some common uses here.

To date, I am unaware of missing any non-spam email because of TMDA. Keep in mind that messages can be kept in a "pending" directory until their challenge is replied to. I (quickly) scan that directory once a week or so, in case I missed something.

But it turns out that most people worth emailing with are willing to press a single key for you...

The key difference. by alistair · 2003-07-23 01:28 · Score: 4, Interesting

The key difference is that KMail does this on a per message basis, whereas in Mozilla this is set once in Preferences and I suspect the same is true in Evolution. Thus looking at a HTML message I just received I get the following in a box at the top of the message;

"Note: This is an HTML message. For security reasons, only the raw HTML code is shown. If you trust the sender of this message then you can activate formatted HTML display for this message by clicking here."

The HTML code follows and a single click turns it into a fully rendered message, or an alternate click consignes it to the trash can.

It may be possible to add this as a mozilla mail / thunderbird toolbar, and as Thunderbird takes off I hope we will see this type of quick prefs bar develop to the same extent they have been developed for the mozilla browser component.

Those rearing lands: Spam Poetry? by Heisenbug · 2003-07-23 01:31 · Score: 1

From the in-the-wild sample for the Camoflauge technique:

"those rearing lands
Plasticine sex-cartoons.
eel harness highest
Absolutely new category of adu1t sites.
nobody jets held
Northumbria- diamond sleep."

Any lit majors able to explain this one?

The one thing I never got was... by jdvernon1976 · 2003-07-23 01:33 · Score: 4, Insightful

Why DON'T spammers remove us from their lists when we ask? They're working REALLY REALLY hard (with all the filtering, header forging, etc.) to send mail to people that don't want it. If they would just target their email to those who had indicated that they wanted it, and removed us that had indicated they didn't, they'd save themselves a lot of grief, as measured in legal and technical hassle.

Granted, it's easier for them to ignore the "remove me"s, but is the trouble saved in 'not removing' >= the trouble spent in 'getting past spam filters'?

Besides, if the mails were targeted to those that THOUGHT their penis was small and needed extension....doesn't that mean it's not spam anymore? And wouldn't that make their click-through (or whatever) rate higher, therefore making their own attractiveness as a bulk emailer greater to their customers?

I'm just thinkin' here...

Re:The one thing I never got was... by Anonymous Coward · 2003-07-23 02:19 · Score: 1, Informative

Why DON'T spammers remove us from their lists when we ask? They're working REALLY REALLY hard (with all the filtering, header forging, etc.) to send mail to people that don't want it
Seems like there are two likely reasons. First, they get paid to deliver emails, so removing a name from their list reduces the number of emails they send and the number of dollars they get paid. Second, they get paid for click-throughs, and a certain fraction of recipients-completely independent of whether they're interested in the "product" or not--will click a link, if only by accident. Dropping names from their list reduces the number of these unintentional click-throughs and takes dollars out of the email marketer's pocket.
Re:The one thing I never got was... by Anonymous Coward · 2003-07-23 05:03 · Score: 0

using a real reply-to or opt-out url will bring it to the isp's attention sooner than the spammer can finish a mailing
Re:The one thing I never got was... by Yorkshire · 2003-07-23 05:08 · Score: 1

They do remove ppl from their lists, and then sell the addresses to other spammers.

--
Custom Rules For SpamAssassin
Re:The one thing I never got was... by endofoctober · 2003-07-23 12:19 · Score: 1

They're working REALLY REALLY hard (with all the filtering, header forging, etc.) to send mail to people that don't want it.
Sadly, they aren't working hard to send out spam - once spammers figure out a new way to defeat anti-spam methods, they can construct one email and chug out millions of messages. Lather, rinse, repeat.
It reminds me of the struggle between bacteria and antibiotics - ineffective (or non-existent) laws, poor user understanding, and overly-simple anti-spam software might contribute to highly resistant strains of spam at some point. Either that or we'll end up getting rid of spam by simply getting rid of email altogether.

--
- Jack

How anti-spam laws benefit spammers. by $criptah · 2003-07-23 01:37 · Score: 1

It might be offtopic, but there is a good article in The Wall Street Journal (July the 18th, 2003) about how some spammers might benefit from anti-spam laws. The idea is that big corporations that do legid business via e-mail marketing are trying to eliminate competition that gives them, spammers, a bad name. By reducing the amount of 'get rich quick' or 'increase your penis size to 18 inches' e-mails and following strict guidelines, today's spammers have a chance of being legid, reduce costs of operations and have a well established market base. For example, knowing that people opt-in for some offers means that companies can target the consumers more precisely.

Ahh, I knew there was a catch. Meanwhile, I am going to post my email addresses as 'my_name at domain dot com.'

MX records by MeNeXT · 2003-07-23 01:41 · Score: 2, Insightful

I always wondered why we do not confirm that the sending IP matches the MX record of a domain.

1. Most of the SPAM sent today has this little problem, where the sending server does not resolve to the IP which is listed in the header.

2. It will permit people to first map a domain to an IP.(Makes it harder for a SPAMMER because now he needs to register a domain. Once the domain is used to SPAM it can then be blocked. All blocked domains can be easily maintained in a list and shared by ISP's

3. Time is money. Moving domains from one ISP to another does not help the SPAMMER. The domain is blocked and the IP is identified. The SPAMMER has to be able to activate multiple domains, multiple DNS servers and such. The paterns will be easier to identify and it will be easier to block SPAM by either Blocking the Domain or the DNS server or all the IP's of a certain offending ISP

4. In order to acquire a domain a payment transaction must occure. This can be traced if it's a credit card. ISP's who accept cash withou ID or who continually HOST SPAMMERS can be blocked. The work involved to acquire a domain may inclease the costs of a domain but I am sure that this will enable people to assign responsibility.

While this system is not perfect and, yes it may cause some headaches for most, having sendmail match the MX record to the IP of the sendind server would eliminate almost 100% of all the SPAM that I have encountered in the last 3 months. We would still need to keep the existing anti-spam practices in place.

When SPAMMERS find a way around this we can then address that issue when it's time.

--
DRM? No thanks, I'll just get it somewhere else...

Re:MX records by Anonymous Coward · 2003-07-23 02:55 · Score: 0, Flamebait

Nice idea, but probably too much hassle to be implemented in the near to middle future. Id really prefer if we could finally get rid of the current unsecure and spammer- friendly SMTP instead...
Re:MX records by Anonymous Coward · 2003-07-23 04:29 · Score: 3, Insightful

I always wondered why we do not confirm that the sending IP matches the MX record of a domain.
Because this isn't a reliable test.
1. Most of the SPAM sent today has this little problem, where the sending server does not resolve to the IP which is listed in the header.
Pay attention to your email some time. Lots of legitimate email doesn't match, either. Many companies and most hosting companies use one server for incoming mail - the server the MX record points to - and another for outgoing - one which doesn't have an MX record.
2. It will permit people to first map a domain to an IP.(Makes it harder for a SPAMMER because now he needs to register a domain. Once the domain is used to SPAM it can then be blocked. All blocked domains can be easily maintained in a list and shared by ISP's
Except that most spammers don't use servers under their control, anyway, so this test wouldn't work.
3. Time is money. Moving domains from one ISP to another does not help the SPAMMER. The domain is blocked and the IP is identified. The SPAMMER has to be able to activate multiple domains, multiple DNS servers and such. The paterns will be easier to identify and it will be easier to block SPAM by either Blocking the Domain or the DNS server or all the IP's of a certain offending ISP
Which also doesn't work, because the spammers don't use their own servers.
4. In order to acquire a domain a payment transaction must occure. This can be traced if it's a credit card. ISP's who accept cash withou ID or who continually HOST SPAMMERS can be blocked. The work involved to acquire a domain may inclease the costs of a domain but I am sure that this will enable people to assign responsibility.
A theory beloved of fascists and quick-fix pipe dreamers, but never actually proven to work in the real world. In fact, I don't know where this has ever worked, period.
While this system is not perfect and, yes it may cause some headaches for most, having sendmail match the MX record to the IP of the sendind server would eliminate almost 100% of all the SPAM that I have encountered in the last 3 months. We would still need to keep the existing anti-spam practices in place.
Then what's the freaking point? For me, and for most people I know, this would block about 40% of all *email*, spam and non-spam. The other 60% also includes spam and regular email, so you're not doing anything positive. And the current techniques, constantly improving as more and better filtering techniques become available (e.g. Bayes) already stop 99.9% of the spam I or my users receive. What else do you need? Why make sweeping changes like this to catch .1% or less of spam, particularly with the damage it would do to legitimate email?
Amazing how all the people making these "brilliant" suggestions couldn't manage a real-world mailserver to save their soul. Running Sendmail on your home Linux box doesn't make you a mail admin.
Re:MX records by joeldg · 2003-07-23 04:30 · Score: 1

I wrote a little SMTP server that has a plugin to do exactly this..
It is just skeletoned out in PHP at present, but if you are interested it is on my site here:
http://lucifer.intercosmos.net/index.php?di sp=hone ymail

--
anime+manga together at last.. in real time.
Re:MX records by ahodgson · 2003-07-23 05:17 · Score: 2

A lot of sites send mail out from different servers than they receive mail in from. You'd need a new DNS record type to indicate an "authorized" sending server.

Also, that only works if they send direct to your server. What if the person using the domain name sends through their ISP server from home? What if they send to a mailing list?
Re:MX records by MeNeXT · 2003-07-23 05:17 · Score: 1

Lots of legitimate email doesn't match, either. Many companies and most hosting companies use one server for incoming mail - the server the MX record points to - and another for outgoing - one which doesn't have an MX record.

Just this second I received 2 SPAMS here is the example that I make....

Received: from (HELO h1cmvuw) [191.123.181.36] by XXX.XXX.XXX.XXX SMTP id 0yK797LGLSg8fZ; Wed, 23 Jul 2003 19:51:18 +0400

As you can see the first IP does not match the sending ISP's domain name, as a matter of fact it's not even a valid domain name. If this was a valid ISP these both would match. If not show me an axample of why the would not. This is not the SMP in outlook, this is the date stamp place by the Sending SMTP server. If it's not correct then why would I want to accept it.

Here is a valid one from openoffice.org;

Received: from openoffice.org (s002.sfo.collab.net [64.125.133.202]) an MX record could be setup to support this. It would only be a one time setup ( I do not think "real business" would change SMTP servers every month)

You are right that the SPAMMERS do not use there own servers but they would have to setup a DNS server and also get a connection in order to SPAM. They would have to do this for every domain that they wish to use for SPAM because we can now also blacklist their DNS servers IP as listed on the MX record. This will increase the cost for the SPAMMERS without a great deal of cost to legitimate business.

Filtering does NOT STOP SPAM it hides it. I find it increases it. These sweeping changes would remove 100% of the SPAM which reaches my INBOX, at this time, while none of the legitimate email that I receive would be refused.

SMTP is broken it needs to be fixed, filtering does not addresss the issue just the problem. If a SMTP server is being used just for sending purposes it can then be programed to redirect to the receiving SMTP server.

Your comments seem to make a few assuptions on who I am and what I do but they have shown to me that you are unable to understand that changes needed to be made bucause we ALL can see that what is in place now DOES NOT work, and this includes filtering. If you have another means of stopping SPAM I'm all ears.

--
DRM? No thanks, I'll just get it somewhere else...
Re:MX records by Anonymous Coward · 2003-07-23 06:15 · Score: 0

Your sig should use "you're", not "your".
Re:MX records by AnotherBlackHat · 2003-07-23 07:08 · Score: 2, Informative

I always wondered why we do not confirm that the sending IP matches the MX record of a domain.

You might want to google for "spam" + "DHVP", "DMP", "RMX", "DRIP" or "SPF"

The closest would probably be DHVP.
DHVP checks that the HELO from the sender either has a special "This is valid" record in DNS,
or that an MX record for the HELO string matches the IP address,
or some superset of the HELO's fully qualified domain name has an MX that matches the IP address.

We don't do this because it has a high false positive rate.
Even if you personally would accept 5% of your email being discarded as "non-conforming",
an ISP can't accept that high a false postive rate and stay in business.

-- this is not a .sig
Re:MX records by ajs318 · 2003-07-23 08:40 · Score: 1

The problem with this technique is that it breaks on ADSL connections, where there is typically no MX record for IP addresses assigned to subscribers. {Why would they need one if they didn't have an SMTP server?} Neither BT openworld nor Freeserve have any plans to introduce this on their ADSL services.

SMTP authentication could work, iff enough ISPs implemented it. If there were significant numbers of non-authenticated SMTP servers in action, these could end up being used for sending spam.

Another method, used by my ISP, is to require a successful POP3 login from any IP address before it will permit any SMTP send operations. This is subject to a 15-minute timeout {just long enough to rattle off replies}.

--
Je fume. Tu fumes. Nous fûmes!
Re:MX records by don.g · 2003-07-23 09:16 · Score: 1

So... if I take my laptop to the local university, send mail from my personal address through their SMTP relay... you'd consider it spam? Or require them to modify their relay to attempt route the message through one of the MX records for my domain (which would be horribly inefficient, as I'm on a much smaller pipe than the university, and also a configuration nightmare)?

--
Pretend that something especially witty is here. Thanks.
Re:MX records by robfoo · 2003-07-23 11:54 · Score: 3, Informative

+4, insightful?
I beg to differ!

While this system is not perfect and, yes it may cause some headaches for most, having sendmail match the MX record to the IP of the sendind server would eliminate almost 100% of all the SPAM that I have encountered in the last 3 months.

You're right, this system is not perfect, and would cause a *lot* of headaches for almost all users (or at least, us admins).
Firstly, it creates a lot of technical headaches..

The way I see it, the only way I could send email under your proposed system would be through a relay whose IP address was the same as the server listed in the domain's MX record, right?

So, in order to send email from myaddress@somedomain.com, my MTA has to have the same IP address as somedomain.com's mail exchanger?
Not. Gonna. Work.
I send mail from several different physical locations (home, work, etc), as several different addresses/domains. This means in order to send email as my home address while I'm at work, I'd have to send through my home ISP's mail relay. Which I can't do, because I'm not on their network (and they don't have an open relay, to prevent *spam*).
I also send email as being from a couple of domains I own, but I send this email thru whatever system I happen to be on (ISP or work, whatever), as my domain just points at things, rather than running a full-time MTA just to deliver my email..

Not to mention the fact that most ISPs I can think of would have more than one server in charge of mail, and it would be possible, if not likely, that the outgoing mail relay is a different machine than the one that accepts incoming mail (ie, the one in the MX record).

But let's just assume, for argument's sake, that everything was working as you outline. Everyone sends mail thru a relay whose IP corresponds to the domain they're sending from.
All I need to do to send spam is get an account at an ISP, let's say I get username foo at ISP isp.com. Now I dial up, and send a big bunch of spam, from false.address@isp.com. So your domain/mx/ip check works ok, but it's still a false address. Sure, my IP address will be in the headers, but how different is that from the current situation?

Next you'll be suggesting that to combat terrorism, before getting on a plane passengers should have to pass a 1/2 hour series of tests with questions like 'are you a terrorist?' and 'Is this flight for: a) business; b) pleasure; or c) terrorism?'
Not going to make it any harder for the terrorists (except the really dumb ones), but a big pain in the ass for Joe Citizen.

(sorry, in a bit of a ranting mood)
Re:MX records by MeNeXT · 2003-07-24 00:30 · Score: 1

This is what I LOVE about admins, if it does not fit in with what they want to do they could not care less about how it affects others. In order for you to send mail through most ISP's you need to either use one of their IP's or you need to authenticate yourself. If you are using ADSL, DIAL-UP, or what ever (I know now everyone will start litteraly quoting the exact connect that I mention) to connect to the net you need to have your system IDENTIFIED by an IP in order to get a response. If you need to connect to on-line banking your IP and system/domain name needs to resolve. Most ISP's assign a name to your connection if it is dynamic, and it resolves.

I have naver seem a control system work where responsibility is not required to authorise a transaction.

As for your terrorist commet: I guess your solution to let them just get on the plane without any checks is the proper way? It does not matter to you how many die, just so you can save 1/2hour? You might as well as give them the BOMB!

--
DRM? No thanks, I'll just get it somewhere else...
Re:MX records by MeNeXT · 2003-07-24 00:37 · Score: 1

No. Send yourself an email as you described and you will notice that the university mail server will be properly identified. Your station will also be properly identified. Now take a look at SPAM and you will see that they can put as a Domain, "YOUR MAMA EAT SH1T" as their servers domain and your stupid server will still accept it.

Try what you claim. Look at the HEADERS and you will see that the only SPAM (if you follow the current anti-spam rules for your mail server) getting through is SH1T like the record above. As a matter of fact you have to go out of your way to configure your system to work and identify itself with a stupid domain as mentioned above.

--
DRM? No thanks, I'll just get it somewhere else...

Actually by Andy+Dodd · 2003-07-23 01:42 · Score: 1

Depending on how they sent the email, this is likely one of the "tricks" where the text content and HTML content differ.

Many mail clients (IMP for example) will display the text version, and show the HTML version as an attachment. Very likely the "missing" advertisements are in an HTML attachment.

I get spams like this all the time.

--
retrorocket.o not found, launch anyway?

Bayesian Filtering Should Still Work. by Jack_Frost · 2003-07-23 01:42 · Score: 2, Informative

My Bayesian filter analyzes the message in raw text, including any HTML tags. A handful of HTML "enhanced" spams might make it through the first few times until I classify the new messages as junk. Once that happens the filter learns that random HTML tags increase the chances of it being spam and it's off to the junk pile.

No, no, no... look at this another way by RT+Alec · 2003-07-23 02:01 · Score: 3, Insightful

This article highlights why I have stopped using filters altogether. End-user filters address the symptom, not the cure. The problem with even the best filter is the mail is already there, taking up space, hogging bandwidth, and the filter is churning CPU cycles to hopefuly deal with it. My mail server uses 3 rbl (blacklists), and one I have programmed myself (rbl.restongeek.com). I get no false positives, and only a trickle of spam that gets through. I also get some small pleasure reviewing my server logs of the rejected mail, where the reject happened before any of the actual data was transmitted (see my /. journal for a sample).

Of the anti-spam legislation currently being proposed, the most important clauses are those that deal with forged headers and illegal use of other servers (relay rape). Once such laws are in place, blacklists will become even more effective, because spammers will have fewer places to run and hide (if they sell something from the U.S.A.).

One final piece to the solution is to get ISPs to act responsibly, and block egress traffic on port 25 for dynamic IP addresses (look up many of my previous posts for more detail on this point). Again, combined with blacklists, this will reduce spam tremendously-- not just in your inbox, but your (and your ISP's) bandwidth.

Re:No, no, no... look at this another way by Urchlay · 2003-07-23 07:00 · Score: 4, Interesting

> One final piece to the solution is to get ISPs to act responsibly, and block egress traffic on port 25 for dynamic IP addresses

Some ISPs do this already.

<rant topicality="50%">
That'd be fine, if said ISPs would allow their users to relay mail from addresses other than $user@isp.com... but for various reasons (commercial? political?), they don't.

In other words, I can't send mail via my $50/mo. cable modem at all, unless I want to use the account assigned to me by my ISP (and sold to spammers, no doubt). I prefer to use an address at a domain I personally have registered and for which I personally control the SMTP server. For one thing, my ISP may change: I may decide to get DSL instead of cable, or I may move to an area served by a different cable ISP, or (this has happened to me recently) my cable provider may get bought out by another company, and change the domain name... or any number of other things... but my domain and my SMTP server won't change, so nobody even has to care what ISP I use, and I don't lose legitimate mail due to the address changing.

Unfortunately, my ISP, in its attempt to stop me from sending spam, has restricted me to using only their SMTP server (blocked egress on TCP port 25, as suggested by the parent), but will not allow me to send mail via their own SMTP server using my own (valid) email address (which I do not wish to use for reasons already explained)...

The only solutions here are some sort of VPN to the network where my SMTP server lives (at work), or else ssh to the SMTP server (which is what I actually do, but it's inconvenient).

I've offered to pay my ISP for `business class' cable service, but they *don't offer it*. I've attempted to get DSL, but am too far away from the CO. I'd love to have a choice of ISPs in my area, but cable companies are local monopolies in the country where I live... and thanks to the shakedown in the market, they're getting to be multi-state monopolies. I'd have to move *many* miles before I could get cable internet service from a different provider.

I'm not claiming anyone's deliberately conspiring to limit my (or anyone else's) freedoms. I guess what this boils down to is that so many people have pissed in the pool that we've now got on-duty cops as lifeguards... sorry, that's a rotten analogy, best I can do at the moment.
</rant>

OK, I feel better now, sorry about that.
Re:No, no, no... look at this another way by elleirdad · 2003-07-23 07:49 · Score: 1

Nah. The coyote never caught the road runner either. There are an infinite number of places to hide when trying to sell Viagra and body part enlargements. And, the spammers will always stay one step ahead of the blacklists. You see, the spammers must own the same filters (like POPFILE) that you and I do. So, as soon as you an I get an update, they get it and have a chance to change their address. The laws will only stop the legit businesses from prospecting. I am a big fan of the new Bayesian filters. There is a group developing a free filter called SpamBayes (www.spambayes.org) and once company turned it into a product for consumers named (www.inboxer.com). I've been using it and my spam dropped from more than 100 a day to nearly 0.
Re:No, no, no... look at this another way by RT+Alec · 2003-07-23 08:38 · Score: 1
The only solutions here are some sort of VPN to the network where my SMTP server lives (at work), or else ssh to the SMTP server (which is what I actually do, but it's inconvenient).
Exactly. The admin of the SMTP server you want to use ought to use SMTP + AUTH + SSL, which would run off another port (SMTPS uses 465). So the SSL part takes care of the issues with your ISP (they won't be blocking port 465). The AUTH part keeps your work SMTP server from unauthorized use (e.g. spammers looking for an open relay). Everyone is happy. Here are some links with additional info on setting up SMTP + AUTH + SSL:
- Sendmail tips
- Qmail tips
- Exchange tips

White Lists is the only way by Organic_Info · 2003-07-23 02:06 · Score: 2

Filtering is all very well and good - but ultimately it is an arms race that no side will win. Battles may be won but the war will rage on.

The most effective method I have used is whitelists - if your names not down your not getting to my inbox. All other mails are placed in a pending folder where I currently have to manually check the mails - filtering cold be performed on these mails to cut out the really obvious spams and save me some time.

Human authenticators could be used to move mails not on the white list to a more privileged folder than the pending (to be reviewed) or straight to your inbox. But I expect at some point in the spam wars tricking human authenticators will be on the cards.

I personally find the white list method as used by hushmail works wonderfully.

--
"Things that you own end up owning you" - Tyler Durden (via Diogenes of Sinope).

Re:White Lists is the only way by dumboy · 2003-07-23 02:55 · Score: 1

It does work, until you try to deploy it to a couple thousand users at the mail gateway level. Now the end users spend half as much time white listing people as they did deleting spam. You also have to worry about bounces. For instance, suppose I'm a spammer. I send a mail to: joe@abc.com and from: joe@def.com. You at abc.com bounce the message to joe@def.com, because it wasn't white listed. joe@def.com never sent it since it was a forged from address. 95% (guess) of the from addresses in spam are forged. You just helped in sending joe@def.com spam. What if joe@def.com is using the same system? He didn't white list you or the postmaster address from abc.com. Another bounce will occur on the def.com mail server, leaving the postmaster addresses on both abc.com and def.com with a lot of useless garbage. White listing alone isn't a solution, just part of it.

and then legitimate become spam. by QNX · 2003-07-23 02:12 · Score: 1

What's sad about all the spam is that legitimate email become flagged as spam with filters.
I created the mailing system for my company and we only send legitimate mail.
legitimate "Click here to unsubscribe..." is enough for a filter to flag the message as a spam.

We mostly have customers from US and Germany. The number of US customer reached is very low....around 10% of the mailing...for germany, 25%. It's a matter of time until those german customer become flooded with spam and start using spam filters.

As for myself I moved to a white list. 30 spam a day was more then enough to be annoyed.
I'm using outlook and Qurb is doing the job quite good for white listing. This way I can simply check the quarantines mail once a day and check if there's any good mail in there.

--
Karma: Very Very Very Very Bad

Cannot ISP be forced to block spammers? by michib01 · 2003-07-23 02:16 · Score: 1

This thread is quite interesting, but I still cannot understand why ISP cannot be forced to stop spammers.
IMHO, if an ISP account is generating 50000 messages per day, chances are that he/she is a spammer. So, an ISP software could build a list of possible spammers. Maybe some of those messages are real service/useful communications, maybe not... But a look at the sent messages can easily reveal their nature and a list of "trusted" account can be used.
Then if an account is recognized as belonging to a spammer, the latter can be identified and/or the account ca be deactivated.
Why should an ISP do all this work? It could be forced by the law.

I know there are many ISP in different countries, not equally eager to apply such a rule, but preventing a user from receiving spam from Europe or US would be a step ahead...

--
- "Having a clean conscience is sign of bad memory"

Re:Cannot ISP be forced to block spammers? by Moryath · 2003-07-23 03:06 · Score: 1

That works IF the mail is being sent through the ISP's mail servers.

However, most spammers run their own mail server. All the ISP sees is a high-traffic data stream heading out into the void. Chances are it bounces off half a dozen places before it reaches the myriad of open relays the spammer is working with.

Chances are further that the spammer is using multiple open relays and multiple or just forged return accounts.

Now if the ISP were intercepting everyone's datastream and analyzing it, realtime, that theoretically COULD be done... but the processing power just isn't there.

New tech by JMP3 · 2003-07-23 02:20 · Score: 3, Interesting

Some time ago a new way for filtering spam has been discovered. Solution is simple, yet brilliant - we already have those "To confirm you're not a script, please type the text shown in this image" at various websites to guard against form-submitting bots. Apply this to email (bounce back all emails with image attached) and all the spam is gone! Not that it is a perfect solution (I wish there was...) as I see 2 minor flaws in this system :
1. It introduces a delay in communication - confirmation letter has to be sent and reply received.
2. Not all recepients at the other end are *that smart* to understand "what the hell this image means and what am I supposed to do with it?"
From the other side it can serve as lameness filter ;)

But still a promising technology. I've searched the web and came with both subscription services Mailblocks and client-side apps Icemile. The last one is free and I think I'll stick with it.

Re:New tech by Anonymous Coward · 2003-07-23 05:00 · Score: 0

With the system you describe, I'll never get all the e-mails from web sites that I actually do business with. When that book or CD or computer part I ordered ships, I won't get an e-mail informing me and giving me the tracking number. To me, that's too high a price to pay. In fact, any spam blocking technique that makes it much harder to use e-mail legitimately is a spam blocking technique I'm not interested in.
Re:New tech by cant_get_a_good_nick · 2003-07-23 05:22 · Score: 1

Slashdot just had an article not that long ago about sites that had this methodology to sign up (like Yahoo Mail) didn't allow people with sigh disabilities to sign up. You just eliminated all email from blind folks.
There's also some automated emails to worry about - how would notices from, say, your bank get through?
Re:New tech by zanthas · 2003-07-23 06:27 · Score: 0

Not that it is a perfect solution (I wish there was...) as I see 2 minor flaws in this system : 1. It introduces a delay in communication - confirmation letter has to be sent and reply received. 2. Not all recepients at the other end are *that smart* to understand "what the hell this image means and what am I supposed to do with it?" From the other side it can serve as lameness filter ;) ----- What about things like: Download my confimation email Confirmation.gif.exe
Re:New tech by ispeters · 2003-07-23 11:46 · Score: 1

It only introduces a delay the first time, since you can whitelist that sender the moment they respond, and automatically accept anything they send thereafter.

As for point two, I'm sure you could word the response email in such a way that even the most thick-headed of recipients could figure it out....

Ian

IP Blacklists are the way to go... by m3djack · 2003-07-23 02:27 · Score: 1

I used to maintain my filters stopping spam, but they were only catching about 60% of all spam, and even then, I still had to download it from the POP server. I signed up with spamcop.net (No, I don't work for them :P) about four months ago, and now a good 95% of my spam is blocked on the server side, and I never have to see it. Ever.

I subscribe to all of the IP blacklists, and I've never lost a legitimate email. Since March, the service has stopped about 11,000 spams. The best part is for $30/yr., I don't have to play games with filtering programs... works like a charm.

Re:IP Blacklists are the way to go... by acceleriter · 2003-07-23 03:59 · Score: 2, Interesting

spamcop.net was pretty cool, before Julian got in bed with Cyveillance. Now I wouldn't touch them with a ten foot pole.

--
CEE5210S The signal SIGHUP was received.

My theory by Anonymous Coward · 2003-07-23 02:39 · Score: 0

People who send out spam are in it for the money. I don't know the specifics of the industry, but I'd wager that they're paid on number of emails seint out.
Anybody know how spammers actually get paid?

PopFile by MrEnigma · 2003-07-23 02:44 · Score: 3, Interesting

What's awesome about the author (Dr. John Graham-Cumming) is that he not only knows his stuff, but he puts it out in his open source software called PopFile written in Python.

PopFile can be located at http://popfile.sourceforge.net.

I am currently using PopFile, with an accuracy of 98.26% from nearly 8,000 messages. It's the best I've ever used, and it's free!

--
GeekWares - Buy and Download Today!

Re:PopFile by JohnGrahamCumming · 2003-07-23 04:00 · Score: 1

Thanks for the kind words.

John.
Re:PopFile by driptray · 2003-07-23 12:01 · Score: 1

As another popfile user, I'd like to add that the popfile author is very active on the popfile forums on sourceforge, and has a wonderfully professional attitude. He calmly deals with critics, and always appears open-minded about new suggestions and is willing to discuss the pros and cons openly.
As a result, the forums are quite active, and he seems to get quite a bit of assistance from the users. It really seems to be a perfect example of an open source project done well.
My popfile accuracy is 99.49%, for over 9,000 emails. I receive approximately 120 spams a day. I'm just a very happy user.
Re:PopFile by JohnGrahamCumming · 2003-07-25 05:41 · Score: 1

Perhaps you'd like to nominate me for an Open Source Award :-)
At any rate, thanks for the comments. It's heartening to hear from people that what you are doing is professional, especially when faced with the challenges of the demands of people worldwide on your "little" project.
Glad to hear that POPFile is working out for you.
John.

That would require we be able to find them by Moryath · 2003-07-23 02:55 · Score: 2, Interesting

The trick is: the Spammer, him/her/itself (well he/she WILL be an "it" if I ever find them), wants to be completely transparent.

They send mail. You see mail. In their depraved mind, you then deal with company that commissioned mail.

First of all, I want to strangle the people who commissioned said mail, especially mr. "Free golf wedge, best in world" and the fuck from K-Mart marketing who bought a cd full of email addresses and added them to K-Mart's bluelight email list.

However, that's not the point.

Think about how we filter. In order to have a realistic opt-out sequence, we have to be able to reach the spammer back. Either by email, or clicking a link, or something of that sort.

The MOMENT something that static is in the email, however, ISP filters will catch it and promptly ban any email that they send with that indicator tag in it.

See the trick? It's all based on evading filters. You can't legitimately provide an opt-out solution, because then that becomes an identifying tag for people to filter you away.

And the last thing spammers want to see is people actually opt out anyways, because if they WERE honoring it, they couldn't claim to be mailing to 50 million people. They make their cash partially on the claim that they reach a huge number of people in order to get responses from a smaller number, just as TV shows do with ratings and ads.

Legitimate mail is completely different by Moryath · 2003-07-23 03:01 · Score: 1

Legitimate email offers are one thing.

For example, I accept emails from Amazon. Why? Because I buy books from them. When something comes up that I might be interested in, I like hearing. Likewise, I accept the occasional email from online computer parts stores I've bought from. Chances are I am not buying again, but if the right offer came along I might, and I have been a customer of theirs.

However, two things need to happen:

Fraudulent email (porn, penis junk, get rich quick, etc...) needs to be stopped, except for people who bought from those people before. It should be all opt-in.

Sales of customer lists, of lists of emails, should be ILLEGAL. I have bought a service or product from you, and only from you. I have no business relationship with your cousin, your "partner" business down the street, or anyone else you might think to send my information to.

If this happened, we wouldn't mind nearly as much. Legitimate mail, from companies I legitimately have dealt with, is fine. The problem is, for every one of those emails I get, there are 5,000 fraudulent spams.

Gettin paid by Glonoinha · 2003-07-23 03:02 · Score: 1

I read a recent story by an ex-spammer that said he was up to sending something like (insert some big ass random number here because I forgot) 10M emails a day, 70M emails a week.

Got paid roughly $1,000 a week on good weeks, that seemed to be his peak.

This little kokgobbler is sending out a third of a billion spam emails just so he can make $40k a year. That alone justifies letting sys/admins kill spammers.

--
Glonoinha the MebiByte Slayer

But surely... by StoatBringer · 2003-07-23 03:04 · Score: 0

...everyone *wants* a larger penis. And breasts.

--
Cress, cress, lovely lovely cress

Congrats by NumbThumb · 2003-07-23 03:10 · Score: 1

You have been trolled.
Have a nice day.

--
I have discovered a truly remarkable sig which this 120 chars is too small to contain.

Where's the profit in hiding? by netringer · 2003-07-23 03:19 · Score: 2, Interesting

One thing I gotta know: If the spammer knows I have no interest in the say, "Herbal Viagra" prodct he's pitching, why does he think that if he says he's selling "V A 1 G R A" it'll be different? Am I supposed to go for that message and BUY THE PRODUCT now?

I'll answer my own question a bit: After seeing one of these scumbags on TV it's obvious they get off just watching the counter increment saying that he just sent 4,123,456.890 more messages while he watched. They don't really want you buy or do anything. They just want to send the garbage.

--
Ever dream you could fly? Get up from the Flight Sim. I Fly

Re:Where's the profit in hiding? by acceleriter · 2003-07-23 03:56 · Score: 1

They're going for the people whose ISPs filter for them, I guess. Either that, or the spammers are totally irrational.

--
CEE5210S The signal SIGHUP was received.
Re:Where's the profit in hiding? by Anonymous Coward · 2003-07-23 05:09 · Score: 0

Why try to get past filters? Maybe the person who runs the mail server is doing filtering for all accounts on the server. There may be some customers who have no strong opinion on spam (or who aren't even fully aware of the spam / non-spam distinction) who might respond and buy whatever crap they're hawking.

In other words, the fact that your mail is filtered doesn't imply that you personally took intentional steps to filter it.

Also, maybe spammers just don't want to feel like they're losing out to the filters.

Spam clients outed, credit card details published by hexidec · 2003-07-23 03:32 · Score: 1

The Reg has just posted an article about anti-spam activists outing some potential future spammers. Give it a read, and if you're sufficiently motivated, join the battle.

Silly by Hatta · 2003-07-23 03:53 · Score: 1

I see lots of schemes to kill spam, but anything that requires cooperation between the end users isn't going to catch on. You need to be able to send email to and recieve email from anyone in the world. There is an existing user base of billions for email. They won't suddenly all switch over. I think the best way to deal with spam would be a GPG web of trust, but if you blacklist unverified keys you lose one of the major advantages of email.

--
Give me Classic Slashdot or give me death!

Re: SPAM by grandmofftarkin · 2003-07-23 04:11 · Score: 1

Rubbish - that's an acronym after the fact. The real meaning is that receiving that sort of message is as annoying as having a bunch of Vikings shouting "spam, spam, spam, spam" and drowning out your conversation. Anyone tells you different, they're a n00b to the net and you should ignore them.

I'm sure you are right, though not everyone believes this. See Foldoc where is states, Correspondant Bob White claims the modern use of the term predates Monty Python by at least ten years. He cites an editor for the Dallas Times Herald describing Public Relations as "throwing a can of spam into an electric fan just to see if any of it would stick to the unwary passersby."

No, spam doesn't work, but something else does by Anonymous Coward · 2003-07-23 04:15 · Score: 1, Informative

the Spammers MAY make money by selling an occassionaly Penis Enlarger, but they REALLY make money by selling LISTS!

Lists of VALID email addresses.

These lists are SOLD to people trying to actually sell things. "Clean" lists with valid email addresses.

The people who BUY these lists or services want as FEW bounces as possible.

This is one reason why I get gobs of these new spams that are really nothing more than spam filter tests. They are trying to figure out what gets through and also trying to poison the filters so they can claim a higher percentage throughput for the stuff they REALLY want to deliver.

The problem is not people who actually BUY the stuff advertised, the problem is the people who buy the LISTS of email addresses or the services of a spammer thinking that they are using some sort of valid "Direct Marketing" service.

I built a small website for a client and after it was up, he wanted me to find him a way to advertise the site VIA EMAIL! He wanted me to go find a spammer, and PAY the spammer to send his ad to millions of valid email addresses.

He saw absolutely nothing wrong with this. He thought it was no different than buying a snail mail mailing list and sending out thousands of flyers....but Cheaper!

and he was not selling Penis Enlargers, he was selling Printing Services!

It's the Buying of the LISTS and spam SERVICES that's the big problem! Not the people who are actually buying the stuff in the spam.

It's like the Gold Rush, there may be no Gold anymore, but that won't stop people from heading out west to try, and when they get there, they find some nice vendors who are more than happy to sell them all the tools they need to pan for gold. Whether they find gold is immaterial, there's a steady flow of customers buying supplies. The ones that give up, just go away. Plenty more suckers lined up outside to buy pans, picks and shovels.

metaphone mapping text by joeldg · 2003-07-23 04:21 · Score: 5, Interesting

You can use the metaphone algorithm (I use PHP so, http://us3.php.net/manual/en/function.metaphone.ph p) which has come in handy.. Just strip all HTML and de-urlencode then run this on the msg, it totally ignores numbers and punctuation and any letters that are not in (a-z A-Z). You will need to have a database pre-made full of metaphone values from a dictionary then start a comparision and you can get a general feel for the msg.

I took all the words used in a product called spamassassin and used that to do a comparison.. Coupled with bayes filtering I imagine this would be pretty much the best way to filter mail.

It is kind of an interesting approach based on what mail "sounds" like vs what it actually contains.. If you filter on the straight contents these guys will just keep coming up with different ways of encoding and generally being twitchy.

However, their mail will *always* have that "buy this!" kind of sound.

I built a system a while back that was processing all double bounces from three servers and handled around 50k/day spams and came up with some interesting results.

If anyone is interested I'll dig up the code and place it on my site with the rest of the stuff there.

--
anime+manga together at last.. in real time.

Re:No Field Guide by Zeriel · 2003-07-23 04:34 · Score: 1

Umm...you're a troll, maybe, but did you check the links on the left sidebar?

--
"America has done some terrible things. But I know that Americans don't cheer when innocents die." -Dave Barry

it's also the kind of HTML that is used by victorvodka · 2003-07-23 04:50 · Score: 1

Most non-spam Outlook users send HTML messages that lack tables, iframes, and other post-Mosaic formatting tricks. I think if one were to bounce email that contained these useless HTML entities, you'd still be able to get your precious email from your long-lost Outlook-using girlfriend.

--

The flag just makes more sense than the constitution. - Judas Gutenberg

HTML is useless; strip it out by bigberk · 2003-07-23 04:58 · Score: 1

Since HTML is such a menace, why not get rid of it? You can remove markup (note: markup) without losing the meaning of text. This Windows client strips HTML and displays sweet, innocent plaintext.

anti-spammers' Dirty Little Secret by airdrummer · 2003-07-23 05:08 · Score: 0, Flamebait

As these types of tricks are discovered, ActiveState's team of spam analysts creates new heuristics to identify them, and provides these heuristics as part of the regular PureMessage SpamCheck updates.

A.S. don't want 2 solve the spam problem, they want 2 sell bandaids... the _only_ solution is 2 charge 4 every byte sent, but that's a filthy capitalist/market-oriented idea unacceptable 2 marxist-befuddled computer "scientists"

Re:anti-spammers' Dirty Little Secret by AKnightCowboy · 2003-07-23 05:21 · Score: 1

A.S. don't want 2 solve the spam problem, they want 2 sell bandaids... the _only_ solution is 2 charge 4 every byte sent, but that's a filthy capitalist/market-oriented idea unacceptable 2 marxist-befuddled computer "scientists"

Holy shit! Who would've thought Prince read Slashdot, much less posted comments. Welcome aboard!

Anti-Spam thought by pugugly · 2003-07-23 05:43 · Score: 1

I've had this occasional idea, with an implementation for it. Obviously, Spam depends on two things, getting a message out to everyone, whether they care or not, and getting response from that 1/1000000 person foolish enough to want generic viagra (or whatever).

Is it possible to automatically set up systems that insure that the place which gets the response are added to the spammers list? If they get on each others queue's they drown out their own servers and the cost to benefit ration goes up sharply.

Or so it would seem to me.

--
An Invisible Entity of Vast Power whose existence must be taken on faith alone: Liberal Media

The guide underrates the tricks by dacarr · 2003-07-23 05:47 · Score: 1

A lot of the tricks he lists as "rare" are tricks that my filters frequently pick up on.

--
This sig no verb.

Pump-and-dump stock spam by Anonymous Coward · 2003-07-23 06:12 · Score: 0

Until recently, most of my spam was "buy this stock now!". How do you propose I locate the person making money off that one?

Why remove your email address when... by SyCKLe · 2003-07-23 06:29 · Score: 1

... you can submit a spammers address? ;)

Just a thought. Instead of brooding about not being able to remove your emails from spam sites, ever considered putting in a known spammer email in the Remove box instead of yours? :)

And by spammer email address, I don't mean that address you pluck it off from the email which is usually crap. All the spam that comes in usually advertise some sorta URL. Find out who owns the bloody URL using WHOIS, validate their email address to make sure it's legitimate, and throw their email in the Remove box.

Of course, if you have time on your hands, you could always go VISIT the spam sites, find out the contacts (if you can't, check out the source code), validate the email address for legitimacy, and do the same in other spam sites Remove box. Spam sites are just PERFECT breeding ground for Remove and Opt-Out boxes!

If you have a website, it gets even better! Draw up a HTML page and stick the spammer's legitimate email addresses in there... and pray for the bots to come! :)

Who knows, their email will probably get harvested, and the spammers will in turn get spammed by other spammers. Let them spam themselves to death for all I care.

And after that, if you are really hyped, subscribe them to some beastiality mailing lists, or some nasty ones, and imagine them squirm!

Remember, I get spam too! And it sucks. But thanks to Procmail, it's just a trickle now. Still, one or two do get through a day, and for those rare ones, boy I really DO hope they have REMOVE and OPT-OUT boxes! ;)

So don't shun the Remove options. Embrace it to your advantage! ;)

Nice free advertising, Slashdot by oobar · 2003-07-23 06:52 · Score: 1

Okay, this article is a thinly veiled promo for ActiveState. This so-called field guide contains a handful of tricks that are mostly obvious to anyone that knows a little bit about HTML or MIME-Encoding. You would be much better off combing through SpamAssassin's extensive list of heuristics rathen then reading a boring rehash of "Hey! you can hide stuff in HTML comments! Betcha didn't know that! (Subscribe to our newsletter, thanks.)"

(S)tupid (P)eople (A)sking for (M)oney by Anonymous Coward · 2003-07-23 06:58 · Score: 0

>And for crying out loud, "spam" is not an acronym so stop writing it in upper case!

SPAM was coined back in the early eighties. Sorry your wrong!

Bayesian filters to the rescue! by brlewis · 2003-07-23 07:02 · Score: 1

A Bayesian filter would learn that "iaga" is a sign of spam. Any spam-hiding technique that fragments words is going to run into this same problem. Basically they'd have to resort to one or two-letter fragments, making their messages even easier to distinguish from legitimate mail.

Bayesian by elleirdad · 2003-07-23 07:38 · Score: 1

I agree that Bayesian filters are the way to go.

Wasn't it Justice Potter Stewart who said "I can't define pornography, but I know it when I see it."?

Well, if all other spam filters try to define spam, it is the Bayesian ones that learn by example. They not only learn by the piles that wiggys describes, Bayesian filters learn about new types of spam as the messages arrive.

I learned about Bayesian from the SpamBayes site and then switched to a beta of InBoxer from a small start-up(www.inboxer.com) about a month ago. I hardly see any spam any more.

Re:TMDA/ASK by nexus987 · 2003-07-23 08:35 · Score: 1

Yeah, I've been using "Active Spam Killer" (ASK www.paganini.net/ask), which has similar functionality. In the 2.5 months I've been using it, NONE of the 2000+ spams I've recieved have gotten to my inbox. Works great with procmail, as I don't control my mail server. I still use spamassassin to trash the stuff that's obviously spam. TMDA style software seems to be the way to go... I don't see anything else helping to obliterate spam any time soon (IE: "replace SMTP on every mail server on the planet with something better that hasn't been invented yet". Not gonna happen in our lifetimes).

Re: SPAM by joebubba · 2003-07-23 08:35 · Score: 1

Here's a blast from the past from 1995:

According to "The New Hacker's Dictionary" (third edition) by Eric S. Raymond:

====
spam vt., vi., n. [From "Monty Pythons Flying Circus"]
1. To crash a program by overrunning a fixed-size buffer with excessively large input data. See also buffer overflow, overrun screw, smash the stack.
2. To cause a newsgroup to be flooded with irrelevant or inappropriate messages. You can spam a newsgroup with as little as one well- (or ill-) planned message (e.g., asking "What do you think of abortion?" on soc.women). This is often done with cross-posting (e.g. any message which is crossposted to al.rush-limbaugh and alt.politics.homosexuality will almost inevitably spam both groups).
3. To send many identical or nearly-identical messages separately to a large number of Usenet newsgroups. This is one sure way to infuriate nearly everone on the Net.

The second and third definitions have become much more prevalent as the Internet has opened up to non-techies, and to many Usenetters sense 3 is now (1995) primary. In this sense the term has apparantly (sic) begun to go mainstream, though without its original sense of folkloric freight - there is apparently a widespread belief among lusers that "spamming" is what happens when you dump cans of Spam into a revolving fan.
====

Now if I could just stop sneezing from all the dust that was disturbed from opening that book.

A new spamming technique to foil bayesian filters by leob · 2003-07-23 09:35 · Score: 1

Using your e-mail address as From to send spam to you is old news. A new technique is to go to Google groups and find who are your 'discussion buddies', then sending you spam disguised as mail from those buddies. I've seen it happening.

Re: Your post by Hard_Code · 2003-07-23 09:38 · Score: 1

Hello,

This is a gre article. Everyboy should read it.

--

It's 10 PM. Do you know if you're un-American?

Re: Your post by Anonymous Coward · 2003-07-23 11:13 · Score: 0

The HTML comments alone should cause this message to get pitched. Attempting to obfuscate the contents should make the spam score 100.

~~~

Stupid filters by beersoft · 2003-07-23 12:22 · Score: 1

the best filter i have is if the message contains "-->" bin it, anything else the magic list-o-words picks up and bins for me. 99% of the spam i have had in the last month the from header is good, because its from someone else getting the spam

-later

Owen

Send helpful replies to the list, not the poster by AnotherScratchMonkey · 2003-07-23 16:09 · Score: 1

Unless the poster asks for a personal reply, don't cc his personal address. Send the reply to the list, so everyone (including the poster) benefits.

Let's react 180 degrees by Anonymous Coward · 2003-07-23 18:27 · Score: 0

What if instead of filtering SPAM messages, we let them through and do take all the steps towards buying the advertised products, expect of course actually buying them.

Suppose they provide a URL to click on, then lets click it several times, even filling out many order forms on their website.

If they provide a phone number to call (it usually is an 800 number), call them 20 times for each email they send you.

Bring them the /. effect, have them pay for wasted bandwidth and long distance charges.

If they send 1 000 000 emails, and if each recipient generates 20 bogus responses, it means they somehow have to process 20 000 000 bogus orders...Now that would lower their ROI, wouldn'it?

After all, arent't they begging to be DDOS'ed and overwhelmed by customer responses?

False Positives by Arpie · 2003-07-24 05:01 · Score: 2, Interesting

Yeah, I hate spam as much as the next geek. However most people don't stop to think about the black side of spam filters: false positives.

I use spamassassin and Mozilla's bayesian filters, they do get rid of a lot of spam, but they also do get some false positives. This means I have to check my spam folder every so often, which kind of defeats the purpose, doesn't it?

Moreover, email is not only a personal communication tool anymore. Do you buy on-line? Do you expect an order confirmation, or a shipping confirmation? Well, it's quite likely that those could be flagged as spam by spam filters. It just happened to me yesterday on an ebay winning bid notice, because the subject had an exclamation mark. Businesses -- you know, the kind of organization that usually pays the sallaries of us working geeks, or the sallaries of the parents of student geeks -- need to get through to comunicate with their customers. Spam and spam filters are both getting in the way.

How bad is that? IMHO pretty bad. Spam is killing half of the advantages of using email. Filters, with the pretty much unavoidable false positives in this cat and mouse game are killing another quarter, at least. I don't know what will happen, but it's a pretty sad situation.

--
/* TAANSTAFL */

Slashdot Mirror

The Growing Field Guide To Spam Techniques

321 comments