Security Predictions of 2004

← Back to Stories (view on slashdot.org)

Posted by michael on Sunday January 4, 2004 @10:02PM from the looking-forward dept.

scubacuda writes "Computer World's security predictions for 2004: R.a..n,d,o.,m p,u,,n,c.t,,u_a.t.1..0.n evading spam filters, Internet access filtering, better desktop management, enterprise personal firewall deployment, tools that securely scrub metadata, corporate policies against USB flash drives, Wi-Fi break-ins, Bluetooth abuses, cell phone hacking, centralized control over IM, public utility breakin publicized, government defense against cybercriminals, organized cybercrime, and a shorter time to exploitation."

14 of 326 comments (clear)

Min score:

Reason:

Sort:

Random punctuation by JanneM · 2004-01-04 22:35 · Score: 3, Informative

Sure, you can defeat spam filters by being obscure enough. Do random punctuation, embed your message in a mass of unrelated words and so on. But from my experience, spam is already approaching the "vanishing point" when it ceases to be comprehensible even to the humans that are supposed to react to the things. I have had spam that has been so obscure it's taken me several minutes do decipher what they are trying to sell (and they still get caught by Spamassassin).

--
Trust the Computer. The Computer is your friend.
bayesian filters aren't fooled so easily by _Shorty-dammit · 2004-01-04 22:51 · Score: 5, Informative

there are more parts to an email than just the subject line or the message body that still give away emails as spam. So even if random punctuation circumvents the spotting of something as specific as "viagra" by changing it to "v..1.,a,g.r,,a" or something similar it doesn't matter much. There are so many other hints that it's basically meaningless to do this, they still get caught because of those other clues. I'm still amazed at how well my bayesian filter of choice, popfile http://sourceforge.net/projects/popfile does with all my email needs. Filtering out spam, sorting out other emails into work, family, and a handful of other 'buckets' to get everything going where I'd like it to go. Spammers are indeed trying out different ideas all the time, but next to nothing ever gets through. And when something does manage to slip by on a rare occasion, well, you just made popfile that much better at catching the rest of the crap anyways. shrug. Been a long time (since I found popfile) since spam was even the slightest concern to me. There are quite a few different bayesian-based filtering methods out there, definitely a good idea to check at least one of them out. Popfile's a good choice, especially if you'd like to sort things besides spam too.
Re:What I encountered yesterday by arivanov · 2004-01-04 23:06 · Score: 3, Informative

Fairly stupid and will not work. At least with SPAM assassin. It does Bayes on two word combinations (unless you change one of the defaults). So random words will not get into the bayes dictionary anyway.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Re:Nearly impossible? by stevey · 2004-01-04 23:14 · Score: 2, Informative

My solution to the punctuation and l33t-speak type spams is simply to run the incoming message through a spell checker.

Whilst lots of people make typos and use words not in my dictionary it does become obvious when the spelt-wrong/spelt-correctly ratio is high that it's likely spam.
Anti-Obfuscation script by cnb · 2004-01-04 23:15 · Score: 4, Informative

Anti SPAM tools already include anti-obfuscation support. Here's one of many scripts for spamassassin.

- cnb
Re:Spam Spam Defeatable Spam by Jugalator · 2004-01-04 23:20 · Score: 5, Informative

According to SpamAssassin's default scores, these are all adding up to the spam score that apply to the examples above to "challenge spam filters":

- Message text disguised using base64 encoding
- Uses a numeric IP address in URL
- Uses a dotted-decimal IP address in URL
- HTML has over 9 kilopixels of images
- HTML: images with 0-200 bytes of words
- HTML has a low ratio of text to image area
- The score from a bayesian filter, which would probably quickly increase for messages with tons of punctuation and still leave legit mail since you normally don't use tons of punctuation.

Spam operators might get more creative, but I still think spam removal tools are several steps ahead.

--
Beware: In C++, your friends can see your privates!
Re:Spam Spam Defeatable Spam by ---- · 2004-01-04 23:45 · Score: 5, Informative

I run spamassassin too.

I get 30-120 spam a day. (old account).

Checking with my spamassassin filter, I see that it's bayesian filter is happy with 1,868,996 pieces of spam, and 386 pieces of ham (the good stuff, stuff I want to keep).

I get maybe 1 spam thru to my normal inbox a month. Which I happily feed to the sa-learn tool (spamassassin's bayesian learning tool).

I don't need any wacky products installed in my email client (which I change often).
I access my email via imap over ssl.
I use mozilla mail mostly, but have used mutt, outlook, pine, outlook express, kmail, and a large amount of others (that I've forgotten about now), all with spamassassin running happily on the mail server churning thru all incoming email.

our mail server handles 4000-10,000 pieces of email a day for all our accounts, and spamassassin barely registers as a 'blip' on our cpu usage radar.

It's really sweet.

Oh yeah, I've had only 1 false positive, and it was due to a wise-ass friend that decided to send a piece of conversational email disguised as spam from a new email address. /* ---- */
Re:On random punctuation by Richard+W.M.+Jones · 2004-01-05 00:40 · Score: 2, Informative

But will it filter the town name Scunthorpe as being offensive? AOL had this problem where people living in Scunthorpe suddenly found they could no longer use their town name.
It handles this case correctly. There is actually some extra code I added to handle cases like this (specifically the word "scrape").
Basically the regexp is modified so it only matches at either the beginning or the end of a word, using word boundary matching. Not completely ideal, but good enough.
Rich.

--
libguestfs - tools for accessing and modifying virtual machine disk images
Re:Nearly impossible? by LnxAddct · 2004-01-05 01:36 · Score: 3, Informative

Yes, there is something wrong with it...you don't know everyone who will email you and you don't know when. You can't tell mailing lists to add "a magic password" and making another account just for mailing lists will be inconvenient and probably be filled with spam. If you hand out business cards with your email or post it on a private forum to get responses there is no way to whitelist everyone who will email you. You can't ask someone for their email address everytime you hand out your business card and adding a little line to the bottom saying "Add this when you email me" will take up alot of the space on the card and be very unprofessional. The list could go on.
Regards,
Steve
Re:Nearly impossible? by Anonymous Coward · 2004-01-05 01:54 · Score: 1, Informative

Alas, one of the main problems is that as a spammer, you are turning a profit if you get a sale on 1 in 40,000 emails (sorry, can't recall where I read that stat, but it was reputable source).

Personally, I've been using SpamBayes (spambayes.sourceforge.net) and it's been working beautifully.

I used SpamNet (cloudmark.com) when it was free and was blown away it's accuracy. It's a p2p spam tracking network (so you let a community of humans decide what's spam, not filtering rules). Course, now they charge you to be a part of the community, but it's worth a look...
Re:Nearly impossible? by Uggy · 2004-01-05 03:10 · Score: 2, Informative

ispell -l < some_email

gives you a list of the misspelled word. You could fiddle with the capitalization rules for things like DNS, DHCP, TCP/IP etc. to lower your false positives.

We could wrap that into spamd and generate a weighted score. Problems would be speed of course as ispell would have to start up each time to check an email (is there a daemon mode for ispell or aspell?)

Anyway, I ran it on a bunch of aforementioned spam and it gives convincing results.

Of course, slashdotters would probably rate a lot of false positives, so maybe we shouldn't push this until we better our spelling.

--
Toddlers are the stormtroopers of the Lord of Entropy.
Re:What I encountered yesterday by Teux · 2004-01-05 03:32 · Score: 2, Informative

Paul Grahm wrote an explaination of why this sort of random introduction of words into spam doesn't fool a good Bayesian filter in this article.

So Far, So Good

The more they try to fool the filter, the better the filter becomes at recognizing this sort of "random" word placement. Interesting read.
Re:Spam Spam Defeatable Spam by stormpunk · 2004-01-05 04:35 · Score: 2, Informative

That's faster because it didn't delete what you wanted.

From the perlop manpage:
Note that tr does not do regular expression character classes such as \d or [:lower:].

Also, do you really want to delete *all* white space too?

Spamassassin does a good job of catching spammers by their horrible imitation headers too, which I'm sure they will continue identify themselves by.
dictionary words in bare mime part by alsta · 2004-01-05 04:59 · Score: 2, Informative

The far most nefarious spam I've seen so far is the kind that has a bunch of dictionary words in the bare 7-bit part of a MIME encoded message. It's common to see this stuff if you have a mail client that doesn't render the multi-media portion of the e-mail by default. You'll see something like;

conduit horse house press lingo technical gelatin overlord brown uniform

In the muli-media portion you'll see spam like never before.

How to stop these? You can't train a bayes database with dictionary words as it would eventually defang the whole method. Your only option I suppose would be to compare the contents of the multi-media portion with the 7-bit ASCII portion and see if they match. Problem here is to make the comparison fuzzy enough to allow for multi-byte characters and stuff like that.

The words thing about this type of spam is that at best your bayes database is circumvented, but at worst it is trained to see good words as bad or bad words as good and is rendered useless.

With SpamAssassin it is easy to set when to auto-train your bayes backend and when not to. I have my required_hits option set to '4.0' so I would use the following settings;

use_bayes 1
auto_learn 1
auto_learn_threshold_spam 7
auto_learn_threshold_nonspam -5.5

With this I am reasonably confident that I am not training my bayes database with good words as bad unless it really is found to be spam impirically, and inverse unless I am sure it's a good e-mail, typically by means of AWL or whitelist_from.

If anybody has solved this, I would be very grateful to hear what you did and how you did it.

--
Wealth is the product of man's capacity to think. -Ayn Rand