Security Predictions of 2004
scubacuda writes "Computer World's security predictions for 2004: R.a..n,d,o.,m p,u,,n,c.t,,u_a.t.1..0.n evading spam filters, Internet access filtering, better desktop management, enterprise personal firewall deployment, tools that securely scrub metadata, corporate policies against USB flash drives, Wi-Fi break-ins, Bluetooth abuses, cell phone hacking, centralized control over IM, public utility breakin publicized, government defense against cybercriminals, organized cybercrime, and a shorter time to exploitation."
This is a good thing. It makes it harder for the victims to read, and gives a lot of anomolies that any modern statistical filter will find extremely useful.
Spammers actually seem to try defeating bayesian spam filters by "training" them with random words:
From: Noah Poe
Date: Sun, 04 Jan 2004 15:58:49 -0600
To: a.konrad@aon.at
Subject: canberra happen
aides bone emmanuel rumania persistent josephine pencil majesty bottom
anarch molecular cafe hepburn done ellipsoid monoceros chokeberry pungent decontrolled
orphanage keel cessna lippincott drugstore onion inclement empire
This is just sick.
A monkey is doing the real work for me.
Ok, this is probably a dumb question, but why the hell doesn't anyone make a spell checking spam filter? Just set it to junk any incoming email with more than x% spelling mistakes, and voila! All y,o.ur.,. r,a.,n.d,.om.,,. p,.u,.nc,.tu,at,i.on and |33t 5p34k is fucked. Combine it with a regular spam filter, and you're set!
It'd also have the added bonus of keeping idiots who can't spell worth crap out of your inbox. And since it would work off a dictionary (preferably the same one as your outgoing spell checker, if equipped), you could always add whatever names, phrases, and abbreviations you wanted, while still keeping the "0MG L1EK MAK UR P3N0R 9 INCHZ LONGR!!" crap out of your inbox.
Surely we have the ability to create something like this. So where is it?
One of the requirements (coming from "concerned parents", of course) was to filter out swearing in the chat rooms. So if someone typed in, say, "you're a shit", what would actually appear for everyone else would be "you're a $!%^" or something similar.
Eventually, of course, we got into an arms race with the kids, who would write "sh1t", "s.h.i.t", "sh*t" and so on.
However, I came up with a program which generated a regexp which matched pretty much all the variations, and - to date - none of the kids have worked out a way around it.
This is how it worked.
(Actually, I can send anyone the original regexp generator code if they're interested - just mail me).
The basic concept was to use a table of "equivalences", for, eg. "a" => [ "@", "4", "A", ....], "f" => [ "ph", .... ]
For each swear word we generate a regexp with (r1|r2|r3|...) for each letter in the bad word, where r1, r2, r3, ... are the list of
equivalences for that letter.
That produces a list of swear word - matching regexps which we then combined into a super mega regexp which would match any of the 50 or so banned words.
One interesting thing is that you can end up with a regexp which is too big for GNU regexp to handle ... But there are ways to get round
that and you can code it up as a flex parser
too which doesn't have any limits as far as I
can tell.
The actual code is slightly more complex and does a few more things than above (eg. it works for "s.h.1.t" too, or even "s---h--1----------t". And it has a concept of "obliterator characters", so "sh*t" can be banned also.
If anyone's interested I can send the code.
Rich.
libguestfs - tools for accessing and modifying virtual machine disk images