Paul Graham on Fighting Spam
Ramakrishnan M writes "Paul Graham, the Lisp Guru is back with a great technique to fight spam. It is based on trust matric, and he claims, only 5 out of 1000 spams got leaked out of this system with 0 false positives. Worth looking at."
The proper way to get rid of spam is to get rid of spammers. Have it illegal to send spam, to market using spam, and to host spammers.
Make each link in the chain liable!
Fight Spammers!
1) Lisp...ever since i ran into scheme, I have _loved_ the concept of lisp based languages. A nice Hoo-ha to anyone who says there are no practical applications of lisp based languages. (except haskell...which personally, i think sucks! if one of our own professors hadn't invented it, it would be dead by now) 2) _0_ false positives. I'm perfectly happy to settle with "some small number of spams getting through" given there are NO false positives. Early on in the article he states that he realizes this is a critical problem, and from the start keeps no false positives as a goal. It is far better to have no false positives then to have 100% no-spam rate with that in mind... 3) the statistical word analysis is really interesting..."describe" is innocent. unfortunately....what happens when a few smart spammers get their hands on this analysis *sigh*
When in doubt, parenthesize. At the very least it will let some poor schmuck bounce on the % key in vi. (Larry Wall)
When I said market using spam, that includes the company that hires someone who spams.
Great... now that they know, they'll spam me with gifs and jpeg.
All humans are mortal. Socrates is a human. Socrates is dead.
Spammers would learn to adapt, and the sales pitches would change character/format. The sales pitch will still be that, but it'll be more cleverly designed - it may be hard to do, but people will manage it. having said that, this method does look like it could be worth implementing - maybe even on the mail server...
Tom.
Oh arse
In the long run filtering would eliminate the source as well. Spam has to be payed for by two sides: Both the spammer and the recipient have to pay for the bandwith. The spammer has to pay a lot more though. Spamming is a business that will continue to exist as long as its profitable. If the success rate of Spam drops dramatically due to refining filters than sooner or later Spammers will no longer be able to afford the bandwidth they need.
Hank! White!
but it'll be more cleverly designed
Ding ding ding ding <points at nose>.
I think you've hit the nail on the head. Simply requiring that spam be cleverly designed should get rid of 99% of spammers.
People's desire to believe they are right is much stronger than their desire to be right.
Making spam illegal would probably cut down on people buying email lists and starting to spam in their free time because it seems like a great way to make some money. It might even cut down on the "legitimate businessmen" types here who do it professionally. It's going to have no effect internationally, however, and there's really not much you can do about it.
There's an interesting point about this in the article, however, when graham says:
I would agree with this - it seems to me that for a lot of "crimes of this nature, drugs being the best example, the solution is not criminalization but regulation. People aren't going to stop dealing or using drugs, nor is it something as serious (like murder) that it's worth it to put them in jail anyway. If drugs were regulated, however, most of the problems could be easily reduced. Enforce strict controls to prevent cutting, ban advertisement, and tie sellers to treatment programs to help get people off of drugs. As long as there's no incentive for people to buy them illegally (ie, their being much cheaper or, as it is now, the only supply), people will buy them from regulated sellers.
Similarly if you regulate spam and make people attach footers you'll be less likely to drive people overseas to spam while also making it much easier to filter out.
Of course, there's still not much you can do about the Koreans, other than trying to get their government to do the same thing.
Besides, do you really want to encourage the government to effectively prohibit certain kinds of non-victimizing (non-kiddie porn) speech online?
The spam message is entirely contained as an /image/ within the html.
Thankfully, my e-mail client is set up to not render any HTML in an e-mail. I have yet to send back any information to a spammer via specially-coded image tags and am proud of it.
HTML-based e-mail is fundamentally insecure and really should be used by no one (except those who simply don't care about privacy). Go here to learn just what a spammer--or anyone who sends you an HTML-based e-mail--can learn about you with just one "click" of your mouse.
Yes, the spammer can learn what browser version you use, what OS you use, and even what city you live in (via the traceroute). An unusually savvy spammer could use this information to install spyware via known exploits in certain browsers and operating systems.
In short, HTML e-mail is damn scary knowing that so many people us it not knowing just how much information they are giving away for free!
Healthcare article at Kuro5hin
I like TMDA, but I have two issues with it. First, you can only use it if you control a mail server. Second, my friends have a terrible time dealing with the concept of having to reply to a message to let mail go through to me. Sure, I can add them in advance, but if they have a new mail address, I don't get to see their message. Maybe I just have dumb friends, but they are my friends, and I want to get mail from them!
I actually had to close down my hotmail account; the spam would exceed the 2MB within 24 hours after being cleaned (and that's with the wonderful MS spam filter set on "high.")
BTW, these days I'm getting individual spams that are 170 KB in size. Talk about rude...
Phallic Symbols in LOTR
BUT, now, the best spam filters out there already use statistical properties. Spamassassin does this...
Spamassassin (as he addressed) does not do this, it gives individual items a score. His method dynamically scores items based on the message. You could use his filter as a plugin for Spamassassin, but with the numbers he's talking about you wouldn't need anything other than his system.
Bill
SMTP is designed broken because it:
1) Allows senders to be faked.
2) Is slow.
3) Requires bounces for broken messages.
4) Allows loops.
5) Cross-subscription to mailing lists, complicated mailing list management.
6) MIME.
7) Add your gripe here.
See http://cr.yp.to/im2000.html
This reasoning is statistically invalid. It is only true if the chance of the word "sexy" appearing in a message is independent of the chance of the word "sex" appearing. In other words, only if knowing that the word "sex" appears tells you nothing about how likely the word "sexy" is to appear, can you reason as he is doing above. That's probably a very poor assumption in this case.
He is doing:
The correct formula is: where the last term means the probably of "sexy" given that "sex" appears.Maybe his approach is good enough for his purposes, but the statistical foundations are not correct.
the spam should be written as a 'multipart/alternative' with an html version of the spam as the primary alternate. The text version contains an innocuous message intended to pass the statistical spam filter. The spam message is entirely contained as an /image/ within the html.
Yes this would make it more difficult to spot, but notice that he examines the headers as well as the content of the spam. Looking at Mr. Graham's examples a lot of the key words that his filter finds are parts of the header, so you have a good chance that the probabalistic filters can still rule these out.
The second point, also made in Paul's article, is that part of what you want to do is push up the costs and difficulty of sending spam. Pushing out a million HTML images is much more costly to the spammer than sending out a million text messages. The more costs we can force spammers to bear the less economical it will become to spam, thus reducing the amount of spam.
Sailing over the event horizon
Can you imagine the day everyone uses this. You send mail to a public list and get back 2000 messages asking you to "authenticate" yourself.
This is a bad plan for working in the large.
Freedom of speech is not the freedom to tresspass on my computer equiptment, use my resources for me to listen to your advertising!
This is not a prohibition on your paying your moneyto spread your advertising. This is a prohibition on you spending my money to spread your advertising.
Commercial speech does have some constitutional protection, but not to the same level as non-commercial speech. But even with pure political speech, there is no requirement for me to pay for your speech.
As for hitting the delete key, at that point, you have already tied up at least 2 of my computers used my disk storage, my time, my bandwidth without paying for it.
If you want to spam, no problem, just pay me in advance.
Fight Spammers!
If you have a driveway that connects to a public road, then people can park there. Your house is connected to a public road, I can walk in and watch TV. Your car is on a public road, I can use it without your permission.
A spammer that I tracked down was very unhappy that I knocked on his door. He claimed I was tresspassing. How could I, he opted in by having his house accessible by a public road.
If spamming is legal and honorable, why don't you post your real name, address, and phone number with each spam and on each website that you spam about?
Fight Spammers!
- Mail arrives
- Mail is scanned
- Good/Bad hash is updated
- Mail is delivered (if necessary)
Then at the end of the day regenerate the probability hash.Sig is on vacation