Slashdot Mirror


Paul Graham on Fighting Spam

Ramakrishnan M writes "Paul Graham, the Lisp Guru is back with a great technique to fight spam. It is based on trust matric, and he claims, only 5 out of 1000 spams got leaked out of this system with 0 false positives. Worth looking at."

25 of 675 comments (clear)

  1. This is wrong. by www.sorehands.com · · Score: 1, Insightful
    SPAM is wrong!

    The proper way to get rid of spam is to get rid of spammers. Have it illegal to send spam, to market using spam, and to host spammers.

    Make each link in the chain liable!

    1. Re:This is wrong. by morgajel · · Score: 2, Insightful

      "if you outlaw spam, the only people with spam are outlaws..." er something.
      anyways, what I was going to say is ok, US outlaws spam. now what? sue korea as a whole? how about china? nigera?

      laws don't mean shit.
      you need to go after the people making MONEY off spam, not the spammers. Most of them are US "businesses". ...and I use the term 'business' loosely.

      --
      Looking for Book Reviews? Check out Literary Escapism.
    2. Re:This is wrong. by japhmi · · Score: 2, Insightful
      One heavy-handed bit of leverage would be to block /all/ telcommunications from Korea


      This is a very bad idea. What about companies such as Hyundai that have Korean and American (and many other countries) divisions? Or, what about my friends from Korea trying to e-mail their family back home - should they be hurt because some companies in their home country do bad things (and/or it's government doesn't have/enforce laws to stop them)? Name a country that doesn't another country/ies thinking that they need to 'change how they do things over there.'

      --
      "Giving money and power to government is like giving whiskey and car keys to teenage boys" P. J. O'Rourke
    3. Re:This is wrong. by Stonehand · · Score: 3, Insightful

      In this case, the damage to others /is/ the point, just as that's the same logic behind the Usenet Death Penalty. Hurt others (in the case of a UDP, the customers of the ISP who send perfectly legitimate email) whom the authorities do care about so that they change their policies...

      It's not particularly nice, or even remotely fair, but something like that might work. A large-scale boycott by major ISPs might do the trick.

      --
      Only the dead have seen the end of war.
  2. Ok, that is hot.... by Vengie · · Score: 4, Insightful

    1) Lisp...ever since i ran into scheme, I have _loved_ the concept of lisp based languages. A nice Hoo-ha to anyone who says there are no practical applications of lisp based languages. (except haskell...which personally, i think sucks! if one of our own professors hadn't invented it, it would be dead by now) 2) _0_ false positives. I'm perfectly happy to settle with "some small number of spams getting through" given there are NO false positives. Early on in the article he states that he realizes this is a critical problem, and from the start keeps no false positives as a goal. It is far better to have no false positives then to have 100% no-spam rate with that in mind... 3) the statistical word analysis is really interesting..."describe" is innocent. unfortunately....what happens when a few smart spammers get their hands on this analysis *sigh*

    --
    When in doubt, parenthesize. At the very least it will let some poor schmuck bounce on the % key in vi. (Larry Wall)
    1. Re:Ok, that is hot.... by Anonymous Coward · · Score: 1, Insightful
      I'm perfectly happy to settle with "some small number of spams getting through"

      I'm not singling you out, but this statement is the exact reason spam has become as popular as it has. It's annoying, it's cumbersome, but everyone is willing to 'settle' to avoid further problems. People spend effort developing complex filters and programs and proxies. which the spammer spends about a minute and a half figuring out how to get around. I think with the spammers there should be ZERO tolerance and ZERO SPAM. To stop spam you need to stop THE SPAMMER.

    2. Re:Ok, that is hot.... by Plutor · · Score: 5, Insightful

      1) [...] A nice Hoo-ha to anyone who says there are no practical applications of lisp based languages. (except haskell...which personally, i think sucks! [...])

      You ridicule people who dismiss the usefulness of your personal "favorite" language, and then you dismiss the usefulness of one particular language that you happen to dislike? That's a bit hypocritical.

      3) [...] what happens when a few smart spammers get their hands on this analysis[?]

      Paul covers this. First, he suggests that each user's filters should be personalized, so that any spammer would not be able to circumvent everyone's filters. Second, the filters would be continually learning, possibly dumping older words from the corpus in favor of newer ones. And third, even if a spammer put at the end of his spam "describe describe describe describe", this still wouldn't work; the basic premise of the filter is that the spammer HAS to tell you what he's selling, and in the process of doing that, gives himself away as a spammer.

    3. Re:Ok, that is hot.... by RevAaron · · Score: 3, Insightful

      Most people here on /. would say that same thing about Lisp-related languages that you do about Haskell. Esp that they were forced to use it, to their detriment, in an intro CS class, or perhaps in AI. I love Lisp myself, but I also think Haskell is quite interesting, and also can be very useful.

      There's no difference between you, "L1sp rules und haskell dr00ls!" and all the slashkiddiez on here that say "perl and C 0wnZ j00! fsck lisp!"

      --

      Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
  3. When I said... by Anonymous Coward · · Score: 1, Insightful

    When I said market using spam, that includes the company that hires someone who spams.

  4. Filtering text content by gawi · · Score: 2, Insightful

    Great... now that they know, they'll spam me with gifs and jpeg.

    --
    All humans are mortal. Socrates is a human. Socrates is dead.
  5. Re:A weak point... by tomknight · · Score: 3, Insightful
    Yes, I'll admit I hurried in with the comment there. Stupid ;-)

    Spammers would learn to adapt, and the sales pitches would change character/format. The sales pitch will still be that, but it'll be more cleverly designed - it may be hard to do, but people will manage it. having said that, this method does look like it could be worth implementing - maybe even on the mail server...

    Tom.

    --
    Oh arse
  6. Re:Misleading by sebi · · Score: 4, Insightful

    In the long run filtering would eliminate the source as well. Spam has to be payed for by two sides: Both the spammer and the recipient have to pay for the bandwith. The spammer has to pay a lot more though. Spamming is a business that will continue to exist as long as its profitable. If the success rate of Spam drops dramatically due to refining filters than sooner or later Spammers will no longer be able to afford the bandwidth they need.

  7. Re:A weak point... by tsg · · Score: 2, Insightful

    but it'll be more cleverly designed

    Ding ding ding ding &ltpoints at nose&gt.

    I think you've hit the nail on the head. Simply requiring that spam be cleverly designed should get rid of 99% of spammers.

    --
    People's desire to believe they are right is much stronger than their desire to be right.
  8. Law and Reality by prester · · Score: 3, Insightful
    Making something illegal doesn't make someone stop doing it, obviously. All it does is increase the risks of doing the action. If it's still worth it to you anyway (drug dealers, drug addicts), or you're not thinking about the consequences of your actions (shooting the bastard who you just found in bed with your wife), or if you don't think that you're actually going to get caught (warez), you're not going to stop just because it's illegal.

    Making spam illegal would probably cut down on people buying email lists and starting to spam in their free time because it seems like a great way to make some money. It might even cut down on the "legitimate businessmen" types here who do it professionally. It's going to have no effect internationally, however, and there's really not much you can do about it.

    There's an interesting point about this in the article, however, when graham says:
    "(I used to think it was naive to believe that stricter laws would decrease spam. Now I think that while stricter laws may not decrease the amount of spam that spammers send, they can certainly help filters to decrease the amount of spam that recipients actually see.)"

    I would agree with this - it seems to me that for a lot of "crimes of this nature, drugs being the best example, the solution is not criminalization but regulation. People aren't going to stop dealing or using drugs, nor is it something as serious (like murder) that it's worth it to put them in jail anyway. If drugs were regulated, however, most of the problems could be easily reduced. Enforce strict controls to prevent cutting, ban advertisement, and tie sellers to treatment programs to help get people off of drugs. As long as there's no incentive for people to buy them illegally (ie, their being much cheaper or, as it is now, the only supply), people will buy them from regulated sellers.

    Similarly if you regulate spam and make people attach footers you'll be less likely to drive people overseas to spam while also making it much easier to filter out.

    Of course, there's still not much you can do about the Koreans, other than trying to get their government to do the same thing.

    Besides, do you really want to encourage the government to effectively prohibit certain kinds of non-victimizing (non-kiddie porn) speech online?
  9. Re:This approach is very easy to defeat by pmz · · Score: 5, Insightful

    The spam message is entirely contained as an /image/ within the html.

    Thankfully, my e-mail client is set up to not render any HTML in an e-mail. I have yet to send back any information to a spammer via specially-coded image tags and am proud of it.

    HTML-based e-mail is fundamentally insecure and really should be used by no one (except those who simply don't care about privacy). Go here to learn just what a spammer--or anyone who sends you an HTML-based e-mail--can learn about you with just one "click" of your mouse.

    Yes, the spammer can learn what browser version you use, what OS you use, and even what city you live in (via the traceroute). An unusually savvy spammer could use this information to install spyware via known exploits in certain browsers and operating systems.

    In short, HTML e-mail is damn scary knowing that so many people us it not knowing just how much information they are giving away for free!

  10. Re:Best anti-Spam method is TMDA by Anonymous Coward · · Score: 2, Insightful

    I like TMDA, but I have two issues with it. First, you can only use it if you control a mail server. Second, my friends have a terrible time dealing with the concept of having to reply to a message to let mail go through to me. Sure, I can add them in advance, but if they have a new mail address, I don't get to see their message. Maybe I just have dumb friends, but they are my friends, and I want to get mail from them!

  11. Not much help for businesses... by David+Wong · · Score: 3, Insightful

    ...Or somebody who runs a website like me. I want readers to be able to get through, even though they're not each on my approved list. In the same way, a business who uses a customer feedback e-mail address needs to keep it open to everyone.

    I actually had to close down my hotmail account; the spam would exceed the 2MB within 24 hours after being cleaned (and that's with the wonderful MS spam filter set on "high.")

    BTW, these days I'm getting individual spams that are 170 KB in size. Talk about rude...

  12. Re:This is not news ... by wsloand · · Score: 2, Insightful

    BUT, now, the best spam filters out there already use statistical properties. Spamassassin does this...

    Spamassassin (as he addressed) does not do this, it gives individual items a score. His method dynamically scores items based on the message. You could use his filter as a plugin for Spamassassin, but with the numbers he's talking about you wouldn't need anything other than his system.

    Bill

  13. The problem is the existing email infrastructure by dmelomed · · Score: 2, Insightful

    SMTP is designed broken because it:

    1) Allows senders to be faked.
    2) Is slow.
    3) Requires bounces for broken messages.
    4) Allows loops.
    5) Cross-subscription to mailing lists, complicated mailing list management.
    6) MIME.
    7) Add your gripe here.

    See http://cr.yp.to/im2000.html

  14. Incorrect statistics by SiliconEntity · · Score: 4, Insightful
    Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam.

    This reasoning is statistically invalid. It is only true if the chance of the word "sexy" appearing in a message is independent of the chance of the word "sex" appearing. In other words, only if knowing that the word "sex" appears tells you nothing about how likely the word "sexy" is to appear, can you reason as he is doing above. That's probably a very poor assumption in this case.

    He is doing:

    p(sex & sexy) = p(sex) * p(sexy)
    The correct formula is:
    p(sex & sexy) = p(sex) * p(sexy | sex)
    where the last term means the probably of "sexy" given that "sex" appears.

    Maybe his approach is good enough for his purposes, but the statistical foundations are not correct.

  15. Re:This approach is very easy to defeat by gwernol · · Score: 3, Insightful

    the spam should be written as a 'multipart/alternative' with an html version of the spam as the primary alternate. The text version contains an innocuous message intended to pass the statistical spam filter. The spam message is entirely contained as an /image/ within the html.

    Yes this would make it more difficult to spot, but notice that he examines the headers as well as the content of the spam. Looking at Mr. Graham's examples a lot of the key words that his filter finds are parts of the header, so you have a good chance that the probabalistic filters can still rule these out.

    The second point, also made in Paul's article, is that part of what you want to do is push up the costs and difficulty of sending spam. Pushing out a million HTML images is much more costly to the spammer than sending out a million text messages. The more costs we can force spammers to bear the less economical it will become to spam, thus reducing the amount of spam.

    --
    Sailing over the event horizon
  16. Mailing list hell by ajs · · Score: 3, Insightful

    Can you imagine the day everyone uses this. You send mail to a public list and get back 2000 messages asking you to "authenticate" yourself.

    This is a bad plan for working in the large.

  17. Bullshit! by www.sorehands.com · · Score: 5, Insightful
    Another spammer lie.

    Freedom of speech is not the freedom to tresspass on my computer equiptment, use my resources for me to listen to your advertising!

    This is not a prohibition on your paying your moneyto spread your advertising. This is a prohibition on you spending my money to spread your advertising.


    Commercial speech does have some constitutional protection, but not to the same level as non-commercial speech. But even with pure political speech, there is no requirement for me to pay for your speech.


    As for hitting the delete key, at that point, you have already tied up at least 2 of my computers used my disk storage, my time, my bandwidth without paying for it.


    If you want to spam, no problem, just pay me in advance.

  18. Your eyes are brown. by www.sorehands.com · · Score: 3, Insightful
    You are so full of shit, your eyes are brown!


    If you have a driveway that connects to a public road, then people can park there. Your house is connected to a public road, I can walk in and watch TV. Your car is on a public road, I can use it without your permission.


    A spammer that I tracked down was very unhappy that I knocked on his door. He claimed I was tresspassing. How could I, he opted in by having his house accessible by a public road.


    If spamming is legal and honorable, why don't you post your real name, address, and phone number with each spam and on each website that you spam about?

  19. Re:Why Bayesian Analysis isn't so hot by mla_anderson · · Score: 2, Insightful
    If you keep the two original hashes along with the probability hash you can simply update the word count of the two originals and rebuild the probability hash. This could be fairly simple.

    1. Mail arrives
    2. Mail is scanned
    3. Good/Bad hash is updated
    4. Mail is delivered (if necessary)
    Then at the end of the day regenerate the probability hash.
    --
    Sig is on vacation