Slashdot Mirror


Bayesian Filtering For Dummies

Dynamoo writes "Bayesian filtering for spam is awfully clever stuff, touched on by Slashdot several times before. There's a very accessible article at BBC News explaining in fairly simple terms the drawbacks of current keyword-based filtering. It's slightly ironic that the BBC, through the commissioning of Monty Python, also gave 'spam' its name. Those Vikings have a lot to answer for."

2 of 281 comments (clear)

  1. Re:Yes, we must filter out the dummies by zoikes · · Score: 5, Interesting

    The moderation system (esp. in its current form - moderation by +karma /.ers) will always be better than automated filtering.

    The key problem is adaptation. "Bayesian filtering is better than simple keyword filtering, but its performance will degrade over time unless its rules are continuously updated (via analysis of new data). And there's the problem that a troll in one story context may be an insightful comment in another.

    Moderation by humans apapts rapidly, accomodates a variety of contexts, and will reflect (and grow with) the overall /. "culture".

  2. Re:Yes, we must filter out the dummies by dJCL · · Score: 5, Interesting

    I've been using a baysian spam filter for months now and I understand how they work... Even thou people find the comment funny, a baysian troll filter on slashdot would work...

    If you were to run every slashdot post throu my mail filter as an e-mail message and properly mark the trolls and others you don't want, and the ones you do want, suddenly you would only get the actual good posts, trolling would die quickly... And because of the user classification system currently in place, slashdot has a huge db to build up the word stats, so it could happen immediatly or faster...

    Seriously, I ask that the slashdot admins consider adding this to slashcode... even if slashdot does not use it, others would... there are too many trolls out there as it is on the net and many people put them only a few rungs higher than spammers on the evolutionary ladder(but lower than an ameoba still)

    The logic behind this can actually be extended, to allow a user to start filtering stories so that they only get ones that interest them, or even to filtering submissions to get rid of the cruft, how often to you think that the trolls post troll story submissions? Save work for the site admins...

    I'm curious if an extension of this idea is how Google News works... anyone know?

    Enjoy.

    --
    On Arrakis: early worm gets the bird. Magister mundi sum!