Slashdot Mirror


Bayesian Filters Predict Sundance

JohnGrahamCumming writes "The LA Times reports on a company's use of Bayesian filtering to predict the winners at the Sundance Film Festival. They use a modified POPFile email filter and claim an 81% success rate."

29 of 123 comments (clear)

  1. It goes like this: by BTO · · Score: 4, Funny

    Gay = +100%

    --

    Banach-Tarski Overdrive
  2. The Winner! by Anonymous Coward · · Score: 2, Funny

    Tortured with health problems? You're one click away from healthy life! An amazing variety of licensed meds at one big store! Click the link and make your first step to constant relief!

  3. Shocking news! by Big+Nothing · · Score: 3, Funny

    So, a company claims that their product (or in this case; algorithm) is good?

    STOP THE PRESS!

    --
    SIG: TAKE OFF EVERY 'CAPTAIN'!!
    1. Re:Shocking news! by goombah99 · · Score: 3, Insightful

      Yeah, I get so tired of people publishing probabilty success rates without stating what the baseline is.

      For example, I could announce I have an 85% accurate weather prediction system. it's this: predict the sun will shine most of the day. nowhere does it rain all day more than 15% of the days. so my predictor is 85% accurate.

      When you claim an accuracy you need to also give the null model accuracy or it's gibberish.

      --
      Some drink at the fountain of knowledge. Others just gargle.
    2. Re:Shocking news! by sunya · · Score: 3, Informative

      nowhere does it rain all day more than 15% of the days.

      Time to brush up on geography. It rains pretty much all the time in Cherrapunji.

      --
      MLT - simple and robust open source multimedia framework for Linux
  4. Fuck films... by Caspian · · Score: 2, Insightful

    ...let's see it predict STOCK WINNERS.

    --
    With spending like this, exactly what are "conservatives" conserving?
    1. Re:Fuck films... by Caspian · · Score: 3, Funny
      "Predict something that noone else can predict."

      Who IS this Noone guy? I keep hearing his name all over the place. He must be bigger than Jesus.
      --
      With spending like this, exactly what are "conservatives" conserving?
    2. Re:Fuck films... by DeveloperAdvantage · · Score: 5, Informative

      There are many examples of using statistics and artificial intelligence in finance (go google), including some applications to predict stock prices. Even a decade ago, books like "Neural Networks in Finance and Investing" and "Artificial Intelligence in the Capital Markets" were already published, along with hordes of books on statistics in finance (think about what Quants do).

      Of course, I don't think we can yet predict stock prices with the same 81% accuracy as in this article. And, if anyone could, they would be wise to keep it to themselves.

      --
      FREE - Java, J2EE and Ajax Audiobooks for Software Developers - www.DeveloperAdvantage.com
    3. Re:Fuck films... by MobyDisk · · Score: 3, Funny

      ** REPORT RESULTS: Bayesian Query = 'STOCK WINNERS' **

      George W. Bush
      Dick Cheney
      Darl McBride

  5. Filter Mods by Anonymous Coward · · Score: 5, Funny

    Angsty +2
    Depressing +2
    Happy or Inspirational -1
    Featuring charaters of a marginalized societal group +10
    Featuring charaters of a majority societal group -10
    Making those majority characters feel guilty +20
    Political Agenda +10
    Social Agenda +10
    Leftist Social & Political Agenda +50
    Non-acting acting +3
    Use of black and white film +1
    Sense of Humor -5
    Comedy film -100
    Intellectual +1
    Pseudo-intellectual +30
    Director dresses in all black +4
    Actors dress in all black +10
    Actors dress in all black and do interpretive dance to Phillip Glass music while speaking German backwards +20
    Audience participates and dances with the actors in above scenario +1000
    Would actually generate box office revenue -100
    Good movie that would appeal to more than a niche audience -20

  6. Re:Unimpressed by Raistlin77 · · Score: 2, Insightful

    That depends. If it predicts and filters 84% of all spam, then it can't be anything but good. However, if 84% of what it predicts and filters is indeed spam, then 16% was not and was filtered needlessly - that's bad.

  7. Fit your stereotype? by 246o1 · · Score: 4, Interesting

    From TFA (words in the description that help or hurt it): "Golden: academic, accomplished, bedroom, complex, dialogue, dream, death, focus, girl, human, high, journey, love, mother, narrative, romance, relationship, superbly, sex, ultimately. Kiss of death: Africa, America, American, beautiful, black, best, emotional, fascinating, great, inspired, lake, new, riveting, Sundance, sexy, story, subtitles, truth, vision, world." So, they want complex, academic films about girl-mother relationships with a strong narrative of romance and sex. Nothing about beautiful black people in Africa or America with any sort of interest in visions, truth, or the world, especially if said black people are sexy and live near a great, nay, the best lake.

    --
    Although the moon is smaller than the earth, it is farther away.
  8. Bayesian for Slashdot by bhima · · Score: 5, Interesting

    I've been thinking about this for a while...

    Someone should develop a client side Bayesian Filter / Moderation system for Slashdot.

    Think about it...

    A sizable portion of people around here are not consistantly assholes so it doesn't really make sense to add them to a "foe" list.
    Frequently things are in strange topics so it doesn't make sense to ignore whole topics.
    Not all new members are trolls so modding all new members down doesn't make sense either.
    And the current moderation system is subjected to other people's current peeves and political leanings.

    And please don't tell me to do it, I'm an embedded developer not a web developer... I have no idea where to even begin with it.

    --
    Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
    1. Re:Bayesian for Slashdot by utexaspunk · · Score: 2, Interesting

      Yeah- I've wanted a site like digg/slashdot that worked like this for a while- users can vote on anything, and then anything you haven't voted on is given a score that is calculated according to how the people who most consistently vote in agreement with you score the story/comment. The site is custom-tailored to what you want- People who like stupid crap will mod up stupid crap and get more stupid crap because other people who like stupid crap will have modded up the same stupid crap and more, while people who like good stuff will mod up good stuff and will get more of it because other people who modded up the same stuff that they thought was good will have modded up more stuff that they'll like. It's impervious to trolls or advertisers, because if I don't like advertising, I'll mod all advertising down and thus it will pre-mod stuff with advertising down because other people who hate advertising will have modded it down...

    2. Re:Bayesian for Slashdot by Billosaur · · Score: 4, Interesting
      And the current moderation system is subjected to other people's current peeves and political leanings.

      Which is what makes it so much fun!

      Seriously, its wonderful that Bayesian filters are useful, but why put blinders on? Slashdot would simply cease to be interesting if you could will away anything you didn't like. Intelligent discourse requires an airing of all sides of an issue and theoretically this can lead to consensus building, if the best parts of all ideas are combined. Of course you're going to get people with very little to say, or very little between the ears, muddying the waters -- the challenge is to take the disparate elements and meld them to something coherent. Superfluous elements will be winnowed out and hopefully the end product is something most people can agree on.

      Of course this is Slashdot, the Internet equivalent of a bar brawl. The rough-and-tumble of this kind of fourm is what keeps it interesting and more importantly, as much as we are infuriated by those who don't agree with us, makes us think.

      --
      GetOuttaMySpace - The Anti-Social Network
    3. Re:Bayesian for Slashdot by bhima · · Score: 2, Interesting

      I think you are looking at it the wrong way:

      Using the current mod system on Slashdot you are using someone else's blinders.
      Using the Friend / Foe system you are using a static subset.

      Less than 20% of the comments around here are either meaningful, thought provoking, or relevant... I want to see those that truly are interesting and between the current mod system and the outright volume I can't in the amount of time I'm willing to spend reading Slashdot.

      Slashdot is not like the Internet equivalent of a bar brawl it's more like kids talking about sex in the playground of an elementary school after a heavy rain.

      --
      Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
    4. Re:Bayesian for Slashdot by BridgeBum · · Score: 2, Interesting

      Check out http://reddit.com./ At least, once it isn't broken. It's a news aggragation site per slashdot/digg, but incorporates some of what you are looking for.

      --
      My UID is the product of 2 primes.
  9. And the winner is... by fak3r · · Score: 2, Funny

    BUY Ch 3ap \/iag r a 0n1i ne - n0 prescr1pti0n r3quir3d!!!!

  10. An algorithm that works by digitaldc · · Score: 4, Funny

    So, a company claims that their product (or in this case; algorithm) is good?

    Well according to their algorithm, certain words such as Africa, America, American, beautiful, black, best, emotional, fascinating, great, inspired, lake, new, riveting, Sundance, sexy, story, subtitles, truth, vision, world should never be used.

    My 'kiss of death' film would be:

    "The Beautiful Lake: An African Vision of the World"

    Description: An emotional story of truth about a man from Africa who comes to America to find himself. Being a skilled carpenter, he builds a new home which is set on a beautiful lake. As we hear anectdotes of his vision of truth, a fascinating story emerges. We also learn about his riveting and inspired adventure to his new home, and we see how it impacts his once black view of the world. A great film for any Sundance enthusiast! (with sexy subtitles)

    It is almost guaranteed to bomb, before anyone even sees it!

    --
    He who knows best knows how little he knows. - Thomas Jefferson
    1. Re:An algorithm that works by Sporkus · · Score: 2, Funny

      I've gone ahead and compiled a similar list with respect to /. posts: Golden: "insensitive clod," "tinfoil hat," "Soviet Russia," "overlords," and "M$" Kiss of Death: "the honorable Jack Thompson," "the RIAA acted appropriately," etc.

  11. A better thing by tessonec · · Score: 2, Interesting

    This was a far better (and open source) applecation of Bayesian filters

  12. Instructions on completing your Oscar ballot form. by SIGFPE · · Score: 2, Funny

    Does it portray women as victims? +3

    Does it star a beautiful actress with ugly makeup +1

    Does it deal with weighty issues? +1

    Is it science fiction? -3

    Does it show how minority groups are oppressed? +2

    Does it star people from a minority group who haven't received Oscars for a few years? +2

    Did you cry? +2

    Was it made by an action movie director turned serious? +2

    Does it deal with weighty issues albeit by stringing together a sequence of time-worn cliches? +2

    Is it an action movie made by a serious director? -2

    Is it science fiction? -5

    Will I feel guilty that I'm a racist homophobe if I don't vote for this movie? +3

    (For the sound editing Oscar only:) Does the movie have good sound editing? +0

    Is it science fiction? -2

    --
    -- SIGFPE
  13. Comment removed by account_deleted · · Score: 4, Funny

    Comment removed based on user account deletion

  14. Bayesian filter to predict Slashdot's new stories? by xxxJonBoyxxx · · Score: 2, Insightful

    I'm not sure what kind of crack-simulator Slashdot put into its related stories selector, but some kind of Bayesian filter to figure out the relationship might be helpful.

    For example...

    Ask Slashdot: State of WLAN Support on Linux?
    Related...
        IT: Microsoft Spending $120M To Look Smaller
        Games: Defying Review Aggregation
        Games: Competitive Gaming Hits the Mainstream

    WTF?

  15. Re:Bayesian filter to predict Slashdot's new stori by Kagura · · Score: 2, Informative

    Where do you see the word "related" or any of its equivalents? As far as I can tell, every story's position is based on the time it is posted to the front page.

  16. Re:Instructions on completing your Oscar ballot fo by CrazyJim1 · · Score: 2

    Your system sounds like a Lifetime original movie

  17. Re:Statistical methods? by JohnGrahamCumming · · Score: 2, Informative

    Their web site states that the 81% number was "year on year" which I interpret to mean that they took the data for years n - 1 to predict year n.

    John.

  18. adjectives bad in film descriptions, menus by Harlan879 · · Score: 2, Interesting

    I was amused by something in the article that said that too many adjectives in the description ("riveting!") is a predictor of a negative outcome for a film. That reminds me of a rule of thumb for restaurants that a friend suggested -- if the name of the dish is full of adjectives, it'll taste bad. Amusingly, I just did a Google search for "restaurant menu adjectives", and most of the hits on the first page were for middle-school lesson plans where kids add adjectives to menus to make the food seem more appetizing!

  19. Re:Unimpressed by Vann_v2 · · Score: 2, Informative

    The problem is that saying it is "81% successful" is meaningless. Typically one would use a two-fold measure of success for these sorts of application: precision and recall. In the case of spam, the precision of your algorithm would be the number of correctly marked emails over the total number of emails marked, and the recall would be the number of correctly marked emails over the number of emails that are actually spam.

    In terms of search this is perhaps more clear, so consider Google. You issue Google a search query and it returns a bunch of results. Precision measures how many of the results returned are actually relevant, and recall measures how many of the relevant results were actually returned. One could get 100% precision by returning just one result which could be verified as relevant (or, in the above case, verified as spam), and one could get 100% recall by simply returning everything. Oftentimes one takes the harmonic mean of the two, called the F-score in this case, as an overall measure of the success of the algorithm. In other instances one might want to favor precision over recall or vice versa.

    I think they probably mean "81% precision," but a low recall means that you'll have many spam emails which are not marked. Of course, if they mean the opposite, then low precision could mean many marked emails which are not spam!