Bayesian Filters Predict Sundance
JohnGrahamCumming writes "The LA Times reports on a company's use of Bayesian filtering to predict the winners at the Sundance Film Festival. They use a modified POPFile email filter and claim an 81% success rate."
Gay = +100%
Banach-Tarski Overdrive
Tortured with health problems? You're one click away from healthy life! An amazing variety of licensed meds at one big store! Click the link and make your first step to constant relief!
So, a company claims that their product (or in this case; algorithm) is good?
STOP THE PRESS!
SIG: TAKE OFF EVERY 'CAPTAIN'!!
...let's see it predict STOCK WINNERS.
With spending like this, exactly what are "conservatives" conserving?
Angsty +2
Depressing +2
Happy or Inspirational -1
Featuring charaters of a marginalized societal group +10
Featuring charaters of a majority societal group -10
Making those majority characters feel guilty +20
Political Agenda +10
Social Agenda +10
Leftist Social & Political Agenda +50
Non-acting acting +3
Use of black and white film +1
Sense of Humor -5
Comedy film -100
Intellectual +1
Pseudo-intellectual +30
Director dresses in all black +4
Actors dress in all black +10
Actors dress in all black and do interpretive dance to Phillip Glass music while speaking German backwards +20
Audience participates and dances with the actors in above scenario +1000
Would actually generate box office revenue -100
Good movie that would appeal to more than a niche audience -20
That depends. If it predicts and filters 84% of all spam, then it can't be anything but good. However, if 84% of what it predicts and filters is indeed spam, then 16% was not and was filtered needlessly - that's bad.
From TFA (words in the description that help or hurt it): "Golden: academic, accomplished, bedroom, complex, dialogue, dream, death, focus, girl, human, high, journey, love, mother, narrative, romance, relationship, superbly, sex, ultimately. Kiss of death: Africa, America, American, beautiful, black, best, emotional, fascinating, great, inspired, lake, new, riveting, Sundance, sexy, story, subtitles, truth, vision, world." So, they want complex, academic films about girl-mother relationships with a strong narrative of romance and sex. Nothing about beautiful black people in Africa or America with any sort of interest in visions, truth, or the world, especially if said black people are sexy and live near a great, nay, the best lake.
Although the moon is smaller than the earth, it is farther away.
I've been thinking about this for a while...
Someone should develop a client side Bayesian Filter / Moderation system for Slashdot.
Think about it...
A sizable portion of people around here are not consistantly assholes so it doesn't really make sense to add them to a "foe" list.
Frequently things are in strange topics so it doesn't make sense to ignore whole topics.
Not all new members are trolls so modding all new members down doesn't make sense either.
And the current moderation system is subjected to other people's current peeves and political leanings.
And please don't tell me to do it, I'm an embedded developer not a web developer... I have no idea where to even begin with it.
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
BUY Ch 3ap \/iag r a 0n1i ne - n0 prescr1pti0n r3quir3d!!!!
fak3r.com
So, a company claims that their product (or in this case; algorithm) is good?
Well according to their algorithm, certain words such as Africa, America, American, beautiful, black, best, emotional, fascinating, great, inspired, lake, new, riveting, Sundance, sexy, story, subtitles, truth, vision, world should never be used.
My 'kiss of death' film would be:
"The Beautiful Lake: An African Vision of the World"
Description: An emotional story of truth about a man from Africa who comes to America to find himself. Being a skilled carpenter, he builds a new home which is set on a beautiful lake. As we hear anectdotes of his vision of truth, a fascinating story emerges. We also learn about his riveting and inspired adventure to his new home, and we see how it impacts his once black view of the world. A great film for any Sundance enthusiast! (with sexy subtitles)
It is almost guaranteed to bomb, before anyone even sees it!
He who knows best knows how little he knows. - Thomas Jefferson
This was a far better (and open source) applecation of Bayesian filters
Does it portray women as victims? +3
Does it star a beautiful actress with ugly makeup +1
Does it deal with weighty issues? +1
Is it science fiction? -3
Does it show how minority groups are oppressed? +2
Does it star people from a minority group who haven't received Oscars for a few years? +2
Did you cry? +2
Was it made by an action movie director turned serious? +2
Does it deal with weighty issues albeit by stringing together a sequence of time-worn cliches? +2
Is it an action movie made by a serious director? -2
Is it science fiction? -5
Will I feel guilty that I'm a racist homophobe if I don't vote for this movie? +3
(For the sound editing Oscar only:) Does the movie have good sound editing? +0
Is it science fiction? -2
-- SIGFPE
Comment removed based on user account deletion
I'm not sure what kind of crack-simulator Slashdot put into its related stories selector, but some kind of Bayesian filter to figure out the relationship might be helpful.
For example...
Ask Slashdot: State of WLAN Support on Linux?
Related...
IT: Microsoft Spending $120M To Look Smaller
Games: Defying Review Aggregation
Games: Competitive Gaming Hits the Mainstream
WTF?
Where do you see the word "related" or any of its equivalents? As far as I can tell, every story's position is based on the time it is posted to the front page.
Your system sounds like a Lifetime original movie
God spoke to me.
Their web site states that the 81% number was "year on year" which I interpret to mean that they took the data for years n - 1 to predict year n.
John.
I was amused by something in the article that said that too many adjectives in the description ("riveting!") is a predictor of a negative outcome for a film. That reminds me of a rule of thumb for restaurants that a friend suggested -- if the name of the dish is full of adjectives, it'll taste bad. Amusingly, I just did a Google search for "restaurant menu adjectives", and most of the hits on the first page were for middle-school lesson plans where kids add adjectives to menus to make the food seem more appetizing!
The problem is that saying it is "81% successful" is meaningless. Typically one would use a two-fold measure of success for these sorts of application: precision and recall. In the case of spam, the precision of your algorithm would be the number of correctly marked emails over the total number of emails marked, and the recall would be the number of correctly marked emails over the number of emails that are actually spam.
In terms of search this is perhaps more clear, so consider Google. You issue Google a search query and it returns a bunch of results. Precision measures how many of the results returned are actually relevant, and recall measures how many of the relevant results were actually returned. One could get 100% precision by returning just one result which could be verified as relevant (or, in the above case, verified as spam), and one could get 100% recall by simply returning everything. Oftentimes one takes the harmonic mean of the two, called the F-score in this case, as an overall measure of the success of the algorithm. In other instances one might want to favor precision over recall or vice versa.
I think they probably mean "81% precision," but a low recall means that you'll have many spam emails which are not marked. Of course, if they mean the opposite, then low precision could mean many marked emails which are not spam!