Bayesian Filters Predict Sundance
JohnGrahamCumming writes "The LA Times reports on a company's use of Bayesian filtering to predict the winners at the Sundance Film Festival. They use a modified POPFile email filter and claim an 81% success rate."
Gay = +100%
Banach-Tarski Overdrive
Tortured with health problems? You're one click away from healthy life! An amazing variety of licensed meds at one big store! Click the link and make your first step to constant relief!
So, a company claims that their product (or in this case; algorithm) is good?
STOP THE PRESS!
SIG: TAKE OFF EVERY 'CAPTAIN'!!
...let's see it predict STOCK WINNERS.
With spending like this, exactly what are "conservatives" conserving?
Bring a decibel meter and a stopwatch and find the films with the loudest and longest:
1) Laughter
2) Applause
3) Standing Ovations afterward
This simple method will give you a good idea of who will be the winners.
He who knows best knows how little he knows. - Thomas Jefferson
I wonder what Mr. Graham thinks of this.
"Our engineers were thinking that determining whether a movie is good or bad could be similar to determining whether e-mail is spam or not," said Unspam Chief Executive Prince, 31, who loves the festival and uses it as a recruiting tool. "We had the last 10 years of the festival's film guides, which are like inputs, and then a bunch of outputs, like how many people saw a film, did it win anything at Sundance, did it have commercial success. If you could figure out the pattern between the inputs and the outputs, then you could actually predict future winners."
I'm not a Spam guru so please excuse me if I'm wrong, but isn't 81% a horrible result? Perhaps not for movie prediction but in Spam filtering?
SIG: TAKE OFF EVERY 'CAPTAIN'!!
In the stock market there are no winners...only fools to a greater or lesser degree.
Angsty +2
Depressing +2
Happy or Inspirational -1
Featuring charaters of a marginalized societal group +10
Featuring charaters of a majority societal group -10
Making those majority characters feel guilty +20
Political Agenda +10
Social Agenda +10
Leftist Social & Political Agenda +50
Non-acting acting +3
Use of black and white film +1
Sense of Humor -5
Comedy film -100
Intellectual +1
Pseudo-intellectual +30
Director dresses in all black +4
Actors dress in all black +10
Actors dress in all black and do interpretive dance to Phillip Glass music while speaking German backwards +20
Audience participates and dances with the actors in above scenario +1000
Would actually generate box office revenue -100
Good movie that would appeal to more than a niche audience -20
Prince and his crew came up with two lists: words that "make you golden" or are "the kiss of death."
...
Kiss of death: Africa, America, American, beautiful, black,
Prince went on to comment they were suprised to come up with the first racist bayesian filter in their career.
I have a sudden urge to post the O RLY owl. In on-topic news, I can't say I find this particularly impressive. From what I know about Sundance, it's pretty easy to predict winners based on past patterns and current trends in film-making. Granted, the popularity contest known as the Grammys is easier by several miles...
120 characters for a sig? That's bloody useless.
From TFA (words in the description that help or hurt it): "Golden: academic, accomplished, bedroom, complex, dialogue, dream, death, focus, girl, human, high, journey, love, mother, narrative, romance, relationship, superbly, sex, ultimately. Kiss of death: Africa, America, American, beautiful, black, best, emotional, fascinating, great, inspired, lake, new, riveting, Sundance, sexy, story, subtitles, truth, vision, world." So, they want complex, academic films about girl-mother relationships with a strong narrative of romance and sex. Nothing about beautiful black people in Africa or America with any sort of interest in visions, truth, or the world, especially if said black people are sexy and live near a great, nay, the best lake.
Although the moon is smaller than the earth, it is farther away.
is to use them to write the perfect screenplay! Guarenteed blockbusters, everytime!
"Your mother's a bloody liar... That's what I liked about her." - Yellowbeard
They picked "Sombodies" as a successful drama. Drama? Not really; definitely a flat out comedy. Good stuff. Sundance is cool, but their ticketing system is getting progressively worse. This year I paid $5 for the chance to pick movies in a ½ hour slot 3 days after the box office opened. Didn't get even one of my first choices; essentially got what was left.
46 & 2
I've been thinking about this for a while...
Someone should develop a client side Bayesian Filter / Moderation system for Slashdot.
Think about it...
A sizable portion of people around here are not consistantly assholes so it doesn't really make sense to add them to a "foe" list.
Frequently things are in strange topics so it doesn't make sense to ignore whole topics.
Not all new members are trolls so modding all new members down doesn't make sense either.
And the current moderation system is subjected to other people's current peeves and political leanings.
And please don't tell me to do it, I'm an embedded developer not a web developer... I have no idea where to even begin with it.
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
BUY Ch 3ap \/iag r a 0n1i ne - n0 prescr1pti0n r3quir3d!!!!
fak3r.com
So, a company claims that their product (or in this case; algorithm) is good?
Well according to their algorithm, certain words such as Africa, America, American, beautiful, black, best, emotional, fascinating, great, inspired, lake, new, riveting, Sundance, sexy, story, subtitles, truth, vision, world should never be used.
My 'kiss of death' film would be:
"The Beautiful Lake: An African Vision of the World"
Description: An emotional story of truth about a man from Africa who comes to America to find himself. Being a skilled carpenter, he builds a new home which is set on a beautiful lake. As we hear anectdotes of his vision of truth, a fascinating story emerges. We also learn about his riveting and inspired adventure to his new home, and we see how it impacts his once black view of the world. A great film for any Sundance enthusiast! (with sexy subtitles)
It is almost guaranteed to bomb, before anyone even sees it!
He who knows best knows how little he knows. - Thomas Jefferson
This was a far better (and open source) applecation of Bayesian filters
Does it portray women as victims? +3
Does it star a beautiful actress with ugly makeup +1
Does it deal with weighty issues? +1
Is it science fiction? -3
Does it show how minority groups are oppressed? +2
Does it star people from a minority group who haven't received Oscars for a few years? +2
Did you cry? +2
Was it made by an action movie director turned serious? +2
Does it deal with weighty issues albeit by stringing together a sequence of time-worn cliches? +2
Is it an action movie made by a serious director? -2
Is it science fiction? -5
Will I feel guilty that I'm a racist homophobe if I don't vote for this movie? +3
(For the sound editing Oscar only:) Does the movie have good sound editing? +0
Is it science fiction? -2
-- SIGFPE
Comment removed based on user account deletion
I'm not sure what kind of crack-simulator Slashdot put into its related stories selector, but some kind of Bayesian filter to figure out the relationship might be helpful.
For example...
Ask Slashdot: State of WLAN Support on Linux?
Related...
IT: Microsoft Spending $120M To Look Smaller
Games: Defying Review Aggregation
Games: Competitive Gaming Hits the Mainstream
WTF?
Although it's possible they ommitted data when when creating their model in order that it could be used later in testing (i didn't see in the article whether this was the case). It is quite possible that the 81% result was based on predicting results that were used in building the model (the article says they used historiacl data to build the model and then tried to predict historical results to test the model) - this would totally negate the results as meaningless. Lets see what it predicts and compare it the the actual results when they're available.
As well as POPFile's multi-category email filtering, I sell a commercial component that does multi-category Bayesian filtering for companies to embed in their own software. Bayesian and other statistical techniques are going to be cropping up everywhere there's text to analyze.
John.
You forgot to add:
Does it include a mentally handicapped character? +10
Wow, document classification with Bayes nets. How fresh is that??! I wonder how many more of these we'll see? I liked this version better: http://www.pitchformula.com/ He took it a step further and actually MADE art based on those kinds of predictions.
Did you ever notice that *nix doesn't even cover Linux?
Where do you see the word "related" or any of its equivalents? As far as I can tell, every story's position is based on the time it is posted to the front page.
Your system sounds like a Lifetime original movie
God spoke to me.
Well, if you stick some headlines in a little grey box attached to a "main" story, wouldn't you assume they are related to the main story? (If the "grey box" thing is an attempt to "minimize" items of less-than-wide interest, the interface just sucks.)
That's because Oscar winners are just Lifetime movies with famous people starring in them. I wish they did Oscars for movies for guys.
-- SIGFPE
...predicting Slashdupes?
Sheesh, evil *and* a jerk. -- Jade
I was amused by something in the article that said that too many adjectives in the description ("riveting!") is a predictor of a negative outcome for a film. That reminds me of a rule of thumb for restaurants that a friend suggested -- if the name of the dish is full of adjectives, it'll taste bad. Amusingly, I just did a Google search for "restaurant menu adjectives", and most of the hits on the first page were for middle-school lesson plans where kids add adjectives to menus to make the food seem more appetizing!
You didn't like Brokeback Mountain either?
I'll celebrate the big step forward for Hollywood's portrayal of gay issues when they make a gay feelgood movie. Or, you know, a gay Dukes of Hazzard.
Xenu loves you!
What are the odds on Sundance in the 5th?
That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
Have you seen the movie? +3
May Peace Prevail On Earth
"Well, if you stick some headlines in a little grey box attached to a "main" story, wouldn't you assume they are related to the main story?"
Maybe at first glance, but if every single one does NOT actually relate to the story above, I would realize that assumption was wrong...
Then I think we agree - the new GREY STORY interface sucks donkey balls.
Therefore, coming soon to a theater near you:
The Contortionist
This academic work involves an accomplished contortionist, her bedroom, and many complex, dialogue-strewn dreams that focus on girl-girl scenes with animals as well as humans. Everyone is high on life in this journey through love, motherhood, and applepie, as the narrative covers romance, relationships, but, most importantly, superb sex, ultimately.
What does Googlefight say?
'I voted for you - did you vote for me' - at least that's what the blog says ;-)
B j6EJ60T_
http://efrenramirez.imeem.com/photo/0MCW7w6O/K184
By releasing their results early, they have biased this year's results with media attention. Imagine a judge making a decsion between two films that she liked. Does he pick the same one as the computer? If a computer can do her job what does the world need him for? So she picks the movie that was not predicted.
The correct methodology would have been to entrust the results to a third party to be released after the event was over.
By the way, I am aware that my judge is a he/she, it is Sundance after all.
If someone were to use AI to predict the stock market, and would invest on it based on those predictions, they would be very successful initially, but would also change the behaviour of the same market up until it would render the model unusable.
I suspect this has happened several times.
If variables are correlated, the mechanics of that correlation might be due to some underlying common cause. Without understanding the underlying cause (if it exists), you are simply groping in the dark, hoping the interplay between variables doesn't change out from under you. Real predictive power and understanding comes from studying those underlying causes, which are generally robust.
For example, TFA's statement that "if you only have one or two producers on your film, you're statistically more likely to have a stinker" is akin to Flying Spaghetti Monsterism saying (albeit satirically) that "statistically speaking, the fewer pirates you have the more natural disasters you will have." Both are true statements based on raw data, but both tell you very little about the mechanics of why such correlations exist. By only studying correlations, you are absolving yourself of really understanding the system and making meaningful predictions.
I'm sure the folks doing the analysis know all this. But I worry that people who don't understand such an analysis will make idiotic policy decisions based on mistaking correlation and causality. For example, "we must have two producers to maximize our likelihood of winning the Sundance" or "let's start a pirate university to create more pirates to help reduce natural disasters!"
i\hbar\dot{\psi}=\hat{H}\psi
Uh, they made a real common problem in neural nets. Success rate on test data != actual success rate.
81% success when you run it back on your test data is meaningless. In fact, any number is meaningless when you apply it to your test data. I could get 100% success just by spitting back out if the name of a movie matches a name in the test set.
Here's the relevant quote:
"[t]esting the system with known data from previous years, we have established an approximately 81% typical accuracy rate on a year-by-year basis."
The test is if they can successfully predict future winners, which they haven't done. All they've done is make predictions. Puccini will do badly, etc.
Where Sundance is concerned, you run the filter one way to determine if it wins, and then reverse the good/bad word lists to see if it will have commercial success.
It's good to use your head, but not as a battering ram.
Please check it out, mods. MAHALO
Thinkingman.com New Media
If it were my project, I'd probably feed the script through instead of the reviews.
Assume I was drunk when I posted this.