Bayesian Filters Predict Sundance

← Back to Stories (view on slashdot.org)

Bayesian Filters Predict Sundance

Posted by CmdrTaco on Tuesday January 24, 2006 @03:00AM from the because-you-can dept.

JohnGrahamCumming writes "The LA Times reports on a company's use of Bayesian filtering to predict the winners at the Sundance Film Festival. They use a modified POPFile email filter and claim an 81% success rate."

5 of 123 comments (clear)

Min score:

Reason:

Sort:

Re:Fuck films... by DeveloperAdvantage · 2006-01-24 03:28 · Score: 5, Informative

There are many examples of using statistics and artificial intelligence in finance (go google), including some applications to predict stock prices. Even a decade ago, books like "Neural Networks in Finance and Investing" and "Artificial Intelligence in the Capital Markets" were already published, along with hordes of books on statistics in finance (think about what Quants do).

Of course, I don't think we can yet predict stock prices with the same 81% accuracy as in this article. And, if anyone could, they would be wise to keep it to themselves.

--
FREE - Java, J2EE and Ajax Audiobooks for Software Developers - www.DeveloperAdvantage.com
Re:Bayesian filter to predict Slashdot's new stori by Kagura · 2006-01-24 03:42 · Score: 2, Informative

Where do you see the word "related" or any of its equivalents? As far as I can tell, every story's position is based on the time it is posted to the front page.
Re:Statistical methods? by JohnGrahamCumming · 2006-01-24 03:51 · Score: 2, Informative

Their web site states that the 81% number was "year on year" which I interpret to mean that they took the data for years n - 1 to predict year n.

John.
Re:Shocking news! by sunya · 2006-01-24 03:55 · Score: 3, Informative

nowhere does it rain all day more than 15% of the days.

Time to brush up on geography. It rains pretty much all the time in Cherrapunji.

--
MLT - simple and robust open source multimedia framework for Linux
Re:Unimpressed by Vann_v2 · 2006-01-24 05:56 · Score: 2, Informative

The problem is that saying it is "81% successful" is meaningless. Typically one would use a two-fold measure of success for these sorts of application: precision and recall. In the case of spam, the precision of your algorithm would be the number of correctly marked emails over the total number of emails marked, and the recall would be the number of correctly marked emails over the number of emails that are actually spam.

In terms of search this is perhaps more clear, so consider Google. You issue Google a search query and it returns a bunch of results. Precision measures how many of the results returned are actually relevant, and recall measures how many of the relevant results were actually returned. One could get 100% precision by returning just one result which could be verified as relevant (or, in the above case, verified as spam), and one could get 100% recall by simply returning everything. Oftentimes one takes the harmonic mean of the two, called the F-score in this case, as an overall measure of the success of the algorithm. In other instances one might want to favor precision over recall or vice versa.

I think they probably mean "81% precision," but a low recall means that you'll have many spam emails which are not marked. Of course, if they mean the opposite, then low precision could mean many marked emails which are not spam!