A Crowdsourcing Project To Make Predictions More Precise

← Back to Stories (view on slashdot.org)

A Crowdsourcing Project To Make Predictions More Precise

Posted by timothy on Wednesday April 14, 2010 @08:00AM from the what-think-ye? dept.

databuff writes "Predictions are critical to modern life. Police predict where and when crimes are most likely to take place, banks predict which loan applicants are most likely to default, and hotels forecast seasonal demand to set room rates. A new project called Kaggle facilitates better predictions by providing a platform for forecasting competitions. The platform allows organizations to post their data and have it scrutinized by the world's best statisticians. It will offer a robust rating system, so it's easy to identify those with a proven track record. Organizations can choose either to follow the experts, or to follow the consensus of the crowd — which, according to New Yorker columnist James Surowiecki, is likely to be more accurate than the vast majority of individual predictions. The power of a pool of predictions was demonstrated by the Netflix Prize, a $1m data-prediction competition, which was won by a team of teams that combined 700 models. Kaggle's first competition is underway, and it is accessing the 'wisdom of crowds' to predict the winner of this May's Eurovision Song Contest." Understandably, participation requires registration.

55 of 69 comments (clear)

Min score:

Reason:

Sort:

First Post! by Anonymous Coward · 2010-04-14 08:02 · Score: 1, Funny

I predict a first post!!
did we forget something? by fusiongyro · 2010-04-14 08:02 · Score: 3, Funny

"Past prediction is not an indicator of future performance."
While we're at it, why don't we let everyone pool together their lottery number predictions?
1. Re:did we forget something? by blair1q · 2010-04-14 08:49 · Score: 1
  
  Logical. If we get enough, we're more likely to get it right.
2. Re:did we forget something? by TubeSteak · 2010-04-14 09:27 · Score: 1
  
  "Past prediction is not an indicator of future performance."
  The general idea is that 'the crowd' has, collectively, access to more (inside) information than any one expert.
  This lets 'the crowd' do a better job of predicting future performance despite your inability to verify individual trustworthiness.
  As for the lottery, if the real world was that random, things like stock markets couldn't function.
  
  --
  [Fuck Beta]
  o0t!
3. Re:did we forget something? by zenpiglet · 2010-04-14 09:45 · Score: 1
  
  Already been done: How to win the Lottery
4. Re:did we forget something? by Hognoxious · 2010-04-14 10:21 · Score: 1
  
  Don't bet on number 24.018734 - it never comes up.
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
5. Re:did we forget something? by fusiongyro · 2010-04-14 10:54 · Score: 1
  
  In order for that to work, the crowd has to actually have access to better information. The problem is that we are extremely prone to seeing patterns where they don't exist. No amount of software collation can turn pattern-y non-information into information.
6. Re:did we forget something? by databuff · 2010-04-14 12:24 · Score: 1
  
  I totally agree, past performance does not guarantee future performance. However, the more forecasts you get statisticians to make, the less likely it is that their prediction-history reflects chance rather than skill.
7. Re:did we forget something? by tomhudson · 2010-04-14 12:34 · Score: 1
  
  Let the "You just RickRolled me you insensitive b*tch" contest begin!
My crime predictions by elrous0 · 2010-04-14 08:05 · Score: 4, Interesting

Behold my amazing precognitive abilities, as I look into the future of crime and predict:
Most crime will take place in that part of town with the highest concentration of check-cashing and liquor stores, between 5 pm and 3 am. Most of the alleged defendents will not be college educated and will have prior criminal records. Very few actual crime arrests will involve white collar fraud or the elaborate, diabolically-planned crimes that make up the bulk of criminal activity shown in popular TV shows, comic books, and movies. The vast majority of accused criminals will be, in fact, guilty of the crime they are accused of. Very few criminals will be represented by a crusading public defender with the resources to conduct a thorough analysis of their case and order elaborate DNA tests to prove their innocence in a last-minute dramatic countroom reveal.

--
SJW: Someone who has run out of real oppression, and has to fake it.
1. Re:My crime predictions by Vintermann · 2010-04-14 08:15 · Score: 3, Insightful
  
  Very few actual crime arrests will involve [...] diabolically-planned crimes
  Yes. Eurovision Song Contest entries occasionally qualify as diabolically-planned crimes in my opinion, but alas, they tend to get away with it.
  
  --
  xkcd is not in the sudoers file. This incident will be reported.
2. Re:My crime predictions by dgatwood · 2010-04-14 08:21 · Score: 1
  
  I'll go one step further and say that statistically speaking, the majority of the criminals will be poor (or in the case of drug dealers, moderately wealthy, but from poor families), and most will be members of a repressed minority in the country in question.
  
  --
  Check out my sci-fi/humor trilogy at PatriotsBooks.
I can see a movie out of this project by parallel_prankster · 2010-04-14 08:11 · Score: 1

Where the bad guys get to the guy who has the best predicted performance, kidnap his loved one and make him commit a bank robbery or something.
1. Re:I can see a movie out of this project by SnarfQuest · 2010-04-14 08:20 · Score: 2, Funny
  
  How about this. Get a group of "police" that predict who will commit a crime, and arrest them beforehand. We could give it a silly name, like "minority report".
  
  --
  Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
Yeah, that's the ticket by PPH · 2010-04-14 08:13 · Score: 1

Police predict where and when crimes are most likely to take place,
There's going to be, ... er, a crime. A big one. Yeah, that's it. Clear over on the other side of town. Send all the cops. Right away.

--
Have gnu, will travel.
Crowdsourcing predictions by pushing-robot · 2010-04-14 08:15 · Score: 4, Funny

Crowdsourcing Project To Make Predictions More Precise
I think they used to call them "polls".

--
How can I believe you when you tell me what I don't want to hear?
1. Re:Crowdsourcing predictions by CannonballHead · 2010-04-14 08:42 · Score: 3, Funny
  
  But this is a poll in the cloud. It's much more important.
2. Re:Crowdsourcing predictions by blair1q · 2010-04-14 08:59 · Score: 2, Funny
  
  It puts the zeitgeist in the machine.
3. Re:Crowdsourcing predictions by databuff · 2010-04-14 12:32 · Score: 1
  
  Funny and insightful - nice comment! The post was terse, so I didn't explain that competitions on Kaggle aren't polls. Competitions are framed in a way that requires serious data analysis. For example the Eurovision Forecasting Comp requires contestants to forecast the voting matrix (who votes for who) rather than a simple who will win.
4. Re:Crowdsourcing predictions by CrashandDie · 2010-04-14 15:09 · Score: 1
  
  A few hundred years ago, they used "crowsourcing" to know what was going to happen.
  
  Looks to me like someone just misread.
Reminds me of Wikiality by Itninja · 2010-04-14 08:17 · Score: 1

Because if most people think something is true (or in this case, think something is going to happen a certain way), then it simply must be so.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
Mod Statistics by bragr · 2010-04-14 08:25 · Score: 1

If you take enough samples, with approximately the same error rate, you will get an accurate result if you average them together.
Therefore I conclude that any answer can be calculated by running: answer = (answer+rand())/2; enough times
1. Re:Mod Statistics by Hognoxious · 2010-04-14 10:25 · Score: 1
  
  Really? What of you go to Arkansas and ask how old the Earth is?
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
2. Re:Mod Statistics by Hognoxious · 2010-04-14 10:36 · Score: 1
  
  Accuracy being closeness to the bullseye and precision being the grouping of the shots.
  A good way of expressing the difference - to most people they're synonyms.
  I put it like this: two people measure the length of a stick. One says it's ten inches, the other says it's ten point one inches. Which is more accurate? Most people will say the latter. But if the real length (we'll argue about what that means later) is exactly ten (or nine point nine) the first is more accurate. The second is merely more precise.
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
3. Re:Mod Statistics by palegray.net · 2010-04-14 16:53 · Score: 1
  
  The Earth is 4,000 years old. I'm three years old.
  
  Hint: I'm also 29 years old.
  
  --
  512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
Weather predictions by SnarfQuest · 2010-04-14 08:25 · Score: 1

Can they accurately predict the global tempature 1000 years in the future, but have to estimate past values, the the Global Warming people?

--
Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
1. Re:Weather predictions by Vintermann · 2010-04-14 22:20 · Score: 1
  
  I can confidently predict that the average temperature for June 2010 in the northern hemisphere will be higher than for April 2010. I am not nearly as confident that 15. June will be warmer than today.
  Predicting averages is easier than predicting point values. The wider the area averaged over, the easier it becomes.
  "The the global warming people" are in the business of predicting averages.
  The intrade contracts on global temperature averages are yours for the taking, if you think you know more than the experts.
  
  --
  xkcd is not in the sudoers file. This incident will be reported.
Sounds like a giant paradox by northernfrights · 2010-04-14 08:32 · Score: 1

For example, what if we crowdsourced a prediction for which stocks will do well tomorrow?

There's a sort of unavoidable feedback loop when the entity responsible for prediction is also responsible for execution, even if that entity is "everyone".
Apply Selectively by delirium+of+disorder · 2010-04-14 08:32 · Score: 3, Interesting

The wisdom of crowds works when everyone is looking in the same area for the answer to a question with a somewhat fuzzy answer. The group average can often be better than any single expect that attempts to calculate it. However this is a poor approach when the crowd isn't even looking in the right place. Simple majority decision making would be disastrous for many of the big decisions organizations make. The pubic is massively ignorant on scientific issues and continues to be plagued by religious, corporate, and state imposed falsehoods. Freeing people from these shackles and providing full education for all could allow us to crowd source more important decisions and lead to a more efficient and just society.

--
------ Take away the right to say fuck and you take away the right to say fuck the government.
1. Re:Apply Selectively by delirium+of+disorder · 2010-04-14 08:39 · Score: 1
  
  *expert....I never claimed to be one regarding proofreading.
  
  --
  ------ Take away the right to say fuck and you take away the right to say fuck the government.
2. Re:Apply Selectively by blair1q · 2010-04-14 08:58 · Score: 1
  
  Crowds elected George W. Bush. Twice.
  Plural voting is not a reliable system of determining facts. It's better than asking one half-informed person, but not by much.
3. Re:Apply Selectively by Grundlefleck · 2010-04-14 09:03 · Score: 1
  
  Crowds elected George W. Bush. Twice.
  Did they though?
  
  Oooooh. Yeah, think about that one, my friend.
  
  --
  I accept I know nothing. Insulting my ignorance is wasted on me.
4. Re:Apply Selectively by MikeFM · 2010-04-14 09:51 · Score: 1
  
  I think combining one or more expert systems with the crowd can produce more interesting results. Single systems rarely accurately model a problem any better than a single person does but by polling many expert systems, as well as people, you can get a more accurate result.
  
  I can't really agree fully that more education makes people better decision makers. I think educated people tend to suffer from underestimating the value of their opinion and from thinking other people will be reasonable. The majority of the educated people I know really think that if you're nice to people they'll be nice back - no wonder geeks get beat up so much in school.
  
  --
  At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
5. Re:Apply Selectively by mrsurb · 2010-04-14 16:37 · Score: 1
  
  you corrected "expert" but decided that "pubic" should remain? ;)
6. Re:Apply Selectively by Vintermann · 2010-04-14 22:24 · Score: 1
  
  Yeah, Bush was elected by a pretty big crowd. But bad as that was, letting a smaller crowd elect someone could easily have led to much worse results.
  
  --
  xkcd is not in the sudoers file. This incident will be reported.
I read this in Sci-Fi many years ago by egburr · 2010-04-14 08:42 · Score: 1

I think it was a short story in Analog or Asimov science fiction magazines. Someone got tired of the weather forecast being right only about half of the time and created a nation-wide betting pool for people to bet on the weather for the next few days for their area. The theory was that most people would bet on what they thought would actually occur instead of trying for long odds. In the story the forecasters eventually started subscribing to the pool because its predictions were accurate more often than theirs.
I always wanted to try this to see if it really would work. I guess someone has finally done so, just not yet with weather.

--

Edward Burr
Having a smoking section in a restaurant is like having a peeing section in a swimming pool.
1. Re:I read this in Sci-Fi many years ago by blair1q · 2010-04-14 08:57 · Score: 1
  
  The problem with the weather isn't that it's unpredictable, it's that the parameters that feed the prediction can change significantly in the time it takes you to propagate the prediction to the end users, and many features of the weather are local enough that a central weather forecast is incorrect for a significant portion of the user base.
  So no, the general public will not be a better predictor of the weather than the NWS could. And the system using wagers will be even worse, since many of those wagers will be based old data, or data that is measured at a great distance from the point at which the determination of the actual weather is made.
2. Re:I read this in Sci-Fi many years ago by MikeFM · 2010-04-14 09:55 · Score: 1
  
  You could improve the system by letting people bet on their locality rather than a large area. Even with crude data there is a lot of correlation with previous patterns people have watched over time and learned to recognize without being able to clearly define. Would be interesting to tweak the system to recognize people that are more accurate and weight their predictions higher.
  
  --
  At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
Mod Statistics by Colin+Smith · 2010-04-14 08:55 · Score: 1

Averaging only reduces random error giving you a more precise result. It doesn't help with systematic errors and therefore not necessarily a more accurate result.
Accuracy being closeness to the bullseye and precision being the grouping of the shots.

--
Deleted
The police already know who and where by Colin+Smith · 2010-04-14 09:10 · Score: 2, Informative

They're just legally prevented from interceding before the when. In fact, in my old home town, the police knew a certain criminal had been murdered because of the reduction in the crime rate at certain times and areas.
Only a small proportion of real crimes and criminals are not predictable.

--
Deleted
Isaac Asimov and Hari Seldon's Psychohistory? by saintory · 2010-04-14 09:10 · Score: 1

Do polls work so well because the people voting in the earlier polls influence the later polls?
If the predictions were shared in real-time with the people they were to predict upon, would they still have the same accuracy?
It seems to me that predicting is only useful when its use is unknown to those it's used on.
1. Re:Isaac Asimov and Hari Seldon's Psychohistory? by Vintermann · 2010-04-14 22:30 · Score: 1
  
  Do polls work so well because the people voting in the earlier polls influence the later polls?
  If the predictions were shared in real-time with the people they were to predict upon, would they still have the same accuracy?
  It seems to me that predicting is only useful when its use is unknown to those it's used on.
  I think the answers to that is:
  Not only, but it does have an effect.
  No, probably not.
  You're right that things trying to predict their own results face fundamentally insurmountable obstacles. It has discouraged computer scientists, and I think it will eventually discourage economists and brain researchers. But polls can still be useful (as can computers, economists, and brains)
  
  --
  xkcd is not in the sudoers file. This incident will be reported.
Crowd Sourced Predictions... by sycodon · 2010-04-14 09:30 · Score: 1

...because none of us is as dumb as all of us.

--
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
Quality not quantity by MessyBlob · 2010-04-14 10:06 · Score: 1

This model seems to follow 'genetic programming' principles, but is flawed in many ways: (a) It assumes that most people know everything relevant to the problem under consideration - they often don't.
(b) What the model is looking for is an expert among the crowd. On average, you can find an expert among 1024 people, to predict 10 coin tosses - this is with random data having no relation to specialized wisdom.
(c) Eurovision (mentioned above) is in the rare category of scenarios that can make use of 'crowdsourcing prediction', but only because the simulation correlates to the probable reality: it's effectively a poll, where the opinions of lots of people are used to model the opinions of lots of people.
(d) Can you really assume that if someone gets it right 10 times, the same person will get it right a further 10 times? It needs to be the same specialism.
(e) You'd need to iron out the randomness by running lots of trials that will be of no use to anyone. Can this operate commercially?
1. Re:Quality not quantity by databuff · 2010-04-14 12:49 · Score: 1
  
  Great comments, thanks! To address the your most incisive comments (as I see them) c) Competitions on Kaggle aren't polls. Competitions are framed in a way that requires serious data analysis. For example the Eurovision Forecasting Comp requires contestants to forecast the voting matrix (who votes for who) rather than a simple who will win. b,d,e) getting people to do lots of predictions should seperate the talented from the lucky. Having forecasters predict in the same place over and over is a good way to get long enough history to discover the trully talented.
Re:This story is NOT News by khallow · 2010-04-14 10:30 · Score: 1

it's just a rehash of PREDICTION MARKETS which is OLD news.
I absolutely agree. And prediction markets do it better than this approach. For Kaggle, I'd issue a prediction by an initial deadline, time would pass, and my prediction would then be judged and prizes awarded. What happens if I change my mind after the start? Tough luck. And the reward is significantly devalued by being put off.

Further, there are various ways to game this system that are much easier and lower cost than corresponding gaming of prediction markets. For example, I could create several accounts and issue a scattering of predictions. The cost of creating a prediction is extremely low and (at least in this Eurovision example), the payout is $1000. Suppose I create 100 different predictions. Victory through greater coverage of the outcomes! Any similar attempts to game a betting market (say transferring money from one account to another to make the latter look better) would have some sort of cost and allow any third party a chance to interfere for a profit.

In comparison, markets work in real time, allow you to change your mind as the situation evolves, and there's no need to figure out who deserves a prize after the fact, good predictors will have the money (even if it's play money, that's a means to figure out who deserves the prize).
Modern life? by PsiCTO · 2010-04-14 11:56 · Score: 1

I don't know I find these kind of opening phrases amusing/annoying, but saying "Predictions are critical to modern life" seems to imply that somehow they are more important than ever. Aren't most major religions based on "predictions" of some kind and didn't they begin a wee bit before "modern life"?
Crowd-sourcing predictions will undermine all sorts of religions, and we all know what happens when you threaten the monopoly on truth help by religion...
1. Re:Modern life? by databuff · 2010-04-14 12:19 · Score: 1
  
  I do believe we rely on predictions more today than at anytime in history because we can make them more reliably (we have so much historical data to base them on).
Does anybody have prediction-competition ideas? by databuff · 2010-04-14 12:04 · Score: 1

Thanks everyone for your comments! Sounds like many of you are skeptical that 'wisdom of crowds' can work in this setting. It'll be an interesting experiment, but I'm encouraged by the Netflix Prize case study. Out of interest, does anybody have any interesting ideas for prediction competitions? I'd love to hear from you either in the comments area or at statsbuff@gmail.com.
The DELPHI method, circa 1944 by JohnQPublic · 2010-04-14 12:29 · Score: 1

This is REALLY old news. 66 years ago, it was known as the DELPHI method, and it's been studied to death in the interim.
1. Re:The DELPHI method, circa 1944 by databuff · 2010-04-14 12:43 · Score: 1
  
  Thanks for the post. I hadn't heard of the DELPHI method - so now I'm a little bit wiser. According to the Wikipedia article, the DELPHI method tries to get a panel of experts to agree on a single forecast. Kaggle (assuming the wisdom of crowds is the method of choice), cherishes diversity. It takes everybody's forecasts and 'combines' them in the hope that individual forecast errors will cancel out.
2. Re:The DELPHI method, circa 1944 by Teunis · 2010-04-15 11:43 · Score: 1
  
  The novel "Shockwave Rider" (John Brunner, 1975) proposes a computer-based model very similar to this one doing "crowd sourced predictions, with prizes". He even gave proper attribution, calling it "Delphi".
  
  so yeah, nothing new here - not even method.
Re:This story is NOT News by databuff · 2010-04-14 12:37 · Score: 1

Kaggle, unlike prediction markets, is designed to deal with complex tasks where data modeling is required. For example, a prediction market can be used to get the crowd's view on who will win the Eurovision Song Contest. But Kaggle is asking contestants to forecast the voting matrix.
As Someone Who Knows Something by coaxial · 2010-04-14 13:19 · Score: 1

As someone who knows a little about the Netflix Prize, metalearning is not crowdsourcing. It's not a prediction market. It's none of these things. It's essentially taking a weighted average to match some prior data and then using that for new predictions. It's machine learning. It's not magic. If you wanted to draw an analogy in the real world, it would be like asking, who do you believe more when predicting changes to the climate? Some know-nothing wingnut who never went to college, but listens to conspiracy AM talk radio and creationist programing? Or a climetologist with a PhD and years of experience? Oh sure, the wingnut *might* be right, but probably not.
The real question is whether prediction markets even work? The answer is, they don't. They're at best lagging indicators of existing knowledge, and at worst completely useless.
Re:This story is NOT News by khallow · 2010-04-15 01:27 · Score: 1

Kaggle, unlike prediction markets, is designed to deal with complex tasks where data modeling is required. For example, a prediction market can be used to get the crowd's view on who will win the Eurovision Song Contest. But Kaggle is asking contestants to forecast the voting matrix.
You can get similar coverage with a combinatorial market where the securities are themselves complex objects. Robin Hanson has a nice example (on paper) for how you could implement a prediction betting market. Among other things, Hanson's approach allows for conditional statements (if Greece gets the 2032 Summer Olympics, then Ethiopia gets the 2036 Summer Olympics). These things tend to have liquidity problems (the more esoteric the prediction, the less likely you are to find anyone to trade with) and Hanson's market does a fair job of addressing that problem. Having said that, I know of no working prediction markets with the desired level of complexity comparable to what you're doing with Kaggle. I imagine there are derivative providers would could sell derivatives of that level of complexity, but I don't know if there's a real market in that.

Glancing through your posts, I get the impression that you are associated with Kaggle. If so, I'd like to forward a few complaints. First, I registered on your site, but had to restart simply because the registration timed out on me (ok, I was writing an essay in my profile and taking a bit of time to complete that). If the registration process doesn't depend on the experience/education/expertise data to complete (I assumed at the time that there was a registration filter to weed out the riffraff), then it's probably better to hold off on that till later. Maybe the user could get reminders, if they haven't filled out the information.

Second, I have a few complaints about the sample competition. For usability, the links are invisible in a number of places. I couldn't figure out at first how to access the information for the Eurovision Voting competition (the links in the region just below "Submission Instructions: Eurovision Voting" are black font just like the text below). Also the submission template is too restricted. I refer to the line "Submissions with scores other than 1, 2, 3, 4, 5, 6, 7, 8, 10 or 12 will not be accepted." A real statistical prediction will have mixed values, for example, all 5's should be an acceptable entry (as should any numerical entry). That would mean according to my model, I have no information to provide on that row of votes. Given the scoring algorithm, I think you gain by removing this restriction.