A Crowdsourcing Project To Make Predictions More Precise
databuff writes "Predictions are critical to modern life. Police predict where and when crimes are most likely to take place, banks predict which loan applicants are most likely to default, and hotels forecast seasonal demand to set room rates. A new project called Kaggle facilitates better predictions by providing a platform for forecasting competitions. The platform allows organizations to post their data and have it scrutinized by the world's best statisticians. It will offer a robust rating system, so it's easy to identify those with a proven track record. Organizations can choose either to follow the experts, or to follow the consensus of the crowd — which, according to New Yorker columnist James Surowiecki, is likely to be more accurate than the vast majority of individual predictions. The power of a pool of predictions was demonstrated by the Netflix Prize, a $1m data-prediction competition, which was won by a team of teams that combined 700 models. Kaggle's first competition is underway, and it is accessing the 'wisdom of crowds' to predict the winner of this May's Eurovision Song Contest." Understandably, participation requires registration.
I predict a first post!!
"Past prediction is not an indicator of future performance."
While we're at it, why don't we let everyone pool together their lottery number predictions?
Behold my amazing precognitive abilities, as I look into the future of crime and predict:
Most crime will take place in that part of town with the highest concentration of check-cashing and liquor stores, between 5 pm and 3 am. Most of the alleged defendents will not be college educated and will have prior criminal records. Very few actual crime arrests will involve white collar fraud or the elaborate, diabolically-planned crimes that make up the bulk of criminal activity shown in popular TV shows, comic books, and movies. The vast majority of accused criminals will be, in fact, guilty of the crime they are accused of. Very few criminals will be represented by a crusading public defender with the resources to conduct a thorough analysis of their case and order elaborate DNA tests to prove their innocence in a last-minute dramatic countroom reveal.
SJW: Someone who has run out of real oppression, and has to fake it.
Where the bad guys get to the guy who has the best predicted performance, kidnap his loved one and make him commit a bank robbery or something.
Police predict where and when crimes are most likely to take place,
There's going to be, ... er, a crime. A big one. Yeah, that's it. Clear over on the other side of town. Send all the cops. Right away.
Have gnu, will travel.
Crowdsourcing Project To Make Predictions More Precise
I think they used to call them "polls".
How can I believe you when you tell me what I don't want to hear?
Because if most people think something is true (or in this case, think something is going to happen a certain way), then it simply must be so.
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
If you take enough samples, with approximately the same error rate, you will get an accurate result if you average them together.
Therefore I conclude that any answer can be calculated by running: answer = (answer+rand())/2; enough times
Can they accurately predict the global tempature 1000 years in the future, but have to estimate past values, the the Global Warming people?
Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
For example, what if we crowdsourced a prediction for which stocks will do well tomorrow?
There's a sort of unavoidable feedback loop when the entity responsible for prediction is also responsible for execution, even if that entity is "everyone".
The wisdom of crowds works when everyone is looking in the same area for the answer to a question with a somewhat fuzzy answer. The group average can often be better than any single expect that attempts to calculate it. However this is a poor approach when the crowd isn't even looking in the right place. Simple majority decision making would be disastrous for many of the big decisions organizations make. The pubic is massively ignorant on scientific issues and continues to be plagued by religious, corporate, and state imposed falsehoods. Freeing people from these shackles and providing full education for all could allow us to crowd source more important decisions and lead to a more efficient and just society.
------ Take away the right to say fuck and you take away the right to say fuck the government.
I think it was a short story in Analog or Asimov science fiction magazines. Someone got tired of the weather forecast being right only about half of the time and created a nation-wide betting pool for people to bet on the weather for the next few days for their area. The theory was that most people would bet on what they thought would actually occur instead of trying for long odds. In the story the forecasters eventually started subscribing to the pool because its predictions were accurate more often than theirs.
I always wanted to try this to see if it really would work. I guess someone has finally done so, just not yet with weather.
Edward Burr
Having a smoking section in a restaurant is like having a peeing section in a swimming pool.
Averaging only reduces random error giving you a more precise result. It doesn't help with systematic errors and therefore not necessarily a more accurate result.
Accuracy being closeness to the bullseye and precision being the grouping of the shots.
Deleted
They're just legally prevented from interceding before the when. In fact, in my old home town, the police knew a certain criminal had been murdered because of the reduction in the crime rate at certain times and areas.
Only a small proportion of real crimes and criminals are not predictable.
Deleted
Do polls work so well because the people voting in the earlier polls influence the later polls?
If the predictions were shared in real-time with the people they were to predict upon, would they still have the same accuracy?
It seems to me that predicting is only useful when its use is unknown to those it's used on.
...because none of us is as dumb as all of us.
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
This model seems to follow 'genetic programming' principles, but is flawed in many ways: (a) It assumes that most people know everything relevant to the problem under consideration - they often don't.
(b) What the model is looking for is an expert among the crowd. On average, you can find an expert among 1024 people, to predict 10 coin tosses - this is with random data having no relation to specialized wisdom.
(c) Eurovision (mentioned above) is in the rare category of scenarios that can make use of 'crowdsourcing prediction', but only because the simulation correlates to the probable reality: it's effectively a poll, where the opinions of lots of people are used to model the opinions of lots of people.
(d) Can you really assume that if someone gets it right 10 times, the same person will get it right a further 10 times? It needs to be the same specialism.
(e) You'd need to iron out the randomness by running lots of trials that will be of no use to anyone. Can this operate commercially?
it's just a rehash of PREDICTION MARKETS which is OLD news.
I absolutely agree. And prediction markets do it better than this approach. For Kaggle, I'd issue a prediction by an initial deadline, time would pass, and my prediction would then be judged and prizes awarded. What happens if I change my mind after the start? Tough luck. And the reward is significantly devalued by being put off.
Further, there are various ways to game this system that are much easier and lower cost than corresponding gaming of prediction markets. For example, I could create several accounts and issue a scattering of predictions. The cost of creating a prediction is extremely low and (at least in this Eurovision example), the payout is $1000. Suppose I create 100 different predictions. Victory through greater coverage of the outcomes! Any similar attempts to game a betting market (say transferring money from one account to another to make the latter look better) would have some sort of cost and allow any third party a chance to interfere for a profit.
In comparison, markets work in real time, allow you to change your mind as the situation evolves, and there's no need to figure out who deserves a prize after the fact, good predictors will have the money (even if it's play money, that's a means to figure out who deserves the prize).
I don't know I find these kind of opening phrases amusing/annoying, but saying "Predictions are critical to modern life" seems to imply that somehow they are more important than ever. Aren't most major religions based on "predictions" of some kind and didn't they begin a wee bit before "modern life"?
Crowd-sourcing predictions will undermine all sorts of religions, and we all know what happens when you threaten the monopoly on truth help by religion...
Thanks everyone for your comments! Sounds like many of you are skeptical that 'wisdom of crowds' can work in this setting. It'll be an interesting experiment, but I'm encouraged by the Netflix Prize case study. Out of interest, does anybody have any interesting ideas for prediction competitions? I'd love to hear from you either in the comments area or at statsbuff@gmail.com.
This is REALLY old news. 66 years ago, it was known as the DELPHI method, and it's been studied to death in the interim.
Kaggle, unlike prediction markets, is designed to deal with complex tasks where data modeling is required. For example, a prediction market can be used to get the crowd's view on who will win the Eurovision Song Contest. But Kaggle is asking contestants to forecast the voting matrix.
As someone who knows a little about the Netflix Prize, metalearning is not crowdsourcing. It's not a prediction market. It's none of these things. It's essentially taking a weighted average to match some prior data and then using that for new predictions. It's machine learning. It's not magic. If you wanted to draw an analogy in the real world, it would be like asking, who do you believe more when predicting changes to the climate? Some know-nothing wingnut who never went to college, but listens to conspiracy AM talk radio and creationist programing? Or a climetologist with a PhD and years of experience? Oh sure, the wingnut *might* be right, but probably not.
The real question is whether prediction markets even work? The answer is, they don't. They're at best lagging indicators of existing knowledge, and at worst completely useless.
Kaggle, unlike prediction markets, is designed to deal with complex tasks where data modeling is required. For example, a prediction market can be used to get the crowd's view on who will win the Eurovision Song Contest. But Kaggle is asking contestants to forecast the voting matrix.
You can get similar coverage with a combinatorial market where the securities are themselves complex objects. Robin Hanson has a nice example (on paper) for how you could implement a prediction betting market. Among other things, Hanson's approach allows for conditional statements (if Greece gets the 2032 Summer Olympics, then Ethiopia gets the 2036 Summer Olympics). These things tend to have liquidity problems (the more esoteric the prediction, the less likely you are to find anyone to trade with) and Hanson's market does a fair job of addressing that problem. Having said that, I know of no working prediction markets with the desired level of complexity comparable to what you're doing with Kaggle. I imagine there are derivative providers would could sell derivatives of that level of complexity, but I don't know if there's a real market in that.
Glancing through your posts, I get the impression that you are associated with Kaggle. If so, I'd like to forward a few complaints. First, I registered on your site, but had to restart simply because the registration timed out on me (ok, I was writing an essay in my profile and taking a bit of time to complete that). If the registration process doesn't depend on the experience/education/expertise data to complete (I assumed at the time that there was a registration filter to weed out the riffraff), then it's probably better to hold off on that till later. Maybe the user could get reminders, if they haven't filled out the information.
Second, I have a few complaints about the sample competition. For usability, the links are invisible in a number of places. I couldn't figure out at first how to access the information for the Eurovision Voting competition (the links in the region just below "Submission Instructions: Eurovision Voting" are black font just like the text below). Also the submission template is too restricted. I refer to the line "Submissions with scores other than 1, 2, 3, 4, 5, 6, 7, 8, 10 or 12 will not be accepted." A real statistical prediction will have mixed values, for example, all 5's should be an acceptable entry (as should any numerical entry). That would mean according to my model, I have no information to provide on that row of votes. Given the scoring algorithm, I think you gain by removing this restriction.