Google Launches a Data Prediction API
databuff writes "Google has released a data prediction API. The service helps users leverage historical data to make predictions that can guide real-time decisions. According to Google, the API can be used for prediction tasks ranging from product recommendations to churn analysis (predicting which customers are likely to switch to another provider). The API involves three simple steps: upload the data, train the model, then generate predictions. The API is currently available on an invitation-only basis." Google also recently announced several other API additions, including Buzz, Fonts, and Storage.
What about feeding it with historical events, train with the outcome from these events and try to get a glimpse at which way the future will evolve ?
It's worth remembering the saying with data: "if you look hard enough, you can find anything you want to".
A friend of mine works as a quant at one of the big investment banks. He admitted that the models his team creates are useless at predicting the unexpected (as you'd probably expect). Adding in a degree of randomness rarely produces better models, as there are too many possible sources of such unpredictability and the reactions to them depend on many unquantifiable forces. This results in models that are OK at telling traders what they want to know - that they're doing the right thing by all doing the same thing. As soon as something undesirable or unexpected happens, then all hell breaks loose and the traders panic. Having mulled this over for a bit, I suggested his job was pointless, to which he agreed, but pointed out that the pay's great. So much wasted mathematical genius.
Often misused, definitely. But that does not invalidate the importance of any tool, including data mining.
One good example is Netflix recommendation engine. I know it's far from perfect (as there is nothing perfect about prediction), but is it useful? Hell yeah. It's the best recommendation engine I have used and have benefited greatly from.
Problem is when it's applied to areas where stacks are higher - like risk analysis by the investment banks.
And that brings me to mention an interesting (old) and related read - "Fooled by randomness" by Naseem Taleb.