Google Launches a Data Prediction API
databuff writes "Google has released a data prediction API. The service helps users leverage historical data to make predictions that can guide real-time decisions. According to Google, the API can be used for prediction tasks ranging from product recommendations to churn analysis (predicting which customers are likely to switch to another provider). The API involves three simple steps: upload the data, train the model, then generate predictions. The API is currently available on an invitation-only basis." Google also recently announced several other API additions, including Buzz, Fonts, and Storage.
Does it use users Wifi sniffer captures to aid in this prediction?
... that Google will do their own analysis on your data. They're nothing if not thorough.
neural networks to me!
Data prediction launch YOU!
Yours In Astrakhan,
Kilgore T.
P.S.: Maybe Google can use scrape invitation-only users predictions for its bond trading floor.
1. Upload the data.
2. Train the model.
3. Generate predictions.
4. PROFIT!!!!
of thi
What about feeding it with historical events, train with the outcome from these events and try to get a glimpse at which way the future will evolve ?
plz predict for me if this turing machine haltz, kthksbye.
Given my family history... is there a girl for me?
Despite history (and having really good access to historical information), will people keep making stupid choices, voting for someone that screws them in the end and buying products that they think will make them happy but end up at the next garage sale for 90% off.
Yes
Holy shit, you... you... you figured out step 3!!
Pretty good is actually pretty bad.
I can't wait to take my Droid to Vegas once this launches!
Taking into account they released it, and they probably used it to predict its own success; this will either:
- work, and be a success
- not work, and fail.
The future is here!
When I used to work in the financial services industry we used to call this "data mining". The result is usually at best worthless and at worst dangerous as it is so often misused.
It's worth remembering the saying with data: "if you look hard enough, you can find anything you want to".
What is listed as step 4. is actually step 5. There wasn't much of a wait involved at all, so we skipped it to keep things simple.
..." I you will.
Think of the step you're thinking of as being more of an extension of step 3... "3.b)
That's the power of Cloud Computing.
Finally had enough. Come see us over at https://soylentnews.org/
Google require you to have a current Storage-For-Developers account, which is only available for US parties currently.
Could be bad news for the already ailing ACNielsen. Based on my experience there, I'd guess that many companies that use the services of ACNielsen would also be willing to plug their data into an API like this and not only compare the output, but compare it to the outcome. If the API does a satisfactory job, they'll drop Nielsen like a ton of hot bricks.
Nielsen has some slick and useful software, but making their own API like this is the kind of ingenious thing that they could really use about now.
I predict that within the next year someone's blog or the Wall Street Journal will feature a cage match between Google's Prediction API, a chimp with a dartboard, and a magic 8-ball.
a riot!
Finally had enough. Come see us over at https://soylentnews.org/
I know that word is "churn" but the first 3 times I read it as "chum" Anyway, is this similar logic to how google is able to advertise based on what is discussed in your email?
What could possibly go wrong?
The service helps users leverage historical data to make predictions that can guide real-time decisions.
This sentence hurts my brain in how vague it is. You could say the same thing about Excel, Lotus 1-2-3, your kid's history homework, my filing cabinet, or the library. If it was removed from the summary, no meaning would be lost.
Please post your privacy concerns in the form of an outraged screed. :-)
The bitter lessons of a veteran coder: http://bitterprogrammer.blogspot.com
It's absolutely data mining, but it's far from worthless.
Every time you go to Amazon and it recommends something to you, guess what, that's data mining using basically the same techniques that this service will use. And as you might expect, that equates to big $$$ for them (or else they wouldn't be bothering).
Many many fields use the technology, particularly the medical fields for analyzing the relationships between a large number of input variables (which may or may not be correlated) and some desired output variable. Spam filters, Google Search itself... all data mining algorithms. Nah, no money to be made there...
Now, the reality isn't as simple as 'upload the data, training the model, and generate predictions' normally. It takes time to figure out what factors to include, ETL'ing the training data from the actual source(s), plugging in algorithm parameters, and carefully validating your output model. Most models I've worked have taken several iterations to get right as you learn more about your input data relationships as you use the model.
And your second sentence is sadly true, if management wants a certain output, then the endeavor is pointless. But when used appropriately (and it's on the experts to explain the limitations of the tech to the users), this stuff is really powerful.
But will a lot of businesses be willing to send their 10 year history of accepted/declined credit card transactions with all the related demographic data to the cloud? Or their medical scenarios with the medical details of each patient? I think not. The type of data most mining projects use is critically sensitive. So I predict this will be limited to experimental users 'playing around', nothing more.
predict(data)
{
delete data
prediction = random()
return prediction
}
Google probably wants to use the data for their own analysis. So, I suggest all of Slashdot team together and forge a large volume of the most bullshit data that will convince Google that, without a doubt, they need to make every first search result named "Frosty P1ss!" linked to goatse in order to make their customers happy.
Now I see why the Amazon Cloud people have been so insistent on people in Hacker Dojo's machine learning class run problems on their "cloud".
This stuff is actually fairly routine by now. It's much the same technology that's behind spam filters.
the post made me think of Asimov's old foundation series books.
The API predicts that will be an empty niche/opportunity in a day, then everyone that uses it jump there, so the prediction fails because becomes overcrowded. Is very easy to turn predictions for everyone to predictions for none if all try to take advantage of that knowledge.
http://www.cockeyed.com/citizen/gas_zero/gas_zero.php
mod parent parent up.
mod parent up.
Hivemind harvest in progress..
It's interesting to see this coming, as in google becoming a digital Harry Seldon. But while it's good to have plenty of info to which base decisions on, it's becoming what in the Army is referred to as "paralysis by analysis". At some point, you need to trust your instincts, and do it. Pouring over the amount of data google can provide, filtering what is relevant (google isn't perfect), and then deciding what to do would likely take longer than going with your gut, or the smaller amount of available data, and then adjusting from there.
Well? Can it predict the rest? :)
-- Sig down
The use of the word "predict" is for ease-of-understanding for the business market and those not familiar with machine learning. Many of the comments here are getting lost in that word. The algorithms behind the API are most likely the same basic ones that have been around for a long time: naive bayes, svm, knn, etc. The actual novelty of this service is that it puts these methods in easy reach for people who otherwise wouldn't know where to start looking, or wouldn't know how to use one of the many available libraries already around, or much less implement something themselves.
See also: http://mlcomp.org/ for a service that allows you to try out different classification algorithms on your own data sets.
that's actually a pretty nice idea. The thing seems to have some caveats though: only categorical labels are allowed, training sets are limited to 100mb and no sparse features can be used. There's also no info on whether things like cross-validation are done and what algorithm will be chosen. I also wonder about how fast the prediction phase will be. Still pretty neat.
I asked the Google Prediction API what the next Google API would be, and it said "Google Prediction API".
Not that this wasn't entirely predictable.
Past performance does not guarantee future results.
I'd like to buy homeland for our 10 million people. http://twitter.com/mahadiga