Cutting Through Data Science Hype
An anonymous reader writes: Data science — or "big data" if you prefer — has evolved into a full-fledged buzzword, thanks to marketing departments around the world. John Foreman writes that part of the marketing blitz has been focused on how fast big data analysis can be. Most companies offering some kind of analytic service try to sell you on how it'll make it easy for you to quickly find and fix the problems with your business. But he points out that good, robust models need a stable set of inputs, and businesses often change far too quickly for any kind of stable prediction. He takes IBM's analytic services as an example, quoting Kevin Hillstrom: "If IBM Watson can find hidden correlations that help your business, then why can't IBM Watson stem a 3 year sales drop at IBM?" Foreman offers some simple advice: "Simple analyses don't require huge models that get blown away when the business changes. ... If your business is currently too chaotic to support a complex model, don't build one."
IBM, like SAP, Oracle and the rest, are dinosaurs unable to adapt their businesses to changing markets. Why would they be able to do the same for your company?
Actually last year bonuses were forgone amid lower profits: BBC.
"Maybe this world is another planet's hell"
Aldous Huxley
"we don't need no stinkin' sales", we have Ginni.
"Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
However they authorized stock buybacks that probably more than made up for the lack of 'bonuses' through sell off of restricted stock units. They didn't have bonuses directly, but they authorized giving cash to stockholders (particularly themselves).
He is making the assumption that IBM is concerned with a sales drop. For the last decade and a half the only thing their awful management has cared about is executive compensation. Even after this year's awful earnings the genius Ginni said 'the results prove our strategy is working', and lo and behold they voted themselves bonuses today.
Agreed, his criticism is making the same mistake that the scientific method is there to avoid, jumping to conclusions by way of logical fallacies.
There could be any number of causes for a 3 year sales drop and many of them are the market IBM is operating in. Making snarky commentary about using Watson to automagically fix the sales drop is hyperbole not any analysis of predictive analytics or how it works
This article says nothing about the size and diversity of datasets, nothing about regression algorithms, nothing about randomized trials and nothing about dependent variables.
Synopsis: Waste of time... Walk away!
Statistical Process Control and Western Digital rule are very applicable here. Without stability for a baseline, it's (pretty well) impossible to utilize small data, much less big data (big bad data:).
Great minds think alike; fools seldom differ.
Last year they gave up their bonuses. This year they brought them back.
This pretty much sums up the entirety of Big Data.
Data analysis can highlight the correlations that would otherwise go unnoticed, and the "big" data sets involved help to ensure that the noticed correlations are statistically significant. With a large enough sample size, the effects of time can be eliminated from the statistics, supporting analysis of even highly-dynamic models. To a statistician, this is all trivial, given a large enough data set.
Once correlations are discovered, interpreting them in the business context is a different matter for which computers are not well-suited. As the phrase goes, correlation is not causation. A business expert must analyse the observations and figure out what it all means. There may be a correlation indicating a causal relationship, or there may be a hidden cause not covered by the available data.
Even if a causal relationship can be identified, the management may not want to act on it. Sure, the company might make more money by changing their behavior in a particular market segment, but if that segment is dying, it may not be worth the expense to change now. That's also not a task for computers, yet.
Big Data techniques are effectively just a tool. It does one job particularly well, and does a few other jobs well enough to be useful. It is still up to humans to determine if Big Data is the best tool for a particular situation.
You do not have a moral or legal right to do absolutely anything you want.
Don't be mean!
If you have a marketing department, you're wasting money.
If you hire a marketing firm, you're burning money.
If you hire a marketing firm and then take their advice, you're emptying your bank account into a volcano.
Actually last year bonuses were forgone amid lower profits....
Now Watson has some data on what happens to a company when you cut the pay of its top-performing employees more than the lowest performing! *
* I'm talking about the regular employees who get ranked, not necessarily the exectives.
Watson is a bad example since the goal of Watson was to be a showcase of what can be done in a particular area. It was the same with Deep Blue, the computer that win against the world chess champion Gary Kasparov. Nobody is using Deep Blue or Deep Blue like machines to play chess. This was an algorithm and architecture challenge. The same hold for Watson.
The argument using Watson's incapacity to make IBM the most profitable company in the world is then irrelevant. However, IBM is selling since a long time decision assistance and business intelligence solutions, mainly from Cognos until the acquired the company in 2007. Despit these tools, they did not become the most profitable company in the world neither. However, who knows if the situation wouldn't have been worst without computer aided decision and business intelligence for IBM? It is not because you can make sense of the data, extract useful information you can change everything in a big company like IBM to meet instantaneously the market demand and be fully oriented with the most profitable segments. Moving a company like IBM is a long and tedious process.
In summary, this article is bullshit. It doesn't take into account for a large number of things which cannot be neglected in such an analysis.
Achille Talon
Hop!
Data scientists are this bubble's web masters. 'Nuff said.
Across all of the Firefox-branded products, 87% of people report being "sad" with Firefox, while only 13% are "happy" with it!
There is a problem with the sad/happy feedback classes. Which feedback type is right to pick for idea submission, neutral feedback, feedback about issues that make the user both sad and happy? What about interactions with add-ons? I personally try to send both feedback types by breaking up the issues as much as possible, sometimes failing at it. With free software, the user can be happy even with clear problems, or limitations.
none of which disproves TFA's thesis...
TFA is about the **hype**...everything described in your post is value-added...not hype
Thank you Dave Raggett
these systems could be effective, but it comes down to ontology or more broadly research design
i'm not saying *any* company can benefit from "big data", but most can
the core problem is a misunderstanding of what is happening...from a to z alot of biz people are just clueless...the techies they hire to do the big data are partially responsible for this
data analysis is great...everyone does it to some level...highly complex data analysis in a biz situation must have well thought out research questions and research design, specifically tailored for the situation
business is too complex to have a one-size-fits-all data categorization ontology
Thank you Dave Raggett
Actually, Watson is pretty cool as you can feed in natural language data. That removes the very expensive translation step from creating an expert system. It does not do predictions or analyses though, it is just an expert system. Expert systems can be very useful in some tasks, but are rather limited in what they can do. And no, Watson is not (true/strong/whatever) AI and at least to expert audiences IBM is not claiming it is.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
I have worked with many very large data sets or very important data sets covering large numbers of people (not that big just complex). In both cases my first fight was with the data itself. I don't know how many databases I would get into with fields (all in one table) like phone, phone_num, number_phone, phonenum, and then usually a magical set like phone1, phone2, phone3, and phone2a.
Or I would have lat longs for customers that put them in 100 miles off the coast of Nova Scotia (not sable island either). Or a mostly good lat longs but if they couldn't get one then they would use the lat long of the nation's capital resulting in 20% of the customers residing in any given nation's capital which also then obscured the actual number of customers in the nation's capital.
And then dates, can nobody ever get dates right. A favourite is that round one of the system will only record the day of a transaction but later they expand their collection to the hour and minute but now the old dates are all at noon or something. So when you try to find the usage pattern of users there will be this massive spike at noon and a scattering of transactions in the rest of the day. Try and run that through a Bayesian analysis.
I can go on and on with one of my recent favorites is a phone company database where many phone calls never begin, or never end.
So I think the big bucks is not in doing an ML processing of their data using some ingenious Hadoop crap but to maybe use ML to clean the data up. And by the way if someone has a tilde(~) in their name your OCR needs to be shot.
big data needs data science. data science does not need big data. data science = statistics and machine learning (mostly)
--- widget evolution: enhanced, plus, super, ultra, extreme, exxxtreme, ultra-extreme,
To predict global warming? Isn't this a form of "Data Science"?
Big data is really a thing.
Firefox feedback is not, in any sense, a representation of big data.
Global data sets are, for lack of a better word, global.
You are, for lack of a better word, a complete and total brain-lacking vacuum.
Find a rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.
Watson was impressive on Jeopardy, but a TV show is a very different venue than business data analytics.
For the latter you really need a statistically sound approach in order to reach the right conclusion.
(DISCLAIMER: I do not work for Bayesia, but actually a competitor, yet any person or company that understand Bayesianism as a sound foundation for knowledge inference knows this dirty little secret about Watson)
You are absolutely right, only problem is that Watson doesn't perform proper statistics. It's anything but Bayesian learning.
IBM CEO Ginni Rometty Made $16 Million Last Year -- Is She Underpaid?
Top 10 Reasons Why Ginni Rometty Will Fail as IBM's New CEO
Summary from the article:
1. IBM Forgot Who They Were.
2. Ginni Has No Vision for the Future of IBM.
3. IBM Executives are out of Touch.
4. IBM's Sales Culture is Poison.
5. IBM's Executive Compensation is Misaligned.
6. IBM's Rape, Pillage & Burn Acquisition Strategy.
7. IBM's Offshore Model will kill its Services Business.
8. IBM Sells Futures. What is IBM's strategy? Smarter Planet?
9. Watson is not the Panacea.
10. IBM Seems to be Preparing to Sell its Services Business.
Watson is an automated research department that extracts related facts from unstructured text much faster than any human, like any other research department it does not tell management what to do with those facts. Optimizing business processes like JIT supply chains is a branch of math called "operations research" (logistics if you are american). Much of it is closely related to computer science, which itself is a branch of maths, O/R and AI are only tangentially related to each other.
The problem with optimizing the bottom line of a company the size of IBM is "feedback", ie - optimising a market giant like IBM will induce a change in the market itself, the changed market changes the optimal solution. The other hassle is that the problem space of optimising IBM for profit is so big that any methods use to find the optimal solution will only ever be able to find local maxima. Some humans still do this better than computers, which is why humans are the ones building computers and asking them the questions.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
I've never had much of a chance to use IBM offerings. What is AIX like? What is DB2 like? What is Informix like? What is Lotus like? What is WebSphere like? What is the XL C/C++ compiler like?
IBM is repeating what General Motors has been doing, putting out junks, after junks, after junks
Decades ago it didn't matter if you bought Pontiac or Chevrolet or Buick, you bought the same fucking junk
Nowadays it doesn't matter if it is Informix or WebsSphere or AIX or DB2 ... they simply don't worth their sticker price
Muchas Gracias, Señor Edward Snowden !
> With a large enough sample size, the effects of time can be eliminated from the statistics.
Oh, dear. This is so wrong, on so many levels, I'm having difficulty even knowing where to start. But "time" is one of the most critical axes in any systems involving feedback and cannot be safely ignored.
Find a rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.
Buy a very expensive rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.
There, fixed it for you.
I'm a consultant - I convert gibberish into cash-flow.
It's poorly worded above, but perhaps a better way to say it is that the time-dependent churn in a particular model is negligible (to a statistical irrelevance) if you can get enough data quickly enough. Effectively, once your data stream outpaces the time-dependent effects, those effects may no longer be relevant variables in your calculations.
For example, I'd expect that Google can collect enough data in an hour to determine if a UI improvement is helpful, or if a particular change to PageRank results in more accurate results. Because Google has such a high volume of data collection all of the time, a very short sampling duration all but eliminates the variation due to the time of day, day of the week, or season of the year.
I'm not suggesting that a Big Data solution is somehow magically independent of time. Rather, what I'm saying is that the "store first, ask questions later" approach that is central to Big Data lends itself readily to collecting useful samples quickly enough that delta-t is negligible.
You do not have a moral or legal right to do absolutely anything you want.
The problem with Big Data as I see it: information is not the same as knowledge.
Sure, there is a lot of data, as more and more information feeds are made available, but there are still a lot of hidden data. The amount of work put into hiding data is huge. Also, the amount of work put into generating data is huge too, which creates a lot of noise. The point is, a typical decision involves tiny little microscopic bits of _knowledge_, and only a small sample from the masses of information that could be waded through but is rather avoided for lack of time and energy. As far as decision making goes, that's worked well.
In some ways, data is hardly static. It changes as quickly as dominoes cascading. A single "impulse" such as an announcement or event can cause huge shifts in decision making. One can only hope to jump on a trend between impulses or right after an impulse. Analyzing the relationship between impulses and dominoes, i.e., the way data changes, could be illuminating. The challenge is to have the probes in place to get the data. You can't watch dominoes that you aren't looking at.
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.