Cutting Through Data Science Hype
An anonymous reader writes: Data science — or "big data" if you prefer — has evolved into a full-fledged buzzword, thanks to marketing departments around the world. John Foreman writes that part of the marketing blitz has been focused on how fast big data analysis can be. Most companies offering some kind of analytic service try to sell you on how it'll make it easy for you to quickly find and fix the problems with your business. But he points out that good, robust models need a stable set of inputs, and businesses often change far too quickly for any kind of stable prediction. He takes IBM's analytic services as an example, quoting Kevin Hillstrom: "If IBM Watson can find hidden correlations that help your business, then why can't IBM Watson stem a 3 year sales drop at IBM?" Foreman offers some simple advice: "Simple analyses don't require huge models that get blown away when the business changes. ... If your business is currently too chaotic to support a complex model, don't build one."
He is making the assumption that IBM is concerned with a sales drop. For the last decade and a half the only thing their awful management has cared about is executive compensation. Even after this year's awful earnings the genius Ginni said 'the results prove our strategy is working', and lo and behold they voted themselves bonuses today.
IBM, like SAP, Oracle and the rest, are dinosaurs unable to adapt their businesses to changing markets. Why would they be able to do the same for your company?
The term "Big Data" is bullshit, but the concept itself is not. It's statistics, plain and simple. When you have sufficient data available, there is a lot of information and insight that can be obtained from these data.
A perfect example of this are the data that are available about Mozilla Firefox. Let's start by looking at Firefox's market share today. As we can see, it's only about 10% these days, on both the desktop and mobile platforms. Their mobile presence is particularly embarrassing, as it's much less than even mobile IE's! Even the ancient Android 2.3 browser has more users than Firefox for Android! Even more interesting is how Chrome for Android alone likely has more users than Firefox does in total!
Those browser stats are an example of "Big Data" that's tremendously useful. We can learn a lot about Firefox and its role in the modern world from that data alone. When you're dealing with data sets derived from absolutely massive collections of source data, remarkable observations are possible.
We can also look at Mozilla's own Firefox feedback results. These are very interesting! Over the past 7 days, over 10,000 people have submitted feedback. Across all of the Firefox-branded products, 87% of people report being "sad" with Firefox, while only 13% are "happy" with it! That's a huge gap, even when we consider that angry people are more likely to give feedback than happy people. There are 6.5 times more people who are sad with Firefox than there are people who are happy with it! We can correlate this feedback data set, which is statistically significant, with the results we derive from the browser market share data set. It becomes obvious that people are leaving Firefox behind because they are unhappy with it. Furthermore, Mozilla should already be aware of this displeasure with Firefox.
This is the beauty of statistics at work!
When we consider global data sets consisting of data from thousands or millions or even billions of people, we can see some stunning patterns and results. Clearly Mozilla needs to do a better job of listening to its users. Something is seriously wrong when 87% of them are unhappy with Firefox. The data are there, Mozilla! The results are obvious! Please, act on it! Listen to the users!
This is basically the same kind of thinking I've been having. Your logic isn't quite completely sound as no matter how smart the software, it's still dependent on computing power, but it's still a valid point - much like the "if he was so smart, then how com he's dead?" ..
It's just a surveillance grid dressed up as the next big corporate fad.
"we don't need no stinkin' sales", we have Ginni.
"Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
And any company in BI knows this and is recommending people stay far away. Some canned analytics and pretty dashboards, but nothing worth the price. So they don't have many partners.
Statistical Process Control and Western Digital rule are very applicable here. Without stability for a baseline, it's (pretty well) impossible to utilize small data, much less big data (big bad data:).
Great minds think alike; fools seldom differ.
I've never had much of a chance to use IBM offerings. What is AIX like? What is DB2 like? What is Informix like? What is Lotus like? What is WebSphere like? What is the XL C/C++ compiler like?
It would have been mush more effective if you left the last few sentences out.
Put simply, Tyson is a celebrity, not a scientist.
If you have a marketing department, you're wasting money.
If you hire a marketing firm, you're burning money.
If you hire a marketing firm and then take their advice, you're emptying your bank account into a volcano.
USA Government has an Economic model for prosperity and the budget and all things apple pie including employment.
It does not work.
Futures Traders: In theory there to smooth out trading spikes. Well, did not work recently for the price of oil or commodities, and those employ the smartest brains of the lot with the best DWH money can buy
Models for predicting election results: Greece, Italy - scratch that one.
Employment office: No steaming pile of 'puter will make a dent in the numbers unemployed.
Phone Plans. Not sure what Telcos do or what Data Scientist output actually is, but a telemarketer cold call with a strong Indian accent will not see me buying or churning.
20 Years ago these were called decision support modelling, a variation of operations research from 1938, where the British had excellent results using just paper and pencil. One speculates in wartime, ALL inputs are considered, there is active FEEDBACK, and RESULT'S are interpreted correctly; not through rose colored glasses.
Conclusion: Garbage In, Garbage out - and don't bother if you can't pull the levers 'out of bounds' .
Data scientists are this bubble's web masters. 'Nuff said.
none of which disproves TFA's thesis...
TFA is about the **hype**...everything described in your post is value-added...not hype
Thank you Dave Raggett
these systems could be effective, but it comes down to ontology or more broadly research design
i'm not saying *any* company can benefit from "big data", but most can
the core problem is a misunderstanding of what is happening...from a to z alot of biz people are just clueless...the techies they hire to do the big data are partially responsible for this
data analysis is great...everyone does it to some level...highly complex data analysis in a biz situation must have well thought out research questions and research design, specifically tailored for the situation
business is too complex to have a one-size-fits-all data categorization ontology
Thank you Dave Raggett
Brady Haran is neither, but he puts actual scientists on his YouTube channels, and they talk about honest science (and occasional amusing trivia), with no CGI or celebrity required. No politics, no manufactured quotes, many Nobel prizes.
Socialism: a lie told by totalitarians and believed by fools.
I have worked with many very large data sets or very important data sets covering large numbers of people (not that big just complex). In both cases my first fight was with the data itself. I don't know how many databases I would get into with fields (all in one table) like phone, phone_num, number_phone, phonenum, and then usually a magical set like phone1, phone2, phone3, and phone2a.
Or I would have lat longs for customers that put them in 100 miles off the coast of Nova Scotia (not sable island either). Or a mostly good lat longs but if they couldn't get one then they would use the lat long of the nation's capital resulting in 20% of the customers residing in any given nation's capital which also then obscured the actual number of customers in the nation's capital.
And then dates, can nobody ever get dates right. A favourite is that round one of the system will only record the day of a transaction but later they expand their collection to the hour and minute but now the old dates are all at noon or something. So when you try to find the usage pattern of users there will be this massive spike at noon and a scattering of transactions in the rest of the day. Try and run that through a Bayesian analysis.
I can go on and on with one of my recent favorites is a phone company database where many phone calls never begin, or never end.
So I think the big bucks is not in doing an ML processing of their data using some ingenious Hadoop crap but to maybe use ML to clean the data up. And by the way if someone has a tilde(~) in their name your OCR needs to be shot.
big data needs data science. data science does not need big data. data science = statistics and machine learning (mostly)
--- widget evolution: enhanced, plus, super, ultra, extreme, exxxtreme, ultra-extreme,
To predict global warming? Isn't this a form of "Data Science"?
It's easier to get it wrong then get it right at this stage of the game.
I know of a big company that does this stuff. They found out the most profitable customers made 2 purchases quite quickly and return a lot after that. Now to me that's just quite obvious but doesn't say how you find these customers. The business interpreted it as an action to targeting customers that made 1 purchase to try to convert them into this highly profitable 2 purchase type. Isn't it obvious that this isn't the stage of intervention that actually creates these type of customers?
Watson was impressive on Jeopardy, but a TV show is a very different venue than business data analytics.
For the latter you really need a statistically sound approach in order to reach the right conclusion.
(DISCLAIMER: I do not work for Bayesia, but actually a competitor, yet any person or company that understand Bayesianism as a sound foundation for knowledge inference knows this dirty little secret about Watson)
We are a startup with maybe 1 gig of data. Yet on our promotional material it says we use "big data"
Ooohhkay
...are due to Idiotic Management By A Pussy. She wants to increase profits while firing the best workers and moving their jobs into cheaper places.
We now (fortunately) see this does not work.
Greece is a hotbed of lying, laziness and corruption. No amount of computers and software will change that.
IBM CEO Ginni Rometty Made $16 Million Last Year -- Is She Underpaid?
Top 10 Reasons Why Ginni Rometty Will Fail as IBM's New CEO
Summary from the article:
1. IBM Forgot Who They Were.
2. Ginni Has No Vision for the Future of IBM.
3. IBM Executives are out of Touch.
4. IBM's Sales Culture is Poison.
5. IBM's Executive Compensation is Misaligned.
6. IBM's Rape, Pillage & Burn Acquisition Strategy.
7. IBM's Offshore Model will kill its Services Business.
8. IBM Sells Futures. What is IBM's strategy? Smarter Planet?
9. Watson is not the Panacea.
10. IBM Seems to be Preparing to Sell its Services Business.
If you cannot sell you contraptions, you can be Mr Einstein himself and live under a bridge.
Watson is an automated research department that extracts related facts from unstructured text much faster than any human, like any other research department it does not tell management what to do with those facts. Optimizing business processes like JIT supply chains is a branch of math called "operations research" (logistics if you are american). Much of it is closely related to computer science, which itself is a branch of maths, O/R and AI are only tangentially related to each other.
The problem with optimizing the bottom line of a company the size of IBM is "feedback", ie - optimising a market giant like IBM will induce a change in the market itself, the changed market changes the optimal solution. The other hassle is that the problem space of optimising IBM for profit is so big that any methods use to find the optimal solution will only ever be able to find local maxima. Some humans still do this better than computers, which is why humans are the ones building computers and asking them the questions.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
I've never had much of a chance to use IBM offerings. What is AIX like? What is DB2 like? What is Informix like? What is Lotus like? What is WebSphere like? What is the XL C/C++ compiler like?
IBM is repeating what General Motors has been doing, putting out junks, after junks, after junks
Decades ago it didn't matter if you bought Pontiac or Chevrolet or Buick, you bought the same fucking junk
Nowadays it doesn't matter if it is Informix or WebsSphere or AIX or DB2 ... they simply don't worth their sticker price
Muchas Gracias, Señor Edward Snowden !
Ps research is about asking the righ questions, not making your sophomoric simplistic models. ORs study a lot of Econ and engineering for exactly that reason.