Slashdot Mirror


Cutting Through Data Science Hype

An anonymous reader writes: Data science — or "big data" if you prefer — has evolved into a full-fledged buzzword, thanks to marketing departments around the world. John Foreman writes that part of the marketing blitz has been focused on how fast big data analysis can be. Most companies offering some kind of analytic service try to sell you on how it'll make it easy for you to quickly find and fix the problems with your business. But he points out that good, robust models need a stable set of inputs, and businesses often change far too quickly for any kind of stable prediction. He takes IBM's analytic services as an example, quoting Kevin Hillstrom: "If IBM Watson can find hidden correlations that help your business, then why can't IBM Watson stem a 3 year sales drop at IBM?" Foreman offers some simple advice: "Simple analyses don't require huge models that get blown away when the business changes. ... If your business is currently too chaotic to support a complex model, don't build one."

99 comments

  1. IBM by Anonymous Coward · · Score: 0

    He is making the assumption that IBM is concerned with a sales drop. For the last decade and a half the only thing their awful management has cared about is executive compensation. Even after this year's awful earnings the genius Ginni said 'the results prove our strategy is working', and lo and behold they voted themselves bonuses today.

    1. Re:IBM by Meshach · · Score: 1

      He is making the assumption that IBM is concerned with a sales drop. For the last decade and a half the only thing their awful management has cared about is executive compensation. Even after this year's awful earnings the genius Ginni said 'the results prove our strategy is working', and lo and behold they voted themselves bonuses today.

      Actually last year bonuses were forgone amid lower profits: BBC.

      --
      "Maybe this world is another planet's hell"
      Aldous Huxley
    2. Re:IBM by Anonymous Coward · · Score: 1

      However they authorized stock buybacks that probably more than made up for the lack of 'bonuses' through sell off of restricted stock units. They didn't have bonuses directly, but they authorized giving cash to stockholders (particularly themselves).

    3. Re:IBM by Anonymous Coward · · Score: 1

      He is making the assumption that IBM is concerned with a sales drop. For the last decade and a half the only thing their awful management has cared about is executive compensation. Even after this year's awful earnings the genius Ginni said 'the results prove our strategy is working', and lo and behold they voted themselves bonuses today.

      Agreed, his criticism is making the same mistake that the scientific method is there to avoid, jumping to conclusions by way of logical fallacies.

      There could be any number of causes for a 3 year sales drop and many of them are the market IBM is operating in. Making snarky commentary about using Watson to automagically fix the sales drop is hyperbole not any analysis of predictive analytics or how it works

      This article says nothing about the size and diversity of datasets, nothing about regression algorithms, nothing about randomized trials and nothing about dependent variables.

      Synopsis: Waste of time... Walk away!

    4. Re:IBM by bws111 · · Score: 1

      Last year they gave up their bonuses. This year they brought them back.

    5. Re:IBM by Sarten-X · · Score: 5, Insightful

      This pretty much sums up the entirety of Big Data.

      Data analysis can highlight the correlations that would otherwise go unnoticed, and the "big" data sets involved help to ensure that the noticed correlations are statistically significant. With a large enough sample size, the effects of time can be eliminated from the statistics, supporting analysis of even highly-dynamic models. To a statistician, this is all trivial, given a large enough data set.

      Once correlations are discovered, interpreting them in the business context is a different matter for which computers are not well-suited. As the phrase goes, correlation is not causation. A business expert must analyse the observations and figure out what it all means. There may be a correlation indicating a causal relationship, or there may be a hidden cause not covered by the available data.

      Even if a causal relationship can be identified, the management may not want to act on it. Sure, the company might make more money by changing their behavior in a particular market segment, but if that segment is dying, it may not be worth the expense to change now. That's also not a task for computers, yet.

      Big Data techniques are effectively just a tool. It does one job particularly well, and does a few other jobs well enough to be useful. It is still up to humans to determine if Big Data is the best tool for a particular situation.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    6. Re:IBM by dfsmith · · Score: 1

      Actually last year bonuses were forgone amid lower profits....

      Now Watson has some data on what happens to a company when you cut the pay of its top-performing employees more than the lowest performing! *

      * I'm talking about the regular employees who get ranked, not necessarily the exectives.

    7. Re:IBM by Antique+Geekmeister · · Score: 1

      > With a large enough sample size, the effects of time can be eliminated from the statistics.

      Oh, dear. This is so wrong, on so many levels, I'm having difficulty even knowing where to start. But "time" is one of the most critical axes in any systems involving feedback and cannot be safely ignored.

    8. Re:IBM by Sarten-X · · Score: 1

      It's poorly worded above, but perhaps a better way to say it is that the time-dependent churn in a particular model is negligible (to a statistical irrelevance) if you can get enough data quickly enough. Effectively, once your data stream outpaces the time-dependent effects, those effects may no longer be relevant variables in your calculations.

      For example, I'd expect that Google can collect enough data in an hour to determine if a UI improvement is helpful, or if a particular change to PageRank results in more accurate results. Because Google has such a high volume of data collection all of the time, a very short sampling duration all but eliminates the variation due to the time of day, day of the week, or season of the year.

      I'm not suggesting that a Big Data solution is somehow magically independent of time. Rather, what I'm saying is that the "store first, ask questions later" approach that is central to Big Data lends itself readily to collecting useful samples quickly enough that delta-t is negligible.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    9. Re:IBM by Anonymous Coward · · Score: 0

      "... noticed correlations are statistically significant. With a large enough sample size..."

      Big data data mining techniques don't use statistical significance as an evaluation criterion for the very reason that with a large enough sample size everything becomes statistically significant. As well, with a large enough set of variables evaluated for statistical significance at a given significance level, the number of spurious relationships found by chance is the significance level as well. That's another reason why it's not used.

      If one knows anything about statistics, it doesn't necessary follow they know a thing about data mining in a 'big data' context. They are very much a different set of tools.

    10. Re:IBM by PingPongBoy · · Score: 1

      The problem with Big Data as I see it: information is not the same as knowledge.

      Sure, there is a lot of data, as more and more information feeds are made available, but there are still a lot of hidden data. The amount of work put into hiding data is huge. Also, the amount of work put into generating data is huge too, which creates a lot of noise. The point is, a typical decision involves tiny little microscopic bits of _knowledge_, and only a small sample from the masses of information that could be waded through but is rather avoided for lack of time and energy. As far as decision making goes, that's worked well.

      In some ways, data is hardly static. It changes as quickly as dominoes cascading. A single "impulse" such as an announcement or event can cause huge shifts in decision making. One can only hope to jump on a trend between impulses or right after an impulse. Analyzing the relationship between impulses and dominoes, i.e., the way data changes, could be illuminating. The challenge is to have the probes in place to get the data. You can't watch dominoes that you aren't looking at.

      --
      Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
  2. Missing the forest for the trees by Anonymous Coward · · Score: 1

    IBM, like SAP, Oracle and the rest, are dinosaurs unable to adapt their businesses to changing markets. Why would they be able to do the same for your company?

    1. Re:Missing the forest for the trees by fuzzyfuzzyfungus · · Score: 1

      IBM, like SAP, Oracle and the rest, are dinosaurs unable to adapt their businesses to changing markets. Why would they be able to do the same for your company?

      Well, I'd say that fossil fuels, which are mostly composed of dinosaurs who were unable to adapt(along with plants who were unable to adapt, and various other organisms who were unable to adapt) revolutionized the hell out of our entire civilization...

      Maybe if IBM were buried and subjected to a few million years of heat and pressure they too would become a highly coveted resource?

    2. Re:Missing the forest for the trees by CaptainDork · · Score: 2

      The dinosaurs did not die out because they were unable to adapt anymore than a person dies because they fail to "adapt" to a grenade.

      --
      It little behooves the best of us to comment on the rest of us.
    3. Re:Missing the forest for the trees by ColdWetDog · · Score: 1

      Evolution is a cast-iron bitch sometimes. Dino's didn't adapt to the big grenade. Lots of other critters did.

      (And yes, fossil fuels are composed of relatively few actual dinosaurs, it's mostly ex-plant life.)

      --
      Faster! Faster! Faster would be better!
    4. Re:Missing the forest for the trees by CaptainDork · · Score: 1

      Grenades and huge rocks aren't "evolutionary," they are "catastrophic."

      --
      It little behooves the best of us to comment on the rest of us.
    5. Re:Missing the forest for the trees by Anonymous Coward · · Score: 0

      Dinos did adapt! all the avian dinosaurs adapted and radiated. it is the non-avian dinosaurs that died out.

    6. Re:Missing the forest for the trees by TapeCutter · · Score: 2

      What do you mean "dinosaurs failed to adapt", there are several of them flying around in my garden right now!

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    7. Re:Missing the forest for the trees by Antique+Geekmeister · · Score: 1

      Catastrophe is a critical factor in most evolutionary history. Practices and traits that were successful, successful enough to become part of the biology or lifesstyle of an organism, often fail as circumstances change. I'm afraid that abrupt changes in environment are a common, through often unpredicatable, factor in many species.

    8. Re:Missing the forest for the trees by CaptainDork · · Score: 1

      Catastrophe is a critical factor in most evolutionary history.

      Citation, please.

      --
      It little behooves the best of us to comment on the rest of us.
    9. Re:Missing the forest for the trees by fuzzyfuzzyfungus · · Score: 2

      Birds heap shame upon their ancestors merely by existing. (Except maybe shrikes; their willingness to keep up a proud tradition of bloodthirsty carnivorous murder despite now being about the size of a sparrow is pretty honorable).

    10. Re:Missing the forest for the trees by Antique+Geekmeister · · Score: 3, Interesting

      >> Catastrophe is a critical factor in most evolutionary history.

      > Citation, please.

      Wikipedia has a fairly good entry on "Catastrophism", and another on "Punctuated equilibrium". But even without large scale events such as dinosaur killer asteroids or the evolution of photosynthesis poisoning most species with much higher concentrations of volatile oxygen, the are much smaller and more frequent effects. Forest fires are a crtical factor in breeding jack pine trees, floods are vital to the fertility of the ecosystem near river banks, and hurricanes spread species throughout their trail and profoundly affect the ecology and evolution of areas that are likely to endure hurricanes. And catastrophes can and do create a "founder effect", where a small number of introduced species members become a new species quite quickly in their new environment.

      Do I need to find individual links links for each of those?

  3. "Big Data" is not "bullshit". by Anonymous Coward · · Score: 0, Interesting

    The term "Big Data" is bullshit, but the concept itself is not. It's statistics, plain and simple. When you have sufficient data available, there is a lot of information and insight that can be obtained from these data.

    A perfect example of this are the data that are available about Mozilla Firefox. Let's start by looking at Firefox's market share today. As we can see, it's only about 10% these days, on both the desktop and mobile platforms. Their mobile presence is particularly embarrassing, as it's much less than even mobile IE's! Even the ancient Android 2.3 browser has more users than Firefox for Android! Even more interesting is how Chrome for Android alone likely has more users than Firefox does in total!

    Those browser stats are an example of "Big Data" that's tremendously useful. We can learn a lot about Firefox and its role in the modern world from that data alone. When you're dealing with data sets derived from absolutely massive collections of source data, remarkable observations are possible.

    We can also look at Mozilla's own Firefox feedback results. These are very interesting! Over the past 7 days, over 10,000 people have submitted feedback. Across all of the Firefox-branded products, 87% of people report being "sad" with Firefox, while only 13% are "happy" with it! That's a huge gap, even when we consider that angry people are more likely to give feedback than happy people. There are 6.5 times more people who are sad with Firefox than there are people who are happy with it! We can correlate this feedback data set, which is statistically significant, with the results we derive from the browser market share data set. It becomes obvious that people are leaving Firefox behind because they are unhappy with it. Furthermore, Mozilla should already be aware of this displeasure with Firefox.

    This is the beauty of statistics at work!

    When we consider global data sets consisting of data from thousands or millions or even billions of people, we can see some stunning patterns and results. Clearly Mozilla needs to do a better job of listening to its users. Something is seriously wrong when 87% of them are unhappy with Firefox. The data are there, Mozilla! The results are obvious! Please, act on it! Listen to the users!

    1. Re:"Big Data" is not "bullshit". by Anonymous Coward · · Score: 1

      Across all of the Firefox-branded products, 87% of people report being "sad" with Firefox, while only 13% are "happy" with it!

      There is a problem with the sad/happy feedback classes. Which feedback type is right to pick for idea submission, neutral feedback, feedback about issues that make the user both sad and happy? What about interactions with add-ons? I personally try to send both feedback types by breaking up the issues as much as possible, sometimes failing at it. With free software, the user can be happy even with clear problems, or limitations.

    2. Re:"Big Data" is not "bullshit". by Anonymous Coward · · Score: 0

      There are no problems with the sad/happy feedback classes. One or the other is always appropriate.

      An idea submission? You should feel sad, because Firefox doesn't already implement your idea, depriving you of whatever functionality you desire.

      Neutral feedback? There's no such thing. Either your feedback is negative in nature, or it's positive in nature. If you aren't absolutely sure you're happy, then you're sad.

      Issues that make the user feel both sad and happy? There is clearly more than one issue at play here. Report one (or more) as sad, and the rest as happy.

      Interactions with add-ons? If Firefox allowed an add-on to do something bad, then report the issue as making you feel sad. Otherwise Firefox hasn't done anything, so you shouldn't be filing a report at all.

      And if there are any problems at all, then the user shouldn't be happy. Maybe Firefox isn't as bad as the alternatives, but if the user has experienced any problems or limitations then that user should still be sad. You shouldn't be happy that Firefox hasn't screwed you over as badly as, say, IE has. You should be sad that you still experienced problems with Firefox.

    3. Re:"Big Data" is not "bullshit". by Bite+The+Pillow · · Score: 1

      Big data is really a thing.

      Firefox feedback is not, in any sense, a representation of big data.

      Global data sets are, for lack of a better word, global.

      You are, for lack of a better word, a complete and total brain-lacking vacuum.

    4. Re:"Big Data" is not "bullshit". by quax · · Score: 1

      You are absolutely right, only problem is that Watson doesn't perform proper statistics. It's anything but Bayesian learning.

  4. Well said by Anonymous Coward · · Score: 0

    This is basically the same kind of thinking I've been having. Your logic isn't quite completely sound as no matter how smart the software, it's still dependent on computing power, but it's still a valid point - much like the "if he was so smart, then how com he's dead?" ..

    It's just a surveillance grid dressed up as the next big corporate fad.

  5. IBM's got this by turkeydance · · Score: 2

    "we don't need no stinkin' sales", we have Ginni.

  6. Reminds me of a joke by ShaunC · · Score: 5, Funny

    "Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

    --
    Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
    1. Re:Reminds me of a joke by linear+a · · Score: 1

      Mod up. Slightly vulgar but a really good analogy.

    2. Re:Reminds me of a joke by Anonymous Coward · · Score: 2, Funny

      "Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

      Well, OK, but this is slashdot. Are you sure your audience will get this analogy? Can you try to rework this into a car analogy instead?

    3. Re:Reminds me of a joke by Registered+Coward+v2 · · Score: 5, Funny

      "Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

      Well, OK, but this is slashdot. Are you sure your audience will get this analogy? Can you try to rework this into a car analogy instead?

      "Big Data" is like sex in a car while in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

      --
      I'm a consultant - I convert gibberish into cash-flow.
    4. Re:Reminds me of a joke by K.+S.+Kyosuke · · Score: 2

      Vulgar, as in perfectly ordinary.

      --
      Ezekiel 23:20
    5. Re:Reminds me of a joke by Anonymous Coward · · Score: 0

      Those are interrelated analogies, to which my life is a testament.

    6. Re:Reminds me of a joke by gweihir · · Score: 1

      Indeed. The one big-data project I personally see at a customer does have the advantage that the IBM-team is too stupid to actually collect the data (they just cannot hack the engineering and have been delayed for over a year now and just recently were removed from the productive platform again because they break other things). So while the customer pays them oogles of money, they at least do not get bogus analyses in return.

      The fascinating thing is that I though that you do not find the combination of extreme arrogance and extreme incompetence in engineers. Two meetings with that IBM team taught me differently. Any plans for further meetings were silently dropped from our side after that. These people are not even worthwhile talking to.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    7. Re:Reminds me of a joke by gweihir · · Score: 1

      Sex in a car? Sounds messy and uncomfortable...

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    8. Re:Reminds me of a joke by drinkypoo · · Score: 1

      I used to have a car with a back seat truly the size of a sofa, a 1960 Dodge Phoenix (2dr dart... before they shrunk it). But alas, although I actually was having sex regularly, the car had no working parking brake so I couldn't do it in the car. Haven't had a vehicle with a big enough back seat to get my freak on since. I may never lose that purity point.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    9. Re:Reminds me of a joke by Bite+The+Pillow · · Score: 1

      I know the guy that did it. Big data is about asking the guy that did it.

      If I can assign that guy an identifier, then I know you forever.

      I know the girl, and I know the guy. More importantly, I know the guy that didn't go for that girl. I want to get paid.

      More importantly, I want everyone to be private.

      Don't pay me. I can't be bought.

      But everyone else, for all practical purposes, can.

    10. Re:Reminds me of a joke by Anonymous Coward · · Score: 0

      Is your conclusion based on your massive package [of data]?

    11. Re:Reminds me of a joke by Anonymous Coward · · Score: 0

      You used to not find that combination in U.S. presidents as well, but times change.

    12. Re:Reminds me of a joke by DeBaas · · Score: 1

      Indeed, and the few that actually do (or did) get it, love(d) it!

      --
      ---
  7. Watson is a marketing gimick by Anonymous Coward · · Score: 0

    And any company in BI knows this and is recommending people stay far away. Some canned analytics and pretty dashboards, but nothing worth the price. So they don't have many partners.

    1. Re:Watson is a marketing gimick by AchilleTalon · · Score: 1

      Watson is a bad example since the goal of Watson was to be a showcase of what can be done in a particular area. It was the same with Deep Blue, the computer that win against the world chess champion Gary Kasparov. Nobody is using Deep Blue or Deep Blue like machines to play chess. This was an algorithm and architecture challenge. The same hold for Watson.

      The argument using Watson's incapacity to make IBM the most profitable company in the world is then irrelevant. However, IBM is selling since a long time decision assistance and business intelligence solutions, mainly from Cognos until the acquired the company in 2007. Despit these tools, they did not become the most profitable company in the world neither. However, who knows if the situation wouldn't have been worst without computer aided decision and business intelligence for IBM? It is not because you can make sense of the data, extract useful information you can change everything in a big company like IBM to meet instantaneously the market demand and be fully oriented with the most profitable segments. Moving a company like IBM is a long and tedious process.

      In summary, this article is bullshit. It doesn't take into account for a large number of things which cannot be neglected in such an analysis.

      --
      Achille Talon
      Hop!
    2. Re:Watson is a marketing gimick by gweihir · · Score: 1

      Actually, Watson is pretty cool as you can feed in natural language data. That removes the very expensive translation step from creating an expert system. It does not do predictions or analyses though, it is just an expert system. Expert systems can be very useful in some tasks, but are rather limited in what they can do. And no, Watson is not (true/strong/whatever) AI and at least to expert audiences IBM is not claiming it is.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  8. SPC by Mikkeles · · Score: 2

    Statistical Process Control and Western Digital rule are very applicable here. Without stability for a baseline, it's (pretty well) impossible to utilize small data, much less big data (big bad data:).

    --
    Great minds think alike; fools seldom differ.
    1. Re: SPC by Anonymous Coward · · Score: 0

      Err ... that would be Western Electric.
      But use a CUSUM chart instead.

    2. Re: SPC by Mikkeles · · Score: 1

      Er, yes (he wrote, shamefacedly).

      --
      Great minds think alike; fools seldom differ.
  9. Tell me about IBM's products. by Anonymous Coward · · Score: 0

    I've never had much of a chance to use IBM offerings. What is AIX like? What is DB2 like? What is Informix like? What is Lotus like? What is WebSphere like? What is the XL C/C++ compiler like?

    1. Re:Tell me about IBM's products. by linear+a · · Score: 1

      Don't be mean!

    2. Re:Tell me about IBM's products. by Alan+Shutko · · Score: 1

      Find a rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.

    3. Re:Tell me about IBM's products. by Anonymous Coward · · Score: 0

      Their Unix products are good. Lotus Notes is a steaming pile of sh.. DB/2 can be made working by IBM sales engineers. You cannot do it on your own. Websphere is one of these Java Server Monsters and usually takes 20 Minutes to start again after your one-line bugfix.

    4. Re:Tell me about IBM's products. by Anonymous Coward · · Score: 0

      Find a rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.

      But eventually, your eyeball fluid will polish the rusty spike until it is bright, shiny and new. That's what IBM products are like.

    5. Re:Tell me about IBM's products. by Registered+Coward+v2 · · Score: 2

      Find a rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.

      Buy a very expensive rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.

      There, fixed it for you.

      --
      I'm a consultant - I convert gibberish into cash-flow.
  10. Re:Neil degASSe Tyson - liar, parrot, dangerous. by sexconker · · Score: 0

    It would have been mush more effective if you left the last few sentences out.
    Put simply, Tyson is a celebrity, not a scientist.

  11. Marketing by sexconker · · Score: 3, Funny

    If you have a marketing department, you're wasting money.
    If you hire a marketing firm, you're burning money.
    If you hire a marketing firm and then take their advice, you're emptying your bank account into a volcano.

    1. Re:Marketing by Anonymous Coward · · Score: 1

      Actually, marketing is the soul of the business.
      *Cue to the corporate-atheists that claim that business have no souls...

    2. Re:Marketing by Anonymous Coward · · Score: 0

      If you have a marketing department, you're wasting money.
      If you hire a marketing firm, you're burning money.
      If you hire a marketing firm and then take their advice, you're emptying your bank account into a volcano.

      The Late Bill Hicks rant on marketing:

      https://www.youtube.com/watch?v=gDW_Hj2K0wo

    3. Re:Marketing by thegarbz · · Score: 2

      If you don't have a marketing department no one knows you exist.

      Marketing is a bucket of shit at the best of times, but you can do very little without it.

    4. Re:Marketing by quax · · Score: 1

      Marketing also encompasses requirement gathering i.e. understanding what the market needs. Especially for the fast moving software industry it is a core business process and about much more than just advertising and branding.

    5. Re:Marketing by Anonymous Coward · · Score: 0

      So in a picture it would be like this http://dilbert.com/strip/2014-... ?

    6. Re:Marketing by Anonymous Coward · · Score: 0

      If you hire a marketing firm and then take their advice, you're emptying your bank account into a volcano.

      In other words, Scientology

  12. Examples that prove Data Science does not work by Anonymous Coward · · Score: 0

    USA Government has an Economic model for prosperity and the budget and all things apple pie including employment.
    It does not work.
    Futures Traders: In theory there to smooth out trading spikes. Well, did not work recently for the price of oil or commodities, and those employ the smartest brains of the lot with the best DWH money can buy
    Models for predicting election results: Greece, Italy - scratch that one.
    Employment office: No steaming pile of 'puter will make a dent in the numbers unemployed.
    Phone Plans. Not sure what Telcos do or what Data Scientist output actually is, but a telemarketer cold call with a strong Indian accent will not see me buying or churning.
    20 Years ago these were called decision support modelling, a variation of operations research from 1938, where the British had excellent results using just paper and pencil. One speculates in wartime, ALL inputs are considered, there is active FEEDBACK, and RESULT'S are interpreted correctly; not through rose colored glasses.

    Conclusion: Garbage In, Garbage out - and don't bother if you can't pull the levers 'out of bounds' .

  13. Data scientists == web masters by rockmuelle · · Score: 2

    Data scientists are this bubble's web masters. 'Nuff said.

    1. Re:Data scientists == web masters by gweihir · · Score: 1

      Fair assessment.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:Data scientists == web masters by Anonymous Coward · · Score: 0

      This bubble's Web Masters are Social Networking Professionals. Data Scientists would be this bubble's Sooth Sayers.

      Now, if you'll excuse me, I have some number patterns to scry.

  14. "Big Data" HYPE is "bullshit". by globaljustin · · Score: 1

    none of which disproves TFA's thesis...

    TFA is about the **hype**...everything described in your post is value-added...not hype

    --
    Thank you Dave Raggett
  15. research design = solution by globaljustin · · Score: 1

    these systems could be effective, but it comes down to ontology or more broadly research design

    i'm not saying *any* company can benefit from "big data", but most can

    the core problem is a misunderstanding of what is happening...from a to z alot of biz people are just clueless...the techies they hire to do the big data are partially responsible for this

    data analysis is great...everyone does it to some level...highly complex data analysis in a biz situation must have well thought out research questions and research design, specifically tailored for the situation

    business is too complex to have a one-size-fits-all data categorization ontology

    --
    Thank you Dave Raggett
  16. Re:Neil degASSe Tyson - liar, parrot, dangerous. by lgw · · Score: 0

    Brady Haran is neither, but he puts actual scientists on his YouTube channels, and they talk about honest science (and occasional amusing trivia), with no CGI or celebrity required. No politics, no manufactured quotes, many Nobel prizes.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  17. Good data first, then maybe big data later by EmperorOfCanada · · Score: 4, Insightful

    I have worked with many very large data sets or very important data sets covering large numbers of people (not that big just complex). In both cases my first fight was with the data itself. I don't know how many databases I would get into with fields (all in one table) like phone, phone_num, number_phone, phonenum, and then usually a magical set like phone1, phone2, phone3, and phone2a.

    Or I would have lat longs for customers that put them in 100 miles off the coast of Nova Scotia (not sable island either). Or a mostly good lat longs but if they couldn't get one then they would use the lat long of the nation's capital resulting in 20% of the customers residing in any given nation's capital which also then obscured the actual number of customers in the nation's capital.

    And then dates, can nobody ever get dates right. A favourite is that round one of the system will only record the day of a transaction but later they expand their collection to the hour and minute but now the old dates are all at noon or something. So when you try to find the usage pattern of users there will be this massive spike at noon and a scattering of transactions in the rest of the day. Try and run that through a Bayesian analysis.

    I can go on and on with one of my recent favorites is a phone company database where many phone calls never begin, or never end.

    So I think the big bucks is not in doing an ML processing of their data using some ingenious Hadoop crap but to maybe use ML to clean the data up. And by the way if someone has a tilde(~) in their name your OCR needs to be shot.

    1. Re:Good data first, then maybe big data later by Anonymous Coward · · Score: 0

      This. Data quality is almost always a huge problem. If you don't have data quality issues, you probably don't have 'big data' ;-)

    2. Re:Good data first, then maybe big data later by NeutronCowboy · · Score: 2

      Absolutely true. Unfortunately, it's far easier to convince management that the problem is the lack of a shiny tool that shows them pretty graphs than shitty data that they have to pay some consultant an ungodly amount of money to fix. Because, of course, no one in the company has the time to fix the data on which they run their business.

      --
      Those who can, do. Those who can't, sue.
    3. Re:Good data first, then maybe big data later by Registered+Coward+v2 · · Score: 2

      And then dates, can nobody ever get dates right. A favourite is that round one of the system will only record the day of a transaction but later they expand their collection to the hour and minute but now the old dates are all at noon or something. So when you try to find the usage pattern of users there will be this massive spike at noon and a scattering of transactions in the rest of the day. Try and run that through a Bayesian analysis.

      Data quality has been an issue with every project I've worked on involving data analysis or integration into a new system. One project was combining two employee databases for a merged company, where they decided to use SSNs as the key for unique records since it was a US company. Unfortunately for them, foreign employees on temporary jobs in the US often had 999-99-9999 or 123-45-6789 as SSNs, with the occasional real one thrown in. Then their were duplicate valid SSNs for employees that worked for both companies at various times in their career. That project, as with all others, confirmed my 2-2-10 law of data cleanup:

      Data cleanup will take twice as long, cost twice as much, and you will lose at least 10% of your data when you decide to finally give up scrubbing the data.

      I have since added a corollary:

      I do not do IT projects unless you pay me enough to retire on.

      --
      I'm a consultant - I convert gibberish into cash-flow.
    4. Re:Good data first, then maybe big data later by Anonymous Coward · · Score: 0

      I've been there too. I worked for a company that collected customer and sales data. The ability to offload some of the data integrity to the client was there, just not enforced. I've run into a lot of the same data issues that you had. You need a clean base. To this end, I put together a collection of stored procedures. Each procedure would evaluate one aspect of the data (ie, does each customer record have a customer number or are there any duplicate customer numbers, etc). This collection of procedures would executed and evaluated regularly. Only after this was put in place could we move on to software problems.

    5. Re:Good data first, then maybe big data later by Jumperalex · · Score: 1

      Yes! Dear Tea Pot! YES YES YES!!!!!!

      Then you find out the transactional data is jacked because it is 1) manually entered by a third party (not the user/customer) 2) entered without regard to policy 3) maybe not entered at all. [hangs head] and then they are the very ones asking for the analysis of that same data to drive their future planning and you want to beat them over the head with your rusting slide rule!!!!!!!

      --
      If you can't be good, be good at it!
    6. Re:Good data first, then maybe big data later by Jumperalex · · Score: 1

      Hey now!!! Ungodly amounts of money paid to consultants is how I make my living; don't go shitting on it :)

      --
      If you can't be good, be good at it!
    7. Re:Good data first, then maybe big data later by Jumperalex · · Score: 1

      "Data cleanup will take twice as long, cost twice as much, and you will lose at least 10% of your data when you decide to finally give up scrubbing the data."

      I like this. I will use this from now on with my client. I will be sure to give proper credit to a Registered Coward :)

      --
      If you can't be good, be good at it!
    8. Re:Good data first, then maybe big data later by dotancohen · · Score: 1

      Data cleanup will take twice as long, cost twice as much, and you will lose at least 10% of your data when you decide to finally give up scrubbing the data.

      I actually independently came up with the 10% figure today as well, and mentioned to my project manager that unless he wants to invest real money chasing the long tail of data, he was going to have 10% of the records with bogus values in some fields. I will certainly adopt the rest of your quote!

      I have since added a corollary: I do not do IT projects unless you pay me enough to retire on.

      Here you lost me. Why were you even in this business if you didn't love the challenge? Don't take other peoples' bad data personally. Take it as an opportunity.

      --
      It is dangerous to be right when the government is wrong.
    9. Re:Good data first, then maybe big data later by Registered+Coward+v2 · · Score: 1

      I have since added a corollary: I do not do IT projects unless you pay me enough to retire on.

      Here you lost me. Why were you even in this business if you didn't love the challenge? Don't take other peoples' bad data personally. Take it as an opportunity.

      I get enough work doing other things so IT work is something I can avoid unless it is lucrative enough. Most of my IT projects started out doing something differently then getting roped into staying on when they discovered I could actually deliver results. I've learned to so NO when asked to stay.

      --
      I'm a consultant - I convert gibberish into cash-flow.
    10. Re:Good data first, then maybe big data later by EmperorOfCanada · · Score: 1

      Worst database I ever worked on was the billing system for a telco. All fields text fields except for the automatically generated ID field. Thanks Lotus Notes and your IT Mall School training for that gem.

      Oh and the data input had pulldowns as a suggestion. So you could type Hal and it would suggest Halifax. But if you wanted you could just type Helifax and use that. This allowed for the easy addition of new towns and cities because in this small region they seemed to think we would be getting new towns and cities all the time when in fact it probably would have been safe to store that list in the BIOS.

    11. Re:Good data first, then maybe big data later by dotancohen · · Score: 1

      I see what you mean. You seem to suffer from The Curse of Competence:
      http://dilbert.com/strip/2008-...

      --
      It is dangerous to be right when the government is wrong.
  18. data science != big data ! by majid_aldo · · Score: 1

    big data needs data science. data science does not need big data. data science = statistics and machine learning (mostly)

    --
    --- widget evolution: enhanced, plus, super, ultra, extreme, exxxtreme, ultra-extreme, ..etc.
  19. Aren't climatologists using "Big Data"? by billrp · · Score: 1

    To predict global warming? Isn't this a form of "Data Science"?

  20. Entering the Age of Big Data False Positives by Anonymous Coward · · Score: 0

    It's easier to get it wrong then get it right at this stage of the game.

  21. Good example... by Anonymous Coward · · Score: 0

    I know of a big company that does this stuff. They found out the most profitable customers made 2 purchases quite quickly and return a lot after that. Now to me that's just quite obvious but doesn't say how you find these customers. The business interpreted it as an action to targeting customers that made 1 purchase to try to convert them into this highly profitable 2 purchase type. Isn't it obvious that this isn't the stage of intervention that actually creates these type of customers?

  22. The convoluted concept doesn't help by quax · · Score: 2

    Watson was impressive on Jeopardy, but a TV show is a very different venue than business data analytics.

    For the latter you really need a statistically sound approach in order to reach the right conclusion.

    (DISCLAIMER: I do not work for Bayesia, but actually a competitor, yet any person or company that understand Bayesianism as a sound foundation for knowledge inference knows this dirty little secret about Watson)

  23. my employer is a joke by Anonymous Coward · · Score: 0

    We are a startup with maybe 1 gig of data. Yet on our promotional material it says we use "big data"

    Ooohhkay

  24. IBM Sales Losses by Anonymous Coward · · Score: 0

    ...are due to Idiotic Management By A Pussy. She wants to increase profits while firing the best workers and moving their jobs into cheaper places.

    We now (fortunately) see this does not work.

  25. Bullshit by Anonymous Coward · · Score: 0

    Greece is a hotbed of lying, laziness and corruption. No amount of computers and software will change that.

  26. How can she live on such a low income? by Futurepower(R) · · Score: 1

    IBM CEO Ginni Rometty Made $16 Million Last Year -- Is She Underpaid?

    Top 10 Reasons Why Ginni Rometty Will Fail as IBM's New CEO

    Summary from the article:
    1. IBM Forgot Who They Were.
    2. Ginni Has No Vision for the Future of IBM.
    3. IBM Executives are out of Touch.
    4. IBM's Sales Culture is Poison.
    5. IBM's Executive Compensation is Misaligned.
    6. IBM's Rape, Pillage & Burn Acquisition Strategy.
    7. IBM's Offshore Model will kill its Services Business.
    8. IBM Sells Futures. What is IBM's strategy? Smarter Planet?
    9. Watson is not the Panacea.
    10. IBM Seems to be Preparing to Sell its Services Business.

  27. IDIOT by Anonymous Coward · · Score: 0

    If you cannot sell you contraptions, you can be Mr Einstein himself and live under a bridge.

  28. Humans ask the questions. by TapeCutter · · Score: 1

    Watson is an automated research department that extracts related facts from unstructured text much faster than any human, like any other research department it does not tell management what to do with those facts. Optimizing business processes like JIT supply chains is a branch of math called "operations research" (logistics if you are american). Much of it is closely related to computer science, which itself is a branch of maths, O/R and AI are only tangentially related to each other.

    The problem with optimizing the bottom line of a company the size of IBM is "feedback", ie - optimising a market giant like IBM will induce a change in the market itself, the changed market changes the optimal solution. The other hassle is that the problem space of optimising IBM for profit is so big that any methods use to find the optimal solution will only ever be able to find local maxima. Some humans still do this better than computers, which is why humans are the ones building computers and asking them the questions.

    --
    And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    1. Re:Humans ask the questions. by ultranova · · Score: 1

      Optimizing business processes like JIT supply chains is a branch of math called "operations research" (logistics if you are american).

      Or "garbage in, garbage out" if you've seen the results of mathematically optimized processes encountering physical reality. But hey, someone earned a bonus for implementing them, and its not their fault someone got the flu, a storm delayed a ship, a roadwork delayed a truck which thus arrived just after lunch hour began, the warehouse door got stuck so they had to use another, another company was delivering goods at the same time so ours had to wait in line, the new part didn't fit, half the workforce was "optimized" away so the rest hate your guts and now have a work ethic to match, etc. etc.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

  29. IBM has turned into GM by Taco+Cowboy · · Score: 1

    I've never had much of a chance to use IBM offerings. What is AIX like? What is DB2 like? What is Informix like? What is Lotus like? What is WebSphere like? What is the XL C/C++ compiler like?

    IBM is repeating what General Motors has been doing, putting out junks, after junks, after junks

    Decades ago it didn't matter if you bought Pontiac or Chevrolet or Buick, you bought the same fucking junk

    Nowadays it doesn't matter if it is Informix or WebsSphere or AIX or DB2 ... they simply don't worth their sticker price

    --
    Muchas Gracias, Señor Edward Snowden !
  30. Ops research by Anonymous Coward · · Score: 0

    Ps research is about asking the righ questions, not making your sophomoric simplistic models. ORs study a lot of Econ and engineering for exactly that reason.