Slashdot Mirror


Cutting Through Data Science Hype

An anonymous reader writes: Data science — or "big data" if you prefer — has evolved into a full-fledged buzzword, thanks to marketing departments around the world. John Foreman writes that part of the marketing blitz has been focused on how fast big data analysis can be. Most companies offering some kind of analytic service try to sell you on how it'll make it easy for you to quickly find and fix the problems with your business. But he points out that good, robust models need a stable set of inputs, and businesses often change far too quickly for any kind of stable prediction. He takes IBM's analytic services as an example, quoting Kevin Hillstrom: "If IBM Watson can find hidden correlations that help your business, then why can't IBM Watson stem a 3 year sales drop at IBM?" Foreman offers some simple advice: "Simple analyses don't require huge models that get blown away when the business changes. ... If your business is currently too chaotic to support a complex model, don't build one."

19 of 99 comments (clear)

  1. IBM's got this by turkeydance · · Score: 2

    "we don't need no stinkin' sales", we have Ginni.

  2. Reminds me of a joke by ShaunC · · Score: 5, Funny

    "Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

    --
    Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
    1. Re:Reminds me of a joke by Anonymous Coward · · Score: 2, Funny

      "Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

      Well, OK, but this is slashdot. Are you sure your audience will get this analogy? Can you try to rework this into a car analogy instead?

    2. Re:Reminds me of a joke by Registered+Coward+v2 · · Score: 5, Funny

      "Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

      Well, OK, but this is slashdot. Are you sure your audience will get this analogy? Can you try to rework this into a car analogy instead?

      "Big Data" is like sex in a car while in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

      --
      I'm a consultant - I convert gibberish into cash-flow.
    3. Re:Reminds me of a joke by K.+S.+Kyosuke · · Score: 2

      Vulgar, as in perfectly ordinary.

      --
      Ezekiel 23:20
  3. SPC by Mikkeles · · Score: 2

    Statistical Process Control and Western Digital rule are very applicable here. Without stability for a baseline, it's (pretty well) impossible to utilize small data, much less big data (big bad data:).

    --
    Great minds think alike; fools seldom differ.
  4. Re:IBM by Sarten-X · · Score: 5, Insightful

    This pretty much sums up the entirety of Big Data.

    Data analysis can highlight the correlations that would otherwise go unnoticed, and the "big" data sets involved help to ensure that the noticed correlations are statistically significant. With a large enough sample size, the effects of time can be eliminated from the statistics, supporting analysis of even highly-dynamic models. To a statistician, this is all trivial, given a large enough data set.

    Once correlations are discovered, interpreting them in the business context is a different matter for which computers are not well-suited. As the phrase goes, correlation is not causation. A business expert must analyse the observations and figure out what it all means. There may be a correlation indicating a causal relationship, or there may be a hidden cause not covered by the available data.

    Even if a causal relationship can be identified, the management may not want to act on it. Sure, the company might make more money by changing their behavior in a particular market segment, but if that segment is dying, it may not be worth the expense to change now. That's also not a task for computers, yet.

    Big Data techniques are effectively just a tool. It does one job particularly well, and does a few other jobs well enough to be useful. It is still up to humans to determine if Big Data is the best tool for a particular situation.

    --
    You do not have a moral or legal right to do absolutely anything you want.
  5. Marketing by sexconker · · Score: 3, Funny

    If you have a marketing department, you're wasting money.
    If you hire a marketing firm, you're burning money.
    If you hire a marketing firm and then take their advice, you're emptying your bank account into a volcano.

    1. Re:Marketing by thegarbz · · Score: 2

      If you don't have a marketing department no one knows you exist.

      Marketing is a bucket of shit at the best of times, but you can do very little without it.

  6. Re:Missing the forest for the trees by CaptainDork · · Score: 2

    The dinosaurs did not die out because they were unable to adapt anymore than a person dies because they fail to "adapt" to a grenade.

    --
    It little behooves the best of us to comment on the rest of us.
  7. Data scientists == web masters by rockmuelle · · Score: 2

    Data scientists are this bubble's web masters. 'Nuff said.

  8. Good data first, then maybe big data later by EmperorOfCanada · · Score: 4, Insightful

    I have worked with many very large data sets or very important data sets covering large numbers of people (not that big just complex). In both cases my first fight was with the data itself. I don't know how many databases I would get into with fields (all in one table) like phone, phone_num, number_phone, phonenum, and then usually a magical set like phone1, phone2, phone3, and phone2a.

    Or I would have lat longs for customers that put them in 100 miles off the coast of Nova Scotia (not sable island either). Or a mostly good lat longs but if they couldn't get one then they would use the lat long of the nation's capital resulting in 20% of the customers residing in any given nation's capital which also then obscured the actual number of customers in the nation's capital.

    And then dates, can nobody ever get dates right. A favourite is that round one of the system will only record the day of a transaction but later they expand their collection to the hour and minute but now the old dates are all at noon or something. So when you try to find the usage pattern of users there will be this massive spike at noon and a scattering of transactions in the rest of the day. Try and run that through a Bayesian analysis.

    I can go on and on with one of my recent favorites is a phone company database where many phone calls never begin, or never end.

    So I think the big bucks is not in doing an ML processing of their data using some ingenious Hadoop crap but to maybe use ML to clean the data up. And by the way if someone has a tilde(~) in their name your OCR needs to be shot.

    1. Re:Good data first, then maybe big data later by NeutronCowboy · · Score: 2

      Absolutely true. Unfortunately, it's far easier to convince management that the problem is the lack of a shiny tool that shows them pretty graphs than shitty data that they have to pay some consultant an ungodly amount of money to fix. Because, of course, no one in the company has the time to fix the data on which they run their business.

      --
      Those who can, do. Those who can't, sue.
    2. Re:Good data first, then maybe big data later by Registered+Coward+v2 · · Score: 2

      And then dates, can nobody ever get dates right. A favourite is that round one of the system will only record the day of a transaction but later they expand their collection to the hour and minute but now the old dates are all at noon or something. So when you try to find the usage pattern of users there will be this massive spike at noon and a scattering of transactions in the rest of the day. Try and run that through a Bayesian analysis.

      Data quality has been an issue with every project I've worked on involving data analysis or integration into a new system. One project was combining two employee databases for a merged company, where they decided to use SSNs as the key for unique records since it was a US company. Unfortunately for them, foreign employees on temporary jobs in the US often had 999-99-9999 or 123-45-6789 as SSNs, with the occasional real one thrown in. Then their were duplicate valid SSNs for employees that worked for both companies at various times in their career. That project, as with all others, confirmed my 2-2-10 law of data cleanup:

      Data cleanup will take twice as long, cost twice as much, and you will lose at least 10% of your data when you decide to finally give up scrubbing the data.

      I have since added a corollary:

      I do not do IT projects unless you pay me enough to retire on.

      --
      I'm a consultant - I convert gibberish into cash-flow.
  9. The convoluted concept doesn't help by quax · · Score: 2

    Watson was impressive on Jeopardy, but a TV show is a very different venue than business data analytics.

    For the latter you really need a statistically sound approach in order to reach the right conclusion.

    (DISCLAIMER: I do not work for Bayesia, but actually a competitor, yet any person or company that understand Bayesianism as a sound foundation for knowledge inference knows this dirty little secret about Watson)

  10. Re:Missing the forest for the trees by TapeCutter · · Score: 2

    What do you mean "dinosaurs failed to adapt", there are several of them flying around in my garden right now!

    --
    And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  11. Re:Tell me about IBM's products. by Registered+Coward+v2 · · Score: 2

    Find a rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.

    Buy a very expensive rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.

    There, fixed it for you.

    --
    I'm a consultant - I convert gibberish into cash-flow.
  12. Re:Missing the forest for the trees by fuzzyfuzzyfungus · · Score: 2

    Birds heap shame upon their ancestors merely by existing. (Except maybe shrikes; their willingness to keep up a proud tradition of bloodthirsty carnivorous murder despite now being about the size of a sparrow is pretty honorable).

  13. Re:Missing the forest for the trees by Antique+Geekmeister · · Score: 3, Interesting

    >> Catastrophe is a critical factor in most evolutionary history.

    > Citation, please.

    Wikipedia has a fairly good entry on "Catastrophism", and another on "Punctuated equilibrium". But even without large scale events such as dinosaur killer asteroids or the evolution of photosynthesis poisoning most species with much higher concentrations of volatile oxygen, the are much smaller and more frequent effects. Forest fires are a crtical factor in breeding jack pine trees, floods are vital to the fertility of the ecosystem near river banks, and hurricanes spread species throughout their trail and profoundly affect the ecology and evolution of areas that are likely to endure hurricanes. And catastrophes can and do create a "founder effect", where a small number of introduced species members become a new species quite quickly in their new environment.

    Do I need to find individual links links for each of those?