Slashdot Mirror


Why Is Data Mining Still A Frontier?

bbsguru writes "How much do we know that we still don't know? A story in The Register points out that little has changed since Francis Bacon proposed combining knowledge to learn new things 400 years ago, despite all the computer power we now have. Scientific (and other) data is still housed in unrelated collections, waiting for some enterprising Relational Database Programmer to unlock the keys to understanding. Is RDBMS still a Brave New Frontier, or will Google make the art obsolete once they finish indexing everything?"

7 of 223 comments (clear)

  1. Shot in the dark: by Spazntwich · · Score: 5, Insightful

    Either
    a) There's not enough money in it to make it worthwhile

    or

    b) It doesn't work.

    1. Re:Shot in the dark: by Disavian · · Score: 5, Insightful

      How about

      c) our ability to produce data far outstrips our ability and/or willingness to analzye it

    2. Re:Shot in the dark: by flynt · · Score: 4, Insightful

      Also, blindly "mining" data for trends can be very misleading. Hypothesis generation is usually better done some other way. There will always be trends in data we already have that are there by chance, and this is what data mining finds in many cases. Then models are fit to that data and don't validate on future samples taken, and everyone wonders why.

    3. Re:Shot in the dark: by plover · · Score: 4, Insightful
      I have to wonder if data mining isn't the problem -- the real problem seems to be that there are few obvious problems data mining will solve.

      Consider WalM*rt. When the 2005 hurricanes were predicted, they mined their sales data for previous hurricanes. They found that in the last hurricane people stocked up on beer, pop tarts and peanut butter, so they sent trucks full of that stuff to the stores in the path of the hurricanes. They made lots of sales, and provided a valuable service to the communities. Capitalism at its finest.

      Data mining worked very well in this case. The issue was "here's an obvious problem, and a clever solution involving data mining."

      The big problem is that people expect the same golden results from non-obvious situations. "Hey, sales are down in the Wisconsin stores, let's do some data mining to figure out what they'll buy" makes no sense. Data mining worked well in the case of an obvious trigger event, but data mining by itself didn't reveal the trigger. You can't predict hurricanes based on the sales of pop tarts and beer, for example.

      But, can you ever correlate pop tart and beer sales to an external event? You might be able to go back and say "here's a strange case where pop tarts and beer sold out quickly, why did this happen?" If you can tie this to external events, you'd think you'd be better prepared to react to the same events in the future.

      Maybe correlating sales to Google News is the next step? Republican scandal == lower white bread sales; French riots + Senate bickering over immigration control reform == higher 'Peeps' sales; etc. p. Or maybe it's always been a bad idea to equate correlation with causality.

      --
      John
  2. I tell you why (from a bioinformatics viewpoint) by Neil+Blender · · Score: 5, Insightful

    Programmers have no idea of context. Biologists have no idea about programming. It is very hard to mix the two. You can be the shit-hottest dba in the world but if you have no relevant (deep) biology background you are guaranteed to produce crap. Almost every piece of biological software is a POS because of this.

  3. Because it's not sexy by beacher · · Score: 4, Insightful

    From my expierience - The people who are subject matter experts in their field (outside of computers) and typically don't have the time to perform all of the data entry. So you have to get an ETL / Miner to do all of the work for you. ETL and data mining are *NOT* the sexiest jobs in the industry by a long shot. Auditing data makes you want to gouge your eyes out after the fourth day straight of reviewing loads.

  4. 42 by DesertWolf0132 · · Score: 4, Insightful

    "I checked it very thoroughly," said the computer, "and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you've never actually known what the question is."-Hitchhiker's Guide to the Galaxy"

    One must remember when undertaking to find answers in the data to first figure out the question. Otherwise the answer you find will be as useful to you as the answer 42.

    Without context you only have a neat compilation of arranged meaningless facts.

    On the small scale data mining is used daily by marketing people and the like to figure out who would be most receptive to their approach. Webmasters use it to optimize content and respond to user trends. In most large corporations data mining is used on some level.

    Data mining on the scale discussed here may be practical at some point in the future once we determine the questions we wish answers to.

    Let us hope the answer is more useful than 42.

    --
    No animals were harmed in the making of this sig.
    Well, there was that one puppy, but he is all better now.