Finding a Needle in a Haystack of Data
Roland Piquepaille writes "Finding useful information in oceans of data is an increasingly complex problem in many scientific areas. This is why researchers from Case Western Reserve University (CWRU) have created new statistical techniques to isolate useful signals buried in large datasets coming from particle physics experiments, such as the ones run in a particle collider. But their method could also be applied to a broad range of applications, like discovering a new galaxy, monitoring transactions for fraud or identifying the carrier of a virulent disease among millions of people." Case Western has also provided a link to the original paper. [PDF Warning]
They are trying to efficiently find a signal in random and chaotic data. Random and chaotic data isn't easy to index.
Mythbusters actually did an ep where they built two different needle-in-haystack finding machines, one actually did quite well...
-everphilski-
Its better to either have a a priori hypothesis to look for one specific, pre-defined pattern in a mound data than to see if any pattern is in the data. Or, if one insists on looking for many patterns, then the standards for statistical significance must be correspondingly higher.
Two wrongs don't make a right, but three lefts do.