Slashdot Mirror


Extreme Complexity of Scientific Data Driving New Math Techniques

An anonymous reader writes "According to Wired, 'Today's big data is noisy, unstructured, and dynamic rather than static. It may also be corrupted or incomplete. ... researchers need new mathematical tools in order to glean useful information from the data sets. "Either you need a more sophisticated way to translate it into vectors, or you need to come up with a more generalized way of analyzing it," [Mathematician Jesse Johnson] said. One such new math tool is described later: "... a mathematician at Stanford University, and his then-postdoc ... were fiddling with a badly mangled image on his computer ... They were trying to find a method for improving fuzzy images, such as the ones generated by MRIs when there is insufficient time to complete a scan. On a hunch, Candes applied an algorithm designed to clean up fuzzy images, expecting to see a slight improvement. What appeared on his computer screen instead was a perfectly rendered image. Candes compares the unlikeliness of the result to being given just the first three digits of a 10-digit bank account number, and correctly guessing the remaining seven digits. But it wasn't a fluke. The same thing happened when he applied the same technique to other incomplete images. The key to the technique's success is a concept known as sparsity, which usually denotes an image's complexity, or lack thereof. It's a mathematical version of Occam's razor: While there may be millions of possible reconstructions for a fuzzy, ill-defined image, the simplest (sparsest) version is probably the best fit. Out of this serendipitous discovery, compressed sensing was born.'"

10 of 107 comments (clear)

  1. Enough with this big data bullshit by Anonymous Coward · · Score: 3, Insightful

    For fuck's sake.

    These techniques of dealing with incomplete and unstructured data have existed for decades.

    AI researches hyping absolutely everything about their field to get some funding is starting to get on my nerves.

  2. Amazing intuition by ZeroPly · · Score: 5, Funny

    "They were trying to find a method for improving fuzzy images, such as the ones generated by MRIs when there is insufficient time to complete a scan. On a hunch, Candes applied an algorithm designed to clean up fuzzy images,[...]"

    Wow! That would be the last thing I thought of in that situation...

    --
    Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
    1. Re:Amazing intuition by Anonymous Coward · · Score: 4, Funny

      But it's even more amazing than that.

      The Nobel committee only had the first three digits of his phone (the area code), so they applied the same algorithm, and bam! Turns out it works just as well for phone numbers.

      They got him on the first ring too. But that part is just coincidence.

  3. We are the ones in need of a network by Vesvvi · · Score: 5, Insightful

    I like some of the more subtle details in the title and summary: new math "techniques", "researchers need new mathematical tools", etc.

    I find it hard to believe that our sciences are driving the math fields, as mature and well-developed as the math community is. But it is true that existing knowledge and tools from mathematics drive huge advances in the sciences when they are brought to bear. The sad truth is that scientists just don't play terribly well with others (maybe no one does): interdisciplinary work is rare and difficult, and so we end up re-inventing the wheel over and over again. The reality is that the "wheel" being created by the biologist in order to interpret their data is a poor copy of the one already understood by the physicist across campus.

    What can we do about this? I'm not sure, but I think it's safe to say that our greatest scientific advances in the next few decades will be the result of novel collaborations, and not novel math or (strictly speaking) novel science.

    1. Re:We are the ones in need of a network by JanneM · · Score: 3, Informative

      I find it hard to believe that our sciences are driving the math fields, as mature and well-developed as the math community is.

      This has actually always been the norm. Physics has long driven mathematics research for instance; many areas of calculus were created/discovered specifically to solve problems in physics.

      --
      Trust the Computer. The Computer is your friend.
  4. Re:I dunno about you... by almitydave · · Score: 4, Funny

    Yeah, my doctor couldn't see enough detail in my head x-ray, so he used Photoshop's "content-aware fill" to fix it, and now apparently I need surgery to remove the 3rd half of my brain. I get to keep the 2 extra eyeballs, though.

    (actually, I really really want to see that applied to medical x-rays)

    --
    my, your, his/her/its, our, your, their
    I'm, you're, he's/she's/it's, we're, you're, they're
  5. Re:I dunno about you... by lgw · · Score: 4, Funny

    OF course it works. "Zoom! Enhance!" If TV hasn't taught me that "enhance" works reliably, then TV has taught me nothing.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  6. Old news on old news by key45 · · Score: 3, Funny

    4 years ago, Slashdot ran this exact same story http://science.slashdot.org/story/10/03/02/0242224/recovering-data-from-noise about Wired running this exact same story: http://www.wired.com/magazine/2010/02/ff_algorithm/all/1

  7. informercial by stenvar · · Score: 4, Insightful

    The whole article is just a sales job:

    That is the basis of the proprietary technology Carlsson offers through his start-up venture, Ayasdi, which produces a compressed representation of high dimensional data in smaller bits, similar to a map of London’s tube system.

    The first place to look when people make such claims is at their publications, neither Gunnar Carlsson nor Simon DeDeo have significant publications that show that their approach works on real data or standard test sets. The statements in the article that these kinds of approaches are new are also bogus (I don't know whether they are deceptive or ignorant).

    Lastly, from a Stanford math professor, I would expect better citation statistics overall; I don't know what's going on there.

    http://scholar.google.de/citations?user=nCGwiu0AAAAJ&hl=en

    http://scholar.google.de/scholar?as_ylo=2009&q=author:%22gunnar+carlsson%22&hl=en&as_sdt=0,5

    1. Re:informercial by Anonymous Coward · · Score: 3, Informative

      What are you smoking? 1877 citations since 2008 isn't a good citation statistic? More importantly, judging someone's research value by absolute citation statistic is quite silly; he is a full Stanford Professor for his accomplishments, intellect, and personality (I hear he is a good advisor).

      While the article is quite a promotional piece, you don't know much about the field. Gunnar Carlsson and his group have advanced computational topology moreso than any other. He came up with the concept and way to compute persistent homology, one of the actually useful and computable advancements that has come out of the topology field. It allows you to reason about clustering much better than any adhoc statistical measure.

      The current computational topology tools implemented by grad students today, like PLEX, dont scale very well. His group had some proprietary advancement that scales well, and spun off the data science company.

      If you would like two links that are actually informative about what Gunnar does:
      http://comptop.stanford.edu/
      http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf