Extreme Complexity of Scientific Data Driving New Math Techniques

← Back to Stories (view on slashdot.org)

Extreme Complexity of Scientific Data Driving New Math Techniques

Posted by Soulskill on Friday October 11, 2013 @09:59AM from the how-do-you-process-twelve-billion-data-points dept.

An anonymous reader writes "According to Wired, 'Today's big data is noisy, unstructured, and dynamic rather than static. It may also be corrupted or incomplete. ... researchers need new mathematical tools in order to glean useful information from the data sets. "Either you need a more sophisticated way to translate it into vectors, or you need to come up with a more generalized way of analyzing it," [Mathematician Jesse Johnson] said. One such new math tool is described later: "... a mathematician at Stanford University, and his then-postdoc ... were fiddling with a badly mangled image on his computer ... They were trying to find a method for improving fuzzy images, such as the ones generated by MRIs when there is insufficient time to complete a scan. On a hunch, Candes applied an algorithm designed to clean up fuzzy images, expecting to see a slight improvement. What appeared on his computer screen instead was a perfectly rendered image. Candes compares the unlikeliness of the result to being given just the first three digits of a 10-digit bank account number, and correctly guessing the remaining seven digits. But it wasn't a fluke. The same thing happened when he applied the same technique to other incomplete images. The key to the technique's success is a concept known as sparsity, which usually denotes an image's complexity, or lack thereof. It's a mathematical version of Occam's razor: While there may be millions of possible reconstructions for a fuzzy, ill-defined image, the simplest (sparsest) version is probably the best fit. Out of this serendipitous discovery, compressed sensing was born.'"

20 of 107 comments (clear)

Min score:

Reason:

Sort:

Enough with this big data bullshit by Anonymous Coward · 2013-10-11 10:09 · Score: 3, Insightful

For fuck's sake.
These techniques of dealing with incomplete and unstructured data have existed for decades.
AI researches hyping absolutely everything about their field to get some funding is starting to get on my nerves.
Amazing intuition by ZeroPly · 2013-10-11 10:11 · Score: 5, Funny

"They were trying to find a method for improving fuzzy images, such as the ones generated by MRIs when there is insufficient time to complete a scan. On a hunch, Candes applied an algorithm designed to clean up fuzzy images,[...]"

Wow! That would be the last thing I thought of in that situation...

--
Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
1. Re:Amazing intuition by Anonymous Coward · 2013-10-11 10:52 · Score: 2, Funny
  
  They were trying to reach him to talk to him. Oh a hunch, the Nobel committee applied a phone designed to reach people.
2. Re:Amazing intuition by Anonymous Coward · 2013-10-11 15:28 · Score: 4, Funny
  
  But it's even more amazing than that.
  The Nobel committee only had the first three digits of his phone (the area code), so they applied the same algorithm, and bam! Turns out it works just as well for phone numbers.
  They got him on the first ring too. But that part is just coincidence.
clear, but wrong by raymorris · 2013-10-11 10:15 · Score: 2

While there may be millions of possible reconstructions for a fuzzy, ill-defined image, the simplest (sparsest) version is probably the best fit."
Of the millions of possibilities, the sparsest is MOST likely. Perhaps it's twice as likely as any other possibility. That still means it's 99.999% likely to be wrong.
As for the MRI, that fuzzy part is probably noise that can be deleted, except when it's a tumor.
"
We are the ones in need of a network by Vesvvi · 2013-10-11 10:20 · Score: 5, Insightful

I like some of the more subtle details in the title and summary: new math "techniques", "researchers need new mathematical tools", etc.
I find it hard to believe that our sciences are driving the math fields, as mature and well-developed as the math community is. But it is true that existing knowledge and tools from mathematics drive huge advances in the sciences when they are brought to bear. The sad truth is that scientists just don't play terribly well with others (maybe no one does): interdisciplinary work is rare and difficult, and so we end up re-inventing the wheel over and over again. The reality is that the "wheel" being created by the biologist in order to interpret their data is a poor copy of the one already understood by the physicist across campus.
What can we do about this? I'm not sure, but I think it's safe to say that our greatest scientific advances in the next few decades will be the result of novel collaborations, and not novel math or (strictly speaking) novel science.
1. Re:We are the ones in need of a network by JanneM · 2013-10-11 11:42 · Score: 3, Informative
  
  I find it hard to believe that our sciences are driving the math fields, as mature and well-developed as the math community is.
  This has actually always been the norm. Physics has long driven mathematics research for instance; many areas of calculus were created/discovered specifically to solve problems in physics.
  
  --
  Trust the Computer. The Computer is your friend.
2. Re:We are the ones in need of a network by fph+il+quozientatore · 2013-10-11 20:10 · Score: 2
  
  I like some of the more subtle details in the title and summary: new math "techniques", "researchers need new mathematical tools", etc.
  
  The summary isn't too inaccurate; what they are talking about is compressed sensing https://en.wikipedia.org/wiki/Compressed_sensing, i.e., the search for sparse (as in: with few nonzero elements) solutions to underdetermined systems of nonlinear equations. "Sparse" is understood in suitable basis, so for instance for a sound it could mean few different frequencies. The problem in itself is NP-hard, but it turns out that in some cases of interest you can get the solution or a reasonable approximation by solving a convex programming problem (minimizing the 1-norm rather than the sparsity).
  
  --
  My first program:
  Hell Segmentation fault
Re:I dunno about you... by almitydave · 2013-10-11 10:22 · Score: 4, Funny

Yeah, my doctor couldn't see enough detail in my head x-ray, so he used Photoshop's "content-aware fill" to fix it, and now apparently I need surgery to remove the 3rd half of my brain. I get to keep the 2 extra eyeballs, though.
(actually, I really really want to see that applied to medical x-rays)

--
my, your, his/her/its, our, your, their
I'm, you're, he's/she's/it's, we're, you're, they're
Re:I dunno about you... by lgw · 2013-10-11 10:25 · Score: 4, Funny

OF course it works. "Zoom! Enhance!" If TV hasn't taught me that "enhance" works reliably, then TV has taught me nothing.

--
Socialism: a lie told by totalitarians and believed by fools.
Re:I dunno about you... by timeOday · 2013-10-11 11:14 · Score: 2

Have you ever played with the compression level on jpg? At some point, enough is enough. Now instead of lossy compression, imagine we're talking about how much radiation to shoot into your nads to get a clean xray. There are diminishing returns on image quality for each doubling of the radiation. Are you still so sure you want to turn it up to 11?
Re:I dunno about you... by Arkh89 · 2013-10-11 11:30 · Score: 2

That is NOT the way to understand these sets of techniques. Candes, Tao and Donoho's works are basically about saying : what is the minimum number of measurements that I have to do to make sure that the reconstruction of the signal will be sufficient (for a given task), assuming that the signal has some known properties?
Let's say you hear the sound of horseshoes while walking in a street, if I ask you what is the color of the coat of the animal, you won't probably start by saying "red" or "blue". This is because you know already some of the classical equine coats colors which means you technically need less information to find the real color.
This technologies can also help for q signal corrupted by noise since the properties of the first might be, in some way, orthogonal to the last, leading to a clean removal.
Fascinating! by Fubari · 2013-10-11 11:36 · Score: 2

This! is the kind of article I joined slashdot to find out about.
I wish there was a way to mod actual articles +1 or -1 instead of just modding comments; or to at least toss the submitter a karma point or something.
Old news on old news by key45 · 2013-10-11 11:52 · Score: 3, Funny

4 years ago, Slashdot ran this exact same story http://science.slashdot.org/story/10/03/02/0242224/recovering-data-from-noise about Wired running this exact same story: http://www.wired.com/magazine/2010/02/ff_algorithm/all/1
informercial by stenvar · 2013-10-11 12:11 · Score: 4, Insightful

The whole article is just a sales job:

That is the basis of the proprietary technology Carlsson offers through his start-up venture, Ayasdi, which produces a compressed representation of high dimensional data in smaller bits, similar to a map of London’s tube system.
The first place to look when people make such claims is at their publications, neither Gunnar Carlsson nor Simon DeDeo have significant publications that show that their approach works on real data or standard test sets. The statements in the article that these kinds of approaches are new are also bogus (I don't know whether they are deceptive or ignorant).
Lastly, from a Stanford math professor, I would expect better citation statistics overall; I don't know what's going on there.
http://scholar.google.de/citations?user=nCGwiu0AAAAJ&hl=en
http://scholar.google.de/scholar?as_ylo=2009&q=author:%22gunnar+carlsson%22&hl=en&as_sdt=0,5
1. Re:informercial by Anonymous Coward · 2013-10-11 13:56 · Score: 3, Informative
  
  What are you smoking? 1877 citations since 2008 isn't a good citation statistic? More importantly, judging someone's research value by absolute citation statistic is quite silly; he is a full Stanford Professor for his accomplishments, intellect, and personality (I hear he is a good advisor).
  While the article is quite a promotional piece, you don't know much about the field. Gunnar Carlsson and his group have advanced computational topology moreso than any other. He came up with the concept and way to compute persistent homology, one of the actually useful and computable advancements that has come out of the topology field. It allows you to reason about clustering much better than any adhoc statistical measure.
  The current computational topology tools implemented by grad students today, like PLEX, dont scale very well. His group had some proprietary advancement that scales well, and spun off the data science company.
  If you would like two links that are actually informative about what Gunnar does:
  http://comptop.stanford.edu/
  http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-01249-X.pdf
2. Re:informercial by stenvar · 2013-10-11 14:01 · Score: 2
  
  How does a project web page make up for the lack of relevant peer reviewed publications or lack of citations?
  Where are the published results on real-world data sets? Or do you believe that a lot of verbiage is sufficient?
Re:I dunno about you... by icebike · 2013-10-11 12:42 · Score: 2

But every once in a while, you'd be so screwed.
Occam would surely ride in and save the day.

--
Sig Battery depleted. Reverting to safe mode.
Re:I dunno about you... by SuricouRaven · 2013-10-11 20:11 · Score: 2

"Checked. Still no weapons of mass destruction."
"Damnit... switch to a lower resolution and try again!"
Re:I dunno about you... by cellocgw · 2013-10-12 01:28 · Score: 2

but I don't think I'd want my doctor working from a "fuzzy logic" MRI if I had (God forbid) a BRAIN TUMOR or something...
Then I got bad news for you: NMR imaging and CAT imaging depend on algorithms with names like "Maximimum A Priori Likelihood Estimation." They *all* depend on making the best bet as to what the reconstructed image should be. It just turns out (thanks to that thing called mathematical statistics) that the correct solution is overwhelmingly positive. "Fuzzy Logic" does not mean what I think you think it means, i.e. "some random drunk posting to /."

--
https://app.box.com/WitthoftResume Code: https://github.com/cellocgw