Extreme Complexity of Scientific Data Driving New Math Techniques
An anonymous reader writes "According to Wired, 'Today's big data is noisy, unstructured, and dynamic rather than static. It may also be corrupted or incomplete. ... researchers need new mathematical tools in order to glean useful information from the data sets. "Either you need a more sophisticated way to translate it into vectors, or you need to come up with a more generalized way of analyzing it," [Mathematician Jesse Johnson] said. One such new math tool is described later: "... a mathematician at Stanford University, and his then-postdoc ... were fiddling with a badly mangled image on his computer ... They were trying to find a method for improving fuzzy images, such as the ones generated by MRIs when there is insufficient time to complete a scan. On a hunch, Candes applied an algorithm designed to clean up fuzzy images, expecting to see a slight improvement. What appeared on his computer screen instead was a perfectly rendered image. Candes compares the unlikeliness of the result to being given just the first three digits of a 10-digit bank account number, and correctly guessing the remaining seven digits. But it wasn't a fluke. The same thing happened when he applied the same technique to other incomplete images. The key to the technique's success is a concept known as sparsity, which usually denotes an image's complexity, or lack thereof. It's a mathematical version of Occam's razor: While there may be millions of possible reconstructions for a fuzzy, ill-defined image, the simplest (sparsest) version is probably the best fit. Out of this serendipitous discovery, compressed sensing was born.'"
For fuck's sake.
These techniques of dealing with incomplete and unstructured data have existed for decades.
AI researches hyping absolutely everything about their field to get some funding is starting to get on my nerves.
"They were trying to find a method for improving fuzzy images, such as the ones generated by MRIs when there is insufficient time to complete a scan. On a hunch, Candes applied an algorithm designed to clean up fuzzy images,[...]"
Wow! That would be the last thing I thought of in that situation...
Support microSD: in a post 9/11 world, it is unwise to carry your data on media that you cannot comfortably swallow.
While there may be millions of possible reconstructions for a fuzzy, ill-defined image, the simplest (sparsest) version is probably the best fit."
Of the millions of possibilities, the sparsest is MOST likely. Perhaps it's twice as likely as any other possibility. That still means it's 99.999% likely to be wrong.
As for the MRI, that fuzzy part is probably noise that can be deleted, except when it's a tumor.
"
I like some of the more subtle details in the title and summary: new math "techniques", "researchers need new mathematical tools", etc.
I find it hard to believe that our sciences are driving the math fields, as mature and well-developed as the math community is. But it is true that existing knowledge and tools from mathematics drive huge advances in the sciences when they are brought to bear. The sad truth is that scientists just don't play terribly well with others (maybe no one does): interdisciplinary work is rare and difficult, and so we end up re-inventing the wheel over and over again. The reality is that the "wheel" being created by the biologist in order to interpret their data is a poor copy of the one already understood by the physicist across campus.
What can we do about this? I'm not sure, but I think it's safe to say that our greatest scientific advances in the next few decades will be the result of novel collaborations, and not novel math or (strictly speaking) novel science.
Yeah, my doctor couldn't see enough detail in my head x-ray, so he used Photoshop's "content-aware fill" to fix it, and now apparently I need surgery to remove the 3rd half of my brain. I get to keep the 2 extra eyeballs, though.
(actually, I really really want to see that applied to medical x-rays)
my, your, his/her/its, our, your, their
I'm, you're, he's/she's/it's, we're, you're, they're
OF course it works. "Zoom! Enhance!" If TV hasn't taught me that "enhance" works reliably, then TV has taught me nothing.
Socialism: a lie told by totalitarians and believed by fools.
Have you ever played with the compression level on jpg? At some point, enough is enough. Now instead of lossy compression, imagine we're talking about how much radiation to shoot into your nads to get a clean xray. There are diminishing returns on image quality for each doubling of the radiation. Are you still so sure you want to turn it up to 11?
That is NOT the way to understand these sets of techniques. Candes, Tao and Donoho's works are basically about saying : what is the minimum number of measurements that I have to do to make sure that the reconstruction of the signal will be sufficient (for a given task), assuming that the signal has some known properties?
Let's say you hear the sound of horseshoes while walking in a street, if I ask you what is the color of the coat of the animal, you won't probably start by saying "red" or "blue". This is because you know already some of the classical equine coats colors which means you technically need less information to find the real color.
This technologies can also help for q signal corrupted by noise since the properties of the first might be, in some way, orthogonal to the last, leading to a clean removal.
This! is the kind of article I joined slashdot to find out about.
I wish there was a way to mod actual articles +1 or -1 instead of just modding comments; or to at least toss the submitter a karma point or something.
4 years ago, Slashdot ran this exact same story http://science.slashdot.org/story/10/03/02/0242224/recovering-data-from-noise about Wired running this exact same story: http://www.wired.com/magazine/2010/02/ff_algorithm/all/1
The whole article is just a sales job:
The first place to look when people make such claims is at their publications, neither Gunnar Carlsson nor Simon DeDeo have significant publications that show that their approach works on real data or standard test sets. The statements in the article that these kinds of approaches are new are also bogus (I don't know whether they are deceptive or ignorant).
Lastly, from a Stanford math professor, I would expect better citation statistics overall; I don't know what's going on there.
http://scholar.google.de/citations?user=nCGwiu0AAAAJ&hl=en
http://scholar.google.de/scholar?as_ylo=2009&q=author:%22gunnar+carlsson%22&hl=en&as_sdt=0,5
But every once in a while, you'd be so screwed.
Occam would surely ride in and save the day.
Sig Battery depleted. Reverting to safe mode.
"Checked. Still no weapons of mass destruction."
"Damnit... switch to a lower resolution and try again!"
but I don't think I'd want my doctor working from a "fuzzy logic" MRI if I had (God forbid) a BRAIN TUMOR or something...
Then I got bad news for you: NMR imaging and CAT imaging depend on algorithms with names like "Maximimum A Priori Likelihood Estimation." They *all* depend on making the best bet as to what the reconstructed image should be. It just turns out (thanks to that thing called mathematical statistics) that the correct solution is overwhelmingly positive. "Fuzzy Logic" does not mean what I think you think it means, i.e. "some random drunk posting to /."
https://app.box.com/WitthoftResume Code: https://github.com/cellocgw