Google Begat the End of the Scientific Method?
TheSauce writes "In a fairly concise one-pager from Chris Anderson, at Wired, the editor posits that all of our current (or now previous) models for collecting data are dead. The content is compelling. It notes that we've entered the Age of the Petabyte — where one can collect immense amounts of data that are paradigm agnostic. It goes on to add a comment from the head of Google's R&D, that we need an update to George Box's maxim: 'All models are wrong, and increasingly you can succeed without them.' Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?"
WTF?
English, ---, do you speak it?
> made moot by vast clouds of information Sure, seeing how 90% of websites are door-ways, satellites, and other SEO tricks. Way to go, interwebz.
I saw the article yesterday, but it was so WTFey I just moved on...definitely not Slashdot submission material (especially being a Wired article).
"When information is power, privacy is freedom" - Jah-Wren Ryel
They may lead from one to the other but they are not all the same thing.
"one can collect intense amounts of data that is paradigm agnostic"
No data is paradigm agnostic. You already chose to either collect it or pay attention to it, and neither of those decisions are paradigm-agnostic. Not to mention, the data must be stored in paradigm-laden formats. Units and categories that may mean nothing, or everything.
Don't worry, hardly anyone else really understood Kuhn either.
Until cells, molecules, atoms, and subatomic particles start publishing blogs, the scientific method will remain useful.
So everything possible has been researched now and therefore no more research is necessary since it will all be on the internet? Ridiculous!
what?
the current qoute "Never frighten a small man -- he'll kill you." seems more relevent
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
kinda predictable if you ask me.....hardware as well as software will always be ever expanding...
I just hope Petaphile never becomes mainstream.
Um, no. Claims like this demonstrate a lack of understanding of what a model is.
From the perspective of physics, the universe is just a massive amount of data--more data than any single human can comprehend at once. But thanks to the models of Newton we have a set of relatively simple equations that describe, generally, the way bodies in the universe interact. The model is not perfect, but it is useful.
Likewise, Google uses a very explicit model to describe the universe of the web: some pages are more relevant to a given search query than others, and these pages will generally be more 'popular' among other important pages. Again, the model is not perfect, but it is useful.
The fallacy is that somehow what Google is doing is a paradigm shift. It's not. It's just applying the same kind of scientific method to a type of data that hadn't existed before.
What, I think, the article is really trying to say is that Google's data is so massive and complex that we can't ascribe any explanation to the results it gives us. First of all, that is false, because the PageRank algorithm in its simplest form does give us a very explicit explanation (popular pages generally return better results). But even if it were true, Newton faced the same kind of accusations when people called his model of the universe 'Godless' and claimed, for example, that he decribed how gravity works without actually explaining "why" it works like it does. And that accusation is always with science. There are always more questions raised than answered. This is nothing new.
"Computers."
Honestly, that's about the gist of the article, and it left me wondering just what the point of it was. Until I remembered the career advice from The Graduate.
This might have been true if all of your data was in the same order of magnitude. But consider things like the hyperfine structure. A petabyte is pretty large, but it is nothing compared to the orders of magnitude needed to randomly sample the entire electromagnetic spectrum that would detect hyperfine levels. When things like physics deal with subjects with over 40 orders of magnitude difference, random sampling isn't going to displace intelligent sampling.
The article is utter nonsense. But it's such a rambling mess it's hard to know where to start picking it apart. Perhaps the best is when he presents as an example of this new "model-free" approach with a program which includes "simulations of the brain and the nervous system". Uh, hello... a simulation IS a model.
I am definitely a victim of this "Google effect". Search makes me lazy.
For example, for years I would pride myself on my well-tended Windows Start menu. I'd create base categories for my application folders like Hardware, Games, and Internet, and move applications into those folders to keep my Start menu manageable. I blogged about this procedure and included a screenshot.
Now that I'm using Vista I have little need to be so organized. I rarely have to navigate manually to an application folder thanks to the embedded search box on the Start menu. So now my Start menu is a huge clutter, but so what? I see that exercise as futile as dusting the cardboard boxes in the attic.
"It's time to ask: What can science learn from Google?"
Science had nothing to do with founding google.
For a long time we have had two ways of looking at the world: deterministic and statistical. More data may make for better statistical models or maybe not!
The best example I can think of is weather forecasting. In the 1970s we thought that if we had enough data and powerful enough computers, we could totally predict the weather, nay even the climate. We didn't take butterflies into account.
So, sometimes no matter how much data you have, you're euchered. The scientific method still works in the domain where it works. (and it doesn't work ...) Nothing has changed. Nothing to see here folks. Move along.
Searching data is a tool. You still need to have insight to formulate a theory, develop a test for the theory, and ask the data pool the right (non-leading) question. Then evaluate the data looking for both proof and disproof of the theory and be smart and ego neutral enough to let the data suggest a new theory, test and question. Don't confuse a new and useful tool that makes insight easier, with the ability of humans to have that insight.
There are still several computing problems from earlier, smaller eras that havent been solved by the "more" paradigm. One example is realistic synthetic voice. The bandwidth is megabytes, achieved by mp3 players some years ago. However voice is the last part of the "real world" we have to capture instead of synthesize to implement computer-generated feature movies or video games. This keeps the need for having some "flesh" actors around, at least for a few more years :-)
Then there was Slashdot's retrospective of Artificial Intelligence a few days ago. Many of the interesting advances where made in the kilobyte and megabyte eras. It seems the gigabyte and terabyte eras have barely made a dent in progress.
That an incredible amount of data exists on any given topic does nothing to describe relationships, causality, precision, accuracy, distribution, correlation, or anything else. Data is information, and information must be processed in order to make it meaningful. Additionally, everything that's written, printed, published, etc, is not necessarily true, accurate, precise, etc.
If anything, the Google phenomenon demands more rigorous examination by accepted methods.
The preceding message has been brought to you by Captain Obvious and the letters O,R,L,Y.
I'm pretty sure I was told the opposite in [every stats class ever]
Crunching large amounts of data is useless if you don't sort out which results are meaningless.
Side Note: WTF is up with
I always post using Plain Old Text and hitting enter twice (two line breaks) only displays as one line break.
{p} doesn't create a new paragraph.
{br}{br} is the only thing that shows up correctly for me.
[Fuck Beta]
o0t!
First, not everyone has access to vast clouds of information due to expense and I don't think that's going away any time soon. So we'll still get to understand what's going on around us and not just rely on regression analysis to inform our every decision.
Second, in my experience with large sets of data, you can do all kinds of math to them to bring out interesting relationships but someone with domain expertise is going to have a much better insight into what the data is saying than someone who doesn't. It seems the peak of hubris to think that the techniques taught in every science (social, hard, or otherwise) are worth nothing compared to massive amounts of data. How do you know where to get the data from? How do you apply the data?
I don't think it's quite time to throw out "correlation != causation". In fact, I think now more than ever we need to be able to understand underlying phenomena behind the data precisely because there is so much of it. With so much data, coincidental correlation is going to happen quite often I'm sure.
And, of course, the ultimate reason we need to understand things is for, you know, when the cloud's not there.
'Every story, if continued long enough, ends in death.' --Ernest Hemingway
Cheers, -m
This is typical web 2.0 hype... more is better. Which, as anybody who has used Wikipedia knows, is utter bullshit. The scientific method can't be supplanted by a large amount of questionable data. Tons and tons of bad data is still bad data. It doesn't get any more correct just because there's more of it.
I don't respond to AC's.
A thought-provoking piece written by someone who neither understands the scientific method nor Google. Who doesn't understand the difference between a Theory and a model. Who still doesn't get correlation!=causation. Who probably has never had to actually analyze any substantial amount of data before. And who has clearly been raised on a self-important intellectual diet consisting of too much Buckminster Fuller, Kurtzweil, Frank Tipler, and Derrida. I'm sure there are some kernels of insight buried in there someplace, but I'm just not clear what they are. If his rant is indicative about the future direction of science, we're all doomed.
i\hbar\dot{\psi}=\hat{H}\psi
I thought this was a joke at first. One thing to think about is that the biggest data collector of them all, the Large Hadron Collider, which fits the frame given perfectly - delivering terabytes of data in huge data sets is just the opposite of the described scenario. Models are crucial to actually picking what data is actually recorded. In fact a large part of how good the LHC data will be will be in using models to select what events to capture. The way the data is captured is of course also based on long effort and knowledge from previous detectors. This isn't just randomly, or even generically selectively gathering data and then analyzing it. This is targeted data gathering based on complex scientific theories. There have been shouting matches at what to tag for collection based on what people think is important for a given theory - and these will happen again.
As our collection abilities rise exponentially, the the storage and analysis abilities are not exponentially growing, even though they are increasing at a fast rate! I would argue exactly the opposite of what this article said. We are going to be more and more dependent on our current scientific theories to even be able to choose appropriately the rich data that new sensors and techniques will let us collect. That is we are more and more dependent on our scientific theories when we get data not less. Did we even know to get methylation data when sequencing a genome. How about some other "ylation". Without background theory and experience we wouldn't even know some of that stuff was there to collect!
This is nonsense pure and simple.
One needs to acquire facts. Now these "facts" can come from your own research or, in the age if the internet, someone else' data, but they still need to be collected and verified.
The *only* advantage that google provides is a more efficient way of sharing and finding facts. Not even all facts, those that are popular and topical are what you'll most likely find.
Historical information, from when newspapers only used dead trees, can be very difficult to find on the internet unless someone else did the research first.
Vast clouds of information used without intelligence are just garbage going nowhere. You can't even call it Garbage In Garbage Out, because it's not being processed by any kind of mind at all.
What could possibly go wrong?
To avoid the same fate as the GP, let me clarify that by WTFey I specifically meant that the article was full of fluff, light on details and generally pointless...which makes me think "WTF." The closest thing to a point I could get from the article was "Nice big blobs of data can be useful, and statistical data based on said blobs could replace the results of scientific research." Mmmkay.
A sensational headline leading to a rather pointless article consisting mostly of fluff: WTF.
"When information is power, privacy is freedom" - Jah-Wren Ryel
This is grade school stuff. Correlation is not causation.
Which means if you're approaching a region you haven't sampled, then you can't understand what's going to happen because you've thrown away your interest in 'why [something] does what it does, [because] it just does it.'
If you're only using models as correlations or proxies, what are you using models for anyways? There's nothing 'increasingly' true about that.
=======
Science -- Sealed, Delivered.
A scientist doing an experiment still relies on the scientific method to collect his own data to see if they support his hypothes[ie]s. I really don't see anyone publishing a paper and saying "Dudes! I used Google to find my data points!" How the hell is Google going to stop people from doing experiments and finding their own data?
This article is complete crap. I don't think this person even understands whats the "Scientific Method" means.
Vivin Suresh Paliath
http://vivin.net
I like
For example, Craig Venter may have tons of genes that look like something that can make gasoline from grass, but you still need to test each one the old-fashioned way, with careful application of theory and experiment, to see if it works before you start using it.
Sidney Brenner (legendary Biologist and Nobel laureate) calls these methods "low-input, high-throughput, no-output biology." http://www.mc.vanderbilt.edu/reporter/index.html?ID=5027
For example, to detect stress you might traditionally measure heartbeat, skin conductivity, pupil dilation.
In the "petabyte age" you throw in the number of times the subject uses the letter 's'; how frequently they use the 'reload' button on the browser; what colour of pants they wore last tuesday; Pepsi vs. coca cola; the number of times they picked their nose in 1997 and any and every other bit of data you have on the subject.
In the "petabyte age", most of the data you sift through will show no correlation, but you have a much better chance of finding the unexpected if indeed, there is some unknown factor out there.
There's lies, damn lies, and statistics. Now "Clouds" of information.
Feh.
Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?
No. And also no to the basic premise of the article.
Meteorologists have been doing this for decades (principal component analysis has been a crucial tool there since the 1960's, and correlation analysis has been used in some form since the 1920's if not earlier) and so have the astronomers. Oh, and the particle physicists have been sifting data in their own way on a big scale ever since World War II.
As one of many examples, if you ever have heard of an "El Nino event," that was discovered through correlation analysis and is best understood through principal component analysis. BTW, the original work predates electronic computers and was all done by hand. The vast quantities of meteorological data require statistical analysis to make any progress at all, but that certainly does not mean that you cannot use the scientific method.
So, no, this does not invalidate the scientific method. In the Internet jargon, science scales.
Anyone who has read any work by Lyotard, Baudrillard, or Derrida has seen this interpretation of reality coming for years. This is basically the consequence of the Post-modernist/Post-structuralist mentality.
In a sense, what the article is proposing is the "simulation" of reality in a computer system based on the available "data". This simulation as i will suppose in a moment is merely a flawed model since the data being related must in some sense be based on an algorithm which inherently MIMICS reality and is not a substitution for it (no matter how, "accurate" agreement). But nonetheless, the result of this as Baudrillard observed is not a simulation but a simulacrum of reality and eventually will take the place of reality. The implication is that reality is not created or manufactured by the interaction of people in a "real" sense but is actually lead by the operation of the simulacrum!
Nonetheless, the fact is there is no possible way to store ALL the data of the entire world (since some data is not recordable by a binary machine, and no a "quantum" computer is the solution to say it can be); however, the problem is this fact does not mean we cannot be mislead by the simulacrum and be lead into a future where human interaction is as I would call inhuman, but as some who have (in some cases unknowingly) fallen for the post-modern myth would call it merely an evolutionary result of human-interaction.
In the future the storage of data, the usage of data, and the power of data will have a huge impact on our humanity as the past twenty years should already be evidence of. I am not an apocalyptic fear-monger, but the proof is in the pudding. For further reading, I recommend a highly prescient book written in 1990 by a Mr. Mark Poster called the Mode of Information which talks about some of these implications which are in the process of becoming as we speak
...and it should be known by now
The idea in the article is interesting, but I personally feel it's totally bogus. Yes, crunching data with mathematical formulas can help extract something useful, but...
Strictly speaking, isn't a mathematical formula a model? All of the theories (models) we use in materials science to explain things (quantum mechanics, stress/strain relations of materials, etc) are all mathematical. Qualitative understanding doesn't give you a numerical prediction.
Perhaps the above is a bit of a logical flaw, but you still need the maths to get information out of all the data. You need to know what to look for and make the necessesary algorithm (low-level model?). AFTERWARDS, though, you need to understand that data. Otherwise, you have not done much to advance your understanding. I did RTFA, and the person mentioned who "discovered a new species" but doesn't know anything about it...neat. What, really, has he done? Just thrown out some meta-data for someone else to analyze, model, and study. Google is not the end of scientific method. To the contrary, I think it will only help.
.. but found this information on wikipedia's current events page: "The US state of Florida purchases a large dildo to add to the pleasure of the senate about their lands in the Everglades."
I firmly believe in paradigm. Call me what you will: "paradigm freek", "irrational", "stupid", etc...
Nothing will shake my faith in data or the paradigm. My faith has given me peace and happiness. I just hope you agnostics and paradigm atheists respect my beliefs and I'll respect yours.
Thank you and peace.
It means there's about to be an explosion in models and theoretical sciences. Always beware the End of History ;)
Don't blame me, I voted for Baltar.
Here is some 'compelling' comment. Lots of very important things require the scientific method.
Obvious ones are medicines and housing materials.
Important ones are accurate global warming models and electric battery efficiency tests.
This Wired jerkoff and his band of know-little acolytes think that because they can accomplish everything in _their_ day without science, that it will die out.
This myopic self centredness would not have yielded them a clear signal on their iPhone. Science did that.
[% slash_sig_val.text %]
I'm concerned that placing too much trust in such a model-less paradigm is dangerous. Why? There could be several reasons. Data can be artificially manipulated, for example. This can cause us to draw erroneous conclusions, and consequently, to make poor decisions. I would still want to know "why".
Proverbs 21:19
What is not useful about that?
We agree.
Restated:
The information quality of data isn't implied by large amounts of it. Correlation (read petabites of foo) != causation.
---- Teach Peace. It's Cheaper Than War.
"Intense" amounts of data?
WTF is an "intense" amount?
And since most slashdot readers don't RTFA most comments here have proven useless in trying to figure what those kernels you mention are.
But this guy, who has read TFA (and commented on it on the Wired's site) seems to have found them.
Mit der Dummheit kämpfen Götter selbst vergebens
Fighter classes generally stop at con, where as Casters generally for Int or Wis. No one cares about Cha.
The Kruger Dunning explains most post on
Just finished rereading the Foundation series for the one millionth time. Anyone remember some of the signs of the decay of the first empire? The idea that these "scientists" were no longer experimenting, no longer looking for new ways to do things - just spending their time looking at old books and old experiments and trying to squeeze a "new" thought or two out of them? That a sociologist would study a society through books written about it? An archeologist would explore the ruins of a world by reading descriptions written by someone centuries before?
Anyway catching the parallels here? The "search engine" is a great tool for gathering existing data - but our current tools help us:
1. Analyze that data
2. Gather more data
Can you honestly say that those aren't important anymore? The summary seems pretty crazy to me.
Even though its a bit exuberant, as some techies are,
the interesting point is that new generations of a "thousand" should give you new ways of looking at scientific problems. The claim it "ends science" is just a red-herring to get you to think about the issue.
Uh, hello... a simulation IS a model.
Simulation is a copy.A model involves a shortcut.
There is a confusion here between technology, which indeed is taking the peta-brute-forcing route, and Science whose beauty precisely resides in its computational economy.
A perfect copy of a brain would be an engineering feast without providing any understanding of why it works.
The root of all this is a pitch for Enterprise Search. I can't tell you how much more productive I am now that I can search all my email with Google Desktop (or whatever application anyone thinks is better.)
I am getting in screaming matches with my boss because management wants me to "moderate" all our 10 different corporate "portals", each of which has been created because some pissant minor manager didn't like the way the 9 other pissant managers were moderating their portals. Fuck that, the corporate intranet is a big pile of data, the tools exist for users to search it themselves, and I can do more interesting things than argue with users over the difference between "its" and "it's".
is getting out of control.
What's in a sig?
You know, this may be the most pure Wired article I've read in a long time. Reminds me of the magazine's layout when it first came out. Complete bull, unreadable, unstructure, but slick.
What Google has done is represent the world, mathematically, as it existed a moment ago. This is a massively impressive feat which we slashdotters don't give enough credit. On the other hand, I still say, "meh".
Think of it this way, Google has created a "model" world. I'm not thinking "model" in the scientific terms but "model" as in the Gundam robots with snap-tite (tm) parts. Instead of plastic bits, this model Earth is built with data. It's pretty to look at and has a lot of great details. It's a darned good-looking likeness of our world. (And no, I don't mean Google Earth, either.)
But it can't predict things like a scientific model.
One still needs the scientific model and hypothesis testing to make predictions and see what our world will be like in the future. This, in turn, also helps explain how or why things came before. The Google model just shows what currently is.
After all, you never know when you need to prove that a supposed "piece of the cross" is in fact from the corner of some jerk's 20th century pine desk.
stuff |
For the most complex 'stuff' there is almost always the most simple explenation. Allthough E=mc^2 is not entirely correct, you do know what I mean...
What the author of TFA basicly is whining about is that everything is supposed to be too hard for them to find an explenation/solution for. So instead of doing the hard work they are just doing statistic analasys because it's much easyer for them to do.
He is basicly almost saying that we should be starting to believe in god because science (in his view, not mine) is crap the way it is today.
I call ThisFA a piece of crap
Here be signatures
Damn, thats my problem, I thought it was all tubes!
Google used reams of data to get good at advertising and marketing, so Wired is using this ability to predict the end of SCIENCE?
Do they not realize the difference between these things? Advertising is extremely hand wavy and vague in the best of circumstances - I would argue that Google's offerings aren't really better than any other method, they're just cheaper for advertisers, and have a much larger base than normal.
I'm honestly astounded at this.
I have to admit that the article is thought provoking, and it might actually fulfill an educational purpose in a science or philosophy class. It can be given to students for criticizing, in that way be used as a motivation for a discussion about how science can benefit from new technologies. The article is wrong basically because is confusing Science with something else, that could in the best case be called engineering. Science is about understanding and we use data and models to iteratively refine our understanding. Neither the model or the data by themselves, are the science. More importantly, as other people have pointed out, there is no understanding embedded in a correlation, or in a collection of butterflies for that matter (even if each butterfly is given a pompous "scientific" name).
If your just data mining, you have to assume that the data is valid and not someones scientific bias. Thats a big assumption. There has to be some objective methodology to test claims and results, if not then there are no checks and balances and we are reduced to "crack-pottery".
"Paradigm agnostic" is just another word for "Who the hell knows?"
Actuarial data is about history. *Theories* describe behaviors in the past AND the future.
Both are useful, but you shouldn't mistake one for the other, or be misled into thinking you can do without thinking (i.e. developing a theory).
It seams to basically be advocating:
* get lots of data
* use computers to search for pattens in that data
* use those patterns
instead of thinking up models and testing them.
But don't you have to have some model to define the parameters of the data you are collecting? I'm not sure there is anything new here..... Isn't it just data mining?
Until cells, molecules, atoms, and subatomic particles start publishing blogs...
Just what we need... quark twitters.
When our name is on the back of your car, we're behind you all the way!
1) Google doesn't link to much useful scientific data. There are other databases that predate Google that did and still do, but Google isn't a very useful tool for data collection, so it shouldn't be in the discussion.
2) For about a century, all funded scientists have had pretty good access to all publicly released data in their fields and could rather efficiently sort through it due to good hierarchical organization. Nothing has changed about that other than speeding up the process to some extent using search (Again, not using Google. Their one foray into academic search is so far not useful for people who have funding and thus access to much more advanced and complete tools), but people who are really involved with their field of study already knew where to look, so the search is mostly a benefit to the undergraduate paper-writer.
3) The way that "paradigm" affects data has not remotely been related to which data sets people choose to examine in well run science. The paradigm affects what data is collected (which is what real data-driven scientists do), and the "Petabyte Age" has no affect on the efficiency of data collection.
The author is a massive failure.
"I zero-index my hamsters" - Willtor (147206)
One must already have a concept about what is measurable, what to measure, and how to measure it before data can be collected - thus the data collected already assumes a paradigm. Healthy humans start out with a number of measurement tools (senses) together with evolved modeling abilities (e.g., language proficiency) and refine their models. Where there is no innate model, mathematics, followed by the scientific method, appear to be the most reliable means of refinement. The innate models are likely reliable due to such reliability having an evolutionary advantage. Such models are nonetheless subject to improvement by math and science. The Wired article fails to appreciate the amount of data humans successfully model innately - petabytes don't even begin - and the amplification through scientific models expands that by many orders of magnitude.
It seems like the author is using the word "model" to mean "The structural design of a complex system," rather than "A simplified representation (usually mathematical) used to explain the workings of a real world system or event." What he's attempting to say is that mathematical models (what he refers as not-models) are all we need, and experiments are useless because we only need to know about correlation and not about causation. This is phenomenally idiotic, and assumes that "causation" is some Platonic, mystical idea of underlying truth rather than simply correlation + time. Once you get past his complete ignorance of the fundamental terms he's using to build his argument, he's saying that science can be done better with big computers and lots of data. Thank you for that fabulous insight. Alert the internet!
This has to do with a subject brought up in the article, this "new science".
I got into an argument with a friend over some content this article happens to contain. He said that, in fact, environment can influence inherited genes. Epigenetics and such. My argument was that if a parent is a marathon runner, for instance, his son might be prone to doing that simply by being brought up around that sort of thing, rather than having the son be a good runner because the father (or more likely the mother) did that.
He ranted and fumed that his teachers at school were right, and I asked if we could have a rational discussion on the subject, assuming him to be completely off his rocker. Another friend of ours also backed me up, saying he did not believe that environment could have an effect on inherited genes in that matter. We asked him how these traits were passed on, and he had no answer, just anger and frustration.
What is the skinny on this "new biology"? Can someone here please enlighten me? I feel ignorant here and wouldn't mind some information to sort things out a bit. Is this stuff proven, or just more hypothesizing?
Thanks!
-
Google is at NASA? Just do me a favor and keep them out of the engineering building. Limit Google to the library where they can manage vast quantities of data using the dewey decimal system method.
An unknowable paradigm? Interesting.
I am very small, utmostly microscopic.
...it's a Wired article. What more could there possibly be to discuss?
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
All I want to know is this: how much did Google pay for this article?
Chris Anderson may not be an expert and he seems to equate theory with the partly dated Laplacian concepts. I only partly agree: not even close to the end of science or theory, but perhaps a slight shift in focus, towards models that are more statistical in nature. Fundamentals remain the same, though, as far as I can tell.
<shameless plug>
http://www.bitquill.net/ post
</shameless plug>
I'm guessing that at some point Chris Anderson will say that he wasn't trying to make a serious claim in the article, he was just trying to "stimulate discussion about the changing realities of scientific investigation" or some damn thing.
Microsoft tried vast clouds of information and strictly applied maths when they made their own video ASIC for the XBOX 360.
Their vast clouds of information and strictly applied maths still did not encompass knowledge of the future, or of people's previously-overcome practically - corrected mistakes.
Thus, the Red Ring of Death.
Models mean one need not re-invent the wheel for applied sciences.
We all know that young, inexperienced workers are often more creative than much older ones -- at least in very hard fields. For example in mathematics, it is the young who do most of the innovating. There may be a physical aspect to this, as the brain becomes less neurally flexible, but imagine for a moment. What if part of it may be that creativity is bound up with the struggle to expand one's mental horizons? What if once sufficiently expanded, consistency with the known territory becomes a constraint on finding new territory?
Now, instead of an individual, consider a society as a whole. What happens when the volume of knowledge of society grows to a point where the marginal value of exploring the known exceeds the marginal value of extending it? Specifically, I am thinking of an example. In the late twentieth century, people began to think much more about exploring the interdisciplinary topics. They began to think about questions like, does chemical engineering have applications to the design of cancer drugs?
I don't think science is in any danger of becoming less dynamic because because this is really just another fronteir. The connections between the knowledge we already have is a very productive place to explore. Given that more scientists are alive and working today than ever, there is plenty of old school and interdisciplinary work to go around. The demands of publication will make use of all the individual scientific creativity the world can supply.
Where we are in danger of a kind of societal senility is in political and ethical thought.
The Internet (along with other things) makes us stupider in these areas than ever, because personal progress in them depends on setting out to confirm what you already know, but failing. You can hate Jews, but if you have to live with them, work with them, and interact with them that hatred is going to be strained. The same goes for any group. You may hate liberals or conservatives, but in a pre-Internet environment you had to accommodate those viewpoints in your mental landscape.
But it feels so much nicer to get a pat on the back than a stick in the eye.
Yes, we have never as individuals had as much access to dissenting views, thanks to the Internet. But we've never had as much access to people who think just the way we do. Thanks to Internet search engine technology, it is now possible, as never before, to spend every waking moment confirming our own preconceptions, whether that is anti-semitism, ultra-right wing politics, socialism, religious fundamentalism, or whatever ism you can imagine. Rather than confronting the information that challenges our pet conspiracy theory, we can burrow into a comfortable sub-world where our opinions and beliefs are nearly always ratified.
The dynamism of how we see the world of affairs is what is danger of being lost, because it has never been easier to be intellectually lazy. There are more opportunities to expand our horizons than ever, but along with that greater convenience in choosing the same thing, over and over. There may be more kind of vegetables in the supermarket than our grandparents could name, but it's sooo easy to eat at McDonalds every day.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
You can't get good data unless you control the makeup of your data population. Even if you applied this technique to all the data in the cloud, it wouldn't mean the "end of the scientific method", it would be scientifically studying the cloud.
So no. Even if everything he wrote is all true, you still apply science to study things, just in a different way. The internet doesn't make science obsolete any more than it made economics obsolete, and saying otherwise is as much hubris now as it was then.
It seems rather stupid to me. Sure, we can correlate a whole bunch of data. And we can collect a whole bunch of data. But that's not going to give us the predictive power that scientific models give us.
.) to predict that the earth orbits the sun, and how it does so, that gives us insight.
Take for example, the orbit of the earth around the sun. Suppose we collected a whole bunch of data on the orbit of the earth around the sun. Sure, we'd be able to predict what the orbit is going to be, based on past data. But it gives us no other insight. Whereas, when we use the theory of gravity (and rotational motion and conservation of angular momentum etc . .
Because we can now turn to, say, Jupiter and the sun. Even if there is no data collected on how Jupiter orbits the sun, we can use the predictive power of our theories, that we have tested on the earth-sun system, to say how Jupiter is going to orbit.
That's a simple example, but you can imagine much more complicated situations. If we simply have correlation, we may be able to say that X is going to do Y based on previous behavior, but if I ask you how something new and unexpected is going to behave, we can get no answer until we take data . . . because we don't know *why* anything happens. And that's why we're never going to replace theories with statistical analysis of data.
There's a place for both. Obviously, just statistics can be very successful (google, for example), but, at least in science, it's not sufficient.
This article tells us just how muck Google is an evil corporation.
These singularity flaks are at it again. C'mon, it's a not-so-pleasant sci-fi concept and nothing more.
Humans transcending into bodiless personalities floating in a data cloud unbounded by time and space? Humans becoming godlike, yet keeping their innate lust?
Oh, gimme a break! When they turn off the switch it all goes black and there is nothing more. Without the scientific method, how do you tell a petabyte of junk data from a petabyte of shinola?
The idea of skipping models is by itself quite interesting. Scientific method gets in since it tries to generalize based on observations. We need generalizations or models since we cannot capture the data we need. The article wants to say if we have enough data about a given phenomena, we no longer need to simplify reality using a model. We can instead just use the facts as they are in reality. Models come at the cost of simplifying assumptions, which make them, according to the infamous GÃdel Incomplete Theorem, paradoxal by nature. Another thing is that models are usually in continuous form. The reason is that we, humans, cannot capture an infinitely discrete system. So, models serve to help understand things without the need for capturing the whole data of the phenomena in mind. This might not be needed in the future if computers can take care of storing enough data for the given phenomena. So, I think the idea of skipping models might be realistic at some point in the future.
It's time to ask: What can science learn from Google?
This Wired editor guy is clueless.
People in the bioinformatics and computational biology have been saying these things now for well over a decade (at least). It's not about Google. It's not about Craig Venter. What a fucking shame for Wired to have such a pop-science fanboy for editor.
How mainstream and behind the curve Slashdot and Wired have become...
Main difference between the BSD license and the GPL license: one is from California and the other is from Massachusetts
I think the main issue (and I'll pre-emptively say that, no, I didn't RTFA...I'm at work, what do you want from me) is exactly what is mis-stated. Our issue is with how we collect data, not how we search it.
Waaaay back when XML was being formed from SGML, one of the the big deals was that, one day (in a perfect world) we would be able to put all of our information in a giant database and be able to meta sort it to such a degree that finding a needle in the proverbial haystack would be a non-issue. One of the problems with how we collect data now is that there's no structure or form to the data. It's raw in the rawest sense.
We're human and, thus, we're concerned with how information is visually presented, not how information is stored. That will be the next great stepping stone. In theory, searching mechanisms should be exactly that, simple and mechanical in nature. With our ever growing storage ability, we should spend the time tagging our data as it's created and dealing with a database of information twice the size as what we would otherwise have, not spending our time coming up with more and more complex search tools.
Now that information "tags" are mainstream, I think we're finally on the verge of mainstream society understanding the importance of data structuring. It's also our next stepping stone to creating the semantic web. Spend money on semantics research, not on beating a dead horse.
Wise men say, "Forgiveness is divine, but never pay full price for late pizza."
Ah, but...
Googling "less is better" returns 138,000 results, while googling "less is worse" returns only 5,300 results. But since less, as is commonly acknowledged, is in fact more, we must conclude that worse is better, which is absurd. Therefore, Google is not google, insects are evil thoughts, and burning Sappho loved and sang and stroked the wine-dark sea, in the temple by the moonlight, wa da doo dah. Q.E.D.
Shop as usual. And avoid panic buying.
Astrology has been alive and well for millennia. Using the vastness of data from our own galaxy in order to attempt to predict similar patterns in human beings and more.
So google has a big dataset, whooptydo. Is this article suggesting that we add a new pseudo-science of internet data set movement a-la astrology? I hope not.
I can see predictions now: 'When the MS dataset eclipses the SUN dataset, the age to JAVA will be over.'
I have a feeling most people on here aren't really understanding the article. Some are, and are simply disagreeing with it (of which I would be one - hopefully), but I do not feel that is the majority. The article is recommending a totally empiricist approach to dealing with data. His point is that as we try to make a 'law' of some physical phenomenon, we are only limiting our knowledge, as we will continue to only examine data within the confines of this law, whether that includes supporting or detracting information.
For an example, consider Einstein's Special Relativity, it both addressed the universal speed of time, while also dealing with relative notions of motion. Einstein connected the dots by ignoring the apparent discrepencies between Galileos principle of relativity (a model) and the empirically determined speed of light, by removing the model, he found an answer. At the time, scientists were convinced that there must have been some error in their measuring devices in order to account for the absolute speed of light through the ether.
While even SR is a model, I think the author would be recommending post-hoc analysis for 'models' though, rather than models that intend to predict.
There are as many cells in the brain as there are stars in the galaxy (roughly, 100 billion). There are around 100,000 receptors on each neuron, most from different cells, making the brain so interconnected that nothing is ever more than the famed "6 degrees of freedom" from anything else, the average being 3. There are dozens of neurotransmitters involved in intercellular communication and even more neuromodulators within the cells to handle processing. There is also indirect electrical communication between neurons where their overlapping dendritic trees detect the electrical fields of nearby but non-connected cells, and processing depends on those too. All these processes are interdependent -- virtually none of them happen without affecting and being affected by, the others. The numbers, amounts and even fractional dimensions of the interactions' variables involved are just one of the fields that spawned experimental mathematics, a field that even mathematicians have trouble wrapping their heads around. That's enough for the purpose of my conclusion, though I could go on.
This is the subject matter of my profession. I have no problem dealing with petabytes of data because I know what to look for. Nobody collects that kind of data throughs it in a pile and stares at it. We collect that much because we know that's how much we need and know how to find our answers in it. Just because some places (there have been many of them for years) let you throw your raw petabytes into a pile doesn't mean science itself changes. Anyone who tries to data mine petabytes withouth being intimately involved in the collection might as well start at any random place on earth and start mining for gold.
So, Chris Anderson can peta-bite my ass. Science survived a far more revolutionary fundamental change than this and did quite well since. That invention was the zero. It will always be more important than any order-of-magnitude milestone.
Ask a scientist that uses that much data if there's anything significant about it, leave science writing to them and to the decent science writers (like Alan Boyle), and keep the editors busy at their desks editing. If the editors are not so busy that they can waste time making up this pseudo-FUD out of barely understood buzzphrases, it's time to trim the budget at Wired.
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
I must admit, as an applied mathematician who makes models of physical things for a living, this sort of research threatens to steal my bread and butter. It may be self-centered, but I think modeling is, beside experiment, half of science.
Simplified models are so valuable to our understanding because they tell us what information we can remove, which parts of a problem are important and which parts may be ignored. They allow us to not just make predictions, but they guide future experimentalists as to what sorts of changes will impact the system and which won't.
To be fair, it's more of a cycle: experiments generate data, models are constructed to explain the data. These models make predictions (and hopefully useful simplifications) that can be checked by further experiments to validate them. At the end of the process, we've produced a clearer picture of how a system works. Enough information maybe for someone building something slightly different to not have to test the aspects covered by the model.
I view these data-mining techniques like the scientific computing techniques of the last 30 years or so, only the inverse. Sci Comp nerds wanted to do away with experiments. They thought they could numerically simulate (relatively) exact models (like Navier-Stokes for fluid motion rather than one of its more tractable, understandable simplifications) and use the generated data instead of experimental data. The trouble was that no one will believe that the crazy new phenomenon discovered by your program is real until they see it in the lab, until they construct a simplified model that has the same behavior -- i.e. the same science as before.
The new data-mining idea is the same, but for the modeling end of things. "No models, please," they say. They'll just data-mine the experimental results and "discover" whatever the model missed. Except people will want to do experiments to verify the discovery. They'll want to build models so they can know they're doing the right experiments, and so on.
At the end, I think Sci Comp and data-mining are fantastic new tools that have a lot to offer science, but I don't think either eliminates the need for old fashioned modeling.
Use the Firehose to mod down Second Life stories!
This article felt like reading the numerous articles over the years about the imminent arrival of Strong AI written by people whose primary understanding of computer science comes from watching sci-fi movies. Now though the Strong AI god has been replaced by Google and the followers claim the Strong AI messiah is already here. Unfortunately for them they are mistaking mere information for knowledge and understanding.
I think not.
Maybe, though, it is the end of the line for heuristic strategies for deciding which data to collect.
you, sir, are the wind beneath my wings.
Need Geek Rock? Try The Franchise!
Comment removed based on user account deletion
Ok, for you nitwits that have less than a 7th grade level of reading comprehension, or the inability to focus for more than two paragraphs, here is the summary of the article you might be able to grasp: The author is saying we are approaching the point where we can use raw computational power to mine enormous amounts of data for answers to increasingly complex questions. We can use this instead of the scientific method, because in the end, they produce the same thing: an approximation. The author then sites examples where this process was applied with degrees of success (i.e Google Searches).
.. how I changed my photography method when I got a digital camera.
Now, instead of carefully considering the composition and setting of some subject so that I don't waste precious film, I can just take eleventy-nine indisciminate snapshots, and hope that one of 'em is decent.
Correlation testing is hypothesis testing. The statement "X and Y are related" is a hypothesis and the test is statistics. Simple models are still useful for humans.
You're driving your car down the street in Paris and then suddenly in LA Jason's brakes fail at the same time someone is driving a new Tata made car in India and the person who is driving a black car just north of Buckingham Palace turns left and a green thing suddenly appears then you decide to buy some petrol and booom it's all true and then the guy in LA slows down using his hand brake and and and ...
is not correlation!=causation. The problem is collecting the data. Without insight, we will never be able to collect meaningful information. Any statistical method relies on data collected by people who knew what they were measuring.
Trust me, I work for the government.
Let me just say that the scientific method is alive and well, and in none of the peer-reviewed journals I've had material published in, do they take "I Googled it" to be valid.
that the author of TFA, Chris Anderson, is the editor in chief of Wired?
...the future crusty old bastards are already drinking the Kool-Aid.
I anticipated logging into Slashdot expecting an educated philosophical discourse and instead just read these rants.
I am going to go read some mature comments on Digg
/sarcasm...especially the last sentence
"Excuse me, but 'proactive' and 'paradigm'? Aren't those just buzzwords that dumb people use to sound important? Not that I'm accusing you of anything like that... I'm fired aren't I?"
I googled this and found 187,003 entries. Pick 5 of them, and found out that this really isnt a problem.
Too many goofy pundits in the petabyte age.
I've read most of your posts, and it seems that my approach is quite different, mostly focused on what the article says about Google fundamental method, alias PageRank.
If Craig's Venture sequencing of genome revealed that environment can influence heavily inheritable genetic traits, this influence is still to exist and to be found in the environment.
Google is dominating the Internet's environment, which happens to be presently one of the dominant environment -or context, at least- in which human beings develop relationship and communicate.
What is said, in this article, to be Google's philosophy, 'we don't know why this page is better than that one...', is actually Google 's declared or confessed philosophy.
Absolutely opposed to Google's pretended ignorance (or maybe some gap in my knowledge of Klingon), the success of that company would rather rely fundamentally on the analysis of hyperlinks, as a sign of relationship and hierarchy between sites, then between communities of Internet users, then between social groups in a human environment. Google understood that the hyperlinks were an unambiguous expression, mechanically exploitable, of a socially determined human relationships. So it is absolutely wrong to say that Google requires âoeNo ... semantic analysisâ. To work, it does requires it, yes, it absolutely requires semantic analysis, and above all semantic analysis mixed with social analysis based on hyperlink observation.
Google does not know, indeed, if this page is better than that one. And this should precisely explain its success. It mainly and simply understood that the human, who consciously make meaningful hyperlinks between webpages, know better than Google ever will, which are the best pages. Google in particular made a brilliant bet on that human knowledge, and speculated heavily on it, as its foundation to provide some quality, instead of spending time and power calculating randomly numeric relationships between words and text elements as Altavista did, in vain.
As regard the term 'raw data' mentioned in the article, everyone will agree that hyperlinks have nothing to do with raw data, undetermined data, on the contrary, because one doesn't put insert tag of an hyperlink on a webpage, pointing to another webpage, unconsciously, which is not the case of many words we use, that always have many ways of being interpreted, including meanings not intended (heard about lapsus?). Hyperlinks are definitely pre-structured data
Google is not a method to find truth, indeed, it's not needed since human are much better at filling Google with their truths... and beliefs! Google provides a method, which is as servile as brilliant and acute, to reproduce and certainly worsen, through ranking, relationships of power just as they already exist and are inherited in real human societies. Google pretends that it just devours our relationships between us to rank us and spits out the (our) truth.
Google owns the Internet, ok, but human social reproduction is still what determines the Internet. Google knows that, that's why it wants to control Internet environment, to know more about how human determination, and maybe how to influence it. This is definitely more a political than a scientific method. It would be better, instead, to use the already old expression âoerealpolitikâ, it is well-proven that it can be pretty useful, oh yes.
What I felt about this article, is some irony, in particular when the author says that Google can translate perfectly from Klingon to English. Don't you?
Google does use a model; not some whimsical nothingness. Itâ(TM)s a patterning matching algorithm that also incorporates the page ranking (which from my guess is a form validation on the model and part of the model; hence correlation.) Which means the data Google has is well organized. Perhaps the data is not in a form that we would understand or can use but in a form that the programs can access quickly and easily use, otherwise people would not use Google. Or another way of saying it is that computer programmers may know little to nothing about the source of the data, but they do know something about how the data can be related and used to create their programs to make Google efficient. They are not blind when writing the code and just hoping that their stuff will work, though some coders donâ(TM)t all ways know the big picture; someone normally does. I think the author of this article is just overwhelmed and is panicking; its said to see it posted.
TFA is a take on Feyerabend's notion that science doesn't - and shouldn't - use any kind of method or methodology to do research:
http://www.marxists.org/reference/subject/philosophy/works/ge/feyerabe.htm
Feyerabend completely rejected the idea that scientists actually use the scientific method and essentially stated that all scientific theories are baseless, ad-hoc explanations of raw data that will be thrown out completely when new data incompatible with those theories comes along. Feyerabend is really popular with some 'philosophers of science' who'd like to see scientists taken down a peg or two, but he isn't really taken seriously by anyone else. Hell, he even went so far as to say that astrology was as valid as science:
Feyerabend described science as being essentially anarchistic, obsessed with its own mythology, and as making claims to truth well beyond its actual capacity. He was especially indignant about the condescending attitudes of many scientists towards alternative traditions. For example, he thought that negative opinions about astrology and the effectivity of rain dances were not justified by scientific research, and dismissed the predominantly negative attitudes of scientists towards such phenomena as elitist or racist. In his opinion, science has become a repressing ideology, even though it arguably started as a liberating movement. Feyerabend thought that a pluralistic society should be protected from being influenced too much by science, just as it is protected from other ideologies.
http://en.wikipedia.org/wiki/Paul_Feyerabend#Role_of_science_in_society
The idea put forward in TFA that we could eventually dispense with models completely and just interpret data has its roots in this type of radical thought. The real crux of TFA is this:
Learning to use a "computer" of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.
Anyone familiar with the fallacy of equating correlation and causation can immediately see one glaring problem with the "Correlation supersedes causation" statement. No matter how much data we have to analyze, correlation and causation will always remain two separate concepts and it will always be an error of reasoning to confuse them.
The rest of his comment is suggesting that, with enough data, we won't need to use models anymore to understand the world. Instead, we'll just be able to look at an essentially infinity large data set that will be examined with essentially infinite processing power and we'll be able to predict the outcomes of any action or set of actions without having to have any sort of understanding of 'why' or 'how' those actions are taking place. TFA is saying that we can reduce science to nothing more than an elaborate method of making predictions and dispense entirely with the notion of even trying to understand the underlying mechanism that explains why those predictions do or don't work. Naturally, this is ridiculous. Firstly, because it ignores the fact that those underlying mechanisms *do* exist - even if our knowledge of them is imperfect or incomplete. Secondly, because it would require unlimited data and processing power to even approach being workable. Without a complete data set for every action and every outcome of every action, this kind of prediction without models can't work on the universal scale that TFA is talking about. That amount of data will NEVER exist. It just isn't possible to determine the outcome of all actions and events. It's absu
Comment removed based on user account deletion
Comment removed based on user account deletion
TFA is a take on Feyerabend's notion that science doesn't - and shouldn't - use any kind of method or methodology to do research:
http://www.marxists.org/reference/subject/philosophy/works/ge/feyerabe.htm
Feyerabend completely rejected the idea that scientists actually use the scientific method and essentially stated that all scientific theories are baseless, ad-hoc explanations of raw data that will be thrown out completely when new data incompatible with those theories comes along. Feyerabend is really popular with some 'philosophers of science' who'd like to see scientists taken down a peg or two, but he isn't really taken seriously by anyone else. Hell, he even went so far as to say that astrology was as valid as science:
Feyerabend described science as being essentially anarchistic, obsessed with its own mythology, and as making claims to truth well beyond its actual capacity. He was especially indignant about the condescending attitudes of many scientists towards alternative traditions. For example, he thought that negative opinions about astrology and the effectivity of rain dances were not justified by scientific research, and dismissed the predominantly negative attitudes of scientists towards such phenomena as elitist or racist. In his opinion, science has become a repressing ideology, even though it arguably started as a liberating movement. Feyerabend thought that a pluralistic society should be protected from being influenced too much by science, just as it is protected from other ideologies.
http://en.wikipedia.org/wiki/Paul_Feyerabend#Role_of_science_in_society
The idea put forward in TFA that we could eventually dispense with models completely and just interpret data has its roots in this type of radical thought. The real crux of TFA is this:
Learning to use a "computer" of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.
Anyone familiar with the fallacy of equating correlation and causation can immediately see one glaring problem with the "Correlation supersedes causation" statement. No matter how much data we have to analyze, correlation and causation will always remain two separate concepts and it will always be an error of reasoning to confuse them.
The rest of his comment is suggesting that, with enough data, we won't need to use models anymore to understand the world. Instead, we'll just be able to look at an essentially infinity large data set that will be examined with essentially infinite processing power and we'll be able to predict the outcomes of any action or set of actions without having to have any sort of understanding of 'why' or 'how' those actions are taking place. TFA is saying that we can reduce science to nothing more than an elaborate method of making predictions and dispense entirely with the notion of even trying to understand the underlying mechanism that explains why those predictions do or don't work. Naturally, this is ridiculous. Firstly, because it ignores the fact that those underlying mechanisms *do* exist - even if our knowledge of them is imperfect or incomplete. Secondly, because it would require unlimited data and processing power to even approach being workable. Without a complete data set for every action and every outcome of every action, this kind of prediction without models can't work on the universal scale that TFA is talking about. That amount of data will NEVER exist. It just isn't possible to determine the outcome of all actions and events. It's absurd. Finally, even if you could somehow accumulate that god-like amount of data you wouldn't need to worry about determining the underlying mechanisms whereby things worked any
I wish I had mod points. You're absolutely right. If you believe strong AI is on the foreseeable horizon, you're so incredibly wrong.
As an analogy, think about how much information is in your DNA. Even compressed, the data fills up a CD-ROM (650 MB). We are nowhere near explaining how our DNA or the brain that our DNA prescribes work. How can we arrange the petabytes of data on the internet into something useful? It's going to take work, and lots of it.
Climate models, as we all know, are infallible.
Consensus, don't you know?
We must be alert to the danger that public policy could become captive to a scientific-technological elite. - Eisenhower
Sure, models are "wrong." That's why they're MODELS. I don't look at a 1/50 scale model of an experimental aircraft and say "Well, geez, how is this thing useful? It's made out of wood! It doesn't fly! You'll never get any useful information out of this!" The model helps us, the experimenters, wrap our heads around exactly what we're doing. When scientists create models, they're not really trying to replicate the universe, they're trying to make the complexity of it make sense to them. That's why we have multiple models of the same damn thing - look at the atom: the Bohr model, the Rutherford model, the Quantum Mechanical model. Sure, some are more accurate than others, but each are useful to us in understanding the phenomena that occur.
We, humanity, are the ones doing science. The computers are tools. If the computer understands what all the petabytes mean, good for it - but WE still need to understand, and the way we understand is by fitting the data to a model. Then we test the model and find where it's incorrect. The size of the data set is completely irrelevant, because the data will always be used to confirm or reject a hypothesis.
Answer me this then: how exactly are you "mining" the data for "answers"? And once you've explained that, explain how it differs from the scientific method.
Modded GP back up because good things should rise regardless of who writes them. UID 0 has infinite modpoint you know.
i'd never heard the term "model selection," so thanks for pointing that out. it looks like there really is some good literature to read on the subject.
the process described by the model selection sites i skimmed still doesn't adress what i was getting at, though. "choosing a model from a set of potential models" is only conceivable when your set of potential models (and set of variables to potentially be modeled) is well bounded.
to put it another way, take the smartest model choosing algorithm you can find, hand it a pile of data, and say "what do you make of that, smart guy?" i'm willing to bet that the answer is going to be along the lines of "wtf?" unless there is some sort of context or metadata provided along with the data to give the algorithm a hint of what it's looking for. am i looking for covariance between scalar values among regularly organized groups? am i looking for white rabbits in the image data from a camera? is this ascii or ebcdic or 8-bit PCM data? you can argue that these questions are trivial, that no algorithm can be *that* general, but that is precisely my point: all known algorithms require significant narrowing down of the problem space by human hands before they can begin to produce useful output.
if you had an algorithm that took *truly* semantics-free data in one end and spit models of regularly occuring features out the other end, you'd be halfway to general AI.
"All /.'ers are wrong, and increasingly you can succeed without them."
Comment removed based on user account deletion
Mr. Anderson... you disappoint me.
I work in Natural Language processing/Computational linguistics, and I think I can see where this article is coming from.
Look at the case of Machine Translation (MT). For a long time, the approach to getting better MT was to develop better models (alignment models, syntactic models, semantic models, etc.). The idea was that if we used more sophisticated methods to model language phenomena, then we'll capture more nuance and produce better results. Well, that worked for a while. As, people started collecting larger and larger amounts of data, to the point where meaningful statistics could be calculated over a collection of text, simpler statistical models started beating out complex models created by trained experts. One of the big names in MT from IBM made the controversial, but accurate, statement that every time he fired a linguist, his accuracy went up.
Fast forward to Google and their petabytes of text data. At this point, some very sophisticated models, developed by some very intelligent people are being beaten hands-down by systems Google is working on that use very simple methods, trained on LOTS of data. The point is, as long as your models/techniques are reasonable having more data seems to dominate using better methods.
I saw a different parallel, to Brin's Uplift series.
The galactic civilizations in that series relied on computers and algorithms handed down from the dawn of time. Their computers and algorithms were so powerful, in fact, that they worked with the fundamentals. They didn't have floating point numbers or calculus or abstractions about area or specular highlights. Their algorithms worked with integers, counted virtual atoms, and traced virtual photons.
The blind correlation techniques described in the article seem very similar to me. Don't bother with models and abstractions, with power-law and bell-curve approximations of reality, just deal with reality directly.
It's like the dude said, "God invented the integers; all else is the work of man."
It also reminds me of the Chinese Room. You don't need understanding, you just need patterns. This concept was fruitfully explored in the hard SF book Blindsight, by Peter Watts.
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
Science's job is to understand, predict and control.
Deriving correlations from giant data sets is useful precisely so that we CAN form models, theories, etc. that enable us to make predictions and control outcomes.
Fundamentally, we are interested in cause and effect relationships even when they are complex or emerging properties.
Correlations suggest potential relationships, which demand design of experiments to determine if there is a causal relationship.
I was one of the co-founders of the company, Pangea Systems/DoubleTwist, which made the software which assembled and annotated the Human Genome in 1999, beating Venter's company by 6 months.
We based our business on utilizing enormous amounts of the public DNA, protein, and biochemical metabolic data to derive correlative relationships, but the purpose always was to enable formulation and testing of hypotheses.
This allowed us to assemble the genome out of tiny bits of sequences and to infer potential biochemical function by transitive logic leveraging the known functionalities that had been discovered by scientists who laboriously conducted real biochemical experiments with isolated proteins, whose sequences we now could piece together from the fragmented, disparate data.
Anderson reflects a view common among those who deny material reality and defer solely to mathematics and statistics, and in fact, consider math to be the a priori of the universe, not material reality.
The phenomenal success of the human genome project was a vindication of the scientific method and the work of thousands of brilliant engineers from many disciplines leveraging hundreds of theories for each of their disciplines to deliver that data that Anderson strips of physicality, causality, hypothesis and testing.
Unfortunately, this tendency of philosophical idealism is manifested in many ways among the technorati.
From sophomoric Matrix-philosophies that everything we perceive is an illusion to the useless and solipsist "anthropic principle" that has become current in physics stating that 10^500 universes are possible, but we could only happen to exist in one that allowed intelligent beings to exist, blended in with string theory as a way to deflect discussion about the lack of experiments suggested by their theory.
Anderson, Matrix-ites, Anthropocists all share a denial of historical and dialectical material reality and the scientific method.
Unfortunately, two generations of progressives have been poisoned by anti-materialist, Post-Modernism, and the resulting intellectual rot continues to fester in myriad forms.
The author doesn't know what the difference is, that's his look out.
I never considered Google stats to be science.
There is a world of difference between knowing that objects fall at 32ft/s/s and knowing that something has a probability of correlation with some index or other.
The difference between quantum physics and statistics, (apart from Einstein's objection that "God does not play dice with the universe,') is that the "unknowns" in quantum physics are matters of fractions (like: 80% of the universe is hidden from us in Dark Matter and Dark Energy,:-) and are things that we know we don't know.
But we know that we don't know them...
Statistics can lead one anywhere in the KNOWN universe.
Science can lead one into the unknown.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
And you have the gall to chastise slashdot readers about their reading comprehension? The 'point' the author was trying to make goes a lot deeper than the simple use of data mining. He was suggesting that we could use data to completely dispense with the need for models of any sort in science. He was suggesting that, with enough data, we wouldn't need models that would attempt to approximate or explain the underlying mechanisms responsible for phenomenon. Rather, we could - with enough data - just make predictions on what events would proceed from what precursors based upon pure statistics. We could do away with any notion of even trying to understand 'how' or 'why' things happen and just focus on 'what' was likely to happen given a specific set of conditions. That's a lot deeper then what you've provided in your summary. It's also a point of view that is profoundly naive.
In order for the 'modeless' approach to be capable of replacing the scientific method we'd first have to define that the purpose of science is not to understand the 'how' or 'why' of phenomenon, but rather say that science exists exclusively to make predictions and that it can tell us nothing about why its predictions work or what mechanisms are at play with respect to the phenomenon that these predictions concern. That's a huge leap that not many scientists are going to be willing to make.
Further, without models, we'd be in no position to be able to interrelate the different phenomenon about which we were making predictions. Without such a framework our ability to understand and make use of our predictions would be severely impaired. To give a more concrete example, just because we have data that shows that IF p -> q 80% of the time and IF z -> q 50% of the time that doesn't mean we know anything about why this happens and whether or not there is a relationship between p, z and the way they lead to q. With good models we could make statements and predictions concerning these questions. Without models all we can do is wonder.
The author might respond to this charge by saying that, with enough data, this problem would disappear. Well, that's true - if you have enough data. The thing is, in order to completely replace the need for models you need *complete* all-encompassing data for whatever subject you're discussing. You have to know the outcome of every possible event and combination of events. At that point, you don't need models to make predictions anymore simply because you don't have any predictions left to make. You also don't have to worry about trying to use models to explain the underlying mechanisms at work that cause the phenomenal behavior being studied and predicted because, with complete knowledge, you'll have already picked that up along the way. So, yeah, as soon as we have perfect knowledge and complete data we can dispense with models and the scientific method. I won't be holding my breath for that day.
Sure, you can use statistical approaches to make predictions about the behavior of systems that aren't well understood or well modeled. This has been done with varying degrees of success for a long time and increased computing power will only make this technique more useful. However, to suggest that we'll ever be able to substitute the need for models and the scientific method with pure statistics is nothing but blind folly.
When Gene Roddenberry was putting together TNG, his theme for the series was "Prometheus Unbound". Roddenberry wanted to explore a world of ideas, to explore what happens when scarcity is not a factor. When you have so much energy and so much technology that the only limitation is your imagination, what do you do then? What decisions do you still have to make? What conflicts must you still resolve? (I think this can be a difficult sell because few of us can relate to a scarcity-free world. If you look at the luxurious 1701-D and then compare it to, say, Babylon 5 with downbelow and lurkers, it's clear which one is more like our world, and thus, which one is more about our personal stories.)
So now imagine a world so Googleized that finding the facts is never a problem. What we do then is decide what to do. What kind of world do we build? What kind of society do we want?
We can abandon representative democracy because referenda have no cost -- Google knows how people will vote. Do we still want representative democracy because there are social benefits that come only with that structure? What about communism or capitalism? Who cares! Google has correlated all information everywhere and knows what goods to ship where... if we can agree on what our beliefs and priorities are. A more Googly world also gets us closer to the idealized Adam Smith "perfect information flow" discussed in Economics 101. Is the end result of this a win for libertarians? Or will perfect information flow have different consequences than we believe it will?
When Google's clusters tell us the facts, our role will be to be the deciders of things. We will have to choose those things that are beyond facts. Personally, I feel this gets us closer to our ideal selves. We'll get closer to choosing what we really want rather than choosing from the subset of what we can have.
Five percent of one year's DoD budget puts us on Mars.
when I saw so many other posters call this article out. Hope is not lost, when people can see bull for what it is!
The reason why theory is not necessary is because they have petabytes of EVIDENCE. Yes, the votes are in, so tally them up with computers, and spew out some results. But if no theory was involved, then why is google so much better at it than other now obsolete search engines? Maybe they have better science. What a leap!
Unfortunately, it is well written, and many "smart" people simply uninformed in this particular field may find compelled by some of the arguments... but then again, they probably won't. At least I hope not.
There is no such thing as perfectly accurate data, especially on computers where the tools used are not platform independent (half of laptops sold to college kids are macs now buddy!)
There are considerable numbers of people who take passive and active measures to avoid being tracked on multiple levels.
the conclusion: if you actually believe this conclusion and apply it to your marketing/product development you will fail to serve a considerable number of people and possibly alienate this group if they used to be your customer before you decided the data didn't justify the cost of feature X.
If your product is information, you will encourage piracy. If your product is something else, someone else will come and eat your lunch, especially in tech where that outlier which uses the obscure features you "dropped" is the same group advising friends on purchases
VLC FOR MAC IS DYING! IF YOU DEVELOP, PLEASE SAVE IT!!
I use it successfully for anything from helping me debug software to finding a list of somebody's campaign contributors to ... anything I can come up with search terms for. I'd say that on information search (image search is iffier), I get useful results about 99% of the time, though it might take a while to get those results.
I suggest you browse the google help docs.
Not to say it couldn't be better, they still don't have the Boolean NEAR or DATE operators, and in some ways, search would be easier if I could simply enter Boolean operators.
Tech Public Policy stuff
The only compelling thing about this is that it compels me to conclude that neither the author nor the poster knows anything about science. This kind of nonsense is, in my view, a clear symptom that the agenda of the religious right is succeeding all too well in destroying scientific awareness and critical thinking. To hell with them.
I tried to RTA, but couldn't stomach it. This /. caught my eye because I have found that "The Scientific Method" is either dead, forgotten, or something that my 9th grade biology teacher made up. I tried to explain to a colleague how my approach to all problem solving was taught to me by Mr. B... Dang, forgot his name! Anyway, when I tried to find an article on the steps, Hypothesis... Experiment... Conclusion, etc. I couldn't find anything anywhere. It's like it is even used anymore! I'm trying to convince people to use this method, and I can't even find an explanation to show anyone.
The only stable state is the one in which all men are equal before the
My guess is that someone who admires wordy high-falutin' postmodern philosophy text or who was once a subscriber to OMNI magazine got involved with this article.
Probably an intellectual imposture with rectangle glasses and a turtleneck.
http://www.elsewhere.org/pomo/
http://www.theonion.com/content/opinion/why_do_all_these_homosexuals
what is fact/truth?
u need to know the mechanism AND reality prove
mathematics is kind of representation but doesn't mean only math can prove stuff
textual or visual is also the way to prove stuff in higher level or just phenomena
but that doesn't mean it's not scientific
blindly believe in math means nothing...ppl could do right math but with wrong answers
Wired is tired.
"The Data Deluge Makes the Scientific Method Obsolete"? How absurd.
The scientific method requires data to make hypothesis and theories testable.
Data with models, hypothesis, theories is just useless as is the article's observations and conclusions.
'Speaking at the O'Reilly Emerging Technology Conference this past March, Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."'
As someone else observed WTF???
Without models you've got nada.
What's going to replace models?
"This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear."
Are not applied mathematics models? Duh... They make one giant claim in one statement and in the very next sentence they contradict themselves and prove that they are full of shit!!! Cool way of writing... but not my style.
While I'm not a particle physicist I suspect their claims in this statement are simply junk: "The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the "beautiful story" phase of a discipline starved of data) is that we don't know how to run the experiments that would falsify the hypotheses -- the energies are too high, the accelerators too expensive, and so on."
It's just a sound bite designed to attract as many none critical thinkers as possible. Given how many people like mind poo this poo pablum is typical of writing that passes itself off as science based.
Icky mind poo guys. Surely you can do better?
"In short, the more we learn about biology, the further we find ourselves from a model that can explain it."
What??? That doesn't mean that we won't find a model that works? That doesn't mean that we can't use information science to find models that work!!!
"The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all."
Huh?
While I'm all for "correlation" (in fact I love correlation of various kinds) I have no idea what they mean when they say that "science can advance even without coherent models, unified theories, or really any mechanistic explanation at all." That's just beyond bizarre and profoundly and deeply mistaken understanding of science.
Maybe there is a useful message in the article about data mining on petabyte scales and higher, but it's not the advertised "End of Theory".
Why the cloud cannot obscure the scientific method by John Timmer.
If you've scrolled down this far, you might as well read it all :)
lol Blade:House of Cthon
On the same line about Science vs AI,
at Napoleon time there was a real example.
At this time it was very important to known
how to shoot canon ball in order to destroy
the other army. The techniques at this time
was a regression approach, they fire and look
where the ball crash and update the model until
have a good acurracy. Just like AI does.
Napoleon had very good people working on this problem (fourier, and others), these people
using the scientific method discover a very precise balistic law from first principle.
The concecuence was that Napoleon wins most all
the first battle, because he take less time in
pointing and destroy his enemy army.
CU
I could summarise TFA as follows: we don't need science, we don't need models, we don't need scientists, all we need is competent statisticians and analysts. My first reservation is the oversimplification of science as presented in TFA. It is a very bold claim to assume there is a universal "scientific method". Hard sciences (e.g. physics) and soft sciences (e.g. sociology) have totally different research strategies and criteria of model success. Similarly, I don't think there are universal roles of models and mental models in sciences. Finally, a model may be able to predict the past, i.e. fit into existing data, but this does not guarantee that the same model can reliably predict the future. The second reservation (and I am glad this was implied by the comment above on LHC data) is that both data collection and data treatment are inevitably theory-laden. On the one hand, we need a theory to provide criteria of what counts as data. In the LHC experiment, an valid event may be masked by millions of noise data points. Both a very solid theory of the phenomena and a theory of the instruments are needed to declare an event as valid. On the other hand, data treatment (number crunching) also implies taking many assumptions on the distribution of the data, and on the probabilistic and statistical paradigms used to approach them. To provide an example, how do you interpret a statement like "the probability of Brooklyn Bridge to collapse in the next month is one in a million"? The person claiming this has studied a million bridges like the Brooklyn one and found that on average one of them collapses every month? In that sense, I'm afraid that the scientific method, unless premises and prior knowledge are explicitly and adequately taken into account, cannot help hunting its tail. Theory will always be underdetermined by data. In addition, as some commenters have indicated, some data coming from complex processes are well known to resist analysis. Global warming is an excellent example, where different starting points predict totally different outcomes. The techniques implied by TFA may provide good likelihood estimates, that is probabilities of data obtained by a specific model, in the same sense that we can check the correct spelling of a word by the number of its google references. However, when the data are scarce, googling cannot help. I would trust more an artist to provide answers to interesting questions like the meaning of the universe ("42") than the billions of monkeys like me typing the google search space. I think a more appropriate title for the article would be "The end of data". In the ocean of data, the relevant bits are lost and their identity gets diluted in the vastness. The end of humanity must be just the same thing - loss of identity, heat death, maximum entropy: all is the same, all is equal, all is meaningless.
"I can't imagine how things could get any worse!" (some guy) "That could just be failure of imaginatioÂn on your p