Why the Cloud Cannot Obscure the Scientific Method
aproposofwhat noted Ars Technica's rebuttal to
yesterday's story about "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete." The response is titled "Why the cloud cannot obscure the Scientific Method," and is a good follow up to the discussion.
Link error in story.
Because a datasource isn't a process?
Check out my sysadmin blog!
Where's the link to the Ars Technica story?
http://arstechnica.com/news.ars/post/20080625-why-the-cloud-cannot-obscure-the-scientific-method.html
http://arstechnica.com/news.ars/post/20080625-why-the-cloud-cannot-obscure-the-scientific-method.html
I like the fact that the web and search/aggregate engines may combine vast amounts of data in ways we now
cannot imagine - it expands the field for new scientific research enormously. Replace science? No.
accept no limits but time
Crack cocaine makes you stupid.
Oh, you were talking about the "information cloud" the crackheads at Wired always talk about. Never mind.
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Latest addition to bullshit bingo cards:
CLOUD
I'd say that the models are the science. They're how you explain your data. They provide evidence that the experiments make sense, and they guide you by making predictions you can test.
Moreover, SIMPLIFIED MODELS are good science. Understanding which details can be omitted without impacting the predictive ability of your model shows you know which effects are important and which aren't.
Use the Firehose to mod down Second Life stories!
another obvious history.
I am sorry Google, but your ad bussines model will be terminated by random page requests. It is alraedy happening, no 'pseudo' articles will help.
What's in a sig?
Leonardo Davinci is reputed to be the last person who "knew everything" that there was to know during their lifetime. Even that wasn't true. But the scientific method has been the key to both creating and coping with a "data deluge".
Science suffers when there's too little data: scientists then must generate more by observation, or do something else that isn't science (and doesn't work nearly as well). Too much data is only a problem if you're willing to settle for imprecise/inaccurate results. I'm sure there are a lot more lazy scientists than since Leonardo's time, just with the inflation of the scientist population, but that doesn't mean we should dumb down scientists who just want to own a computer that spits out answers to the data they put in.
--
make install -not war
A large source of data that has a correlation does not somehow imply causation. Even if it works under some conditions (or even all conditions). The science happens when the causation is determined and then applied.
All models are wrong, but some are useful.
We still need scientific methods to develop useful models and understand and refine the existing models. When Newton defined his mechanics that was the state of the art in his era, and now we have progressed to quantum mechanics which might be refined tomorrow.
But mere observation of some phenomena is not sufficient to postulate the behaviour in a changed condition. A scientific model and its rigorous application is required for this. Correlations drawn from the cloud cannot substitute it.
gopla
The point of the last story was horribly miscommunicated. There were two main points. The first is that data is expanding in such scope that hierarchal organization systems don't work and that the second is we're approaching a time where the method or analysis of data to show causation will come from correlation, because you can determine all the variances due to the fact that all the variables have been accounted for. Look at the human genome project or folding at home. I don't think this is completely true, but lets not bash the idea or miss the point just cause the original author's a complete bumbling moron.
Oh honey look... How cute... an angry slashdotter!
In general I'm right behind the rebuttal. However John Timmer chooses a very bad real-life example as his rebuttal champion.
He asks: ...would Anderson be willing to help test a drug that was based on a poorly understood correlation pulled out of a datamine? These days, we like our drugs to have known targets and mechanisms of action and, to get there, we need standard science.
These days we may like our drugs to have these attributes, but very often they don't. There are still quite a few medicines around that clearly work and are prescribed on that basis, but for which there is only the haziest evidence as to how exactly they work.
The good thing about the scientific method, however is it gives us a framework to investigate these drug's actions - even if the explanation is still currently beyond us.
Truly, the whole reason someone like Mr. Anderson could claim the end of science because of data is that he is a writer, a thinker, and large part businessman. Businessmen do not think about Science and how to use it to come with a method that produces a conclusion. He uses information to come up with ways to illicit a reaction in people. So to him data is more important than science because he uses it for his purposes. That is marketing, and the "science" of marketing has almost always been that way.
/. this article is as cogent a rebuttal as one can make.
Mr. Anderson was not prescient in any way, he was just speaking his perspective. The only thing is we must be careful to even consider his proposition as a valid reality worth pursuing. Not for true scientists, but from a social perspective, or it will truly be the end of science. There are some in power as it is already attempting to make this happen.
That said, I almost consider responding to yesterday's article as falling for the argument. But, since it hit the
...and it should be known by now
And can back up this rebuttal with a practical example. I am a physicist, I know sod all about blood samples, or proteins, or cancer. I get a pile of mass spec data (about a billion data points or so on some days) and through binning, background subtraction, and a string of other statistical witchcraft I produce a set of peaks labeled according to intensity and significance.
This does not make me a cancer researcher. This data has to go back to the cancer guys and they have to pick out the Biomarkers and thus develop new diagnostic tests, based on principles that I don't understand. I am master of the information but entirely blind as far as the science is concerned. Same goes for google.
If we can put a man on the moon, why can't we shoot people for Apollo-related non-sequiturs?
When I read the original article my thought was that someone was just trying to write something to get noticed. The Scientific method, IMHO, is all about a person or group of persons using a logical process to determine the vailidity of an idea. Observing massive amounts of data can reveal relationships that may not have been noticed in other ways, but at the end of the day the process of "I think X, I wonder if it is true", the heart of the scientific method, can no sooner become obsolete than we can stop being human. The questions of What, Why and How are so fundamental to humans as humans that nothing short of total omniscience will ever replace the logical process represented by the scientific method.
traditionally, science forms its hypothesis, and performs an experimentum crucis to test the hypothesis; rinse & repeat. it seems to me that 'the cloud' refers to a hitherto statistically huge number of samples of data points from which to extract our knowledge of the world -- a sort of broad collection of facts derived from constantly and systematically varying the experimental conditions -- an exploratory experimentation. goethe outlines a method of Exploratory Experimentation in the essay The experiment as mediator between subject and object.
"Theory-oriented and exploratory experimentation are not exclusive categories, but rather members of a spectrum of experimental research strategies. Which is more productive in a given context depends on many factors, including a field's state of development, the sort of knowledge (for example, underlying mechanisms versus phenomenal regularities) sought by the physicist, and the complexity of the system being studied. Our aim in emphasizing the exploratory path has been to bring to light an experimental style that has played an important, but hitherto underrecognized, role in the history of physics.
Physics Today Article
What you say is true, Hoplite3. The big issue I see is how people define "model". My guess is that quite a few unfortunately define it as "I got 3 asterisks in the significance test", whether the "model" (say, linear regression) makes sense or not.
I forget where I read it, but I've been studying linear regression, and there was a fascinating example were if they'd have used linear regression techniques on the early "drop the canonball and time it's fall" data, they would have come up with a nice, highly-significant linear regression for gravity.
Then there is the whole issue of explanation versus prediction. Something can be predictive while providing no explanation, and perhaps that's where the petabyte idea is going: who cares about explanation if prediction is accurate enough? (Not my philosophy, BTW.)
You're right about the medicine example. It's odd that medicine has an incredibly rigorous statistical process before approval, yet many medicines are basically black boxes.
Look at statins (cholesterol medication), which are one of the most widely-prescribed medicines in the world -- and which I take. There's a legitimate question as to whether their main effect is to reduce cholesterol levels, or whether it's actually a specific kind of anti-inflammatory which happens to reduce cholesterol levels.
Or how about ulcers, which were chalked up to personality and stomach acid, and treated as such, until a "crank" pushed the medical community for decades and they finally realized that a bacteria was behind most of it. The medicines were (and are) effective, but no amount of modeling along those lines could find the actual, root cause of most ulcers.
(I also take medicine for stomach acid, and interestingly I am one of the 10% whose ulcer was not caused by bacteria.)
I have always viewed this debate in the context of scientist vs. engineer. That is one who views data as "good and true" vs. "good enough". That's not a slam on engineers (I am one), but a reflection of the balance between the two. A scientist that never applies theory sits in an empty room. An engineer who build things with out science, sits in a cluttered room surrounded by useless objects.
I do find interesting though that the advent of "google data" may indicate a flip in order of the two disciplines. Historically (IMHO) science has led engineering. A theoretical breakthrough, provable by the scientific method, may take years to give birth to a practical application. Now, with enormous piles of data and the knowledge that "good enough" is often good enough, we may be creating useful objects that will take science many years to explain and model.
The biggest issue and omission in both of these pieces is that this "cloud" of data does not represent "truth" (as the scientist may seek), but rather a summation or averaging of the "perception of truth" as seen by the individual authors. The cloud, therefore, is only as useful as human's ability to divine truth without the scientific method.
My two cents. :)
I have a problem with the google generation, sure, they can parrot facts and find things in an instant, as can any slashdotter I'm sure, but knowing something is not the same thing as understanding something.
I coworker asked me yesterday "how do you call a C++ class member function from C [or java]?" The question is an example of pure ignorance.
If they "understood" computer science, as a profession, this would be a trivial question, like how do I or can I declare a C function in C++. The second question is what google can help you with while having to ask the first question means you are screwed and need to ask someone who understands what you do not. Not understanding what you do for a living is a problem.
How programs get linked, how environments function, virtual machines vs pure binaries, etc. These are important parts of computer science, just as much as algorithms and structures. You have to have a WORKING knowledge of things, i.e. an understanding.
Google's ease of discovery eliminates a lot of the understanding learned from research. Now we can get the information we want, easily, without actually understanding it. IMHO this is a very dangerous thing.
Petabyte technology suggests new avenues of scientific investigation, but doesnt end science or older alternative ways of doing things. The clever thing is to be first to discover the new possibilities.
I would agree that the scientific method is not dead, but I like this rebuttal. The scientific method as I understand it is
1) Observe
2) Form a hypothesis or create a model to explain some phenomenon
3) Experiment and gather empirical data to support or refute the hypothesis/model
We still do all that but the emphasis does seem to be shifting away from traditional models that are sweeping generalizations (e.g., "An atom has a nucleus of protons and neutrons surrounded by moving electrons") to more nuanced, numerous, highly specific, and esoteric observations which are cobbled together into a patchwork of quasi-models that collectively define a distributed understanding of the real underlying concept. No single person understands the big picture in its entirety and no single model dominates scientific disciplines. Nay! Controversy is rampant.
These quasi-models manifest themselves as scientific papers, correspondence between academics, and flame wars on web vBulletin or phpBB sites and in practice, people subscribe to them a la carte like they were ordering at McDonald's or something.They stitch together their own stylized scientific philosophy from a vast menu of options.
In my opinion, all these claims that "we scientists are still doing science and we do understand the universe" are actually kind of pathetic. To call your data on the propagation of a particular gene variant in D. melanogaster a 'model' is hubris. You are a technician, not a scientist. You are a cog in the machine. We are all just neurons in the collective brain.
He makes statements about treatments, causes, and outcomes as if they were God given truths proven to the world beyond all doubt. In truth medicine seems to this mathematician as a field governed sooley by statistical correlation with next to no concern over (a) what is the actual cause is, (b) testing the hypothesized cause in any meaningful way. I've read study after study that goes through a wonderful presented statistical analysis to conclude that such and such drug works well at treating such and such symptom; they then close with a couple of paragraphs as to why (they think) the drug is working often not using an qualifiers such as "we don't know but our guess is..." or "it would be nice to find out if it is ...."
To the vast majority of practicing physicians I've met "cause" just doesn't seem to be the important question. Which I think is why things happen like my pharmacist declaring that two drugs prescribed by my doctor are going to cancel each others effects or why I take a drug to treat a painful toenail and end up with bleeding in my stomach.
This whole thing makes no sense. It's all ambiguous concepts. What? Lot's of data means we don't need to use theories? Lot's of data != Omniscience. If fact, lot's of data is not even yet information. You still need to find how it applies. It's the people are Wired making a religion out of new technology that causes them to say crazy things like this.
Science and openness go together.
Without openness, we all are reinventing private wheels, which we destroy the plans to when there is no profit.
If you work in software, consider for a moment how scientific your work is, considering the work of other companies doing similar work.
This Clouds thing is the "billion monkeys/humans typing on keyboards" model.
Yes, it really can work (with humans).
But, as with science, the chaos development model only works with openness.
Of course, organized science along with a little chaotic development work work even better.
There are forces in our society that do not like any open model. The Microsoft's, the MPAA, the RIAA. These type of organization thrive from closed models. More copyright controls, more DRM, longer copyright and patent terms.
These forces would prefer to own,control and close science and clouds of data. They are unaware of the inevitable impact of such actions.
In a free capitalist society, we are naturally driven my contrary forces.
A desire to hide discoveries, to maximize profits, even at the expense of innovation.
A desire to share discoveries, to contribute to society and for credit.
While it is possible to profit when ideas are shared,
It is more difficult to contribute to society by hiding information indefinitely.
There are coefficients we use in models that we don't fully understand in the physical world. We obtain those coefficients through empirical data. To rely solely on those models for design ignores the fact that those coefficients may change for any reason in the real world, because we don't fully understand what factors influence them.
In my experience this only applies to certain sciences. Most of my experience with such systems is in the area of fluid mechanics, and thermochemistry. Models can save years of lab work, but in the end, the model still needs to be verified.
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
I believe science is a direct descendent of the capacity for deduction as granted by our model-making brains. In our internal symbol sense, we often use the subjunctive tense, if when why hypothetically depicting, don't call it science, fine, it's still predicting what will happen from what has happened. -- that's what i'm rappin'
The Wired post was a bit over-reaching, sure... but that's Wired for you.
The bigger point is that science is about testability, not story-telling. There may soon come a day when our analysis can prove that something is true without our being able to explain why it is true.
We are already there in many respects, but will be much further along when the current crop of Bayesian diagnostics hits the market. Combine those with the flood of information that personal genomics companies hope to make available and you might see an explosion of insight into diagnosing disease states.
Does that mean we're all done with lab science? Of course not. But our research may come to focus more on understanding what our diagnostics have already proven, rather than on charting new frontiers of knowledge.
Call it what you will, that's a pretty big change in how people organize and gather knowledge.
Some time ago some researchers came out with a book which was supposed to be called "the end of intuition". The name of the book actually became "Supercrunchers", because people would click more on that ad than in the "end of intuition". I wondered why the final name shouldn't be "hot college lesbians".
The Eliza effect is so huge that any nice trick machines do seems to give us the immediate feeling that "It's alive!", and it has deep meaning.
Nonsense.
As a researcher of psychologically plausible AI models, I found the whole idea disgusting, and submitted a paper to a journal explaining why the whole thing is bogus.
Expect to see more of this overexcited nonsense in the future.
Another point missed here is that background noise can obscure real results. Much of the data cloud is utter garbage. Picking out the useful information is often a complicated and difficult process, in some cases it's easier to just go and do the measurement yourself. I've heard the "a few days in the library can save you weeks at the bench" about as often as the reverse. I think they're both true.
-sk
Ever wonder how early humans discovered medicinal qualities of plants? They didn't use models and scientific method... they used vast amounts of trial and error results. Then they used prediction based on what they had learned to narrow down what kind of plants to try out next. They didn't understand the underlying mechanisms and test out new findings based on that type of model... they used cheap and dirty statistics and record keeping.
This is just an extension of what humans have been doing to discover new correlations, for our entire history... just faster.
I come up with theories all the time based on cross-referenced science articles. Unfortunately I'm not in a position to test any of them, so the best I could do is blog about it - but then I'd join the ranks of the armchair scientists out there and that just seems lame, for now.
One day I'll come across a community that accepts crack pot ideas as the basis for experimentation... lets the community vote on which ones to carry out and takes small donations to fund the projects... then I'll submit my ideas. Hmmm... sounds like a fun community, off to Google to see if one exists already.
A fool throws a stone into a well and a thousand sages can not remove it.
The article states that "we know quantum mechanics is wrong on some level". Oh really? That's news to me. Any serious proposed theories of everything have been quantum in nature. It's amusingly hypocritical that the Arstechnica article refers to the Wired author as unscientific, yet makes such a claim itself.
The only thing "wrong" with quantum theory is that doesn't fit human intuitions. But this is only because people ignore the psychology of perception and are not careful about interpretations; it's easy to create a very reasonable interpretation of QM that doesn't invoke weird stuff like saying QM must have something wrong with it, or strange consciousness stuff, etc. An example is Mohrhoff's http://arxiv.org/abs/quant-ph/0412182 (also check Marchildon's review of this class of interpretations, it's in the arxiv somewhere linked to this).
"Politicians and diapers must be changed often, and for the same reason."
thoroughly fed up with trying to register on wired to argue as the article seemed seriously wrong.
what i was intending to post there, but can't - finally somwhere to post it beyond the g/f's email!
Copied from http://www.systems-thinking.org/dikw/dikw.htm as I was looking for a reference to the information but think that this page sums it up at least as well as I could:
"The content of the human mind can be classified into five categories (Russell Ackoff):
Data: symbols
Information: data that are processed to be useful; provides answers to "who", "what", "where", and "when" questions
Knowledge: application of data and information; answers "how" questions
Understanding: appreciation of "why"
Wisdom: evaluated understanding."
The interpretation on the given url sees understanding as a process that represents the transition between each stage rather than as a stage in itself - information is the understanding of the relationships between data, knowledge is the understanding of patterns of information and wisdom is the understanding of the principles that underpin knowledge and hence make extrapolation to the future possible.
Whichever way it is looked at, the first categories relate to the past with wisdom (the ability to extrapolate) being the only one which relates to the future.
Applying to your example of J. Craig Venter, it can be seen (from my viewpoint) that his research has expanded the amount of data available to us and even possible the amount of information.
However it provides no answers, that I can see - from what is provided in your article - to the questions of "How?" or "Why?" and therefore provides no increase in knowledge, understanding or especially wisdom.
I would argue that it is the scientific method of hypothesize, model, test that provides the answers to the how and why questions and therefore increases Knowledge and Wisdom.
To me what you are arguing is not that the data deluge makes the scientific method obsolete but rather that it provides a new basis for experimentation by the analysis of statistics - it provides a new medium for testing, but provides no ability to hypothesize or test and hence does not increase knowledge, understanding or wisdom.
As such it is a very beneficial development, but the results must be treated with the same caution, indeed more, as any experimental results gained by more traditional meas. To blindy accept the findings without factoring in all the paramaters and testing against a hypothesis (indeed, unless you hypothesize how do you determine what to vary and what to test?) seems to me to be very dangerous and indeed a step backwards in thinking.
Chris Anderson
has foreseen the most profound change since the age of reason. Man has
reached the point where his "understanding" can impede evolution. It is
time to concede that some processes may be beyond our
comprehension.
Research in the area of Artificial General Intelligence provides a
crystal clear demonstration of the problem. A half century of research
has led to "intelligent" data mining and voice response systems and
very little else.
However, Koza, Fogel
and many others have observed evolutionary computation machines
creating solutions to real world problems. In some cases these
are patentable solutions beyond previous human achievement, and some of
them defy understanding.
Unless you have unlimited funding and lots of time, it's not necessary
to understand why every complex solution works. It may not even be
possible.
A million MRI's of functioning brains are not likely to result in any
Lisp program for AGI, so the search for AGI seems to be coming full
circle back to the "baby bootstrap". Even Ben Goertzel
is looking to virtual babies to mine the clouds.
Like others who have managed to see beyond the horizon, Anderson will
be widely misunderstood. He is not rejecting scientific method, he is
simply showing us its limitations.
I have a feeling you could have a brilliant career in that field.
Mit der Dummheit kämpfen Götter selbst vergebens
When I was taking experimental physics 101, I remember we verified basic laws in mechanics by sliding and throwing stuff around many many times, then fitting equations and calculating confidence intervals. Sure, we didn't have petabytes, it all fit in one square-ruled piece of paper.
Several centruries ago, the wholy grail of theory was perfect causality and inference of all the minute details. Chris Anderson seems to be stuck there. Quantum mechanics changed that for good, by talking in terms of statistical properties of position and momentum of particles. But that turns out to be a very useful set of models, with many practical uses. (String theory, on the other hand, through my limited understanding takes a different tack: it adds so many dimensions, that's it's possible to fit almost any kind of data -- as Feynman once complained, more or less.)
So, now we have petabytes of particles (so to speak). We can throw the dice many times and make observations and draw inferences that are statistical in nature. But we're still dealing with models and confidence intervals. The fundamentals are the same, maybe there is a relative shift in focus between theory and experiment, or between perfectly causal and statistical models, but that's about it in my view.
I think the biggest problem in QM is the idea that the "collapse of the state vector" actually describes anything real. It's one of those questions like "when does life start" or "what's really a planet" that doesn't really have anything to do with science. It's just a metaphor that makes certain kinds of reasoning about QM easier, and provides guidance as to where you can simplify your model to make the calculations practical.
Chris' article was nonsense and the Ars article shows very well, at least, that Chis has drawn some inappropriate conclusions regarding "the Cloud" by citing contradictions in the very article that was posted on Wired. However I found another article (link below), written apparently by a Physics Ph.D. student, that goes into a little more depth regarding the nature of Chris' misunderstanding. He raises the question: is what Chris is referring to actually "knowledge"?
http://thatsprettylame.blogspot.com/2008/06/end-of-reason-why-data-deluge-will-not.html
They are both owned by Conde Nast. It's sorta funny seeing them duking it out. I believe Ars Technica has a better team of journalists than current-day Wired. Wired is pretty much run by graphic designers now...
While he does a good job showing that science itself isn't going away, he actually lends credence to the position that cloud computing implies a lot of useful information will be generated outside of science. Moreover, he also might be supporting the position that science isn't necessarily going to catch-up and explain this data any time soon. So, the "strong" position, that Google makes science irrelevant, is naturally false. But the "weak" position, that Google represents a new kind of inquiry that is going to be increasingly used and relevant, seems intact and supported. So cheers to Google and science, HJS
I think the consensus is that the original article is a bit presumptuous and flawed. He says that science will be replaced, which implies that there is a hardened definition for how science is to be performed currently, which there isn't. There is no ONE definition of science or the scientific method.
From a junior high school site about the scientific method:
"Six steps of the S. M.
State the problem: Why is that doing that? Or Why is this not working?
Gather information: Research problem and get background info
Form a hypothesis: a possible explanation for the problem using what you know and what you observe.
Test the hypothesis: Make observations, build a model and relate to real-life or experiment.
Experiment: testing the effects of one thing on another using controlled conditions.
Variable: a quantity that can have more than a single value. (Dependent vs independent)
Constant: a factor that does not change when other variables change.
Control: the standard by which the test results can be compared
Analyze data: recording data and organizing it into tables and graphs.
Draw conclusions: based on your analysis of your data, you decide whether or not your hypothesis is supported."
This "cloud" is just a buzz-word for massive amounts of data collected for no good reason other than to collect it, IE before you perform a hypothesis. Using this junior high model, a hypothesis is created from observation (seeing a correlation in the data), then you go back to the data or collect more data to prove or disprove that hypothesis.
Massive amounts of data and algorithms that sift through it are TOOLS in the box for performing the scientific method. They don't replace it.
I think his argument would be better if he stated that these tools, in certain cases, allow you to reasonably prove and create a hypothesis in a single step.
I had a nice example of the complete inadequacy of google's thought-agnostic approach to links browsing around looking for information on samba and fuse under linux. Google's ad bars, completely misinterpreting the context, offered links to fuse boxes, as in wiring, and Samba lessons, as in dancing. But then, maybe I'm not giving Google enough credit. It might have actually recognized the pointlessness of trying to market software to a Linux user, and took the obvious step of throwing in some complete non sequiturs in the hopes of catching something of value.
I just remembered.. they collect "clouds" of statistical information about baseball players so that they can create great "correlations" to post on the scoreboard..
When you just let the algorithms try to make meaningful conclusions you get such gems as..
"On every third Sunday in June in an election year, catchers have a 35% chance of hitting a 420 yard home run in the third inning."
"Because it came from WIRED," should have been enough reason to discard this bullshit from day one. Why not ask some REAL scientists in a REAL peer reviewed scientific journal about what the "cloud" is doing instead of letting a bunch of insular technophiles indulge in masturbatory fantasies about how their "culture jamming" is "shifting paradigms" all while convincing themselves the same shit wasn't going on in the 60's, 70's, 80's and fucking 90's, and is indeed the sort of thing that led to WIRED's kind in the first fucking place. If science and its titular method could both create and survive the atomic bomb, radar, TANG and LSD, it can certainly handle a fucking "cloud" of bits.
Wasn't this all demonstrated 100 years ago by Francis Galton and an Ox? What's new is that there are more data points and better techniques to identify interesting correlations. Probably this is what we do internally anyway. All of our sensory input is correlated and the interesting bits are filtered out by specific algorithms trained by evolution. What is fascinating to many are the times when these algorithms are spectacularly wrong.
I'm as pleased as anybody that the development of large pools of widely accessible data may lead scientists to find and consider correlations which may not otherwise have observed.
However, Wired does tend to breathlessly enthuse when it comes to stories about how the Internet has changed everything, everywhere, forever and ever! (Look back 11 years at "The Long Boom" for an example of this unbridled enthusiasm. Today, to our great sorrow, this seems a bit ... overoptimistic.)
In the current political climate, any claim that the Internet has made information universally available is hopelessly naive. And the veracity of the information that is available is, at best, mixed.
This is not to say that scientists would resort to sources such as Wikipedia for their sole source of information. Even so, statistical modeling is not a new science. If the emerging massive data cloud makes this kind of research an increasingly important scientific tool, it is cause for optimism.
However, anybody who claims that his/her hypothesis does not require testing, verification and review--or that scientific hypotheses in general have become obsolete--cannot be taken seriously.
That would be a typical Wired overstatement.
One aspect nobody seemed to address is that although the "Google method" cannot replace science in the sense "it is a substitute for the scientific method", but it may become the prevalent method because of psychological and social reasons.
First notice that not all scientists are created equal. Only a handful of us are in the position or able to create new theories or advance existing ones using data collected by them or others. The body of scientific research is devoted to the collection and publication of (more or less) valuable data. Just look at the scientific papers! Most of it may be summarized this way: "We studied this and this and measured or calculated these and these, and maybe found some correlations." Of course to do this we must know our field of research well enough to be able to determine what kind of data should be collected and how. But as the usage of the "cloud" becomes ubiquitous it may seem to be more economical to just measure everything possible and then finding the correlations using an AI or ANN. In that way we only need to pay a bunch of cheap "sciworkers" who are able to handle the equipments. This means cheaper education and lower costs. And more people can work in "science", which makes good statistics. And hey! Even those with poorer abilities can do it! This will not advance our understanding of the Universe that's true, but who cares if the technical advancements still continue?
And there are other factors to consider.
Even now nobody is able to learn everything there is to know on any field of science or technology. And the situation will worsen. We cannot overcome this by simply prolonging the duration of education. The "cloud" seem to be a solution to this problem. Just take it a step further by automatizing the data acquisition too.
Nowadays more people are interested in esoteric pseudo science then in science, because it's easier to digest, doesn't require hard work and give instant answers to all problems. And we want answers right now! The "cloud" seem to be the perfect solution for this problem too. It's easy to believe that if we have enough correlated data the computer can give the correct "answer" for all of our queries without too much work from our part. So why pay scientists?