Why the Cloud Cannot Obscure the Scientific Method
aproposofwhat noted Ars Technica's rebuttal to
yesterday's story about "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete." The response is titled "Why the cloud cannot obscure the Scientific Method," and is a good follow up to the discussion.
Because a datasource isn't a process?
Check out my sysadmin blog!
http://arstechnica.com/news.ars/post/20080625-why-the-cloud-cannot-obscure-the-scientific-method.html
I like the fact that the web and search/aggregate engines may combine vast amounts of data in ways we now
cannot imagine - it expands the field for new scientific research enormously. Replace science? No.
accept no limits but time
The author's head is completely up in the clouds...
Posting with out proof reading since 2001.
Latest addition to bullshit bingo cards:
CLOUD
I'd say that the models are the science. They're how you explain your data. They provide evidence that the experiments make sense, and they guide you by making predictions you can test.
Moreover, SIMPLIFIED MODELS are good science. Understanding which details can be omitted without impacting the predictive ability of your model shows you know which effects are important and which aren't.
Use the Firehose to mod down Second Life stories!
A large source of data that has a correlation does not somehow imply causation. Even if it works under some conditions (or even all conditions). The science happens when the causation is determined and then applied.
All models are wrong, but some are useful.
We still need scientific methods to develop useful models and understand and refine the existing models. When Newton defined his mechanics that was the state of the art in his era, and now we have progressed to quantum mechanics which might be refined tomorrow.
But mere observation of some phenomena is not sufficient to postulate the behaviour in a changed condition. A scientific model and its rigorous application is required for this. Correlations drawn from the cloud cannot substitute it.
gopla
The point of the last story was horribly miscommunicated. There were two main points. The first is that data is expanding in such scope that hierarchal organization systems don't work and that the second is we're approaching a time where the method or analysis of data to show causation will come from correlation, because you can determine all the variances due to the fact that all the variables have been accounted for. Look at the human genome project or folding at home. I don't think this is completely true, but lets not bash the idea or miss the point just cause the original author's a complete bumbling moron.
Oh honey look... How cute... an angry slashdotter!
In general I'm right behind the rebuttal. However John Timmer chooses a very bad real-life example as his rebuttal champion.
He asks: ...would Anderson be willing to help test a drug that was based on a poorly understood correlation pulled out of a datamine? These days, we like our drugs to have known targets and mechanisms of action and, to get there, we need standard science.
These days we may like our drugs to have these attributes, but very often they don't. There are still quite a few medicines around that clearly work and are prescribed on that basis, but for which there is only the haziest evidence as to how exactly they work.
The good thing about the scientific method, however is it gives us a framework to investigate these drug's actions - even if the explanation is still currently beyond us.
Truly, the whole reason someone like Mr. Anderson could claim the end of science because of data is that he is a writer, a thinker, and large part businessman. Businessmen do not think about Science and how to use it to come with a method that produces a conclusion. He uses information to come up with ways to illicit a reaction in people. So to him data is more important than science because he uses it for his purposes. That is marketing, and the "science" of marketing has almost always been that way.
/. this article is as cogent a rebuttal as one can make.
Mr. Anderson was not prescient in any way, he was just speaking his perspective. The only thing is we must be careful to even consider his proposition as a valid reality worth pursuing. Not for true scientists, but from a social perspective, or it will truly be the end of science. There are some in power as it is already attempting to make this happen.
That said, I almost consider responding to yesterday's article as falling for the argument. But, since it hit the
...and it should be known by now
And can back up this rebuttal with a practical example. I am a physicist, I know sod all about blood samples, or proteins, or cancer. I get a pile of mass spec data (about a billion data points or so on some days) and through binning, background subtraction, and a string of other statistical witchcraft I produce a set of peaks labeled according to intensity and significance.
This does not make me a cancer researcher. This data has to go back to the cancer guys and they have to pick out the Biomarkers and thus develop new diagnostic tests, based on principles that I don't understand. I am master of the information but entirely blind as far as the science is concerned. Same goes for google.
If we can put a man on the moon, why can't we shoot people for Apollo-related non-sequiturs?
When I read the original article my thought was that someone was just trying to write something to get noticed. The Scientific method, IMHO, is all about a person or group of persons using a logical process to determine the vailidity of an idea. Observing massive amounts of data can reveal relationships that may not have been noticed in other ways, but at the end of the day the process of "I think X, I wonder if it is true", the heart of the scientific method, can no sooner become obsolete than we can stop being human. The questions of What, Why and How are so fundamental to humans as humans that nothing short of total omniscience will ever replace the logical process represented by the scientific method.
What you say is true, Hoplite3. The big issue I see is how people define "model". My guess is that quite a few unfortunately define it as "I got 3 asterisks in the significance test", whether the "model" (say, linear regression) makes sense or not.
I forget where I read it, but I've been studying linear regression, and there was a fascinating example were if they'd have used linear regression techniques on the early "drop the canonball and time it's fall" data, they would have come up with a nice, highly-significant linear regression for gravity.
Then there is the whole issue of explanation versus prediction. Something can be predictive while providing no explanation, and perhaps that's where the petabyte idea is going: who cares about explanation if prediction is accurate enough? (Not my philosophy, BTW.)
I have always viewed this debate in the context of scientist vs. engineer. That is one who views data as "good and true" vs. "good enough". That's not a slam on engineers (I am one), but a reflection of the balance between the two. A scientist that never applies theory sits in an empty room. An engineer who build things with out science, sits in a cluttered room surrounded by useless objects.
I do find interesting though that the advent of "google data" may indicate a flip in order of the two disciplines. Historically (IMHO) science has led engineering. A theoretical breakthrough, provable by the scientific method, may take years to give birth to a practical application. Now, with enormous piles of data and the knowledge that "good enough" is often good enough, we may be creating useful objects that will take science many years to explain and model.
The biggest issue and omission in both of these pieces is that this "cloud" of data does not represent "truth" (as the scientist may seek), but rather a summation or averaging of the "perception of truth" as seen by the individual authors. The cloud, therefore, is only as useful as human's ability to divine truth without the scientific method.
My two cents. :)
I have a problem with the google generation, sure, they can parrot facts and find things in an instant, as can any slashdotter I'm sure, but knowing something is not the same thing as understanding something.
I coworker asked me yesterday "how do you call a C++ class member function from C [or java]?" The question is an example of pure ignorance.
If they "understood" computer science, as a profession, this would be a trivial question, like how do I or can I declare a C function in C++. The second question is what google can help you with while having to ask the first question means you are screwed and need to ask someone who understands what you do not. Not understanding what you do for a living is a problem.
How programs get linked, how environments function, virtual machines vs pure binaries, etc. These are important parts of computer science, just as much as algorithms and structures. You have to have a WORKING knowledge of things, i.e. an understanding.
Google's ease of discovery eliminates a lot of the understanding learned from research. Now we can get the information we want, easily, without actually understanding it. IMHO this is a very dangerous thing.
He makes statements about treatments, causes, and outcomes as if they were God given truths proven to the world beyond all doubt. In truth medicine seems to this mathematician as a field governed sooley by statistical correlation with next to no concern over (a) what is the actual cause is, (b) testing the hypothesized cause in any meaningful way. I've read study after study that goes through a wonderful presented statistical analysis to conclude that such and such drug works well at treating such and such symptom; they then close with a couple of paragraphs as to why (they think) the drug is working often not using an qualifiers such as "we don't know but our guess is..." or "it would be nice to find out if it is ...."
To the vast majority of practicing physicians I've met "cause" just doesn't seem to be the important question. Which I think is why things happen like my pharmacist declaring that two drugs prescribed by my doctor are going to cancel each others effects or why I take a drug to treat a painful toenail and end up with bleeding in my stomach.
Science and openness go together.
Without openness, we all are reinventing private wheels, which we destroy the plans to when there is no profit.
If you work in software, consider for a moment how scientific your work is, considering the work of other companies doing similar work.
This Clouds thing is the "billion monkeys/humans typing on keyboards" model.
Yes, it really can work (with humans).
But, as with science, the chaos development model only works with openness.
Of course, organized science along with a little chaotic development work work even better.
There are forces in our society that do not like any open model. The Microsoft's, the MPAA, the RIAA. These type of organization thrive from closed models. More copyright controls, more DRM, longer copyright and patent terms.
These forces would prefer to own,control and close science and clouds of data. They are unaware of the inevitable impact of such actions.
In a free capitalist society, we are naturally driven my contrary forces.
A desire to hide discoveries, to maximize profits, even at the expense of innovation.
A desire to share discoveries, to contribute to society and for credit.
While it is possible to profit when ideas are shared,
It is more difficult to contribute to society by hiding information indefinitely.
While he does a good job showing that science itself isn't going away, he actually lends credence to the position that cloud computing implies a lot of useful information will be generated outside of science. Moreover, he also might be supporting the position that science isn't necessarily going to catch-up and explain this data any time soon. So, the "strong" position, that Google makes science irrelevant, is naturally false. But the "weak" position, that Google represents a new kind of inquiry that is going to be increasingly used and relevant, seems intact and supported. So cheers to Google and science, HJS
I think the consensus is that the original article is a bit presumptuous and flawed. He says that science will be replaced, which implies that there is a hardened definition for how science is to be performed currently, which there isn't. There is no ONE definition of science or the scientific method.
From a junior high school site about the scientific method:
"Six steps of the S. M.
State the problem: Why is that doing that? Or Why is this not working?
Gather information: Research problem and get background info
Form a hypothesis: a possible explanation for the problem using what you know and what you observe.
Test the hypothesis: Make observations, build a model and relate to real-life or experiment.
Experiment: testing the effects of one thing on another using controlled conditions.
Variable: a quantity that can have more than a single value. (Dependent vs independent)
Constant: a factor that does not change when other variables change.
Control: the standard by which the test results can be compared
Analyze data: recording data and organizing it into tables and graphs.
Draw conclusions: based on your analysis of your data, you decide whether or not your hypothesis is supported."
This "cloud" is just a buzz-word for massive amounts of data collected for no good reason other than to collect it, IE before you perform a hypothesis. Using this junior high model, a hypothesis is created from observation (seeing a correlation in the data), then you go back to the data or collect more data to prove or disprove that hypothesis.
Massive amounts of data and algorithms that sift through it are TOOLS in the box for performing the scientific method. They don't replace it.
I think his argument would be better if he stated that these tools, in certain cases, allow you to reasonably prove and create a hypothesis in a single step.
I had a nice example of the complete inadequacy of google's thought-agnostic approach to links browsing around looking for information on samba and fuse under linux. Google's ad bars, completely misinterpreting the context, offered links to fuse boxes, as in wiring, and Samba lessons, as in dancing. But then, maybe I'm not giving Google enough credit. It might have actually recognized the pointlessness of trying to market software to a Linux user, and took the obvious step of throwing in some complete non sequiturs in the hopes of catching something of value.
"Because it came from WIRED," should have been enough reason to discard this bullshit from day one. Why not ask some REAL scientists in a REAL peer reviewed scientific journal about what the "cloud" is doing instead of letting a bunch of insular technophiles indulge in masturbatory fantasies about how their "culture jamming" is "shifting paradigms" all while convincing themselves the same shit wasn't going on in the 60's, 70's, 80's and fucking 90's, and is indeed the sort of thing that led to WIRED's kind in the first fucking place. If science and its titular method could both create and survive the atomic bomb, radar, TANG and LSD, it can certainly handle a fucking "cloud" of bits.
Wasn't this all demonstrated 100 years ago by Francis Galton and an Ox? What's new is that there are more data points and better techniques to identify interesting correlations. Probably this is what we do internally anyway. All of our sensory input is correlated and the interesting bits are filtered out by specific algorithms trained by evolution. What is fascinating to many are the times when these algorithms are spectacularly wrong.