Google Begat the End of the Scientific Method?

Ahem by Anonymous Coward · 2008-06-25 03:44 · Score: 5, Insightful

The content is compelling. It notes that we've entered the Age of the Petabyte â" where one can collect intense amounts of data that is paradigm agnostic. It goes on to add a comment from the head of Google's R&D, that we need an "update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them." Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?" I believe I speak for not a few of us when I respond:

WTF?

English, ---, do you speak it?

Re:Ahem by smallfries · 2008-06-25 03:54 · Score: 5, Insightful

I used to think that I could translate most dialects of bullshit into english but this threw me off guard. The most reasonable explanation is that Chris Anderson is a tool and doesn't know what he is talking about.
For example, data is now "paradigm agnostic". Seriously, wtf? When was data ever not "paradigm agnostic" and when did we develop the need for a term to describe it. Data is data. It is raw, and unanalysed, and as such the notion of a paradigm is completely irrelevant.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Ahem by eln · 2008-06-25 04:00 · Score: 5, Interesting

It's simple really: The article seems to be saying that we have access to such a ludicrously large amount of data that trying to draw any real meaning from it is pointless. So, we employ a "shotgun" approach at reading the data, and voila, we get data that at least appears to be interesting.
Of course, since we have no particular purpose in mind when we do this, and no particular method other than "random", we end up with mostly useless data (in the example given, we have a bunch of random gene sequences that must belong to previously unknown species, but we know nothing about those species other than that we found some random DNA that probably belongs to them, and have no particularly good way of finding out more).
The article seems to be saying that since we have so much data, we can now draw correlations between different pieces of data and call it science. No reason is given why this is useful other than that we have so much of it, and Google is somehow involved. Apparently when you have enough data, "correlation does not equal causation" is no longer true. Again, no coherent reason is given for this stance.
I think the article makes the same mistake a lot of ill-informed people that get excited by big numbers make: It seems to believe that data is in and of itself an end goal, when really vast amounts of data are useless unless it can help us as humans answer questions that we want answered. Yes, knowing that there are lots of species of organisms in the air that we didn't know about before is sort of interesting I guess, but it doesn't really tell us anything useful.
Above all, the article proves that you can be almost entirely incoherent and still get your article published in Wired if it says something about how Google is changing the world.
Re:Ahem by clang_jangle · 2008-06-25 04:06 · Score: 4, Funny

Data is data. It is raw, and unanalysed, and as such the notion of a paradigm is completely irrelevant.

Well, we already know it wants to be free, so maybe now it's just exercising its sentient status in other areas.

--
Caveat Utilitor
Re:Ahem by Anonymous Coward · 2008-06-25 04:09 · Score: 5, Informative

Each claim the others data is unsound by the paradigm's umbrella it falls under.
No, each claim the other's theory is wrong.
Nobody (sane) refutes the existence of ring species, or refutes microevolution, or other observable forms of data. The only thing in dispute in the controversy is "species are species because they were made that way" versus "species are species because after some really big N evolutionary steps they become that way".
Re:Ahem by commodoresloat · 2008-06-25 04:15 · Score: 4, Insightful

Well, in the abstract data may be "paradigm agnostic," but the selection of data one has access to at any given time is inevitably not. Which data you choose to collect, how much of it you collect, which data you ignore - these are all decisions that are ultimately subjective. (BTW I think this is probably true even in the age of google but his point is that one is now collecting, storing, and accessing so much data and the "paradigm" influencing those decisions is not a specific scientific theory or point of view.)
Re:Ahem by nine-times · 2008-06-25 04:19 · Score: 5, Interesting

Yeah, I don't know what "paradigm agnostic" means specifically, but I think it's a mistake to think that "data is data".
Not all data is created equally. You have to ask how it was collected, according to what rules, and with what purpose. I can collect all sorts of data by stupid means, and have it be unsuitable for proving anything. It's even possible that I could collect a bunch of data in an appropriate way, accounting for the variables which matter for my particular experiment, and have that data be inappropriate for other uses.
Of course, if what's intended by "paradigm agnostic" is that we no longer pay attention to those things, then I hope we're not becoming paradigm agnostic. I'm just bringing this up because I think some people think numbers don't lie, and that when you analyze data, either your conclusions will be infallible or your analysis is flawed. On the contrary, data can not only be bad, but it can be inappropriate.
Re:Ahem by melikamp · 2008-06-25 04:47 · Score: 5, Funny

I used to think that I could translate most dialects of bullshit into english
Piping TFA to bs2english yields:
Google is a great place to work, and an even better place to invest money in. Go Google! P.S.: buy Google stock.
Re:Ahem by LilGuy · 2008-06-25 04:48 · Score: 5, Insightful

I'm glad slashdot linked it. I read this the other day and had no idea what to make of it. After the first 20 comments I see I'm not completely retarded.

--

You're nothing; like me.
Re:Ahem by JeanPaulBob · 2008-06-25 05:04 · Score: 5, Informative

In the minds of some Creationists, science is itself defective because it only deals with natural phenomena.
Psst. It doesn't. It deals with phenomena about which (or based on which) we can make measurable, testable predictions.

If your methodology for evaluating a theory requires classifying it by abstract metaphysical concepts like "natural" and "supernatural", then you're a step away from the scientific method of "experiment".
Re:Ahem by Randle_Revar · 2008-06-25 05:14 · Score: 4, Interesting

Just undoing a slip of the mouse moderation.
That's one disadvantage of the current mod system - no chance to fix mistakes

--
Climate Progress - Hell and High Water
Re:Ahem by ArhcAngel · 2008-06-25 05:26 · Score: 4, Funny

I'm not completely retarded.
The data is inconclusive. Let me see what I turn up on a Google search.

--
"A person is smart. People are dumb, panicky dangerous animals and you know it." - K
Re:Ahem by sm62704 · 2008-06-25 05:56 · Score: 4, Insightful

Information doesn't want to be free. But when it isn't, neither are you.

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:Ahem by sm62704 · 2008-06-25 06:02 · Score: 4, Funny

Not all data is created equally. You have to ask how it was collected, according to what rules, and with what purpose
I wear a goatee as a result of a small study.
Several years ago after after my marriage unravelled and I got divorced and couldn't as much as get a dinner date, I decided "fuck it, why do I bother buying razors?" and simply stopped shaving.
Then one night in a bar a woman told me I should shave it into a goatee. So I started asking women "goatee or full beard?" and collecting the binary (y/n) data. Of seventeen randomly selected women aged 21 to 70, sixteen said "goatee". The one who said "full beard" was standing beside her boyfriend, who wore a full beard.
My losing streak ended, thanks to pseudoscience!

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest

WTF indeed by GameboyRMH · 2008-06-25 03:46 · Score: 5, Insightful

I saw the article yesterday, but it was so WTFey I just moved on...definitely not Slashdot submission material (especially being a Wired article).

--
"When information is power, privacy is freedom" - Jah-Wren Ryel

Re:WTF indeed by eggoeater · 2008-06-25 03:50 · Score: 5, Funny

"WTFey"
I hadn't seen WTF adjective-ised before, but I love it... there's just so much I can use it with. In fact, I gotta go now and tell my boss how my project is going....

--
$7.95/mo, 200 GB disk, 2TBxfer, MySQL, PHP, RoR.
Re:WTF indeed by mrchaotica · 2008-06-25 04:02 · Score: 5, Funny

adjective-ised

And I hadn't seen adjective verbed!

--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:WTF indeed by MightyMartian · 2008-06-25 04:14 · Score: 5, Funny

It reads like some sort of brain-damaged new-age technohippy tripe. Yeah, we don't need methodologies any more, because, maaaan, we've got tubes! Gimme a break.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:WTF indeed by m.ducharme · 2008-06-25 05:00 · Score: 4, Funny

This thread is cromulent.

--
Rule of Slashdot #0: You and people like you are not representative of the larger population. - A.C.
Re:WTF indeed by dmbasso · 2008-06-25 05:02 · Score: 5, Funny

And I hadn't seen anything, I'm blind you insensitive clod!

--
`echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com

Not quite by edwebdev · 2008-06-25 03:48 · Score: 4, Funny

Until cells, molecules, atoms, and subatomic particles start publishing blogs, the scientific method will remain useful.

So... by dunnius · 2008-06-25 03:49 · Score: 5, Insightful

So everything possible has been researched now and therefore no more research is necessary since it will all be on the internet? Ridiculous!

Re:Definitions by Itninja · 2008-06-25 03:52 · Score: 4, Insightful

A bit OT here, but don't forget 'wisdom' after intelligence. So many people stop at intelligence.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.

How bout no by Anonymous Coward · 2008-06-25 03:53 · Score: 5, Insightful

Um, no. Claims like this demonstrate a lack of understanding of what a model is.

From the perspective of physics, the universe is just a massive amount of data--more data than any single human can comprehend at once. But thanks to the models of Newton we have a set of relatively simple equations that describe, generally, the way bodies in the universe interact. The model is not perfect, but it is useful.

Likewise, Google uses a very explicit model to describe the universe of the web: some pages are more relevant to a given search query than others, and these pages will generally be more 'popular' among other important pages. Again, the model is not perfect, but it is useful.

The fallacy is that somehow what Google is doing is a paradigm shift. It's not. It's just applying the same kind of scientific method to a type of data that hadn't existed before.

What, I think, the article is really trying to say is that Google's data is so massive and complex that we can't ascribe any explanation to the results it gives us. First of all, that is false, because the PageRank algorithm in its simplest form does give us a very explicit explanation (popular pages generally return better results). But even if it were true, Newton faced the same kind of accusations when people called his model of the universe 'Godless' and claimed, for example, that he decribed how gravity works without actually explaining "why" it works like it does. And that accusation is always with science. There are always more questions raised than answered. This is nothing new.

Don't rule science out it. by russotto · 2008-06-25 03:53 · Score: 5, Insightful

The article is utter nonsense. But it's such a rambling mess it's hard to know where to start picking it apart. Perhaps the best is when he presents as an example of this new "model-free" approach with a program which includes "simulations of the brain and the nervous system". Uh, hello... a simulation IS a model.

Re:Don't rule science out it. by feed_me_cereal · 2008-06-25 04:04 · Score: 5, Funny

He didn't bother writing more than one rambling page because he figured someone said it better somewhere else on the internet and that we're all bound to find it.

--
"Question with boldness even the existence of a god." - Thomas Jefferson
Re:Don't rule science out it. by ColdWetDog · 2008-06-25 04:08 · Score: 4, Interesting

The article is utter nonsense. But it's such a rambling mess it's hard to know where to start picking it apart.

I suppose you could start where he, again, tries to present the argument that correlation really is "good enough" - causation be damned. What he is blattering on about is that you can infer lots of things via statistical analysis - even complex things. That's certainly true. Where he fails (and it's an EPIC fail) is his assertion that this method is a general phenomena, suitable for every day use.
The other major failure of TFA is that I can't find a car analogy anywhere.

--
Faster! Faster! Faster would be better!
Re:Don't rule science out it. by JustinOpinion · 2008-06-25 04:11 · Score: 5, Insightful

it's such a rambling mess it's hard to know where to start picking it apart. Agreed. I want to do a line-by-line rebuttal... but I fear that would be a waste of time.

The article does not make a compelling point. It keeps saying that we can give up on models (and science), because now we just have lots of data, and "correlation is enough." What utter BS. Establishing a correlation is not enough. Even if it is predictive for the given trend, it doesn't allow us to generalize to new domains the way a well-established scientific model does. If an engineer is designing a totally new device, that goes above and beyond what any established device has done, what data can he draw upon? If there is no mountain of data, he must rely on the tried-and-true techniques of engineering/science: use our best models, and predict how the new device/system will behave.

The article actually makes this point perfectly clear when it says:
Venter can tell you almost nothing about the species he found.
Indeed. Merely having tons of data doesn't actually give you insight into what you have measured. You must distill the data, pull out trends, and construct models. I just don't see how have mountains of data about a species, but still being unable to answer simple questions about it, is superior to conventional science (which can answer questions about the things it has discovered).

A deluge of data and data-mining techniques is a boon to science. But I don't see the benefit of giving up on the remarkably successful strategy of constructing models to explain the phenomena we've observed. I somehow doubt that having 20 petabytes of data on electron-electron interactions is more useful than having a concise theory of quantum mechanics.

My Start menu has been Googled by spyrochaete · 2008-06-25 03:55 · Score: 4, Insightful

I am definitely a victim of this "Google effect". Search makes me lazy.

For example, for years I would pride myself on my well-tended Windows Start menu. I'd create base categories for my application folders like Hardware, Games, and Internet, and move applications into those folders to keep my Start menu manageable. I blogged about this procedure and included a screenshot.

Now that I'm using Vista I have little need to be so organized. I rarely have to navigate manually to an application folder thanks to the embedded search box on the Start menu. So now my Start menu is a huge clutter, but so what? I see that exercise as futile as dusting the cardboard boxes in the attic.

What question do you ask the data. by xzvf · 2008-06-25 03:56 · Score: 4, Insightful

Searching data is a tool. You still need to have insight to formulate a theory, develop a test for the theory, and ask the data pool the right (non-leading) question. Then evaluate the data looking for both proof and disproof of the theory and be smart and ego neutral enough to let the data suggest a new theory, test and question. Don't confuse a new and useful tool that makes insight easier, with the ability of humans to have that insight.

Google =/= scientific method by Rubikon · 2008-06-25 03:57 · Score: 5, Informative

That an incredible amount of data exists on any given topic does nothing to describe relationships, causality, precision, accuracy, distribution, correlation, or anything else. Data is information, and information must be processed in order to make it meaningful. Additionally, everything that's written, printed, published, etc, is not necessarily true, accurate, precise, etc.

If anything, the Google phenomenon demands more rigorous examination by accepted methods.

The preceding message has been brought to you by Captain Obvious and the letters O,R,L,Y.

No. by qw0ntum · 2008-06-25 03:58 · Score: 4, Insightful

First, not everyone has access to vast clouds of information due to expense and I don't think that's going away any time soon. So we'll still get to understand what's going on around us and not just rely on regression analysis to inform our every decision.

Second, in my experience with large sets of data, you can do all kinds of math to them to bring out interesting relationships but someone with domain expertise is going to have a much better insight into what the data is saying than someone who doesn't. It seems the peak of hubris to think that the techniques taught in every science (social, hard, or otherwise) are worth nothing compared to massive amounts of data. How do you know where to get the data from? How do you apply the data?

I don't think it's quite time to throw out "correlation != causation". In fact, I think now more than ever we need to be able to understand underlying phenomena behind the data precisely because there is so much of it. With so much data, coincidental correlation is going to happen quite often I'm sure.

And, of course, the ultimate reason we need to understand things is for, you know, when the cloud's not there.

--
'Every story, if continued long enough, ends in death.' --Ernest Hemingway

Wrong by DogDude · 2008-06-25 03:59 · Score: 5, Insightful

This is typical web 2.0 hype... more is better. Which, as anybody who has used Wikipedia knows, is utter bullshit. The scientific method can't be supplanted by a large amount of questionable data. Tons and tons of bad data is still bad data. It doesn't get any more correct just because there's more of it.

--
I don't respond to AC's.

Interesting, ranty, and wrong by xPsi · 2008-06-25 04:01 · Score: 5, Insightful

A thought-provoking piece written by someone who neither understands the scientific method nor Google. Who doesn't understand the difference between a Theory and a model. Who still doesn't get correlation!=causation. Who probably has never had to actually analyze any substantial amount of data before. And who has clearly been raised on a self-important intellectual diet consisting of too much Buckminster Fuller, Kurtzweil, Frank Tipler, and Derrida. I'm sure there are some kernels of insight buried in there someplace, but I'm just not clear what they are. If his rant is indicative about the future direction of science, we're all doomed.

--
i\hbar\dot{\psi}=\hat{H}\psi

Re:Definitions by gnick · 2008-06-25 04:04 · Score: 4, Insightful

don't forget 'wisdom' after intelligence. So many people stop at intelligence. From what I've seen, it's not completely a progression from one to the other. I've met people who I would describe as 'knowledgeable', 'intelligent', or 'wise' without possessing either of the other attributes. Those traits are often coincidental and one can help beget another, but it's far from a hard set 'intelligent'->'knowledgeable'->'wise' progression.

--
He's getting rather old, but he's a good mouse.

Biggest Data Collector LHC relies on Models by markk · 2008-06-25 04:04 · Score: 4, Insightful

I thought this was a joke at first. One thing to think about is that the biggest data collector of them all, the Large Hadron Collider, which fits the frame given perfectly - delivering terabytes of data in huge data sets is just the opposite of the described scenario. Models are crucial to actually picking what data is actually recorded. In fact a large part of how good the LHC data will be will be in using models to select what events to capture. The way the data is captured is of course also based on long effort and knowledge from previous detectors. This isn't just randomly, or even generically selectively gathering data and then analyzing it. This is targeted data gathering based on complex scientific theories. There have been shouting matches at what to tag for collection based on what people think is important for a given theory - and these will happen again.

As our collection abilities rise exponentially, the the storage and analysis abilities are not exponentially growing, even though they are increasing at a fast rate! I would argue exactly the opposite of what this article said. We are going to be more and more dependent on our current scientific theories to even be able to choose appropriately the rich data that new sensors and techniques will let us collect. That is we are more and more dependent on our scientific theories when we get data not less. Did we even know to get methylation data when sequencing a genome. How about some other "ylation". Without background theory and experience we wouldn't even know some of that stuff was there to collect!

Just to clarify by GameboyRMH · 2008-06-25 04:08 · Score: 5, Insightful

To avoid the same fate as the GP, let me clarify that by WTFey I specifically meant that the article was full of fluff, light on details and generally pointless...which makes me think "WTF." The closest thing to a point I could get from the article was "Nice big blobs of data can be useful, and statistical data based on said blobs could replace the results of scientific research." Mmmkay.

A sensational headline leading to a rather pointless article consisting mostly of fluff: WTF.

--
"When information is power, privacy is freedom" - Jah-Wren Ryel

Re:Just to clarify by javilon · 2008-06-25 05:14 · Score: 4, Interesting

Well I think the point they make is that with this kind of mathematical tools running against this huge sets of data, you get models out that you couldn't have thought of about. This is real AI. During the last days we had entries here on Slashdot about how AI is not advancing, but this kind of thing is very advanced AI and it is new.
I'll explain myself. The biggest job that a brain does (lets not consider a human brain so we don't get into the consciousness/mind type of conversation) is to find statistical correlations from the input data and extracting models from this correlations that can be used to predict the future. This is exactly what this tools are doing.
Before this tools, by looking at the data you would go: mmmm, this is interesting, lets check it out. That is, you would come up with a model and try to find out if it predicts the data. Then we started to use computers to check our models, and from what this WTFey article says, it is the computer the one coming out with the model now, starting from raw data.

--

When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."

The Paradigm is the Data Subset by fictionpuss · 2008-06-25 04:15 · Score: 5, Insightful

The paradigm is embedded in the quantity, or subset, of data you choose to analyse.

For example, to detect stress you might traditionally measure heartbeat, skin conductivity, pupil dilation.

In the "petabyte age" you throw in the number of times the subject uses the letter 's'; how frequently they use the 'reload' button on the browser; what colour of pants they wore last tuesday; Pepsi vs. coca cola; the number of times they picked their nose in 1997 and any and every other bit of data you have on the subject.

In the "petabyte age", most of the data you sift through will show no correlation, but you have a much better chance of finding the unexpected if indeed, there is some unknown factor out there.

Re:The Paradigm is the Data Subset by kurthr · 2008-06-25 04:38 · Score: 5, Insightful

Don't you run a much higher probability of finding high correlation by chance?
I can expect to find a result that matches my model to 95% certainty about 5% of the time in random data. You can correct for this, but it's against human nature because people like to see the face of Mary in toast.
Learning how to look for correlation in huge uncontrolled data sets will require a new paradigm... or it will ultimately be useless and even perhaps, unsuccessful.

Slashdot Mirror

Google Begat the End of the Scientific Method?

40 of 387 comments (clear)