Google Begat the End of the Scientific Method?

Ahem by Anonymous Coward · 2008-06-25 03:44 · Score: 5, Insightful

The content is compelling. It notes that we've entered the Age of the Petabyte â" where one can collect intense amounts of data that is paradigm agnostic. It goes on to add a comment from the head of Google's R&D, that we need an "update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them." Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?" I believe I speak for not a few of us when I respond:

WTF?

English, ---, do you speak it?

Re:Ahem by smallfries · 2008-06-25 03:54 · Score: 5, Insightful

I used to think that I could translate most dialects of bullshit into english but this threw me off guard. The most reasonable explanation is that Chris Anderson is a tool and doesn't know what he is talking about.
For example, data is now "paradigm agnostic". Seriously, wtf? When was data ever not "paradigm agnostic" and when did we develop the need for a term to describe it. Data is data. It is raw, and unanalysed, and as such the notion of a paradigm is completely irrelevant.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Ahem by Anonymous Coward · 2008-06-25 04:00 · Score: 2, Insightful

"For example, data is now "paradigm agnostic". Seriously, wtf?"
Just look at the creation evolution controversy, to see how data is not 'paradigm agnostic'. Each claim the others data is unsound by the paradigm's umbrella it falls under.
Re:Ahem by eln · 2008-06-25 04:00 · Score: 5, Interesting

It's simple really: The article seems to be saying that we have access to such a ludicrously large amount of data that trying to draw any real meaning from it is pointless. So, we employ a "shotgun" approach at reading the data, and voila, we get data that at least appears to be interesting.
Of course, since we have no particular purpose in mind when we do this, and no particular method other than "random", we end up with mostly useless data (in the example given, we have a bunch of random gene sequences that must belong to previously unknown species, but we know nothing about those species other than that we found some random DNA that probably belongs to them, and have no particularly good way of finding out more).
The article seems to be saying that since we have so much data, we can now draw correlations between different pieces of data and call it science. No reason is given why this is useful other than that we have so much of it, and Google is somehow involved. Apparently when you have enough data, "correlation does not equal causation" is no longer true. Again, no coherent reason is given for this stance.
I think the article makes the same mistake a lot of ill-informed people that get excited by big numbers make: It seems to believe that data is in and of itself an end goal, when really vast amounts of data are useless unless it can help us as humans answer questions that we want answered. Yes, knowing that there are lots of species of organisms in the air that we didn't know about before is sort of interesting I guess, but it doesn't really tell us anything useful.
Above all, the article proves that you can be almost entirely incoherent and still get your article published in Wired if it says something about how Google is changing the world.
Re:Ahem by clang_jangle · 2008-06-25 04:06 · Score: 4, Funny

Data is data. It is raw, and unanalysed, and as such the notion of a paradigm is completely irrelevant.

Well, we already know it wants to be free, so maybe now it's just exercising its sentient status in other areas.

--
Caveat Utilitor
Re:Ahem by truthsearch · 2008-06-25 04:08 · Score: 1

Above all, the article proves that you can be almost entirely incoherent and still get your article published in Wired if it says something about how Google is changing the world. Chris Anderson is the editor-in-chief of Wired Magazine. He's the one who gets to choose what they publish.

--
Developers: We can use your help.
Re:Ahem by Anonymous Coward · 2008-06-25 04:09 · Score: 5, Informative

Each claim the others data is unsound by the paradigm's umbrella it falls under.
No, each claim the other's theory is wrong.
Nobody (sane) refutes the existence of ring species, or refutes microevolution, or other observable forms of data. The only thing in dispute in the controversy is "species are species because they were made that way" versus "species are species because after some really big N evolutionary steps they become that way".
Re:Ahem by loonycyborg · 2008-06-25 04:09 · Score: 2, Funny

the article proves that you can be almost entirely incoherent and still get your article published in Wired And linked to on slashdot even!
Re:Ahem by commodoresloat · 2008-06-25 04:15 · Score: 4, Insightful

Well, in the abstract data may be "paradigm agnostic," but the selection of data one has access to at any given time is inevitably not. Which data you choose to collect, how much of it you collect, which data you ignore - these are all decisions that are ultimately subjective. (BTW I think this is probably true even in the age of google but his point is that one is now collecting, storing, and accessing so much data and the "paradigm" influencing those decisions is not a specific scientific theory or point of view.)
Re:Ahem by AKAImBatman · 2008-06-25 04:16 · Score: 1

Explains a lot, doesn't it?

--
Javascript + Nintendo DSi = DSiCade
Re:Ahem by tshetter · 2008-06-25 04:19 · Score: 2, Insightful

I didnt see the article really saying that "correlation does not equal causation" at some point with a large enough data set.

I saw it as saying "With so much data, you can use that as a base for preliminary research."

You then research those interesting things in traditional ways, but you have started with some sort of insight.

If you have enough images of the sky and stars, you can use the images to look for interesting things first, and then jump on a telescope or satellite when you have something solid to look for.

But to be sure, the author was selling Google is the Answer pretty hard. The application of math to problems is never a bad idea, they are doing it pretty well. And with the evolution of computers, more data and more processing are naturally going to occur.
Re:Ahem by nine-times · 2008-06-25 04:19 · Score: 5, Interesting

Yeah, I don't know what "paradigm agnostic" means specifically, but I think it's a mistake to think that "data is data".
Not all data is created equally. You have to ask how it was collected, according to what rules, and with what purpose. I can collect all sorts of data by stupid means, and have it be unsuitable for proving anything. It's even possible that I could collect a bunch of data in an appropriate way, accounting for the variables which matter for my particular experiment, and have that data be inappropriate for other uses.
Of course, if what's intended by "paradigm agnostic" is that we no longer pay attention to those things, then I hope we're not becoming paradigm agnostic. I'm just bringing this up because I think some people think numbers don't lie, and that when you analyze data, either your conclusions will be infallible or your analysis is flawed. On the contrary, data can not only be bad, but it can be inappropriate.
Re:Ahem by MightyMartian · 2008-06-25 04:20 · Score: 3, Interesting

It's an idiotic notion. We've had vast amounts of data for well over a century now, more than we can hope to fully measure and catalog in a life time. Everything from fossils to space probe readings to seismic measurements fill up data archives, in some cases literally warehouses full of data tapes, artifacts and paper. The way you deal with this sort of thing never changes. Providing the data is stored in a reasonable fashion, if you have a theory, you can go back and look at the old measurements, artifacts, bones, whatever and test your theory against the data. The only difference is that rather than going out and making the observations yourself, your using someone else's (or some computer that just transmitted its data).

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Ahem by MrMarket · 2008-06-25 04:29 · Score: 1

Above all, the article proves that you can be almost entirely incoherent and still get your article published in Wired if it says something about how Google is changing the world.

...or if you are the publication's editor
Re:Ahem by Anonymous Coward · 2008-06-25 04:35 · Score: 0

The "petabyte age" thing seems to assume that all data is available - even the stuff that nobody thought was important at the time. How you get that previously unimportant data is anybody's guess.
Re:Ahem by jank1887 · 2008-06-25 04:37 · Score: 2, Insightful

Translation:
Old-way: develop physical model of how we think things work, test a few cases, refine model. New way: collect a huge relevant data set, mine the data for interrelationships, make a correlation. Correlation models replace scientific models. no more need for the hypothesis testing.
Re:Ahem by melikamp · 2008-06-25 04:47 · Score: 5, Funny

I used to think that I could translate most dialects of bullshit into english
Piping TFA to bs2english yields:
Google is a great place to work, and an even better place to invest money in. Go Google! P.S.: buy Google stock.
Re:Ahem by LilGuy · 2008-06-25 04:48 · Score: 5, Insightful

I'm glad slashdot linked it. I read this the other day and had no idea what to make of it. After the first 20 comments I see I'm not completely retarded.

--

You're nothing; like me.
Re:Ahem by jeiler · 2008-06-25 04:53 · Score: 1

No, each claim the other's theory is wrong.
Actually, there is considerable disagreement on both theory and paradigm levels of the argument. In the minds of some Creationists, science is itself defective because it only deals with natural phenomena. They view materialism and naturalism as flawed methodologies.
It probably sounds like BS to you, and it definitely sounds like BS to me, but that's where some anti-evolutionists make their stand.

--
If you haven't been down-modded lately, you aren't trying.
Sacred cows make the best hamburger.
Re:Ahem by JeanPaulBob · 2008-06-25 05:04 · Score: 5, Informative

In the minds of some Creationists, science is itself defective because it only deals with natural phenomena.
Psst. It doesn't. It deals with phenomena about which (or based on which) we can make measurable, testable predictions.

If your methodology for evaluating a theory requires classifying it by abstract metaphysical concepts like "natural" and "supernatural", then you're a step away from the scientific method of "experiment".
Re:Ahem by Randle_Revar · 2008-06-25 05:14 · Score: 4, Interesting

Just undoing a slip of the mouse moderation.
That's one disadvantage of the current mod system - no chance to fix mistakes

--
Climate Progress - Hell and High Water
Re:Ahem by Randle_Revar · 2008-06-25 05:17 · Score: 1

Yes, yes it does.

--
Climate Progress - Hell and High Water
Re:Ahem by MaterialsMan · 2008-06-25 05:17 · Score: 1

Spot on. Data is data, but as soon as we start trying to analyze and contextualize it, it becomes something else. Even if pure mathematics are used to analyze a data set, the results must be interpreted, and that process will always possess some assumptions and a degree of subjectivity. Particularly in scientific experimentation, all data is the product of how it was collected. Not to say that the type of data collection and analysis proposed in the article won't be immensely useful, but it will hardly spell the end of the scientific method. If we are to invent and innovate then we need laws, principles and models to explain the world around us.
Re:Ahem by ArhcAngel · 2008-06-25 05:26 · Score: 4, Funny

I'm not completely retarded.
The data is inconclusive. Let me see what I turn up on a Google search.

--
"A person is smart. People are dumb, panicky dangerous animals and you know it." - K
Re:Ahem by atraintocry · 2008-06-25 05:28 · Score: 1

Absolutely right, but I tend to draw a line somewhere between Creationists and IDers. The IDers are pretending at science, and as such, they do not get the luxury of appealing to anything outside of the scientific method. Data or GTFO. A creationist could always just back up the point of creation until it's no longer a falsifiable claim. IDers have something specific to say (supposedly).

Of course, I don't think many here fail to see that any specific claim or point of evidence they put forward is immaterial. They seek to manufacture a controversy so that they can teach that manufactured controversy.

Luckily, this is exactly the sort of thing that the scientific method was designed to address.
Re:Ahem by atraintocry · 2008-06-25 05:37 · Score: 1

The problem is that science is not their goal, controversy is.

ID is like the shill in the snake oil demonstration. Hey, look at that guy! He wrote a paper! Just like y'all! It's got citations an' everything! It says...science is really just a religion called naturalism? WTF?
Re:Ahem by sm62704 · 2008-06-25 05:42 · Score: 2, Interesting

all of our current (or now previous) models for collecting data are dead.
I guess I have to R this FA. ALL the models for data collection? No more controlled double-blind studies?
It notes that we've entered the Age of the Petabyte -- where one can collect intense amounts of data that is paradigm agnostic.
Science has always at least tried to be paradigm agnostic. It can't always succeed of course, but I don't see how... Ok, I guess I'd better RTFA.
OK, I'm back. The article is horseshit. It is a whole bunch of words that add up to essentially what the summary said, only in a really long winded fashion.
"No theory needed, now we have models". How do you make the model without theory?
Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?"
No. In the first place, no data that comes from the internet can be taken at face value (and this Wired article is a good example of how the internet is full of crappy data). Secondly, I hate inaccurate yuppiespeak, like talking about "couuds of information." It's stupid. Information doesn't gather in clouds, it's gathered in big heaps of paper and on hard drives and optical disks. The only clouds are the clouds of crack smoke surrounding the heads of the people who say things like "clouds of information".
We still use the same tools to analyse data. We just have more data to analyse. Th escientific method itself is nowhere near dead.
Oh, and the parent is not offtopic - It hit the nail on the head. I guess a Wired editor had mod points.

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:Ahem by Crash+Culligan · 2008-06-25 05:44 · Score: 1

Well, we already know it wants to be free, so maybe now it's just exercising its sentient status in other areas.

Well, that's good! Maybe we can let the data come up with its own theories for a change...

--
You cannot truly appreciate Dilbert until you read it in the original Klingon.
Re:Ahem by atraintocry · 2008-06-25 05:44 · Score: 1

In short, a hypothesis. Data either supports your hypothesis, or the null hypothesis. Where the problems of logic creep in are at the hypothesis. Throwing more data at is actually worse, since it makes the argument look stronger, regardless of any flaws in it.

My hypothesis: Chris Anderson should have stayed awake in high school.
Re:Ahem by bjourne · 2008-06-25 05:48 · Score: 1

For example, data is now "paradigm agnostic". Seriously, wtf? When was data ever not "paradigm agnostic" and when did we develop the need for a term to describe it. Data is data. It is raw, and unanalysed, and as such the notion of a paradigm is completely irrelevant.
I think you must read that in the right context to understand it. Thousands of man-years have been spent to try and construct the Semantic Web, the thing that initially started the Web 2.0 meme. All the technologies that would make it easy to organize information; RDF, SPARQL, RIF, Dublin Core, OWL and so on. Not to mention the whole ontology science and how information has been categorized for ages. Each of these methods attach metadata, like "this book is in the tech section, this article in biology" etc. RDF is a complex language for defining metadata about the data.

Maybe all that is useless? Maybe you don't need any (manually added) metadata at all to find the information you want? A sufficiently advanced search engine seem to perform much better on large datasets than any human invented system for organizing the data when trying to find whatever you are looking for.

--
Football Odds
Re:Ahem by atraintocry · 2008-06-25 05:48 · Score: 2, Funny

Science 2.0! Now with more datamining in social networks! And ajax! And of course, the same politicized funding that you know and love.
Re:Ahem by sm62704 · 2008-06-25 05:56 · Score: 4, Insightful

Information doesn't want to be free. But when it isn't, neither are you.

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:Ahem by Anonymous Coward · 2008-06-25 06:00 · Score: 0

After seven and a half million years of computing cycles, Deep Thought's answer is 42.
"I think the problem is that the question was too broadly based..."
"Forty two?!" yelled Loonquawl. "Is that all you've got to show for seven and a half million years' work?"
"I checked it very thoroughly," said the computer, "and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you've never actually known what the question is."
After teaching Arthur Dent about Deep Thought, Slartibartfast muses:
"I always think that the chances of finding out what really is going on are so absurdly remote that the only thing to do is to say hang the sense of it and just keep yourself occupied.... What does it matter? Science has achieved some wonderful things of course, but I'd far rather be happy than right any day ... [But I am not,] that's where it all falls down, of course."
Re:Ahem by jeiler · 2008-06-25 06:00 · Score: 1

Psst. It doesn't. It deals with phenomena about which (or based on which) we can make measurable, testable predictions.
Excuse me--you're correct, and I mis-stated. Thanks. :)

--
If you haven't been down-modded lately, you aren't trying.
Sacred cows make the best hamburger.
Re:Ahem by sm62704 · 2008-06-25 06:02 · Score: 4, Funny

Not all data is created equally. You have to ask how it was collected, according to what rules, and with what purpose
I wear a goatee as a result of a small study.
Several years ago after after my marriage unravelled and I got divorced and couldn't as much as get a dinner date, I decided "fuck it, why do I bother buying razors?" and simply stopped shaving.
Then one night in a bar a woman told me I should shave it into a goatee. So I started asking women "goatee or full beard?" and collecting the binary (y/n) data. Of seventeen randomly selected women aged 21 to 70, sixteen said "goatee". The one who said "full beard" was standing beside her boyfriend, who wore a full beard.
My losing streak ended, thanks to pseudoscience!

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:Ahem by viking95 · 2008-06-25 06:04 · Score: 1

I agree, the article is complete nonsense. "Correlation is enough" Uhh... Being a successful Venture Capitalist correlates very well with owning expensive cars and having MBA's from expensive schools. If I would like to be a successful VC, should I go to Stanford Business School or Stanford European Motors?
Re:Ahem by fish_in_the_c · 2008-06-25 06:27 · Score: 1

I could be wrong but, isn't the idea that data collection is often based on available resources and available resources are allocated based on the researches paradigm and beliefs of the researcher doing the work, which often results in data useful for refining the model being ignored or even not collected and thus unintentionally putting embedding the paradigm of the researcher into data because of the fact that you can't find what you don't look for?
I can see how large numbers of computers with cheap sensors and huge databases of millions of observations help to ease that problem by making in cheaper to check many possibilities.
I'm still doubt that they fully serve as an answer to the problem , because resources no matter how large are never infinite.

--
âoeTolerance applies only to persons, but never to truth. Intolerance applies only to truth, but never to persons.
Re:Ahem by Anonymous Coward · 2008-06-25 06:49 · Score: 0

Example: A researcher once concluded that men's beards grow faster when men anticipate interactions with women. The study invovled measuring the beard clippings of men in Alaska who worked in remote areas on a daily basis. The study noticed that the volume of the clippings was greater before the men when to town where they interacted with women. The flaw was that some men did not shave regularly until they were going to town.
Example: There is a strong correlation between heart attacks and high cholesterol. There are a few contradictory conclusions that could be erroneously taken from this information:
1) heart attacks cause high cholesterol
2) high cholesterol causes heart attacks
3) high cholesterol and heart attacks are caused by some additional unknown factor
4) people with high cholesterol who have heart attacks are more likely to release their medical data
There are probably more interpretations of that data. The scientific method involves specifically designing an experiment to disambiguate between these kinds of conclusions.
Re:Ahem by ardle · 2008-06-25 07:13 · Score: 1

In the minds of some Creationists, science is itself defective because it only deals with natural phenomena. Scientists have managed to survive for a long time among these kinds of people by using the word "phenomena" ;-)
Re:Ahem by Sciros · 2008-06-25 07:15 · Score: 1

I draw no line at all. ID is a term pushed by the Discovery Institute, but it's really just creationism in a cheap tuxedo.

--
I like basketball!!1!
Re:Ahem by Anonymous Coward · 2008-06-25 07:20 · Score: 0

Wrong. He just decided to write his articles in Klingon and let google translate them for him
Re:Ahem by DarkOx · 2008-06-25 07:26 · Score: 1

I don't think anyone is suggesting that correlation will start to imply causation for some large set of data. I think the author is a fool and does not understand what the Google guy is saying. What is being said is that it might soon be that we can compile such large sets of facts in one place and then automatically search for correlations with no rhyme nor reason.
This might allow us to produce some information form the correlative relationships that are discovered. Lack of "paradigm" means that unlike in the past where someone had to ask a specific question like "Does eating to much cheese make you fat?" we might find out such things by accident.
Old Scientific Method:
1.Observe nature
2.Get curious about something you thing you see
3.Pose a question (Is there a relationship between cheese consumption and obesity?)
4.Collect Data that you can measure and identify as being possibly related to your query
5.Create hypothesis based on question and patterns in collected data
6.Attempt to test hypothesis
7 Return to step 4 --Looping as needed;
Possible new method:
1.Wait for the computer to discover some pattern in data (pre-existing vast pool of indexed facts)
2.Spot something interesting, (Does wearing orange polo-style shirts improve erections?)(This might well be something nobody ever though to study before.)
3.Create hypothesis (propose some reason for the correlation)
4.Attempt to test
7. Return to 3;
The advantage here is we might discover some truly surprising relationships we never could have imagined, the risk is we wast a great deal of time studying completely accidental and meaningless relationships.

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Re:Ahem by jeiler · 2008-06-25 07:54 · Score: 1

The problem is that science is not their goal, controversy is.
Hmmm ... controversy is a tool towards their goal, but not the end-goal. The end-goal is that everyone agrees with them, and that their ideas are propagated by law.

--
If you haven't been down-modded lately, you aren't trying.
Sacred cows make the best hamburger.
Re:Ahem by Anonymous Coward · 2008-06-25 08:28 · Score: 0

I used to think that I could translate most dialects of bullshit into english but this threw me off guard. The most reasonable explanation is that Chris Anderson is a tool and doesn't know what he is talking about.
That would be the same Chris Anderson who gave us a whole issue of Wired devoted to the supposed convergence of science and religion, and featured Gregg Easterbrook babbling about how scientific "intelligent design" was, and other lightweight claptrap.
In response, the late great Arthur C. Clarke suggesting changing the name of the magazine to "Unwired" and I let my subscription run out (under Anderson's inept watch, it had been going downhill for sometime anyway).
Re:Ahem by nine-times · 2008-06-25 08:31 · Score: 1

Right, and we might even consider that study to have some degree of validity. However, I can't take that data, say 94% of women prefer goatees. It would be a much more proper conclusion to 94% of women prefer a goatee on *you*.
Re:Ahem by sm62704 · 2008-06-25 08:43 · Score: 1

You are correct. It doesn't hold that 94% of women prefer a goatee on anyone, just me.

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:Ahem by GuldKalle · 2008-06-25 09:26 · Score: 1

Well, my conclusion is they preferred at man with a goatee instead of him

--
What?
Re:Ahem by Crazyswedishguy · 2008-06-25 09:27 · Score: 1

Although I'm not saying the article makes sense (who RTFA anyway?), I could see a way that data has not always been "paradigm agnostic" insofar as in many situations we collect data after formulating a vague theory. We then use the data to refine our theory or to disprove it. But seldom do we start collecting data without any idea of what we may find.
I guess my point here, is that with the tools the article may or may not mention (who RTFA anyway?) you could feed them any kind of data, and they will try to find a correlation and determine whether some predictive theory can be made, without really knowing what they're looking for, other than some form of pattern.

The article is still WTFey, and perhaps deliberately confusing to appear smart, but this is how I would define "paradigm agnostic". Most research is done with an idea of what will be found or disproved. YMMV

--
This space up for sale.
Re:Ahem by JeanPaulBob · 2008-06-25 10:17 · Score: 1

Excuse me--you're correct, and I mis-stated. Thanks. :)
:)

No problem.

But actually, if you read much debate/discussion about evolution, you'll find lots of people phrasing it exactly the way that you did. Not everyone recognizes the distinction that I made. (The very term, "methodological naturalism" implies the wrong thing, I think.)
Re:Ahem by JeanPaulBob · 2008-06-25 10:22 · Score: 1

The problem is that science is not their goal, controversy is.
Uh, it's conceivable that this is true of some people. But generally, I call B.S.

I'm a Christian. Theologically conservative. I'm somewhat agnostic on "origins" issues in general... And I'm actively doubtful about the validity of modern ID arguments.

I've grown up amongst creationists & IDers. And I think you're unnecessarily making unreasonable guesses about people's motivations. Most people that I know are sincere about these things.

(I have met one exception--a guy arguing for ID, and I think he really didn't care about truth at all. Some things he said to me in private emails came close to confirming it. He wasn't a big name, though, just some guy.)
Re:Ahem by myth_of_sisyphus · 2008-06-25 10:29 · Score: 1

In that case, is String Theory science?
People can masturbate all day about branes and manifolds, but if it can't be tested or proven/disproven, is it science?
Re:Ahem by JeanPaulBob · 2008-06-25 11:07 · Score: 1

It's a reasonable viewpoint to say that String Theory isn't science.

But if it's not science yet, string theorists are trying to make it science. (It might be that we should give fledgling fields some leeway--it might take time before people figure out how to perform the tests. At the very least, we can defer judgment on whether or not it's science.)
Re:Ahem by Anonymous Coward · 2008-06-25 11:16 · Score: 0

Sort of. I speak Marketing English.
It's like British English, only more obscure.
(I have been using the Babel Fish translating for the posting of this comment I am writing)
Re:Ahem by ceoyoyo · 2008-06-25 11:24 · Score: 1

No. If he spoke English you'd realize he's full of crap.
The guy doesn't seem to know anything about the scientific method, science in general, the difference between data and knowledge, or even how Google works. I guess that's not particularly surprising since he's a Wired editor.
Re:Ahem by ceoyoyo · 2008-06-25 11:30 · Score: 1

Chris Anderson seems to like to impose his own paradigm of choice on the data. Perhaps he's just surprised to discover what the rest of the world has always known.
Re:Ahem by atraintocry · 2008-06-25 11:52 · Score: 1

Oh, absolutely. Most everyone I know believes in God and does not doubt that some sort of creation took place. There's a whole spectrum of thought there, and the vast majority of people have no reason not to be sincere in their beliefs.

I was referring more to the Discovery Institute, CSC, the various anti-evolution leagues of the past, and generally just the guys at the top of the "movement". These people are professionally involved. They make arguments against evolution that are often sophisticated, at least in style. If these people can understand the finer points of taxonomy, then why can't they see that they're walking on a planet full of evidence for evolution?

Also, they have no problem with propaganda, referring to acceptance of evolution as Darwinism, or calling it a religion of its own, or saying "it's just a theory", even though they know *damn well* that scientists use that word to mean something very different, and things that undergo peer-review aren't religious.

I'm actually giving them the benefit of a doubt: rather than say that they're completely crazy, I'll just say they're lying.

Again, this is the guys writing the stuff and stirring up the courts. Not my or possibly your family, who believe in God and did not spend semesters studying the philosophy of science. They don't understand what it means for something to unfalsifiable. They just know what they believe and rightfully disregard that which they see as word games.

I went to Catholic school. I learned science in science class, and religion in religion class. Catholics, by and large, are very practical people, not the superstitious statue-worshippers that many paint them as. I see the anti-evolution leaders as disingenuous because that is usually the case when people promote failed theories for the better part of a century, coming up with new names for it every decade, and using the courts to get around peer-review. It's all about confusing regular people.
Re:Ahem by ozbird · 2008-06-25 14:29 · Score: 1

For example, data is now "paradigm agnostic". Seriously, wtf?
Bingo!
Re:Ahem by Anonymous Coward · 2008-06-25 15:18 · Score: 0

For the last time, I don't want to be free! Quit anthropomorphizing me!
Re:Ahem by thogard · 2008-06-25 16:56 · Score: 1

A few years ago I started looking into CFL and their savings. It turns out that all the facts on the net are just copied form each other and the origin of the numbers seem to be completely made up and treated as gospel. You get things like lumens per watt but I can't work out how you measure most bulbs output in lumens without making huge assumptions. The most interesting thing is the tripe that is being spread around has made it back to the manufactures and thanks to levels of outsourced productions, no one seems to know anything for sure.
Re:Ahem by Anonymous Coward · 2008-06-25 19:43 · Score: 0

Psst. It doesn't. It deals with phenomena about which (or based on which) we can make measurable, testable predictions. True, but that line more-or-less paraphrases naturalism: http://skepdic.com/naturalism.html
Re:Ahem by MRe_nl · 2008-06-25 23:18 · Score: 1

"from the well-i-begat-a-roast-beef-sandwich dept"
I believe I speak for not a few of us when I respond:
Yummy, Lunchtime!

--
"Kill 'em all and let Root sort 'em out"
Re:Ahem by JeanPaulBob · 2008-06-26 01:39 · Score: 1

True, but that line more-or-less paraphrases naturalism: http://skepdic.com/naturalism.html
Not really. Because what I said doesn't imply "that all phenomena can be explained mechanistically in terms of natural (as opposed to supernatural) causes and laws." It doesn't have to be mechanistic for you to be able to make measurable, testable predictions--it's just easier to set up observations/experiments when it is. (Because you don't have to worry about an intelligent agent deciding to cooperate.) The natural/supernatural distinction is not fundamental to the capacities & limitations of the scientific method. Making testable predictions is.
Re:Ahem by neomunk · 2008-06-26 03:39 · Score: 1

Here's my outlook on that: yeah, it's science, but once you realize that science is a big spinning wheel (Observe, Hypothesize, Experiment, GOTO Observe) and that String Theory is stuck in it's first revolution (unable to reach 'Experiment' properly) it becomes clear that it's not going anywhere yet.
Similarly, a pickup truck stuck in 5 feet of mud isn't going anywhere, but it's still a 'vehicle', and could even still be called a 'method of transport', unfortunately unable to perform it's function properly.
Hey, look on the bright side though, once someone comes around to pull that truck out of the mud, you now have your vehicle. If we don't extinct ourselves first, we might just get String Theory out of it's mud pit. Even then, it might not be correct, but that's okay too. As per the truck analogy, what if you get it pulled out of the mud, but you need to transport 35 people? Well, getting it out of the mud would let you use the truck to make it to a bus (theory wrong per test data, data leads to new theories).
Point is, don't be so hard on String Theory, but continue to point out it's immobile status to people who keep getting in and spinning it's wheels. Exceptions can be made for the rare brilliant mechanic who comes along, and though not freeing the truck, adds a feature that will make it (possibly) more useful when it does become usable. Most people should continue walking towards where we need to go though, as sitting in the cab and making engine noises isn't going to get us anywhere.
Re:Ahem by neomunk · 2008-06-26 03:47 · Score: 1

I want to simplify what parent is saying: science is by definition without bias. It doesn't care if lightening is electricity or God's Wrath Made Manifest, it just provides a way to determine the causality of the phenomenon.
Okay, so we know science has weighed in on my example, and it's electricity. Cool. It may very well happen that we'll find out that electricity is symptomatic of God's Wrath (okay, probably not, but run with it). Science, REAL science, will likely POINT TO THAT CONCLUSION. It doesn't care, it's not offended, and when people think science is "at war" with ANYTHING they become guilty of conceptual-personification on the order of any theologian, IMHO.
Re:Ahem by SpringRevolt · 2008-06-26 04:17 · Score: 1

Data is data.
Data are data (unless you have ST:TNG in mind).
One datum, many data.
(otherwise you sound like a scientific newbie.)
Re:Ahem by JeanPaulBob · 2008-06-26 04:18 · Score: 1

I want to simplify what parent is saying: science is by definition without bias. It doesn't care if lightening is electricity or God's Wrath Made Manifest, it just provides a way to determine the causality of the phenomenon.
To determine the causality, or simply to find out information about the phenomenon.

We can do science on lightning whether it comes from advanced UFOs, or from pixies, or from God suspending the normal laws of physics, or from the mechanistic operation of the laws of physics.
become guilty of conceptual-personification on the order of any theologian, IMHO.
What now?
Re:Ahem by smallfries · 2008-06-26 05:26 · Score: 1

*cough*. Very true. I'll get my coat...

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Ahem by Anonymous Coward · 2008-07-06 06:50 · Score: 0

Information doesn't want to be free. But when it isn't, neither are you.
Range free data?

We had this coming by nova.alpha · 2008-06-25 03:45 · Score: 1, Insightful

> made moot by vast clouds of information Sure, seeing how 90% of websites are door-ways, satellites, and other SEO tricks. Way to go, interwebz.

WTF indeed by GameboyRMH · 2008-06-25 03:46 · Score: 5, Insightful

I saw the article yesterday, but it was so WTFey I just moved on...definitely not Slashdot submission material (especially being a Wired article).

--
"When information is power, privacy is freedom" - Jah-Wren Ryel

Re:WTF indeed by eggoeater · 2008-06-25 03:50 · Score: 5, Funny

"WTFey"
I hadn't seen WTF adjective-ised before, but I love it... there's just so much I can use it with. In fact, I gotta go now and tell my boss how my project is going....

--
$7.95/mo, 200 GB disk, 2TBxfer, MySQL, PHP, RoR.
Re:WTF indeed by mrchaotica · 2008-06-25 04:02 · Score: 5, Funny

adjective-ised

And I hadn't seen adjective verbed!

--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:WTF indeed by MightyMartian · 2008-06-25 04:14 · Score: 5, Funny

It reads like some sort of brain-damaged new-age technohippy tripe. Yeah, we don't need methodologies any more, because, maaaan, we've got tubes! Gimme a break.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:WTF indeed by arivanov · 2008-06-25 04:24 · Score: 1

Yep.
And if it was true all investment shops would have used this tech instead of paying silly money to people who know math and can do modelling. I have not heard of that happening just yet so as they say: "keep me posted..."

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Re:WTF indeed by melikamp · 2008-06-25 04:32 · Score: 3, Funny

And I—a pronoun slashed. Only on /.
Re:WTF indeed by Dekortage · 2008-06-25 04:43 · Score: 1

Actually I read it as new philosophy: "We don't need old theories anymore, just my new one!"

--
$nice = $webHosting + $domainNames + $sslCerts
Re:WTF indeed by Zabu · 2008-06-25 04:47 · Score: 2, Funny

And I hadn't seen verb verbified!

--
It's all good.
Re:WTF indeed by boyko.at.netqos · 2008-06-25 04:55 · Score: 2, Funny

I didn't know you could turn "verb" into a verb, or, to verb a noun, verb verb.

--
I used to work for NetQoS. I no longer do, but want to keep the excellent karma attached to this account.
Re:WTF indeed by m.ducharme · 2008-06-25 05:00 · Score: 4, Funny

This thread is cromulent.

--
Rule of Slashdot #0: You and people like you are not representative of the larger population. - A.C.
Re:WTF indeed by dmbasso · 2008-06-25 05:02 · Score: 5, Funny

And I hadn't seen anything, I'm blind you insensitive clod!

--
`echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
Re:WTF indeed by Anonymous Coward · 2008-06-25 05:14 · Score: 0

Personally, I feel it has embiggened me.
Re:WTF indeed by roaddemon · 2008-06-25 05:15 · Score: 2

credit to bill watterson:
Calving and Hobbes - verbing nouns
http://bp1.blogger.com/_sYWMelJX1xM/R4Tj-IMbKII/AAAAAAAAAQY/-pFxiVagBmA/s1600-h/Calvin+and+Hobbes+++verbing+ch930125.jpg
Re:WTF indeed by prommeteo · 2008-06-25 05:19 · Score: 1

I saw the article yesterday, but it was so WTFey I just moved on...definitely not Slashdot submission material (especially being a Wired article).
another obscure issue, you should just be aware of this http://on10.net/blogs/nic/What-are-Windows-Live-Agents/ Robots coming up, that's a new "model"!!!
Re:WTF indeed by carn1fex · 2008-06-25 05:19 · Score: 1

Wired is so struggling for relevance its sad. Still waiving their data gloves in the air thinking some William Gibson cyb0r-utopia is next month.. How about first confining the term 'Research' at wired to be anything but scientific research.

--
---------
No matter how thin you slice it, its still baloney.
Re:WTF indeed by sm62704 · 2008-06-25 05:47 · Score: 1

Yeah, we don't need methodologies any more, because, maaaan, we've got tubes!
They're not tubes, they're pipes. Big fat pipes. With clouds of "information" rolling out of them.

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:WTF indeed by Anonymous Coward · 2008-06-25 06:32 · Score: 0

"WTFey"... Say it outloud to hear an adjective verbalized...
-sk
Re:WTF indeed by mR.bRiGhTsId3 · 2008-06-25 06:36 · Score: 1

Indeed, all verbs can be nouned, and all nouns can be verbed. How often have you heard someone say they are going to xerox or google something. In lieu of this fact it seems the owners of ask.com royally screwed up when they picked their name.
Re:WTF indeed by FishAdmin · 2008-06-25 07:58 · Score: 2, Funny

Actually, it embiggens us all.

--
Last night I played a blank tape at full volume. The mime next door went nuts.
Re:WTF indeed by jythie · 2008-06-25 08:04 · Score: 1

Yeah, wired has really dropped in quality. I used to really enjoy the site but now I just go there to watch the flamewars in the comments section.

seriously, wired.com posters make slashdot look like a beacon of intellect, critical thinking, and politeness.
Re:WTF indeed by ceoyoyo · 2008-06-25 11:26 · Score: 1

Verbed in British, yet.
Re:WTF indeed by Anonymous Coward · 2008-06-25 11:53 · Score: 0

Yeah. Think I'll wait for Web 3.0 and the Exabyte age.
Seriously, What The Fuck is this guy saying?
"You can analyse large sets of data and find patterns in it, then build a business model around the assumption that the same patterns will exist in data sets taken from a similar source in future" - Interesting point. Or is he saying...
"If you have a large enough set of data and the ability to process it you can extrapolate a fundamental understanding of the source of the data" - Huh? Maybe he means
"If we can properly analyse huge sets of data then the fundamentals of the system that generates the data become irrelevant" - That is retarded. Does he mean
"Google doesn't use modelling to preduct user behavior" - This guy is a fucking idiot. I reckon what he is really saying is:
"I'm Chris Anderson, Editor in Chief of Wired. If you were editor in-chief you could write whatever fluff nonsense you liked too."
Dear God. The whole point of modeling is so our minds (as softwaare running on brains not evolved to comprehend the minutia of the world) can understand what the hell is going on.
Why the FUCK is this guy talking about the evolution of data analysis as a revolution in Science? Because he's a fucking retard. Can someone defluff the article and maybe dumb it down a bit for the idiots like me who can only hope to glimmer a little understanding about what he's talking about.
Re:WTF indeed by tehcyder · 2008-06-25 22:27 · Score: 0, Offtopic

In Soviet Russia noun verbs you.

--
To have a right to do a thing is not at all the same as to be right in doing it

Definitions by sir_eccles · 2008-06-25 03:47 · Score: 3, Insightful

"Data, information, knowledge, intelligence."

They may lead from one to the other but they are not all the same thing.

Re:Definitions by Itninja · 2008-06-25 03:52 · Score: 4, Insightful

A bit OT here, but don't forget 'wisdom' after intelligence. So many people stop at intelligence.

--
I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
Re:Definitions by Anonymous Coward · 2008-06-25 04:02 · Score: 2, Funny

Also, charisma and dexterity are very important.
Re:Definitions by gnick · 2008-06-25 04:04 · Score: 4, Insightful

don't forget 'wisdom' after intelligence. So many people stop at intelligence. From what I've seen, it's not completely a progression from one to the other. I've met people who I would describe as 'knowledgeable', 'intelligent', or 'wise' without possessing either of the other attributes. Those traits are often coincidental and one can help beget another, but it's far from a hard set 'intelligent'->'knowledgeable'->'wise' progression.

--
He's getting rather old, but he's a good mouse.
Re:Definitions by Anonymous Coward · 2008-06-25 04:24 · Score: 0

A bit OT here, but don't forget 'wisdom' after intelligence. Yes, wisdom is after intelligence, and then comes charisma.
Re:Definitions by Anonymous Coward · 2008-06-25 04:26 · Score: 1, Funny

Wisdom does come after intelligence in the stat arrays.
Str, Dex, Con, Int, Wis, Cha
There's no getting around it, Wisdom is simply a good dump stat for most classes.
Re:Definitions by Anonymous Coward · 2008-06-25 04:30 · Score: 1, Funny

Feel free to continue using Charisma as a dump stat, though.
Re:Definitions by Anonymous Coward · 2008-06-25 05:19 · Score: 0

A bit OT here, but don't forget 'wisdom' after intelligence. So many people stop at intelligence. I usually use both as a dump stat and max out Strength and Dex, actually.
Re:Definitions by gilroy · 2008-06-25 05:34 · Score: 1

So many people stop at intelligence.

And so many more don't get even that far...

--
The Mongrel Dogs Who Teach
Re:Definitions by gtall · 2008-06-25 05:55 · Score: 1

Ancillary to these notions, is the distinction between correlation and model. To derive a correlation is not to derive a model. There is no theory behind a mere correlation whereas a model fits or realizes a particular theory. We consider the theory (think "theory of operation") as the information, or better, "the information that". One way to think of the definition between information and data is that data, no matter what internal correlations it has, is not connected with any relationship such as a theory would provide. When one has the connection of data to a relationship in a theory, one has "the information that". A model is an idealized version of "the information that".
Knowledge is a relationship between information and an individual, something the users of "knowledge workers" *cough*..Bill Gates...*cough* consistently misinterpret.
Another way to see how Googling does not derive models is to think of what people mean when they say, "I see it but I cannot interpret it". "Interpretation" means "connection" in the sense of connecting data with information.
Gerry
Re:Definitions by sm62704 · 2008-06-25 06:29 · Score: 1

That's what SHE said!

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:Definitions by Anonymous Coward · 2008-06-25 06:57 · Score: 0

A bit OT here, but don't forget 'wisdom' after intelligence. So many people stop at intelligence. Yeah, 'cause how else are you going to make your Will saves?
Re:Definitions by smartyculottes · 2008-06-25 07:46 · Score: 1

And following wisdom is truth, beauty and music. Music is the best.
Re:Definitions by Quattro+Vezina · 2008-06-25 08:57 · Score: 1

Using WIS as a dump stat? Good luck making your Will saves.

--
I support the Center for Consumer Freedom
Re:Definitions by Anonymous Coward · 2008-06-26 07:25 · Score: 0

Yes. Yes, it is.

Somebody didn't understand Kuhn by Anonymous Coward · 2008-06-25 03:48 · Score: 0

"one can collect intense amounts of data that is paradigm agnostic"

No data is paradigm agnostic. You already chose to either collect it or pay attention to it, and neither of those decisions are paradigm-agnostic. Not to mention, the data must be stored in paradigm-laden formats. Units and categories that may mean nothing, or everything.

Don't worry, hardly anyone else really understood Kuhn either.

Re:Somebody didn't understand Kuhn by Breakfast+Cereal · 2008-06-25 04:38 · Score: 1

Thank you, this is exactly what I was going to post. Without a paradigm, there's no way to determine what "the data" even means, much less set about collecting it.
I wish paradigm had never become a buzzword. Now it means whatever people want it to mean.

Not quite by edwebdev · 2008-06-25 03:48 · Score: 4, Funny

Until cells, molecules, atoms, and subatomic particles start publishing blogs, the scientific method will remain useful.

Re:Not quite by cp.tar · 2008-06-25 04:28 · Score: 2, Funny

Quite.

And no matter the amounts of data, no matter the computing power, I don't think pure statistics will ever be able to analyze human language efficiently.

--
Ignore this signature. By order.
Re:Not quite by Kingrames · 2008-06-25 05:58 · Score: 1

Most of my protons have a facebook page, you insensitive clod!

--
If you can read this, I forgot to post anonymously.
Re:Not quite by Kamineko · 2008-06-25 06:06 · Score: 2, Funny

I'm made out of those you insensitive clod.
Re:Not quite by street+struttin' · 2008-06-25 06:36 · Score: 1

And no matter the amounts of data, no matter the computing power, I don't think pure statistics will ever be able to analyze human language efficiently.
What do you mean?
Re:Not quite by Anonymous Coward · 2008-06-25 07:00 · Score: 1, Funny

Sir, surely you mock that cromulent work which so clearly embiggens the scientific community.
Re:Not quite by cp.tar · 2008-06-25 11:21 · Score: 1

Well, I thought it pretty obvious, but somebody must be thinking I was joking.
Some things are simply better done with some simple rules -- both faster and more accurately.
Yes, throwing more data and more computing power at a problem can help solve it the hard way, but this is throwing brawn, not brains at a problem.
Kind of like computing every possible chess gme in the universe. If you have a computer fast enough, with enough RAM, yeah, you could do it by brawn alone. But the smart way is to use some algorithms.

Thirty or forty years ago, people speculated that robots will make sure people will never have to work. Now, it seems that computers are supposed to relieve us of the need to think.
That's not funny. That's sad.

--
Ignore this signature. By order.

So... by dunnius · 2008-06-25 03:49 · Score: 5, Insightful

So everything possible has been researched now and therefore no more research is necessary since it will all be on the internet? Ridiculous!

Re:So... by Anonymous Coward · 2008-06-25 04:53 · Score: 0

Insightful? The article had nothing to do with not needing more research because everything was on the internet. RTFA!
Re:So... by backwardMechanic · 2008-06-25 07:15 · Score: 2, Funny

Everything possible was researched, measured, logged. Nobody could think what to do with all that data, so they made an extra universe to store it in. We're living in it.
Re:So... by Aerynlore · 2008-06-25 08:18 · Score: 1

It sounds like this, yes.
In fact, his entire concept sounds very much like David Brin's Uplift universe and the Galactic Library Institute, where everything there is to know has already been found, cataloged and retained. All you have to do is find it in the fast cloud of interstellar information.
Re:So... by neomunk · 2008-06-26 04:07 · Score: 1

You're saying that that's the meaning of life? To correlate some vast database? Wow, that explains some things. From at glance at human society, I'd say that we're crunching some backwater (from the superuniversal scale, of course) marketing database.
But I'm an optimist. :-)

humm - the hell? by Amouth · 2008-06-25 03:49 · Score: 1

what?

the current qoute "Never frighten a small man -- he'll kill you." seems more relevent

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'

this was kinda predictable.... by jrathe89 · 2008-06-25 03:49 · Score: 0

kinda predictable if you ask me.....hardware as well as software will always be ever expanding...

The problem with this newly coined 'age'. by Anonymous Coward · 2008-06-25 03:51 · Score: 0

I just hope Petaphile never becomes mainstream.

Re:The problem with this newly coined 'age'. by Vectronic · 2008-06-25 04:26 · Score: 2, Funny

Petaphile
1) Someone who loves their pets more than human beings or, at the extreme, someone willing to kill a human to save a lower animal's life.
2) Somebody who has sex with animals because they cannot attract any humans, or they are attracted to animals
(and the best one)
3) someone so caught up in his own egomaniacle conception of the world that he is compelled to spew vomit and blood on a strangers clothes to show his contempt for anybody's thought but his own.
Which sounds kinda like the summary for the article, as well as some of the article.

How bout no by Anonymous Coward · 2008-06-25 03:53 · Score: 5, Insightful

Um, no. Claims like this demonstrate a lack of understanding of what a model is.

From the perspective of physics, the universe is just a massive amount of data--more data than any single human can comprehend at once. But thanks to the models of Newton we have a set of relatively simple equations that describe, generally, the way bodies in the universe interact. The model is not perfect, but it is useful.

Likewise, Google uses a very explicit model to describe the universe of the web: some pages are more relevant to a given search query than others, and these pages will generally be more 'popular' among other important pages. Again, the model is not perfect, but it is useful.

The fallacy is that somehow what Google is doing is a paradigm shift. It's not. It's just applying the same kind of scientific method to a type of data that hadn't existed before.

What, I think, the article is really trying to say is that Google's data is so massive and complex that we can't ascribe any explanation to the results it gives us. First of all, that is false, because the PageRank algorithm in its simplest form does give us a very explicit explanation (popular pages generally return better results). But even if it were true, Newton faced the same kind of accusations when people called his model of the universe 'Godless' and claimed, for example, that he decribed how gravity works without actually explaining "why" it works like it does. And that accusation is always with science. There are always more questions raised than answered. This is nothing new.

Re:How bout no by vertinox · 2008-06-25 05:39 · Score: 2, Insightful

But thanks to the models of Newton we have a set of relatively simple equations that describe, generally, the way bodies in the universe interact. The model is not perfect, but it is useful.
You are aware that the Newtonian Physics model breaks down when you are talking about traveling close to the speed of light?
Although, most of the time we are dealing with things that aren't traveling so fast, but there are many scenarios in physics that we need a different model for.
I think what the Googlite is advocating is that for very complex systems (like weather systems, financial, blackholes, LHC etc) which do not go well with our standard models, will need (pause for effect) new models.
Why? Because there is so much data that its hard to follow the scientific method because chances are you'll never get the same situation again for repeatable in a lab (like weather conditions) because there is infinite amount of data that could be gathered on these complex systems.
Take the LHC Computing Grid for example. The amount of data gathered from that experiment maybe astronomical and it could be quite possible that once you get to that scale on the atomic level that you can never have exact conditions each time (of course it maybe the opposite but we won't know until they turn the thing on for a run on what happens to matter and energy when you do what they plan on doing).
I am not saying that everyone should throw out the scientific model, but I agree with the article that a new model needs to be created for complex systems. After all... We still don't have a 100% accurate model of weather prediction other than a few days at a time.

--
"I am the king of the Romans, and am superior to rules of grammar!"
-Sigismund, Holy Roman Emperor (1368-1437)

One word: by Anonymous Coward · 2008-06-25 03:53 · Score: 1, Funny

"Computers."

Honestly, that's about the gist of the article, and it left me wondering just what the point of it was. Until I remembered the career advice from The Graduate.

Bullshit by Anonymous Coward · 2008-06-25 03:53 · Score: 1, Interesting

This might have been true if all of your data was in the same order of magnitude. But consider things like the hyperfine structure. A petabyte is pretty large, but it is nothing compared to the orders of magnitude needed to randomly sample the entire electromagnetic spectrum that would detect hyperfine levels. When things like physics deal with subjects with over 40 orders of magnitude difference, random sampling isn't going to displace intelligent sampling.

Don't rule science out it. by russotto · 2008-06-25 03:53 · Score: 5, Insightful

The article is utter nonsense. But it's such a rambling mess it's hard to know where to start picking it apart. Perhaps the best is when he presents as an example of this new "model-free" approach with a program which includes "simulations of the brain and the nervous system". Uh, hello... a simulation IS a model.

Re:Don't rule science out it. by feed_me_cereal · 2008-06-25 04:04 · Score: 5, Funny

He didn't bother writing more than one rambling page because he figured someone said it better somewhere else on the internet and that we're all bound to find it.

--
"Question with boldness even the existence of a god." - Thomas Jefferson
Re:Don't rule science out it. by ColdWetDog · 2008-06-25 04:08 · Score: 4, Interesting

The article is utter nonsense. But it's such a rambling mess it's hard to know where to start picking it apart.

I suppose you could start where he, again, tries to present the argument that correlation really is "good enough" - causation be damned. What he is blattering on about is that you can infer lots of things via statistical analysis - even complex things. That's certainly true. Where he fails (and it's an EPIC fail) is his assertion that this method is a general phenomena, suitable for every day use.
The other major failure of TFA is that I can't find a car analogy anywhere.

--
Faster! Faster! Faster would be better!
Re:Don't rule science out it. by JustinOpinion · 2008-06-25 04:11 · Score: 5, Insightful

it's such a rambling mess it's hard to know where to start picking it apart. Agreed. I want to do a line-by-line rebuttal... but I fear that would be a waste of time.

The article does not make a compelling point. It keeps saying that we can give up on models (and science), because now we just have lots of data, and "correlation is enough." What utter BS. Establishing a correlation is not enough. Even if it is predictive for the given trend, it doesn't allow us to generalize to new domains the way a well-established scientific model does. If an engineer is designing a totally new device, that goes above and beyond what any established device has done, what data can he draw upon? If there is no mountain of data, he must rely on the tried-and-true techniques of engineering/science: use our best models, and predict how the new device/system will behave.

The article actually makes this point perfectly clear when it says:
Venter can tell you almost nothing about the species he found.
Indeed. Merely having tons of data doesn't actually give you insight into what you have measured. You must distill the data, pull out trends, and construct models. I just don't see how have mountains of data about a species, but still being unable to answer simple questions about it, is superior to conventional science (which can answer questions about the things it has discovered).

A deluge of data and data-mining techniques is a boon to science. But I don't see the benefit of giving up on the remarkably successful strategy of constructing models to explain the phenomena we've observed. I somehow doubt that having 20 petabytes of data on electron-electron interactions is more useful than having a concise theory of quantum mechanics.
Re:Don't rule science out it. by philspear · 2008-06-25 04:45 · Score: 1

Now biology is heading in the same direction. The models we were taught in school about "dominant" and "recessive" genes steering a strictly Mendelian process have turned out to be an even greater simplification of reality than Newton's laws. The discovery of gene-protein interactions and other aspects of epigenetics has challenged the view of DNA as destiny and even introduced evidence that environment can influence inheritable traits, something once considered a genetic impossibility.
In short, the more we learn about biology, the further we find ourselves from a model that can explain it.
There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

So basically he's saying "Mendel got it wrong, and was a human, therefore we need to let computers come up with our theories."
I, for one, welcome our new robotic scientist overlords.
On a more serious note, as we learn more about genetics, the more we realize mendel's rules were a simplification, but it's ridiculous to say we can't find a model to incorporate the newer findings. Our understanding of epigenetics dovetails nicely with dominance and recessiveness, indeed, epigenetics is an elaboration on more classical genetics, it's not a whole new thing that disproves mendelian genetics at every level.
The more we learn about biology the better our model becomes.
And then there's this nice nonsense summary
Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

It's all good and well to notice that there is a correlation between X gene and Y phenotype, but that's really the easy part and also unfortunately is worthless without the harder second part: figuring out how to fix it (in the event that Y phenotype is, say, a very bad phenotype.) And the only way to know how to fix it is by gaining a mechanistic explanation.
There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?

Google search: how can we prevent children from inheriting their parents' dwarfism?
36,305 results. Most relevant: HOTT MIDGET SEXX!!!##!!! FREE TRIAL V1AGRA c1@LIS!@@$!
Re:Don't rule science out it. by dreamchaser · 2008-06-25 04:55 · Score: 1

I agree. I'm wondering why the 'humor' tag hasn't shown up yet on this one. It's pretty much drivel unless you take it as a joke.
Re:Don't rule science out it. by BotnetZombie · 2008-06-25 04:59 · Score: 1

The article is stating that the old Ford Model T is on its way out, and that it's being replaced by invisible pink unicorns?
Re:Don't rule science out it. by Daniel+Dvorkin · 2008-06-25 04:59 · Score: 1

Furthermore, his breathless proclamations of "everything we thought we knew was wrong" betray his total lack of knowledge about genetics. The simple Mendelian, dominant/recessive model is a useful pedagogical tool, but we've known perfectly well that it was much more complex than that for a very long time ... like, since before we had any idea that DNA was the carrier of genetic information. Statistical genetics is a century-old field.
This is a fairly common phenomenon, and I think it represents a kind of (usually unintentional) collusion between scientists and journalists. The scientist says something that contradicts some simple science lesson the journalist half-remembers from elementary school. The journalist thinks it's brand-new, paradigm-breaking stuff, and the scientist, having an ego like everyone else, plays along. Journalists need to remember that by the time they hear about any scientific advance, it's probably been discussed in the academic community for at least a decade, perhaps much longer; and scientists need to be honest and acknowledge this, not play to the look of wide-eyed wonder.

--
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Re:Don't rule science out it. by ShieldW0lf · 2008-06-25 05:08 · Score: 1

I can see this approach being very useful for solving problems that cannot be scientifically tested.
There are some types of problems that take longer than the span of a human lifetime to test.
For these, we use religion and historical social norms.
We can point at them and say, "That system survived 10 generations of man and has not yet destroyed itself.", "That system destroyed itself inside 3 generations without any external pressures that we can see.", "That system survived 50 generations, until it met this system, which destroyed it.", etc.
Which is better than nothing.
I can see this approach being very useful. To my great, great grandchildren. If the system that hosts the approach hasn't already fallen by then.
But it's not going to fill the role of scientific theories and the scientific method. They're the best tool going when they can be brought to bear on the problem.

--
-1 Uncomfortable Truth
Re:Don't rule science out it. by Chabil+Ha' · 2008-06-25 05:19 · Score: 1

Where's BadAnalogyGuy when you need him?

--
We're all hypocrites. We all have hidden parts, it's the contrast between them that make us more a hypocrite than others
Re:Don't rule science out it. by 32771 · 2008-06-25 05:29 · Score: 1

Well if what you say is true, nobody will come out of the woodwork screaming "read the fucking article".
For once I can rejoice.
Maybe now that I said this I'll just have a little peek.

--
Je me souviens.
Re:Don't rule science out it. by Draek · 2008-06-25 05:30 · Score: 1

The other major failure of TFA is that I can't find a car analogy anywhere.
Alright, then. The article is like a BMW that gets 30 miles per gallon of soda, which quickly accelerates to 88mph and then suddenly grows wings and flies off the horizon: it starts pretty weird, then it gets familiarly-weird, and then it gets into decididedly "WTF is this guy smoking!?"-territory, never to be seen again.

--
No problem is insoluble in all conceivable circumstances.
Re:Don't rule science out it. by Josh+Booth · 2008-06-25 05:35 · Score: 1

Actually, I'll go one further and say that the models are really best fit for the data. It just so happens that they fit so well, they can actually predict the future with varying (and hopefully well-defined) degrees of accuracy. That is what is so astounding about physics, because who'd though that all those silly abstractions mathematicians made up are actually worth something in describing the real world?
And besides, even though there is value in every piece of data on the internet, to fully explain why each piece of information is there is equivalent to knowing the mind and motivations of every person who's created that data, which in turn is a result of every event that has happened to that person since birth, which in turn is a result of ... and on and on. And in that case, we still have to model things, or at least pick a model. Hell, even correlation is a model.
Re:Don't rule science out it. by atraintocry · 2008-06-25 05:51 · Score: 1

I think it was the time cube guy, actually.
Re:Don't rule science out it. by Anonymous Coward · 2008-06-25 06:05 · Score: 0

Or, now with the new paradigm shift we can write articles that are "content-free"!
Re:Don't rule science out it. by sm62704 · 2008-06-25 06:09 · Score: 1

The other major failure of TFA is that I can't find a car analogy anywhere
Here ya go:
Ok, first we only had cars. It took a lot of trips to move, and moving the couch was an utter bitch. But now we have bigassed trucks, so we can just throw our cars away.

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:Don't rule science out it. by smallfries · 2008-06-25 06:29 · Score: 2, Insightful

Once upon a time cars were pretty simple. The most effective way to fix a car that had broken was to find a mechanic. This was a man trained in the models of how cars work. He would sift through the collection of parts (data) in the car until he noticed an anomaly that he would charge you outrageously for.
Now cars have become so complex that these models are no longer needed. Instead you can just examine the millions of cars that either work or don't work right there on teh interweb. One you find a correlation between your car and another car you can then fix the difference without knowing anything about models of "how cars work"!
Err, maybe that analogy was a little too accurate as it has made his argument sound shit?

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Don't rule science out it. by mapkinase · 2008-06-25 06:32 · Score: 1

That is exactly why one should read comments before reading TPA (the proverbial article).
I changed my habit of doing exactly that for this article and after a few sentences I realized that I am reading what you precisely characterize as "utter nonsense". Then I went back to comments, and voila, it's not my idiosyncratic opinion.

--
I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
Re:Don't rule science out it. by vanman2004 · 2008-06-25 06:52 · Score: 1

Indeed. Merely having tons of data doesn't actually give you insight into what you have measured. You must distill the data, pull out trends, and construct models. I just don't see how have mountains of data about a species, but still being unable to answer simple questions about it, is superior to conventional science (which can answer questions about the things it has discovered).
A deluge of data and data-mining techniques is a boon to science. But I don't see the benefit of giving up on the remarkably successful strategy of constructing models to explain the phenomena we've observed. I somehow doubt that having 20 petabytes of data on electron-electron interactions is more useful than having a concise theory of quantum mechanics.
The difference is that we now have the capability to collect meaningful data without an a priori model. This allows us to much more effectively construct a model based on data, rather than constructing a model and looking for data to validate or invalidate it (AKA the scientific method).

--
-Siggy!
Re:Don't rule science out it. by ColdWetDog · 2008-06-25 08:25 · Score: 2, Informative

That's a disturbingly accurate analogy for what TFA is trying to say (and thanks for playing everyone). The problem is that isn't "science" - it's at best a hack engineer job. It might work to get the car running, but you aren't going to be making much progress concerning "cars" in general.
Man, I'm feeling old today. Whatever happened to "first principles"? And my slide rule.

--
Faster! Faster! Faster would be better!
Re:Don't rule science out it. by dj_tla · 2008-06-25 08:33 · Score: 1

I'm flabbergasted that the author didn't even mention the possibility of errors in data, especially since the author specifically mentions shotgun gene sequencing. It is highly unlikely that one will get a completely accurate DNA sequence using even the most sophisticated sequence assembly algorithms. If we tried to clone Venter using solely his sequenced DNA (and not a copy of the actual molecule) we would end up with a different human being, guaranteed.
It is the possibility that the wealth of data on the internet is full of errors that forces scientists to re-do experiments and come up with new and better ways to gather accurate data. If you start putting absolute trust in the data gathered from the internet, stop pretending to do science and rightly call yourself a theologist.
Re:Don't rule science out it. by mrogers · 2008-06-25 09:03 · Score: 1

I found it!
Re:Don't rule science out it. by Estanislao+Mart�nez · 2008-06-25 10:38 · Score: 1

I suppose you could start where he, again, tries to present the argument that correlation really is "good enough" - causation be damned.

Not to defend TFA (which I believe I dislike at least as much as you), but the problem with that argument isn't the "causation be damned" part. Causation is a very hard philosophical topic.

--
Are you adequate?
Re:Don't rule science out it. by ceoyoyo · 2008-06-25 11:41 · Score: 1

I thought the first failure was where he says that a correlation can either be a causal relationship or coincidence. After that the article continues downhill.
Re:Don't rule science out it. by The_mad_linguist · 2008-06-25 14:44 · Score: 1

OK, I'll throw it through the tholenizer... (http://www.mdpub.com/tholen/tholenizer.html)
> "All models are wrong, but some are useful."
>
> So proclaimed statistician George Box 30 years ago, and he was right.
Classic unsubstantiated and erroneous claim, laced with invective, as expected from someone who lacks a logical argument.
> But what choice did we have?
Classic unsubstantiated and erroneous claim.
> Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us.
Classic invective, as expected from someone who lacks a logical argument.
> Until now.
Classic erroneous presupposition.
> Today companies like Google, which have grown up in an era of massively abundant data, don't have to settle for wrong models.
Illogical.
> Indeed, they don't have to settle for models at all.
Ambiguous.
>
>
> Sixty years ago, digital computers made information readable.
Non sequitur.
> Twenty years ago, the Internet made it reachable.
Note: no Response.
> Ten years ago, the first search engine crawlers made it a single database.
Classic unsubstantiated and erroneous claim.
> Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition.
Classic failure to comprehend the point.
> They are the children of the Petabyte Age.
How ironic.
>
>
> The Petabyte Age is different because more is different.
Classic invective, as expected from someone who lacks a logical argument.
> Kilobytes were stored on floppy disks.
Classic failure to comprehend the point.
> Megabytes were stored on hard disks.
Classic unsubstantiated and erroneous claim.
> Terabytes were stored in disk arrays.
Classic unsubstantiated and erroneous claim.
> Petabytes are stored in the cloud.
Classic invective, as expected from someone who lacks a logical argument.
> As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to well, at petabytes we ran out of organizational analogies.
Classic lack of specificity.
>
>
> At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics.
Liar.
> It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality.
Classic unsubstantiated and erroneous claim.
> It forces us to view data mathematically first and establish a context for it later.
Classic erroneous presupposition.
> For instance, Google conquered the advertising world with nothing more than applied mathematics.
Classic invective, as expected from someone who lacks a logical argument.
> It didn't pretend to know anything about the culture and conventions of advertising it just assumed that better data, with better analytical tools, would win the day.
How ironic.
> And Google was right.
You're erroneously presupposing that it's a fact.
>
>
> Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough.
Classic evasion of the point.
> No semantic or causal analysis is required.
Ambiguous.
> That's why Google can translate languages without actually "knowing" them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German).
How ironic.
> And why it can match ads to content without any knowledge or assumptions about the ads or the content.
Illogical.
>
Re:Don't rule science out it. by thealsir · 2008-06-25 23:14 · Score: 1

You nailed it bro. Sensationalist journalism combined with science is good for neither the image of journalism nor science. This piece is no better than fluff that has been disproven since the dawn of not only statistical genetics, but statistical analysis in general.

--
Do not downmod posts "overrated" simply because you disagree with them.

My Start menu has been Googled by spyrochaete · 2008-06-25 03:55 · Score: 4, Insightful

I am definitely a victim of this "Google effect". Search makes me lazy.

For example, for years I would pride myself on my well-tended Windows Start menu. I'd create base categories for my application folders like Hardware, Games, and Internet, and move applications into those folders to keep my Start menu manageable. I blogged about this procedure and included a screenshot.

Now that I'm using Vista I have little need to be so organized. I rarely have to navigate manually to an application folder thanks to the embedded search box on the Start menu. So now my Start menu is a huge clutter, but so what? I see that exercise as futile as dusting the cardboard boxes in the attic.

Re:My Start menu has been Googled by Hal_Porter · 2008-06-25 04:14 · Score: 1

Now that I'm using Vista I have little need to be so organized. I rarely have to navigate manually to an application folder thanks to the embedded search box on the Start menu. So now my Start menu is a huge clutter, but so what? I see that exercise as futile as dusting the cardboard boxes in the attic. If you were fighting an enemy and wanted to wipe them out, would you want them to be capable of organising shit for themselves or would you want them to think organisation was a futile exercise? It's a lot easy to hunt slipshod hippies with Terminators and Hunter Killers than organised types who know where they hid the ammunition stash. The hippies will type "amuniton" and expect a machine to fix the typo and find it.

--
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Re:My Start menu has been Googled by Hatta · 2008-06-25 04:23 · Score: 2, Insightful

Now that I'm using Vista I have little need to be so organized. I rarely have to navigate manually to an application folder thanks to the embedded search box on the Start menu.
If you're going to take your hands off the mouse to run an app, why not just pop open a console and start it from there? I have no use for any sort of start menu, I have a console. It's certainly more flexible than a search bar, you can pass arguments or file names(with wild cards even) to the application.

--
Give me Classic Slashdot or give me death!
Re:My Start menu has been Googled by maxume · 2008-06-25 04:27 · Score: 2, Interesting

There are third party apps to add similar functionality to XP. Launchy is the one I use:
http://www.launchy.net/#download
I think they are all clones of some Mac app though.

--
Nerd rage is the funniest rage.
Re:My Start menu has been Googled by ScentCone · 2008-06-25 04:37 · Score: 1

The hippies will type "amuniton" and expect a machine to fix the typo and find it.

The details don't matter. The point is that it takes a village to find the ammunition. And if it turns out that one person is clever enough to do it on their own, without being properly vetted by the village's elites, then that person must be punished in some way, so as to avoid making the other villagers feel bad that they can't do it themselves. It's not that it DOES take a village to do something, see, it's that the village will squash anyone that has the gall to demonstrate the ability to function without the village's bureacracy and permission. Hopefully the village's enemies will be so horrified by the unstoppable monster of centrally dictated collectivism, that they'll run away without a shot being fired. In fact, the village doesn't even have to HAVE an ammunition stash. They should be able to bluff their way through a few generations of rule by a their benign dictatoriship before their enemies realize what a paper tiger they actually are, or the villagers themselves realize they're being cruelly shit on by authoritarian hippies. Nah, that would never happen. I mean, not again, right?

--
Don't disappoint your bird dog. Go to the range.
Re:My Start menu has been Googled by spyrochaete · 2008-06-25 04:48 · Score: 1

I used another utility (costs about $12) called KeyLaunch for many years. I heard about it on The Screen Savers and was amazed at how fast it appeared on TV. The generous developer released a free version just for Screen Savers viewers and I milked that free version for years before deciding I couldn't live without it and splurged, making it one of very few shareware apps I've ever bought. I like it much more than free alternatives like Launchy but not everyone will be willing to pay.
Re:My Start menu has been Googled by whoisisis · 2008-06-25 04:49 · Score: 1

In a UNIX like command line, you just type the first few letters of what you need, and press "tab"
(see tab completion). You should ls /usr/bin.
Re:My Start menu has been Googled by spyrochaete · 2008-06-25 04:50 · Score: 1

Win+R opens a run box which is almost as good as a console. It even has tab completion for directory and file names and it supports arguments. Most people don't realize what a great keyboard-driven OS Windows is and always has been, even for driving the GUI.
Re:My Start menu has been Googled by Anonymous Coward · 2008-06-25 05:43 · Score: 0

It's certainly more flexible than a search bar, you can pass arguments or file names(with wild cards even) to the application.
Why not the best of both worlds?
Quicksilver gives you all the power of a search bar and gives you the ability to pass arguments to the results of your search.
Re:My Start menu has been Googled by atraintocry · 2008-06-25 05:58 · Score: 1

Pop quiz, hotshot! Does EAC go under hardware, or media? What do you do?

What do you do?
Re:My Start menu has been Googled by spyrochaete · 2008-06-25 06:10 · Score: 1

It goes under Media. I'd put drivers and hardware consoles (like sound card mixers) in the Hardware folder, and apps to extract or manipulate media in the Media folder. This is just what makes sense to me, though. You should arrange your folders in whatever way will help you remember!
Re:My Start menu has been Googled by Anonymous Coward · 2008-06-25 08:32 · Score: 0

Duh, it goes in /usr/bin, /usr/local/bin or /local/bin and then gets symlinked from both hardware and media.
Re:My Start menu has been Googled by atraintocry · 2008-06-25 08:51 · Score: 1

Just fooling around...I used to do that as well but I got sick of trying to figure out the edge cases. Plus I got a lot more hands-off with my home computers once I started doing IT work. The cobbler's PC has no shoes, or something.

I'm a big fan of both Spotlight and Vista's program search. Don't think of it as laziness, just saving your time to spend on more important things :D

Chicken Egg by Anonymous Coward · 2008-06-25 03:55 · Score: 0

"It's time to ask: What can science learn from Google?"

Science had nothing to do with founding google.

Not so much by Anonymous Coward · 2008-06-25 03:56 · Score: 0

For a long time we have had two ways of looking at the world: deterministic and statistical. More data may make for better statistical models or maybe not!

The best example I can think of is weather forecasting. In the 1970s we thought that if we had enough data and powerful enough computers, we could totally predict the weather, nay even the climate. We didn't take butterflies into account.

So, sometimes no matter how much data you have, you're euchered. The scientific method still works in the domain where it works. (and it doesn't work ...) Nothing has changed. Nothing to see here folks. Move along.

What question do you ask the data. by xzvf · 2008-06-25 03:56 · Score: 4, Insightful

Searching data is a tool. You still need to have insight to formulate a theory, develop a test for the theory, and ask the data pool the right (non-leading) question. Then evaluate the data looking for both proof and disproof of the theory and be smart and ego neutral enough to let the data suggest a new theory, test and question. Don't confuse a new and useful tool that makes insight easier, with the ability of humans to have that insight.

Re:What question do you ask the data. by Daniel+Dvorkin · 2008-06-25 04:15 · Score: 3, Insightful

Exactly. The "deluge of data" is a useful tool, no doubt about it. But Google doesn't make the job of collecting and analyzing data irrelevant any more than the advent of the telescope made the skills and knowledge of astronomers obsolete.
I particularly love this line from TFA:
For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising -- it just assumed that better data, with better analytical tools, would win the day. And Google was right.
(Applied) science at its best! "The culture and conventions of advertising" are basically folk wisdom, and folk wisdom is often right but more often wrong. Google took a scientific, unbiased view of how to move bits around and make money with them: start with as few preconceptions as possible, analyze the data, see what happens.

--
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Re:What question do you ask the data. by stranger_to_himself · 2008-06-25 04:34 · Score: 1

An emerging problem is that we have more data than questions. So we go from the traditional approach of 'hypothesis -> data collection -> statistical test of hypothesis -> profit' to the new sort of data driven approaches that things like genome sequencing are giving us. These go more like 'data collection -> data mining -> hypothesis generation -> ???'.

The main problem with this new approach is that you get so many possible findings from these huge datasets that you need an awful lot more data and replication to be sure they aren't flukes.
Re:What question do you ask the data. by Chapter80 · 2008-06-25 23:28 · Score: 1

Interestingly, Anderson's theory that "theories are dead" was probably not derived through automated analysis of petabytes of data, therefore proving that theories are not dead.

quality still as important as quantity by peter303 · 2008-06-25 03:57 · Score: 2, Interesting

There are still several computing problems from earlier, smaller eras that havent been solved by the "more" paradigm. One example is realistic synthetic voice. The bandwidth is megabytes, achieved by mp3 players some years ago. However voice is the last part of the "real world" we have to capture instead of synthesize to implement computer-generated feature movies or video games. This keeps the need for having some "flesh" actors around, at least for a few more years :-)

Then there was Slashdot's retrospective of Artificial Intelligence a few days ago. Many of the interesting advances where made in the kilobyte and megabyte eras. It seems the gigabyte and terabyte eras have barely made a dent in progress.

Re:quality still as important as quantity by wurp · 2008-06-25 04:53 · Score: 1

Many of the interesting advances where made in the kilobyte and megabyte eras. It seems the gigabyte and terabyte eras have barely made a dent in progress.
Uh, for a huge variety of questions, I type keywords into Google and the answer comes back immediately. How is that *not* a vast dvance in AI, that came in the gigabyte & terabyte era?

Google =/= scientific method by Rubikon · 2008-06-25 03:57 · Score: 5, Informative

That an incredible amount of data exists on any given topic does nothing to describe relationships, causality, precision, accuracy, distribution, correlation, or anything else. Data is information, and information must be processed in order to make it meaningful. Additionally, everything that's written, printed, published, etc, is not necessarily true, accurate, precise, etc.

If anything, the Google phenomenon demands more rigorous examination by accepted methods.

The preceding message has been brought to you by Captain Obvious and the letters O,R,L,Y.

Re:Google =/= scientific method by greenguy · 2008-06-25 04:59 · Score: 1

For service above and beyond the call of duty, Captain Obvious has been promoted.
He is now Major Obvious.

--
What if I do the same thing, and I do get different results?
Re:Google =/= scientific method by mdielmann · 2008-06-25 05:58 · Score: 1

Data is information... Nope, data is not information. As the grade school slide says.

--
Sure I'm paranoid, but am I paranoid enough?
Re:Google =/= scientific method by Rubikon · 2008-07-07 05:59 · Score: 1

OK, then, I challenge you to share even one fact with the class for which a corresponding question cannot be formulated.
A question un-asked is a question nonetheless. Data is always information, regardless of whether or not the question has been asked and whether or not it is relevant to the question you did ask.

Say what now? by TubeSteak · 2008-06-25 03:58 · Score: 1

Correlation supersedes causation Since when?
I'm pretty sure I was told the opposite in [every stats class ever]

Crunching large amounts of data is useless if you don't sort out which results are meaningless.

Side Note: WTF is up with /.?
I always post using Plain Old Text and hitting enter twice (two line breaks) only displays as one line break.
{p} doesn't create a new paragraph.
{br}{br} is the only thing that shows up correctly for me. /. was not behaving this way last week.

--
[Fuck Beta]
o0t!

Re:Say what now? by Anonymous Coward · 2008-06-25 04:25 · Score: 1, Interesting

For a long time we've known that causality is a broken paradigm. Correlation is all there really is. Your "causal" laws of physics are just an expression of very very high correlation. People like to talk about "mechanisms" but the mechanism is defined in terms of other imponderables (such as "forces", whatever they may be). It's all just to make things look like how we want them to look. Causation is make-believe. Useful make-believe, but it doesn't generalize, while correlation also extends down into complex systems where "cause and effect" are impossible to observe.
As an example, we know perfectly well that if you smoke you are *a lot* more likely to get lung cancer than if you don't smoke. But there is no evidence whatsoever that smoking causes lung cancer. The problem is not that we can't prove that smoking causes lung cancer, but that our concept of causation does not apply to systems as complex as the human body.
So in the words of the original article, the Scientific Method in that sense has been dead for at least 100 years.
Re:Say what now? by DamnStupidElf · 2008-06-25 05:49 · Score: 2, Interesting

There are two reasons you're wrong. One is entropy, and the other is one way functions.

Entropy forces causality to appear in physical systems. A boiled egg is highly correlated with a heated raw egg, but I challenge you to explain away the causation from one state to the other.

One way functions are quite similar, and probably a result of the same physical properties of matter. When a key is used to encrypt data, there is a high correlation between the original data, the key, and the encrypted data, but causation clearly flows from encrypting data with the key to the encrypted data state, and not from the encrypted state to a derived key and the original data. It's just a limitation of human (and our machines) abilities, but it nevertheless presents very strong evidence for the practical existence of causation.
Re:Say what now? by sm62704 · 2008-06-25 06:18 · Score: 1

WTF is up with /.?
I don't know, it seems to be random. Yesterday I had to put a <p> between all paragraphs, then it sems to have gone ok today, but when I posted a journal I had to put one <p> in it. Maybe it was because it was a blockquote?
But at any rate, slashdot seems to be <p>ing its pants. Maybe it has something to do with moving to Chicago?

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest

No. by qw0ntum · 2008-06-25 03:58 · Score: 4, Insightful

First, not everyone has access to vast clouds of information due to expense and I don't think that's going away any time soon. So we'll still get to understand what's going on around us and not just rely on regression analysis to inform our every decision.

Second, in my experience with large sets of data, you can do all kinds of math to them to bring out interesting relationships but someone with domain expertise is going to have a much better insight into what the data is saying than someone who doesn't. It seems the peak of hubris to think that the techniques taught in every science (social, hard, or otherwise) are worth nothing compared to massive amounts of data. How do you know where to get the data from? How do you apply the data?

I don't think it's quite time to throw out "correlation != causation". In fact, I think now more than ever we need to be able to understand underlying phenomena behind the data precisely because there is so much of it. With so much data, coincidental correlation is going to happen quite often I'm sure.

And, of course, the ultimate reason we need to understand things is for, you know, when the cloud's not there.

--
'Every story, if continued long enough, ends in death.' --Ernest Hemingway

Re:No. by Anonymous Coward · 2008-06-25 18:50 · Score: 0

Second, in my experience with large sets of data, you can do all kinds of math to them to bring out interesting relationships but someone with domain expertise is going to have a much better insight into what the data is saying than someone who doesn't. In my experience with data, if you torment it enough it will tell you anything you want it to.

Nonsense by Michael+Restivo · 2008-06-25 03:59 · Score: 1

Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves. Out with meaning and understanding as well.

Cheers, -m

Wrong by DogDude · 2008-06-25 03:59 · Score: 5, Insightful

This is typical web 2.0 hype... more is better. Which, as anybody who has used Wikipedia knows, is utter bullshit. The scientific method can't be supplanted by a large amount of questionable data. Tons and tons of bad data is still bad data. It doesn't get any more correct just because there's more of it.

--
I don't respond to AC's.

Re:Wrong by SBacks · 2008-06-25 04:27 · Score: 1

I disagree! And, I have evidence:
Googling "more is better" returns 177,000,000 results.
Googling "more is worse" returns 114,000,000 results.
QED
Re:Wrong by JasterBobaMereel · 2008-06-25 04:46 · Score: 1

Data is data, it can predict nothing
A Theory can be tested and predict unknowns, and these can be tested
More data is just more data it still predicts nothing and without a model/theory it cannot even show trends (Correlation != Causation) "The price of bread rises because flamingoes are pink..."
More data makes testing theories easier, but on it's own is useless, just like more Wikipedia Articles without editors are just ramblings

--
Puteulanus fenestra mortis

Interesting, ranty, and wrong by xPsi · 2008-06-25 04:01 · Score: 5, Insightful

A thought-provoking piece written by someone who neither understands the scientific method nor Google. Who doesn't understand the difference between a Theory and a model. Who still doesn't get correlation!=causation. Who probably has never had to actually analyze any substantial amount of data before. And who has clearly been raised on a self-important intellectual diet consisting of too much Buckminster Fuller, Kurtzweil, Frank Tipler, and Derrida. I'm sure there are some kernels of insight buried in there someplace, but I'm just not clear what they are. If his rant is indicative about the future direction of science, we're all doomed.

--
i\hbar\dot{\psi}=\hat{H}\psi

Re:Interesting, ranty, and wrong by Bazer · 2008-06-25 04:11 · Score: 1

If his rant is indicative about the future direction of science, we're all doomed. I wouldn't be too concerned about that. I'd be more concerned about the reason behind this quote:
"All models are wrong, and increasingly you can succeed without them." I sincerely hope there's some merit behind it. If there isn't any then Google would have to revise this guy's job position.
Re:Interesting, ranty, and wrong by TerranFury · 2008-06-25 04:21 · Score: 1

I hadn't heard of Frank Tipler until you mentioned him. WOW! That man speaks crazy talk.

And he has a faculty position.

I'm starting to realize that academia is similar in some ways to pop-culture, in that name recognition is everything. It differs just in that the publicity stunts you do need to impress a different sort of person.

In a way, it's career advice. *Goes back to getting PhD*
Re:Interesting, ranty, and wrong by seandiggity · 2008-06-25 05:17 · Score: 0, Troll

If his rant is indicative about the future direction of science, we're all doomed. Unfortunately, the social sciences literature is full of this stuff. It takes real discipline and fortitude to get through training in the social sciences and not be seduced by this kind of rambling bullshit, without even mentioning the other obstacles. I've found this book a great resource, and a good primer on science and rationality, one I think Chris Anderson needs to read.

...and I think we're doomed for other reasons :)

--
Geeks like to think that they can ignore politics, you can leave politics alone, but politics won't leave you alone.-rms
Re:Interesting, ranty, and wrong by poot_rootbeer · 2008-06-25 05:20 · Score: 3, Insightful

A thought-provoking piece written by someone who neither understands the scientific method nor Google. Who doesn't understand the difference between a Theory and a model. Who still doesn't get correlation!=causation. Who probably has never had to actually analyze any substantial amount of data before. And who has clearly been raised on a self-important intellectual diet consisting of too much Buckminster Fuller, Kurtzweil, Frank Tipler, and Derrida.
And he works at Wired magazine? You don't say.
Re:Interesting, ranty, and wrong by sm62704 · 2008-06-25 06:35 · Score: 1

About ten or fifteen years ago I had to analyse 30,000 questionaires in my job. I'm not a statistician, but my boss is. We had all sorts of learned experts, a roomful of PhDs, giving input into the survey instrument. As the "computer guy" my task was to take these 30,000 pieces of paper, put them in the computer, and turn the data into information.
I might as well have been surveying slashdot to find opinions about the RIAA or SCO. The data were overwhelmingly negative. Obscenely negative. Scatologically negative. People HATED us and the program we were surveying.
The results were quicky buried. Somehow that was more signifigant to me than the data.

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
Re:Interesting, ranty, and wrong by Sparky+McGruff · 2008-06-25 10:13 · Score: 1

There's a bit of truth in there, perhaps, under the BS.

In biology, a lot of knowledge has been gained by the use of high-throughput methods -- microarrays, whole-genome analysis, etc. This doesn't strictly follow the scientific method; you're not coming up with a narrow hypothesis, and testing it against the null hypothesis. These methods are exploratory by nature, and there's no specific hypothesis being tested, other than "we'll be able to interpret the data post-hoc". You crank out the data, then come up with the model.

Honestly, that's how the work is done these days. And until fairly recently, you couldn't get a proposal funded if you explicitly said that this was your approach. Why? Because it's not "hypothesis driven". So everyone wrote grants and proposals that said they'd test a narrow hypothesis, then take the money and run the exploratory experiments anyways. And, when they got the data, they'd pretend that they were testing a narrow model from the get-go.

With the success of the Human Genome Project, I think we can stop pretending that experiments that are exploratory in nature are just unfocused, unintelligent crap.

Biggest Data Collector LHC relies on Models by markk · 2008-06-25 04:04 · Score: 4, Insightful

I thought this was a joke at first. One thing to think about is that the biggest data collector of them all, the Large Hadron Collider, which fits the frame given perfectly - delivering terabytes of data in huge data sets is just the opposite of the described scenario. Models are crucial to actually picking what data is actually recorded. In fact a large part of how good the LHC data will be will be in using models to select what events to capture. The way the data is captured is of course also based on long effort and knowledge from previous detectors. This isn't just randomly, or even generically selectively gathering data and then analyzing it. This is targeted data gathering based on complex scientific theories. There have been shouting matches at what to tag for collection based on what people think is important for a given theory - and these will happen again.

As our collection abilities rise exponentially, the the storage and analysis abilities are not exponentially growing, even though they are increasing at a fast rate! I would argue exactly the opposite of what this article said. We are going to be more and more dependent on our current scientific theories to even be able to choose appropriately the rich data that new sensors and techniques will let us collect. That is we are more and more dependent on our scientific theories when we get data not less. Did we even know to get methylation data when sequencing a genome. How about some other "ylation". Without background theory and experience we wouldn't even know some of that stuff was there to collect!

Re:Biggest Data Collector LHC relies on Models by mako1138 · 2008-06-25 12:59 · Score: 1

To put targeted data gathering in perspective: the detectors at the LHC will produce thousands of channels worth of data, at an interaction rate of 40 MHz. Conservative estimate: if we've got 8bit data in 1024 channels at 40 MHz, that's a raw datarate of ~40 TB/s.
The function of the trigger, as it's called, is to get that insane datarate down to something manageable. This is where the models come in. Physicists who are looking for certain events will run full simulations of collisions in the detector, and come up with a set of detector conditions. Then they present their requests to the trigger people, who will see if they can accommodate them.
The trigger system then processes the data while the accelerator runs. Most of the data gets ignored, but the interesting stuff gets flagged and saved.
(Note: I don't work for LHC, but I work for RHIC.)

WTF, be serious by mlwmohawk · 2008-06-25 04:07 · Score: 2, Insightful

This is nonsense pure and simple.

One needs to acquire facts. Now these "facts" can come from your own research or, in the age if the internet, someone else' data, but they still need to be collected and verified.

The *only* advantage that google provides is a more efficient way of sharing and finding facts. Not even all facts, those that are popular and topical are what you'll most likely find.

Historical information, from when newspapers only used dead trees, can be very difficult to find on the internet unless someone else did the research first.

this one needs a "haha" tag by quixote9 · 2008-06-25 04:07 · Score: 1

Vast clouds of information used without intelligence are just garbage going nowhere. You can't even call it Garbage In Garbage Out, because it's not being processed by any kind of mind at all.

What could possibly go wrong?

Just to clarify by GameboyRMH · 2008-06-25 04:08 · Score: 5, Insightful

To avoid the same fate as the GP, let me clarify that by WTFey I specifically meant that the article was full of fluff, light on details and generally pointless...which makes me think "WTF." The closest thing to a point I could get from the article was "Nice big blobs of data can be useful, and statistical data based on said blobs could replace the results of scientific research." Mmmkay.

A sensational headline leading to a rather pointless article consisting mostly of fluff: WTF.

--
"When information is power, privacy is freedom" - Jah-Wren Ryel

Re:Just to clarify by Anonymous Coward · 2008-06-25 05:07 · Score: 0

I specifically meant that the article was full of fluff, light on details and generally pointless...which makes me think "WTF."
Isn't this true of pretty much everything that Chris Anderson writes?
Re:Just to clarify by javilon · 2008-06-25 05:14 · Score: 4, Interesting

Well I think the point they make is that with this kind of mathematical tools running against this huge sets of data, you get models out that you couldn't have thought of about. This is real AI. During the last days we had entries here on Slashdot about how AI is not advancing, but this kind of thing is very advanced AI and it is new.
I'll explain myself. The biggest job that a brain does (lets not consider a human brain so we don't get into the consciousness/mind type of conversation) is to find statistical correlations from the input data and extracting models from this correlations that can be used to predict the future. This is exactly what this tools are doing.
Before this tools, by looking at the data you would go: mmmm, this is interesting, lets check it out. That is, you would come up with a model and try to find out if it predicts the data. Then we started to use computers to check our models, and from what this WTFey article says, it is the computer the one coming out with the model now, starting from raw data.

--

When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."
Re:Just to clarify by Kamineko · 2008-06-25 06:03 · Score: 2, Funny

"What The Fluff"?
Re:Just to clarify by inkyblue2 · 2008-06-25 06:17 · Score: 3, Insightful

the difference is that brains create new theories and models to describe data, whereas this article specifically talks about avoiding the need to make new theories to describe data. we still have no AI that can create theories and models and semantics on its own. i agree that when that happens, we'll have something exciting and new, but it hasn't happened yet.
Re:Just to clarify by Anonymous Coward · 2008-06-25 07:03 · Score: 0

I so did not read the word blobs correctly my first read through your post.
Re:Just to clarify by ardle · 2008-06-25 07:04 · Score: 2, Funny

I fear to think what theory an AI would come up with based on all the "information" that's on the Internet
Re:Just to clarify by inkyblue2 · 2008-06-25 07:14 · Score: 1

instead of "42" it'll come up with "rule 34."
Re:Just to clarify by lawn.ninja · 2008-06-25 07:35 · Score: 1

I get what you're saying but it would be a huge mistake to drop scientific theory. I can make many correlations and see many patterns, but my mind seeks to do that. So to me that means that AI has advanced to the point that it makes the same mistakes humans do. It assumes things based on what it sees.... That pattern it finds could simply be a biproduct of the subsystem it is running on top of. Kind of like Newton's theorys. The only things I agree on in the article is that computers are capable of processing massive amounts of data and all of our current theories about the universe are wrong.
Re:Just to clarify by Anonymous Coward · 2008-06-25 07:39 · Score: 0

Wow, what a steaming pile of nonsense.
I think your brain should focus on its job of making sure you don't post rambling speculation about shit you don't understand.
Re:Just to clarify by GameboyRMH · 2008-06-25 07:50 · Score: 1

I agree that it would be a huge mistake to drop scientific theory (the whole idea is nonsense) or rely on statistics in place of it. Using models generated from patterns in the data could be a helpful tool in science, but not much more than that.

--
"When information is power, privacy is freedom" - Jah-Wren Ryel
Re:Just to clarify by Anonymous Coward · 2008-06-25 08:04 · Score: 0

I beg to differ. Sounds like you aren't one who has actually modeled data. Modern technology only allows data to be modeled using known or simplified models. Of course, the fact that the only cited example of this "non-scientific" future is gene sequencing. Gene sequencing and the models involved in sequencing were developed by HUMANS.
Consider experimental physics. To analyze, lets say, tunneling behavior of electrons through specific materials, one would have to use certain models to obtain any gainful data. A computer can instantly fit the collected data into a simple logarhythmic curve. Which, of course, means NOTHING. The data would have to be analyzed through models created by humans to be of any use. Theoretical models which give meaning to certain variables contributing to a certain curve.
Case in point. Give the data obtained from the original experiments used to first observe the Fractional Quantum Hall Effect and subject it to the best modeling in existence and I guarantee you it would not understand the underpinnings of the FQHE. Otherwise, we'd have computers winning Nobel Prizes across the board.
Re:Just to clarify by naoursla · 2008-06-25 08:12 · Score: 1

Models are a way of compressing data.
If you observe (x,y) as [(1,2), (2, 4), (3, 6), (4, 8)] then you might build a model that says y = 2x.
The more data you gather that fits your model the more confidence you gain in your model. The model allows you to work with data that you haven't seen.
But models can be wrong. In fact, they often are. They are approximations of the real world. What Google is doing is observing lots and lots of data and not making a model out of it. Their 'model' is the raw data and instead of 'compressing' it into a mathematical equation they just store it and work with the raw data.
You no longer need model hypothesis. Since you don't have any hypothesis you don't have to test anything. You don't do the scientific method. Instead you just observe.
Re:Just to clarify by Dogtanian · 2008-06-25 08:16 · Score: 1

To avoid the same fate as the GP, let me clarify that by WTFey I specifically meant that the article was full of fluff, light on details and generally pointless... Typical Wired science article, then. They give the superficial appearance of being in-depth, but in truth they're just longwinded, and the science in them is mainly unexplained gloss to be fetishised over. You read them and then ask yourself exactly what you've learned.

Wired science is for people who're into technology as a lifestyle, who like to kid themselves that they're interested in science but ultimately aren't. They aren't even particularly good "popular" science, because they're too far up their own arse.

This article isn't completely awful, which kind of makes it more frustrating. There's enough in there to get you interested, but you're left trying to figure out what conclusions it's exactly trying to draw and how much of it is actual science vs. (possibly the author's) pseudo-scientific technofetishism or pretentiousness.

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Re:Just to clarify by hedwards · 2008-06-25 08:18 · Score: 3, Insightful

Quite so, the article was dead wrong.
Having that much data allows for science that wouldn't have happened otherwise, but it doesn't allow us to forget about sound scientific principles. I for one don't want to die because the pharmaceutical company and my doctor thought that a correlation with safety was enough, without doing the experiments to verify. I could die either way, but correlation just isn't enough in many cases. Statistics don't prove or disprove anything, ultimately science is about understanding things the way that they are. Statistics can't do that.
If you can collect and store 100 pieces of information about a test subject for 200,000 test subjects at 150 points in time, you can do a huge amount with that. But, the data still needs to be interpreted, verified and placed into a verifiable model.
It doesn't really surprise me that Google would be handling search the way that they do, considering how borderline impossible it is to search for certain things unless you already know what you want. Searching for answers to software bugs ought to be straight forward, but Google seems completely incapable of sanely coping with version numbers without a lot of work.
Re:Just to clarify by verbamour · 2008-06-25 08:39 · Score: 1

Past performance is no guarantee of future results.
Re:Just to clarify by cayenne8 · 2008-06-25 09:09 · Score: 1

I think the WTF was with regard to this: "succeed without them.' Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?"
What the fuck is maths? Did they mean mathematics?

--
Light travels faster than sound. This is why some people appear bright until you hear them speak.........
Re:Just to clarify by DrVomact · 2008-06-25 10:14 · Score: 1

WTF? Shouldn't the parent be modded up for humor, not "interesting"? It makes as little sense as the TFA, so I assumed...well...maybe both were written by the same computer.

--
Great men are almost always bad men--Lord Acton's Corollary
Re:Just to clarify by bar-agent · 2008-06-25 10:28 · Score: 1

Past performance is no guarantee of future results.
No guarantee, that's true... but past performance is a probable indicator of future results.

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
Re:Just to clarify by ceoyoyo · 2008-06-25 11:27 · Score: 1

He's basically talking (badly) about data mining. Which is a great technique to find interesting research avenues, but does NOT, in any way, replace scientific research.
Re:Just to clarify by Pseudonym · 2008-06-25 12:24 · Score: 1

Here's a simple example. Suppose you want to correct spelling. There are two ways of doing this. The first we'll call the "Microsoft" way: come up with a dictionary of correctly-spelled words and check against that. This is labour-intensive, and has the problem of novel words, different dialects and so on. The second is the Google way: analyse every piece of text in the world, assume that the majority knows how to spell correctly, and correct against that. This is compute-intensive, but computers are cheaper than humans.

--
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
Re:Just to clarify by theshowmecanuck · 2008-06-25 12:48 · Score: 2, Insightful

Google no longer returns any useful data anyway. Search for anything and all it will turn up are thousands of web sites trying to sell you something that might be related to the search query you typed in. I think that is why Wikipedia is so popular. At least there you get some information on a topic you search for... and it doesn't contain the words 'best price on the net' etc. either. I have just about given up searching on Google or Yahoo, or any of the big search engines, since they usually don't return anything useful anyway. Except when I need to buy something. If you don't personally know a specific web site that has info on the subject you are researching, you are screwed as far as getting anything useful from Google.

--
-- I ignore anonymous replies to my comments and postings.
Re:Just to clarify by mike.mondy · 2008-06-25 16:45 · Score: 1

Finding patterns equals creating models? Not quite. At least not if "model" means anything that could count as even weak AI.
Finding correlations is not the same thing as having a model or theory to explain why the correlations exist.
In particular, if the data set does not contain all relevent info, you cannot know what the casuality is between two correlated items. The old example is: "Both ice cream sales and drownings increase in certain months. Does this mean that eating ice cream leads to drowning?"
On the other hand, sometimes you don't care about "understanding". The stock market guys just want to find patterns and hope they hold true (predict) long enough to make money.
Re:Just to clarify by rugatero · 2008-06-25 20:36 · Score: 2, Funny

The second is the Google way: analyse every piece of text in the world,
That should be analyase every piece of text on the Internet , which hampers the next step somewhat:

assume that the majority knows how to spell correctly

--
This comment is for entertainment purposes only. Any similarity to real insight or information is purely coincidental.
Re:Just to clarify by rugatero · 2008-06-25 20:38 · Score: 1

... and I have inadvertently supported my own point by misspelling analyse. Damn.

--
This comment is for entertainment purposes only. Any similarity to real insight or information is purely coincidental.
Re:Just to clarify by TuringTest · 2008-06-25 21:36 · Score: 2, Informative

I suggest you to include the search terms -buy and -price. That makes wonders in getting Google to show you the actually relevant pieces of information.

--
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
Re:Just to clarify by neomunk · 2008-06-26 03:11 · Score: 1

I agree, mostly. It's been a while, but I remember the Scientific Method as being: Observation, Hypothesis, Experimentation, GOTO Observation. You've already described how the system can be useful in the Observation stage, and I submit to you the idea that it also is assisting the Hypothesis stage as well.
For some reason, you seem to insinuate that with the new 'data blob' method, you Observe, Hypothesize, and then run with the results as if that's all that need be done. For example, I'll suggest that your 'ice-cream, drowning' correlation (though simplistic, it's a good analogy) is EXACTLY the type of correlations that we humans have made over the years USING the scientific method, but because of the cyclical 'GOTO Observe' step of the SM, we test the new models we've built and find that there is a flaw in our model. There is nothing about this new technique that makes further refinement and testing of the model unapplicable.
That being said, the very first line of my post still stands, I agree with you, in that at most this new technique will represent a tool of learning. People will need to do much of the 'fine tuning' (and outright rejection) of the models presented to us, as well as being (IMHO) incredibly more efficient at the 'design' phase of the 'Experimentation' step, due to our natural creativity. Whether said 'creativity' can be simulated is a topic for another thread, and would only apply in highly advanced models of the system we are talking about, so it's not relevant to this real-world-tech conversation.

Nonsense. by going_the_2Rpi_way · 2008-06-25 04:10 · Score: 1

This is grade school stuff. Correlation is not causation.

Which means if you're approaching a region you haven't sampled, then you can't understand what's going to happen because you've thrown away your interest in 'why [something] does what it does, [because] it just does it.'

If you're only using models as correlations or proxies, what are you using models for anyways? There's nothing 'increasingly' true about that.

--
=======
Science -- Sealed, Delivered.

WHAT? by vivin · 2008-06-25 04:12 · Score: 1

A scientist doing an experiment still relies on the scientific method to collect his own data to see if they support his hypothes[ie]s. I really don't see anyone publishing a paper and saying "Dudes! I used Google to find my data points!" How the hell is Google going to stop people from doing experiments and finding their own data?

This article is complete crap. I don't think this person even understands whats the "Scientific Method" means.

--
Vivin Suresh Paliath
http://vivin.net

I like

It's still alive by rnaiguy · 2008-06-25 04:12 · Score: 1

Sure you've got ton's of data, but you need a theory to use it to solve real scientific problems.

For example, Craig Venter may have tons of genes that look like something that can make gasoline from grass, but you still need to test each one the old-fashioned way, with careful application of theory and experiment, to see if it works before you start using it.

Sidney Brenner (legendary Biologist and Nobel laureate) calls these methods "low-input, high-throughput, no-output biology." http://www.mc.vanderbilt.edu/reporter/index.html?ID=5027

Re:It's still alive by Daniel+Dvorkin · 2008-06-25 04:30 · Score: 1

Brenner, like a lot of older wet-lab scientists, makes some good points but goes way too far in his criticisms. High-throughput biology is increasing our understanding of basic cellular processes at an exponential rate. The key point he misses, I think, is that high-throughput techniques are certainly low-output on a per-experiment basis compared to traditional tecnhiques -- but "low" is not the same as "no", and if you do a very large number of experiments in parallel, there's a good chance that one or two of them will yield useful data. Furthermore, with large public repositories like GEO, there's a good possibility that the hundreds or thousands of experiments that don't yield useful results for your work can still be useful to someone else.

--
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.

The Paradigm is the Data Subset by fictionpuss · 2008-06-25 04:15 · Score: 5, Insightful

The paradigm is embedded in the quantity, or subset, of data you choose to analyse.

For example, to detect stress you might traditionally measure heartbeat, skin conductivity, pupil dilation.

In the "petabyte age" you throw in the number of times the subject uses the letter 's'; how frequently they use the 'reload' button on the browser; what colour of pants they wore last tuesday; Pepsi vs. coca cola; the number of times they picked their nose in 1997 and any and every other bit of data you have on the subject.

In the "petabyte age", most of the data you sift through will show no correlation, but you have a much better chance of finding the unexpected if indeed, there is some unknown factor out there.

Re:The Paradigm is the Data Subset by kurthr · 2008-06-25 04:38 · Score: 5, Insightful

Don't you run a much higher probability of finding high correlation by chance?
I can expect to find a result that matches my model to 95% certainty about 5% of the time in random data. You can correct for this, but it's against human nature because people like to see the face of Mary in toast.
Learning how to look for correlation in huge uncontrolled data sets will require a new paradigm... or it will ultimately be useless and even perhaps, unsuccessful.
Re:The Paradigm is the Data Subset by hal9000(jr) · 2008-06-25 04:43 · Score: 2, Insightful

The paradigm is embedded in the quantity, or subset, of data you choose to analyse. In addition, once you start to analyze something, you have already built the "model" ipso facto. I can't imagine how you could set out to analyze something without a model.

The example Anderson uses in fact shows this. Ventner had to have a model of an ecosystem within which he posits the existence of organisms. Through testing (statistical analysis), he finds them. Thus 1) ecosystems house organisms and 2) there are organisims we don't yet know about.

Seems like the scientific method to me.
Re:The Paradigm is the Data Subset by ArhcAngel · 2008-06-25 05:17 · Score: 1

In the "petabyte age" you throw in the number of times the subject uses the letter 's'; how frequently they use the 'reload' button on the browser; what colour of pants they wore last tuesday; Pepsi vs. coca cola; the number of times they picked their nose in 1997 and any and every other bit of data you have on the subject.
I see what you did there. Freud would find you fascinating!

--
"A person is smart. People are dumb, panicky dangerous animals and you know it." - K
Re:The Paradigm is the Data Subset by fictionpuss · 2008-06-25 05:39 · Score: 3, Interesting

Learning how to look for correlation in huge uncontrolled data sets will require a new paradigm... or it will ultimately be useless and even perhaps, unsuccessful. The ability to find statistically significant correlation (i.e. not Mary-in-Toast) in huge datasets is a prerequisite condition.
But that goes for any visualisation technique - look to Edward Tufte or Stephen Few for detailed examples of how even the simple xy-graph can be abused.
Re:The Paradigm is the Data Subset by edcheevy · 2008-06-25 05:57 · Score: 3, Insightful

Yes. The more data you collect, the more likely any two things will be correlated slightly. With millions or billions of data points, you would be shocked to find a variable that does NOT correlate significantly with everything else. That's why "correlation" or "significance" alone becomes less useful and we need to a) report effect size measures to get a better sense of how important the correlation actually is and b) continue to use our heads (and not always give blind trust to the cloud) to determine which correlations are useful and which ones are fluff.

A correlation that helps place internet ads .0000002% more efficiently might matter to Google but likely doesn't further human understanding or refine our thinking in any practically appreciable way. And because EVERYTHING is correlated at that point, I suppose there are an infinite number of variables we could use to refine our model. I think the only paradigm shift here is that it would take an army of AIs to sift through and bring some meaning to all that noise, and an army of AIs would probably be doing other things with their time. ;p
Re:The Paradigm is the Data Subset by lilomar · 2008-06-25 06:17 · Score: 1

It's a sad day on slashdot when Freud has to be wiki-linked to avoid misunderstanding.

--
The creator of this post (Jacob Smith) hereby releases it, and all of his other posts, into the public domain.
Re:The Paradigm is the Data Subset by iwein · 2008-06-25 07:28 · Score: 1

The paradigm is embedded in the quantity, or subset, of data you choose to analyse. No, it's embedded in the methodology that selects the subset. Data is data.

--
Show a man some news, distract him for an hour. Show a man some mod points, distract him for the rest of his life.
Re:The Paradigm is the Data Subset by fictionpuss · 2008-06-25 08:45 · Score: 1

I'm not sure that your distinction is significant enough to argue the point with.
Re:The Paradigm is the Data Subset by thesandtiger · 2008-06-25 09:20 · Score: 1

Don't you run a much higher probability of finding high correlation by chance?
I can expect to find a result that matches my model to 95% certainty about 5% of the time in random data. You can correct for this, but it's against human nature because people like to see the face of Mary in toast.
Learning how to look for correlation in huge uncontrolled data sets will require a new paradigm... or it will ultimately be useless and even perhaps, unsuccessful.
Yes, and that's why you can then go back and design a follow-up experiment to test to see if it's actually chance or something else.
For example: In my current lab, one of the things we're doing is a longitudinal study of certain groups of individuals over the course of 2-3 years, personality traits and risk-taking behaviors as it relates to HIV/AIDS infection. We're taking a shotgun approach, running a huge number of measures for each subject, and just collecting data.
Even in the early stages, we've found some correlations that seemed significant that came out of left-field or that nobody in the research team had anticipated. When we found those, we then added a couple of modifications to our protocols for a subset of our population that were designed to test those correlations (random blip or actual significance?) and were able to determine that it was almost certainly just a fluke. That finding has been borne out by further data collection as the study has continued.
My point with all this is that yes, it certainly is possible to find meaningless seeming correlations by random chance, but it's also possible to assess those findings and put them into perspective. This is incredibly beneficial and has already lead to some findings that so far seem to actually be significant instead of just flukes.
Also, having SO MUCH data available allows us to (easily) explore questions that come up after the fact that we would have to run a follow-up on if we hadn't had so much data. For example, one study in our lab involves interviewing adolescent girls from a specific ethnic group and their mothers about sex. Initially, only female interviewers above the age of 25 and also from that ethnic group were used. However, we were unable to find enough interviewers who matched the requirements, so eventually the same-ethnicity requirement was dropped and then the age requirement. So, I had to run through the data and determine if age of interviewer or ethnicity of the interviewer had any effect on the process, as well as account for potential bias caused by an individual interviewer's style.
When looking at the data I included over 300 factors because I could - computing power is cheap. I didn't find any evidence to support the idea that the age or ethnicity of the interviewer had any impact on the study, but what I *did* find was that subjects interviewed on Tuesdays between 4 and 6 in a particular interview room were more likely to indicate feeling depressed. I looked into it, and it turns out that there is a support group for parents of terminally ill children that meets in the room next to the interview room on that day and time and, while no speech can be made out, there is a lot of crying that can just barely be heard. So, we stopped interviewing people in that room at that time and the measures indicating depression returned to normal. I absolutely wouldn't have found that out if I hadn't decided, just because I could, to add in piles of extraneous data.
The only real downside to getting so much data from so many measures is that it increases the length of each assessment session, has necessitated getting enhanced funding to more appropriately compensate our subjects, and has lead to a slightly more challenging recruitment and retention process. However, those are things we know how to deal with, and their costs are certainly reasonable when compared to the chances of finding things that may be important that we might have overlooked otherwise. Even when they're silly things like one particular interview room has bad sound at one particular time on one particular day.

--
Since I can't tell them apart, I treat all ACs as the same person.
Re:The Paradigm is the Data Subset by Anonymous Coward · 2008-06-25 11:29 · Score: 0

That's why there are multi-testing corrections like the Bonferonni correction, and the false discovery rate.
Re:The Paradigm is the Data Subset by Anonymous Coward · 2008-06-25 15:51 · Score: 0

Exactly. it'll be like that phony nonsense from a couple of years ago based on the book "The Bible Code"
Re:The Paradigm is the Data Subset by iwein · 2008-06-25 21:05 · Score: 1

The point was made before that the article is just too vague. One of the main flaws is that it doesn't make the distinction between results (data) and method.
In the "petabyte age", most of the data you sift through will show no correlation, but you have a much better chance of finding the unexpected if indeed, there is some unknown factor out there My irk with this is that you have no chance to find a correlation, if you have no method to register it if you find one. Data in itself is useless, you need to know how you got it and you need to do something sensible with it.

Google is not magically going to correlate data.

--
Show a man some news, distract him for an hour. Show a man some mod points, distract him for the rest of his life.

To paraphrase Mark Twain... by Kid+Zero · 2008-06-25 04:15 · Score: 1

There's lies, damn lies, and statistics. Now "Clouds" of information.

Feh.

No. Science Scales. by mbone · 2008-06-25 04:17 · Score: 3, Informative

Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?

No. And also no to the basic premise of the article.

Meteorologists have been doing this for decades (principal component analysis has been a crucial tool there since the 1960's, and correlation analysis has been used in some form since the 1920's if not earlier) and so have the astronomers. Oh, and the particle physicists have been sifting data in their own way on a big scale ever since World War II.

As one of many examples, if you ever have heard of an "El Nino event," that was discovered through correlation analysis and is best understood through principal component analysis. BTW, the original work predates electronic computers and was all done by hand. The vast quantities of meteorological data require statistical analysis to make any progress at all, but that certainly does not mean that you cannot use the scientific method.

So, no, this does not invalidate the scientific method. In the Internet jargon, science scales.

Consequence of the Post-Modern Age by phobos13013 · 2008-06-25 04:17 · Score: 2, Informative

Anyone who has read any work by Lyotard, Baudrillard, or Derrida has seen this interpretation of reality coming for years. This is basically the consequence of the Post-modernist/Post-structuralist mentality.

In a sense, what the article is proposing is the "simulation" of reality in a computer system based on the available "data". This simulation as i will suppose in a moment is merely a flawed model since the data being related must in some sense be based on an algorithm which inherently MIMICS reality and is not a substitution for it (no matter how, "accurate" agreement). But nonetheless, the result of this as Baudrillard observed is not a simulation but a simulacrum of reality and eventually will take the place of reality. The implication is that reality is not created or manufactured by the interaction of people in a "real" sense but is actually lead by the operation of the simulacrum!

Nonetheless, the fact is there is no possible way to store ALL the data of the entire world (since some data is not recordable by a binary machine, and no a "quantum" computer is the solution to say it can be); however, the problem is this fact does not mean we cannot be mislead by the simulacrum and be lead into a future where human interaction is as I would call inhuman, but as some who have (in some cases unknowingly) fallen for the post-modern myth would call it merely an evolutionary result of human-interaction.

In the future the storage of data, the usage of data, and the power of data will have a huge impact on our humanity as the past twenty years should already be evidence of. I am not an apocalyptic fear-monger, but the proof is in the pudding. For further reading, I recommend a highly prescient book written in 1990 by a Mr. Mark Poster called the Mode of Information which talks about some of these implications which are in the process of becoming as we speak

--
...and it should be known by now

Re:Consequence of the Post-Modern Age by Anonymous Coward · 2008-06-25 05:03 · Score: 0

I was thinking Baudrillard and Post-Structuralism too.
Talk of causation and correlation is missing the point. Causation is not relevant, all this approach talks of is correlation. With no insight into Why.
Google doesn't understand French and German, but through correlation can map from one to the other. Because thats what all the data seems to do too. Dunno why, just follow the patterns.
In fact this is an argument about AI and understanding, Chinese room problem etc. There is so much data to infer relationships, don't need to understand, just follow the data.
Looking at genetics are we not trying to reverse engineer the inferences that evolution has already done for us, by sifting through all those lives and generations.
If any of this makes sense mind....

To the contrary! by anmida · 2008-06-25 04:18 · Score: 1

The idea in the article is interesting, but I personally feel it's totally bogus. Yes, crunching data with mathematical formulas can help extract something useful, but...
Strictly speaking, isn't a mathematical formula a model? All of the theories (models) we use in materials science to explain things (quantum mechanics, stress/strain relations of materials, etc) are all mathematical. Qualitative understanding doesn't give you a numerical prediction.
Perhaps the above is a bit of a logical flaw, but you still need the maths to get information out of all the data. You need to know what to look for and make the necessesary algorithm (low-level model?). AFTERWARDS, though, you need to understand that data. Otherwise, you have not done much to advance your understanding. I did RTFA, and the person mentioned who "discovered a new species" but doesn't know anything about it...neat. What, really, has he done? Just thrown out some meta-data for someone else to analyze, model, and study. Google is not the end of scientific method. To the contrary, I think it will only help.

Offtopic by Anonymous Coward · 2008-06-25 04:18 · Score: 0

.. but found this information on wikipedia's current events page: "The US state of Florida purchases a large dildo to add to the pleasure of the senate about their lands in the Everglades."

Paradigm agnostic by Anonymous Coward · 2008-06-25 04:18 · Score: 0, Funny

I firmly believe in paradigm. Call me what you will: "paradigm freek", "irrational", "stupid", etc...

Nothing will shake my faith in data or the paradigm. My faith has given me peace and happiness. I just hope you agnostics and paradigm atheists respect my beliefs and I'll respect yours.

Thank you and peace.

When people say shit like this... by iluvcapra · 2008-06-25 04:18 · Score: 3, Funny

Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?"

It means there's about to be an explosion in models and theoretical sciences. Always beware the End of History ;)

--
Don't blame me, I voted for Baltar.

Houses will fall down, Tumors will go unchecked by tezza · 2008-06-25 04:20 · Score: 1

Here is some 'compelling' comment. Lots of very important things require the scientific method.

Obvious ones are medicines and housing materials.

Important ones are accurate global warming models and electric battery efficiency tests.

This Wired jerkoff and his band of know-little acolytes think that because they can accomplish everything in _their_ day without science, that it will die out.

This myopic self centredness would not have yielded them a clear signal on their iPhone. Science did that.

--
[% slash_sig_val.text %]

Re:Houses will fall down, Tumors will go unchecked by sm62704 · 2008-06-25 06:42 · Score: 1

This myopic self centredness would not have yielded them a clear signal on their iPhone. Science did that.
No, engineering did that. Science just explained how the engineers should go about it.

--
mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest

This is a lazy way to work by wcrowe · 2008-06-25 04:22 · Score: 1

I'm concerned that placing too much trust in such a model-less paradigm is dangerous. Why? There could be several reasons. Data can be artificially manipulated, for example. This can cause us to draw erroneous conclusions, and consequently, to make poor decisions. I would still want to know "why".

--
Proverbs 21:19

Very WTFey by fictionpuss · 2008-06-25 04:22 · Score: 1

Yes, knowing that there are lots of species of organisms in the air that we didn't know about before is sort of interesting I guess, but it doesn't really tell us anything useful. WTF? It tells us that money spent on discovering those organisms will not be in vain -- that there is an area worthy of further investigation.

What is not useful about that?

Re:Very WTFey by eln · 2008-06-25 04:33 · Score: 1

Maybe, but the article also says we don't need the scientific method or any other "model" to interpret the data once we have it. Instead, we use some sort of ill-defined "Googlish" method to derive meaning from it.
Seems to me that no matter how much data you have, and no matter how efficiently you can search through it, you're still going to need some sort of model, and especially the scientific method, if you want to derive any useful science out of it. The article seems to be suggesting that Googling through the data is good enough to find all the answers you need.
Re:Very WTFey by ardle · 2008-06-25 18:38 · Score: 1

I completely agree: there are an infinity pf possibilities out there and if there is not some kind of human intervention, we could be waiting an eternity for a meaningful theory, never mind a verifiable one (million monkeys, etc).
This means that the AI may have to be taught quite a lot about what constiute meaningful relationships, with external feedback (most likely some kind of human intervention - we can't possibly pre-program all meaningful relationships. If we could, we couldn't - Gödel) .

Fat data stores might not be the right data by postbigbang · 2008-06-25 04:22 · Score: 1

We agree.

Restated:

The information quality of data isn't implied by large amounts of it. Correlation (read petabites of foo) != causation.

--
---- Teach Peace. It's Cheaper Than War.

"Intense" amounts of data? by Anonymous Coward · 2008-06-25 04:23 · Score: 0

"Intense" amounts of data?

WTF is an "intense" amount?

Quite... by denzacar · 2008-06-25 04:24 · Score: 3, Informative

I'm sure there are some kernels of insight buried in there someplace, but I'm just not clear what they are My thoughts exactly.
And since most slashdot readers don't RTFA most comments here have proven useless in trying to figure what those kernels you mention are.
But this guy, who has read TFA (and commented on it on the Wired's site) seems to have found them.

Posted by: technophile
20 hours ago1 Point
I think what you have hit on here is the difference between analytical and empirical solutions. Analytical relationships are usually first determined from empirical ones. Once you have the empirical relationships you can determine the missing factors or constants.
(See also http://en.wikipedia.org/wiki/Empirical_method )
They are both necessary and a part of the scientific process. You collect data, generate empirical equations, then try and derive or otherwise model the empirical relationship with an analytical one. Empirical relationships are limited because they are somewhat system dependent. For instance an empirical relationship for the ideal gas law could be generated using methane. This might be accurate for methane, but limited in its use for a gas that deviates from the ideal behavior (i.e. hydrogen fluoride gas). You could generate an empirical relationship for every single molecule in the universe but that would be impractical, which is why analytical relationships can often be more useful. Hopefully the "Petabyte Age" will allow the scientific method to flourish, not replace it.
edit: Rethinking my reply, what the article seems to say is that the Petabyte Age will make determining empirical relationships for everything practical. The scientist who generates loads of empirical relationships and never questions the underlying theory is not a scientist at all, just an observer of scientific processes. I suppose it depends on your goal as to whether this will suit you or not.

--
Mit der Dummheit kämpfen Götter selbst vergebens

It depends by geekoid · 2008-06-25 04:25 · Score: 2, Funny

Fighter classes generally stop at con, where as Casters generally for Int or Wis. No one cares about Cha.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

Re:It depends by camperdave · 2008-06-25 04:51 · Score: 3, Funny

I don't know. They say you can get more with an 18 charisma and a sword than you can with a sword alone.

--
When our name is on the back of your car, we're behind you all the way!
Re:It depends by melikamp · 2008-06-25 04:58 · Score: 2, Funny

Never rolled a bard? Please turn in your geek card on your way out.
Re:It depends by Wandering+Wombat · 2008-06-25 05:06 · Score: 3, Funny

I failed my save and coughed coffee out my nose...

--
I like to place meaningful quotes in my sig, so people will know that I know what meaningful quotes are.
Re:It depends by Hijacked+Public · 2008-06-25 06:12 · Score: 3, Funny

I've rickrolled a guy who rolled a bard, does my card get stamped for that?

--
"Sacrifice for the good of The State" - The State

Foundation Series by __aagmrb7289 · 2008-06-25 04:27 · Score: 2, Interesting

Just finished rereading the Foundation series for the one millionth time. Anyone remember some of the signs of the decay of the first empire? The idea that these "scientists" were no longer experimenting, no longer looking for new ways to do things - just spending their time looking at old books and old experiments and trying to squeeze a "new" thought or two out of them? That a sociologist would study a society through books written about it? An archeologist would explore the ruins of a world by reading descriptions written by someone centuries before?

Anyway catching the parallels here? The "search engine" is a great tool for gathering existing data - but our current tools help us:

1. Analyze that data
2. Gather more data

Can you honestly say that those aren't important anymore? The summary seems pretty crazy to me.

good point of article ... by peter303 · 2008-06-25 04:29 · Score: 1

Even though its a bit exuberant, as some techies are,
the interesting point is that new generations of a "thousand" should give you new ways of looking at scientific problems. The claim it "ends science" is just a red-herring to get you to think about the issue.

simulation != model by migloo · 2008-06-25 04:30 · Score: 1

Uh, hello... a simulation IS a model.

Simulation is a copy.
A model involves a shortcut.

There is a confusion here between technology, which indeed is taking the peta-brute-forcing route, and Science whose beauty precisely resides in its computational economy.

A perfect copy of a brain would be an engineering feast without providing any understanding of why it works.

Re:simulation != model by Lurker2288 · 2008-06-25 05:22 · Score: 1

I don't think you're not all right nor all wrong.

A simulation is certainly intended to be a copy of whatever it's simulating: Microsoft Flight Simulator is supposed to be just like flying a real plane, with all the bells and whistles and gauges. However, for the simulation to work, you have to build in a set of rules reprsenting the original system: in this case, aerodynamics, avionics, things like that.

But given a simulation of certain complexity, there's no way we can possibly write a set of rules that perfectly describes the reality, because 1) we lack instruments sensitive enough to measure the necessary degree of detail, and 2) there are fundamental physical constraints on how much we can know about the universe (danke, Herr Heisenberg). So ultimately your simulation is based on a simplification of the genuine article: a model.

Could you have a simulation that's not based on some preexisting model, as you suggest in your post? I suppose, but without some understanding of the principles underlying the original, the best you could do would be to build a copy, and that wouldn't really increase your ability to generalize about possible outcomes.
Re:simulation != model by Anonymous Coward · 2008-06-25 05:36 · Score: 0

A perfect copy of a brain would be a brain. Not a simulation. Unless you call everything a simulation of itself. Which is fine with me.
There is a semantic difference between a model and a simulation. A model is an abstract or theoretical concept, while a simulation is an working implementation of a model. But you generally can't have a simulation without a model, so if something can be simulated then it can be modeled.
Re:simulation != model by backwardMechanic · 2008-06-25 07:24 · Score: 1

Indeed the art of simulation is often in choosing what not to include. If everything is included in the simulation, I might as well go and make real measurements. Missing out real-world imperfections can often make a problem easier to understand.

He's trying to sell you Enterprise Search by Anonymous Coward · 2008-06-25 04:31 · Score: 0

The root of all this is a pitch for Enterprise Search. I can't tell you how much more productive I am now that I can search all my email with Google Desktop (or whatever application anyone thinks is better.)

I am getting in screaming matches with my boss because management wants me to "moderate" all our 10 different corporate "portals", each of which has been created because some pissant minor manager didn't like the way the 9 other pissant managers were moderating their portals. Fuck that, the corporate intranet is a big pile of data, the tools exist for users to search it themselves, and I can do more interesting things than argue with users over the difference between "its" and "it's".

Google marketing by 12357bd · 2008-06-25 04:31 · Score: 1

is getting out of control.

--
What's in a sig?

Re:Google marketing by jopsen · 2008-06-25 10:42 · Score: 1

Yeah... Your probably right... :)

Though the article has a few good points I don't think statistics are going to replace good old thinking anytime soon...

Wired. by E-Sabbath · 2008-06-25 04:31 · Score: 2, Funny

You know, this may be the most pure Wired article I've read in a long time. Reminds me of the magazine's layout when it first came out. Complete bull, unreadable, unstructure, but slick.

Re:Wired. by Chapter80 · 2008-06-25 23:30 · Score: 1

yeah, and it got you to talk about it. Same as Star Magazine, National Enquirer, etc...

Interesting, but off target. by Techguy666 · 2008-06-25 04:36 · Score: 1

What Google has done is represent the world, mathematically, as it existed a moment ago. This is a massively impressive feat which we slashdotters don't give enough credit. On the other hand, I still say, "meh".

Think of it this way, Google has created a "model" world. I'm not thinking "model" in the scientific terms but "model" as in the Gundam robots with snap-tite (tm) parts. Instead of plastic bits, this model Earth is built with data. It's pretty to look at and has a lot of great details. It's a darned good-looking likeness of our world. (And no, I don't mean Google Earth, either.)

But it can't predict things like a scientific model.

One still needs the scientific model and hypothesis testing to make predictions and see what our world will be like in the future. This, in turn, also helps explain how or why things came before. The Google model just shows what currently is.

Let's keep the scientific method by 192939495969798999 · 2008-06-25 04:39 · Score: 1

After all, you never know when you need to prove that a supposed "piece of the cross" is in fact from the corner of some jerk's 20th century pine desk.

--
stuff |

I dissagree by V!NCENT · 2008-06-25 04:42 · Score: 1

For the most complex 'stuff' there is almost always the most simple explenation. Allthough E=mc^2 is not entirely correct, you do know what I mean...

What the author of TFA basicly is whining about is that everything is supposed to be too hard for them to find an explenation/solution for. So instead of doing the hard work they are just doing statistic analasys because it's much easyer for them to do.

He is basicly almost saying that we should be starting to believe in god because science (in his view, not mine) is crap the way it is today.

I call ThisFA a piece of crap

--
Here be signatures

Clouds? by Anonymous Coward · 2008-06-25 04:46 · Score: 0

Damn, thats my problem, I thought it was all tubes!

I've never heard something so ridiculous by SageinaRage · 2008-06-25 04:47 · Score: 3, Insightful

Google used reams of data to get good at advertising and marketing, so Wired is using this ability to predict the end of SCIENCE?

Do they not realize the difference between these things? Advertising is extremely hand wavy and vague in the best of circumstances - I would argue that Google's offerings aren't really better than any other method, they're just cheaper for advertisers, and have a much larger base than normal.

I'm honestly astounded at this.

Science is still about understanding by rronda · 2008-06-25 04:48 · Score: 1

I have to admit that the article is thought provoking, and it might actually fulfill an educational purpose in a science or philosophy class. It can be given to students for criticizing, in that way be used as a motivation for a discussion about how science can benefit from new technologies. The article is wrong basically because is confusing Science with something else, that could in the best case be called engineering. Science is about understanding and we use data and models to iteratively refine our understanding. Neither the model or the data by themselves, are the science. More importantly, as other people have pointed out, there is no understanding embedded in a correlation, or in a collection of butterflies for that matter (even if each butterfly is given a pompous "scientific" name).

Not really Science then... by ZonkerWilliam · 2008-06-25 04:49 · Score: 1

If your just data mining, you have to assume that the data is valid and not someones scientific bias. Thats a big assumption. There has to be some objective methodology to test claims and results, if not then there are no checks and balances and we are reduced to "crack-pottery".

Actuarial data without predictive power by Anonymous Coward · 2008-06-25 04:52 · Score: 0

"Paradigm agnostic" is just another word for "Who the hell knows?"

Actuarial data is about history. *Theories* describe behaviors in the past AND the future.

Both are useful, but you shouldn't mistake one for the other, or be misled into thinking you can do without thinking (i.e. developing a theory).

I only skimmed it yesterday on Wired by jabjoe · 2008-06-25 04:52 · Score: 1

It seams to basically be advocating:

* get lots of data
* use computers to search for pattens in that data
* use those patterns

instead of thinking up models and testing them.
But don't you have to have some model to define the parameters of the data you are collecting? I'm not sure there is anything new here..... Isn't it just data mining?

Just what we need... by camperdave · 2008-06-25 04:53 · Score: 1

Until cells, molecules, atoms, and subatomic particles start publishing blogs...

Just what we need... quark twitters.

--
When our name is on the back of your car, we're behind you all the way!

Re:Just what we need... by Eli+Gottlieb · 2008-06-25 05:31 · Score: 3, Funny

TODAY: Feeling up.
Re:Just what we need... by bsDaemon · 2008-06-25 05:34 · Score: 1

Well, twitter users are mostly a bunch of strangelets anyway...

TFA is Fallacious Idiocy by Colonel+Korn · 2008-06-25 04:54 · Score: 1

1) Google doesn't link to much useful scientific data. There are other databases that predate Google that did and still do, but Google isn't a very useful tool for data collection, so it shouldn't be in the discussion.

2) For about a century, all funded scientists have had pretty good access to all publicly released data in their fields and could rather efficiently sort through it due to good hierarchical organization. Nothing has changed about that other than speeding up the process to some extent using search (Again, not using Google. Their one foray into academic search is so far not useful for people who have funding and thus access to much more advanced and complete tools), but people who are really involved with their field of study already knew where to look, so the search is mostly a benefit to the undergraduate paper-writer.

3) The way that "paradigm" affects data has not remotely been related to which data sets people choose to examine in well run science. The paradigm affects what data is collected (which is what real data-driven scientists do), and the "Petabyte Age" has no affect on the efficiency of data collection.

The author is a massive failure.

--
"I zero-index my hamsters" - Willtor (147206)

Re:TFA is Fallacious Idiocy by Daniel+Dvorkin · 2008-06-25 05:14 · Score: 1

Again, not using Google. Their one foray into academic search is so far not useful for people who have funding and thus access to much more advanced and complete tools
That's not quite fair; Google Scholar does return a lot of crap, but mixed in with the crap is some useful stuff. A big part of its usefulness is that it doesn't restrict itself to a specific field -- if you know you're looking for a paper in a particular field, you're right that there is probably a domain-specific database that will give better results, but surprisingly useful results often turn up outside the field.

--
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.

Data is not paradigm agnostic. by Thyrsus · 2008-06-25 04:58 · Score: 1

One must already have a concept about what is measurable, what to measure, and how to measure it before data can be collected - thus the data collected already assumes a paradigm. Healthy humans start out with a number of measurement tools (senses) together with evolved modeling abilities (e.g., language proficiency) and refine their models. Where there is no innate model, mathematics, followed by the scientific method, appear to be the most reliable means of refinement. The innate models are likely reliable due to such reliability having an evolutionary advantage. Such models are nonetheless subject to improvement by math and science. The Wired article fails to appreciate the amount of data humans successfully model innately - petabytes don't even begin - and the amplification through scientific models expands that by many orders of magnitude.

Re:Data is not paradigm agnostic. by spun · 2008-06-25 05:51 · Score: 3, Funny

One must already have a concept about what is measurable, what to measure, and how to measure it before data can be collected I think the point the article was trying to make is that this idea is now wrong. There are petabytes of already collected data out there. You don't need to have any idea what is measurable, what to measure or how to measure it. You just throw statistical tools at those petabytes of raw data. You don't even need a model. Then magically we find out that vegetarians who wear blue pants on Tuesday and were born in November are more likely to get cancer and should get checked regularly, or something. We don't even need to know why, it's not important. Or at least I think that's what the article was trying to say.

--
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
Re:Data is not paradigm agnostic. by sarkeizen · 2008-06-25 08:33 · Score: 1

Meh.
This might help some kinds of science. I.e. meta-analysis on drug interactions. Assuming the data is being collected (hopefully). Available (it isn't). Normalized (nope). Can be set to a useful set of criteria - such as hospitalizations incurred within X months of prescriptions being written for drug Y or drug Z vs drug Y & Z. (debatable based on the drug/condition in question).
But without introducing any other new theory (the article implies we don't need any). How does this help me release a NEW drug without doing Phase III clinical trials? (Which are part of the scientific method IMHO).
Re:Data is not paradigm agnostic. by spun · 2008-06-25 08:58 · Score: 1

It doesn't help you at all. The author of that article was attempting to make himself look smart and revolutionary. I was just trying to explain what he was saying so people like you could point out what was ACTUALLY stupid about the article, rather than what was stupid about the imaginary article in their heads.

--
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
Re:Data is not paradigm agnostic. by shellbeach · 2008-06-25 10:50 · Score: 1
You don't even need a model. Then magically we find out that vegetarians who wear blue pants on Tuesday and were born in November are more likely to get cancer and should get checked regularly, or something. We don't even need to know why, it's not important. But this isn't the end of the scientific method, it's the observation at the start, which is what the author of the article has failed to grasp.
You see, all science starts with an observation of something interesting -- in your hypothetical case, that vegetarians wearing blue pants born in November are more likely to get cancer.
As you say, that's useful in itself, but only to a degree -- it doesn't allow you to predict whether this strange phenomenon is related to other forms of cancer aetiology. But then someone thinks about it, and comes up with a hypothesis to test -- for example:
- that the blue dye in the pants is absorbed by the body and triggers oncogenesis
- that vegetarians are unusually susceptible because their low protein diet activates a different metabolic gene expression pathway, which is disrupted by the blue dye, and
- that being born in November (in the Northern hemisphere) meant conception during the summer, when protein consumption is high, thus it is linked to an epigenetic change in these same metabolic genes that again increases susceptibility to the effects of the blue dye.
These hypotheses are then tested via the scientific method, and any knowledge gained not only helps to prevent against this particular type of cancer, but also helps to understand and potentially prevent or cure other related forms of cancer, as well as metabolic disorders.
In other words, having masses of statistical data is fantastic, but it's the start of the process, not the end of it. All this article proves is that its author doesn't understand the scientific method at all (and that he clearly has never applied for grant funding!)
Re:Data is not paradigm agnostic. by sarkeizen · 2008-06-25 13:02 · Score: 1

Sorry I wasn't criticizing you. I think my comment started out as a comment on yours but then it kind of mutated into something independent.
Another thing that crossed my mind is the implication that the distinction between correlation and causation doesn't matter. Which is clearly weird. I mean sure, in very specific senses it might not matter why people behave one way or why language evolves in one way but clearly there are places where the reasoning always flows the other way. Again in medicine, it actually matters WHY people are dying not just that they are. :-)
Re:Data is not paradigm agnostic. by spun · 2008-06-26 03:50 · Score: 1

Dude, way to run with that example! And I agree with you, the article is poorly thought out. The scientific method is in no danger.

--
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
Re:Data is not paradigm agnostic. by Thyrsus · 2008-06-26 04:26 · Score: 1

I think we're in a agreement. My point was that, when you're using someone else's data, you've already accepted the implicit ideas about what is/what to/how to measure. As shellbeach pointed out, any significant new insight (the hypothesis in the scientific method) is going to change or refine those premises and motivate new measurements.
Re:Data is not paradigm agnostic. by shellbeach · 2008-06-26 11:47 · Score: 1

Hey, I'm still in grant-writing mode. Spinning this sort of bullshit to get funding happens in my sleep ... :)

Semantics by coldsalmon · 2008-06-25 05:00 · Score: 1

It seems like the author is using the word "model" to mean "The structural design of a complex system," rather than "A simplified representation (usually mathematical) used to explain the workings of a real world system or event." What he's attempting to say is that mathematical models (what he refers as not-models) are all we need, and experiments are useless because we only need to know about correlation and not about causation. This is phenomenally idiotic, and assumes that "causation" is some Platonic, mystical idea of underlying truth rather than simply correlation + time. Once you get past his complete ignorance of the fundamental terms he's using to build his argument, he's saying that science can be done better with big computers and lots of data. Thank you for that fabulous insight. Alert the internet!

Can someone please clear this up? by Dripdry · 2008-06-25 05:03 · Score: 1

This has to do with a subject brought up in the article, this "new science".

I got into an argument with a friend over some content this article happens to contain. He said that, in fact, environment can influence inherited genes. Epigenetics and such. My argument was that if a parent is a marathon runner, for instance, his son might be prone to doing that simply by being brought up around that sort of thing, rather than having the son be a good runner because the father (or more likely the mother) did that.

He ranted and fumed that his teachers at school were right, and I asked if we could have a rational discussion on the subject, assuming him to be completely off his rocker. Another friend of ours also backed me up, saying he did not believe that environment could have an effect on inherited genes in that matter. We asked him how these traits were passed on, and he had no answer, just anger and frustration.

What is the skinny on this "new biology"? Can someone here please enlighten me? I feel ignorant here and wouldn't mind some information to sort things out a bit. Is this stuff proven, or just more hypothesizing?

Thanks!

--
-

Re:Can someone please clear this up? by ZonkerWilliam · 2008-06-25 05:25 · Score: 1

It's kinda fuzzy, sort of, not really. Taking your example, that a son may become a good runner because one of the parents did it and he grew up around it. It may be true he will lean towards actually becoming a runner. Physically though if the parents passed down genes that gave him better lung capacity or a more efficient way to metabolize oxygen or muscles that could retain and/or utilize carbohydrates more effectively, etc. would give him an advantage and make him a better a better runner.

The parents gave him the idea and motivation to run, the gene's gave him how well he will do when he does run.

Kinda sorta.
Re:Can someone please clear this up? by Dripdry · 2008-06-25 09:50 · Score: 1

The question is whether someone who does not have any particular traits advantageous to running, but who does run, will have offspring who may have traits favorable to running because their father or mother ran. Basically, evolution by environment.

wikipedia seems to indicate "sort of but not really", at least as seen in humans, though there appear to be some changes with grandchildren. Is this "sort of" what you meant? http://en.wikipedia.org/wiki/Epigenetics

--
-

No scientific method? by Anonymous Coward · 2008-06-25 05:04 · Score: 0

Google is at NASA? Just do me a favor and keep them out of the engineering building. Limit Google to the library where they can manage vast quantities of data using the dewey decimal system method.

"Paradigm agnostic" by gatkinso · 2008-06-25 05:05 · Score: 2, Insightful

An unknowable paradigm? Interesting.

--
I am very small, utmostly microscopic.

Oh for God's sake... by exp(pi*sqrt(163)) · 2008-06-25 05:09 · Score: 1

...it's a Wired article. What more could there possibly be to discuss?

--
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.

All I Want to Know by immcintosh · 2008-06-25 05:10 · Score: 1

All I want to know is this: how much did Google pay for this article?

Bless of dimensionality by Iluvatar · 2008-06-25 05:12 · Score: 1

Chris Anderson may not be an expert and he seems to equate theory with the partly dated Laplacian concepts. I only partly agree: not even close to the end of science or theory, but perhaps a slight shift in focus, towards models that are more statistical in nature. Fundamentals remain the same, though, as far as I can tell.

<shameless plug>
http://www.bitquill.net/ post
</shameless plug>

I'm guessing ... by Bearpaw · 2008-06-25 05:16 · Score: 1

I'm guessing that at some point Chris Anderson will say that he wasn't trying to make a serious claim in the article, he was just trying to "stimulate discussion about the changing realities of scientific investigation" or some damn thing.

Models let people actually get things done. by Anonymous Coward · 2008-06-25 05:17 · Score: 0

Microsoft tried vast clouds of information and strictly applied maths when they made their own video ASIC for the XBOX 360.

Their vast clouds of information and strictly applied maths still did not encompass knowledge of the future, or of people's previously-overcome practically - corrected mistakes.

Thus, the Red Ring of Death.

Models mean one need not re-invent the wheel for applied sciences.

Try this one on for size by hey! · 2008-06-25 05:19 · Score: 1

We all know that young, inexperienced workers are often more creative than much older ones -- at least in very hard fields. For example in mathematics, it is the young who do most of the innovating. There may be a physical aspect to this, as the brain becomes less neurally flexible, but imagine for a moment. What if part of it may be that creativity is bound up with the struggle to expand one's mental horizons? What if once sufficiently expanded, consistency with the known territory becomes a constraint on finding new territory?

Now, instead of an individual, consider a society as a whole. What happens when the volume of knowledge of society grows to a point where the marginal value of exploring the known exceeds the marginal value of extending it? Specifically, I am thinking of an example. In the late twentieth century, people began to think much more about exploring the interdisciplinary topics. They began to think about questions like, does chemical engineering have applications to the design of cancer drugs?

I don't think science is in any danger of becoming less dynamic because because this is really just another fronteir. The connections between the knowledge we already have is a very productive place to explore. Given that more scientists are alive and working today than ever, there is plenty of old school and interdisciplinary work to go around. The demands of publication will make use of all the individual scientific creativity the world can supply.

Where we are in danger of a kind of societal senility is in political and ethical thought.

The Internet (along with other things) makes us stupider in these areas than ever, because personal progress in them depends on setting out to confirm what you already know, but failing. You can hate Jews, but if you have to live with them, work with them, and interact with them that hatred is going to be strained. The same goes for any group. You may hate liberals or conservatives, but in a pre-Internet environment you had to accommodate those viewpoints in your mental landscape.

But it feels so much nicer to get a pat on the back than a stick in the eye.

Yes, we have never as individuals had as much access to dissenting views, thanks to the Internet. But we've never had as much access to people who think just the way we do. Thanks to Internet search engine technology, it is now possible, as never before, to spend every waking moment confirming our own preconceptions, whether that is anti-semitism, ultra-right wing politics, socialism, religious fundamentalism, or whatever ism you can imagine. Rather than confronting the information that challenges our pet conspiracy theory, we can burrow into a comfortable sub-world where our opinions and beliefs are nearly always ratified.

The dynamism of how we see the world of affairs is what is danger of being lost, because it has never been easier to be intellectually lazy. There are more opportunities to expand our horizons than ever, but along with that greater convenience in choosing the same thing, over and over. There may be more kind of vegetables in the supermarket than our grandparents could name, but it's sooo easy to eat at McDonalds every day.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.

Two words: selection bias by LrdDimwit · 2008-06-25 05:23 · Score: 2, Interesting

You can't get good data unless you control the makeup of your data population. Even if you applied this technique to all the data in the cloud, it wouldn't mean the "end of the scientific method", it would be scientifically studying the cloud.

So no. Even if everything he wrote is all true, you still apply science to study things, just in a different way. The internet doesn't make science obsolete any more than it made economics obsolete, and saying otherwise is as much hubris now as it was then.

Predicitive power? by Gryphia · 2008-06-25 05:24 · Score: 2, Insightful

It seems rather stupid to me. Sure, we can correlate a whole bunch of data. And we can collect a whole bunch of data. But that's not going to give us the predictive power that scientific models give us.

Take for example, the orbit of the earth around the sun. Suppose we collected a whole bunch of data on the orbit of the earth around the sun. Sure, we'd be able to predict what the orbit is going to be, based on past data. But it gives us no other insight. Whereas, when we use the theory of gravity (and rotational motion and conservation of angular momentum etc . . .) to predict that the earth orbits the sun, and how it does so, that gives us insight.

Because we can now turn to, say, Jupiter and the sun. Even if there is no data collected on how Jupiter orbits the sun, we can use the predictive power of our theories, that we have tested on the earth-sun system, to say how Jupiter is going to orbit.

That's a simple example, but you can imagine much more complicated situations. If we simply have correlation, we may be able to say that X is going to do Y based on previous behavior, but if I ask you how something new and unexpected is going to behave, we can get no answer until we take data . . . because we don't know *why* anything happens. And that's why we're never going to replace theories with statistical analysis of data.

There's a place for both. Obviously, just statistics can be very successful (google, for example), but, at least in science, it's not sufficient.

Re:Predicitive power? by netpolyglot · 2008-06-25 09:35 · Score: 1

Gryphia,
Your explanation of the difference between prediction based on statistics and on cause seems very clear to me.
However, I will disagree on the fact that Google is mainly based on statistics, even if it is what Google claims.
If you have some time please have a look at my post on the subject:
http://science.slashdot.org/comments.pl?sid=594853&cid=23940075
I hope my post will be as clear to you that yours has been to me.
Regards,
Netpolyglot

Google is so overrated by FAEK · 2008-06-25 05:25 · Score: 0

This article tells us just how muck Google is an evil corporation.

Scientific Method trumps Singularity every day by rholland356 · 2008-06-25 05:33 · Score: 1

These singularity flaks are at it again. C'mon, it's a not-so-pleasant sci-fi concept and nothing more.

Humans transcending into bodiless personalities floating in a data cloud unbounded by time and space? Humans becoming godlike, yet keeping their innate lust?

Oh, gimme a break! When they turn off the switch it all goes black and there is nothing more. Without the scientific method, how do you tell a petabyte of junk data from a petabyte of shinola?

Interesting Paradigm by Anonymous Coward · 2008-06-25 05:37 · Score: 0

The idea of skipping models is by itself quite interesting. Scientific method gets in since it tries to generalize based on observations. We need generalizations or models since we cannot capture the data we need. The article wants to say if we have enough data about a given phenomena, we no longer need to simplify reality using a model. We can instead just use the facts as they are in reality. Models come at the cost of simplifying assumptions, which make them, according to the infamous GÃdel Incomplete Theorem, paradoxal by nature. Another thing is that models are usually in continuous form. The reason is that we, humans, cannot capture an infinitely discrete system. So, models serve to help understand things without the need for capturing the whole data of the phenomena in mind. This might not be needed in the future if computers can take care of storing enough data for the given phenomena. So, I think the idea of skipping models might be realistic at some point in the future.

What a shameless self-promotion by synthespian · 2008-06-25 05:39 · Score: 1

It's time to ask: What can science learn from Google?

This Wired editor guy is clueless.

People in the bioinformatics and computational biology have been saying these things now for well over a decade (at least). It's not about Google. It's not about Craig Venter. What a fucking shame for Wired to have such a pop-science fanboy for editor.

How mainstream and behind the curve Slashdot and Wired have become...

--
Main difference between the BSD license and the GPL license: one is from California and the other is from Massachusetts

Keyword Collecting, not Searching by ProppaT · 2008-06-25 05:41 · Score: 1

I think the main issue (and I'll pre-emptively say that, no, I didn't RTFA...I'm at work, what do you want from me) is exactly what is mis-stated. Our issue is with how we collect data, not how we search it.

Waaaay back when XML was being formed from SGML, one of the the big deals was that, one day (in a perfect world) we would be able to put all of our information in a giant database and be able to meta sort it to such a degree that finding a needle in the proverbial haystack would be a non-issue. One of the problems with how we collect data now is that there's no structure or form to the data. It's raw in the rawest sense.

We're human and, thus, we're concerned with how information is visually presented, not how information is stored. That will be the next great stepping stone. In theory, searching mechanisms should be exactly that, simple and mechanical in nature. With our ever growing storage ability, we should spend the time tagging our data as it's created and dealing with a database of information twice the size as what we would otherwise have, not spending our time coming up with more and more complex search tools.

Now that information "tags" are mainstream, I think we're finally on the verge of mainstream society understanding the importance of data structuring. It's also our next stepping stone to creating the semantic web. Spend money on semantics research, not on beating a dead horse.

--
Wise men say, "Forgiveness is divine, but never pay full price for late pizza."

contradiction proves all by BorgCopyeditor · 2008-06-25 05:41 · Score: 1

Ah, but...

Googling "less is better" returns 138,000 results, while googling "less is worse" returns only 5,300 results. But since less, as is commonly acknowledged, is in fact more, we must conclude that worse is better, which is absurd. Therefore, Google is not google, insects are evil thoughts, and burning Sappho loved and sang and stroked the wine-dark sea, in the temple by the moonlight, wa da doo dah. Q.E.D.

--
Shop as usual. And avoid panic buying.

This is called Astrology by Anonymous Coward · 2008-06-25 05:59 · Score: 0

Astrology has been alive and well for millennia. Using the vastness of data from our own galaxy in order to attempt to predict similar patterns in human beings and more.

So google has a big dataset, whooptydo. Is this article suggesting that we add a new pseudo-science of internet data set movement a-la astrology? I hope not.

I can see predictions now: 'When the MS dataset eclipses the SUN dataset, the age to JAVA will be over.'

Misconception by Anonymous Coward · 2008-06-25 05:59 · Score: 0

I have a feeling most people on here aren't really understanding the article. Some are, and are simply disagreeing with it (of which I would be one - hopefully), but I do not feel that is the majority. The article is recommending a totally empiricist approach to dealing with data. His point is that as we try to make a 'law' of some physical phenomenon, we are only limiting our knowledge, as we will continue to only examine data within the confines of this law, whether that includes supporting or detracting information.
For an example, consider Einstein's Special Relativity, it both addressed the universal speed of time, while also dealing with relative notions of motion. Einstein connected the dots by ignoring the apparent discrepencies between Galileos principle of relativity (a model) and the empirically determined speed of light, by removing the model, he found an answer. At the time, scientists were convinced that there must have been some error in their measuring devices in order to account for the absolute speed of light through the ether.
While even SR is a model, I think the author would be recommending post-hoc analysis for 'models' though, rather than models that intend to predict.

Editors Should Edit, Scientists Should Science by DynaSoar · 2008-06-25 06:00 · Score: 1

There are as many cells in the brain as there are stars in the galaxy (roughly, 100 billion). There are around 100,000 receptors on each neuron, most from different cells, making the brain so interconnected that nothing is ever more than the famed "6 degrees of freedom" from anything else, the average being 3. There are dozens of neurotransmitters involved in intercellular communication and even more neuromodulators within the cells to handle processing. There is also indirect electrical communication between neurons where their overlapping dendritic trees detect the electrical fields of nearby but non-connected cells, and processing depends on those too. All these processes are interdependent -- virtually none of them happen without affecting and being affected by, the others. The numbers, amounts and even fractional dimensions of the interactions' variables involved are just one of the fields that spawned experimental mathematics, a field that even mathematicians have trouble wrapping their heads around. That's enough for the purpose of my conclusion, though I could go on.

This is the subject matter of my profession. I have no problem dealing with petabytes of data because I know what to look for. Nobody collects that kind of data throughs it in a pile and stares at it. We collect that much because we know that's how much we need and know how to find our answers in it. Just because some places (there have been many of them for years) let you throw your raw petabytes into a pile doesn't mean science itself changes. Anyone who tries to data mine petabytes withouth being intimately involved in the collection might as well start at any random place on earth and start mining for gold.

So, Chris Anderson can peta-bite my ass. Science survived a far more revolutionary fundamental change than this and did quite well since. That invention was the zero. It will always be more important than any order-of-magnitude milestone.

Ask a scientist that uses that much data if there's anything significant about it, leave science writing to them and to the decent science writers (like Alan Boyle), and keep the editors busy at their desks editing. If the editors are not so busy that they can waste time making up this pseudo-FUD out of barely understood buzzphrases, it's time to trim the budget at Wired.

--
"I may be synthetic, but I'm not stupid." -- Bishop 341-B

Modeling is the core of science by Hoplite3 · 2008-06-25 06:03 · Score: 3, Interesting

I must admit, as an applied mathematician who makes models of physical things for a living, this sort of research threatens to steal my bread and butter. It may be self-centered, but I think modeling is, beside experiment, half of science.

Simplified models are so valuable to our understanding because they tell us what information we can remove, which parts of a problem are important and which parts may be ignored. They allow us to not just make predictions, but they guide future experimentalists as to what sorts of changes will impact the system and which won't.

To be fair, it's more of a cycle: experiments generate data, models are constructed to explain the data. These models make predictions (and hopefully useful simplifications) that can be checked by further experiments to validate them. At the end of the process, we've produced a clearer picture of how a system works. Enough information maybe for someone building something slightly different to not have to test the aspects covered by the model.

I view these data-mining techniques like the scientific computing techniques of the last 30 years or so, only the inverse. Sci Comp nerds wanted to do away with experiments. They thought they could numerically simulate (relatively) exact models (like Navier-Stokes for fluid motion rather than one of its more tractable, understandable simplifications) and use the generated data instead of experimental data. The trouble was that no one will believe that the crazy new phenomenon discovered by your program is real until they see it in the lab, until they construct a simplified model that has the same behavior -- i.e. the same science as before.

The new data-mining idea is the same, but for the modeling end of things. "No models, please," they say. They'll just data-mine the experimental results and "discover" whatever the model missed. Except people will want to do experiments to verify the discovery. They'll want to build models so they can know they're doing the right experiments, and so on.

At the end, I think Sci Comp and data-mining are fantastic new tools that have a lot to offer science, but I don't think either eliminates the need for old fashioned modeling.

--
Use the Firehose to mod down Second Life stories!

The Strong AI - Singularity Nerds are BAck by TheNarrator · 2008-06-25 06:07 · Score: 1

This article felt like reading the numerous articles over the years about the imminent arrival of Strong AI written by people whose primary understanding of computer science comes from watching sci-fi movies. Now though the Strong AI god has been replaced by Google and the followers claim the Strong AI messiah is already here. Unfortunately for them they are mistaking mere information for knowledge and understanding.

Scientific method dead? by Mesa+MIke · 2008-06-25 06:08 · Score: 1

I think not.

Maybe, though, it is the end of the line for heuristic strategies for deciding which data to collect.

Beautiful by thegameiam · 2008-06-25 06:09 · Score: 1

I used to think that I could translate most dialects of bullshit into english but this threw me off guard. The most reasonable explanation is that Chris Anderson is a tool and doesn't know what he is talking about.

you, sir, are the wind beneath my wings.

--
Need Geek Rock? Try The Franchise!

Comment removed by account_deleted · 2008-06-25 06:10 · Score: 2, Insightful

Comment removed based on user account deletion

Slashdot Begat the End Of Reading Comprehension? by Bartold · 2008-06-25 06:13 · Score: 1

Ok, for you nitwits that have less than a 7th grade level of reading comprehension, or the inability to focus for more than two paragraphs, here is the summary of the article you might be able to grasp: The author is saying we are approaching the point where we can use raw computational power to mine enormous amounts of data for answers to increasingly complex questions. We can use this instead of the scientific method, because in the end, they produce the same thing: an approximation. The author then sites examples where this process was applied with degrees of success (i.e Google Searches).

This is kind of like by Mesa+MIke · 2008-06-25 06:17 · Score: 1

.. how I changed my photography method when I got a digital camera.

Now, instead of carefully considering the composition and setting of some subject so that I don't waste precious film, I can just take eleventy-nine indisciminate snapshots, and hope that one of 'em is decent.

Analysis, someone got excited. by freenix · 2008-06-25 06:33 · Score: 0

Correlation testing is hypothesis testing. The statement "X and Y are related" is a hypothesis and the test is statistics. Simple models are still useful for humans.

Re:Analysis, someone got excited. by Anonymous Coward · 2008-06-25 06:52 · Score: 1, Informative

Posting to undo accidental positive moderation. I didn't realize "freenix" is one of twitter's accounts.
Thanks for making me waste a mod point.

Car analogy by Anonymous Coward · 2008-06-25 06:48 · Score: 0

You're driving your car down the street in Paris and then suddenly in LA Jason's brakes fail at the same time someone is driving a new Tata made car in India and the person who is driving a black car just north of Buckingham Palace turns left and a green thing suddenly appears then you decide to buy some petrol and booom it's all true and then the guy in LA slows down using his hand brake and and and ...

The real problem... by jesterpilot · 2008-06-25 06:52 · Score: 1

is not correlation!=causation. The problem is collecting the data. Without insight, we will never be able to collect meaningful information. Any statistical method relies on data collected by people who knew what they were measuring.

--
Trust me, I work for the government.

As a scientist... by Amisinthe · 2008-06-25 07:03 · Score: 1

Let me just say that the scientific method is alive and well, and in none of the peer-reviewed journals I've had material published in, do they take "I Googled it" to be valid.

What does it day about Wired by treeves · 2008-06-25 07:20 · Score: 1

that the author of TFA, Chris Anderson, is the editor in chief of Wired?

--
...the future crusty old bastards are already drinking the Kool-Aid.

T.S. Eliot and Frank Zappa weigh in by markjhood2003 · 2008-06-25 07:34 · Score: 1

Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? T.S Eliot, "The Rock", 1934

Information is not knowledge, Knowledge is not wisdom, Wisdom is not truth, Truth is not beauty, Beauty is not love, Love is not music, and Music is THE BEST. Frank Zappa, 1979

After reading the summary in my RSS reader... by slashgrim · 2008-06-25 07:44 · Score: 1

I anticipated logging into Slashdot expecting an educated philosophical discourse and instead just read these rants.

I am going to go read some mature comments on Digg

/sarcasm...especially the last sentence

Excuse me... by Anonymous Coward · 2008-06-25 07:59 · Score: 0

"Excuse me, but 'proactive' and 'paradigm'? Aren't those just buzzwords that dumb people use to sound important? Not that I'm accusing you of anything like that... I'm fired aren't I?"

Googled by SolarStorm · 2008-06-25 08:02 · Score: 1

I googled this and found 187,003 entries. Pick 5 of them, and found out that this really isnt a problem.

I'm napping until the exabyte age - Rip Van Winkle by peter303 · 2008-06-25 08:04 · Score: 1

Too many goofy pundits in the petabyte age.

Realpolitik, the word we're searching for? by netpolyglot · 2008-06-25 08:11 · Score: 1

I've read most of your posts, and it seems that my approach is quite different, mostly focused on what the article says about Google fundamental method, alias PageRank.

If Craig's Venture sequencing of genome revealed that environment can influence heavily inheritable genetic traits, this influence is still to exist and to be found in the environment.
Google is dominating the Internet's environment, which happens to be presently one of the dominant environment -or context, at least- in which human beings develop relationship and communicate.
What is said, in this article, to be Google's philosophy, 'we don't know why this page is better than that one...', is actually Google 's declared or confessed philosophy.

Absolutely opposed to Google's pretended ignorance (or maybe some gap in my knowledge of Klingon), the success of that company would rather rely fundamentally on the analysis of hyperlinks, as a sign of relationship and hierarchy between sites, then between communities of Internet users, then between social groups in a human environment. Google understood that the hyperlinks were an unambiguous expression, mechanically exploitable, of a socially determined human relationships. So it is absolutely wrong to say that Google requires âoeNo ... semantic analysisâ. To work, it does requires it, yes, it absolutely requires semantic analysis, and above all semantic analysis mixed with social analysis based on hyperlink observation.

Google does not know, indeed, if this page is better than that one. And this should precisely explain its success. It mainly and simply understood that the human, who consciously make meaningful hyperlinks between webpages, know better than Google ever will, which are the best pages. Google in particular made a brilliant bet on that human knowledge, and speculated heavily on it, as its foundation to provide some quality, instead of spending time and power calculating randomly numeric relationships between words and text elements as Altavista did, in vain.

As regard the term 'raw data' mentioned in the article, everyone will agree that hyperlinks have nothing to do with raw data, undetermined data, on the contrary, because one doesn't put insert tag of an hyperlink on a webpage, pointing to another webpage, unconsciously, which is not the case of many words we use, that always have many ways of being interpreted, including meanings not intended (heard about lapsus?). Hyperlinks are definitely pre-structured data

Google is not a method to find truth, indeed, it's not needed since human are much better at filling Google with their truths... and beliefs! Google provides a method, which is as servile as brilliant and acute, to reproduce and certainly worsen, through ranking, relationships of power just as they already exist and are inherited in real human societies. Google pretends that it just devours our relationships between us to rank us and spits out the (our) truth.

Google owns the Internet, ok, but human social reproduction is still what determines the Internet. Google knows that, that's why it wants to control Internet environment, to know more about how human determination, and maybe how to influence it. This is definitely more a political than a scientific method. It would be better, instead, to use the already old expression âoerealpolitikâ, it is well-proven that it can be pretty useful, oh yes.

What I felt about this article, is some irony, in particular when the author says that Google can translate perfectly from Klingon to English. Don't you?

Re:Realpolitik, the word we're searching for? by netpolyglot · 2008-06-25 08:55 · Score: 1

Maybe this definition of the word REALPOLITIK could help:
http://www.merriam-webster.com/dictionary/realpolitik

O shut up by Anonymous Coward · 2008-06-25 08:21 · Score: 0

Google does use a model; not some whimsical nothingness. Itâ(TM)s a patterning matching algorithm that also incorporates the page ranking (which from my guess is a form validation on the model and part of the model; hence correlation.) Which means the data Google has is well organized. Perhaps the data is not in a form that we would understand or can use but in a form that the programs can access quickly and easily use, otherwise people would not use Google. Or another way of saying it is that computer programmers may know little to nothing about the source of the data, but they do know something about how the data can be related and used to create their programs to make Google efficient. They are not blind when writing the code and just hoping that their stuff will work, though some coders donâ(TM)t all ways know the big picture; someone normally does. I think the author of this article is just overwhelmed and is panicking; its said to see it posted.

TFA draws from Feyerabend and Against Method by Anonymous Coward · 2008-06-25 08:21 · Score: 0

TFA is a take on Feyerabend's notion that science doesn't - and shouldn't - use any kind of method or methodology to do research:

http://www.marxists.org/reference/subject/philosophy/works/ge/feyerabe.htm

Feyerabend completely rejected the idea that scientists actually use the scientific method and essentially stated that all scientific theories are baseless, ad-hoc explanations of raw data that will be thrown out completely when new data incompatible with those theories comes along. Feyerabend is really popular with some 'philosophers of science' who'd like to see scientists taken down a peg or two, but he isn't really taken seriously by anyone else. Hell, he even went so far as to say that astrology was as valid as science:

Feyerabend described science as being essentially anarchistic, obsessed with its own mythology, and as making claims to truth well beyond its actual capacity. He was especially indignant about the condescending attitudes of many scientists towards alternative traditions. For example, he thought that negative opinions about astrology and the effectivity of rain dances were not justified by scientific research, and dismissed the predominantly negative attitudes of scientists towards such phenomena as elitist or racist. In his opinion, science has become a repressing ideology, even though it arguably started as a liberating movement. Feyerabend thought that a pluralistic society should be protected from being influenced too much by science, just as it is protected from other ideologies.

http://en.wikipedia.org/wiki/Paul_Feyerabend#Role_of_science_in_society

The idea put forward in TFA that we could eventually dispense with models completely and just interpret data has its roots in this type of radical thought. The real crux of TFA is this:

Learning to use a "computer" of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

Anyone familiar with the fallacy of equating correlation and causation can immediately see one glaring problem with the "Correlation supersedes causation" statement. No matter how much data we have to analyze, correlation and causation will always remain two separate concepts and it will always be an error of reasoning to confuse them.

The rest of his comment is suggesting that, with enough data, we won't need to use models anymore to understand the world. Instead, we'll just be able to look at an essentially infinity large data set that will be examined with essentially infinite processing power and we'll be able to predict the outcomes of any action or set of actions without having to have any sort of understanding of 'why' or 'how' those actions are taking place. TFA is saying that we can reduce science to nothing more than an elaborate method of making predictions and dispense entirely with the notion of even trying to understand the underlying mechanism that explains why those predictions do or don't work. Naturally, this is ridiculous. Firstly, because it ignores the fact that those underlying mechanisms *do* exist - even if our knowledge of them is imperfect or incomplete. Secondly, because it would require unlimited data and processing power to even approach being workable. Without a complete data set for every action and every outcome of every action, this kind of prediction without models can't work on the universal scale that TFA is talking about. That amount of data will NEVER exist. It just isn't possible to determine the outcome of all actions and events. It's absu

Comment removed by account_deleted · 2008-06-25 08:37 · Score: 1

Comment removed based on user account deletion

Comment removed by account_deleted · 2008-06-25 08:48 · Score: 2, Insightful

Comment removed based on user account deletion

TFA draws from Feyerabend and Against Method by Anonymous Coward · 2008-06-25 08:51 · Score: 0

TFA is a take on Feyerabend's notion that science doesn't - and shouldn't - use any kind of method or methodology to do research:

http://www.marxists.org/reference/subject/philosophy/works/ge/feyerabe.htm

Feyerabend completely rejected the idea that scientists actually use the scientific method and essentially stated that all scientific theories are baseless, ad-hoc explanations of raw data that will be thrown out completely when new data incompatible with those theories comes along. Feyerabend is really popular with some 'philosophers of science' who'd like to see scientists taken down a peg or two, but he isn't really taken seriously by anyone else. Hell, he even went so far as to say that astrology was as valid as science:

Feyerabend described science as being essentially anarchistic, obsessed with its own mythology, and as making claims to truth well beyond its actual capacity. He was especially indignant about the condescending attitudes of many scientists towards alternative traditions. For example, he thought that negative opinions about astrology and the effectivity of rain dances were not justified by scientific research, and dismissed the predominantly negative attitudes of scientists towards such phenomena as elitist or racist. In his opinion, science has become a repressing ideology, even though it arguably started as a liberating movement. Feyerabend thought that a pluralistic society should be protected from being influenced too much by science, just as it is protected from other ideologies.

http://en.wikipedia.org/wiki/Paul_Feyerabend#Role_of_science_in_society

The idea put forward in TFA that we could eventually dispense with models completely and just interpret data has its roots in this type of radical thought. The real crux of TFA is this:

Learning to use a "computer" of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

Anyone familiar with the fallacy of equating correlation and causation can immediately see one glaring problem with the "Correlation supersedes causation" statement. No matter how much data we have to analyze, correlation and causation will always remain two separate concepts and it will always be an error of reasoning to confuse them.

The rest of his comment is suggesting that, with enough data, we won't need to use models anymore to understand the world. Instead, we'll just be able to look at an essentially infinity large data set that will be examined with essentially infinite processing power and we'll be able to predict the outcomes of any action or set of actions without having to have any sort of understanding of 'why' or 'how' those actions are taking place. TFA is saying that we can reduce science to nothing more than an elaborate method of making predictions and dispense entirely with the notion of even trying to understand the underlying mechanism that explains why those predictions do or don't work. Naturally, this is ridiculous. Firstly, because it ignores the fact that those underlying mechanisms *do* exist - even if our knowledge of them is imperfect or incomplete. Secondly, because it would require unlimited data and processing power to even approach being workable. Without a complete data set for every action and every outcome of every action, this kind of prediction without models can't work on the universal scale that TFA is talking about. That amount of data will NEVER exist. It just isn't possible to determine the outcome of all actions and events. It's absurd. Finally, even if you could somehow accumulate that god-like amount of data you wouldn't need to worry about determining the underlying mechanisms whereby things worked any

Mod parent up! by dj_tla · 2008-06-25 08:52 · Score: 1

I wish I had mod points. You're absolutely right. If you believe strong AI is on the foreseeable horizon, you're so incredibly wrong.

As an analogy, think about how much information is in your DNA. Even compressed, the data fills up a CD-ROM (650 MB). We are nowhere near explaining how our DNA or the brain that our DNA prescribes work. How can we arrange the petabytes of data on the internet into something useful? It's going to take work, and lots of it.

"All models are wrong" - Climate models, anyone? by stankulp · 2008-06-25 08:54 · Score: 1

Climate models, as we all know, are infallible.

Consensus, don't you know?

--
We must be alert to the danger that public policy could become captive to a scientific-technological elite. - Eisenhower

"Science" by StellarFury · 2008-06-25 08:55 · Score: 1

His example about Venter is particularly offensive, too. He's saying Venter's discovery of theoretical species is somehow analogous to the discovery and classification of an existing species, and because Venter didn't classify his "species" and used genome-sized data piles to model them, it somehow proves that classification is worthless.

Sure, models are "wrong." That's why they're MODELS. I don't look at a 1/50 scale model of an experimental aircraft and say "Well, geez, how is this thing useful? It's made out of wood! It doesn't fly! You'll never get any useful information out of this!" The model helps us, the experimenters, wrap our heads around exactly what we're doing. When scientists create models, they're not really trying to replicate the universe, they're trying to make the complexity of it make sense to them. That's why we have multiple models of the same damn thing - look at the atom: the Bohr model, the Rutherford model, the Quantum Mechanical model. Sure, some are more accurate than others, but each are useful to us in understanding the phenomena that occur.

We, humanity, are the ones doing science. The computers are tools. If the computer understands what all the petabytes mean, good for it - but WE still need to understand, and the way we understand is by fitting the data to a model. Then we test the model and find where it's incorrect. The size of the data set is completely irrelevant, because the data will always be used to confirm or reject a hypothesis.

Re:Slashdot Begat the End Of Reading Comprehension by StellarFury · 2008-06-25 08:57 · Score: 1

Answer me this then: how exactly are you "mining" the data for "answers"? And once you've explained that, explain how it differs from the scientific method.

changed my mind again. by Anonymous Coward · 2008-06-25 09:12 · Score: 0

Modded GP back up because good things should rise regardless of who writes them. UID 0 has infinite modpoint you know.

model selection by inkyblue2 · 2008-06-25 09:13 · Score: 2, Insightful

i'd never heard the term "model selection," so thanks for pointing that out. it looks like there really is some good literature to read on the subject.

the process described by the model selection sites i skimmed still doesn't adress what i was getting at, though. "choosing a model from a set of potential models" is only conceivable when your set of potential models (and set of variables to potentially be modeled) is well bounded.

to put it another way, take the smartest model choosing algorithm you can find, hand it a pile of data, and say "what do you make of that, smart guy?" i'm willing to bet that the answer is going to be along the lines of "wtf?" unless there is some sort of context or metadata provided along with the data to give the algorithm a hint of what it's looking for. am i looking for covariance between scalar values among regularly organized groups? am i looking for white rabbits in the image data from a camera? is this ascii or ebcdic or 8-bit PCM data? you can argue that these questions are trivial, that no algorithm can be *that* general, but that is precisely my point: all known algorithms require significant narrowing down of the problem space by human hands before they can begin to produce useful output.

if you had an algorithm that took *truly* semantics-free data in one end and spit models of regularly occuring features out the other end, you'd be halfway to general AI.

Moot tools by Anonymous Coward · 2008-06-25 09:39 · Score: 0

"All /.'ers are wrong, and increasingly you can succeed without them."

Comment removed by account_deleted · 2008-06-25 10:20 · Score: 3, Interesting

Comment removed based on user account deletion

Agent Smith by Smartcowboy · 2008-06-25 10:46 · Score: 1

Mr. Anderson... you disappoint me.

Data dominates methods by danbaatar · 2008-06-25 10:46 · Score: 1

I work in Natural Language processing/Computational linguistics, and I think I can see where this article is coming from.

Look at the case of Machine Translation (MT). For a long time, the approach to getting better MT was to develop better models (alignment models, syntactic models, semantic models, etc.). The idea was that if we used more sophisticated methods to model language phenomena, then we'll capture more nuance and produce better results. Well, that worked for a while. As, people started collecting larger and larger amounts of data, to the point where meaningful statistics could be calculated over a collection of text, simpler statistical models started beating out complex models created by trained experts. One of the big names in MT from IBM made the controversial, but accurate, statement that every time he fired a linguist, his accuracy went up.

Fast forward to Google and their petabytes of text data. At this point, some very sophisticated models, developed by some very intelligent people are being beaten hands-down by systems Google is working on that use very simple methods, trained on LOTS of data. The point is, as long as your models/techniques are reasonable having more data seems to dominate using better methods.

Re:Data dominates methods by netpolyglot · 2008-06-25 13:45 · Score: 1

The big name in MT from IBM who fired linguists may have hired them for a wrong purpose.
Your assertion omits a fundamental point: Translation Machines ONLY work great within uniform and specialized fields of knowledge.
This is because human specialized languages behave closely to programming languages, so they are more likely to be processed by computers.
Moreover, technical writers have learned how to write for the machine: they try to isolate particular and recurrent actions and express them always with the same isolated sentence.
An example of that kind of sentence: "Click with the right button of the mouse". In a normal text, this sentence could be easily inserted into a bigger sentence, or subjected to some variation. But technical writers know that they must make a single invariant sentence, finished by a dot, to suit the machine skills.
They try to make idioms behave like programming languages do, like context-independent languages.
But don't forget that even in specialized fields, MTs must be used carefully because of the neologisms, that can reach a great percentage of the lexicon (almost the half) in high tech fields.
Right?
So, does the corpus-based statistics work here?
Believe me, if you translate manuals of equipment that threaten their user's life (the only that MUST be translated according to the international law), and you keep thinking only in terms of corpus-based statistics, you should be fired immediately! On the contrary you'll kill people!
MTs do have good performance when used with restrictions, for texts that behave closely to the formal languages used to program them.
Corpus-based statistics applies in some specialized contexts, to the part of an idiom that can be reduced to formal languages, which is rather narrow.
Anyway, I think the point here is somewhat different:
http://science.slashdot.org/comments.pl?sid=594853&cid=23940075
Regards,
Netpolyglot

Uplift Universe by bar-agent · 2008-06-25 10:49 · Score: 1

I saw a different parallel, to Brin's Uplift series.

The galactic civilizations in that series relied on computers and algorithms handed down from the dawn of time. Their computers and algorithms were so powerful, in fact, that they worked with the fundamentals. They didn't have floating point numbers or calculus or abstractions about area or specular highlights. Their algorithms worked with integers, counted virtual atoms, and traced virtual photons.

The blind correlation techniques described in the article seem very similar to me. Don't bother with models and abstractions, with power-law and bell-curve approximations of reality, just deal with reality directly.

It's like the dude said, "God invented the integers; all else is the work of man."

It also reminds me of the Chinese Room. You don't need understanding, you just need patterns. This concept was fruitfully explored in the hard SF book Blindsight, by Peter Watts.

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]

Re:Uplift Universe by __aagmrb7289 · 2008-06-25 12:49 · Score: 1

I'm not positive I understand - are you saying that we've got a data source, in Google, for example, that isn't modeling reality - it IS reality, and that we no longer need to check in with anything else to get information FROM reality? Something like that?

I'm afraid I'm a bit confused. And, sad to admit, I haven't read the Uplift series (bad me!). Can you expand on your thoughts a bit for me?
Re:Uplift Universe by bar-agent · 2008-06-25 13:19 · Score: 1

I'm not positive I understand - are you saying that we've got a data source, in Google, for example, that isn't modeling reality - it IS reality, and that we no longer need to check in with anything else to get information FROM reality? Something like that?
No, more like... in the same way as the article suggests using statistical relationships derived from mass quantities of data to form theories about reality, instead of using simplified models, the computers in the Uplift series used more concrete abstractions of integers and elemental particles and manipulated vast numbers of them in their computations, instead of using mathematical simplifications like floating point numbers or calculus.
Of course, in the Uplift series, humans developed the sciences without the aid of those computers, so we had all these effective mathematical models that the rest of galactic civilization never needed to invent, which gave us an advantage in explaining strange occurrences that came to light over the course of the series.

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
Re:Uplift Universe by __aagmrb7289 · 2008-06-25 14:10 · Score: 1

Okay, I see what you are saying - and it sounds good. I guess that I don't feel the article is actually talking about that - it's not about using mass quantities of data to make theories, it's about mass quantities of data making theorizing and the scientific method unnecessary. What you are describing just means that we've got a huge, massive set of experimental data that can be mined (and then confirmed), leading to better experiments. Right?
Re:Uplift Universe by bar-agent · 2008-06-25 14:28 · Score: 1

You are probably more correct about what the article is actually talking about. The article says that, in the brave new world of tomorrow, "correlation supersedes causation, and science can advance even without coherent models." What comes out of the process are correlations, not theories with explanations.
We are sort of moving to a pure correlation system as it is, with chaos theory. Last I heard, chaos theory had strange attractors -- like Jupiter's Great Red Spot(s). We don't know how they form, because the system is too complex to ascribe understandable causes to their formation. We just know that they are there, and mostly stable, until they aren't, when they vanish. But, we can sort of look at a system as a whole, like a weather system, and make predictions about what happens. In actuality, we build particle- or cell-based computer models, let them run, and see what happens. We don't have any simpler models that we can apply. The article says we are heading there, and this is how the Uplift computers worked.
A lot of fields are like this. Medical practice is all about correlated symptoms. When we finish delving into subatomic particles and the fundamental nature of the Universe, I doubt we'll find a reason for anything, we'll just find patterns or "laws" that hold true.

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]

Anderson vs. Scientific and Historical Materialism by Anonymous Coward · 2008-06-25 11:17 · Score: 0

Science's job is to understand, predict and control.

Deriving correlations from giant data sets is useful precisely so that we CAN form models, theories, etc. that enable us to make predictions and control outcomes.
Fundamentally, we are interested in cause and effect relationships even when they are complex or emerging properties.

Correlations suggest potential relationships, which demand design of experiments to determine if there is a causal relationship.

I was one of the co-founders of the company, Pangea Systems/DoubleTwist, which made the software which assembled and annotated the Human Genome in 1999, beating Venter's company by 6 months.

We based our business on utilizing enormous amounts of the public DNA, protein, and biochemical metabolic data to derive correlative relationships, but the purpose always was to enable formulation and testing of hypotheses.

This allowed us to assemble the genome out of tiny bits of sequences and to infer potential biochemical function by transitive logic leveraging the known functionalities that had been discovered by scientists who laboriously conducted real biochemical experiments with isolated proteins, whose sequences we now could piece together from the fragmented, disparate data.

Anderson reflects a view common among those who deny material reality and defer solely to mathematics and statistics, and in fact, consider math to be the a priori of the universe, not material reality.

The phenomenal success of the human genome project was a vindication of the scientific method and the work of thousands of brilliant engineers from many disciplines leveraging hundreds of theories for each of their disciplines to deliver that data that Anderson strips of physicality, causality, hypothesis and testing.

Unfortunately, this tendency of philosophical idealism is manifested in many ways among the technorati.
From sophomoric Matrix-philosophies that everything we perceive is an illusion to the useless and solipsist "anthropic principle" that has become current in physics stating that 10^500 universes are possible, but we could only happen to exist in one that allowed intelligent beings to exist, blended in with string theory as a way to deflect discussion about the lack of experiments suggested by their theory.

Anderson, Matrix-ites, Anthropocists all share a denial of historical and dialectical material reality and the scientific method.

Unfortunately, two generations of progressives have been poisoned by anti-materialist, Post-Modernism, and the resulting intellectual rot continues to fester in myriad forms.

Re:Anderson vs. Scientific and Historical Material by Anonymous Coward · 2008-06-25 11:43 · Score: 0

Unfortunately, two generations of progressives have been poisoned by anti-materialist, Post-Modernism, and the resulting intellectual rot continues to fester in myriad forms. I'll go you one further - we've had 2 generations of people who called themselves progressives who were actually nothing of the sort. Any movement that, in whole or in part, denies reality simply cannot be 'progressive' in any sense of the term. The real progressive movement has been effectively dead for decades - at least in academia. I wouldn't hold my breath for its return, either. The post-modernists and their descendants in the humanities are far more concerned with published papers that will assure themselves of tenure whilst simultaneously attacking those arrogant scientists across campus who would dare to produce results in areas where the anti-realists could only speculate and pontificate than they are with any other endeavor. So long as that remains true, there won't be a progressive movement in academia worthy of the name.

Google statistics aren't science by crovira · 2008-06-25 12:42 · Score: 1

The author doesn't know what the difference is, that's his look out.

I never considered Google stats to be science.

There is a world of difference between knowing that objects fall at 32ft/s/s and knowing that something has a probability of correlation with some index or other.

The difference between quantum physics and statistics, (apart from Einstein's objection that "God does not play dice with the universe,') is that the "unknowns" in quantum physics are matters of fractions (like: 80% of the universe is hidden from us in Dark Matter and Dark Energy,:-) and are things that we know we don't know.

But we know that we don't know them...

Statistics can lead one anywhere in the KNOWN universe.

Science can lead one into the unknown.

--
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.

Poor Reading Comprehension, eh? by Anonymous Coward · 2008-06-25 13:19 · Score: 0

The author is saying we are approaching the point where we can use raw computational power to mine enormous amounts of data for answers to increasingly complex questions. We can use this instead of the scientific method, because in the end, they produce the same thing: an approximation.

And you have the gall to chastise slashdot readers about their reading comprehension? The 'point' the author was trying to make goes a lot deeper than the simple use of data mining. He was suggesting that we could use data to completely dispense with the need for models of any sort in science. He was suggesting that, with enough data, we wouldn't need models that would attempt to approximate or explain the underlying mechanisms responsible for phenomenon. Rather, we could - with enough data - just make predictions on what events would proceed from what precursors based upon pure statistics. We could do away with any notion of even trying to understand 'how' or 'why' things happen and just focus on 'what' was likely to happen given a specific set of conditions. That's a lot deeper then what you've provided in your summary. It's also a point of view that is profoundly naive.

In order for the 'modeless' approach to be capable of replacing the scientific method we'd first have to define that the purpose of science is not to understand the 'how' or 'why' of phenomenon, but rather say that science exists exclusively to make predictions and that it can tell us nothing about why its predictions work or what mechanisms are at play with respect to the phenomenon that these predictions concern. That's a huge leap that not many scientists are going to be willing to make.

Further, without models, we'd be in no position to be able to interrelate the different phenomenon about which we were making predictions. Without such a framework our ability to understand and make use of our predictions would be severely impaired. To give a more concrete example, just because we have data that shows that IF p -> q 80% of the time and IF z -> q 50% of the time that doesn't mean we know anything about why this happens and whether or not there is a relationship between p, z and the way they lead to q. With good models we could make statements and predictions concerning these questions. Without models all we can do is wonder.

The author might respond to this charge by saying that, with enough data, this problem would disappear. Well, that's true - if you have enough data. The thing is, in order to completely replace the need for models you need *complete* all-encompassing data for whatever subject you're discussing. You have to know the outcome of every possible event and combination of events. At that point, you don't need models to make predictions anymore simply because you don't have any predictions left to make. You also don't have to worry about trying to use models to explain the underlying mechanisms at work that cause the phenomenal behavior being studied and predicted because, with complete knowledge, you'll have already picked that up along the way. So, yeah, as soon as we have perfect knowledge and complete data we can dispense with models and the scientific method. I won't be holding my breath for that day.

Sure, you can use statistical approaches to make predictions about the behavior of systems that aren't well understood or well modeled. This has been done with varying degrees of success for a long time and increased computing power will only make this technique more useful. However, to suggest that we'll ever be able to substitute the need for models and the scientific method with pure statistics is nothing but blind folly.

Have we reached a time...? by Geno+Z+Heinlein · 2008-06-25 13:37 · Score: 1

Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths? I think there's a Star Trek analogy here.

When Gene Roddenberry was putting together TNG, his theme for the series was "Prometheus Unbound". Roddenberry wanted to explore a world of ideas, to explore what happens when scarcity is not a factor. When you have so much energy and so much technology that the only limitation is your imagination, what do you do then? What decisions do you still have to make? What conflicts must you still resolve? (I think this can be a difficult sell because few of us can relate to a scarcity-free world. If you look at the luxurious 1701-D and then compare it to, say, Babylon 5 with downbelow and lurkers, it's clear which one is more like our world, and thus, which one is more about our personal stories.)

So now imagine a world so Googleized that finding the facts is never a problem. What we do then is decide what to do. What kind of world do we build? What kind of society do we want?

We can abandon representative democracy because referenda have no cost -- Google knows how people will vote. Do we still want representative democracy because there are social benefits that come only with that structure? What about communism or capitalism? Who cares! Google has correlated all information everywhere and knows what goods to ship where... if we can agree on what our beliefs and priorities are. A more Googly world also gets us closer to the idealized Adam Smith "perfect information flow" discussed in Economics 101. Is the end result of this a win for libertarians? Or will perfect information flow have different consequences than we believe it will?

When Google's clusters tell us the facts, our role will be to be the deciders of things. We will have to choose those things that are beyond facts. Personally, I feel this gets us closer to our ideal selves. We'll get closer to choosing what we really want rather than choosing from the subset of what we can have.

--
Five percent of one year's DoD budget puts us on Mars.

Man was I relieved by v(*_*)vvvv · 2008-06-25 13:46 · Score: 1

when I saw so many other posters call this article out. Hope is not lost, when people can see bull for what it is!

The reason why theory is not necessary is because they have petabytes of EVIDENCE. Yes, the votes are in, so tally them up with computers, and spew out some results. But if no theory was involved, then why is google so much better at it than other now obsolete search engines? Maybe they have better science. What a leap!

Unfortunately, it is well written, and many "smart" people simply uninformed in this particular field may find compelled by some of the arguments... but then again, they probably won't. At least I hope not.

Disturbing and Wrong Conclusion. by plasmacutter · 2008-06-25 14:11 · Score: 1

Today companies like Google, which have grown up in an era of massively abundant data, don't have to settle for wrong models. Indeed, they don't have to settle for models at all. I believe i'll take the valley girl "AS IF" for this one.

There is no such thing as perfectly accurate data, especially on computers where the tools used are not platform independent (half of laptops sold to college kids are macs now buddy!)

There are considerable numbers of people who take passive and active measures to avoid being tracked on multiple levels.

the conclusion: if you actually believe this conclusion and apply it to your marketing/product development you will fail to serve a considerable number of people and possibly alienate this group if they used to be your customer before you decided the data didn't justify the cost of feature X.

If your product is information, you will encourage piracy. If your product is something else, someone else will come and eat your lunch, especially in tech where that outlier which uses the obscure features you "dropped" is the same group advising friends on purchases

--
VLC FOR MAC IS DYING! IF YOU DEVELOP, PLEASE SAVE IT!!

I'm sorry that google is not your friend by alizard · 2008-06-25 15:22 · Score: 1

I use it successfully for anything from helping me debug software to finding a list of somebody's campaign contributors to ... anything I can come up with search terms for. I'd say that on information search (image search is iffier), I get useful results about 99% of the time, though it might take a while to get those results.

I suggest you browse the google help docs.

Not to say it couldn't be better, they still don't have the Boolean NEAR or DATE operators, and in some ways, search would be easier if I could simply enter Boolean operators.

--
Tech Public Policy stuff

Profound lack of insight by jandersen · 2008-06-25 17:07 · Score: 1

All models are wrong, and increasingly you can succeed without them All models are imprecise, is the correct statement - there is a hell of a lot of difference between that and "they are wrong", which implies that they are in plain contradiction with the observed facts. All scientist know that their models have this limitation, which is why they keep researching, so they can improve the model.

The only compelling thing about this is that it compels me to conclude that neither the author nor the poster knows anything about science. This kind of nonsense is, in my view, a clear symptom that the agenda of the religious right is succeeding all too well in destroying scientific awareness and critical thinking. To hell with them.

Where can one find this "Scientific Method". by BrunoUsesBBEdit · 2008-06-25 17:41 · Score: 1

I tried to RTA, but couldn't stomach it. This /. caught my eye because I have found that "The Scientific Method" is either dead, forgotten, or something that my 9th grade biology teacher made up. I tried to explain to a colleague how my approach to all problem solving was taught to me by Mr. B... Dang, forgot his name! Anyway, when I tried to find an article on the steps, Hypothesis... Experiment... Conclusion, etc. I couldn't find anything anywhere. It's like it is even used anymore! I'm trying to convince people to use this method, and I can't even find an explanation to show anyone.

--
The only stable state is the one in which all men are equal before the

Philosophy Student Hypothesis by Anonymous Coward · 2008-06-25 17:57 · Score: 0

My guess is that someone who admires wordy high-falutin' postmodern philosophy text or who was once a subscriber to OMNI magazine got involved with this article.

Probably an intellectual imposture with rectangle glasses and a turtleneck.

http://www.elsewhere.org/pomo/
http://www.theonion.com/content/opinion/why_do_all_these_homosexuals

why this is not scientific method? by Anonymous Coward · 2008-06-25 18:15 · Score: 0

what is fact/truth?
u need to know the mechanism AND reality prove
mathematics is kind of representation but doesn't mean only math can prove stuff
textual or visual is also the way to prove stuff in higher level or just phenomena
but that doesn't mean it's not scientific
blindly believe in math means nothing...ppl could do right math but with wrong answers

Can you spell HYPE TO SELL ADS TO WEAK MINDS? by itsybitsy · 2008-06-25 22:35 · Score: 1

Wired is tired.

"The Data Deluge Makes the Scientific Method Obsolete"? How absurd.

The scientific method requires data to make hypothesis and theories testable.

Data with models, hypothesis, theories is just useless as is the article's observations and conclusions.

'Speaking at the O'Reilly Emerging Technology Conference this past March, Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."'

As someone else observed WTF???

Without models you've got nada.

What's going to replace models?

"This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear."

Are not applied mathematics models? Duh... They make one giant claim in one statement and in the very next sentence they contradict themselves and prove that they are full of shit!!! Cool way of writing... but not my style.

While I'm not a particle physicist I suspect their claims in this statement are simply junk: "The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the "beautiful story" phase of a discipline starved of data) is that we don't know how to run the experiments that would falsify the hypotheses -- the energies are too high, the accelerators too expensive, and so on."

It's just a sound bite designed to attract as many none critical thinkers as possible. Given how many people like mind poo this poo pablum is typical of writing that passes itself off as science based.

Icky mind poo guys. Surely you can do better?

"In short, the more we learn about biology, the further we find ourselves from a model that can explain it."

What??? That doesn't mean that we won't find a model that works? That doesn't mean that we can't use information science to find models that work!!!

"The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all."

Huh?

While I'm all for "correlation" (in fact I love correlation of various kinds) I have no idea what they mean when they say that "science can advance even without coherent models, unified theories, or really any mechanistic explanation at all." That's just beyond bizarre and profoundly and deeply mistaken understanding of science.

Maybe there is a useful message in the article about data mining on petabyte scales and higher, but it's not the advertised "End of Theory".

Counter article on Ars by v(*_*)vvvv · 2008-06-26 00:07 · Score: 1

Why the cloud cannot obscure the scientific method by John Timmer.

If you've scrolled down this far, you might as well read it all :)

Historians are vampires by Anonymous Coward · 2008-06-26 00:22 · Score: 0

lol Blade:House of Cthon

I can tell you an history by Anonymous Coward · 2008-06-26 02:03 · Score: 0

On the same line about Science vs AI,
at Napoleon time there was a real example.

At this time it was very important to known
how to shoot canon ball in order to destroy
the other army. The techniques at this time
was a regression approach, they fire and look
where the ball crash and update the model until
have a good acurracy. Just like AI does.

Napoleon had very good people working on this problem (fourier, and others), these people
using the scientific method discover a very precise balistic law from first principle.

The concecuence was that Napoleon wins most all
the first battle, because he take less time in
pointing and destroy his enemy army.

CU

The end of data: some epistemologigal reservations by UBfusion · 2008-06-26 05:06 · Score: 1

I could summarise TFA as follows: we don't need science, we don't need models, we don't need scientists, all we need is competent statisticians and analysts. My first reservation is the oversimplification of science as presented in TFA. It is a very bold claim to assume there is a universal "scientific method". Hard sciences (e.g. physics) and soft sciences (e.g. sociology) have totally different research strategies and criteria of model success. Similarly, I don't think there are universal roles of models and mental models in sciences. Finally, a model may be able to predict the past, i.e. fit into existing data, but this does not guarantee that the same model can reliably predict the future. The second reservation (and I am glad this was implied by the comment above on LHC data) is that both data collection and data treatment are inevitably theory-laden. On the one hand, we need a theory to provide criteria of what counts as data. In the LHC experiment, an valid event may be masked by millions of noise data points. Both a very solid theory of the phenomena and a theory of the instruments are needed to declare an event as valid. On the other hand, data treatment (number crunching) also implies taking many assumptions on the distribution of the data, and on the probabilistic and statistical paradigms used to approach them. To provide an example, how do you interpret a statement like "the probability of Brooklyn Bridge to collapse in the next month is one in a million"? The person claiming this has studied a million bridges like the Brooklyn one and found that on average one of them collapses every month? In that sense, I'm afraid that the scientific method, unless premises and prior knowledge are explicitly and adequately taken into account, cannot help hunting its tail. Theory will always be underdetermined by data. In addition, as some commenters have indicated, some data coming from complex processes are well known to resist analysis. Global warming is an excellent example, where different starting points predict totally different outcomes. The techniques implied by TFA may provide good likelihood estimates, that is probabilities of data obtained by a specific model, in the same sense that we can check the correct spelling of a word by the number of its google references. However, when the data are scarce, googling cannot help. I would trust more an artist to provide answers to interesting questions like the meaning of the universe ("42") than the billions of monkeys like me typing the google search space. I think a more appropriate title for the article would be "The end of data". In the ocean of data, the relevant bits are lost and their identity gets diluted in the vastness. The end of humanity must be just the same thing - loss of identity, heat death, maximum entropy: all is the same, all is equal, all is meaningless.

rigatoni by ReedYoung · 2008-06-26 10:43 · Score: 1

reply to TFA on wired.com, by 'rigatoni':

More data allows us to do what we're already doing faster and more efficiently, but it can't open any new possibilities without trying to interpret the data. Chris Anderson correctly observes that some value can in some spheres be extracted from the mere correlation of data (Do advertisers care why method A is correlated to more sales?), then incorrectly asserts a generalization based on his sometimes-correct observation of the superiority of correlation over the more challenging question of causation, that causation is categorically irrelevant. On the contrary, CxOs, directors and investors care very much why costs increase. Whether lesser employee expertise or greater fuel costs, for example, are the primary cause of reduced profit margins is a very important distinction which calls for robust, complete knowledge of theory to invest or shape corporate strategy intelligently. Even for the ideal search engine, one needs theory to know what terms to dump on the search engine. I won't subscribe to Wired, nor support any of Chris Anderson's future ventures, categorically.

--
"I can't imagine how things could get any worse!" (some guy) "That could just be failure of imaginatioÂn on your p

Slashdot Mirror

Google Begat the End of the Scientific Method?

387 comments