Slashdot Mirror


Google Begat the End of the Scientific Method?

TheSauce writes "In a fairly concise one-pager from Chris Anderson, at Wired, the editor posits that all of our current (or now previous) models for collecting data are dead. The content is compelling. It notes that we've entered the Age of the Petabyte — where one can collect immense amounts of data that are paradigm agnostic. It goes on to add a comment from the head of Google's R&D, that we need an update to George Box's maxim: 'All models are wrong, and increasingly you can succeed without them.' Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?"

100 of 387 comments (clear)

  1. Ahem by Anonymous Coward · · Score: 5, Insightful

    The content is compelling. It notes that we've entered the Age of the Petabyte â" where one can collect intense amounts of data that is paradigm agnostic. It goes on to add a comment from the head of Google's R&D, that we need an "update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them." Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?" I believe I speak for not a few of us when I respond:

    WTF?

    English, ---, do you speak it?

    1. Re:Ahem by smallfries · · Score: 5, Insightful

      I used to think that I could translate most dialects of bullshit into english but this threw me off guard. The most reasonable explanation is that Chris Anderson is a tool and doesn't know what he is talking about.

      For example, data is now "paradigm agnostic". Seriously, wtf? When was data ever not "paradigm agnostic" and when did we develop the need for a term to describe it. Data is data. It is raw, and unanalysed, and as such the notion of a paradigm is completely irrelevant.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    2. Re:Ahem by Anonymous Coward · · Score: 2, Insightful

      "For example, data is now "paradigm agnostic". Seriously, wtf?"

      Just look at the creation evolution controversy, to see how data is not 'paradigm agnostic'. Each claim the others data is unsound by the paradigm's umbrella it falls under.

    3. Re:Ahem by eln · · Score: 5, Interesting

      It's simple really: The article seems to be saying that we have access to such a ludicrously large amount of data that trying to draw any real meaning from it is pointless. So, we employ a "shotgun" approach at reading the data, and voila, we get data that at least appears to be interesting.

      Of course, since we have no particular purpose in mind when we do this, and no particular method other than "random", we end up with mostly useless data (in the example given, we have a bunch of random gene sequences that must belong to previously unknown species, but we know nothing about those species other than that we found some random DNA that probably belongs to them, and have no particularly good way of finding out more).

      The article seems to be saying that since we have so much data, we can now draw correlations between different pieces of data and call it science. No reason is given why this is useful other than that we have so much of it, and Google is somehow involved. Apparently when you have enough data, "correlation does not equal causation" is no longer true. Again, no coherent reason is given for this stance.

      I think the article makes the same mistake a lot of ill-informed people that get excited by big numbers make: It seems to believe that data is in and of itself an end goal, when really vast amounts of data are useless unless it can help us as humans answer questions that we want answered. Yes, knowing that there are lots of species of organisms in the air that we didn't know about before is sort of interesting I guess, but it doesn't really tell us anything useful.

      Above all, the article proves that you can be almost entirely incoherent and still get your article published in Wired if it says something about how Google is changing the world.

    4. Re:Ahem by clang_jangle · · Score: 4, Funny

      Data is data. It is raw, and unanalysed, and as such the notion of a paradigm is completely irrelevant.


      Well, we already know it wants to be free, so maybe now it's just exercising its sentient status in other areas.
      --
      Caveat Utilitor
    5. Re:Ahem by Anonymous Coward · · Score: 5, Informative

      Each claim the others data is unsound by the paradigm's umbrella it falls under.

      No, each claim the other's theory is wrong.

      Nobody (sane) refutes the existence of ring species, or refutes microevolution, or other observable forms of data. The only thing in dispute in the controversy is "species are species because they were made that way" versus "species are species because after some really big N evolutionary steps they become that way".

    6. Re:Ahem by loonycyborg · · Score: 2, Funny

      the article proves that you can be almost entirely incoherent and still get your article published in Wired And linked to on slashdot even!
    7. Re:Ahem by commodoresloat · · Score: 4, Insightful

      Well, in the abstract data may be "paradigm agnostic," but the selection of data one has access to at any given time is inevitably not. Which data you choose to collect, how much of it you collect, which data you ignore - these are all decisions that are ultimately subjective. (BTW I think this is probably true even in the age of google but his point is that one is now collecting, storing, and accessing so much data and the "paradigm" influencing those decisions is not a specific scientific theory or point of view.)

    8. Re:Ahem by tshetter · · Score: 2, Insightful

      I didnt see the article really saying that "correlation does not equal causation" at some point with a large enough data set.

      I saw it as saying "With so much data, you can use that as a base for preliminary research."

      You then research those interesting things in traditional ways, but you have started with some sort of insight.

      If you have enough images of the sky and stars, you can use the images to look for interesting things first, and then jump on a telescope or satellite when you have something solid to look for.

      But to be sure, the author was selling Google is the Answer pretty hard. The application of math to problems is never a bad idea, they are doing it pretty well. And with the evolution of computers, more data and more processing are naturally going to occur.

    9. Re:Ahem by nine-times · · Score: 5, Interesting

      Yeah, I don't know what "paradigm agnostic" means specifically, but I think it's a mistake to think that "data is data".

      Not all data is created equally. You have to ask how it was collected, according to what rules, and with what purpose. I can collect all sorts of data by stupid means, and have it be unsuitable for proving anything. It's even possible that I could collect a bunch of data in an appropriate way, accounting for the variables which matter for my particular experiment, and have that data be inappropriate for other uses.

      Of course, if what's intended by "paradigm agnostic" is that we no longer pay attention to those things, then I hope we're not becoming paradigm agnostic. I'm just bringing this up because I think some people think numbers don't lie, and that when you analyze data, either your conclusions will be infallible or your analysis is flawed. On the contrary, data can not only be bad, but it can be inappropriate.

    10. Re:Ahem by MightyMartian · · Score: 3, Interesting

      It's an idiotic notion. We've had vast amounts of data for well over a century now, more than we can hope to fully measure and catalog in a life time. Everything from fossils to space probe readings to seismic measurements fill up data archives, in some cases literally warehouses full of data tapes, artifacts and paper. The way you deal with this sort of thing never changes. Providing the data is stored in a reasonable fashion, if you have a theory, you can go back and look at the old measurements, artifacts, bones, whatever and test your theory against the data. The only difference is that rather than going out and making the observations yourself, your using someone else's (or some computer that just transmitted its data).

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    11. Re:Ahem by jank1887 · · Score: 2, Insightful
      Translation:

      Old-way: develop physical model of how we think things work, test a few cases, refine model. New way: collect a huge relevant data set, mine the data for interrelationships, make a correlation. Correlation models replace scientific models. no more need for the hypothesis testing.

    12. Re:Ahem by melikamp · · Score: 5, Funny

      I used to think that I could translate most dialects of bullshit into english

      Piping TFA to bs2english yields:

      Google is a great place to work, and an even better place to invest money in. Go Google! P.S.: buy Google stock.

    13. Re:Ahem by LilGuy · · Score: 5, Insightful

      I'm glad slashdot linked it. I read this the other day and had no idea what to make of it. After the first 20 comments I see I'm not completely retarded.

      --

      You're nothing; like me.
    14. Re:Ahem by JeanPaulBob · · Score: 5, Informative

      In the minds of some Creationists, science is itself defective because it only deals with natural phenomena.
      Psst. It doesn't. It deals with phenomena about which (or based on which) we can make measurable, testable predictions.

      If your methodology for evaluating a theory requires classifying it by abstract metaphysical concepts like "natural" and "supernatural", then you're a step away from the scientific method of "experiment".
    15. Re:Ahem by Randle_Revar · · Score: 4, Interesting

      Just undoing a slip of the mouse moderation.
      That's one disadvantage of the current mod system - no chance to fix mistakes

    16. Re:Ahem by ArhcAngel · · Score: 4, Funny

      I'm not completely retarded.

      The data is inconclusive. Let me see what I turn up on a Google search.

      --
      "A person is smart. People are dumb, panicky dangerous animals and you know it." - K
    17. Re:Ahem by sm62704 · · Score: 2, Interesting

      all of our current (or now previous) models for collecting data are dead.

      I guess I have to R this FA. ALL the models for data collection? No more controlled double-blind studies?

      It notes that we've entered the Age of the Petabyte -- where one can collect intense amounts of data that is paradigm agnostic.

      Science has always at least tried to be paradigm agnostic. It can't always succeed of course, but I don't see how... Ok, I guess I'd better RTFA.

      OK, I'm back. The article is horseshit. It is a whole bunch of words that add up to essentially what the summary said, only in a really long winded fashion.

      "No theory needed, now we have models". How do you make the model without theory?

      Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?"

      No. In the first place, no data that comes from the internet can be taken at face value (and this Wired article is a good example of how the internet is full of crappy data). Secondly, I hate inaccurate yuppiespeak, like talking about "couuds of information." It's stupid. Information doesn't gather in clouds, it's gathered in big heaps of paper and on hard drives and optical disks. The only clouds are the clouds of crack smoke surrounding the heads of the people who say things like "clouds of information".

      We still use the same tools to analyse data. We just have more data to analyse. Th escientific method itself is nowhere near dead.

      Oh, and the parent is not offtopic - It hit the nail on the head. I guess a Wired editor had mod points.

      --
      mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
    18. Re:Ahem by atraintocry · · Score: 2, Funny

      Science 2.0! Now with more datamining in social networks! And ajax! And of course, the same politicized funding that you know and love.

    19. Re:Ahem by sm62704 · · Score: 4, Insightful

      Information doesn't want to be free. But when it isn't, neither are you.

      --
      mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
    20. Re:Ahem by sm62704 · · Score: 4, Funny

      Not all data is created equally. You have to ask how it was collected, according to what rules, and with what purpose

      I wear a goatee as a result of a small study.

      Several years ago after after my marriage unravelled and I got divorced and couldn't as much as get a dinner date, I decided "fuck it, why do I bother buying razors?" and simply stopped shaving.

      Then one night in a bar a woman told me I should shave it into a goatee. So I started asking women "goatee or full beard?" and collecting the binary (y/n) data. Of seventeen randomly selected women aged 21 to 70, sixteen said "goatee". The one who said "full beard" was standing beside her boyfriend, who wore a full beard.

      My losing streak ended, thanks to pseudoscience!

      --
      mcgrew's razor: Never attribute to stupidity that which can be explained by greedy self-interest
  2. WTF indeed by GameboyRMH · · Score: 5, Insightful

    I saw the article yesterday, but it was so WTFey I just moved on...definitely not Slashdot submission material (especially being a Wired article).

    --
    "When information is power, privacy is freedom" - Jah-Wren Ryel
    1. Re:WTF indeed by eggoeater · · Score: 5, Funny

      "WTFey"
      I hadn't seen WTF adjective-ised before, but I love it... there's just so much I can use it with. In fact, I gotta go now and tell my boss how my project is going....

    2. Re:WTF indeed by mrchaotica · · Score: 5, Funny

      adjective-ised

      And I hadn't seen adjective verbed!

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    3. Re:WTF indeed by MightyMartian · · Score: 5, Funny

      It reads like some sort of brain-damaged new-age technohippy tripe. Yeah, we don't need methodologies any more, because, maaaan, we've got tubes! Gimme a break.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    4. Re:WTF indeed by melikamp · · Score: 3, Funny

      And I—a pronoun slashed. Only on /.

    5. Re:WTF indeed by Zabu · · Score: 2, Funny

      And I hadn't seen verb verbified!

      --
      It's all good.
    6. Re:WTF indeed by boyko.at.netqos · · Score: 2, Funny

      I didn't know you could turn "verb" into a verb, or, to verb a noun, verb verb.

      --
      I used to work for NetQoS. I no longer do, but want to keep the excellent karma attached to this account.
    7. Re:WTF indeed by m.ducharme · · Score: 4, Funny

      This thread is cromulent.

      --
      Rule of Slashdot #0: You and people like you are not representative of the larger population. - A.C.
    8. Re:WTF indeed by dmbasso · · Score: 5, Funny

      And I hadn't seen anything, I'm blind you insensitive clod!

      --
      `echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
    9. Re:WTF indeed by FishAdmin · · Score: 2, Funny

      Actually, it embiggens us all.

      --
      Last night I played a blank tape at full volume. The mime next door went nuts.
  3. Definitions by sir_eccles · · Score: 3, Insightful
    "Data, information, knowledge, intelligence."

    They may lead from one to the other but they are not all the same thing.

    1. Re:Definitions by Itninja · · Score: 4, Insightful

      A bit OT here, but don't forget 'wisdom' after intelligence. So many people stop at intelligence.

      --
      I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
    2. Re:Definitions by Anonymous Coward · · Score: 2, Funny

      Also, charisma and dexterity are very important.

    3. Re:Definitions by gnick · · Score: 4, Insightful

      don't forget 'wisdom' after intelligence. So many people stop at intelligence. From what I've seen, it's not completely a progression from one to the other. I've met people who I would describe as 'knowledgeable', 'intelligent', or 'wise' without possessing either of the other attributes. Those traits are often coincidental and one can help beget another, but it's far from a hard set 'intelligent'->'knowledgeable'->'wise' progression.
      --
      He's getting rather old, but he's a good mouse.
  4. Not quite by edwebdev · · Score: 4, Funny

    Until cells, molecules, atoms, and subatomic particles start publishing blogs, the scientific method will remain useful.

    1. Re:Not quite by cp.tar · · Score: 2, Funny

      Quite.

      And no matter the amounts of data, no matter the computing power, I don't think pure statistics will ever be able to analyze human language efficiently.

      --
      Ignore this signature. By order.
    2. Re:Not quite by Kamineko · · Score: 2, Funny

      I'm made out of those you insensitive clod.

  5. So... by dunnius · · Score: 5, Insightful

    So everything possible has been researched now and therefore no more research is necessary since it will all be on the internet? Ridiculous!

    1. Re:So... by backwardMechanic · · Score: 2, Funny

      Everything possible was researched, measured, logged. Nobody could think what to do with all that data, so they made an extra universe to store it in. We're living in it.

  6. How bout no by Anonymous Coward · · Score: 5, Insightful

    Um, no. Claims like this demonstrate a lack of understanding of what a model is.

    From the perspective of physics, the universe is just a massive amount of data--more data than any single human can comprehend at once. But thanks to the models of Newton we have a set of relatively simple equations that describe, generally, the way bodies in the universe interact. The model is not perfect, but it is useful.

    Likewise, Google uses a very explicit model to describe the universe of the web: some pages are more relevant to a given search query than others, and these pages will generally be more 'popular' among other important pages. Again, the model is not perfect, but it is useful.

    The fallacy is that somehow what Google is doing is a paradigm shift. It's not. It's just applying the same kind of scientific method to a type of data that hadn't existed before.

    What, I think, the article is really trying to say is that Google's data is so massive and complex that we can't ascribe any explanation to the results it gives us. First of all, that is false, because the PageRank algorithm in its simplest form does give us a very explicit explanation (popular pages generally return better results). But even if it were true, Newton faced the same kind of accusations when people called his model of the universe 'Godless' and claimed, for example, that he decribed how gravity works without actually explaining "why" it works like it does. And that accusation is always with science. There are always more questions raised than answered. This is nothing new.

    1. Re:How bout no by vertinox · · Score: 2, Insightful

      But thanks to the models of Newton we have a set of relatively simple equations that describe, generally, the way bodies in the universe interact. The model is not perfect, but it is useful.

      You are aware that the Newtonian Physics model breaks down when you are talking about traveling close to the speed of light?

      Although, most of the time we are dealing with things that aren't traveling so fast, but there are many scenarios in physics that we need a different model for.

      I think what the Googlite is advocating is that for very complex systems (like weather systems, financial, blackholes, LHC etc) which do not go well with our standard models, will need (pause for effect) new models.

      Why? Because there is so much data that its hard to follow the scientific method because chances are you'll never get the same situation again for repeatable in a lab (like weather conditions) because there is infinite amount of data that could be gathered on these complex systems.

      Take the LHC Computing Grid for example. The amount of data gathered from that experiment maybe astronomical and it could be quite possible that once you get to that scale on the atomic level that you can never have exact conditions each time (of course it maybe the opposite but we won't know until they turn the thing on for a run on what happens to matter and energy when you do what they plan on doing).

      I am not saying that everyone should throw out the scientific model, but I agree with the article that a new model needs to be created for complex systems. After all... We still don't have a 100% accurate model of weather prediction other than a few days at a time.

      --
      "I am the king of the Romans, and am superior to rules of grammar!"
      -Sigismund, Holy Roman Emperor (1368-1437)
  7. Don't rule science out it. by russotto · · Score: 5, Insightful

    The article is utter nonsense. But it's such a rambling mess it's hard to know where to start picking it apart. Perhaps the best is when he presents as an example of this new "model-free" approach with a program which includes "simulations of the brain and the nervous system". Uh, hello... a simulation IS a model.

    1. Re:Don't rule science out it. by feed_me_cereal · · Score: 5, Funny

      He didn't bother writing more than one rambling page because he figured someone said it better somewhere else on the internet and that we're all bound to find it.

      --
      "Question with boldness even the existence of a god." - Thomas Jefferson
    2. Re:Don't rule science out it. by ColdWetDog · · Score: 4, Interesting

      The article is utter nonsense. But it's such a rambling mess it's hard to know where to start picking it apart.

      I suppose you could start where he, again, tries to present the argument that correlation really is "good enough" - causation be damned. What he is blattering on about is that you can infer lots of things via statistical analysis - even complex things. That's certainly true. Where he fails (and it's an EPIC fail) is his assertion that this method is a general phenomena, suitable for every day use.

      The other major failure of TFA is that I can't find a car analogy anywhere.

      --
      Faster! Faster! Faster would be better!
    3. Re:Don't rule science out it. by JustinOpinion · · Score: 5, Insightful

      it's such a rambling mess it's hard to know where to start picking it apart. Agreed. I want to do a line-by-line rebuttal... but I fear that would be a waste of time.

      The article does not make a compelling point. It keeps saying that we can give up on models (and science), because now we just have lots of data, and "correlation is enough." What utter BS. Establishing a correlation is not enough. Even if it is predictive for the given trend, it doesn't allow us to generalize to new domains the way a well-established scientific model does. If an engineer is designing a totally new device, that goes above and beyond what any established device has done, what data can he draw upon? If there is no mountain of data, he must rely on the tried-and-true techniques of engineering/science: use our best models, and predict how the new device/system will behave.

      The article actually makes this point perfectly clear when it says:

      Venter can tell you almost nothing about the species he found.
      Indeed. Merely having tons of data doesn't actually give you insight into what you have measured. You must distill the data, pull out trends, and construct models. I just don't see how have mountains of data about a species, but still being unable to answer simple questions about it, is superior to conventional science (which can answer questions about the things it has discovered).

      A deluge of data and data-mining techniques is a boon to science. But I don't see the benefit of giving up on the remarkably successful strategy of constructing models to explain the phenomena we've observed. I somehow doubt that having 20 petabytes of data on electron-electron interactions is more useful than having a concise theory of quantum mechanics.
    4. Re:Don't rule science out it. by smallfries · · Score: 2, Insightful

      Once upon a time cars were pretty simple. The most effective way to fix a car that had broken was to find a mechanic. This was a man trained in the models of how cars work. He would sift through the collection of parts (data) in the car until he noticed an anomaly that he would charge you outrageously for.

      Now cars have become so complex that these models are no longer needed. Instead you can just examine the millions of cars that either work or don't work right there on teh interweb. One you find a correlation between your car and another car you can then fix the difference without knowing anything about models of "how cars work"!

      Err, maybe that analogy was a little too accurate as it has made his argument sound shit?

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    5. Re:Don't rule science out it. by ColdWetDog · · Score: 2, Informative
      That's a disturbingly accurate analogy for what TFA is trying to say (and thanks for playing everyone). The problem is that isn't "science" - it's at best a hack engineer job. It might work to get the car running, but you aren't going to be making much progress concerning "cars" in general.

      Man, I'm feeling old today. Whatever happened to "first principles"? And my slide rule.

      --
      Faster! Faster! Faster would be better!
  8. My Start menu has been Googled by spyrochaete · · Score: 4, Insightful

    I am definitely a victim of this "Google effect". Search makes me lazy.

    For example, for years I would pride myself on my well-tended Windows Start menu. I'd create base categories for my application folders like Hardware, Games, and Internet, and move applications into those folders to keep my Start menu manageable. I blogged about this procedure and included a screenshot.

    Now that I'm using Vista I have little need to be so organized. I rarely have to navigate manually to an application folder thanks to the embedded search box on the Start menu. So now my Start menu is a huge clutter, but so what? I see that exercise as futile as dusting the cardboard boxes in the attic.

    1. Re:My Start menu has been Googled by Hatta · · Score: 2, Insightful

      Now that I'm using Vista I have little need to be so organized. I rarely have to navigate manually to an application folder thanks to the embedded search box on the Start menu.

      If you're going to take your hands off the mouse to run an app, why not just pop open a console and start it from there? I have no use for any sort of start menu, I have a console. It's certainly more flexible than a search bar, you can pass arguments or file names(with wild cards even) to the application.

      --
      Give me Classic Slashdot or give me death!
    2. Re:My Start menu has been Googled by maxume · · Score: 2, Interesting

      There are third party apps to add similar functionality to XP. Launchy is the one I use:

      http://www.launchy.net/#download

      I think they are all clones of some Mac app though.

      --
      Nerd rage is the funniest rage.
  9. What question do you ask the data. by xzvf · · Score: 4, Insightful

    Searching data is a tool. You still need to have insight to formulate a theory, develop a test for the theory, and ask the data pool the right (non-leading) question. Then evaluate the data looking for both proof and disproof of the theory and be smart and ego neutral enough to let the data suggest a new theory, test and question. Don't confuse a new and useful tool that makes insight easier, with the ability of humans to have that insight.

    1. Re:What question do you ask the data. by Daniel+Dvorkin · · Score: 3, Insightful

      Exactly. The "deluge of data" is a useful tool, no doubt about it. But Google doesn't make the job of collecting and analyzing data irrelevant any more than the advent of the telescope made the skills and knowledge of astronomers obsolete.

      I particularly love this line from TFA:

      For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising -- it just assumed that better data, with better analytical tools, would win the day. And Google was right.

      (Applied) science at its best! "The culture and conventions of advertising" are basically folk wisdom, and folk wisdom is often right but more often wrong. Google took a scientific, unbiased view of how to move bits around and make money with them: start with as few preconceptions as possible, analyze the data, see what happens.

      --
      The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  10. quality still as important as quantity by peter303 · · Score: 2, Interesting

    There are still several computing problems from earlier, smaller eras that havent been solved by the "more" paradigm. One example is realistic synthetic voice. The bandwidth is megabytes, achieved by mp3 players some years ago. However voice is the last part of the "real world" we have to capture instead of synthesize to implement computer-generated feature movies or video games. This keeps the need for having some "flesh" actors around, at least for a few more years :-)

    Then there was Slashdot's retrospective of Artificial Intelligence a few days ago. Many of the interesting advances where made in the kilobyte and megabyte eras. It seems the gigabyte and terabyte eras have barely made a dent in progress.

  11. Google =/= scientific method by Rubikon · · Score: 5, Informative

    That an incredible amount of data exists on any given topic does nothing to describe relationships, causality, precision, accuracy, distribution, correlation, or anything else. Data is information, and information must be processed in order to make it meaningful. Additionally, everything that's written, printed, published, etc, is not necessarily true, accurate, precise, etc.

    If anything, the Google phenomenon demands more rigorous examination by accepted methods.

    The preceding message has been brought to you by Captain Obvious and the letters O,R,L,Y.

  12. No. by qw0ntum · · Score: 4, Insightful

    First, not everyone has access to vast clouds of information due to expense and I don't think that's going away any time soon. So we'll still get to understand what's going on around us and not just rely on regression analysis to inform our every decision.

    Second, in my experience with large sets of data, you can do all kinds of math to them to bring out interesting relationships but someone with domain expertise is going to have a much better insight into what the data is saying than someone who doesn't. It seems the peak of hubris to think that the techniques taught in every science (social, hard, or otherwise) are worth nothing compared to massive amounts of data. How do you know where to get the data from? How do you apply the data?

    I don't think it's quite time to throw out "correlation != causation". In fact, I think now more than ever we need to be able to understand underlying phenomena behind the data precisely because there is so much of it. With so much data, coincidental correlation is going to happen quite often I'm sure.

    And, of course, the ultimate reason we need to understand things is for, you know, when the cloud's not there.

    --
    'Every story, if continued long enough, ends in death.' --Ernest Hemingway
  13. Wrong by DogDude · · Score: 5, Insightful

    This is typical web 2.0 hype... more is better. Which, as anybody who has used Wikipedia knows, is utter bullshit. The scientific method can't be supplanted by a large amount of questionable data. Tons and tons of bad data is still bad data. It doesn't get any more correct just because there's more of it.

    --
    I don't respond to AC's.
  14. Interesting, ranty, and wrong by xPsi · · Score: 5, Insightful

    A thought-provoking piece written by someone who neither understands the scientific method nor Google. Who doesn't understand the difference between a Theory and a model. Who still doesn't get correlation!=causation. Who probably has never had to actually analyze any substantial amount of data before. And who has clearly been raised on a self-important intellectual diet consisting of too much Buckminster Fuller, Kurtzweil, Frank Tipler, and Derrida. I'm sure there are some kernels of insight buried in there someplace, but I'm just not clear what they are. If his rant is indicative about the future direction of science, we're all doomed.

    --
    i\hbar\dot{\psi}=\hat{H}\psi
    1. Re:Interesting, ranty, and wrong by poot_rootbeer · · Score: 3, Insightful

      A thought-provoking piece written by someone who neither understands the scientific method nor Google. Who doesn't understand the difference between a Theory and a model. Who still doesn't get correlation!=causation. Who probably has never had to actually analyze any substantial amount of data before. And who has clearly been raised on a self-important intellectual diet consisting of too much Buckminster Fuller, Kurtzweil, Frank Tipler, and Derrida.

      And he works at Wired magazine? You don't say.

  15. Biggest Data Collector LHC relies on Models by markk · · Score: 4, Insightful

    I thought this was a joke at first. One thing to think about is that the biggest data collector of them all, the Large Hadron Collider, which fits the frame given perfectly - delivering terabytes of data in huge data sets is just the opposite of the described scenario. Models are crucial to actually picking what data is actually recorded. In fact a large part of how good the LHC data will be will be in using models to select what events to capture. The way the data is captured is of course also based on long effort and knowledge from previous detectors. This isn't just randomly, or even generically selectively gathering data and then analyzing it. This is targeted data gathering based on complex scientific theories. There have been shouting matches at what to tag for collection based on what people think is important for a given theory - and these will happen again.

    As our collection abilities rise exponentially, the the storage and analysis abilities are not exponentially growing, even though they are increasing at a fast rate! I would argue exactly the opposite of what this article said. We are going to be more and more dependent on our current scientific theories to even be able to choose appropriately the rich data that new sensors and techniques will let us collect. That is we are more and more dependent on our scientific theories when we get data not less. Did we even know to get methylation data when sequencing a genome. How about some other "ylation". Without background theory and experience we wouldn't even know some of that stuff was there to collect!

  16. WTF, be serious by mlwmohawk · · Score: 2, Insightful

    This is nonsense pure and simple.

    One needs to acquire facts. Now these "facts" can come from your own research or, in the age if the internet, someone else' data, but they still need to be collected and verified.

    The *only* advantage that google provides is a more efficient way of sharing and finding facts. Not even all facts, those that are popular and topical are what you'll most likely find.

    Historical information, from when newspapers only used dead trees, can be very difficult to find on the internet unless someone else did the research first.

  17. Just to clarify by GameboyRMH · · Score: 5, Insightful

    To avoid the same fate as the GP, let me clarify that by WTFey I specifically meant that the article was full of fluff, light on details and generally pointless...which makes me think "WTF." The closest thing to a point I could get from the article was "Nice big blobs of data can be useful, and statistical data based on said blobs could replace the results of scientific research." Mmmkay.

    A sensational headline leading to a rather pointless article consisting mostly of fluff: WTF.

    --
    "When information is power, privacy is freedom" - Jah-Wren Ryel
    1. Re:Just to clarify by javilon · · Score: 4, Interesting

      Well I think the point they make is that with this kind of mathematical tools running against this huge sets of data, you get models out that you couldn't have thought of about. This is real AI. During the last days we had entries here on Slashdot about how AI is not advancing, but this kind of thing is very advanced AI and it is new.

      I'll explain myself. The biggest job that a brain does (lets not consider a human brain so we don't get into the consciousness/mind type of conversation) is to find statistical correlations from the input data and extracting models from this correlations that can be used to predict the future. This is exactly what this tools are doing.

      Before this tools, by looking at the data you would go: mmmm, this is interesting, lets check it out. That is, you would come up with a model and try to find out if it predicts the data. Then we started to use computers to check our models, and from what this WTFey article says, it is the computer the one coming out with the model now, starting from raw data.

      --


      When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."
    2. Re:Just to clarify by Kamineko · · Score: 2, Funny

      "What The Fluff"?

    3. Re:Just to clarify by inkyblue2 · · Score: 3, Insightful

      the difference is that brains create new theories and models to describe data, whereas this article specifically talks about avoiding the need to make new theories to describe data. we still have no AI that can create theories and models and semantics on its own. i agree that when that happens, we'll have something exciting and new, but it hasn't happened yet.

    4. Re:Just to clarify by ardle · · Score: 2, Funny

      I fear to think what theory an AI would come up with based on all the "information" that's on the Internet

    5. Re:Just to clarify by hedwards · · Score: 3, Insightful

      Quite so, the article was dead wrong.

      Having that much data allows for science that wouldn't have happened otherwise, but it doesn't allow us to forget about sound scientific principles. I for one don't want to die because the pharmaceutical company and my doctor thought that a correlation with safety was enough, without doing the experiments to verify. I could die either way, but correlation just isn't enough in many cases. Statistics don't prove or disprove anything, ultimately science is about understanding things the way that they are. Statistics can't do that.

      If you can collect and store 100 pieces of information about a test subject for 200,000 test subjects at 150 points in time, you can do a huge amount with that. But, the data still needs to be interpreted, verified and placed into a verifiable model.

      It doesn't really surprise me that Google would be handling search the way that they do, considering how borderline impossible it is to search for certain things unless you already know what you want. Searching for answers to software bugs ought to be straight forward, but Google seems completely incapable of sanely coping with version numbers without a lot of work.

    6. Re:Just to clarify by theshowmecanuck · · Score: 2, Insightful

      Google no longer returns any useful data anyway. Search for anything and all it will turn up are thousands of web sites trying to sell you something that might be related to the search query you typed in. I think that is why Wikipedia is so popular. At least there you get some information on a topic you search for... and it doesn't contain the words 'best price on the net' etc. either. I have just about given up searching on Google or Yahoo, or any of the big search engines, since they usually don't return anything useful anyway. Except when I need to buy something. If you don't personally know a specific web site that has info on the subject you are researching, you are screwed as far as getting anything useful from Google.

      --
      -- I ignore anonymous replies to my comments and postings.
    7. Re:Just to clarify by rugatero · · Score: 2, Funny

      The second is the Google way: analyse every piece of text in the world,

      That should be analyase every piece of text on the Internet , which hampers the next step somewhat:

      assume that the majority knows how to spell correctly
      --
      This comment is for entertainment purposes only. Any similarity to real insight or information is purely coincidental.
    8. Re:Just to clarify by TuringTest · · Score: 2, Informative

      I suggest you to include the search terms -buy and -price. That makes wonders in getting Google to show you the actually relevant pieces of information.

      --
      Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
  18. The Paradigm is the Data Subset by fictionpuss · · Score: 5, Insightful
    The paradigm is embedded in the quantity, or subset, of data you choose to analyse.

    For example, to detect stress you might traditionally measure heartbeat, skin conductivity, pupil dilation.

    In the "petabyte age" you throw in the number of times the subject uses the letter 's'; how frequently they use the 'reload' button on the browser; what colour of pants they wore last tuesday; Pepsi vs. coca cola; the number of times they picked their nose in 1997 and any and every other bit of data you have on the subject.

    In the "petabyte age", most of the data you sift through will show no correlation, but you have a much better chance of finding the unexpected if indeed, there is some unknown factor out there.

    1. Re:The Paradigm is the Data Subset by kurthr · · Score: 5, Insightful

      Don't you run a much higher probability of finding high correlation by chance?

      I can expect to find a result that matches my model to 95% certainty about 5% of the time in random data. You can correct for this, but it's against human nature because people like to see the face of Mary in toast.

      Learning how to look for correlation in huge uncontrolled data sets will require a new paradigm... or it will ultimately be useless and even perhaps, unsuccessful.

    2. Re:The Paradigm is the Data Subset by hal9000(jr) · · Score: 2, Insightful

      The paradigm is embedded in the quantity, or subset, of data you choose to analyse. In addition, once you start to analyze something, you have already built the "model" ipso facto. I can't imagine how you could set out to analyze something without a model.

      The example Anderson uses in fact shows this. Ventner had to have a model of an ecosystem within which he posits the existence of organisms. Through testing (statistical analysis), he finds them. Thus 1) ecosystems house organisms and 2) there are organisims we don't yet know about.

      Seems like the scientific method to me.
    3. Re:The Paradigm is the Data Subset by fictionpuss · · Score: 3, Interesting

      Learning how to look for correlation in huge uncontrolled data sets will require a new paradigm... or it will ultimately be useless and even perhaps, unsuccessful. The ability to find statistically significant correlation (i.e. not Mary-in-Toast) in huge datasets is a prerequisite condition.

      But that goes for any visualisation technique - look to Edward Tufte or Stephen Few for detailed examples of how even the simple xy-graph can be abused.

    4. Re:The Paradigm is the Data Subset by edcheevy · · Score: 3, Insightful

      Yes. The more data you collect, the more likely any two things will be correlated slightly. With millions or billions of data points, you would be shocked to find a variable that does NOT correlate significantly with everything else. That's why "correlation" or "significance" alone becomes less useful and we need to a) report effect size measures to get a better sense of how important the correlation actually is and b) continue to use our heads (and not always give blind trust to the cloud) to determine which correlations are useful and which ones are fluff.

      A correlation that helps place internet ads .0000002% more efficiently might matter to Google but likely doesn't further human understanding or refine our thinking in any practically appreciable way. And because EVERYTHING is correlated at that point, I suppose there are an infinite number of variables we could use to refine our model. I think the only paradigm shift here is that it would take an army of AIs to sift through and bring some meaning to all that noise, and an army of AIs would probably be doing other things with their time. ;p

  19. No. Science Scales. by mbone · · Score: 3, Informative

    Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?

    No. And also no to the basic premise of the article.

    Meteorologists have been doing this for decades (principal component analysis has been a crucial tool there since the 1960's, and correlation analysis has been used in some form since the 1920's if not earlier) and so have the astronomers. Oh, and the particle physicists have been sifting data in their own way on a big scale ever since World War II.

    As one of many examples, if you ever have heard of an "El Nino event," that was discovered through correlation analysis and is best understood through principal component analysis. BTW, the original work predates electronic computers and was all done by hand. The vast quantities of meteorological data require statistical analysis to make any progress at all, but that certainly does not mean that you cannot use the scientific method.

    So, no, this does not invalidate the scientific method. In the Internet jargon, science scales.

  20. Consequence of the Post-Modern Age by phobos13013 · · Score: 2, Informative

    Anyone who has read any work by Lyotard, Baudrillard, or Derrida has seen this interpretation of reality coming for years. This is basically the consequence of the Post-modernist/Post-structuralist mentality.

    In a sense, what the article is proposing is the "simulation" of reality in a computer system based on the available "data". This simulation as i will suppose in a moment is merely a flawed model since the data being related must in some sense be based on an algorithm which inherently MIMICS reality and is not a substitution for it (no matter how, "accurate" agreement). But nonetheless, the result of this as Baudrillard observed is not a simulation but a simulacrum of reality and eventually will take the place of reality. The implication is that reality is not created or manufactured by the interaction of people in a "real" sense but is actually lead by the operation of the simulacrum!

    Nonetheless, the fact is there is no possible way to store ALL the data of the entire world (since some data is not recordable by a binary machine, and no a "quantum" computer is the solution to say it can be); however, the problem is this fact does not mean we cannot be mislead by the simulacrum and be lead into a future where human interaction is as I would call inhuman, but as some who have (in some cases unknowingly) fallen for the post-modern myth would call it merely an evolutionary result of human-interaction.

    In the future the storage of data, the usage of data, and the power of data will have a huge impact on our humanity as the past twenty years should already be evidence of. I am not an apocalyptic fear-monger, but the proof is in the pudding. For further reading, I recommend a highly prescient book written in 1990 by a Mr. Mark Poster called the Mode of Information which talks about some of these implications which are in the process of becoming as we speak

    --
    ...and it should be known by now
  21. When people say shit like this... by iluvcapra · · Score: 3, Funny

    Have we reached a time where all of our tool-sets are now made moot by vast clouds of information and strictly applied maths?"

    It means there's about to be an explosion in models and theoretical sciences. Always beware the End of History ;)

    --
    Don't blame me, I voted for Baltar.
  22. Quite... by denzacar · · Score: 3, Informative

    I'm sure there are some kernels of insight buried in there someplace, but I'm just not clear what they are My thoughts exactly.
    And since most slashdot readers don't RTFA most comments here have proven useless in trying to figure what those kernels you mention are.
    But this guy, who has read TFA (and commented on it on the Wired's site) seems to have found them.

    Posted by: technophile
    20 hours ago1 Point

    I think what you have hit on here is the difference between analytical and empirical solutions. Analytical relationships are usually first determined from empirical ones. Once you have the empirical relationships you can determine the missing factors or constants.

    (See also http://en.wikipedia.org/wiki/Empirical_method )

    They are both necessary and a part of the scientific process. You collect data, generate empirical equations, then try and derive or otherwise model the empirical relationship with an analytical one. Empirical relationships are limited because they are somewhat system dependent. For instance an empirical relationship for the ideal gas law could be generated using methane. This might be accurate for methane, but limited in its use for a gas that deviates from the ideal behavior (i.e. hydrogen fluoride gas). You could generate an empirical relationship for every single molecule in the universe but that would be impractical, which is why analytical relationships can often be more useful. Hopefully the "Petabyte Age" will allow the scientific method to flourish, not replace it.

    edit: Rethinking my reply, what the article seems to say is that the Petabyte Age will make determining empirical relationships for everything practical. The scientist who generates loads of empirical relationships and never questions the underlying theory is not a scientist at all, just an observer of scientific processes. I suppose it depends on your goal as to whether this will suit you or not.

    --
    Mit der Dummheit kämpfen Götter selbst vergebens
  23. It depends by geekoid · · Score: 2, Funny

    Fighter classes generally stop at con, where as Casters generally for Int or Wis. No one cares about Cha.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    1. Re:It depends by camperdave · · Score: 3, Funny

      I don't know. They say you can get more with an 18 charisma and a sword than you can with a sword alone.

      --
      When our name is on the back of your car, we're behind you all the way!
    2. Re:It depends by melikamp · · Score: 2, Funny

      Never rolled a bard? Please turn in your geek card on your way out.

    3. Re:It depends by Wandering+Wombat · · Score: 3, Funny

      I failed my save and coughed coffee out my nose...

      --
      I like to place meaningful quotes in my sig, so people will know that I know what meaningful quotes are.
    4. Re:It depends by Hijacked+Public · · Score: 3, Funny

      I've rickrolled a guy who rolled a bard, does my card get stamped for that?

      --
      "Sacrifice for the good of The State" - The State
  24. Re:The problem with this newly coined 'age'. by Vectronic · · Score: 2, Funny

    Petaphile

    1) Someone who loves their pets more than human beings or, at the extreme, someone willing to kill a human to save a lower animal's life.

    2) Somebody who has sex with animals because they cannot attract any humans, or they are attracted to animals

    (and the best one)

    3) someone so caught up in his own egomaniacle conception of the world that he is compelled to spew vomit and blood on a strangers clothes to show his contempt for anybody's thought but his own.

    Which sounds kinda like the summary for the article, as well as some of the article.

  25. Foundation Series by __aagmrb7289 · · Score: 2, Interesting

    Just finished rereading the Foundation series for the one millionth time. Anyone remember some of the signs of the decay of the first empire? The idea that these "scientists" were no longer experimenting, no longer looking for new ways to do things - just spending their time looking at old books and old experiments and trying to squeeze a "new" thought or two out of them? That a sociologist would study a society through books written about it? An archeologist would explore the ruins of a world by reading descriptions written by someone centuries before?

    Anyway catching the parallels here? The "search engine" is a great tool for gathering existing data - but our current tools help us:

    1. Analyze that data
    2. Gather more data

    Can you honestly say that those aren't important anymore? The summary seems pretty crazy to me.

  26. Wired. by E-Sabbath · · Score: 2, Funny

    You know, this may be the most pure Wired article I've read in a long time. Reminds me of the magazine's layout when it first came out. Complete bull, unreadable, unstructure, but slick.

  27. I've never heard something so ridiculous by SageinaRage · · Score: 3, Insightful

    Google used reams of data to get good at advertising and marketing, so Wired is using this ability to predict the end of SCIENCE?

    Do they not realize the difference between these things? Advertising is extremely hand wavy and vague in the best of circumstances - I would argue that Google's offerings aren't really better than any other method, they're just cheaper for advertisers, and have a much larger base than normal.

    I'm honestly astounded at this.

  28. "Paradigm agnostic" by gatkinso · · Score: 2, Insightful

    An unknowable paradigm? Interesting.

    --
    I am very small, utmostly microscopic.
  29. Two words: selection bias by LrdDimwit · · Score: 2, Interesting

    You can't get good data unless you control the makeup of your data population. Even if you applied this technique to all the data in the cloud, it wouldn't mean the "end of the scientific method", it would be scientifically studying the cloud.

    So no. Even if everything he wrote is all true, you still apply science to study things, just in a different way. The internet doesn't make science obsolete any more than it made economics obsolete, and saying otherwise is as much hubris now as it was then.

  30. Predicitive power? by Gryphia · · Score: 2, Insightful

    It seems rather stupid to me. Sure, we can correlate a whole bunch of data. And we can collect a whole bunch of data. But that's not going to give us the predictive power that scientific models give us.

    Take for example, the orbit of the earth around the sun. Suppose we collected a whole bunch of data on the orbit of the earth around the sun. Sure, we'd be able to predict what the orbit is going to be, based on past data. But it gives us no other insight. Whereas, when we use the theory of gravity (and rotational motion and conservation of angular momentum etc . . .) to predict that the earth orbits the sun, and how it does so, that gives us insight.

    Because we can now turn to, say, Jupiter and the sun. Even if there is no data collected on how Jupiter orbits the sun, we can use the predictive power of our theories, that we have tested on the earth-sun system, to say how Jupiter is going to orbit.

    That's a simple example, but you can imagine much more complicated situations. If we simply have correlation, we may be able to say that X is going to do Y based on previous behavior, but if I ask you how something new and unexpected is going to behave, we can get no answer until we take data . . . because we don't know *why* anything happens. And that's why we're never going to replace theories with statistical analysis of data.

    There's a place for both. Obviously, just statistics can be very successful (google, for example), but, at least in science, it's not sufficient.

  31. Re:Just what we need... by Eli+Gottlieb · · Score: 3, Funny

    TODAY: Feeling up.

  32. Re:Say what now? by DamnStupidElf · · Score: 2, Interesting

    There are two reasons you're wrong. One is entropy, and the other is one way functions.

    Entropy forces causality to appear in physical systems. A boiled egg is highly correlated with a heated raw egg, but I challenge you to explain away the causation from one state to the other.

    One way functions are quite similar, and probably a result of the same physical properties of matter. When a key is used to encrypt data, there is a high correlation between the original data, the key, and the encrypted data, but causation clearly flows from encrypting data with the key to the encrypted data state, and not from the encrypted state to a derived key and the original data. It's just a limitation of human (and our machines) abilities, but it nevertheless presents very strong evidence for the practical existence of causation.

  33. Re:Data is not paradigm agnostic. by spun · · Score: 3, Funny

    One must already have a concept about what is measurable, what to measure, and how to measure it before data can be collected I think the point the article was trying to make is that this idea is now wrong. There are petabytes of already collected data out there. You don't need to have any idea what is measurable, what to measure or how to measure it. You just throw statistical tools at those petabytes of raw data. You don't even need a model. Then magically we find out that vegetarians who wear blue pants on Tuesday and were born in November are more likely to get cancer and should get checked regularly, or something. We don't even need to know why, it's not important. Or at least I think that's what the article was trying to say.
    --
    - None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
  34. Modeling is the core of science by Hoplite3 · · Score: 3, Interesting

    I must admit, as an applied mathematician who makes models of physical things for a living, this sort of research threatens to steal my bread and butter. It may be self-centered, but I think modeling is, beside experiment, half of science.

    Simplified models are so valuable to our understanding because they tell us what information we can remove, which parts of a problem are important and which parts may be ignored. They allow us to not just make predictions, but they guide future experimentalists as to what sorts of changes will impact the system and which won't.

    To be fair, it's more of a cycle: experiments generate data, models are constructed to explain the data. These models make predictions (and hopefully useful simplifications) that can be checked by further experiments to validate them. At the end of the process, we've produced a clearer picture of how a system works. Enough information maybe for someone building something slightly different to not have to test the aspects covered by the model.

    I view these data-mining techniques like the scientific computing techniques of the last 30 years or so, only the inverse. Sci Comp nerds wanted to do away with experiments. They thought they could numerically simulate (relatively) exact models (like Navier-Stokes for fluid motion rather than one of its more tractable, understandable simplifications) and use the generated data instead of experimental data. The trouble was that no one will believe that the crazy new phenomenon discovered by your program is real until they see it in the lab, until they construct a simplified model that has the same behavior -- i.e. the same science as before.

    The new data-mining idea is the same, but for the modeling end of things. "No models, please," they say. They'll just data-mine the experimental results and "discover" whatever the model missed. Except people will want to do experiments to verify the discovery. They'll want to build models so they can know they're doing the right experiments, and so on.

    At the end, I think Sci Comp and data-mining are fantastic new tools that have a lot to offer science, but I don't think either eliminates the need for old fashioned modeling.

    --
    Use the Firehose to mod down Second Life stories!
  35. Comment removed by account_deleted · · Score: 2, Insightful

    Comment removed based on user account deletion

  36. Comment removed by account_deleted · · Score: 2, Insightful

    Comment removed based on user account deletion

  37. model selection by inkyblue2 · · Score: 2, Insightful

    i'd never heard the term "model selection," so thanks for pointing that out. it looks like there really is some good literature to read on the subject.

    the process described by the model selection sites i skimmed still doesn't adress what i was getting at, though. "choosing a model from a set of potential models" is only conceivable when your set of potential models (and set of variables to potentially be modeled) is well bounded.

    to put it another way, take the smartest model choosing algorithm you can find, hand it a pile of data, and say "what do you make of that, smart guy?" i'm willing to bet that the answer is going to be along the lines of "wtf?" unless there is some sort of context or metadata provided along with the data to give the algorithm a hint of what it's looking for. am i looking for covariance between scalar values among regularly organized groups? am i looking for white rabbits in the image data from a camera? is this ascii or ebcdic or 8-bit PCM data? you can argue that these questions are trivial, that no algorithm can be *that* general, but that is precisely my point: all known algorithms require significant narrowing down of the problem space by human hands before they can begin to produce useful output.

    if you had an algorithm that took *truly* semantics-free data in one end and spit models of regularly occuring features out the other end, you'd be halfway to general AI.

  38. Comment removed by account_deleted · · Score: 3, Interesting

    Comment removed based on user account deletion