Slashdot Mirror


disCERNing Data Analysis

technodummy writes: "Wired is reporting how CERN is driving the Linux-based, EU funded, DataGRID project. And no, they say, it's nothing like Seti@Home. The description on the site of the project is: ' The objective is to enable next generation scientific exploration which requires intensive computation and analysis of shared large-scale databases, from hundreds of TeraBytes to PetaBytes, across widely distributed scientific communities.'" If you're interested in this, check out the Fermi Lab work with LinuxNetworkX data as well as the all-powerful Google search on the Fermi Collider Linux project. As jamie points out, "Colliders produce *amazing* amounts of data in *amazingly* short time periods... on the order of "here's a gigabyte, you have 10 milliseconds to pull whatever's valuable out of it before the next gigabyte arrives".

4 of 82 comments (clear)

  1. distributed computing by sam@caveman.org · · Score: 5, Interesting

    here's a gigabyte, you have 10 milliseconds to pull whatever's valuable out of it before the next gigabyte arrives.

    let's see. 1 GB in 10 ms works out to 100 GB per second. how recently did GB ethernet come about? and what would the average bandwidth of users be? i would guess much less, but let us assume 100KB per second.

    so you have 107374182400 bytes of data per second. your users can take 102400 bytes per second each. even if everyone was connected directly to your network (no delays or bottlenecks... ha!) you would still require 1048576 users (that is over 1 million).

    and this is not taking into effect sending any data BACK to the source or actual computation time on the users.

    -sam

    --
    burn the computers. go back to the abacus.
    1. Re:distributed computing by fiziko · · Score: 5, Informative

      The data figure stated above is at the actual data collection stage, not the anlysis stage, so it's not being transmitted via ethernet. The project I'm working on (ATLAS, which should be running on the LHC when it gets built in the next few years) has actually found that magnetic media cannot keep up with the data rate, so they had to figure out another means of storing the data while they were sorting it between particle bursts. They decided on a switched capacitor array, since that can keep up. The data actually goes through (IIRC) three stages of analysis before it's finally approved and recorded indefinitely. This filtered data is the stuff that will be transmitted via the Grid.

      --
      - W. Blaine Dowler
      http://www.bureau42.com
  2. Re:EU funding by san · · Score: 5, Informative

    The WWW, developed at CERN by Tim Berners-Lee springs to mind..

  3. Re:EU funding by pubjames · · Score: 5, Informative

    Has anyone actually seen an IT related EU project that achieved something?

    Government funded work, in the EU, US and internationally, actually drive changes in the IT industry a lot more than most people realise (or perhaps would care to admit).

    For christssakes, the web itself came out of a CERN project! Also many other web standards originated in EU funded projects, for instance JPEG and MPEG. So, the most common formats on the web for text (HTML), images (JPEG), and video (MPEG), all owe something to funding from the EU.

    And of course the Internet itself comes from US government funded projects. Even commonly used business process have resulted from government funded work (project management methodologies).

    Both Americans and Europeans like to bitch about the inefficies of their governments, but the fact of the matter is that if you look at the history of IT, more fundamental innovations come from government funded work than from industry. Of course Bill Gates, Larry Ellison etc. don't want you to think that, but that's the way it is.