disCERNing Data Analysis

Posted by Hemos on Wednesday November 21, 2001 @03:50AM from the making-more-sense-of-things dept.

technodummy writes: "Wired is reporting how CERN is driving the Linux-based, EU funded, DataGRID project. And no, they say, it's nothing like Seti@Home. The description on the site of the project is: ' The objective is to enable next generation scientific exploration which requires intensive computation and analysis of shared large-scale databases, from hundreds of TeraBytes to PetaBytes, across widely distributed scientific communities.'" If you're interested in this, check out the Fermi Lab work with LinuxNetworkX data as well as the all-powerful Google search on the Fermi Collider Linux project. As jamie points out, "Colliders produce *amazing* amounts of data in *amazingly* short time periods... on the order of "here's a gigabyte, you have 10 milliseconds to pull whatever's valuable out of it before the next gigabyte arrives".

1 of 82 comments (clear)

Min score:

Reason:

Sort:

distributed computing by sam@caveman.org · 2001-11-21 04:03 · Score: 5, Interesting

here's a gigabyte, you have 10 milliseconds to pull whatever's valuable out of it before the next gigabyte arrives.

let's see. 1 GB in 10 ms works out to 100 GB per second. how recently did GB ethernet come about? and what would the average bandwidth of users be? i would guess much less, but let us assume 100KB per second.

so you have 107374182400 bytes of data per second. your users can take 102400 bytes per second each. even if everyone was connected directly to your network (no delays or bottlenecks... ha!) you would still require 1048576 users (that is over 1 million).

and this is not taking into effect sending any data BACK to the source or actual computation time on the users.

-sam

--
burn the computers. go back to the abacus.