disCERNing Data Analysis

← Back to Stories (view on slashdot.org)

Posted by Hemos on Wednesday November 21, 2001 @03:50AM from the making-more-sense-of-things dept.

technodummy writes: "Wired is reporting how CERN is driving the Linux-based, EU funded, DataGRID project. And no, they say, it's nothing like Seti@Home. The description on the site of the project is: ' The objective is to enable next generation scientific exploration which requires intensive computation and analysis of shared large-scale databases, from hundreds of TeraBytes to PetaBytes, across widely distributed scientific communities.'" If you're interested in this, check out the Fermi Lab work with LinuxNetworkX data as well as the all-powerful Google search on the Fermi Collider Linux project. As jamie points out, "Colliders produce *amazing* amounts of data in *amazingly* short time periods... on the order of "here's a gigabyte, you have 10 milliseconds to pull whatever's valuable out of it before the next gigabyte arrives".

8 of 82 comments (clear)

Min score:

Reason:

Sort:

shear quantity of data by Alien54 · 2001-11-21 04:02 · Score: 3, Interesting

This first reaction I have is that I wonder if a distributed model of computing would even be able to make a dent if the amount of data is that big.
Does anyone have a idea on how much data Seti at home has processed? This would certainly be useful as a yard stick of sorts.

--
"It is a greater offense to steal men's labor, than their clothes"
distributed computing by sam@caveman.org · 2001-11-21 04:03 · Score: 5, Interesting

here's a gigabyte, you have 10 milliseconds to pull whatever's valuable out of it before the next gigabyte arrives.

let's see. 1 GB in 10 ms works out to 100 GB per second. how recently did GB ethernet come about? and what would the average bandwidth of users be? i would guess much less, but let us assume 100KB per second.

so you have 107374182400 bytes of data per second. your users can take 102400 bytes per second each. even if everyone was connected directly to your network (no delays or bottlenecks... ha!) you would still require 1048576 users (that is over 1 million).

and this is not taking into effect sending any data BACK to the source or actual computation time on the users.

-sam

--
burn the computers. go back to the abacus.
This wont work very well by sopwath · 2001-11-21 04:16 · Score: 1, Interesting

I think they are assuming they'll be able to actually get all this raw data out to people around the world. That's going to be a problem for people on dial-up.(still the majority in the US, what about europe?) Plus the fact that it's going to cost a hell of a lot of money to keep thier end of the data pipe from bursting. Even if they only have a couple hundered megabytes per second that's quite a bit to maintain.

I know broadband is getting more accepted, but I don't think real-time is going to work on this kind of scale. SETI is successful because anyone can run it (evenif it is slow) and there's competition to get the most work units done. Without something to keep people interested, no one is going to run anything from CERN. Without the ability for a broad range of people to run a client or something, there's not going to be enough people anyway.

Harddrive space is cheap (compared to a super-colider) why can't they store all these petabytes of data? When the project gets more successful, they'll be able to actually analyse all the extra data they've got. I mean if you're going to spend that much money on a colider, you might as well get as much info as you can from it.

good luck,
sopwath
Solid State Niche by Cesaro · 2001-11-21 04:19 · Score: 2, Interesting

This is exactly what the niche market for solid state drives is. You have gigs of data you need to get there FAST...then you can worry about picking it apart afterwards. After you have it on the solid state drive, then as long as you don't lose power and your UPS power, you can leisurely use however many computers you want to nit pick it without having to worry about missing data.
1. Re:Solid State Niche by Anonymous Coward · 2001-11-21 04:24 · Score: 1, Interesting
  
  Sure. Solid state whatever. Do you know what they cost? $10,000 per gigabyte. Useless for this project. You'd need a billion of these drives.
It's a little-known fact... by JohnPM · 2001-11-21 04:30 · Score: 2, Interesting

That 1 petabyte, if stored as an area of black and white 8mm square bathroom tiles with 2mm grout would cover an area of 900,720 square kilometres which is about 741 times the area of Los Angeles.

Bring on the pixie dust!

(source)

--
Karma police, I've given all I can, it's not enough, I've given all I can, but we're still on the payroll.
Re:EU funding by pubjames · 2001-11-21 04:37 · Score: 3, Interesting

Has anyone actually seen an IT related EU project that achieved something? The company I work for has been involved in two EU project proposals so far, and nothing came of either of them -- though they both consumed a large ammount of resources from universities to get through the three failed applications each.

Perhaps you are expecting the wrong results.

I have been involved in a couple of large EU funded projects, and have spoken to the project managers about the aims and motives of the projects.

One principal point is that just because a new successful product/standard/format whatever does not arise from a project, does not mean that it has been a failure.

The EU is made up of lots of different countries with lots of different types of people speaking different languages and with different working mentalities. This is a major competitive disadvantage for us compared to a country like the US. If a company in San Francisco wants to work with a company in New York, there aren't many barriers to them doing that. In the EU, there are lots of barriers. One of the main aims of EU funded projects (and the EU in general) is to break down these barriers by getting different companies and universities working together across the EU. If new technologies come our of these projects, so much the better, but that's not necessarily the principal aim.
Virtual science by Sloppy · 2001-11-21 05:27 · Score: 3, Interesting

This reminds me of an astronomy-related story I saw yesterday. Some projects are generating more data than the people doing the projects can handle.

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.