CERN Collider To Trigger a Data Deluge

← Back to Stories (view on slashdot.org)

CERN Collider To Trigger a Data Deluge

Posted by kdawson on Monday May 21, 2007 @09:18PM from the things-that-go-bang dept.

slashthedot sends us to High Productivity Computing Wire for a look at the effort to beef up computing and communications infrastructure at a number of US universities in preparation for the data deluge anticipated later this year from two experiments coming online at CERN. The collider will smash protons together hoping to catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang. From the article: "The world's largest science experiment, a physics experiment designed to determine the nature of matter, will produce a mountain of data. And because the world's physicists cannot move to the mountain, an army of computer research scientists is preparing to move the mountain to the physicists... The CERN collider will begin producing data in November, and from the trillions of collisions of protons it will generate 15 petabytes of data per year... [This] would be the equivalent of all of the information in all of the university libraries in the United States seven times over. It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress. And there is no search function."

5 of 226 comments (clear)

Min score:

Reason:

Sort:

No Search Function by tacocat · 2007-05-21 21:27 · Score: 4, Interesting

Google it?

If Google is so awesome, maybe they can put their money where there mouth is and do something commendable. Of course, they'll probably have a hard time turning this data into marketing material.
Never mind the data by simong · 2007-05-21 21:34 · Score: 4, Interesting

What about the backups?
All pages are identical by Laxator2 · 2007-05-21 22:14 · Score: 5, Interesting

The main difference between the LHC data and the Internet is that all that 15 PB of data will come in a standard format, so a search is much easier to perform. In fact most of the search will consist on discarding non-interesting stuff while attempting to identify the very rare events that may show indications of new particles (Higgs for example). The Internet is a lot more diverse, the variety of information dwarfs the limited number of patterns LHC is looking for, so "no search available" for LHC data sounds more like "no search needed".
Re:I predict the end of the universe by TapeCutter · 2007-05-21 23:14 · Score: 4, Interesting

It seems the metric LoC = 10TB. If that is so then an LoC is no longer based on a physical library but has rather been redefined based on a more basic unit of information, (ie: the byte). This sort of thing has happened before, the standard time unit (second) is no longer based on the earth's rotation, rather it is based on some esoteric (but very stable) feature of cesium atoms.

IMHO: This is a GoodThing(TM), it could mean the LoC is well on it's way to becoming an accepted SI unit. :)

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Re:Too much for the 'Net by bockelboy · 2007-05-21 23:59 · Score: 4, Interesting

That's 4Gbps AVERAGE, meaning it's much below the peak rate. That's also the raw data stream, not accounting for site X in the US wanting to read reconstructed data from site Y in Europe.

LHC-related experiments will eventually have 70 Gbps of private fibers across the atlantic (Most NY -> Geneva, but at least 10Gbps NY -> Amsterdam), and at least 10 Gbps across the Pacific.

For what it's worth, here's the current transfer rates for one LHC experiment You'll notice that there's one site, Nebraska (my site), which averages 3.2 Gbps over the last day. That's a Tier 2 site - meaning it won't even recieve the raw data, just reconstructed data.

Our peak is designed to be 200TB / week (2.6Gbps averaged over a whole week). That's one site out of 30 Tier 2 sites and 7 Tier 1 sites (each Tier 1 should be about 4-times as big as a Tier 2).

Of course, the network backbone work has been progressing for years. It's to the point where Abilene, the current I2 network, rarely is at 50% capacity.

The network part is easy; it's a function of buying the right equipment and hiring smart people. The extremely hard part is putting disk servers in place that can handle the load. When we went from OC-12 (622 Mbps) to OC-192 (~10Gbps), we had RAIDs crash because we wrote at 2Gbps on some servers for days at a time. Try building up such a system without the budget to buy high-end Fiber Channel equipment too!

And yes, I am on a development team that works to provide data transfer services for the CMS experiment.