"Evolution of the Internet" Powers Massive LHC Grid
jbrodkin brings us a story about the development of the computer network supporting CERN's Large Hadron Collider, which will begin smashing particles into one another later this year. We've discussed some of the impressive capabilities of this network in the past.
"Data will be gathered from the European Organization for Nuclear Research (CERN), which hosts the collider in France and Switzerland, and distributed to thousands of scientists throughout the world. One writer described the grid as a 'parallel Internet.' Ruth Pordes, executive director of the Open Science Grid, which oversees the US infrastructure for the LHC network, describes it as an 'evolution of the Internet.' New fiber-optic cables with special protocols will be used to move data from CERN to 11 Tier-1 sites around the globe, which in turn use standard Internet technologies to transfer the data to more than 150 Tier-2 centers. Worldwide, the LHC computing grid will be comprised of about 20,000 servers, primarily running the Linux operating system. Scientists at Tier-2 sites can access these servers remotely when running complex experiments based on LHC data, Pordes says. If scientists need a million CPU hours to run an experiment overnight, the distributed nature of the grid allows them to access that computing power from any part of the worldwide network"
warning: this is a "*.notlong.com" link... DO NOT CLICK.
there is a lot of quite fancy security stuff used. all users need a x.509 certificate to submit jobs.
"The LHC collisions will produce 10 to 15 petabytes of data a year"
The collisions will produce much more data, but "only" 15 PB of that will be permanently stored. That's a stack of CDs 20km high. Every. Year.
What a lot of people don't know is that if you want to join a cluster to the Open Science Grid and you are a legit organization more than likely they would let you join. Just be sure you understand your responsibilities as it's more of an active participation. If you are a school or computer user group/club go to the open science grid website and start reading up.
Warning: Although not for this crowd. Joining OSG (http://www.opensciencegrid.org/) is a bit more complicated than loading up BOINC or folding@home. It requires a stack of middleware that is distributed as part of OSG's software. Most of the sites I believe use Condor (http://www.cs.wisc.edu/condor/). If you would like to get Condor up and running quick the best way is using ROCKS (http://www.rocksclusters.org/wordpress/) with a Rocks Condor "Roll" (jargon for Rocks condor cluster). Then after getting your condor flock up and running you can load the Open Science Grid stuff on it.
I'm currently running a small cluster of PC's that were destined to be excessed (P4's 3 or 4 years old) and have seen jobs come in and process on my computers! And...to boot you can configure BOINC to act as a backfill mechanism so that when the systems are not running jobs from OSG they can be running BOINC and whatever project you've joined through that project.
BTW...all of the software mentioned is funded under grants from the National Science Foundation - primarily via the Office of CyberInfrastructure but some through other Directorates within NSF.
It has nothing to do with ISPs. The Tier1 sites are the largest sites around the world with thousands of CPUs and petabytes of storage to hand the influx of data. Typically there is no more than one Tier 1/country/experiement. Tier 2's in this nomenclature are generally university sites that have O(100) CPUs and O(100) TB of disk.
It wasn't very black...
There are two types of people in the world: Those who crave closure