I make a part of the software for the grid infrastructure thingies. All apache 2 licensed.
Most of the experimental physics stuff, the infrastructure, technical plans, agendas, nearly everything is open to view.
How interesting...
Just a 10Gbit/s WAN? How about, 11x that to each (combined) Tier-1 center via an optical private network to get the load of the data onto the Grid.
How about the solutions created to get so much data spread out, indexed, replicated, and distributed.
Perhaps that ain't that interesting. Perhaps the total capacity of the Grid being about 30 PB ain't that impressive. Perhaps the concept of more then 200 clusters big and small across different administrative domains at your finger tips might be not that challenging as it may seem.
Ow well, let's focus on the database. Since that holds the least amount of actual data. Being it still is the biggest Oracle instance according to Oracle.
I'd like to see you cards by asking you to bzip2 one month of data on your prefered desktop.
Now you're contributing to science and you have an extended coffee break.
Actually... all the data that is kept is going on to a huge tape store. The live data is flowing to the biggest data centers on the Grid. So, basically all the huge distributed data centers will need to pay for the needed storage.
Estimates are that the four LHC experiments will produce about 15 PetaByte/year. The LHC will be online for about 15 years (maybe more). All data is kept permenantly. This means that there is a fail-safe copy stored at CERN on tape, which is a big task to perform constently.
But that data is not worked on there, it is spread through the huge tubes of the academic fibers to big data centers around the world. All that online copy is replicated and is stored at two geographical locations. At each location most of the data (depends on the type) is mirrored to tape. So the largest volumes is on tape but there is still a need for mucho-grande cache servers, which are mostly huge disk-arrays.
The 10-11 biggest data centers will store and perform (re-)processing of the data at the rate in which it is produced. The other 190 data centers are calculating the physics analyses of all the (local) science groups.
ps: Most data is analyses/processed multiple times.
I make a part of the software for the grid infrastructure thingies. All apache 2 licensed. Most of the experimental physics stuff, the infrastructure, technical plans, agendas, nearly everything is open to view.
How interesting... Just a 10Gbit/s WAN? How about, 11x that to each (combined) Tier-1 center via an optical private network to get the load of the data onto the Grid. How about the solutions created to get so much data spread out, indexed, replicated, and distributed. Perhaps that ain't that interesting. Perhaps the total capacity of the Grid being about 30 PB ain't that impressive. Perhaps the concept of more then 200 clusters big and small across different administrative domains at your finger tips might be not that challenging as it may seem. Ow well, let's focus on the database. Since that holds the least amount of actual data. Being it still is the biggest Oracle instance according to Oracle.
I'd like to see you cards by asking you to bzip2 one month of data on your prefered desktop. Now you're contributing to science and you have an extended coffee break.
Actually ... all the data that is kept is going on to a huge tape store. The live data is flowing to the biggest data centers on the Grid. So, basically all the huge distributed data centers will need to pay for the needed storage.
Estimates are that the four LHC experiments will produce about 15 PetaByte/year. The LHC will be online for about 15 years (maybe more). All data is kept permenantly. This means that there is a fail-safe copy stored at CERN on tape, which is a big task to perform constently. But that data is not worked on there, it is spread through the huge tubes of the academic fibers to big data centers around the world. All that online copy is replicated and is stored at two geographical locations. At each location most of the data (depends on the type) is mirrored to tape. So the largest volumes is on tape but there is still a need for mucho-grande cache servers, which are mostly huge disk-arrays. The 10-11 biggest data centers will store and perform (re-)processing of the data at the rate in which it is produced. The other 190 data centers are calculating the physics analyses of all the (local) science groups. ps: Most data is analyses/processed multiple times.