CERN Collider To Trigger a Data Deluge

← Back to Stories (view on slashdot.org)

CERN Collider To Trigger a Data Deluge

Posted by kdawson on Monday May 21, 2007 @09:18PM from the things-that-go-bang dept.

slashthedot sends us to High Productivity Computing Wire for a look at the effort to beef up computing and communications infrastructure at a number of US universities in preparation for the data deluge anticipated later this year from two experiments coming online at CERN. The collider will smash protons together hoping to catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang. From the article: "The world's largest science experiment, a physics experiment designed to determine the nature of matter, will produce a mountain of data. And because the world's physicists cannot move to the mountain, an army of computer research scientists is preparing to move the mountain to the physicists... The CERN collider will begin producing data in November, and from the trillions of collisions of protons it will generate 15 petabytes of data per year... [This] would be the equivalent of all of the information in all of the university libraries in the United States seven times over. It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress. And there is no search function."

56 of 226 comments (clear)

Min score:

Reason:

Sort:

OT: The size of the internet by chriss · 2007-05-21 21:27 · Score: 5, Informative

Okay, the Library of Congress has been estimated to contain about 10 Terabyte, so I buy the 1000 * LoC = 15 Petabyte. But archive.org alone expanded its storage capacity to 1 Petabyte in 2005, so the CERN is not going to generate anything near "22 Internet" (whatever that might be). This estimate from 2002 calculates the size of the internet as about 530 Exabyte, 440 Exabyte of which are email, 157 Petabyte for the "surface web"

--
memomo: free web based language trainer DE-EN-ES-FR-IT
1. Re:OT: The size of the internet by vitya404 · 2007-05-21 21:44 · Score: 2, Insightful
  
  Have you read that article? Firstly, what you say exa, is peta really. But, according to me the size of the internet is the available data through internet. And my emails are not available through the web (hopefully). And while the data transmitted through the network is redundant and huge part of it worthless data (eg. my post), this experiment will give us an enormous amount of meaningful, therefore valuable data.
2. Re:OT: The size of the internet by chriss · 2007-05-21 21:55 · Score: 3, Informative
  
  Firstly, what you say exa, is peta really.
  
  Me bad, miscalculated, off by a factor of 1000.
  
  --
  memomo: free web based language trainer DE-EN-ES-FR-IT
3. Re:OT: The size of the internet by Anonymous Coward · 2007-05-21 23:26 · Score: 5, Funny
  
  We are from NASA, and would like to offer you a job in mission planning.
4. Re:OT: The size of the internet by joto · 2007-05-21 23:46 · Score: 2, Interesting
  
  Meaningful and valuable to who? If I had to make the choice between using the bandwidth and storage space to store your post, or to store half a kilobyte of CERN sensor data, I would actually choose to store your post. And it's not because I find your post particularly valuable. It's because the CERN data is as meaningless to me as line-noise would be. For me even donkey bukkake with midgets is more meaningful, than random sensor data from CERN. Only when the scientists make discoveries from it that either carries important philosophical, economical, and/or practical benefits or changes, do I become interested.
Too much for the 'Net by DTemp · 2007-05-21 21:27 · Score: 3, Insightful

I hope they're planning on running their own fiber optic line across the Atlantic, or shipping a lot of hard drives, cause thats too much data to pass over the public internet.

FYI 15 petabytes per year = 120 petabits per year = 120,000,000 gigabits per year

120,000,000 gigabits per year / ~30,000,000 seconds per year = 4gbps of continuous transmission. They could run a fiber across the Atlantic that could handle 4gbps.
1. Re:Too much for the 'Net by Anonymous Coward · 2007-05-21 22:04 · Score: 2, Informative
  
  They could run a fiber across the Atlantic that could handle 4gbps.
  
  They have been getting sustained performance (with simulated data) of more than that for several years now. This is the sort of thing that Internet2 does well, when it's not on fire.
2. Re:Too much for the 'Net by Anonymous Coward · 2007-05-21 22:26 · Score: 2, Interesting
  
  "They could run a fiber across the Atlantic that could handle 4gbps."
  
  The .eu academic networks have a lot more transatlantic bandwidth than that already. When I worked at JANET (the uk academic network) we were one hop from .us and had 10G transatlatic bandwidth (how much of that was on-demand I can't remember). Geant, the .eu research network interconnect, also has direct connections to the .us research networks. The bandwidth is in place and has been for some time. It's being updated right now as well.
  
  Check out http://www.geant2.net/
3. Re:Too much for the 'Net by ender- · 2007-05-21 23:47 · Score: 2, Funny
  
  Is it really too much? The average torrent release of a popular TV show spreads to hundreds of users at an average of perhaps a megabit / second. University networks can probably handle that load without problem right now. Um, no they can't, they're full to the brim with torrent traffic. :)
  
  --
  Nothing to see here
4. Re:Too much for the 'Net by bockelboy · 2007-05-21 23:59 · Score: 4, Interesting
  
  That's 4Gbps AVERAGE, meaning it's much below the peak rate. That's also the raw data stream, not accounting for site X in the US wanting to read reconstructed data from site Y in Europe.
  
  LHC-related experiments will eventually have 70 Gbps of private fibers across the atlantic (Most NY -> Geneva, but at least 10Gbps NY -> Amsterdam), and at least 10 Gbps across the Pacific.
  
  For what it's worth, here's the current transfer rates for one LHC experiment You'll notice that there's one site, Nebraska (my site), which averages 3.2 Gbps over the last day. That's a Tier 2 site - meaning it won't even recieve the raw data, just reconstructed data.
  
  Our peak is designed to be 200TB / week (2.6Gbps averaged over a whole week). That's one site out of 30 Tier 2 sites and 7 Tier 1 sites (each Tier 1 should be about 4-times as big as a Tier 2).
  
  Of course, the network backbone work has been progressing for years. It's to the point where Abilene, the current I2 network, rarely is at 50% capacity.
  
  The network part is easy; it's a function of buying the right equipment and hiring smart people. The extremely hard part is putting disk servers in place that can handle the load. When we went from OC-12 (622 Mbps) to OC-192 (~10Gbps), we had RAIDs crash because we wrote at 2Gbps on some servers for days at a time. Try building up such a system without the budget to buy high-end Fiber Channel equipment too!
  
  And yes, I am on a development team that works to provide data transfer services for the CMS experiment.
5. Re:Too much for the 'Net by markov_chain · 2007-05-22 01:52 · Score: 2, Interesting
  
  If they could get 1GB/s sustained, it would take them... 173 days to transfer 15PB. I hope they have dark fiber to light up!
  
  --
  Tsunami -- You can't bring a good wave down!
6. Re:Too much for the 'Net by kestasjk · 2007-05-22 02:08 · Score: 2, Informative
  
  They're not going to run the particle accelerator for a day and then spend half a year transferring all the data generated, the lifetime of a particle accelerator is longer than 173 days.
  
  --
  // MD_Update(&m,buf,j);
7. Re:Too much for the 'Net by kestasjk · 2007-05-22 02:27 · Score: 2, Funny
  Oh wait, this is Slashdot.
  
  Okay, so that's 15 petabytes *tapping on calculator* that's 3.4x10^29 bits.
  
  Taking the maximum data rate from a given node as 3 gigabits per second, and taking into account the effect of bandwidth increases over time.. *tapping on calculator*
  
  Okay, and taking the average mosquito lifetime as 20 days.. *tapping on calculator*
  
  *breaks into a cold sweat*
  
  Now, assuming mutations in mosquitos occur at a rate of 1 base pair per generation, *tap tap tap* and that our genes are different from mosquitos by 2.4x10^6 base pairs.. *more tapping on calculator*
  
  By the time they have transferred this data to scientists across the world mosquitos will have become the new dominant species.
  --
  // MD_Update(&m,buf,j);
No Search Function by tacocat · 2007-05-21 21:27 · Score: 4, Interesting

Google it?

If Google is so awesome, maybe they can put their money where there mouth is and do something commendable. Of course, they'll probably have a hard time turning this data into marketing material.
1. Re:No Search Function by gedhrel · 2007-05-21 21:45 · Score: 3, Informative
  
  Well, there _is_ a search function, and that's what the tier-2 sites will be running. The data describes individual experiements (that is, individual collisions) and comes off LHC at a whacking rate. There's some front-end processing to throw away a lot of it before what's left gets sent to the tier-1 sites for further distribution.
  
  The data is suitable for high-throughput (ie, batch processing) and the idea is to keep copies of the experimental data in several places during processing. Interesting results get flagged up by the batch processing for further study.
2. Re:No Search Function by Raptoer · 2007-05-21 21:59 · Score: 2, Interesting
  
  The problem is less that there is no search function (with digital data all you're doing is matching one pattern to another), the problem is more that you don't know exactly what you are searching for!
  My guess is that they are looking for anomalies within the data that would indicate the presence of one of these subatomic particles. My guess furthermore is that once they get enough data analyzed they will be able to form a model to base a search function around.
  That or the summary lies (wouldn't be the first time) and in fact they know exactly what they are searching for, and they have a search function, but of course someone has to look at the output of those functions to determine what impact they have on their model/ideas.
3. Re:No Search Function by scheme · 2007-05-22 00:20 · Score: 4, Informative
  
  The problem is less that there is no search function (with digital data all you're doing is matching one pattern to another), the problem is more that you don't know exactly what you are searching for!
  My guess is that they are looking for anomalies within the data that would indicate the presence of one of these subatomic particles. My guess furthermore is that once they get enough data analyzed they will be able to form a model to base a search function around.
  That or the summary lies (wouldn't be the first time) and in fact they know exactly what they are searching for, and they have a search function, but of course someone has to look at the output of those functions to determine what impact they have on their model/ideas.
  
  For a lot of the physics, the researchers know what they are looking for. For example, with the Higgs boson, theories constrain the decay and production to certain channels that have characteristic signatures. So they would be looking for events that have a muon at a certain energy with a hadron jet with another given energy coming off x degrees away and so on. There have been monte carlo simulations and other calculations done to predict what the interesting events should look like using various different theories. Of course there maybe interesting events that pop up that no one has predicted but everyone has a fairly good idea of what the expected events should look like.
  
  --
  "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
4. Re:No Search Function by Benson+Arizona · 2007-05-22 00:35 · Score: 5, Funny
  
  Buy Higgs Boson now at e-bay.com
  
  Buy books about Bosons at Amazon.com
60% by Alsee · 2007-05-21 21:34 · Score: 4, Funny

The CERN collider will begin producing data in November, and from the trillions of collisions of protons it will generate 15 petabytes of data per year... [This] would be the equivalent of all of the information in all of the university libraries in the United States seven times over. It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress. And there is no search function.

And 60% of it will be porn.

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
1. Re:60% by carpe_noctem · 2007-05-21 22:02 · Score: 4, Funny
  
  mmmm... particle porn!
  
  See the hottest collisions on the web! Watch as innocent particles get ripped apart, revealing their inner quarks! See protons get exploited and penetrated in their luscious gluons!
  
  --
  "Quoting famous computer scientists out of context is the root of all evil (or at least most of it) in programming." - K
2. Re:60% by lexarius · 2007-05-22 01:23 · Score: 5, Funny
  
  Talk like that gives me a large hadron.
3. Re:60% by rasputin465 · 2007-05-22 04:49 · Score: 2, Informative
  
  wow, i almost spit coffee all over my laptop when i read that. careful, yo.
Never mind the data by simong · 2007-05-21 21:34 · Score: 4, Interesting

What about the backups?
1. Re:Never mind the data by dylan_- · 2007-05-21 23:40 · Score: 2, Funny
  
  Nah, there are robots for that. Big robots.
  Tape backup?
  
  --
  Igor Presnyakov stole my hat
re: 15 petabytes? by GNUThomson · 2007-05-21 21:43 · Score: 2, Funny

The real fundamental question is not about beginning of the universe, but something much much more important: Are they going to backup the data?
On the other hand, I'm sure it will be available on some torrent soon.
Neutrinos by MichaelSmith · 2007-05-21 21:46 · Score: 4, Funny

I hope they're planning on running their own fiber optic line across the Atlantic

You know with the right sort of particle accelerator you could send messages straight through the Earth and save a heap of latency.

--
http://michaelsmith.id.au
1. Re:Neutrinos by Dunbal · 2007-05-22 00:10 · Score: 4, Funny
  
  You know with the right sort of particle accelerator you could send messages straight through the Earth and save a heap of latency.
  
  It's called the "Death Star" project, and we've been having a hell of a time with the receiver...
  
  --
  Seven puppies were harmed during the making of this post.
GASP by Excelcia · 2007-05-21 21:47 · Score: 2, Funny

Lepton dancers wearing gluons.... WHOA!
Never underestimate the bandwidth of a 747 by Rix · 2007-05-21 21:48 · Score: 2, Insightful

So long as it's not needed right now pretty much any amount of data can be transmitted.
1. Re:Never underestimate the bandwidth of a 747 by MikShapi · 2007-05-21 23:44 · Score: 3, Informative
  
  That's a highly misleading figure (whatever figure you had in mind).
  
  When you add the amount of time, money, kit and effort that'd go into either burning that many optical disks or filling that many harddrives, then connecting them on the other end and reading it out makes it less attractive than fiber optics.
  
  On the other hand, if the 747 is crammed full of ultra-high-capacity hard-drives (say, the new Hitachi 1TB) in high-density racks that do not need unloading from the aircraft (it lands, it plugs into a power/multiple-10GbE-grid, offloads the data to a local ground facility, then goes out for the next run), you get something that'd possibly be competitive with fiber, as well as a possible business model avenue.
  
  You would, of course, need someone to be willing pay the rough equivalent of .. say .. 500 economy airline tickets (shooting from the hip here, I tried compounding business/first-class costs).. to get that through. That's a lot of cash. Then again, at 1TB/drive, it's a LOT of data.
  
  --
  -
2. Re:Never underestimate the bandwidth of a 747 by Dunbal · 2007-05-22 00:12 · Score: 4, Funny
  
  If they had an A380 (Airbus for teh win ;-)) worth of hard drives installed and ready to tap data, they would not need to move all that data.
  
  I'm sorry, how much is that in Cessna 172's again?
  
  --
  Seven puppies were harmed during the making of this post.
3. Re:Never underestimate the bandwidth of a 747 by fbjon · 2007-05-22 02:47 · Score: 5, Informative
  
  We obviously want to use maximum storage per HD weight, which is currently the Hitachi Deskstar 7K1000, we would have 1,000,000,000,000 bits per a maximum of 700 grams.
  Using the maximum payload weight of an A380F (freighter model), we get with Google calc: (152 400 kg / 700 grams) * 1Tbytes = 193.36913 petabytes, which is 12.8912753 years worth of CERN CMS data over a maximum distance of 5,600 nautical miles.
  The maximum useful load of a Cessna 172 is 371 kg, which gives a meager 0.0313823042 years worth of data over a maximum distance of 687 nm.
  The raw distance between CERN and Purdue University (not including distances to airports and such) is about 3838 nm, well within range of the A380F. The Cessna 172 falls into the ground/ocean long before that however. Since there's no air-refueling option for the Cessna, the plan calls for a fleet of at least 179 Cessna 172's constantly working in relay, just to keep up with the data production rate!
  So, to answer your question: If you want the same leisurely pace of using one A380F, you'll need a massive 2148 Cessnas flying for a full year, every 12 years (the total weight of which is equivalent to 531 A380F's, which should tell you something about the efficiency of said plan).
  
  --
  True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
22 Internets per year? by UnHolier+than+ever · 2007-05-21 21:54 · Score: 4, Funny

Would that be 0.84 Internet per forthnight? Or 1 kiloLibrary per Congress session? How much in tubes?
1. Re:22 Internets per year? by AndyboyH · 2007-05-21 22:37 · Score: 2, Funny
  
  How much in tubes?
  
  Too much, and that's why we should pay the good companies all our hard earned cash to drill giant tubes for all our torrents, MP3s, smut and VoIP calls. Or at least, wasn't that what they were arguing for? ;)
  
  --
  Baka Drew
2. Re:22 Internets per year? by SharpFang · 2007-05-21 23:09 · Score: 2, Funny
  
  The tube radius of 420 attoparsecs.
  
  OTOH owning the harddrives capable of holding this much data gives you about 730 kilometers of e-penis.
  
  --
  45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
Re:Is there a danger or isn't there? by SamSim · 2007-05-21 22:03 · Score: 4, Funny

all the particle colliders of the most recent generation (like the Tevatron at Fermilab or the Relativistic Heavy Ion collider in New York) have the capability (if certain theoretical models are accurate enough) to generate very tiny (around nine millimeters), but stable black holes (though the probability is extremely low)

Well, yeah, but the probability is about the same as that of you generating a small black hole by clapping your hands together really hard.

--
qntm.org
All pages are identical by Laxator2 · 2007-05-21 22:14 · Score: 5, Interesting

The main difference between the LHC data and the Internet is that all that 15 PB of data will come in a standard format, so a search is much easier to perform. In fact most of the search will consist on discarding non-interesting stuff while attempting to identify the very rare events that may show indications of new particles (Higgs for example). The Internet is a lot more diverse, the variety of information dwarfs the limited number of patterns LHC is looking for, so "no search available" for LHC data sounds more like "no search needed".
1. Re:All pages are identical by Anonymous Coward · 2007-05-22 00:13 · Score: 2
  
  Actually, "discarding non-interesting stuff" is exactly how particle physicists work! Look at the design of a big experiment, all the different parts of the detector are there to work out which particles are boring, well understood stuff like Muons and discard them.
  
  I doubt they'll actually delete any of this data once they have it safely on disk, but you can bet your life that most of it is going to be filtered out and basically ignored.
2. Re:All pages are identical by Laxator2 · 2007-05-22 00:28 · Score: 2, Insightful
  
  What I mean by "discarding non-interesting stuff" is not actually delete the data from disk. If this were the case, what need would be for 15 PB of storage ? The thing is that what the LHC people (and the whole physics community) want very badly is some signature of new physics. That means either Higgs, or supersymmetric partners of known particles, or even microscopic black holes (most people are skeptical about that, but look anyway at: http://www.slac.stanford.edu/spires/find/hep/www?r awcmd=f+a+thomas+and+giddings&FORMAT=WWW&SEQUENCE= to see how many times it has been cited. That gives an idea of how many papers have been written on the subject) The "non-interesting stuff" will be used to improve current limits on experimental data, but if nothing genuinely new will be found it is very likely that the LHC will be the last large particle accelerator ever built.
3. Re:All pages are identical by vondo · 2007-05-22 04:25 · Score: 2, Informative
  
  More like 6 or more extra zeros, actually. There seems to be a lot of confusion about this, so let me try to explain.
  
  Generally the data coming out of these experiments is filtered in two or more stages. It has to run in real time since the data volume is enormous. A detector like this can easily spew out several TB a second of raw data. The first layer of filtering will look at very small portions of the data and make very loose requirements on it, but can run very fast in dedicated electronics. This might discard 99.99% of the events and keep 90% of the interesting stuff, for instance. Now you have a much smaller volume of data, so you can afford to spend more time on it. So maybe you run a pared down version of the full reconstruction software. This is much more sophisticated software, so maybe you can get rid of 99% of what remains and only toss out 10% of the interesting interactions. This stage might be done on a cluster of 1000 computers or more. At the end, you've kept one out a million events and only thrown away 20% of what might be useful. But you need both steps. Skip the first step and you need a network with 10,000 times as much bandwidth and a computer cluster of 10 million computers. Skip the second step and instead of 100 PB of storage, you need 10,000. And you need to deal with all that data in the next step.
  
  The initial filtering is not the end of the story. The one event in a million that passes will be reconstructed with the full, best software available along with the other billion events that pass. Then those will be filtered again based on different types of physics signatures and sent to the researchers looking at that one particular type of interaction. This process also requires thousands of CPUs. The big LHC experiments will have 40 million interactions/second and each interaction might contain 25 collisions. The vast majority of these are understood (not interesting) but the challenge is to sort through those 1 billion interactions a second in a finite amount of time to find the interesting ones. The two stages I've described are called "triggering" and "offline event reconstruction and filtering" if you want to try to find out more.
  
  There go the mod points I assigned earlier in this discussion.
Gaaa aaaaa aaaaaaa by $RANDOMLUSER · 2007-05-21 22:30 · Score: 3, Funny

"Like an exercise session getting you ready for the big game, we've been going to the physics gym," Hacker says
Must. Erase. Image.
Physics locker room.

--
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
I predict the end of the universe by jamesh · 2007-05-21 22:49 · Score: 4, Funny

This is really bad news. By defining the amount of data in LoC's, they leave themselves open to a huge exploit... If the LoC ever includes this data, then there will be a recursive loop of definitions and the LoC will expand to fill the universe.

Okay... maybe not, but if they ever did put this data in the LoC, the effort required to re-factor all the LoC based measurements would bankrupt the world. And the confusion that goes on while this re-factoring is happening will surely crash at least one probe into Mars, where the English have used the new LoC units and the Americans will have used the old LoC units.
1. Re:I predict the end of the universe by TapeCutter · 2007-05-21 23:14 · Score: 4, Interesting
  
  It seems the metric LoC = 10TB. If that is so then an LoC is no longer based on a physical library but has rather been redefined based on a more basic unit of information, (ie: the byte). This sort of thing has happened before, the standard time unit (second) is no longer based on the earth's rotation, rather it is based on some esoteric (but very stable) feature of cesium atoms.
  
  IMHO: This is a GoodThing(TM), it could mean the LoC is well on it's way to becoming an accepted SI unit. :)
  
  --
  And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
So.. by mindwhip · 2007-05-21 22:59 · Score: 2

FTA:"catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang."

Who was at the Big Bang to see them then? I suspect that the numbers are a lot lower than the number of people that heard that tree fall in the woods and heard the sound of one hand clapping put together.

--
[The Universe] has gone offline.
Worst Hyperbole Ever... by Glock27 · 2007-05-21 23:12 · Score: 3, Insightful

The collider will smash protons together hoping to catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang.
That line is some of the worst hyperbole ever. Here's why. First, there was (almost by definition) no one there to 'see' anything at the Big Bang. (Supernatural explanations aside, and this purports to be a science article.) Second, these subatomic particles are formed frequently in nature, as high-energy astronomy has found various natural particle accelerators that are FAR more powerful than anything we're likely to build on Earth.
One hopes the author will do better next time.

--
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
Bush and his internets by cl191 · 2007-05-21 23:15 · Score: 2, Funny

It would be the equivalent of 22 Internets So our President was right about the "Internets" after all, he must have access to a few of those 22 Internets!
Re:Remember by kramulous · 2007-05-21 23:18 · Score: 2

I'm willing to bet that they're all over it. And have even considered the possibility of a lot more than your 'average' figures given that a significant event may increase this data deluge. There is a lot at stake with this experiment (series of). A lot of future funding is dependent on how well this project has been managed, down to the smallest (pun originally not intended) detail.

--
.
Re:Disturbing and unsettling by Anonymous Coward · 2007-05-22 00:35 · Score: 2, Informative

"fixed without delaying the project" says the parent!
Truth: There are several news agencies that have booked flights to descend upon CERN at the "supposed" start of the LHC in November. What will they come and see, lots of hype and not much!
What will happen? Single beam commissioning earliest in May. Collisions probably in August. Not earlier.
I hate being a Anon Coward, but there you go... Yes, I am sitting at a CERN office right now.
Re:Remember by bockelboy · 2007-05-22 00:51 · Score: 5, Informative

I do work with one of the LCG projects, so let me share some of my personal opinions with you (all this info is mostly available on the web, if you can find it. We keep no secrets.).
I don't think CERN has HTTP/FTP servers right on a OC Internet backbone, or the server structure (think magnitudes greater than Google's) to drive the data.
Oh yes we do. You are right though - buying network bandwidth is a lot more straightforward than building an disk / server infrastructure to handle all the data. It's difficult, but being accomplished.

I think total - transatlantic fiber plus the European equivalent of Internet2 - bandwidth to CERN will amount to 100 Gbps - about 10 OC-192s. Universities buy into private global fiber networks, which are independent of the public internet.

We then use gridFTP as a transport, which is basically PKI-protected FTP which transfers in N many parallel TCP streams. Then, we use a protocol called SRM to control the gridFTP transfers and (well, the CMS experiment) uses a higher-level application called PhEDEx to control worldwide data movement. Right now, PhEDEx directs about 8-10 Gbps worldwide, and we aren't "doing anything" big.

GridFTP is a fairly effective protocol. I can get near-line speed - 2Gbps from a channel bonded RAID device. Locally, we've been buying large RAIDs - 30TB a box, building up to 200TB this fall. Some sites take a more "clustered" approach - they put a few 500-750 GB drives in each of the cluster's worker nodes, and build up to 200TB that way. Costs are lower, but you have to keep 2 copies of each file in the cluster, plus have the headache of swapping out drives. Of course, I like our method better. In addition, larger, T1 sites have a few petabytes in tape silos.

Funding agencies don't just throw money into projects for years at a time, then wait for results. Two years ago, we did a test at 25% of the turn-on "complexity" (in terms of jobs run and data movement). Last year, we increased that to 50% complexity. Toward the end of this summer, we will have a challenge called CSA07 which should be between 75-100% complexity. Finally, turn-on should be around November this year.

This is a multi-billion dollar project which has been under development for 10-15 years. We've been doing lots and lots of careful planning.
Don't forget the security... by siasl · 2007-05-22 00:52 · Score: 2

The NSA will have to scan the data for potential terrorist Tachyons hiding among the Bosons. That will slow things down a bit.
Think for a moment by kilodelta · 2007-05-22 00:53 · Score: 2, Interesting

There are some other benefits to building such a huge network of high powered computers. And it's not the teleportation you thought, it's more copying of metadata and re-creating the original.

Think about it, the only thing stopping us is the ability to store and transfer large amounts of data necessary to describe the precise makeup of a human being. I have a feeling this project will branch off into that area.
1. Re:Think for a moment by Control+Group · 2007-05-22 01:23 · Score: 2, Funny
  
  kilodelta, I have someone I think you should meet. His name is Werner Heisenberg, and he's got some ideas that may interest you.
  
  --
  
  Reality has a conservative bias: it conserves mass, energy, momentum...
22 Internets? by Evil+Cretin · 2007-05-22 01:34 · Score: 2, Funny

Sounds like the article was written by Senator Stevens. Nothing to fear, 22 emails can't possibly clog our tubes.

--
"A deadlock has been reached. One task must die. We must now choose between murder and suicide."
Re: 15 petabytes? by databyss · 2007-05-22 01:52 · Score: 3, Funny

My quantum computer has been working on downloading the torrent for the past few weeks.

--
Hmmm witty sig or funny sig? Maybe elitest techy sig!
Re:Remember by Falstius · 2007-05-22 01:56 · Score: 2, Informative

Another thing to point out is that, at least for ATLAS, researchers don't get their data directly from CERN. CERN has fat dedicated pipes to what are called Tier-1 data centers, which are spread around the world. I think these centers build the raw data into structured events. Then there are smaller Tier-2 data centers (I worked for one of the Universities hosting a Tier-2 center) which get these structured events and that is where Joe physicist gets his data from. Also, these data centers have processing power on site to run programs submitted by physicists, so most of this data will never touch the everyday internet.

For some reason, ATLAS and CMS don't use the same techniques and technologies for just about anything from detector design down to the style of pen carried in their pocket protectors. So anything said for ATLAS does not necessarily hold true for CMS (the other big detector on the LHC).
22 Internets by rubberbandball · 2007-05-22 02:10 · Score: 2, Funny

That's a lot of tubes.

--
oh marmalade.