Slashdot Mirror


IT At the LHC — Managing a Petabyte of Data Per Second

schliz writes "iTnews in Australia has published an interview with CERN's deputy head of IT, David Foster, who explains what last month's discovery of a 'particle consistent with the Higgs Boson' means for the organization's IT department, why it needs a second 'Tier Zero' data center, and how it is using grid computing and the cloud. Quoting: 'If you were to digitize all the information from a collision in a detector, it’s about a petabyte a second or a million gigabytes per second. There is a lot of filtering of the data that occurs within the 25 nanoseconds between each bunch crossing (of protons). Each experiment operates their own trigger farm – each consisting of several thousand machines – that conduct real-time electronics within the LHC. These trigger farms decide, for example, was this set of collisions interesting? Do I keep this data or not? The non-interesting event data is discarded, the interesting events go through a second filter or trigger farm of a few thousand more computers, also on-site at the experiment. [These computers] have a bit more time to do some initial reconstruction – looking at the data to decide if it’s interesting. Out of all of this comes a data stream of some few hundred megabytes to 1Gb per second that actually gets recorded in the CERN data center, the facility we call "Tier Zero."'"

19 of 248 comments (clear)

  1. Call the Interns! by Sponge+Bath · · Score: 3, Funny

    We need backup on floppy disk.

    1. Re:Call the Interns! by DigiShaman · · Score: 3, Funny

      You don't what them to be idle, now do you? Use punch cards instead. Trust me.

      -BOFH

      --
      Life is not for the lazy.
  2. Keeping us humble... by Anonymous Coward · · Score: 3, Interesting

    My wife, a staff physicist at FermiLab in their computing division, manages to keep me humble when I talk about the "big data" work I'm doing in my commercial engineering position. I think having to deal with a billion or so data points per day is big... Not so much in her universe!

    1. Re:Keeping us humble... by plover · · Score: 4, Funny

      And we jokingly call our data center the "Large Software Collider". Not as funny when the real thing is even bigger!

      --
      John
    2. Re:Keeping us humble... by somersault · · Score: 2

      Of course hadrons are bigger than softwares, not to mention a lot more fun in collisions.

      --
      which is totally what she said
    3. Re:Keeping us humble... by ethanms · · Score: 2

      My wife, a staff physicist at FermiLab in their computing division

      Much like the HB itself, up until recently I assumed these were only theoretical...

  3. GRID ack by PiMuNu · · Score: 3, Interesting

    I tried using the GRID - it's deeply embedded in acronyms and crud, practically impossible to use without a PhD. For crying out loud, it's just a batch farm!

  4. pretty described on the LHC-CMS websites by peter303 · · Score: 2

    I was looking up how complicated the detectors were, and they were. They have 75M directional sensors and 9K energy detectors (calorimeters), each which are analyzed 40M times a second for "interesting" events. One out of a billion maybe recorded for subsequent deep analysis.

  5. Re:I don't really give a s h i t by Fusselwurm · · Score: 2

    What do you want to imply?

    That, somehow, he who does not know how to debug the kernel should not play with bit operations?

    Something like that?

      Or, that we should stop researching the structure of the universe, and instead focus on what we usually do, which is making war, screwing other people and post photos of our dicks on teh internet?

  6. And Still. by CimmerianX · · Score: 4, Funny

    The head researcher will STILL come to IT and ask them to please help him sync his outlook contacts to his phone.

  7. grep by atisss · · Score: 2

    So they just used grep

  8. Which amounts to... by Travelsonic · · Score: 2

    Roughly, assuming you can round it off to 53 weeks/year, if you do 1Petabyte/ear, and transferred that much constantly, that would be roughly 2887200000000000000000000000000000000000000000000000 BITS [individual 1s or 0s] per year

    --
    If you believe in privacy, and believe you have "nothing to hide" at the same time, you're a goddammed idiot
  9. Power limitations by onyxruby · · Score: 3, Informative

    Did a bunch of work with some stock exchanges a few years back. It was an interesting environment and I see that CERN had the same problems that the stock exchanges had. They even had the where the number one budgetary item wasn't cost but electric load.

    You only had so much power physically available in the data centers next to the exchanges and server rooms inside them. Monetary cost was never an issue, but electric load was everything. It seems funny considering their load is strictly a science based load and not monetary, but their requirements and distribution remind me greatly of the exchanges.

  10. Re:You mean... by cduffy · · Score: 4, Interesting

    VMWare is pretty widely recognized as the king of virtualization-- at least so long as you arent concerned with money. Its overhead is far far smaller than the others especially when dealing with huge numbers of connections, and it simply has more features than its competitors.

    Which doesn't mean those features are implemented well.

    Not so long ago, I built an automated QA platform on top of Qumranet's KVM. Partway through the project, my employer was bought by Dell, a VMware licensee. As such, we ended up putting software through automated testing on VMware, manual testing on Xen (legacy environment, pre-acquisition), and deployment to a mix of real hardware and VMware.

    In terms of accurate hardware implementation, KVM kicked the crap out of what VMware (ESX) shipped with at the time. We had software break because VMware didn't implement some very common SCSI mode pages (which the real hardware and QEMU both did), we had software break because of funkiness in their PXE implementation, and we otherwise just plain had software *break*. I sometimes hit a bug in the QEMU layer KVM uses for hardware emulation, but when those happened, I could fix it myself half the time, and get good support from the dev team and mailing list otherwise. With VMware, I just had to wait and hope that they'd eventually get around to it in some future release.

    "King of virtualization"? Bah.

  11. Re:I don't really give a s h i t by Joce640k · · Score: 2

    News that matters? The human race is not even able to handle itself and it wants to play with atoms.

    I assume you're using a computer to post that? Maybe own a cellphone....?

    That makes you a hypocrite of the worst kind. Sorry, but there it is in black and white.

    --
    No sig today...
  12. Re:You mean... by X0563511 · · Score: 2

    The King Joffrey of virtualization, perhaps.

    --
    For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
  13. Re:They need some serious... by X0563511 · · Score: 2

    That takes time. Time vs Space tradeoff.

    --
    For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
  14. CERN network architecture by Joshua+Fan · · Score: 2

    Those with further interest in the article may find this informative:

    http://www.geant2.net/upload/pdf/LHC_networking_v1-9_NC.pdf

    Apparently, CERN uses BGP between T0 and T1, and uses only ACLs, no firewalls, for security.

  15. Re:Large Organization Has 2 Data Centers by HappyPsycho · · Score: 2

    'Score: 3, Funny' - This is hilarious, from TFA:

    'The Tier Zero facility is the central hub of the Worldwide LHC Computing Grid, which also connects to some dozen ‘Tier One’ data centres for near-real time storage and analysis of data and over 150 ‘Tier Two’ data centres for batch analysis of experiment data.'