Slashdot Mirror


Storing CERN's Search for God (Particles)

Chris Lindquist writes "Think your storage headaches are big? When it goes live in 2008, CERN's ALICE experiment will use 500 optical fiber links to feed particle collision data to hundreds of PCs at a rate of 1GB/second, every second, for a month. 'During this one month, we need a huge disk buffer,' says Pierre Vande Vyvre, CERN's project leader for data acquisition. One might call that an understatement. CIO.com's story has more details about the project and the SAN tasked with catching the flood of data."

13 of 154 comments (clear)

  1. Um no...it's a product placement for Quantum by xxxJonBoyxxx · · Score: 4, Informative

    Um...no. Actually, it's a product placement PR piece about Quantum's StorNext. (Read page 2...)

  2. Re:Gigabits or Bytes? by snowraver1 · · Score: 2, Informative

    2.6 Petabytes. The article says that they will be collecting petabytes of data. Also, the article clearly said GB. GB= Gigabyte Gb= Gigabit. The thing that I thought was "Wow that's ALOT of blinking lights!" Sweet!

    --
    Copyright 2010. All rights reserved. This comment may not be copied in any way including, but not limited to caching.
  3. Re:PC's? by Falstius · · Score: 2, Informative

    Actually, there really is a gigantic room at CERN full of commodity PCs that form the first level of computing for the different experiments. The data is then shipped off to sites around the world for further processing. There is a combination of 'locally' distributed computing and world-wide grid being used.

  4. Re:PC's? by Anonymous Coward · · Score: 1, Informative

    Initially some data is being filtered at the detector pits by the farms of PCs doing the triggering. After that the data will be fed to storage and analysis. CERN has been upgrading its computer centre for quite some while (the main problem is not power supply, but cooling system - thus some of performance benchmarks also include it). Besides CERN (Tier-0) will have high-speed connections (via means of LCG backbone) with many sites around the world and the data processing will be done in a 'global manner'.

    You can google on phrase 'service challenge' site:cern.ch, or just go to the LCG site.

    --
    Milosz

  5. Re:News for Nerds! by xyvimur · · Score: 4, Informative

    Just go there and take a guided tour. If you'll hurry you'll be able to go to the detector pit and see it. Otherwise after starting up it will be inaccesible for visitors for the life-cycle of the experiments (10-20 years). Google for CERN visit service.

    Milosz

  6. Re:Thousands of disk drives. by noggin143 · · Score: 5, Informative

    We are expecting to record around 15PB / year during the LHC running. This data is stored onto magnetic tape with petabytes of disk cache to give reasonable performance. A grid of machines distributed worldwide analyses the data. More details are available on the CERN web site www.cern.ch.

  7. Re:PC's? by Rodolpho+Zatanas · · Score: 5, Informative

    load"*",8,1 would load something from a diskette, not a cassette.

  8. Not So Huge by PenGun · · Score: 5, Informative

    It's only 5x HD SDI single channel ~ 200MB/s. Any major studio could handle this with ease.

    SDI is how the movie guys move their digital stuff around. A higher end digital camera will capture at 2x HD SDI for a 2K res, 4:4:4 colour space. A few of em' and you got your 1GB/s easy. Spools onto godlike RAID arrays.

      Get em' to call up Warner Bros if they have problems.

  9. 30 racks, $1.8M in disks by this+great+guy · · Score: 3, Informative

    Assuming a non-RAID 3x-replication tech solution (what Google do in their datacenters), using 500-GB disks (best $/GB ratio), they would need about 16 thousands disks:

    .001 (TB/sec) * 3600*24*30 (sec/month) * 3 (copies) * 2 (disk/TB) = 15552 disks

    Which would cost about $1.8M (disks alone):

    15552 (disk) * 110 ($/disk) = $1710720

    Packed in high-density chassis (48 disks in 4U, or 12 disks per rack unit), they could store this amount of data in about 30 racks:

    15552 (disk) / 12 (disk/rack unit) / 42 (rack unit/rack) = 30.9 racks

    Now for various reasons (vendors influence, inexperienced consultants, my experience in the IT world in general, etc), I have a feeling they are going to end up with a solution unnecessarily complex, much more expensive, and hard to maintain and expand... Damn, I would love to be this project leader !

  10. ALICE is not Higgs Hunting by Roger+W+Moore · · Score: 2, Informative

    The ALICE experiment is actually concentrating on heavy ion collisions which is why they only worry mainly about one month/year, the rest of the time the machine is running protons for the other experiments, ATLAS and CMS, which will look for the Higgs. ALICE will hopefully study the quark gluon plasma but, as far as I know, has no plans to look for the Higgs.

  11. The CIO article is incomplete by quarkie68 · · Score: 2, Informative

    OK, we got a half way overview of CERN's decision, with some bold statements of questionable validity. I am submitting the criticism purely on the grounds of being really interested in large data storage, I don't work for any large storage vendor, but I am an architect of storage systems.

    First of all, with the statement "and it's (StorNext) completely vendor independent": Lot's of other solutions provide flexibility about choosing the hardware vendor from a theoretical perspective. The theory says that if vendor A makes a SAN, vendor B makes a RAID controller, C a disk cabinet and D offers a clustered FS, and all comply to the relevant standards, you can plug them together and expect them to function. However, imperfections in the standards, hidden proprietary optimizations, always dictate certain configs and combinations for optimum performance. There is a lot of work to be done in the StorNext and other similar products, until they claim full flexibility. My experience in deploying a StorNext based solution on a 1200 node setup says so and to keep the post short, I shall exclude at this stage vendor details, but if someone is interested, I am happy to go over the details. There is vendor dependence if you wish optimum performance. Not to mention that if you mix and match the RAID and SAN cards in the setup, any unfortunate issue might end up in a multi-headache, even if you have solution support (A blaims B, B accusses A, and the game of ping-pong begins). You can never exclude vendor dependence in such a large setup, you have to deal with it.

    Then you have the "Clustered file systems are still an evolving category, she says, but enterprise IT is warming up to it.". I can imagine what the author classes as enterprise IT here, but I think there is a bit of an orientation issue. CERN is not exactly the classical enterprise IT environment, is it? Not in terms of their requirements for resilience and capacity. These FAR EXCEED enteprise IT requirements. CERN is a research setup. And the mentality of a research setup (that incubated the WWW after all) is (or should be) that of innovation and playing with some of the latest and the greatest. In fact, some US based research setups have long experimented with other cluster FSes. They are not warming up. CIO claims that StorNext is scalable. It is. But to what extent? Have they excluded for example things such as Lustre? http://wiki.lustre.org/index.php?title=Main_Page If yes, why?

  12. well if no one else is going to say it by Main+Gauche · · Score: 3, Informative

    "Imagine how deep the personality problems must run in a person who gets all hot because of someone's DNA sequences!"

    You must be new here.

  13. Try 5,000 years ago by benhocking · · Score: 2, Informative

    I think you're thinking of that guy who got nailed to the cross (Jesus). Noah was born about 5,000 years ago.

    --
    Ben Hocking
    Need a professional organizer?