Slashdot Mirror


Storing CERN's Search for God (Particles)

Chris Lindquist writes "Think your storage headaches are big? When it goes live in 2008, CERN's ALICE experiment will use 500 optical fiber links to feed particle collision data to hundreds of PCs at a rate of 1GB/second, every second, for a month. 'During this one month, we need a huge disk buffer,' says Pierre Vande Vyvre, CERN's project leader for data acquisition. One might call that an understatement. CIO.com's story has more details about the project and the SAN tasked with catching the flood of data."

23 of 154 comments (clear)

  1. News for Nerds! by KlomDark · · Score: 4, Insightful

    Wow! Actually geeky science news, not enough of that here lately!

    1. Re:News for Nerds! by zeugma-amp · · Score: 3, Interesting

      Interesting article.

      Many years ago when the SSC (Superconducting Super Collider) was still being built in Texas, I went to an HP users group meeting as I was working primarily with HP-3000 systems at the time. The fellow addressing the meeting was the head of the physics department at the SSC. It was a really neat presentation, in which he described a similar, though orders of magnitude smaller data storage requirement, though he was talking terabytes of data per month IIRC. At the time, they were planning on using two arrays of 40 workstation computers to handle the load. This would have been fairly early loosely coupled setup similar to a Beowulf cluster.

      After the presentation I went up to him and told him that all I wanted to do is sell him mag-tapes.

      These types of experiments evidently produce tons of data. I wonder if the processing could be parcelled out like Stanford's Folding@Home or SETI to speed up data correlations.

      --
      This is an ex-parrot!
    2. Re:News for Nerds! by Anonymous Coward · · Score: 3, Insightful

      ive often wondered if i could sneak into cern and just look around. i think the only two things you would need to do it would be a white lab coat and a really grizzled look on your face.

      i remember when i was under 18 i used to go to alot of places i wasnt allowed in just to check things out. i wasnt a malicious kid that would run around breaking things for fun, i just loved seeing various things that most people never see or think about, especially feats of engineering.

      when i turned 18 i looked back and was actually sad i didnt do it more often. after 18 you dont just get escorted out with a warning. now that im older im really really sad for the upcoming generations. genuinely good kids wont go peeking around at stuff as often, and the ones that do will be severely punished because everyone will think they were 'terrorists'.

      for many of the up and coming kids, all they have to look forward to are pointless unnecessary techno gadgets and the warped MTV social culture where money, drugs, and sex are all they are taught to appreciate and strive for.

    3. Re:News for Nerds! by Rodolpho+Zatanas · · Score: 5, Interesting

      From my experience, generic blue work clothes (preferably with your name on the breast pocket) work best. I once got into some research facility (they had lasers and everything) because I got out of the elevator on the wrong floor and some guy in a lab coat opened the door for me (I was wearing my work clothes because I was on my lunch break). I wandered about at the place for something like 10 minutes before I found a way out. There was even a security guy of some type sitting at a hallway but he lost interest in me after I looked him in the eye and said hello.

    4. Re:News for Nerds! by xyvimur · · Score: 4, Informative

      Just go there and take a guided tour. If you'll hurry you'll be able to go to the detector pit and see it. Otherwise after starting up it will be inaccesible for visitors for the life-cycle of the experiments (10-20 years). Google for CERN visit service.

      Milosz

  2. If Only... by i_ate_god · · Score: 4, Funny

    If only I could get porn that fast

    there I said it, let's move on now.

    --
    I'm god, but it's a bit of a drag really...
  3. Um no...it's a product placement for Quantum by xxxJonBoyxxx · · Score: 4, Informative

    Um...no. Actually, it's a product placement PR piece about Quantum's StorNext. (Read page 2...)

    1. Re:Um no...it's a product placement for Quantum by Anonymous Coward · · Score: 5, Funny

      Um...no. Actually, it's a product placement PR piece about Quantum's StorNext. (Read page 2...)
      We knew there were some serious nerds on Slashdot, but to be potential customers for the same RAID system as CERN, whoa! :)
    2. Re:Um no...it's a product placement for Quantum by Midnight+Warrior · · Score: 5, Insightful

      You may think of it as product placement, but I use it. I even provide the occasional blog entry on it on Advanced Topics. I sat through a RedHat performance tuning class that was quite excellent. But when they came to the part about ext3 and tuning it, well, let's face it - ext3 just isn't going to scale. I started with Veritas' Filesystem which is pretty nice. If you're a small-time admin, then you never get beyond a local, 4U disk array. Once your group spends more than US$2million on servers though, it's obvious what the problem is: Storage - The Final Frontier. SAN and clustered filesystems allow a level of scalability completely unheard of before.

      They also completely left out anything but a tagline of their multi-tiered solution. I wish they'd talked more about how CERN supports 500Gbit per second aggregate throughput to their disks (at least they implied that). 50GB/sec (or so) is probably the toughest I/O problem you've ever dealt with, or will deal with for a long time. Whose RAID controllers did they use? Did they focus on speed (ASIC and ISL minimization), availability (redundant fabrics), or both? Did each node get dual 4Gb links or just one?

      If this had been an advertisement, they would have discussed some 3.0 features like LAN clients.

      So, in short, it's easy to say it sounds like an advertisement. Quite possibly, Quantum (formerly ADIC) coerced them into getting the piece written. But if this had been an advertisement, there is so much more that is going on under the hood that would have been said. Large, fast, distributed filesystems are non-trivial and take an extreme amount of engineering and testing. StorNext really is good at what they claim to do.

      If you want to read about some of the drawbacks though, I yak about them on my blog. Sorry for the plug.

  4. Re:The mere thought of that much bandwidth... by dosguru · · Score: 3, Interesting

    A standared dual CPU dual core HP server with Windows can keep a 4Gb FC pretty full if set up correctly. I work for a large bank, and we have many a Solaris box that can keep 4 or even 8 2Gb FC cards full into our FC and SATA disk arrays. Not to trivialize the extreme coolness of what they are doing at all, but a PB of data with a few PB of I/O in a day isn't what it used to be. I'm just glad to see they don't use Polyserve, it is worthless for clustering and has caused more downtime at work than it has ever prevented. If they really have that much data they should use 10Gb FC or Infiband. Even our stodgy old bank is implementing our first infiband system so we can move IO at 12Gb instead of the slow 4Gb links.

  5. Re:Pseudo-Dupe? by Easy2RememberNick · · Score: 3, Funny

    Nah it's just, spooky article submission at a distance.

      The other article appeared because it knew this one would be submitted later in the future.

  6. FTL by unchiujar · · Score: 3, Funny

    "Due for operation in May 2008, the LHC is a 27-kilometer-long device designed to accelerate subatomic particles to ridiculous speeds, smash them into each other and then record the results."
    Next up ludicrous speed!!! Better fasten your seat belts...

    --
    Shakespeare poems - infinite monkeys with infinite time.Computer tech support - a few trained ones working from 9 to 5.
  7. Thousands of disk drives. by Anonymous Coward · · Score: 3, Funny

    Hmm, lets see. ~2700 TB of data over one month. Let's store it on 500 GB drives. That's 5400 disk drives just to store the data. Add in the the extra drives for parity, and a few hundred hot spares, this thing could easily use OVER NINE THOUSAND drives.

    1. Re:Thousands of disk drives. by noggin143 · · Score: 5, Informative

      We are expecting to record around 15PB / year during the LHC running. This data is stored onto magnetic tape with petabytes of disk cache to give reasonable performance. A grid of machines distributed worldwide analyses the data. More details are available on the CERN web site www.cern.ch.

  8. A correct use of the word "catch". by Futurepower(R) · · Score: 4, Insightful

    Not only did the Slashdot editor not catch a spelling mistake, he apparently didn't catch the fact that the linked article is an advertisement from CXO Media, which, according to its web site, mixes articles and advertisements: "Through our integrated media and marketing programs we provide..."

    From the linked article: "... the team is using Quantum's StorNext software as its file system..."

    Question: Did a Slashdot editor get paid directly for running an advertisement disguised as an article? Or was someone in Slashdot's parent company paid "under the table"? Or did the parent company get paid?

    Anyone wanting to read a real article from 2005 about CERN's data handling, data storage, and data processing can download this PDF file: Grid Computing: The European Data Grid Project.

    Real articles begin this way: "The computing challenges for LHC are: * the massive computational capacity required for analysis of the data and * the volume of data to be processed."

    Advertisements begin by talking about God and murder, this way (from the article linked by Slashdot): "CERN's Search for God (Particles)..."

    and "Maybe you last read about CERN (the European Organization for Nuclear Research) and its massive particle accelerators in Angels & Demons by Dan Brown of The Da Vinci Code fame. In that book, the lead character travels to the cavernous research institute on the border of France and Switzerland to help investigate a murder."

  9. Re:Idea by KillerCow · · Score: 3, Funny

    ./go.sh | bzip2 > results.bz2 Problem solved!


    No. No, my friend; you do not grasp the scale of this project.
     
    ./go.sh | bzip2 | bzip2 > results.bz2
  10. Re:PC's? by Rodolpho+Zatanas · · Score: 5, Informative

    load"*",8,1 would load something from a diskette, not a cassette.

  11. Not So Huge by PenGun · · Score: 5, Informative

    It's only 5x HD SDI single channel ~ 200MB/s. Any major studio could handle this with ease.

    SDI is how the movie guys move their digital stuff around. A higher end digital camera will capture at 2x HD SDI for a 2K res, 4:4:4 colour space. A few of em' and you got your 1GB/s easy. Spools onto godlike RAID arrays.

      Get em' to call up Warner Bros if they have problems.

  12. 30 racks, $1.8M in disks by this+great+guy · · Score: 3, Informative

    Assuming a non-RAID 3x-replication tech solution (what Google do in their datacenters), using 500-GB disks (best $/GB ratio), they would need about 16 thousands disks:

    .001 (TB/sec) * 3600*24*30 (sec/month) * 3 (copies) * 2 (disk/TB) = 15552 disks

    Which would cost about $1.8M (disks alone):

    15552 (disk) * 110 ($/disk) = $1710720

    Packed in high-density chassis (48 disks in 4U, or 12 disks per rack unit), they could store this amount of data in about 30 racks:

    15552 (disk) / 12 (disk/rack unit) / 42 (rack unit/rack) = 30.9 racks

    Now for various reasons (vendors influence, inexperienced consultants, my experience in the IT world in general, etc), I have a feeling they are going to end up with a solution unnecessarily complex, much more expensive, and hard to maintain and expand... Damn, I would love to be this project leader !

  13. CERN DAQ is generally impressive by torako · · Score: 5, Interesting
    It's important to distinguish between the amount of data generated during an event right in the detector and the filtered data that in the end will be kept and saved on permanent storage. The ATLAS detector, for example, has a data rate in the order of terabits per sec during an event. There's a pretty sophisticated multi-level triggering system whose purpose it is to throw out most of that data (~98%) and only look for interesting events.

    Right now, the average event size for ATLAS is 1.6 MByte and the system is designed to keep around 200 events per second, or roughly 300 MByte. This isn't much of course, but you have to consider that the bunch crossing rate (i.e. the rate at which bunches of protons will collide and generate events) is 40 MHz.

    So you have to design a system that boils this rate from 40 MHz down to 200 Hz and only keeps the interesting parts, while also buffering all the data in the meantime. For this reason, the first trigger level is entirely implemented in hardware right in the detector and reduces the rate down to 75 KHz with a latency of 2.5 s. The rest of the trigger works on clusters using Linux computers and has a latency of o(1s).

  14. Finding God by Mark_MF-WN · · Score: 3, Funny

    Don't worry -- the products of particle accelerators only exist for a few picoseconds. If God is created during a collision event, he will wink out of existence so fast that we'll only become aware of his presence by the shower of Mormonions and PatRobertsonite particles impinging on the detection apparatus.

  15. Backup options by Mostly+a+lurker · · Score: 5, Funny
    I assume they will want to have more than one copy of this for backup purposes. Here is my analysis on their choices. The total data to be backup up (for the month) is taken as a lazy 1 * 60 * 60 * 24 * 30 = 2,592,000 gigabytes
    • Printed hardcopy. Many authorities recommend this as you do not need to worry about changes in data formats over time. For exact calculation, we would need to know the font they were planning to use and the character encoding. However, let's take a working assumption that they can cram 10KB of data onto an A4 sheet. That implies 259,200,000,000,000 pages. They will probably not want to use an inkjet printer if they use this solution and may, indeed, choose to acquire multiple printers and split the load. A single printer at 10 ppm would take approximately 50,000 years to complete the backup. On 70gm paper, it would weigh a little over two million tons. At any rate, this would certainly produce reams of output.
    • Diskettes. This was good enough for nearly everyone 15 years ago. It is curious that such a tried and trusted technique is no longer in fashion. I assume regular 3.5" 1.44MB diskettes, generally recognised as easier to handle than 5.25". We shall need around 1,800,000,000 diskettes. One drawback is the person changing the diskettes as each one filled up might become a little bored after a while. On the positive side, the backup will be quite a lot faster than the printed solution. Assuming about one diskette per minute, inclusive of changing disks, the backup could be complete in less than 3,500 years.
    • Now considered somewhat old fashioned, punch cards were once a mainstay of every programmer's personal backups. Like printed hardcopy, anyone familiar with the character encoding used, could read the data without needing any access to a computer. If we assume 80 column cards, we would need 32,400,000,000,000 cards. I would be somewhat concerned about the problem of getting this stack of cards back in the correct order if I dropped it. With a weight of about 30 million tons and stretching perhaps 6 million miles end to end, handling certainly would be challenging and an accident very possible.
    • Paper (punched) tape was the only alternative on the first computer I used, a basic early model Elliott 803 without the optional magnetic tape. If I recall correctly, you could manage about 10 characters per inch, so you would need a paper tape over 4,000,000,000 miles long. Hmmm, that would be silly. The other solutions are clearly better.
    I am sure other options will be considered, but I just wanted to bring these up in case CERN had failed to consider them
  16. well if no one else is going to say it by Main+Gauche · · Score: 3, Informative

    "Imagine how deep the personality problems must run in a person who gets all hot because of someone's DNA sequences!"

    You must be new here.