IBM Building 120PB Cluster Out of 200,000 Hard Disks

← Back to Stories (view on slashdot.org)

IBM Building 120PB Cluster Out of 200,000 Hard Disks

Posted by Soulskill on Friday August 26, 2011 @02:16AM from the go-big-or-go-home dept.

MrSeb writes "Smashing all known records by some margin, IBM Research Almaden, California, has developed hardware and software technologies that will allow it to strap together 200,000 hard drives to create a single storage cluster of 120 petabytes — 120 million gigabytes. The data repository, which currently has no name, is being developed for an unnamed customer, but with a capacity of 120PB, it's most likely use will be a storage device for a governmental (or Facebook) supercomputer. With IBM's GPFS (General Parallel File System), over 30,000 files can be created per second — and with massive parallelism, and no doubt thanks to the 200,000 individual drives in the array, single files can be read or written at several terabytes per second."

28 of 290 comments (clear)

Min score:

Reason:

Sort:

What's it for? by yomammamia · 2011-08-26 02:20 · Score: 2

A billionaire's porn collection?
1. Re:What's it for? by Given+M.+Sur · 2011-08-26 02:43 · Score: 5, Funny
  
  What's it for? No surprise, domestic spying.
  I think you mean "protecting your freedoms, fellow patriot."
  
  --
  nil
2. Re:What's it for? by yomammamia · 2011-08-26 02:46 · Score: 2
  
  Could be a company that intends to rent out space to such agencies and for such uses or for cloud computing (amazon).
3. Re:What's it for? by erroneus · 2011-08-26 02:55 · Score: 3, Funny
  
  Yes, he's an admitted petaphyle.
Not done yet by TheVidiot · 2011-08-26 02:21 · Score: 2

Do they back up to tape or external USB drive?
1. Re:Not done yet by S.O.B. · 2011-08-26 02:28 · Score: 5, Funny
  
  Punch cards.
  
  --
  Some of what I say is fact, some is conjecture, the rest I'm just blowing out my ass...you guess.
2. Re:Not done yet by Hatta · 2011-08-26 03:12 · Score: 2
  
  Imagine a Beowulf cluster of these!
  
  --
  Give me Classic Slashdot or give me death!
3. Re:Not done yet by stridebird · 2011-08-26 05:36 · Score: 2
  
  came for the beowulf comment...leaving satisfied.
I wonder.. by eexaa · 2011-08-26 02:21 · Score: 2

...about the sound and torque generated when all these disks start to spin-up.
1. Re:I wonder.. by jhoegl · 2011-08-26 02:25 · Score: 2
  
  It may very well alter time as we know it!
2. Re:I wonder.. by ELCouz · 2011-08-26 02:31 · Score: 2
  
  Obviously, they are forming a deathstar ;)
3. Re:I wonder.. by eexaa · 2011-08-26 02:59 · Score: 2
  
  My geek nature disapproves such torque-negating behavior. Instead, it totally wants to see the petabytes spin at some insane RPM, cancelling the gravity and possibly crushing some enemies.
4. Re:I wonder.. by crow · 2011-08-26 04:58 · Score: 2
  
  Yes, alternating directions. That assumes the drives are mounted vertically. If they're mounted horizontally, then yes, upside-down.
  If they're using SSDs, then they need special leveling algorithms to keep the accesses spread out so that they don't get out of balance. If you access the left side of all your SSDs in the rack, the rack might fall over. :)
5. Re:I wonder.. by rrohbeck · 2011-08-26 11:49 · Score: 2
  
  Yup. Don't mount them all in the same orientation as the Earth's axis or you can probably measure the change in the day's length.
  
  --
  thegodmovie.com - watch it
Finally... by TheAngryArmadillo · 2011-08-26 02:21 · Score: 2

Somewhere I can store _all_ my porn in one spot.
Fill 'er up by mmarlett · 2011-08-26 02:22 · Score: 4, Funny

All I know is that if you put it on my computer, I'll have it filled in two years and have no idea what's actually on it.
Finally! by AngryDeuce · 2011-08-26 02:23 · Score: 2

Woot! Torrent all the things!
Re:Depressing by m50d · 2011-08-26 02:29 · Score: 2

I can see the likes of the LHC or the AEA using something like this - they generate enough data. But if it were a "good guy" why would they keep it secret?

--
I am trolling
Would be a good fit for CERN LHC by Tynin · 2011-08-26 02:30 · Score: 2

My understanding is that the LHC generates so much data, that most of it is discarded immediately without going to disk. Seems like this would be a good solution to there data problems.
Not the government. by girlintraining · 2011-08-26 02:31 · Score: 4, Interesting

It's not the government guys, at least not the cloak and dagger kind. They're too paranoid to let you know how much data they can store. They also don't want you to know that even with all that data, they're still only able to utilize a fraction of it. People are still going through WWII wire intercepts *today*. No, the problem in the intelligence community is making the data useful and organized as efficiently as possible, not collecting it.
That leaves only one real option: Scientific research. Look at how much data the Hadron Supercollider produces in a day. ..

--
#fuckbeta #iamslashdot #dicemustdie
1. Re:Not the government. by DrgnDancer · 2011-08-26 02:50 · Score: 4, Insightful
  
  This is generally something I have a hard time convincing people of. I've worked for spooky organizations. Not at the highest levels or on the most secret projects, but in the general vicinity. The government is not monitoring you. Not because they lack the legal capability (though they do, and that is mostly, but not always, respected), but because they lack the technical ability. There are only so many analysts, only so much computer time, only so much storage. Except in cases of explicit corruption or misuse of resource, those analysts, that computer time, and that storage is not being wasted on monitoring Joe and Jane average.
  I'm not going to say that there aren't abuses by the people who have access to some of this stuff; they are human and weak like the rest of us and are often tempted to take advantage of their situation I'm sure. In general however, unless you've done something that got a warrant issued for your information, the government doesn't care. They just don't have the resources to be big brother, even if they want to be.
  
  --
  I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
2. Re:Not the government. by m50d · 2011-08-26 03:33 · Score: 2
  
  Practically, the government *can't* watch you all the time, or really at all, unless you are the subject of some investigation worth those resources
  
  Trouble is, if the government does something I don't like, and I start taking (perfectly legal) political action against that, I become someone "worth" watching. So surveillance capability is something to worry about now; otherwise, when something directly problematic comes up, you're a dissident and it's too late.
  
  --
  I am trolling
3. Re:Not the government. by AmiMoJo · 2011-08-26 04:39 · Score: 2
  
  There are only so many analysts, only so much computer time, only so much storage.
  The government has found a solution to that problem. Distribute the computing and storage requirements.
  These days if you want a license to sell alcohol in your shop you have to get agreement from the police, and they usually require you to have extensive CCTV systems covering the area outside your shop as well as inside it. They shift the burden of installing and maintaining the system to the shop owner and can access the video any time they like. If a crime is reported the shop owner gets a demand for CCTV footage and has to go back into their archive, find and save it to disc all at no cost to the police force.
  Admittedly there are not enough people to monitor all these video streams at once, but they don't have to. They rely on victims reporting crime rather than actively looking for it. Unfortunately this makes them very lazy, and when CCTV footage isn't available they tend not to bother investigating.
  So it comes down to your definition of "monitoring you". If all email headers and the domain of every web site you visit is kept for two years and can be accessed by them at any time, and it costs them very little because the ISP you are paying for service is the one who is monitoring you and keeping all the data then I'd say that meets the criteria.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Re:Depressing by PPH · 2011-08-26 02:32 · Score: 4, Insightful

Facebook and presumably a spy agency?
You're repeating yourself.

--
Have gnu, will travel.
Re:Paranoid much? by Anonymous Coward · 2011-08-26 02:40 · Score: 3, Informative

modern gernome compression techniques only store the edits needed to convert the reference genome to your genome. And the diff file is just around 24 MB per person. I am an ex-bioinformatician.
Re:Paranoid much? by biodata · 2011-08-26 02:42 · Score: 2

Our modest lab turns out roughly 100GB a week of finished sequence, from a single sequencer, which is only a very small fraction of the temporary disk storage needed along the way to get to finished sequence. Genome centres with many machines will turn out an order of magnitude (or two) more, and believe me, these machines are kept busy week after week. Once we have finished sequences, the assembly process adds a multiple to this. Yes, a genome is only XMB, but when you have to effectively sequence it 40 times to get the overlaps you need to assemble the thing, it soon mounts up. The sequencer machine companies are now touting similar scale machines on the basis that any lab can afford one to do their own sequencing. Sequence volumes have been outstripping Moore's law for some time now, and it isn't going to stop anytime soon. That said, I think Facebook and their CIA funders are probably more likely to have the money for this than anyone doing anything useful for humanity.

--
Korma: Good
Re:Constant failures? by SuperQ · 2011-08-26 02:47 · Score: 2

This is what MTBF is all about. "Enterprise" drives are rated at 1.2 million hours MTBF. 1,200,200 hours / 200,000 drives = 6 hours per drive failure. Not too bad, only 4 a day.
Re:Constant failures? by Marc+Desrochers · 2011-08-26 03:01 · Score: 2

How long does it take for the cluster to rebuild after a drive fails, and does this involve downtime?