IBM Building 120PB Cluster Out of 200,000 Hard Disks
MrSeb writes "Smashing all known records by some margin, IBM Research Almaden, California, has developed hardware and software technologies that will allow it to strap together 200,000 hard drives to create a single storage cluster of 120 petabytes — 120 million gigabytes. The data repository, which currently has no name, is being developed for an unnamed customer, but with a capacity of 120PB, it's most likely use will be a storage device for a governmental (or Facebook) supercomputer. With IBM's GPFS (General Parallel File System), over 30,000 files can be created per second — and with massive parallelism, and no doubt thanks to the 200,000 individual drives in the array, single files can be read or written at several terabytes per second."
A billionaire's porn collection?
Do they back up to tape or external USB drive?
...about the sound and torque generated when all these disks start to spin-up.
Somewhere I can store _all_ my porn in one spot.
All I know is that if you put it on my computer, I'll have it filled in two years and have no idea what's actually on it.
Woot! Torrent all the things!
I can see the likes of the LHC or the AEA using something like this - they generate enough data. But if it were a "good guy" why would they keep it secret?
I am trolling
My understanding is that the LHC generates so much data, that most of it is discarded immediately without going to disk. Seems like this would be a good solution to there data problems.
It's not the government guys, at least not the cloak and dagger kind. They're too paranoid to let you know how much data they can store. They also don't want you to know that even with all that data, they're still only able to utilize a fraction of it. People are still going through WWII wire intercepts *today*. No, the problem in the intelligence community is making the data useful and organized as efficiently as possible, not collecting it.
That leaves only one real option: Scientific research. Look at how much data the Hadron Supercollider produces in a day. ..
#fuckbeta #iamslashdot #dicemustdie
Facebook and presumably a spy agency?
You're repeating yourself.
Have gnu, will travel.
modern gernome compression techniques only store the edits needed to convert the reference genome to your genome. And the diff file is just around 24 MB per person. I am an ex-bioinformatician.
Our modest lab turns out roughly 100GB a week of finished sequence, from a single sequencer, which is only a very small fraction of the temporary disk storage needed along the way to get to finished sequence. Genome centres with many machines will turn out an order of magnitude (or two) more, and believe me, these machines are kept busy week after week. Once we have finished sequences, the assembly process adds a multiple to this. Yes, a genome is only XMB, but when you have to effectively sequence it 40 times to get the overlaps you need to assemble the thing, it soon mounts up. The sequencer machine companies are now touting similar scale machines on the basis that any lab can afford one to do their own sequencing. Sequence volumes have been outstripping Moore's law for some time now, and it isn't going to stop anytime soon. That said, I think Facebook and their CIA funders are probably more likely to have the money for this than anyone doing anything useful for humanity.
Korma: Good
This is what MTBF is all about. "Enterprise" drives are rated at 1.2 million hours MTBF. 1,200,200 hours / 200,000 drives = 6 hours per drive failure. Not too bad, only 4 a day.
How long does it take for the cluster to rebuild after a drive fails, and does this involve downtime?