IBM Building 120PB Cluster Out of 200,000 Hard Disks
MrSeb writes "Smashing all known records by some margin, IBM Research Almaden, California, has developed hardware and software technologies that will allow it to strap together 200,000 hard drives to create a single storage cluster of 120 petabytes — 120 million gigabytes. The data repository, which currently has no name, is being developed for an unnamed customer, but with a capacity of 120PB, it's most likely use will be a storage device for a governmental (or Facebook) supercomputer. With IBM's GPFS (General Parallel File System), over 30,000 files can be created per second — and with massive parallelism, and no doubt thanks to the 200,000 individual drives in the array, single files can be read or written at several terabytes per second."
A billionaire's porn collection?
Anyone else find it depressing that the two top suspects for the use of this system are Facebook and presumably a spy agency?
Can humanity come up with no better use for the biggest iron than a bunch of frivolous, narcissistic ad profiling and covert spying on people living in an allegedly free country?
No wonder F@H doesn't post more progress. Our hardware is going towards people sharing their naked bong photos and government spooks cataloging your naked bong photos.
Do they back up to tape or external USB drive?
...about the sound and torque generated when all these disks start to spin-up.
Somewhere I can store _all_ my porn in one spot.
it's most likely use will be a storage device for a governmental (or Facebook) supercomputer.
Actually, given the explosion of data storage needs in the bio-informatics area, it's most likely use would be in storing DNA sequences for research purposes.
Someone had to do it.
All I know is that if you put it on my computer, I'll have it filled in two years and have no idea what's actually on it.
Woot! Torrent all the things!
downloading to much from TL again.
When that thing crashes somebody is going to be mad. I wonder how long restoring from backup is going to take.
...for hoarding whorecookies.
If I'm not mistaken one hard drive needs about 12~14W, so assuming that half of those are under load at a time how are they going to power that thing?
Not counting with all the needed AC and support computers, network, etc...
My understanding is that the LHC generates so much data, that most of it is discarded immediately without going to disk. Seems like this would be a good solution to there data problems.
or it is build for some ones porn collection.
It's not the government guys, at least not the cloak and dagger kind. They're too paranoid to let you know how much data they can store. They also don't want you to know that even with all that data, they're still only able to utilize a fraction of it. People are still going through WWII wire intercepts *today*. No, the problem in the intelligence community is making the data useful and organized as efficiently as possible, not collecting it.
That leaves only one real option: Scientific research. Look at how much data the Hadron Supercollider produces in a day. ..
#fuckbeta #iamslashdot #dicemustdie
FTFS:
It's the tech equivalent of Prince - it's "the data repository with no name." We can denote it with some sort of unicode glyph that slashdot will mangle.
And of course it has amazingly fast read speeds - if each drive has a 32 meg cache, that's 6.4 terabytes just for the cache.
BTW, it's for the ^@#%^&^+++NO CARRIER
So if I had that kind of power, would I want to power a 120PB cluster or a flux capacitor. Decisions, decisions.
These guys have way too much time on their hands.
The government happily stands by when a major corporation announces that it has 120 petabytes (ie petafiles - my emphasis) under its control, yet if the average joe schmo even thinks about how they'd like a petafile or two at home the FBI, CIA, TSA, ICE (and every other TLA) hauls his ass off to jail and and etches a scarlet letter on his forehead.
Such harassment by the government of simple people who aren't hurting anyone else needs to be stopped. Think of the children -- how are they going to cope when their own father/uncle/priest gets charged with accessing petafiles? They'll be the laughing stick of their peers!
I am Slashdot. Are you Slashdot as well?
Perhaps this cluster can load Deus Ex : Human Revolution levels in a reasonable amount of time!
Fear is the mind killer.
Run around with a shopping cart and swap out drives as they fail. Kind of like they did back in first computer days with vacuum tubes.
WWJD -- What Would Jimi Do?
(Smash amp, burn guitar, take home the groupies)
With 200,000 hard drives, won't there always be at least one hard drive that is failing? You'll need an IT guy 24/7 swapping out the failed drives. As soon as he swaps out one drive, another one will fail. It just seems kinda ridiculous.
And the men who hold high places must be the ones who start
To mold a new reality... closer to the heart
This just kinda strikes me as who would need this. Backing up the entire internet has to take up some space.
What do I know, I'm just an idiot, right?
It's for storing images from Nikon's new "Petapixel Pro" D7000000 camera
1) Download the internets
2) re-host the internets
3) ????
4) I really don't know. I'm scared.
Can anyone give an estimate how many disks have to be replaced every day? Can (are) big disk arrays be built so that replacements can be automated?
120 million divided by 200,000 = 600. Even on an enterprise scale they could could get a lot better densities.
~
That's almost enough to install Vista
Anyone else find it depressing that the two top suspects for the use of this system are Facebook and presumably a spy agency?
Can humanity come up with no better use for the biggest iron than a bunch of frivolous, narcissistic ad profiling and covert spying on people living in an allegedly free country?
No wonder F@H doesn't post more progress. Our hardware is going towards people sharing their naked bong photos and government spooks cataloging your naked bong photos.
You are trying too hard looking for something to be upset about (in a very attention-whorish manner to boot.)
How about some sort of gigantic media library... all porn jokes aside. Netflix? Apple? Isn't Walmart getting into the streaming business? Or some new "cloud" server?
If they could make a 120PB cluster using floppy disks, I would be much more entertained by this.
Whats the failures per minute estimation? How many full time hard-drive replacers will they need?
At least the data on this monster will be totally safe.
The fsck alone will be started at each new president inauguration,
and nobody will have access to the data for the next 8 years.
In addition, to approve financing for the outside storage tapes,
Congress will need to increase the debt limit again.
The open rainbow tables project announces....120PB of tables :)
It's them or SETI has come back.
Someone should manufacture industrial sized hard drives for this type of application. Like full height x2, so you could cram 30 platters in there.
IBM once made the punch card machines that made it easier for the Nazi party to round up the Jews. This sounds too familiar.
If this were for an American spy agency, maybe that would be enough. But when I think about how I have ten times this much data in my Gmail, and that Gmail isn't limited to only the US, I suspect that Google has a lot more storage space than this. Of course it's probably all very decentralized.
It's going in one of Wal-Mart's data centers.
So where/how do you back up something so massive? Would you have 100,000 hard drives backing up the other 100,000 drives, or build another 200,000 drive array off-site?
the drives replacements will slide onto a conveyor and like the matrix when you are of no use will be dropped down a tube and recycled , the new drive slides along and a robot puts it all together.
You should see the size of the tape backups
It would be 122PB. 2PB lost on bad marketing. Gimme my 1024 bytes back. But all-in-all this isn't that surprising. You can get 1PB in a 42U rack these days.
As a fun side note: You'll also need 122PB of tape storage (or 1.5 systems like this) just for backups. That's a lot of tape.
Custom electronics and digital signage for your business: www.evcircuits.com
Surely it must be a Beowulf cluster of those.
50,000 machines at 4 drives per machine. Several cloud sites out there now are larger than this..
If the mean time between failure of a hard drive is around two hundred thousand hours, and this disk garden has two hundred thousand drives won't the technicians be replacing a drive every hour or so? Don't believe the MTBF figures from the drive manufacturers. Those appear to be butt numbers. http://www.pcworld.com/article/129558/study_hard_drive_failure_rates_much_higher_than_makers_estimate.html ....
Two hundred thousand water cooled hard drives? How much does this fucking thing weigh? Allowing for half a pint of coolant in the pipes for each hard drive the figure comes out to over one hundred thousand pounds. That doesn't count the plumbing and coolant distribution system and heat exchanger. ....
Two hundred thousand drives with plumbing for water cooling will take up a healthy sized volume. The drives alone require on the order of four million cubic inches. I have to wonder if this is a proof of concept storage array for DARPA on behalf of an alphabet agency that needs a place to park all their spy photos and sigint.
Hard to image? Yes. But forty years ago, the largest computing center on earth had 57GB of disc storage.
We know the capacity. We know the transfer rate. But how quickly do disks need to be moved in and out of the system in order to keep it running?
200,000 is a lot of disks. I assume they are all hot swap with a great deal of redundancy because I would expect multiple drive failures every day. A raid0 with that many disks might never boot.
skynet
I'd assume they'll be sourcing drives for all manufacturers so it would be very useful if they would post the failure rates for each manufacturer.
Just curious if anyone has experience managing large, mechanical disk arrays, if you installed an array of such a size using identical hard drives and bringing everything online relatively at the same time, would there be an increased likelihood of ALL the drives dying at roughly the same time? Could failure statistics bite you with enough simultaneous failures to negate redundancy?
How exactly do you backup 120PB?
I'll give you credit for "this used to be true" back in the day when a computer was a 486 on a modem. It's absolutely not true any more.
Govt is Big Brother, and they Like it. And they absolutely have the resources to do it.
Why? Because all they need to do is a Red Flag system. Joe Average doesn't really produce that much data per day all by himself, and .gov isn't trying to perfectly reproduce the entire activity. They just need to know if something is getting juicy.
"Look! Here's a 12 Gig file of Joe's activity for the month! Control-F and search for the words "Music" and "Movie" and "Copy".
Lights out.
The part you are glossing over is how much help they are getting from nice Corps. ISPs, Telecoms, Facebook, and Google.
So to play the "nah, don't worry" line is completely misleading.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Can you just imagine the brown up when they power up the drive farm?
In practice they would be doing sequential spin up. I do however, wonder how long that would take to sequentially spin up 200k drives.
ELOI, ELOI, LAMA SABACHTHANI!?
based on 120 million GB and 200k drives, the per-drive capacity works out to 600GB a piece. Sounded like they're stringing together a bunch of WD Velociraptors.
ELOI, ELOI, LAMA SABACHTHANI!?
What's it for? No surprise, domestic spying.
I suspect just spying generally, including gathering information from non-spying sources, and including non-domestic. Why on earth would it be limited to domestic spying?
-- IANAL, this isn't legal advice, and definitely isn't legal advice for you. Also, Squee!
I don't know what it is for but I know the name of the drive is "C:"
Some drink at the fountain of knowledge. Others just gargle.
How much is that terabyte number in neocortices?
It's way too big for any governmental use, NASA, or weather-related supercomputing application. Must be for Facebook
Sorry, but gray text on gray background is making my eyes bleed.
ASCII Pr0n
A unique way to learn a language: http://languageloom.com
This isn't a genuine statistical analysis, but a back-of-the-napkin calculation suggests that if they use hard drives with an MTBF around 3 years, they'll be replacing one drive every 7.5 minutes. If your employee can run fast, that's a 24/7 fulltime job.
What do you mean they cut the power? How can they cut the power, man? They're animals!
Assuming that you could get 50 Libraries of Congress onto a single petabyte drive, you ought to be able to get 6,000 Libraries of Congress onto one of these 120 petabyte arrays,. . .
I know of at least three companies (Apple, Google, Microsoft) just off the top of my head rich enough and ballsy enough to try AI on a scale that's never been tried before. I'm crossing my fingers.
If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
the "milligoogle"
Help stamp out iliturcy.
If they could make a 120PB cluster using floppy disks, I would be much more entertained by this.
Well, of course, they already did the floppy RAID. But it didn't quite get to 120 petabytes.
It's worthwhile considering how much space would be required simply to *store* all those floppies in their boxes. This article calculates how many 1.44MB floppies a "modest" 1.5 terabyte hard drive would require to back up. It's about a million, obviously. Which would require a near-cube-shaped stack around 3 metres on each side (see footnote 3)- that's a cubic pile just under twice the height of an average man, simply to back up the contents of an unremarkable 1.5 terabyte hard drive.
120 petabytes is around 80,000 times bigger than *that*.
Our cube would have to be 80,000 ^ (1/3) = 43 times higher than 3 metres, that's a cube around 129 metres high!!!
The amount of space that would be required if punched paper tape was used instead is left as an exercise to the reader.
OKAY, cat's out of the bag on this on - I'n the customer and it's for my ...
torrent server (which BTW has no infringing content stored on it)
A company I used to work for were in the process of developing some new products when I started. They were very good with lasers, since an early product of theirs was a storage device that stored a terabyte of data on optical media. At the time (early 1990s) the market for storing such vast quantities of data was limited. Worse, the main customers were three-letter agencies who had security concerns about buying from a non-U.S. company, so they sold the product line to a U.S. company and went on to other things.
They proceeded to develop some good stuff, but also developed some real crap, and supported it all miserably. Along the way they fucked me over badly. They were subsequently bought out and asset-stripped. I'm still here. They're not. Serves the bastards right... :-)
...laura
Ponies... as far as the eye can see...
The article is missing a key information: which storage product did they used in this GPFS Cluster?
Sonas?
XIV?
DS? which model?
other?
Finally a place to store all my zeros!
Fascism should more properly be called corporatism because it is the merger of state and corporate power. -- Mussolini
It's for Jurassic Park
Isn't that one less than HPFS?
MS Word will still take 20 seconds to launch.
sig has been sent away for a few small repairs...
Given the fail rate on hard drives, replacing these would be a full time job. No?
http://www.awfullybigmoustache.com
At the rate that IBM SAN disks fail they'll have to employ one or more people full time to replace failed drives.
Actually, 120PiB is 135 PB.
I am not impressed, its tiny. Do you know how much capacity our bittorrent network has folks? Image a beowulf cluster of these... and you got it.
Obviously it will be used by people wanting to download the internet.