Slashdot Mirror


IBM Building 120PB Cluster Out of 200,000 Hard Disks

MrSeb writes "Smashing all known records by some margin, IBM Research Almaden, California, has developed hardware and software technologies that will allow it to strap together 200,000 hard drives to create a single storage cluster of 120 petabytes — 120 million gigabytes. The data repository, which currently has no name, is being developed for an unnamed customer, but with a capacity of 120PB, it's most likely use will be a storage device for a governmental (or Facebook) supercomputer. With IBM's GPFS (General Parallel File System), over 30,000 files can be created per second — and with massive parallelism, and no doubt thanks to the 200,000 individual drives in the array, single files can be read or written at several terabytes per second."

290 comments

  1. What's it for? by yomammamia · · Score: 2

    A billionaire's porn collection?

    1. Re:What's it for? by sgt+scrub · · Score: 1

      Billionaire porn collections are stored in multiple locations. billionaire pr0n collection

      --
      Having to work for a living is the root of all evil.
    2. Re:What's it for? by Scareduck · · Score: 1

      What's it for? No surprise, domestic spying.

      --

      Dog is my co-pilot.

    3. Re:What's it for? by swan5566 · · Score: 1

      Satellite companies/government agencies is one sector that could use this. They gather terabytes' worth of new data every day.

      --
      In debates about Christianity, there are two groups: those looking for answers, and those looking to just ask questions.
    4. Re:What's it for? by Given+M.+Sur · · Score: 5, Funny

      What's it for? No surprise, domestic spying.

      I think you mean "protecting your freedoms, fellow patriot."

      --
      nil
    5. Re:What's it for? by yomammamia · · Score: 2

      Could be a company that intends to rent out space to such agencies and for such uses or for cloud computing (amazon).

    6. Re:What's it for? by Hatta · · Score: 1

      Why would a billionaire need porn?

      --
      Give me Classic Slashdot or give me death!
    7. Re:What's it for? by erroneus · · Score: 3, Funny

      Yes, he's an admitted petaphyle.

    8. Re:What's it for? by Anonymous Coward · · Score: 0

      Since when is *need* an impetus for whatever billionaires acquire and collect?

      - T

    9. Re:What's it for? by ObsessiveMathsFreak · · Score: 1

      China's internet surveillance records?

      --
      May the Maths Be with you!
    10. Re:What's it for? by Hatta · · Score: 1

      I'm just suggesting that billionaires would have a better option than porn. Why collect porn when you could collect porn stars?

      --
      Give me Classic Slashdot or give me death!
    11. Re:What's it for? by CharlyFoxtrot · · Score: 1

      What's it for? No surprise, domestic spying.

      Well the butler always did it, this way they'll have proof.

      --
      If all else fails, immortality can always be assured by spectacular error.
    12. Re:What's it for? by GodfatherofSoul · · Score: 1, Informative

      A billionaire's porn collection is called a "harem".

      --
      I swear to God...I swear to God! That is NOT how you treat your human!
    13. Re:What's it for? by Z00L00K · · Score: 1

      Being a billionaire would attract a lot of women regardless of how you look.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    14. Re:What's it for? by Just+Brew+It! · · Score: 1

      All your bits are belong to us!

    15. Re:What's it for? by AJH16 · · Score: 1

      Could be a 4 and a half day buffer of raw data from the LHC (ok, unlikely). The data rate that thing generates blows my mind.

      --
      AJ Henderson
    16. Re:What's it for? by Rob+Riggs · · Score: 1

      Being a billionaire would attract a lot of women regardless of how you look. [Emphasis mine]

      I don't think that gender matters much here.

      --
      the growth in cynicism and rebellion has not been without cause
    17. Re:What's it for? by Anonymous Coward · · Score: 0

      That's just enough to hold all my Japanese P0rn collection!

    18. Re:What's it for? by Yamioni · · Score: 1

      Yeah but the video was so grainy that they couldn't tell if it was the knife or the candlestick so the butler got aquitted anyway. Guess they should have invested in better cameras first...

      --
      Cool post bro, highfive \o
    19. Re:What's it for? by Z00L00K · · Score: 1

      Except that more men are embarrassed by the fact that their spouse earns more than them than the other way around.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    20. Re:What's it for? by Anonymous Coward · · Score: 0

      My bet is SkyNet...

    21. Re:What's it for? by GregC63 · · Score: 1

      My bet is SkyNet...

      Crap I forgot to login

    22. Re:What's it for? by mangu · · Score: 1

      One of the people in this photo is a billionaire. See if you can guess which one.

    23. Re:What's it for? by Smauler · · Score: 1

      I did a little clicking from TFA, and found this page, which says : "The effect of heat is so pronounced that a temperature of 125C can slow down a processor’s frequency by up to 14%." I'm guessing this is an error... since the effect of a temperature of 125C can slow down most processors frequency by up to 100%.

    24. Re:What's it for? by Anonymous Coward · · Score: 0

      Sure, but I (not knowing any billionaires) imagine that from their point of view it would be more like, "Why not both?" There are trade-offs between the two. The middle-class analogue would be a guy married to a hot, libidinous woman, yet he still collects porn. I know, not really the same thing, but porn collecting and having sexual partner(s) are not mutually exclusive, and the super-wealthy can afford just about anything they desire.

      - T

    25. Re:What's it for? by Compaqt · · Score: 1

      >Crap I forgot to login

      Don't worry, SkyNet did it for you.

      --
      I'm not a lawyer, but I play one on the Internet. Blog
    26. Re:What's it for? by retroworks · · Score: 1

      42

      --
      Gently reply
  2. Depressing by Anonymous Coward · · Score: 0

    Anyone else find it depressing that the two top suspects for the use of this system are Facebook and presumably a spy agency?

    Can humanity come up with no better use for the biggest iron than a bunch of frivolous, narcissistic ad profiling and covert spying on people living in an allegedly free country?

    No wonder F@H doesn't post more progress. Our hardware is going towards people sharing their naked bong photos and government spooks cataloging your naked bong photos.

    1. Re:Depressing by m50d · · Score: 2

      I can see the likes of the LHC or the AEA using something like this - they generate enough data. But if it were a "good guy" why would they keep it secret?

      --
      I am trolling
    2. Re:Depressing by PPH · · Score: 4, Insightful

      Facebook and presumably a spy agency?

      You're repeating yourself.

      --
      Have gnu, will travel.
    3. Re:Depressing by BitZtream · · Score: 1

      Because it's a target regardless of who owns it. God could own it and call it the garden of Eden and people would still blow it up

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    4. Re:Depressing by Yamioni · · Score: 1

      We should all be glad it's not Apple that's building this then!

      --
      Cool post bro, highfive \o
    5. Re:Depressing by CCarrot · · Score: 1

      No wonder F@H doesn't post more progress. Our hardware is going towards people sharing their naked bong photos and government spooks cataloging your naked bong photos.

      I don't get it, does your bong look classier if it's all dressed up? ;p

      --
      "I love animals! Some are cute, others are tasty, what's not to like?" - Betsy Schroeder, Jeopardy contestant
  3. Not done yet by TheVidiot · · Score: 2

    Do they back up to tape or external USB drive?

    1. Re:Not done yet by S.O.B. · · Score: 5, Funny

      Punch cards.

      --
      Some of what I say is fact, some is conjecture, the rest I'm just blowing out my ass...you guess.
    2. Re:Not done yet by Hatta · · Score: 2

      Imagine a Beowulf cluster of these!

      --
      Give me Classic Slashdot or give me death!
    3. Re:Not done yet by stridebird · · Score: 2

      came for the beowulf comment...leaving satisfied.

    4. Re:Not done yet by Anonymous Coward · · Score: 0

      125,000,000,000,000,000 punch cards.

    5. Re:Not done yet by S.O.B. · · Score: 1

      The cluster is listed as 120 PB not PiB so 1 PB = 10^15 not 2^50. One IBM punch card in binary mode can hold 2 bytes per column * 72 columns (columns 73 - 80 are not used), so 144 bytes per card.

      So you would need 833,333,333,333,334 cards.

      My question is, can you make a punch card RAID5 array?

      --
      Some of what I say is fact, some is conjecture, the rest I'm just blowing out my ass...you guess.
  4. I wonder.. by eexaa · · Score: 2

    ...about the sound and torque generated when all these disks start to spin-up.

    1. Re:I wonder.. by jhoegl · · Score: 2

      It may very well alter time as we know it!

    2. Re:I wonder.. by Anonymous Coward · · Score: 0

      And the heat, assuming they're using all their old Hitachi Deskstar drives.

    3. Re:I wonder.. by Anonymous Coward · · Score: 0

      ...as if millions of magnetic heads suddenly cried out in terror...

    4. Re:I wonder.. by ELCouz · · Score: 2

      Obviously, they are forming a deathstar ;)

    5. Re:I wonder.. by ae1294 · · Score: 1

      And the heat, assuming they're using all their old Hitachi Deskstar drives.

      That sounds like a plot to a disaster movie... "Sir, the cluster won't shut down! We're looking at a full melt down!"

    6. Re:I wonder.. by crow · · Score: 1

      If the torque were an issue (which it's not), you could mount the drives in alternating directions to balance them out.

    7. Re:I wonder.. by eexaa · · Score: 2

      My geek nature disapproves such torque-negating behavior. Instead, it totally wants to see the petabytes spin at some insane RPM, cancelling the gravity and possibly crushing some enemies.

    8. Re:I wonder.. by rubycodez · · Score: 1

      mounting in alternating directions? I saw some twin girl porns like that.....

    9. Re:I wonder.. by turbidostato · · Score: 1

      Alternating directions you say? How exactly do you expect that to cancel torque?

      Upside-down.

    10. Re:I wonder.. by Anonymous Coward · · Score: 0

      I wonder if they also build their own nuclear power plant to keep all those disks spinning.

    11. Re:I wonder.. by crow · · Score: 2

      Yes, alternating directions. That assumes the drives are mounted vertically. If they're mounted horizontally, then yes, upside-down.

      If they're using SSDs, then they need special leveling algorithms to keep the accesses spread out so that they don't get out of balance. If you access the left side of all your SSDs in the rack, the rack might fall over. :)

    12. Re:I wonder.. by Yamioni · · Score: 1

      Citation Needed

      --
      Cool post bro, highfive \o
    13. Re:I wonder.. by X0563511 · · Score: 1

      ... brings a whole new meaning to "click of death"

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    14. Re:I wonder.. by CCarrot · · Score: 1

      ...as if millions of magnetic heads suddenly cried out in terror...

      ...and were suddenly erased.

      --
      "I love animals! Some are cute, others are tasty, what's not to like?" - Betsy Schroeder, Jeopardy contestant
    15. Re:I wonder.. by rrohbeck · · Score: 2

      Yup. Don't mount them all in the same orientation as the Earth's axis or you can probably measure the change in the day's length.

    16. Re:I wonder.. by rrohbeck · · Score: 1

      Spindle torque is not a problem because they rotate at a constant speed.
      Actuator torque is however. Large disk subsystems need careful construction of the frames and disk mounting or the torque of a seeking arm will cause tracking deviation in other drives that show up as increased RW error rates.
      That happens all the time with cheap disk chassis.

    17. Re:I wonder.. by wwphx · · Score: 1

      Not that it matters since you posted AC, but TFA says they'll be water-cooled. It's entirely possible that they might share such a cooling system with the servers accessing it.

      --
      When you sympathize with stupidity, you start thinking like an idiot.
  5. Finally... by TheAngryArmadillo · · Score: 2

    Somewhere I can store _all_ my porn in one spot.

    1. Re:Finally... by hot+soldering+iron · · Score: 1

      I think you mean "store _all_ THE porn".

      --
      When you want something built, come see me. If you want correct grammar and spelling, get a F*ing liberal arts student.
    2. Re:Finally... by TheAngryArmadillo · · Score: 1

      Me thinks you misunderestimate the extent of my depravity.

    3. Re:Finally... by Tarlus · · Score: 1

      Misunderstimate? Is that even a word? I used the Internets and tried to pull it up on the Google but all I got was maps.

      --
      /* No Comment */
  6. Paranoid much? by skids · · Score: 1

    it's most likely use will be a storage device for a governmental (or Facebook) supercomputer.

    Actually, given the explosion of data storage needs in the bio-informatics area, it's most likely use would be in storing DNA sequences for research purposes.

    1. Re:Paranoid much? by ByOhTek · · Score: 1

      The human genome can effectively be stored in about 750MB (each base being only 2 bits). The largest genomes are only abut 10x that size. IIRC the FASTA files for it take only about 3GB uncompressed.

      Even with specific protein sequences, etc. I think that's a bit excessive the bio-informatics field.

      Also, I'm not sure if even the NIH could afford that kind of storage cluster.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    2. Re:Paranoid much? by yomammamia · · Score: 1

      "The human genome"? That's a bit of a misnomer. With compression and differential storage however the point is still valid.

    3. Re:Paranoid much? by Anonymous Coward · · Score: 3, Informative

      modern gernome compression techniques only store the edits needed to convert the reference genome to your genome. And the diff file is just around 24 MB per person. I am an ex-bioinformatician.

    4. Re:Paranoid much? by tomknight · · Score: 1

      Data requirements are doubling faster than disk storage capabilities. We're needing to find ways of dealing with this, but ideally without simply asking for more money for more disks. I've just been told a new academic here will need about 200TB in a few months. I can see my (fairly small set) of Bioinformatics researchers needing a PB before the end of next year.

      --
      Oh arse
    5. Re:Paranoid much? by biodata · · Score: 2

      Our modest lab turns out roughly 100GB a week of finished sequence, from a single sequencer, which is only a very small fraction of the temporary disk storage needed along the way to get to finished sequence. Genome centres with many machines will turn out an order of magnitude (or two) more, and believe me, these machines are kept busy week after week. Once we have finished sequences, the assembly process adds a multiple to this. Yes, a genome is only XMB, but when you have to effectively sequence it 40 times to get the overlaps you need to assemble the thing, it soon mounts up. The sequencer machine companies are now touting similar scale machines on the basis that any lab can afford one to do their own sequencing. Sequence volumes have been outstripping Moore's law for some time now, and it isn't going to stop anytime soon. That said, I think Facebook and their CIA funders are probably more likely to have the money for this than anyone doing anything useful for humanity.

      --
      Korma: Good
    6. Re:Paranoid much? by ByOhTek · · Score: 1

      So am I. I was just talking about the base genome, not the diffs.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    7. Re:Paranoid much? by Beorytis · · Score: 1

      ...the diff file is just around 24 MB per person.

      OK, so 120 petabytes will store the genomes for about 5 billion people, not accounting for the further compression that could probably happen. Maybe this is for everyone's genome.

    8. Re:Paranoid much? by skids · · Score: 1

      My understanding is that that amount of data is post-processed information, and that there are reasons not to be throwing out some of the the intermediate data (it could be re-analyzed by better algorithms in the future), but it gets thrown out anyway just because there is no space to store it.

    9. Re:Paranoid much? by Marc+Madness · · Score: 1

      Maybe this is for everyone's genome.

      This is starting to sound like a spy agency again.

    10. Re:Paranoid much? by Anonymous Coward · · Score: 1

      I work in a similar lab, though we generate about 2 TB week. The first thing I suspected when I saw this was that it's a for genome center.

      While the final, polished results of a genome are relatively small, the amount of data required to get that point is very large. Sequence data scales are driven by the amount of _coverage_ needed to confidently identify features (or expression levels, if you're sequencing RNA). For human applications, these are anywhere from 15x coverage for simple variations to 100x or more for full reassembly of a genome (necessary for identifying context sequences for large structural variations). In data scales, you're looking at anywhere from a few hundred GB to multiple TB of working data.

      It's this working data that's really killing biology labs right now. Consider a lab with one sequencing instrument (say an Illumina GAII) 5 active sequencing projects. Say each project needs 10 sequencing runs for a simple human variation study. Already, that's 500 GB per project, or 2.5 TB for the lab. Keep in mind that this data is actively processed, so that's 2.5 TB of _fast_ storage. This is for the simplest sequencing application - SNP detection with high quality data on a well annotated species. Changing species may require more depth (the references aren't as mature as human) and could double or triple the data requirements. Adding different types of experiments will also similarly increase the data scale.

      What's really interesting in this space is that the turnaround times for sequencing are shrinking. It took the human genome project 10 years to generate enough data for a single human genome. A GAII (or better yet, a HiSeq), can do the same in two weeks. ION Torrent's new sequencer is getting close to doing it in a few hours. Soon, biologists will able to do deep sequencing runs that generate hundreds of GBs of data in an afternoon.

      Should be a fun time for bioinformaticians...

      -Chris

    11. Re:Paranoid much? by c0nner · · Score: 1

      We have many researchers that arrive here that give us similar storage requirements but then when we ask them how they are going to process that captured data or if they have any plan on the real output then end up realizing that spending $5k a month to store,replicate, and protect data that could be regenerated at a cost of $1k of compute cycles and a couple days of waiting it just doesn't make sense.

      Now admittedly we do house a bit over 1.2 PB of live data and another ~2 PB of archival and DR data so it isn't like we aren't storing a lot but if we were storing as much data as every new lab said they needed to store we would have closer to 20 times that storage.

      We see more waste in storage because the storage is so cheap compared to the past even if everyone complains about how expensive it is compared buying an external hard drive from Best Buy.

    12. Re:Paranoid much? by Anonymous Coward · · Score: 0

      Enuff to store the DNA(compressed) of 5 billion people, hmmm

  7. Fill 'er up by mmarlett · · Score: 4, Funny

    All I know is that if you put it on my computer, I'll have it filled in two years and have no idea what's actually on it.

    1. Re:Fill 'er up by rrossman2 · · Score: 1

      Sadly... that would apply to me as well :)

    2. Re:Fill 'er up by odirex · · Score: 1

      Porn.

    3. Re:Fill 'er up by Zeroko · · Score: 1

      If you could push data to it at 2GB/s continuously, 120PB would fill in about 1.9 years.

    4. Re:Fill 'er up by X0563511 · · Score: 1

      To be honest, I'd rather they focus more on reliability and durability than speed and capacity...

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    5. Re:Fill 'er up by Pharmboy · · Score: 0

      I'm pretty sure that when you are using 200,000 hard drives, you can engineer in reliability, durability, speed as well as capacity. Just a guess, but I'm betting it won't be a RAID 0 configuration.

      --
      Tequila: It's not just for breakfast anymore!
    6. Re:Fill 'er up by X0563511 · · Score: 1

      Erm, unless you are designing and manufacturing those drives, you have no say. "They" in my post referred to the disk manufacturers.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    7. Re:Fill 'er up by DiEx-15 · · Score: 1

      This would be the MPAA and RIAA's wet dream if somebody got their hands on it!

      Millions and billions of songs and movies... Wouldn't matter one iota if they all were legal: That person would be sued out of existence!

  8. Finally! by AngryDeuce · · Score: 2

    Woot! Torrent all the things!

    1. Re:Finally! by Anonymous Coward · · Score: 0

      You forgot the pic.

    2. Re:Finally! by Anonymous Coward · · Score: 0

      And wosh. Back to reddit you go!

    3. Re:Finally! by rrohbeck · · Score: 1

      Donate one to archive.org so they can really archive everything!

  9. Must Be by Anonymous Coward · · Score: 0

    downloading to much from TL again.

  10. When that thing crashes by jader3rd · · Score: 1

    When that thing crashes somebody is going to be mad. I wonder how long restoring from backup is going to take.

    1. Re:When that thing crashes by Bucky24 · · Score: 1

      I imagine that a lot of it is actually meant for backup. If I had something that size I'd partition it off into three segments, and make recursive backups. Because honestly, where are they gonna find any space to backup something that large if they do all 120 PB?

      --
      All the world's a CPU, and all the men and women merely AI agents
    2. Re:When that thing crashes by jader3rd · · Score: 1

      There is that, but I was thinking of what happens when the central coordinating unit goes down, it might take a lot of the data with it. Have you ever had a SAN go down and take everything with it? If you store your backups on the same SAN, the backups are gone as well.

    3. Re:When that thing crashes by Bucky24 · · Score: 1

      That's true. Hopefully they have redundant control systems as well.

      --
      All the world's a CPU, and all the men and women merely AI agents
    4. Re:When that thing crashes by Lennie · · Score: 1

      What about the time and RAM it needs for doing a fsck if this was one filesystem ?

      --
      New things are always on the horizon
  11. Sounds like a data orgy.. by katz · · Score: 1

    ...for hoarding whorecookies.

  12. How are they going tho power that thing? by Anonymous Coward · · Score: 0

    If I'm not mistaken one hard drive needs about 12~14W, so assuming that half of those are under load at a time how are they going to power that thing?

    Not counting with all the needed AC and support computers, network, etc...

    1. Re:How are they going tho power that thing? by tenco · · Score: 1

      3x 1 GW PSUs?

    2. Re:How are they going tho power that thing? by X0563511 · · Score: 1

      AFAIK it needs much less to maintain once it's started. So you just have to power the disks up in a 'wave' and ensure your peak power demand is below what you can supply.

      Keeping some (large) capacitor banks charged can help with this as well. Put the cap banks in series with the power supply (being safe about it of course) and the caps should be able to provide for any peak shortfalls.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
  13. Would be a good fit for CERN LHC by Tynin · · Score: 2

    My understanding is that the LHC generates so much data, that most of it is discarded immediately without going to disk. Seems like this would be a good solution to there data problems.

    1. Re:Would be a good fit for CERN LHC by Anonymous Coward · · Score: 0

      Sites with LHC data (CERN and associated institutes) already use about 200PB of storage, barely enough for the current mode of operation.

    2. Re:Would be a good fit for CERN LHC by Tynin · · Score: 1

      A good solution to THEIR data problems.

      Irregardless, grammer and spelin ain't no science, its a art form... and for all intensive porpoises, I lost power last night do to the slight'est bit of wind from whether system Irene (I live in south florida) from about ~6:30PM till ~3:00AM... at lease my generator worked, and FPL was on my road buy 8PM. Not sure why I came into work... I'm so tierd.

    3. Re:Would be a good fit for CERN LHC by rubycodez · · Score: 1

      they discard the common uninteresting decays, no point in storing it

    4. Re:Would be a good fit for CERN LHC by mswhippingboy · · Score: 0

      Irregardless

      Irregardless is an informal term commonly used in place of regardless or irrespective, which has caused controversy since it first appeared in the early twentieth century. Most dictionaries list it as "nonstandard" or "incorrect".

      --
      Sometimes the light at the end of the tunnel is the headlight of an oncoming train.
    5. Re:Would be a good fit for CERN LHC by Anonymous Coward · · Score: 0

      Some, maybe, but most are "uninteresting" only for a particular type of search. And you REALLY have to trust all of your various trigger levels to not accidentally throw out the good bits. I don't know any scientist who wouldn't store much much more given the opportunity.

      Of course, you then still have to somehow analyze that mess...

    6. Re:Would be a good fit for CERN LHC by geekmux · · Score: 1

      My understanding is that the LHC generates so much data, that most of it is discarded immediately without going to disk. Seems like this would be a good solution to there data problems.

      Ah, no, that's merely a solution for warehousing a shit-ton of data. But data itself is basically worthless.

      It takes people to turn data into information, and in this case, it would take a small army to turn that amount of data into anything useful.

      Chances are there's a valid reason most LHC data gets thrown away....either that, or in the billions spent building the damn thing, no one ever considered storage requirements.

    7. Re:Would be a good fit for CERN LHC by Anonymous Coward · · Score: 0

      No, you sir are mistaken.

      That can't possibly be the correct spelling of the word, since by feeling the need to post a correction you are admitting you couldn't understand the original word nor what the poster meant.
      If you don't even know what he meant, how could you possibly know what needs corrected, and to what?

      Please stop posting authoritatively about things you admit to not understanding, ktnx.

    8. Re:Would be a good fit for CERN LHC by Anonymous Coward · · Score: 0

      That was the point wasn't it?

    9. Re:Would be a good fit for CERN LHC by Tynin · · Score: 1

      I'll have you know that my intensive porpoises have freaking laser beams mounted on their heads, I will take your issue with 'Irregardless' and raise you one 'WHOOSH'! :-)

  14. p0rn by Anonymous Coward · · Score: 0

    or it is build for some ones porn collection.

    1. Re:p0rn by X0563511 · · Score: 1

      Congratulations, you've been nominated for this year's Most Useless Comment Award! We take great pleasure in awarding the MUCA, but in order to claim it you'll need to reply without your AC cloak.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
  15. Not the government. by girlintraining · · Score: 4, Interesting

    It's not the government guys, at least not the cloak and dagger kind. They're too paranoid to let you know how much data they can store. They also don't want you to know that even with all that data, they're still only able to utilize a fraction of it. People are still going through WWII wire intercepts *today*. No, the problem in the intelligence community is making the data useful and organized as efficiently as possible, not collecting it.

    That leaves only one real option: Scientific research. Look at how much data the Hadron Supercollider produces in a day. ..

    --
    #fuckbeta #iamslashdot #dicemustdie
    1. Re:Not the government. by DrgnDancer · · Score: 4, Insightful

      This is generally something I have a hard time convincing people of. I've worked for spooky organizations. Not at the highest levels or on the most secret projects, but in the general vicinity. The government is not monitoring you. Not because they lack the legal capability (though they do, and that is mostly, but not always, respected), but because they lack the technical ability. There are only so many analysts, only so much computer time, only so much storage. Except in cases of explicit corruption or misuse of resource, those analysts, that computer time, and that storage is not being wasted on monitoring Joe and Jane average.

      I'm not going to say that there aren't abuses by the people who have access to some of this stuff; they are human and weak like the rest of us and are often tempted to take advantage of their situation I'm sure. In general however, unless you've done something that got a warrant issued for your information, the government doesn't care. They just don't have the resources to be big brother, even if they want to be.

      --
      I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
    2. Re:Not the government. by Anonymous Coward · · Score: 1

      Entirely besides the point. They could, therefore it can be abused, therefore there must be EXTREME oversight. Period.

      If the only difference between today and a surveillance state is some manpower, then all it takes is a change in policy. I don't trust them that much, I do now, and always will, advocate that the organs of state secrecy be dismantled. We don't need a state to begin with, these groups, even less so.

      In terms of what the people need, it is little more than in ineffective jobs program.

    3. Re:Not the government. by Anonymous Coward · · Score: 0

      >> People are still going through WWII wire intercepts *today*

      This is true. But you neglect to mention that the people doing this are historians.

    4. Re:Not the government. by DrgnDancer · · Score: 1

      It's not besides the point, it's the practical side of the point. This doesn't mean we should ignore questions of morality, how much power is too much, how much monitoring is appropriate, etc... It just mean that while these philosophical questions are both interesting and relevant you don't really need to worried about the practical implications day by day. Practically, the government *can't* watch you all the time, or really at all, unless you are the subject of some investigation worth those resources (or someone is doing something they shouldn't be). That doesn't mean we shouldn't seek to limit and control then from a legal and regulatory perspective, but it does mean you probably don't need to worry about spies in your attic.

      --
      I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
    5. Re:Not the government. by Anonymous Coward · · Score: 0

      Information overload has become a part of the fog of war, which is why we have computers these days. Joes and Janes might not be interesting because of their names and appearances, but Adnans and Jamels are interesting because of their names and appearances. Oh, I see what I did there and now I'm going sci-fi: how much storage it would take to create a sufficient database of images to produce models to be used in recognizers hooked up to public CCTVs for usable facial recognition?

    6. Re:Not the government. by m50d · · Score: 2

      Practically, the government *can't* watch you all the time, or really at all, unless you are the subject of some investigation worth those resources

      Trouble is, if the government does something I don't like, and I start taking (perfectly legal) political action against that, I become someone "worth" watching. So surveillance capability is something to worry about now; otherwise, when something directly problematic comes up, you're a dissident and it's too late.

      --
      I am trolling
    7. Re:Not the government. by mlts · · Score: 1

      Things can change though. For example right now, monitoring by the USG is not on my list of worries, because I'm sure i'd bore to tears any people watching.

      However, governments can change; the LEOs who are looking for felonies being committed and are abiding by their oath have a possibly of being replaced by people more interested in getting rid of any opposition.

      Take a system for figuring out if someone gets an intensive or routine search at customs. That same technology can be used to data mine social networking sites to find people who are a threat because of their ideas and their writings. This can be left or right ideology. What it can mean is an easy way for a repressive government to run a couple SELECT statements with a threshold number, and pass the results to a secret police to do some arresting. It doesn't even have to be people's political bents. It can be their race, religion, Alliance or Horde preference, or any factors.

      Right now, this isn't happening, so social networking sites are doing well. However, as soon as some government decides to use a social site on their soil to find people of a certain race for some ethnic cleansing action, this would all change.

    8. Re:Not the government. by afabbro · · Score: 1

      Entirely besides the point. They could, therefore it can be abused, therefore there must be EXTREME oversight. Period.

      Yeah, we should setup a government board or require government courts to monitor and oversee the government...er wait...

      --
      Advice: on VPS providers
    9. Re:Not the government. by LWATCDR · · Score: 1

      1. Get back on the meds, really.
      2. If they did get ride of the "organs of state secrecy" how would you know? They are secret after all.

      Really if you got rid of the CIA, NSA, and NRO they would still be there. No nation can survive with out intelligence gathering organisations. So what would happen is they would be hidden and secret. Being pubic means that there is oversite.
      So really get back on the meds and the voices will stop.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    10. Re:Not the government. by Anonymous Coward · · Score: 1

      ...There are only so many analysts, only so much computer time, only so much storage....

      Looks like they don't have a storage problem anymore.

    11. Re:Not the government. by AmiMoJo · · Score: 2

      There are only so many analysts, only so much computer time, only so much storage.

      The government has found a solution to that problem. Distribute the computing and storage requirements.

      These days if you want a license to sell alcohol in your shop you have to get agreement from the police, and they usually require you to have extensive CCTV systems covering the area outside your shop as well as inside it. They shift the burden of installing and maintaining the system to the shop owner and can access the video any time they like. If a crime is reported the shop owner gets a demand for CCTV footage and has to go back into their archive, find and save it to disc all at no cost to the police force.

      Admittedly there are not enough people to monitor all these video streams at once, but they don't have to. They rely on victims reporting crime rather than actively looking for it. Unfortunately this makes them very lazy, and when CCTV footage isn't available they tend not to bother investigating.

      So it comes down to your definition of "monitoring you". If all email headers and the domain of every web site you visit is kept for two years and can be accessed by them at any time, and it costs them very little because the ISP you are paying for service is the one who is monitoring you and keeping all the data then I'd say that meets the criteria.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    12. Re:Not the government. by Anonymous Coward · · Score: 0

      This is what bothers me about 'the guberment is out to get yous' people. They can not seem to realize that the *same* people who are doing that can not be assed to help someone recover a stolen phone. It takes 2-4 hours to get a chunk of plastic that says you can drive on the road. So on and so on...

      Its not that they do not have the ability. Its that most who work there are too lazy to really care. They get *thousands* of requests a day for stupid shit. They are beat down by the very same system we all loath...

    13. Re:Not the government. by misexistentialist · · Score: 1

      don't really need to worried about the practical implications day by day

      Normal activities like traveling or opening a bank account are quite noticeably affected by government surveillance of the financial and transport systems. Any practical limitation of government capabilities can be made up for by requisitioning private resources or by simply blocking events that are difficult to monitor from happening at all.

    14. Re:Not the government. by triffid_98 · · Score: 1

      Practically, the government *can't* watch you all the time, or really at all, unless you are the subject of some investigation worth those resource

      Do you own a cell phone? Your carrier knows where you are, right now and has records of where you've been this week. They know who you've talked to, and they're more than happy to share that information with 'interested parties' in the government, no warrant required.
      Given that the data sizes are small, there's no reason they can't store everyone's location/phone data on the off chance that one of them will become or come in contact with a 'person of interest'. This does not require the services of analysts and it would be trivial to automate.

      Is this still a philosophical question?

    15. Re:Not the government. by Anonymous Coward · · Score: 0

      Thank you for the insight. It makes sense.

      Unfortunately, two of the three of those issues "only so much computer time, only so much storage" are problems that are being solved at exponential speeds. (Moore's law is exponential) There is real value in old communication to the intelligence community -- if they could build a virtual time machine to go back and tap phone calls from last year based on information today, they'd do it in a heartbeat.

    16. Re:Not the government. by Bucky24 · · Score: 1

      Is moore's law still valid? I thought I read something on slashdot (sadly don't have link anymore) that said it wasn't the case for the last few years.

      --
      All the world's a CPU, and all the men and women merely AI agents
    17. Re:Not the government. by Bucky24 · · Score: 1

      Going by that idea, people are still going through intelligence gathered back in Roman times....

      --
      All the world's a CPU, and all the men and women merely AI agents
    18. Re:Not the government. by Hatta · · Score: 1

      The government is not monitoring you. Not because they lack the legal capability (though they do, and that is mostly, but not always, respected), but because they lack the technical ability. There are only so many analysts, only so much computer time, only so much storage.

      They may not be monitoring everyone all of the time, but they are always monitoring someone. Going by what we know of the competence of these organizations, we know they make mistakes. Therefore it's reasonable to assume that they are always monitoring someone they shouldn't be.

      In general however, unless you've done something that got a warrant issued for your information, the government doesn't care.

      So your position is that warrants never issue for false causes? If you have a warrant against you it's always your fault?

      I'm not surprised you have a hard time convincing people.

      --
      Give me Classic Slashdot or give me death!
    19. Re:Not the government. by xkr · · Score: 1

      I had the privilege of hearing a detailed talk by an ex, high-level spook. He explained the current bottlenecks of gov't intelligence services to doing useful work. Storing lots of data was not on the list. His talk supported the comments in "Not the government" post.

      --
      I will create a sig when innovation restarts in the U.S.
  16. I propose a name for it ... by tomhudson · · Score: 1

    FTFS:

    The data repository, which currently has no name, is being developed for an unnamed customer,

    It's the tech equivalent of Prince - it's "the data repository with no name." We can denote it with some sort of unicode glyph that slashdot will mangle.

    And of course it has amazingly fast read speeds - if each drive has a 32 meg cache, that's 6.4 terabytes just for the cache.

    BTW, it's for the ^@#%^&^+++NO CARRIER

    1. Re:I propose a name for it ... by Anonymous Coward · · Score: 0

      Make a Gibson reference; call it the Aleph.

  17. 1.21 Jigawatts by Anonymous Coward · · Score: 0

    So if I had that kind of power, would I want to power a 120PB cluster or a flux capacitor. Decisions, decisions.

    1. Re:1.21 Jigawatts by Abstrackt · · Score: 1

      I'd go with the flux capacitor personally. Then you can go back in time, invest in IBM, Microsoft, Google, and Apple when shares are still cheap and buy the 120PB cluster. Assuming you drive a DeLorean anyway.

      --
      They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
    2. Re:1.21 Jigawatts by black+soap · · Score: 0

      Either spell it "gigawatts," or start referring to hard drive and memory in "jiggabytes."

    3. Re:1.21 Jigawatts by Anonymous Coward · · Score: 0

      Do you even know what a flux capacitor is?

    4. Re:1.21 Jigawatts by Tarlus · · Score: 1

      or start referring to hard drive and memory in "jiggabytes."

      Actually, I would be okay with changing this. All in favor?

      --
      /* No Comment */
    5. Re:1.21 Jigawatts by frank_adrian314159 · · Score: 1

      Assuming you drive a DeLorean anyway.

      But here's the thing - you don't really need the DeLorean! All you need is a beater that can get up to 88mph! And I used to do that all the time in my old Pinto... well, until that time I didn't quite make it up to 88 in time. But don't worry, the burns are healing nicely and the point still stands!

      --
      That is all.
    6. Re:1.21 Jigawatts by Culture20 · · Score: 1

      Jigga, whaAt?

  18. Wow. by UncleNinja · · Score: 0

    These guys have way too much time on their hands.

  19. Proof of corporate favoritism by government by OzPeter · · Score: 0, Offtopic

    The government happily stands by when a major corporation announces that it has 120 petabytes (ie petafiles - my emphasis) under its control, yet if the average joe schmo even thinks about how they'd like a petafile or two at home the FBI, CIA, TSA, ICE (and every other TLA) hauls his ass off to jail and and etches a scarlet letter on his forehead.

    Such harassment by the government of simple people who aren't hurting anyone else needs to be stopped. Think of the children -- how are they going to cope when their own father/uncle/priest gets charged with accessing petafiles? They'll be the laughing stick of their peers!

    --
    I am Slashdot. Are you Slashdot as well?
    1. Re:Proof of corporate favoritism by government by Anonymous Coward · · Score: 0

      What are you talking about? I'm disenfranchised with the control as well, but who has been taken to jail for owning large amounts of data?

    2. Re:Proof of corporate favoritism by government by Anonymous Coward · · Score: 0

      it was a really bad joke... peda/peto... took me a while to understand as well.

    3. Re:Proof of corporate favoritism by government by mjperson · · Score: 0

      Wait, when did they arrest someone for having too much disk space?

    4. Re:Proof of corporate favoritism by government by mjperson · · Score: 0

      Oh, oops. I is p0wned.

    5. Re:Proof of corporate favoritism by government by rubycodez · · Score: 1

      the government is too busy with its War on Terrabytes to worry about the petafiles

  20. Loading times by ifrag · · Score: 1

    Perhaps this cluster can load Deus Ex : Human Revolution levels in a reasonable amount of time!

    --
    Fear is the mind killer.
    1. Re:Loading times by Anonymous Coward · · Score: 0

      What? Levels take just a few seconds to load on my average gaming build.

    2. Re:Loading times by Bucky24 · · Score: 1

      So they built this for YOU?

      --
      All the world's a CPU, and all the men and women merely AI agents
  21. Good job for a HS kid... by spagthorpe · · Score: 1, Interesting

    Run around with a shopping cart and swap out drives as they fail. Kind of like they did back in first computer days with vacuum tubes.

    --

    WWJD -- What Would Jimi Do?
    (Smash amp, burn guitar, take home the groupies)

  22. Constant failures? by LordNimon · · Score: 1

    With 200,000 hard drives, won't there always be at least one hard drive that is failing? You'll need an IT guy 24/7 swapping out the failed drives. As soon as he swaps out one drive, another one will fail. It just seems kinda ridiculous.

    --
    And the men who hold high places must be the ones who start
    To mold a new reality... closer to the heart
    1. Re:Constant failures? by Anonymous Coward · · Score: 0

      No worries, they're using deskstars, so they'll all fail at once.

    2. Re:Constant failures? by SuperQ · · Score: 2

      This is what MTBF is all about. "Enterprise" drives are rated at 1.2 million hours MTBF. 1,200,200 hours / 200,000 drives = 6 hours per drive failure. Not too bad, only 4 a day.

    3. Re:Constant failures? by bigredradio · · Score: 1

      Since they can't backup to tape, maybe they will convert their old tape library to swap out hard drives.

    4. Re:Constant failures? by Jeng · · Score: 1

      I would guess that would be the reason for the water cooling, to increase the drives reliability.

      Also from the article it sounds like they may have more than 200,000 hard drives hooked up, but only use 200,000 at a time so the computer can automatically begin recreating the dead drive as soon as it occurs.

      --
      Don't know something? Look it up. Still don't know? Then ask.
    5. Re:Constant failures? by Marc+Desrochers · · Score: 2

      How long does it take for the cluster to rebuild after a drive fails, and does this involve downtime?

    6. Re:Constant failures? by fuzzyfuzzyfungus · · Score: 1

      I'm assuming that IBM has better plumbers than I do; because "reliability" is not the first word that comes to mind when somebody suggests water-cooling 200,000 hard drives...

    7. Re:Constant failures? by h4rr4r · · Score: 1

      Even ancient RAID5 implementations are not that bad. Most likely this is really some sort of RAID over RAID over RAID, or some sort of RAID like software that does similar actions. This means no downtime and most likely nearly no speed costs for a single drive.

    8. Re:Constant failures? by guruevi · · Score: 1

      Given that 'water' cooling in computers is never done with water (and most other closed systems besides cars are neither) but with an inert fluid it's not really that big of a problem.

      Even in home computers, "water" in water cooling (as some dweebs have indeed used tap water) has been known to calcify, have algae growths and/or corrode the components and there are a lot of other liquids that are better at transferring heat than water.

      Also, pure water (the undrinkable kind) is inert.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    9. Re:Constant failures? by Manfre · · Score: 1

      http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/

      Backblaze provides some metrics about their drive failure rates. It's surprisingly low (1-5% per year). If they had 200k drives, they would need to replace 39-192 per week. I'm sure the cluster is built with lots of redundancy that doesn't require a person to immediately replace a failed drive. They'll probably need a full time staff of at least 3 to maintain it.

    10. Re:Constant failures? by marcroelofs · · Score: 1

      Jury still out on the meaning of MTBF (1.200200 hours is longer than since HD's were invenetd). Recent sampling research showed something of at least 1% failure per year. That would mean 2000 disks per year of around 10 per working day. One disk to change every 3/4 of an hour. On average! Since there sometimes is a 'cause' for failure, that 'cause' can easily make them happen at the same time.

    11. Re:Constant failures? by Rockoon · · Score: 1

      It will probably take longer than 45-minutes to find, verify, and replace a drive in that vast sea of 200000 water cooled drives.

      Also, that 1% figure is bullshit. Expect 6% to die in the first year (nearly 3% in the first 3 months)

      --
      "His name was James Damore."
    12. Re:Constant failures? by Anonymous Coward · · Score: 0

      In general with enterprise storage arrays, you RAID across a relatively small group of drives. For example, we're currently setting up some SATA 2TB drives in our storage array in groups of 7+2, meaning 7 data drives + 2 parity drives (a.k.a. RAID 6). Then you have several individual RAID groups, which are presented through a few other layers in the array controller, to the devices driving them, which in turn have to do something to aggregate all those devices.

      Basically, it depends on a lot of things, but the array isn't doing RAID/parity calculations across the whole array at once. THAT would be very poor-performing.

      Oh, and with some of the larger SATA drives (eg. 1-3 TB), it's not uncommon for it to take 1-2 days to rebuild, especially when the array is under load.

    13. Re:Constant failures? by SuperQ · · Score: 1

      That highly depends on the drives, enclosures, and IO rates.

    14. Re:Constant failures? by SuperQ · · Score: 1

      No, you still don't get it. The meaning of MTBF is well understood.

      http://en.wikipedia.org/wiki/Mean_time_between_failures

      MTBF as it relates to drives is only valid when you take into account population stats and the lifespan of a drive into account (Assume lifespan is 3 years)

      Say you use these drives:
      http://www.seagate.com/www/en-us/products/enterprise-ssd-hdd/constellation/constellation-2/#tTabContentSpecifications

      You have an MTBF of 1,400,000 and 200,000 drives you will roughly see drives die every 7 hours. If you believe MTBF you will see about 1250 drive failures per year. This is ~0.62% which Seagate publishes as the annual failure rate.

    15. Re:Constant failures? by Bucky24 · · Score: 1

      Why is pure water undrinkable? I assume by pure water you mean H2O and nothing else. The human body should be able to process that. And even if it's somehow useless to the body, we still can drink it... it's just probably not a good idea. Unless you mean that pure H2O doesn't exist, which I guess would make it undrinkable...

      --
      All the world's a CPU, and all the men and women merely AI agents
    16. Re:Constant failures? by X0563511 · · Score: 1

      MTBF doesn't seem to work all that well for drives. If they are going to fail on their own, they tend to do so quickly. Else, left alone, they work FOR YEARS.

      Of course, all of that goes out the window when someone does something silly like knock the rack over.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    17. Re:Constant failures? by X0563511 · · Score: 1

      At this size, likely there will be space between disks with heat exchanging protrusions. Stacked in banks, a couple of pumps near the bottom to ensure movement...

      Convection would do the majority of the work, and you wouldn't really have piping so much as a "pool" the disk enclosures stick into.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    18. Re:Constant failures? by bws111 · · Score: 1

      Computers certainly are chilled with real water. For instance, an IBM Z196 requires two connections to the building chilled water at up to 21GPM each. This water has requirements such as pH range, bacteria counts, turbidity, hardness, etc. It is real (not pure) water.

    19. Re:Constant failures? by X0563511 · · Score: 1

      Don't forget that pure water is also quite corrosive, and it doesn't take too many impurities for it to start not being pure anymore.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    20. Re:Constant failures? by SuperQ · · Score: 1

      Yes, it does very much work for drives. How many drives do you own/manage? 10? 100? I work on petabyte+ storage clusters where we have tens of thousands of drives. I've done the math.

    21. Re:Constant failures? by ClickOnThis · · Score: 1

      This is what MTBF is all about. "Enterprise" drives are rated at 1.2 million hours MTBF. 1,200,200 hours / 200,000 drives = 6 hours per drive failure. Not too bad, only 4 a day.

      I don't think the math is that simple. You're assuming that the probability of a drive failing is uniform over the lifetime of the drive (or equivalently, the 200,000 drives have uniformly-distributed ages.)

      I'm not an expert on drive reliability, but I would guess that, if you started with all-brand-new drives, you'd face an initial period of multiple infant mortalities, followed by a rising rate of failure of individual drives as they age. Eventually I suppose the failure rate would reach a steady state as new drives replace old failed ones, but you may or may not see an oscillation in the failure rate as you approach that steady state (like a damped second-order differential equation.)

      Of course, failure of an individual drive is not the same as failure of the cluster. With enough redundancy, you can handle individual drive failures with no more bother than a horse twitching to shoo a fly off its hide.

      --
      If it weren't for deadlines, nothing would be late.
    22. Re:Constant failures? by ImprovOmega · · Score: 1

      MTBF has a very explicit meaning. It is (total time)/(number of failures). To calculate MTBF when it is high, you need to run a sample of, say, 100,000 drives for, say, 2000 hours (right around three months). Count the failures. You now have 200,000,000 drive-hours / (number of failed drives). For 1.2 million hour MTBF that translates to about 167 drives failed in three months out of 100,000 tested.

      You can still be statistically valid with a smaller sample set, say 1,000 drives over three months, with about 16-17 failures in the batch. But please don't say the jury is out on MTBF, it has a well documented, well understood precise mathematical meaning. I hope this little example helps people understand it a bit better.

    23. Re:Constant failures? by ImprovOmega · · Score: 1

      Umm...pure water is very much drinkable, it just doesn't taste like one would expect.

      I mean, I know laboratory grade ethanol is often denatured with benzene (which will kill the hell out of you) but I can't imagine what they would use that would be toxic but still result in purer water...?

    24. Re:Constant failures? by guruevi · · Score: 1

      That is most likely for a heat exchanger type system then. There are indeed data centers that use chilled water for cooling but they are either used in combination with special purpose racks (with a sealed internal system) or each rack/server has a heat exchanger that uses another fluid inside the guts of the servers. You won't see water in a system that have high risk of component, connection or user failure or running through the electronics of a server.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    25. Re:Constant failures? by guruevi · · Score: 1

      That's what I've always been told (probably an urban myth). I think the myth goes that if you drink distilled or deionized water for more than a few days that you remove/decrease certain minerals and electrolytes which could be dangerous to your health (which is the reverse of drinking too much water which is also deadly).

      Funny that both Oxygen and Hydrogen are both dangerous to your health when pure but not as a combination.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    26. Re:Constant failures? by Bucky24 · · Score: 1

      This calls for an episode of Mythbusters!

      --
      All the world's a CPU, and all the men and women merely AI agents
    27. Re:Constant failures? by X0563511 · · Score: 1

      Datacenter. 100s to 1000s.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    28. Re:Constant failures? by hawkinspeter · · Score: 1

      RAID5 implementations are all evil! When a drive fails, you then dramatically increase the disk accesses to recreate the data from the parity. Also, RAID5 can only survive one disk breaking at a time.

      I'd imagine they're using something more like ZFS or whatever IBM has that has similar features.

      --
      You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
    29. Re:Constant failures? by rrohbeck · · Score: 1

      A replacement drive has to be written with the recovered data, that's capacity/data rate seconds worst case, which is quite long with modern drives. That's why you go from RAID-5 to RAID-6 and then to RAID-newfangled with distributed checksums for large arrays, otherwise chances of incurring another failure during that time become significant.

    30. Re:Constant failures? by dbIII · · Score: 1

      A little bit of hydrazine or other additives, most very toxic because the idea is to remove dissolved oxygen, fixes that to an extent.

    31. Re:Constant failures? by Vegemeister · · Score: 1

      ...there are a lot of other liquids that are better at transferring heat than water.

      Not many. Molten lithium is one.

    32. Re:Constant failures? by h4rr4r · · Score: 1

      Not if you have enough backplane speed. I just tested this on a device I am using in RAID5 because its RAID10 implementation sucks.

    33. Re:Constant failures? by hawkinspeter · · Score: 1

      I don't see how the backplane speed helps if a drive fails as you typically need double the reads from the disk to recreate the data that you're after. I suppose it depends on your workload not needing the throughput that the disks can provide (and not maxing out the array cache).

      Mirror and stripe is the answer, now what's the question?

      --
      You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
    34. Re:Constant failures? by h4rr4r · · Score: 1

      What I mean is if you have enough spindles and the backplane is fast enough raid 5 rebuild slowdown is pretty low. Try it out, modern devices can do it with 10-20% hit in speeds.

      RAID 10 is only good if the device supports it well, some don't. By that I mean I have a low end device that does 300MB/s max, with RAID 5 over 22 disks. RAID 10 those same disks and you will not break 250MB/s. The vendor has acknowledged this issue is due to the low controller specs.

    35. Re:Constant failures? by SuperQ · · Score: 1

      Yea, that's not enough drives to really see useful statistics. +- 1 drive failure per year is a 20% margin of error at 1000 drives with a rated failure rate of basically 6 per year.

    36. Re:Constant failures? by hawkinspeter · · Score: 1

      RAID5 over 22 disks? You're brave if you're using big disks. The problem with RAID5 is that when one disk fails, you then have to rebuild the data (with a hot spare for instance) completely before another disk fails. As you then have increased disk activity, you increase the chance of a disk failure during that rebuild time even if you don't hit a read error.

      I'm not a big fan of relying on vendor's RAID implementations as they can be buggy and if you have a controller go south, you have to ensure you've got a matching controller to even be able to recover you're data. I prefer software RAID (e.g. Linux MD) as modern CPUs and memory sizes can get more performance than a RAID controller and you've got the advantage of being able to read the volumes from any machine.

      Mirror and stripe.

      --
      You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
    37. Re:Constant failures? by X0563511 · · Score: 1

      But does the fact that this was observed consistently over a period of 2 years (direct observation by me) backed by anecdotal statements confirming that this was not unusual (so, if you accept that, expand that to 4-5 years)?

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    38. Re:Constant failures? by h4rr4r · · Score: 1

      I agree 100%, but the devices I am taking about are iscsi SANS, that are vmware approved. The Linux kernel ISCSI implementation is pretty poor and does not even provide iSCSI3.

      With local storage I agree with you 100%. I have tested rebuilds on this setup and is only a couple hours.

    39. Re:Constant failures? by SuperQ · · Score: 1

      2 years, 5 years, 10 years, it doesn't matter. These observation don't help at all if you're not looking at a large enough sample.

      See also: http://en.wikipedia.org/wiki/Bathtub_curve

  23. Google? by JustAnotherIdiot · · Score: 1

    This just kinda strikes me as who would need this. Backing up the entire internet has to take up some space.

    --
    What do I know, I'm just an idiot, right?
    1. Re:Google? by Anonymous Coward · · Score: 0

      According to http://www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-web-servers/, Google probably has around 900k servers, and it's well known that they distribute disks to "normal" servers rather than concentrating in RAID configurations. Assuming the disks average 500GB, that's around 450PB of storage.

      Also, a drive failure every hour or so. :)

    2. Re:Google? by Bucky24 · · Score: 1

      I honestly don't think Google would do this. This sounds like it's hosted in a single data center, and a company as widespread as Google wouldn't put all their data in a single datacenter. Single point of failure and all that.

      --
      All the world's a CPU, and all the men and women merely AI agents
    3. Re:Google? by Anonymous Coward · · Score: 0

      You think they don't already?

    4. Re:Google? by Lennie · · Score: 1

      Not to forget: latency

      --
      New things are always on the horizon
  24. Pro cameras by CrowdedBrainzzzsand9 · · Score: 1

    It's for storing images from Nikon's new "Petapixel Pro" D7000000 camera

  25. Your mission, should you choose to accept it. by Anonymous Coward · · Score: 1

    1) Download the internets
    2) re-host the internets
    3) ????
    4) I really don't know. I'm scared.

  26. Needs maintenance by Anonymous Coward · · Score: 0

    Can anyone give an estimate how many disks have to be replaced every day? Can (are) big disk arrays be built so that replacements can be automated?

    1. Re:Needs maintenance by owlstead · · Score: 1

      What do you mean: can big disk arrays be build so that replacements can be automated? Of course they can be build, it would not even be that hard. Well, as long as you don't put drive/server production and delivery of the components or auto assembly in the automated system. I could not find one on google, I guess on such a large drive array, you can afford a human to replace some disks now and then. Humans are more flexible and more prone to see other problems occuring as well.

    2. Re:Needs maintenance by rubycodez · · Score: 1

      even in "small" disk arrays the replacements are automated with hot spares. of course you periodically replenish the hot spare pool, but one doesn't need to go running every time a disk fails

    3. Re:Needs maintenance by fuzzyfuzzyfungus · · Score: 1

      Pulling and replacing drives from hot-swap slots in a drive shelf would only be a slight change from the long-available robotic tape silo systems; but I've never heard of a situation where rigging such a thing up made economic sense...

      Your hot-spares provide immediate 'replacement', which allows you to make physical replacement less time-critical just by adding more drives to the system, and most big-huge-storage systems have front mounted indicator lights for drive health.

      Having a human on duty who gets paged with an aisle and rack number, grabs a spare drive, and goes over and swaps it out for the one with the red light just isn't all that expensive on the scale of such a system... With the homogeneity of the drives, and the automated monitoring, swapping dead drives is easier than stocking grocery shelves(the latter isn't rocket-surgery; but there are substantial variations in size, shape, density, crushability, changes in location according to daily/weekly promotional campaigns, etc. Yes, I've done both.) The biggest personnel expense, in the case of an "unnamed customer" buying a bespoke system from IBM, will probably be finding somebody who has sufficiently high clearance to touch the system; but is still willing to be a drive monkey...

    4. Re:Needs maintenance by mlts · · Score: 1

      I looked into making a hard drive silo as a business. Even dropped the business proposal by some vendors. You would put bare SATA or SAS drives in a load port and they would be dropped into place in groups for reading/writing. Critical data would have four HDDs writing at a time (three way mirror, plus one HDD that would go offsite.) Non critical would get 5-8 HDDs writing in a RAID 6 configuration. It would have been nice to have because disks can be erased faster than tapes for security (just do an ATA level secure erase when the data expires before writing new stuff).

      However, I encountered a few problems:

      1: 3.5" drives or 2.5" drives? A lot of enterprise arrays are running on the smaller drives. One could do both, but essentially it requires two silos due to the completely different shapes of the drives (requiring different grippers and such).

      2: Engineering grippers for the drives. Enclosures would make the setup a lot more expensive, and there wouldn't be any standard for those. So, the drives need to be moved around bare. This is harder than you think, as a bare drive isn't engineered to have reliable gripping surfaces.

      3: Delicate mechanisms. If a robot drops a tape, who cares. If there is no physical damage, it will work. A HDD that gets any significant shock is pretty much toast, or at best will be unreliable.

      4: I could not find anyone interested in making a robotic mechanism for this. The only party that would do the job was Seimens.

      4: Nobody was interested enough to fund this project.

      I wished this would have worked out. A silo like this could be used as a disk array, swapping out bad disks automatically, a VTL, a replacement for a tape array, a place for cloning disks to send out to remote sites, all kinds of uses.

    5. Re:Needs maintenance by fuzzyfuzzyfungus · · Score: 1

      Out of curiosity: I've seen a number of systems that use screwless drive rails that take advantage of the fact that the 3 screw holes on each side, and 4 on the bottom(for 3.5", 2.5" has something slightly different; but also fairly standard) by having pegs, just slightly smaller than the screws would be, that slide in to the holes. As long as modest pressure is applied to keep them in the holes(they aren't threaded or tight enough to be friction fit), the setup is pretty solid. Could a gripper mechanism employ either similar peg-insertion directly, or have an internal supply of rails with a good grip on one side, and pegs on the other(possibly even two internal supplies, one for 3.5"s and larger 'shim' ones for 2.5"s)?

    6. Re:Needs maintenance by mlts · · Score: 1

      That was thought of, but grippers have to do tens of thousands of moves, and reliably hitting those holes each time, every time, was an issue. If there was a misalignment, then the drive might not make onto the gripper's tray correctly. The good thing is that with a robotic gripper that has a tray for the drive to slide on, the chances of it falling are less, but trying to get it to a place where it can be read might be an issue.

      Of course, an enclosure would remedy this completely, but there are no real standards for drive enclosures, and it would dramatically increase the cost per drive, unless one could make and sell a large number, so economies of scale could kick in.

  27. 600GB drives? by Hsensei · · Score: 1

    120 million divided by 200,000 = 600. Even on an enterprise scale they could could get a lot better densities.

    --
    ~
    1. Re:600GB drives? by IDK · · Score: 1

      One word: Redundancy.

      120petabyte*5/3/200000 = 1TB
      with 2 redundancy disks per 5 disks

    2. Re:600GB drives? by maxwell+demon · · Score: 1

      I'm pretty sure there's some redundancy built in, so you need more than 120 PB of raw disk storage to provide 120 PB of usable storage. If they don't add redundancy, they will have an unpleasant experience as soon as the first hard disk fails (and with 200,000 of them, this will be very soon).

      --
      The Tao of math: The numbers you can count are not the real numbers.
    3. Re:600GB drives? by Anonymous Coward · · Score: 0

      120 million divided by 200,000 = 600. Even on an enterprise scale they could could get a lot better densities.

      you're assuming no redundancy at all in the array, highly unlikely considering the volume of data they want to store that they would risk the whole array dying from a single drive failure.

    4. Re:600GB drives? by PezJunkie42 · · Score: 1

      Depends on the type of drive. Current 15k RPM SAS drives don't go much bigger than that... Also, (as TFA mentions) once you factor in some kind of overhead for redundancy, you're probably talking about 1TB drives. (Assuming this is 120 PB of *usable* capacity.)

    5. Re:600GB drives? by rrossman2 · · Score: 1

      So how long would a consumer grade "raid chipset" take to rebuild that raid if it was a raid 5 setup (with the drives split into 3 different raid 0 setups)?

    6. Re:600GB drives? by SuperQ · · Score: 1

      No, when people publish these kind of articles they almost always state raw capacity. They're likely using 2.5" drives which don't hold as much as 3.5" drives.

    7. Re:600GB drives? by Bucky24 · · Score: 1
      From TFS:

      IBM Research Almaden, California, has developed hardware and software technologies that will allow it to strap together 200,000 hard drives to create a single storage cluster of 120 petabytes

      I am willing to bet that part of that special hardware is something that is quite a bit more robust and powerful than a consumer grade "raid chipset"

      --
      All the world's a CPU, and all the men and women merely AI agents
    8. Re:600GB drives? by walshy007 · · Score: 1

      Or they have replicated each set of data at least 3 times. For redundancy purposes for when the inevitable high about of disks start failing.

      (with that many disks, it would not surprise me to have a guy full time employed whose purpose is to simply replace dead hard drives)

    9. Re:600GB drives? by warchildx · · Score: 1

      nice calculation. based on those numbers -- looks like they are ATA (SATA) disks. i am not aware if SAS drives have that density (off the top of my head).

  28. 120 petabytes? That's amazing by Anonymous Coward · · Score: 0

    That's almost enough to install Vista

  29. emo? by luis_a_espinal · · Score: 1

    Anyone else find it depressing that the two top suspects for the use of this system are Facebook and presumably a spy agency?

    Can humanity come up with no better use for the biggest iron than a bunch of frivolous, narcissistic ad profiling and covert spying on people living in an allegedly free country?

    No wonder F@H doesn't post more progress. Our hardware is going towards people sharing their naked bong photos and government spooks cataloging your naked bong photos.

    You are trying too hard looking for something to be upset about (in a very attention-whorish manner to boot.)

    1. Re:emo? by luis_a_espinal · · Score: 1
      And just to prove my point.

      Can humanity come up with no better use for the biggest iron than a bunch of frivolous, narcissistic ad profiling and covert spying on people living in an allegedly free country?

      Yes. It ain't that hard to come to that answer, you know? The slashdot's story half-seriously hints at either a government agency (NSA) or somebody like Facebook. And obviously in Emo fashion, you took it as an statement about humanity. It's more a statement about you.

      I find these type of opinions rather simplistic as other opportunities in large-scale application engineering abound:

      1. Data collection and simulation done by the DoE or DoT (not necessarily just a DoE-related agency)
      2. A High-Energy particle collider
      3. Big-Iron for large-scale Online Transaction Processing (think Airline reservation systems)
      4. Algo-Trading
      5. Pharma
      6. Or even IBM's own venture into, God knows, web service platform providers or online searching in direct competition with Google, MS or Amazon.

      With the exception of the first two, all others potential clients could request anonymity. Would be nice to know for what purpose this behemot is being built. Would be even better if people could rub a pair of neurons together and come with similar sample lists (it's not rocket science) as opposed to go ZOMG humanity sux plz hold me!.

    2. Re:emo? by GooberToo · · Score: 1

      Don't forget the IRS.

    3. Re:emo? by jc42 · · Score: 1

      Another possible reason for this is to handle something that the US government is seriously planning to impose on all its ISPs: a requirement that records be kept of every message sent/received by every customer for several years.

      Some of the reports of this have tried to be reassuring by telling us that they'll only have to save the source/dest IP addresses, and presumably also the timestamps. But you might try doing a bit of arithmetic: Assuming a minimum database record of only these three 4-byte quantities (for IPv4), i.e., 12 bytes per message. You might ask your ISP for the total number of messages they handle per day. If you can get this info, or you can estimate it from numbers they will provide, multiply it by 12 (for the DB record) and by 365x2 (for two years storage). You might be impressed by the size of the resulting number.

      And, of course, the actual DB requirements could well be much higher. It's possible that the regulation may require saving all header info, for some meaning of "header". For TCP, this would be a minimum of 128 bytes per message. And, of course, there would be all the usual DB (or file system) overhead for all these records, which could easily double the resulting numbers.

      There are good reasons that nobody has ever actually kept this sort of data, only various running sums. If the non-computer-savvy regulators decide ("for the children") that ISPs must provide this data, this alone means petabyte-per-day databases must come into existence. Every ISP, except maybe your local mom-and-pop Internet service, will need one to meet the traffic tracking requirements.

      Similar requirements are being implemented by many other governments around the world.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    4. Re:emo? by fast+turtle · · Score: 1

      I think you mean Google as that just might be enough space to handle their current storage needs

      --
      Mod me up/Mod me down: I wont frown as I've no crown
  30. Media storage? by Anonymous Coward · · Score: 0

    How about some sort of gigantic media library... all porn jokes aside. Netflix? Apple? Isn't Walmart getting into the streaming business? Or some new "cloud" server?

  31. Not so impressive as a floppy RAID by erroneus · · Score: 1, Informative

    If they could make a 120PB cluster using floppy disks, I would be much more entertained by this.

    1. Re:Not so impressive as a floppy RAID by Anonymous Coward · · Score: 0

      Stack them on top of each other and we do not need a space shuttle to reach the ISS....

    2. Re:Not so impressive as a floppy RAID by mauhiz · · Score: 1

      That would require 10^14 floppy drives, and they probably would eat all of Earth's energy throughput.
      Disclaimer : I would be quite entertained too. Even more with 5''1/4 drives.

    3. Re:Not so impressive as a floppy RAID by rrossman2 · · Score: 1

      Man.. and make sure its the 5.25" drives that love to chatter... kind of like a commodore 64 drive loading up flight simulator ][

    4. Re:Not so impressive as a floppy RAID by jandrese · · Score: 1

      Just for the heck of it, I worked out the math on this. Assuming 1.44MB 3.5" floppies, you will need 83,333,333,333 disks to store all of that data. Not even accounting for the drives, the disks alone would fill a volume of 2,240,418.91 m^3 (591,856,062 US gallons). I don't know for sure, but I suspect that number exceeds the number of floppies that have ever existed, although it is only about 12 floppies for every man, woman, and child on the Earth.

      --

      I read the internet for the articles.
    5. Re:Not so impressive as a floppy RAID by Anonymous Coward · · Score: 0

      floppy... disk...?

      What's that?

  32. The important question... by Anonymous Coward · · Score: 0

    Whats the failures per minute estimation? How many full time hard-drive replacers will they need?

  33. fsck time will span 2 presidential cadences by Anonymous Coward · · Score: 0

    At least the data on this monster will be totally safe.
    The fsck alone will be started at each new president inauguration,
    and nobody will have access to the data for the next 8 years.

    In addition, to approve financing for the outside storage tapes,
    Congress will need to increase the debt limit again.

  34. Rainbow tables by Anonymous Coward · · Score: 0

    The open rainbow tables project announces....120PB of tables :)

    It's them or SETI has come back.

  35. lots of aluminum by buback · · Score: 1

    Someone should manufacture industrial sized hard drives for this type of application. Like full height x2, so you could cram 30 platters in there.

    1. Re:lots of aluminum by BetterSense · · Score: 1

      It's not as straightforward as that, because current multi-platter hard drives have all the read heads attached to the same "tonearm" (I don't know the proper term). So even with a 4-platter drive, you can only read 1 platter at a time, I assume, unless they somehow sync the platters together. With a 30-platter drive, your throughput would be much worse than with 10 3-platter drives, because you would have 10 times the usable read-heads at any given moment.

    2. Re:lots of aluminum by SuperQ · · Score: 1

      Yup, IO rates are far more important than space for datasets of this size. As some other posts mentioned, they're probably using 2.5" enterprise drives instead of 3.5". If they were using 3.5" 2T drives the storage space would be more like 350P not 120P.

    3. Re:lots of aluminum by buback · · Score: 1

      that's kinda odd. i wonder why they don't spread the data out over each platter, like a raid array. ie read 8 bits off each platter to get one 32 bit block.

      That would make IO speeds faster too.

  36. IBM goes to work for fascists again by Anonymous Coward · · Score: 0

    IBM once made the punch card machines that made it easier for the Nazi party to round up the Jews. This sounds too familiar.

  37. That's only 400MB for every US American by Dr.+Spork · · Score: 1

    If this were for an American spy agency, maybe that would be enough. But when I think about how I have ten times this much data in my Gmail, and that Gmail isn't limited to only the US, I suspect that Google has a lot more storage space than this. Of course it's probably all very decentralized.

    1. Re:That's only 400MB for every US American by guruevi · · Score: 1

      Not everybody has more than 400MB in their e-mail account and a LOT of that can be compressed or de-duplicated (spam). Google doesn't need THAT much. I think for ALL their data they're probably close to 100 PB which is again, not all that impressive these days. Off course they have it redundantly in every data center so their capacity is much larger.

      From a scientific standpoint, this would be capable of storing video of everything a person has seen in his life or when running a simulation of the Universe, store all the properties of the Universe in the first few seconds after the big bang.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    2. Re:That's only 400MB for every US American by walshy007 · · Score: 1

      or when running a simulation of the Universe, store all the properties of the Universe in the first few seconds after the big bang.

      Something tells me you haven't thought this all the way through. To store _all_ the properties of the universe at a single point in time (yes I know things with time get very hairy and this itself is a simplification) _without_ abstraction you would need something more complex than the entirety of the universe, good luck with that.

    3. Re:That's only 400MB for every US American by Anonymous Coward · · Score: 0

      how much for every non-us american?

  38. I know who this is for by Anonymous Coward · · Score: 0

    It's going in one of Wal-Mart's data centers.

  39. Backups? by Anonymous Coward · · Score: 0

    So where/how do you back up something so massive? Would you have 100,000 hard drives backing up the other 100,000 drives, or build another 200,000 drive array off-site?

    1. Re:Backups? by stderr_dk · · Score: 1

      How exactly do you backup 120PB?

      Easy, you just buy one of these clusters... (for arbitrary values of "easy" and "just")

      What made you think this was the primary storage?

      --
      alias sudo="echo make it yourself #" ; # https://pipedot.org/~stderr & http://soylentnews.org/~stderr
  40. NOPE automated hard drive replacement by Anonymous Coward · · Score: 0

    the drives replacements will slide onto a conveyor and like the matrix when you are of no use will be dropped down a tube and recycled , the new drive slides along and a robot puts it all together.

  41. That's nothing by Anonymous Coward · · Score: 0

    You should see the size of the tape backups

  42. If it weren't for those meddling disk manufacturer by guruevi · · Score: 1

    It would be 122PB. 2PB lost on bad marketing. Gimme my 1024 bytes back. But all-in-all this isn't that surprising. You can get 1PB in a 42U rack these days.

    As a fun side note: You'll also need 122PB of tape storage (or 1.5 systems like this) just for backups. That's a lot of tape.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  43. I'm more interested in their backup solution by Anonymous Coward · · Score: 0

    Surely it must be a Beowulf cluster of those.

  44. Not so impressive in the cloud era by Anonymous Coward · · Score: 0

    50,000 machines at 4 drives per machine. Several cloud sites out there now are larger than this..

  45. Failure Rate by djl4570 · · Score: 1

    If the mean time between failure of a hard drive is around two hundred thousand hours, and this disk garden has two hundred thousand drives won't the technicians be replacing a drive every hour or so? Don't believe the MTBF figures from the drive manufacturers. Those appear to be butt numbers. http://www.pcworld.com/article/129558/study_hard_drive_failure_rates_much_higher_than_makers_estimate.html .... Two hundred thousand water cooled hard drives? How much does this fucking thing weigh? Allowing for half a pint of coolant in the pipes for each hard drive the figure comes out to over one hundred thousand pounds. That doesn't count the plumbing and coolant distribution system and heat exchanger. .... Two hundred thousand drives with plumbing for water cooling will take up a healthy sized volume. The drives alone require on the order of four million cubic inches. I have to wonder if this is a proof of concept storage array for DARPA on behalf of an alphabet agency that needs a place to park all their spy photos and sigint.

    1. Re:Failure Rate by Anonymous Coward · · Score: 0

      Assuming the array is brought online with the full number of disks, shouldn't they expect a pretty good run for the first 6 months or so while the drives are new and the survival probability of each drive is closest to 1? Then start getting overwhelmed with failures as the drives start to get older.

      Would depend most on how the heavily the array is used, and (if there's heavy reading) how full it is.

  46. Someday, we will carry these in our pockets by NicknamesAreStupid · · Score: 1

    Hard to image? Yes. But forty years ago, the largest computing center on earth had 57GB of disc storage.

  47. Failure rate? by erice · · Score: 1

    We know the capacity. We know the transfer rate. But how quickly do disks need to be moved in and out of the system in order to keep it running?

    200,000 is a lot of disks. I assume they are all hot swap with a great deal of redundancy because I would expect multiple drive failures every day. A raid0 with that many disks might never boot.

    1. Re:Failure rate? by lopaka1998 · · Score: 1

      I was thinking the same thing. A whole man's job could be to replace these disks. He'd probably always be busy, too.

      Estimating that a single drive fails in approximately 7 years, it is estimated that at or around that time and from then on you would have 28,571.429 drives failing per year. This comes to 78.277888 drives per day that would need to be replaced to maintain the system. And that's counting on it being a raid system that can restore itself.

    2. Re:Failure rate? by Bucky24 · · Score: 1

      More than one man. I imagine that any organization with the money to actually pay for something like this could afford a crew of a few dozen to maintain it.

      --
      All the world's a CPU, and all the men and women merely AI agents
  48. It will be used for... by Anonymous Coward · · Score: 0

    skynet

  49. Post The Failure Rates by Manufacturer by Anonymous Coward · · Score: 0

    I'd assume they'll be sourcing drives for all manufacturers so it would be very useful if they would post the failure rates for each manufacturer.

  50. MTBF Question by skelly33 · · Score: 1

    Just curious if anyone has experience managing large, mechanical disk arrays, if you installed an array of such a size using identical hard drives and bringing everything online relatively at the same time, would there be an increased likelihood of ALL the drives dying at roughly the same time? Could failure statistics bite you with enough simultaneous failures to negate redundancy?

    1. Re:MTBF Question by SuperQ · · Score: 1

      I do manage large storage farms in the petabytes range. There is a curve to the rate at which disks die. It mostly seems kinda obvious.

      #1 - Infant mortality. I see a bunch of drives fail within the first few months of a new install.
      #2 - Increased death rate as the drives age. Usually when the drives start to reach the warranty age. This can be accelerated depending on the IO load of the system.

      There's a lot of great info out there. Here's one good whitepaper:
      http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/disk_failures.pdf

    2. Re:MTBF Question by ImprovOmega · · Score: 1

      In my reliability class in college they called it a "bathtub curve" with burn-in, useful life, and end of life phases. Most MTBF docs are built around describing the useful life phase. All bets are off for the first couple weeks and anything post-warranty.

  51. Backups? by Anonymous Coward · · Score: 0

    How exactly do you backup 120PB?

  52. Re:big brother by TaoPhoenix · · Score: 1

    I'll give you credit for "this used to be true" back in the day when a computer was a 486 on a modem. It's absolutely not true any more.

    Govt is Big Brother, and they Like it. And they absolutely have the resources to do it.

    Why? Because all they need to do is a Red Flag system. Joe Average doesn't really produce that much data per day all by himself, and .gov isn't trying to perfectly reproduce the entire activity. They just need to know if something is getting juicy.

    "Look! Here's a 12 Gig file of Joe's activity for the month! Control-F and search for the words "Music" and "Movie" and "Copy".

    Lights out.

    The part you are glossing over is how much help they are getting from nice Corps. ISPs, Telecoms, Facebook, and Google.

    So to play the "nah, don't worry" line is completely misleading.

    --
    My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
  53. imagine the brown out! by Lead+Butthead · · Score: 1

    Can you just imagine the brown up when they power up the drive farm?
    In practice they would be doing sequential spin up. I do however, wonder how long that would take to sequentially spin up 200k drives.

    --
    ELOI, ELOI, LAMA SABACHTHANI!?
  54. 600GB drives? by Lead+Butthead · · Score: 1

    based on 120 million GB and 200k drives, the per-drive capacity works out to 600GB a piece. Sounded like they're stringing together a bunch of WD Velociraptors.

    --
    ELOI, ELOI, LAMA SABACHTHANI!?
  55. Don't be silly. by Oxford_Comma_Lover · · Score: 1

    What's it for? No surprise, domestic spying.

    I suspect just spying generally, including gathering information from non-spying sources, and including non-domestic. Why on earth would it be limited to domestic spying?

    --
    -- IANAL, this isn't legal advice, and definitely isn't legal advice for you. Also, Squee!
  56. C: by goombah99 · · Score: 1

    I don't know what it is for but I know the name of the drive is "C:"

    --
    Some drink at the fountain of knowledge. Others just gargle.
  57. The real question is by Anonymous Coward · · Score: 0

    How much is that terabyte number in neocortices?

  58. Way too big for... by cvtan · · Score: 1

    It's way too big for any governmental use, NASA, or weather-related supercomputing application. Must be for Facebook

    --
    Sorry, but gray text on gray background is making my eyes bleed.
  59. It's to be filled with by Normal+Dan · · Score: 1

    ASCII Pr0n

    --
    A unique way to learn a language: http://languageloom.com
  60. MTBF by devnullkac · · Score: 1

    This isn't a genuine statistical analysis, but a back-of-the-napkin calculation suggests that if they use hard drives with an MTBF around 3 years, they'll be replacing one drive every 7.5 minutes. If your employee can run fast, that's a 24/7 fulltime job.

    --
    What do you mean they cut the power? How can they cut the power, man? They're animals!
    1. Re:MTBF by ImprovOmega · · Score: 1

      Enterprise grade drives can be had with an MTBF of 2 million hours (during their useful life, realistically you need to replace them every 7 years or so regardless). That would put unexpected failures at one every 10 hours or 2-3 per day for an array this size, with a small army needed for "drive swap month" every 6-7 years.

  61. LoC? by cashman73 · · Score: 1

    Assuming that you could get 50 Libraries of Congress onto a single petabyte drive, you ought to be able to get 6,000 Libraries of Congress onto one of these 120 petabyte arrays,. . .

  62. Imagine a beowolf cluster of these! by Iamthecheese · · Score: 1

    I know of at least three companies (Apple, Google, Microsoft) just off the top of my head rich enough and ballsy enough to try AI on a scale that's never been tried before. I'm crossing my fingers.

    --
    If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
  63. There's already a unit of measure for this. by symbolset · · Score: 1

    the "milligoogle"

    --
    Help stamp out iliturcy.
  64. OMFG.... Killer floppy pile of death!!! by Anonymous Coward · · Score: 0

    If they could make a 120PB cluster using floppy disks, I would be much more entertained by this.

    Well, of course, they already did the floppy RAID. But it didn't quite get to 120 petabytes.

    It's worthwhile considering how much space would be required simply to *store* all those floppies in their boxes. This article calculates how many 1.44MB floppies a "modest" 1.5 terabyte hard drive would require to back up. It's about a million, obviously. Which would require a near-cube-shaped stack around 3 metres on each side (see footnote 3)- that's a cubic pile just under twice the height of an average man, simply to back up the contents of an unremarkable 1.5 terabyte hard drive.

    120 petabytes is around 80,000 times bigger than *that*.

    Our cube would have to be 80,000 ^ (1/3) = 43 times higher than 3 metres, that's a cube around 129 metres high!!!

    The amount of space that would be required if punched paper tape was used instead is left as an exercise to the reader.

    1. Re:OMFG.... Killer floppy pile of death!!! by Anonymous Coward · · Score: 0

      Edit; the calculations above are for the floppy discs *without* their boxes, not with as was incorrectly stated. Which makes it even worse!

  65. it's for my torrent server... by Anonymous Coward · · Score: 0

    OKAY, cat's out of the bag on this on - I'n the customer and it's for my
    torrent server (which BTW has no infringing content stored on it) ...

  66. How things change! by spaceyhackerlady · · Score: 1

    A company I used to work for were in the process of developing some new products when I started. They were very good with lasers, since an early product of theirs was a storage device that stored a terabyte of data on optical media. At the time (early 1990s) the market for storing such vast quantities of data was limited. Worse, the main customers were three-letter agencies who had security concerns about buying from a non-U.S. company, so they sold the product line to a U.S. company and went on to other things.

    They proceeded to develop some good stuff, but also developed some real crap, and supported it all miserably. Along the way they fucked me over badly. They were subsequently bought out and asset-stripped. I'm still here. They're not. Serves the bastards right... :-)

    ...laura

  67. What to do with 120PB of storage? by Anonymous Coward · · Score: 0

    Ponies... as far as the eye can see...

  68. Which storage product? by Anonymous Coward · · Score: 0

    The article is missing a key information: which storage product did they used in this GPFS Cluster?
    Sonas?
    XIV?
    DS? which model?
    other?

  69. mv /dev/zero /mnt/petadrive by Archwyrm · · Score: 1

    Finally a place to store all my zeros!

    --
    Fascism should more properly be called corporatism because it is the merger of state and corporate power. -- Mussolini
  70. DNA data storage by Anonymous Coward · · Score: 0

    It's for Jurassic Park

  71. GPFS by Anonymous Coward · · Score: 0

    Isn't that one less than HPFS?

  72. So? by DreadfulGrape · · Score: 1

    MS Word will still take 20 seconds to launch.

    --
    sig has been sent away for a few small repairs...
  73. Fail rate by wesleyjconnor · · Score: 1

    Given the fail rate on hard drives, replacing these would be a full time job. No?

  74. What about drive failures? by Anonymous Coward · · Score: 0

    At the rate that IBM SAN disks fail they'll have to employ one or more people full time to replace failed drives.

  75. Re:If it weren't for those meddling disk manufactu by Vegemeister · · Score: 1

    Actually, 120PiB is 135 PB.

  76. Microcluster system by marcuz · · Score: 1

    I am not impressed, its tiny. Do you know how much capacity our bittorrent network has folks? Image a beowulf cluster of these... and you got it.

  77. obvious use.... by Anonymous Coward · · Score: 0

    Obviously it will be used by people wanting to download the internet.