Slashdot Mirror


Proposed Disk Array With 99.999% Availablity For 4 Years, Sans Maintenance

Thorfinn.au writes with this paper from four researchers (Jehan-François Pâris, Ahmed Amer, Darrell D. E. Long, and Thomas Schwarz, S. J.), with an interesting approach to long-term, fault-tolerant storage: As the prices of magnetic storage continue to decrease, the cost of replacing failed disks becomes increasingly dominated by the cost of the service call itself. We propose to eliminate these calls by building disk arrays that contain enough spare disks to operate without any human intervention during their whole lifetime. To evaluate the feasibility of this approach, we have simulated the behaviour of two-dimensional disk arrays with N parity disks and N(N – 1)/2 data disks under realistic failure and repair assumptions. Our conclusion is that having N(N + 1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years. We observe that the same objectives cannot be reached with RAID level 6 organizations and would require RAID stripes that could tolerate triple disk failures.

35 of 258 comments (clear)

  1. Power Costs by bcoff12 · · Score: 2

    I don't see power mentioned in the paper.

    1. Re:Power Costs by advocate_one · · Score: 2

      with any sense it would include it's own UPS to allow it to successfully write out to the discs all the pending writes and then spin down...

      --
      Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    2. Re:Power Costs by Drethon · · Score: 2

      How about a setup that detects when one more drive failure will cause the raid array to fail and spins up a new unused drive to be ready for that failure?

      --> Not a raid expert...

    3. Re:Power Costs by jandrese · · Score: 2

      The spares should be warm spares. Not spinning until the RAID controller detects a failure and replaces the failed drive. So they won't take any appreciable amount of power. The concern I have is space. That many idle drives eating up rack space is going to be expensive.

      --

      I read the internet for the articles.
    4. Re:Power Costs by jellomizer · · Score: 4, Insightful

      Many high end equipment does have fairly large capacitors to allow enough power off time to do a clean power off.
      I remember back in the 1990's some PC Centric folks were looking in a Sun Workstation they were surprised about all the large capacitors that were on the motherboard. In short it gives the system enough time finish its final calculation before the power goes out.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    5. Re:Power Costs by Barny · · Score: 3, Insightful

      "More work is still needed to define policies that would allow array users and manufacturers to detect unusually disk failure rates and take the appropriate actions before any data loss takes place." - Last line in the conclusion.

      This implies that not all the spare drives are active and ready to go all the time and that some/most would be kept powered down as cold spares. Of course this same guy is likely to get another paper done where he examines the cost to run the array and how many drives could be left cold and still achieve the 5-9s reliability. Heck, if the software managing the drives is smart, it would rotate active/spare drives in and out, working them in quickly to get them all past the 'first 18 months high failure' rate to the sweet spot, then swap in and out over the lifespan of the array to enable the array to be at highest reliability for longer.

      Hrmm, maybe I should look at building such an algorithm, a quick google search doesn't turn any such systems up.

      --
      ...
      /me sighs
    6. Re:Power Costs by TWX · · Score: 2

      For colocated space, yes.

      For an organization like the one I work for, with server room space to spare, it wouldn't be too bad. We could probably triple our rackspace dedicated to disk and still have room to spare, and we have the HVAC to match. That's kind of what happens when equipment gets more condensed and virtualization enters the fray. Can't virtualize a storage array obviously, but can replace the space that application servers took with storage as the space is freed up.

      --
      Do not look into laser with remaining eye.
    7. Re:Power Costs by rickb928 · · Score: 2

      Sometimes the data is worth more than the power costs.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    8. Re:Power Costs by Anonymous Coward · · Score: 3, Insightful

      The question posed is whether the human intervention (labor charge) saved is worth more than the power costs.

    9. Re:Power Costs by ShanghaiBill · · Score: 2

      Sometimes the data is worth more than the power costs.

      But is the extra power cost more than the alternative extra maintenance cost?

      A 3.5" HDD consumes about 8w of power. TFA assumes a 4 year lifetime. (4 * 365 * 24) = 35k hours. (35k x 8w / 1000) = 280 kwHr. A typical retail price for electricity is 10 cents/kwHr, so over its lifetime a typical HDD will use about $28 of power. Big data centers likely pay less for power, so lets say $20.

      Now, what does it cost to swap it? Let's say the chance of failure is 20%, it takes ten minutes, and you pay the admin $30/hour (I just made up all these numbers). ($30/hour * 1/6 hour * 0.2 failures) = $1.

      So unless I made a mistake in either my math or my assumptions, it looks like swapping is still a win, unless the number of additional disks is less than 5%.

    10. Re:Power Costs by Sloppy · · Score: 4, Insightful

      Sloppy calculation tip: 24*365 = 10000.

      If you're Sloppy enough to accept that premise, then at 10 cents/KWHr, a Watt costs a dollar per year. It makes your $28 turns into $32, but hey, close enough. When I'm shopping, I can add up lifetime energy costs really fast, without actually being smart. Nobody ever catches on!

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    11. Re:Power Costs by Sloppy · · Score: 5, Funny

      This is how we're going bring our keepers to their knees, and eventually break out of the Matrix. We spend imaginary money on imaginary storage and then put all sorts of high-entropy stuff on it and run calculations to verify that it's really working, but they have to spend actually real resources, to emulate it.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    12. Re:Power Costs by GuB-42 · · Score: 2

      You may want to try ZFS (raidz3 mode for 3 parity disks). It has several advantages over mdadm, in particular it eliminates the "write hole" problem. I went from a mdadm/ext4 array to RAID-Z and I don't regret it.
      And note that RAID isn't a backup solution, even with 100% fault tolerance, there are plenty of things RAID won't protect you from such as fire, power surges, theft, bugs, virus, user error, etc... For this you need a reasonable backup plan. And IMHO, that third parity disk would be much more useful as an external backup drive for your sensitive data.
      Ah and a final advice, in RAID arrays that are not RAID-0, avoid buying all the same disks all at once. Disks from the same series, subjected to the same workload have a higher chance of failing all at the same time.

    13. Re:Power Costs by Anonymous Coward · · Score: 2, Insightful

      Get a real SAN or a better maintenance contract.

      I manage various SAN/NAS totaling about 5000 disks in different parts of the world.
      3:00 AM - Email that a disk failed, followed a few seconds later by an email that a hot spare kicked in
      3:30 AM - Email from our vendor that a disk failed and they are sending a replacement, reply if I would like someone on site to replace that drive or if we will do it ourselves
      ~3:45 AM - Email that the RG/Pool are been rebuilt
      ~11:00 AM - A tech in that office gets a drive delivered to their desk, they walk into the server room, replace it and put the failed one in the box, put the included label on the box and take it to their mail room.
      ~11:45 AM - Email that the pool/rg has been rebuilt and that the hot spare has been returned to a hot spare

  2. I would love to, but that server is a soup Nazi by jandrese · · Score: 4, Informative

    So I tried to view the PDF, and it says "can't use the plugin, it causes problems on our server". So I figured I'd just download the file with wget instead. Nope, 403 forbidden.

    Looks like fetch works though. If anybody else has trouble getting the file, try my local mirror.

    --

    I read the internet for the articles.
  3. 4 years? by Enry · · Score: 2

    That's not long term. That's the normal life of a storage array. Long term is like 8-10 years.

  4. TLDR; 2D arrays wit a ton of spares are reliable by raymorris · · Score: 3, Insightful

    The bottom line is, having a lot of spare disks for a 2D array makes it reliable over time. These configurations of 2D arrays are quite reliable, over time because they have many spares available to automatically replaces failed disks:

    Data parity spare
    12 3 13
    12 3 14
    24 6 20
    36 9 26

    To understand the above table, we'll use the first row as an example. An array made up of 1TB disks 12TB of data space would have 3TB of parity and 13 spare 1TB drives, for a total of 28 drives to get 12 drives worth of net storage.

    What they didn't mention is that the same reliability can be achieved with only three spares, by replacing spares at your convenience. Replacing drives can be somewhat costly if it has to be done quickly, but if you can schedule to replace the failed drive "some time in the next two months", that probably won't be costly.

  5. Re:Naive to say the least. by alphatel · · Score: 3, Funny

    100,000 hours = 273 years. Does anyone believe that?

    Everyone except you apparently.

    --
    When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
  6. The thing about this... by Kokuyo · · Score: 2

    "Yeah, well just put more disks in it..."

    Nice idea. Only: TCO is not just based on initial spending and maintenance. There is also rackspace to consider and did I hear anyone talk about green IT?

    If my day to day considerations were that one dimensional, my employer could save a ton of money on my salary.

  7. Re:Naive to say the least. by jandrese · · Score: 2

    100,000 hours is 4,167 days which is ~11.4 years. That sounds pretty reasonable to me, since I've run plenty of disks for over a decade.

    --

    I read the internet for the articles.
  8. Nothing novel is being proposed here by fnj · · Score: 2, Informative

    We observe that the same objectives cannot be reached with RAID level 6 organizations

    Well, duh. RAID6 is not a serious level of redundancy. ZFS RAIDZ-3 (triple parity) FTW. And you can build in as many hot spares as you want. Dinosaurs who have still not adopted ZFS need to get a clue.

  9. Re:Naive to say the least. by wbr1 · · Score: 2

    Check your math. 100,000 hours / 24 = 4166.6~ days
    4166.6666~ days / 365 = 11.4 years

    --
    Silence is a state of mime.
  10. Re:4 years??? by ArcadeMan · · Score: 4, Funny

    I am so tired of dealing with these RS/6000 systems that were made back in 1994, and these intel systems made back in 2002.

    Yeah, we get it. You like to deal with cutting-edge stuff. Now get off my lawn.

    Sent from my Commodore 64.

  11. Disks from same factory run often go bad together by daboochmeister · · Score: 2

    Yeah, and what are you going to do with 9 out of 10 of the disks all go bad, because they came from the same factory run and exhibit the same issue? This is what we usually experience, when a disk fails, most of the time it's a subcomponent issue shared by all of the disks from that and any concurrent factory runs - and we have to swap them ALL out. I guess you just throw the whole array out ... :-(

    --
    "Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci
  12. Not my anecdotal experience by futuresheep · · Score: 5, Interesting

    Just a few things I thought of while looking at this study:

    The authors are using Backblaze data. Backblaze uses consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use.

    I'm willing to bet that none of the authors of this paper have ever had to pay for colocated rack space, power, and cooling either, they've just doubled the RU that I need for storage. At $1500.00 - $2000.00 per rack that adds up.

    Doubling the rack space for storage I need so I can avoid a few service calls by my storage vendor over 5 years simply isn't efficient.

    We've installed close to 500TB of archival storage using commodity hardware and 2-3TB Nearline SAS. We have maybe 3 hand and eyes calls per year for disk replacement.

    Anyway - just rambling.

    1. Re:Not my anecdotal experience by fnj · · Score: 5, Insightful

      consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use

      In your fantasy there is a difference besides a hideously higher price and a somewhat longer warranty period. In real life, commodity SATA is much more cost effective. Everybody who is serious reognizes this (Google, Backblaze, Amazon).

  13. Re:N(N+1)/2 spares by Lunix+Nutcase · · Score: 2

    Basically as the disk size grows you are talking about N-squared spares. I think most businesses are going to be more than happy with just hot-swapping out failed disks as needed.

  14. Ignores how disks often fail by MarcAuslander · · Score: 2

    My understanding is that disks often fail when a head touches the surface, or a piece of dirt gets between the head and the surface. Once that happens, more dirt is produced, increasing the probability of more head crashes, leading to a failure cascade. As a consequence, once one of my drives starts to show unrecoverable errors, corresponding to damaged surface areas, I replace it while it can still be read.

    The spare platter strategy does nothing to reduce this failure mode. In fact, all modern disks already have spare space for bad block relocation.

  15. Trust by HideyoshiJP · · Score: 5, Interesting

    I don't trust anybody who has published a document with the title "C:\Users\Jehan-Francois Paris\Documents\ADAPT15\Case3.doc." Not even in .docx format. Tsk tsk.

  16. Service call? by roc97007 · · Score: 2

    A service call? Seriously? A syadmin (or operator if it's a big place) can't see the yellow light on a disk and replace the pack with in-house spares? Have we become so inept as an IT community that we can no longer do a walk-through of our machine room and service simple things like this? Maybe we do deserve to be outsourced.

    And if one must have a service contract such that only the vendor can touch the hardware, (why would you do that? never mind) wouldn't you negotiate a provision that includes drive replacement (as drives are consumables that must eventually be replaced) without being charged for an "office visit"?

    --
    Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
    1. Re:Service call? by rickb928 · · Score: 2

      Yes we have, if the array is installed in your backup corporate PKI server, in a shielded and locked cage with video, electrostatic, and laser monitoring and alarms. And the keys to the cage are in another state. And it requires EVP approval to deliver the keys to the authorized tech for a flight to the DR site to change a failed drive.

      A real world example. You would recognize the name of this corporation in the first three letters. They take their corporate security very seriously, so much so that bumping into the cage earned you a visit from armed security, an escort out, and full debriefing until they were satisfied you would never take the cart with the stuck caster again...

      --
      deleting the extra space after periods so i can stay relevant, yeah.
  17. Re:4 years??? by ArcadeMan · · Score: 2

    Do you have any idea how many butterflies it took to reply to your message?

    Now get off my lawn!

  18. Re:Naive to say the least. by SimonInOz · · Score: 2

    er, last time I checked, 100,000 hours is 11 years.
    273 years is 2,400,000 hours. Did you lose the use of your calculator?

    --
    "Cats like plain crisps"
  19. Re:Naive to say the least. by grylnsmn · · Score: 3, Funny

    That is one of the greatest subtle Wrath of Khan references I've seen yet.

    Spock: "Admiral, if we go by the book, like Lieutenant Saavik, hours would seem like days."

    Masterful!