Slashdot Mirror


Proposed Disk Array With 99.999% Availablity For 4 Years, Sans Maintenance

Thorfinn.au writes with this paper from four researchers (Jehan-François Pâris, Ahmed Amer, Darrell D. E. Long, and Thomas Schwarz, S. J.), with an interesting approach to long-term, fault-tolerant storage: As the prices of magnetic storage continue to decrease, the cost of replacing failed disks becomes increasingly dominated by the cost of the service call itself. We propose to eliminate these calls by building disk arrays that contain enough spare disks to operate without any human intervention during their whole lifetime. To evaluate the feasibility of this approach, we have simulated the behaviour of two-dimensional disk arrays with N parity disks and N(N – 1)/2 data disks under realistic failure and repair assumptions. Our conclusion is that having N(N + 1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years. We observe that the same objectives cannot be reached with RAID level 6 organizations and would require RAID stripes that could tolerate triple disk failures.

258 comments

  1. Power Costs by bcoff12 · · Score: 2

    I don't see power mentioned in the paper.

    1. Re:Power Costs by advocate_one · · Score: 2

      with any sense it would include it's own UPS to allow it to successfully write out to the discs all the pending writes and then spin down...

      --
      Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    2. Re:Power Costs by Drethon · · Score: 2

      How about a setup that detects when one more drive failure will cause the raid array to fail and spins up a new unused drive to be ready for that failure?

      --> Not a raid expert...

    3. Re:Power Costs by Enry · · Score: 1

      IIRC XFS/SGIs had this built in that there was just enough juice to flush buffers to disk while everything was spinning down.

    4. Re:Power Costs by Anonymous Coward · · Score: 0

      I would think they would be low, since they shouldn't need to spin up the hot spares. This isn't useful for me, at home, because I can just swap the drives out myself, but for a small business that relies on paying pc technicians to come out and fix anything that goes wrong, this doesn't sound bad. For me, I would like to see mdadm support 3 parity drives, though I haven't researched why you can't. I'm running RAID6 on 8 3tb drives right now, and I plan to expand. I really have to think about how many drives I can comfortably put in my RAID6 before 2 drive fault tolerance is not enough and I have to build another RAID. I'm thinking about RAID10, but it's still pricey to do and my data isn't really mission critical.

    5. Re:Power Costs by jandrese · · Score: 2

      The spares should be warm spares. Not spinning until the RAID controller detects a failure and replaces the failed drive. So they won't take any appreciable amount of power. The concern I have is space. That many idle drives eating up rack space is going to be expensive.

      --

      I read the internet for the articles.
    6. Re:Power Costs by jellomizer · · Score: 4, Insightful

      Many high end equipment does have fairly large capacitors to allow enough power off time to do a clean power off.
      I remember back in the 1990's some PC Centric folks were looking in a Sun Workstation they were surprised about all the large capacitors that were on the motherboard. In short it gives the system enough time finish its final calculation before the power goes out.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    7. Re:Power Costs by Anonymous Coward · · Score: 1

      with any sense it would include it's own UPS to allow it to successfully write out to the discs all the pending writes and then spin down...

      Though you make a good point, I think bcoff12 means the potential power consumption of such a large disk array. Over the lifetime proposed, that could be significant enough to offset the benefit of high availability over other solutions plus regular backup.

    8. Re:Power Costs by Barny · · Score: 3, Insightful

      "More work is still needed to define policies that would allow array users and manufacturers to detect unusually disk failure rates and take the appropriate actions before any data loss takes place." - Last line in the conclusion.

      This implies that not all the spare drives are active and ready to go all the time and that some/most would be kept powered down as cold spares. Of course this same guy is likely to get another paper done where he examines the cost to run the array and how many drives could be left cold and still achieve the 5-9s reliability. Heck, if the software managing the drives is smart, it would rotate active/spare drives in and out, working them in quickly to get them all past the 'first 18 months high failure' rate to the sweet spot, then swap in and out over the lifespan of the array to enable the array to be at highest reliability for longer.

      Hrmm, maybe I should look at building such an algorithm, a quick google search doesn't turn any such systems up.

      --
      ...
      /me sighs
    9. Re:Power Costs by TWX · · Score: 2

      For colocated space, yes.

      For an organization like the one I work for, with server room space to spare, it wouldn't be too bad. We could probably triple our rackspace dedicated to disk and still have room to spare, and we have the HVAC to match. That's kind of what happens when equipment gets more condensed and virtualization enters the fray. Can't virtualize a storage array obviously, but can replace the space that application servers took with storage as the space is freed up.

      --
      Do not look into laser with remaining eye.
    10. Re:Power Costs by Anonymous Coward · · Score: 0

      If you have a flood or fire you will not be able to swap out a drive and magically get your data back. You need to come up with a more robust solution, or else you will learn in a way that you will not like.

    11. Re:Power Costs by NatasRevol · · Score: 1

      I have yet to meet a small business that would be happy to pay for what is essentially raid10+1 (N(N+1)/2).

      --
      There are two types of people in the world: Those who crave closure
    12. Re:Power Costs by Anonymous Coward · · Score: 1

      you could include standby spared which do not receive power until needed. broken disks can be powered down.

    13. Re:Power Costs by LWATCDR · · Score: 1

      Or how about having the array swap in spares.
      Every few weeks or so one of the spares could start to act as a mirror of an active drive and once that drive is mirrored you swap the active drive to the spare and the spare to the active?

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    14. Re:Power Costs by Anonymous Coward · · Score: 0

      you can't virtualize storage?

    15. Re:Power Costs by silas_moeckel · · Score: 1

      Well since they are not supposed to need to be hot swap you can get 12+ drive into a 1ru chassis with redundant power and a fairly beefy server. That is 3x the density of traditional 4 up front 1ru. Expanding to 2ru gives 12 hot swap 3.5's or 24 2.5 still 2x the density in 3.5's for non hot swap. Potentially even higher with 2.5's, though highest I find is 88 hot swaps in a 4ru or 22 per ru coupled with a rather beefy server.

      --
      No sir I dont like it.
    16. Re:Power Costs by Immerman · · Score: 1

      How do you figure? I mean sure, presumably the spares would be inactive until a replacement was needed, to save both power and wear and tear, but how do you figure that that is an implication of needing to detect anomalous failure rates to avoid data loss? No matter what strategy you're using, if you've got N-nines projected reliability over Y years assuming normal failure rates, then if you're suffering from anomalously high failure rates you're going to need to replace some drives early to maintain the same reliability for the full Y years.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    17. Re:Power Costs by DigiShaman · · Score: 1

      You can virtualize and abstract out to your heart's content. What TWX said was simple; at some point at the end of the day, all that data has to be stored on physical media. That takes physical rack space.

      --
      Life is not for the lazy.
    18. Re:Power Costs by rickb928 · · Score: 2

      Sometimes the data is worth more than the power costs.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    19. Re:Power Costs by mlts · · Score: 1

      Cooling costs come to mind as well. SSDs are one thing, as they can be powered off and not used. However, HDDs have to be either spinning (which creates a lot of heat, especially at 10k+ RPMs that enterprise disks spin at), or spun up/down, and spinning enterprise disks up and down isn't good for them, and might even cause array faults unless the array firmware is designed to deal with it.

      There is also expense. If I have five hard disks worth of data, I need (5*4)/2, or ten HDDs by the OP's metrics. However, I've had batches of hard drives all fail at once. If I get multiple failures, even RAID 6 isn't going to help. If HDDs popped at random times, I might be OK, but not in this case.

      Of course, I've ranted about this before... RAID is solid for protecting data against disk failure... but that is just one of -many- failure scenarios. I have seen disk controllers fail and write garbage to the entire array. One goober doing an rm or a dd command will toss the array. If you want serious backups, you need to not just focus on disk. Tape isn't perfect, but done right, after the initial cost of the drive, the cartridges are inexpensive, take zero watts (other than climate control), last decades, have innate encryption (LTO-4 and newer), and can have hardware write protect enabled, as well as WORM media. This is great for people with the "keep it forever" mindset. Just set a password [1], stream the data off to a pile of WORM tapes, and stuff those in a closet somewhere. If the tapes vanish, since they were encrypted, and assuming only a few people have the password, it can be written off has "just" a hardware loss.

      [1]: It is boneheadedly easy to set encryption on LTO media via SPIN/SPOUT, so might as well set something, even if it is a variant of "correct horse battery staple". Ideally, the password should change every year or so... but just setting -something- is better than nothing.

    20. Re:Power Costs by rickb928 · · Score: 1

      (you can't virtualize the actual disks)

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    21. Re:Power Costs by K.+S.+Kyosuke · · Score: 1

      Either that, or the usual economies of scale would apply (clever block allocation and low-power large cache electronics to increase performance while decreasing energy costs per data transaction).

      --
      Ezekiel 23:20
    22. Re:Power Costs by Anonymous Coward · · Score: 0

      that's a dumb argument. i could s/storage/cpu and return the argument, but we're stipulating you can
      virtualize cpu. generally one can save a lot of space by thin provisioning and deduplication.
      it's essentially the same idea is cpu virtualization, except it's virtual space, and not virtual cycles.
      i.e. if i cycle/block is not available when you're not looking for it, it doesn't matter if it exists or not.

    23. Re:Power Costs by rickb928 · · Score: 1

      It seems that one assumption in the study is predictable or consistent failure rates or timing. This would make sense if the drives were all the same make/model/manufacturing dates, but if not, well, then the model changes and they would be needing more intelligence to deal with unpredictable failure rates and having to spin up cold spares at different rates, predicting failure.

      Which all makes a world of sense to me. When I hovered over Raid 5 arrays with cold spares, especially in NetWare servers where 'device deactivated due to non-media defect' errors were not uncommon, I would add spares to save on windshield time to swap them out. Not all customers were comfortable going to the supply locker, grabbing a drive tray, and swapping out the tray with the flashing red light.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    24. Re:Power Costs by Anonymous Coward · · Score: 3, Insightful

      The question posed is whether the human intervention (labor charge) saved is worth more than the power costs.

    25. Re:Power Costs by ShanghaiBill · · Score: 2

      Sometimes the data is worth more than the power costs.

      But is the extra power cost more than the alternative extra maintenance cost?

      A 3.5" HDD consumes about 8w of power. TFA assumes a 4 year lifetime. (4 * 365 * 24) = 35k hours. (35k x 8w / 1000) = 280 kwHr. A typical retail price for electricity is 10 cents/kwHr, so over its lifetime a typical HDD will use about $28 of power. Big data centers likely pay less for power, so lets say $20.

      Now, what does it cost to swap it? Let's say the chance of failure is 20%, it takes ten minutes, and you pay the admin $30/hour (I just made up all these numbers). ($30/hour * 1/6 hour * 0.2 failures) = $1.

      So unless I made a mistake in either my math or my assumptions, it looks like swapping is still a win, unless the number of additional disks is less than 5%.

    26. Re:Power Costs by ShanghaiBill · · Score: 1

      It seems that one assumption in the study is predictable or consistent failure rates or timing.

      That would be a very bad assumption. Backblaze looked at 100,000+ drives and found that some models were more than 30 times as likely to fail as others (Hitachi was most reliable, Seagate was worst, for the models they reported). They also found that consumer drives were slightly more reliable than enterprise drives, despite costing half as much.

    27. Re:Power Costs by Sloppy · · Score: 4, Insightful

      Sloppy calculation tip: 24*365 = 10000.

      If you're Sloppy enough to accept that premise, then at 10 cents/KWHr, a Watt costs a dollar per year. It makes your $28 turns into $32, but hey, close enough. When I'm shopping, I can add up lifetime energy costs really fast, without actually being smart. Nobody ever catches on!

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    28. Re:Power Costs by Sloppy · · Score: 5, Funny

      This is how we're going bring our keepers to their knees, and eventually break out of the Matrix. We spend imaginary money on imaginary storage and then put all sorts of high-entropy stuff on it and run calculations to verify that it's really working, but they have to spend actually real resources, to emulate it.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    29. Re:Power Costs by Anonymous Coward · · Score: 0

      I have yet to meet a small business that would be happy to pay for what is essentially raid10+1 (N(N+1)/2).

      It depends on the data size requirements. I'm at a small business where even a single super-cheap disk is more than big enough for our mailserver. So I said: let's get three of them, and I'll RAID1 them. It added something like $150 to the total cost of the project and now we can tolerate a double disk failure. (Though, mercifully, since we're finally starting using WD drives instead of Seagate, even single-disk failures are starting to become far less routine.) That was $150 well-spent and everyone knew it and nobody blinked when it came to paying for it. Reliability is worth something.

    30. Re:Power Costs by viperidaenz · · Score: 1

      An idle drive takes up around 5W.
      That's 43kWh per year. That's less than $10. Over 4 years a drive uses less power than the cost of the drive.

    31. Re: Power Costs by rickb928 · · Score: 1

      And if you send a tech, not the local admin, all the numbers change.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    32. Re:Power Costs by TWX · · Score: 1

      But it still has to reside on physical disks, just like virtual servers still have to run on physical hardware. There are hundreds of cores in a high-end Cisco UCS installation, on dozens of blades. The UCS can optimize what goes where for the IT group, but in the end that's all about density and the actual relationship between physical cores and virtual cores allocated to VMs is probably not as leveraged as you seem to think it is, especially for high-load usage.

      There still has to be disks, there still has to be RAM, there still has to be processors, there still has to be I/O.

      --
      Do not look into laser with remaining eye.
    33. Re:Power Costs by Anonymous Coward · · Score: 0

      > Big data centers likely pay less for power, so lets say $20.

      Depends on where you are. We're in the Seattle area which is very hostile to business, and we're paying about six times during peak hours of what I pay for power at home. Also, there's a business-crushing occupancy tax here on gross rather than net. We pay more in taxes every month than we have made total in profits the entire twelve year history of the company. There's a reason businesses are fleeing Seattle. Between my wife and I, we've lost nine jobs due to fleeing companies in the twenty years here. Boeing found it cheaper to move their HQ to Chicago than to stay here with our expensive power and opressive taxes. You're wrong that a data center is likely to pay less for power.

    34. Re:Power Costs by Anonymous Coward · · Score: 0

      Assumption: Disk failure to within 1 disk of failure tolerance triggers a page to the sys admin at 3 AM. He then spends the next four hours servicing hard drives, drinking coffee, and cussing out the servers, all billable hours. Total cost of disk swap: $120. Not being woken up at 3 AM, ever: priceless.

    35. Re:Power Costs by JWW · · Score: 1

      Yep. And this costs way less than bringing in and swapping out a part.

      I don't see any real reason not to just spin the spare drives.

    36. Re:Power Costs by dgatwood · · Score: 1

      In a curiously ironic twist, the hardware designed to protect consumer-grade disks from damage ends up destroying them. As I understand it, a number of fairly recent consumer drives exhibit a higher than normal failure rate because the heads break off of the arms when they collide with the park ramp. This is, at least in part, a consequence of making the arms smaller and lighter to improve seek times.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    37. Re:Power Costs by GuB-42 · · Score: 2

      You may want to try ZFS (raidz3 mode for 3 parity disks). It has several advantages over mdadm, in particular it eliminates the "write hole" problem. I went from a mdadm/ext4 array to RAID-Z and I don't regret it.
      And note that RAID isn't a backup solution, even with 100% fault tolerance, there are plenty of things RAID won't protect you from such as fire, power surges, theft, bugs, virus, user error, etc... For this you need a reasonable backup plan. And IMHO, that third parity disk would be much more useful as an external backup drive for your sensitive data.
      Ah and a final advice, in RAID arrays that are not RAID-0, avoid buying all the same disks all at once. Disks from the same series, subjected to the same workload have a higher chance of failing all at the same time.

    38. Re:Power Costs by PRMan · · Score: 1

      I've never seen an IT project at a medium to large company take less than 4 hours. Because in addition to changing the drive (1 hour max), you have to write up paperwork and track it (3 hours of organizational time).

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
    39. Re:Power Costs by PRMan · · Score: 1

      Also, drives are prone to "bad batches". It's easy to get a case of drives where 50% are bad. And then follow that up with 10 cases with 0 or 1 bad drives.

      It doesn't matter how many extra drives you have if they all came from the same bad batch.

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
    40. Re:Power Costs by Anonymous Coward · · Score: 2, Insightful

      Get a real SAN or a better maintenance contract.

      I manage various SAN/NAS totaling about 5000 disks in different parts of the world.
      3:00 AM - Email that a disk failed, followed a few seconds later by an email that a hot spare kicked in
      3:30 AM - Email from our vendor that a disk failed and they are sending a replacement, reply if I would like someone on site to replace that drive or if we will do it ourselves
      ~3:45 AM - Email that the RG/Pool are been rebuilt
      ~11:00 AM - A tech in that office gets a drive delivered to their desk, they walk into the server room, replace it and put the failed one in the box, put the included label on the box and take it to their mail room.
      ~11:45 AM - Email that the pool/rg has been rebuilt and that the hot spare has been returned to a hot spare

    41. Re:Power Costs by PRMan · · Score: 1

      The problem with Hitachi drives is that the performance is VERY uneven. I would buy WD instead.

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
    42. Re:Power Costs by Anonymous Coward · · Score: 0

      See Simplstor

      http://www.simplstor.com/

      They mainly do Linux but will build and support the equivalent hardware running Windows Storage server for you if you supply the MS license.

      Not pimping and not a customer, just a potential customer looking for relatively cheap bulk NAS with better commercial support than building it myself and what other COS NAS vendors like QNap and Synology currently provide.

    43. Re:Power Costs by ShanghaiBill · · Score: 1

      Also, drives are prone to "bad batches".

      Backblaze buys and installs ~50 drives per day. So the batches would even out.

    44. Re:Power Costs by dnavid · · Score: 1

      Now, what does it cost to swap it? Let's say the chance of failure is 20%, it takes ten minutes, and you pay the admin $30/hour (I just made up all these numbers). ($30/hour * 1/6 hour * 0.2 failures) = $1.

      So unless I made a mistake in either my math or my assumptions, it looks like swapping is still a win, unless the number of additional disks is less than 5%.

      Let me know when you can find a way to dispatch a tech to swap a hard drive in a tier 1 datacenter for a buck.

    45. Re:Power Costs by ShanghaiBill · · Score: 1

      The problem with Hitachi drives is that the performance is VERY uneven.

      Could you provide a citation for that? If your opinion is anecdotal, then how many drives is it based on?

    46. Re:Power Costs by ShanghaiBill · · Score: 1

      Let me know when you can find a way to dispatch a tech to swap a hard drive in a tier 1 datacenter for a buck.

      It doesn't cost a buck. It costs 5 bucks, but has a 20% chance of occurring in the 4 year lifetime of each HDD. Also, you would not "dispatch a tech". Instead you would send out a tech with a cart of, say, 50 HDDs. The the tech would walk down the aisles, pulling and inserting disks. That would be his full time job. If he could do 50 in an 8 hour shift, and is paid $30/hour, that is about $5/disk.

    47. Re:Power Costs by Anonymous Coward · · Score: 0

      You need more than just big caps though, but some way to notify the system that it is about to lose power. Otherwise the juice in those large caps could be used to start a new calculation or write instead of finishing stuff too.

    48. Re:Power Costs by dnavid · · Score: 1

      Let me know when you can find a way to dispatch a tech to swap a hard drive in a tier 1 datacenter for a buck.

      It doesn't cost a buck. It costs 5 bucks, but has a 20% chance of occurring in the 4 year lifetime of each HDD. Also, you would not "dispatch a tech". Instead you would send out a tech with a cart of, say, 50 HDDs. The the tech would walk down the aisles, pulling and inserting disks. That would be his full time job. If he could do 50 in an 8 hour shift, and is paid $30/hour, that is about $5/disk.

      No, you would not. The problem with this is that the whole point of the paper was to analyze ways to improve the reliability of disk arrays. You can do what you're describing if there was no specific timeframe in which hard drives need to be replaced: you just replace them whenever you get around to it, rather than soon after they fail. But that only works in environments where actual disk reliability is not important. In environments where actual array reliability is important, delaying the swapping of drives widens the window of vulnerability for an array, even one with hot spares, because of the need to survive the rare cases of multiple drives failing in a short span. That isn't likely, but when you're dealing with five nines of uptime requirement, those unlikely events have to be accounted for.

      I'm also trying to imagine implementing a system whereby a $30/hr tech just walks down aisles and pulls blinking drives and replaces them, and I'm thinking anyone who does that deserves the uptime they get.

    49. Re:Power Costs by Anonymous Coward · · Score: 0

      If you're outsourcing the maintenance (you should if you expect less than $60K/year in service calls), the swap is going to be more like $100 than $5. Data recovery is extra.

    50. Re:Power Costs by painandgreed · · Score: 1

      Now, what does it cost to swap it? Let's say the chance of failure is 20%, it takes ten minutes, and you pay the admin $30/hour (I just made up all these numbers). ($30/hour * 1/6 hour * 0.2 failures) = $1.

      I don't know where you work at or what your processes are like that it only takes ten minutes to swap a drive. Where I work, it takes 10 minutes for the admin to tell that the drive has failed and determine what model it is for the replacement. Add in another 30 minutes to submit RFQs to three different vendors because his request for extra drives at implementation was denied. Once he gets a quote, it takes another 60 minutes of email and meetings with the guy that OKs budget requests before getting his boss involved and telling him that, yes, the department really does need these drives. Another 30 minutes over the next month checking on the backorder of the said drives till they finally ship. 90 minutes after seeing that the drives have arrived at the enterprise to go down to the loading dock, confirm that they have been delivered, get somebody to tell him who they have been delivered to, track down that wrong person and get them to find the drives which they have already misplaced, and hand them over. In that time is another 5 minutes to fill out the proper change control forms and submit them, another 15 minutes to explain change control request and answer questions at weekly meeting to boss and coworkers, 10 minutes over the next three weekly meetings to explain he is still waiting on the drives to complete that change control. 60 minutes to explain change control request to the server farm department of the IT department and argue till they give their permission. Another 30 minutes to schedule a visit time with the server farm guardians for time to access the lights out center (where the lights are never really out, but they like to call it that). 20 minutes waiting for the guy to show up to let you into the server farm to swap the drive and find the server. 10 minutes to do the physical work of swapping the drive. 20 minutes of checking on the drive swap to make sure that the drives have been swapped successfully and data is being replicated to it correctly. 15 more minutes in the weekly meeting to explain that the drive has been swapped and that the change control request is now closed. Which comes to more like six hours and five minutes to swap a drive.

    51. Re:Power Costs by Tough+Love · · Score: 1

      It would be stupid to keep the spares running, that comes right off their life. Maybe just spin them up once a month. What I don't see mentioned is the falling cost of drives... failed drives are normally replaced by newer, higher capacity drives, or they should be. IOW, they should plug in spares over time with planned maintainance instead of dumbly overprovisioning those things permanently.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    52. Re:Power Costs by ShanghaiBill · · Score: 1

      you just replace them whenever you get around to it

      You don't have to wait till you have 50 dead drives to send out the guy with 50 replacements. A datacenter with a million drives, with a 20% failure rate over 4 years, would have 170 dead disks per day, or about 50 in an eight-hour shift. So the worker with the cart would move through the aisles, replacing the closest dead drive as they die. This would likely be faster than specifically dispatching someone to a particular dead drive, since the worker would already be in the vicinity.

      I'm also trying to imagine implementing a system whereby a $30/hr tech just walks down aisles and pulls blinking drives and replaces them

      How do you think big datacenters work? What is wrong with "pull and replace"? The rebuild should happen automatically when the good drive is inserted.

    53. Re: Power Costs by Anonymous Coward · · Score: 0

      This is the reality of enterprise storage. I work for one of the major storage companys (NetApp) and disk replacements are as simple as stated here. Failures are predicted, monitored, reported, and handled without use intervention. Replacement disks are automatically sent out and are end use replaceable in under a minute. Just pull the drive the report told you too (it will also have a status light indicating it has failed) and replace it.

      The added cost, rackspace, power/cooling of what the paper suggests less efficient. Not to mention what it does to your useable capacity reports for management. "But we have a PB free what do tou mean we can't use it?"

    54. Re:Power Costs by Barny · · Score: 1

      Did you see the numbers on some of those seagate batches?

      Now while I admit that 300 or so drives isn't really enough to warrant a sample size when some of their batches are in the tens of thousands, but if you get 300 drives and almost 70 of them die within a year, would you keep buying drives?

      --
      ...
      /me sighs
    55. Re:Power Costs by Anonymous Coward · · Score: 0

      Why is a failed drive considered as part of your change control program? Do those people consider the longer a drive is failed (or any single failure in a redundant system) the higher the risk which more than outweighs the calculated risk the very same change control program was supposed to avoid?

      Sorry, Not your fault but your IT department is whacked. Swapping a failed drive is the least of your worries.

      A) You should not put critical systems on equipment that:
      1) Is not under some sort of maintenance contract with a SLA or in the case of older legacy equipment, spare parts on hand.
      2) Where it takes 3 weeks to get common consumable replacement parts like hard drives and power supplies.
      3) There is no budget consideration for maintaining it.

      Any of those alone is a recipe for data loss and down time.

      B) You should not have redundant sub systems that have to go through a full blown change control process to get replaced
      C) You should not have multiple separate groups in IT that should argue or give grief about changing something as a common as a failed drive
      D) You should not have a wait to get permission or wait to get access to your data center.
      E) If there is a server farm department and a head of that department, why did he/she not know about the failed drive already and why are you handling it? What part of the farm is that department monitoring that does not include a hard drive? What are you possibly talking to that person about for 30-60 minutes? What was your discussion last week when a drive failed?

    56. Re:Power Costs by rtb61 · · Score: 1

      The real question is whether running down maintenance ability will sound real fine up until the moment of catastrophic failure and their ability to react to it has been totally compromised. This would result in hugely extended down time in the event of that catastrophic failure, what ever it's cause. Looks great on a spreadsheet and pumps up an executives bonus but the whole company ends up going boom when a catastrophic failure occurs because customers will not tolerate extended downtime and that downtime might not be hours but weeks on even months as the try to rebuild maintenance efforts so that their maintainers can rebuild the system.

      This kind of evaluation extends out to government, should governments pay the costs of maintaining manual systems ie pencil and paper because in the event of catastrophic failure recovery is bound to their ability to sustain the essential elements of government whilst digital system are rebuilt and as it will be required to rebuild those systems. Corporate executives abandon these ideas because of course costs affect bonuses and golden parachutes in the event of failure.

      --
      Chaos - everything, everywhere, everywhen
    57. Re:Power Costs by Anonymous Coward · · Score: 0

      Power? Ideally yes but in some places, it does not matter for cost. In the three data centers we have, we pay for a 30 amp circuit per rack. It does not matter if we have 2 servers in there or a few UCS chassis filled with blades or disks shelves jammed in there, it is the same cost to us. In 15 of our 17 offices with small/med data centers on site, electricity is non metered and included in the lease.

    58. Re:Power Costs by mewrei · · Score: 1

      IBM's XIV storage platform does this. Has it's own UPS and pushes data to disk from cache when the power goes out.

    59. Re:Power Costs by stoatwblr · · Score: 1

      Raid is old hat. Sufficiently advanced technology doesn't require that the disks be in the same enclosures or even in the same building.

      If you design around the concept that "drives fail, get over it" then the "grunt with a cart full of drives" model will work extremely well - and he doesn't need to do paperwork because the system has been setup to note serial numbers, locations and hours automatically as drives are removed and replaced.

      Any installation where a drive change is a big deal is either trivially small or incompetently run.

    60. Re:Power Costs by stoatwblr · · Score: 1

      No, it's a consequence of _having_ park ramps. They're a relatively recent development.

      I've seen WD drives with a few tens of hours on them and tens-of-thousands of head parks. That kind of thing is sheer stupidity and it's not much surprise the heads get damaged when they're shunted to the park ramp every couple of seconds.

    61. Re:Power Costs by stoatwblr · · Score: 1

      spare drives don't take any damage when cold.

      It's true that enterprise drives don't like being spun up/down, but the reality in most setups is that a spare is only spun up once - it's the start/stop cycles which drives object to.

      FWIW Idle enterprise drives tend to pull more like 10W than 5W

    62. Re:Power Costs by dgatwood · · Score: 1

      Yeah, but park ramps have been around for a couple of decades (the earliest patent filing I could find was filed in 1992), and they only started having insane levels of trouble fairly recently (by comparison). So it's probably the combination of excessive amounts of parking (as you mentioned) and having less structural support for the heads that makes them so problematic.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

  2. I would love to, but that server is a soup Nazi by jandrese · · Score: 4, Informative

    So I tried to view the PDF, and it says "can't use the plugin, it causes problems on our server". So I figured I'd just download the file with wget instead. Nope, 403 forbidden.

    Looks like fetch works though. If anybody else has trouble getting the file, try my local mirror.

    --

    I read the internet for the articles.
    1. Re:I would love to, but that server is a soup Nazi by Nutria · · Score: 1

      it says "can't use the plugin, it causes problems on our server".

      The name of the browser and plugin would be helpful...

      (The PDF happens to work perfectly on Linux with the built-in viewers of FF35 and Chromium 39.)

      --
      "I don't know, therefore Aliens" Wafflebox1
    2. Re:I would love to, but that server is a soup Nazi by whoever57 · · Score: 1

      So I tried to view the PDF, and it says "can't use the plugin, it causes problems on our server".

      Maybe they have problems with their disk array?

      But seriously, I had no problems downloading the document from the orginal site.

      --
      The real "Libtards" are the Libertarians!
    3. Re:I would love to, but that server is a soup Nazi by Anonymous Coward · · Score: 0

      Unless you use a sub-par OS, there's no plugin involved.

    4. Re:I would love to, but that server is a soup Nazi by ArcadeMan · · Score: 1

      No problem viewing the PDF file in Safari on OS X.

    5. Re:I would love to, but that server is a soup Nazi by jandrese · · Score: 1

      This was on Windows with Firefox and the Adobe plugin. I don't have the built-in plugin because I like popping out PDFs and because the built-in viewer is slow as balls on nontrivial PDFs.

      --

      I read the internet for the articles.
    6. Re:I would love to, but that server is a soup Nazi by Anonymous Coward · · Score: 0

      Works fine here. I use the built-in PDF viewer (Evince 3.10.3) and use the following browsers:
          Chromium 39.0.2171.65 Ubuntu 14.04 (64-bit)
          Firefox 35.0.1+build1-0ubuntu0.14.04.1
          Opera/9.80 (X11; Linux x86_64) Presto/2.12.388 Version/12.16
      Not exactly sure whether this means Opera is version 9.80 or version 12.16...

    7. Re:I would love to, but that server is a soup Nazi by Anonymous Coward · · Score: 0

      This was on Windows with Firefox and the Adobe plugin. I don't have the built-in plugin because I like popping out PDFs and because the built-in viewer is slow as balls on nontrivial PDFs.

      Loser. Try the Firefox plugin from Foxit software instead. It can handle "nontrivial" PDFs, and is quick about it. I also have full Adobe on my work PC, but use Foxit as the PDF reader. It's faster than the crappy insecure Adobe stuff.

    8. Re:I would love to, but that server is a soup Nazi by Anonymous Coward · · Score: 0

      So I tried to view the PDF, and it says "can't use the plugin, it causes problems on our server". So I figured I'd just download the file with wget instead. Nope, 403 forbidden.

      Looks like fetch works though. If anybody else has trouble getting the file, try my local mirror.

      Just use axel instead of wget. Problem solved.....

  3. 4 years? by Enry · · Score: 2

    That's not long term. That's the normal life of a storage array. Long term is like 8-10 years.

    1. Re:4 years? by jandrese · · Score: 1

      They only had availability data for 4 years of drive life. This is largely a math study. I'm not familiar with any implementations of their 2D parity system, although it is outside of my area of expertise. Their assumption that the service calls would always be more expensive seemed a little suspect to me. Rack space isn't free and when you have basically 100% redundancy or more in spare drives you're going to eat up a lot of space. Putting 54 spare drives in a rack that already has 11 parity disks and only 55 primary disks just doesn't seem efficient. Is all of that space really cheaper than a single service call during the life of the machine to replace 20 failed drives all at once (when the rack drops below say 6 spares of the original 26--saving you half of the space the spares would have taken up).

      I have also seen enough buggy RAID controllers in my day to make me very wary of that 2D raid arrangement in the paper.

      All in all this smells like a mathematicians solution to the problem, largely unbounded by real life concerns.

      --

      I read the internet for the articles.
    2. Re:4 years? by Enry · · Score: 1

      All in all this smells like a mathematicians solution to the problem, largely unbounded by real life concerns.

      I had the same thought. There's a few realities of storage that are missed here: storage use always increases, disks aren't the only things that fail, rack space isn't free, you usually have staff available already....

      This is an interesting idea if your storage is in a place where it can't be reached at all for some reason, but I think NASA and ESA have already done a good bit of research on that.

    3. Re:4 years? by stoatwblr · · Score: 1

      We run our arrays as long as we can. They tend to show a bathtub curve ramping up at the end of 6 years.

  4. And a three year warranty by Anonymous Coward · · Score: 0

    because, you know...

  5. 4 years??? by Anon-Admin · · Score: 1

    Really, 4 year life span and they are replaced?

    God I need to work for a company like that!

      I am so tired of dealing with these RS/6000 systems that were made back in 1994, and these intel systems made back in 2002.

    1. Re:4 years??? by ArcadeMan · · Score: 4, Funny

      I am so tired of dealing with these RS/6000 systems that were made back in 1994, and these intel systems made back in 2002.

      Yeah, we get it. You like to deal with cutting-edge stuff. Now get off my lawn.

      Sent from my Commodore 64.

    2. Re:4 years??? by operagost · · Score: 1

      Your C64 has video and keyboard I/O. Luxury! I would have responded earlier, but I was still keying my response into the front panel of my Altair. Now get off my lawn!

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    3. Re:4 years??? by ArcadeMan · · Score: 2

      Do you have any idea how many butterflies it took to reply to your message?

      Now get off my lawn!

    4. Re:4 years??? by rickb928 · · Score: 1

      4 years was my recommendation for disk replacements from about 198 onwards. Some arrays had drives >8 years old, but if failure was not tolerated, 4 years was enough.

      Mind you, if the customer specified IDE drives, I warned them that failure was inevitable. SCSI 10K drives, I would still swap but that was for five-nines.

      And those stupid IDE RAID cards, well, that's too cheap. We are no longer talking reliable. Let someone else have that business.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    5. Re: 4 years??? by jd2112 · · Score: 1

      I was still wire wrapping the logic circuits and waiting for the vacuum tubes to reach operating temperature on my ENIAC.

      --
      Any insufficiently advanced magic is indistinguishable from technology.
    6. Re:4 years??? by Anonymous Coward · · Score: 0

      "from about 198 onwards"
      All hail Metusalah. What were the beads on those Abacuses made of - ivory?

    7. Re:4 years??? by Anonymous Coward · · Score: 0

      So you're the guy responsible for this weather!

    8. Re:4 years??? by CmdrTamale · · Score: 1

      IS THAT YOU, B1FF?

  6. TLDR; 2D arrays wit a ton of spares are reliable by raymorris · · Score: 3, Insightful

    The bottom line is, having a lot of spare disks for a 2D array makes it reliable over time. These configurations of 2D arrays are quite reliable, over time because they have many spares available to automatically replaces failed disks:

    Data parity spare
    12 3 13
    12 3 14
    24 6 20
    36 9 26

    To understand the above table, we'll use the first row as an example. An array made up of 1TB disks 12TB of data space would have 3TB of parity and 13 spare 1TB drives, for a total of 28 drives to get 12 drives worth of net storage.

    What they didn't mention is that the same reliability can be achieved with only three spares, by replacing spares at your convenience. Replacing drives can be somewhat costly if it has to be done quickly, but if you can schedule to replace the failed drive "some time in the next two months", that probably won't be costly.

  7. Naive to say the least. by BarbaraHudson · · Score: 0

    selected a five-year disk array lifetime and assumed disk failures were independent events distributed according to a Poisson law with a mean time to failure (MTTF) of 100,000 hours.

    100,000 hours = 273 years. Does anyone believe that?

    --
    "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    1. Re:Naive to say the least. by alphatel · · Score: 3, Funny

      100,000 hours = 273 years. Does anyone believe that?

      Everyone except you apparently.

      --
      When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
    2. Re:Naive to say the least. by jandrese · · Score: 2

      100,000 hours is 4,167 days which is ~11.4 years. That sounds pretty reasonable to me, since I've run plenty of disks for over a decade.

      --

      I read the internet for the articles.
    3. Re:Naive to say the least. by wbr1 · · Score: 2

      Check your math. 100,000 hours / 24 = 4166.6~ days
      4166.6666~ days / 365 = 11.4 years

      --
      Silence is a state of mime.
    4. Re:Naive to say the least. by cellocgw · · Score: 1

      100,000 hours = 273 years. Does anyone believe that?

      Oddly enough, it doesn't matter whether you believe it or not. What matters is whether that's the same predictive model used for estimating lifetimes of RAID arrays, or a single drive for that matter. Since you want to compare the proposed new config directly with current paradigms, you have to use the same set of underlying assumptions.

      --
      https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
    5. Re:Naive to say the least. by Anonymous Coward · · Score: 0

      How do you get 273 years? 100,000 / (24*365) = 11.4 years

    6. Re:Naive to say the least. by oodaloop · · Score: 1, Funny

      Girls suck at math.

      --
      Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
    7. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

      Umm, 273 years is nearly 2.4 million hours. So, no, no one with basic arithmetic skills believes that 100,000 hours is 273 years.

    8. Re:Naive to say the least. by whoever57 · · Score: 1

      mean time to failure (MTTF) of 100,000 hours.

      100,000 hours = 273 years. Does anyone believe that?

      You don't understand the meaning of MTBF.

      --
      The real "Libtards" are the Libertarians!
    9. Re:Naive to say the least. by Anonymous Coward · · Score: 0

      Hours, not days. (about 11)

    10. Re: Naive to say the least. by Anonymous Coward · · Score: 0

      They mistook the number to be in days and not hours.

    11. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

      They did 100000/365 which equals about 274. They seem to have confused hours with days.

    12. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

      Actually it does matter. If you believe 100,000 hours = 273 years you lack basic arithmetic skills.

    13. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

      They also don't realize that 100,000 hours / 365 days is not the way you get years from hours.

    14. Re:Naive to say the least. by BarbaraHudson · · Score: 1

      Oops my math error. Still, 11.4 years is also way out of line with the reality that, as density rises, so do failure rates. Why do you think they've lowered the warranty period?

      --
      "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    15. Re:Naive to say the least. by BarbaraHudson · · Score: 0

      I know. Already apoligized. But I have yet to see a high-density disk last more than 8,000 hours, with the median being maybe half that.

      --
      "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    16. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

      But I have yet to see a high-density disk last more than 8,000 hours, with the median being maybe half that.

      Good for you. I have a number of 2 and 3 TB drives that are more than 5 years old. Anecdotes != evidence.

    17. Re:Naive to say the least. by BarbaraHudson · · Score: 1

      I've apoligized for the bad math, but sorry again. However, 11.4 years doesn't match what's actually happening as we go to higher densities. I've had a few drives last 8,000 hours, but most have died much sooner.

      --
      "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    18. Re:Naive to say the least. by BarbaraHudson · · Score: 1

      I screwed up. Sorry. However, even 11.4 years is overly optimistic as we cram more and more onto a single platter.

      --
      "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    19. Re:Naive to say the least. by BarbaraHudson · · Score: 1
      Good one! Yes, I screwed up. Circle this date on your calendar :-)

      But thinking that 11.4 years is going to save their behind is unrealistic.

      --
      "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    20. Re:Naive to say the least. by BarbaraHudson · · Score: 1

      Yes, I goofed. However, believing that 11.4 years is what you'll get in practice is also naive, especially with the higher-density drives that haven't accumulated even 2 years of real-life experience,

      --
      "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    21. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

      Sorry, the 3 TB drives are around 3 years old. The 2 TB have passed their 5 year warranties with no issues.

    22. Re:Naive to say the least. by BarbaraHudson · · Score: 0

      Their age isn't necessarily their up-time, and home use isn't the same load as these are expected to meet. Also, your anecdote also isn't evidence :-) But that's okay too.

      --
      "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    23. Re:Naive to say the least. by cellocgw · · Score: 1

      Actually it does matter. If you believe 100,000 hours = 273 years you lack basic arithmetic skills.

      +1 sardonic

      But doesn't address my serious point about application of statistical methods.

      --
      https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
    24. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

      No, they are constantly being read and written to from a NAS.

    25. Re:Naive to say the least. by cellocgw · · Score: 1

      They seem to have confused hours with days.

      Captain! They've broken our secret Starfleet code!

      --
      https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
    26. Re:Naive to say the least. by wonkey_monkey · · Score: 1

      100,000 hours = 273 years. Does anyone believe that?

      I don't, because 100,000 hours is 11.4 years.

      273 (much closer to 274) years is 100,000 days.

      --
      systemd is Roko's Basilisk.
    27. Re:Naive to say the least. by wonkey_monkey · · Score: 1

      PS You've already apologised more than enough for this. Sorry to compound it!

      --
      systemd is Roko's Basilisk.
    28. Re:Naive to say the least. by Anonymous Coward · · Score: 0

      I don't know if they'd be considered "high-density", but each of the pair of 1TB drives in my colo'd server are over 50K hours, with a perfectly clean SMART report.

    29. Re:Naive to say the least. by Anonymous Coward · · Score: 0

      haha, I was going to ask what wormhole those 5+ year old 3 TB popped out of. You're like guy in interview in 2010 claiming 15+ years Java experience, and no he never worked at Sun

    30. Re:Naive to say the least. by BarbaraHudson · · Score: 1

      I kind of deserve it, thiough. That's what I get for trying to pass the vacuum, watch Dr. Phil, keep my neighbors dog from drinking my coffee (again), and post on slashdot at the same time.

      --
      "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    31. Re:Naive to say the least. by wbr1 · · Score: 1

      That seems exceptionally short. I run a repair shop, and dead/dying HDDs are the second most common problem. While I do not know the operational hours of these devices, the great majority are past the 3 year mark when they begin to fail.
      I guess it also depends on your definition of high density as I do not see many drives > 1TB in consumer/SMB equipment.

      --
      Silence is a state of mime.
    32. Re:Naive to say the least. by Immerman · · Score: 1

      A mean time between failure of 11.4 years means you can reasonably expect half of all drives to fail before then*. Assuming a constant failure rate (which we really shouldn't do), that means you can expect ~4.4% of drives to fail every year. Which leads to the benefit of lowering the warranty period: Every year of warranty increases the expected total production/replacement cost of the drive by 4.4% - reduce the warranty period and you boost profit margins and/or can reduce the price to undercut your competitors.

      *In reality it's not quite so simple, MTBF is actually the average failure rate of a large number of young drives tested for (probably) considerably less than a year, with aging effects never taken into consideration.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    33. Re:Naive to say the least. by SimonInOz · · Score: 2

      er, last time I checked, 100,000 hours is 11 years.
      273 years is 2,400,000 hours. Did you lose the use of your calculator?

      --
      "Cats like plain crisps"
    34. Re:Naive to say the least. by grylnsmn · · Score: 3, Funny

      That is one of the greatest subtle Wrath of Khan references I've seen yet.

      Spock: "Admiral, if we go by the book, like Lieutenant Saavik, hours would seem like days."

      Masterful!

    35. Re:Naive to say the least. by Anonymous Coward · · Score: 0

      100,000 hours = 273 years. Does anyone believe that?

      There are 168 hours in a week, and 52 weeks in a year. That's 8,736 hours per year. Divide 100,000 by 8,736. That's roughly 11 and a half years you knucklehead. You must work for NASA.

    36. Re:Naive to say the least. by hcs_$reboot · · Score: 1

      er, last time I checked, 100,000 hours is 11 years.

      Oh you check that a lot?

      --
      Slashdot, fix the reply notifications... You won't get away with it...
    37. Re:Naive to say the least. by BronsCon · · Score: 1

      I'm sure he used a calculator, seems he simply forgot to divide by 24.

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    38. Re:Naive to say the least. by Anonymous Coward · · Score: 0

      100,000 hours is 11.4 years, and 11.4 x 24 is 273. I assume someone left out the 24 hours/day factor when converting.

    39. Re:Naive to say the least. by whoever57 · · Score: 1

      Yes, I goofed. However, believing that 11.4 years is what you'll get in practice is also naive,

      Not, it's not your basic conversion error that's the problem.

      A MTBF of 11.4 years does not mean that a typical array will have a lifetime of 11.4 years. From Wikipedia:

      Once the MTBF of a system is known, the probability that any one particular system will be operational at time equal to the MTBF can be calculated. This calculation requires that the system is working within its "useful life period", which is characterized by a relatively constant failure rate (the middle part of the "bathtub curve") when only random failures are occurring.

      You are conflating "useful life period" with MTBF. They measure different things.

      --
      The real "Libtards" are the Libertarians!
    40. Re:Naive to say the least. by SimonInOz · · Score: 1

      every 11 years, or when my inbuilt estimation engine says "these figures are wrong, let's just check that".

      Said engine was especially useful when we used slide-rules (you might have to look that up), as I did at high school. It still is, because the world is full of people who blindly believe stuff.

      Not you of course.

      --
      "Cats like plain crisps"
    41. Re:Naive to say the least. by SimonInOz · · Score: 1

      Even Jupiter's day is 10 hours. (Ok, 9.9, but close enough).

      Maybe if we speeded up the earth's rotation a bit ... yeah, let's do that, make it one hour. Oh boy, effective gravity has gone slightly negative at the equator, we are losing our atmosphere, and cows will fly, perhaps over the moon, though mooing seems unlikely.

      Nah, I vote to leave it alone and do arithmetic properly. Boring, but we should live longer (though maybe not in days).

      --
      "Cats like plain crisps"
    42. Re:Naive to say the least. by BronsCon · · Score: 1

      Err... didn't see who the original bad math was done by. I mean "she"... I think...

      --
      APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
    43. Re:Naive to say the least. by goarilla · · Score: 1

      Most should make it to their second year (>=8640 hours).
      In our small 24 bay array I've seen a lot of those bad Seagate ST3000DM001 fail at ~15000-19000 hours.

    44. Re:Naive to say the least. by Anonymous Coward · · Score: 0

      Number is being misinterpreted, anyway.
      It does not mean one drive will last 100,000 hours. It means that 100,000 drives will last one hour.

  8. Not enough by BitZtream · · Score: 1

    I worry a lot less about losing data than I do corrupting data and not knowing it.

    But hey, congratulations, you've learned about RAID mirrors with lots of copies and learned how to apply basic, well understood engineering principals to it.

    Guess what, some of us were aware of this years ago, some others aware of it longer than you've probably been alive. Its been known my entire life, thats for sure, so thats at least 40 years.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    1. Re:Not enough by BitZtream · · Score: 1

      And lets add, to 'avoid maintenance' you just add a bunch of extra spares from the start. Thats just stupid, you over build ridiculously in order to not have to spend 10 minutes swapping a drive out. Totally cost effective ... if you're sending a probe out into space. In which case, you're going to want better than fives 9s, so try again.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    2. Re:Not enough by Anonymous Coward · · Score: 1

      Swapping out a drive under warranty takes 2-5 days. Problem occurs, system notifies operator, operator gets the notification, operator calls vendor, vendor assigns case, technician calls operator, technician orders spares, technician schedules repair, technician receives spares, technician waits until appointment and drives to site. technician gets met at the door and escorted to rack, technician replaces drive, technician checks repair, system rebuilds, technician checks rebuild, technician is escorted out of building, technician drives back.

      Not fast.

    3. Re:Not enough by operagost · · Score: 1

      A little faster if someone with a pulse is at the site. Then it's sending the new disk overnight or with a courier, and handing it to the IT staff member who swaps the disk.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    4. Re:Not enough by ebh · · Score: 1

      Or if you have a Netapp with a decent support contract: A disk fails while you're asleep[1]. The filer notifies Netapp over a dedicated POTS line. Netapp overnights a new disk to you. You find out the next morning that the disk failed, via a call from the loading dock about a package for you. You pop in the new drive, activate one of your other hot spares, and configure the new drive as a new hot spare, all in less time than it took you to walk down to the loading dock and back.

      [1] You don't have single disk failure alarms wake you up in the middle of the night because you configured your array to run with two failed disks.

    5. Re:Not enough by Qzukk · · Score: 1

      This is why I deal with equipment where I can A) crossship (bonus points for letting me ship the drive faceplate instead of the whole thing) and B) swap the drive out myself.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    6. Re:Not enough by Immerman · · Score: 1

      Sure, so you probably want to keep several spares handy, maybe even have a few hot spares that can be automatically deployed the moment there's a failure, and replace the failed drives at your leisure. Having almost as many hot spares as you have active disks is probably overkill for most scenarios. In fact they themselves calculated that with their parity technique it will give you 5-nines confidence in having 4 years of maintenance-free reliability. Probably a lot more cost effective to build the system for 5-nines reliability for a few months, and just make a point of replacing any failed drives within a few weeks.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    7. Re:Not enough by halltk1983 · · Score: 1

      I know it might be hard to believe, but some people run servers that aren't in their broom closet, close at hand. In fact, many run some internationally, and it's a lot more than 10 minutes to go swap a disk. It's a few hundred dollars each time to pay remote hands to swap it.

      --
      Watch for Penguins, they eat Apples and throw rocks at Windows.
    8. Re:Not enough by Virtucon · · Score: 1

      dedicated POTS line

      You realize that's not really feasible in most places anymore, right? Also, hate to burst your bubble but LPOPs and COs are being phased out as well. It's something to do with this newfangled network switching technology. Pretty soon circuit switched connections will be a thing of the past. ;-)

      --
      Harrison's Postulate - "For every action there is an equal and opposite criticism"
    9. Re:Not enough by petermgreen · · Score: 1

      Pretty soon circuit switched connections will be a thing of the past. ;-)

      The core of phone networks has moved from physical circuit switching to virtual circuit switching to packet switching with priority but at least here in the UK normal phone lines are still delivered from the phone exchnage as analog pots over a pair of copper wires (which may or may not also carry DSL). I beleive the situation in the US is similar.

      Were you thinking of some other place (and if so where) or were you using a pedantically narrow defintion of "dedicated pots line"?

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  9. Only 4 years? by fxsoap · · Score: 1

    Is this really a long time? 4 Years? ------That seems kind of short and not reliable to me.

    1. Re:Only 4 years? by wonkey_monkey · · Score: 1

      It's a long time for 99.999% reliability.

      --
      systemd is Roko's Basilisk.
    2. Re:Only 4 years? by Immerman · · Score: 1

      If a single drive has a MTBF of 100,000 hours, that means you can naively expect 50% of drives to fail within 100,000 hours. That gives you a five-nines reliability period for one drive of only 1.44 hours. Does that put the degree of reliability being discussed in proper perspective for you?

      The math:
      0.99999 = 0.5^N
      N = log(0.99999) / log(0.5) = 0.000014
      So, the 5-nines reliability period is 0.0014% of the MTBF, or
      100,000h * 0.000014 = 1.44h

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    3. Re:Only 4 years? by mordred99 · · Score: 1

      3-5 years is the industry standard for depreciation of computing hardware. IE use it and get rid of it for newer stuff.

    4. Re:Only 4 years? by Anonymous Coward · · Score: 0

      How are you doing your maths? If you assume a Poisson process, the 5 9s time is just 100,000 / 50,000 = 2 hrs. If you assume most any other distribution, you can't calculate the 5 9s time without knowing at least the variance of the TBF.

    5. Re:Only 4 years? by Immerman · · Score: 1

      Standard combinatorial statistics, assuming failure probability is constant over time. Obviously things get more complicated if the failure rate varies over time, but it's good for a first-order approximation. In reality the "bathtub curve" of drive failures means those first few thousand hours have a much higher failure rate, so the actual 5-nines reliability duration will be much lower.

      Assume you have a 0.99999 probability of not failing in 1.4427 hours
      After the first 1.4427 hours you have 0.99999 chance of having not failed.
      During the second period you have another 0.99999 chance of non-failure, assuming you didn't fail in the first period - for a total non-failure chance of 0.99999*0.99999 ~= 0.99998
      During the third period you again have a 0.99999 chance of continued non-failure, for a lifetime nonfailure chance of 0.99999^3 ~= 0.99997
      After 100,000 hours your chance of having not failed is 0.99999^(100,000h/1.4427h) = 0.5

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
  10. Discrete math & linear algebra to the rescue by Anonymous Coward · · Score: 0

    See subject: Kids, this is why math is good for you!

  11. The thing about this... by Kokuyo · · Score: 2

    "Yeah, well just put more disks in it..."

    Nice idea. Only: TCO is not just based on initial spending and maintenance. There is also rackspace to consider and did I hear anyone talk about green IT?

    If my day to day considerations were that one dimensional, my employer could save a ton of money on my salary.

    1. Re:The thing about this... by Anonymous Coward · · Score: 0

      Well != we'll

    2. Re:The thing about this... by Immerman · · Score: 1

      Presumably the spares aren't spin up until needed, so power consumption is negligible. And really, you're talking about having just over K spare drives for an array of K data drives and ~2*sqrt(K) parity drives. That's actually not all that bad - especially when you consider that that is an extreme case for getting four years of 5-nines reliability out of individual drives with a five-nines reliability of only 1.44 hours (=100,000h MTBF). If you instead assume a tech goes through your racks to replace all the failed drives once a month you should be able to eliminate most of the hot spares while maintaining reliabilty

      The actually interesting bit I think is probably the 2 sqrt(N) error-correcting parity system - assuming it scales that seems like it could be a really interesting advance for large-scale data stores: Your 10,000 data-disk array only needs 200 parity disks to ensure 5-nines reliability for as long as you can keep sufficient hot spares in the queue.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    3. Re:The thing about this... by Maxwell · · Score: 1

      "Yeah, well, just put more disks in it" He forgot a comma. Relax.

    4. Re:The thing about this... by PRMan · · Score: 1

      And you could probably do this with consumer instead of enterprise drives if you have that many spares (and avoid Seagate like the plague).

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
  12. Nothing novel is being proposed here by fnj · · Score: 2, Informative

    We observe that the same objectives cannot be reached with RAID level 6 organizations

    Well, duh. RAID6 is not a serious level of redundancy. ZFS RAIDZ-3 (triple parity) FTW. And you can build in as many hot spares as you want. Dinosaurs who have still not adopted ZFS need to get a clue.

    1. Re:Nothing novel is being proposed here by Anonymous Coward · · Score: 0

      Yeah, checksum scrubbing is nice.

      But, my conventional raid arrays also allow as many spares as I want, both with mid-tier LSI hardware raid and software raid. In my experience, you don't need very many hot spares before your practical maintenance concerns shift to other parts of the system than just disk. For example, you may have to intervene to service failed HBAs, fabric links, controllers, backplanes, cooling fans, power supplies (local to chassis), UPS, or air-conditioning.

    2. Re:Nothing novel is being proposed here by Immerman · · Score: 1

      Still lousy - they appear to be claiming a scalable parity system that only requires ~2sqrt(N) parity disks to protect N data disks. That's only 20 parity disks for 100 data disks, or 200 parity for 10,000 data. That's impressive.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
  13. Re:TLDR; 2D arrays wit a ton of spares are reliabl by Chas · · Score: 1

    Yes, but then you're dancing around the possibility of additional disk failures while waiting on that replacement.

    If you pop a few more drives (which, if you got your disks in lots is QUITE possible), you're in deep shit.

    --


    Chas - The one, the only.
    THANK GOD!!!
  14. check your math by fche · · Score: 1

    more like 11.4 years

  15. Academic la la bullshit. by Anonymous Coward · · Score: 1

    In academia, everything is simple and independent. I'm sure it's fun to calculate theoretical parity requirements for quintuple disk failures. ...but it's useless.

    In the real world, if you have five disks simultaneously fail in an array, there was a common cause. The next step is to restore from backup because every drive in that array is now suspect. Whatever knocked out five disks probably did a number on the rest, and it would be reckless to assume they are unaffected, even if they have a clean SMART report. You're well past the point of caring about parity if an array gets crushed like that.

    1. Re:Academic la la bullshit. by CBM · · Score: 1

      This. Anybody who remembers the IBM DeskStar (a.k.a. DeathStar) debacle will remember that drives from certain batches would fail on a weekly basis. To get better independence, one would need drives in the same RAID from different manufacturers, and hopefully, from different batches.

    2. Re:Academic la la bullshit. by Anonymous Coward · · Score: 0

      That's why you plant a couple hundred 1k files of random data in your filesystem and find *.special -exec md5sum {} > /sanitycheck those files. Then md5sum -c /sanitycheck them after a drive failure. If any of them don't match, something is likely wrong with the whole array.

    3. Re:Academic la la bullshit. by stoatwblr · · Score: 1

      "In the real world, if you have five disks simultaneously fail in an array, there was a common cause."

      Usually something to do with the bus, not the drives (thanks HP!). Losing a bunch of drives simultaneously like that usually results in an array which is errored but recoverable.

      As for the comment below: If you use desktop drives in an array then you need to ramp up your parity and spares accordingly. Deathstars had a very simple software fault (timer rollover) which caused them to fail at 49 days uptime.

  16. Simple. by Neil+Boekend · · Score: 1

    TL;DR version:
    Replacing disks sucks some times. Sticking in additional spares means you don't have to replace them. They calculated an efficient RAID solution that means you don't need as many spares.

    --
    Well, I might have a way, but it only works on a semi spherical planet in a vacuum.
    1. Re:Simple. by Anonymous Coward · · Score: 0

      Except that most companies are going to balk at being told that they need to buy what amounts to an exponential growth of spares as the disk size grows to provide enough redundancy. Maybe if you are dealing with some puny ass disk array that is okay, but say you want a 50 or 100 disk array you are going to have to buy an enormous amount of spares just to get 4 years of reliability. WHAT A JOKE!

    2. Re:Simple. by Immerman · · Score: 1

      You misread: the number of spares ~= the number of data disks, and the number of parity disks scales with the square root of that number. ( N(N-1)/2 data disks, N(N+1)/2 spares, and N parity disks ) This could actually be pretty interesting for high-capacity data storage.

      You should also consider the degree of reliability being discussed: a single drive's 100,000h MTBF translates to a 5-nines reliability period of only 1.4427 hours: 0.99999^(100,000h/1.4427h) = 0.5.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
  17. Not impressed by Anonymous Coward · · Score: 0

    I'm not impressed with this at all. My last two hard drives lasted 8 or 9 years each, with no motherboard failures or controller failers, or anything. But everything mentioned in this story indicates a much bigger investment, only to get a little more security for 4 years? No Thanks!

    1. Re:Not impressed by Immerman · · Score: 1

      Not a little more reliability, a LOT more reliability.

      A single drive's 100,000h MTBF translates to a 5-nines reliability period of only 1.4427 hours: 0.99999^(100,000h/1.4427h) = 0.5

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    2. Re:Not impressed by Anonymous Coward · · Score: 0

      Yes because your two drives lasting that long means that every single drive ever created will too.
      To balance that out anecdote, I've had two separate relatives lose data on failing hard drives (one external, one internal) both of which were fairly new drives, different brands. One of them even paid the $800 or whatever it was to get some of the data back from a data recovery company because despite my constant nagging to back up their stuff, they put it off too long.

      You want to run the US stock exchange or a missile control system or handle medical data that can cause people to die if it's unavailable when needed, or even run a successful medium to large business that you want to REMAIN successful? Then you need actual reliability, not just hoping to be lucky and never have a failure, because it's cheaper.

      Realistically though, to get the kind of reliability that 99.999% implies (max 5 minutes downtime per year and NEVER ANY permanent data loss), you can't even depend on a single fault-tolerant storage array system like what's proposed here. You really need multiple such systems at different geographic sites, with automatic data replication and fail-over.

      Here's an (insanely cheezy, but strangely compelling) marketing video from several years back that involves actually blowing up an operating datacenter:
      Disaster Proof
      Full disclosure: In a previous job, I was almost the Linux guy in this setup, but kind of glad I wasn't after hearing the guy they got to be the narrator... I could have definitely failed-over a Linux cluster faster than that, but I think they deliberately used less-aggressive settings because at the time they were pushing the other solutions as higher end and more reliable, so they couldn't have cheap little Linux stealing the show (my how things have changed).

  18. N(N+1)/2 spares by Anonymous Coward · · Score: 0

    I haven't read the paper, but.

    If N=4 then spares = 10.
    If N=6 then spares = 21.

    Seems like overkill to get 5 9s (5.26 minutes per year)

    1. Re:N(N+1)/2 spares by Lunix+Nutcase · · Score: 2

      Basically as the disk size grows you are talking about N-squared spares. I think most businesses are going to be more than happy with just hot-swapping out failed disks as needed.

    2. Re:N(N+1)/2 spares by Anonymous Coward · · Score: 0

      Unless we're misunderstanding their claim. Looks like their definition of "spare" is "part of the disk array", and by N*(N+1)/2 they mean that their method achieves 99.999% reliability with arrays of 1, 3, 6, 10, ... disks, but if you show up with 9 disks they can't do anything for you.

    3. Re:N(N+1)/2 spares by Lunix+Nutcase · · Score: 1

      I would hope I'm misunderstanding it, because that seems like a lot of spares to purchase ahead of time.

    4. Re:N(N+1)/2 spares by Anonymous Coward · · Score: 0

      Was this financed by Seagate?

    5. Re:N(N+1)/2 spares by Anonymous Coward · · Score: 0

      ...N-squared spares.

      I cannot spare a square.

    6. Re:N(N+1)/2 spares by Immerman · · Score: 1

      Reread the summary - N is the number of parity disks, not the number of data disks.
      N parity disks
      N*(N-1)/2 data disks
      N*(N+1)/2 spares
      So roughly the same number of spares as data disks, and the number of parity disks scales as twice the square root of that number. Pretty impressive if you're talking haigh-capcity data storage with 100s or thousands of data disks.

      Also data reliability is something very different than uptime: you don't lose data for only 5.26 minutes per year - once gone it's gone.
      Meanwhile a single drive's 100,000h MTBF translates to a 5-nines reliability period of only 1.4427 hours: 0.99999^(100,000h/1.4427h) = 0.5

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    7. Re:N(N+1)/2 spares by Immerman · · Score: 1

      N is the number of parity disks - the number of data disks also increases as N-squared.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    8. Re:N(N+1)/2 spares by Anonymous Coward · · Score: 0

      N(N+1)/2 spares on a disc array of N(N-1)/2 drives. The ratio of spares to data drives is (N+1)/(N-1), or about 1:1 unless N is quite small.

    9. Re:N(N+1)/2 spares by Anonymous Coward · · Score: 0

      Are you sure that you're reading it correctly? Why would the number of spares happen to equal the total number of parity and data disks, coincidence? Why distinguish between parity and spares? If the whole thing is "sans maintenance", then spares need to be encased and plugged in anyway (maybe not powered on?)

    10. Re:N(N+1)/2 spares by Immerman · · Score: 1

      That's certainly what it says in the summary. As for distinguishing between parity and spares - I should think that would be obvious: the parity disks are in active use, the array can't detect/correct errors without them. The spares meanwhile are just sitting there, presumably powered down, until one of the active disks needs to be replaced.

      As for the equivalence in the number of spares... I suspect it's not exactly coincidence, more like human nature: "Okay, we've got a cool 2D parity system - let's see just how long it will maintain 5-nines reliability if we give it one spare for every active drive. Over four years! Cool, for the press release lets juice it up a little and rephrase that as 'more than enough for five nines for four years'."

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    11. Re:N(N+1)/2 spares by Anonymous Coward · · Score: 0

      That's certainly what it says in the summary.

      No. What the summary says is that if you have N*(N+1)/2 spare drives lying around, you can build a disk array with them, splitting them as N*(N-1)/2 data disks, and N parity disks.

    12. Re:N(N+1)/2 spares by Immerman · · Score: 1

      We propose to eliminate [disk replacement] calls by building disk arrays that contain enough spare disks to operate without any human intervention during their whole lifetime ...we have simulated the behaviour of two-dimensional disk arrays with N parity disks and N(N – 1)/2 data disks under realistic failure and repair assumptions. Our conclusion is that having N(N + 1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years.

      Are you seriously telling me you read that and get that they're creating a disk array out of spare disks that can provide 5-nines reliability for four years without involving any disk replacement? Methinks you need to invest some serious effort on your reading comprehension skills. Not to mention your sanity-check skills.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
  19. Disks from same factory run often go bad together by daboochmeister · · Score: 2

    Yeah, and what are you going to do with 9 out of 10 of the disks all go bad, because they came from the same factory run and exhibit the same issue? This is what we usually experience, when a disk fails, most of the time it's a subcomponent issue shared by all of the disks from that and any concurrent factory runs - and we have to swap them ALL out. I guess you just throw the whole array out ... :-(

    --
    "Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci
  20. Not my anecdotal experience by futuresheep · · Score: 5, Interesting

    Just a few things I thought of while looking at this study:

    The authors are using Backblaze data. Backblaze uses consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use.

    I'm willing to bet that none of the authors of this paper have ever had to pay for colocated rack space, power, and cooling either, they've just doubled the RU that I need for storage. At $1500.00 - $2000.00 per rack that adds up.

    Doubling the rack space for storage I need so I can avoid a few service calls by my storage vendor over 5 years simply isn't efficient.

    We've installed close to 500TB of archival storage using commodity hardware and 2-3TB Nearline SAS. We have maybe 3 hand and eyes calls per year for disk replacement.

    Anyway - just rambling.

    1. Re:Not my anecdotal experience by fnj · · Score: 5, Insightful

      consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use

      In your fantasy there is a difference besides a hideously higher price and a somewhat longer warranty period. In real life, commodity SATA is much more cost effective. Everybody who is serious reognizes this (Google, Backblaze, Amazon).

    2. Re:Not my anecdotal experience by Anonymous Coward · · Score: 0

      It's only cost effective because their scale is significantly larger.

    3. Re:Not my anecdotal experience by silas_moeckel · · Score: 1

      Well you can probably double your density moving to non hot swap 3.5's, making double the drives even on space. Now if I were going to do that I would mirror the raid sets anyways since power consumption of near line drives is pretty minimal.

      Never seen much of a use of enterprise sata, I do use a lot of SAS with dual ports to separate raid controllers.

      --
      No sir I dont like it.
    4. Re:Not my anecdotal experience by Anonymous Coward · · Score: 1

      >The authors are using Backblaze data. Backblaze uses consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use.

      Why Enterprise Hard Drives Might Not Be Worth the Cost

      In addition to recording failure rates of thousands of consumer grade hard drives, the online backup company has also been keeping tabs on the enterprise-class drives used in its servers. (The consumer grade drives store customers' backup data, while the servers from Dell and EMC store Backblaze records that run the business.) Long story short, they found that the failure rate of the enterprise drives is higher than the consumer ones—4.6% annual failure rate versus 4.2%.

      For god sake, check your facts!

    5. Re:Not my anecdotal experience by Anonymous Coward · · Score: 1

      Why use spinning disks? Do the same thing with SSDs and you have something reliable and energy efficient.

    6. Re:Not my anecdotal experience by futuresheep · · Score: 1

      There's more to reliability than failure rates. Enterprise drives have stronger magnets, more robust error detection, better vibration dampening, etc..plus consumer drives do not support TLER, making them useless in anything other than the massive JBOD environments you mentioned.

    7. Re:Not my anecdotal experience by Anonymous Coward · · Score: 0

      In reality, it's the opposite.
      The hardware is exactly the same, and there is no difference in failure rate.

      The firmware on the disk is different.
      While a consumer grade disk will try and retry multiple times to recover a sector on disk, a professional drive will not as this would increase disk latency too much.
      A consumer disk will quite often be able recover a sector after multiple retries (and mark that sector as bad), while a professional disk will often simply "fail" with a S.M.A.R.T. error.

    8. Re:Not my anecdotal experience by Anonymous Coward · · Score: 0

      Uh no, they make about 2 to 4 copies of everything spread out across datacenters so they don't have to care about reliability. Huge difference between them recognizing commodity hardware is reliable and simply not caring if it's reliable. Get a clue.

  21. So they figured out raid z 3 with enough spares by silas_moeckel · · Score: 1

    To last all of 4 years, and need nearly as many hot spares as data drives. I guess the academics think they know something yet again. They took some dubious failure rates (backblazes use whatever is the cheapest consumer drive at the time and eventually stop buying the really bad ones (seagate 1.5 and 3tb looking at you)) and a rather optimistic transfer rate (200MBS) that assume all sequential reads. They failed to account for back plane, controller, and power assuming that those never fail. By their numbers you might as well run mirrored raid 5 or 6 with enough hot spares to make it between regularly scheduled tech visits. That give you the ability to split chassis and controllers along mirror lines. As to rebuilds we have better methods, predictive failure works well, ssd's make great caches while rebuilding etc etc. We also have less centralized options with distributed technologies that potentially scale better.

    5 9's is not that hard of an objective when talking about raid sets, the tools have been there for decades. Sure you will never reliably reach it with a single path to anything, 5 minutes is not enough time for even a staffed site to remedy any outage.

    --
    No sir I dont like it.
  22. failed disks by Anonymous Coward · · Score: 0

    One of my SATA hard drives has been running Winodws XP for five years. I work with legacy programs for MS-DOS, Windows 3.1 and Windows 9x. Don't laugh. The original disk went bad. It developed several bad sectors. The MFT became damaged and some of my programs didn't run properly. S.M.A.R.T. complained about bad sectors too. Smartdisk couldn't fix the bad sectors. I ended up having to swap out the 80 GB drive for a 320 GB drive. Always keep a backup of your files.

  23. Ignores how disks often fail by MarcAuslander · · Score: 2

    My understanding is that disks often fail when a head touches the surface, or a piece of dirt gets between the head and the surface. Once that happens, more dirt is produced, increasing the probability of more head crashes, leading to a failure cascade. As a consequence, once one of my drives starts to show unrecoverable errors, corresponding to damaged surface areas, I replace it while it can still be read.

    The spare platter strategy does nothing to reduce this failure mode. In fact, all modern disks already have spare space for bad block relocation.

    1. Re:Ignores how disks often fail by drinkypoo · · Score: 1

      The spare platter strategy does nothing to reduce this failure mode. In fact, all modern disks already have spare space for bad block relocation.

      Including pretty much everything with an onboard controller. "Modern" is understating the case.

      If I were expecting an array to last a long time without being touched, I would expect it to have a whole bunch of spares that never even got heated up until they were needed, just sat there in the box enjoying living in a relatively temperature-constant environment. Sure, there's fluctuations, but they'll all be within the operating temperature range of the drives.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Ignores how disks often fail by rickb928 · · Score: 1

      This from an NEC white paper in 2008:

      "A recent academic study [1] of 1.5 million HDDs in the NetApp database over a 32 month period found that 8.5% of SATA disks develop silent corruption. Some disk arrays run a background process to verify that the data and RAID parity match, a process which can catch these kinds of errors. However, the study also found that 13% of the errors are missed by the background verification process. When you put those statistics together, you find on average that 1 in 90 SATA drives will experience silent data corruption not caught by the background verification process. So when those data blocks are read, the data returned to the application would be corrupt, but nobody would know. For a RAID-5 (4+P) configuration at 930 GB usable per 1 TB SATA drive, that calculates to an undetected error for every 67 TB of data, or 15 errors for every petabyte of data. If a system were constantly reading all that data at 200 MB/sec, it would encounter an error in less than 100 hours."

      Sometimes, I just want to weep.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    3. Re:Ignores how disks often fail by Anonymous Coward · · Score: 0

      I would weep for NEC. The author totally misunderstood the academic article. What the paper actually was referring to was latent errors. IE, a block that can't be read, but you don't know it until you try to read it. So if a disc fails, and then you try to rebuild from the others and run into a latent error, you lose data.

      The NEC article is about miscompares, the drive returns data, but not the correct data. Two very different things.

  24. also the cost of a raid card / cards with that man by Anonymous Coward · · Score: 0

    also the cost of a raid card / cards with that many ports. Maybe even dual cpu just to get more pci-e lanes as say x4 to x8 for each raid card + say X4 to each 10 gige card. Say about X8 for 2-4 port cards.

  25. Trust by HideyoshiJP · · Score: 5, Interesting

    I don't trust anybody who has published a document with the title "C:\Users\Jehan-Francois Paris\Documents\ADAPT15\Case3.doc." Not even in .docx format. Tsk tsk.

    1. Re:Trust by Qzukk · · Score: 1

      Now I'm curious what happend to Case1, Case2, and Copy of copy of case3 [8].doc.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    2. Re:Trust by jones_supa · · Score: 1

      :D

    3. Re:Trust by Akili · · Score: 1

      I don't trust anybody who has published a document with the title "C:\Users\Jehan-Francois Paris\Documents\ADAPT15\Case3.doc." Not even in .docx format. Tsk tsk.

      Now I'm amused at the idea of the embedded filesystem path as a measure of trust of the source. I can only guess that these would be even worse:

      C:\My Documents\ADAPT15\Case3.doc
      C:\WINNT\Profiles\User\My Documents\ADAPT15\Case3.doc
      C:\Documents and Settings\User\My Documents\ADAPT15\Case3.doc

      Any path containing 'New Folder' and/or 'Untitled.doc' would quite possibly trump any of the above.

      ( 'C:\Documents and Settings\Ricky\My Documents\faxes\sent faxes\case3.doc' I wouldn't even dare open. )

    4. Re:Trust by Zontar_Thing_From_Ve · · Score: 1

      I don't trust anybody who has published a document with the title "C:\Users\Jehan-Francois Paris\Documents\ADAPT15\Case3.doc." Not even in .docx format. Tsk tsk.

      I don't know if this is an attempt to get modded as "Funny" when it's not funny at all or if you are serious, so I'll assume the later. There are compatibility reasons for using .doc format. .doc format is old and well supported by non-Microsoft products like LibreOffice, OpenOffice, etc. Where I work we save a lot of internal documents in .doc format simply because we don't need any features that .docx has and we don't want to force people needlessly to have to upgrade to Office 2010 just to read our docs when, again, they're pretty simple and don't need any of the new features that .docx supports. Additionally, my company in the past didn't have the fastest record of upgrading versions of Office and it got really frustrating to have a few people in the office saving docs in .docx when the majority of people in the office were on an older version of Office that didn't understand .docx and thus couldn't read their docs.

  26. Re:TLDR; 2D arrays wit a ton of spares are reliabl by tlhIngan · · Score: 1

    What they didn't mention is that the same reliability can be achieved with only three spares, by replacing spares at your convenience. Replacing drives can be somewhat costly if it has to be done quickly, but if you can schedule to replace the failed drive "some time in the next two months", that probably won't be costly.

    The goal is to realize that for manufacturers, service calls are expensive. Perhaps a company has a 4 hour response time - if a disk fails, the company is still running with redundancy, but they're wanting that drive replaced pronto, which is easily $500+ per incident (need to have spares on hand, drop ship extras if a tech runs low, need to station techs around, maybe even need to fly a tech in).

    So the goal is that building an extra 13 spare 1TB drives (which probably cost under $50 in bulk) is $650, or the cost of just over one service call.

    If enough drives have to be replaced then the tech can change a whole pile of them at once, which is still cheaper than sending people out for individual drive failures.

    The goal is basically to have no service calls over the service life - then maybe refresh it periodically at one's convenience by replacing all the failed drives in one go.

  27. Only addresses single-disk failure rates by Anonymous Coward · · Score: 0

    What about other causes of failure, especially ones that impact every single disk you own?

    Human errors and manufacturing defects. What do you think happens if all your 15K rpm drives were incorrectly manufactured with bearings designed for 7200 rpm drives? I had a BIG customer where that happened. It took several years before the disk manufacturer was forced to fess up because of the crazy failure rate.

    And did I actually read some posts all but saying with enough redundancy there'd be no need for backups? Umm, wrong.

  28. Re:Disks from same factory run often go bad togeth by Diss+Champ · · Score: 1

    If you read the article, that is exactly what they suggest. If failure rates are too far above predicted, they say to replace with new array. At least they are upfront about it.

  29. Service call? by roc97007 · · Score: 2

    A service call? Seriously? A syadmin (or operator if it's a big place) can't see the yellow light on a disk and replace the pack with in-house spares? Have we become so inept as an IT community that we can no longer do a walk-through of our machine room and service simple things like this? Maybe we do deserve to be outsourced.

    And if one must have a service contract such that only the vendor can touch the hardware, (why would you do that? never mind) wouldn't you negotiate a provision that includes drive replacement (as drives are consumables that must eventually be replaced) without being charged for an "office visit"?

    --
    Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
    1. Re:Service call? by Anonymous Coward · · Score: 1

      Didn't you get the memo? The drive is to eliminate IT staff at midsize and smaller companies. For those holdouts that don't want to put everything into the "cloud", vendors are creating maintenance free local storage for which you still will not need any IT staff to babysit. Your software can be outsourced/offshored, but local hardware also needs to be made hands free. The extra $5K for this is way cheaper than a moody sysadmin to plug in replacement drives.

    2. Re:Service call? by rickb928 · · Score: 2

      Yes we have, if the array is installed in your backup corporate PKI server, in a shielded and locked cage with video, electrostatic, and laser monitoring and alarms. And the keys to the cage are in another state. And it requires EVP approval to deliver the keys to the authorized tech for a flight to the DR site to change a failed drive.

      A real world example. You would recognize the name of this corporation in the first three letters. They take their corporate security very seriously, so much so that bumping into the cage earned you a visit from armed security, an escort out, and full debriefing until they were satisfied you would never take the cart with the stuck caster again...

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    3. Re:Service call? by painandgreed · · Score: 1

      A service call? Seriously? A syadmin (or operator if it's a big place) can't see the yellow light on a disk and replace the pack with in-house spares? Have we become so inept as an IT community that we can no longer do a walk-through of our machine room and service simple things like this? Maybe we do deserve to be outsourced.

      And if one must have a service contract such that only the vendor can touch the hardware, (why would you do that? never mind) wouldn't you negotiate a provision that includes drive replacement (as drives are consumables that must eventually be replaced) without being charged for an "office visit"?

      First off, are things so bad you still have to do physical inspection of the servers? Where we work, there are multiple monitoring systems and they don't expect anybody inside the data centers unless there is a change order for work of known parameters. Beyond that, it's not even the IT community in many cases but the business community that will all too easily not spend the money for the protection measures the IT department requests, decide to go with the said vendor, and not make the changes to the contract that the IT department requests (if they even get to see the contract before its signed).

    4. Re:Service call? by roc97007 · · Score: 1

      I know there's SMART and other tools, but oddly enough, with offshore admins supposedly monitoring our equipment 24/7, I can still walk through our (fairly large) machine room and identify three or five warning lights that they did not know about. (I'm a "legacy" IT employee who still has access to the room.) Software alerts are important, but they're only as good as the people watching them. Even with an alert automatically spawning a trouble ticket, things can go bad if the ticket is dropped into a week-long queue, or even if it happens during local daylight hours and the offshore crew aren't coming online until 8:00 PM local time. Later, when the smoke clears, the offshore admins will insist they were just following process, and we'll just set things up to be knocked down again at a future date.

      Secondly, you're right about IT making recommendations that are ignored by the pencil pushers. But in my opinion that's the CIO not doing his or her job.

      --
      Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
  30. Re:TLDR; 2D arrays wit a ton of spares are reliabl by silas_moeckel · · Score: 1

    We do just that, when it gets down to 1 hot spare it's an emergency service and we replace all the failed units. This does not happen very often and tends to be just that a bad batch.

    --
    No sir I dont like it.
  31. Reliability vs. Availability by Anonymous Coward · · Score: 0

    "to achieve a 99.999 percent probability of not losing data over four years."

    I think the summary writer doesn't understand the difference.

    1. Re:Reliability vs. Availability by Anonymous Coward · · Score: 0

      I think the same could be said for most of those who have commented on this so far.

  32. ONLY O(N^2) DISKS? by Anonymous Coward · · Score: 0

    SIGN ME UP!

  33. Re:TLDR; 2D arrays wit a ton of spares are reliabl by Anonymous Coward · · Score: 0

    Yes, but then you're dancing around the possibility of additional disk failures while waiting on that replacement.

    Even if the mean time between failures for consumer drives was 6 months, the odds of 'popping' two more spares in the month after the first failure would be less than 3%. If the MTBF is 1 year the probability drops to 0.7%.

  34. Geo location? by Anonymous Coward · · Score: 0

    If you need 99.999% reliability, I think you should consider more options. Keep your data in two separate locations. I don't know that power and internet are 99.999% available in a single location.

    For that kind of reliability, they should consider diversifying the types of disks in the system. Disk failure is not a purely independent random event. The kind of power, vibration or magnetic-field glitch that could knock out one drive, would likely knock out many drives.

    SAN's are great, until a whole run of disks die at the same time, or a fiber cut knocks out access to your data.

  35. Missing the point by Anonymous Coward · · Score: 0

    I think the major point of the paper is that pre-allocation some disks, as swap-in ready, might be cheaper then a service call. That is a change of mindset. Everything else about this is a distraction. Yes maybe in the future they would power-off those extra disks, until needed, to keep the green people happy etc...

  36. Oh, hai, from 2009 by bill_mcgonigle · · Score: 1

    zpool create -o ashift=12 -o autoreplace=on raidz2 sdc sdd sde sdf sdg sdh spare sdi sdj

    Alright, fine, ashift=12 is newer than 2009, for 2TB+ drives. And always use /dev/disk/by-id for your sanity.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  37. Re:TLDR; 2D arrays wit a ton of spares are reliabl by Kjella · · Score: 1

    Even if the mean time between failures for consumer drives was 6 months, the odds of 'popping' two more spares in the month after the first failure would be less than 3%. If the MTBF is 1 year the probability drops to 0.7%.

    Except if you got a bad batch where some kind of material or production defect will cause many disks to fail near simultaneously. The overall MTBF might be true for all the disks they produce, but unless you make a real effort to source them from different batches over time you can't assume that's going to be your MTBF.

    --
    Live today, because you never know what tomorrow brings
  38. Why not a gradually-degrading array instead? by mi · · Score: 1

    Our conclusion is that having N(N + 1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years.

    Instead of keeping the spares inside as just that — spares — can it not start using all of them (in a sufficiently redundant configuration) and gradually lose capacity as physical disks fail?

    Yes, it would require coordination with the driver and filesystem, but there is nothing insurmountable in that...

    --
    In Soviet Washington the swamp drains you.
  39. Unfair advantage. by koleczek · · Score: 1

    One of the authors is a Catholic priest. He probably blessed the drives first.

  40. Flawed logic by JerryLove · · Score: 1

    "We observe that the same objectives cannot be reached with RAID level 6 organizations and would require RAID stripes that could tolerate triple disk failures."

    That's true only if you assume that three disk failures occur faster than a single disk can be rebuilt.

    If you assume no more than two disk failures *during the length of time it takes to rebuild the array* then RAID 5 or RAID 6 works fine as long as you assign enough hot spares.

  41. expensive BECAUSE four hour service by raymorris · · Score: 1

    >. service calls are expensive. Perhaps a company has a 4 hour response time -

    Service calls are expensive BECAUSE it's an emergency. If you have four spares, plus the two parity drives, you're still six drives away from a problem. With a few spares, you can easily replace one by sending it UPS ground, rather than having a tech run out there immediately.

    1. Re:expensive BECAUSE four hour service by Anonymous Coward · · Score: 0

      There is more to a storage system then just the disks. This study is flawed in that it only looks at the disks themselves failing, not any other part of the storage system. In the case of a SAN/NAS array that has multiple shelves and buses, that spare may not be on the same shelf which could lead to a larger failure with a second failure.

      Assume you had 5 disk shelves and were running a bunch of RAID5 at 4+1 with one disk per shelf or RAID6 at 8+2. If you lose a disk on shelf 1 and a spare on shelf 2 picks up, a loss of shelf 2 would take your raid group down if shelf 2 ever failed. In this case, you would not technically lose data as the SAN should be smart enough to immediately stop all IO to that RG but your availability would suffer until that shelf was fixed and the RG was brought back on line. This gets a little more complicated with larger raid groups, different raid group types and the use of automated merged pools and even on a SAN/NAS that moves data around in the background to spread I/O and redundancy across all of the disks on all buses but the same concept would apply. There is more to that reliability than just the disks themselves. The study ONLY includes the rotating disk part of the storage system.

  42. Re:TLDR; 2D arrays wit a ton of spares are reliabl by Em+Adespoton · · Score: 1

    Indeed -- remember the experiment posted on Slashdot a year or so ago where they measured the MTBF across drives purchesed in batches and outside batches? Failures tended to cascade within the batch; other batches would cascade at different times.

    So that entire cluster is likely to fail catastrophically unless you're swapping in drives from new batches from time to time -- at which point it should last MUCH longer than 4 years with data integrity. Bonus points if your array can handle size boosts over time (swapping in larger disks).

  43. Dick array by vipvop · · Score: 1

    Call me when there's a dick array with 99.999% availability

  44. Math by jklovanc · · Score: 1

    The number of drives seems to be large. The calculations are exponential therefore as the cluster gets bugger the number of spare disks get much bigger.

    Drives spares Total
    5, 15, 20
    10, 55, 65
    30, 465, 495

    That's a lot of disks. There is a point that space and power overcomes the human cost.

  45. Absolute GARBAGE by Anonymous Coward · · Score: 0

    Run 2 drives each holding a copy of the same data. The probability of BOTH failing at the same time is clearly low, and easily calculated. Now when one fails, the other only has to operate perfectly for the period required to copy its contents to a new drive- a probability again easily calculated.

    So long as the failure mode isn't something like a fire or a catastrophic power surge killing both drives (which obviously should not be sharing the same local PSU anyway), the two drive setup gives extraordinary robustness against failure- and actually illustrates how direct mirroring is the ONLY sane form of better data protection.

    The reason such trivial logic is not applied is because it is TOO SIMPLE. People want to sell you complex solutions, and use pseudo-science to justify such solutions, for reasons of commerce. The REAL issue of the HDD is that people are encouraged to use it past a clear point of failure- and I mean when platter surface break-up and mechanical problems are generating increasing numbers of sector write-fails. In my TWO drive situation, if both drives have hidden faults on the same data that has gone un-noticed, clearly that data is lost. The trick is KNOWING that a HDD has terminally failed long before most industry 'tests' would suggest this to be the case.

    And the problem with DORMANT data is the biggest issue. If you haven't checked a file in a time, how do you know it is still 'good'? Of course you can simply change mirror-2 to mirror-n (which as any person versed in back-up theory will tell you is like having n full back-ups made across n days, where n represents an appropriate level of paranoia).

    Most data is lost because there is NO BACKUP at all. Then most backed-up data is lost because of a system failure, like all the back-ups burning in the same fire that destroys the main computer, or all the back-ups being malformed (but not noticed, because up to that point, no backup was used for data restoration).

    Long term archival 'back-ups' are a different issue suffering from various forms of degradation- usually cost and care related (look at old Hollywood films that still have perfect copies today, and films from the 80s that only have dreadful grainy copies available).

    Automated data-loss prevention systems tend to be jokes because no-one wants to pay 3+ times the cost of storage. If someone will pay for data replication costs (and no data is easier to replicate than data on a HDD), then mirror methods make the likelihood of data loss as good as zero.

    Of course, in practise, VERY real-time replication means things like horrible RAID controller chips, and these don't work well with modern OSes and HDDs. Less real-time replication, as seen in Google's de-facto cloud, for instance, is probably as good as a non-fuss automated system gets. But if a company has key files being updated all the time in real-time, NOTHING can prevent localised data loss situations.

  46. So... by guruevi · · Score: 1

    They 'invented' RAIDZ3? Or they are perhaps using ZFS or something similar internally and not telling anyone (like so many in the industry). Sure you can achieve very high reliability using ZFS but most systems maintain those 9's by a) having hot spares and b) replacing disks that failed in a timely manner. They are simply adding more hot spares so a service call is less important, you can just go by and replace 5 disks at a time whenever you need to expand your storage.

    They also forgot to mention that once disks start failing, you could easily have a whole set of them fail. Especially with firmware issues or if someone dropped an entire box in shipping. Once you drop below 2 hotspares/10 disks, you are in serious risk of degrading your system because disks could fail while rebuilding as well.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  47. Why we have RAID and Spares by Virtucon · · Score: 1

    Calculating the the System MTBF of 77 drives at 100,000 hours as a subsystem we'd expect to have a drive failure approximately every 1300 hours. That's not the reality of most observations/environments but it's enough to have a least a couple of spares on hand and why we have things like Raid 6 and ZFS. It also doesn't necessitate you having tons of spares onsite either.

    --
    Harrison's Postulate - "For every action there is an equal and opposite criticism"
  48. XIOTech does this already by Anonymous Coward · · Score: 0

    Their spinning disk SANs don't use individual drives but a Datapac with many drives inside. The array can take down individual disks inside and re manufacture them in-situ by doing low-level formatting per drive or down to per platter and platter side if needed. The only time you need to disk swap is when you've suffered enough internal issues that they can't be corrected for with spares or reconditioning.

    The downside is you're now replacing an entire pac of something like 15 drives and if you're pac bound through VMware RDMs, then you're looking at downtime to relocate the data and detatch the connections. Other than that, it's actually a really neat system.

  49. Re:TLDR; 2D arrays wit a ton of spares are reliabl by BronsCon · · Score: 1

    That's why, as the manufacturer of such a system, you refuse to sell it bare. Your customers won't complain if you tell them what the bare cost, cost per disk, and labor cost to install a disk are, and sell disks at cost and with reasonable labor. Make money on your hardware, bring in enough to pay for assembly based on disk install labor.

    That's only step one, though. Start ordering disks when you start your first production run of hardware. Order direct from manufacturer, and from as many suppliers as possible, so you get disks from as many batches as possible. Then, continue placing frequent, but small, orders from whoever can get you the disks the cheapest; it may work out that you can get volume pricing from the manufacturer by telling them "I'm going to need X disks over all and am willing to pay for them up front, but I need them shipped (X/52) per week from current stock at the time of shipping, don't set aside my disks out of the current batch to ship at a future date".

    It's a bit more labor, but compare serial numbers and attempt to color code by batch. Use colored dot stickers for this. When fetching drives for an installation, try and get an even distribution of colors, so you don't have an excess of drives from any given batch, and always record who has which drives, so if you start getting failure reports that indicate a bad batch, you can proactively alert the customers who have those drives that it might be a good idea to have you swap them even if they still appear to be functioning.

    All of that drives up the cost, of course. I'm not going to sit here and to the math to figure out what the cost would be, as there are simply too many assumptions and I have too little time, but if you've nothing better to do and don't mind making a couple dozen, likely provably wrong, assumptions, you can have at it.

    --
    APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
  50. A solution in search of a problem. by sirwired · · Score: 1

    If service call costs for one or two disks are prohibited, simply put in enough spares so you only have to roll a tech for, say, 10 drives.

    Alternatively, make them user-swappable. If all the customer has to do is ask their tech to yank drives with a Blinky Amber Light of Doom, even the most untrained monkey could figure that out.

    1. Re:A solution in search of a problem. by Anonymous Coward · · Score: 0

      The most untrained monkey might be able to figure it out, but I don't really think the same could be said for most American tech workers.

  51. Good Luck With That (tm)... by Anonymous Coward · · Score: 0

    Everything works great UNTIL:
    A. You discover that the drives have a firmware bug that causes silent data corruption.
    B. You discover that the drives have a firmware bug that causes them to drop out of the array.
    C. You end up with one or more drives that fail in a "unique" way that hangs the bus they're on, making multiple other drives drop out too.
    D. You get a bad batch of drives, since you bought them all at the same time from the same supplier instead of adding more over time to increase capacity and/or replacing the failed ones over time with new ones from very different batches.
    E. Realize that controllers, FANs, and sometimes even cables and backplanes can die over time, especially in certain countries where air pollution (like sulfer) is a problem.
    F. Discover that tin whiskers grow on some lead-free component connections and fry them over time.
    G. Your datacenter cooling fails due to a breaker failing in the control room that causes the controllers to lose power (despite your multiple redundancy EVERYWHERE else) and your drives get heat stressed and fail at astronomically higher than "normal" rates (true story, actually most of these are).
    H. During backup generator testing, someone screws up and your array loses power during heavy activity, and you get a surge when it comes back on.
    I. Natural disaster. 'nough said.

    Need I continue? 99.8% is realistic, 99.99% is doable with a lot of extra effort and expense. 99.999% is usually total, absolute and utter BS unless you have a fully separate datacenter in another region to fail over to or you're just lucky (which is nothing even resembling a "guarantee"), with synchronous data replication. That's less than 5.3 MINUTES PER YEAR of downtime, or a total over 4 years of less than 28 minutes total. Usually, one single unplanned service event and you've already blown it.

    That's before you waste all the capital, power, and cooling to fully expand the array on day 1 rather than thin provision and add over time and replace failed disks as you go (also keeping in mind that storage acquisition costs go DOWN over time).

    1. Re:Good Luck With That (tm)... by kmoser · · Score: 1

      Just don't drop the whole thing on a concrete fllor, otherwise every platter will fail immediately.

  52. K.I.S. by Anonymous Coward · · Score: 0

    Did they think a formula for the no. of spares would impress
    The formula above reduces to spare capacity required = 50% of parity and data disks.

  53. Enterprise drives have stronger magnets! by mveloso · · Score: 1

    They have stronger magnets because they need to write that data more harder than normal drives.

  54. Or... by AlchemyX · · Score: 1

    You could use ZFS with RAIDZ3 and multiple spares.

  55. Another fantasy by Anonymous Coward · · Score: 0

    Those who believe in this type of risk assessment should read the Rasmussen (no relation to the pollster) report on the risks associated with commercial nuclear reactors. In real life as opposed to just the product of probabilities a single event can cause a chain of "improbable" failures. For example a short in a cable tray can wipe out a whole data array.

  56. Nice Pun dad! by MacColossus · · Score: 1

    Disk Array, "Sans" maintenance.

  57. By the book, Admiral by AkkarAnadyr · · Score: 1

    Spock, if that array isn't rebuilt in two hours, get that rack out of there and back to a Service Bay.

    --

    I bought this house and you know I'm boss
    Ain't no h'aint gonna run me off

  58. Predictable failure. by leuk_he · · Score: 1

    It also assumes a normal failure of drives. However modern drives do not always fail normal. They develop slow spots, timeouts from which they might recover.

    Also the software to create the redundancy might fail, of it might fail if you do not update the firmware.

    And I am not even talking about catastropic failure. When a drive overheats you might want to remove it from the datacenter.

    1. Re:Predictable failure. by stoatwblr · · Score: 1

      "However modern drives do not always fail normal. They develop slow spots, timeouts from which they might recover."

      Enterprise firmware marks the spots bad after 7 seconds and carries on. The assumption is that redundancy will cover the loss.

      Consumer drives spend quite a while trying to recover data, on the basis that there's no redundancy, so it's worth a 5 minute hang to try and get the data.

    2. Re:Predictable failure. by leuk_he · · Score: 1

      You are referring to loosing a sector on the platter. That is exactly what the study assumes. loose a sector (detect that and do somtething with that ) or loose the disk.(have But there might be much more failure modes.
      -power fluctuation.
      -memory problems
      -Software problems. (Ever seen the SAN in a big compy having problems... yup, some configuration issue)
      -driver/interfaceproblems.

    3. Re:Predictable failure. by stoatwblr · · Score: 1

      "-power fluctuation."

      Redundant power supplies.

      "-memory problems"

      ECC ram, etc - but that's outside the scope of the disk array design anyway

      "-Software problems."

      Outside the scope of the design

      " -driver/interfaceproblems."

      Ditto

      They need to be taken into consideration for an end-to-end solution, but the approach in the study was strictly to the array level.

      If you design in appropriate redundancy then loss of any component is a routine replacement issue, not an emergency.

  59. Cheaper solution? by Zorpheus · · Score: 1

    How about having like 10 additional spare discs in your rack, and calling the service for replacement when 10 discs died? The cost of the service call does not matter much when it is for many discs at once.

  60. Really? A Paper for this? Saw it in 2000 by addikt10 · · Score: 1

    Some company was doing this in the Bay area in 2000.
    Hotplug is expensive. Cases are expensive. Making room for human access is expensive.
    Design for nothing but airflow and drive density, keeping pieces as absolutely cheap as possible. Gigabit instead of 10G.
    At exabyte scale, why do you care about the loss of 4TB? Using Super Micro boxes w/4TB Drives, you can have over 6 petabytes of raw storage in a 72u rack / cabinet

    Metadata servers keep track of where the copies of blocks are.
    Put copies of the blocks on completely disparate systems. If there is heavy read usage of a block, make more copies.
    Head servers scale and have some beef to them. They are all about getting info from the commodity stuff and packaging it for (subscribers, clients, whatever).

    If a drive dies or has issues - mark it bad and leave it at that. Ignore it.
    If a server dies, mark it as bad. Leave it.
    In 4 years you are forklifting the equipment and replacing it with new storage.

    There is no "RAID", other than there are multiple copies of blocks throughout the system.

    I met with a company in the bay area doing this in 2000 (I don't remember which one). It was dealing with Filesystems and not block, but with NFS, VMDKs, VHD, etc, who cares. I don't see anything new here at all.

  61. Re:Really? A Paper for this? Saw it in 2000 by addikt10 · · Score: 1

    I used the wrong Supermicro box to make my point - I selected the pure storage, vs server with storage.
    So 72 drives instead of 90 per 4U. 5.5 PB per 72U instead of "over 6".
    The rest of my points stand.

  62. NetApp or EMC? by Lost+Penguin · · Score: 1

    I'll believe others when I see the uptime....

    --
    I am the unwilling control for my Origin.
  63. Until the disk drives fail en masse by Antique+Geekmeister · · Score: 1

    This has happened repeatedly. The most notorious example is the "IBM Deskstar", which failed en masse after consistent amounts of use. They destroyed RAID arrays around the world because the individual drives could not be replaced fast enough to secure the data before multiple drives went offline simultaneously.

  64. Wrong N by Chirs · · Score: 1

    They have N parity disks, and then roughly N(N-1)/2 data disks and roughly the same number of spares.

    In larger arrays the overall overhead of the parity and spare disks is slightly under 50%, or roughly equivalent to RAID-1, but more reliable since the spares can be reassigned as needed.

  65. checksummed filesystems by Chirs · · Score: 1

    The solution for this is checksums and parity on the disk contents at the filesystem level. Read a block off the disk and check the stored checksum against what you read...if it doesn't match then use the parity information to correct the data and store it somewhere else.

  66. Re:TLDR; 2D arrays wit a ton of spares are reliabl by Anonymous Coward · · Score: 0

    They do exactly that, with a replacement to be scheduled "some time in the next 4 years".

  67. Re:TLDR; 2D arrays wit a ton of spares are reliabl by stoatwblr · · Score: 1

    If you run raid 66 (a raid 6 array of raid 6 arrays) then you get that much more protection.

    Not that raid6 is anywhere near good enough since 2Tb drives came along. There's around a 10% chance that you'll lose your remaining spare during a parity rebuild from a drive loss on a 12+2 disk array and a 1% chance that you'll lose another drive recovering from that (I've seen it happen)

    This is one of the reasons for considering ZFS raidZ3. One of the other reasons is that because it uses SSD buffering and caching, drive seek activity is smoothed out and heavy head seek is one of the prime life shorteners in mechanical hard drives (I've had identical array hardware using the same batches of drives and the ones which get hit hardest for random IO are the ones where drives fail more often.)