Slashdot Mirror


Proposed Disk Array With 99.999% Availablity For 4 Years, Sans Maintenance

Thorfinn.au writes with this paper from four researchers (Jehan-François Pâris, Ahmed Amer, Darrell D. E. Long, and Thomas Schwarz, S. J.), with an interesting approach to long-term, fault-tolerant storage: As the prices of magnetic storage continue to decrease, the cost of replacing failed disks becomes increasingly dominated by the cost of the service call itself. We propose to eliminate these calls by building disk arrays that contain enough spare disks to operate without any human intervention during their whole lifetime. To evaluate the feasibility of this approach, we have simulated the behaviour of two-dimensional disk arrays with N parity disks and N(N – 1)/2 data disks under realistic failure and repair assumptions. Our conclusion is that having N(N + 1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years. We observe that the same objectives cannot be reached with RAID level 6 organizations and would require RAID stripes that could tolerate triple disk failures.

191 of 258 comments (clear)

  1. Power Costs by bcoff12 · · Score: 2

    I don't see power mentioned in the paper.

    1. Re:Power Costs by advocate_one · · Score: 2

      with any sense it would include it's own UPS to allow it to successfully write out to the discs all the pending writes and then spin down...

      --
      Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    2. Re:Power Costs by Drethon · · Score: 2

      How about a setup that detects when one more drive failure will cause the raid array to fail and spins up a new unused drive to be ready for that failure?

      --> Not a raid expert...

    3. Re:Power Costs by Enry · · Score: 1

      IIRC XFS/SGIs had this built in that there was just enough juice to flush buffers to disk while everything was spinning down.

    4. Re:Power Costs by jandrese · · Score: 2

      The spares should be warm spares. Not spinning until the RAID controller detects a failure and replaces the failed drive. So they won't take any appreciable amount of power. The concern I have is space. That many idle drives eating up rack space is going to be expensive.

      --

      I read the internet for the articles.
    5. Re:Power Costs by jellomizer · · Score: 4, Insightful

      Many high end equipment does have fairly large capacitors to allow enough power off time to do a clean power off.
      I remember back in the 1990's some PC Centric folks were looking in a Sun Workstation they were surprised about all the large capacitors that were on the motherboard. In short it gives the system enough time finish its final calculation before the power goes out.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    6. Re:Power Costs by Anonymous Coward · · Score: 1

      with any sense it would include it's own UPS to allow it to successfully write out to the discs all the pending writes and then spin down...

      Though you make a good point, I think bcoff12 means the potential power consumption of such a large disk array. Over the lifetime proposed, that could be significant enough to offset the benefit of high availability over other solutions plus regular backup.

    7. Re:Power Costs by Barny · · Score: 3, Insightful

      "More work is still needed to define policies that would allow array users and manufacturers to detect unusually disk failure rates and take the appropriate actions before any data loss takes place." - Last line in the conclusion.

      This implies that not all the spare drives are active and ready to go all the time and that some/most would be kept powered down as cold spares. Of course this same guy is likely to get another paper done where he examines the cost to run the array and how many drives could be left cold and still achieve the 5-9s reliability. Heck, if the software managing the drives is smart, it would rotate active/spare drives in and out, working them in quickly to get them all past the 'first 18 months high failure' rate to the sweet spot, then swap in and out over the lifespan of the array to enable the array to be at highest reliability for longer.

      Hrmm, maybe I should look at building such an algorithm, a quick google search doesn't turn any such systems up.

      --
      ...
      /me sighs
    8. Re:Power Costs by TWX · · Score: 2

      For colocated space, yes.

      For an organization like the one I work for, with server room space to spare, it wouldn't be too bad. We could probably triple our rackspace dedicated to disk and still have room to spare, and we have the HVAC to match. That's kind of what happens when equipment gets more condensed and virtualization enters the fray. Can't virtualize a storage array obviously, but can replace the space that application servers took with storage as the space is freed up.

      --
      Do not look into laser with remaining eye.
    9. Re:Power Costs by NatasRevol · · Score: 1

      I have yet to meet a small business that would be happy to pay for what is essentially raid10+1 (N(N+1)/2).

      --
      There are two types of people in the world: Those who crave closure
    10. Re:Power Costs by Anonymous Coward · · Score: 1

      you could include standby spared which do not receive power until needed. broken disks can be powered down.

    11. Re:Power Costs by LWATCDR · · Score: 1

      Or how about having the array swap in spares.
      Every few weeks or so one of the spares could start to act as a mirror of an active drive and once that drive is mirrored you swap the active drive to the spare and the spare to the active?

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    12. Re:Power Costs by silas_moeckel · · Score: 1

      Well since they are not supposed to need to be hot swap you can get 12+ drive into a 1ru chassis with redundant power and a fairly beefy server. That is 3x the density of traditional 4 up front 1ru. Expanding to 2ru gives 12 hot swap 3.5's or 24 2.5 still 2x the density in 3.5's for non hot swap. Potentially even higher with 2.5's, though highest I find is 88 hot swaps in a 4ru or 22 per ru coupled with a rather beefy server.

      --
      No sir I dont like it.
    13. Re:Power Costs by Immerman · · Score: 1

      How do you figure? I mean sure, presumably the spares would be inactive until a replacement was needed, to save both power and wear and tear, but how do you figure that that is an implication of needing to detect anomalous failure rates to avoid data loss? No matter what strategy you're using, if you've got N-nines projected reliability over Y years assuming normal failure rates, then if you're suffering from anomalously high failure rates you're going to need to replace some drives early to maintain the same reliability for the full Y years.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    14. Re:Power Costs by DigiShaman · · Score: 1

      You can virtualize and abstract out to your heart's content. What TWX said was simple; at some point at the end of the day, all that data has to be stored on physical media. That takes physical rack space.

      --
      Life is not for the lazy.
    15. Re:Power Costs by rickb928 · · Score: 2

      Sometimes the data is worth more than the power costs.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    16. Re:Power Costs by mlts · · Score: 1

      Cooling costs come to mind as well. SSDs are one thing, as they can be powered off and not used. However, HDDs have to be either spinning (which creates a lot of heat, especially at 10k+ RPMs that enterprise disks spin at), or spun up/down, and spinning enterprise disks up and down isn't good for them, and might even cause array faults unless the array firmware is designed to deal with it.

      There is also expense. If I have five hard disks worth of data, I need (5*4)/2, or ten HDDs by the OP's metrics. However, I've had batches of hard drives all fail at once. If I get multiple failures, even RAID 6 isn't going to help. If HDDs popped at random times, I might be OK, but not in this case.

      Of course, I've ranted about this before... RAID is solid for protecting data against disk failure... but that is just one of -many- failure scenarios. I have seen disk controllers fail and write garbage to the entire array. One goober doing an rm or a dd command will toss the array. If you want serious backups, you need to not just focus on disk. Tape isn't perfect, but done right, after the initial cost of the drive, the cartridges are inexpensive, take zero watts (other than climate control), last decades, have innate encryption (LTO-4 and newer), and can have hardware write protect enabled, as well as WORM media. This is great for people with the "keep it forever" mindset. Just set a password [1], stream the data off to a pile of WORM tapes, and stuff those in a closet somewhere. If the tapes vanish, since they were encrypted, and assuming only a few people have the password, it can be written off has "just" a hardware loss.

      [1]: It is boneheadedly easy to set encryption on LTO media via SPIN/SPOUT, so might as well set something, even if it is a variant of "correct horse battery staple". Ideally, the password should change every year or so... but just setting -something- is better than nothing.

    17. Re:Power Costs by rickb928 · · Score: 1

      (you can't virtualize the actual disks)

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    18. Re:Power Costs by K.+S.+Kyosuke · · Score: 1

      Either that, or the usual economies of scale would apply (clever block allocation and low-power large cache electronics to increase performance while decreasing energy costs per data transaction).

      --
      Ezekiel 23:20
    19. Re:Power Costs by rickb928 · · Score: 1

      It seems that one assumption in the study is predictable or consistent failure rates or timing. This would make sense if the drives were all the same make/model/manufacturing dates, but if not, well, then the model changes and they would be needing more intelligence to deal with unpredictable failure rates and having to spin up cold spares at different rates, predicting failure.

      Which all makes a world of sense to me. When I hovered over Raid 5 arrays with cold spares, especially in NetWare servers where 'device deactivated due to non-media defect' errors were not uncommon, I would add spares to save on windshield time to swap them out. Not all customers were comfortable going to the supply locker, grabbing a drive tray, and swapping out the tray with the flashing red light.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    20. Re:Power Costs by Anonymous Coward · · Score: 3, Insightful

      The question posed is whether the human intervention (labor charge) saved is worth more than the power costs.

    21. Re:Power Costs by ShanghaiBill · · Score: 2

      Sometimes the data is worth more than the power costs.

      But is the extra power cost more than the alternative extra maintenance cost?

      A 3.5" HDD consumes about 8w of power. TFA assumes a 4 year lifetime. (4 * 365 * 24) = 35k hours. (35k x 8w / 1000) = 280 kwHr. A typical retail price for electricity is 10 cents/kwHr, so over its lifetime a typical HDD will use about $28 of power. Big data centers likely pay less for power, so lets say $20.

      Now, what does it cost to swap it? Let's say the chance of failure is 20%, it takes ten minutes, and you pay the admin $30/hour (I just made up all these numbers). ($30/hour * 1/6 hour * 0.2 failures) = $1.

      So unless I made a mistake in either my math or my assumptions, it looks like swapping is still a win, unless the number of additional disks is less than 5%.

    22. Re:Power Costs by ShanghaiBill · · Score: 1

      It seems that one assumption in the study is predictable or consistent failure rates or timing.

      That would be a very bad assumption. Backblaze looked at 100,000+ drives and found that some models were more than 30 times as likely to fail as others (Hitachi was most reliable, Seagate was worst, for the models they reported). They also found that consumer drives were slightly more reliable than enterprise drives, despite costing half as much.

    23. Re:Power Costs by Sloppy · · Score: 4, Insightful

      Sloppy calculation tip: 24*365 = 10000.

      If you're Sloppy enough to accept that premise, then at 10 cents/KWHr, a Watt costs a dollar per year. It makes your $28 turns into $32, but hey, close enough. When I'm shopping, I can add up lifetime energy costs really fast, without actually being smart. Nobody ever catches on!

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    24. Re:Power Costs by Sloppy · · Score: 5, Funny

      This is how we're going bring our keepers to their knees, and eventually break out of the Matrix. We spend imaginary money on imaginary storage and then put all sorts of high-entropy stuff on it and run calculations to verify that it's really working, but they have to spend actually real resources, to emulate it.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    25. Re:Power Costs by viperidaenz · · Score: 1

      An idle drive takes up around 5W.
      That's 43kWh per year. That's less than $10. Over 4 years a drive uses less power than the cost of the drive.

    26. Re: Power Costs by rickb928 · · Score: 1

      And if you send a tech, not the local admin, all the numbers change.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    27. Re:Power Costs by TWX · · Score: 1

      But it still has to reside on physical disks, just like virtual servers still have to run on physical hardware. There are hundreds of cores in a high-end Cisco UCS installation, on dozens of blades. The UCS can optimize what goes where for the IT group, but in the end that's all about density and the actual relationship between physical cores and virtual cores allocated to VMs is probably not as leveraged as you seem to think it is, especially for high-load usage.

      There still has to be disks, there still has to be RAM, there still has to be processors, there still has to be I/O.

      --
      Do not look into laser with remaining eye.
    28. Re:Power Costs by JWW · · Score: 1

      Yep. And this costs way less than bringing in and swapping out a part.

      I don't see any real reason not to just spin the spare drives.

    29. Re:Power Costs by dgatwood · · Score: 1

      In a curiously ironic twist, the hardware designed to protect consumer-grade disks from damage ends up destroying them. As I understand it, a number of fairly recent consumer drives exhibit a higher than normal failure rate because the heads break off of the arms when they collide with the park ramp. This is, at least in part, a consequence of making the arms smaller and lighter to improve seek times.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    30. Re:Power Costs by GuB-42 · · Score: 2

      You may want to try ZFS (raidz3 mode for 3 parity disks). It has several advantages over mdadm, in particular it eliminates the "write hole" problem. I went from a mdadm/ext4 array to RAID-Z and I don't regret it.
      And note that RAID isn't a backup solution, even with 100% fault tolerance, there are plenty of things RAID won't protect you from such as fire, power surges, theft, bugs, virus, user error, etc... For this you need a reasonable backup plan. And IMHO, that third parity disk would be much more useful as an external backup drive for your sensitive data.
      Ah and a final advice, in RAID arrays that are not RAID-0, avoid buying all the same disks all at once. Disks from the same series, subjected to the same workload have a higher chance of failing all at the same time.

    31. Re:Power Costs by PRMan · · Score: 1

      I've never seen an IT project at a medium to large company take less than 4 hours. Because in addition to changing the drive (1 hour max), you have to write up paperwork and track it (3 hours of organizational time).

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
    32. Re:Power Costs by PRMan · · Score: 1

      Also, drives are prone to "bad batches". It's easy to get a case of drives where 50% are bad. And then follow that up with 10 cases with 0 or 1 bad drives.

      It doesn't matter how many extra drives you have if they all came from the same bad batch.

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
    33. Re:Power Costs by Anonymous Coward · · Score: 2, Insightful

      Get a real SAN or a better maintenance contract.

      I manage various SAN/NAS totaling about 5000 disks in different parts of the world.
      3:00 AM - Email that a disk failed, followed a few seconds later by an email that a hot spare kicked in
      3:30 AM - Email from our vendor that a disk failed and they are sending a replacement, reply if I would like someone on site to replace that drive or if we will do it ourselves
      ~3:45 AM - Email that the RG/Pool are been rebuilt
      ~11:00 AM - A tech in that office gets a drive delivered to their desk, they walk into the server room, replace it and put the failed one in the box, put the included label on the box and take it to their mail room.
      ~11:45 AM - Email that the pool/rg has been rebuilt and that the hot spare has been returned to a hot spare

    34. Re:Power Costs by PRMan · · Score: 1

      The problem with Hitachi drives is that the performance is VERY uneven. I would buy WD instead.

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
    35. Re:Power Costs by ShanghaiBill · · Score: 1

      Also, drives are prone to "bad batches".

      Backblaze buys and installs ~50 drives per day. So the batches would even out.

    36. Re:Power Costs by dnavid · · Score: 1

      Now, what does it cost to swap it? Let's say the chance of failure is 20%, it takes ten minutes, and you pay the admin $30/hour (I just made up all these numbers). ($30/hour * 1/6 hour * 0.2 failures) = $1.

      So unless I made a mistake in either my math or my assumptions, it looks like swapping is still a win, unless the number of additional disks is less than 5%.

      Let me know when you can find a way to dispatch a tech to swap a hard drive in a tier 1 datacenter for a buck.

    37. Re:Power Costs by ShanghaiBill · · Score: 1

      The problem with Hitachi drives is that the performance is VERY uneven.

      Could you provide a citation for that? If your opinion is anecdotal, then how many drives is it based on?

    38. Re:Power Costs by ShanghaiBill · · Score: 1

      Let me know when you can find a way to dispatch a tech to swap a hard drive in a tier 1 datacenter for a buck.

      It doesn't cost a buck. It costs 5 bucks, but has a 20% chance of occurring in the 4 year lifetime of each HDD. Also, you would not "dispatch a tech". Instead you would send out a tech with a cart of, say, 50 HDDs. The the tech would walk down the aisles, pulling and inserting disks. That would be his full time job. If he could do 50 in an 8 hour shift, and is paid $30/hour, that is about $5/disk.

    39. Re:Power Costs by dnavid · · Score: 1

      Let me know when you can find a way to dispatch a tech to swap a hard drive in a tier 1 datacenter for a buck.

      It doesn't cost a buck. It costs 5 bucks, but has a 20% chance of occurring in the 4 year lifetime of each HDD. Also, you would not "dispatch a tech". Instead you would send out a tech with a cart of, say, 50 HDDs. The the tech would walk down the aisles, pulling and inserting disks. That would be his full time job. If he could do 50 in an 8 hour shift, and is paid $30/hour, that is about $5/disk.

      No, you would not. The problem with this is that the whole point of the paper was to analyze ways to improve the reliability of disk arrays. You can do what you're describing if there was no specific timeframe in which hard drives need to be replaced: you just replace them whenever you get around to it, rather than soon after they fail. But that only works in environments where actual disk reliability is not important. In environments where actual array reliability is important, delaying the swapping of drives widens the window of vulnerability for an array, even one with hot spares, because of the need to survive the rare cases of multiple drives failing in a short span. That isn't likely, but when you're dealing with five nines of uptime requirement, those unlikely events have to be accounted for.

      I'm also trying to imagine implementing a system whereby a $30/hr tech just walks down aisles and pulls blinking drives and replaces them, and I'm thinking anyone who does that deserves the uptime they get.

    40. Re:Power Costs by painandgreed · · Score: 1

      Now, what does it cost to swap it? Let's say the chance of failure is 20%, it takes ten minutes, and you pay the admin $30/hour (I just made up all these numbers). ($30/hour * 1/6 hour * 0.2 failures) = $1.

      I don't know where you work at or what your processes are like that it only takes ten minutes to swap a drive. Where I work, it takes 10 minutes for the admin to tell that the drive has failed and determine what model it is for the replacement. Add in another 30 minutes to submit RFQs to three different vendors because his request for extra drives at implementation was denied. Once he gets a quote, it takes another 60 minutes of email and meetings with the guy that OKs budget requests before getting his boss involved and telling him that, yes, the department really does need these drives. Another 30 minutes over the next month checking on the backorder of the said drives till they finally ship. 90 minutes after seeing that the drives have arrived at the enterprise to go down to the loading dock, confirm that they have been delivered, get somebody to tell him who they have been delivered to, track down that wrong person and get them to find the drives which they have already misplaced, and hand them over. In that time is another 5 minutes to fill out the proper change control forms and submit them, another 15 minutes to explain change control request and answer questions at weekly meeting to boss and coworkers, 10 minutes over the next three weekly meetings to explain he is still waiting on the drives to complete that change control. 60 minutes to explain change control request to the server farm department of the IT department and argue till they give their permission. Another 30 minutes to schedule a visit time with the server farm guardians for time to access the lights out center (where the lights are never really out, but they like to call it that). 20 minutes waiting for the guy to show up to let you into the server farm to swap the drive and find the server. 10 minutes to do the physical work of swapping the drive. 20 minutes of checking on the drive swap to make sure that the drives have been swapped successfully and data is being replicated to it correctly. 15 more minutes in the weekly meeting to explain that the drive has been swapped and that the change control request is now closed. Which comes to more like six hours and five minutes to swap a drive.

    41. Re:Power Costs by Tough+Love · · Score: 1

      It would be stupid to keep the spares running, that comes right off their life. Maybe just spin them up once a month. What I don't see mentioned is the falling cost of drives... failed drives are normally replaced by newer, higher capacity drives, or they should be. IOW, they should plug in spares over time with planned maintainance instead of dumbly overprovisioning those things permanently.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    42. Re:Power Costs by ShanghaiBill · · Score: 1

      you just replace them whenever you get around to it

      You don't have to wait till you have 50 dead drives to send out the guy with 50 replacements. A datacenter with a million drives, with a 20% failure rate over 4 years, would have 170 dead disks per day, or about 50 in an eight-hour shift. So the worker with the cart would move through the aisles, replacing the closest dead drive as they die. This would likely be faster than specifically dispatching someone to a particular dead drive, since the worker would already be in the vicinity.

      I'm also trying to imagine implementing a system whereby a $30/hr tech just walks down aisles and pulls blinking drives and replaces them

      How do you think big datacenters work? What is wrong with "pull and replace"? The rebuild should happen automatically when the good drive is inserted.

    43. Re:Power Costs by Barny · · Score: 1

      Did you see the numbers on some of those seagate batches?

      Now while I admit that 300 or so drives isn't really enough to warrant a sample size when some of their batches are in the tens of thousands, but if you get 300 drives and almost 70 of them die within a year, would you keep buying drives?

      --
      ...
      /me sighs
    44. Re:Power Costs by rtb61 · · Score: 1

      The real question is whether running down maintenance ability will sound real fine up until the moment of catastrophic failure and their ability to react to it has been totally compromised. This would result in hugely extended down time in the event of that catastrophic failure, what ever it's cause. Looks great on a spreadsheet and pumps up an executives bonus but the whole company ends up going boom when a catastrophic failure occurs because customers will not tolerate extended downtime and that downtime might not be hours but weeks on even months as the try to rebuild maintenance efforts so that their maintainers can rebuild the system.

      This kind of evaluation extends out to government, should governments pay the costs of maintaining manual systems ie pencil and paper because in the event of catastrophic failure recovery is bound to their ability to sustain the essential elements of government whilst digital system are rebuilt and as it will be required to rebuild those systems. Corporate executives abandon these ideas because of course costs affect bonuses and golden parachutes in the event of failure.

      --
      Chaos - everything, everywhere, everywhen
    45. Re:Power Costs by mewrei · · Score: 1

      IBM's XIV storage platform does this. Has it's own UPS and pushes data to disk from cache when the power goes out.

    46. Re:Power Costs by stoatwblr · · Score: 1

      Raid is old hat. Sufficiently advanced technology doesn't require that the disks be in the same enclosures or even in the same building.

      If you design around the concept that "drives fail, get over it" then the "grunt with a cart full of drives" model will work extremely well - and he doesn't need to do paperwork because the system has been setup to note serial numbers, locations and hours automatically as drives are removed and replaced.

      Any installation where a drive change is a big deal is either trivially small or incompetently run.

    47. Re:Power Costs by stoatwblr · · Score: 1

      No, it's a consequence of _having_ park ramps. They're a relatively recent development.

      I've seen WD drives with a few tens of hours on them and tens-of-thousands of head parks. That kind of thing is sheer stupidity and it's not much surprise the heads get damaged when they're shunted to the park ramp every couple of seconds.

    48. Re:Power Costs by stoatwblr · · Score: 1

      spare drives don't take any damage when cold.

      It's true that enterprise drives don't like being spun up/down, but the reality in most setups is that a spare is only spun up once - it's the start/stop cycles which drives object to.

      FWIW Idle enterprise drives tend to pull more like 10W than 5W

    49. Re:Power Costs by dgatwood · · Score: 1

      Yeah, but park ramps have been around for a couple of decades (the earliest patent filing I could find was filed in 1992), and they only started having insane levels of trouble fairly recently (by comparison). So it's probably the combination of excessive amounts of parking (as you mentioned) and having less structural support for the heads that makes them so problematic.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

  2. I would love to, but that server is a soup Nazi by jandrese · · Score: 4, Informative

    So I tried to view the PDF, and it says "can't use the plugin, it causes problems on our server". So I figured I'd just download the file with wget instead. Nope, 403 forbidden.

    Looks like fetch works though. If anybody else has trouble getting the file, try my local mirror.

    --

    I read the internet for the articles.
    1. Re:I would love to, but that server is a soup Nazi by Nutria · · Score: 1

      it says "can't use the plugin, it causes problems on our server".

      The name of the browser and plugin would be helpful...

      (The PDF happens to work perfectly on Linux with the built-in viewers of FF35 and Chromium 39.)

      --
      "I don't know, therefore Aliens" Wafflebox1
    2. Re:I would love to, but that server is a soup Nazi by whoever57 · · Score: 1

      So I tried to view the PDF, and it says "can't use the plugin, it causes problems on our server".

      Maybe they have problems with their disk array?

      But seriously, I had no problems downloading the document from the orginal site.

      --
      The real "Libtards" are the Libertarians!
    3. Re:I would love to, but that server is a soup Nazi by ArcadeMan · · Score: 1

      No problem viewing the PDF file in Safari on OS X.

    4. Re:I would love to, but that server is a soup Nazi by jandrese · · Score: 1

      This was on Windows with Firefox and the Adobe plugin. I don't have the built-in plugin because I like popping out PDFs and because the built-in viewer is slow as balls on nontrivial PDFs.

      --

      I read the internet for the articles.
  3. 4 years? by Enry · · Score: 2

    That's not long term. That's the normal life of a storage array. Long term is like 8-10 years.

    1. Re:4 years? by jandrese · · Score: 1

      They only had availability data for 4 years of drive life. This is largely a math study. I'm not familiar with any implementations of their 2D parity system, although it is outside of my area of expertise. Their assumption that the service calls would always be more expensive seemed a little suspect to me. Rack space isn't free and when you have basically 100% redundancy or more in spare drives you're going to eat up a lot of space. Putting 54 spare drives in a rack that already has 11 parity disks and only 55 primary disks just doesn't seem efficient. Is all of that space really cheaper than a single service call during the life of the machine to replace 20 failed drives all at once (when the rack drops below say 6 spares of the original 26--saving you half of the space the spares would have taken up).

      I have also seen enough buggy RAID controllers in my day to make me very wary of that 2D raid arrangement in the paper.

      All in all this smells like a mathematicians solution to the problem, largely unbounded by real life concerns.

      --

      I read the internet for the articles.
    2. Re:4 years? by Enry · · Score: 1

      All in all this smells like a mathematicians solution to the problem, largely unbounded by real life concerns.

      I had the same thought. There's a few realities of storage that are missed here: storage use always increases, disks aren't the only things that fail, rack space isn't free, you usually have staff available already....

      This is an interesting idea if your storage is in a place where it can't be reached at all for some reason, but I think NASA and ESA have already done a good bit of research on that.

    3. Re:4 years? by stoatwblr · · Score: 1

      We run our arrays as long as we can. They tend to show a bathtub curve ramping up at the end of 6 years.

  4. 4 years??? by Anon-Admin · · Score: 1

    Really, 4 year life span and they are replaced?

    God I need to work for a company like that!

      I am so tired of dealing with these RS/6000 systems that were made back in 1994, and these intel systems made back in 2002.

    1. Re:4 years??? by ArcadeMan · · Score: 4, Funny

      I am so tired of dealing with these RS/6000 systems that were made back in 1994, and these intel systems made back in 2002.

      Yeah, we get it. You like to deal with cutting-edge stuff. Now get off my lawn.

      Sent from my Commodore 64.

    2. Re:4 years??? by operagost · · Score: 1

      Your C64 has video and keyboard I/O. Luxury! I would have responded earlier, but I was still keying my response into the front panel of my Altair. Now get off my lawn!

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    3. Re:4 years??? by ArcadeMan · · Score: 2

      Do you have any idea how many butterflies it took to reply to your message?

      Now get off my lawn!

    4. Re:4 years??? by rickb928 · · Score: 1

      4 years was my recommendation for disk replacements from about 198 onwards. Some arrays had drives >8 years old, but if failure was not tolerated, 4 years was enough.

      Mind you, if the customer specified IDE drives, I warned them that failure was inevitable. SCSI 10K drives, I would still swap but that was for five-nines.

      And those stupid IDE RAID cards, well, that's too cheap. We are no longer talking reliable. Let someone else have that business.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    5. Re: 4 years??? by jd2112 · · Score: 1

      I was still wire wrapping the logic circuits and waiting for the vacuum tubes to reach operating temperature on my ENIAC.

      --
      Any insufficiently advanced magic is indistinguishable from technology.
    6. Re:4 years??? by CmdrTamale · · Score: 1

      IS THAT YOU, B1FF?

  5. TLDR; 2D arrays wit a ton of spares are reliable by raymorris · · Score: 3, Insightful

    The bottom line is, having a lot of spare disks for a 2D array makes it reliable over time. These configurations of 2D arrays are quite reliable, over time because they have many spares available to automatically replaces failed disks:

    Data parity spare
    12 3 13
    12 3 14
    24 6 20
    36 9 26

    To understand the above table, we'll use the first row as an example. An array made up of 1TB disks 12TB of data space would have 3TB of parity and 13 spare 1TB drives, for a total of 28 drives to get 12 drives worth of net storage.

    What they didn't mention is that the same reliability can be achieved with only three spares, by replacing spares at your convenience. Replacing drives can be somewhat costly if it has to be done quickly, but if you can schedule to replace the failed drive "some time in the next two months", that probably won't be costly.

  6. Not enough by BitZtream · · Score: 1

    I worry a lot less about losing data than I do corrupting data and not knowing it.

    But hey, congratulations, you've learned about RAID mirrors with lots of copies and learned how to apply basic, well understood engineering principals to it.

    Guess what, some of us were aware of this years ago, some others aware of it longer than you've probably been alive. Its been known my entire life, thats for sure, so thats at least 40 years.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    1. Re:Not enough by BitZtream · · Score: 1

      And lets add, to 'avoid maintenance' you just add a bunch of extra spares from the start. Thats just stupid, you over build ridiculously in order to not have to spend 10 minutes swapping a drive out. Totally cost effective ... if you're sending a probe out into space. In which case, you're going to want better than fives 9s, so try again.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    2. Re:Not enough by Anonymous Coward · · Score: 1

      Swapping out a drive under warranty takes 2-5 days. Problem occurs, system notifies operator, operator gets the notification, operator calls vendor, vendor assigns case, technician calls operator, technician orders spares, technician schedules repair, technician receives spares, technician waits until appointment and drives to site. technician gets met at the door and escorted to rack, technician replaces drive, technician checks repair, system rebuilds, technician checks rebuild, technician is escorted out of building, technician drives back.

      Not fast.

    3. Re:Not enough by operagost · · Score: 1

      A little faster if someone with a pulse is at the site. Then it's sending the new disk overnight or with a courier, and handing it to the IT staff member who swaps the disk.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    4. Re:Not enough by ebh · · Score: 1

      Or if you have a Netapp with a decent support contract: A disk fails while you're asleep[1]. The filer notifies Netapp over a dedicated POTS line. Netapp overnights a new disk to you. You find out the next morning that the disk failed, via a call from the loading dock about a package for you. You pop in the new drive, activate one of your other hot spares, and configure the new drive as a new hot spare, all in less time than it took you to walk down to the loading dock and back.

      [1] You don't have single disk failure alarms wake you up in the middle of the night because you configured your array to run with two failed disks.

    5. Re:Not enough by Qzukk · · Score: 1

      This is why I deal with equipment where I can A) crossship (bonus points for letting me ship the drive faceplate instead of the whole thing) and B) swap the drive out myself.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    6. Re:Not enough by Immerman · · Score: 1

      Sure, so you probably want to keep several spares handy, maybe even have a few hot spares that can be automatically deployed the moment there's a failure, and replace the failed drives at your leisure. Having almost as many hot spares as you have active disks is probably overkill for most scenarios. In fact they themselves calculated that with their parity technique it will give you 5-nines confidence in having 4 years of maintenance-free reliability. Probably a lot more cost effective to build the system for 5-nines reliability for a few months, and just make a point of replacing any failed drives within a few weeks.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    7. Re:Not enough by halltk1983 · · Score: 1

      I know it might be hard to believe, but some people run servers that aren't in their broom closet, close at hand. In fact, many run some internationally, and it's a lot more than 10 minutes to go swap a disk. It's a few hundred dollars each time to pay remote hands to swap it.

      --
      Watch for Penguins, they eat Apples and throw rocks at Windows.
    8. Re:Not enough by Virtucon · · Score: 1

      dedicated POTS line

      You realize that's not really feasible in most places anymore, right? Also, hate to burst your bubble but LPOPs and COs are being phased out as well. It's something to do with this newfangled network switching technology. Pretty soon circuit switched connections will be a thing of the past. ;-)

      --
      Harrison's Postulate - "For every action there is an equal and opposite criticism"
    9. Re:Not enough by petermgreen · · Score: 1

      Pretty soon circuit switched connections will be a thing of the past. ;-)

      The core of phone networks has moved from physical circuit switching to virtual circuit switching to packet switching with priority but at least here in the UK normal phone lines are still delivered from the phone exchnage as analog pots over a pair of copper wires (which may or may not also carry DSL). I beleive the situation in the US is similar.

      Were you thinking of some other place (and if so where) or were you using a pedantically narrow defintion of "dedicated pots line"?

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  7. Only 4 years? by fxsoap · · Score: 1

    Is this really a long time? 4 Years? ------That seems kind of short and not reliable to me.

    1. Re:Only 4 years? by wonkey_monkey · · Score: 1

      It's a long time for 99.999% reliability.

      --
      systemd is Roko's Basilisk.
    2. Re:Only 4 years? by Immerman · · Score: 1

      If a single drive has a MTBF of 100,000 hours, that means you can naively expect 50% of drives to fail within 100,000 hours. That gives you a five-nines reliability period for one drive of only 1.44 hours. Does that put the degree of reliability being discussed in proper perspective for you?

      The math:
      0.99999 = 0.5^N
      N = log(0.99999) / log(0.5) = 0.000014
      So, the 5-nines reliability period is 0.0014% of the MTBF, or
      100,000h * 0.000014 = 1.44h

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    3. Re:Only 4 years? by mordred99 · · Score: 1

      3-5 years is the industry standard for depreciation of computing hardware. IE use it and get rid of it for newer stuff.

    4. Re:Only 4 years? by Immerman · · Score: 1

      Standard combinatorial statistics, assuming failure probability is constant over time. Obviously things get more complicated if the failure rate varies over time, but it's good for a first-order approximation. In reality the "bathtub curve" of drive failures means those first few thousand hours have a much higher failure rate, so the actual 5-nines reliability duration will be much lower.

      Assume you have a 0.99999 probability of not failing in 1.4427 hours
      After the first 1.4427 hours you have 0.99999 chance of having not failed.
      During the second period you have another 0.99999 chance of non-failure, assuming you didn't fail in the first period - for a total non-failure chance of 0.99999*0.99999 ~= 0.99998
      During the third period you again have a 0.99999 chance of continued non-failure, for a lifetime nonfailure chance of 0.99999^3 ~= 0.99997
      After 100,000 hours your chance of having not failed is 0.99999^(100,000h/1.4427h) = 0.5

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
  8. Re:Naive to say the least. by alphatel · · Score: 3, Funny

    100,000 hours = 273 years. Does anyone believe that?

    Everyone except you apparently.

    --
    When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
  9. The thing about this... by Kokuyo · · Score: 2

    "Yeah, well just put more disks in it..."

    Nice idea. Only: TCO is not just based on initial spending and maintenance. There is also rackspace to consider and did I hear anyone talk about green IT?

    If my day to day considerations were that one dimensional, my employer could save a ton of money on my salary.

    1. Re:The thing about this... by Immerman · · Score: 1

      Presumably the spares aren't spin up until needed, so power consumption is negligible. And really, you're talking about having just over K spare drives for an array of K data drives and ~2*sqrt(K) parity drives. That's actually not all that bad - especially when you consider that that is an extreme case for getting four years of 5-nines reliability out of individual drives with a five-nines reliability of only 1.44 hours (=100,000h MTBF). If you instead assume a tech goes through your racks to replace all the failed drives once a month you should be able to eliminate most of the hot spares while maintaining reliabilty

      The actually interesting bit I think is probably the 2 sqrt(N) error-correcting parity system - assuming it scales that seems like it could be a really interesting advance for large-scale data stores: Your 10,000 data-disk array only needs 200 parity disks to ensure 5-nines reliability for as long as you can keep sufficient hot spares in the queue.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    2. Re:The thing about this... by Maxwell · · Score: 1

      "Yeah, well, just put more disks in it" He forgot a comma. Relax.

    3. Re:The thing about this... by PRMan · · Score: 1

      And you could probably do this with consumer instead of enterprise drives if you have that many spares (and avoid Seagate like the plague).

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
  10. Re:Naive to say the least. by jandrese · · Score: 2

    100,000 hours is 4,167 days which is ~11.4 years. That sounds pretty reasonable to me, since I've run plenty of disks for over a decade.

    --

    I read the internet for the articles.
  11. Nothing novel is being proposed here by fnj · · Score: 2, Informative

    We observe that the same objectives cannot be reached with RAID level 6 organizations

    Well, duh. RAID6 is not a serious level of redundancy. ZFS RAIDZ-3 (triple parity) FTW. And you can build in as many hot spares as you want. Dinosaurs who have still not adopted ZFS need to get a clue.

    1. Re:Nothing novel is being proposed here by Immerman · · Score: 1

      Still lousy - they appear to be claiming a scalable parity system that only requires ~2sqrt(N) parity disks to protect N data disks. That's only 20 parity disks for 100 data disks, or 200 parity for 10,000 data. That's impressive.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
  12. Re:TLDR; 2D arrays wit a ton of spares are reliabl by Chas · · Score: 1

    Yes, but then you're dancing around the possibility of additional disk failures while waiting on that replacement.

    If you pop a few more drives (which, if you got your disks in lots is QUITE possible), you're in deep shit.

    --


    Chas - The one, the only.
    THANK GOD!!!
  13. Re:Naive to say the least. by wbr1 · · Score: 2

    Check your math. 100,000 hours / 24 = 4166.6~ days
    4166.6666~ days / 365 = 11.4 years

    --
    Silence is a state of mime.
  14. check your math by fche · · Score: 1

    more like 11.4 years

  15. Re:Naive to say the least. by cellocgw · · Score: 1

    100,000 hours = 273 years. Does anyone believe that?

    Oddly enough, it doesn't matter whether you believe it or not. What matters is whether that's the same predictive model used for estimating lifetimes of RAID arrays, or a single drive for that matter. Since you want to compare the proposed new config directly with current paradigms, you have to use the same set of underlying assumptions.

    --
    https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
  16. Academic la la bullshit. by Anonymous Coward · · Score: 1

    In academia, everything is simple and independent. I'm sure it's fun to calculate theoretical parity requirements for quintuple disk failures. ...but it's useless.

    In the real world, if you have five disks simultaneously fail in an array, there was a common cause. The next step is to restore from backup because every drive in that array is now suspect. Whatever knocked out five disks probably did a number on the rest, and it would be reckless to assume they are unaffected, even if they have a clean SMART report. You're well past the point of caring about parity if an array gets crushed like that.

    1. Re:Academic la la bullshit. by CBM · · Score: 1

      This. Anybody who remembers the IBM DeskStar (a.k.a. DeathStar) debacle will remember that drives from certain batches would fail on a weekly basis. To get better independence, one would need drives in the same RAID from different manufacturers, and hopefully, from different batches.

    2. Re:Academic la la bullshit. by stoatwblr · · Score: 1

      "In the real world, if you have five disks simultaneously fail in an array, there was a common cause."

      Usually something to do with the bus, not the drives (thanks HP!). Losing a bunch of drives simultaneously like that usually results in an array which is errored but recoverable.

      As for the comment below: If you use desktop drives in an array then you need to ramp up your parity and spares accordingly. Deathstars had a very simple software fault (timer rollover) which caused them to fail at 49 days uptime.

  17. Simple. by Neil+Boekend · · Score: 1

    TL;DR version:
    Replacing disks sucks some times. Sticking in additional spares means you don't have to replace them. They calculated an efficient RAID solution that means you don't need as many spares.

    --
    Well, I might have a way, but it only works on a semi spherical planet in a vacuum.
    1. Re:Simple. by Immerman · · Score: 1

      You misread: the number of spares ~= the number of data disks, and the number of parity disks scales with the square root of that number. ( N(N-1)/2 data disks, N(N+1)/2 spares, and N parity disks ) This could actually be pretty interesting for high-capacity data storage.

      You should also consider the degree of reliability being discussed: a single drive's 100,000h MTBF translates to a 5-nines reliability period of only 1.4427 hours: 0.99999^(100,000h/1.4427h) = 0.5.

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
  18. Re:Naive to say the least. by oodaloop · · Score: 1, Funny

    Girls suck at math.

    --
    Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
  19. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

    Umm, 273 years is nearly 2.4 million hours. So, no, no one with basic arithmetic skills believes that 100,000 hours is 273 years.

  20. Re:Naive to say the least. by whoever57 · · Score: 1

    mean time to failure (MTTF) of 100,000 hours.

    100,000 hours = 273 years. Does anyone believe that?

    You don't understand the meaning of MTBF.

    --
    The real "Libtards" are the Libertarians!
  21. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

    They did 100000/365 which equals about 274. They seem to have confused hours with days.

  22. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

    Actually it does matter. If you believe 100,000 hours = 273 years you lack basic arithmetic skills.

  23. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

    They also don't realize that 100,000 hours / 365 days is not the way you get years from hours.

  24. Disks from same factory run often go bad together by daboochmeister · · Score: 2

    Yeah, and what are you going to do with 9 out of 10 of the disks all go bad, because they came from the same factory run and exhibit the same issue? This is what we usually experience, when a disk fails, most of the time it's a subcomponent issue shared by all of the disks from that and any concurrent factory runs - and we have to swap them ALL out. I guess you just throw the whole array out ... :-(

    --
    "Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci
  25. Not my anecdotal experience by futuresheep · · Score: 5, Interesting

    Just a few things I thought of while looking at this study:

    The authors are using Backblaze data. Backblaze uses consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use.

    I'm willing to bet that none of the authors of this paper have ever had to pay for colocated rack space, power, and cooling either, they've just doubled the RU that I need for storage. At $1500.00 - $2000.00 per rack that adds up.

    Doubling the rack space for storage I need so I can avoid a few service calls by my storage vendor over 5 years simply isn't efficient.

    We've installed close to 500TB of archival storage using commodity hardware and 2-3TB Nearline SAS. We have maybe 3 hand and eyes calls per year for disk replacement.

    Anyway - just rambling.

    1. Re:Not my anecdotal experience by fnj · · Score: 5, Insightful

      consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use

      In your fantasy there is a difference besides a hideously higher price and a somewhat longer warranty period. In real life, commodity SATA is much more cost effective. Everybody who is serious reognizes this (Google, Backblaze, Amazon).

    2. Re:Not my anecdotal experience by silas_moeckel · · Score: 1

      Well you can probably double your density moving to non hot swap 3.5's, making double the drives even on space. Now if I were going to do that I would mirror the raid sets anyways since power consumption of near line drives is pretty minimal.

      Never seen much of a use of enterprise sata, I do use a lot of SAS with dual ports to separate raid controllers.

      --
      No sir I dont like it.
    3. Re:Not my anecdotal experience by Anonymous Coward · · Score: 1

      >The authors are using Backblaze data. Backblaze uses consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use.

      Why Enterprise Hard Drives Might Not Be Worth the Cost

      In addition to recording failure rates of thousands of consumer grade hard drives, the online backup company has also been keeping tabs on the enterprise-class drives used in its servers. (The consumer grade drives store customers' backup data, while the servers from Dell and EMC store Backblaze records that run the business.) Long story short, they found that the failure rate of the enterprise drives is higher than the consumer ones—4.6% annual failure rate versus 4.2%.

      For god sake, check your facts!

    4. Re:Not my anecdotal experience by Anonymous Coward · · Score: 1

      Why use spinning disks? Do the same thing with SSDs and you have something reliable and energy efficient.

    5. Re:Not my anecdotal experience by futuresheep · · Score: 1

      There's more to reliability than failure rates. Enterprise drives have stronger magnets, more robust error detection, better vibration dampening, etc..plus consumer drives do not support TLER, making them useless in anything other than the massive JBOD environments you mentioned.

  26. Re:Naive to say the least. by BarbaraHudson · · Score: 1

    Oops my math error. Still, 11.4 years is also way out of line with the reality that, as density rises, so do failure rates. Why do you think they've lowered the warranty period?

    --
    "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
  27. Re:N(N+1)/2 spares by Lunix+Nutcase · · Score: 2

    Basically as the disk size grows you are talking about N-squared spares. I think most businesses are going to be more than happy with just hot-swapping out failed disks as needed.

  28. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

    But I have yet to see a high-density disk last more than 8,000 hours, with the median being maybe half that.

    Good for you. I have a number of 2 and 3 TB drives that are more than 5 years old. Anecdotes != evidence.

  29. Re:Naive to say the least. by BarbaraHudson · · Score: 1

    I've apoligized for the bad math, but sorry again. However, 11.4 years doesn't match what's actually happening as we go to higher densities. I've had a few drives last 8,000 hours, but most have died much sooner.

    --
    "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
  30. Re:Naive to say the least. by BarbaraHudson · · Score: 1

    I screwed up. Sorry. However, even 11.4 years is overly optimistic as we cram more and more onto a single platter.

    --
    "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
  31. So they figured out raid z 3 with enough spares by silas_moeckel · · Score: 1

    To last all of 4 years, and need nearly as many hot spares as data drives. I guess the academics think they know something yet again. They took some dubious failure rates (backblazes use whatever is the cheapest consumer drive at the time and eventually stop buying the really bad ones (seagate 1.5 and 3tb looking at you)) and a rather optimistic transfer rate (200MBS) that assume all sequential reads. They failed to account for back plane, controller, and power assuming that those never fail. By their numbers you might as well run mirrored raid 5 or 6 with enough hot spares to make it between regularly scheduled tech visits. That give you the ability to split chassis and controllers along mirror lines. As to rebuilds we have better methods, predictive failure works well, ssd's make great caches while rebuilding etc etc. We also have less centralized options with distributed technologies that potentially scale better.

    5 9's is not that hard of an objective when talking about raid sets, the tools have been there for decades. Sure you will never reliably reach it with a single path to anything, 5 minutes is not enough time for even a staffed site to remedy any outage.

    --
    No sir I dont like it.
  32. Re:Naive to say the least. by BarbaraHudson · · Score: 1
    Good one! Yes, I screwed up. Circle this date on your calendar :-)

    But thinking that 11.4 years is going to save their behind is unrealistic.

    --
    "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
  33. Ignores how disks often fail by MarcAuslander · · Score: 2

    My understanding is that disks often fail when a head touches the surface, or a piece of dirt gets between the head and the surface. Once that happens, more dirt is produced, increasing the probability of more head crashes, leading to a failure cascade. As a consequence, once one of my drives starts to show unrecoverable errors, corresponding to damaged surface areas, I replace it while it can still be read.

    The spare platter strategy does nothing to reduce this failure mode. In fact, all modern disks already have spare space for bad block relocation.

    1. Re:Ignores how disks often fail by drinkypoo · · Score: 1

      The spare platter strategy does nothing to reduce this failure mode. In fact, all modern disks already have spare space for bad block relocation.

      Including pretty much everything with an onboard controller. "Modern" is understating the case.

      If I were expecting an array to last a long time without being touched, I would expect it to have a whole bunch of spares that never even got heated up until they were needed, just sat there in the box enjoying living in a relatively temperature-constant environment. Sure, there's fluctuations, but they'll all be within the operating temperature range of the drives.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Ignores how disks often fail by rickb928 · · Score: 1

      This from an NEC white paper in 2008:

      "A recent academic study [1] of 1.5 million HDDs in the NetApp database over a 32 month period found that 8.5% of SATA disks develop silent corruption. Some disk arrays run a background process to verify that the data and RAID parity match, a process which can catch these kinds of errors. However, the study also found that 13% of the errors are missed by the background verification process. When you put those statistics together, you find on average that 1 in 90 SATA drives will experience silent data corruption not caught by the background verification process. So when those data blocks are read, the data returned to the application would be corrupt, but nobody would know. For a RAID-5 (4+P) configuration at 930 GB usable per 1 TB SATA drive, that calculates to an undetected error for every 67 TB of data, or 15 errors for every petabyte of data. If a system were constantly reading all that data at 200 MB/sec, it would encounter an error in less than 100 hours."

      Sometimes, I just want to weep.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
  34. Re:Naive to say the least. by BarbaraHudson · · Score: 1

    Yes, I goofed. However, believing that 11.4 years is what you'll get in practice is also naive, especially with the higher-density drives that haven't accumulated even 2 years of real-life experience,

    --
    "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
  35. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

    Sorry, the 3 TB drives are around 3 years old. The 2 TB have passed their 5 year warranties with no issues.

  36. Re:Naive to say the least. by cellocgw · · Score: 1

    Actually it does matter. If you believe 100,000 hours = 273 years you lack basic arithmetic skills.

    +1 sardonic

    But doesn't address my serious point about application of statistical methods.

    --
    https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
  37. Trust by HideyoshiJP · · Score: 5, Interesting

    I don't trust anybody who has published a document with the title "C:\Users\Jehan-Francois Paris\Documents\ADAPT15\Case3.doc." Not even in .docx format. Tsk tsk.

    1. Re:Trust by Qzukk · · Score: 1

      Now I'm curious what happend to Case1, Case2, and Copy of copy of case3 [8].doc.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    2. Re:Trust by jones_supa · · Score: 1

      :D

    3. Re:Trust by Akili · · Score: 1

      I don't trust anybody who has published a document with the title "C:\Users\Jehan-Francois Paris\Documents\ADAPT15\Case3.doc." Not even in .docx format. Tsk tsk.

      Now I'm amused at the idea of the embedded filesystem path as a measure of trust of the source. I can only guess that these would be even worse:

      C:\My Documents\ADAPT15\Case3.doc
      C:\WINNT\Profiles\User\My Documents\ADAPT15\Case3.doc
      C:\Documents and Settings\User\My Documents\ADAPT15\Case3.doc

      Any path containing 'New Folder' and/or 'Untitled.doc' would quite possibly trump any of the above.

      ( 'C:\Documents and Settings\Ricky\My Documents\faxes\sent faxes\case3.doc' I wouldn't even dare open. )

    4. Re:Trust by Zontar_Thing_From_Ve · · Score: 1

      I don't trust anybody who has published a document with the title "C:\Users\Jehan-Francois Paris\Documents\ADAPT15\Case3.doc." Not even in .docx format. Tsk tsk.

      I don't know if this is an attempt to get modded as "Funny" when it's not funny at all or if you are serious, so I'll assume the later. There are compatibility reasons for using .doc format. .doc format is old and well supported by non-Microsoft products like LibreOffice, OpenOffice, etc. Where I work we save a lot of internal documents in .doc format simply because we don't need any features that .docx has and we don't want to force people needlessly to have to upgrade to Office 2010 just to read our docs when, again, they're pretty simple and don't need any of the new features that .docx supports. Additionally, my company in the past didn't have the fastest record of upgrading versions of Office and it got really frustrating to have a few people in the office saving docs in .docx when the majority of people in the office were on an older version of Office that didn't understand .docx and thus couldn't read their docs.

  38. Re:TLDR; 2D arrays wit a ton of spares are reliabl by tlhIngan · · Score: 1

    What they didn't mention is that the same reliability can be achieved with only three spares, by replacing spares at your convenience. Replacing drives can be somewhat costly if it has to be done quickly, but if you can schedule to replace the failed drive "some time in the next two months", that probably won't be costly.

    The goal is to realize that for manufacturers, service calls are expensive. Perhaps a company has a 4 hour response time - if a disk fails, the company is still running with redundancy, but they're wanting that drive replaced pronto, which is easily $500+ per incident (need to have spares on hand, drop ship extras if a tech runs low, need to station techs around, maybe even need to fly a tech in).

    So the goal is that building an extra 13 spare 1TB drives (which probably cost under $50 in bulk) is $650, or the cost of just over one service call.

    If enough drives have to be replaced then the tech can change a whole pile of them at once, which is still cheaper than sending people out for individual drive failures.

    The goal is basically to have no service calls over the service life - then maybe refresh it periodically at one's convenience by replacing all the failed drives in one go.

  39. Re:Naive to say the least. by Lunix+Nutcase · · Score: 1

    No, they are constantly being read and written to from a NAS.

  40. Re:Naive to say the least. by cellocgw · · Score: 1

    They seem to have confused hours with days.

    Captain! They've broken our secret Starfleet code!

    --
    https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
  41. Re:Disks from same factory run often go bad togeth by Diss+Champ · · Score: 1

    If you read the article, that is exactly what they suggest. If failure rates are too far above predicted, they say to replace with new array. At least they are upfront about it.

  42. Re:Naive to say the least. by wonkey_monkey · · Score: 1

    100,000 hours = 273 years. Does anyone believe that?

    I don't, because 100,000 hours is 11.4 years.

    273 (much closer to 274) years is 100,000 days.

    --
    systemd is Roko's Basilisk.
  43. Service call? by roc97007 · · Score: 2

    A service call? Seriously? A syadmin (or operator if it's a big place) can't see the yellow light on a disk and replace the pack with in-house spares? Have we become so inept as an IT community that we can no longer do a walk-through of our machine room and service simple things like this? Maybe we do deserve to be outsourced.

    And if one must have a service contract such that only the vendor can touch the hardware, (why would you do that? never mind) wouldn't you negotiate a provision that includes drive replacement (as drives are consumables that must eventually be replaced) without being charged for an "office visit"?

    --
    Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
    1. Re:Service call? by Anonymous Coward · · Score: 1

      Didn't you get the memo? The drive is to eliminate IT staff at midsize and smaller companies. For those holdouts that don't want to put everything into the "cloud", vendors are creating maintenance free local storage for which you still will not need any IT staff to babysit. Your software can be outsourced/offshored, but local hardware also needs to be made hands free. The extra $5K for this is way cheaper than a moody sysadmin to plug in replacement drives.

    2. Re:Service call? by rickb928 · · Score: 2

      Yes we have, if the array is installed in your backup corporate PKI server, in a shielded and locked cage with video, electrostatic, and laser monitoring and alarms. And the keys to the cage are in another state. And it requires EVP approval to deliver the keys to the authorized tech for a flight to the DR site to change a failed drive.

      A real world example. You would recognize the name of this corporation in the first three letters. They take their corporate security very seriously, so much so that bumping into the cage earned you a visit from armed security, an escort out, and full debriefing until they were satisfied you would never take the cart with the stuck caster again...

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    3. Re:Service call? by painandgreed · · Score: 1

      A service call? Seriously? A syadmin (or operator if it's a big place) can't see the yellow light on a disk and replace the pack with in-house spares? Have we become so inept as an IT community that we can no longer do a walk-through of our machine room and service simple things like this? Maybe we do deserve to be outsourced.

      And if one must have a service contract such that only the vendor can touch the hardware, (why would you do that? never mind) wouldn't you negotiate a provision that includes drive replacement (as drives are consumables that must eventually be replaced) without being charged for an "office visit"?

      First off, are things so bad you still have to do physical inspection of the servers? Where we work, there are multiple monitoring systems and they don't expect anybody inside the data centers unless there is a change order for work of known parameters. Beyond that, it's not even the IT community in many cases but the business community that will all too easily not spend the money for the protection measures the IT department requests, decide to go with the said vendor, and not make the changes to the contract that the IT department requests (if they even get to see the contract before its signed).

    4. Re:Service call? by roc97007 · · Score: 1

      I know there's SMART and other tools, but oddly enough, with offshore admins supposedly monitoring our equipment 24/7, I can still walk through our (fairly large) machine room and identify three or five warning lights that they did not know about. (I'm a "legacy" IT employee who still has access to the room.) Software alerts are important, but they're only as good as the people watching them. Even with an alert automatically spawning a trouble ticket, things can go bad if the ticket is dropped into a week-long queue, or even if it happens during local daylight hours and the offshore crew aren't coming online until 8:00 PM local time. Later, when the smoke clears, the offshore admins will insist they were just following process, and we'll just set things up to be knocked down again at a future date.

      Secondly, you're right about IT making recommendations that are ignored by the pencil pushers. But in my opinion that's the CIO not doing his or her job.

      --
      Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
  44. Re:TLDR; 2D arrays wit a ton of spares are reliabl by silas_moeckel · · Score: 1

    We do just that, when it gets down to 1 hot spare it's an emergency service and we replace all the failed units. This does not happen very often and tends to be just that a bad batch.

    --
    No sir I dont like it.
  45. Re:Naive to say the least. by wonkey_monkey · · Score: 1

    PS You've already apologised more than enough for this. Sorry to compound it!

    --
    systemd is Roko's Basilisk.
  46. Re:Naive to say the least. by BarbaraHudson · · Score: 1

    I kind of deserve it, thiough. That's what I get for trying to pass the vacuum, watch Dr. Phil, keep my neighbors dog from drinking my coffee (again), and post on slashdot at the same time.

    --
    "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
  47. Re:N(N+1)/2 spares by Lunix+Nutcase · · Score: 1

    I would hope I'm misunderstanding it, because that seems like a lot of spares to purchase ahead of time.

  48. Re:Naive to say the least. by wbr1 · · Score: 1

    That seems exceptionally short. I run a repair shop, and dead/dying HDDs are the second most common problem. While I do not know the operational hours of these devices, the great majority are past the 3 year mark when they begin to fail.
    I guess it also depends on your definition of high density as I do not see many drives > 1TB in consumer/SMB equipment.

    --
    Silence is a state of mime.
  49. Oh, hai, from 2009 by bill_mcgonigle · · Score: 1

    zpool create -o ashift=12 -o autoreplace=on raidz2 sdc sdd sde sdf sdg sdh spare sdi sdj

    Alright, fine, ashift=12 is newer than 2009, for 2TB+ drives. And always use /dev/disk/by-id for your sanity.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  50. Re:TLDR; 2D arrays wit a ton of spares are reliabl by Kjella · · Score: 1

    Even if the mean time between failures for consumer drives was 6 months, the odds of 'popping' two more spares in the month after the first failure would be less than 3%. If the MTBF is 1 year the probability drops to 0.7%.

    Except if you got a bad batch where some kind of material or production defect will cause many disks to fail near simultaneously. The overall MTBF might be true for all the disks they produce, but unless you make a real effort to source them from different batches over time you can't assume that's going to be your MTBF.

    --
    Live today, because you never know what tomorrow brings
  51. Re:Naive to say the least. by Immerman · · Score: 1

    A mean time between failure of 11.4 years means you can reasonably expect half of all drives to fail before then*. Assuming a constant failure rate (which we really shouldn't do), that means you can expect ~4.4% of drives to fail every year. Which leads to the benefit of lowering the warranty period: Every year of warranty increases the expected total production/replacement cost of the drive by 4.4% - reduce the warranty period and you boost profit margins and/or can reduce the price to undercut your competitors.

    *In reality it's not quite so simple, MTBF is actually the average failure rate of a large number of young drives tested for (probably) considerably less than a year, with aging effects never taken into consideration.

    --
    --- Most topics have many sides worth arguing, allow me to take one opposite you.
  52. Why not a gradually-degrading array instead? by mi · · Score: 1

    Our conclusion is that having N(N + 1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years.

    Instead of keeping the spares inside as just that — spares — can it not start using all of them (in a sufficiently redundant configuration) and gradually lose capacity as physical disks fail?

    Yes, it would require coordination with the driver and filesystem, but there is nothing insurmountable in that...

    --
    In Soviet Washington the swamp drains you.
  53. Unfair advantage. by koleczek · · Score: 1

    One of the authors is a Catholic priest. He probably blessed the drives first.

  54. Flawed logic by JerryLove · · Score: 1

    "We observe that the same objectives cannot be reached with RAID level 6 organizations and would require RAID stripes that could tolerate triple disk failures."

    That's true only if you assume that three disk failures occur faster than a single disk can be rebuilt.

    If you assume no more than two disk failures *during the length of time it takes to rebuild the array* then RAID 5 or RAID 6 works fine as long as you assign enough hot spares.

  55. expensive BECAUSE four hour service by raymorris · · Score: 1

    >. service calls are expensive. Perhaps a company has a 4 hour response time -

    Service calls are expensive BECAUSE it's an emergency. If you have four spares, plus the two parity drives, you're still six drives away from a problem. With a few spares, you can easily replace one by sending it UPS ground, rather than having a tech run out there immediately.

  56. Re:Naive to say the least. by SimonInOz · · Score: 2

    er, last time I checked, 100,000 hours is 11 years.
    273 years is 2,400,000 hours. Did you lose the use of your calculator?

    --
    "Cats like plain crisps"
  57. Re:TLDR; 2D arrays wit a ton of spares are reliabl by Em+Adespoton · · Score: 1

    Indeed -- remember the experiment posted on Slashdot a year or so ago where they measured the MTBF across drives purchesed in batches and outside batches? Failures tended to cascade within the batch; other batches would cascade at different times.

    So that entire cluster is likely to fail catastrophically unless you're swapping in drives from new batches from time to time -- at which point it should last MUCH longer than 4 years with data integrity. Bonus points if your array can handle size boosts over time (swapping in larger disks).

  58. Dick array by vipvop · · Score: 1

    Call me when there's a dick array with 99.999% availability

  59. Re:Naive to say the least. by grylnsmn · · Score: 3, Funny

    That is one of the greatest subtle Wrath of Khan references I've seen yet.

    Spock: "Admiral, if we go by the book, like Lieutenant Saavik, hours would seem like days."

    Masterful!

  60. Math by jklovanc · · Score: 1

    The number of drives seems to be large. The calculations are exponential therefore as the cluster gets bugger the number of spare disks get much bigger.

    Drives spares Total
    5, 15, 20
    10, 55, 65
    30, 465, 495

    That's a lot of disks. There is a point that space and power overcomes the human cost.

  61. Re:Not impressed by Immerman · · Score: 1

    Not a little more reliability, a LOT more reliability.

    A single drive's 100,000h MTBF translates to a 5-nines reliability period of only 1.4427 hours: 0.99999^(100,000h/1.4427h) = 0.5

    --
    --- Most topics have many sides worth arguing, allow me to take one opposite you.
  62. Re:N(N+1)/2 spares by Immerman · · Score: 1

    Reread the summary - N is the number of parity disks, not the number of data disks.
    N parity disks
    N*(N-1)/2 data disks
    N*(N+1)/2 spares
    So roughly the same number of spares as data disks, and the number of parity disks scales as twice the square root of that number. Pretty impressive if you're talking haigh-capcity data storage with 100s or thousands of data disks.

    Also data reliability is something very different than uptime: you don't lose data for only 5.26 minutes per year - once gone it's gone.
    Meanwhile a single drive's 100,000h MTBF translates to a 5-nines reliability period of only 1.4427 hours: 0.99999^(100,000h/1.4427h) = 0.5

    --
    --- Most topics have many sides worth arguing, allow me to take one opposite you.
  63. Re:N(N+1)/2 spares by Immerman · · Score: 1

    N is the number of parity disks - the number of data disks also increases as N-squared.

    --
    --- Most topics have many sides worth arguing, allow me to take one opposite you.
  64. So... by guruevi · · Score: 1

    They 'invented' RAIDZ3? Or they are perhaps using ZFS or something similar internally and not telling anyone (like so many in the industry). Sure you can achieve very high reliability using ZFS but most systems maintain those 9's by a) having hot spares and b) replacing disks that failed in a timely manner. They are simply adding more hot spares so a service call is less important, you can just go by and replace 5 disks at a time whenever you need to expand your storage.

    They also forgot to mention that once disks start failing, you could easily have a whole set of them fail. Especially with firmware issues or if someone dropped an entire box in shipping. Once you drop below 2 hotspares/10 disks, you are in serious risk of degrading your system because disks could fail while rebuilding as well.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  65. Why we have RAID and Spares by Virtucon · · Score: 1

    Calculating the the System MTBF of 77 drives at 100,000 hours as a subsystem we'd expect to have a drive failure approximately every 1300 hours. That's not the reality of most observations/environments but it's enough to have a least a couple of spares on hand and why we have things like Raid 6 and ZFS. It also doesn't necessitate you having tons of spares onsite either.

    --
    Harrison's Postulate - "For every action there is an equal and opposite criticism"
  66. Re:Naive to say the least. by hcs_$reboot · · Score: 1

    er, last time I checked, 100,000 hours is 11 years.

    Oh you check that a lot?

    --
    Slashdot, fix the reply notifications... You won't get away with it...
  67. Re:TLDR; 2D arrays wit a ton of spares are reliabl by BronsCon · · Score: 1

    That's why, as the manufacturer of such a system, you refuse to sell it bare. Your customers won't complain if you tell them what the bare cost, cost per disk, and labor cost to install a disk are, and sell disks at cost and with reasonable labor. Make money on your hardware, bring in enough to pay for assembly based on disk install labor.

    That's only step one, though. Start ordering disks when you start your first production run of hardware. Order direct from manufacturer, and from as many suppliers as possible, so you get disks from as many batches as possible. Then, continue placing frequent, but small, orders from whoever can get you the disks the cheapest; it may work out that you can get volume pricing from the manufacturer by telling them "I'm going to need X disks over all and am willing to pay for them up front, but I need them shipped (X/52) per week from current stock at the time of shipping, don't set aside my disks out of the current batch to ship at a future date".

    It's a bit more labor, but compare serial numbers and attempt to color code by batch. Use colored dot stickers for this. When fetching drives for an installation, try and get an even distribution of colors, so you don't have an excess of drives from any given batch, and always record who has which drives, so if you start getting failure reports that indicate a bad batch, you can proactively alert the customers who have those drives that it might be a good idea to have you swap them even if they still appear to be functioning.

    All of that drives up the cost, of course. I'm not going to sit here and to the math to figure out what the cost would be, as there are simply too many assumptions and I have too little time, but if you've nothing better to do and don't mind making a couple dozen, likely provably wrong, assumptions, you can have at it.

    --
    APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
  68. Re:Naive to say the least. by BronsCon · · Score: 1

    I'm sure he used a calculator, seems he simply forgot to divide by 24.

    --
    APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
  69. A solution in search of a problem. by sirwired · · Score: 1

    If service call costs for one or two disks are prohibited, simply put in enough spares so you only have to roll a tech for, say, 10 drives.

    Alternatively, make them user-swappable. If all the customer has to do is ask their tech to yank drives with a Blinky Amber Light of Doom, even the most untrained monkey could figure that out.

  70. Re:Naive to say the least. by whoever57 · · Score: 1

    Yes, I goofed. However, believing that 11.4 years is what you'll get in practice is also naive,

    Not, it's not your basic conversion error that's the problem.

    A MTBF of 11.4 years does not mean that a typical array will have a lifetime of 11.4 years. From Wikipedia:

    Once the MTBF of a system is known, the probability that any one particular system will be operational at time equal to the MTBF can be calculated. This calculation requires that the system is working within its "useful life period", which is characterized by a relatively constant failure rate (the middle part of the "bathtub curve") when only random failures are occurring.

    You are conflating "useful life period" with MTBF. They measure different things.

    --
    The real "Libtards" are the Libertarians!
  71. Enterprise drives have stronger magnets! by mveloso · · Score: 1

    They have stronger magnets because they need to write that data more harder than normal drives.

  72. Re:N(N+1)/2 spares by Immerman · · Score: 1

    That's certainly what it says in the summary. As for distinguishing between parity and spares - I should think that would be obvious: the parity disks are in active use, the array can't detect/correct errors without them. The spares meanwhile are just sitting there, presumably powered down, until one of the active disks needs to be replaced.

    As for the equivalence in the number of spares... I suspect it's not exactly coincidence, more like human nature: "Okay, we've got a cool 2D parity system - let's see just how long it will maintain 5-nines reliability if we give it one spare for every active drive. Over four years! Cool, for the press release lets juice it up a little and rephrase that as 'more than enough for five nines for four years'."

    --
    --- Most topics have many sides worth arguing, allow me to take one opposite you.
  73. Or... by AlchemyX · · Score: 1

    You could use ZFS with RAIDZ3 and multiple spares.

  74. Nice Pun dad! by MacColossus · · Score: 1

    Disk Array, "Sans" maintenance.

  75. By the book, Admiral by AkkarAnadyr · · Score: 1

    Spock, if that array isn't rebuilt in two hours, get that rack out of there and back to a Service Bay.

    --

    I bought this house and you know I'm boss
    Ain't no h'aint gonna run me off

  76. Re:Naive to say the least. by SimonInOz · · Score: 1

    every 11 years, or when my inbuilt estimation engine says "these figures are wrong, let's just check that".

    Said engine was especially useful when we used slide-rules (you might have to look that up), as I did at high school. It still is, because the world is full of people who blindly believe stuff.

    Not you of course.

    --
    "Cats like plain crisps"
  77. Re:Naive to say the least. by SimonInOz · · Score: 1

    Even Jupiter's day is 10 hours. (Ok, 9.9, but close enough).

    Maybe if we speeded up the earth's rotation a bit ... yeah, let's do that, make it one hour. Oh boy, effective gravity has gone slightly negative at the equator, we are losing our atmosphere, and cows will fly, perhaps over the moon, though mooing seems unlikely.

    Nah, I vote to leave it alone and do arithmetic properly. Boring, but we should live longer (though maybe not in days).

    --
    "Cats like plain crisps"
  78. Re:Naive to say the least. by BronsCon · · Score: 1

    Err... didn't see who the original bad math was done by. I mean "she"... I think...

    --
    APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
  79. Predictable failure. by leuk_he · · Score: 1

    It also assumes a normal failure of drives. However modern drives do not always fail normal. They develop slow spots, timeouts from which they might recover.

    Also the software to create the redundancy might fail, of it might fail if you do not update the firmware.

    And I am not even talking about catastropic failure. When a drive overheats you might want to remove it from the datacenter.

    1. Re:Predictable failure. by stoatwblr · · Score: 1

      "However modern drives do not always fail normal. They develop slow spots, timeouts from which they might recover."

      Enterprise firmware marks the spots bad after 7 seconds and carries on. The assumption is that redundancy will cover the loss.

      Consumer drives spend quite a while trying to recover data, on the basis that there's no redundancy, so it's worth a 5 minute hang to try and get the data.

    2. Re:Predictable failure. by leuk_he · · Score: 1

      You are referring to loosing a sector on the platter. That is exactly what the study assumes. loose a sector (detect that and do somtething with that ) or loose the disk.(have But there might be much more failure modes.
      -power fluctuation.
      -memory problems
      -Software problems. (Ever seen the SAN in a big compy having problems... yup, some configuration issue)
      -driver/interfaceproblems.

    3. Re:Predictable failure. by stoatwblr · · Score: 1

      "-power fluctuation."

      Redundant power supplies.

      "-memory problems"

      ECC ram, etc - but that's outside the scope of the disk array design anyway

      "-Software problems."

      Outside the scope of the design

      " -driver/interfaceproblems."

      Ditto

      They need to be taken into consideration for an end-to-end solution, but the approach in the study was strictly to the array level.

      If you design in appropriate redundancy then loss of any component is a routine replacement issue, not an emergency.

  80. Cheaper solution? by Zorpheus · · Score: 1

    How about having like 10 additional spare discs in your rack, and calling the service for replacement when 10 discs died? The cost of the service call does not matter much when it is for many discs at once.

  81. Re:N(N+1)/2 spares by Immerman · · Score: 1

    We propose to eliminate [disk replacement] calls by building disk arrays that contain enough spare disks to operate without any human intervention during their whole lifetime ...we have simulated the behaviour of two-dimensional disk arrays with N parity disks and N(N – 1)/2 data disks under realistic failure and repair assumptions. Our conclusion is that having N(N + 1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years.

    Are you seriously telling me you read that and get that they're creating a disk array out of spare disks that can provide 5-nines reliability for four years without involving any disk replacement? Methinks you need to invest some serious effort on your reading comprehension skills. Not to mention your sanity-check skills.

    --
    --- Most topics have many sides worth arguing, allow me to take one opposite you.
  82. Really? A Paper for this? Saw it in 2000 by addikt10 · · Score: 1

    Some company was doing this in the Bay area in 2000.
    Hotplug is expensive. Cases are expensive. Making room for human access is expensive.
    Design for nothing but airflow and drive density, keeping pieces as absolutely cheap as possible. Gigabit instead of 10G.
    At exabyte scale, why do you care about the loss of 4TB? Using Super Micro boxes w/4TB Drives, you can have over 6 petabytes of raw storage in a 72u rack / cabinet

    Metadata servers keep track of where the copies of blocks are.
    Put copies of the blocks on completely disparate systems. If there is heavy read usage of a block, make more copies.
    Head servers scale and have some beef to them. They are all about getting info from the commodity stuff and packaging it for (subscribers, clients, whatever).

    If a drive dies or has issues - mark it bad and leave it at that. Ignore it.
    If a server dies, mark it as bad. Leave it.
    In 4 years you are forklifting the equipment and replacing it with new storage.

    There is no "RAID", other than there are multiple copies of blocks throughout the system.

    I met with a company in the bay area doing this in 2000 (I don't remember which one). It was dealing with Filesystems and not block, but with NFS, VMDKs, VHD, etc, who cares. I don't see anything new here at all.

  83. Re:Really? A Paper for this? Saw it in 2000 by addikt10 · · Score: 1

    I used the wrong Supermicro box to make my point - I selected the pure storage, vs server with storage.
    So 72 drives instead of 90 per 4U. 5.5 PB per 72U instead of "over 6".
    The rest of my points stand.

  84. NetApp or EMC? by Lost+Penguin · · Score: 1

    I'll believe others when I see the uptime....

    --
    I am the unwilling control for my Origin.
  85. Until the disk drives fail en masse by Antique+Geekmeister · · Score: 1

    This has happened repeatedly. The most notorious example is the "IBM Deskstar", which failed en masse after consistent amounts of use. They destroyed RAID arrays around the world because the individual drives could not be replaced fast enough to secure the data before multiple drives went offline simultaneously.

  86. Wrong N by Chirs · · Score: 1

    They have N parity disks, and then roughly N(N-1)/2 data disks and roughly the same number of spares.

    In larger arrays the overall overhead of the parity and spare disks is slightly under 50%, or roughly equivalent to RAID-1, but more reliable since the spares can be reassigned as needed.

  87. checksummed filesystems by Chirs · · Score: 1

    The solution for this is checksums and parity on the disk contents at the filesystem level. Read a block off the disk and check the stored checksum against what you read...if it doesn't match then use the parity information to correct the data and store it somewhere else.

  88. Re:Naive to say the least. by goarilla · · Score: 1

    Most should make it to their second year (>=8640 hours).
    In our small 24 bay array I've seen a lot of those bad Seagate ST3000DM001 fail at ~15000-19000 hours.

  89. Re:TLDR; 2D arrays wit a ton of spares are reliabl by stoatwblr · · Score: 1

    If you run raid 66 (a raid 6 array of raid 6 arrays) then you get that much more protection.

    Not that raid6 is anywhere near good enough since 2Tb drives came along. There's around a 10% chance that you'll lose your remaining spare during a parity rebuild from a drive loss on a 12+2 disk array and a 1% chance that you'll lose another drive recovering from that (I've seen it happen)

    This is one of the reasons for considering ZFS raidZ3. One of the other reasons is that because it uses SSD buffering and caching, drive seek activity is smoothed out and heavy head seek is one of the prime life shorteners in mechanical hard drives (I've had identical array hardware using the same batches of drives and the ones which get hit hardest for random IO are the ones where drives fail more often.)

  90. Re:Good Luck With That (tm)... by kmoser · · Score: 1

    Just don't drop the whole thing on a concrete fllor, otherwise every platter will fail immediately.