Proposed Disk Array With 99.999% Availablity For 4 Years, Sans Maintenance
Thorfinn.au writes with this paper from four researchers (Jehan-François Pâris, Ahmed Amer, Darrell D. E. Long, and Thomas Schwarz, S. J.), with an interesting approach to long-term, fault-tolerant storage: As the prices of magnetic storage continue to decrease, the cost of replacing failed disks becomes increasingly dominated by the cost of the service call itself. We propose to eliminate these calls by building disk arrays that contain enough spare disks to operate without any human intervention during their whole lifetime. To evaluate the feasibility of this approach, we have simulated the behaviour of two-dimensional disk arrays with N parity disks and N(N – 1)/2 data disks under realistic failure and repair assumptions. Our conclusion is that having N(N + 1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years. We observe that the same objectives cannot be reached with RAID level 6 organizations and would require RAID stripes that could tolerate triple disk failures.
So I tried to view the PDF, and it says "can't use the plugin, it causes problems on our server". So I figured I'd just download the file with wget instead. Nope, 403 forbidden.
Looks like fetch works though. If anybody else has trouble getting the file, try my local mirror.
I read the internet for the articles.
The bottom line is, having a lot of spare disks for a 2D array makes it reliable over time. These configurations of 2D arrays are quite reliable, over time because they have many spares available to automatically replaces failed disks:
Data parity spare
12 3 13
12 3 14
24 6 20
36 9 26
To understand the above table, we'll use the first row as an example. An array made up of 1TB disks 12TB of data space would have 3TB of parity and 13 spare 1TB drives, for a total of 28 drives to get 12 drives worth of net storage.
What they didn't mention is that the same reliability can be achieved with only three spares, by replacing spares at your convenience. Replacing drives can be somewhat costly if it has to be done quickly, but if you can schedule to replace the failed drive "some time in the next two months", that probably won't be costly.
100,000 hours = 273 years. Does anyone believe that?
Everyone except you apparently.
When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
Many high end equipment does have fairly large capacitors to allow enough power off time to do a clean power off.
I remember back in the 1990's some PC Centric folks were looking in a Sun Workstation they were surprised about all the large capacitors that were on the motherboard. In short it gives the system enough time finish its final calculation before the power goes out.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
"More work is still needed to define policies that would allow array users and manufacturers to detect unusually disk failure rates and take the appropriate actions before any data loss takes place." - Last line in the conclusion.
This implies that not all the spare drives are active and ready to go all the time and that some/most would be kept powered down as cold spares. Of course this same guy is likely to get another paper done where he examines the cost to run the array and how many drives could be left cold and still achieve the 5-9s reliability. Heck, if the software managing the drives is smart, it would rotate active/spare drives in and out, working them in quickly to get them all past the 'first 18 months high failure' rate to the sweet spot, then swap in and out over the lifespan of the array to enable the array to be at highest reliability for longer.
Hrmm, maybe I should look at building such an algorithm, a quick google search doesn't turn any such systems up.
...
Yeah, we get it. You like to deal with cutting-edge stuff. Now get off my lawn.
Sent from my Commodore 64.
Get free satoshi (Bitcoin) and Dogecoins
Just a few things I thought of while looking at this study:
The authors are using Backblaze data. Backblaze uses consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use.
I'm willing to bet that none of the authors of this paper have ever had to pay for colocated rack space, power, and cooling either, they've just doubled the RU that I need for storage. At $1500.00 - $2000.00 per rack that adds up.
Doubling the rack space for storage I need so I can avoid a few service calls by my storage vendor over 5 years simply isn't efficient.
We've installed close to 500TB of archival storage using commodity hardware and 2-3TB Nearline SAS. We have maybe 3 hand and eyes calls per year for disk replacement.
Anyway - just rambling.
I don't trust anybody who has published a document with the title "C:\Users\Jehan-Francois Paris\Documents\ADAPT15\Case3.doc." Not even in .docx format. Tsk tsk.
The question posed is whether the human intervention (labor charge) saved is worth more than the power costs.
That is one of the greatest subtle Wrath of Khan references I've seen yet.
Spock: "Admiral, if we go by the book, like Lieutenant Saavik, hours would seem like days."
Masterful!
Sloppy calculation tip: 24*365 = 10000.
If you're Sloppy enough to accept that premise, then at 10 cents/KWHr, a Watt costs a dollar per year. It makes your $28 turns into $32, but hey, close enough. When I'm shopping, I can add up lifetime energy costs really fast, without actually being smart. Nobody ever catches on!
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
This is how we're going bring our keepers to their knees, and eventually break out of the Matrix. We spend imaginary money on imaginary storage and then put all sorts of high-entropy stuff on it and run calculations to verify that it's really working, but they have to spend actually real resources, to emulate it.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.