Recovering a Wrecked RAID
Dr. Eggman writes "Tom's Hardware recently posted an article specifying how the professionals at Kroll Ontrack recover data from a RAID array that has suffered a hard drive failure, allowing for recovery of even RAID 5 arrays suffering two failures. The article is quick to warn this is costly, however, and points out the different types of hard drive failures that occur, only some of which are repairable. Ultimately the article concludes that consistent backups and other good practices are the best solution. Still, it provides an interesting look into the world of data after death."
Any fanboys or critics of RAID5 here?
It takes far too many pages to say what could actually fit in a page or two.
Never put all of your eggs in one little basket (RAID or otherwise)! For the love of God, if your data is critical, you need a backup *and* an offsite backup. At least one of each. There are no exceptions to this rule.
People often poopoo software RAID (it is more of a pain to manage). But when it comes to recovery, it's what you want. You know the disk format and have the tools. Of course, you really shouldn't have to recover, you should keep good backups or another mirror if its that important.
Could these articles be any more annoying to read?
They painstakingly
NEXT PAGE
pull data
NEXT PAGE
off the
NEXT PAGE
damaged drive
And tell them to cause more damage next time or I will tell everyone they secretly admire Steve Ballmers commitment to developers.
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
HTH.
OK, this is for the very extreme (and rare) cases where the disk is physically very damaged. Most of the time, you'll find that available tools are enough. See http://en.wikipedia.org/wiki/SpinRite, for example. Has worked for me, but 1. Copy the entire disk contents first. 'Low-level' disk-to-disk dup utilities (Seagate...) can work fine here. 2. Be prepared to wait. Of course, if your disk is on its way out, the intensive reading, (and writing, in the case of SpinRite) may accelerate its demise. Keep the disk at a constant, cool temperate, (stick it in a domestic freezer if you've no aircon).
http://www.tomshardware.com/2007/02/14/raid_recove ry/print.html
I don't know why TH has printer friendly pages that they don't ever link to.
[Fuck Beta]
o0t!
I have a concern with the recommendations given in the introduction:
We assume that all hard drives will be handled with care, so they should be installed in suitable drive bays. If you use multiple drives, we recommend removable drive frame solutions, which help reduce vibration transfer onto the computer chassis and even back to individual hard drives. Make sure that your system has sufficient ventilation, so high speed hard drives won't overheat.
I've found that the removable drive frames available for cheap consumer hardware to be total crap. The metal enclosure keeps heat close to the drive, and the tiny fans used don't move nearly as much air past the drive as when it's inside the case, being cooled by the airflow of the case fans. The drive temperature is therefore higher even under the best conditions. In addition, the smaller fans fill with gunk quickly and as a result wear out faster than larger ones, leading regularly to a drive trapped in an uncooled box.
I've used enclosures from Promise, Enermax, and several other companies whose products were so bad I tried to forget their names; all had fans that instantly became the least reliable part of the entire system once I installed the drive frame, and I wasn't happy with the drive's temperature from day one.
I don't think the person making this comment at Tom's ever keeps systems running long enough to realize the long-term issues that come with anything cheaper than server-grade drive enclosures for hard drives. I'd welcome suggestions for a better quality product in this category. It's a hard subject to cover, because by the time you've had several units setup for a year or two to gather useful data on how rugged they are, the product is obsolete; not something any review site I'm aware of is setup to cover.
Sony ha
SpinRite is a Steve Gibson product. Steve Gibson is a pompous blowhard with few real skills. There are plenty of other ways to do a low level copy of a disk.
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
RAID 11? Or, more to the point, how would I implement a mirror, but with 3 drives? Does linux 'md' do this? How about any controllers?
After all, we're supposed to replicate data 3 times, right?
A sun D1000 loaded with latest-generation 300GB disk drives? Not a bad solution, slow, and not the cheapest.
Apple X-serve RAID? Cheapest - does it work reliably with Linux or Solaris? Word in the street is that it does, but I have not seen a demo yet.
We're actually going with recycling our ancient D-1000s and A-1000s with no-name 300 GB SCSI drives. Pretty old school, but reliable.
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
With recent articles on HDDs not being very good for redundancy (because they often fail at the same time if they are from the same batch, or fail because of things like electrical spikes which affect all drives in an array) it is clear that HDDs are not an ideal backup medium. I use an external 2.5" HDD which is totally disconnected from the PC and everything else when not in use (to avoid power surges etc), but only for critical data as my machine has 1.2TB of HDD storage.
Optical discs are a joke - 4.3GB is just not enough. Larger formats exist but are relatively expensive. Tape is expensive per MB and slow, plus it isn't random access and not suited to anything but slow full backups. MO is too small and expensive.
It seems like the best bet is something like a Century Tower - basically a USB enclosure that can take up to 4/8 drives. Keep it totally disconnected when not in use, and use RAID 0 mirroring with drives from different manufacturers.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Why is there even a criticisms section?
OK, he's not always right and may be egotistical...
And that makes him different from other people how?
Linus Torvalds is constantly getting in disputes with kernel hackers and such. There's no "criticisms" section on his entry?
Oooh, and this just happened to me a few weeks ago. well, not quite, but close enough.
I had an LVM container that sat on a RAID-1 volume go bad.
the lvm tools couldn't reconstruct the container, so I effectively 'lost' my partitions.
There wasn't any program I could find which would scan the raid volume for the data partitions,
so I ended up cobbling one together on my own, out of the sources in the ext2-tools distro.
And yes, I did get my data back, and no, i'm no longer using LVM containers.
Support FSF: Stop thinking with your wallet, and think with your imagination. (cc/non-commercial)
Oh, I'm just trolling. Ignore me. It was mostly an excuse to make a "hack the gibson" joke. It was a dumb joke anyway.
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
Linux md RAID-1 allows you to replicate to n number of drives, PLUS set m more drives as spares that will be automatically substituted for failed drives without intervention. You can spread the drives among as many controllers as you want.
Of course you need off site backups too (fire, theft, lightning, human error).
Why don't people just run hot spares in their arrays? I lost a drive in my primary RAID 50 array and it re-built it from the hot spare in less than an hour. Now I'm down to "only" 1 hot spare left and I have plenty of time to replace the other one before I get worried.
Of course we also have 6-hour off-site snapshots to another RAID 50 and 3 month tape archives.
Keep in mind this is for an academic computer system, so we're pretty happy with the level of reliability.*
*Except that when some user creates 78,000 files in his home directory it causes rsnapshot to fail because the GNU cp command can't handle that many files in one directory...
I'm a big fan of the hard drive->freezer method. It has been alleged that putting a broken hard drive into a freezer can sometimes make the data readable again for a short period of time.
This is good reading:
http://storagemojo.com/?p=383
Short synopsis for those who don't want to read it: The rebuild process is intense enough to cause secondary failures in many more cases than you'd think. Because you haven't seen it yet is not indicative of the overall population, and sysadmins are payed to be prepared.
The rest of your post is arguable, but it's more a matter of opinion and practice than anything else.
Besides having a backup not connected to system, i found simply having a spare disk to steal the circuit board off of to be a life saver :)
I miss the old bigfoot drives we had, everyone said they had problems with them but it was always (in our case) the board that died NOT the disk. I saved a couple of those by swapping in a board for a 1 hour recovery.
If you buy several HD for RAID or whatever buy one more and stick it on shelf for a rainy day. Along with a few utilities you can do 3/4's of what they do for $100 instead of $1000+
Really, a wrecked raid can be fixed pretty easily if you have enough warlocks to get everyone a soulstone.
Your mind is clear / The things that you fear / Will fade with how much you / Believe what you hear
When a disk fails in a RAID, it becomes an AID.
Or in the case of RAID-1, it becomes just an ID.
Most people don't even think inside the box.
I attended a small conference where the Kroll VP of Data Recovery was speaking. He came in, his assistant set up his power point stuff, made sure the projector was right etc. He then gave a very interesting talk about what Kroll could pull off of a drive, despite what had been done to it. By way of example he showed a slide of a burnt and bent hard drive - that came out of the sky when the shuttle broke up. They recovered 99% of the data on that drive. He also mentioned that they do the data recovery for all of the spook organizations in D.C.
When we broke for lunch I got to sit at his table and we got to ask him all sorts of questions about their processes. He mentioned they have things they use that they have never patented because it would be too much of a leg up for both the competition and those that seek to destroy data. We tried to get him to tell us what we would have to do to a drive to make it unreadable. Mostly his answers to our "Surely this would make the data unreadable" queries were "You would think that would work wouldn't you?" Someone referenced his assistant who was sitting next to him and the VP said:
"Him? No, no, no. (laughs) He is not my assistant, in fact he doesn't work for me at all. He is a lawyer for the company and is here to make sure I don't say anything I am not supposed to." The assistant then gave us one of those 'I could eat you alive' lawyer smiles.
I walked out secure in the knowledge that short of melting the platters down the data can *always* be recovered.
Sera
Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
As long as you know how the RAID config was setup(striping size), most disk recovery programs will do the job just fine. GetDataBack NTFS is functional and simple tool to use as long as you know how the disks were setup. Including RAID5...I've rebuilt 3 RAID5's and a shitload of 0's, 1's, and 01's. You should see the look on some of these people's faces after your done(with all 18+hrs of it...)The problem usually I find is that if you recovered the data then the customer is usually under the impression that you *fixed* the disk and they can keep on using it without replacing it...so yeah, it's not a big deal it's just a question of how much time you want to spend and how much time you have to finish the job.
``Still, it provides an interesting look into the world of data after death.''
Death before data!!!
Many complete RAID 5 failures can be "forced" back online by a good hardware RAID card BIOS, saving the money that would have been paid to Ontrack.
That said, RAID is no substitute for a backup.
All you need is the ability to read and write from at least one of the two failed drives, even if the drive has some bad sectors.
Install a new drive in place of the drive that is completely dead, and force the "only half dead" drive to work.
Many times you can recover your data this way.
Newer RAID cards can even fail "stripes" in an array, to "write around" bad sectors on hard drives while not losing the entire array.
You may lose the data that was in the "bad" stripe, but the rest of the array still contains useful data.
I am the unwilling control for my Origin.
It doesn't.
An alternative to a drive in a removable bay frame is in a SATA external enclosure. It has the advantage of a separate power supply (I've known PC power supplies to fail in a way that sends mains through the DC lines, frying everything). An external RAID1 (mirror) disk on hot-plug SATA is also something you can quickly grab in case of a fire, or to take somewhere else to mount read-only.
Last week, i did a data recovery on a client that had multiple disk head crash from a power outage, or a kick or something. The drives were resulting in a click-seek, which for the most parts is unrecoverable.
Popped in a Helix disk, and checked what the MFT was doing. Low and behold, no MFT, no boot sector, and a huge list of bad sectors. Basically, the crash had resulted in a bad sector in the bad sector table, and all over the first portion of the disk.
These were 200GB disks, but eventually I was able to get a sector repair program to read through and do a non-destructive repair. Data was safe, but was now corrupt. Next step was to repair the data, and I was finally able to just use chdisk to repair.
Eventually, it was back to real data, and was able to push the data over to a new replacement hard drive.
Told the client to invest in RAID 1, but seriously doubt they would be willing to spend that $100 for the RAID. Instead, they prefer to pay $1000 for a repair.
BACKUPS. make lots of BACKUPS. RAID your stuff, and get those backups offsite. Do them regularly. Seriously, it would save your ass if something happens. For example, I have a LAN HD that is parked out in a shed in my backyard. Total cost $200, and has already saved my ass 2x.
Anyone who things that RAID is enough for data protection, shouldn't be in IT. Now, there are some nifty solutions, such as the stuff from Copan, but there isn't an alternative to backing up one's data, and off-siting it.
Again - if you believe that you shouldn't back up the data that's on a RAID array, get the fuck out of the profession - we don't need you, and you're going to cause serious harm.
How can you recover a RAID 5 array if your disk controller fails?
For instance, in a home desktop-turned-server with a motherboard-chipset-based controller and 3-5 disks in the array.
If the motherboard fails and is replaced, won't the disks be overwritten when reconfiguring the array?
TFA is just an advertisement for Kroll Ontrack. Basically "you can phone Kroll Ontrack ..", "the Ontrack data recovery specialist will ..", WTF?
After all, we're supposed to replicate data 3 times, right?
:^)
That's why my server has "New Folder", "New Folder(2)", and "New Folder(3)" on it.
As much as this stuff is cool, it's going to be insanely expensive to restore data from these guys.
Data integrity and uptime are served by RAID5. If it's not good enough, then it should be backed with mirroring (RAID5+0) or some form of dual-parity RAID (RAID-DP from NetApp, etc.).
But data gets lost or corrupted, even without disk failures. Backups are the place where data recovery is done. DO YOUR BACKUPS!
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
http://docs.sun.com/app/docs/doc/819-5461/6n7ht6q
For example:
But pretty much agree with everything else you've said... Though a hot spare is always nice.
Deleted
-b.
I detest Tom's Hardware for just your point. The article assumed these were hardware (like 3ware) raids. The reviewer said the formats are highly proprietary, making them difficult to decode and recover. TH is nothing but a Windows fanboy suckbutt site.
Why not Linux soft raid in a Samba server? Soft raid 1 is screaming fast on reads, great for photographers scanning the day's work, and you'll never see the read lag over the network. The format is well known. And all raid is soft anyway, even if the code is held in a ROM Bios.
Look at smeserver.org, it automates raid. ClarkConnect ain't bad, either. Either one with a Roseswill SATA card without raid and a couple of monster WDs in an old HP Pavilion that used to run Win98, you're good to go. And keep your stacks of backup DVDs in a safe deposit box.
Isn't "RAID array" redundant?