Data Recovery from ReiserFS RAID Array?
Ruatha asks: "We've recently had a problem with a ReiserFS RAID-5 array - two of the disks failed and, of course, some of the people using the array didn't have backups of their data...Ontrack have returned the disks because they can do nothing with them due to the FS we used on the array. Does anyone know of a company that can deal with data recovery from a ReiserFS RAID-5 array?"
Repeat after me, please: The purpose of IT is to help users, not the other way around.
If an employee at this worthy's company lost data, it is the responsibility of the IT department to attempt to recover that data, within reason. That's what the IT department is for. This is a sensitive subject to me, because the IT department at my company closed down the IMAP port on our firewall tonight for what they called "security reasons," despite the fact that (1) we've been running IMAP over that connection for years now, and (2) the connection is encrypted with SSL. It literally took my yelling into the ear of the CTO over the phone, after calling him at home late in the evening, to get this problem fixed. The pervasive attitude of indignant hostility from IT departments in all sorts of industries is really starting to burn me up.
If you worked for me in my IT department, and one of your RAIDs failed, and I had un-backed-up data on it, the only answer I'd want to hear from you is, "Yes, sir, we'll do the best we can and get right back to you." If I even heard a hint from you of the "you were irresponsible so it's not my problem" vein that you showed in your post, you'd find yourself being escorted out of the building carrying your stuff in a cardboard box. And we'd expect you to return the box.
So just keep repeating it to yourself: The purpose of IT is to help users, not the other way around.
Could they have recovered from ext3?
Very likely, yes. Ext2 recovery techniqes are well known, documented, and tools exist (if rudimentary) for recovering files from it. I believe that this translates well to ext3.
Also, if you want someone to recover the data and you're willing to drop some money on it...I strongly suspect that there are people on the reiserfs team that would take on a recovery job quite happily. No one knows reiser better!
That being said, there is currently no good, easy-to-use, powerful recovery tool for Linux filesystems, rather depressingly. Now, you *could* argue that this is because the filesystems are so great, but even so...
May we never see th
First of all, since _two_ disks got screwed at the same time, you've lost the "normal" chance of getting the data back. RAID-5 ensures that if ONE disk fails, you can get the data back due to the parity-stuff - but not if two disks fail. You probably know this.
;)
So, what needs to be done, is to get one of the disks back online again. That *should* be possible, and if nothing really really bad happened, should, i think, in theory, be as easy as getting a lab to pull the disk-platters out and put a new motor / new electronics on them. I'm not sure about this though
Preferrably it could be done by the disk-manufacturer.
You could also check out an excellent company in Norway called IBAS. Check out http://www.ibas.com/america/index.html for their american office. They are really excellent at data reconstruction.
"Rune Kristian Viken" - http://www.nwo.no - arca
This may or may not work, however, I've successfully recovered data about 10 or 12 times using this method.
Find a working drive of the same model, take the electronics board off of it and swap it onto the bad drive. Typically when I have a drive fail, it's the electronics, not the mechanical portion of it. So far, this has worked every time for me, one was a Quantum Fireball, and the rest were all Seagate SCSI drives (some FCAL and some ultrawide).
If you had two disks fail at the same time, chances are it's the electronics. Once you recover the data, I would take a serious look at your RAID controller and possibly replace it. I had a bad RAID controller that kept frying drives, and once I replaced it I didn't have anymore problems.
Need Free Juniper/NetScreen Support? JuniperForum
Simultaneous disk failure is about as rare as winning the lotto while simultaneously having you and your friend on the other side of the planet get struck by lightning - unless there's a common larger problem (e.g. power surge to both drives or something).
As a practical example, 4 or 5 years ago I had large amount of disks attached to some large oracle servers, roughly on the order of 600 or so hard drives in several arrays taking up several racks, all the same manufacturer/model, with a handful of groupings of revision/lot/date.
This set of disks was seeing fairly constant and heavy activity for a few years while I was there. As you can imagine, with 600 disks and the usual MTBF numbers, we quite regularly had disk failure. We kept a few spares onsite and replaced them as they failed, then exchanged the dead drive for a new spare. As I roughly remember it, we probably averaged about one disk failure every 2-3 weeks. Two, perhaps three times, we had a double disk failure during a 24 hour period - but they were never close enough that we didn't have plenty of time to replace the first (and in any case, odds are slim that two failed out of 600 would happen to affect the same data).
Of course, another point back at the original guy with the failed disks - don't use raid 5, chunk out some more money (disks are cheap) and do proper mirroring - and if you stripe use 1+0, not 0+1.
11*43+456^2
Heh, I was working on a bit LDAP based site once. We had two LDAP servers and a load balancer to fail one out if it shit itself. Couple of months in, come in one morning to discover LDAP down and the site wiggling it's feet in the air. Why didn't the load balancer fail over onto the other LDAP?
Once we'd done the autopsy we discovered that it did.... days ago and nobody noticed. Hence when the second LDAP went as well we were left high and dry.
Dave
I write a blog now, you should be afraid.
so users who shirk defined job responsibilities can foist the blame onto their unknowing IT departments?
I find that pill a little hard to swallow. I would however examine the chain of decisions that allowed the implementation of what would appear to be an 'unsupported' FS for what is valuable, if not critical data.
Note: Now I know that ReiserFS is an actual standard with a vocal and dedicated following, and it servers a useful purpose being a journaling alternative to ext3. But if it cannot be accomodated in your disaster recovery plan, it would have to be considered 'unsupported' against that measure.
All in all, I would have to agree that the critical misstep was in the bailwick of IT, but I would hesitate to call it a failure in execution, it was a failure in planning, which has deeper roots in project management and communication.
But hey, someone's gonna burn some oil trying to get back those numbers. good luck to the original poster.
"If I wanted your input on my pet project, I'd stick my hand up your ass and use you like a sock-puppet." - Muse
This is certainly true, but you should consider the flipside of it. The typical way it works with IT departments is that they are given unfunded mandates right and left. There is no possible way they can do everything with the money they have. What should happen is that some stuff should be taken off their plate. But they rarely have the political pull needed to do that, so what actually happens is that either everything is done poorly or the IT guys work on what they think is important.
So before you go pointing fingers at the IT department's attitude, it would be good to ask, "Did they tell the managment that they needed a way to back up those machines? And did the managers give them the necessary time and funds?"
Every IT person I know with a bad attitude has didn't start that way; they acquired it through years of crappy management.
Assuming you can find one of them, use google for gods sake. Get them to read the drive linearly. At this point, they should be able to put each drive onto new drives for you. Build a raidtab that matches the new drives they give you, do raidstart /dev/mdXX, do a fsck.reiserfs on it. Mount the thing. Poof like magic you have data again.
I've heard of several people recovering Ph.d and masters disertations this way. It's why the DoD has such strict standards about destruction and disposal of a hard drive. It's very difficult to delete data from a drive so it can't be recovered. It's can just get out of control expensive to recover the data.
If it's worth $5-10K, and you can deal without having it for several days, this might actually work... If the company bitches, just tell that it's about what a good tape setup would have cost if they had bought one.
The two guys who said try stripping the electronics are pretty brave SOB's, but that might be worth trying if you can't spring for some experts to deal with that for you.
Kirby
For me the guys are www.vogon-data-recovery.com
There are others, but I always seem to see these guys at the forefront...
just my 2 pence..
That is not a smart strategy.
Alot of storage is very finicky -- and odd low-level problems emerge when you mix disk manufacturers or even disk models in some cases.
When Sun 5200 fibre arrays first came out, if you got a batch of Sun-branded drives that were manufactured by different vendors, you would have all sorts of odd locking issues and other goodies.
A good strategy for reliable storage:
- Don't be cheap. You get what you pay for.
- Plan for Disk-to-Tape (or DVD) backup, actually test restores regularly-
- Use well-supported, stable versions of filesystems
- Don't buy the latest and greatest unless there is a business reason to do so
Conformity is the jailer of freedom and enemy of growth. -JFK