Data Recovery from ReiserFS RAID Array?

← Back to Stories (view on slashdot.org)

Data Recovery from ReiserFS RAID Array?

Posted by Cliff on Tuesday September 17, 2002 @04:40PM from the someone-get-the-bulk-eraser-STAT!! dept.

Ruatha asks: "We've recently had a problem with a ReiserFS RAID-5 array - two of the disks failed and, of course, some of the people using the array didn't have backups of their data...Ontrack have returned the disks because they can do nothing with them due to the FS we used on the array. Does anyone know of a company that can deal with data recovery from a ReiserFS RAID-5 array?"

13 of 62 comments (clear)

Min score:

Reason:

Sort:

Re:Responsibility by foobar104 · 2002-09-17 17:00 · Score: 4, Insightful

Repeat after me, please: The purpose of IT is to help users, not the other way around.

If an employee at this worthy's company lost data, it is the responsibility of the IT department to attempt to recover that data, within reason. That's what the IT department is for. This is a sensitive subject to me, because the IT department at my company closed down the IMAP port on our firewall tonight for what they called "security reasons," despite the fact that (1) we've been running IMAP over that connection for years now, and (2) the connection is encrypted with SSL. It literally took my yelling into the ear of the CTO over the phone, after calling him at home late in the evening, to get this problem fixed. The pervasive attitude of indignant hostility from IT departments in all sorts of industries is really starting to burn me up.

If you worked for me in my IT department, and one of your RAIDs failed, and I had un-backed-up data on it, the only answer I'd want to hear from you is, "Yes, sir, we'll do the best we can and get right back to you." If I even heard a hint from you of the "you were irresponsible so it's not my problem" vein that you showed in your post, you'd find yourself being escorted out of the building carrying your stuff in a cardboard box. And we'd expect you to return the box.

So just keep repeating it to yourself: The purpose of IT is to help users, not the other way around.
Linux FS recovery techniques by 0x0d0a · 2002-09-17 17:30 · Score: 4, Informative

Could they have recovered from ext3?

Very likely, yes. Ext2 recovery techniqes are well known, documented, and tools exist (if rudimentary) for recovering files from it. I believe that this translates well to ext3.

Also, if you want someone to recover the data and you're willing to drop some money on it...I strongly suspect that there are people on the reiserfs team that would take on a recovery job quite happily. No one knows reiser better!

That being said, there is currently no good, easy-to-use, powerful recovery tool for Linux filesystems, rather depressingly. Now, you *could* argue that this is because the filesystems are so great, but even so...

--
May we never see th
You probably know this, but.. by arcade · 2002-09-17 17:33 · Score: 4, Informative

First of all, since _two_ disks got screwed at the same time, you've lost the "normal" chance of getting the data back. RAID-5 ensures that if ONE disk fails, you can get the data back due to the parity-stuff - but not if two disks fail. You probably know this.

So, what needs to be done, is to get one of the disks back online again. That *should* be possible, and if nothing really really bad happened, should, i think, in theory, be as easy as getting a lab to pull the disk-platters out and put a new motor / new electronics on them. I'm not sure about this though ;)

Preferrably it could be done by the disk-manufacturer.

You could also check out an excellent company in Norway called IBAS. Check out http://www.ibas.com/america/index.html for their american office. They are really excellent at data reconstruction.

--
"Rune Kristian Viken" - http://www.nwo.no - arca
quick fix by austad · 2002-09-17 17:39 · Score: 5, Informative

This may or may not work, however, I've successfully recovered data about 10 or 12 times using this method.

Find a working drive of the same model, take the electronics board off of it and swap it onto the bad drive. Typically when I have a drive fail, it's the electronics, not the mechanical portion of it. So far, this has worked every time for me, one was a Quantum Fireball, and the rest were all Seagate SCSI drives (some FCAL and some ultrawide).

If you had two disks fail at the same time, chances are it's the electronics. Once you recover the data, I would take a serious look at your RAID controller and possibly replace it. I had a bad RAID controller that kept frying drives, and once I replaced it I didn't have anymore problems.

--
Need Free Juniper/NetScreen Support? JuniperForum
1. Re:quick fix by ChadN · 2002-09-17 18:17 · Score: 3, Informative
  
  I've had drives which were fixed by the same method; apparently heat is a major cause of this. I would also suggest trying to operate the drives in a cold room (or with some suitable extra cooling), to see if they'll work long enough to recover the data.
  
  --
  "It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
Re:What happened? by photon317 · 2002-09-17 17:47 · Score: 3, Interesting

Simultaneous disk failure is about as rare as winning the lotto while simultaneously having you and your friend on the other side of the planet get struck by lightning - unless there's a common larger problem (e.g. power surge to both drives or something).

As a practical example, 4 or 5 years ago I had large amount of disks attached to some large oracle servers, roughly on the order of 600 or so hard drives in several arrays taking up several racks, all the same manufacturer/model, with a handful of groupings of revision/lot/date.

This set of disks was seeing fairly constant and heavy activity for a few years while I was there. As you can imagine, with 600 disks and the usual MTBF numbers, we quite regularly had disk failure. We kept a few spares onsite and replaced them as they failed, then exchanged the dead drive for a new spare. As I roughly remember it, we probably averaged about one disk failure every 2-3 weeks. Two, perhaps three times, we had a double disk failure during a 24 hour period - but they were never close enough that we didn't have plenty of time to replace the first (and in any case, odds are slim that two failed out of 600 would happen to affect the same data).

Of course, another point back at the original guy with the failed disks - don't use raid 5, chunk out some more money (disks are cheap) and do proper mirroring - and if you stripe use 1+0, not 0+1.

--
11*43+456^2
Re:What happened? by WasterDave · 2002-09-17 18:02 · Score: 3

Heh, I was working on a bit LDAP based site once. We had two LDAP servers and a load balancer to fail one out if it shit itself. Couple of months in, come in one morning to discover LDAP down and the site wiggling it's feet in the air. Why didn't the load balancer fail over onto the other LDAP?

Once we'd done the autopsy we discovered that it did.... days ago and nobody noticed. Hence when the second LDAP went as well we were left high and dry.

Dave

--
I write a blog now, you should be afraid.
Re: Responsibility by digitalmuse · 2002-09-17 18:22 · Score: 3, Insightful

so users who shirk defined job responsibilities can foist the blame onto their unknowing IT departments?
I find that pill a little hard to swallow. I would however examine the chain of decisions that allowed the implementation of what would appear to be an 'unsupported' FS for what is valuable, if not critical data.

Note: Now I know that ReiserFS is an actual standard with a vocal and dedicated following, and it servers a useful purpose being a journaling alternative to ext3. But if it cannot be accomodated in your disaster recovery plan, it would have to be considered 'unsupported' against that measure.

All in all, I would have to agree that the critical misstep was in the bailwick of IT, but I would hesitate to call it a failure in execution, it was a failure in planning, which has deeper roots in project management and communication.

But hey, someone's gonna burn some oil trying to get back those numbers. good luck to the original poster.

--
"If I wanted your input on my pet project, I'd stick my hand up your ass and use you like a sock-puppet." - Muse
Re:Responsibility by dubl-u · 2002-09-17 18:23 · Score: 3, Interesting

This is certainly true, but you should consider the flipside of it. The typical way it works with IT departments is that they are given unfunded mandates right and left. There is no possible way they can do everything with the money they have. What should happen is that some stuff should be taken off their plate. But they rarely have the political pull needed to do that, so what actually happens is that either everything is done poorly or the IT guys work on what they think is important.

So before you go pointing fingers at the IT department's attitude, it would be good to ask, "Did they tell the managment that they needed a way to back up those machines? And did the managers give them the necessary time and funds?"

Every IT person I know with a bad attitude has didn't start that way; they acquired it through years of crappy management.
Data Recovery efforts... by ComputerSlicer23 · 2002-09-17 20:02 · Score: 4, Informative

Assuming, it's really, really necessary, there are ways of gleaming data off of a drive that has died. You can send your drives off to said company and for a thousand dollars a disk they take it into a clean room, disassmeble the entire platter assembly, physically prepare the platters so they are as good as can be, then put them under their own super duper head assembly.
Assuming you can find one of them, use google for gods sake. Get them to read the drive linearly. At this point, they should be able to put each drive onto new drives for you. Build a raidtab that matches the new drives they give you, do raidstart /dev/mdXX, do a fsck.reiserfs on it. Mount the thing. Poof like magic you have data again.
I've heard of several people recovering Ph.d and masters disertations this way. It's why the DoD has such strict standards about destruction and disposal of a hard drive. It's very difficult to delete data from a drive so it can't be recovered. It's can just get out of control expensive to recover the data.
If it's worth $5-10K, and you can deal without having it for several days, this might actually work... If the company bitches, just tell that it's about what a good tape setup would have cost if they had bought one.
The two guys who said try stripping the electronics are pretty brave SOB's, but that might be worth trying if you can't spring for some experts to deal with that for you.
Kirby
1. Re:Data Recovery efforts... by duffbeer703 · 2002-09-18 02:15 · Score: 3, Informative
  
  Remember that IRIX and AIX use XFS and JFS, respectively.
  
  If you sent corrupted volumes to a shop that can recover commerical Unix disks, they should be able to recover data on a Linux box. JFS=JFS.
  
  --
  Conformity is the jailer of freedom and enemy of growth. -JFK
data recover specialists by martin · 2002-09-17 20:58 · Score: 3, Informative

For me the guys are www.vogon-data-recovery.com

There are others, but I always seem to see these guys at the forefront...

just my 2 pence..
Re:What happened? by duffbeer703 · 2002-09-18 02:12 · Score: 3, Informative

That is not a smart strategy.

Alot of storage is very finicky -- and odd low-level problems emerge when you mix disk manufacturers or even disk models in some cases.

When Sun 5200 fibre arrays first came out, if you got a batch of Sun-branded drives that were manufactured by different vendors, you would have all sorts of odd locking issues and other goodies.

A good strategy for reliable storage:

- Don't be cheap. You get what you pay for.
- Plan for Disk-to-Tape (or DVD) backup, actually test restores regularly-
- Use well-supported, stable versions of filesystems
- Don't buy the latest and greatest unless there is a business reason to do so

--
Conformity is the jailer of freedom and enemy of growth. -JFK