Managing RAID on Linux
A person deciding to go with RAID faces a panoply of options and gotchas. Hardware or software? How many controllers? ATA or SCSI (or ataraid)? RAID 1 or RAID 5? Which file system or distribution? Kernel options? Mdadm or raidtools? /swap or /boot on raid? Hybrid? Left or right symmetric? One poster pointed out that putting two ATA drives on the same controller could impact performance. Yikes! Didn't I do that? Upon discovering that O'Reilly had just published its Managing RAID on Linux book, looking at sample chapter , I bought the book and let my blood pressure return to normal.
RAID is one of these subjects that is really not complex; it's just very hard to find all the information in one place. This is precisely the book to solve the problem. Author Derek Vadala, sysadmin and founder of Azurance.com, an open source/security consulting firm, has gathered a lot of information and even personal anecdotes to go through the decision making process when going over to RAID. He goes step-by-step through that process, educating us about hard drives, controllers, and bottlenecks along the way. This exhaustive book may be the first to bring RAID to the masses.
Although parts of the book (RAID types, file system types) may seem already familiar to experienced Linux users, it is helpful nonetheless to have everything in a nifty little book. A section of file systems provided not only a rundown of the merits and drawbacks of each one, but also a guide to their utilities. I learned for example what "file tails" for Reiser are, and why using them causes performance to degrade after reaching 85% capacity. The book compares raidtools with mdadm as well as lovely commands like nohup mdadm -monitor -mail=paranoidsysadmin@home.com (which, if you haven't guessed, causes the system to email you RAID status reports upon boot).
People who use software RAID may skip over the chapter on RAID utilities for the leading RAID controller cards. Still, there was one interesting tidbit: Why, the author asks, do makers of controller cards put all their BIOS utilities on DOS floppies which require us to find a DOS boot disk? Seriously, how many of us carry around DOS boot disks nowadays? The book made me aware for the first time of freedos, an open source solution that solves precisely that problem.
The Software RAID stuff was pretty thorough and clarified a lot of things. The book does an excellent job in helping to identify and eliminate bottlenecks and optimizing hard drive performance (using hdparm and various monitoring commands). The anecdotes and case studies definitely clarified which RAID solution is suited for which task.
I am less impressed by the book's sections on disaster recovery and troubleshooting. Although these subjects are brought up at several places in the software RAID chapter, the book could have discussed several failure scenarios or used a fault tree (such as the famous Fault Tree in Chapter 9 of the Samba book, a marvel for any tech writer to read). The book doesn't even discuss booting with software RAID until the last 10 page of the book and then gives it only a single paragraph (even though the author acknowledges it as "one of the most frequently asked questions on the linux-raid mailing list."). Call me old-fashioned, but isn't the ability to boot into your RAID system ... kinda important? As someone who just spent a significant amount of time troubleshooting RAID booting problems in Gentoo, I for one would have liked more insight into the grub/lilo thing. Also, in the next paragraph in the last chapter on page 228, the author casually mentions that "all /boot and / partitions must be on a RAID-1." Say what? Please pity the poor newbie who religiously follows the instructions in the book but fails to read until the end. I'm not sure what the author meant by this statement, but it required a much more substantial explanation and needed to go into a much earlier chapter.
These complaints don't detract very much from this excellent book, a true O'Reilly classic and a model of clarity and helpfulness. This book provides enough knowledge to avoid the dread and uncertainty that comes with trying to tackle Linux RAID. With a book like this, a sysadmin can sleep a little easier.
Recommended Readings:- Reliable Linux , by Iaian Campbell, Wiley and Sons, Dec 2001, ISBN: 0471070408. Gives excellent information not only about RAID but on general Linux reliability issues.
- Software RAID in the Linux 2.4 Kernel by Daniel Robbins. (Part Two).
- Linux Journal Article on Software RAID by Joe Edwards, Audin Malmin and Ron Shaker. ( Part Two).
- "How to do a gentoo install on software RAID" by Chris Atwood. Gentoo User Forum.
Robert Nagle (aka Idiotprogrammer )is a Texas technical writer, trainer and Linux aficionado. You can purchase Managing RAID on Linux from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Is it possible to use Firewire and a service like Rendevous to make an intelligent redundant system? It's a thought at least. My firewire drive I use for my Inspiron works nicely enough. Would firewire be cheaper than RAID for servers, however?
Syr GameTab.com - Game Reviews Database
I've stepped away from the software RAID idea on my boxes, due to the availability of cheap hardware RAID, such as Promise's SX4000. It will do hardware RAID 5 for four+ drives and has a SDRAM slot for cache expansion. Coupled with LVM, it ended up being a good solution for me, as I had both the reliability, and good volume management if I wanted to combine arrays.
The problem I've had with the software RAID is reliability and expandability. It is a pain in the ass if you lose a drive in the array, and it is next to impossible to add a drive (other than a stand by drive) to your existing RAID 5 setup.
Aah, opinions...
------------------ D. A. Davenport: http://www.firebin.net
RAID level 01/10 is both expensive *and* pointless
Well, maybe for the average power user, but not the real power users. Pretty much every stock exchange, airline reservations system, credit card switching system in the world uses mirroring and striping. Operating systems such as HP's Non-Stop Kernel (from Tandem) and IBM's Transaction Processing Facility (TPF) work this way and run these mission critical systems.
Why? I/O throughput and redundancy in applications that can't afford to fail. The disks aren't expensive compared to the rest of the system and even less expensive than the downtime.
These aren't Linux systems, but as Linux scales up there will be times when it will necessarily copy from mainframe-class systems.
Jan 26 04:15:02 hostname kernel: hdb: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Jan 26 04:15:02 hostname kernel: hdb: dma_intr: error=0x84 { DriveStatusError BadCRC }
I've looked all over the place for the answer, google, mailing list archives, Usenet, local Linux friends, etc. and haven't been able to find a definitive answer. It's like nobody really knows what that error messages really means.
Newsgroups suggested bad cables, so I replaced those (twice, once with brand new cables bought specifically for the purpose). Some info suggested the drive or the drive's controller was failing, so I replaced it. Other info pointed to my IDE controller, so I installed a new one dedicated only to the RAID pair. I saw info that said the raid tools were to blame, and to see if the errors go away when the mirror is broken. No dice. Other info I found suggested that it was the IDE drivers in the kernel and that the messages were nothing to worry about unless I was seeing data corruption. I'm not seeing corruption so I'm left with this option.
If the book can shed some light on the error message voodoo one sees with Linux's IDE driver, then I'll buy it. I'd pay double what they're asking, even.
-B
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
Why, the author asks, do makers of controller cards put all their BIOS utilities on DOS floppies which require us to find a DOS boot disk? Seriously, how many of us carry around DOS boot disks nowadays?
Well, given Dell's recent announcements, I suppose fewer and fewer of us will be doing so.
But really, the author's point is so moot that it's embarassing: if it's my job to maintain a RAID array, and the utilities are on DOS floppies, of course I'm going to have access to a DOS boot disk. So what ? Just how hard is it to carry such a thing around, and why is this is a worthy thing to rail about, in a book about RAID ? If the author wastes too much time talking about stuff like this, this book can't be that useful - arggh, I've wasted too much of my own time already.
Do explain what *exactly* is wrong with Window's software raid features. Besides being made by your favourite company, Microsoft, that is.
another 'biased-to-the-point-of-bigotry' post.
CPU usage isn't entirely the point. It doesn't take much CPU power to do RAID these days (that's why software RAID in Linux is a pretty good option). The problem is that it requires drivers and software control over RAID functionality (just what you want to avoid) when the RAID card should just be making the RAID array look like a single drive to the operating system. Notable examples include the HighPoint "RAID" controller found on some Abit motherboards.
Anyway, I think "Please get your facts straight before posting" is kind of a nasty response to somebody who is pointing out something that is well known to most people who have tried using these pieces of crap with their non-Windows operating systems. Try using Google if you want references, one way or another. They won't be hard to find if you search through driver development lists for Linux and *BSD.
An interesting anagram of "BANACH TARSKI" is "BANACH TARSKI BANACH TARSKI"
1/ Linux can rebuild RAID from on-disk information. NT 4 is deficient in this regard, it would seem.
2/ Problem is worse with hardware RAID, because if I lose the card, I'm fucked. I either have to have spares, or wait on a controller. Never mind what happens if the manufacturer goes out of business.
3-4 years ago, when we decided to use hardware RAID on our Linux servers, we bought some DPT Smartraid V hardware RAID controllers. Unfortunatly DPT was bought by Adaptec some time after. Adaptec has been really good at getting the driver included in the kernel, but the takeover seemed to delay this proces, so the time in between was a rough ride.
The lesson learned was, never have a production Linux system with (binary) drivers tied to a specific kernel or distro version.
That said, we have been very happy with the controllers, and since at least two disks has died without warning, the expense has easely been worth it. Our systems are used 24/7/365, so every minute of downtime annoys somebody. RAID really makes me sleep better, restoring a server from a slow tapestreamer, at some ungodly hour, while people nervously checks in, asking when we will be up again, is something I really want to avoid too much of.
YMMV, but I think hardware RAID still has an edge over software raid, mostly because I find it simpler to maintain in the long run.
If you are into LVM's, FS tools, and software RAID, go to:
http://evms.sourceforge.net/
and _drool_. Future stuff for now on production servers, but nevertheless.
There are plenty of reasons to go software RAID over hardware RAID. With Linux, one of the main reasons is the same reason many of us choose Linux to begin with-- it's open source. I know that isn't traditionally a factor to be considered when picking hardware, but remember that when a hardware controller fails you are at the mercy of the vendor. If a Linux software RAID fails, you have access to the source code and perhaps also the developers, so maybe you just have a shot at recovering data in a catastrophic event, even if it does mean writing some recover tool on your own. In fact, with RAID-1 in the Linux kernel, if something goes kablooey you can just mount a member disk standalone and get some rest.
That's only one consideration. It used to be that the headache of booting from, and installing to Linux with software RAID was a huge hassle. Today almost every distribution supports out of the box installation to software RAID. So the 'ease of use' considerations for going hardware are all but gone.
Now here's the issue that always starts the tug of war-- performance. Traditionally hardware RAID was simply better because it didn't hit the CPU. Today that doesn't make a difference, especially if you use SCSI. Now with ATA you might see the overhead of RAID a little more, but that's because ATA already has overhead to begin with. The CPU hit with SCSI is negligible, and I doubt if it will be noticed in most cases, even in so called "production". That's because the real bottleneck in most systems in I/O throughput and not CPU performance. That's most systems, not all systems. Obviously if you are a good sysadmin you are evaluating these issues on a case by case basis.
Finally I just want to say that it's a widely held opininion among the Linux RAID community that the kernel RAID (the md driver) outperforms all but the most high-end SCSI RAID controllers. I'm sure many will disagree, but that's been my experience and I know that if you ask certain kernel developers who shall remain nameless they will tell you the same thing.
Run bonnie, you'll see.
Derek Vadala, lowly author.
at O'Reilly, mdadm
and, I'd recommend Enterprise Volume Management System rather than LVM ( Logical Volume Manager ), simply because LVM's seems to be being dropped as
redundant ( ironic, that : ) as EVMS gets more effective, and I don't want the conversion-work from LVM to EVMS, if I can just do EVMS right now, see
Messages to/for me ( in me journal )
I've just been through setting up a raid system. I set up a file server that automatically backs up data every week that the users on the network put on it via samba. Since I only want to show up at the place every 6 month or so to check on the server it needs to be bullet proof to the max and still cheap, because they don't have much money as social workers.
:-(, since doku is the last thing those guys seem to think about.
I purchased a used p2 system with a stable mb and two ibm scsi drives on an adaptec controller. I installed Debian GNU/Linux stable and upgraded to the latest stable. Then I put up a softraid and opted for xfs in case of a power failure. I decided against an ups, because I hooked the machine up to the local power network, which is very stable, since the server lives in Berlin/Germany, and I wanted to save the cost.
Then I moved the root filesystem over to the raid device. Up until now everything was documented very good, except for the fact, that I heard that reiserfs doesn't work with softraid and I didn't find that info on the net anymore. I would have taken reiserfs instead if I would have had a reliable source, such as the book, telling me that that is OK.
The only thing I had problems with was how to make the system boot off the raid device. Here the howtos and man pages had contradicting stands on how to do this.
I read this Slashdot article with some regret, because I thought it could have saved me a lot of trouble. But the only section that gave me trouble also seems to confuse the auther of the book. Now that is no help at all. So this book is a waste of time if You know how to use google, which I had to learn painfully fast getting into Debian
But since Debian is still by far the best system out there overall I have no choice. If You start to rely on seemingly simple things such as a reliable update of Your system with very low hassle then You are hooked.