Managing RAID on Linux
A person deciding to go with RAID faces a panoply of options and gotchas. Hardware or software? How many controllers? ATA or SCSI (or ataraid)? RAID 1 or RAID 5? Which file system or distribution? Kernel options? Mdadm or raidtools? /swap or /boot on raid? Hybrid? Left or right symmetric? One poster pointed out that putting two ATA drives on the same controller could impact performance. Yikes! Didn't I do that? Upon discovering that O'Reilly had just published its Managing RAID on Linux book, looking at sample chapter , I bought the book and let my blood pressure return to normal.
RAID is one of these subjects that is really not complex; it's just very hard to find all the information in one place. This is precisely the book to solve the problem. Author Derek Vadala, sysadmin and founder of Azurance.com, an open source/security consulting firm, has gathered a lot of information and even personal anecdotes to go through the decision making process when going over to RAID. He goes step-by-step through that process, educating us about hard drives, controllers, and bottlenecks along the way. This exhaustive book may be the first to bring RAID to the masses.
Although parts of the book (RAID types, file system types) may seem already familiar to experienced Linux users, it is helpful nonetheless to have everything in a nifty little book. A section of file systems provided not only a rundown of the merits and drawbacks of each one, but also a guide to their utilities. I learned for example what "file tails" for Reiser are, and why using them causes performance to degrade after reaching 85% capacity. The book compares raidtools with mdadm as well as lovely commands like nohup mdadm -monitor -mail=paranoidsysadmin@home.com (which, if you haven't guessed, causes the system to email you RAID status reports upon boot).
People who use software RAID may skip over the chapter on RAID utilities for the leading RAID controller cards. Still, there was one interesting tidbit: Why, the author asks, do makers of controller cards put all their BIOS utilities on DOS floppies which require us to find a DOS boot disk? Seriously, how many of us carry around DOS boot disks nowadays? The book made me aware for the first time of freedos, an open source solution that solves precisely that problem.
The Software RAID stuff was pretty thorough and clarified a lot of things. The book does an excellent job in helping to identify and eliminate bottlenecks and optimizing hard drive performance (using hdparm and various monitoring commands). The anecdotes and case studies definitely clarified which RAID solution is suited for which task.
I am less impressed by the book's sections on disaster recovery and troubleshooting. Although these subjects are brought up at several places in the software RAID chapter, the book could have discussed several failure scenarios or used a fault tree (such as the famous Fault Tree in Chapter 9 of the Samba book, a marvel for any tech writer to read). The book doesn't even discuss booting with software RAID until the last 10 page of the book and then gives it only a single paragraph (even though the author acknowledges it as "one of the most frequently asked questions on the linux-raid mailing list."). Call me old-fashioned, but isn't the ability to boot into your RAID system ... kinda important? As someone who just spent a significant amount of time troubleshooting RAID booting problems in Gentoo, I for one would have liked more insight into the grub/lilo thing. Also, in the next paragraph in the last chapter on page 228, the author casually mentions that "all /boot and / partitions must be on a RAID-1." Say what? Please pity the poor newbie who religiously follows the instructions in the book but fails to read until the end. I'm not sure what the author meant by this statement, but it required a much more substantial explanation and needed to go into a much earlier chapter.
These complaints don't detract very much from this excellent book, a true O'Reilly classic and a model of clarity and helpfulness. This book provides enough knowledge to avoid the dread and uncertainty that comes with trying to tackle Linux RAID. With a book like this, a sysadmin can sleep a little easier.
Recommended Readings:- Reliable Linux , by Iaian Campbell, Wiley and Sons, Dec 2001, ISBN: 0471070408. Gives excellent information not only about RAID but on general Linux reliability issues.
- Software RAID in the Linux 2.4 Kernel by Daniel Robbins. (Part Two).
- Linux Journal Article on Software RAID by Joe Edwards, Audin Malmin and Ron Shaker. ( Part Two).
- "How to do a gentoo install on software RAID" by Chris Atwood. Gentoo User Forum.
Robert Nagle (aka Idiotprogrammer )is a Texas technical writer, trainer and Linux aficionado. You can purchase Managing RAID on Linux from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
but the easiest way I've found is to go with hardware RAID. It's easier to setup, doesn't put any extra load on the CPU, and only costs a few hundred dollars extra.
Mind you I'm thinking of RAID used in producion instead of someone RAIDing two drives in there home machine.
"all /boot and / partitions must be on a RAID-1."
/boot must be RAID1, but / can most assuredly be RAID 5 (or, I presume, any of the other RAID levels). I have this running on an ol' RedHat 7.0 box:
/dev/md1 / ext2 defaults 1 1
/dev/md0 /boot ext2 defaults 1 2
/dev/md0 /dev/sdb1 /dev/sda1
/dev/md1 /dev/sda6 /dev/sdb6 /dev/sdc5
With raidtools, at least,
Hunk 'o fstab:
Similar hunk 'o raidtab
raiddev
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
#nr-spare-disks 0
device
raid-disk 0
device
raid-disk 1
raiddev
raid-level 5
nr-raid-disks 3
chunk-size 64k
persistent-superblock 1
#nr-spare-disks 0
device
raid-disk 0
device
raid-disk 1
device
raid-disk 2
*Shrug* Wonder what the context of that quote was within the book?
the no
Does this book talk about the md driver's
multipath personality?
This is the most poorly documented part of the
md driver.
if you read the raidtab man page ("man raidtab")
you will find _no_ mention of multipath whatsovever.
Yet, the md driver can do mulitpath (well, failover) if you set it up right.
It has limitations though... You can't install to multipath devices, or boot from them (lilo/grub, the various distributions installers don't understand md multipath) and, if an hba fails in such a way that interrupts are not generated...commands just go out to lunch... then md won't notice anything is wrong, and so won't failover. Also, it does nothing to notice if the failover path is actually working, so if that path fails you won't have any notice that redundancy is lost....
Well, multipath is not RAID, so maybe this book
doesn't cover it, but any book on software RAID for linux should probably cover all the features of the md driver.
I will be interested to see this book.
(From the raid howto)
4.7 The Persistent Superblock
Back in ``The Good Old Days'' (TM), the raidtools would read your /etc/raidtab file, and then initialize the array. However, this would require that the filesystem on which /etc/raidtab resided was mounted. This is unfortunate if you want to boot on a RAID.
Also, the old approach led to complications when mounting filesystems on RAID devices. They could not be put in the /etc/fstab file as usual, but would have to be mounted from the init-scripts.
The persistent superblocks solve these problems. When an array is initialized with the persistent-superblock option in the /etc/raidtab file, a special superblock is written in the beginning of all disks participating in the array. This allows the kernel to read the configuration of RAID devices directly from the disks involved, instead of reading from some configuration file that may not be available at all times.
You should however still maintain a consistent /etc/raidtab file, since you may need this file for later reconstruction of the array.
The persistent superblock is mandatory if you want auto-detection of your RAID devices upon system boot. This is described in the Autodetection section.
I beg to differ.
After having four hard drives die and losing various amounts of data, I purchased two 100GB drives and made 6 RAID1 partitions using about 90GB, and a 20GB RAID0 partition with the remainder.
The security of a number of RAID1 partitions for backup is a nice feeling to have since a drive failure can't wipe out my data now.
The RAID0 space is scratch space, so it doesn't matter if mtbf is reduced--there's nothing important permanently stored there.
You did realize that you don't have to have the entire drive use the same RAID level...
Windows software RAID (of any type) sucks, that doesn't necessarily apply to Solaris or Linux (in which I've used both, Solaris tends to be a little bit of overkill in many cases, but if you need it you need it).
/etc files (infact, the IBM EVMS stuff doesn't even use config files, it doesn't need them.. ) Just a few tips for the curious. (I use Gentoo, so I don't have to add these patches.)
As far as IDE channels, many many motherboards these days have about 4 ide channels (mine does, and it's not even NEW) 4 ide channels can make a good raid. My linux RAID 5 (software) is pretty transparent and read speeds are noticable faster. This is even MORE true if you put in the EVMS patches from IBM and use the GUI tools to create and manage RAIDS without even editing
Hardware RAID is marginally, not always better. For one thing, you are limited to the idea of RAID that you board manufacturer believes in.. It's not always what you need. CPU power? On any machine faster than 1ghz you never even notice. 2ghz and software RAID is invisible. Yes, software RAID sucks on windows (due to the stupidest fucking volume/RAID managing service I've ever used), but it's viable almost everywhere else.
Sometimes that extra few hundred dollars is an extra $20k (if you're doing lots of machines), if you can deal with the CPU hit is still more economical as long as it's reliable. Solaris/Linux RAID are ready for prime time, W2k's is still trying to figure it out. (For Windows boxes, please get hardware, save yourself headache.. thanks!)
On a modern machine, software IDE RAID is still beneficial. For striped arrays, the performance penalty on the host CPU is very minimal compared to the device performance. Of course, hardware solutions are easier to set up. If you buy a 3Ware card, or something similar, kernel support is a non-issue. But for home users that just want software to load faster or wish to have backups, IDE RAID is a cheap solution that performs very well.
So, you say it sucks, I say it's fine. You say toe-mott-oh, I say toe-mate-oh. Hardware RAID is more than just a few $. It costs hundred(s) more than software RAID controllers. I've had software controllers that performed better than the current high-end SCSI drives at the time. I can attest to the fact that CPU load was a non-issue. Performance was excellent and was the most inexpensive way to gain speed. It's ideal for home users that aren't wanting to spend a fortune on limiting the swapfile chug.
So, please define "sucks". Enlighten us softRAID users on what the problem is. Or is the problem really that you've spent your fortune on some overpriced SCSI drives that get outperformed by a couple of ATA100s?
When I've gotten that error it has meant that the drive itself is heading towards the great hardware graveyard in the sky. Since it's raid1 you should be able to simply put in a new /dev/hdb and all should be fine.
RAID 0 is pointless - gosh, I wish all the video editing studios out there knew this. They've been duped into believing 150 megs a second sustained has value. What morons.
Too bad cheap RAID5 cards don't exist. - Hmm, you mean like the Promise SX4000 that costs $150?
I'm Rick James with mod points biatch!
firewire is a bus, raid is a configuration.
d Type=firewire
there are raid arrays with firewire interfaces, and software raid using firewire drives is quite possible. (osx makes it easy as pie)
here are some cool firewire raid products:
http://www.usbshop.com/firewireraid.html
http://www.sancube.com/
http://www.voyager.uk.com/products_master.asp?pro
the x-stream from sancube has two firewire busses for double the speed, or for sharing.
That onboard Promise RAID controller you dished out the extra $50 for on that new motherboard is not going to get you a nice hardware RAID 5. AFAIK they can only do 1,0, 0+1, or 1+0. Also, I see people whining about software RAID as compared to hardware RAID. Running a striped set through software was nearly unfeasable a few years ago, but with the resources new machines have these days, the difference is almost negligable, as long as it doesn't have to fight for system resources. let's not forget software RAID is alot cheaper than buying a RAID controller.
At any rate, taking the view that hardware RAID is always the solution and software RAID is never the solution is just bad sysadministration.
Everyone is entitled to their own opinion. It's just that yours is stupid.
Dude, that's hardware. Turn off the dma on your drives with hdparm.
/dev/hdb
/dev/hdb
hdparm -d 0
You might also have to turn off 32 bit mode:
hdparm -c 0
Of course, this will slow things down.
Be sure everything's jumpered correctly.
Also, of course, I'm not responsible if you fry your data!
-- I am. Therefore, I think!
Up until now I've bought only SCSI drives because heavy compiles (which I do a lot) just choke IDE down. I now have a 4 x 60 GB RAID-1 and it just screams. With a one time investment in a proper IDE RAID card with escalator scheduling, tagged queueing and big cache I still save a lot of money by being able to buy large but cheap IDE disks.
the poster obviously doesn't know what he's talking about.
a 'rubbish' 500Mhz CPU - 500,000,000 ops / sec
a 5ms access time SCSI HDD - 200 ops / sec.
so what if the CPU on the RAID card is a pathetic 100MHz job, it'll still be able to keep up with the data flow from the HDD, even when that data is being burst through.
How much cache ram have you got on that RAID card is a better indication of performance improvements for your hardware.
I get >160 Megabytes per second off my software striped drives, which is far faster than I've ever gotten off any hardware RAID.
And I've found the RAID 5 overhead is nominal, and very reliable.
No matter how fast your CPU is, you aren't going to beat a dedicated hardware RAID controller. Also, if you're going to spend the money for SCSI, why wouldn't you spend a little more and go with a hardware solution? That's like buying a BMW then "saving money" by adding the fog lights yourself.
You are going to beat hardware controller, because the chip running your software RAID (P4 Xeon, 2GHz) is much faster than the chip on the hardware controller (arm, 100MHz). Your only limitation is the IO bandwidth, thats why you go with SCSI.
Server manufacturers sell hardware RAID as expensive add-on, but they are not advertising any benchmarks showing speed advantage. Because there is none. Current controllers are just not good enough, can't keep up with speed advances of CPUs.
This can be done with (as root):
wget http://www.linux-ide.org/smart/
smartsuite-2.1.tar.gz
tar -xzvf smartsuite-2.1.tar.gz
cd smartsuite-2.1
make
make install
You might get some non-fatal type errors. The makefile doesn't always work for setting up the rc.d scripts.
Now run:
I'm assuming the bad disk is /dev/hda, but change it to suit your needs. If you get some errors, then SMART may not be enabled, so you'll need to run:
Anyway, when you run smartctl with the -a, it will tell you all about hardware failures and whatnot. For more info on the codes it returns, go to this page: http://www.ariolic.com/activesmart/docs/smart-attr ibute-meaning.html
I hope this helps
Beware TPB
EVMS is IBM's version of RAID for linux. This is natively available on gentoo linux. I've been running it on a few boxes with great success. The utilities make it a lot easier to set up raid, lvm, etc.. Definately worth looking at for those interested.
I'm not a real doctor, but I recommend beer.
You know what? The other drive in the RAID pair (/dev/hdd) had DMA off, while /dev/hdb had it turned on. I don't know why that was the case. Perhaps my late night fiddling resulting in some sort of fat fingering (wait... that sounded really bad). Anyway, I decided to do some tests by copying about 150MB of MP3s to my array while setting DMA to either on or off.
With DMA on/off (regardless of which drive has DMA on or off), I get the errors. With it set to off/off, I don't get errors, and the array is slower than a wounded prawn and a huge CPU hog (the copy takes around 50 seconds and the load avg hovers around 4.50). I don't care about slow since this is an NFS/Samba server and CAT5 is my bottleneck. The CPU load I do care about since the box does other things besides simply serve files. With DMA set to on for both drives, I also don't get the errors, which is very cool. The copy takes around 10 seconds and the load avg is about 0.70. All to be expected, since DMA gives quite a performance boost. But it's good to know I can turn it on.
Looks like my issue was with wacked DMA settings, and not the hardware going bad. So thanks for getting me to take another look! I probably ought to go buy the RAID book now...
-B
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.
As far as IDE channels, many many motherboards these days have about 4 ide channels (mine does, and it's not even NEW) 4 ide channels can make a good raid.
Isn't that just 4 IDE plugs, but only really 2 IDE channels? RAID embedded in your motherboard is usually of the Promise variety and cheap hardware raid isn't much better than software raid. Tom's hardware has an informative article on the difference between hardware and software RAID and they reported that this is the case.
You actually feel good about the Linux drivers that Promise gives you with the SX4000? I bought this card, and I wished I stayed away from it.
I am using it with four 120gb IDE drives with 8mb cache. For starters, if you use anything but the sxcslapp program in Linux to configure the drive, your drives are corrupt. All of 'em. And, your bios will return corrupt information regarding them. This causes DOS not to boot (hard freeze), and Linux to produce keyboard smashings on boot. This is a known firmware problem, and I'll be damned if they have any flashes available, even though the card is four months old. I just checked before writing this review.
Once I figured out that all the work had to be done with sxcslapp in Linux, I started building my RAID5, albeit with caution. Things here went pretty well, except a) performance sucked about as bad as a single drive and b) the closed source drivers rebuild the raid array with no warning if a drive fails and is replaced, even if the file system is mounted. So, this means that if you have a drive that bombs and you replace it, anything you write to the raid array will be wiped out. I could have used some notification.
The Linux drivers are horrible. They are written in 'Engrish', and the documentation might as well have been written by someone who doesn't understand computers. "Select the remove drive from array option to remove a drive from array". This continues for all of the options in their menu-driven app.
I am also forced to use Red Hat 7.3 for this. Great. I now have a cluster of Debian 3 servers I administrate and one Red Hat server.
I would have returned the card if my reseller would have taken my money. It's about equally expensive to buy IDE add-on cards, or maybe a bit less, and the software RAID in Linux seems to be firmly documented. I've used RAID1 in software on servers before, and it works nicely.
I'm tempted to go buy a real RAID controller card and get away from software RAID.
What do you think it'll buy you, honestly? I've got a half dozen software RAID1 systems out there, three of them being pounded mightily every day (10k-user ISP mail/radius servers) without so much as a squeak of complaint. Throughput is pretty decent as well:
(yes I know it's not a thorough benchmark) -- So without taking the drive cache into play, I can hit about 30MB/sec sustained. If I had better drives I bet I could boost those numbers significantly. Probably close to the 90MB/sec I am seeing on my new server, single-drive stats.
Well, I had thought that my IDE controller was bad, the IDE drivers are wonky, the raid tools stuff was weird, whatever. I mean, I had two drives which both worked great when used by themsleves. I put them in a RAID pair, and I got errors. Turns out I had DMA disabled on one of them, but I was looking at Linux software RAID as the culprit. I thought buyiung dedicated hardware would isolate any problems. It was a last ditch, straw-grasping effort to tell the truth.
I'm actually a fan of Linux's software RAID1. No "special" drivers, I can use any kernel I want, easy to set up, minimal performance impact, and fairly transparent to use. Now that I know why I was getting errors, and that it wasn't anything to do with software RAID, I'm fine with it.
-B
Ash and Hickory, straight-grained and true, make excellent bludgeons, dandy for the cudgeling of vegetarians.