Which RAID for a Personal Fileserver?
Dredd2Kad asks: "I'm tired of HD failures. I've suffered through a few of them. Even with backups, they are still a pain to recover from. I've got all fairly inexpensive but reliable hardware picked out, but I'm just not sure which RAID level to implement. My goals are to build a file server that can live through a drive failure with no loss of data, and will be easy to rebuild. Ideally, in the event of a failure, I'd just like to remove the bad hard drive and install a new one and be done with it. Is this possible? How many drives to I need to get this done, 2,4 or 5? What size should they be? I know when you implement RAID, your usable drive space is N% of the total drive space depending on the RAID level."
For personal use, a two-drive RAID 1 is probably the easiest way to go, and involves the fewest drives, but loses the most space (half). Raid 5 is the standard, but the hardware is more expensive and it involves at least one additional drive.
For simplicity and low expense, even though you lose a full drive worth of capacity, go with RAID 1.
You might want to read The Tech Report's recent article mentioned on Slashdot if you haven't already.
That what was all this school was for... to teach us how to solve our own problems. -- janeowit
RAID 0, you need a hero,
RAID 1, is equally fun,
but RAID 5 keeps you alive!
stuff |
I would choose RAID-1.. because RAID Level 1 provides redundancy by writing all data to two or more drives. The performance of a RAID-1 array tends to be faster on reads but slower on writes when compared to a single drive. However, if either drive fails, no data is lost. This is also a great entry-level starting point as you only need 2 dirves. The downside is the cost per MB is high in comparison to the other levels. This level is often referred to as disk mirroring.
Hmmm.
Try RAID 5 or RAID 10 (not to be confused with RAID 0+1). This site has a nice overview of all the RAID options. And, of course, Wikipedia has some info.
Quick overview:
RAID 5 - Requires at least 3 HDs (many times implemented with 5 - can be used with up to 24 I believe). Data is not mirrored but can be reconstructed after drive failure using the remaining disks and the parity data (very similiar to how PAR files can reconstruct damaged/missing RAR files for the Newsgroup pirates out there). % of total space available dependent on number of drives used.
RAID 10 - High performance, but expensive. You get ~50% of the total HD space as it is fully mirrored. So, 1 TB total disk space nets you 500 GB total storage space. Your data is mirrored so if one drive fails you do not lose everything. However, if you experience multiple drive failure you can be in big trouble.
Casual Games/Downloads
Whatever you do, never have more than one disk on an ide channel. Only one disk per channel can be written to at the same time, so you will get absolutely horrible performance if you get more than one hd per channel. If possible, get an ide raid card (if you can afford it) or a SATA card/mobo and drives, which dont have this problem
95% of all computer errors occur between chair and keyboard (TM)
Wow inexpensive & reliable... Those are two words you don't see together too often.
Your good options are raid 1, raid 0+1, or raid 5, depending on what you want..
Raid 1 is the safest.. just mirroring the drives, but it results in no speed increase..
Raid 0+1 does mirrored stripe sets -- you get the speed advantages of raid 0 with the full protection of raid 1.
Raid 5 is good middle ground. Raid 5 stores 1 drive's worth of parity. When you lose a drive, your system goes down (if you don't have a hot spare), but you throw another disk in and it'll come back up. You also get some speed increase over a normal drive setup. With RAID 5, you only lose a single drive's worth of capacity no matter how many drives are in your array, whereas with raid 1, you lose 50%.
Thank you for fulfilling the "Why didn't you check google?" quota. No one else needs to this until the next Ask Slashdot!
Keep your eyes to the sky.
*ducks*
I went through this last year and here's what I came up with for the best benefit to cost ratio with the lowest hassle. In short, take an old PC and put a four channel raid controller card in it to do RAID 5. Add a big extra fan for safety and you're done.
Here's what I came up with: Total cost about $1200 (probably less by now).
0) Red Hat Linux, ext3 filesystem.
1) 3Ware Escalade 7506-4LP card (64 bit card, but fits in 32bit slot)
2) 4x 250Gb Western Digital drives
3) Big fan.
At RAID 5 This yields 750gigs (715Gb after crappy GB conversion).
The 3Ware software has a nice web monitor interface and does daily or weekly integrity checks. It emails me if there is a problem - I did have one drive die already and replaced it easily.
Pat Niemeyer
Author of Learning Java, O'Reilly & Associates
Another missed opportunity to use the internet's, uh, hottest new acronym: FGI.
"I'm tired of HD failures. I've suffered through a few of them. Even with backups, they are still a pain to recover from.
If you just run Gentoo, you can type "emerge new_harddrive" and it takes care of everything by the end of the month!
or..
Your shit PEECEE WINTEL crap parts made in china are no match for real quality Mac hardware, which are fully integrated with the UNIX UNDERPINNINGS that have the Best GUI Ever(tm) on top.
Disclaimer: I love trolls.
do() || do_not();
Dear Slashdot,
which is better, SCSI or IDE?
Googleless in VA
If I could, I'd get 2x 250GB HDDs in a RAID1 (promise controllers are good for this), and a third 250GB for a cold backup of all my data that syncs weekly.
:)
Raid's great, but an rm -rf is still an rm -rf, thus the third drive
-- The unsig...
Seriously. Raid is all about risk. Figure out how much risk is acceptable to you. If you have a stack of 6 drives and you only believe 1 is ever going to fail at any one time, then go with raid 5.
If you have a stack of 6 drives and believe not a single one is ever going to fail, go for level 0.
If you are a government contractor and are required to handle simultaneous failures of 75% of your drives, either mirror them all or go with 5+1 or a raid 10 setup.
All in all, its a poor question to ask slashdot. You need to let us know what you consider an acceptable failure, and by the time you have that figured out determining what raid level you need is easy.
Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
RAID 5 or 6 will stripe the data across all drives in the array. You will basically need about 8 - 10 % of the total space set aside for data recovery. You can loose 2 hard drives (as long as they are not next to eachother) and not loose any data. RAID 5 and 6 are only incredibly useful in application with more than 4 hard drives and about 500 gb of storage. It's a little faster than the lower raids becuase the redundancies are simple pairity bit calculations, and are done twice for each single data change on disk. The lower raids will have a set of disks that actually mirror the data in tact (raid 1) or perform more intensive Hamming Distance calculations and store the results on another set of disks.
So, RAID 5 or 6 would be the best (RAID 6 is worth the extra bit of space for the 2nd calculation, and really helps when you can test the pairity bits against another pairity to create the lost data.)
There will be some slow down associated with RAID, but it wont be as bad with 5 or 6 and generally, you can live through it with the thought of having relativly robust file servers.
while(1) { fork(); };
A quick note - if you re-initialize the RAID, it will erase everything you have. You should 'rebuild' the drive, unless you have a hot-swap, in which case you just take out the bad drive, pop in the good one, and ur good to go.
Have you thought about software RAID? Before everyone jumps down my throat, I realize that it's slower than hardware RAID...but, here is my rationale for using it:
1) You don't need drives that are the same size.
I've done hardware RAID, had a drive fail 2 years down the road and not been able to find an 18GB SCSI drive to re-insert to the array. That has the potential to jack your entire array. With software RAID, you buy a 36G drive, partition it so that 1 partition fits your array, and off you go
2) It's a personal file server, so speed is less important than cost (i'm guessing). With software RAID you can mix all sorts of wonderous things together. IDE drives from the basement, SCSI-320 drives you stole from work and nearly everything in between. It's for flexible, and has no associated controller cost.
3) It's easy as heck. You can configure it in Disk Druid/fdisk, and it works quite easily in any major distribution (I've done it in Slack, Debian, RH, Fedora and Mandrake).
The major downside is that you cannot (as least I don't know how to) hot-swap drives. But again, this is a personal file server. Spend your money on pizza and beer, screw the SCA hot-swap drives that are going to cost you an arm and a leg.
That's just my $0.02...flame away
Werd.
You also never touched on the possibility of him having only 2 drives, in which case RAID 1 would be the way to go for data redundancy.
She loves me: 09F911029D74E35BD84156C5635688C0 She loves me not: 09F911029D74E35BD84156C5635688BF
Really depends on the type of RAID you'd like to implement.
RAID 0 stripes the data across 2 or more drives and therefore offers no redundancy (in fact, in a two-disk stripe you mutiply danger of data loss x4 compared to two individual drives -- because you not only double the possibility of failure with two disks as opposed to one, but stand to lose all of the data on both drives should one fail). In any event, no point in discussing it further since redundancy is the point.
RAID 1 offers redundancy by exactly duplicating the contents of a drive onto another drive, and needs exactly two drives. This is considered the most "fail-safe" method of RAID array although offers no performance benefits whatsoever.
RAID 10 (or 1+0 or 0+1) is a combination of RAID 0 and 1 and is nearly always done with four drives, although technically it can be done with six or eight (if your controller supports them). It offers both performance benefit and redundancy, although the cost of the "wasted" drive space is quite high.
RAID 3 involves using 3 or more drives, one of which contains parity information to rebuild the lost drive should any of the other drives fail. This is one of the least popular RAID formats and has more or less been totally replaced by RAID 5.
RAID 5 involves using 3 or more drives and writes parity information across all drives in the array, allowing one drive to fail with little to no performance loss. The failed drive can be replaced and the RAID rebuilt. Depending on your hardware/software, this can often be done hot without having to power down the system at all. It is one of the most commonly implemented RAID solutions because of the good mix between drive use (the price goes down the more drives you have in the array yet you can have as little as three), redundancy, and high availability.
There are others out there like RAID 50 but nothing worth mentioning, especially for a home user.
The only question left to you is whether the RAID will be run by hardware or software (software might be a good choice if you are already running Linux on the server, but you'll have to ask someone else about it because I don't know a thing about it). Personally I chose the hardware route years ago and bought an Adaptec 2400A, which is a four-channel hardware ATA-RAID card capable of RAID0, 1, 10, and 5 -- guess which I use. I use all four channels, each with a 200GB SATA hard drive. I've lived through a couple drive failures, a full drive upgrade (when I first bought the card it was 4x60GB drives) and even once where two drives RAID tables got zapped (I'll NEVER put my drives in removable cages again) and never lost a byte of data -- so the CAD$500 or so for the investment on the card was worth it.
600GB of storage means not having to worry about all those unlicenced-in-North-America-anime torrents running out of space any time soon.
IDE is quite stable these days - certainly for the price.
Let's see. My server requires half a terrabyte of storage.
3 200gb IDE drives at $100/ea == $300
3 180gb SCSI drives at $700/ea == $2,100
Yeah... Not likely, pal. And certainly doesn't qualify for "affordable" like this guy is clearly looking for.
True story...had a personal fileserver with a Promise RAID card. I got the Promise card because it was cheap and had a good rating on a couple of review sites.
What I didn't know at the time, but learned the hard way, is that Promises's RAID monitoring program "PAM" is a user-mode only application. That means that if you don't login, it doesn't run. Care to guess what happened to me?
At some point while I was gone for the weekend, I can only guess something crashed and rebooted Windows 2000. When it rebooted, I didn't have it set to automatically login (why would I? it's a server). So "PAM" wasn't running when one of the drives in the RAID 5 set failed. Maybe it even had something to do with the crash, I don't know.
Now, the point of PAM is that if a drive fails, an e-mail gets sent, in this case to my mobile phones textpage address. Since PAM wasn't running however, nothing was sent. The drive failed and, I can only guess, put off so much heat that it cooked the drive above it (why do so many cases mount hard drives horizontally above each other anyway?) and next thing I know, I can't login to my server from where I'm staying. I call a family member with a key to come by and they are unable to restart the server. It wasn't until I came home and read the BIOS messages that I understood why. Everything gone.
I had a lot of stuff on CDR, but let me tell you, I was plenty outraged that Promise could design something so utterly stupid as a monitoring utility that doesn't know how to run as a service. Even to this day, PAM still will only run as a user-mode program, and even worse, you actually have to login to the program now to start it, which can't be scripted.
F Promise. Only a complete and utter fool would be stupid enough to buy any of their products. May they rot in that special place reserved for child molesters. (Yes, I'm still bitter about it)
- JoeShmoe
.
-- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing
It's for home use
No data loss if a drive dies
Easy to rebuild - remove dead drive, install new one
Budget... Ah. Why is it *every* "Ask Slashdot" never mentions the budget? On the cheap, you could do simple mirroring RAID1 - most mobos with on-board SATA RAID will do this for you. The overhead is that you pay twice as much per GB because you obviously need two drives and the performance gains are negligable.
Personally, I'd take the more expensive route; get a proper hardware RAID controller with proper RAID management software. There are 4 port SATA RAID controllers (who *really* still needs SCSI for home use?) for a few hundred dollars and do full RAID5. You lose one drive for the parity info, but that could be as little as 25% of your total capacity if you get four drives instead of the the minimum RAID5 requirement of three drives.
Also, with a proper hardware RAID controller, you should also get a performance boost from use of RAID and have minimal CPU overhead. Get four of Seagate's new 400GB drives and you'll have over a TB of disk space, which should give you some bragging rights for a months or two before it's old hat. :)
UNIX? They're not even circumcised! Savages!
Software raid is plenty fast for a personal fileserver. It's not like you'll have a hundred users on it at a time. Unless you have an ancient CPU, you'll be fine.
-Ryan, with the unoriginal sig
You want a Promise UltraTrak SX8000 It's the easy idiotproof array. We're using several of these.
If a drive fails, it beeps at you til you replace it. You just yank it out, and put in a new drive, the same size or larger. It then rebuilds automatically. No shutdown or reboot required.
The Linux crowd will be happy to know the RM series runs linux. I don't know about the SX series, but I suppose it does too. Either one appears to the server to be a single SCSI drive. No drivers required, other than making the SCSI card of your choice work.
There's the Linux method of doing it too, which I like a lot. It saves you a *LOT* of money in extra hardware. You can go with 3 drives without adding any extra cards to your system, or you can put in IDE controllers to add as many drives as your system can support (PCI slots, power, and physical mounting points are the limitation). Read the "Software-RAID-HOWTO", which should come with your system. I've done many of these also, and they work quite nicely. You have to shut down the system to swap a drive, and then run `raidaddhot` with a couple parameters (the md device, if I remember right), and you can be running while it rebuilds.
You should have looked it up before you posted.
RAID 5 is the most common for a large redundant array. The array size is (N-1)*size . The more drives you use in a single array, the better off you are for size loss.
3 100Gb drives = 200Gb
5 100Gb drives = 400Gb
10 100Gb drives = 900Gb
10 200Gb drives = 1.8Tb
RAID 0 is striping. No redundancy, which you won't be happy with. (One failure means losing the array.
RAID 1 is mirroring. With two drives, you still only have the size of one.
RAID 50 is nice where it does striping across redundant arrays. You lose size, but gain speed.
Most other RAID types aren't very popular for various reasons.
Watch out for going over 2Tb in size on a single block device. I'm having problems with that right now. I have two Promise VTrak 15100's with 15 250Gb SATA drives in each, and anything with a block size over 2Tb is giving me grief. There are legitimate reasons for this, most of which newer documentation claims to be fixing, but I'm still having problems with a current Linux release. Making logical drives under 2Tb works, but doesn't accomplish what I need.
I hope this helps.
Serious? Seriousness is well above my pay grade.
However, what happens if your place has a fire, gets vandalized, or a burglar takes off with your server(s)?
At my last job, we needed a basic RAID device that was under $500. We found this: http://www.accusys.com.tw/7500.htm It was about $200, and is OS and system independent. You simply put in two IDE drives, and you magically have RAID-1. You can hot-swap the IDE drives if necessary. We had one drive go bad and it worked perfectly. I recommend it to anybody on a budget. It takes up 2 drive bays, so it's a pretty easy fit in any standard PC.
I always like to joke that this book should have been called: RTFM: Raid - The Fucking Manual
Life is the leading cause of death in America.
No offense intended, but why didn't you just do a google search rather than asking 1.5million slashdotters?
Holy crap! There are 1.5 million of us? Now I know what to say the next time a bully asks me, "You and what army?" THE SLASHDOT ARMY!!!
-Ryan, with the unoriginal sig
Don't get too fancy with yourself on this one...
You definitely don't need any type of RAID solution because it doesn't offer you what you really need. You say you want RAID, but what you really want is backup.
All RAID solution deal with disaster recovery, but they don't deal with the situation where you accidentally rm -rf a directory that you wanted. If you mirror or RAID 5 your drives, you're still hosed because both drives will delete the files. In the end, this is more important and much more convenient.
Instead, go with a better approach which is copy or tar your files every night (or every week) to a backup drive, preferably over the network on a completely different machine. This will prevent the problem of a power surge or accidental shutoff from corrupting both drives at the same time.
Actually, I've used it quite successfully under Linux for web, MySQL, and mail servers. The mail server is the most abused server, and it has no speed problems. We have 3 IDE drives as a RAID5 under Linux (md device). That server has been known to pass over 100k Emails per day. Sure, it's mostly spam and viruses coming in, but they're still received, scanned, and everything but the high scoring spam and viruses are delivered.
/proc/mdstat
/ /boot /dev/shm
So, several hundred users using IMAP and POP3 to collect mail, SMTP to send mail, and the 100k or so incoming messages do add up to a lot of work, and it handles it flawlessly.
$ cat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath]
read_ahead 1024 sectors
md0 : active raid5 hdc2[2] hdb2[1] hda2[0]
351100416 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices:
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md0 330G 11G 302G 4%
/dev/hda1 122M 8.0M 108M 7%
none 499M 0 499M 0%
Serious? Seriousness is well above my pay grade.
Because Google turns up 1,400,000 hits of mostly crap in 0.11 seconds. When you need advice, do you ask a librarian, or a group of trusted friends? By your logic, we should trust the company that wants to sell us RAID cards. I'd rather ask people that use RAID products, not sell them.
Seeing as how you want data redundancy, there are three RAID levels for you to pick from:
RAID 1 - Drive mirroring.
Pros:
-Excellent read performance, no loss of performance if one drive crashes.
Cons:
-The amount of space you can have on this array is limited to the largest drive you can find. Then you have to buy a second one to mirror the data, which means you are paying double the cost per unit storage on your array.
-Write performance is slower than other RAID levels.
RAID 5 - Striped array with parity. You can stack as many drives as you want on this array (within limits of the controller of course) and lose only one for redundancy.
Pros:
-You can build a very large data array out as many drives as you want, losing only one for the purpose of data reconstruction should a drive in the array fail.
Cons
-Array performance dies in the event of a failure, as lost data is reconstructed on the fly from parity information stored across the remaining drives. Of course, performance is restored with the bad disk is replaced and the array reconstructed.
-You need at least 3 drives to build a RAID 5 array.
RAID 10 - Drive mirroring with striping. Essentially combines RAID 0 and RAID 1, hence RAID 10.
Pros:
-Redundant and fast. Array can survive multiple drive failures.
Cons:
-Expensive. You need at least 4 drives to get started with RAID 10, and go by 2's as you expand on the array. As with RAID 1, your price per unit storage is doubled.
-The array can survive multiple failures, but that depends on which drives die...If you lose two drives out of the same mirror set, then the array is gone
Which RAID level you pick depends on your application. If you are interested in having something like a 1 TB data dump, you'll probably want to go RAID 5. If you only want 200GB or less in your array, then RAID 1 is probably the way to go. If you are interested in lots of space, lots of redundancy, and have lots of money, then RAID 10 is probably what you want.
-R
>No offense intended, but why didn't you just do a google search rather than asking 1.5million slashdotters
No offense intended here either, but why is it that every time someone posts an "ask slashdot" question someone else feels compelled to complain (and occasionally get downright rude) about why the user didn't just "google it"?
Google will get you articles and advertisements, true, but most of the time what the questioner is really after is peoples OPINIONS and EXPERIENCES.
If I post a question like "what's the best backup program you've used on linux" I'm looking for 1.5 million slashdotters EXPERIENCES with backup programs...a google search will get me a list of programs and some reviews if I'm lucky, but that's no substitute for hearing from a bunch of people who've actually DONE or USED something.
Hearing from a few hundred or thousand responders is a better recommendation than a "C-NET" review anyday!
Milo from Kangaroo Koncepts
It sounds to me like this guy just needs a quality HDD and good tape backup. Do not put your faith in RAID, put in a good off-site backup. I've seen RAID solutions fail to many times. I've seen RAID solutions fail twice recently. The first one was a company with a slick server and nice hot-swappable SCSI drives but their controller card went out. It was replaced by the manufacturer but the techs were unable to recover the data. Next one happened when a machines case fan went out and the mirrored HDDs cooked themselves to death. The moral of the story: NEVER TRUST RAID and as always keep a backup.
--Gentoo Baby!
there are two kinds of people: those who have had hard drive failures, and those that will have hard drive failures. i don't care if jesus h fucking christ himself blessed your hard drives.
I suggest looking at getting reliable drives before looking at a RAID solution.
and, if the poster is looking for the more-realtime-than-backup-restore reliability as he indicated, i suggest he look at raid BEFORE looking at drive quality.
the name of the game is redundancy. a RAID array of cheap drives (let's remember that it stands for Redundant Array of Inexpensive Disks) *is* more likely to have a single hard drive failure - but it's recoverable. however, it's far less likely to have multiple, simultaneous drive failures on the same day (unrecoverable) than your one, expensive, better-quality hard drive is likely to have a single failure - which is unrecoverable.
pr0n - keeping monitor glass spotless since 1981.
I'm glad he asked. I benefit from reading the discussion, including the various tangents. This gives me another opportunity to consider using RAID at home and benefit from some "war stories" folks might offer. My needs aren't exactly the same as his, but fortunately people never stick to the exact question asked, anyway. The free information people give out is invaluable, especially the stories of personal experiences and descriptions of people's personal setups at home.
Secession is the right of all sentient beings.
I've run software RAID-5 on Linux for several years on two of my home fileservers.
The only problem I ever encountered were hardware failures (Promise *ack* *spit* PCI IDE cards) and one drive failure. Performance is not really an issue for home use; I can easily saturate my 100Mbps network card.
My Fileserver: AMD Duron 1300MHz, 768MB RAM
This device was built from 4x 160GB 7200rpm SW RAID-5 for online storage (including all of my digital photos, and my collection of CD's ripped to MP3).
For backup I have an old Celeron 433, 512MB RAM box with 4x 120GB 5400rpm SW RAID-5
The main fileserver is rsynced to the backup server once a week. CPU on the backup server is a bottleneck; the Celeron is a bit underpowered for rsync, but it works ;)
My $0.02:
- Software RAID is perfectly usable, especially for typical home use. Performance is adequate.
- With RAID-5 you "lose" only one disk to parity so it is quite cheap to build
- Yes, I'd really like a 3Ware Escalade but if the card fails I need to get a new one pronto; software RAID sets can be migrated to most PCs.
-- Gxis! Ed.
Why not read a few FAQ entries at StorageReview?
/", etc.
In short, I would probably recommend RAID5 if you have 3+ drives.
RAID5 gives you the most available space while still being redundant. It allows for exactly one hard drive failure.
RAID5's write speed is usually terrible, especially with a small number of drives, but write speed isn't a big deal on my home file server. (Only you know about your needs).
RAID1+0 (NOT RAID 0+1, which is inferior) is great for performance. With 4 drives, you have potentially twice the STR of one drive (writing) and 4 times the STR of one drive reading. Of course, since STR is not important for most IO, this doesn't really effect your end performance much unless you are dealing with linearly reading/writing very large files.
Writing performance will almost certainly be higher than with RAID5.
You do lose quite a lot of space (especially when you use a large number of drives). If you used a 4-drive 1+0 array, you would have the space in two of those individual drives.
RAID1 is nice, and is very reliable, but is impractical with more than two drives unless you are incredibly paranoid. RAID1 simply makes all drives copies of the others, this, you always have as much free space as one drive would have, even if you have ten. If course, you could also handle 9 drive failures and not lose data. RAID1 is fine for 2-drive arrays though.
DO NOT FORGET that RAID is no substitute for regular backups. RAID will not help if your data loss is caused by FS corruption, a cracker, accidentally typing "rm -rf
For lowest cost, I would use software RAID, such as Linux's LVM, FreeBSD's Vinum, or whatever Windows has. (RAID5 requires Windows server). (I would not use Windows as the file server myself).
For slightly higher cost, try a Promise controller.
I would avoid Highpoint and Silicon Image controllers. Highpoint, especially, is crap. (but it is very cheap, at least).
If you possibly can, I would recommend a nice 3Ware Escalade controller. Escalades are true hardware RAID cards, unlike Highpoint/SI and most of Promise's cards, and are OS independent and very stable (with certain exceptions for some unlikely configurations).
If you have any questions, you might try the StorageReview forums. There are a number of extremely knowledgeable people there, including engineers and executives-level researchers at hard drive companies. They can give far better advice than I can, I am sure.
By the way, all my comments assume that all drives are the same size. If not, treat all drives as if they are the same size as the smallest drive on the array (unless you are using JBOD, which is not redundant)
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
If it's just a mirror, writes are slowed slightly
Hardware controllers with batter backed RAM (note; not all controllers have this), will have an edge over software solutions on ALL writes - no matter which RAID level you use.
Don't even bother trying to do RAID 5 in software
SW RAID is usually a lot faster than HW RAID solutions, when you factor out the battery-backed RAM part. Any HW RAID controller with battery backed memory will lose big-time to SW RAID on even moderately faster CPUs (like 500MHz P-IIIs), especially on RAID-5 which is compute intensive, an even more on RAID-6 which is also compute intensive but not XOR based.
Modern HW RAID controllers have reasonably fast CPUs with XOR accelerators built in - therefore they can do RAID-5 as fast as the pure SW solution. But this is not the case with older controllers.
I know of people who use 3ware cards for large RAID-5 servers, but only use the 3ware cards as "dumb" IDE controllers, and leave the RAID-5 handling to SW-RAID. The reason? Their benchmarks indicate that this is significantly faster.
And when you think about it, it makes sense. Nobody puts a GHz processor on a RAID controller. Even a slow-by-todays-standards P-III is able to XOR more than a gigabyte of data per second - much much more than anything you put thru most file servers out there.
So, the "HW RAID is faster than SW RAID" is true in one scenario only; when you have write-intensive workloads and a HW RAID controller with battery backed cache.
In *all* other cases, SW RAID will be a win, performance wise.
For a personal file server, I wouldn't hesitate to run RAID-5 in plain software. It's as fast or faster than any HW RAID controller in the sub-$3K price range, it's reliable, and the flexibility beats the heck out of any HW based solution out there (mixing IDE/SCSI, allowing a cryptographic layer between the RAID layer and the physical disks, etc. etc...)
Next up is drives. Not all drives are alike as I'm sure you already know. Do you want a SCSI or an IDE array? I won't go into this lengthy topic further. I'll assume though that you will build an IDE array. Some drives do not work well in RAID setups. The controller companies are more likely to tell you this than the drive manufacturers. I own 6 Western Digital WD12000JB drives (7200 RPM, 8MB cache, 120GB capacity). By all accounts one would expect those drives to work quite well in a RAID setup. They have excellent read/write times individually and have a massive amount of cache. Well, one would think that and they'd be wrong. Both 3ware, Highpoint, and Asus tech support (on an OEM Promise chipset in teh A7V333) recommend against using Western Digital drives. 3Ware did however say that WD will give you firmware that works significantly better in RAID setups if you ask for it. Personally I'm a fan of Maxtor, both the drives and the company. I've had very few failures with Maxtor drives. Whenever I did they were always extremely helpful with getting me a replacement fast. I've been very impressed by ther service. I have 2 Maxtor 7Y250P0 and 2 6Y200P0 drives in the server sitting next to me. The second is a very high quality drive from Maxtor's DiamondMaX Plus 9 line. It too have 8MB cache and 200GB to spare and runs at 7200 RPM. Nice drive. The first pair are from Maxtor's MaXLine Plus II. They have a high MTTF, 8MB cache, 250GB space, and run at 7200 RPM. They are also a little bit faster than the 6Y200P0. They are excellent drives. My next drives will also be Maxtors but this time I'll be buying the SATA siblings of the MaxLine Pluss II product line.
That brings me to my next point. PATA or SATA. Does your case have an abundance of room? I mean a massive amount of room to route long 80-conductor ribbon cables? Do you have at least 1 if not 2 PCI slots to waste below your RAID controller with the room needed to route the ribbon cables and make connections? If not then you need to go with Serial ATA drives. Don't even think twice about it. Go with SATA. The drives cost almost the same nowadays and you'll find wht little price difference there is ($5?) is worth it in the end. SATA drives are so much easier to wire. I have a case full of round cables. The case I have is an extremely large Codegen case and even I am having trouble with the cable mess. SATA is a wonderful thing. Along the same lines is hot-swap cages. There are a dozen brands to choose from. You should probably utilize them, even if you don't need hot-swap capabilities. I need them to create 3.5 drive slots from 5.25 bays. If you do want to do hot-swapping, make sure you drive cage and controller support it.
Finally we get to RAID levels. You don't want to increase your risk of losing data so level 0 is out. 1 is extremely redundant and with the right controller can actually speed up reads. It's also costly at twice the cost per GB. Unless the data you're storing is absolutely critical you won't want to use 1 (in most cases). Forget about level 2. For starters th
Wrong: With RAID 5, if any two drives fail, all data is lost. Well, all data is not really exact, but the only thing you could hope is that you can still restore some text files.
Your description seems more to fit a RAID 1+0 which is something completely different.
And then, you don't seem to know anything about probabilities:
"If you have 15 drives, and two fail, the chances of them being consecutive are very low."
Correct. But the probability of two consecutive drives failing ist still just as high as with 3 drives! It is just much more probable that from 15 drives two fail at the same time (that's just 2/15) than from three drives (2/3). Still, it could be better to have more drives, just because you could have a better "feeling" of how many drives fail before it comes to the fatal crash.
But for RAID 5, this is irrelevant anyways, because any two drives failing will screw your data. And with 15 drives, the probability for that is much higher (and I would even say that 15 drives is a bit too much for RAID 5, use a RAID level where more than one drive can fail without data loss).
Unix makes easy tasks hard and hard tasks possible. Windows makes easy tasks easy and hard tasks $29.95.
I have just finished doing this exact thing.
;)
;)
I basically built a box to do nothing other than fileserv. I put together a nice simple old PC (550mhz with 256 meg of ram) and mounted it in an old rack mount case I had lying round.
It's running debian with 2.4.26.
I'm running software raid and installed 2 x 2 interface IDE cards.
I threw in 6 seagate 120 gig drives (the ones with the 8 meg cache) and ran raid5 across 5 of them and a hot spare to rebuild the raid should a drvie fail. Each drive has it's own IDE channel to prevent channel faliure from screwing my raid.
I'm using ext3 as the filesystem and wrote my own little raid mon script that SMS's me should a drive fail and alarms locally.
This setup has been rock steady and gives me 460 (ish) gig of usable space after formatting.
For added peice of mind the machine is plugged into a UPS that is connected to the machine via Serial. If the UPS kicks in it shuts the machine down properly after sending an alarm SMS (the DSL and switch are also on the UPS) (yes I'm a paranoid freak)
This makes a perfectly good media and file server and I've had no problem with it in the few months I've had it.
I also reccomend setting the spin down time onm the drives manually with hdparm. It was getting awfully warm in the box till I turned that on on the seagates. Modern drives are rather hot.
I have the whole thing mounted via SMB on my other boxes around the house and it's fast,(gig ethernet) reliable and easy.
Tho do remember that no amount of raiding will save you if you lose 2 drives through some horrible freak of badness, and no raid level is going to protect you from a house fire. Hence mine also rsyncs all my absoloutely vital files (scanned family photos and docs) offsite to a file storage site every night at 2am so as not to chew my bandwidth dduring usable times. Don't forget the only truely secure data is that which is backed up.. and offsite.... twice.
What you seem to fail to grasp is that your 5 year SCSI guarantee does not guarantee you that the disk will not fail within 5 years.. It merely means that the disk is unlikely to fail in that time and they will give you a free replacement if it does.
Therefore, if your data is important you won't just trust that an unlikely event won't happen - you'll assume that it will happen and make sure that it won't affect the integrity of your data.
Therefore you'll be using RAID and preferably regular backups whatever you do. This is what ensures your data integrity, not the reliability or otherwise of your drive.
After that, it's a case weighing performance, the cost (in money, manpower and downtime) of replacing a broken drive and the cost of setup against each other, and this is where it starts to make sense to use IDE drives for RAID:
For instance, say you've got 5 IDE RAID array. Over the space of say, five years you end up having to replace three of the drives - that's eight IDE drives you've had to buy
You also do the same thing with SCSI drives, and luckily none of them break - that's 5 SCSI drives all in all.
Now, say the IDE drives cost $100 each compared to $500 for the SCSI drives. You've spent $800 in the IDE case compared with $2500 in the SCSI case. There was no difference in the safety of your data but the SCSI one cost three times as much.
Therefore to choose SCSI, you'd *really* want to get that extra little bit of speed, which to be honest is more likely to be limited by the network to your server anyway...
So, to recap - assuming your data is valuable to you, the choice between SCSI and IDE has nothing to do with the disk reliability because you'll be relying on some other systems (RAID and backups) for your reliability anyway.
I would suggest you dont buy a RAID System: Heres what I do: I got 3 harddrives - one small one with a tiny linux installation on it and 2 harddrives of the same size for data. Every night Drive 1 is rsynced to Drive 2 and unmounted. Now Drive 2 will be mounted instead of Drive 1. The next Night Drive 2 will be rsynced to Drive 1 and so on. The great advance: If you accidentally delete a file, you have untill midnight to restore it without any hazzle.
Spelling mistakes: My is english spoken not tongue of mother.
How do I know? 'Cause I submitted this EXACT SAME story a month ago and was rejected.
A ID-5-3BAY&cats=&catid=314,312 It is a 3 bay RAID 5 for $800.
Sigh.
The cheapest RAID 1 OS internal and independent RAID (MIRROR) is Duplidisk3 by ARCOIDE.com
You also get a ton of implementations; Stand alone, PCI card (for power only), 3 1/2" bay, and 5 1/4" bay. The ones that install in bays are so the user can seethe status lights.
If you want an external RAID 5 the cheapest I have found is this - http://www.coolgear.com/productdetails1.cfm?sku=R
If you want 5 disk RAID 5 those are @ $1200. http://www.cooldrives.com/fii13toatade.html
If you want external RAID 0 or 1 relatively cheap then go with one of these - http://www.cooldrives.com/dubayusb20an1.html
You can find a ton of these devices on the web since they all use the same drive controllers and bays. The nice thing about these is that sometimes you can talk the store into selling you the RAID system without the external case. These things simply require you plugging in an IDE cable and power and can be installed in any PC case that has 2 5 1/4" bays open. If you but just the 2 bay controller they are @ $230 or so. I have one and I am really happy with it.
Everything I listed above uses IDE drives and is OS independent.
The problem w/ Software RAID is it depends on the OS, if you OS fails you can loose your data - I've confirmed this w/ Windows Software RAID at least, it's a real, real bitch to recover from if you have any OS problems (and no matter what anyone tells you Signed Disks in Windows are a horror story waiting to jump out at you).
As for forking $ for RAID cards, I've had really good experiences w/ the MegaRaid cards from LSI Logic - really, really good tech support and exceptionally inexpensive cards.
closed minded is as closed minded does
Basically, your options are RAID-1 and RAID-5... as hundreds of people here have already pointed out. RAID-1 is just straight mirroring (where all drives in the array contain the same information). Usually, this just involves two drives, but there's no reason why you couldn't have, say, three or four drives all mirrored... and you could lose all but one of them and still be up and running.
RAID-5 is a very cool beast. You bascially have an array of drives with some portion of them set aside for redundancy. Most of the posts I've seen here only describe a scenario where you have three drives with one of those drives for redundancy. This only scratches the surface, however.
For example, you could have an array of, say, 5 10GB drives, with 2 drives' worth of redundancy. With this, your RAID implementation would make available to you, what seemed to be, a single 30GB drive (since 20GB of the total 50GB is used for redundancy). This way, you could have any two drives go bad and you're still okay.
Another example, I guess, is that you could have a two-drive RAID-5 with one drive's worth of redundancy. In this case, you'd have the functionaly equivalent of a RAID-1 mirroring setup. Not very sexy... but you could do it in some implementations, I'm sure.
I'm trying to use the phrase "X drives' worth of redundancy" instead of "X drives set aside for redundancy" because it's important to point out that, in RAID, all of the drives are considered equal. If you have 5 drives with 2-drive redundancy, it's not like you set 3 of them as the "main" drives and 2 as the "backup" ones. There's no preferential treatment like that. All the drives are equivalent and you could lose any of them and the others all move to cover for the one that was lost.
Now, personally, I like RAID-5 because it offers the ability to use more than 50% of the space you paid for. With RAID-1 mirroring, you always only get to use 50% of the space that really exists. This would be necessary if, when you suffered a storage failure, you always lost half of it. But that's not how it happens. Usually, you lose a single drive. So, it would be nice to maximize your space available, while having some insurance against a single drive failure.
This is where RAID-5 really shines, because each successive drive you add, you get all of that space for your usage. You could have, say, four drives, 1 drive of redundancy, and you get 3 drives' worth of space.
Now, there are a few pros and cons for both RAID-1 and RAID-5 regarding recovering/moving data and changing the size of your array, and I'll list them here.
IMHO, the real value in SW RAID is the hardware independence.
If your HW RAID controller dies, you have to get another one of the same controller, and hope that you can re-import your config w/o losing all your data. If your running SW RAID and your SCSI/IDE controller dies, you can replace it w/ whatever is cheap/available at the time. As long as the failure itself didn't bork your data, you shouldn't have to do much, if anything, to see your data again.
If you can afford to get the top of the line SCSI RAID controller from a good vendor it's probably the better option, but if cost is an issue, IDE SW RAID is the only way to go.
That's not how you do it. Show me someone who needs 147 GB on their root partition, and I'll show you someone who has a poorly configured linux system.
/ /scratch
/scratch; it doesn't need fast access. My swap, libraries, binaries, source code, etc, is on my root partiion, where it needs fast access.
My system, for example:
36gig 15k (3.6ms) rpm scsi:
250 gig 5k (9.5ms) rpm ide:
Who needs 3.6ms access time for their music and videos? What will that gain you? I can tell you what 3.6 ms access time gives you for a root partition, though: blazingly fast startup of the system, of X, of programs, and compilation.
All of my media is in
As I demonstrated, you can get a small 10k rpm scsi drive with access time 70% better of that for all but the nicest IDE drives (which cost notably more than scsi drives), brand new and with shipping, for 30$. After re-looking at pricewatch, I found the same thing for only 20$, including shipping. You can get a new scsi controller for 20$ also, inc. shipping, that will do 40mb/s (plenty for one drive). A new cable will cost you about 6$. That's 50$ for a root partition that will give you a 70% speed boost over a 7200 RPM ide drive.
Why would one *not* do something like that, unless they really don't care about speed at all? And if they don't care about speed, why raid for reasons other than redundancy?
You know when it's okay to shout fire in a crowded theatre? When it's on fire.
No one uses software RAID for performance, although the performance is good compared with the cheap 1+0 cards available.
The real advantage of software over hardware RAID is that you don't need to keep a spare RAID card around. With hardware RAID, when your RAID card fails you'll need exactly the same make & model card to read your data.
With Linux software RAID, you can read the drive set on any system with the raid modules.
London's finest organic fairtrade coffee
I know of people who use 3ware cards for large RAID-5 servers, but only use the 3ware cards as "dumb" IDE controllers, and leave the RAID-5 handling to SW-RAID. The reason? Their benchmarks indicate that this is significantly faster.
First off, 3Ware cards cannot be used as "dumb" IDE controllers - they only support logical drives - creating single drives is not possible, nor is leaving unassigned drives.
Second, Software raid will always suck for one big reason: A drive fails, your system locks up.
I have not seen any software based controller (promise, Silicon Image, High Point) or complete software based solution (Windows 2000/2k3 server's RAID, or Linux's md raid) on standard IDE controllers stay alive after a drive fails. It always takes the box down with it.
When you buy a hardware based RAID solution, the controller handles the drive failure gracefully, which keeps the machine running. "Dumb" IDE controllers don't know they're raided (they are dumb after all), so when a drive fails, they freak out.
3Ware makes a TRUE hardware based RAID solution that is intelligent enough to email you when a drive fails. Their 2 channel cards (SATA and PATA) are roughly $100, and their 4 Channel cards (RAID-5-able) are $250 and $350. Its well worth the money.
I've not used the LSI Megaraid SATA controller yet (I plan to); I've had good luck with their cards for SCSI RAID, and they carry a slightly cheaper price tag than the 3Ware cards.
No, I do not work for 3Ware - I think suggesting software RAID to anyone is a bad idea. I've seen people loose data with promise controllers, which are nothing more than glorified IDE controllers with software doing the RAID functionality. Software RAID is BAD.
-- If we don't stand up for our rights, now, there will be no right to stand up for them later.