RAID Vs. JBOD Vs. Standard HDDs
Ravengbc writes "I am in the process of planning and buying some hardware to build a media center/media server. While there are still quite a few things on it that I haven't decided on, such as motherboard/processor, and windows XP vs. Linux, right now my debate is about storage. I'm wanting to have as much storage as possible, but redundancy seems to be important too." Read on for this reader's questions about the tradeoffs among straight HDDs, RAID 5, and JBOD.
At first I was thinking about just putting in a bunch HDDs. Then I started thinking about doing a RAID array, looking at RAID 5. However, some of the stuff I was initially told about RAID 5, I am now learning is not true. Some of the limitations I'm learning about: RAID 5 drives are limited to the size of the smallest drive in the array. And the way things are looking, even if I gradually replace all of the drives with larger ones, the array will still read the original size. For example, say I have 3x500gb drives in RAID 5 and over time replace all of them with 1TB drives. Instead of reading one big 3tb drive, it will still read 1.5tb. Is this true? I also considered using JBOD simply because I can use different size HDDs and have them all appear to be one large one, but there is no redundancy with this, which has me leaning away from it. If y'all were building a system for this purpose, how many drives and what size drives would you use and would you do some form of RAID, or what?
At first I was thinking about just putting in a bunch HDDs. Then I started thinking about doing a RAID array, looking at RAID 5. However, some of the stuff I was initially told about RAID 5, I am now learning is not true. Some of the limitations I'm learning about: RAID 5 drives are limited to the size of the smallest drive in the array. And the way things are looking, even if I gradually replace all of the drives with larger ones, the array will still read the original size. For example, say I have 3x500gb drives in RAID 5 and over time replace all of them with 1TB drives. Instead of reading one big 3tb drive, it will still read 1.5tb. Is this true? I also considered using JBOD simply because I can use different size HDDs and have them all appear to be one large one, but there is no redundancy with this, which has me leaning away from it. If y'all were building a system for this purpose, how many drives and what size drives would you use and would you do some form of RAID, or what?
That said, RAID is not a replacement for proper backup. RAID is just a first line of defense to avoid downtime.
"No one likes working in a hamster wheel, and your shop smells of cedar shavings from here." - TaleSpinner
RAID 5 drives are limited to the size of the smallest drive in the array.
Yes... Duh....
And the way things are looking, even if I gradually replace all of the drives with larger ones, the array will still read the original size. For example, say I have 3x500gb drives in RAID 5 and over time replace all of them with 1TB drives. Instead of reading one big 3tb drive, it will still read 1.5tb. Is this true?
Yes... Fucking duh.... Have you even read the RAID 5 Wiki article?
I also considered using JBOD simply because I can use different size HDDs and have them all appear to be one large one, but there is no redundancy with this, which has me leaning away from it. If y'all were building a system for this purpose, how many drives and what size drives would you use and would you do some form of RAID, or what?
We've been through this a million times before and the answer is always the same. You're a cheap bastard who wants gobs of space with an acceptable amount of redundancy but aren't willing to buy two sets of drives. Buy 4 of the biggest drives you can afford and RAID 5 them. Don't expect stellar write speeds. You won't have a backup if something happens and all 4 drives blow but you'll at least have protection when one drive gives up the ghost which is mainly what most people want to protect against.
Why does stupid shit like this keep getting posted to the front page?
This is what you do: buy 2 drives exactly the same size and mirror them. End of story. If you're worried about a blown raid controller, then buy another hard drive and stick that on another computer and run a weekly cron job to copy everything. Right now you can get 500 GB hard drive for about $150. Get two of them and mirror them. (If you need more than 500 GB I would highly suggest encoding your porn into a different format than MPEG2) By the time you run out of space, you will be able to get 1 TB drives for about $150. Migrate over to the 2 1 TB hard drives. Repeat every few years.
With computers, the stupidest thing you can do is spend extra money to prepare for your needs for tomorrow. Buy for what you need now, and by the time you outgrow it, things will be cheaper, faster and larger.
By the way RAID 5 is a pain in the ass unless you have physical hotswap capability, which I highly doubt.
I really can't believe this made the front page. The questions are badly written, and the question itself could have been answered with some basic Internet research. RAID isn't an esoteric topic anymore, folks!
This place has really gone downhill. I thought Firehose was supposed to stop stuff like this, not increase it!
Anyways, just to be slightly on topic: there's no one answer to this question. It depends on your budget, your motherboard, your OS, and, most importantly, your actual redundancy needs. This kind of thing is addressed by large articles/essays, not brief comments.
Plausible conjecture should not be misrepresented as proof positive.
Hardware WILL get old, WILL die, and better stuff WILL become available. So it only makes sense to recognize this and plan for it.
Here's the way I do it (for a home storage server, not a solution for business-critical stuff):
Examine current storage needs, and forecast about two years into the future.
Build new server with reliable midrange motherboard, and a midrange RAID card. These days you could do with a $100-$300 four-port SATA card, or two.
Add four hard disks in capacities calculated to last you for two years of predicted usage, in RAID 5 mode. Don't worry about brand unless you know for a fact that a particular drive model is a lemon.
Since manufacturer's warranties are about one year, and you may have difficulty finding an unused drive of the same type for replacement, buy two more identical drives. These will be your spares in the event of a drive failure.
When the two years are up, you should be using 80 to 90 percent of your total storage.
At this point, you build an entirely new server, using whatever technology has advanced to at that time.
Transfer all your files to the new server.
Sell your entire old storage server along with any unused spare drives. A completely prebuilt hot-to-trot RAID 5 system, with new matching spare disk, only two years old, will still be very useful to someone else and you can recoup maybe 30 to 40 percent of the cost of building a new server.
Lather, rinse, repeat until storage space is irrelevant or you die.
If you are going to do this, do it right. It will cost you some up front, however, in the long run, doing it right will be cheaper. Get a real raid card, as in hardware RAID. Get something that supports multiple volumes and at least 8 disks. I personally just got the Promise SuperTrak EX8350. Now, why do you ask do you need 8 disks? So you can upgrade, that is why. Use your current 3 or 4 disks you have now in a raid volume. In a couple years when bigger disks are dirt cheap, pick up 4 1TB+ size disks and build a second volume on the RAID array using the new disks. Now you can offload all the old data onto the new RAID volume and either ditch the old disks or keep them around (up to you, however, I recommend ditching to other computers or whatever so that you now have 4 empty slots on the RAID card so that you can rinse/repeat the whole process again in another few years...)
Again, doing it correct up front takes care of upgrade options down the line. It also gives you room to do monster sized volume if you ever need that much space (8 disk array). Most of these RAID solutions are also OS independent, so if you want dual boot, the volume would be recognized by Windows, Linux, Unix, BSD, etc., and you are also not dependent on using the exact same motherboard if you motherboard dies or wants to be upgraded (you would lose all your data if you use the built in RAID on the motherboard when changing to a new motherboard other then the exact same model).
These better cards also can be linked together (i.e. you always get a second card assuming your motherboard has a slot for it, and add more disks to the array that way as well).
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
Even though it was modded funny, it's good advice: if most of your data is not something you created on your own, either directly or indirectly as a part of using the computer, it's possible to replace it from an outside source if lost. All you really need a backup of is your unique data.
This is what you do: buy 2 drives exactly the same size and mirror them. End of story.
NO! That's NOT the end of the story. You need to do what is called "scrubbing" the array periodically, because drives "silently" fail, where areas become unreadable for various reasons. Guess when one usually discovers the bad data? When one drive screeches to a halt, and you confidently slap in another and hit "rebuild". Surpriiiiiiiiise.
You can do it a variety of ways. The most harmless is probably to run a read-only bad-block test via cron, while monitoring each drive's SMART parameters long-term and having your cron job let you know if badblocks finds anything. An alternative is to instruct md to verify the array, if you're doing software raid.
You cannot, cannot, CANNOT just drop a bunch of drives into raid 5 and expect it to be peachy for the rest of time.
By the way, regarding controllers- skip ANYTHING made by 3ware, especially their PCI controllers. They're barely able to push 20-25MB/sec and have a couple of bad compatibility problems with certain drives. Areca units are blazing fast (especially the PCI-E cards) but priced for businesses, not home users looking for "cheap as possible."
Software raid comes in #1 for price/performance, but I strongly, strongly recommend you play around with the mdadm tool quite a bit before you put actual data on an md array. The stuff is very half-baked.
Please help metamoderate.
I did this a couple of times recently. I built a file server to supply ripped DVDs to three media centers in a house. I played around with RAID but got poor disk performance. Eventually I realized that the data is not vital information - the world won't end if you loose some movies and have to rip them again. I put four 500 GB drives in a Supermicro 8 bay server, with the OS on an internal drive.
Each drive is mapped by each the UNC path, i.e., \\movieserver\movies1 so the media centers have four drives mapped on each one.
If I lose a hard drive, oh well, some of the movies won't be available until they are re-ripped from the DVDs.
Had I used RAID5, I would have 1,500 GB and it would not have been easy to upgrade. I have ran out of room and I am adding a couple of 750 GB drives.
If you use a linux server and LVM, losing one drives loses everything.
CM www.cometenergysystems.com Blog: http://caribbeanrenewable.blogspot.com/
The advantages of RAID 0 versus RAID 1 versus RAID 5 have already been covered in detail, here, and in many books and websites.
However, allow me to address the issue of how they relate to a media center:
Firstly, when you say "media center/media server", do you mean "I just want to build myself a kickass Tivo?", or do you mean "I want to serve video for everyone in my frat house, simultaneously?"
If the former, consider that Tivos ship with 5500 RPM drives for several reasons:
1) They're cheaper than faster drives
2) They run cooler than faster drives
3) They run quieter than faster drives
4) They use less power than faster drives
5) They're more than fast enough for streaming a single video to your TV while recording another
Long story short, if you're just building a "free" Tivo with a kickass drive array, performance is *not* an issue. Keep in mind that if you're building a set-top box of sorts, the low heat and low noise features are *very* big benefits. You probably want RAID 5, and/or JBOD.
If, however, you're planning on serving video to more than a handful of stations simultaneously, you may need to consider performance. This is a vote for RAID 0 and/or RAID 10.
Now, the second axis: How important to you is this data? Really?
I've got over 300 gigs of drive space on my Tivo. Most of it is the last two weeks of television reruns (Scrubs, 6 copies of last Thursday's Daily Show, etc.), movies I recorded but won't watch, etc. There are about 10 gigs (3%) of video on there that's been saved for a few months, and frankly, I couldn't tell you a single thing on there that I'd miss if my drives went belly up tomorrow. So: do you *really* need to save all those Seinfeld reruns on a highly-redundant storage array? How *much* of the stuff on the server do you really need to keep?
Assuming it's less than 50% (in the Tivo scenario, it probably is), consider using JBOD for most of your storage, and maintaining a single backup drive, or small backup drive array. Or just backing up the good stuff to DVD.
In summary: If you're just building a Tivo, you probably don't really need the performance, or redundancy that RAID offers.
Benchmarks of hardware vs software RAID (results: mostly software > hardware raid):
http://www.chemistry.wustl.edu/~gelb/castle_raid.
http://milek.blogspot.com/2006/08/hw-raid-vs-zfs-
http://milek.blogspot.com/2006/08/hw-raid-vs-zfs-
http://milek.blogspot.com/2007/04/hw-raid-vs-zfs-
http://stoilis.blogspot.com/2005/09/linux-softwar
Benchmarks/info of Linux IP Routing (more than capable of gigabit routing):
http://hardware.slashdot.org/article.pl?sid=06/09
http://freedomhec.pbwiki.com/f/linux_ip_routers.p
http://docs.rodecker.nl/10-GE_Routing_on_Linux.pd
Of course a Linux machine isn't going to be all that much help to you if are doing supercomputing work with 10 gigabit routing (but as we start seeing more dual quad core machines with 4 PCI Express x16 slots, this is bound to change).
So if you're not working with high-end ("giga" prefix) storage/networking for large systems, you're wasting your money on hardware appliances. Cheap hardware firewalls are a scaled down PC in a fancy box. Cheap RAID cards don't have their own ASIC offload engines. Cheap hardware routers are a joke compared to Linux PC routers.
Unless it is a 10 gigabit router with everyone done in specially designed high performance ASIC chips, you will see better performance on a PC than in a hardware appliance. The same for hardware raid where we're mostly only talking about 5 gigabit read/write speeds to/from the array.
Forcing this response to the top of the page, just so visitors don't think Slashdotters don't know RAID math.
I.e. 3 500 GB drives in a RAID 5 doesn't give you 1.5 TB. (RAID 0 dose that). With RAID 5 you only get 1 TB.
--= Isn't it surprising how badly I spell ?
The main operation in raid5 is XOR. The processors on modern computers have 2 cpus, each of which runs at speeds >2ghz, and have instructions which xor 128-bits at a time. They can xor at gigabyte/s ranges. This is far larger than: the actual drive performance (~40mbyte/s) the bus bandwidth available to transfer data from memory to the drives. I have done the benchmarks. The linux software raid speed is pretty much exactly what you would expect - you will run at the slower of the maximum bus bandwidth available or the drive speed. With small arrays, you'll be limited by drive speed. With large ones, you'll top out at poci bus speed if using pci, or drive speed if using something better.
I used to run both Mirroring and RAID 5 in the past (not at the same time), but I found it overly complex for simple usage, plus it doesn't allow for what happens if the controller card fails or system goes up in smoke? Plus once you build a RAID you can't just add a drive to it easily or cheaply (I'm over simplifying this I know)
I find the best is to have another computer or possibly external drives sitting somewhere, and just make weekly/daily/monthly/whatever rsync copies between them. This allows for you to recover from user error like accidental deletions, and if the entire system goes down your covered. Want more space? add a drive and presto, more space. No special configuration required. No expensive controller cards (or cheap and slow controller cards) required.
And if your like me, you have another set of drives stored offsite... but I'm pretty paranoid about such things. =P