Slashdot Mirror


RAID Vs. JBOD Vs. Standard HDDs

Ravengbc writes "I am in the process of planning and buying some hardware to build a media center/media server. While there are still quite a few things on it that I haven't decided on, such as motherboard/processor, and windows XP vs. Linux, right now my debate is about storage. I'm wanting to have as much storage as possible, but redundancy seems to be important too." Read on for this reader's questions about the tradeoffs among straight HDDs, RAID 5, and JBOD.

At first I was thinking about just putting in a bunch HDDs. Then I started thinking about doing a RAID array, looking at RAID 5. However, some of the stuff I was initially told about RAID 5, I am now learning is not true. Some of the limitations I'm learning about: RAID 5 drives are limited to the size of the smallest drive in the array. And the way things are looking, even if I gradually replace all of the drives with larger ones, the array will still read the original size. For example, say I have 3x500gb drives in RAID 5 and over time replace all of them with 1TB drives. Instead of reading one big 3tb drive, it will still read 1.5tb. Is this true? I also considered using JBOD simply because I can use different size HDDs and have them all appear to be one large one, but there is no redundancy with this, which has me leaning away from it. If y'all were building a system for this purpose, how many drives and what size drives would you use and would you do some form of RAID, or what?

12 of 555 comments (clear)

  1. Do some research first? by bi_boy · · Score: 5, Informative

    Wikipedia has a very informative article regarding RAID and the various levels, in fact here it is. http://en.wikipedia.org/wiki/RAID

    --
    Chicken fried butter sticks? Do ... do you use a fork? - Black Mage, 8-Bit Theater
    1. Re:Do some research first? by Anonymous Coward · · Score: 3, Informative

      While you're there, check out how http://en.wikipedia.org/wiki/ZFS can make most of the issues other posters are point out irrelevant, or at least nothing to be worried about.

      While Solaris might be a dirty word among the Slashdot crowd, if all the OP needs is a way to store a bunch of files, ZFS is an excellent solution. Check out http://www.opensolaris.org/os/community/zfs/whatis / and in particular the demos linked on the left side.

      Then, if you're still not convinced how appropriate ZFS might be for a somewhat clueless user, read about how it can save your ass from flaky hardware and data corruption: http://blogs.sun.com/elowe/entry/zfs_saves_the_day _ta

  2. Linux, raid5, LVM on top, can use extra capacity by Spirilis · · Score: 4, Informative

    With Linux you can create a RAID5 md device, say /dev/md0, then run LVM on top of that (pvcreate /dev/md0 ; vgcreate MyVgName /dev/md0) and use that to carve out your storage. The key here is to create partitions on each drive, eg filling up the entire disk, and create your raid5 with those.

    If you buy 1TB drives further down the road, here's what you do- With each disk, create a partition identical in size to the partitions on the smaller disks, then allocate the rest of the space to a second partition.
    Join the first partition of the disk to the existing RAID set. Let it rebuild. Swap the next drive, etc. etc. Then once you've done this switcharoo to all the drives, create another raid set using the 2nd partition on your new disks--call it /dev/md1. So now you have /dev/md0, pointing to the first 500GB of each disk, and /dev/md1, pointing to the 2nd 500GB of each disk.

    Take that /dev/md1 and graft it onto your LVM volume group. (pvcreate /dev/md1 ; vgextend MyVgName /dev/md1). Now your LVM VG just doubled in size, and you can use all that new space. Whatever you do though, do NOT create any "striped" logical volumes (the "-i2" option to lvcreate; LVM's Poor Man's RAID0, basically) because you will suffer terrible performance, since you'll be striping across different volumes on the same physical spindles (a big no-no for any striped configuration). But if you use the extra space by creating new filesystems or growing existing ones, you shouldn't see any trouble.

    Just be sure that any replacement drives you have to buy... you must partition them out similarly. I'd recommend pulling back on the partition sizes a bit, maybe 5%, to account for any size differences between the drives you bought right now and some replacement drives you may purchase later on which might be slightly lower in capacity (different drive manufacturers often have differing exact capacities).

    --
    the real at&t mix
  3. Linux, RAID 5, md by Pandaemonium · · Score: 5, Informative

    Go RAID5. RAID5 = Hardware failure resilience + maximum storage.
    Go Linux. The Linux MD driver allows you to control how you RAID- over disks or partitions. there are advantages. We will discuss.

    First, don't get suckered into a hardware RAID card. They are *NOT* really a hardware card- they rely on a software driver to do calculations on your CPU for RAID5 ops. Software RAID is JUST AS FAST. Unless you blow the big bucks for a card with a real dedicated ASIC to do the work, you're fooling yourself.

    Now, you want to go Linux. By using the md driver, you can stripe over PARTITIONS, and not the whole disk. By doing this, you can get MAXIMUM storage capacity out of your disks, even in upgrades.

    Say you have 3 500GB disks. You create a 1TB array, with 1 disk as parity. On each of these disks is a single partition, each the size of the drive. Now, you want to upgrade? SURE! Add 3 more disks. Create three partitions of EQUAL size to the original, and tack it on to the first array. Then, with the additional space, you can create a WHOLE NEW array, and now you have two seperate RAID5's, each redundant, each fully using your space.

    Another advantage with MD is flexibility. In my setup, I use 5x 250 drives right now. On each is a 245GB partition, and a 5GB partition. I use RAID1 over the 5's, and RAID5 over the rest. Why? Because each drive is now independently bootable! Plus, I can run the array off two disks, upgrade the file system on the other 3, and if there's a problem, I can always revert to the original file system. So much flexibility, it's not even funny.

    I recommend using plain old SATA, in conjunction with SATA drives, and just stick with the MD device. For increased performance, watch your motherboard selection. You could grab a server oriented board, with dedicated PCI buses for slots, and split the drives over the cards. Or, you can get a multiproc rig going, and assign processor affinity to the IRQ's- one card calls proc 1 for interrupts, the other card calls proc 0. If you have multiple buses, then performance is maximized.

    The last benefit? Portability. If your hardware suffers a failure, then your software RAID can move to any other system. Using ANY hardware RAID setup will require you to use the EXACT same card no matter what to recover data. Even the firmware will have to stay stable or else your data can be kissed goodbye.

    Windows? Forget about it.

    Good luck!

    1. Re:Linux, RAID 5, md by Pandaemonium · · Score: 4, Informative

      It'll take some reading and combining from multiple sources. I've been doing it for a few years, combined with a handful of upgrades, plus setting it up as an iSCSI backend- all of that lent to the pool of greyness in my head.

      I recommend Gentoo to do this with. Other distro's dont include the latest mdadmtools required to manage and migrate RAID5 md devices. Ubuntu is catching up, I believe.

      Here are some places to start:

      http://gentoo-wiki.com/HOWTO_Gentoo_Install_on_Sof tware_RAID
      http://www.gentoo.org/doc/en/gentoo-x86+raid+lvm2- quickinstall.xml
      http://linas.org/linux/Software-RAID/Software-RAID .html
      http://linas.org/linux/raid.html
      http://evms.sourceforge.net/
      http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html

    2. Re:Linux, RAID 5, md by Pandaemonium · · Score: 3, Informative

      Software RAID just as fast? Please. Next you're going to tell me a software firewall is just as good as a hardware firewall, right?


      What's rather humorous about this statement is that ultimately, all firewalls are implemented in software. What is firmware, again?

      There are three different implementations of RAID on PC class hardware- software RAID, fake-hardware RAID, and hardware RAID.
      When I said that software is just as fast, I'm comparing it to fake RAID cards, cost under ~$200 US. These cards rely on drivers to get their work done, and rely on the CPU just as much as software RAID. The only benefit they bring to the table is the ability to have the RAID exist in a pre-OS environment- you can boot off the RAID no matter what OS.

      Ultimately, the disadvantages (extra cost, no speed increase, buggy drivers, et al.), do not weigh out over the advantages (dedicated BIOS).

      Both have their applications, but let's be honest - It makes a hell of a lot of sense to add a layer of abstraction between your operating system and your disk storage. Leave the details of arranging all your 0's and 1's, stripe sizes, etc. to your RAID controller, while your operating system sees only what it needs to - a simple logical drive. (AKA virtual disk, logical volume, etc., depending on the vendor.) Add a battery to your RAID controller, and you aren't relying on your OS to keep the logical disk intact should your system be shut down uncleanly.


      The latest versions of software RAID support a snapshotting feature which makes it impossible for the array to become out-of-sync. Batteries are only required when you are caching information from the disk onto the controller for performance reasons. At this point, you're talking about a REAL hardware RAID card, which is most likely doing parity calculations on a dedicated processor. Cost is now over $200 for a *GOOD* 4-port card.

      There is a cost-benefit curve that comes into play here also. But as a previous poster mentioned, the most cost-effective way of getting the most storage for the cheapest price is to get two cheap 500GB drives attached to a hardware RAID card, and you've covered the most likely failure scenarios. Total cost is less than $1000.


      I bought an xSeries IBM chassis (two, infact), hacked out the SCSI backplane, and added 5 250GB drives and 2 SATA controllers. Total cost: $800. I still have room for 5 more drives. I also have two processors and 3GB of RAM. Cost-effective? You betcha. Hardware RAID? Nope. And, it's designed to handle the heat.

      There's no need to get fancy here - I cant help laugh when I hear horror stories from my "hardcore" computer gaming friends who have highly tedious and unnecessary media setups - RAID-10 with hot spares, 5 fans to manage all the heat, and the bi-monthly critical meltdowns associated with it - all to store movies and porn. Overkill? You tell me.


      Sounds like they may not have thought through their implementation. Cost-effective means maximizing space, maximizing life, and maximizing versatility. Cost isn't just initial outlay- it's the life of the implementation. I'll take my 4TB array for $1700US over anything custom. Did I mention that my quad-Xeon 700MHZ can stream 1080p?

      Sure, it's expensive in electricity. But over the life of the server, I'll get more use out of it than just storage. I'll have excess processor capacity when writes are not occuring. And I have a vendor independent implementation that can be moved to any system, any time, for any purpose, including data recovery. Using fake RAID or hardware RAID will just encumber that, and add unnecessary cost.
    3. Re:Linux, RAID 5, md by guruevi · · Score: 3, Informative

      But then again, RAID6 is terrible in performance compared to RAID5 (especially on write operations) just as RAID5 is terrible in comparison on the same criteria to RAID10 (although it could be faster on non-sequential reads).

      Higher RAID-levels are not always THE ultimate solution and depending on your solution you might just have to go for a non-secure RAID level (RAID0) for large media storage with nightly snapshotting to your backup device. Usually it's not all that bad to lose a single day worth of data and if it is for these applications, use RAID10 or so. I do it as follows: get media on RAID0 (HD streams are large and fast on 10k drives) and then as soon as job is done, I copy it to the storage area which is RAID5 on cheap SATA storage and then a nightly copy to an offline backup station (HW-RAID5 with ATA100) of the data I want to keep.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
  4. Re:go for RAID-5 by TuballoyThunder · · Score: 3, Informative

    I concur. You would be crazy not to have redundancy--without out it one disk failure will pull down a good chunk of your data.

    As for growing the array. From what I understand (and I have not tested this) you can grow the size of the array if you replace all the disks (one at a time with a resync obviously). Also, as of linux 2.6.17, you can add a disk to the RAID and grow it that way.

    I would caution against making your array very large (either in disks or in space). Consider the case of a 3 disk RAID array where each disk has a probability of failing in any given second of 10^-10 (you would do this analysis using the reconstruction time of your array as the time window). The probability for any two drives not failing is (1-10^-10)^2. The total number of 2 drive pairs in a 3 disk RAID is 3, thus the probability of the array not failing in any given second is (1-10^-10)^6=0.99999999940. Over a period of five years, the overall probability of no two drives failing is (1-10^-10)^(6*157680000)=0.909729. If you increase the array size to 10 disks, the overall probability of two drives failing is 0.241927 (the number of 2 drive combinations is 45 so you replace the 3 with 45).

  5. Re:SCRUB your arrays! by statemachine · · Score: 3, Informative

    echo check > /sys/block/md1/md/sync_action

  6. Re:KISS it by Wonko+the+Sane · · Score: 5, Informative

    Had I used RAID5, I would have 1,500 GB and it would not have been easy to upgrade. I have ran out of room and I am adding a couple of 750 GB drives.
    If you use a linux server and LVM, losing one drives loses everything.
    That's why you use hardware RAID. A good card will allow you to swap out drives and rebuild, or add new drives to the array, without ever needing to unmount the anything.

    3ware made some pretty good cards.
  7. Re:KISS it by sbryant · · Score: 4, Informative

    If you use a linux server and LVM, losing one drives loses everything.
    That's why you use hardware RAID.

    Eh?

    LVM and RAID are orthogonal solutions, and don't do the same thing. LVM will let you make a single larger partition out of a number of real partitions, and before anyone says that's the same as RAID0, I should point out that RAID0 is not a real RAID level (as it has no redundancy). The circumstances for failure for LVM and RAID0 (JBOD too) are basically the same - if one part fails, you will quite possibly lose the whole lot.

    As for hardware RAID, I would not necessarily recommend that either, as it moves the single point of failure without resolving the problem. Replacing a broken controller with something compatible some years down the road can prove impossible, especially with onboard controllers. There's also the fact that a number of RAID controller cards are buggy and others do most os the work in software drivers anyway! Performance is also no longer a reason to use a pure hardware RAID solution, especially now that multi-core machines are available cheaply.

    Hot-swap is still someting that requires a good hardware solution, but that's about it. Good (and well supported) RAID products cost good money too, and for most of us it's just not worth doing - better to use software RAID, buy more RAM, and pocket the rest.

    -- Steve

  8. Re:KISS it by redcane · · Score: 4, Informative

    I think the comment about CPU performance was more about the fact that with faster CPUs, the speed benefit of a hardware raid solution isn't as useful. I checked the raid6 personality on my 1Ghz celeron, and it reports that it can calculate RAID parity at a throughput of 985MB/s, using the SSE parity calculation routines. That's more than you do any useful file serving with (it has to go out on the network, and that'll saturate gigabit ethernet). The I/O performance advantage of a hardware controller doesn't seem too useful here. I'm also not sure why software raid can't benefit from the multi-channel read performance available with raid 1, 5, etc.... Why can't the software issue a read command to two drives simultaneously? The comment about a buggy raid controller obviously wasn't talking about a new controller, but one that has eventually failed. I imagine they generally become obeselete before they fail, but even so, a failed controller really sucks if you need to figure out how to get the raid stack running on another controller. I had a quick look into it at one stage, and it broke my brain trying to work out what controllers worked with what....