RAID Vs. JBOD Vs. Standard HDDs
Ravengbc writes "I am in the process of planning and buying some hardware to build a media center/media server. While there are still quite a few things on it that I haven't decided on, such as motherboard/processor, and windows XP vs. Linux, right now my debate is about storage. I'm wanting to have as much storage as possible, but redundancy seems to be important too." Read on for this reader's questions about the tradeoffs among straight HDDs, RAID 5, and JBOD.
At first I was thinking about just putting in a bunch HDDs. Then I started thinking about doing a RAID array, looking at RAID 5. However, some of the stuff I was initially told about RAID 5, I am now learning is not true. Some of the limitations I'm learning about: RAID 5 drives are limited to the size of the smallest drive in the array. And the way things are looking, even if I gradually replace all of the drives with larger ones, the array will still read the original size. For example, say I have 3x500gb drives in RAID 5 and over time replace all of them with 1TB drives. Instead of reading one big 3tb drive, it will still read 1.5tb. Is this true? I also considered using JBOD simply because I can use different size HDDs and have them all appear to be one large one, but there is no redundancy with this, which has me leaning away from it. If y'all were building a system for this purpose, how many drives and what size drives would you use and would you do some form of RAID, or what?
At first I was thinking about just putting in a bunch HDDs. Then I started thinking about doing a RAID array, looking at RAID 5. However, some of the stuff I was initially told about RAID 5, I am now learning is not true. Some of the limitations I'm learning about: RAID 5 drives are limited to the size of the smallest drive in the array. And the way things are looking, even if I gradually replace all of the drives with larger ones, the array will still read the original size. For example, say I have 3x500gb drives in RAID 5 and over time replace all of them with 1TB drives. Instead of reading one big 3tb drive, it will still read 1.5tb. Is this true? I also considered using JBOD simply because I can use different size HDDs and have them all appear to be one large one, but there is no redundancy with this, which has me leaning away from it. If y'all were building a system for this purpose, how many drives and what size drives would you use and would you do some form of RAID, or what?
Nothing can possibly go wrong. Especially if you use, like, 10 disks.
I would go RAID 5. But, let's face it, you're gonna have to bite the bullet on this one...either get the bigger disks you want now, or plan on rebuilding the array down the road (and losing all your data, unless you have another mass storage device that can hold it).
Wikipedia has a very informative article regarding RAID and the various levels, in fact here it is. http://en.wikipedia.org/wiki/RAID
Chicken fried butter sticks? Do
That said, RAID is not a replacement for proper backup. RAID is just a first line of defense to avoid downtime.
"No one likes working in a hamster wheel, and your shop smells of cedar shavings from here." - TaleSpinner
You can just download them again, right?
This issue is a bit more complicated than you think.
I know there was an article in linuxjournal on how to build out a 1 TB JBOD storage server out of JBOD and a normal PC chassis. I'd look back and do that -- with multiple sets of discs as backup. I can't remember the issue, I believe it may have been Oct 2005. good luck -- http://30days.itious.com/
21st-Century-Citizen
Does anyone know... once you setup raid what happens when a drive fails? Does something in the harddrive pop up and tell you? What if it is a linux server in a closet and you'd rather the server sent you an email?
Design for what you want to use today and in the near future, don't design for a few years from now, you'll never get it built.
That being said, mirroring might be the easiest solution to upgrade, but you'll sacrifice speed and space.
If you want speed and redundancy, you'll have to go with something like RAID 5 or RAID 10 and just have a painful upgrade in the future.
I'm running a few arrays, all over 1TB. Largest is 8 drives in a raid6 config. everything uses software raid. Be sure to use LVM, so that you can snapshot your drives. Once you're properly RAIDed, your more likley to lose your data by an accidental file deletion than by unfixable hardware failure.
If you have 3x500GB disks in RAID5, you only have 1TB of usable space, as one drive is used as parity (and therefore not for effective data storage). If you replace the disks with larger ones, the array is not increased in size if you replace each disk one at a time and let the array rebuild itself. However, you can just plug in your new drives (if you have enough ports), create a new array, and then copy data across to the new array. Alternatively, if you are using software RAID, as you increase the size of the drives, you can create extra partitions on the drives and RAID these. e.g. 3x500GB drives in RAID5, changed to 3x1TB drives, each with 2x500GB partitions = 2x1TB RAID5 arrays. This is not recommended however!!
Personally, I would just buy a Buffalo TeraStation or Netgear StorageStation and let that do the hard work. Just plug it into your network, then share the data. Just have a single 500GB drive on your media centre for recording TV, and then anything you want to keep just copy over to your NAS box.
It all depends on how the RAID is implemented. Most inexpensive controllers require a rebuild when you change sizes. It is not a big deal. I would never implement anything important jbod the chance of failure is too large. I have replaced too many disks. Do RAID5 or RAID1. Over 99% of my disk is RAID5 and I manage just over 500TB.
Buy a Drobo.
Just get a couple of these.They're cheap enough that you can use one for live storage and one as a backup.
RAID 5 drives are limited to the size of the smallest drive in the array.
Yes... Duh....
And the way things are looking, even if I gradually replace all of the drives with larger ones, the array will still read the original size. For example, say I have 3x500gb drives in RAID 5 and over time replace all of them with 1TB drives. Instead of reading one big 3tb drive, it will still read 1.5tb. Is this true?
Yes... Fucking duh.... Have you even read the RAID 5 Wiki article?
I also considered using JBOD simply because I can use different size HDDs and have them all appear to be one large one, but there is no redundancy with this, which has me leaning away from it. If y'all were building a system for this purpose, how many drives and what size drives would you use and would you do some form of RAID, or what?
We've been through this a million times before and the answer is always the same. You're a cheap bastard who wants gobs of space with an acceptable amount of redundancy but aren't willing to buy two sets of drives. Buy 4 of the biggest drives you can afford and RAID 5 them. Don't expect stellar write speeds. You won't have a backup if something happens and all 4 drives blow but you'll at least have protection when one drive gives up the ghost which is mainly what most people want to protect against.
Why does stupid shit like this keep getting posted to the front page?
This is what you do: buy 2 drives exactly the same size and mirror them. End of story. If you're worried about a blown raid controller, then buy another hard drive and stick that on another computer and run a weekly cron job to copy everything. Right now you can get 500 GB hard drive for about $150. Get two of them and mirror them. (If you need more than 500 GB I would highly suggest encoding your porn into a different format than MPEG2) By the time you run out of space, you will be able to get 1 TB drives for about $150. Migrate over to the 2 1 TB hard drives. Repeat every few years.
With computers, the stupidest thing you can do is spend extra money to prepare for your needs for tomorrow. Buy for what you need now, and by the time you outgrow it, things will be cheaper, faster and larger.
By the way RAID 5 is a pain in the ass unless you have physical hotswap capability, which I highly doubt.
Out of all the details you're still working on, you decided to ask Slashdotters about storage?
Why not the "windows XP vs. Linux" bit? Do you want 100 responses or 1000?
Media Server: n. A euphamism for digital porn storage.
With Linux you can create a RAID5 md device, say /dev/md0, then run LVM on top of that (pvcreate /dev/md0 ; vgcreate MyVgName /dev/md0) and use that to carve out your storage. The key here is to create partitions on each drive, eg filling up the entire disk, and create your raid5 with those.
/dev/md1. So now you have /dev/md0, pointing to the first 500GB of each disk, and /dev/md1, pointing to the 2nd 500GB of each disk.
/dev/md1 and graft it onto your LVM volume group. (pvcreate /dev/md1 ; vgextend MyVgName /dev/md1). Now your LVM VG just doubled in size, and you can use all that new space. Whatever you do though, do NOT create any "striped" logical volumes (the "-i2" option to lvcreate; LVM's Poor Man's RAID0, basically) because you will suffer terrible performance, since you'll be striping across different volumes on the same physical spindles (a big no-no for any striped configuration). But if you use the extra space by creating new filesystems or growing existing ones, you shouldn't see any trouble.
If you buy 1TB drives further down the road, here's what you do- With each disk, create a partition identical in size to the partitions on the smaller disks, then allocate the rest of the space to a second partition.
Join the first partition of the disk to the existing RAID set. Let it rebuild. Swap the next drive, etc. etc. Then once you've done this switcharoo to all the drives, create another raid set using the 2nd partition on your new disks--call it
Take that
Just be sure that any replacement drives you have to buy... you must partition them out similarly. I'd recommend pulling back on the partition sizes a bit, maybe 5%, to account for any size differences between the drives you bought right now and some replacement drives you may purchase later on which might be slightly lower in capacity (different drive manufacturers often have differing exact capacities).
the real at&t mix
It depends on the implementation (and possibly the raid level). Some raid cards will let you expand the container after you've replaced all of the drives with new ones of a larger size. Then you have to expand the partition, or put another partition into the new space. I've done this with Compaq hardware running Win2k in a RAID 1 (mirrored pair).
The "Raid 5 can't do what I heard" isn't quite what's going on, again, depending on the implementation. Most raid cards I've used allow you to add drives to the array and expand the array to the new drive(s) without downing the server or requiring a rebuild.
So RTFM for the card you're going to use.
Go RAID5. RAID5 = Hardware failure resilience + maximum storage.
Go Linux. The Linux MD driver allows you to control how you RAID- over disks or partitions. there are advantages. We will discuss.
First, don't get suckered into a hardware RAID card. They are *NOT* really a hardware card- they rely on a software driver to do calculations on your CPU for RAID5 ops. Software RAID is JUST AS FAST. Unless you blow the big bucks for a card with a real dedicated ASIC to do the work, you're fooling yourself.
Now, you want to go Linux. By using the md driver, you can stripe over PARTITIONS, and not the whole disk. By doing this, you can get MAXIMUM storage capacity out of your disks, even in upgrades.
Say you have 3 500GB disks. You create a 1TB array, with 1 disk as parity. On each of these disks is a single partition, each the size of the drive. Now, you want to upgrade? SURE! Add 3 more disks. Create three partitions of EQUAL size to the original, and tack it on to the first array. Then, with the additional space, you can create a WHOLE NEW array, and now you have two seperate RAID5's, each redundant, each fully using your space.
Another advantage with MD is flexibility. In my setup, I use 5x 250 drives right now. On each is a 245GB partition, and a 5GB partition. I use RAID1 over the 5's, and RAID5 over the rest. Why? Because each drive is now independently bootable! Plus, I can run the array off two disks, upgrade the file system on the other 3, and if there's a problem, I can always revert to the original file system. So much flexibility, it's not even funny.
I recommend using plain old SATA, in conjunction with SATA drives, and just stick with the MD device. For increased performance, watch your motherboard selection. You could grab a server oriented board, with dedicated PCI buses for slots, and split the drives over the cards. Or, you can get a multiproc rig going, and assign processor affinity to the IRQ's- one card calls proc 1 for interrupts, the other card calls proc 0. If you have multiple buses, then performance is maximized.
The last benefit? Portability. If your hardware suffers a failure, then your software RAID can move to any other system. Using ANY hardware RAID setup will require you to use the EXACT same card no matter what to recover data. Even the firmware will have to stay stable or else your data can be kissed goodbye.
Windows? Forget about it.
Good luck!
I really can't believe this made the front page. The questions are badly written, and the question itself could have been answered with some basic Internet research. RAID isn't an esoteric topic anymore, folks!
This place has really gone downhill. I thought Firehose was supposed to stop stuff like this, not increase it!
Anyways, just to be slightly on topic: there's no one answer to this question. It depends on your budget, your motherboard, your OS, and, most importantly, your actual redundancy needs. This kind of thing is addressed by large articles/essays, not brief comments.
Plausible conjecture should not be misrepresented as proof positive.
you write that if you have 3 500G disks in a RAID 5 that you will have a 1.5T, etc. Don't you realize that (N x C) - C = Total ? i.e. (3 x 500) - 500 = 1000 or 1 terabyte. That's only the first problem with your logic...
I have a lot of data (500 GB of music/movies/pictures/wallpapers/audiobooks/ebooks ; filled to the last GB)
I'm a student and I do not have the money for redundant storage.
I rsync my documents and pictures over the two drives and burn my favourite movies to DVD. I use ffmpeg to turn DVDs into Xvids and oggenc to turn flacs into ogg q5s.
If I lose one of the harddrives; that's life.
So for those who do not have the luxury that the poster has; make sure that you backup what is really important and risk what you can do without.
If you mod this up, your slashdot background will turn into a beautiful sunset!
This thing is very cool: Drobo, from Data Robotics. Check out the demo! http://www.drobo.com/products_demo.aspx
> either get the bigger disks you want now, or plan on rebuilding the array down the road
Not at all, these days one does have better options than rebuilding a blank array. Read up on LVM, it is powerful stuff.
Replace the drives in the array one at a time, allowing time for the array to rebuild. Then you can grow the volume to make use of the extra capacity. Yes it will require some planning and will probably take a week to slowly merge in the new set of drives, but it sure beats a bare metal restore because you can still be recording and watching video while all this rebuilding and resizing is happening.
Don't really know how much of the above applies to Windows, haven't seriously used it in a decade; so sometone else will have to supply details on it's volume management flexibility.
Democrat delenda est
You could always purchase a NAS from Infrant. A diskless version retails for around $640; and they have a proprietary raid level called X-RAID. It is basically a RAID 5 array, but allows for expansion using larger drives. The standard rule applies, each individual drive will be limited to the size of the smallest drive, but you can hotswap one drive at a time, allow the drive to be rebuilt, and repeat the process for all four drives. Once the final one is done, it will auto expand to your new capacity. Pretty futureproof.
I went through all this.. Read all about rolling my own RAID, using Linux, using Windows, etc. In the end I opted for the ReadyNAS NV from Infrant.
I bought the ReadyNas NV without drives for about $700. I put in 4 500GB drives last year and now I've got 1.5GB of RAID 5 storage. It works great for my needs (media storage of videos, music, pictures streaming through XBMC).
Raid 0 won't protect you, man!
Agonizing over the ability to incrementally upgrade an array is a sure sign you have cost at the very top of your list of concerns, with everything else far below. Learn about software RAID. At the throughput levels you're planning for (3 disks?) hardware RAID is a waste; contemporary CPUs can cope with all the parity calculations involved with negligible effort. Save money on proprietary hardware/licenses with Linux+LVM+MD and use the cash to upgrade the drives simultaneously. Or become a guru and figure out how to layer LVs and MDs to use capacity incrementally; the only cost is your time and spare stomach tissue.
If I had to manage fault tolerant storage with mis-matched physical disks and no budget I'd be looking at ZFS. There are other ways of doing it but the ZFS model is so simple and obvious that it has a high probability of actually working in the real world. Right up until it gets corrupted and you learn there is no ZFS fsck...
Lurking at the bottom of the gravity well, getting old
Hardware WILL get old, WILL die, and better stuff WILL become available. So it only makes sense to recognize this and plan for it.
Here's the way I do it (for a home storage server, not a solution for business-critical stuff):
Examine current storage needs, and forecast about two years into the future.
Build new server with reliable midrange motherboard, and a midrange RAID card. These days you could do with a $100-$300 four-port SATA card, or two.
Add four hard disks in capacities calculated to last you for two years of predicted usage, in RAID 5 mode. Don't worry about brand unless you know for a fact that a particular drive model is a lemon.
Since manufacturer's warranties are about one year, and you may have difficulty finding an unused drive of the same type for replacement, buy two more identical drives. These will be your spares in the event of a drive failure.
When the two years are up, you should be using 80 to 90 percent of your total storage.
At this point, you build an entirely new server, using whatever technology has advanced to at that time.
Transfer all your files to the new server.
Sell your entire old storage server along with any unused spare drives. A completely prebuilt hot-to-trot RAID 5 system, with new matching spare disk, only two years old, will still be very useful to someone else and you can recoup maybe 30 to 40 percent of the cost of building a new server.
Lather, rinse, repeat until storage space is irrelevant or you die.
When did slashdot become a substitute for usenet/google/wiki or (gawd forbid) a fucking manual? Why do editors feel inclined to post the drivel of every clueless newbie who needs handholding, while rejecting important/interesting news stories?
As to the poster's question: read the fucking manual, kid.
___
If you think big enough, you'll never have to do it.
No, really. It's all about cost. Even with hardware accelerated RAID, you can expect a steep performance hit. If you're going for a massive data repository I'd suggest several RAID 1+0 setups in hardware with a decent volume manager & file system (not NTFS).
2x500GB drives in a RAID 1 (for peace of mind). Then double that in a RAID 0 stripe (for speed). That's 4 drives per TB. Then use a decent file system, like ZFS, to chain your RAID 1+0 clusters into a single volume 1TB at a time.
Whatever you choose to aggregate your storage, I don't think you'll be able to get away from mirroring every drive... unless you go the budget RAID 5 route. I'd suggest no redundancy in that case.
Some RAID controllers allow you to enlarge a RID5 array. If the OS also allows you to enlarge the partitioning, then you are set. I think currently both is possible under Linux.
However, the better approach would be to recreate the array on disk upgrades. After all for any kind of reliability, you need backup anyways. RAID is not a replacement for backup!
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
It would be nice to know just how much data you are trying to store. If this is going to be a whole bunch of mp3s, then you might look into a Raid 1 array of that new 1TB drive from hitachi.
At 1TB, it is still gonna be pretty hard to fill this with DIVX encoded movies. I guess though, if you need more space, do a 0+1. Meaning a redundant array of a data-striped set.
If you are talking about some sort of seriously whacked out array of like some Blu-Rays or HDDVDs or some crazy thing like that....then i would....uhhh....probably just start praying.
Honestly, the best set up (once again depending upon your intentions) is probably going to be a linux box running:
mt-daapd (for streaming to itunes)
mpd (the media player daemon, for hooking the box into a stereo)
slimserver (to stream through a web interface to any machine that can reach it on the network).
Samba (for sharing the music to windows clients)
vsftpd (for sharing the music to everybody else)
slap a couple of those 1TB drives in there with some Raid 1 for redundnacy....and i think you should be in VERY good shape.
OH OH OH...put it into one of those ultra-sexy HTPC boxes for added win factor.
NewslilySocial News. No lolcats allowed.
Using mdadm and linux you can grow a RAID 5 if you replaced all the disks one by one, this has been possible for a while. Recent kernels have made it possible to expand the RAID 5 sets by adding more drives. So you can basically grow as you need. Some guys have even migrated from a JBOD to RAID 5 using just one extra disk by creating an array from two drives but marking one as missing, I'm not recommending this unless you have backups :) (and since you have your backups it will be quicker to just create the whole thing right away than to go through reshaping for each disk).
I'm a big fan of RAID 1. It's 100% wasteful, but I can relate to that. Advantages of RAID 1 include:
If you're seeking 1.5TB, you could have two 750GB RAID 1 arrays using four 750GB Seagate Barracuda 7200.10s.- Simplicity: RAID card broken? Fine, just shove one of the drives onto a non-RAID interface and you're off. Simple setup also means low overheads, which leads to...
- Speed: Faster than RAID 5 because the controller isn't doing anything clever. If you want faster, go 0+1.
- Robustness: The temptation with RAID 5 is to have one massive partition across loads of drives. That's great, until you accidentally format it or something. Don't forget to back up, but splitting your storage into smaller arrays would be safer.
You've left off which method you plan to use for your RAID (whichever implementation)... If it's hardware, the suggestions mentioning things like LVM are irrelevant, it only matters if the RAID controller supports dynamic re-partitioning (if it doesn't, then who cares what the OS supports, the OS can't use what the controller doesn't say is there).
/., but since you said you hadn't decided, I just wanted to make you aware of another option that you might not have considered.
Also, since you mentioned that you haven't chosen an OS, I believe MS will be releasing Windows Home Server this fall. It's based of the 2003 Server system, so it's well proven and has no problem with drivers or any of the issues Vista is currently having. Also since it's built on top of 2003, there's already lots of industry support out there. The UI they've grafted onto it is very friendly, and the backup system that it has is awesome (incl full-disk restore using only a boot CD!) It's actually being designed to fulfill more or less exactly the role you seem to be seeking a solution for.
I know I'll get blasted for having suggested an MS sol'n on
Anyway, for your system, I'd make a Software RAID1 partition for your OS (whichever it is) and install a Hardware RAID5 solution for my data. Since it's hardware RAID5, you can break it up however you like, and still have redundancy AND a minimal loss of space. You could consider RAID6 for increased safety, but I haven't seen a hardware RAID6 controller out there anywhere yet... (RAID 6 is like RAID 5, but has 2 parity drives, thus enabling up to two drives in the array to fail whilst remaining operational).
-AC
I started to put a 2-drive RAID 1 setup in my MythTV HD server. I eventually bagged it and went with a single SATA disk.
Here's why:
So I figured I didn't HAVE to have the array, realized the machine would run lots cooler and lots cheaper, and when it finally went down, BFD, I'd buy an even bigger disk for less money, install it without a hassle and copy my MP3s back onto it.
N.B. - I've been on a 20+ hour bridge-line call today as our data center guys try to figure how to rebuild an enterprise disk array. Are you SURE you want to go with a RAID?
You can put RAID 5 on varying size disks.
I had 4 300GB drives, and 2 200GB drives.
I broke them up into 100GB partitions, and layed out the RAID arrays:
A1 = [D1P1 D2P1 D3P1 D5P1]
A2 = [D1P2 D2P2 D4P1 D6P1]
A3 = [D1P3 D3P2 D4P2 D5P2]
A4 = [D2P3 D3P3 D4P3 D6P1]
Then I concatenated the arrays together, giving a little less than 1.2 TB of space from 1.6 TB of drives; if I had just RAID'd the 4 300 gig drives, and mirrored the 200's I would have only had 1.1 TB available, and the drive accesses would be imbalanced.
I could also grow the array, since it was built as concatenated, so later when I got 4 400GB drives I raided them then tacked them on for 2.4 TB total.
If you are going to do this, do it right. It will cost you some up front, however, in the long run, doing it right will be cheaper. Get a real raid card, as in hardware RAID. Get something that supports multiple volumes and at least 8 disks. I personally just got the Promise SuperTrak EX8350. Now, why do you ask do you need 8 disks? So you can upgrade, that is why. Use your current 3 or 4 disks you have now in a raid volume. In a couple years when bigger disks are dirt cheap, pick up 4 1TB+ size disks and build a second volume on the RAID array using the new disks. Now you can offload all the old data onto the new RAID volume and either ditch the old disks or keep them around (up to you, however, I recommend ditching to other computers or whatever so that you now have 4 empty slots on the RAID card so that you can rinse/repeat the whole process again in another few years...)
Again, doing it correct up front takes care of upgrade options down the line. It also gives you room to do monster sized volume if you ever need that much space (8 disk array). Most of these RAID solutions are also OS independent, so if you want dual boot, the volume would be recognized by Windows, Linux, Unix, BSD, etc., and you are also not dependent on using the exact same motherboard if you motherboard dies or wants to be upgraded (you would lose all your data if you use the built in RAID on the motherboard when changing to a new motherboard other then the exact same model).
These better cards also can be linked together (i.e. you always get a second card assuming your motherboard has a slot for it, and add more disks to the array that way as well).
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
If it is a Linux server, you're already using mdadm, which has a monitoring daemon with e-mail notification.
Get a small box, install opensolaris on it, configure your JBOD as either raidz or raidz2, configure either iSCSI or SaMBa to share the files using a gig link.
Your data should be perfectly safe, with raidz2 can lose up to 2 drives, without data loss.
Who is general failure, and why is he reading my hard drive?
that is all.
Has anyone tried using drobo ? I'm contemplating upgrading my own storage, and they seem to have the least painful solution (as opposed to managing Raid and dealing with disks of different sizes, recovery, etc.), but I have no idea whether their product actually works.
>|<*:=
I considered this for quite some time before putting together my 2.2TB Myth box two years ago, and I went against the grain and went with JBOD. What takes up the most disk space on my server is images of my DVD's, for which I have the originals, so I don't need the redundancy. For the recorded broadcasts, if I lose them, I lose them. I have a small RAID-0 pair of disks for my music and such that I don't want to lose. I could have shrunk my disk space and went for a RAID cluster (lord knows I have the disks), but the redundancy was overkill for the application.
RAID 0 will offer you the best performance but no redundancy of data -- so no fault tolerance. You lose a disk and your dead in the water. RAID 5, on the other hand, offers a parity array, so while it stores parity information for rebuilding data, it doesn't store redundant data. Therefore, you get the most for your money. If you really concerned about data protection, you can also go with RAID 6, which includes a second parity array, for protection against dual disk failure (something I believe StoregeTek originally came up with). Again, it makes the most of your storage capacity by only saving parity data (striped across multiple disks) and not duplicate copies or your original data blocks.
i have just rebuilt my setup and am having good luck with it. this might be an option for you.
i had this same kind of issue a few years back and started with a 'hardware' raid card(aka bootable software card). when the raid card failed, i replaced it with an exact duplicate but could not recover my array as the firmware was not identical and the chipset revision had changed! in short i lost all my media that had not hit DVD!. next i went software raid on linux with mythtv and that worked quite well though expanding the raid became difficult and multiple logical drives did not suite me well.
so i went with 2 boxes! one is on an old BP6 dual celery500@600mhz with just 256mb ram running solaris for ZFS. i had read about ZFS a bit and had some old 40GB drives lying around so i built it and did a z-raid on the 3 drives and ZFS rocks! so easy to add drives and i can mix and match sizes and pull out old drives when they arent worth having anymore! im not running opensolaris as ZFS wasnt ready on that platform yet. so now i have 2x120gb, 4x200gb and 1x320gb in the array/ZFS set. i export the drives on gigabit(pci33mhz so im really at about 1/2gigabit cause the pci bus sucks!) and i have better drive performance on my mythtv box via the network than i do with the 4x100GB drives in the mythbox. i am using samba on solaris as NFS was slower and i have not played with iscsi on solaris enough though i plan to implement it.
solaris and ZFS suit this purpose very well and i dont think i would go back to running a software raid on linux if i can run free solaris and ZFS!
i am not using the inbuilt disk compression as the media files are already compressed and the celeron processors i think would be limiting.
i can also access the ZFS samba resource from my windows xp machine and watch the media from their though i had a bit of trouble finding the right codec to play the mythtv format.
This is what you do: buy 2 drives exactly the same size and mirror them. End of story.
NO! That's NOT the end of the story. You need to do what is called "scrubbing" the array periodically, because drives "silently" fail, where areas become unreadable for various reasons. Guess when one usually discovers the bad data? When one drive screeches to a halt, and you confidently slap in another and hit "rebuild". Surpriiiiiiiiise.
You can do it a variety of ways. The most harmless is probably to run a read-only bad-block test via cron, while monitoring each drive's SMART parameters long-term and having your cron job let you know if badblocks finds anything. An alternative is to instruct md to verify the array, if you're doing software raid.
You cannot, cannot, CANNOT just drop a bunch of drives into raid 5 and expect it to be peachy for the rest of time.
By the way, regarding controllers- skip ANYTHING made by 3ware, especially their PCI controllers. They're barely able to push 20-25MB/sec and have a couple of bad compatibility problems with certain drives. Areca units are blazing fast (especially the PCI-E cards) but priced for businesses, not home users looking for "cheap as possible."
Software raid comes in #1 for price/performance, but I strongly, strongly recommend you play around with the mdadm tool quite a bit before you put actual data on an md array. The stuff is very half-baked.
Please help metamoderate.
one option to consider is waiting a few months for Windows Home Server - one of its key features is a new technology named "Drive Extender". The idea is you make a pool of disks (1394/SATA/IDE/SCSI/USB2/etc). This pool is exposed as one drive letter. You can remove disks as you choose a and even cooler is that you can mix drives of various sizes. You can even remove a disk without causing a blue screen or massive corruption (you should still try to be friendly when you remove disks - yanking them out can be dangerous.. just not nearly as much as RAID strips).
The data on the disks can be duplicated if you want, just choose a share and select the duplicated attribute. Once selected Home Server will make sure there are always two copies located on two different disks.
more info can be found here Wikipedia article http://en.wikipedia.org/wiki/Windows_Home_Server
After screwing around using a mixture of linux/freebsd with numerous software and hardware raid drive and so long, having spend a lot of time that I could have use to other thing and also having a hardware failure that was never suppose to happen (You know the kind) and recovering from backup most of my stuff.
o gy 106e (You can get it naked, so you get your own drives), added a SATA 500Gig internal and a external 500Gig usb. Backup are automatic once a week (Good for me) and in case of hardware issue I can mount on any PC. You can also do network backup across unit and I may do that later to a remote site. They also have now raid box. It's also 1000BT but I have not seen much speed improvement for what I do at lease. They is a bunch of software running, ftp, www, mysql, php, photo management and media server (itune and UPNP)
:-)
The goal of all this was to have a lot of storage for my house on the network for my PC, mac and Roku
I decided to try a different way and bought a http://www.synology.com/enu/products/DS106j/synol
This has been working great for me, there was some privilege issue between the mac and the pc some time but I don't mind too much. It's running linux inside and you can even get in with some special tools(Not recommended by the manufacturer)
I did this over a year ago and have not spend more then a few hours on it, mostly software upgrade and some minor issue, the termo static fan is starting to make some noise, but it's in the garage so I don't mind
that there is no good solution I can find. Every solution is flawed for this purpose, including ZFS.
I have been giving much thought to writing yet another filesystem, which would fill the needs of home/archival/media box users. Essentially it would be like ZFS, except it would improve upon ZFS's dynamic striping. I would have dynamic parity, such that the number of disks in the stripe-set and number of recovery blocks is completely independent per-file, ala PAR2. ZFS is still just as bone-headed as older filesystem because the vdev's are still atomic, you make a raidz, and it stays that way. The integrity would be on a per-file basis only. So you could add and remove disks at will, no dangerous re-striping operations, and protection and recovery from on-disk corruption. If you lose too many disks, you only lose the information on those disks. A file need not be striped on every disk. Only when a particular file has less parity blocks than missing blocks, wherever such blocks may be, is the file gone. Files on disk should always be recoverable, regardless of "corrupt superblocks", or something similar. This could probably be done using FUSE and some quick and dirty code.
Why?
1. We want a lot of storage
2. We want it expandable, no dangerous restriping or filesystem expansion. There can be NO BACKUPS!
3. We don't want to wake up in the middle of the night and wonder if the next fsck is the last.
4. We only care about enough performance to run the media center, i.e. record TV and play movies.
"I don't know that atheists should be considered citizens, nor should they be considered patriots." George HW Bush
Do you have ANY idea how long that would take me to download again!? :(
And I wouldn't have any bandwidth left for new stuff in the mean time
Stupid %#@$%ing monthly GB caps on cable service.
> even if I gradually replace all of the drives with larger ones,
> the array will still read the original size.
The key word to your problem is "gradually".
Let's say you have three 500GB disks in a RAID5 array and you want to expand. Let's also assume you can't connect more than four drives to your system. So you add one 1TB disk as part of a new (but still incomplete/failed) RAID5 array and copy all your data to it. Next, you remove the three 500GB disks and add another two 1TB disks. Once you've done this, the new array will synchronise its disks. While it does so, your data is safe because you still have your three old 500GB disks. When your three new 1TB disks have sync'ed, you can wipe and sell off the old drives.
BTW, don't do this with master/slave IDE drives on the same controller.
You don't have to do that just to use extra capacity on Linux software RAID 5 disks. Once every drive has been replaced with one of a higher capacity, mdadm can be asked to expand the array onto the unallocated space on the disks, bringing the per-disk used size up to the capacity of the smallest disk.
If you're using mixed sizes this doesn't work, of course, and then you benefit from grafting them together with lvm as you suggested.
I tend to use LVM to manage the storage as a matter of course, but prefer to keep the RAID array fairly simple.
In my hardware forum there are what people call AFRT"s or "Another Failed RAID Thread". The average single user doesn't need RAID and many people don't discover this until it's too late.
I did this a couple of times recently. I built a file server to supply ripped DVDs to three media centers in a house. I played around with RAID but got poor disk performance. Eventually I realized that the data is not vital information - the world won't end if you loose some movies and have to rip them again. I put four 500 GB drives in a Supermicro 8 bay server, with the OS on an internal drive.
Each drive is mapped by each the UNC path, i.e., \\movieserver\movies1 so the media centers have four drives mapped on each one.
If I lose a hard drive, oh well, some of the movies won't be available until they are re-ripped from the DVDs.
Had I used RAID5, I would have 1,500 GB and it would not have been easy to upgrade. I have ran out of room and I am adding a couple of 750 GB drives.
If you use a linux server and LVM, losing one drives loses everything.
CM www.cometenergysystems.com Blog: http://caribbeanrenewable.blogspot.com/
...will allow you to resize it easily.
The advantages of RAID 0 versus RAID 1 versus RAID 5 have already been covered in detail, here, and in many books and websites.
However, allow me to address the issue of how they relate to a media center:
Firstly, when you say "media center/media server", do you mean "I just want to build myself a kickass Tivo?", or do you mean "I want to serve video for everyone in my frat house, simultaneously?"
If the former, consider that Tivos ship with 5500 RPM drives for several reasons:
1) They're cheaper than faster drives
2) They run cooler than faster drives
3) They run quieter than faster drives
4) They use less power than faster drives
5) They're more than fast enough for streaming a single video to your TV while recording another
Long story short, if you're just building a "free" Tivo with a kickass drive array, performance is *not* an issue. Keep in mind that if you're building a set-top box of sorts, the low heat and low noise features are *very* big benefits. You probably want RAID 5, and/or JBOD.
If, however, you're planning on serving video to more than a handful of stations simultaneously, you may need to consider performance. This is a vote for RAID 0 and/or RAID 10.
Now, the second axis: How important to you is this data? Really?
I've got over 300 gigs of drive space on my Tivo. Most of it is the last two weeks of television reruns (Scrubs, 6 copies of last Thursday's Daily Show, etc.), movies I recorded but won't watch, etc. There are about 10 gigs (3%) of video on there that's been saved for a few months, and frankly, I couldn't tell you a single thing on there that I'd miss if my drives went belly up tomorrow. So: do you *really* need to save all those Seinfeld reruns on a highly-redundant storage array? How *much* of the stuff on the server do you really need to keep?
Assuming it's less than 50% (in the Tivo scenario, it probably is), consider using JBOD for most of your storage, and maintaining a single backup drive, or small backup drive array. Or just backing up the good stuff to DVD.
In summary: If you're just building a Tivo, you probably don't really need the performance, or redundancy that RAID offers.
I am on testing whether this site works well in my mobile web-browsing environment.
Infrant (wow, just checked their website and it looks like they were bought by NetGear) created their own version of RAID that specifically addresses the issue of capacity and expansion. It's a nice transitional blend from RAID-1 to RAID-5 and does offer the ability to increase the total capacity (albeit with a lot of drive swapping).
p hp?name=About%20X-RAID
Buy an Infrant RAID with the two biggest drives you can afford. Let's say two 750GB drives or whatever's on sale that week. It starts out acting as RAID-1 with the drives mirroring. So you have 750GB of "safe" storage. Now you add another 750GB drive. Okay, now you have 1500GB of storage with one of the drives acting as parity drive (RAID-5). Add a fourth drive and how you have 2250GB of "safe" storage. Now you come back and just replace one of the original 750GB drives witha 1TB drive. Do you get extra capacity? No...not initially. But the drive is fully formatted and integrated as X-RAID. What this means is that eventually after you have piecemeal or onesie-twosie upgraded all four drives, suddenly the X-RAID resizes itself to match the capacity of the new drives with no transfer or downtime. So in theory if you wanted to upgrade your RAID, buy four 1TB drives, swap them out one at a time (letting each one rebuilt the array) and then at the end you'll have 3TB RAID isntead of the old 2250GB RAID and all the data intact.
http://www.infrant.com/products/products_details.
I have three ReadyNAS units and love them to death. They are a little fussy about drive temperatures (I guess that's a good things but, I may get like 40 emails during the course of the day about it and it's not like I'll drive home from work to turn up the A/C in my house). My only sadness is that Infrant doesn't have a higher capacity unit than four drives (oh please oh please, eight drives with a RAID-6 type protective hotspare in one nice rack-mountable unit would be my ultimate dream).
-JoeShmoe
.
-- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing
linux software RAID as NAS
you can then configure it for external access and LAN access easilly, and by running your RAID in software you can always add additional Arrays rather than trying to expand an existing array.
* denotes array which should be taking in new data during a particular phase
phase1: *3x500r5 total storage 1TB, move to phase 2 at 500GB,
phase2:[+2 drives] 3x500r5, *2x500r1 total storage 1.5TB move to phase 3 at 1TB (active array full)
phase3:[+0drives] 3x500r5 2x500r0, convert the 2x500r1 to striped 2x500 in order to hold the half of the 3x500r5 in use, move to phase3.1 immeditately
phase3.1[+1drive] 4x500r5*, 2x500r1,total storage 2TB move on to phase 4 at 1.5TB (active array full)
phase4:[+1drive] 4x500r5, 3x500r5*, total storage 2.5TB, option to fill all arrays now, or to "retire" the 4x500 from the cycle and repeat from phase 2 with the 3x500r5 by acting once you are at 2TB
Snowden and Manning are heroes.
I use Linux software raid. I bought a rack mount case that can hold a *hit Ton of drives, leave it in the closet, and randomlly put in old drives I obtain. I use Linux LVM to mash the drives together so no parts of the file system are on a single disk. Like, I have a 200Gb disk, a 150 GB disk, a 50 GB disk, and two 20 Gb disk. I split the 200Gb disk into a 150 partition and a 50 partition. I then mirror the two, i take the 50 gb partition and the 50 gb disk, and mirror them. The I mirror the two 20's. Then I append the entire thing together.
You can do stuff like that with software raid. Just get a great rack mount case that has as many 5.25 bays as possible, put in sata hot swap bays, and start collecting disks. Add extra SATA controllers to top it off.
Of course, that is just file storage. You can export it with iSCSI or Samba and use it from windows media center... or just use NFS with Myth. Whatever.
This is your official warning that you are in over your head. If you consider your data to be critical in any sense you should use an off-the-shelf solution.
That said, the most straight forward approach to this situation would be to build a second RAID 5 set using the extra space once you have upgraded all of the drives. Depending on the sophistication of the software and hardware driving the RAID set you might be able to set up a RAID 1 on the extra space once two drives have been replaced, then grow that into a RAID 5 as more larger drives come on line.
For extra credit you could manage the space with some form of LVM (Logical Volume Manager/Management).
Also note that "RAID array" sounds foolish, since it would expand to "Redundant Array of Inexpensive Disks array", which isn't the kind of redundancy we are trying to accomplish.
Good luck!
-Peter
Seriously... This person needs to read more about data storage technology and tailor what they can afford (and can afford to lose) with their situation.
/. can really tell this person what is best especially if they aren't inclined to do the research to answer the very basic questions posed.
Personally, I assign the whole thing to a volume group this way data is easy to move between drives and I get to use it anyway I want. I also backup my systems nightly using Tivoli Storage Manager so there's always a second copy of my data already on another disk somewhere. If that somehow fails, I also have a mirror copy on tape that is also updated at the end of the backup runs every night.
This configuration means I don't mess with mirroring, I get decent performance, and my data is restorable. Tape drives are really cheap these days. Lookup DLT drives on ebay, 40GB of enterprise-level storage gear for a relatively low investment.
I also don't consider "media" files as something worth having redundant. You already own the DVD/CD right? If so you have the master you can always re-rip. Still, I do consider my iTunes library worthy of a backup simply because of the volume of music I'd have to rerip (really). Otherwise, I don't bother backing up movies and stuff I might have ripped.
The poster's situation will probably be totally different and there's no way in hell anyone at
I can't believe that nobody has recommended Solaris 10 x86. Check the ZFS filesystem as it includes most of the features found in RAID and LVM combined at the OS level.
If the only thing you are planning to do with that box is storage, then Solaris is a no brainer. If you also want to add most of the standard stuff you find in Linux, such as apache, Samba, NFS, etc, you can also do all that. Heck I can even run Asterisk on it now, and you gain terriffic security when you use zones/containers.
Have a spare controller on hand. Trust me on this.
Need Mercedes parts ?
Perhaps the guys at Data Robotics can help, or maybe not.
If you want to be able to expand your raid, one option (if you are at familiar with Linux is to use EVMS http://evms.sourceforge.net/ as it allows you to expand on your existing raid system, without you losing everything. The other option is to wait until Windows Home Server comes out which although isn't raid 5, in fact its more like raid 1, although not exactly like it, it will still allow you to have redundant expandable storage, and probably a lot less riskier than EVMS (cos if something does go wrong you may well lose your raid, and will have to rebuild it from scratch which won't be fun.)
Don't bother with dedicated RAID hardware controllers. I've seen the Linux md disk driver mentioned, and while a viable option, the better option IMO is Solaris x86 using ZFS. Basically you've got an industrial-strength piece of storage software ripe with features begging to be used in this situation....for free.
If you're interested in an industrial strength hardware platform to go with the software, go for one of these.
If you're interested in rolling your own, then simply put together an x86 box with as many SATA controllers and buses you can stuff in a box and set the disk up as JBOD (just make sure the hardware is Solaris x86 compatible of course). Create some ZFS pools of whatever RAID suits your need, and sit back and enjoy data glory.
Oh, and simply pick a protocol of your choosing to serve up the data to your clients...iSCSI, SMB, NFS, whatever.
It seems that raid10 has these advantages over raid5:
The downside is that you need more drives, and it'll cost you more.
Any theories on this?
Start by defining the amount of storage you need. Now buy 3x as much. For instance, if you want 1TB, buy 3TB. 3x 1TB drives will work nicely, but 6x 500GB drives works just as well. If you go the 3x 1TB-drive route, use the first two in a RAID1 configuration. If you go the 6x 500GB-drive route, use the first four in a RAID0+1 configuration.
Use the remaining drives as a backup destination, preferably in another computer, but even in the same computer is better than nothing. Set your backup to be automatic. Verify your backup regularly. BTW the backup drive(s) can be external if you ran out of SATA / IDE connectors; speed is not that important.
Bottom line: you want redundancy, for easy recovery in case of hardware failure, and you want backup for easy recovery in case of operator error, or for recovery in case of catastrophic failure.
If you can only afford 2x the space you require, always choose backup over redundancy. Backups can recover under circumstances where redundancy cannot; the opposite is not true.
Don't bother with RAID5, it's not worth it. Reasons why:
- You still need backup. Therefore you will need to buy at least 2.5x your desired capacity. The savings are small over the 3x required for RAID1 or RAID0+1.
- RAID0, RAID1, RAID0+1 are all supported by most motherboards. RAID5 requires either additional hardware (more expense, another thing to go wrong) or software (slow, boot problems, etc).
BTW I have learned this lesson the hard way. First, I lost everything when I had neither redundancy nor backups. Later, I lost everything when I had only redundancy, and my controller card failed (wiping all drives in the process). Later, I nearly lost everything when I was using only redundancy again - RAID5 this time - I upgraded my OS and it could not rebuild my array. Now, I use RAID1 + Backups, and have recovered from multiple otherwise catastrophic failures, upgrades, drive swaps, etc.
Ubuntu 7.04, works great.
v4sw6PU$hw6ln6pr4F$ck 4/6$ma3+6u7LNS$w2m4l7U$i2e4+7en6a2X h
With Linux you can create a RAID5 md device, say /dev/md0, then run LVM on top of that (pvcreate /dev/md0 ; vgcreate MyVgName /dev/md0) and use that to carve out your storage. The key here is to create partitions on each drive, eg filling up the entire disk, and create your raid5 with those.
/dev/md1. So now you have /dev/md0, pointing to the first 500GB of each disk, and /dev/md1, pointing to the 2nd 500GB of each disk.
/dev/md1 and graft it onto your LVM volume group. (pvcreate /dev/md1 ; vgextend MyVgName /dev/md1). Now your LVM VG just doubled in size, and you can use all that new space. Whatever you do though, do NOT create any "striped" logical volumes (the "-i2" option to lvcreate; LVM's Poor Man's RAID0, basically) because you will suffer terrible performance, since you'll be striping across different volumes on the same physical spindles (a big no-no for any striped configuration). But if you use the extra space by creating new filesystems or growing existing ones, you shouldn't see any trouble.
If you buy 1TB drives further down the road, here's what you do- With each disk, create a partition identical in size to the partitions on the smaller disks, then allocate the rest of the space to a second partition.
Join the first partition of the disk to the existing RAID set. Let it rebuild. Swap the next drive, etc. etc. Then once you've done this switcharoo to all the drives, create another raid set using the 2nd partition on your new disks--call it
Take that
"It's almost too easy." --Garth
"I am in the process of planning and buying some hardware to build a media center/media server"
I would suggest a backup/restore solution instead of RAID.
Ghost images of the OS, and daily backup of the media storage to a large external SATA drive.
If you go with RAID, use RAID 10, this requires twice the drives to get the same space; 4 one TB drives will give a 2 TB array.
RAID 10 is faster than any RAID except RAID 0, and RAID 10 is redundant.
If you go with RAID, go hardware RAID.
Always buy spare drives when you buy the originals for the array, as you may not be able to buy replacements later.
I am the unwilling control for my Origin.
Read thru the comments, much of it very good, here's some addendum
1) software RAID with windows is a bad idea. Linux, as everyone says, is more or less fine.
2) raid 5 w/ a controller - if you loose the controller, you loose the raid unless you can find an identical controller and have taken all the proper steps.
3) mirrored RAIDs can be recovered even if the controller fails, and have very, very good read speeds. Put your boot partition and important data on the mirror
4) one easy way to go is do a hardware mirror for your primary bootable partition (avoiding the problems of installing the OS to a software mirror) then put 2 additional (smaller, faster?) disks in a software RAID0 for your intensive read/write stuff and back them up on a daily basis 'cause some day you're guaranteed to have that fail catastrophically.
closed minded is as closed minded does
Consider RAID-10... fast and VERY redundant... You can have multiple drives fail out and its fast.
However, you will need to have at least 6 disks, and you do loose space... so i'd probably go with raid-5, and although SAS drives would be nice... well... can you say expensive? Consider an MD1000 from dell - just buy one with 3 hard drives and pick up your own that are preferably the same brand. And the basic support is worth it if you are handling more than 5 hard drives as hard drives like to fail.
Nothing can possibly go wrong. Especially if you use, like, 10 disks.
.98^10*100% approximately equal to 82% chance of surviving the year. If each disk had a 90% chance of surviving the year, then the RAID 0 configuration of 10 of them would have about a 35% chance of surviving the year. Bad idea.
Ha ha. Ok, for the ignorant, if you have d disks together in a RAID 0 array and each disk has a s% chance of surviving a given time period (say, a year), then all of those disks together in a RAID 0 configuration has a (s/100)^d*100% chance of of surviving that year. Example: 10 disks, each with a 98% chance of surviving a year gives
Ok, first, get fully educated about RAID 0, 1, 5, 0+1. It's not that hard to understand, really. Then, study the differences between the idea of RAID x and the particular implementation of RAID x on your chosen operating system/raid card.
Second, consider MythTV's storage groups feature. The latest version of MythTV (SVN trunk, which may be stable by the time you actually get everything together), will intelligently load balance over many directories (each one representing a different mounted spindle) giving you way better performance than simple RAID0/1/5. But, consider RAID 1 for redundancy for each directory in the storage group. If it's data you want to keep. If it's just TV recordings, don't bother, unless you really can't live with the idea of losing a few episodes of star trek when your system goes down.
with md on linux you can add a drive to an array or extend an array to use more of the drive later
so you can start with 3x500 (1TB useable) and replace them with 3x1TB (2TB useable)
however, seriously consider raid 6 instead of raid 5, it eats up an extra drive (minimum usealbe arrays are 4 drives giving you the capacity of 2 drives), but with today's large drives it can take long enough to rebuild the array that you run a very real risk of a second drive failing while you are rebuilding from the first one.
currently md on linux will not let you switch on the fly from raid5 to raid6
David Lang
If you're going to be storing important data on these drives - contact a vendor that sells, configures, and supports these types of setups - like IBM.
/. )
Your efforts to learn about this technology while setting it up and supporting will lead to disaster. You either know it well enough to make good judgments or you don't. Admitting you need help (other than asking
You can (with most hardware) resize an array into a larger array when you replace all the drives with larger drives. The methods are specific to each vendor - see first recommendation.
how about joining netflix at the highest level and then ordering every film you might like? certainly makes for great rainy days when you've got nothing better to do. personally, i've downloaded most of the outer limits and am starting on quantum leap (never got to see all of a lot of episodes) so that i have stuff to watch when i'm just loafing around. going through all that effort just to stream porn to your television is a horrible waste when you can just watch that on your pc... use the television and surround sound system for things that are worthwhile
According to googles recent disk report, their recommendation is to mirror disks 3 times, this is coming from a company which uses millions of drives, all the RAID formats, and hundreds of controllers. Listen to them.
RAID can be flakey, is hard to manage, and if something small goes wrong you are screwed. Copying hte data many times is much easier, works on any system, and is much easier to recover from, setup, sometimes cheaper, and more easily expandable.
Buy 3 different drives (if you buy the same they often fail close together or for the same reason).
Mirror all data to one drive.
The 3rd drive is for important files and incremental backups, the reason you need this is you simply can't mirror data blindly because if files are corrupt windows simply copies the corrupt file and you won't know something is wrong until it is too late.
The most open way to do incremental backups I have found is to use a batch file and backup files into a zip file which have changed since the date of your last incremental backup. Do incrementals every week/month or so, and mirrors every night.
I hate to plug a manufacturer, but when I upgrade my home server I'm getting a motherboard with Intel Matrix RAID on it. I've got my two 320GB SATA drives - I'll configure a 400GB RAID-0 volume and a 100GB RAID-1 volume on them. Very efficient use of the drives - I've got no use for 320GB of redundant storage, and it's not cost-effective to buy smaller drives.
If I had 3 drives I'd probably instead go 100GB RAID-5/600GB RAID-0. Might still do that.
BTW I know the numbers don't quite add up, but I'm using round figures (and advertised, but not really, 320GB/drive).
Under Linux and using some hardware controllers, RAID arrays can be reshaped when drives are all a larger size, and to add new drives.
Also, sub-optimal, but possible is to use, say, 2 250gb partitions on a 500gb drive, and expand that way.
Strange but works, if your RAID driver is flexible enough.
A motherboard designed for a server closet can be as noisy as it damn well pleases - performance and reliability is the goal here and a good way to up those is to get heat the hell off the components and a big noisy fan does that pretty well.
If the media server sits in another room and all that sits near the TV is a passively cooled, deliberately underpowered system with a high speed network connection to the other room, system noise is a total non issue.
But, if you're planning on just using one box by the TV, sound is likely to be a serious issue for real movie enjoyment (assuming this isn't the pron storage everyone else jokes about). If that's the case, the range between the insane fans some northbridge chips use (some NForce4 models come to mind, though this obviously isn't the server class you're talking about) and a passively cooled one (or one you can swap out to water cooled) is pretty dramatic.
Given that you can now get totally passively cooled PSUs and a simple kit like Zalman's reserator will passively cool your processor and GPU, literally your only remaining sources of noise will be drive noise and motherboard fans. It would truly suck to get an otherwise utterly silent system and then listen to a motherboard whirring away because it was designed for a server closet.
Avoid excess complication. With the perpendicular recording drives and SATA II, the data comes off the platters at around a gigabit per second, so that's what your sustained max read is likely to be to begin with. Plenty for a media server. No need to combine multiple data streams via striping and RAID-5 for performance.
For reliability, the current Hitachi terabyte disks offer incremental storage increases at about $0.40 USD per gigabyte. You can mirror them, and that's good enough, the incremental storage cost is still under a buck per gigabyte. If the controller or cpu dies, you still have your data. If one disk dies, you still have your data. If it were irreplaceable stuff, maybe there would be an argument for more stringent methods, but really what you're trying to avoid is the nuisance of having to reload everything. If you want to drive the incremental storage cost lower that $0.80/GB, then you can look into striping via RAID-5 or something like ZFS.
The important thing here is to find an external case that will support an adequate number of drives with acceptable noise levels and reliability for the lowest cost. I recently went with the Sonnettech Fusion 400 (not the triple-interface, but the older eSATA II model, as I was looking to keep the cost down), and an external SATA II controller card for an older Powermac G5 (SATA I is all the onboard controller supports on the original G5 Powermacs). The controller and enclosure were about $600, and the first TB drive $400. So for about $1000, I have a TB media server that I can expand to at least 4 TB over time for at most $0.40/GB (in increments of 1000 GB). For the moment, I have no redundancy, but after a few months, I'll pick up another drive and set up mirroring via software. When I fill up a TB of data, I'll add a third drive (they should be really cheap by then) and employ some form of software-based data striping to drive down the fraction used by the stripe. By the time I need a 4th drive, some newer technology will doubtless have arrived that makes SATA II obsolete, and I'll start the whole process all over again.
So far as I can see, there's no viable alternative to some form of data striping to provide adequate backups. With the amount of data contained in a media server, redundancy via striping is the only rational means that I can see of protecting me from a single-disk failure that loses data and forces me into a lotta work to rebuild the library. Other failures are certainly possible (house burning down trumps all local backup, and offsite backup of the amount of storage in a media library is impractical), but this covers the likely cases (and the only ones I've ever lost data to so far), and is simple enough to work.
The more complicated your data protection scheme is, the more likely it is that some facet of the complexity will end up biting you in the ass.
From a performance perspective, all I can recommend for the general PC power user is: Raid 1.
Raid 1 will give you a mirrored storage setup, which doubles the number of spindles and heads you have at your disposal, arguably doubling your access and write times. It will guard against hardware failure, but regardless of what you do, NO RAID LEVEL WILL PROTECT YOU FROM STUPIDITY.
For all the fancy-schmancy disk magic that can be done for very little cost these days, which the one thing that many people do not "get" is that nothing else matters but YOUR DATA. Drive capacaties are growing at fantastic rates, which means you now have more data than ever to lose.
Investing in a proper backup solution should be priority-one for users who utilize their computers as a tool for creating, well, anything. Music, art, writing, programming... things that simply cannot be replaced from scratch. We use computers to get these things from our brains to a more communicative medium. It is foolish to expect that the computer will remember all that which you created -- FOREVER.
Ultimately this will come down to how much your wallet can bear. You can begin with using offlined disk-to-disk backups, that is to say, an external hard disk that you use for backups and ONLY for backups. When your backup operation is complete, you switch the thing off and set it aside. A 300gb drive with 8 mb cache can be had for around $80 in some markets. A miniscule investment, considering a drive dies about every 2-3 years depending on use and it will be covered under warranty.
The ability to remove junk data from your archive repository will prove its value after opting for a more expensive solution and find yourself scanning through stolen milk crates filled with DDS3/4, LTO or DLT tapes.
Yes, you can drop A LOT of cash in optical or magnetic media storage systems, but those require deep pockets and good mind that is good with scheduling, logging, tape rotation, and testing backups.
So now I've gotten into the habit of setting up two 500GB disks, partition them the same, and rsync(1) from the master to the backup periodically. [Run out of cron four times a day, e.g.] The second disk does far less work, therefore it shouldn't fail as quickly.
Ideally, I'd love to be able to incorporate revisioning (and/or snapshotting) on the second disk -- basically use it as a journal to track deltas, and cycle backwards through transactions as necessary to do point-in-time recovery.
I'm guessing this is along the lines of what Apple has in mind with the "Time Machine" feature in Leopard.
--
Dabe
The #1 design constraint for a media centre pc is noise. You need it to be quiet. You don't want a big rattle/hum in your lounge room.
I'm a huge fan of RAID... but for a media centre, use a single, big, quiet hdd. And put soundproof foam in the case.
Instead of RAID, set up automated daily backups to another machine.
If you were setting up a general-purpose fileserver, go RAID 5... if it's real important data, go RAID 10... if it's REALLY important data go RAID 1. For a media centre, I'd use single hard drives.
Just don't use RAID 0... except maybe on a pure games machine... that's backed-up twice daily... and put your save-games somewhere else. Seriously, speed, schmeed. Always assume any hard drive will die in the next 10 seconds and be prepared. If you use RAID 0, assume it'll die in the next 5.
I can't believe no one has suggested an unRAID server. You get redundancy, storage that can grow by just adding another drive, low power consumption, affordability, and the ability to telnet in. (Plus it runs Linux!) I really like this solution since the data isn't spread out over a bunch of disks in a way that only the RAID controller can understand. Instead it's just a bunch of files on a bunch of disks, with an extra parity drive for reliability.
If a drive goes down, you can just pop a new one in and recover the lost data from the parity drive. If two drives simultaneously fail (unlikely), you lose the data on the drives that failed. Compare that to the nightmare if your RAID controller fails.
Here's my unRAID server, built for $400 plus the drives. I love being able to do backups by just running rsync. Once the author gets sshd built into the system, I can even do automatic incremental snapshot backups using rsync --link-dest.
Has anyone else noticed that the new comment system looks like crap in IE7?
(My firefox can't access anything right now due a known bug, and I am listening to internet radio. If I restart 'fox then I will have to listen to ads.)
ZFS allows you to trivially add/remove disks and you get RAID-Z (supposedly better than RAID 5) for free.
I will presume your setup is:
:P)
1) used at home. That means downtime is not that uber important, while heat and noise could be a problem
2) Budget is not that limited, but you surely want to maximize the effectiveness of every dollars spent
I am running a MythTV Box with Ubuntu Linux, just with a single harddisk. On the other hand, I have a standalone backup server, again Ubuntu Linux, on a old P4 box.
The backup is done with a software called Dirvish, which is a rsync based solution. It will keep changes and versions, while those not-changed are only kept once, not duplicated, hence saving space. The good thing I like most is that the backup data is retrievable by normal means, directly from file system or Samba. By the way, the backup server is configured wake-on-lan in the morning and shutdown after it has finish the backup.
As my media collection grows, I have upgraded my mythtv box 2 times, from 250G to 320G to currently 500G. The retired harddisk is moved to the backup box as JBOD to enlarge the backup volume. I use EMVS BadBlockRelocation feature for the JBOD just for the peace of mind (not sure if there is any actual use though
The pros (compare to RAID5/10 solution):
1) Single harddisk: less heat and noise. Less heat also translated to longer harddisk life, less pressure on cooling equipment.
2) Multiple version of backup. From my experience, so far (used for 1.5 year) all my data lost is result from human (read: me) error...like accidentally deleting something, or overwriting something. The backup saves me couple of times, really. RAID can save hardware failure, not human error.
4) As you upgrade the harddrive of the Media Center Box, you got a new one. In my experience I need a bigger harddrive for every 6-9 months, and the rate of failure within 9 months is really low. If you start with a big RAID, and planned to use for the prolonged time, it's just easier for any of them to die.
5) $$$ might not be a problem...but it really is!
The cons (compare to RAID5/10 solution):
1) When any backup harddisk totally fry and die (not just bad sector), your backup partition is just doomed. But who cares? Just rebuild the backup server and redo.
2) When your Media center's harddisk die, your media center just down. Although you are safe, but you have to spend time to execute the restore plan.
3) Frequent upgrade may end up putting a lot of harddisk in the backup server...not that much that would cause problem on me yet, but eventually...
RAID works bets if every drive in the array is identical specs. (preferablly just put a 2 or 3 in your quantity when you buy off a website, or get multiples of the same boxes off the shelf)
I can never remember RAID 0 versus RAID 1, but one is going to give you the same size as JBOD, but will be both faster and much more likely to cause you to lose all data. (My opinion, avoid unless you want a really fast swap partition and have no data you don't want to lose. Most of the time JBOD is better).
The other RAID gives you good redundancy, but slow speed. Useful for mission critical data and zero downtime, but you only have half the space you otherwise would. And you still need to backup, because a power surge can easily fry both drives.
RAID 5 is better than the others, you get 2/3rds data, no speed loss, and good data retention (you can afford to lose 1). And if you lose a drive, the system will still be up. You *still* need backups though. Cause once again, power surges.
Mostly I think JBOD is going be better unless you either have issues with remembering to backup (like me), or absolutely need better speed or no downtime.(like Google).
See subject. I can't believe I wasted 30 seconds of my time Googling that just to find out that it's one of the more worthless acronyms ever invented.
When I read the GP's post, I thought that verifying the array - however google brings up VERY little information about this. It seems there is no way to do this through 'mdadm' - which you would assume you'd do it with - as that does everything else to do with array management...
It wasn't until I came and read the replies to this post that I actually found out how to do this!
Sendmail is like emacs: A nice operating system, but missing an editor and a MTA.
First, go back and read this ZFS thread from a few days ago. Good stuff about storage arrays that essentially manage themselves. One of my major points of paranoia has been silent data corruption, and ZFS has the best handling of that (end-to-end checksumming) of which I'm currently aware.
;)
Second, have you given any thought to power management? Most desktop drives eat around 10 watts when operating, falling to 2 while in sleep mode. Multiplied across a pile of drives, that's a lot of power, heat, and noise. Consider this:
Put your OS on a CF card, or a mirrored pair if you're paranoid. (Dual CF-IDE adapters are all over the place.) Keep frequently used data on a mirrored pair of laptop drives, which are fairly quiet and don't need much cooling. Don't worry about wearing them out with spindle starts and stops, since they're laptop drives and they're made for it. A five-minute spindown timeout would probably be appropriate. Put your bulk storage on a RAID-5 or RAID-6 stripe set of regular desktop drives, and give them a longer spindown timeout, say 30 minutes, so they only stop the spindles at night.
Now here's the question: Could ZFS be instructed to only scrub the storage when the platters are spinning? If it's been half an hour since the last user access, pause the scrub and spin the drives down...
Automatically moving frequently-used files between the big RAID and the quiet mirror is an exercise left to the reader.
I'm liking the software raid5 as well.
:) You can't boot off of a raid5.
/dev/md2 looks like this: /dev/md2 -G -z 79000000 -- read man page, I think this is kb used from each drive
/dev/sda1) (I prefer to think long term ;)
I started out with a 40gig, an 80gig and a 160gig
Personally my drives are mostly partitioned 1-65 (512mb) for Raid 0 (boot partition)
66-end of drive for Raid 5.
I'm surprised you're suggesting making another array...
I've been getting good use out of mdadm, pvresize, and finally lvresize
Growing the
mdadm
pvresize (doesn't need any arguments)
also, when you build the array, use --metadata=1.2 as the current default has an upper limit of 2tb devices. (i.e.
I use 3ware controllers, and have actually moved entire arrays to different cards with no problems. In fact, due to an issue with the firmware on their 5000 series, I have purposely used newer cards to initially build the arrays, then moved them over to the older 5000 series cards.
If you're willing to use IDE (PATA) drives instead of SATA, you can pick up their older series cards off eBay for fairly low prices (RAID 5 will need the 6000 series or newer cards).
We are working on a mission where we will get 2-4 terabytes of data per day.
:-)
And we are going to keep it... A petabyte a year give or take a few hundred terabytes.
I'm all ears for an "inexpensive" solution to this.
You don't want to know how much this stuff costs.
Until another few years go by and you want to buy more storage. Then you're basically stuck with doubling it, clumsily -- or migrating away and essentially throwing out the old drives.
RAID 5 is better in the short run. Even with a three disc array, you're getting more storage for your money, and you can always restripe it onto a fourth disc.
It's not all porn, and some of it is high def, in h.264. And I don't even edit videos, I just watch 'em.
That is true. However, I would fill a terabyte easily, and right now, I'm guessing it's cheaper to buy three 500 gig drives than two 1 tb drives.
You highly doubt he's got SATA?
The one thing I will say is, either have another disk (even a USB thumb drive) to boot off of, or do some sort of RAID1 across them. You almost certainly want software RAID on Linux, and you don't want to try to teach a BIOS to boot off of your array.
Don't thank God, thank a doctor!
I just wrote this a few days ago. May give you something to think about, quite possibly for more than one of the machines you are planning.
-
Did it for a friend recently. Also, currently on ubuntu + dmraid on my own desktop (I dual-boot with Windows on a RAID0 array, because I'm a cheap bastard).
I recommend Ubuntu unless you have a good, specific reason not to, because it's easy, popular, and reasonably up-to-date.
I'd also recommend using NFS and/or Samba to share it, unless I'm missing something important. In particular, NFS lets you tune for jumbo frames, to get the performance you expect from Gigabit.
Don't thank God, thank a doctor!
root@prometheus:/lib/modules/2.6.20-12-generic# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid5 sda2[0] hdb2[3] hda2[1]
158964096 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
md0 : active raid1 hda1[0] sda1[2](S) hdb1[1]
521984 blocks [2/2] [UU]
With the amount of storage I buy these days, I wouldn't bother so much...
I'd just start out with a few discs, at least 3 for RAID5, maybe more. Then, when I need more storage, I have a choice -- if I only need, say, 500 gigs more (and I have an array of 500 gig drives), I can just add it on to the existing array and restripe. If I need a LOT more, I could build a new array, but keep the existing one around, either for backup, or just to have a bunch of storage.
Don't thank God, thank a doctor!
RAID-5 a bunch of disks. Then put LVM on top of them. When you want to start growing, put another set of RAID-5 disks in, migrate the data over and remove the old disks. Or, if the data is semi-expendable - ie music, recorded TV shows and movies - just use ReiserFS on LVM on a JBOD. It's easy to grow and very flexible. My archive has grown from 2 x 10GB Maxtors in 1998 to 2.1TB now.
Money for nothing, pix for free
I just did a bunch of perf testing for a new production OLTP storage system at work. Here's the results (along with some obvious points):
RAID 0 - Fastest read/write performance, best disk capacity, zero redundancy.
RAID 1 - Read/write limited by low spindle count, maximum 2 disks, capacity is equal to smallest disk, single disk failure tolerant.
RAID 5 - Good read, slower write performance due to the calculation of parity, good disk capacity, single disk failure tolerant. In my tests, IO performance peak appears to be around 9 disks, after which through-put drops off significantly.
RAID 10 - Good write, better read, throughput scales in a fairly linear manner up to the limits of the controller, disk capacity is 50% of total disk capacity, support multiple failures (providing a failed disk's mirror partner doesn't also fail). My tests showed that an 8 disk RAID 10 set performed better than a 9 disk RAID 5 set with all other configuration settings unchanged.
All of the above was tested with a good quality hardware RAID controller with write-back caching enabled.
So, in summary, if you want a good balance of capacity, performance and fault tolerance, RAID 5 is the way to go - 9 disks appears to be optimal. If you have plenty of disks, I'd go with RAID 10. Of course, I'd be asking myself if I really need the disk performance, or if I just think that I do. Do some performance profiling, look at disk queues, IOPS, etc. to determine if the expense, time and complexity are warranted.
A One that isn't cold, is scarcely a One at all.
First, if you can help it, don't use FUSE. It'd work for the media center, for now, but I don't think a filesystem is the right place to say "We'll do it quick and dirty, it doesn't have to perform well." I am doing a project in FUSE, but said project is designed to go over the Internet -- similar to GmailFS.
The only advantage of FUSE, in this case, is that it would be portable -- as in, Linux and OS X, and maybe Solaris, BSD, and Windows eventually.
Second, don't use PAR2. Something similar, yes, but PAR2 itself is obscenely slow, last I checked. As in, may actually be too slow even with your assumptions about performance (FUSE would work, PAR2 would not). I realize you probably only meant it as an analogy, just thought I'd warn you.
Third, you will want to restripe things to some extent. It could be done atomically (no risk of failure), but suppose you start with one disk, say, and then add another. You're going to want all your files mirrored onto the second disk, so that either disk can fail. If you then add a third, you'll want to convert to RAID5 -- stripe across the first two disks, parity on the third -- so that any disk can fail, but you still have as much space as you can. Add a fourth disk, and you might want to restripe that one file -- but since this is at the FS level, you can use free/temporary space, even copy the whole file at a time if it's small enough.
Or, as you say, don't stripe it across every disk -- but the fewer you stripe it across, the more space you're wasting to parity (or a full copy of the file).
And finally, you might consider not doing it quick-and-dirty. As just a small example, ZFS fragments badly enough on its own, and I imagine this would be worse.
I'd also suggest a background process that does random checks, so that you don't come back to access a file a year after you saved it, only to find it gone.
But I do have to agree with you -- filesystem design is basically dead now. Reiser4 had me excited for awhile, but now the project seems dead, and never really held much promise for more than just a really fast local FS with FUSE-like toys attached to it. I badly want to write a filesystem, but at the same time, C sucks, and it's the best we've got for FS design. Sometimes I just have to stop myself from rewriting everything...
Don't thank God, thank a doctor!
When deciding between RAID and JBOD in a media server, you should also keep in mind the noise level and power demand.
In a system with 5-10 disks used for media storage, you will need to run an entire RAID group, probably 5 disks, when playing or recording on a RAID array, while when using JBOD you only need to run the disk the actual recording is made on. This of course assumes you have a separate filesystem on each disk and manage the distribution of data yourself, not via some LVM scheme.
I have setup my media server to stop disks after about 10 minutes of idle, and this makes quite a big difference in power consumption and noise level (the thing is in the living room).
It is also easier to add or upgrade disks. Of course there is the risk of data loss when a disk breaks down, but I don't value my collection that high that the loss of a part of that would be a disaster.
(you could always keep the really valuable items on more than one disk, or you could make backups. trouble is that a usable backup medium for terabyte capacities does not exist in the consumer market)
Let's say you start with 4x 500 GB, for a total effective space of 1.5 TB in RAID5, at some point this array fills up and you want to upgrade. So you buy 2x 1 TB drives. Create two 500 GB partitions on each of these 1 TB drives. Now copy the data from the existing RAID5 to the new drives. Drop your old RAID5 array, and now copy some of the data back to the old 500 GB drives until you have two empty 500 GB drives, and one empty 500 GB partition on each of the 1 TB drives. Create a new four disk array using two 500 GB partitions from the 1 TB drives and two of your old 500 GB drives -- copy your data to the new array. Then create another RAID5 array with the remaining four partitions. You now have two RAID5 arrays of 1.5 TB each and a total of 6 drives.
At some point of course, these two arrays will become full, and you need to upgrade again. This time buy 2x 1.5 TB drives (hopefully they'll exist by then). Create a 1 TB partition and a 500 GB partition on each of the 1.5 TB drives. Now you'll need to move data around again (if needed you may need to temporarily run a degraded RAID5 array with only 3 disks), but the end result should be that you get two RAID 5 arrays again, one consisting of 4x 1 TB (for 3 TB of space), and another which consists of 4x 500 GB. Remove two of the unused 500 GB drives from your server so you have a maximum of 6 drives again.
You can repeat this as often as you want, as long as you can find new drives that are big enough (for each upgrade step you'll need a drive that's as large as your largest and smallest drive combined), so the next step would be adding 2x 2.5 TB drives, copying your data, and removing the two smallest drives.
This method of upgrading only really works with an even amount of disks in a RAID5 array, I personally use arrays of 6 drives (for a total of 9 drives in my server, which have two overlapping RAID5 arrays of 6 partitions each).
Forget Raid , RAId Doesn't work.
r .html
Remember the Usenix paper ?
http://www.usenix.org/events/fast07/tech/schroede
Just make sure your application or you know how to store data in multiple places. You'll get both a cheaper solution (less disks required, no fiber required) and a more robust solution straight no nonsense access to the data you need.
Kris Buytaert
As some people have said, Hardware raid 5 will solve some of your problems. But at the low end of the market (and even the middle), hardware raid 5 can hurt (for eg, i've got 8 disks on sata using my gigabyte mb's raid controller, will it still work if my mb dies and i buy a new mb? and thats hard to answer).
/dev/md0 into the lvm (logical volume manager) vg (volume group) initially as a 1tb volume from which you'll probably create a 1tb logical volume... later on after replacing the 500's with 1tb drives you then create a second raid and have /dev/md1, which you can then add into the existing volume group, which can then be used to extend the original 1tb logical volume to 2tb. Assuming your using ext3, you can then online (and while mounted) extend the size of the volume in the box.
Lets say tho, you take your 3x500's, raid 5 them up (make sure you partition at least 1 partition on them), you'll end up with 1tb SWEET!
But, you replace them later with 1tb drives 1 at a time, and you can just partition the 1tb drive up and get the mdadm tool to rebuild your array on the fly (this takes time), you then partition the rest of the space on your 1tb drives for raid, (sda2, sdb2, sdc2) and wham, you have another 500gb... but its in 2 partitions which isnt entirely usefull from a disk-management annoyance.
Enter lvm. Essentially you do the same thing, but you add
That's a brief run down of how you'd do it, but the kind of commands you want to look at:
mdadm (for the raid5 encapsulation)
lvm (for the volume management)
As a suggestion, get (free) vmware server, install it somewhere, install fedora/ubuntu/whatever into it. Add 3 2gb disks into the system as virtual disks, create the array, add in lvm. Then remove 1 disk, replace it with a 4gb disk, etc etc. Its a cheap and easy way of learning how it works!
Here's the deal:
Raid 5:
- Minimal 3 disks
- 1 disk equivalent for parity, so 2/3 capacity for data in case of three disks
- Better do it in Hardware. OS striping will cost a performance hit, if that's an important consideration.
- There are two implementations... Block-striping meaning parity is also distributed per block over the whole array and single-disk parity.
- Read speeds will generally be good, but write speeds can suffer due to extra parity information writes.
This makes RAID 5 good for file-sharing applications where the performance is not absolutely the biggest consideration. Please be aware that the larger you make your raid set, the larger the chance of losing data. If you have 10 disks in one raid five set, redundancy is one spindle. The chance that two disks fail and wipe out your data is there. Also note that in any raid set, the smallest disk will indeed determine the capacity. 2x500GB + 1x1TB will give you 1TB in raw space because the other half of the third disk cannot contain redundant information. Any RAID controller that doesn't stick to that rule will not offer you redundancy.
RAID 1:
- One disk for data, one disk for the mirror.
- Write speeds are somewhat impacted but less than RAID 5.
- Read speeds tend (for some reason) to still equal RAID 5 or better them.
- High cost, high redundancy.
RAID 0:
- No redundancy
- Excellent read/write performance due to striping across multiple spindles without any added operations.
- Low cost
RAID 0 is higher risk than just a bunch of IDE disks on some buses because you lump it together on one volume, and if one disk fails the entire volume is gone.
Now in all of the above, it is *important* to realize you're only safe-guarding yourself against a *degree* of physical damage. Disks can fail. Failures to the controller that cause logical errors, software failures, attacks, virii and so on can still cause data loss. As soon as a piece of malware would fubar a file and commit it to the RAID, of course it's there in corrupted form.
Having said that, RAID 5, 1, 10, ADG and even sync/async off-site replication through IP, FC or iSCSI (FC encapsulated in IP, bad idea) won't safe-guard you from corruptions because of software of human error. Therefore, RAID is not and will never be a substitute for a good old-fashioned backup to tape. I say tape, because a 500 GB volume of data cannot economically be backed up manually to optical disks at present. You would need something like an LTO-3 tape drive to do that on a single tape, alternatively a smaller autoloader.
I'd recommend the following, assuming you're not running a hi-performance database with many random read/writes in 4KB block sizes, but rather a file-sharing or web-site environment:
- System disk: RAID 1 for easy recovery.
- Data disk: smallish RAID 5 set (max 5)
- Backup to tape.
Also note that SCSI disks are designed for an 80% duty cycle, typically, while SATA/IDE disks are designed for a 25% duty cycle. This means that if you're constantly fetching data from and writing data to your disks, 24/7/52, you would see your SATA/IDE disks die like flies around you. If your usage model is very strenuous, use SCSI.
Goddammit Kyle! Please, please ignore this post.
I once worked for the HP-UX group of the Swedish Airspace Authority, and they had HP-UX Service Guard clusters set up with shared storage consisting of manually configured Extent-striped LVM sets on JBOD SCSI disks. Whenever one disk would fail, the whole team would grind to a halt because the only one who could judge the impact on the systems was the guy who built the LVM groups in the first place.
What I'm trying to say here is the following: Sure, you can employ a load of tricks to use all your space on those drives. Hell, if a third, 1.5 TB set of disks comes along, you can rinse and repeat with those. You'll end up with a logical disk structure that equals the gordian knot in complexity and which will become totally unmanageable. This in turn will mean that somewhere, sometime when something happens, you WILL fuck up trying to rebuild things in case of a disk crash.
OS based volume management should never, ever become more complex than RAID 1 with the simplest of mirror sets. Please. For performance' sake. For administration's sake. For your sake.
I've been thinking about doing something like this myself, buying 2/3 drives and just backing up to them from my three systems (desktop,laptop,server). Seen as they're all windows boxen i've ben looking at XXCOPY but havent actually had a go yet.
My idea is to reverse mirror each drive. so for a simple example, a monthly backup, you'd have folders thus (the actual structure here is less important than the idea):
z:\server\2007-01-31\c
z:\server\2007-01-31\d
z:\server\2007-02-28\c
z:\server\2007-02-28\d
something like that. Where the most recent folder holds the full filesystems at that time, but the previous one holds only the files that are different. Whilst this wont tell you which files were NOT present in previous backups it does mean that a corrupt file will be found in an old backup (being different by virtue of size, hopefully, though a hash check would be handy).
To restore a system from this you'd only be able to reliably use the most recent (so creating "good" mirrors manually before making drastic changes will become a good habit to have) without some jiggery pokery. but to retrieve an old version of a file that you accidentally overwrote thinking you were editing a copy you could just search for it. Big movie files that never changed would only ever be on the backup drive once, either in the latest folder or if you'd deleted it, the backup before the delete.
like i said restoring a system to an arbitrary point in the past may be painful with this setup[*] but for keeping backups of random files and their differing versions around forever it might do the trick.
[*] though thinking about it you could also store a (large) csv file that listed every file in the system tree at the time of the backup along with it's size and hash. To restore you'd have to pull up that list and find the relevant version of each one on the drive to rebuild the tree. fiddly and long-winded, but possible whilst being quite space efficient if you're on a tight $-per-Gigabyte budget.
If you don't risk failure you don't risk success.
Oh, additionally?
.txt out of it, & more (list goes on, hugely)).
I won in the Access/Seek too, 8.8ms in the test below, not just in 0% CPU usage!
AND, by BIG margins (due to 10k rpm "Raptor X's" in RAID 0) on that test imo, more than the controller really (unless it was bursting data out of the 128mb of ECC Cache RAM this controller uses)!
(CPU usage category where my system was showing 0% (vs. others ranging from 3% - 11%)... many orders of magnitude of gain in fact, especially if viewed in terms of percentages!)
Once more, the URL of HD Tach 3.x test again for your reference:
http://forums.techpowerup.com/showthread.php?s=518 74ee73e9a212bfbabbaba41cf36e3&t=26630&highlight=Ta ch
Enjoy!
"Great. Did it cost you more than an upgrade to a dual-core CPU? Or a whole separate processor? Or the difference between a single-core and dual-core, or between 32-bit and 64-bit?" - by SanityInAnarchy (655584) on Tuesday June 05, @01:35AM (#19392311)
About the same iirc... about $250, iirc, direct from Promise.
(Again, due to the nature of work I do in Coding & DB work, heavy diskbound activity usually (many files generating during compile/recompile cycles) plus serving up some files as well here too? I like & need FAST access & seeks mostly. Bursting is not something I am worried too much about, I rarely 'burst' read huge amounts of data)
"Or what about a whole separate computer? Just have a dedicated fileserver with enough CPU to handle the RAID, and connect to it over gigabit?" - by SanityInAnarchy (655584) on Tuesday June 05, @01:35AM (#19392311)
I have a 2nd machine here that acts as a SQLServer (2005), to do emulations of work related stuff with sample datasets on it. P4 3.2 ghz 1gb RAM, WD 36gb 10k rpm 8mb buffer generation #1 Raptor... all I need on it really, running Windows Server 2003 SP #2 & SQLServer 2005.
"I would say that, if you now have a lot of spare CPU cycles, you've wasted your money. I could be entirely wrong -- maybe you have done the benchmarks, and maybe you do have the kind of insane load it would take, but most of the time, hardware RAID is a waste." - by SanityInAnarchy (655584) on Tuesday June 05, @01:35AM (#19392311)
I don't know if you write code or not, or manipulate files (I do, quite a lot, much of it being string related & many @ once)? String parsing & such takes up LARGE amounts of CPU power. Coding does quite a bit of that, but more often here, it is in editing files (stripping HTML chars out, getting only RAW
I generally "tear up" cpu pretty badly in the nature of my work - not "TONS" of spare cycles left quite often on CORE #1 (AMD CPU Athlon X2 4800+), but the process scheduler takes threads from other process' (child & parent ones) & sends them to CORE #2 here, as needed.
Having this controller, with a hardware based Intel I/O subprocessor on it, saving SYSTEM CPU cycles as it does, WAY over "normal" setups (most of them, per the url test I directed you to), helps here, especially due to the nature of my work (coding & lots of string processing in files).
Gotta love Windows NT-based OS' for this (had thread model, TRUE smp enterprise ready threading @ kernel level/Ring 0/RPL 0, far before Linux did (the Linux initial "usermode threads" don't really count, as they resolved out to a single thread in kernel mode round robining to it... this is WHY Linux has 'true kernel mode threads' now, to be SMP/enterprise class OS ready in fact!)
Anyhow... you asked for benchmarks, you got 'em!
I have done them, see the URL above in THIS post, & my other reply (between the two of them, you have actual documented data & tests against many others, including "PRT" using disks (they ROCK on reads, as you will see)).
APK
Hi,
I can't believe my eyes - what a flame question!!!!!!!!!!!!! (I don't know about processor, I don't know if I choose WinXP or Linux, I do not understand how it works, I do not know anything about anything please help....)
If you continue to publish such a flame and lame questions I seriously consider to stop reading your web-page.
You are becoming too tabloid.
Think about it...
J.A.
I'm doing exactly the same thing. I will initially have 3 500GB Western Digital RE2 SATA drives in RAID 1 & RAID 5 configuration (RAID1 for /boot, RAID5 for / and /swap). I will expand this to 4 and 5 drives as and when I need the space. I will be using Ubuntu Linux on the machine (MythTV backend server) and you can indeed add more drives to a RAID5 array as well as expand the file systems contained by it (including ext3?). I'm no expert and I still have to learn how to do all this of course!
What I'm aiming for in my machine is storage space that won't be immediately unrecoverable if a drive failure occurs. I will still backup my critical data (photos, music, email, OS install) to DVD but I can live with losing recorded TV, DVDs, etc.
Here's some more questions.
1) /swap space. At the expense of speed I'm proposing to have /swap on an LVM partition over RAID5. This is so that the machine will not crash in the event of a drive failure. I know I can stripe /swap directly across multiple disks (no RAID, no LVM, just flat partitions) but I read that if one of those disks fails the OS will probably crash? I mean, it makes sense that it will, doesn't it?
2) I'm probably going to go with ext3 for reliability. But XFS has the cool stripe alignment thing, will this really help performance that much? Or another way of putting it, how much will ext3 NOT having stripe alignment hurt performance?
3) Does XFS' strips alignment even work at all with LVM?
4) Would I be insane to use XFS without some sort of UPS? I hear that it really doesn't like to have it's plug pulled.
Thanks,
J1M.
Vin Diesel? Bah. Jennifer Aniston is the new Vin Diesel.
(and I've replaced you by a small shell script)
I don't like Linux's software RAID implementation, as it immediately drops member devices out of their array in the event of a read error. I gather at least one implementation of software RAID in at least one flavour of BSD will immediately try to refresh the blocks that generated the read error using data from the other array members. This is the correct approach, IMNSHO, since doing it the Linux way means that if another array member encounters a read error whilst re-building the array, you lose the entire array.
If you're going to use JBOD, you probably might as well use RAID 0 and get better performance. The reliability will be lower than a single drive though (1/n, in fact).
RAID10 (stripes over mirrors) is the Rolls-Royce solution. It can tolerate upto (n/2) drives failing, and degraded performance is better than RAID 0+1, and array rebuilds are quicker than RAID 0+1 too. It doesn't come cheap, though, since you need at least 4 discs and you only get n*2 capacity.
It really depends on how important it is to maintain your media data; I have backups of the MP3s I've ripped (in addition to the original CDs) and my TV recordings are disposable, so I use a single 300GB disc in my MythTV box. I'm thinking of upgrading this to 2 * 500GB and putting TV recordings on one, and everything else on the other.
I used a RAID-5 3x250GB at one point with Linux kernel RAID [e.g. no hardware]. It worked fine. Even after one of the drives died. I replaced the array with a RAID-1 2x320GB (since I wasn't filling the other raid anyways) and it too uses software.
I think the smartest thing is to invest in a good set of drives before you make your first raid. No sense trying to raid up a bunch of random small drives, then upgrade. It's just a pain in the ass and you run into the potential problem of data loss. Suck it up, by a set of big drives now and you'll last longer with it, even through the cycle of drives dying.
A 3x500GB array should be enough for most casual uses, it's not too expensive and gets you 1 drive of redundancy.
Someday, I'll have a real sig.
Have a look here: http://www.gagme.com/greg/linux/raid-lvm.php
Raid5 + LVM allowed me to use different sized disks and create a fully redundant array of the whole thing and have it appear as a single volume.
If you use a filesystem that supports resizing (like ReiserFS for example), you can increase your array without even taking it offline!
I use whatever disks I had laying arouind to build my NAS but the data is still protected. The software I use is developed by Lime-Technology http://www.lime-technology.com/. It's NOT RAID and instead is a JBOD setup with the first drive being a PARITY drive. This means that if one of my drives fails I still have access to the data. If TWO drives fail I lose TWO drives worth of data - *not* the whole damned thing. The data is not striped and is stored in a ReiserFS F/S so I can pull a drive and mount it elsewhere if I desire. This also means that if a drive isn't being acively used it can be spun down - try that with a striped RAID :-) When you write only the parity disk and the disk being written to need to be spinning, love that. The system can hold more than 12 disks if you use their top of the line software - mine only holds 12 total for a bit over 4.5TB worth of storage. Boots a customized Linux off of a memory stick and yeas source for mods is distributed but not the source for the WEB management stuff - he appears to be GPL compliant.
:-) Check out the user forums on the site, the developer is pretty responsive...
Some limitations: Parity drive must be as big or bigger than all others. Each drive is a seperate mount point unless you use a funky sort of shared folder feature. The system doesn't have as high a transfer speed as a RAID would, however it streams video for me to an XBMC XBOX1 just fine. It doesn't have a super robust system to notify you of failed drives out of the box although some users have added this functionality. Not a whole lot of security although I've met someone who has added this on and the developer is also working on expanding this in the future. Pretty decent support overall IMO and he's just moved to the 2.6 kernel - I've yet to upgrade though.
All in all this system seems to be perfect for HTPCs and I also use it to store backup images of all my workstations. All of my music and DVDs are stored on it and I'm about to build a second one as I need still more storage and have "spare" drives that I've pulled from the existing one as I've upgraded that I'd like to put to good use
Build it, Drive it, Improve it! Hybridz.org
I should've searched for unRAId in the discussion before posting my own endorsement - I'd be able to mod you up then (doh!).
:-) Not having striped data also rocks, lets me spin down everything not in use. ResierFS is easier to recover data from than damned weirdo' striped data sets too - there's a reason that recovery firms specifically mention if they can recover from RAID crashes heh. All in all it's a good system, several of my friends now run it and I'm building a second system for my used disks that get upgraded out of the primary system.
I 100% agree with you on unRAID - it rox! I'm not using any sort of background Linux stuff to do my backups but I do use an unattended backup package that works just fine with unRAID (Acronis). The speed could be better (I'm not yet on the 2.6 based version) but it keeps up stutter free with my XBMC box so it's fast enough for me.
Build it, Drive it, Improve it! Hybridz.org
I've recently gone through this process with my own media server. I used two RAID cards and plenty of discs. Separate the media center / media server components! Hard discs are loud and you'll not want them under the TV. Doing it this way means you can select a case as big as is required, and use old components for everything in the server (except the disc subsystem).
My first RAID card is a 5-port affair, which originally had 3x400GB discs, and now upgraded to 5. My second card is an 8-port with at present 3x500GB discs. When it needs the next upgrade, I'll just buy extra 500GB discs - just make sure the cards you buy support online capacity expansion, RAID migration and configuration on disc. Good RAID cards can be expensive but it's well worth it for this sort of thing.
A future upgrade past 4.6TB would involve replacing the 400GB discs on the older RAID card. I plan to just temporarily add new non-RAID discs to the system and transfer the data. Then I can disconnect the old array drives, plug the new ones into the RAID card and use the online RAID migration to bring that data into a new bigger RAID array. Another reason to get decent RAID cards.
I've got all this in a Silverstone Temjin TJ-05 case which supports upto 14 hard discs plus an optical drive. That leaves a space for a non-RAID system disc which is an ancient 60GB drive. The rest of the system is an old Celeron 600 system that was fairly low end when I got it around 2001. It's more than fast enough just running as a file server.
Get a hardware raid card with features to do the things you want... for example:
I have an LSI raid card (6 SATA connections):
If I have 3 400GB drives, I have 800GB of raid5 storage (2x400GB usable)
If I add a 1TB drive, I have 3x400GB of usable space, so 1.2TB
If I add two more 1TB drives, I have 5x400GB of usable space, plus with 3 or more 1TB drives with 600GB free on them, I can create another raid5 array with 2x600GB usable space.
With 3 400GB and 3 1TB drives, I have 5x400GB usable space, and 2x600GB usable space.
Then if I swap out one of the 400GB for a 1TB drive, I have 5x400GB usable and 3x600GB usable.
If I end up with all 6 1TB drives, I would have 5x400GB usable and 5x600GB usable. All Raid5
Jag Player of Games
http://www.drobo.com/ Looks interesting. /DM
example on gentoo > http://gentoo-wiki.com/Resize_LVM2_on_RAID5
v m2-recovery-resize-howto/ :)
http://www.void.gr/kargig/blog/2004/09/24/raid5-l
There is a strong desire to have that massive collection to show off to your friends, but much like moving the wall of CD's & DVD's before it, collecting, sorting, and backing up files became quite the pain in the ass and a RAID array will only exasperate the problem. Other than RAID 1, setting up a RAID or a JBOD puts you in a situation where you'll end up with a single drive that is larger than any one drive you may purchase later, unless you have an array made up of a bunch of 120gb drives. Personally I don't wait that long between upgrades. There is also the joy of losing your array when moving it from one motherboard to the next. With a single drive set-up you just take out the old one, and put it into an external kit, and pop in the new and transfer your files. You have an instant back-up till you sell it, just make sure to purge the embaressing stuff first!
My policy is that some sort of hard drive gets purchased every six months to a year and pushes the smallest one off onto Ebay. This helps me avoid crashes. If you insist on keeping 5+ year old harddrives around you are just asking for trouble, besides you want you collection to grow and for that you need new, bigger drives. The only use I have for a RAID is currently in my main computer as a RAID 0 just so things hum along with the quickness. Anything important gets backed-up on a second drive in the computer as well as a third copy on the media center and a fourth on an external drive that is in a closet and only hooked up to conduct a back-up. For must-never-lose things have I back-ups on several family members' and friends' computers located elsewhere which I update on my semi-annual laps to see everyone and do the same for them in turn. Just in case my house ever burns to the ground.
Unless you have a house-hold full of roommates that insist on streaming everything to their own computer, RAIDs for performance reasons is overkill. A single drive generally has enough bandwidth to handle a stream/filetransfer or two streams without a noticeable impact on someone trying to watch or listen from the same drive. A LAN party blows this out of the water, but then again those will cause a meltdown even to most RAID set-ups, so set some connection and bandwidth limits and you'll be fine. I sort out my media accordingly to take advantage of most people's habits. If someone is listening to music, they're probably not watching a movie or TV and vice versa.
So enough rambling, here is my current set-up
Media Center
40" Samsung 720p LCD TV hooked up with a DVI-HMDI cable
Nvidia 7800GT 256mb
AMD X2 3800
Windows Vista HP (yeah I jumped the gun a bit, it works fine, but drivers are still lagging and I am missing some of the soundcard functions.)
Winamp for music, there are many out there, I like this one. Ripit4me/Fixit/DVD decrypter/DVD shrink for ripping DVD's 1 gb RAM
Creative X-fi plat sound card
200 gb MAIN drive partitioned into 40gb for Windows and programs and the rest as file storage and temp video storage.
300 gb MUSIC drive, it has 70gb of music on it, with the rest devoted to back-ups for my main computer. Drives that are more than one year old, but less than three, are your most trustworthy drives since they've survived the initial culling period and aren't too old. Of course you never 100% trust them.
320GB TV drive, for you guessed it TV shows. Mostly
What is it about 3 disks that makes it RAID3? Or, in other words, what's stopping it from doing what I understand to be RAID5:
DDP
DPD
PDD
DDP
DPD
PDD
(D = data, P = parity)
Don't thank God, thank a doctor!
Ravengbc,
;)
I had similar questions when I was building my file server for use as my media repository and basically everything else (including workstation backups). I use this at home, on my personal network... and it works beautifully.
You'll need Gigabit ethernet to make this work... 100Mb is just too slow. And you'll need a GOOD switch, not some cheap off-brand.
For my media PC, I use BeyondTV. It's basically the same thing as MS's media center, but I like BTV... (Besides, I bought into it long ago!)
My file server uses a Promise TX4310 (4 port hardware RAID) with a 4 drive enclosure that I purchased at Fry's Electronics. With this, I use 4x500GB drives to give me a full 1.5TB worth of space. The enclosure isn't specifically for this card, but seems to work very well... The lights tell me when something is wrong or when there's drive activity, and seems to be hotswap compatible, but I haven't tested the theory.
Toss that in with gigabit ethernet and allow your media PC to run locally with a 200GB drive (7200RPM if you're cheap, or preferrably, get a 10k Raptor). Allow your MPC to record locally, but set it up so that when it compresses the shows, it compresses to your file server... If you turn compression off and just use COPY in place of it, it works beautifully.
Set it up so that it does this during off-peak recording hours.
I know the detail is sparse on my post, but I'm writing quickly some basic information about my own setup.
(By the way, using 3 500GB drives won't give you 1.5TB on your array, it'll only give you 1TB. You lose one drive's worth of capacity in a RAID-5 array... Yeah, it sux, but it's necessary)...
The nice thing about this setup is that if you split your systems this way, you now have a dedicated file server on gigabit ethernet that you can store anything on... not just your movies, tv shows and music. You can now use it as a backup system for your media PC as well as your workstations, programs, etc....
"When the people fear their government, there is tyranny; when the government fears the people, there is liberty." -Thom
Has Slashdot because a self-help group for the tech illiterate now? "For example, say I have 3x500gb drives in RAID 5 and over time replace all of them with 1TB drives. Instead of reading one big 3tb drive, it will still read 1.5tb. Is this true?" NO. You will not read 1.5 TB you will have only 1TB. Now do a search and read up! http://en.wikipedia.org/wiki/Standard_RAID_levels# RAID_5_usable_size
Yeah, I think the submitter should go with RAID 5. Redundancy matters: You only need to lose your data once to regret your decision. But he raises an interesting point - the array cannot grow incrementally through hardware upgrades.
Surely, somebody could invent a system/algorithm with the properties of RAID 5 (or even 6), but with the added capabilities of different-sized disks, and total available space which automatically grows as you gradually swap or add bigger disks.
But that system doesn't exist today. So he should go with RAID 5, and make as big an array as he can afford. When the day comes that he runs out space, he could build a second RAID 5 array, or, if he's feeling cheap, add additional drives outside the array.
Go RAID 5, it's the best performance+reliability/buck (IMHO). As for replacing the drives with 1Tb ones and still ending up with only 1.5Tb why not buy a PCI SATA RAID card and build a second array? that way youd have 4.5Tb over two separate arrays.
In the not too distant future, next Sunday A.D.
First Question: How active is the data?
RAID-1 or RAID-5 is great if the data is constantly updated and losing a day's worth of updates would be a hardship. A media collection is typically not highly active data. That is, files are added occasionally but not often updated.
For this application, a single large drive would be fine if you can find a single drive that is large enough. A second drive of the same size would be mounted in a external case and newly added files would be periodically copied to the external drive. The external drive would be kept offline except during syncing and would therefore be immune to power hits, accidental finger-checks and perils of that nature.
When you need to upgrade to larger drives later on, you just buy two of them and copy the files over.
You were 80% angel, 10% demon. The rest was hard to explain. - Over The Rhine
"Math in a song is good."-Linford
Use RAID 666! There can be up to 26 disk failures without any adverse effect!
The Tao of math: The numbers you can count are not the real numbers.
First, forget hardware RAID solutions. While their effectiveness is debatable for commercial and enterprise applications, it's definitely overkill for a home solution (particularly a media server). (Unless of course you have more money than sense.) But Linux RAID (md, multi-disk) is mature, stable, and well-tested. It's portable from one machine to another. It's free. With even modest hardware, it will be plenty fast for a home media server. Don't even bother with those pseudo RAID solutions that are built into your motherboard (or implemented via firmware or a proprietary driver): Linux software RAID and true hardware RAID beat these solutions in just about every conceivable way.
Now, do you really need RAID? Many people equate RAID and backup. They are not equal. RAID is no substitute for a good backup. In the case of a media library, you do own all the media, right? :) There's your backup. Worst case, you lose the time spent ripping the media. So there's an argument to just use JBOD. However, I do use RAID5 for a bit of safety. If two drives fail simultaneously, I fall back on the media. But if only one drive fails, then I can replace the drive, rebuild the array, and lose very little time. It's quasi-backup. It's just too expensive for an individual to maintain multiple live copies of this much data.
If I were to build a fileserver for someone right now, this is what I'd use:
I have another post on this thread where I went into more detail about the choice of case. Quick summary: if you care about noise, don't cram your drives close together, or you'll have to use an obscenely loud high-speed fan to keep them cool. If you allow at least 0.5" between each drive, you can keep your drives cool with a low-speed (quiet) fan. That's why I'm buying the Lian Li case mentioned above: room for up to nine drives, with adequate spacing between each.
But we're talking about binary. That's only in hex!
Ah, my bad--I hadn't realized mdadm had the -G (grow) option. Durr to me, I should rtfm more closely next time ;) Using mdadm -G would be a much simpler option than hacking together a complex LVM setup.
the real at&t mix
If all you want to have is a backup storage place for all your files, you may want to consider going with a good Network Storage device in addition to a RAID array. That way, if your whole computer ever goes down (say, catches on fire...), you still have the files backed up in a different physical location and immediately accessible from another computer on your network or even remotely from the internet.
I've successfully done hot swaps without any issues whatsoever, and only minor hit to throughput. Of course, this was between 96 and 2001, running on RAID systems that cost between $4K and $7K per array. I also played with cheap IDE RAID, which is just about worthless. It's much better to go with software RAID vs any cheap hardware solution and put the extra money into a few extra drives. Then implement RAID 10 (a stripe set consisting of mirrored arrays) for the best performance and fault tolerance and fault recovery, for high availability/reliability sites (not what the OP was interested in)
For the OP's solution, it'd be better to just have JBOD and make a backup copy onto an external set of disks that he then removes. Yes, you still have double the diskspace, but you also have an offline backup. For video/audio files, this is the cheap way of doing it. The proper way would be to have two external drives for each system drive, and rotate backups between them, keeping one set off site (ie - at work) in case of catastrophic issues (fire at home). Odds are, once filled, each drive will remain mostly static and never need backing up again. Change out backup disks about every 2-3 years (rotate backup into prime use rotation) and you should remain drive failure free and be able to upgrade to the then current best buy for disk size.
The cesspool just got a check and balance.
I know there are two or more camps when it comes to raid and all, but there are a few stand alone NAS boxes that WILL allow you to swap out drives (one at a time) for bigger drives and it expands the raid volume size. http://www.smallnetbuilder.com/content/view/29616/ 75/
is one such one.
I used Tom's hardware's article to create a four disk RAID 5 array on my Windows XP Pro box. It does away with all the hardware limitations that conventional setups have and there's no need to pay for a dedicated controller.
;)
This means that instead of downloading special drivers for the controller I only need to modify three Windows files in the case of a reinstall. I don't have to stick to one brand/model of RAID controller (or to some of the new mainboard chipsets that support RAID 5 as well). Featurewise it does not support RAID level migration. On the other hand it recognizes disks of a RAID setup automatically no matter how they are connected to the computer (e.g. right now I have two connected through the mainboard's SATA and two through a SATA controller, but I could replace the controller/mainboard, replug them and it would still work); this was the feature that made me decide to go for it. The software overhead is not noticeable (3GHz P-IV). Performance is on par with hardware RAID setups. Only things that's a bit annoying is that when for one reason or another the machine reboots without shutting down (power outage/crash) it will 'rebuild' the array when it boots back into Windows, which takes a few hours. While the system is still usable copying files from/to the array will take quite a while.
I've already partitioned the array into four partitions. My play for increasing its size is simply swapping it one by one with bigger HDDs and then creating a new partition with the new unused space. I believe that should work with any RAID 1/5/6 array, be it hardware or software. One might even combine the partitions afterwards using Partition Magic, but I 'm not a big fan of huge partitions. Hope that helps. Anyone else with Software RAID experience? I'd love to see Linux support for it, that would most certainly mean indefinite support (right now I'm stuck with Windows XP, don't know whether Vista supports it, but I'm not really willing to find out anyway seeing as Vista doesn't increase performance/usability at all.)
And when you gaze long enough into the code, the code will also gaze into you.
I used to run both Mirroring and RAID 5 in the past (not at the same time), but I found it overly complex for simple usage, plus it doesn't allow for what happens if the controller card fails or system goes up in smoke? Plus once you build a RAID you can't just add a drive to it easily or cheaply (I'm over simplifying this I know)
I find the best is to have another computer or possibly external drives sitting somewhere, and just make weekly/daily/monthly/whatever rsync copies between them. This allows for you to recover from user error like accidental deletions, and if the entire system goes down your covered. Want more space? add a drive and presto, more space. No special configuration required. No expensive controller cards (or cheap and slow controller cards) required.
And if your like me, you have another set of drives stored offsite... but I'm pretty paranoid about such things. =P
I can pretty much guarantee that once you've switched to a HW RAID subsystem, you'll probably never use anything else. The extra money spent on those is well worth it (IMHO). To keep costs down, ebay is obviously your friend. I've purchased 4 drive Arena units for less than $400. I
the drobo looks pretty cool. it has data redundancy (though not raid, something "better than raid" according to their site, and therefore proprietary), you can use any size drives, it's fully hot swappable and you really don't have to think about it much. it's very easy to upgrade to bigger disks too. it has some glaring downsides, like it's usb-2 only. i'd like to see NAS for GiGE and a firewire interface. also i've read on their forums about the loud fan, heat issues, etc. so i'm waiting for v2. but it's a cool idea and seems well architected.
in this age of communication i'm just not getting through
Before my current gig as an IT director I used to build computers and servers - especially audio video production workstations and media servers.
As far as RAID is concerned for a home media server dealing with video recording and other such you are going to want speed.
You want SATA (Sata2 300 if poss.) drives of at least 7200 rpm in an array that will provide increased performance; I personally have built several media servers for myself (I used to end up selling them to people after using them for 6 months so I could build a new one, but I have quit that after my last one).
If money is an issue and redundency isn't a key concern, then go with a RAID 0 array; this will give you about twice the speed of a single drive, but if there is a drive failure you will lose all of the data on the array. 2 drives in a RAID 0 array will give you the total capacity of both drives added together.
If you can afford it, what I really recommend is a RAID 10 (1+0) or 0+1 array (go for the 10 if your controller supports it, it's the better option IMHO). Both RAID 10 (which is really 1+0) or RAID 0+1 require at least 4 drives and will pretty much give you the speed of a RAID 0 array with the redundency of a mirrored (RAID 1) array. In either of these configurations you'll end up with half of the total capacity of all drives but the performance will be spectacular as will the redundency.
The difference between RAID 10 and RAID 0+1? RAID 01 is a mirrored configuration of two striped sets; RAID 10 is a stripe across a number of mirrored sets. Both of these can sustain multiple simoultaneous drive failures without losing data; however, the 1+0 (RAID 10) is slightly better in this area.
Hope you find this information helpful.
Do backups to the USB drives.
Don't use RAID, don't put your redundancy in a box that's going to be plugged into the wall during a thunderstorm.
Don't use RAID that renders data on individual disks pretty much useless.
I use FreeNAS [http://www.freenas.org/] I have a mirrored 120G RAID setup going, and it has all the bells and whistles to let you sync in many ways; highly recommended if you need a backup server of any type, but if you really want a solution that doesn't hem you in like the RAID size can't be bigger than the smallest drive in the array, go for ZFS. Check out the demos here to get an idea of what it can do: http://www.opensolaris.org/os/community/zfs/demos/ ;jsessionid=7E22552C4800B7688DFD8FD771896B4B
Granted it's new, but drop OpenSolaris on a box and get it configured and running...that's an option you can grow with. Also, FreeBSD has experiemental support for ZFS, and it should (?) be available on the upcoming 7.0 release. I'm sure someone will provide a web GUI to configure it, heck, that'd be a coup for the FreeNAS team, and then you'd really be all set.
fak3r.com
If you choose a RAID solution, 0,1,2,3,4,5,10, whatever, make sure you can recover from a failure BEFORE you start storing something on the RAID that you care about. Or, make sure you have a camera to take your own picture when you're standing there with a dumb look after whatever your solution is has a real failure.
From experience, once, and I didn't have a camera at the time.
Corrollary: practice restoring your backups. You make backups and don't depend upon RAID, right? Because you know what faults a fault-tolerant mechanism like RAID is meant to tolerate, right? Like you know what faults backups are meant to tolerate, right?
Let me tell you about the time I backed up all my dissertation research data stored on RK0 by copying RK1 to RK0 with formatting. I saw what I'd done about a second after my finger raised from the return key. Took me 5 months to recover the data.
I don't think it's fair to suggest that resistance to Solaris is just a matter of prejudice. It's more a matter of not being able to find the tools you need. Linux folk often try Solaris, but quickly give up on it because the administrative tools they're familiar with aren't there, and they don't feel like starting over from scratch.
You might call that laziness, but it's deeper than that. As an example let me cite my own experience with implementing a TWiki for my group. I was given an old Sun V20z to run it on, which already had Solaris 10 installed. I tried very hard to get the TWiki running under Solaris. The TWiki itself wasn't that hard (and there's a lot of helpful Solaris info on twiki.org) but I was utterly defeated when I tried to install all the various TWiki plugins I needed.
The problem is that TWiki plugins are written in Perl, and mostly require that you install additional Perl modules. Now Perl itself runs very nicely on Solaris, but it's pretty obvious that few Perl module developers bother to test their work on Solaris. That seems to include the CPAN module (which provides a shell that most Perl developers use to download and install new modules), so you end up downloading the modules by hand. Fortunately, Perl modules always have neat little install scripts...
Oops! A lot of install scripts don't work on Solaris either. OK, installation is not rocket science, you just have to make sure the module files are in the include path. Easy enough, though the results are disturbingly messy. Oh well, as long as it work. Just need to Make a few more modules...
Oops! Here's a module that uses a library written in C. And the library has to compile on a particular C compiler. Solaris has that compiler, but the Solaris version doesn't have all the features the library needs to compile! That's where I gave up.
So I wiped the Solaris partition (feeling a bit like a murderer) and installed Fedora 6. Now, I'm not happy about the rough edges I saw (unforgivable in a distro that's been under development for 13 years!) but I can't complain about the sheer simplicity of installing Perl modules and TWiki plugins on that platform. You give the CPAN shell a list of Perl modules you need, give it permission to also download and install dependencies, and sit back. Then you download the TWiki plugins and run their installers -- some of which use the CPAN shell to install the Perl modules you forgot. Simple and easy.
So, until ZFS is available "in the box" for Linux, it's just not an option for a lot of Linux people. That's not prejudice, that's practically.
That is like da bomb! If a disk fails you will have 10 minutes to rebuild the array before another disk fails. It's exciting!
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
First off if this is this a home system that you are building as a hobby, just use a simple mirror. It doesn't matter that much what you do. But if this is to hold a lot of important data that people care a lot about then seriously look at ZFS. Sun as raised the bar very high. It's the best thing in storage to come along this century. Read more here. http://en.wikipedia.org/wiki/ZFS
For those not entirely sure of what all of the various raid levels mean, this sums it up quite nicely: http://www.epidauros.be/raid.jpg
This is my Sig.
Linux MD RAID5 can be reshaped (grown). Though, doing RAID5 in software is about the worst idea ever. So you are stuck, unless you find a suitable hardware raid that can do reshaping.
* Drobo
* ZFS
* Starfish Distributed Filesystem
* NSLU or one of those fancy small boxes with USB ports
* Openfiler
What do we want.
A small appliance that you plug into your network. You can add plug in USB drives (or your preferred interface). The box automatically detects the new drive and expands the flexible RAID arrays (see Drobo), giving you redundancy with the minimal overhead hit. Add a nice OS that lets you create block devices SAN-style, create shares, etc. Use ZFS for all its nice features. Use a Distributed FS so that you can have several of those devices in your network for added security.
With drive sizes of today, and failure rates... For my money its only RAID 6.
r _raid_controllers/
Pros:
You can used mismatched drive sizes as long as the largest drive is your parity(s).
You can add drives up to your raid group size.
Using RAID 6 you have diagonal/horizontal parity and can recover from a double disk failure.
Cons:
Without a fancy controller speed can be an issue.
Here is some more info...
http://www.tomshardware.com/2006/01/02/safer_6_fo
http://www.netapp.com/ftp/netapp-raid-dp.pdf
(I just like the chart in the netapp info.)
I'm using a Linux solution myself. I originally went with an adaptec hardware raid controller and ran three drives in raid 5. In the several years of use I never had a drive fail.
Instead I ran into the capacity issues that you too see to be concerned about. After much thought and consideration I decided to take what I believe to be the least expensive and most manageable solution I could come up with. I tossed the raid card and stuffed two reasonably priced drives in that were roughly double what my raid 5 held.
However, instead of mirroring the drives I am doing a nightly rsync. Why? What if I accidentally delete a set of photos by accident, you know, the favorite porn? In a mirror or striped array you may have redundancy but it's just that, redundant with no backup. Unless you have the funds to put together a nice array and have a backup that is sufficient to archive the entire contents of the server then you are better off just saving your money and doing the rsync method. If you are uber-paranoid you could even get an older machine and rsync to it so that a catastrophic event (mice eating cables and frying the drives, etc) on one box doesn't affect the other.
My bet would be this; Budget for either a software or hardware raid method, then, go buy two big hard drives that each match the same capacity as the previous solution and put the rest of the money "under your mattress." If you think a hardware solution is in line, budget for two cards so that you have a failover card. From the management standpoint remember that you'll need to keep the firmware up to date on both cards (or neither, but I found with my adaptec it wouldn't support 400GB drives without a firmware update) A year or two later when your media center fills up, get the money out from under the mattress, buy two drives that double your current capacity and install them. You'll probably have enough leftover money for pizza, beer, and a hooker and you'll be a lot less stressed with the simple upgrade path and less worry about whether you should have bought that hot spare. Now, to be reasonable, you can't expect that scenario to work if you are tying to have terabytes of storage. Three 500GB drives and a raid card right now are cheaper than two 1TB drives, but TB drives are teh new hotness and come at that premium.
Also be prepared for the idea that with media, you may NEVER have enough storage. Between anime, digital photos, Linux ISOs, and ripping my cd collection to my "media center" (just a samba share) I've first filled 157GB (3x80GB raid 5), and now 300GB (2x300 rsync) of hard disk space. Eventually you will have to decide between keeping the last six seasons of Stargate and all of Battlestar Gallactica on your server, or would you be better off using your box sets that you legally own and just use the server for recent stuff. I finally had to draw the line and say I'd be better off to make backup copies of my DVDs and store them somewhere else rather than depend on my server as the backup and the convenient method watching stuff. As far as fair use goes you're probably better off to just have backup copies of media anyway, since you never know when you'll get accused of file sharing and have to prove you are innocent.
For reference and to validate most of my point lets make a real example since you are obviously here asking and not researching too much on your own. I'll use drives that have come down to reasonable market prices rather than the newer overpriced drives.
Semi-Paranoid config
4x250 GB HDD (3xraid 5 + 1 hot spare) ($50.00 each/pricewatch) $200.00
2 x 3Ware SATA II Hardware RAID 9550SX-4LPKIT ($329 each pricewatch) $660.00
Total Cost minus shipping/taxes for 1/2 TB raid5+spare storage =~ $860.00
Poor-man's "I hope it doesn't die" config
3x250 GB HDD (no hot spare) $150
1 x Sata Raid $329
Total Cost** $480
Simple Mirror with backup.
3x 500GB WD WD5000AAKS 500GB 3.5 HDD 7200RPM ($120 Each)(one live, one mirror, one for nightly backups) $360
-- this space for rent --
I partition every drive with a root, boot, and swap (just in case I ever need to use it as the boot drive in the future) and then the rest of the disk gets chopped into 64G chunks.
These 64G chunks can become physical volumes in LVM, or can be assembled into RAID1 (mirrors) or RAID5 (checksum) which are PVs in LVM. If you replace a disk with a larger disk, all you have to do is fail over the 64G chunks. Extra 64G chunks are available for new RAID creations.
I have already performed a couple of hard drive upgrades and RAID migrations with this scheme (although I can't remember if any HDs I use in this scheme have failed in service yet). Most recently I replaced a 160G drive with a 400G drive. pvmove is your friend.
I also tend to create all my RAID1s with 3 partitions, and only fill in 2 of the slots. That way if I decide to relocate one of the mirrors, I can add the new one, let the mirror complete, and then fail the old one.
With 750G drives having reached a good price point recently, the 64G partition size looks silly, so I would create them with 128G chunks if I were starting today.
First of all, my goal was to be able to be able to lose a drive or delete a file and still be able to have a backup somewhere. I also wanted to be able to use my old drives after I upgraded. I also don't yet trust growing a filesystem or RAID array (though I've had great luck with growing XFS). This ruled out any of the RAID levels because I wouldn't be able to recover a file and expansion can be a pain.
Here's what I came up with:
2 computers: Desktop and Server
The desktop has a root drive and a media drive. The root drive can be anything you want (just has to have enough space for your programs, if it's a HTPC, anything should be fine). The media drive is always the largest drive (ie, the newest). The media drive is rsynced to the server nightly.
The server has many drives, one of the the root drive (again, doesn't really matter). The others are put into a union using UnionFS (or AUFS, whichever you can get to compile). This is where my backups go. It's kind of like having a filesystem-level JBOD, but it can withstand the loss of drives with only a partial loss of data.
This gives me the advantage of easy expandability (all I do is copy to a new media drive and add the old drive to the union). I also don't have to waste old drives. It also allows me to lose either the newest drive or ALL of the old drives and still not lose any data. It also means that, even if I lose my media drive and a backup drive I will still have many of my files. This makes recovering from a drive failure trivial (assuming I still have enough space), I can just reboot the server and backup again. It also makes recovering from a loss of the server easier because I can just mount each drive by itself and still access the files.
The only disadvantages of this system are that it could waste space (if you have a drive marked as read-only and you modify a file that was on it, it will be copied to a writable drive, but this shouldn't cause a problem for media files because they should rarely change). Also, you have to monitor the union because if you add a drive when the union is not full, you will waste whatever space is in there now (UnionFS will only write to the top drive in a union). When you get a lot of drives in the union, performance will degrade because it will take longer to find which drive the file is on, but that shouldn't be too much of a problem with media files and it will only occur on the backup drive.
For example:
Desktop -- 60GB root, 250GB media
Server -- 40GB root, 320GB backup
I ran out of space in my 250GB, so upgraded to a 500GB. All I had to do was copy the data from the 250GB to the 500GB, reformat the 250GB, and put it into the server. When the 320GB drive got full (I saw the errors while backuping) I told the server that it had another drive in the union and backup again.
Desktop -- 60GB root, 500GB media
Server -- 40GB root, 320GB+250GB backup
Soon I plan add a 750GB or 1TB (depending on price) as media, and will repeat the process.
-palmer
I just built a media server. I used RAID1 and RAID0:
C: RAID1
D: RAID1
M: RAID0
C: is the system drive. D: is where I keep important files that I don't want to lose. M: is where I put my DVR files (BeyondTV). I figure they are recoverable and hence can risk RAID0.
These are on 4 physical drives:
2x 250GB split 50/50 for C: and D:, giving 125GB for each (mirrored)
2x 500GB for M: to give 1TB (striped)
I used the Intel RAID manager, which is surprisingly easy to use.
For TV recording, I have the Hauppauge HVR 1600 - I saw this on sale for $50 at one of the big chains (Circuit City I think).
"No matter where you go, there you are." -- Buckaroo Banzai
Cuz there are different opinions about what might be best.
You have your 4-disk, Raid5 array, whereas I'd recommend buying 2x400 for a RAID0 and 1 750GB drive for daily backups (not mirror, see "xfsdump" or "dump"). You shouldn't fill your "fast" drives more than ~80%, so with 2x400 you would get approx 800GdB, raw, giving you ~600GB usable space after formatting and round-down (decimal disk GB's) to binary. When you buy a 1TB, cycle it in for the backup and use the 750GB as slower "data" storage. Maybe next the 1.2TB's will be out, so save up for another 750GB (if you really like the RAID0 idea), to RAID with the previous 750GB and use the 1.2TB for backup. By then your 400's will likely be ready to retire (~2-3 years before alt-sector mappings slow down your RAID enough to be noticeable)....and so on and so forth.
Another option -- use a separate system for the backup disks. Who backs up to tape these days? Tapes are too slow, too low capacity and low benefit/cost ratio unless you need to store long term backups and not overwrite your backup media. I usually find I can keep 3-4 months of system & data backups before I have to recycle space, but that's usually quite acceptable for a home server.
Oh, yeah -- invest in "smart" (one's that condition power, like APC SmartUPS 1000's: you can add longer runtime to the 1000's, but not the 1500's) UPS's -- _at least_ enough for slow, graceful shutdowns or better -- until you setup the Honda, "suitable for electronics" (EU2000i is reasonable, portable, and partly mirror-able with a 2nd generator) generator(s). With enough generator power you can keep your media center up and running during a multi-hour power outage and still have enough to keep the fridge cold.
It's a good thing to be prepared...
I have a bunch of mismatched disks, configured in raid 5. The trick is to split the disk into uniform partitions and create the raid on those partitions. As long as no two partition from the same disk are in the same array, then you still have redundancy. I created the arrays initially with 6 disks, 2 120G (hda,b), 2 160G (hdc,d), a 200 (hde) and a 250 (hdf). So I picked 40G as the partition size and got:
3 200G raid 5 arrays (6-1 x 40G) md0, 1 and 2 from partitions 1, 2 and 3 from all drives
1 120G raid 5 array (4-1 x 40G) md3 from partition 4 of drives c-f
1 40G raid 1 array (2/2 x 40G) md4 from partition 5 of drives e and f
1 40G spare partition 6 of drive f
10G leftover (used for boot partition and swap space)
I added the 4 raid 5 arrays together using LVM to get a 720G volume group.
When I replace a disk, with a larger disk, I can swap around partitions between the raid arrays and create new space. For example, say a 120G drive (hda) dies and I replace it with a 300G.
300G = 7 partitions
hda1, 2 and 3 go into md0, 1 and 2 to replace the failed partitions from the old 120.
I now have enough partitions on separate drives to create a 5 partition raid to replace md3.
So, I create a new 5 partition (160G) raid array (md5) using the 4 remaining partitions in my new hda 4, 5, 6, 7 with the 6th partition from hdf (not redundant yet, that part comes later).
Now, I add md5 into the logical volume, and use pvmove to clear any used space in md3 (md5 has more than enough room to hold it).
Next, I extract md3 from the logical volume and stop it.
Now, I can start replacing the partitons in md5 with the ones that used to be in md3.
Similarly, I can create a new 3 drive array from hda, e and f, by replacing hde5 in md4 with hda5.
End Result:
hda 300, hdb 120, hdc 160, hdd 160, hde 200, hdf 250
3 200G raid 5 arrays (6-1) x 40G md0, 1 and 2 from partitions 1, 2 and 3 from all drives
1 160G raid 5 array (5-1) x 40G md5 from partition 4 of hda, c, d, e and partition 6 of hdf
1 80G raid 5 array (3-1) x 40G md6 from partitions hda6, hde5 and hdf4
1 40G raid 1 array (2/2 x 40G) md4 from partition 5 of drives a and f
1 40G spare partition hda7
10G leftover (used for boot partition and swap space)
It's a little work to add a new drive, but I don't have to waste the extra space on a drive, just because of my smallest drive.
for starters you are incredibly wrong about your initial beliefs about the raid5 capacity. At RAID5 with 3x500GB drives your capacity IS NOT 1.5TB in fact it probably wont even be 1TB. RAID redundancy is achieved by (at this level) striping of the data across the drives. When you expand to 3 1TB drives you will again NOT have 3TB of storage. RAID = Redundant Array Intelligent/Inexpensive Disks. (depends who you ask) The operative word here is REDUNDANT. At any level other than RAID0 (which is RAID only technically by name) redundancy and math and the laws of physics state that you can't possibly achieve the max storage resulting from their initial summation without redundancy.
/ubuntoo on it that I am going to turn into a mythTV /dvr box if i can find something to plug into a pcmcia slot for capture(good luck i know..) otherwise its already to go as a mythtv.
I can't tell you what the best answer is. I'm still looking myself. I do think however that instead of starting out by trying to say "here are some various options and different ways to configure them with these resulting tradeoff's" we take a slightly more directed approach.
What is it that you want, what features do you want, what is important, what isn't. What backup stuff do you require, what kind of filesystems do you want to run. what network accessibility do you need, remote access? how critical is speed. (the one note that i would insert here is that you can always keep a seperate additional drive through a different connection to achieve the high speed that you might need.
I'm also buillding a system right now as my desktop that put me through the end of high school and all of college is finally next to dead. This is what I am looking for / needs I want to address: storage of lots of media that I'd like to be able to access from both windows and linux.. yes I'm sorry but I am still migrating off of it, there are a couple of things I need it for but thats what my old barely alive pc is relegated to. That said I have a windows laptop (this one is planned to become linux as soon as i can transfer everything off of it) I keep in my living room for quick access to whatever while I watch tv. I do have a spare T30 laptop that is running gentoo
the biggest thing is that I don't always want to have my desktop running if possible just to access the files on its hard drives. maybe thats dumb, but if I'm just browsing some stuff from my laptop on the couch after a night of work id like to be able to access my files. Yes linux is stable and I could leave it perpetually running all the time, but try as I might the little green guy on my shoulder says that doing so would be wasting a lot of power running a big fancy computer all day long so that i dont have to go upstairs to turn it on when i want to access a couple of files... something tells me that since wakeonlan never really seemed to get off the ground for consumers i'm going to probably end up putting all storage into that machine and making it both a file server and "fun box"
"Jazz isn't dead, it just smells funny" ~Frank Zappa
EdelFactor
From the question it sound to me like you're looking at it the wrong way. You need to consider your goals first, and then decide on alternative methods to achieve them. And then, weigh the pros and cons on the alternatives for your situation. It sounds like you are looking for redundancy in order to achieve data reliability. Recent studies by google and another research, found out that a. MTBF numbers are all wrong (not to say lies) b. the theory of being able to sustain data access through a single failure is flaky To that I add the observation that not all data is worth the same. Google employ some redundancy of data by keeping multiple (3?) copies of important data. I wish there was some network file system that would allow me to give redundancy attribute to a file and the FS would automatically maintain N copies of different hardware. Other requirements you need to consider - how easy it is to recover from a failure. RAID5 failure is not that simple to recover from. It takes time to rebuild the array. RAID1 is much simpler to maintain. RAID0 is not a redundancy mechanism, but a performance mechanism. It is similar to a JBOD, but can be also much worse than JBOD since with JBOD you access each disk as a disk. A disk dies, only the data on that disk dies. WIth RAID0 the data is striped. A disk dies, all the data part of which is striped on that disk, is lost. I'd also suggest you take a look at ZFS (BSD has it I hear). It may provide you what you need. Good luck Dan
This is a complicated set of questions the you've boiled down to should I use RAID 5 array. Yes a small array of three disks set up at RAID 5 can only present two times the space of the smallest drive in the configuration. So have 3 * 250GB today you have 500GB of usable space. Tomorrow replace two of those drives with 750 GB drives and you still will only have 500GB of usable space. While I haven't studied it, I can also image low end arrays that if you now replace the last 250 GB drive with a 750 GB drive you might still only have 500Gb of usable space until you do something.
If you're talking significantly larger arrays, then the hot topic in storage is virtualized RAID. You throw several dozen to several hundred drives at a virtualized array, you tell the array how much RAID 0, RAID 1, and RAID5 storage you want and the array builds that across the drives available. You can even define some drives as hot spares. Somehow from you comments this sounds much larger than you're talking.
Also RAID 5 doesn't have to be a set of three disks. It can be any number of disks with parity space. The smallest RAID5 set is three, but there are arrays that build four, five, six, and seven disk RAID 5 sets. If with three disk (all the same size) sets one third of the space is lost to parity information. With seven disk sets only one seventh. Three is the common number for low end arrays because, well they are low end arrays and disk slots consume space and money. At the high end the number of disks in a RAID 5 set is set based on space savings vs. performance needs. The more disk in a RAID 5 set, the worse performance random writes will get.
Lastly while RAID 1 and/or 5 is a good way to protect you from the hardware failure of one disk, it is not the end all be all of data protection.
Whole arrays aren't as likely to go bad, but it has been known to happen. That's where host based RAID 1 between two arrays comes in for the highest availability.
In many configurations user error is the most likely concern. Wild card deletes, programatic corruptions, overwriting old valid files with new files are as likely to cause you grief in a given year, and RAID 5 (nor 1) won't give you the least bit of protection against that. Backups do better. And with todays cost of storage, and disk to disk backup is a practical way to give yourself another level of protection. Arguably for low update rates a nightly backup is better than RAID 1 as the likelihood of overwriting a file or deleting it is higher on a given day than a hard disk failure.
If your data is more valuable then you need to consider offsite backups. Again this doesn't necessarily mean investment in an expensive tape drive and expensive tape cartridges. External USB, Firewire or SATA drives even the NET can be a cost effective way to move data offsite.
Todd
Just get 3 or 4 friends who all like the same stuff as you, all get kitted out with large hard disks, and share everything with each other. The internet's getting faster these days, just make sure you all download the same torrents. Why not even have a shared torrent folder so you can all see what's up to date.
I guess when your mate gets 20mbit and starts using newsgroups it might cause a few issues, but hey, that's his problem.
And on a more serious note, you could actually do this with hamachi www.hamachi.cc easily, and if you wanted secure storage, you could keep your personal files in encrypted archives on your friends computers.
Everyone knows the only reason you need a hardware RAID card is because all the mobo manufacturers dicked us for IDE headers. 8 SATA connectors is no good to someone with over a TB of IDE drives.
It's OK Bender, there's no such thing as 2.
I'm not sure how much you got to spend
But the storage calculation for raid 5 is simple N-1 So if you put in nine disks then you have 9-1 = 8 disks of storage.
If you have only 3 disks you 3-1 =2 disks for storage. So as you see it gets more intresting, lets overhead when you use more disks.
If you do got a lot money to spend there exists expandable Raid 5 like solutions (different vendors give it different names, for example raid 10 or raid 20 or other names).
I'm wondering dough, you say you use it for media storage if it is only photo's perhaps just googling you can buy a pack of 200 writeable blanc DVD for about 139 euro that's close to a terabyte.. and damn cheaply for such amount of storage.
The backdraw is that you have to manage 200 DVD (perhaps there exist software for that)
Also think of lifetime these days digital hardware will break be it a HD or DVD you have to think of backing up a terrabyte too...(oops)
I know you're out there. I can feel you now. I know that you're afraid. You're afraid of us. You're afraid of change.
Hi,
:)
you have a few possible solutions. First lets talk hardware since that will cost you:
(sorted cheapest first)
- Internal controler with internal disks
- Internal raid controler with internal disks
- external JBOD box
- external raid box
- multiple external boxes with multi paths (yeah, we are this insane
I don't recommend an external box unless you want to go really big for the simple reason of price. You can easily fit 8-16 disks internaly with a good case at a fraction of the cost.
So internal it is. But do we need hardware raid? Hardware raid controler save you the cpu cost for software raid (less than 5% usualy) but tend to be slower than software raid due to the main CPU being so much faster than the controler. Hardware raid also costs a bundle and means you have to buy the same card again if it fails.
So again I recommend saving money. With (linux) software raid you can get any controler card, any number of cards and build your raid any way you like. (or not raid them at all if you prefer).
Once you have the hardware now you have to decide how to use it:
- JBOD
You can format and mount each disk on its own. That makes them totaly independent but your space will be fragmented. Once a disk is full you have to put the next file on another disk. That can become somewhat chaotic and you will run out of space on the wrong disk all the time and have to move stuff around.
- raid linear, raid 0, lvm
You can combine multiple disks into a larger device using any of the three. Lvm can do linear and striping and is a lot more flexible than the raids. So if you want one of the three then use lvm. Striping will give you extra speed but increase the risk of data loss since a failure of one disk in a stripe means the stripe is gone. With 10 disks certainly not a good idea.
- raid 1, raid 10
Mirroring will give you redundancy but at a high cost in both space and bandwith (in software raid). I only recommend it for / (since you can boot of software raid1) and systems where cpu loss for software raid is unacceptable.
- raid 5, raid 6
Now here comes the ultimate solution. The perfect combination of space, redundancy and speed. Raid 5 allows one disk failure without data loss, raid6 allows 2 disks to fail. I recommend raid 6 if you have more than 8 disks. You can calculate the MTBF for raid5/6 with X disks and you probably want it to be more than a single disk has. So a 100 disk raid6 might not be smart but 8-16 disks is probably fine. Someone else do the exact math.
Linux software raid allows you to grow the size of the raid after you replaced disks to the new size of the smallest disk. You can also grow the raid set by a disk to get more space.
Note that raid5/6 will give you one big disk again. You can partition it (partionable raid support in linux) or run lvm on top of it. I recommend lvm over partitioning since it allows online operation while partitionable raids need umounting.
You can also combine multiple raid5/6 with raid linear, raid 0 or lvm with or without striping. Instead of raid 5l/50/6l/60 I always recommend lvm. It is just more flexible. For example you could get 16 drives and build 2 raid5 sets a 8 disks.
Sidenote: What to do with unequal size disks?
A raid 1/5/6 will always use the size of the smallest disk in the set. But the extra space on bigger disks need not be wasted. Software raid can perfectly run on partitions. So you can create partitions the size of the smallest disk on all drives and create a raid there. Then repeat with the remaining free space for a second raid and a third and so on. That way most space can be used.
Example: 4 disks a 200G, 150G, 150G, 100G
|0 |50 |100 |150 |200G
200G |xxxxx|xxxxx|xxxxx|xxxxx|
150G |xxxxx|xxxxx|xxxxx|.....|
150G |xxxxx|xxxxx|xxxxx|.....|
100G |xxxxx|xxxxx|.....|.....|
|...set 1...|set 2|set 3|
Set 1: 30