Slashdot Mirror


Compelling Alternatives to RAID Setups?

jabbadabbadoo asks: "Our software shop has about 30 Linux servers and 15 NT servers running enterprise applications for our customers. Since we have service level agreements with most of them, uptime is crucial. One of the things we've done is to use RAID setups extensively, using products from well renowned disk- and controller vendors. However, we have discovered the paradox that introducing RAID controllers actually reduces overall uptime! Not only does more 'steel' increase the probability of failure, but what fails first is usually the RAID controllers. What is your experience? Have we been having bad luck?" "A related problem, especially on Linux, is that setting up RAIDs is actually a quite costly process. There seems to be endless problems with library versions, and upgrading existing servers simply takes too many hours. To keep the customers happy, we routinely have to create a 'shadow' server while upgrading which in turn means we, at some point, have to synchronize data to the new server, which in turns means a bit of a downtime. Ouch. Does anyone have a good solution to these problems? Of course, cost is a major issue, but so is uptime (which also means cost if we don't provide the uptime dictated in the SLA). What setup gives the best cost/uptime ratio? Thank for any thoughts!"

113 comments

  1. RAID is good by kansei · · Score: 2, Informative

    I remember swapping quite a few Compaq RAID controllers in my day. They wouldn't outright fail, but get in a "compromised" mode, and you usually had enough time to schedule downtime to swap them out. This was much better than messing with software mirror or raid settings, because it's transparent to the OS - the OS just sees a single large disk.

  2. Must be bad luck by Curien · · Score: 1

    We run about twenty systems with RAID storage devices (about half of them fibre channel). I've only had one system go down due to the storage device, ever. I think the power supply on our FC bays failed once or twice, but they have backup PSUs, so it wasn't a problem (hot swap, even!).

    --
    It's always a long day... 86400 doesn't fit into a short.
  3. A few tips by menscher · · Score: 5, Insightful
    First off, you're looking at the wrong "uptime" number. Don't look at how many days since your last reboot. Look at how many hours/year you are offline. If you're not doing raid, a failed disk means restoring from backups. That's a time-consuming, and therefore costly, process. If your controller fails, just pop in your spare controller. You do have a spare in-house, don't you?

    I'll agree that setting it up is a nightmare. I'm currently helping test two 4TB arrays for use on a Linux box (16 SATA drives presented as a single SCSI device). Benchmarks under linux are slower than under windows. It's a mess figuring out why. Meanwhile, vendors (who I will not name ship crappy software, and take months to act on bug reports.

    As for transitioning servers, I've been there too. And yes, copying a terabyte of disk in single is a very long process. It'd have taken several days, which is of course unacceptable. This is where the magic of rsync comes in handy. Copy the data over several days in advance, sync it just before the scheduled downtime, and you'll have a fairly short downtime.

    1. Re:A few tips by Anonymous Coward · · Score: 1, Interesting

      I really don't understand some folks fascination with SATA on "servers".

      SATA is designed for desktops. SATA drives don't meet MTBF criteria of the equiv. SCSI drive, nor the performance.

      If you've chosen it because it's the cheaper of the solutions, ok... if you chose it for performance... well, make sure you have a good backup solution.

    2. Re:A few tips by menscher · · Score: 1
      As I said, it's 16 SATA drives in hardware raid5 that presents itself to the server as a single scsi device.

      Yes, they have a lower MTBF, but it's in a raid array, so who cares? It'll automagically rebuild on our hot spare, and when we wake up to the email, we just go in and replace the downed drive with another.

      And yes, they have slightly lower performance. But the real performance reason for using SCSI instead of IDE is that it offloads the work to the SCSI controller/disk instead of wasting CPU time. If the entire raid array is presented to the system as a scsi device, then there's really no performance loss.

      It's certainly cheaper, performs about as well, and we have backups anyway (gotta have those to protect against lightning strikes and wily h4x0rs).

    3. Re:A few tips by Halvard · · Score: 1

      And you can buy enterprise grade drives now such as this drive.

    4. Re:A few tips by extra88 · · Score: 1

      Would you mind detailing the hardware and software being used for those 4TB arrays? In particular, what kind of drivers do you have to use and what kind of monitoring of the array do you have? I'm just getting started building a 1TB array using a 3ware Escalade 8506-12 card and 5 250GB Western Digital SATA drives. Right now it's running under Win2k and the 3ware software is helpful, providing an alarm app, web interface for monitoring and configuration and the ability to send alert emails. They have drivers for specific distro versions (RedHat 8,9 SUSE 8) and a non-distro specific disk management utility but I don't know if the utility requires using their driver.

      Our interest is not in performance or even uptime, just lots of disk space and maintaining data integrity (we'll be mirroring the data between multiple servers as well as using RAID). We're starting with 1TB but will probably need 2TB very soon and the data set will keep growing. An additional 1TB/year is quite possible.

      Please email me if you'd rather not put your response in the comments.

    5. Re:A few tips by menscher · · Score: 2, Informative
      The 4TB arrays are units we're evaluating (one from Excel, the other from RaidKing). They're just rack-mountable boxes that have a scsi uplink. So, as far as the computer is concerned, you just have one massive scsi drive. (There's a catch, which is that these units can't seem to have more than 2TB per "device", so you really get two scsi devices presented to the computer.)

      Life is made a litte annoying by the 2TB limit in the 2.4 kernel. But we're willing to live with that, for now. I'm told there are patches to fix this, but I prefer stability over features for this box.

      As for 3ware, I've got a box with an Escalade 7500-4LP running RedHat 9. It works by default (can boot off the raid, etc). 3ware has extra drivers, but I don't use them. It's a messy situation, since you have to simultaneously upgrade firmware, driver, and utility programs. I've been less-than-impressed with their support. When I reported that the md5sum on their website didn't match the file, they said "We know.... don't worry about it... it doesn't matter." Umm, yeah. Right.

    6. Re:A few tips by extra88 · · Score: 1

      Thanks for the reply. I'm guessing the Excel is the SecurStor 16 SATA RAID and the RaidKing is the RAIDking 827. The Excel site provides some info about their monitoring software, RAIDWatch. RAIDking doesn't say anything on their site about monitoring. What's the point of having redundant disks if there's no reliable way of being notified when one fails?

      I wouldn't have a problem with the 2TB limitation, I've been thinking that I'd make each array no bigger than 1TB anyway (or 5 drives as RAID5, whichever is larger).

      RAIDking doesn't list prices and Excel doesn't say how many (if any) drives come with their enclosures. Unless their 12 bay model comes with at least 1TB of disk space, they don't fare well, price-wise, against the Gateway 840 RAID enclosure I mentioned in another comment. Based on what I've seen from storage specialty companies like these, I wouldn't be surprised if the price includes zero drives.

    7. Re:A few tips by DA-MAN · · Score: 1

      I would also recommend NexSan's ATA-boy. Their ATA-beast sucks on performance, but hasn't let us down. Their ATA-boy has decent performance, has a nice footprint and is competitivly priced.

      --
      Can I get an eye poke?
      Dog House Forum
    8. Re:A few tips by alazarev · · Score: 1

      Those are the exact units we evaluated. We chose Excel for serveral reason, one of which was the RAIDWatch software, which seems to work perfectly. We got it stocked with 16 X 250GB drives (WDJB type), and 1GB of cache. About 10.5K for that package.

      But we are still confused about many benchmark tests that we get, in windows 2003 and RHEL3-AS. I'd love to speak with someone who has a lot of experience with running benchmarks on RAID arrays, especially if they use SATA-SCSI enclousers, from any manfacturer.

      If someone is willing to share their benchmarks, I'd love to share mine with you. Too much detail to go in here. Email me please at alazarev@uiuc.edu.

    9. Re:A few tips by extra88 · · Score: 1

      16x250GB for $10,500 is $656.25 per disk which is not bad, especially when you take the cache into account.

      The Gateway 840 would be $6,549 (if you bought the disks separately) 12x250GB ($545/disk) but that's with only 12 bays and 256MB cache. It uses StorView Storage Management from nStor (the 840 is probably a re-branded version of nStor's NexStor 4700S). Does anyone have any experience with StorView? It only lists RedHat as a supported Linux distro but again I'm wondering if that really matters.

      The Apple Xserve RAID would be $12,300 with only 14x250GB ($878/disk) if configured with 1GB cache and a 3yr. warranty.

      Of course you and I are in Higher Ed and the Gateway and Apple prices I'm using are retail (Xserve RAID is $783/disk at the higher ed price).

    10. Re:A few tips by FreeBSDPete · · Score: 1
      Wrong, or half right, which is sometimes worse than just wrong. The main reason for using SCSI over IDE technology is simultaneous command queueing. Ever wonder why a SCSI drive makes a machine, server or workstation feel so much faster?

      It's because even workstations do simultaneous read requests. SCSI has this great feature that basically when you request data from 3 different sections of the drive, it reprioritizes on the fly, picking up everything you requested along the way to the furthest request. That way, fewer strokes get you more data. It's probably a good part of the reason the MTBF is so much better.

      An IDE drive deals with it's queue in a FIFO manner. Each request gets processed sequentially, so an SATA Raid is going to be basically useless in terms of performance, and host utilization is only the tip of the iceberg.

      There's at least one good reason real geeks use SCSI. It's also a big part of why IDE-based Mac's 'feel' so much slower.

      SCSI rules.

      Neeeeeeext!

  4. Brands? by JLester · · Score: 4, Informative

    You don't list what brand controllers you are using, but your problems are not typical in my experience. We are a 100% Compaq shop and use their SmartArray controllers with Novell Netware and Debian Linux. We've never had a controller failure and have only lost about 3 drives over the last six years or so.

    I'm a firm believer that you get what you pay for with enterprise-class servers. You shouldn't expect Tier-1 reliability from servers that are built with commodity hardware. There is a reason that Compaq/Dell/IBM servers are more expensive.

    We also haven't had any issues installing other than the default Debian boot disks not supporting the SmartArray controller. A custom set of disks took care of that though.

    Jason

    --
    "FORMAT C:" - Kills bugs dead!
    1. Re:Brands? by -thinker- · · Score: 1

      I'll second that. We have close to 150 Compaq ProLiant servers with SmartArray controllers [different versions] and had no more than 3 controllers fail in the last 5 years. Drives failed quite a bit but then again, just pop a new one in and use the warranty to replace the dead one. BTW, all 3 of the controllers that failed were the "embedded" kind, none of the "add-in" boards failed.

      We are using Novell Netware as well, so this is more of a comment about hardware reliability rather than software woes.

      --
      -thinker- "Be careful how you think; your life is shaped by your thoughts."
    2. Re:Brands? by Anonymous Coward · · Score: 0

      I agree with the parent to this.

      I have used in IT Compaq controllers for years, and I have never had major unscheduled downtime due to them, which is a big compliment. They may go into degraded mode, but it gives me time to schedule a fix.

      Now, the Compaq power supplies blow often, but luckily are redundant, so I can replace those hot, no downtime.

    3. Re:Brands? by Anonymous Coward · · Score: 0

      What's all this talk of controllers *failing*? Since when do chips on boards just fail? Processors don't. Chipsets don't seem to. I never see anyone complaining about them. So why a RAID controller? RAID seems to be one of the simplest things, but in practice it's a mess.

  5. Software RAID? by Marillion · · Score: 3, Interesting
    I've been using linux software raid with an old non-raid symbios scsi-3 card. Performance isn't a requirement in this environment so the penalty (which isn't that much actually) is acceptable.

    In the past two years, none of the "downtime" that I've experenced has been attributed to the disk array or controller.

    The biggies have been: power outage that exceeded the capacity of the UPS (3 hours), planned upgrades and an anonymous gremlin who bumped the reset button - since detached.

    --
    This is a boring sig
    1. Re:Software RAID? by Anonymous Coward · · Score: 0

      I have a software raid question. where is the information about the array stored? is it completly on the os disk or is it on the array somewhere. I was wondering how fault tollerent software raid is. If your os disk dies, is it possible to reclaim your data? Would a pretty dependable system consist of the os disk mirrored on it's own raid card and then a bunch of other disks software raided together? another question would have to be, what if the os got corrupted, could you reinstall the os and be able to reclaim any data then? Would it be possible to keep a stable image of the os disk and reinstall from that, or is the time of crash array info needed to access it?

    2. Re:Software RAID? by Marillion · · Score: 1
      The header blocks of each disk contains a map of all the disks. I reminds me of the Borg. Something like: I am 3 of 5.

      I had to compile a version of the kernel with the auto-detect and activate RAID feature turned on. That way I could make the disk array my root file system.

      To your question, The disk array is abstracted to appear as a single device. The disk array has a mission to defend against problems below it, not above it. So if the filesystem code has bugs and corrupts itself, the array doesn't know any better, but it will faithfully mirror and protect that corruption against hardware failure.

      A reasonable comparison might go like this. A journaling file system, like ext3 or reiserfs can't defend against the root user doing rm -f /etc/passwd.

      The best defense against that is something like EMC's BCV. In a multi-array system, they have some technique of delaying the replication from array A to array B. That way if you accidentally rm -rf / on array A, you have half an hour (or whatever) to switchover to array B.

      If you're curious, here is what my array looks like. I distrubuted swap among sd{a,b,c,d,e}1.

      # mdadm --misc --examine /dev/sda2
      /dev/sda2:
      Magic : a92b4efc
      Version : 00.90.00
      UUID : ffae297b:5de65de9:7ed22ed0:6435b436
      Creation Time : Fri May 30 00:33:16 2002
      Raid Level : raid5
      Device Size : 8888256 (8.48 GiB 9.10 GB)
      Raid Devices : 5
      Total Devices : 5
      Preferred Minor : 0

      Update Time : Dec 23 13:58:39 2003
      State : clean, no-errors
      Active Devices : 5
      Working Devices : 5
      Failed Devices : 0
      Spare Devices : 0
      Checksum : 6a77e500 - correct
      Events : 0.6531321

      Layout : left-symmetric
      Chunk Size : 64K

      Number Major Minor RaidDevice State
      this 0 8 2 0 active sync /dev/sda2
      0 0 8 2 0 active sync /dev/sda2
      1 1 8 18 1 active sync /dev/sdb2
      2 2 8 34 2 active sync /dev/sdc2
      3 3 8 50 3 active sync /dev/sdd2
      4 4 8 66 4 active sync /dev/sde2
      --
      This is a boring sig
  6. You are absolutely right by Anonymous Coward · · Score: 0

    At least, about any except the most expensive RAID controllers out there.

    In my experience, which does not include RAID controllers that cost more than the whole computer, software RAID is the most reliable RAID; however, even it is questionable if it's better than just buying good drives and having a PCI IDE controller card around in case the Mother Board controller goes out.

  7. So would XSan help? by 2nd+Post! · · Score: 4, Interesting

    XSan can 'hide' the complexity of RAID, as well as providing management tools and 'intelligent' cascading failure... but that's just from reading the specs, not from actual experience. I hear XSan is based on CVFS? I should look at that too.

    1. Re:So would XSan help? by Crypt0pimP · · Score: 2, Informative

      Don't believe the marketing.

      From what I read, the XSan software is first and foremost a distributed file system for shared volumes from the Xserve RAID.
      If you look at the applications, it's about multiple servers or workstations with concurrent access to a single volume - distributed file locking.

      Great stuff for the stated purpose, can't wait to get my hands on it!

      Hiding the complexity of RAID is the domain of storage 'virtualization' solutions. The ones that let you mix and match raid types across any number of spindles you throw at it.

      <Shameless_Plug>
      My product, the XIOtech Magnitude does that. Take up to 126 spindles, create RAID 0, 1, 5, 10 volumes and give 'em to your servers. Boot off 'em, mirror 'em, copy 'em. Stick 'em in your ear!
      </Shameless_Plug>

      direct flames or questions to slineyp at hotmail dot com

      --
      Striving to achieve a lower state of conciousness
  8. Major problems with Promise RAID controllers. by Futurepower(R) · · Score: 5, Interesting


    This is on a lower level than the RAID you are using, but we are having major problems with 10 Promise Technology TX2000 mirroring RAID controllers that we bought. The mirrors go critical for no detectable reason. Promise Technology technical support is unable to find the problem, and the company is unwilling to escalate the issue. The Promise Technology technicians escalate the issue, but 2nd level technical support never calls back.

    Promise mirroring controllers on ECS (EliteGroup) L7VTA v 1.0 motherboards have the same problem. When we call ECS tech support, there is a recorded message saying they are busy and to call back later.

    We've been supplying computers with Promise mirroring RAID controllers since the company began doing business, and we've had very few problems until now.

    Possibly the problems are associated with newer, faster motherboards, or with AMD VIA chipset motherboards. We've never had problems with RAID controllers on Intel chipset motherboards.

    Another possibility is that the RAID controllers are incompatible with DVD burner drivers that are installed with Roxio or Nero DVD burning software.

    1. Re:Major problems with Promise RAID controllers. by billcopc · · Score: 1

      I've got an ancient Promise FastTrak66 in my desktop PC. I can attest to it being a potential source of problems. I haven't had any specific crashes since installing it, but I can tell it's not playing nice with IRQs and whatnot (i.e. the mouse locks hard for a moment when doing big-time disk thrashing). I could see this causing problems with PCI cards (network adapters / other raid controllers). Luckily for me, the only other card in my rig is the AGP video, and games usually don't thrash during fps-sensitive action sequences.

      But yeah, Promise raid controllers are cheap. They work well, but they're certainly not the greatest things on earth. I've been lusting after a 3Ware card for some time .. now there's a pro-quality ata-raid controller.

      --
      -Billco, Fnarg.com
    2. Re:Major problems with Promise RAID controllers. by Anonymous Coward · · Score: 0

      You're using Promise crap.

      Then you start using Promise crap on PC Chips motherboards.

      Then you start Promise crap on PC Chips boards with VIA chipset motherboards.

      Any questions?

    3. Re:Major problems with Promise RAID controllers. by GoRK · · Score: 3, Informative

      There is a very important thing that you have not realized...

      Those are not really true hardware RAID controllers. They are regular hacked up IDE controllers with a bit of BIOS firmware on them that handles software RAID via INT13 until the OS loads and the software RAID in the "driver" can take over.

      They offer nothing that a legitimate hardware raid setup should give you such as cache RAM or CPU offloading. Mirrored setups on these types of pseudo-hardware RAID controllers HURTS PERFORMANCE. Don't believe me? Benchmark it yourself versus software raid and hardware raid on a real controller such as Adaptec AAA or 3ware...

    4. Re:Major problems with Promise RAID controllers. by drsmithy · · Score: 1
      They offer nothing that a legitimate hardware raid setup should give you such as cache RAM or CPU offloading.

      Actually, they do provide one particularly useful feature and that is to present the RAIDed disks to the OS (and the BIOS) as a single device.

      Mirrored setups on these types of pseudo-hardware RAID controllers HURTS PERFORMANCE.

      The software overhead for RAID1 should be, for all intents and purposes, insignificant. It just doesn't *do* anything that requires much CPU work.

    5. Re:Major problems with Promise RAID controllers. by ivan256 · · Score: 1

      The software overhead for RAID1 should be, for all intents and purposes, insignificant. It just doesn't *do* anything that requires much CPU work.

      What it does is blocking. A good hardware raid mirror will have a battery backed up cache so it can acknowledge the write as successful either immediatly or after the data is on one disk, which a software raid setup can't do reliably. What you end up with is the additive rotational latency for two disks, which can signifigantly hurt performance for small random access writes.

      Of course, in many cases this is perfectly acceptable.

    6. Re:Major problems with Promise RAID controllers. by GoRK · · Score: 1

      Actually, they do provide one particularly useful feature and that is to present the RAIDed disks to the OS (and the BIOS) as a single device.

      The int13 firmware does this for the code before the OS loads and the driver is responsible for doing this job afterwords. Don't let this software trickery fool you. Behind the veil of the driver, the system software is reading and writing to the two disks individually. The int13 stuff is a nice trick, but it's only necessary due to the inability to replace the OS's bootloader, otherwise they probably wouldn't bother with the difficult task of writing such firmware.

      The software overhead for RAID1 should be, for all intents and purposes, insignificant. It just doesn't *do* anything that requires much CPU work.

      Well, it does at least twice the disk IO, plus it's (hopefully) doing consistency checking by comparing the two data streams for discrepencies. The overhead of the IO is small because it's being done via DMA, but the overhead of the host cpu comparing the data during reads is quite a bit more involved. A quick glance at the linux driver for these cards will show you that there's not even any cpu offloading for this extremely simple operation -- ie how hard would it be to have the controller "read x bytes off of both of these disks and compare them and put them in this memory location or else raise an error if they don't match" -- but they can't. they have to be asked to read the data individually from each disk and then the CPU has to do the comparison. This, comparitively, is a lot of overhead.

      I'm not trying to bash these products. I have used them from time to time as appropriate... but I'm sick of people putting them into situations where they don't belong. There is a reason they are so damn cheap.. they do pretty much nothing!

    7. Re:Major problems with Promise RAID controllers. by drsmithy · · Score: 1
      The int13 firmware does this for the code before the OS loads and the driver is responsible for doing this job afterwords. Don't let this software trickery fool you. Behind the veil of the driver, the system software is reading and writing to the two disks individually. The int13 stuff is a nice trick, but it's only necessary due to the inability to replace the OS's bootloader, otherwise they probably wouldn't bother with the difficult task of writing such firmware.

      I'm well aware of how the "trickery" works - it isn't "fooling me" in the slightest. The fact is it performs a useful function by not having to worry about booting weirdness/problems should the boot drive keel over.

      Well, it does at least twice the disk IO, plus it's (hopefully) doing consistency checking by comparing the two data streams for discrepencies.

      The IO overhead will (should, at least) be the same whether it's hardware RAID or software RAID.

      The overhead of the IO is small because it's being done via DMA, but the overhead of the host cpu comparing the data during reads is quite a bit more involved. A quick glance at the linux driver for these cards will show you that there's not even any cpu offloading for this extremely simple operation -- ie how hard would it be to have the controller "read x bytes off of both of these disks and compare them and put them in this memory location or else raise an error if they don't match" -- but they can't. they have to be asked to read the data individually from each disk and then the CPU has to do the comparison. This, comparitively, is a lot of overhead.

      Compared to what, though ? OS-level software RAID is going to have to do precisely the same thing and IMHO the processing involved, taken in the context of modern, fast CPUs, is insignificant.

    8. Re:Major problems with Promise RAID controllers. by Anonymous Coward · · Score: 0

      Can hardware RAID-1 controllers do read-balancing at the same level as software RAID-1?

      Two processes in linux (when using software RAID) can access two different files, and read each file off separate disks. It also read from the disk which has it's header closest to the data.. in certain cases this provides much better read performance than RAID-0.

    9. Re:Major problems with Promise RAID controllers. by GoRK · · Score: 2, Informative

      The IO overhead will (should, at least) be the same whether it's hardware RAID or software RAID.

      On a real hardware raid controller this overhead exists only on the controller CPU (normally an i960 or somesuch) and is further alleviated by the cache ram on the card.

      Compared to what, though ? OS-level software RAID is going to have to do precisely the same thing and IMHO the processing involved, taken in the context of modern, fast CPUs, is insignificant.

      Well, I wasn't trying to compare promise/hpt/et al. to software raid, but if the overhead of any kind of host-cpu based raid were really actually insignificant as you claim, then I guess we are all real suckers for plunking down hundreds or thousands of dollars for RAID cards.

      The point is that any extra overhead whatsoever on the CPU dealing with the disk is very often unacceptable. The disk subsystem is pretty well the slowest component in any system, and having the host CPU wait around on it all the time can be a real performance killer. Take an example of building a workstation to edit HD video. This will normally use RAID 0 if it is a capturing machine or sometimes RAID 1. Build one -- or better yet build three - one using software raid, one using 'hardware-assisted' raid, and one using a genuine hardware controller.

      The kind of thinking you are doing is the kind of thinking that leads to bloated software. The idea that the CPU is "fast enough" that efficency doesn't matter might be fine for the desktop in most cases, but on a server could mean the difference between supporting 3000 and 6000 users.

    10. Re:Major problems with Promise RAID controllers. by GoRK · · Score: 1

      Is this a genuine question or are you operating trolling under the assumption that this is a problem that hardware designers didn't think about?

      Of course they can! Data access to disk is never necessarily serialized at any point in the process. Even the drive firmware itself tries to read requested blocks in the most optimized order possible.

      On a SCSI, this feature is called command queueing. It's present in the SATA specification also, but is not a part of IDE or ATA. It's kind of a 'fire and forget' thing where you ask for the controller to do a bunch of stuff (and this can be either in arbitrary or sequential order) and it gets back to you whenever it's done with something.

      You will notice that drivers for these 'hardware assisted' RAID controllers from promise and the like are implemented as SCSI devices for this precise reason, although the command queueing is still taking place inside the software driver. SATA controllers appear as SCSI devices in windows for similar reasons also. The IDE subsystems of Windows is not equipped to deal with some of these things but the SCSI subsystem always has been...

    11. Re:Major problems with Promise RAID controllers. by Anonymous Coward · · Score: 0

      This is on a lower level than the RAID you are using, but we are having major problems with 10 Promise Technology TX2000 mirroring RAID controllers that we bought. The mirrors go critical for no detectable reason. Promise Technology technical support is unable to find the problem, and the company is unwilling to escalate the issue. The Promise Technology technicians escalate the issue, but 2nd level technical support never calls back.

      Sounds like shitty power-supply problems. Seriously. Try doing the mirrors with lower-power drives (e.g. 5400rpm or even 2.5" laptop drives).

    12. Re:Major problems with Promise RAID controllers. by treat · · Score: 1
      Well, it does at least twice the disk IO, plus it's (hopefully) doing consistency checking by comparing the two data streams for discrepencies.

      No commonly used software RAID does this.

  9. Sounds like a design problem on your end. by stienman · · Score: 4, Informative

    It's hard for me to believe that RAID causes more downtime than single drive setups, unless you have a really bad raid system and a really good backup system.

    The only time RAID should ever be down, is during initial setup. Thereafter you should replace bad drives while it's running and you should never have cause to shut it down due to a RAID issue.

    If you are experiencing RAID hardware problems then take a good look into these areas:
    RAID Hardware --> Are you using cheap stuff? It honestly isn't worth it. Perhaps you're just discovering the 'real' value of 'cheap' hardware.
    RAID Software --> If you're using unsupported drivers (ie, vendor doesn't supply or support them) then ditch the hardware and get hardware with supported drivers - make sure they support them on your configuration. You've already proven that you can't support them yourself.
    System Hardware --> If the system is generally cheap (cheap power, bad airflow, cheap components, etc) then you simply can't expect the RAID card to work 24/7.
    Server Room --> Make certian your server room can handle the power and ventilation needs of the servers. This should go without saying, but all too often it is the problem.

    The reason people go with cheap components is the lower initial cost. They only work for a few thousand hours of heavy operation. You must get server rated components if you want them to operate for more than a year or two. There really is a difference.

    Lastly, I use 20+ Promise FastTrack ATA RAID cards in 20+ Novell networks. I use cheap components, and they work in harsh conditions. They are not set up for hot-swap, as that's not a need in this situation. I have to replace the cheap hardware every 2-4 years, powersupplies every year, hard drives every 2-3 years. The only time the RAID cards have gone bad is when a power supply failure (usually due to a power outage/surge/brownout) fries the motherboard and usually most of the components in the case.

    I have never had a failure where both HDs completely failed simultaneously, though usually when the rest of the computer goes I replace the whole thing and get the data off one of the old hard drives. This is not an advertisement for Promise. They simply are the only one's with supported Novell 3.12 drivers. :-) Soon to go away... :-(

    I'd be surprised if you've covered all these bases and are still having problems.

    -Adam

    1. Re:Sounds like a design problem on your end. by gremlin_591002 · · Score: 1

      That's a nifty theory, but kind of off base. I've had several Dell servers using Adaptec RAID cards that ended up with big downtime issues. The RAID card used a button battery for drive info backup. The battery failed, the very first time that the UPS wasn't able to keep with a long power outage, the server shut down, and never came back, restore from tape after building the array again. This was about 3 years back so I'm hoping that somebody made a design change. Anyway, it happened to three of my servers on the same day (three different clients, same power outage). That was a really crappy week. RAID controllers are complicated beasts. They fail, maybe FLASH memory would be a better choice for drive array info.

    2. Re:Sounds like a design problem on your end. by stienman · · Score: 1

      So you didn't replace the batteries on a regular basis? Sounds like a maintenance issue to me.

      The cards I use store array info on the hard drives themselves. I can move the hard drives from one card to another without reconfiguring anything.

      Of course, I would suggest staying away from RAID cards that use batteries...

      -Adam

    3. Re:Sounds like a design problem on your end. by gremlin_591002 · · Score: 1

      All this stuff is way in the past, but the machines had been in service for less than six months. The only time I know that they really worked was the day that I finished building them in the shop and transported them to the client site. The RAID cards in question were replaced as soon as possible.

    4. Re:Sounds like a design problem on your end. by Anonymous Coward · · Score: 0

      Sorry for the AC post but you know how it is.

      Batteries are used in some not-so-cheap environments. Ever heard of an AS400, they have batteries on their cache cards and it's not all that hard to screw up the replacement. The end result, 58 hours of down time. OOPS!

      Moral of the story: Cards wit batteries are EVIL!

  10. Multi-engine aircraft by wowbagger · · Score: 3, Insightful

    There is an old saw in the aviation industry: "A twin engine aircraft will have twice as many engine problems as a single engine aircraft."

    However, which would you rather be in, a twin engine aircraft that just lost one engine, or a single engine aircraft that just lost an engine?

    Yes, RAID cards die - I've been shocked at how often that happens. And 5 disk RAID will have more failures than a 4 disk JBOD (just a bunch of disks) array.

    But the question is, are you seeing a reduction in UPTIME, or just in mean time to failure? Maybe the RAID system throws an error once a month and the JBOD system throws an error every two months, but if you can recover in 5 minutes by swaping cards or drives rather than 5 hours for restoring the JBOD from backup, you are better off.

    Perhaps what you might look at would be using RAID software on the server's processor, coupled with Firewire drive bays, disks, and multiple Firewire cards. If you have a card die, move the disks to another card until you can schedule downtime. A disk dies, hot-swap and rebuild in background.

    1. Re:Multi-engine aircraft by dourk · · Score: 1

      However, which would you rather be in, a twin engine aircraft that just lost one engine, or a single engine aircraft that just lost an engine?

      That depends, can the twin engine plane successfully land with only one engine running?

      --
      Wake up.
  11. Hardware, Configs, Backups by duffbeer703 · · Score: 4, Informative

    The answer is SysAdmin 101 stuff.

    1. Buy quality hardware.

    IDE RAID for critical servers is a bad idea.

    In my experience, RAID hardware tends to be very picky and suffers from subtle and often bizarre hardware conflicts. In general, using a RAID solution that is packaged with the hardware is the best idea.

    If you cannot afford good RAID hardware, stick to conventional JBOD configurations.

    2. Configuration

    Design your the configuration of your systems around consistency first, performance second.

    You need to document your procedures for building servers, allocating storage, etc. Create scripts whenever possible.

    If you are not confident that you could not talk a marginally qualified technician through a server rebuild over the phone, your docs aren't good enough. If you don't have the time to write docs, make the time or work late.

    3. Backups

    You need documented, tested backup AND restore procedures. All of your oncall staff need to be able to restore a server. ..

    With 50 servers, disk controller or disk failures should not be a common event. We work with approximately 400 datacenter and 200 field servers (varying in age from 1-9 years), and replaced 3 controllers and 19 disks last year.

    Look for electrical issues, you may have crappy electrical service.

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
    1. Re:Hardware, Configs, Backups by Zeriel · · Score: 1

      Speaking as a small-time sysadmin myself, I disagree on principle--ATA Raid for critical servers becomes a great idea once you realize that you can buy 2 1TB-usable RAID5 machines with 3Ware ATA RAID controllers for less than the price of a single 1TB-usable RAID5 SCSI machine.

      Granted, I'm fully in agreement with you--SCSI is more reliable and better for processing servers etc. But when it comes down to a cost-effective way to get a hell of a lot of disk, I can heartily recommend the 3Ware Escalade stuff. Just make sure you build it with redundant servers in mind or you buy their hot-swapping gear. And don't use it with older/smaller Western Digital drives--there's a bug. =P

      --
      "America has done some terrible things. But I know that Americans don't cheer when innocents die." -Dave Barry
  12. What Promise FastTrak RAID controllers? by Futurepower(R) · · Score: 1


    What Promise FastTrak RAID controllers are you using? As I said in the comment just above yours, we are having major problems with FastTrak Tx2000 mirroring controllers.

    1. Re:What Promise FastTrak RAID controllers? by stienman · · Score: 1

      Just the cheap FastTrack tx100. Some of the older fasttrack66 cards as well.

      -Adam

  13. What RAID controllers would you recommend? by Futurepower(R) · · Score: 1


    What RAID controllers would you recommend?

    What hardware is "quality"?

    1. Re:What RAID controllers would you recommend? by duffbeer703 · · Score: 1

      Where I work now we mostly use IBM hardware with ServRAID controllers.

      In the past I've worked with Compaq hardware, which I believed shipped with Symbios controllers. (Been awhile)

      Sun storage was usually good, except that we tended to get alot of flaky gbics and cabling from them.

      "Certified" hardware really is important, especially in larger environments, where wasted time is more expensive than buying the vendor's recommened hardware. A good example is when due to a supply shortage, we ordered a different vendor's NIC for a bunch of servers. No big deal, right?

      Wrong. The NICs didn't play nice with switches in the field for some bizarre reason, and hundreds of hours of staff time was wasted fixing the problem.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
    2. Re:What RAID controllers would you recommend? by BigDish · · Score: 1

      3Ware Escalade series. Relatively inexpensive, rock solid, vendor Linux support.

  14. The most trouble I ever had with raid... by MoOsEb0y · · Score: 5, Interesting

    I spent the past week and a half trying to set up a 4x160 SATA Raid-5. It was a huge excercise in frustration because every time I'd try to build a volume, my machine would promptly freeze after a few percent. I changed out IDE emulation for SCSI emulation in kernel... same thing... I changed SATA controllers, same thing. I changed SATA cables, same thing. I changed power supplies, same thing. I added 4 80 mm case fans, same thing. In the end, it turned out that the culprit was raidtools. Nobody had ever bothered to post that raid-5 + raidtools + kernel 2.6 locks up a computer. I changed to mdadm, and I had a working array 50 minutes later.

  15. Storage Cluster by Ratso+Baggins · · Score: 2, Interesting

    If your bandwidth requirements are not too high you may be able to use a distributed file system on many redundant (cheap IDE & G ethernet) nodes and allow for replacements. Your uptime should be constant, given enough UPS and redundancy of nodes.

    --

    --
    "we live in a post-ideological world..." - Billy Bragg.

  16. Look at Google by Bruha · · Score: 2, Interesting

    They're systems are probably 80% auctioned desktops and such from busted dot coms.. and I suspect that many of them are not RAID at all. I have yet to hear of a redundant raid controller either. Your best bet is just replication of data on you backend servers and using something in the nature of a Cisco CSS or some other services balancer device to handle keeping alive servers available while redirecting away from dead servers.

    You can still do RAID with this setup but you'd have the added security of 2 or more systems making up your entire functional system so if one is down the other can continue normally. Then it's trivial to repair the dead machine and bring it back into the cluster.

    1. Re:Look at Google by toast0 · · Score: 1

      Here's a description of the redundant raid controller I'm familiar with.

      The compaq proliant 7000 (xeon) I've got came with a smart array 3100es, which does raid on the 3 hot swap scsi cages, and there's a special slot on the i/o board for a second 3100es, so that if one dies, i can just hotswap in a spare. (the two slots pci-x plus 3 channels of scsi that go to scsi cages, and a fourth scsi channel for controller to controller communication)

    2. Re:Look at Google by Alex · · Score: 1

      I have yet to hear of a redundant raid controller either.

      Well you obviously don't know anything about proper RAID then do you? All enterprise storage costing 25k+ at least has this option.

      The normal configuration is an array, which has 2 controllers in it. You create LUN's, and assign them to the primary + secondary controller. The primary + secondary controllers have a heart beat, which ensures one takes over the others configuration if it fails. You dual attach your host to each controller. Set up IO multi-pathing software (vxdmp / mpxio - on solaris), you send I/O's down both paths, the active controller recieves the I/O's. If the active controller fails the secondary one takes over, depending on the quality of the hardware you will get a sub 5 second pause - then i/o's will continue, your apps shouldn't even notice. On some (more expensive) hardware both controllers simultaniously handle I/O's for the LUN's - with these you get zero outage if the controller fails.

      The cheapest dual redundant RAID controllers I've seen are these for about 10k all up, these are rebadge d infotrend devices, so something similar should be available where ever you are. FYI their GUI appears VERY similar to Sun's 3510FC, so much so that I think it is the same chassis - but with different disks (SATA vs SCSI).

      If you want REAL enterprise storage - eg HDS, you get RAID5 or 1 LUN's presented by the system - which actually consist of 4 disks - each disk on a separate RAID controller. These systems come with a data integrity guarentee of worth a significant amount of .

      Alex

    3. Re:Look at Google by Anonymous Coward · · Score: 0

      Google never writes to the filesystem -- it's all in memory and temporary. They only use the disk to boot the system. Hardly anyone is like Google.

      You can bet they'll be using RAID (etc) for the GMail service.

    4. Re:Look at Google by dubl-u · · Score: 2, Informative

      Google never writes to the filesystem -- it's all in memory and temporary. They only use the disk to boot the system.

      For their production service, I understand that they keep it all in memory. But it's hardly temporary.

      Hardly anyone is like Google.

      For now. Google was one of the first companies to take advantage of the fact that RAM and procesing power have become ridiculously cheap. SQL databases arose in an era when 32k was a fair bit of RAM, and where a business computer was one or more refrigerator-sized units kept in a sacred temple.

      Now computers are cheap and disposable. I can fill a rack with cheap 1Us and get processing power that Sun can't match at 10 times the price. The only trick is to write your apps in such a way that you can tolerate hardware failures. That's a little hard, but it paid off handsomely for Google. Others will learn this trick.

      You can bet they'll be using RAID (etc) for the GMail service.

      You'd lose that bet. They already have built their own distributed network filesystem, GFS, that holds at least hundreds of terabytes. It has performance and reliability levels well above any RAID installation I've ever heard of, and it uses cheap commodity hardware to do it. I'd bet that GMail will be built on top of a variant of GFS or some other in-house technology.

    5. Re:Look at Google by j-turkey · · Score: 1
      Well you obviously don't know anything about proper RAID then do you?

      Hey...be nice.

      The cheapest dual redundant RAID controllers I've seen are these for about 10k all up, these are rebadge d infotrend devices, so something similar should be available where ever you are.

      Dell sells some cheapie dual/redundant controllers for well under $10K -- I know that they're available in their tower servers for sure.

      --

      -Turkey

    6. Re:Look at Google by justins · · Score: 1
      They're systems are probably 80% auctioned desktops and such from busted dot coms.. and I suspect that many of them are not RAID at all. I have yet to hear of a redundant raid controller either. Your best bet is just replication of data on you backend servers and using something in the nature of a Cisco CSS or some other services balancer device to handle keeping alive servers available while redirecting away from dead servers.

      There are whole classes of applications where that can't possibly work. If you're talking about web servers it's great.
      --
      Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
  17. just unlucky by jamesh · · Score: 1

    All things being equal in terms of build quality, the thing most likely to fail is the thing with the most moving bits.

    You say you've had more raid controller failures than disk failures. Did any of the raid controller failures require a restore from backup? A non-redundant-disk failure would have.

    Add up the total time you were down due to raid controller failures and the total time you would have been down for disk failures if you didn't have raid. That's a better measure than instances of failure.

  18. PC Chips motherboards? How so? by Futurepower(R) · · Score: 1

    PC Chips motherboards? How so?

    What do you recommend?

    1. Re:PC Chips motherboards? How so? by Anonymous Coward · · Score: 0

      ECS == PC Chips, just a few name changes in the middle.

    2. Re:PC Chips motherboards? How so? by ted_nugent · · Score: 1

      Use top tier equipment for your mission critical servers. They're better made and better supported. In the i386 space, that means IBM, HP, or Dill. You can get a nice little 2-way middle range server from any of the above that provides for redundant disks and power for ~5-7k.

      Most business people will see that hardware is dirt cheap compared to downtime.

      --

      Free the West Memphis Three!

  19. RAID Alone != good design by photon317 · · Score: 4, Informative


    You can't slap a buzzword like RAID onto whatever you were doing before and expect results. Reliable systems have to be carefully engineered correctly.

    From the sound of your posting, I'm assuming when you say you're using RAID, you mean internal RAID cards inside a server with internal disks attached, and relatively small amounts of it. In these types of scenarios, the highest performing, most reliable, and most cost effective option is to put two seperate scsi controllers in your boxes, buy twice as much storage as you need, and mirror between the controllers using the OS's software mirroring capabilities. You are now indepedant of controller failure, the controllers themselves are less likely to "fail" (which doesn't always mean hardware frying) than a complex raid controller by their simpler nature, and you're getting the performance benefit of full mirroring instead of that clunky raid5 business. If you have enough storage to warrant four or more internal disks of some size, use mirror+striping. Always mirror at the lowest level, and then stripe on top of that (in a 4 disk design actually it doesn't matter which way you layer them, but in 6+ disk designs it gives higher data availability in the unlikely event of multiple disk failures). Or in other words - raid5 and hardware cards = bad, mirroring/striping + software raid = good.

    Your goal is not to be buzzword compliant by slapping in a raid controller, your goal is to carefully analyze your systems, your options, your requirements, and your budget, and eliminate single points of failure everywhere that it's feasible and desirable to do so, starting with the lowest MTBF items in the system and working your way up. There are no magic bullet answers of course - change the situation and the "right" answer can change dramatically.

    --
    11*43+456^2
    1. Re:RAID Alone != good design by greck · · Score: 0, Troll

      That last paragraph just about made me cry... it's so very, very nice to see that someone still gets it.

    2. Re:RAID Alone != good design by Anonymous Coward · · Score: 0

      in a 4 disk design actually it doesn't matter which way you layer them

      Yes it does. If you stripe disks 1 & 2, and stripe disks 3 & 4, then mirror them, you can only tolerate one disk failing before data loss.

      If you mirror disks 1 & 2, and mirror disks 3 & 4, then stripe them, you might tolerate two disks failing before data loss. If disk 1 fails, disk 3 or 4 can fail without data loss.

    3. Re:RAID Alone != good design by photon317 · · Score: 1


      You're right of course, I was just on a run there typing quicker than I was thinking. Striping first means you can only tolerate a single disk loss, period. Mirroring first means that after the first disk loss, there's only a 33% chance the 2nd disk lost will cause data loss. So yeah, even in a 4-disk setup, the same rules apply - mirror at the lowest level for better availability :)

      --
      11*43+456^2
  20. Do you have a link? by Futurepower(R) · · Score: 1


    The performance we get with Promise controllers (when they work) has been satisfactory. The application is a cash register; the computer is always faster than the operator. We only need a mirror copy of our data.

    3Ware told me they cannot boot from one drive, after one fails. A 3ware formatted drive cannot boot from the IDE controller on the motherboard. Promise can do both. We need features, not performance, in this case.

    Do you have a link to an Adaptec IDE mirroring RAID controller you would recommend?

    The Adaptec ATA RAID 1200A is about $55, about $30 less than the price of the Promise controllers. We have no experience with them; I found the info by Googling and Froogling.

    1. Re:Do you have a link? by delus10n0 · · Score: 1

      I don't understand why you'd need to boot from a RAID'd drive disconnected from the RAID itself; if the RAID is in RAID5 format or something, this would be impossible anyhow! Unless you're just RAID1-ing everything??

      Even then, why can't you just plug into a 3ware again and get the data off the drive?

      The Adaptec you linked to is another one of those software-driven RAID cards, and offers no real value.

      You're going to have to spend $200-$300+ to get a decent RAID hardware card, and then make sure you have a server (read: 64-bit/66MHz PCI bus) motherboard to handle it.

      --
      Not All Who Wander Are Lost
    2. Re:Do you have a link? by swmccracken · · Score: 1

      You never know what kind of crazy sitation you're in. What if you have no spare 3ware cards with you? What if it's 10 years later and 3ware IDE cards are ancient history? What if you want to recover data off a recovered drive for some reason and you just want to put it in a USB box?

      It's a nice backstop that Promise "arrays" are still accessable with conventional hardware.

      This author is talking about cash registers - where they're likely out and about all day from place to place doing a service run with spares in the back of the car.

      And, yes, RAID-1 - as the poster said, mirroring. For many applications, mirroring is perfectly fine - it's best to mirror the boot disc.

    3. Re:Do you have a link? by Anonymous Coward · · Score: 0

      3ware boots just fine after a raid1 failure.

      I've even intentionally done the following to a 2 disk raid1 with a 2 port controller to increase disks space. I didn't have a spare 3ware card at the time.

      1. Pull both drives from the system.
      2. Create a new second array with the new drives
      3. Pull one of the second array's drives.
      4. Put one of the first array's disks back on the array.
      5. Boot up. At this point, you're running two degraded raid1 arrays.
      6. Copy from the one array to the other.
      7. Pull the original drive, replace with the larger drive from the new array.
      8. Rebuild the array.

      Worked just fine. As well, the raid5 arrays boot just fine without all the drives being live.

    4. Re:Do you have a link? by kasperd · · Score: 1

      What if it's 10 years later and 3ware IDE cards are ancient history?

      That is my primary reason for using software raid. The other reasons being that a raid card is much more expensive than an ordinary IDE controller, and I have read more than once, that it is really still software raid. My setup with three 120GB disks and identical partitioning of all three disks goes like this. One 31MB /boot partition, one 31MB FAT partition (just in case), one 627MB partition for /, one 2GB partition for swap, and the rest for one large filesystem that I can bindmount on /usr, /home, and other places I need it. The /boot filesystem is not raid, but I have a copy of it on each disk. The root filesystem is raid-1 (on three disks, that should be very safe), and the rest is raid-5, which give me one 289GB filesystem.

      --

      Do you care about the security of your wireless mouse?
    5. Re:Do you have a link? by swmccracken · · Score: 1

      Well.. yep, for 10 years in the future, Promise and Software would have the same "readibility".

      We're using Windows 2000 & 2003, and trust me, it's simpler to use a Promise card to mirror the boot volume than to use 2k's software mirroring. Recently moved to SATA based drives.

      There are advantages to the promise cards still - with the enclosures, you get hot-swap and you get status LED's and what not. (We're hoping to be able to say to person on the phone, "which one has the orange light?")

  21. Adaptec ATA RAID 1200A = HighPoint RocketRaid 133? by Futurepower(R) · · Score: 1


    Question: Are Adaptec ATA RAID 1200A cards the same as HighPoint RocketRaid 133 cards? I notice the BIOS setup screens look identical.

  22. There is a reason EMC and others charge $$$ by Anonymous Coward · · Score: 1

    If you want uptime for an enterprise, you have to use enterprise class storage products, or distribute the data the way Google does. There is a reason EMC and Hitachi and others can charge what they do for storage - you can't match the performance, uptime and features.

    Shadow copies? Look at SnapView and SANCopy in EMCs CLARiiON line - no downtime to create a copy. I would expect Hitachi and others to have similar features. There are a lot of used EMC disk arrays on Ebay and other places - just make sure you can re-license the software.

    We have small "hand-built" RAID arrays in our lab totalling about 1TB, and 20+TB of various kinds of EMC disk. The 1TB of cheap RAID is more work to maintain than all the EMC disk put together.

  23. Fibre Channel and the Xserve RAID by caseih · · Score: 4, Informative

    I don't see why setting up the RAIDs under any OS should be more time consuming than on other OSs. Certainly if you use the right hardware-based RAID things should be very simple and very fast.

    Bang for the buck, you can't beat the Apple Xserve RAID. They are IDE, but almost as fast as the fastest scsi arrays, and seem to be very reliable. The array can be easily partitioned into a variety of raid types with hot spares. The unit can then connect to Windows or Linux via standard fibre channel interface and look like simple scsi drives. The RAID is administered via an ethernet connection using a nice java gui tool.

    We set our Xserve RAIDs up such that each array (each Xserve RAID box has 2 arrays with separate controller logic for each) is RAID 5 plus a hot spare, and then the array is mirrored with the other one. This gives is .8 TB or so at a very reasonable price and very reliable. So far it has worked well.

    1. Re:Fibre Channel and the Xserve RAID by justins · · Score: 1
      Bang for the buck, you can't beat the Apple Xserve RAID.

      Yes you can. Easily. Shop around even a little, you'd have to work pretty hard to find an ATA-based solution as expensive as theirs.
      --
      Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
  24. Netblock device RAID? by Halvard · · Score: 1

    Perhaps doing RAID over network block devices would solve your reliability problem. NBD is designed for RAID, you distribute over partitions that are physically separate from each other on different machines and segments, you can do heartbeat, etc. Don't assume that this is necessarily the "cheap way out with cheap hardware". You can do this with fast hardware that's backed by hardware RAID too and use it in a network RAID 0, 1, or 5 scenario for example.

  25. RAID by deadweight · · Score: 1

    I maintain a large number of Dell servers and I have NEVER seen computers malfunction so often before in my life. Our desktops seem to be far more reliable. Try RAID-10 if you want belt and suspendors (two hardware RAID 5 arrays put together in software as a mirror set). Even beter, try some kind of server clustering (Reduntand Array of Inexpensive Servers?)

    1. Re:RAID by ComputerSlicer23 · · Score: 1
      Uhhh, that isn't RAID 10. That isn't RAID 0+1. (Technically, there is no standardized version of RAID 10, however, in my experience, that's not what the general public means by it).

      RAID 10 is when you take 2n raw drives, building n mirrors (The RAID 1 portion of RAID 10). You then take the n mirrors and put them in a RAID 0 stripe.

      RAID 0+1 is less preferrable, but is sometimes all you can do. Take 2n drives, now build two RAID 0 stripes in n devices in them. Now, take the 2 stripes and mirror them together. The total failure conditions on a RAID 10, is better then a RAID 0+1.

      By this naming conventions, what you describe would be referred to as RAID 5+1. Besides all that the configuration you describe is a total waste of disk space. If you have 2n drives, you get to use n-1 (where n > 3) drives worth of space out of them. You have horrible performance for writes relative to a RAID 10. It is very redundant, but at some point.

      Kirby

    2. Re:RAID by deadweight · · Score: 1

      The Dell RAID configuration calls two arrays of RAID 5 mirrored together RAID 10. It may be that no one else calls it that. I have never done it myself. I think two clustered servers is a better idea if you really need to avoid downtime. 2003 Server for Datacenters does this with automatic failover and I am sure there is some *nix equivalent.

    3. Re:RAID by ComputerSlicer23 · · Score: 1
      Okay, now, I'm being more then little pedantic... I've used Dell servers before, and configured their stuff. I've never seen anything that refers to what you call RAID 10 as RAID 10.

      http://www1.us.dell.com/content/topics/global.aspx /power/en/ps1q02_long?c=us&cs=555&l=en&s=b iz

      That's a link to Dell documentation discussing the in's and out's of RAID configuration and reliability. Any chance you've got a link that shows where a mirrored RAID 5 configuration is referred to as RAID 10. I'm always curious to find that sort of information first hand. Gooling for Dell RAID 10 configuration, doesn't come up with anything in the first 5-10 links like what you describe being named RAID 10. They do discuss RAID 50, which is a stripe of RAID 5 configurations. I'm still having a hard time, grasping the waste of disks involved in "RAID 5+1". It's just silly. However, I suppose it sells a lot more hardware, which is what they do.

      Kirby

    4. Re:RAID by deadweight · · Score: 1

      On Dell's site they are calling RAID 10 a stripe set of mirrors. I could swear the PERC setup was doing the opposite, calling it a mirror set of stripes, but I didn't set it up that way so I am not sure. Regardless of what you call it, hardware RAID looks like just one HD to the OS, so you can take two or more HARDWARE RAID arrays and make a SOFTWARE RAID array out of them if you want to. I am tending now to just using two servers both running RAID 5. Too many single points of failure in one server no matter how many HDs you stick on it.

    5. Re:RAID by duplicate-nickname · · Score: 1

      A stipe set of mirrors is RAID 10, but a stripe set is NOT RAID 5. RAID 5 is a stripe set with parity.

      --

      ÕÕ

  26. Mylex 960 by un1xl0ser · · Score: 1
    We had some Compaq's that had Mylex 960s in them. Those things failed more often than not. We ended up just re-installing with software RAID or no RAID at all and it works better.

    Compaq's newer SmartArray have seemed to be more stable.. however only time will tell.

    -un1xloser

    --
    v4sw6PU$hw6ln6pr4F$ck 4/6$ma3+6u7LNS$w2m4l7U$i2e4+7en6a2X h
  27. If RAID cards are failing, that's important! by Futurepower(R) · · Score: 1


    If Mylex cards are failing, that's important! If RAID cards fail, then the company, and all its employees, are out of business. And that's what apparently happened to Mylex. It's now owned by LSI Logic.

    At the low end of the scale, we seem to be having the same kind of problem. We are having a high failure rate with Promise Technology FastTrak Tx2000 controllers. Promise Technology seems to have lost the will, or maybe ability, to deal with problems.

    When I read through the comments to this story, there are a lot of situations where RAID cards are failing. But why?

    The problem seems to be industry-wide. I talked to someone in technical support at HighPoint and he said the mirroring controllers sold by HighPoint have random mirror breakage failures, also.

    This is a new problem. Did Microsoft do something to break mirroring controllers so that customers will buy Microsoft's far more expensive solution? Is there some problem with modern hardware no one has discovered?

    One thing I can say is that, in the past, these cards worked reliably, or the companies could not have stayed in business. Promise Technology mirroring controllers were reliable for us for many years. Now they often cannot even be installed without failing during installation.

    1. Re:If RAID cards are failing, that's important! by TheRealSlimShady · · Score: 1
      When I read through the comments to this story, there are a lot of situations where RAID cards are failing. But why?

      It seems that most (all?) of those stories relate to controllers doing IDE RAID. I suspect the answer to the question of why so many are failing is that it's still a relatively new technology, only really widely available in the last 18 months. SCSI RAID controllers on the other hand don't seem to be plagued with the same issues.

      This is a new problem. Did Microsoft do something to break mirroring controllers so that customers will buy Microsoft's far more expensive solution? Is there some problem with modern hardware no one has discovered?

      Exactly what is Microsoft's far more expensive solution - software RAID?

  28. The FastTrak Tx100 always worked for us, too. by Futurepower(R) · · Score: 1


    The Promise FastTrak Tx100 cards always worked for us, too. The only Promise cards that fail for us are the Tx2000 cards. Since we have been unable to get help from Promise for this problem, I presume they know there is a problem, and are unable to fix it.

  29. Better Hardware? by sam+the+lurker · · Score: 1

    If your having problems with controllers, drives, enclosures, etc., going bad, then maybe you need to buy better hardware (i.e. more expensive).

    I have been working with compaq proliant servers for several years (support for RedHat Linux is good) with nary a hardware problem.

    http://h18004.www1.hp.com/products/servers/proli an tml530/index.html
    http://h18004.www1.hp.com/produ cts/servers/prolian tstorage/arraycontrollers/index.html
    http://h1800 4.www1.hp.com/products/servers/prolian tstorage/drives-enclosures/4300enclosure/

    I know that expensive is not always better or more reliable but failing (or nearly failing) to meet a SLA should get management buy-in to buy just about anything in the $K range.

  30. Better deal than Xserve RAID by extra88 · · Score: 1

    [I haven't tried either of these products.]
    Gateway 840 Serial-ATA RAID Enclosure is cheaper per GB than Xserve RAID. It has 12 bays and uses U320 SCSI instead of Fiber Channel for the connection to the system. Currently the cheapest config you can do is $4,749. That's with 4 250GB SATA drives and their cheapest 3yr warranty (another nice thing is you can increase the warrany to 4 or even 5 years and they have a variety of response times you can choose). Gateway gives you all 12 carriers no matter how many drives you buy from them. So you buy 8 more 250GB drives for $225/ea. ($1800) for a total of $6,549. Apple won't sell you drive carriers, you have to buy the carriers with their drives. They currently charge $450 for the driver carrier + 250GB ATA drive. Xserve RAID with 12x250GB drives and a 3yr. warranty costs $10,998.

    The cheapest way to go is to build you own using a PCI RAID controller and drive cages in a large PC case. There are drawbacks to the DYI method but a 12x250GB SATA RAID system would cost you about less than $5000 ($2700 for drives, $750 for 3ware 8506-12 RAID card, ~$450 for 3 drive cages, the rest is for a big-ass case, mobo, etc.). Note that includes the cost of the computer which the above OEM options do not include.

    1. Re:Better deal than Xserve RAID by Anonymous Coward · · Score: 0

      The cheapest way to go is to build you own using a PCI RAID controller and drive cages in a large PC case. There are drawbacks to the DYI method but a 12x250GB SATA RAID system would cost you about less than $5000 ($2700 for drives, $750 for 3ware 8506-12 RAID card, ~$450 for 3 drive cages, the rest is for a big-ass case, mobo, etc.). Note that includes the cost of the computer which the above OEM options do not include.

      I just did this. Enlight 9 bay case, 8 SATA hotswap carriers, 3ware 8506-8 8 port SATA card, 8 250G Maxtor SATA drives. Total cost, $2900. I reused old motherboard/cpu/ram, so you'll need to add the cost for that to the system.

      My system uses 7 drives in RAID-5 with one additional drive as a hot spare to yield 1.5 TB. For 250G and larger drives, you may want to stick with the 8 port card unless you want more than one hot spare, as the 3ware card limits a single array to 2 TB.

  31. In the 13 years I've been using HW RAID by Anonymous Coward · · Score: 0

    I can count on one hand the number of RAID controller failiures I've had. If I remove external factors (like power failures), then I think I've had only 1 controller fail.

  32. RAID 10? by b!arg · · Score: 3, Insightful

    If uptime is so absolutely crucial how about a duplexed mirror of RAID 5 arrays. Two controllers and a RAID 5. When in doubt throw more money at the problem. :)

    --

    Everybody dies frustrated and sad and that is beautiful
  33. Missed point - Rebuild times by Crypt0pimP · · Score: 2, Interesting

    When that slow 250GB ATA class drive is dead, and while its fellow drives are chugging their little hearts out (and probably maxing out that 3ware controller), how long will it take to rebuild your array?

    Have you tested how long it takes? Probably better than 24 hours if your system is moderately loaded.

    Guess what you have now? The marvelous opportunity for a CASCADING FAILURE!

    That's right kids! Because you just had a drive fail, and all the other drives are doing double the work to rebuild from parity data, you have a higher chance of getting a second drive failure.

    Consider that you bought all of the drives in that array at the same time. They've all been running the same amount of time. What if there was a minor manufacturing defect that caused that First drive to fail? How soon before it takes out the other 4?

    A 'resume generating event' waiting to happen.

    Best of luck.. and I agree with the comment upthread. SATA drives are for Workstations. Maybe for storing what we call 'reference data'.
    Not much more.

    There's a few choice terms in the industry- 'Economy Enterprise'
    'Garbage RAID'
    'Ghetto SAN'

    Good luck

    --
    Striving to achieve a lower state of conciousness
    1. Re:Missed point - Rebuild times by DrZaius · · Score: 1

      You sound like a sales guy pushing SCSI.

      I've had a 250gb SATA drive fail on a 1TB array on a 3ware card. It was about 4 hours for it to rebuild. The system was slower, but we didn't have 'cascading failures.'

      In fact, the only time I've experienced 'Cascading Failures' was on an expensive Mylex SCSI raid controller. There is nothing like saying "Shit, we just lost two drives on that raid 5."

      Banks can continue to use SCSI, but I'm going to use SATA everywhere. It'll save me over half the cost for the same size array.

      You still have to plan for failures everywhere. That's why I have spare drives and backups for all my arrays, even my SCSI ones.

      --
      -- DrZaius - Minister of Sciences and Protector of the Faith
    2. Re:Missed point - Rebuild times by jdray · · Score: 1

      We use RAID 0+1 for our large DAS. Our system guys seem to think that it's safer than RAID 5. We've spent the last couple of years migrating a lot of our storage to SAN, though, and I'm not sure if the 0+1 methodology got migrated along with it. Seems unlikely.

      If you spread your RAID 5 over sixteen volumes (as someone upthread said they did), it seems to me that any individual drive failing wouldn't incur a ton of work on the rest of the drives, because the amount of data any one of them would have to contribute to rebuilding the lost one would be inversely proportional to the number of drives in the overall system.

      Wow. Talk about a run-on sentance...

      --
      The Spoon
      Updated 6/28/2011
  34. 3Ware said no, in a telephone conversation. by Futurepower(R) · · Score: 1


    What you said is what I would expect. However, I called and talked to someone in 3Ware technical support, and he said it would not boot with only one drive; it would be necessary to rebuild the array to boot the computer. Maybe you are using a different controller.

    In addition, here are questions and answers from a session on the 3Ware chat system:

    Request:- 10th January 2004 at 8:23

    [Irrelevant questions removed here.]

    We've been using Promise RAID controllers with our cash register software, and experiencing excessive failures. We are considering moving to 3Ware 7006-2 mirroring controllers.

    Can we rebuild the mirror using only the 7006-2 firmware, without booting? Can we clone hard drives with the firmware?

    Can we boot from one hard drive of the mirror, when that hard drive is temporarily attached to the motherboard IDE controller?

    Does the mirroring controller store anything on the hard drive that would interfere with other devices or software?

    Response:- 12th January 2004 at 11:31
    Michael,

    You can build/rebuild a mirror in the bios before booting to the OS.

    You cannot attach the drive to the motherboard controller and boot from it.

    The controller allocates a small space in the hard drive less than 1MB to store the raid information

    Sincerely,
    3ware Customer Support.

  35. Buy Windows 2003 server? by Futurepower(R) · · Score: 1


    Microsoft's solution is that everyone should buy Windows 2003 server and use software RAID, available only on that Windows OS.

    That's all we need is software RAID mirroring, but it doesn't make sense, for this application, to support a much more complex system and much more expensive system to get it.

    1. Re:Buy Windows 2003 server? by TheRealSlimShady · · Score: 1
      Microsoft's solution is that everyone should buy Windows 2003 server and use software RAID, available only on that Windows OS

      But Microsoft only recommend software RAID for small environments. They don't even use it themselves - they use massive HP EVA SAN's.

  36. Don't mirror HD - mirror the server! by yabHuj · · Score: 2, Interesting

    If (disk)space and performance is not a problem (i.e. HD below 200GB, non-fancy single CPU), you could simply go with two (or three) cheap PC boxen instead of one "data center quality" RAID machine (for the same total price). If you mirror data+setup over from "production" to "standby" daily, any downtime due to any failure (HD, controller, mobo, OS, filesystem) can be minimized to 1-2 minutes (switch service over to the standby) - continuing with yesterdays data, which should be sufficient for most cases.

    Integrating a backup/backlog (e.g. 3 months data) into a mirror setup is possible in several ways - my company does offer such a solution (managed service, that is).

    Continuing with current data instead of yesterday's status is quite a bit more challenging, though...

    1. Re:Don't mirror HD - mirror the server! by lusid1 · · Score: 1

      - continuing with yesterdays data, which should be sufficient for most cases.

      I am having a hard time coming up with a scenario where "yesterdays data" wouldn't get me fired.

  37. Change your design by NateTech · · Score: 1

    Buy a NetApp Filer, mount it and use it for all your variable data. Get rid of the RAID arrays attached directly to the servers.

    --
    +++OK ATH
    1. Re:Change your design by Anonymous Coward · · Score: 0

      NetApp's are cool, but a bit pricey for the home data center. ;)

  38. It's your fault. by Anonymous Coward · · Score: 0

    You just don't know what you're doing. Go paint a picture or something.

  39. Maybe you were the first. by anti-NAT · · Score: 1

    So, did you "post" that raid-5 + raidtools + kernel 2.6 locks up a computer, to save somebody else going through what you had to ?

    --
    The Internet's nature is peer to peer - 20050301_cs_profs.pdf
  40. Better know how to replace your batteries by John+Harrison · · Score: 1

    Especially if you are dealing with an IBM 4758. They detect a casual battery replacement as an attack and clear their memory, which is a good thing. Point is, you had better know what you are doing.

  41. Nerd alert by Anonymous Coward · · Score: 0

    Ahh the sign of a true nerd - That depends question springs up flushing you out.

    You dont have a depends in the question you have a statment you answer with single engine or twin engine. Which would you rather be in when one engine fails. Your not flying around in a simulator your answering a rethorical question - So go crash and dye.

    1. Re:Nerd alert by Nutria · · Score: 0
      So go crash and dye.

      What pray tell, shall he dye, after he's crashed? After all, he's dead...

      --
      "I don't know, therefore Aliens" Wafflebox1
  42. We are using good supplies. Rebooting continuously by Futurepower(R) · · Score: 1


    We are using power supplies that seem like the best (KingWin), although not expensive.

    We have tested these units by putting a re-boot program in the Win XP startup folder. This causes continous reboots. We have run several computers more than 12 hours continously rebooting. This should show problems with the power supplies.

    We do NOT see problems, usually, with the Promise Tx2000 controllers when continously rebooting. The problems come after the units are delivered to the customer, a terrible situation.

  43. Raid problem, no more by Anonymous Coward · · Score: 0

    try IBM Total Storage Solutions, and solve your problems with information management, administration, backup and/or archive

  44. boot disks? we dont need no ... (was: Re:Brands?) by Anonymous Coward · · Score: 0

    the great thing about the newer proliants (in addition to their exceptionally reliable raid cards) is their management cards. you can setup tftp, put your custom kernel/modules there and boot strap complete debian systems in like 5 minutes.

    once you like that idea you can build in dynamic fault tolerance. have a server go down? take your hourly rsync of the production data (and config) and dynamically build a new replacement box.

    poor man's on-demand.

  45. Cisco CSS? Are you high? by Anonymous Coward · · Score: 0

    having dealt with these in very high profile environments, i can --assure-- you that just about ANY alternative would be preferable. maybe you meant f5?