Slashdot Mirror


No Hassle RAID 5 Implementations?

LambSpam asks: "I had a nightmare week (last week) with two of our servers running Intel's U3-1L RAID controller (RAID 5). Whenever there's a power outage in our building these controllers randomly mark one or more of the drives in the array offline (even with adequate UPS support), which means I have to manually mark them online and/or rebuild. Intel acknowledged the problem, but their solution involves updating the backplane's firmware, the controller firmware (destructive upgrade!), and even the firmware on our IBM drives in the array because they 'draw too much power' in certain conditions. I've only used one other RAID 5 implementation (MegaRAID), and it NEVER had these kinds of problems, whereas if you sneeze too hard around this U3-1L card it will go offline. Is this common with most hardware RAID implementations? What RAID 5 implementations works without hassle? What should I stay away from?"

3 of 51 comments (clear)

  1. Tried Adaptec? by Judg3 · · Score: 5, Informative

    Were I used to work (An all-windows shop) we used Adaptec RAID cards in all our "tower" based servers. Even the lower priced models (AAA-131U2) always performed without a hitch and we never had any problems with them at all. AMI's RAID controllers are real nice and all, but for the price it just wasn't worth it. The Adaptec solutions performed just as well and at a lower cost. You'd do good to check em out.

    Now the 3200 RAID Controllers int he Compaq's, thats another diffrent story altogether.
    We had roughly 2000 servers, operating 24/7 @ 67 degrees F. Two times a year we had a site shutdown. Every single time we had to bring everything back up we would have anywhere from 3-5 Compaq array controllers die. But never once did the low-buck Adaptecs crap out on us.

    --
    Looking for hardware (Currently need: Large Etch-a-Sketch) Have one? See my journal!
  2. Firmware by Holophax · · Score: 4, Informative

    Just as a shot in the dark, I would suggest trying to upgrade the firmware on the drives first. At one of my old jobs, we used nothing but IBM drives, and we constantly had problems with the drives becoming marked as bad or off line, but simply pulling them and plugging them back in (hot swap) would bring them back. In our situation, we were using IBM Netfinity servers with IBM raid controllers. When we talked to IBM, they admitted there was a problem with the firmware on the drivers which would cause the drive to not spit out just one error whenever an event (even a simple read error) happened, but to spew them constantly, which made the raid controller mark the drive as bad. Seeing as it only takes a few minutes of downtime and is non-destructive, it might be worth a shot.

  3. Two possibilities... by Vrallis · · Score: 4, Interesting

    First, are you sure your UPS is a *TRUE* UPS? Even a lot of the 'high end' UPSes out there are still REALLY switched UPSes. This could very well be your problem.

    The other one is something I've heard of (I'm not an electrical expert, but I'll try to explain). Larger (older installations, particularly) sites were wired for three-phase electricity. Over time, they split the phases for normal 110 volt usage. There is a chance where if the PC is connected to power on one phase, but the external unit is connected to power from a different phase, that the differential between the two can cause problems, due to the ground connection between the two through the cable shielding. I know, it sounds like something from the BOFH daily calendar, but it does make sense. Try making sure both pieces of equipment are on the same true UPS, or at least switched UPSes on the same circuit.