No Hassle RAID 5 Implementations?
LambSpam asks: "I had a nightmare week (last week) with two of our servers running Intel's U3-1L RAID controller (RAID 5). Whenever there's a power outage in our building these controllers randomly mark one or more of the drives in the array offline (even with adequate UPS support), which means I have to manually mark them online and/or rebuild. Intel acknowledged the problem, but their solution involves updating the backplane's firmware, the controller firmware (destructive upgrade!), and even the firmware on our IBM drives in the array because they 'draw too much power' in certain conditions. I've only used one other RAID 5 implementation (MegaRAID), and it NEVER had these kinds of problems, whereas if you sneeze too hard around this U3-1L card it will go offline. Is this common with most hardware RAID implementations? What RAID 5 implementations works without hassle? What should I stay away from?"
I've never had any problems with the PERC (PowerEdge Raid Controller) in the Dells i (used) to use for Sendmail servers. That kind of limits your choices, though..
Were I used to work (An all-windows shop) we used Adaptec RAID cards in all our "tower" based servers. Even the lower priced models (AAA-131U2) always performed without a hitch and we never had any problems with them at all. AMI's RAID controllers are real nice and all, but for the price it just wasn't worth it. The Adaptec solutions performed just as well and at a lower cost. You'd do good to check em out.
Now the 3200 RAID Controllers int he Compaq's, thats another diffrent story altogether.
We had roughly 2000 servers, operating 24/7 @ 67 degrees F. Two times a year we had a site shutdown. Every single time we had to bring everything back up we would have anywhere from 3-5 Compaq array controllers die. But never once did the low-buck Adaptecs crap out on us.
Looking for hardware (Currently need: Large Etch-a-Sketch) Have one? See my journal!
Just as a shot in the dark, I would suggest trying to upgrade the firmware on the drives first. At one of my old jobs, we used nothing but IBM drives, and we constantly had problems with the drives becoming marked as bad or off line, but simply pulling them and plugging them back in (hot swap) would bring them back. In our situation, we were using IBM Netfinity servers with IBM raid controllers. When we talked to IBM, they admitted there was a problem with the firmware on the drivers which would cause the drive to not spit out just one error whenever an event (even a simple read error) happened, but to spew them constantly, which made the raid controller mark the drive as bad. Seeing as it only takes a few minutes of downtime and is non-destructive, it might be worth a shot.
First, are you sure your UPS is a *TRUE* UPS? Even a lot of the 'high end' UPSes out there are still REALLY switched UPSes. This could very well be your problem.
The other one is something I've heard of (I'm not an electrical expert, but I'll try to explain). Larger (older installations, particularly) sites were wired for three-phase electricity. Over time, they split the phases for normal 110 volt usage. There is a chance where if the PC is connected to power on one phase, but the external unit is connected to power from a different phase, that the differential between the two can cause problems, due to the ground connection between the two through the cable shielding. I know, it sounds like something from the BOFH daily calendar, but it does make sense. Try making sure both pieces of equipment are on the same true UPS, or at least switched UPSes on the same circuit.
Unless you're limited by cost, don't use host based RAID. It will always be less reliable then a dedicated RAID controller. If you must use host based RAID, try and find a card that supports RAID 0/1 because it's faster and more reliable. I've had good experiences with MegaRAID cards, and the IBM host based raid controllers, but by good experience I mean that I've only had a few problems. There is always a chance that something will get screwed up when you change your setup.
Sounds like good advice in the post above.
Some UPSs switch. Some are always online. You want the latter for a RAID array.
The second paragraph is important. Check your input power. Everything attached to your network should be wired to the same power circuit. Otherwise there is a possibility for feeding large spurious signals to your hardware through the power line.
Bush's education improvements were
When I took over my current job the last network team had overloaded the circuits in the server room. We've had 3 circuits trip and had servers drop hard. None of the Compaq SmartArray controllers had any problems recovering.
I suggest you also fix you power problem. The systems should have no idea power was lost to the building. If you are using a UPS and this is still happening, I'd find a better one.
I have built serveral RAID configuration with IBM ServeRAID cotrollers. One RAID5 array (16 drives, 1 hot spare) that I've managed has had 2 drives fail in the past year; the only thing I've had to do is take the bad drive out, pop another one in and it is automatically marked as a hot spare.
I was expecting a hassle, but it was mind-blowing to see how easy it was. The cross-platform remote management utility is a plus too.
I really hope you're kidding.
:)
The A1000's stink. The firmware is awful; the RM6 management software is worse!
Be careful upgrading your firmware (which you need to do from time to time) -- the controller _can_ deadlock. And of course, if it does, you lose all your data, since the only copy of the LUN configuration is in the controller.
Seriously. They're crap. Built on the same crap as the A3000/3500 series. It's all old, re-branded Symbios stuff. Yuck-o.
You'd be better off getting an A5200 tray (or D1000 tray) and using the RAID-5 functions of Veritas Volume Manager instead. It actually has a shot at working
--NBVB
Everything needs to be on the same Ground circuit. It is necessary to avoid ground loops.
"They draw LARGE spikes of current sporadically."
I don't think this is correct. I have designed power supplies, and I don't immediately think of any reason why the power input of a switching power supply should vary differently from the power output. The only surge is when the hard disks spin up, but with SCSI there is a means to stagger the spin-up.
Bush's education improvements were
In this situation, I use XML. I invent my own markup language that is self-consistent and describes the API of a system. I then use an XSLT processor, Apache Xalan to be precise, to transform the source to various other formats including: a web site, one big printable web page, PDF, and I've been thinking about writing a stylesheet for man pages as well.
The only issue with a system like this is version control of your source files, which is highly situation specific.
std::disclaimer<std::legalese> sig=new std::disclaimer; sig->dump(); delete sig;