Tips for Increasing Server Availability?
uptime asks: "I've got a friend that needs some help with his web server availability. On two separate occasions, his server has had a problem that caused it to be unavailable for a period of time. One was early on and was probably preventable, but this latest one was due to two drives failing simultaneously in a RAID5 array. As a web business moves from a small site to a fairly busy one, availability and reliability becomes not only more important, but more difficult to accomplish it seems. Hardware gets bigger, services get more expensive, and options seem to multiply. Where could one find material on recommended strategies for increasing server availability? Anything related to equipment, configurations, software, or techniques would be appreciated."
if you are moving to a level that you need uptime, but cant dedicate more resources to overseeing it - you may want to considering a hosted solution. They host, monitor, upgrade, do checkups (YMMV with whom you choose)
If that isnt something you want to venture down, then start planning outages for fsck, upgrade, and standard checkups. There are alos plugins for NAGIOS that will check different RAID controller status, server response, and server load
You are hosting this on a 56K dial-up in your root cellar?
Your apps need to run on Microsoft Windows or HP-UX or...?
You've got a SAN or local disk or...?
You're using home-built white-box x86s or Sun E15000s or...?
You have sysadmin talent on hand? You're outsourced to IBM global services?
Who vets these silly questions? Oh, I forgot - the "Editors".
Advice: on VPS providers
That is all...
Good judgement comes from experience, and experience comes from bad judgement.
- W. Wriston, former Citibank CEO
are extremely low given the MTBF of modern drives. You have a better chance of a power supply or fan failure.
On that basis I am going to make some wild assed guesses that are more probable given the little information we have.
1) the drives were consumer models from the same production lot,
2) the death of the first drive was not immediately noticed,
3) compatible replacement drives are not easy to come by (no hot spare),
3) the second drive died before the first one was replaced,
4) the server did not have hot swap drive carriers
5) someone tried to replace the dead drive in the running chassis
If you don't like my guesses provide your own