Uptime Realities in the Internet World
schnurble writes: "My former boss has written an interesting article on the realities of uptime in the Internet World. It poses the idea that four and five nines of reliability are too expensive to be realistic, especially in the post dot-bomb economy. It's an interesting read, especially if you answer to an 800lb gorilla for outages and uptime issues."
Let me give you a hypothetical case. One of our clients does about $50k/month on their web site. When the site was built, they were only expecting $10000-$15000/month. At the time, NN4 compatibility wasn't important, because the extra cost ($10k) wasn't going to be worth it. With NN4 sitting between 5% and 10% each month, they have decided that NN4 compatibility is important in the next version.
When we launched, 3 days of downtime a month was considered okay. It was considered a better choice than spending an extra $5k on hardware for redundancy. Well, when the site broke $40k/month, we immediately decided that that was no good and invested in the redundancy.
The site has had a few 15 minute outages over the past 6 months, and a 1 day outage over a holiday weekend (not a big deal). However, if the site doubles in revenue again, downtime is becoming less acceptable, and we'll drop $10k to avoid it.
If your site sucks and no one visits, downtime doesn't matter. If you are making lots of money, downtime does matter. $10k on hardware is worth it if the downtime would cost you $25k?
Alex
Entirely. Having worked extensively on the flight deck systems for the Boeing 767-400ER, I can tell you first hand that the redundancy is rather amazing. There are two major computer systems that drive the displays in the cockpit, the DPCs which do a lot of digital signal manipulation and the DCCs which do a lot of the analog to digital signal manipulation and control. Two DCC boxes drive three DPC boxes and the two DCC boxes are cross-connected to each of the DPC boxes. The three DPC boxes each talk to each other (I'm not sure if the DCC boxes talked to each other - that was further down the chain than I was working on) and actually vote on the data points that are being sent to the displays to determine if one of the DPCs is malfunctioning or processing bad data. The way this all works together is amazingly complicated, especially when you consider that it all runs on embedded boards where the "executable" is typically less than 1-2MBs in size.
... especially the way its actually implemented in the embedded system. Debugging all this, of course, was non-trivial. For that matter, coding it is non-trivial as its all in Ada83.
... those were the days :)
My particular area of development was the actual display software which was provided data from the DPC systems. Each of the six displays (2-pilot, 2-copilot, 2-EICAS in the console) received multi-cast data from each of the DPCs and then fed data back to the DPCs on the display's status. The DPCs would then automagically evaluate if the displays were functioning properly and switch primary functions away from a malfunctioning display to a functioning display if error conditions were detected.
The PFD (primary flight display) is the pilots most important display as it displays airspeed, artificial horizon, TCAS warnings, altitude and a few other things. The ND (navigation display) is the inner screen on both the pilot/co-pilot sides and if the PFD experiences error conditions, the DPCs switch the PFD to the ND and the ND to one of the EICAS (engine indicators, etc.) displays.
All very interesting stuff
Ahh