Are Data Center "Tiers" Still Relevant?

← Back to Stories (view on slashdot.org)

Are Data Center "Tiers" Still Relevant?

Posted by timothy on Tuesday September 22, 2009 @04:07AM from the german-datacenters-have-tieren dept.

miller60 writes "In their efforts at uptime, are data centers relying too much on infrastructure and not enough on best practices? That question is at the heart of an ongoing industry debate about the merits of the tier system, a four-level classification of data center reliability developed by The Uptime Institute. Critics assert that the historic focus on Uptime tiers prompts companies to default to Tier III or Tier IV designs that emphasize investment in redundant UPSes and generators. Uptime says that many industries continue to require mission-critical data centers with high levels of redundancy, which are needed to perform maintenance without taking a data center offline. Given the recent series of data center outages and the current focus on corporate cost control, the debate reflects the industry focus on how to get the most uptime for the data center dollar."

7 of 98 comments (clear)

Min score:

Reason:

Sort:

It depends by afidel · 2009-09-22 04:26 · Score: 4, Interesting

If you are large enough to survive one or more site outages then sure you can go for a cheaper $/sq ft design without redundant power and cooling. If on the other hand you are like most small to medium shops then you probably can't afford the downtime because you haven't reached the scale where you can geographically diversify your operations. In that case downtime is probably still much more costly than even the most expensive of hosting facilities. I know when we looked for a site to host our DR site we were only looking at tier-IV datacenters because the assumption is that if our primary facility is gone we will have to timeshare the significantly reduced performance facilities we have at DR and so downtime wouldn't really be acceptable. By going that route we saved ~$500k on equipment to make DR equivalent to production at a cost of a few thousand a month for a top tier datacenter, those numbers are easy to work.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Infrastructure is very important. by CherniyVolk · 2009-09-22 04:28 · Score: 4, Interesting

Infrastructure is more important than "best practices". Infrastructure is more of a physical, concrete aspect. Practices really aren't that important once the critical, physical disasters begin. As an example, good hardware will continue to run for years. Most of the downtime in regards to good hardware will most likely be due to misconfiguration, human error that sort of thing. A Sys Admin banks on some wrong assumption, messes up a script or hits the wrong command, but nonetheless the hardware is still physically able and therefore the infrastructure has not been jeopardized. If the situation is reversed, top notch paper plans and procedures... with crappy hardware. Well... the realities of physical discrepancies are harder to argue than our personal, nebulous, intangible, inconsequential philosophies of "good/better/best" management procedures/practices.
So to me the question "In their efforts at uptime, are data centers relying too much on infrastructure and not enough on best practices?" is best translated as "To belittle the concept of uptime and it's association with reliability, are data centers relying too much on the raw realities of the universe and the physical laws that govern it and not enough on some random guys philosophies regarding problems that only manifest within our imaginations?"
Or, as a medical analogy... "In their efforts in curing cancer, are doctors relying too much on science and not enough on voodoo/religion?"
Perfect illustration by jeffmeden · 2009-09-22 04:36 · Score: 4, Insightful

Given the recent series of data center outages and the current focus on corporate cost control, the debate reflects the industry focus on how to get the most uptime for the data center dollar.
Repeat after me: There is no replacement for redundancy. There is no replacement for redundancy. Every outage you read about involves a failure in a feature of the datacenter that was not redundant and was assumed to not need to be redundant... assumed *incorrectly*. Redundancy is irreplaceable. If you rely on your servers (the servers housed in one place) you had better have redundancy for EVERY. SINGLE. OTHER. ASPECT. If not, you can expect downtime, and you can expect it to happen at the worst possible moment.
pointless marketing by vlm · 2009-09-22 04:43 · Score: 5, Informative

Critics assert that the historic focus on Uptime tiers prompts companies to default to Tier III or Tier IV designs that emphasize investment in redundant UPSes and generators
I've been involved in this field for about 15 years. The funniest misconception I've run into, time and time again, is that an unmaintained UPS, unmaintained battery bank, unmaintained transfer switch, and unmaintained generator will somehow act as magical charms so as to be more reliable than the commercial power they are supposedly backing up. And yes I've been involved in numerous power failure incidents (dozens) at numerous companies, and only experienced two incidents of successful backup of commercial power loss.
Transfer switches that don't switch. Generators that don't start below 50 degrees. Generators with empty fuel tanks staffed by smirking employees with diesel vehicles. When you're adding capacity to battery string A, and the contractor shorts out the mislabeled B bus while pulling cable for the "A" bus.
Experience shows that if a companies core competency is not running power plants, they would be better off not trying to build and maintain a small electrical power plant. Microsoft has conditioned users to expect failure and unreliability, use that conditioning to your advantage... the users don't particularly care if its down because of a OS patch or a loss of -48VDC...

--
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
1. Re:pointless marketing by R2.0 · 2009-09-22 06:23 · Score: 5, Interesting
  
  It's not just in IT. I work for an organization that uses a LOT of refrigeration in the form of walk-in refrigerators and freezers. Each one can hold product worth up to $1M and all can be lost in a temperature excursion. So we started designing in redundancy: 2 separate refrigeration systems per box, backup controller, redundant power feeds from different transfer switches over divers routing (Brown's Ferry lessons learned). Oh, and each facility had twice as many boxes as needed for the inventory.
  After installation, we began getting calls and complaints about how our "wonder boxes" were pieces of crap, that they were failing left and right, etc. We freak out and do some analysis. Turns out that, in almost every instance, a trivial component had failed in 1 compressor and the system had failed over to the other system, ran for weeks-months, and then that failed too. When we asked why they never fixed the first failure, they said "What failure?" When we asked about the alarm the controller gave due to mechanical failure, we were told that it had gone off repeatedly but was ignored because the temperature readings were still good and that's all Operations cared about. In some instances the wires to the buzzer was cut, and in one instance, a "massive controller failure" was really a crash due to the system memory being filled by the alarm log.
  Yes, we did some design changes, but we also added another base principle to our design criteria: "You can't engineer away stupid."
  
  --
  "As God is my witness, I thought turkeys could fly." A. Carlson
RAID by QuantumRiff · 2009-09-22 04:43 · Score: 4, Interesting

Why go with a huge, multiple 9's datacenter, when you can go the way of google, and have a RAID:
Redundant Array of Inexpensive Datacenters..

Is really better to have 1000 machines in a 5-9's location, or 500 systems each in a 4-9's, with extra cash in hand?

--

What are we going to do tonight Brain?
Re:But it's never the software... by Maximum+Prophet · 2009-09-22 04:54 · Score: 4, Insightful

Code scales, hardware doesn't. If you have one machine, yes, it cheaper to get a bigger, better machine, or to wait for one to be released.

If you have 20,000 machines, even a 10% increase in efficiency is important.

--
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)