Are Data Center "Tiers" Still Relevant?
miller60 writes "In their efforts at uptime, are data centers relying too much on infrastructure and not enough on best practices? That question is at the heart of an ongoing industry debate about the merits of the tier system, a four-level classification of data center reliability developed by The Uptime Institute. Critics assert that the historic focus on Uptime tiers prompts companies to default to Tier III or Tier IV designs that emphasize investment in redundant UPSes and generators. Uptime says that many industries continue to require mission-critical data centers with high levels of redundancy, which are needed to perform maintenance without taking a data center offline. Given the recent series of data center outages and the current focus on corporate cost control, the debate reflects the industry focus on how to get the most uptime for the data center dollar."
Only when coming from the eyes of data center owners.
-1, Disagree is not a valid option. Troll, Flamebait and Offtopic are not a substitute.
And they never were.
If you are large enough to survive one or more site outages then sure you can go for a cheaper $/sq ft design without redundant power and cooling. If on the other hand you are like most small to medium shops then you probably can't afford the downtime because you haven't reached the scale where you can geographically diversify your operations. In that case downtime is probably still much more costly than even the most expensive of hosting facilities. I know when we looked for a site to host our DR site we were only looking at tier-IV datacenters because the assumption is that if our primary facility is gone we will have to timeshare the significantly reduced performance facilities we have at DR and so downtime wouldn't really be acceptable. By going that route we saved ~$500k on equipment to make DR equivalent to production at a cost of a few thousand a month for a top tier datacenter, those numbers are easy to work.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
"More uptime for the data center dollar" is a meaningless phrase.
The tried and true statement is that you can pick two (2) of the following:
-FAST
-RELIABLE
-CHEAP
Changing the metric to SAY you are providing all three does not mean you ACTUALLY are. It is just another way to confuse the customer and sell and inferior service as a premium service. If a company chooses to favor lower costs over redundancy in their data center that is their choice. If we start to blur the line we between the different options we only hinder the ability of a company to make an informed decision.
Infrastructure is more important than "best practices". Infrastructure is more of a physical, concrete aspect. Practices really aren't that important once the critical, physical disasters begin. As an example, good hardware will continue to run for years. Most of the downtime in regards to good hardware will most likely be due to misconfiguration, human error that sort of thing. A Sys Admin banks on some wrong assumption, messes up a script or hits the wrong command, but nonetheless the hardware is still physically able and therefore the infrastructure has not been jeopardized. If the situation is reversed, top notch paper plans and procedures... with crappy hardware. Well... the realities of physical discrepancies are harder to argue than our personal, nebulous, intangible, inconsequential philosophies of "good/better/best" management procedures/practices.
So to me the question "In their efforts at uptime, are data centers relying too much on infrastructure and not enough on best practices?" is best translated as "To belittle the concept of uptime and it's association with reliability, are data centers relying too much on the raw realities of the universe and the physical laws that govern it and not enough on some random guys philosophies regarding problems that only manifest within our imaginations?"
Or, as a medical analogy... "In their efforts in curing cancer, are doctors relying too much on science and not enough on voodoo/religion?"
Whose best practices do you propose we follow?
as the article states, the current cost cutting best practices is leading to mediocrity
Data center redundancy is a need thing. However, most data center designs for get to address the two largest causes of down time ... people and software. People are people and will always make mistakes, as such there are still things that can be done to reduce the impact of human error.
Software, very rarely is designed for use in redundant systems. More likely, the design is for use in a hot-cold or hot-warm recovery scenario. Very rarely is it designed for multiple hot across multiple data centers.
Remember, good disaster avoidance is always cheaper than disaster recovery when done right.
"A stick of RAM costs how much? $50?"
I don't remember the source of that quote, but it was in relation to a company spending money (far more than $50) to reduce the memory use of their program. Sure, there's a lot of talk in computer science curricula about using efficient algorithms, but from what I've seen and heard, companies almost always respond to performance problems by buying bigger and better hardware. If software weren't grossly inefficient, how would that affect data centers? Less power consumption, cheaper hardware, and more "bang for your buck", so to speak.
Eventually, this whole debate becomes moot, as data centers can get more income from the hardware, thus still provide the uptime, redundancy, and features, without the need to cut costs. Once those basic needs are out of the way, there's room for expansion into other less-than-critical offerings, and finally, innovation in areas other than uptime.
You do not have a moral or legal right to do absolutely anything you want.
Given the recent series of data center outages and the current focus on corporate cost control, the debate reflects the industry focus on how to get the most uptime for the data center dollar.
Repeat after me: There is no replacement for redundancy. There is no replacement for redundancy. Every outage you read about involves a failure in a feature of the datacenter that was not redundant and was assumed to not need to be redundant... assumed *incorrectly*. Redundancy is irreplaceable. If you rely on your servers (the servers housed in one place) you had better have redundancy for EVERY. SINGLE. OTHER. ASPECT. If not, you can expect downtime, and you can expect it to happen at the worst possible moment.
Critics assert that the historic focus on Uptime tiers prompts companies to default to Tier III or Tier IV designs that emphasize investment in redundant UPSes and generators
I've been involved in this field for about 15 years. The funniest misconception I've run into, time and time again, is that an unmaintained UPS, unmaintained battery bank, unmaintained transfer switch, and unmaintained generator will somehow act as magical charms so as to be more reliable than the commercial power they are supposedly backing up. And yes I've been involved in numerous power failure incidents (dozens) at numerous companies, and only experienced two incidents of successful backup of commercial power loss.
Transfer switches that don't switch. Generators that don't start below 50 degrees. Generators with empty fuel tanks staffed by smirking employees with diesel vehicles. When you're adding capacity to battery string A, and the contractor shorts out the mislabeled B bus while pulling cable for the "A" bus.
Experience shows that if a companies core competency is not running power plants, they would be better off not trying to build and maintain a small electrical power plant. Microsoft has conditioned users to expect failure and unreliability, use that conditioning to your advantage... the users don't particularly care if its down because of a OS patch or a loss of -48VDC...
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
Why go with a huge, multiple 9's datacenter, when you can go the way of google, and have a RAID:
Redundant Array of Inexpensive Datacenters..
Is really better to have 1000 machines in a 5-9's location, or 500 systems each in a 4-9's, with extra cash in hand?
What are we going to do tonight Brain?
Designing nontrivial systems without single points of failure is difficult and expensive. Worse, it has to be built in from the ground up. Which it rarely is: by the time a system is valuable enough to merit the cost of a failover system, the design choices which limit certain components to single devices have long since been made.
Which means uptime matters. 1% downtime is more than 3 days a year. Unacceptable.
The TIA-942 data center tiers are a formulaic way of achieving satisfactory uptime. They've been carefully studied and statistically tier-3 data centers achieve three 9's uptime (99.9%) while tier-4 data centers achieve four 9's. Tiers 1 and 2 only achieve two 9's.
Are there other ways of achieving the same or better uptime? Of course. But they haven't been as carefully studied which means you can't assign a high a confidence to your uptime estimate.
Is it possible to build a tier-4 data center that doesn't achieve four 9's? Of course. All you have to do is put your eggs in one basket (like buying all the same brand of UPS) and then have yourself a cascade failure. But with a competent system architect, a tier-4 data center will tend to achieve at least 99.99% annual uptime.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
European bank IT people are some of the most conservative and risk-averse people on the planet. If you ask them which is more important, infrastructure or best practices, they will answer "Yes."
----------
Change is inevitable. Progress is not.
On a strict IT budget cost-effectiveness basis, the most uptime for your dollar will be Windows (Windows admins practically grow on trees, so they are cheap) on some commodity Pizza Box servers, connected to some cheap NAS storage and networked with crap switches. If you are an IT manager looking for your short-term bonus before you move onto greener pastures, this is a great idea! There is a good chance you will be able to hold things together long enough to get your bonus, and then get outta there.
Of course, if you actually care about the business IT is supposed to support, you will get a setup a bit more trusty. But if the IT manager isn't incentivized for long-term uptime stats, it just isn't gonna happen.
SirWired
Libraries worked just fine, for thousands of years.
The truly paranoid aren't working at NSA, they're working as safety engineers :-) It's a genuine pity that the so-called bond rating agencies aren't required to employ safety engineers to review the crazy stuff dreamed up by investment bankers.
There are so many "Black Swan" events that safety engineers have to worry about that some of them sound insane ... even when they're events that have happened at least once before.
(As Dave Barry says, "I am not making these up") "Cattle in the Lobby", "Concerted Attack by Rodentia", etc.
Having your data center where it:
- floods
- hurricanes
- earthquakes
- end of airport runways
- bad power supply
- bad network connections
Is just crazy.
5 miles from me, there are streets and homes with 6 feet of water covering them.
I know of multiple telecom data centers that are in hurricane paths or other possible major accident locations that could take out the building.
If your data center doesn't have redundant power from 2 different substations AND the power company doesn't offer continuous power, WHAT ARE YOU THINKING? That's a resource room, not a data center.
Your network uplinks need to be redundant via 2 different providers.
If you can't afford this infra - pay for colocation where they do OR design all your systems to be geographically redundant. I think you'll find that most companies can't afford to do that.
Were you running relational databases? What did you do about schema changes?
(i.e. presumably if you were running relational DBs then there would be one big data set which would be shared between all three sites; you couldn't e.g. deadvertize one site, change the schema, then readvertise, as then the schemas would be different...)