How Do You Evaluate a Data Center?
mpapet writes to ask about the ins and outs of datacenter evaluation. Beyond the simpler questions of physical access control, connectivity, and power redundancy/capacity and SLA review, what other questions are important to ask when evaluating a data center? What data centers have people been happy with? What horror stories have people lived through with those that didn't make the cut?
Beyond the simpler questions of physical access control, connectivity, and power redundancy/capacity and SLA review
Well first of all, I don't know that I'd write any of those things off as "simple". But some other points worth looking into would be:
Cable Management (over or under floor)
Cooling Capacity and Redundancy
Power Quality (not just redundancy)
Age and Condition of Electrical Hardware (ATSs, STSs, UPSs, Generators)
Outage/Uptime History
Fire Suppression System and Smoke Detection System
Maintenance records
Maintenance records
Maintenance records
Look at a datacenter's history [recent and past], outages, maintenance issues, customer support, management and etc, in conjunction with their listed redundancies and capacities.
Just because they have two electrics going to each server, doesn't mean a random maintenance tech will flip the wrong switch. :)
set it on fire, throw floods at it, generate tornados, then top it off with a nice earthquake.
I ran a data center long, long ago. My sales guy knew it wasn't going to pan out and threw me to the wolves. He asked me to start the tour, and then he took a long lunch to miss it.
The guys I gave the tour to seemed very intelligent. They only spent about 60 seconds on our data center. The instant they saw the carpet, their eyebrows were up. When I didn't lie to them that there was no diesel generator on the other side of the (secretly dead) batteries, they did exactly what they should have and stormed out without saying thanks.
slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
I'd also ask:
Number of years in business.
Involvement of the owner in the current business.
Number of years the current owner has been in this business.
Also do a check with the Better Business Bureau to see what, if any, complaints had been filed.
And, as always, Google is your friend -- definitely do a search for the business you are considering along with the word(s) problem, issue, complaint, praise, etc!
Pull floor tiles and compare the amount of obsolete technology-- Thicknet cables, VAX cluster interconnects, water chiller hookups, FDDI cables, etc. with the amount of space remaining.
Anything less than 4 inches of obsolete crud isn't worth excavating. Leave it a few more years.
--Joe
Find someone you trust who's already a customer. Word of mouth beats any number of white papers or studies or guarantees.
I'm assuming this is evaluating for co-location purposes. Here are some things I'd ask.
1) How quickly can I get a new server deployed into it? How do I do it?
2) Can I get a tour? Now? (Note that this not only lets you see the data centre, but also will give you an idea of security. Look for procedures on getting in, notice if they ask you to sign a release form, etc.)
3) How close to capacity are you? (The answer should include space, floor weight, power, cooling, and network. If it doesn't, why not?)
4) What are your racking/networking/cabling standards? (They should have some, at least where you connect to them, but they shouldn't be onerous).
5) How many people manage the data centre? You don't want to be one car accident away from loss of access or service.
6) How about power management? Is the centre on a UPS, redundant UPSes, or nothing? Can you get charts of the power going to the servers? Can you get DC for telecom servers, or only AC? Is it on a generator for long-term outages? (Note that you may not need this--in which case you shouldn't pay for it. Alternatively, if you need it, make sure it's there!)
7) Is it manned 24/7? (Ditto!)
If you can, ask them to pull a tile so you can see under the raised floor. Underfloor cabling (and suspended ceiling cabling for that matter) should be neat, tied, and labelled. Dead cables should be pulled, not left to rot. There has to be sufficient clearance for unrestricted airflow. Cages are better than lying on the floor.
Most of what makes a good data centre comes down to organization. If it's a rats nest, then even if there's one guy who knows "everything," it will be less reliable, less consistent, and less predictable. Procedures should be written down, printed, filed in labeled binders, and regularly updated. (Note: Online copies should be canonical, but also needs to be accessible offline when shit --> fan.)
Fire suppressant mechanisms (wet vs. dry, live pipes, etc.) need to be considered, as does emergency lighting. If the operators need to start digging around for a flashlight to read what they should be doing, then things aren't happening the way they should.
Be picky. If they're leasing space to you, then their data centre design and maintenance is their BUSINESS, and they had better get it right! Look for a neat, well-organized, well-documented, well-panned data centre. Also make sure that it fits your needs.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
There are basically 3 perspectives from which to evaluate the Datacenter. They're pretty well universal to any IT eval. People, Process and Technology. The datacenter facility itself is only one piece of the puzzle (Facility = Technology, which only accounts for a fraction of the total cost of operating a Datacenter). There are also the people running the datacenter and how they are organized and interact with the technology, one another, and their customers (internal and external). From a people/process standpoint, if you want to give a general "score" to them, you can assess them against the SLM maturity scale. (Read about the Gartner Maturity Model for Infrastructure and Operations) Evaluating a datacenter is going to be a balance between the cost of operating the datacenter and the level of service you require from said datacenter. There really isn't enough information in the question to give you a good answer. Are you looking at evaluating the acquisition of a datacenter to grow into, are you looking for a managed services DC to host your gear with operational support? Are you looking for rack space with pipe and power? If you give more details to your inquiry, I'm sure the community can provide you with some great answers.
Regardless of how well they are decked out, always start with a "pilot project". Start small for a short period to evaluate real world performance of both their equipment and their tech support. We currently have a pilot project in place to evaluate a datacentre for outsourcing our compute requirements. We have learned that while they have exceptionally good equipment in place, their responsiveness and ability to provision is highly questionable.
Feed the need: Digitaladdiction.net
What does your company _NEED_? How much bandwidth do you need? What kind of servers do you need? Are you looking for Co-Lo or Dedicated? If you're doing Co-Lo, how much power and space do you need? If you're doing dedicated, do you need managed or unmanaged? PCI compliance? HIPAA compliance? Do you want to pay for certain redundancies? Do you need an Uptime Institute Tier certified facility? I could go on and on. The one thing that you need consistently is good customer service. The rest depends on what you need. Full Disclosure: I work for one of the biggest privately held dedicated hosting companies on the planet.
Since the odds are I'm going to be spending the night there at some point, good vending machines or a cafeteria are a must.
"Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
Such as street access. Is there more than one way in, if the access road was closed off (police incident, subsidence, civil unrest - depending where it's sited), what would happen. Could staff get to work, or leave for home?
Ease of recruiting / retaining sufficiently qualified staff in the locale, or persuading your to commute or relocate
Is the on-site restaurant / canteen or local eateries likely to give everyone food poisoning (this could be a single point of failure)
Local crime rate - number of times the facility has been broken in to - even the amount of graffiti on the walls could be a negative indicator
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
You forgot a few:
- Enough qualified *on site* staff 24x7 to deal with all clients including yourself
- 24x7 phone support, with people who understand English and have immediate access to the techies
- Company financial records and history (You don't want someone almost broke or a new startup with no backing)
- These days availability of virtualisation solution and supporting hardware (depending on your application, if virtualisation is an option)
Oh and your emphasis on maintenance records may be a little misplaced. They can be faked. They also may not be available due to security concerns (of their other clients). *IF* you can get hold of them they should be complete. Hardware service level should be part of the agreement and service schedule should be part of that.
These posts express my own personal views, not those of my employer
When I worked at a corporate office in Maryland, they used the building's air conditioning to cool the server room.
This worked well until the outside temperature got down to about 15 degrees Fahrenheit, but then it failed miserably: the outdoor condensers no longer functioned, the AC shut down, and the entire IT department went into a panic.
The first time this happened, I (a lowly Help Desk tech) suggested to the CIO that he run a duct into the room from the outside: a simple fan would bring in enough sub-freezing air to cool the servers.
The second time it happened, the look on his face told me he hadn't taken my suggestion seriously enough.
The third time, he flipped a switch and the fan cooled his server room just fine.
More important than the technology is the policies and training of the personnel running the operation. It will fail, eventually: It always does, no matter how well its designed or what with promises of infinite uptime. So walk into the data center and count the number of people wearing hiking boots, divide by the number of racks, and there you go. The most grizzly looking guy wearing hiking boots usually knows everything. He also usually has a lighter and a screwdriver if you ask.
I don't know why this is...
#fuckbeta #iamslashdot #dicemustdie
I used to have a large cage in an Exodus colocation facility. Turns out that if we wanted to put in an EMC Symm5 (these are three tiles wide), we would have to rent a fork lift and put it through an open rollup door on the second floor. Their "freight elevator" was barely big enough for two people and a dolly.
One of my other cages was housed in a Global Crossing facility; when they started to run out of out cooling, they would hook up huge external A/C units in the parking lot and run 2ft diameter ducting to a hole in the wall. If you happened to walk near one of these openings you'd be greeted by freezing 50mph winds.
Anybody find it odd that Exodus bought Global Crossing, who then went out of business?
I am the Director of Operations for our DC. When we give tours, I explain the following (pseudo order of the tour):
- Begin with the history of the building, when it was built (1995), why it was build (result of Andrew in 1992), and how it is constructed (twin T, poured tilt wall).
Infastructure:
- Take you through the gen room, show you it is internal to the building, show you the roofing structure from the inside, explain the N+1 redundancy, the hours on the gens, when they are ready for maintenance, how they are maintained, by whom (the vendor), how the diesel is stored, supplied, duration of fuel at max and current loads. Explain conduct before a hurricane or lockdown, how we go off grid 24hours ahead of a storm, mention our various contracts for after storm refill and our straining / refill schedule.
- Take you to the switch gear room, explain the dual feeds from the power company, how the switch gear works, show you the three main bus breakers, show you the numerous other breakers for various sub panels, etc. Explain and show you the spare breakers we have in case replacement is needed.
- Take you to the cooling tower area, explain the piping, the amount of water flowing, the number of pumps, how many are needed, the switching schedule, explain the N+1 capacity and overall capability of the towers, explain maintenance, show you the replacement pumps in stock, explain the concept of condensed water cooling if needed.
- Take you through the UPS and battery rooms, explain the needed KW capacity, what the UPSs back up and what they do not. Show the various distribution breakers out to floor, their capacity, the static switches, bypass, explain the battery capacity, type of cells, number of cells, number of strings, last time the jars were replaced and how they are maintained. Explain max capacity of the load vs time. Answer questions relevant to switching from utility->UPS->generator and back.
Raised floor:
- Take walk on raised floor, explain connectivity, vendors, path diversity we have, how the circuits are protected. Show them network gear, dual everything, how we protect from a LAN or WAN outage, and specific network devices we have for DDoS, Load Balancing, Distribution, Aggregation. Explain how telco and others deliver DS0 to OC-12 capacity, offer information on cross connections regarding copper, fiber, coax. Explain our offerings (dedicated servers up to 5K sq ft cages) and ask what they are interested in.
- Explain below the floor, size of raise, that power and network is delivered under, what are on level one trays, level two trays, and the piping for cooling. Show the PDU units and how they related to the breakers in the previous rooms. Show them the cooling panel and leads out to CRAC units, explain the cooling capacity, plans for future cooling, explain hot/cold aisle fundamentals, and temperature goals. At this point, there are usually more questions about vented tiles, power types available and overall floor density in watts/sq ft.
- Explain the fire detection / mitigation system, monitoring of PDU's, CRAC units, and FM200. Explain the maintenance of the fire system, show them the fire marshal inspection logs and the panels that alert the police and fire departments (both on floor and in our security office in front).
- While finishing the walk on the floor, show cameras, explain process to bring in and remove equipment, tell them the retention on the video, explain the rounds the guards make, the access list updates and changes.
NOC:
- At this point we're back to the front of the building, go into the NOC, explain what we are monitoring (connectivity, weather, scheduled jobs, etc). Introduce NOC and security staff, explain they will always get a person if they call, submit a test ticket from a e-mail on my phone, they will see the alerts light up and the pager for the NOC will signal. The final steps are to introduce them to security and then I'll lead the customer(s) to the conference room so they can continue the conversation
So there.
I'd guess 90% of projects fail at step #1: Define your needs. What's the objective here? Why are we doing this, and what are the benchmarks required for success. Does this sound familiar?
First, define your needs, then evaluate possible solutions to what might meets your needs.
If you don't know what you need, you don't know what the hell you are doing. Hire someone who does, like a consultant.
I wrote an extensive article on choosing a datacenter/colocation facility several months back. The full post can be found on my blog, but I will paste it below for your Slashdot reading convenience:
http://www.bitplumber.net/2009/04/how-to-choose-a-colocation-facility/
How to choose a colocation facility
Choosing a colocation facility is one of the most important decisions an IT professional can make. It will have repercussions for years down the road, as there is generally a contract term associated, and it becomes difficult/costly to move. At the same time, unless you are a facilities professional, it is hard to tell the difference between the quality of one facility vs. that of another without knowing the right questions to ask. I have developed this list in the hopes that it will be a reference to folks evaluating datacenter options. This has been written using the assumption that you need a local datacenter rather than a DR facility (which can have very different needs), however, many of the same concepts will apply.
Location
When it comes right down to it, there are still certain things you have to do physically in person. You can’t run a network cable through SSH or RDP. Having a datacenter close by makes a huge difference, especially when you lose remote connectivity and must go push a button in an emergency (we all have done this once or twice). In general, the newer, more high-end, and redundant your equipment is, the less you should have to touch it in person. Things are getting much better with out of band remote access controllers, but sometimes being there is worth a lot. You can’t hear that fan making funny noises from your office.
Does the facility have good access to transportation such as freeways and airports? Are their hotels nearby if you will have out-of-town contractors visiting? How close to logistics depots are you for your vendor-of-choices parts, i.e. Cisco, Dell, HP, etc
Does the facility have adequate parking that is close to the building, does it cost money? Is it somewhere you want to leave your car in the middle of the night while you are inside working?
Do you have line-of-sight to the datacenter? If you can manage to get a wireless link to your datacenter this can be an extremely cost-effective option for high speed connectivity. There is something to be said for controlling your own destiny when it comes to your connectivity rather than being at the mercy of a telecom provider. Will the facility allow you to put a wireless antenna on the roof and how much will they charge?
Staffing
Do they have on-site staff 24×7 to respond to emergency situations, to secure the facility, and to provide access when you forget/loose your badge (or have to stop by on your way home from the gym).
If they do not have staff on site 24×7, what is their on-call policy? How long would it take them to respond to a power failure, a UPS exploding, a transformer catching fire in the parking lot, an Internet outage, an FM-200 fire suppression system going off, an HVAC system failing, or any other major malady (yes I have had all of these things happen to me in facilities I have worked in, and I am still waiting for the day a fire sprinkler goes off or there is a real fire in a datacenter).
What level of professional services can they provide? Basic remote hands (please press the power button)? More advanced troubleshooting (help diagnose a failed network switch)? Or even managed services (i.e. they take care of backups).
How competent are their NOC engineers, facilities folks, etc What quality of vendors do they use to do electrical work, HVAC maintenance, network cabling? This can be hard to tell, but there are lots of small clues you can pick up on.
Does their staff speak English fluently and without heavy accent? It is extremely difficult to communicate on the phone with someone in a loud datacenter environment about complex technical issues when both of you are having a hard time understanding each other. This dramatically slo
Is there a good desk working area? Is there a landline/PBX for you to make calls from? Is there decent mobile phone reception in the work area and by your cabinet? Can you eat food or bring drinks into the work area or around your cabinet? Is it in a shady neighborhood, where you might feel a little intimidated bringing in tens of thousands of dollars of emergency IT equipment @ 3 AM? In the event that your credentials aren't working (i.e. hand scanner, ID card swipe), can they let you in remotely, or is it manned 24/7? Is it carrier neutral and are there other backbone providers that you can connect with? Do they charge for running cables between cabinets, especially in cases where the cabinets are not adjacent? What is the max amperage that they'll provide per cabinet? Do the rack cabinet doors remove easily? Are there chairs available, and damn it, are they comfortable?
Your question is a little ambiguous. Are you looking to buy a data center of your own or are you renting rackspace?
If you are buying the Data Center
1.) Normal title , lien, Structural due diligence as for any RE purchase
2.) Is it on a flood plain
3.) Seismically active site?
4.) Serviced by multiple communication providers from multiple CO's
5.) Power available from two different substations.
6.) Physical security / susceptibility to civil unrest
7.) Physical access driveways, parking, loading docks, hallway widths elevators ramps
8.) Floor / raised floor design loads. I have seen more than one raised floor rippled by rolling overweight gear on it.
9.) On site power generation / fuel storage. Mech. condition, age, availability, reliability, repair-ability
10.) Sufficient Chiller Capacity
11.) Sufficient UPS / Power Conditioning
12.) Sufficient space both for current needs and growth for planned lifetime
13.) Sufficient office / command center space
Those should be adequate to get you started.
For rented rackspace
I would say you at least need to glance at items 2 through 11 above. Beyond that
1.) Per rack power limits
2.) Physical security
3.) If you are using "hands on" services it's skill set and response time.
4.) Whatever value add services you will be using.
Sorry it is late and a long day and this is all I can think of.