How Do You Evaluate a Data Center?
mpapet writes to ask about the ins and outs of datacenter evaluation. Beyond the simpler questions of physical access control, connectivity, and power redundancy/capacity and SLA review, what other questions are important to ask when evaluating a data center? What data centers have people been happy with? What horror stories have people lived through with those that didn't make the cut?
I noticed something when touring one datacenter. They had a neat conference room that overlooked the whole datacenter. You could see the heat rising off of one area (Google's room). They went on and on about the wonders of their cooling, and how they had so much capacity.
We later took the guided tour. The person I was with was talking to our guide, and I was paying careful attention to our environment. There were tremendous hotspots on the floor. We're not talking about 78 degrees. It was closer to the 90's. Other spots were downright cold. Why? Because they had all this capacity, and no real planning. The circulation was insufficient, even though the capacity was available. A well populated rack will always be hot at the back, but it's expected that they will draw the air off of that area rather quickly. I've even seen datacenters that enforce their hot/cold aisles, but then there isn't much of a reason for it. There is no air return on the hot side, and it's just blowing at another aisle's cold side.
Sometimes it's good to just walk the floor with a tech (not a salesman), and ask questions about the operation. What kind of fiber do you have coming in? How many providers? How good are your generators really? Do you test them on a regular basis? I've found a sales minion will say there are a dozen providers coming in, but it will turn out that only one has substantial fiber, and the others are sharing that. {sigh} Sometimes they will have generators, but they've never test fired them. Sometimes the tech is just frustrated at the nonsense at that datacenter, and that's indicative of how it's going to be to work with them.
Serious? Seriousness is well above my pay grade.
I am the Director of Operations for our DC. When we give tours, I explain the following (pseudo order of the tour):
- Begin with the history of the building, when it was built (1995), why it was build (result of Andrew in 1992), and how it is constructed (twin T, poured tilt wall).
Infastructure:
- Take you through the gen room, show you it is internal to the building, show you the roofing structure from the inside, explain the N+1 redundancy, the hours on the gens, when they are ready for maintenance, how they are maintained, by whom (the vendor), how the diesel is stored, supplied, duration of fuel at max and current loads. Explain conduct before a hurricane or lockdown, how we go off grid 24hours ahead of a storm, mention our various contracts for after storm refill and our straining / refill schedule.
- Take you to the switch gear room, explain the dual feeds from the power company, how the switch gear works, show you the three main bus breakers, show you the numerous other breakers for various sub panels, etc. Explain and show you the spare breakers we have in case replacement is needed.
- Take you to the cooling tower area, explain the piping, the amount of water flowing, the number of pumps, how many are needed, the switching schedule, explain the N+1 capacity and overall capability of the towers, explain maintenance, show you the replacement pumps in stock, explain the concept of condensed water cooling if needed.
- Take you through the UPS and battery rooms, explain the needed KW capacity, what the UPSs back up and what they do not. Show the various distribution breakers out to floor, their capacity, the static switches, bypass, explain the battery capacity, type of cells, number of cells, number of strings, last time the jars were replaced and how they are maintained. Explain max capacity of the load vs time. Answer questions relevant to switching from utility->UPS->generator and back.
Raised floor:
- Take walk on raised floor, explain connectivity, vendors, path diversity we have, how the circuits are protected. Show them network gear, dual everything, how we protect from a LAN or WAN outage, and specific network devices we have for DDoS, Load Balancing, Distribution, Aggregation. Explain how telco and others deliver DS0 to OC-12 capacity, offer information on cross connections regarding copper, fiber, coax. Explain our offerings (dedicated servers up to 5K sq ft cages) and ask what they are interested in.
- Explain below the floor, size of raise, that power and network is delivered under, what are on level one trays, level two trays, and the piping for cooling. Show the PDU units and how they related to the breakers in the previous rooms. Show them the cooling panel and leads out to CRAC units, explain the cooling capacity, plans for future cooling, explain hot/cold aisle fundamentals, and temperature goals. At this point, there are usually more questions about vented tiles, power types available and overall floor density in watts/sq ft.
- Explain the fire detection / mitigation system, monitoring of PDU's, CRAC units, and FM200. Explain the maintenance of the fire system, show them the fire marshal inspection logs and the panels that alert the police and fire departments (both on floor and in our security office in front).
- While finishing the walk on the floor, show cameras, explain process to bring in and remove equipment, tell them the retention on the video, explain the rounds the guards make, the access list updates and changes.
NOC:
- At this point we're back to the front of the building, go into the NOC, explain what we are monitoring (connectivity, weather, scheduled jobs, etc). Introduce NOC and security staff, explain they will always get a person if they call, submit a test ticket from a e-mail on my phone, they will see the alerts light up and the pager for the NOC will signal. The final steps are to introduce them to security and then I'll lead the customer(s) to the conference room so they can continue the conversation
So there.