Slashdot Mirror


How Do You Evaluate a Data Center?

mpapet writes to ask about the ins and outs of datacenter evaluation. Beyond the simpler questions of physical access control, connectivity, and power redundancy/capacity and SLA review, what other questions are important to ask when evaluating a data center? What data centers have people been happy with? What horror stories have people lived through with those that didn't make the cut?

18 of 211 comments (clear)

  1. Just off the top of my head by Critical+Facilities · · Score: 4, Insightful

    Beyond the simpler questions of physical access control, connectivity, and power redundancy/capacity and SLA review

    Well first of all, I don't know that I'd write any of those things off as "simple". But some other points worth looking into would be:

    1. Raised Floor Height
      Cable Management (over or under floor)
      Cooling Capacity and Redundancy
      Power Quality (not just redundancy)
      Age and Condition of Electrical Hardware (ATSs, STSs, UPSs, Generators)
      Outage/Uptime History
      Fire Suppression System and Smoke Detection System
      Maintenance records
      Maintenance records
      Maintenance records
    1. Re:Just off the top of my head by jeffmeden · · Score: 3, Insightful

      Add to that:

      -KW deliverable to each rack

      -Ambient temperature in the cold aisle and how closely it's held (and possibly make it part of SLA)

      -On site technicians (and/or security) and their hours

      -Customer access policy and applicable hours (are you going to be happy, AND are threats going to be kept out?)

    2. Re:Just off the top of my head by whoever57 · · Score: 3, Interesting

      That's interesting, but the OP really needs to know what is good or not. For example, you state "Raised Floor Height". What is good? Newer datacenters don't have raised floors because it is more energy efficient to have concrete floors. "Cooling Capacity" -- what's good and what is bad? How is this measured? Some datacenters may talk aobut how cool they keep the ambient air, but there isn't much evidence that this actually provides a noticable difference to the lifetime or any other factor related to the equipment.

      --
      The real "Libtards" are the Libertarians!
    3. Re:Just off the top of my head by JWSmythe · · Score: 5, Informative

          I noticed something when touring one datacenter. They had a neat conference room that overlooked the whole datacenter. You could see the heat rising off of one area (Google's room). They went on and on about the wonders of their cooling, and how they had so much capacity.

          We later took the guided tour. The person I was with was talking to our guide, and I was paying careful attention to our environment. There were tremendous hotspots on the floor. We're not talking about 78 degrees. It was closer to the 90's. Other spots were downright cold. Why? Because they had all this capacity, and no real planning. The circulation was insufficient, even though the capacity was available. A well populated rack will always be hot at the back, but it's expected that they will draw the air off of that area rather quickly. I've even seen datacenters that enforce their hot/cold aisles, but then there isn't much of a reason for it. There is no air return on the hot side, and it's just blowing at another aisle's cold side.

          Sometimes it's good to just walk the floor with a tech (not a salesman), and ask questions about the operation. What kind of fiber do you have coming in? How many providers? How good are your generators really? Do you test them on a regular basis? I've found a sales minion will say there are a dozen providers coming in, but it will turn out that only one has substantial fiber, and the others are sharing that. {sigh} Sometimes they will have generators, but they've never test fired them. Sometimes the tech is just frustrated at the nonsense at that datacenter, and that's indicative of how it's going to be to work with them.

             

      --
      Serious? Seriousness is well above my pay grade.
    4. Re:Just off the top of my head by Sandbags · · Score: 4, Interesting

      - Raised floor is certainly important, and a given. Check
      - Cable management above AND below the floor. This is not an either-or... Check
      - Cooling capacity is hard to judge, should be scalable. Redundancy is often overlooked but is often even more important that capacity... Check
      - Power quality: never seen a big datacenter without a Liebert, or at least UPS in every rack. Power does not have the be contitioned except between the UPS and the machines/devices. A whole data center power conditioner is often more efficient, but unnecessary for the little guys. either way - check.
      - Age is irrelevent as long as it's under support. If it's not, replace it. Generators need to be run several times a year to validate their condition, and also to grease the innards... See too many good generators get kicked on and fail an hour later because the oil hand't been changed in 3 years....
      - Outages should be tracked, by system, rack row, and power distro. When system seem to be going down more frequently in one area, there's usually an underlying reason... As Google recently proved as well for us all, do not ASSUME all is well, routine disgnostics including memory scans should be performed on ALL hardware. Even ECC RAM deteriorates with age (rapidly) and needs to be part of a maintenance testing and replacement policy - Check.
      - Fire suppression is usually part of your building codes, and a given, as is the routine checks (at least anually) by law.

      In addition, we deploy:
      - Man traps on all enterences to data centers. You go in one door, it closes, then you authenticate to a second door. A pressure plate ensures only one person goes in/out at a time (and it it's tripped, a scurity guy looking at a screen has to override).
      - Full 24x7 video surveilance of the data centers.
      - in/out logs for all equipment. To take a device in/out of a datacenter requires it being logged in a book (by a designated person). This is for anything the size of a disk/tape and larger. All drive bays are audited nightly by security and if drives go missing, security reviews the access logs and server room security footage to see who might have taken them.
      - clear and consistent labeling systems for rack, shelves, cables and systems.
      - pre-cable just about everything to row level redundant switches, and have no cabling from server to other servers not passed through a rack/row switch first. Row switches connect to distro switches. This ensures cabling is simple, and predictable.
      - Colorcoded cabling: we use 1 color for redundant cabling (indicating their should be 2 of these connected to the server at all times, and to seperate cards in the backplane and seperate switches to boot), a seperate color for generic gigabit connections, another color for DS View, another color the out management network(s), another color for heartbeat cables, and yet another for non-ethernet (T1/PRI/etc). Other colors are used in some areas to designate 100m connections, special connectivity, or security enclave barriers, and non-fiber switch-to-switch connections. Every cable is labled at both ends and every 6-8 feet inbetween.
      - FULLY REDUNDANT POWER. It's not enough to have clean poewr, and good UPS and a generator. In a large datacenter (more than a few rows, or anything truly mission critical), you should have 2 seperate power companies, 2 seperate generators, and 2 fully segregated power systems at the datcenter, room, row, and rack levels. in each datacenter we use 2 Liebert mains, each row has a seperate distribution unit connected to a differnt main, and each rack has 4 PDUs (2 to each distro). Every server is connected to 2 seperat PDUs, run all the way back to 2 completely independent power grids. For a deployment of 50 servers or so this is big time overkill. We have over 3500 servers, we need this... We can not rely on a PSU failure taking out racks at a time which may server dozens of other systems each.

      --
      There is no contest in life for which the unprepared have the advantage.
    5. Re:Just off the top of my head by icebike · · Score: 3, Interesting

      Presumably the OP is looking for a hosting site, or processing center, rather than looking at purchasing the facility.

      If so very few of the items mentioned in the parent post are germane, other than Outage/Uptime History. What is under the floor is not your problem in hosting arrangement.

      You might be interested in location (flood plain, quake zone) and, but if the place has been in business for more than 10 years it all boils down to Outage/Uptime History.

      The cost, and ease of migration should the relationship sour and the names of the last big customers to exit the facility would be nice to know.

      --
      Sig Battery depleted. Reverting to safe mode.
  2. i ran a junky data center by digitalsushi · · Score: 4, Informative

    I ran a data center long, long ago. My sales guy knew it wasn't going to pan out and threw me to the wolves. He asked me to start the tour, and then he took a long lunch to miss it.

    The guys I gave the tour to seemed very intelligent. They only spent about 60 seconds on our data center. The instant they saw the carpet, their eyebrows were up. When I didn't lie to them that there was no diesel generator on the other side of the (secretly dead) batteries, they did exactly what they should have and stormed out without saying thanks.

    --
    slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
    1. Re:i ran a junky data center by Nefarious+Wheel · · Score: 3, Funny

      A smattering of basic physics helps.

      Long ago in a distribution centre a far far away - well, east SF bay, anyway - we had a custom mini doing a bit of work for a major retail store chain's logistics business. In the warehouse they built a little room for the mini upstairs, everything cheap but per spec, they insisted. They used one of their domestic air conditioners for the cooling, as it had the right thermal rating to match the heat dissipation we required for our gear. Cool, we said - no problem, cheap is ok as long as it's specced correctly.

      It wasn't long before we had a service call for a hardware failure. Sent the engineer out, and it was about 110 in the computer room. They'd installed the air intake and air outflow of the air conditioner in the same tiny room.

      --
      Do not mock my vision of impractical footwear
    2. Re:i ran a junky data center by Anonymous Coward · · Score: 3, Funny

      I think "data center carpet" should be a new slashdot meme. I can not stop laughing at how ridiculous that "data center" must have looked with that carpet. Please tell me that it was the baby poo green shag carpet from the 70's. That would really make it feature complete.

  3. Additional Questions by Astrobirdr · · Score: 3, Insightful

    I'd also ask:

    Number of years in business.
    Involvement of the owner in the current business.
    Number of years the current owner has been in this business.
    Also do a check with the Better Business Bureau to see what, if any, complaints had been filed.

    And, as always, Google is your friend -- definitely do a search for the business you are considering along with the word(s) problem, issue, complaint, praise, etc!

  4. Word of mouth by tomhudson · · Score: 4, Insightful

    Find someone you trust who's already a customer. Word of mouth beats any number of white papers or studies or guarantees.

  5. What are you evaluating? by chris.knowles · · Score: 4, Insightful

    There are basically 3 perspectives from which to evaluate the Datacenter. They're pretty well universal to any IT eval. People, Process and Technology. The datacenter facility itself is only one piece of the puzzle (Facility = Technology, which only accounts for a fraction of the total cost of operating a Datacenter). There are also the people running the datacenter and how they are organized and interact with the technology, one another, and their customers (internal and external). From a people/process standpoint, if you want to give a general "score" to them, you can assess them against the SLM maturity scale. (Read about the Gartner Maturity Model for Infrastructure and Operations) Evaluating a datacenter is going to be a balance between the cost of operating the datacenter and the level of service you require from said datacenter. There really isn't enough information in the question to give you a good answer. Are you looking at evaluating the acquisition of a datacenter to grow into, are you looking for a managed services DC to host your gear with operational support? Are you looking for rack space with pipe and power? If you give more details to your inquiry, I'm sure the community can provide you with some great answers.

  6. Do not jump in with both feet by Jailbrekr · · Score: 3, Informative

    Regardless of how well they are decked out, always start with a "pilot project". Start small for a short period to evaluate real world performance of both their equipment and their tech support. We currently have a pilot project in place to evaluate a datacentre for outsourcing our compute requirements. We have learned that while they have exceptionally good equipment in place, their responsiveness and ability to provision is highly questionable.

    --
    Feed the need: Digitaladdiction.net
  7. You missed a few by syousef · · Score: 3, Insightful

    You forgot a few:

    - Enough qualified *on site* staff 24x7 to deal with all clients including yourself

    - 24x7 phone support, with people who understand English and have immediate access to the techies

    - Company financial records and history (You don't want someone almost broke or a new startup with no backing)

    - These days availability of virtualisation solution and supporting hardware (depending on your application, if virtualisation is an option)

    Oh and your emphasis on maintenance records may be a little misplaced. They can be faked. They also may not be available due to security concerns (of their other clients). *IF* you can get hold of them they should be complete. Hardware service level should be part of the agreement and service schedule should be part of that.

    --
    These posts express my own personal views, not those of my employer
  8. an outside air duct by spywhere · · Score: 3, Informative

    When I worked at a corporate office in Maryland, they used the building's air conditioning to cool the server room.
    This worked well until the outside temperature got down to about 15 degrees Fahrenheit, but then it failed miserably: the outdoor condensers no longer functioned, the AC shut down, and the entire IT department went into a panic.
    The first time this happened, I (a lowly Help Desk tech) suggested to the CIO that he run a duct into the room from the outside: a simple fan would bring in enough sub-freezing air to cool the servers.
    The second time it happened, the look on his face told me he hadn't taken my suggestion seriously enough.
    The third time, he flipped a switch and the fan cooled his server room just fine.

  9. Re:Some important questions: by Red+Flayer · · Score: 3, Interesting

    If you can, ask them to pull a tile so you can see under the raised floor. Underfloor cabling (and suspended ceiling cabling for that matter) should be neat, tied, and labelled. Dead cables should be pulled, not left to rot. There has to be sufficient clearance for unrestricted airflow. Cages are better than lying on the floor.

    Just want to add... Don't let them pick the tile. They probably get this request frequently enough that they have a "show" tile or two if they are a shoddy organization. Pick one on your tour, as an offhand request that you had "forgotten" until then. If they try to steer you to a specific tile, that tells you they have something to hide, and you need to question everything else they've shown you samples of.

    [paranoid and loving it]

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
  10. I'm going to turn this around. by NoNsense · · Score: 5, Interesting

    I am the Director of Operations for our DC. When we give tours, I explain the following (pseudo order of the tour):

    - Begin with the history of the building, when it was built (1995), why it was build (result of Andrew in 1992), and how it is constructed (twin T, poured tilt wall).

    Infastructure:
    - Take you through the gen room, show you it is internal to the building, show you the roofing structure from the inside, explain the N+1 redundancy, the hours on the gens, when they are ready for maintenance, how they are maintained, by whom (the vendor), how the diesel is stored, supplied, duration of fuel at max and current loads. Explain conduct before a hurricane or lockdown, how we go off grid 24hours ahead of a storm, mention our various contracts for after storm refill and our straining / refill schedule.
    - Take you to the switch gear room, explain the dual feeds from the power company, how the switch gear works, show you the three main bus breakers, show you the numerous other breakers for various sub panels, etc. Explain and show you the spare breakers we have in case replacement is needed.
    - Take you to the cooling tower area, explain the piping, the amount of water flowing, the number of pumps, how many are needed, the switching schedule, explain the N+1 capacity and overall capability of the towers, explain maintenance, show you the replacement pumps in stock, explain the concept of condensed water cooling if needed.
    - Take you through the UPS and battery rooms, explain the needed KW capacity, what the UPSs back up and what they do not. Show the various distribution breakers out to floor, their capacity, the static switches, bypass, explain the battery capacity, type of cells, number of cells, number of strings, last time the jars were replaced and how they are maintained. Explain max capacity of the load vs time. Answer questions relevant to switching from utility->UPS->generator and back.

    Raised floor:
    - Take walk on raised floor, explain connectivity, vendors, path diversity we have, how the circuits are protected. Show them network gear, dual everything, how we protect from a LAN or WAN outage, and specific network devices we have for DDoS, Load Balancing, Distribution, Aggregation. Explain how telco and others deliver DS0 to OC-12 capacity, offer information on cross connections regarding copper, fiber, coax. Explain our offerings (dedicated servers up to 5K sq ft cages) and ask what they are interested in.
    - Explain below the floor, size of raise, that power and network is delivered under, what are on level one trays, level two trays, and the piping for cooling. Show the PDU units and how they related to the breakers in the previous rooms. Show them the cooling panel and leads out to CRAC units, explain the cooling capacity, plans for future cooling, explain hot/cold aisle fundamentals, and temperature goals. At this point, there are usually more questions about vented tiles, power types available and overall floor density in watts/sq ft.
    - Explain the fire detection / mitigation system, monitoring of PDU's, CRAC units, and FM200. Explain the maintenance of the fire system, show them the fire marshal inspection logs and the panels that alert the police and fire departments (both on floor and in our security office in front).
    - While finishing the walk on the floor, show cameras, explain process to bring in and remove equipment, tell them the retention on the video, explain the rounds the guards make, the access list updates and changes.

    NOC:
    - At this point we're back to the front of the building, go into the NOC, explain what we are monitoring (connectivity, weather, scheduled jobs, etc). Introduce NOC and security staff, explain they will always get a person if they call, submit a test ticket from a e-mail on my phone, they will see the alerts light up and the pager for the NOC will signal. The final steps are to introduce them to security and then I'll lead the customer(s) to the conference room so they can continue the conversation

    --
    So there.
  11. Re:Some important questions: by vlm · · Score: 3, Interesting

    Just want to add... Don't let them pick the tile. They probably get this request frequently enough that they have a "show" tile or two if they are a shoddy organization.

    If you pull this stunt, please understand that a techs hidden stockpile of magazines and canned soda does not necessarily indicate a shoddy organization, it merely means they have employees that like reading certain magazines for the interviews, and prefer to store their drinks in a nice clean spot underneath the chiller rather than the proverbially filthy employee refrigerator. On the good side this is a strong indication they don't have an under the floor rodent infestation.

    Strangest thing I ever found under the floor was a vast amount of one employees (clean) clothing. He was kind of stuck in the process of moving and needed a temporary place to stash stuff. Apparently no one found it unusual that he was hauling bags of clothing in and out.

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger