Outfitting a Brand New Datacenter?
An anonymous reader writes "We completed our new 4,000 sq. ft. data center (Tier II/III, according to The Uptime Institute) and just recently moved our core systems from our old data center to the new. We've been up and running for several months now and I'm preparing to close out the project. The last piece is to purchase some accessories and tools for the new location. The short list so far consists of a Server Lift, a few extra floor tile pullers, flashlights and a crash cart. We'll also add to the tools in the toolbox located in one of the auxiliary rooms — these things seem to have legs! What are we missing? Where can we find crash carts set up more for a data center environment (beyond the utility cart with and LCD, keyboard, and mouse strapped to it)?"
Ear protection
O2 masks for when the Halon drops
arrows on the floor directing people to the nearest exit
a 'Battleship' style row/column marker for every row/column of racks
near-Draconian access control policies
I want to delete my account but Slashdot doesn't allow it.
Get a nice comfy Plantronics headset for the POTS line nearby. In a noisy datacenter, while on a mission critical tech support call, the last thing you need is your hand pressing the phone to your ear and/or crappy cell phone audio.
tm
Support TBI Research: http://www.raisinhope.org
Some things I saw in the last datacenter I worked at that I found indispensible:
- one of those headlamp lights for hands-off work on servers (put this in the tool box)
- a way to track who has the tools in the toolbox (check it at start and end of each shift and record such)
- at least 2 cordless headset phones (ever try to move around a server room tied to a cord)
- a supply of batteries for everything that needs them
- a couple 7-day temperature gage chart recorders at various locations in the center + supply of graph paper (useful for A/C issues)
- status check at start and end of each shift (temperature, server status lights, A/C, UPS, equipment in toolbox, etc.)
- a way to log all operations status (we used an in-house Access database which had to be updated at end of each shift)
- install 2 large UPS systems and connect the dual power supplies one to each USP
- instigate a policy "If you change any system stuff on a server, reboot it to ensure it comes back in a known state" Schedule downtime if needed
- don't offer or expect 7x24x365 availability unless you've built fault-tolerant servers that can do this--every system needs downtime for one reason or another and have a slot allocated for regular downtimes on a monthly basis. Emergency hardware outages don't count against this. But when are you going to roll out patches you've tested in the test environment (you _have_ a test environment that somewhat duplicates production, right?)
- NO DEVELOPERS ALLOWED ON PRODUCTION SERVERS. THIS IS A TERMINATION OFFENSE (WITH EXTREME PREJUDICE).
- Limit who has root to groups of servers. Only the datacenter manager should have root to everything.
Have a server shutdown procedure (order that servers go down in the event of a power or A/C event)
If you have a motor generator for backup power, test it quarterly so it will kick in when there's a city power outage. This will avoid the problems seen in the 365 Main Street outage in S.F.
I had to chuckle when I heard about 365 Main. The old datacenter manager would have covered that with the periodically tested motor generator.