Outfitting a Brand New Datacenter?
An anonymous reader writes "We completed our new 4,000 sq. ft. data center (Tier II/III, according to The Uptime Institute) and just recently moved our core systems from our old data center to the new. We've been up and running for several months now and I'm preparing to close out the project. The last piece is to purchase some accessories and tools for the new location. The short list so far consists of a Server Lift, a few extra floor tile pullers, flashlights and a crash cart. We'll also add to the tools in the toolbox located in one of the auxiliary rooms — these things seem to have legs! What are we missing? Where can we find crash carts set up more for a data center environment (beyond the utility cart with and LCD, keyboard, and mouse strapped to it)?"
Ear protection
O2 masks for when the Halon drops
arrows on the floor directing people to the nearest exit
a 'Battleship' style row/column marker for every row/column of racks
near-Draconian access control policies
I want to delete my account but Slashdot doesn't allow it.
At the DC I work at we have a crap load of extra gear. Make sure you have one emergcy kit in your core room, ensure that no one is to use it unless it is an emergmcy. The kit should have but not limited to the following: screw drivers mounting screws/cage nuts knife (a Leatherman multi-tool) spare patch/cross-over cables (Copper) (various length) spare fibre patch cables (Various length) Cable tester (Copper/fibre) couplers for fibre fibre cleaning kits Patch panel punch tool spare hard ware for core gear We have more gear however i'm drawing a bit of a blank as I haven't needed to look at the kit for a while.
And a good sized crescent wrench. Absolutely indispensable.
Drop it across the terminals of one of your backup batteries -- when it's disconnected from the grid. When the wrench cools off, store it in a safe place. Makes a great scapegoat when things go wrong. Could save your career...
I've used crash carts from a company called Ergotron: http://www.ergotron.com/tabid/158/language/en-US/d efault.aspx
:) (@ 90 watts/sq foot of cooling). But they did not(at the time, they wised up July of last year and now strictly enforce their cooling capacity at this particular data center).
At my current and my past company, they work real well. I looked high and low for a good crash cart and nothing seemed to come close to these. Maybe I was just searching the wrong terms(and apparently my vendors were too). They are a bit pricey though, ~$1500 or so to start. I have a Styleview LCD cart at my current job, and had a LCD cart and a laptop cart at my last place (servers were co-located in a ~900 sq foot cage, 8 feet between rows, so plenty of space for the carts).
I also bought a KVM over IP/CAT5 solution from raritan(http://www.raritan.com/), which worked out real well for those situations where a serial console wasn't enough(unless you have fancy out of band management, some do, some don't). I setup tables in the front of the cage, hooked up a couple of the raritan hardware clients. Typically ran one CAT5 cable w/KVM hookup to each rack, so it could be plugged into any system fairly easily. Range of 1000 feet. This was pretty pricey too, with the adapters and all it was about $25k. Though in the grand scheme of things it was cheap at the time. I had cyclades terminal servers in every rack, with serial consoles on all the servers and network gear.
Also I hooked up a temperature sensor board, from Sensatronics(http://www.sensatronics.com/) I think. I think it was a 16 port board, and I bought all 300 foot cables for all of the sensors, and cut them to length. This ended up being about $5k I think(I went way overkill on the cable lengths).
At my current company we use servertech(http://www.servertech.com/) PDUs, their higher end models come with optional temperature/humidity sensors so we use those instead of the senatronics.
Despite it being a co-location, we had 500kW of power going into that cage(standard setup was ~12kW/rack), if the data center had followed their own procedures(AT&T enterprise network services), we would of had to have about a 5,500 sq foot cage, comparable to your data center
posting as AC, since I don't have an account. I read slashdot daily but I post maybe once every 2-3 years, so I haven't bothered to make an account.
...a time machine, preferably in a Faraday cage (to shield your data center from unwanted interference), so you can implement the necessary changes a couple of months ago.
You need a monkey. Why? If a monkey can manage to bring down even a single server, you've not secured the place enough.
Colin Dean Go a year without DRM
For those times when the internal security system is working, but not according to spec...
Just junk food for thought...
Get a nice comfy Plantronics headset for the POTS line nearby. In a noisy datacenter, while on a mission critical tech support call, the last thing you need is your hand pressing the phone to your ear and/or crappy cell phone audio.
Would be a middle-aged Scottish man to sit in the middle of it with an intercom to say "She canna' take it any more!" when usage gets high.
Great Intellect...
I've seen several good suggestions already with specific suggestions on tools or parts. Start with those. My suggestion is quite simple, actually: Why GUESS what you need, when you can find out for sure?
Tear down one ENTIRE rack. (Or several, if they have any variations.)
Now, look at this big pile of parts in front of you and imagine what you would do WHEN *ANY* one of them breaks.
Get several spares for each of those parts and put into the cart.
Whatever tools you needed for disassembly, put into a crash cart.
Then make another, identical cart. When the brown stuff hits the spinnie thingie, and multiple systems are down, the last thing you want to be doing is fighting over tools. Get spares of EVERYTHING so at least TWO people can work on things at the same time! You'll thank me when there's two of you trying to work on both sides of a rack.
NOTE: Be sure to inventory what you put into each cart! Tools have a way of growing legs and you want to be able to check and make sure that you STILL have ALL the tools.
And please consider getting a big-ass UPS for your cart (At least 1KVA). If your power is wonky, you want to be sure your cart's equipment (laptop, hub, switch, router, etc.) won't be flaking out as the power comes and goes. Even with the power out, you can plug one server into the UPS and restore/repair it while the power is still out. While you're at it, also get some LONG extension cords (100-foot) made of AT LEAST 12-gauge wire. Plug the UPS into the extension cord.
Think you're all set? Now, using ONLY the tools on ONE crash cart, put the rack back together. With the power out. (i.e. no mains)
When you have done this, not only will you be CERTAIN that you have all the tools you needed to [re]assemble everything, you'll actually have done so and will have run into (hopefully) most of the problems that you could encounter.
That's it off the top of my head. Best of luck to you! P.S. One last thing: MANY rolls of Duct Tape! <grin>
I'd suggest extreme emergency supplies for situations where extra cables and backup supplies will prove fruitless.
This includes, but not limited to:
A bottle of whisky
A bottle of scotch
A glass
A Shotgun, pref with ammo
Sleeping pills
Pep pills....
In all seriousness, a good first aid kit should be in the center. Nothing sucks more than a dull headache and not having any asprin for it.
Plus, when someone cuts their hand on a server rack, it'll patch their hands up to keep them from bleeding all over them.
import system.cool.Sig;
tm
Support TBI Research: http://www.raisinhope.org
Funny story on a similar but not as large of scale.
I have a small site with about 8 computers and 3 servers and there is wireless shooting to 3 other buildings with about 4-6 computer each in them about 50 feet apart. I was over ridden in our battery backup system in favor of $50 ups purchased for each computer separately at office max. I'm thinking OK, they are getting a generator and I told them to make sure it had a line conditioner and was certified to work with sensitive computer equipment. besides, when it was just the one building, the UPS worked just fine.
They ignored that and one the test, after all the batteries went down, the computer just quit because the UPS software conflicted with a proprietary app the chose to use. I was called in by the guy who installed the generator and was told that about 20 of the UPS were bad. I though ok, they have been there for a couple of years in some cases and brought down some replacements. I swapped them out, they tested it again and before I got back to the shop I got a call saying more of them were bad. All the local sources were out and the electrician told me he had better backups so I told him to get them. after swapping them out I asked to make sure that they had a clean electrical line coming off the generator and they assured me there was.
Two weeks later, a car hits a telephone pole, the electric was out for more then 10 hours. All the UPS units went out, None of the computers would work. I tested the electrical line and it was jumping between 70 and 150 volts at about 40 hertz. All the ups shut down and wouldn't take power, they decided to plug the computers directly into the wall outlets and took two main boards out, three power supplies and the rest of the computers just wouldn't power up.
The data base on one of the applications got corrupted beyond repair and they had to recreate a weeks worth of entries because the drive got corrupted on the backup server too when the main board went out and no one had made the external backup in over 5 days. The phone system was borked, a 64 inch plasma TV in the lobby was gone, and various other things needed replaced because they acted weird from then on out. The line conditioners should have been about $90 per outlet or about $2000 for one capable of regulating all the power coming from the generator. In the end, it costs around $10,000 in replacements, labor and everything plus they ended up buying a new generator and this time getting a power control system that was certified for sensitive electronics.
Bad power will cause so many problems it isn't funny. Most people don't even know that a generator can be out of whack on output. Not all of them are created alike. Small things like how fast they can adjust to the load pulling from them and how stable the current is isn't a given. You have to make sure it is there or end up with broken electronic every time the power goes out and it kicks on.
Some things I saw in the last datacenter I worked at that I found indispensible:
- one of those headlamp lights for hands-off work on servers (put this in the tool box)
- a way to track who has the tools in the toolbox (check it at start and end of each shift and record such)
- at least 2 cordless headset phones (ever try to move around a server room tied to a cord)
- a supply of batteries for everything that needs them
- a couple 7-day temperature gage chart recorders at various locations in the center + supply of graph paper (useful for A/C issues)
- status check at start and end of each shift (temperature, server status lights, A/C, UPS, equipment in toolbox, etc.)
- a way to log all operations status (we used an in-house Access database which had to be updated at end of each shift)
- install 2 large UPS systems and connect the dual power supplies one to each USP
- instigate a policy "If you change any system stuff on a server, reboot it to ensure it comes back in a known state" Schedule downtime if needed
- don't offer or expect 7x24x365 availability unless you've built fault-tolerant servers that can do this--every system needs downtime for one reason or another and have a slot allocated for regular downtimes on a monthly basis. Emergency hardware outages don't count against this. But when are you going to roll out patches you've tested in the test environment (you _have_ a test environment that somewhat duplicates production, right?)
- NO DEVELOPERS ALLOWED ON PRODUCTION SERVERS. THIS IS A TERMINATION OFFENSE (WITH EXTREME PREJUDICE).
- Limit who has root to groups of servers. Only the datacenter manager should have root to everything.
Have a server shutdown procedure (order that servers go down in the event of a power or A/C event)
If you have a motor generator for backup power, test it quarterly so it will kick in when there's a city power outage. This will avoid the problems seen in the 365 Main Street outage in S.F.
I had to chuckle when I heard about 365 Main. The old datacenter manager would have covered that with the periodically tested motor generator.
Whoa there. Don't lose your cool - what the commenting system is for, ultimately, is an exchange of ideas, yo.
First off, it's not like the guy is asking for advise on critical systems like cooling or something. It's more like a last minute gizmo checkup(or a way to rid themselves of budget leftovers?) Some people can come up with things the "asker" or the other readers hadn't thought of - for example, it would never occur to me about tarps in a DC(credit: a few posts below) because mine is sandwiched between 10-12 floors of office space and an underground parking lot. But maybe to the next person it will be of some use. Besides, the lower one is on the n00bness-to-pro scale(and I am!:D), the more useful this kind of old pro information is.
P.S. Funny how appropriate my sig is today, eh?
P.P.S. Pedant alert - I took the liberty of correcting your title:) o-c-y instead of o-c-i-t-y.
Microsoft put the "sucks" in "success".