Server Room Smells Can Be an Early Warning
Barence writes "As embarrassing as it may seem, an eggy smell in a server room needn't mean broaching the delicate subject of hygiene with a colleague. It can actually be a signal that something is about to go wrong with your server setup, as this consultant discovered after days of assuming questionable personal habits were to blame. The culprit? An expiring UPS device, sending out its own unique warning signal."
Amazing how many dying UPS devices must be hidden in my boss's office.
Does this mean I can use my father-in-law as a UPS?
Sulfur Dioxide. Ventilate, replace or recondition battery. If the egg smell is strong and you quit smelling it, that's olifactory fatigue and lethal levels of the gas exist.
"Can be an early warning?"
"CAN be?"
Like all IT administrators who've actually worked with server hardware, I have a heightened sense of smell, but only specifically for the smell of burning plastic. It's not a mere warning, it's an instant alarm that'll have every IT person in the room sniffing the power supplies.
We IT people, we're like bloodhounds or something. I can smell burned plastic from across the street. I've been set off by welders at a car mechanic a block away. I've been set off by an invisibly tiny bit of cheese someone dropped into a toaster oven once... three floors down from the server room. Had me in a right panic.
IT is all fun and games until the servers literally melt into slag. There's no repair CD for that -- and we all know that the backup tapes, while wonderful for backing up, aren't so good at the actual restoring bit. That's why they're called backup tapes, not restore tapes, see?
"Windows in the server room?"
You'd be surprised how many servers still run XP professional, or 2003
Uhhhmmm - it isn't just computers. If I notice an odd smell when I walk through the plant, I investigate. Our plant makes plastic products, and 2/3 of the time, the odd smell is just overheated plastic. But, the other 1/3 finds a problem of one sort, or another. Overheating oils are bad news, overheating capacitors are more bad news - actually, ANYTHING hot enough to give off an odor is bad news. Three weeks ago, we had a machine that was kicking our asses - the mold wouldn't open either manually, or in automatic. 4 of us went over that machine from one end to the other, multiple times. Ohmeters and voltmeters said that everything was just fine, believe it or not. Finally, I caught a whiff of something funky, opened up a solenoid from which the odor seemed to be coming, and found that half of the windings were burnt and shorting.
The sense of smell is a valuable tool in troubleshooting and maintenance, unless you ignore it.
"Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
I've never physically been inside a data center, but I'd have thought that the locales would have really good ventilation, that would simply shut close (or rely on gas weight and gravity) if the halon system or equivalent would need turning on. The ventilation is in fact so bad, there can be a gas buildup so severe you need to (according to posters above me) go in with hazmat gear?
Emotions! In your brain!
From the clasic BOFH :)
"The admin gene," the PFY explains. "The ability to recognise things that users don't. A slight flicker of lighting, a whiff of hot component in the air, a fractional change in the pitch of a cooling fan - all of which the garden variety user misses in the headlong rush to read their email."
http://www.theregister.co.uk/2008/07/04/bofh_2008_episode_24/
APC UPS's have a tendency to cook their batteries as they get near the end of their lifetime. The results can be horrifying... bulging batteries, and if allowed to go on long enough, yes, even "sealed" lead acid batteries will rupture and you'll get the lovely sulfur smell.
I recently pulled these APC batteries out of an APC Smart-UPS 1400, which had to be disassembled (including the removal/replacement of rivets) in order to get the batteries out.
http://img221.imageshack.us/img221/171/imageyv.jpg
Computers run on smoke... when the smoke comes out, they stop running.
I prefer to use a damp towel/sponge (less risk of burning my nose). I would call it a warm metal smell.
If they're like some of the IT departments I've seen, they might be working by some rule from upper management that they need to justify their existence by writing internal invoices for everything they do. It tends to result in them doing nothing until you tell them to, so they can bill you for it. The UPS could have not only the error lights on, but a binking "RED ALLERT" sign and the accompanying acoustic blare, and verily be on fire and billowing smoke, and nobody would touch it until you fill the proper form requesting them to put it out.
Because, yes, that's another thing I've noticed that a lot of departments love, IT including: inventing bureaucracy and paperwork to discourage and delay actually having anything to do. You may need to fill in a 5 page form and draw powerpoint diagrams as to why you want the UPS doused and what are the architecture implications of that. And if you're unlucky a few meetings too, to convince some Mordac The Information Services Preventer why he should move his ass and turn that UPS off, and why his suggested workarounds (in which he'd not have to do anything) aren't quite solving the problem.
A polar bear is a cartesian bear after a coordinate transform.
This is why I prefer to build my new server rooms with individually cooled racks - each rack having its own AC-circulation - as well as using centralized water cooling for its efficiency and reliability. Circulating all your cooling air around the server room is simply a bad idea. When you have 1 kilometer of rack space on a single building floor, one source of contaminant, be it chemical or metal particles, will get into all the enclosures in the hall and cost you everything. And BTW UPS maintenance is something that modern IT management, especially outsourced services, have forgotten. Any veteran admin knows you need to estimate the end-of-life for their electronics AND replace them BEFORE they fail - just like AC-filters - If allow those to fail, they will have already done some damage! There's no "RAID" for burning electronics or blocked cooling air!
www.tribalnetworks.org - helping tribal people around the world to own their own means of high-tech communications
Listen...do you smell something?
Why does the UPS not have a fail safe that kills it when the battery goes bad to stop a fire?
Far too many people rely on performance metrics and alarms. You're one of the ones who actually pays some attention :P
Any time you enter the DC you should take stock:
1. What do you hear? Perhaps an alarm through all the server noise? Unusually loud fans/ACUs? Anything unusually quiet? Other noises? (I 'predicted' an ACU failure because I heard the fan belt rubbing on something lightly shizz-shizz-shizz-shizz-shizz...)
2. What do you smell? This article basically points this out. Could be leaking ACU coolant. Batteries dying. Burning server. Overloaded circuit, etc.
3. What do you see? Yea, stupid I know but - does that corner of the room appear slightly dimmer? Better go check it out, a rack might be down and you haven't noticed yet.
4. What do you feel? Vibrations through the floor? Could be an ACU about to pop a fan belt or blow a compressor.
5. What do you feel further? Unusually dry or humid air? Temperatures etc.
In short, you should be using every sense except taste and direct tactile feel. Anything shorter and you just aren't paying full attention.
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Sense of touch can be valuable too. You can get sub-audible vibration readings by touching a case, and touch is more sensitive to small amounts of temperature change than other senses. Likewise it can be a really exciting way to check for failing/floating ground.
The preceding comment is my own, and in no way construes an opinon of the Emperor of Mankind.
Please return to reality and stop waving numbers about just to win an argument, what is important is actually WHAT THE NUMBERS MEAN and not clueless numerology.
Obviously tape and hard disks are used in different ways - so with tape reliability is not is you can run the same tape continously for years, it's if you can write it, store it and then read it years later. Hard disks are not so reliable by that measure. You can not rely on them. In this case that is what reliability actually MEANS even if you try to win an argument by saying that is longevity instead.
The last comment has things reversed as you would know if you've had the misfortune of multiple failures with optical media or had actaully read anything about the experiences of others.