Big Red Button Disasters?
FredDC asks: "The Daily WTF has a story about a Big Red Button disaster. What Big Red Button disasters have you experienced? Which ones have you caused? Are there any that you've heard about, or do you know of any that can happen any day now?"
I was doing I.T. support for a 400 person call center. In the server room there was a Big Red Button that was very clearly labeled "EMERGENCY POWER SHUT-OFF" near one of the sets of double-doors.
A technician from U.S. Worst had finished his work in the server room and on his way out he hit the Big Red Button thinking that would open the doors, like at a hospital.
Hilarity ensued.
Later that day I printed out several mock "Big Red Buttons" on sheets of paper to use as decoys next time the tech had to visit.
~> ftp www.workplace.domain /mis-typed/path /mis-typed/path: The system cannot find the file specified.
/index.html with something that was supposed to be a couple levels down is bad enough.
/index.html is owned by someone else entirely. Someone who now must be woken up in the middle of the night, in a different country...
Connected to www.workplace.domain.
220 Microsoft FTP Service
Name: shag
331 Password required for shag.
Password:
230 User shag logged in.
Remote system type is Windows_NT.
ftp> cd
550
ftp> put index.html
local: index.html remote: index.html
227 Entering Passive Mode.
125 Data connection already open; Transfer starting.
226 Transfer complete.
ftp>
The realization that one has just overwritten a public-facing, high-traffic
It's worse when
After I did this two or three times, I decided to stop being such a hardcore geek and got an FTP application with a GUI.
Village idiot in some extremely smart villages.
I was a QA intern at Fujitsu working on the WorldsAway chat world when I discovered a rare crash bug with a new artist tool that I could reproduce successfully but my boss couldn't. Since the tool was supposed to be used on the test server only, my boss approved release of the update to the production server. Everything was fine for a day before the production server started crashing. Turns out that the artists were creating new content on the production server instead of the test server and using the new tool that caused the crashes. The production server was shut down for three days a complete code rewrite was required and Fujitsu lost $250,000 USD in revenue. My boss kept his job as he led the programming team to rewrite the code. I, on the other hand, was given two weeks notice that my six month contract wasn't going to be renewed. Two weeks after I left the company, one-third of the division was laid off to pay for the lost revenue.
Wow, I haven't posted in forever.
Anyway, we did a big datacenter migration at my last company. I'm not going to name names, but it's a Fortune 100 company based in Austin, TX. The move was happening because we built our own building with our own datacenter.
As part of the technical staff (network engineering/security), I was given a tour of the new datacenter before it opened. My boss and assorted other folks were on the tour. My boss, by the way, was a huge...jerk.
The electrician showed us the Big Red Buttons by each of the exit doors. He also said that each of the Power Distribution Units (of which there were three) had a Big Red Button that would cut power to just the areas powered by that unit.
My boss said, not jokingly, "If you need to cut power in an emergency, see if you can figure out which PDU is involved and just cut that one, so we don't lose the whole datacenter."
I piped up: "If I'm getting 220 across my nipples, cut the whole damn room. I really don't care enough about the company to die. I can see my epitaph now: 'Here lies Dimwit. He died saving two-thirds of the datacenter.'"
Man, if looks could kill.
...but it's being eaten...by some...Linux or something...
All new keyboards have a single key Shutdown/sleep thing.
Arghhhhhhhhhhhhhhhhhhh @ little fingers.
I either rip the bastard thing right off the board or dig out the regkey thingy to disable it.
liqbase
to tell people that "Halon" is French for "Exit," so if they ever get locked in the data center, they know how to get out.
"National Security is the chief cause of national insecurity." - Celine's First Law
You know the submission queue is slow when by the time the story is posted the site has changed its name.
I used to work help-desk, and late at night there would only be two people in the quite large building - me and one of the operators. Anyone who as worked with "ops" knows they generally turn a bit strange due to them working nights with nobody around and only DAT tapes for company.
So anyway, there is this big fire alarm panel with tons of buttons that we never really thought about, until one night when it started beeping constantly. The ops guy found a key to it, and then we both stood there looking at the probably 60 buttons and flashing lights, etc. Personally, I would have chosen one of the black buttons marked "mute", but the ops guy went straight for the biggest red one on the board.
The result was more beeping, lots of red lights and about 5 fire-engines.
And as long as we're talking halon, who can forget the classic Vaxen, My Children, Just Don't Belong In Some Places.
While not an official "Big Red Button" story I think it is worth telling.
In 1999 while I was working as a private consultant for the capitol city of a small New England state, a colleague of mine was attempting to make a change to the city's core switches. Per usual with this guy, he over-sold his skill set and was way out of his league - while never willing to admit it.
Meanwhile, I was working in the server room on the squid web caching server while he was attempting the change...
I kept hearing him say things like "I wonder what this command does", and "I wonder what the reset command means. Should I enter it?"
Suddenly I was no longer ssh'ed into the proxy server... I looked up and asked "What the hell did you do?"
His answer: "I entered the reset command"
Me: "Well, fix it. Restore the configuration. It looks like you just reset EVERYTHING..."
Well, needless to say, there was NO saved configuration to restore, and no documentation for the city's network nor the equipment installed, and on this equipment the reset command was the command to reset it to its default settings. (BTW, he entered the reset command on the core switch) There were several local switches (connected via copper), and many fiber connections to all the remote departments across the city - several fire departments, the main police department, city hall, you name it... All off-line.
In the end, the city's network was DOWN for 3-4 full days while he contacted qualified people to attempt to rebuild the network...
We would have been better off if he had hit the big red button near the sliding glass door at the server room's exit.
sigh...
P.S. I am pretty sure he blamed it all on me.
Windows is not the answer.
Windows is the question.
The answer is "NO."
Working at a computer center, I think the best design I've seen was the "Big Red Button" was actually 2 buttons, spaced far enough apart that you couldn't hit them both at once with on hand, but close enough together that they were obviously related. They were also much higher off the raised floor than any other switches, and clearly marked.
Just as trivia, that type of circuit is common on industrial equipment (think of the big press from the end scene in Terminator 1) and is called a Two-Hand No-Tie-Down. Basically there are two switches, and they have to both be depressed within a certain interval in order to close the circuit (generally 0.5s or so). If you "tie down" one of the switches, or have something leaning against it, or whatever, pressing the second switch won't trigger (otherwise it would be just a simple AND gate).
The circuits to do it are pretty standard and easily available. What's cooler, is that you can actually get a basically-identical circuit that uses compressed air or other gas instead of electricity (for use in chemical plants and other explosive atmospheres). One of the cooler things I've gotten to see made was a pneumatic "circuit board" cut out of Lucite for this purpose. I've always thought they would make a nice demonstration device for teaching kids about electronic circuits.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Articles to slashdot have to be fact checked, and tested on a focus group to make sure that they don't cause emotional distress. After a two months of this, the editors will submit a form P41B with a write up, which is circulated to have it's facts, grammar and spelling checked. The legal department need to process a form P09F911029D74E35BD84156C5635688C0B for the story to make sure there are no legal implications as to publishing it, due to trade secrets, the DMCA or libel. Then it's pretty much a quiet month of tuning the write up and testing it on focus groups before publication. Seems like cramming all this activity into three months is remarkable to me.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Arguably the biggest shutdown-button screw-up in history ...
... didn't register any parameter changes that could justify the SCRAM. Commission ... gathered and analyzed large amount of materials and, as stated in its report, failed to determine the reason why the SCRAM was ordered. There was no need to look for the reason. The reactor was simply being shut down upon the completion of the experiment."
... "
From http://en.wikipedia.org/wiki/Chernobyl_disaster :
"At 1:23:04 the experiment began. The unstable state of the reactor was not reflected in any way on the control panel, and it did not appear that anyone in the reactor crew was fully aware of any danger. Steam to the turbines was shut off and, as the momentum of the turbine generator drove the water pumps, the water flow rate decreased, decreasing the absorption of neutrons by the coolant. The turbine was disconnected from the reactor, increasing the level of steam in the reactor core. As the coolant heated, pockets of steam formed voids in the coolant lines. Due to the RBMK reactor-type's large positive void coefficient, the steam bubbles increased the power of the reactor rapidly, and the reactor operation became progressively less stable and more dangerous. As the reaction continued, the excess xenon-135 was burnt up, increasing the number of neutrons available for fission. The prior removal of manual and automatic control rods had no substitute, leading to a runaway reaction.
At 1:23:40 the operators pressed the AZ-5 ("Rapid Emergency Defense 5") button that ordered a "SCRAM" - a shutdown of the reactor, fully inserting all control rods, including the manual control rods that had been incautiously withdrawn earlier. It is unclear whether it was done as an emergency measure, or simply as a routine method of shutting down the reactor upon the completion of an experiment (the reactor was scheduled to be shut down for routine maintenance). It is usually suggested that the SCRAM was ordered as a response to the unexpected rapid power increase. On the other hand, Anatoly Dyatlov, chief engineer at the nuclear station at the time of the accident, writes in his book:
"Prior to 01:23:40, systems of centralized control
The slow speed of the control rod insertion mechanism (18-20 seconds to complete), and the flawed rod design which initially reduces the amount of coolant present, meant that the SCRAM actually increased the reaction rate. At this point an energy spike occurred and some of the fuel rods began to fracture, placing fragments of the fuel rods in line with the control rod columns. The rods became stuck after being inserted only one-third of the way, and were therefore unable to stop the reaction. At this point nothing could be done to stop the disaster. By 1:23:47 the reactor jumped to around 30 GW, ten times the normal operational output. The fuel rods began to melt and the steam pressure rapidly increased, causing a large steam explosion. Generated steam traveled vertically along the rod channels in the reactor, displacing and destroying the reactor lid, rupturing the coolant tubes and then blowing a hole in the roof.[7] After part of the roof blew off, the inrush of oxygen, combined with the extremely high temperature of the reactor fuel and graphite moderator, sparked a graphite fire. This fire greatly contributed to the spread of radioactive material and the contamination of outlying areas
I took a great deal of effort to toddler-proof my study. PC and laptop with exposed buttons at desk height or above. Synth moved from wobbly stand to sturdy wall-mounted shelf. Linux server, under my desk, rehomed into a blacker-than-black case, fancy lighting rig unplugged, all buttons, optical drives and recesses safely hidden behind a plain black door. O'Reilly Wall moved from bookcase to high shelves.
I even got a "decoy" keyboard for my 11-month-old daughter to play with.
Of course, she found the UPS switch in seconds. It had a nice glowy LED above it, and was sitting on top of the Linux server just at her shoulder height.
All three PCs, the whole study, powered down, and not in a nice graceful apcupsd way, just a sudden BOINK, follwed by darkness and silence, penetrated only by a happy gurgle.
Thank heavens for Linux software RAID mirroring.
(A couple of months earlier, she managed to cause Windows to prompt "Add new hardware - Searching for drivers" [blur-o-matic cameraphone photo] by sucking the end of my iPod USB cable. Unfortunately I didn't have any Win2K drivers for a 9-month old baby. I bet Ubuntu installs them by default, even though the GNU crowd complain they're not truly free.)
Annabel is one on Sunday. Wish her happy birthday.
Andrew Oakley - www.aoakley.com
I was 6 and the manager at my local bank was in a meeting with my mother. He let me play in the next office over, and what did my young inquisitive eyes find, but a nice big red button, right there on the floor!!! I pushed it, of course as that's what I do, and next thing I know a cop is rubbing my head asking me what grade I'm in. I never admitted to pushing the button outright though.
3 weeks later my uncle approached me (remember, I'm 6): "I heard you pushed a grey button under the desk at the bank last month!"
My response: No! It was red! *busted*
Fresh out of high school I was a janitor who happened to clean the data center at a big business. I was in this job because I needed to raise money for college (it paid $12/hr believe it or not, which was a fair sight better than pretty much any other job I could have landed at the time). It was a foot in the door, and I eventually worked my way through college and up the corporate ladder in the very same company. Now I'm responsible for the servers which occupy that same space which I used to clean.
Fortunately the guys working in the data center weren't as narrow-minded as you; while working as a janitor I would regularly take a few minutes to help them diagnose some problem with their Windows boxes or just help them put together some new hardware. While it's possible they were patronizing me because they saw in me some spark of what they saw in themselves, I also genuinely believe that they were grateful for the assistance, and at the very least at least they didn't judge me because of my position in life.
I have never since worked as hard in my life as I did while a janitor. I have never since in my life been looked down on by as many people. You cannot imagine how being constantly surrounded by people who look down on you saps your self confidence and opinion of yourself. Working to clean the filth that other people generate, and in service to these people, they will often not even acknowledge your presence even if you address them directly. It was one of the worst periods of my life, and I also regard it as one of the most valuable.
Today I use people's attitude toward janitorial or maintenance staff as a litmus test of their personal character and it has yet to let me down. For example, once while interviewing a job candidate, the janitor came into the room to empty the trashcans. The candidate showed obvious distaste, and I recommended against this person for the job. They got the job in spite of my recommendation, but within 8 months they were shown the door; this same attitude, which they were not even able to mask during an interview infested the rest of their inter-personal relationships. They were a nightmare to work with or even just be around.
Whenever you think you are better than someone else because of what they do or because of who they are, that self-same thought makes it not so.
Slay a dragon... over lunch!