Big Red Button Disasters?
FredDC asks: "The Daily WTF has a story about a Big Red Button disaster. What Big Red Button disasters have you experienced? Which ones have you caused? Are there any that you've heard about, or do you know of any that can happen any day now?"
When I was a young child, I found a fire alarm, and, with my father screaming ``No!'' in the background, proceeded to pull it. This is right after we moved to America from Russia, and dealing with the fire department, while barely understanding what they are saying, must have sucked.
Village idiot in some extremely smart villages.
I was doing I.T. support for a 400 person call center. In the server room there was a Big Red Button that was very clearly labeled "EMERGENCY POWER SHUT-OFF" near one of the sets of double-doors.
A technician from U.S. Worst had finished his work in the server room and on his way out he hit the Big Red Button thinking that would open the doors, like at a hospital.
Hilarity ensued.
Later that day I printed out several mock "Big Red Buttons" on sheets of paper to use as decoys next time the tech had to visit.
~> ftp www.workplace.domain /mis-typed/path /mis-typed/path: The system cannot find the file specified.
/index.html with something that was supposed to be a couple levels down is bad enough.
/index.html is owned by someone else entirely. Someone who now must be woken up in the middle of the night, in a different country...
Connected to www.workplace.domain.
220 Microsoft FTP Service
Name: shag
331 Password required for shag.
Password:
230 User shag logged in.
Remote system type is Windows_NT.
ftp> cd
550
ftp> put index.html
local: index.html remote: index.html
227 Entering Passive Mode.
125 Data connection already open; Transfer starting.
226 Transfer complete.
ftp>
The realization that one has just overwritten a public-facing, high-traffic
It's worse when
After I did this two or three times, I decided to stop being such a hardcore geek and got an FTP application with a GUI.
Village idiot in some extremely smart villages.
I was a QA intern at Fujitsu working on the WorldsAway chat world when I discovered a rare crash bug with a new artist tool that I could reproduce successfully but my boss couldn't. Since the tool was supposed to be used on the test server only, my boss approved release of the update to the production server. Everything was fine for a day before the production server started crashing. Turns out that the artists were creating new content on the production server instead of the test server and using the new tool that caused the crashes. The production server was shut down for three days a complete code rewrite was required and Fujitsu lost $250,000 USD in revenue. My boss kept his job as he led the programming team to rewrite the code. I, on the other hand, was given two weeks notice that my six month contract wasn't going to be renewed. Two weeks after I left the company, one-third of the division was laid off to pay for the lost revenue.
Wow, I haven't posted in forever.
Anyway, we did a big datacenter migration at my last company. I'm not going to name names, but it's a Fortune 100 company based in Austin, TX. The move was happening because we built our own building with our own datacenter.
As part of the technical staff (network engineering/security), I was given a tour of the new datacenter before it opened. My boss and assorted other folks were on the tour. My boss, by the way, was a huge...jerk.
The electrician showed us the Big Red Buttons by each of the exit doors. He also said that each of the Power Distribution Units (of which there were three) had a Big Red Button that would cut power to just the areas powered by that unit.
My boss said, not jokingly, "If you need to cut power in an emergency, see if you can figure out which PDU is involved and just cut that one, so we don't lose the whole datacenter."
I piped up: "If I'm getting 220 across my nipples, cut the whole damn room. I really don't care enough about the company to die. I can see my epitaph now: 'Here lies Dimwit. He died saving two-thirds of the datacenter.'"
Man, if looks could kill.
...but it's being eaten...by some...Linux or something...
All new keyboards have a single key Shutdown/sleep thing.
Arghhhhhhhhhhhhhhhhhhh @ little fingers.
I either rip the bastard thing right off the board or dig out the regkey thingy to disable it.
liqbase
to tell people that "Halon" is French for "Exit," so if they ever get locked in the data center, they know how to get out.
"National Security is the chief cause of national insecurity." - Celine's First Law
You know the submission queue is slow when by the time the story is posted the site has changed its name.
I used to work help-desk, and late at night there would only be two people in the quite large building - me and one of the operators. Anyone who as worked with "ops" knows they generally turn a bit strange due to them working nights with nobody around and only DAT tapes for company.
So anyway, there is this big fire alarm panel with tons of buttons that we never really thought about, until one night when it started beeping constantly. The ops guy found a key to it, and then we both stood there looking at the probably 60 buttons and flashing lights, etc. Personally, I would have chosen one of the black buttons marked "mute", but the ops guy went straight for the biggest red one on the board.
The result was more beeping, lots of red lights and about 5 fire-engines.
And as long as we're talking halon, who can forget the classic Vaxen, My Children, Just Don't Belong In Some Places.
While not an official "Big Red Button" story I think it is worth telling.
In 1999 while I was working as a private consultant for the capitol city of a small New England state, a colleague of mine was attempting to make a change to the city's core switches. Per usual with this guy, he over-sold his skill set and was way out of his league - while never willing to admit it.
Meanwhile, I was working in the server room on the squid web caching server while he was attempting the change...
I kept hearing him say things like "I wonder what this command does", and "I wonder what the reset command means. Should I enter it?"
Suddenly I was no longer ssh'ed into the proxy server... I looked up and asked "What the hell did you do?"
His answer: "I entered the reset command"
Me: "Well, fix it. Restore the configuration. It looks like you just reset EVERYTHING..."
Well, needless to say, there was NO saved configuration to restore, and no documentation for the city's network nor the equipment installed, and on this equipment the reset command was the command to reset it to its default settings. (BTW, he entered the reset command on the core switch) There were several local switches (connected via copper), and many fiber connections to all the remote departments across the city - several fire departments, the main police department, city hall, you name it... All off-line.
In the end, the city's network was DOWN for 3-4 full days while he contacted qualified people to attempt to rebuild the network...
We would have been better off if he had hit the big red button near the sliding glass door at the server room's exit.
sigh...
P.S. I am pretty sure he blamed it all on me.
Windows is not the answer.
Windows is the question.
The answer is "NO."
Act One
Big test floor, where several large (multi-million dollar) computer systems are being configured and tested before shipment to the customer.
Tall skinny hyperactive developer (no, not me, I was just an observor) leaning against the wall of the test floor, actually *fiddling with* the Big Red Button.
Someone suggests that he ought not do that. He promises to be careful.
Act Two
Five minutes later. All the power has just gone out. It's amazing how quiet it is all of a sudden. Everyone is looking over at the tall skinny developer with his hand on the Big Red Button.
No words are spoken.
Act Three
Half an hour later. Electrician is leading the tall skinny developer around as he turns on each part of the power system in the right order. CEO and various unmollified developers watching. Back by the door, guy from facilities is bolting a flap over the Big Red Button.
This story has been around for years and years. In case you haven't heard it, here it is again.
***
Magic Switch Story
Some years ago, I was snooping around in the cabinets that housed the MIT AI Lab's PDP-10, and noticed a little switch glued to the frame of one cabinet. It was obviously a homebrew job, added by one of the lab's hardware hackers (no-one knows who).
You don't touch an unknown switch on a computer without knowing what it does, because you might crash the computer. The switch was labelled in a most unhelpful way. It had two positions, and scrawled in pencil on the metal switch body were the words "magic" and "more magic". The switch was in the "more magic" position.
I called another hacker over to look at it. He had never seen the switch before either. Closer examination revealed that the switch had only one wire running to it! The other end of the wire did disappear into the maze of wires inside the computer, but it's a basic fact of electricity that a switch can't do anything unless there are two wires connected to it. This switch had a wire connected on one side and no wire on its other side.
It was clear that this switch was someone's idea of a silly joke. Convinced by our reasoning that the switch was inoperative, we flipped it. The computer instantly crashed.
Imagine our utter astonishment. We wrote it off as coincidence, but nevertheless restored the switch to the "more magic" position before reviving the computer.
A year later, I told this story to yet another hacker, David Moon as I recall. He clearly doubted my sanity, or suspected me of a supernatural belief in the power of this switch, or perhaps thought I was fooling him with a bogus saga. To prove it to him, I showed him the very switch, still glued to the cabinet frame with only one wire connected to it, still in the "more magic" position. We scrutinized the switch and its lone connection, and found that the other end of the wire, though connected to the computer wiring, was connected to a ground pin. That clearly made the switch doubly useless: not only was it electrically nonoperative, but it was connected to a place that couldn't affect anything anyway. So we flipped the switch.
The computer promptly crashed.
This time we ran for Richard Greenblatt, a long-time MIT hacker, who was close at hand. He had never noticed the switch before, either. He inspected it, concluded it was useless, got some diagonal cutters and diked it out. We then revived the computer and it has run fine ever since.
We still don't know how the switch crashed the machine. There is a theory that some circuit near the ground pin was marginal, and flipping the switch changed the electrical capacitance enough to upset the circuit as millionth-of-a-second pulses went through it. But we'll never know for sure; all we can really say is that the switch was magic.
I still have that switch in my basement. Maybe I'm silly, but I usually keep it set on "more magic".
GLS
(1995-02-22)
"Tell me doctor, with all of your defenses, are there any provisions for an attack by killer bees?"
Back in the early 1980s I heard a story on my second co-op work term from a former Dow Chemical contractor about an incident I believe took place somewhere in Ontario. The Dow site operated a large generator of its own, and the generator was monitored by four VAXes running FFTs continuously to detect any unusual vibrations. One day the VAX cluster lit up a few warning lights, the control engineer inexplicably paniced, and despite much training to the contrary, pressed exactly the wrong big red button. The improper shutdown cracked or damaged the giant rotor.
To make things worse, I was told there was a industrial fatality in the aftermath when a panel was removed from a region of the generator that hadn't been properly depressurized. Then they determined that the required replacement rotor was too large to legally truck into Ontario over any public roadway from the U.S. based factory where it originated. I was told they ended up doing a very complex comedy-cops operation under cover of darkness with many scouts and radios, but they did finally get it up and running again, months later.
This was well before the internet so I wasn't able to check out any of the details at the time, and it was a fairly small (yet costly) accident as these things go. I was surprised at the use of VAXes for grinding FFTs, as they seemed rather underpowered in raw CPU relative to other solutions from that era, though maybe not at the time the generator was first commissioned.
Working at a computer center, I think the best design I've seen was the "Big Red Button" was actually 2 buttons, spaced far enough apart that you couldn't hit them both at once with on hand, but close enough together that they were obviously related. They were also much higher off the raised floor than any other switches, and clearly marked.
Just as trivia, that type of circuit is common on industrial equipment (think of the big press from the end scene in Terminator 1) and is called a Two-Hand No-Tie-Down. Basically there are two switches, and they have to both be depressed within a certain interval in order to close the circuit (generally 0.5s or so). If you "tie down" one of the switches, or have something leaning against it, or whatever, pressing the second switch won't trigger (otherwise it would be just a simple AND gate).
The circuits to do it are pretty standard and easily available. What's cooler, is that you can actually get a basically-identical circuit that uses compressed air or other gas instead of electricity (for use in chemical plants and other explosive atmospheres). One of the cooler things I've gotten to see made was a pneumatic "circuit board" cut out of Lucite for this purpose. I've always thought they would make a nice demonstration device for teaching kids about electronic circuits.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Two solaris oopsies.
One: Somehow, I don't know how, I accidentally deleted
Two: Not wanting to accidentally halt the machine without really meaning it, I moved the halt command to halt.ireallymeanit. I then replaced halt with a small shell script that echoed "You don't want to halt this machine" (sleep a few seconds) "If you do, type halt.ireallymeanit" (sleep a few seconds) exit.
Then, to test it, I type halt. Without (duh) first typing which halt to make sure there wasn't a halt command before the
Needless to say, it's not Solaris' fault, but somehow I always managed to screw up that OS without meaning to, so I have developed a healthy fear and loathing for it. I'd like to think I've grown up a bit since then - this has been like 3 or 4 years now, and I've learned a helluvalot since then.
~Wx
sig?
...on the way to the toilets.
...and some pipes.... ...this used to be a factory... ...compressed air? Sprinkler valve? What?
It is on a chain that goes way up to the roof...
I don't know.
I wonder, I wonder.
Other people wonder.
Maybe it has been pulled many times? Maybe someone will pull it and sprinkle all the PCs? Maybe someone pulls it and we all get flushed down the intertubes. (Funny, my kids have never seen a toilet with a chain)
Life is full of little puzzlements.
(It all goes wrong tomorrow, IT WASN'T ME! I HAVE RESISTED TEMPTATION FOR YEARS NOW!)
Articles to slashdot have to be fact checked, and tested on a focus group to make sure that they don't cause emotional distress. After a two months of this, the editors will submit a form P41B with a write up, which is circulated to have it's facts, grammar and spelling checked. The legal department need to process a form P09F911029D74E35BD84156C5635688C0B for the story to make sure there are no legal implications as to publishing it, due to trade secrets, the DMCA or libel. Then it's pretty much a quiet month of tuning the write up and testing it on focus groups before publication. Seems like cramming all this activity into three months is remarkable to me.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Arguably the biggest shutdown-button screw-up in history ...
... didn't register any parameter changes that could justify the SCRAM. Commission ... gathered and analyzed large amount of materials and, as stated in its report, failed to determine the reason why the SCRAM was ordered. There was no need to look for the reason. The reactor was simply being shut down upon the completion of the experiment."
... "
From http://en.wikipedia.org/wiki/Chernobyl_disaster :
"At 1:23:04 the experiment began. The unstable state of the reactor was not reflected in any way on the control panel, and it did not appear that anyone in the reactor crew was fully aware of any danger. Steam to the turbines was shut off and, as the momentum of the turbine generator drove the water pumps, the water flow rate decreased, decreasing the absorption of neutrons by the coolant. The turbine was disconnected from the reactor, increasing the level of steam in the reactor core. As the coolant heated, pockets of steam formed voids in the coolant lines. Due to the RBMK reactor-type's large positive void coefficient, the steam bubbles increased the power of the reactor rapidly, and the reactor operation became progressively less stable and more dangerous. As the reaction continued, the excess xenon-135 was burnt up, increasing the number of neutrons available for fission. The prior removal of manual and automatic control rods had no substitute, leading to a runaway reaction.
At 1:23:40 the operators pressed the AZ-5 ("Rapid Emergency Defense 5") button that ordered a "SCRAM" - a shutdown of the reactor, fully inserting all control rods, including the manual control rods that had been incautiously withdrawn earlier. It is unclear whether it was done as an emergency measure, or simply as a routine method of shutting down the reactor upon the completion of an experiment (the reactor was scheduled to be shut down for routine maintenance). It is usually suggested that the SCRAM was ordered as a response to the unexpected rapid power increase. On the other hand, Anatoly Dyatlov, chief engineer at the nuclear station at the time of the accident, writes in his book:
"Prior to 01:23:40, systems of centralized control
The slow speed of the control rod insertion mechanism (18-20 seconds to complete), and the flawed rod design which initially reduces the amount of coolant present, meant that the SCRAM actually increased the reaction rate. At this point an energy spike occurred and some of the fuel rods began to fracture, placing fragments of the fuel rods in line with the control rod columns. The rods became stuck after being inserted only one-third of the way, and were therefore unable to stop the reaction. At this point nothing could be done to stop the disaster. By 1:23:47 the reactor jumped to around 30 GW, ten times the normal operational output. The fuel rods began to melt and the steam pressure rapidly increased, causing a large steam explosion. Generated steam traveled vertically along the rod channels in the reactor, displacing and destroying the reactor lid, rupturing the coolant tubes and then blowing a hole in the roof.[7] After part of the roof blew off, the inrush of oxygen, combined with the extremely high temperature of the reactor fuel and graphite moderator, sparked a graphite fire. This fire greatly contributed to the spread of radioactive material and the contamination of outlying areas
Perhaps this one is too nerdy for /. - no forget that I said that.
/etc/passwd in vi, then immediatelty realized that this was not where he wanted to be. Now, normally one qould use ':q' to exit a file without saving, but he was in the habit of using ':x', which is a convenient way of saving and exiting at the same time. Unfortunately he forgot the ':', which makes it a command to delete whichever character you are standing on. When nothing seemed to happen, he automatically did it again, this time getting it right. Then he logged out.
/etc/passwd? I'll give you a hint: it begins with root:x:0:0 - so this guy had deleted the 'r' in root, saved the file and exited. And since nobody else was logged in as root, we were stuffed - one couldn't log on as root, since he was not in /etc/passwd, and logging on as oot didn't work either because he was still called root in /etc/security/passwd (this was on AIX - it corresponds to /etc/shadow). And using 'su -' from an ordinary user didn't work, since this command actually looks for the username 'root'. Unfortunately it turned out that booting in single user mode meant that you had only very minimal access to the disks, and getting the others online is not easy when you know too little about AIX and have a very complicated arrangement of disks and volumegroups. In the end we had to reinstall. This of course had to have the traditional, serious consequences: the guy was .... promoted.
You can do really interesting things as root; in a place I worked one of my colleagues wouldn't admit that he had done the following on one of our biggest and most important UNIXes: He had logged on as root and opened up
Now, what is normally the very first line in
I took a great deal of effort to toddler-proof my study. PC and laptop with exposed buttons at desk height or above. Synth moved from wobbly stand to sturdy wall-mounted shelf. Linux server, under my desk, rehomed into a blacker-than-black case, fancy lighting rig unplugged, all buttons, optical drives and recesses safely hidden behind a plain black door. O'Reilly Wall moved from bookcase to high shelves.
I even got a "decoy" keyboard for my 11-month-old daughter to play with.
Of course, she found the UPS switch in seconds. It had a nice glowy LED above it, and was sitting on top of the Linux server just at her shoulder height.
All three PCs, the whole study, powered down, and not in a nice graceful apcupsd way, just a sudden BOINK, follwed by darkness and silence, penetrated only by a happy gurgle.
Thank heavens for Linux software RAID mirroring.
(A couple of months earlier, she managed to cause Windows to prompt "Add new hardware - Searching for drivers" [blur-o-matic cameraphone photo] by sucking the end of my iPod USB cable. Unfortunately I didn't have any Win2K drivers for a 9-month old baby. I bet Ubuntu installs them by default, even though the GNU crowd complain they're not truly free.)
Annabel is one on Sunday. Wish her happy birthday.
Andrew Oakley - www.aoakley.com
What do you mean "not truly free". She's open source, and created by relatively unskilled labor, right?
Best Slashdot Co
there's only one thing any self respecting geek can do.
Hang a note on it that says "Pull me."
Looking for Book Reviews? Check out Literary Escapism.
Fresh out of high school I was a janitor who happened to clean the data center at a big business. I was in this job because I needed to raise money for college (it paid $12/hr believe it or not, which was a fair sight better than pretty much any other job I could have landed at the time). It was a foot in the door, and I eventually worked my way through college and up the corporate ladder in the very same company. Now I'm responsible for the servers which occupy that same space which I used to clean.
Fortunately the guys working in the data center weren't as narrow-minded as you; while working as a janitor I would regularly take a few minutes to help them diagnose some problem with their Windows boxes or just help them put together some new hardware. While it's possible they were patronizing me because they saw in me some spark of what they saw in themselves, I also genuinely believe that they were grateful for the assistance, and at the very least at least they didn't judge me because of my position in life.
I have never since worked as hard in my life as I did while a janitor. I have never since in my life been looked down on by as many people. You cannot imagine how being constantly surrounded by people who look down on you saps your self confidence and opinion of yourself. Working to clean the filth that other people generate, and in service to these people, they will often not even acknowledge your presence even if you address them directly. It was one of the worst periods of my life, and I also regard it as one of the most valuable.
Today I use people's attitude toward janitorial or maintenance staff as a litmus test of their personal character and it has yet to let me down. For example, once while interviewing a job candidate, the janitor came into the room to empty the trashcans. The candidate showed obvious distaste, and I recommended against this person for the job. They got the job in spite of my recommendation, but within 8 months they were shown the door; this same attitude, which they were not even able to mask during an interview infested the rest of their inter-personal relationships. They were a nightmare to work with or even just be around.
Whenever you think you are better than someone else because of what they do or because of who they are, that self-same thought makes it not so.
Slay a dragon... over lunch!