LiveJournal Blackout Analysis Online
Hakubi_Washu writes "LiveJournal has posted their official analysis of what happened last Friday.
Apparently someone "accidentally" pushed the emergency power off (which should keep all power off, even UPS), reset it and ran off. They had problems to come back up fast, because of "9 machines with faulty motherboards with embedded NICs that don't do auto-negotiation properly", Machines not fully rebooting for analysis reasons and few others. "
They should be using OpenBSD. It can run right through power failures
What they do makes me happy when I think how simple my setup is by comparison.
Don't let your clients near the Big Red Button without an escort. Preferably an armed one.
Don't blame me; I'm never given mod points.
Now we've got blogging about blogging. Yay for the rise of meta-blogging.
so, they had faulty motherboards, knew about it, and didn't do anything to fix it before they had a major outage?
No beer, no TV make Lifthrasir something something
Now, if slashdot could fix their servers, so we wouldnt get thoose annoying 503 sites..
I havent seen them that much lately, but then i havent been online that much either...
"I'll just set my coffee down here, and..."
...
"Oppsie, I hope that button wasn't anything important."
Ah, the famous History Eraser Button rears its ugly head. I think that everyone who has worked in a large datacenter or lab environment with one of these has a story to tell...
(S(SKK)(SKK))(S(SKK)(SKK))
Did they put it right next to the light switches? Shouldn't something like that be locked away in a server room or at least in a place where it can be under supervision?
What doesn't kill you only delays the inevitable
/.s current poll now?
Monstar L
Congrats to the LJ folks for getting things working, taking the time to do it right, and giving an admin's-eye-view into what actually happened.
Carousel is a lie!
The linksys rep.
I mean for like $20x9, they could have avoided the problem by adding a few NICs.
Apparently someone "accidentally" pushed the emergency power off
They had to power back on when they realized deadjournal.com was already taken...
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
If Mr. "I Pushed The Big Red Button"'s personal information ever gets published....
LJ's active user base is easily 10x that of Slashdot's. We'd have to come up with a new term for the internet event that pales any slashdotting that ever came before.
Those particular buttons are shielded by plastic covers. You have to deliberately lift the cover to get to the button. You can't just "bump into it".
What, somebody was attracted to the pretty red button and just *had* to push it???
When I first moved company servers in to a new colo four years ago, their engineers advised me that I should turn auto-negotiation off on every port, including our switches and host NICs. I asked why they recommended this and they replied, "trust us, auto-negotiation causes problems when you least expect it." I went ahead and fixed the port speeds everywhere. Now I understand why.
What do you mean, ran off?
Ran off skipping and giggling, like a 13 year old who just put toothpaste on the toilet seat?
Or do you really mean, slunk off, like my dog does when I walk in and find her curled up on top of the remains of the remotes for the TV, TiVo, DVD player and stereo?
My dog likes remote controls more than snausages.
OT: Anyone know where (brick and mortar) to get a replacement (original) TiVo remote?
I don't need no instructions to know how to rock!!!!
Speaking of stupid things to do how many people know someone that has named a file on a Unix server * and then at some point later in time decided they no longer needed that file and decided to rm *?
News Reporters Make Tasty Polar Bear Treats!
So what you're saying is that your post will mirror 99% of all LJ posts?
Anyone who's a paid member of LJ can get a 2-week credit here.
Entrepreneur : (noun), French for "unemployed"
"Must be uh, must be why we're not shipping Longhorn yet."
With that 'running off' part. If you had said 'wobbled off' or 'jiggled off' you might be able to make a case.
I must compliment LJ for at least being honest with their system... many would lie and say "it was the datacenter's fault".
They at least admit their own systems weren't perfect... and clearly explained each fault they observed.
Good info.
I always wanted to push that button... Now I don't have to.
*crickets chirping* That's the sound millions of teenage girls not using up bandwidth and disk space talking about boys, jcrew and high school/college drama.
Click here or a puppy gets stomped!
I was a sysadmin at a Fortune 100 company with thousands of servers. Every Saturday evening, we rebooted all of our servers. We almost always had several machines which would not come back up for one reason or another - so we dealt with it then, on Sunday morning, instead of during the week when a reboot of a critical machine that did not work would be much worse. Scheduled reboots are a part of good systems administration. If once a week is too often, then once every two weeks, or once a month. With this much failure, I'm almost certain they never did scheduled reboots. They had two failures - their power failed, and then their lack of planning allowed for so much to go wrong a result of that.
And I was like OMG I shut off the internets and stuff!!1!!
And i called the AOL helpdesk and they helped turn it back on.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
Is it me, or are some of those LJ users' expressions of thanks just a bit OTT?
The way the comments go, you'd think this was a life support system or something!
I mean, well done for getting the site back up after like 24 hours or something, but hey I'm not creaming my shorts over it!
#include <sig.h>
everybody was blaming Internap for screwing up and running a shoddy Datacenter, when actually Internap did everything they were supposed to correctly.
Your hair look like poop, Bob! - Wanker.
Apparently someone "accidentally" pushed the emergency power off (which should keep all power off, even UPS)
This also raised the all-important "Why do we even have that button?" question.
Maybe they should use the Button of Doom (USB) to lock the pcs down too...
"EPO, by the way, stands for Emergency Power Off and it's a national fire/electrical requirement for firefighters to be able to press these big red buttons near all exits that turn off all power in the entire data center."
"...all our DBs have redundant power supplies. we'll be plugging one side into Internap's, and the other side into our own UPS, which itself is plugged into Internap's other power grid. that way if EPO is pressed, we'll have 1-4 minutes to do a clean shutdown. (but if we do the rest of the stuff right, this step isn't even required, including having UPSes... in theory... but the UPSes would be comforting)
Isn't that circumventing the purpose of the EPO? If there's a smokey fire in there and the firefighters have to enter the room and start spraying water around, won't a few machines glowing for four minutes after the EPO was pressed put them in danger of electrocution? Or force them to wait four minutes beore they can enter?
I'm not trying to be a smartass here, since I'm not an expert in datacenters or the purposes behind EPOs - I'm asking. . .
The only acceptable defense of scientific results is to say that they were the product of the Scientific Method.
I have run across this issue in data centers numerous times. This still occurs with the latest hardware, no matter what vendor or OS. I have this problem on SunFire280Rs and Compaq DL360s. What it comes down to is the switch being used in the data center and the settings in the OS. Typically, data centers set their switch to forced 100-full (unless of course they are using fibre or Gb). The OS must be set to force its NICs in the same mode, or they will either drop alot of packets. Sounds like a disconnect in communications between the NOC and the customer.
Ran off skipping and giggling, like a 13 year old who just put toothpaste on the toilet seat?
By any chance, was his name "Zero Cool"?
They ought to have out-of-band (OOB )serial-console access to their servers via a terminal server for any number of reasons, including this one; if they'd implemented OOB console access, they could've sshed into the terminal server, gotten onto the consoles of the servers in question, and used ifconfig to fix the duplex issue.
Why they don't seem to grasp this is beyond me . . . anyone running a public-facing, high-volume service should have OOB access to all servers, routers, switches, firewalls, etc. . . . it's just common sense.
I told you so.
Looks like my "Newbie Operator" found hisself a new job.
Mit der Dummheit kämpfen Götter selbst vergebens.
The one they tell you about and the real one.
Who in their right mind goes with the on-board NIC in a server environment?
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Actually, most of the accounts don't pay. They're just freeloading whiners.
This is a paste from the Livejournal stats:
* Free Account: 5713743 (98.3%)
* Early Adopter: 14220 (0.2%)
* Paid Account: 94857 (1.6%)
* Permanent Account: 1632 (0.0%)
The article cites disk caches as a source of data-loss.
They claim that their battery-backed RAID caches were safe, but that the actual drives themselves were performing unsafe write cacheing. It strikes me that this is the kind of thing that's quite easy to *suggest*, but far more difficult to *prove*.
I don't have any first-hand knowledge of disk corruption due to write-caching. Is this a real problem or just some kind of legend? Can someone who has RTFA'ed and knows about disk caches please comment?
This is somewhat irrelevant, but I've messed with some non-battery-backed RAID setups in the past. In these situations, it always made sense to me that the controller would set the individual drives' cache policy to match its own.
It seems that my company and LiveJournal host at the same datacenter here in Seattle. Looks like they got hit pretty hard when the datacenter with multiple redundant battery backups and generators had a massive cascade emergency power off, and every server in the building got shutdown at once. LiveJournal got hit the hardest, they had some IDE drives on their servers, doh! Looks like even multiple redundant battery backup with power generator datacenters are still vulnerable to dumbass electricians who don't know what they are doing. The datacenter has been under construction for the past few months too, so you KNOW that had something to do with it. Looks like we'll have to put a UPS in our cabinet at the multiple redundant battery back up and power generator datacenter housing, seeing as all that backup protection doesn't mean diddly squat.
Meet new people, and kill them.
Most of the time it is Stimpy's fault. The rest of the time it is Fry's fault. I think there may be a connection...
(S(SKK)(SKK))(S(SKK)(SKK))
I'm surprised that they didn't have their own little UPSes to bring the system down cleanly before. Sure, the facility is supposed to provide power at all times, even if there's a power grid interruption, but that doesn't get tested very often and isn't under your control. Furthermore, in the event that the facility's power is actually going to go out, there isn't any way for the machines to find this out and shut down cleanly.
Arnt these sorts of switches usually behind little glass things that say "BREAK IN CASE OF EMERGENCY" ?
I mean I'm sure it's a big red button of some sort like the one we've got in our server room, but man, that's the sorta thing that needs a video camera aimed at it.
Of course, if it was a malicious inside job, then there's not too much to do about it.
I understand the REASON for an easily accesable switch like this, but would it be possible just to wire it into the fire system or something and not have a switch that just screams touch me for a thrill ?
About a decade ago, we had a series of "incidents" with the EPO button in the software lab. Shortly after a serious lab upgrade (due to constantly blowing breakers,) someone decided to test the EPO switch (it was a bit of a novelty at the time.) *click* "Cool, it works. Hey, how do you reset this thing?" Turns out you needed to have a key to reset it. It took about 4 hours to find someone who had the key. That one got replaced with the Mark II resetable switch ...
... *click*
...
...
About a month later, one of the managers was giving a prospective new-hire a tour. He got to the software lab, and started blathering about "don't ever push the red switch" as he put his finger on the switch
So some einstein decided that the Big Red Switch was "dangerous" and put a plexi cover over it - the same kind that goes over the thermostat control, and the same kind that has a key lock. Yep, about six months later we had a gen-you-ine emergency. One of the HP 9000/300 monitors went crispy, and was snorting smoke and sparks. One of the software folks went to hit the Big Red Button, but was somewhat nonplussed to find a locking cover over it. She took the co-located fire bottle, sheared the cover off, pressed the button, then got to use said fire bottle on the monitor.
So the cover gets replaced again, though this time with a non-locking cover. At some point, the software server stack needed to be relocated into the corner with the Big Red Button. Another einstein discovered that it was inconvenient to slink behind the equipment rack - the cover kept bashing him in the neck or shoulder. So he removed it, thinking that accidental presses wouldn't happen because the button was obstructed by the server stack. (yep, inaccessible = useless.) Some time later, the equipment was being jockeyed for an upgrade, and one of the big SCSI cables snagged the Big Red Button and *click*
All these shenanigans happened in the space of one year, and I got tired of the thrash. I measured the space between the back of the switch and the faceplate - just over 3/4 inch. I cut a horseshoe shape out of 3/4 plywood, and hung it on the switch shaft. In and emergency, it's really easy (and obvious) to remove it. Gravity keeps it there otherwise. No problems since
Maybe people will see this and relise the LJ staff are geeks, unlike most of their fanbase, so while you maybe mocking their minions they can still bring down a server looking at a single article with the rest of us slashdotters.
I like muppets.
Way back when, I was working at an IBM site (STF) that had a boatload of mainframes and equipment on a raised floor area that was badge-access only. Every summer we'd get interns to learn the finer points of computer science by doing things like bursting printouts from the lineprinter and delivering them. Seems that the intern introductory tour had gotten a bit lax... One day a cleaning person knocked at the door to the raised floor to get let in to empty the wastebaskets. Nobody else was around, so one of the interns decided to let them in. Of course they pushed The Big Red Switch that was right next to the door. Oops. Whole floor went down...hard, about 10% of the stuff didn't come back up when the power was restored. Not fun...
They revised the introductory tour a bit, and added a label to the EPO switch.
(And no, it wasn't me who hit the button...)
Needless to say, we now have a cover over our Button. Funny thing is, the electrician who installed the original button is also the guy who leaned his ladder against it.
Go ahead and read up on how auto-negotiation works. I'll wait...
No, really. Go read up on it...
Okay, since you don't bother reading up on it, and since you claim that someone's cheeky because they *document* what happens when you misconfigure a connection, I must conclude that you, sir, are indeed an idiot.
(To summarize for those of you who won't bother to look it up, a NIC can sense the carrier for 100, so it can differentiate 10/100. Full and half are actively negotiated by the two sides of the connection. If side 'A' is hard set to 100/full, it won't negotiate with the other side. Hearing no negotiation, side 'B' will assume the NIC doesn't support full duplex connections and failover to half duplex. This is the proper, standardized, documented behavior. Anything else would require the psychic interface spec that *still* hasn't been finalized.)
This reminds me of the time when I had a server that would not reboot because there wasn't a keyboard plugged in, and I did not change the setting in the BIOS.
Brian.
So make a little black button and know where it is, but also make an big red one that turns off the lights. That way you get to yell at little kids without much harm to your system.
This post written under Gentoo-linux with an SCO IP license.
Plain and simple. People notice a "historical post" and they want to have their LJ face right up there in it.
Total kissasses. I wonder how many of them are paid members vs free accounts.
Remember, the overwhelming majority of Livejournal users are *NOT* paying customers...
Account Types
What type of account do people have?
* Free Account: 5713743 (98.3%)
* Early Adopter: 14220 (0.2%)
* Paid Account: 94857 (1.6%)
* Permanent Account: 1632 (0.0%)
They're required by law to have it. It's a building code thing. Every data center I've ever been in has one.
Also.. ""EPO, by the way, stands for Emergency Power Off and it's a national fire/electrical requirement for firefighters to be able to press these big red buttons near all exits that turn off all power in the entire data center."
...when you buy crappy kit. Next time do it right.
Knowledge is power. Knowledge shared is power multiplied.
A couple of years ago, when our server room was being 'certified', one of the specific checks was "No, big red button, check". One of the guys in the group came up with a story about how someone's kid at the end of a 'tour' thought that the 'big red button' was ment to be pushed.
The force that blew the Big Bang continues to accelerate.
I most often see autoneg problems with faulty cabling (split pairs from crimps). 98% of newbies cannot get it right, and they aren't to blame because the standards are counter-intuitive unless you've worked for Ma Bell for 40+ years. I beware of all field crimps.
OTOH, I saw one example of a Crisco Crapalyst router not wanting to play with some devices. Of course they blamed the device, but I never had any problem with interconnects or using cheap @$$ switches, so I wonder why the expensive @$$ switch gets huffy.
Nonsense. I had my server up for 360 days without rebooting, with kernel 2.4. It had 360 days on the uptime counter. I only shut it down because it was too slow for the newer stuff I wanted to run.
There is a nasty bug in Linux that makes the computer reboot every 49.7 days. The worst part is that this bug has been around for almost 10 years...
WTF???!!! Only if your a cretinized asshandler like yourself. My last big uptime on Linux was 131 days and that ended because of a hard drive failure. You lie like a fucking $2 whore for SCO because there is NO bug that makes computers reboot every 49.7 days. Take my advice, back away from the computer and go back to sucking on Darl's limp cocktail wiener.
What good is a million eyes looking at the code if they are attached to half a million idiots?
Those million eyes have more intelligence than an infinite number of copies of you. If we took billions of copies of you and stuck them in a room with typewriters, we might wind up with a collective "ungh". Stupid pudthumping moron! Quit jacking off over pictures of your mom and realize that you are worthless and a complete and utter failure as a troll.
I guess most people don't realize this because they need to recompile their kernel every other week, or they use Linux only to boot into illegal copies of Windows.
N. You're wrong again you chocopipe loving jackhole. Most people running Linux are too busy having a real life (ie. screwing our wives/girlfriends, hanging out at our favorite bars/clubs. or even watching a game on TV) instead of patching Windows worms like all SCO lovers are apt to be doing. Because in the end, you don't hate Linux, you don't love SCO, you love the cocks of Bill Gates and Steve Ballmer. Congrats because you've just set the bar even lower than it's ever been before. Good job on making the world that much more retarded with your stupid troll. Good work you assmunching turd goblin.
Well, you get what you pay for. Is this SCO Linux that you talk about any better for $699?
Proof that you are the most unoriginal troll to ever hit the pages of Slashdot. Hey kid, why don't you spend some time learning about being a fucking adult and get some manners. Then maybe, just maybe, you can learn how to troll from the masters. Get real with yourself fuckass. You don't know what the hell you're talking about. You are a complete failure. You are ugly and unloved and a miserable little nutsack. Fuck you. Go to hell. Have a horrible day. Screw off jackass. Go do me a favor and jam a hot soldering iron up your diseased ass. Bye bitch.
(a) Manager that pushed the "off" button gets promoted.
(b) Engineers that spent their weekends getting the system back up: off to India with your jobs!
"Awww, I don't know why we even have a jug!"
This space intentionally left (almost) blank.
I'm waiting for the day that machines come built such that when the power dies, an emergency battery kicks in just long enough to dump the RAM state to a nonvolatile cache, and then when power resumes, restore the system from there. Like VirtualPC.
Heck, having that be a user-accessible feature supported by the OS ("Save and Shutdown") would make a lot of sense too.
-Forrest Cameranesi, Geek of all Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
Lazy writes allow for faster system operation and have only one detrimental downside: in a poweroff or unexpected reset the data waiting to be written won't be. As bad as that sounds, the performance gains during normal system operations usually overcome fears of this data loss potential.
It boils down to this: if every bit of data is crucial, disable write cache. If performance is paramount and some tolerance exists for infrequent data loss due to catastrophic failures, enable it. LiveJournal evidently wanted your normal experience to be pleasantly quick rather than painfully accurate.
-- @rjamestaylor on Ello
I assume that they will have the responsible luser pay for the down time plus the 2 weeks credit plus the extra hours for the staff to bring the system up.
And what the hell was a visitor doing playing with the Big Red Button anyways?
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
If it looks like this, don't push it!
Become a FSF associate member before the low #s are used
Huh? I think the post you replied to was doing the trolling. It was much more effective because it was much longer and more biting than the parent post. If anyone was trolled it was you because you responded. I'm guessing you don't understand how trolling works. Based on your user ID number, I'm guessing you're new here. I've been here since th 5000s. The key to master trolling is to make your post incredibly long, use a lot of profanity and ad hominems, and appear to be extremely upset. Then watch as the losers come in and try to get you back for what you said. I've been trolling on Slashdot myself for the past few years just to relieve the monotony. Trust me. You are the loser and the post you replied to is the success.
Apparently this photo is an example of the button that was "accidently" pressed.
I'd love to hear the explanation for this "accident".
MySQL: Rebuilding indexes that automatically corrupt on power interruption
Lying disk controllers: corrupted data on power interruption
Try using decent hardware and a decent database (PostgreSQL)
Some LJer showing her boobs accidentally hit the EPO switch. "Hey, look at me!! OOOPS"
there's a big picture livejournal seems to be missing - it's great they built all this redundancy into their systems INSIDE THE SAME DATACENTER.
they should also be investigating setting up either a disaster recovery site w/ fast failover, or another facility for an active/active configuration. thrown in some 3DNSes for WIPs, etc etc. that might help guard against a facility experiencing the stupidity of one of their customers, act of God, etc.
I do not deploy Linux. Ever.
I found your picture on your first excursion out of the house, you sack of semen-laced shit:
g
http://www.critical.ch/src/linux_nylug_booth.jp
You sure have a nasty mouth, did you learn to speak like that when your dad was fucking your ass as a kid?
Linux is shit and unstable and you know it, you disgusting scat-eater. But you run it because you love getting that nice monitor tan while endlessly patching and reading useless discussions of how Linux is going to make it one day.
Come out of that shit hole you live in, poopy-dicked fag-bagger.
yeah. i had a linux server up for 72 hours once, i know 'cause i stayed up for three days watching xclock. i only brought the server down because i wanted to move my mouse
Why don't you say that to my face you limp dicked bitch? I'll tell you why. Because you know if you did, your teeth and glasses would wind up in your asshole.
I don't have to "read useless discussions" about Linux making it one day. It's already made it for me for the past ten years you stumphumping steaming pile. It's a good thing that we're both posting AC or I would hunt you down since I'm pretty sure your'a pencil-necked geek or some fat Winblows luser in your mother's basement. Come back when your voice stops cracking and you grow a set.
This happened to us last year in our datacenter.
The Facilities manager had some guys in to install shelving to store toner, cables, etc.
Our datacenter is divided into two sections, inner and outer. All CPUs, UPSs, HVAC, etc are in the inner room. The outer room is shelving, desks, CCTV (security), etc.
The EPOs are near every door, as they should be, including the outer doors. Some guy, while installing the shelves, decided to take a little break and lean against the wall, leaning on the EPO in the process.
It took us about 10 minutes to figure out what the hell happened, because even the generator didn't fire as it should. Meanwhile, the shelving guys were just merrily installing shelves. When asked, the guy just said he didn't realize anything was wrong and just thought it was nice that everything "got so quiet" all of a sudden.
Like LiveJournal, we promptly installed cages over the EPO buttons.
Ahh, I always wondered how one trolled. I feel better now.
You sure hit the nail on the head son. I am glad that you recognize that I am without bias, opinion or a tendency to propagandize for my side. My reporting is beyond reproach and I cannot even fathom how someone could insinuate that things might be to the contrary...
Click here or a puppy gets stomped!
When will people stop using this POS for production environments? do you drive to work in your kid's toy car just because it's cheaper? no. you get the best car you can afford. Do you use FAT32 for your production severs? no. you use reiser or ffs+softupdate.
So - if they'd spent the extra 10 minutes it takes to learn how to program a real database, they'd have come right back up with maybe 5 min of transactions needing to be replayed.
Sitting Walrus Blog
Is why do you have an emergency shut down for a bunch of journals? Dear God Jim! The hax0rz have gotten the journals! Shut them down, now!
> OTOH, I saw one example of a Crisco Crapalyst
We always had problems with auto negotiation and the Crapalyst. It wasn't wiring or the workstation either. Whenever there was a performance problem it was almost always in the switch.
The new guy's first day was also his last.
Simple solution to this one. At work we don't have a kill button. We have a kill key. It takes a little bit more work to "insert key" and "turn", but it's better than having incidents like this wherein somebody hits the big red button.
Plus, you can give the key only people that aren't idiots. With the big red button, you'll inevitably get somebody who thinks "hmm, wonder what would happen if I pushed this big red button duhhhhhh."
Anyhow; I have seen EPO activations ranging from the malicious to a simple slam of the door and never once has it saved a life. So what? if a monitor smokes.
Until then: Place the redundant part of your system in a seperate room, building, or country.
Bigger PSU capacitors = a machine less likely to crash or shut down during a brownout. I mean, after all, their job is to buffer power fluctuations. I doubt it had much to do with the OS.
Or maybe I've just been reading too many episodes of BOFH lately.
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
The second was a guy who was on his first day of work with us. A Big Boss came towards the machine room, so - feeling helpful - the new guy opens the door for him... or so he thought.
My favourite story (though I wasn't there) is about some old DEC machines, which apparently had the power switch about 6" from the floor. Nobody knew why they kept crashing at night, until someone spotted a cleaner ramming a vacuum cleaner right up to the servers.
That beats the one we had, when I used to do a lot of soak-testing of machines in a lab - I'd kick off a test on a Friday night, come back in on Monday to find the machine had rebooted. Nothing in the logs, just looked like the power had died, and returned again half an hour later. Other machines on the same power supply were fine.
It turned out that the cleaners were unplugging the servers, so they could plug in the vacuum cleaner!
Author, Shell Scripting : Expert Re
Power went out in in the server room in the middle of the day. No one could figure out why for like 30 minutes. The breakers were fine. Finally the electrician (!!!) had traced the outage to the red emergency switch located kind of out-of-the-way. The switch has never been used so no one suspected it.
One of the guys in the IT dept. is so incompetent his mere presence warps the space-time continuum. That's a topic for some other time. Anyway, while we were all discussing the incident, he kept referring to the Red Switch(tm) as the halon fire extinguisher switch. The electrician had intervened like 4 times to correct him and explain to him that the switch was the electrical kill switch. But he still kept saying "halon". That was weird - I though.
Later that day, the same guy made the boss go out and buy him a fire extinguisher for his desk because he thought we had a fire earlier and he wanted to be prepared. He made it an HR issue. Which was also weird.
There was no fire.
No one had ever been able to prove it. But from the looks of it, the incompetent dude had thought that there was a fire, and had hit the emergency electrical kill switch because he thought it controlled the halon system. And then he either did not make a connection between his actions and the power going out, or he decided to cover it up and not tell anyone that he hit the switch. He never confessed.
He still works here...
As another troll (who prefers to go light on the profanity in order to garner mod ups and unleash my crap onto the +1 threshold) the goal in trolling is to create chaos while having fun. No matter how long and stupid the argument gets, as long as you're getting off, you're winning. This is why most trolls are perverts. Post pix pls.
If they used PostgreSQL they would'nt have had to deal with rebuilding indexes, etc.
There are real-world reasons to use an ACID compliant database!
Just because it CAN be done, doesn't mean it should!
It is a commonly known fact that cisco autoneg sucks ass.
Somebody's been hiring Stooges to guard that button. Bunch of lousy idiots.
yes, that's how it works. I used to have a computer which I could turn off for a quarter of a second without causing it to reboot. As you might suspect, I discovered this behavior by accident.
On a related note a Brownout isn't desirable and can cause a sitiuation which is commonly called a loss of power. I really don't understand why some people here don't see the difference between powering off and an unintentional drop in voltage.
Since it's not exceptional to have brownouts (some elevators cause them btw) there are standards for PSUs on how much they can take before they can't supply anymore. Good computer magazines simulate brownouts when they test PSUs and the cheap brands usually fail miserably.
That's why GP's link is so funny after all - even the best OS in the world will fail if the motherboard, CPU or other peripherals don't have any power.
I don't read replies by ACs.
They need a Molly Guard
"Everything is adjustable, provided you have the right tools"
It's just a bunch of stupid furries that whine to eachother how nobody understands them, how sexy balto is, and pictures of them having anal sex with eachother in their lame assed fur suits. eff em.
I noticed one thing conspicuously absent from their list of :Things we're doing to avoid this crap in the future..." That item is:
"Put a big sign next to the EPO button saying 'Do NOT Press This Button, it cuts off power to the entire building, it is not a light switch nor a door switch. Push this button only if your life is in danger. If your life is not in danger and you push this button, your life WILL be in danger."
Sorry folks, but there IT folks are stupid. Just about every major vendor and half way decent IT/OS/network idiot out there should know.
NEVER LET YOUR SERVERS AUTO-NEGOTIATE !!!!!!
Set the switch ports to the specific speed and force your server NIC's to a forced speed.
AUTO-NEGOTIATE SHOULD ALWAYS BE OFF !!!!!!!!!!!
Not to be an a##hole, but come on... ?
"Why can't the EPO button perform in the same manner as a door release for an emergency exit..."
t ml
. html
Emergency Power Off (EPO) switches are primarilly a safety feature. If some person is being electrocuted, you hit the switch and the power dies so the person doesn't. You don't have time to wait in a situation like that. A person's life is considered more valuable then LiveJournal, which despite the name, isn't actually alive. (Insert comment about angst-ridden teen-age girls here.)
See also:
http://catb.org/~esr/jargon/html/S/scram-switch.h
http://catb.org/~esr/jargon/html/B/Big-Red-Switch
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
"If you haven't tested your emergency plan
recently, then it doesn't work anymore."
Jerry Saltzer oversaw MIT's first campus-wide
network. Saltzer's Law is the voice of experience.
You ran a Windows data center, right?
You see, there are some other operating systems that don't need to be rebooted every day, or every week.
Although, on the other hand, hard drives can die - but keep running until the computer is shut off. When my company moved out data center to a new location, we had at least 3 or 4 servers (out of maybe 50 or 100 total) that had been running for well over a year. They were transported carefully, but the hard drives never spun up again.
Not that this was a really big problem - all we had to do is restore the machine from nightly backups onto a spare hard drive. Still had plenty of time left in the downtime window when the servers got booted back up.
So yeah - servers should be rebooted at some point (probably just when services are added to verify they start on boot) - but no way should they be rebooted every single weekend.
duh. You didn't read both emails in the link as I and most others did I suspect.
So, I double-clicked the button as fast as I could. No problem! Everything stayed up.
I have seen that a few times since then, where the good-quality computers have survived momentary power outages and the crummy ones haven't. Just another reason to buy quality hardware...
Linux IT Consulting and Domino Development in Michigan
"Please do not press this button again!"
Vista:XPSP2::ME:98SE
Mod parent up.
The BSd box just had bigger caps for it's PSu size or just better quality capcitors.
I had the same thing happen when my PC rebooted but my old powermac rode though a brown out.
The mac's psu was physically 2x the size of the atx in the pc ie far bigger caps and heat sinks.
Dear Mr. Rotund. How does one "roack"? Is it a sound? An action? Is it a new dance? Please elaborate on this stupidity. kthnx
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
Now we're getting somewhere Mr. Rotund (implication that you are a fat lazy slob). I see that you must be using Windows 3.1 to operate your brain. That would explain the latency in your response. Six days. Not bad Mr. Rotund. That 16-bit single tasking brain of yours can work a little, even if it's wayyyy late. LOL!!!!111!!!! OMFG!!!11111!!!!!! I made a funny.
Bleh.
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
I don't know about "Rotund Prickpull", how about "Rotund Pillock"?
No. I think not. I *HAVE* a life in "meatspace" as you call it (I'm no geek. I'm an artist who happens to use computers). If I didn't have one, I'd be trolling like you all the time. You definitely don't have a life Rotund Bastard.
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
How about "too unimaginative to invent a name", anonymous cretin?
michael? is that you?
Whheeee! Fun with trolling the trolls. Your ignorance is quite entertaining Mr. Penis Pudgepack.
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o