Blow-by-Blow Account of the OSDN Outage

← Back to Stories (view on slashdot.org)

Blow-by-Blow Account of the OSDN Outage

Posted by Roblimo on Wednesday June 27, 2001 @02:36AM from the warts-and-all dept.

The first hint that all was not well came at about 2 a.m. on Saturday, US eastern time, in the form of slow-loading pages. By 7 a.m. it was obvious that this was not a typical, easily-fixed, reboot-the-database problem. The network operations people were paged, but did not respond. Uh-oh.

Our network operations staff was shorthanded; one of our most knowledgable people had quit recently to go into business with a friend and had not yet been replaced. Another was in the hospital, ill and unreachable. A third's cell phone was on the kitchen counter, unhearable from the bedroom, and the fourth one's cell phone battery had fallen out. It was a frustrating comedy of errors, and an unusual one. Our netops staff is typically "on the bounce" 24/7.

Dave Olszewski, an OSDN programmer who is not technically part of our netops staff and is not trained in our equipment setup, happened to be on IRC at the time. He doesn't live far from the Exodus facility in Waltham, MA, where our server cage lives, so he went there immediately. Kurt Gray, lead programmer, who we dragged out of bed, was not far behind. Hemos and others were awake by then, growing frantic as we found that not only Slashdot, but also NewsForge, freshmeat, OSDN.com, ThinkGeek, and QuestionExchange were down, along with our old -- but still popular -- MediaBuilder and AnmationFactory sites. Arrgh!

This is Kurt's "on the scene" report from Exodus:

Walk into our cage at Exodus and it seems harmless enough but try to learn what everything is doing and where the wires are all going in less than an hour and you could go insane. You're standing in a nice, clean, uncomfortably air-conditioned facility with 150 of VA's FullOn and various other servers humming away. Greeting you at the door is "Big Gay Al" our Cisco 6509, which contains two redundant router modules: Kyle and Stan. If Stan dies, Kyle takes over and vice-versa. Across the cage are two Arrowpoint CS800 load balancing switches: one is racked and idle (as a hot spare) and the other is live and balancing the load for most of our OSDN web sites. Between the Cisco 6509 and the Arrowpoint is a bridging FreeBSD firewall using ipfw rules to block stuff like ping just to drive everyone nuts basically.
"I can't ping your site!"
"Yeah, we know."
Just to make things interesting we've added ports to the 6509 by cascading to a Foundry Fast Iron II and also a Cisco 3500. We've got piles of printouts and documetation of all sorts, drawings and spreadsheets, helping us keep track of every IP and machine in this cage, yet it doesn't seem to get any clearer unless you've either built it yourself (only one person who did still works here and wasn't available this weekend) or if you've had the joyful opportunity of spending a night trying to trace through it all under pressure of knowing that the minutes of downtime are piling up and the answer is not jumping out at you.
At this point if you know anything about networking you'll demand an explanantion for why we're using each piece of equipment in the cage and not a WhizBang 9000 SuperRouter like the one you've been using flawlessly that even washes your dishes for you and makes food taste better too... I can only tell you that I'm not the networking design person here, I didn't chose this equipment or configure it but I'm told it's very good hardware as long as you know what you're doing, but as CowboyNeal once said, "You can take everything I know about Cisco, put it in a thimble and throw it away."
So Dave takes a look, can't ping the gateway, can't ping anything. Reboot the firewall. Didn't help. Still can't ping outside. OK, reboot the Arrowpoint. No difference. Hold your wallet... reboot the 6509... rebooting... rebooting... no difference. This is not good.
"Did you reboot the firewall?" I asked Dave.
"I rebooted everything," he said. "I think's it's the Cisco."
So we console into the Cisco 6509. What a mess. Neither of us understand how this switch was configured and what it is trying to do. We don't fully understand why you can get a console connection on Stan but not Kyle (turns out the standby module doesn't have active console, that's normal).

Headshaking all around. Meanwhile, about 11:40 a.m. Yazz Atlas woke up and got his cell phone reunited with its battery. He picked up his voice mail messages, tossed on clothes, and hustled over to Exodus.

Yazz says, "When I arrived at Exodus, Kurt and Dave were trying every combination of things to do to get the 6509 back. But neither they nor I even knew the Cisco Passwords." The op who was supposed to be on duty (the one whose phone was out of hearing) was still nowhere to be found. They called their hospitalized coworker and got the Cisco passwords.

But, says Yazz, "Since the Cisco was rebooted there were no logs to look at. We could ping something on the inside but not everything. On some VLANs we could ping the gateway and others not. The outside world could ping one of the IPs the 6509 handles but not the other. From the inside we could not ping the IP that the outside world could ping. We could ping the one that they couldn't...very frustrating..."

Kurt again:

Several hours of this sort of network debugging went on until 3:00 AM Sunday. By then we had called Cisco for help. They couldn't help us until they saw the switch config and got a chance to review it. We were spent. We had to go to bed and stay down for the night.
Next morning we're back at Exodus and the situation hasn't changed -- our network is unreachable to the outside world. I was hoping that during the wee hours of the morning the Cisco 6509 had become sentient and fixed its own configuration or perhaps a friendly hacker had cracked into it and fixed it for us, or perhaps ball lighting would travel down a drain spout and shock our cage back to life like those heart paddles paramedics use... "It's a miracle!" No such luck.
So I called Cisco tech support. I wish had done this sooner. I was amazed first of all by how you can talk to a qualified Cisco tech immediately... we're talking an 800 number that you dial and within less than a minute you are talking to a technician... doesn't Cisco realize how shocking this is to technical people, to actually be able to talk to qulified technicians immediately who say things other than, "Well, it works on my computer here..."? Do they not know that tech support phone numbers are supposed to be 900 numbers that require you to enter your personal information and product license number, then forward you to unthinking robots who put you on hold for hours, then drop your call to the Los Angeles Bus Authority switchboard... does Cisco not understand that if you do not put people on hold for at least 10 minutes they might pass out in shock for being able to talk to a human too soon? Apparently not.
So I asked the Cisco technician, Scott, to telnet into our switch and take a look at the config. I figured he'd balk and say, "No I can't do that," because of course this is a tech support number I called so he's going to tell me to give the phone to my mommy if she's there and ask her to log into the switch because, since I don't have a lot of experience with IOS, I must be some kind of idiot to even call tech support without knowing what my HSRP configuration is on VLAN 4. Instead he says, "OK, what's the login password?" I can't believe this... I must have dialed the wrong number, he's not going to just go into our switch and sort this out for me right here and now, is he?
So he's in the switch and he's disgusted and horrified by how we have it configured, and I'm sure he's right. So I ask him, "Well, can you change all that?" I figure he'd say, "No, this your equipment, you fix it yourself," but he doesn't, he says, "Sure, what's the config password?" You gotta be kidding me, I must have dialed the wrong number here... this cannot be a tech support line... you can't actually get a tech support rep on a toll-free number simply to log in and fix your router setup while you whine at him on the phone... this is not real.
So he's in the switch config and he's having a great time pointing out everything some of our people warned us about months ago. He tells me this is wrong, we shouldn't be doing this or that... "Well, then change it if you don't mind," I tell him. "Switch broke. Me dumb. You fix." ...so at one moment Scott wanted to undo some changes. He bounces the switch... copy startup-config running-config ... the switch resets itself... then email starts streaming into my inbox... then I can ping our sites all of a sudden... we're back online! Everything is back! Weird.
Ok, that's all fine, but Scott is still freaked out about how we have the switch configured. Soon I get a call from Barnaby, another hot shot Cisco tech rep. He just logged into our switch and he's horrified too. He wants to walk me through a total switch upgrade and cleanup right now. "Not tonight", I tell him, "I'm burnt and I need to consult some some network people over here before we mess with this any further."

The next day, Monday, Kurt talked to Exodus network engineers and asked them why our uplink settings were so confusing to Cisco engineers. Instead of getting an answer from Exodus and running to Cisco with it, and then back again, he got Cisco and Exodus engineers to talk directly to each other and work it out. He conferenced an Exodus network engineer to Barnaby at Cisco and, Kurt says, "they talked alien code about VLANs, standby IPs, HSRP, multihoming, etc. etc., and they came to an agreement: our switch config was a mess... but at least Barnaby knew what the settings were supposed to be and an Exodus engineer agreed with him."

Before moving on to the (short) Tuesday outage, here are a few more notes from Yazz:

The one card going bad wouldn't have been such a big deal if the config in both were set up correctly. It was meant to flop to the other interface if the primary card died, which it did, but not with all the info it needed... AKA it was misconfigured...
Exodus really wasn't set up to handle the type of failover the 6509 was meant to do. Thats what the Cisco folks said basically, and the Exodus people are no longer supporting this type of Cisco in their setups. Half the VLANs were only stored on one unit and the other half of them on other. So when one died it only knew half of the full setup and couldn't route things correctly since the VLANs it wanted weren't there... Fun!!!

Tuesday was router reconfig day. It was originally only supposed to cause "about five minutes" of downtime, so it didn't seem worth posting any kind of notice that it was going to happen. Why the middle of the day instead of a low-traffic post-midnight time? Because this way, if there was any trouble lots of people at Exodus and Cisco would be awake and around to help. And it was a good thing this choice was made. Kurt picks up the story:

Tuesday 11:00 a.m. we're back in the cage. Barnaby is logged into our switch while he's talking to me on my cell phone (which disconnects every 5 minutes just to make my day more challenging), helping us by upgrading the Cisco 6509 firmware, then he's going to clean up the config. First step was getting the firmware patches onto a TFTP server near the switch (had to be less 3 hops from the switch, TFTP doesn't work over longer hops). Yazz took care of that. From there Barnaby patched the firmware, had me reboot the switch, and we should be down for just 5 minutes. Unfortunately 5 minutes turned into 2 hours.
After the switch reboot part of our network was unreachable again, much like Saturday's episode only this time with a Cisco rep on the phone helping us work it out. Again we started tracing cables all over the cage, pinging every corner of the matrix. Barnaby got an Arrowpoint tech rep, Jim, on the line and into our Arrowpoint. But this is tech support, Jim isn't just going to log into our Arropoint and debug for it for us, right? Wrong, this is Cisco tech support: Jim logs into our Arrowpoint and works with Barnaby to trace packets and debug our network.
For a while we put a cross-over cable in place of the firewall just to be sure the firewall box wasn't jamming us. Nope. Didn't help. Barnaby and Jim are mapping hardware addresses to IP addresses to figure out where each packet is going. Finally Yazz and I are staring at this other switch cascading off of the 6509, this little out-of-the-way Cisco 3500 just sitting there... is this thing connected? We look at the link light leading it to the 6509. It's dark. "Uh Barnaby... can you check port 1 on module 2?"
"Hold on," he says over the phone to me. Then the light goes green, and after a few seconds of routers correcting their spantrees we're back online. Everything is back online. All this time it was this little interface to an ignored switch that none of us bothered to account for. Make a big note about in the network documentation, please.
After we came back online Barnaby went ahead and cleaned up our switch configuration, put things the way they ought to be, made our conections sane and stable.

This has not been OSDN's finest week. But we thought it was better to give you the full rundown than try to pretend we're perfect. At least we've learned a lot from the experience -- like to call for help from specialists right away instead of trying to gut things out, and just how valuable good tech support can be. If nothing else, perhaps this story can help others avoid some of the mistakes we made. We certainly aren't going to make the same ones again! (~.*)

389 comments

Min score:

Reason:

Sort:

Roblimo: Your story and Rob's are inconsistent by Anonymous Coward · 2001-06-26 22:04 · Score: 1

Please explain, if you can, why CmdrTaco wrote "we discovered that she wasn't actuually as qualified as we had hoped."

It is obvious that he tried to cover up this unfortunate choice of words.

What is not so obvious is why you have written a story which causes his earlier account make absolutely no sense. If "she" was the tech who had quit earlier, what on earth did her qualification have to do with the outage?
1. Re:Roblimo: Your story and Rob's are inconsistent by lcypher · 2001-06-27 02:49 · Score: 1
  
  I haven't been around Slashdot for that long so I don't know alot of the nuances involved in moderation, yet I am confused as to why the post above was moderated as "Offtopic"?
I told you... by Anonymous Coward · 2001-06-26 22:40 · Score: 1

..not to throw pennies into the router.
Strange by Anonymous Coward · 2001-06-26 23:23 · Score: 1

I thought that it was only Windows that needed rebooting.
Cisco ROCKS! by Anonymous Coward · 2001-06-27 00:45 · Score: 1

We have a $MEDIUM_NUMBER contract w/Cisco on our 6509 and 2900 switches, and Cisco ABSOLUTELY ROCKS! when it comes to support. I've had to pick myself up off the floor a couple of times while working with them... Cisco provides the best tech support on the planet, bar none!
Re:Cisco Support by Anonymous Coward · 2001-06-27 01:06 · Score: 1

Cisco's daytime support(from a US perspective) is handled in the US, while calling after hours gets you the UK. While the US guys are good, the UK guys are Excellent.
Cisco TAC is fun! by Anonymous Coward · 2001-06-27 01:59 · Score: 1

I have only read a small portion of this, since I now have to go on shift in the Cisco TAC. :-) I sure have enjoyed the thread though... I've been with the TAC for nearly 7 years, first in Routing Protocols, and now a "newbie" in switching. (Boy, that's an odd feeling!) Most people burn out after a year or two, but I actually _enjoy_ making my customers happy as I fix their networks. I think that is what makes the difference here, most of us like our work and really _want_ to help. We have alot of fun in our jobs here, laughing with each other and customers, and making the networks hum again... I came from SynOptics TAC, and there is a huge difference at Cisco. We can actually make a difference here, suggest ways to improve, and are empowered to make decisions for ourselves and customers, reach developers, etc. as a normal part of our duties. My hat is off to John Chambers for being so candid, easy to approach, humorous, and having a great sense of how to manage Cisco so well.
Re:Cisco Support by Anonymous Coward · 2001-06-27 02:24 · Score: 1

P1s get worked 24 hours until they're resolved, and if they're not fixed in less than 4 hours it's not so good for us.

The reason it is not so good for our TAC if a P1 isn't resolved in 4 hours is that a business rule fires somewhere and John gets a page. :)
And when John gets a page... people die. (Just kidding but I was watching Austin Powers the other day and couldn't resist throwing that in.)
Re:Anne Tomlinson? by Anonymous Coward · 2001-06-27 03:47 · Score: 1

But how did you know that your comment will be modded down at the time when you posted it?
Your post is violating the principle of causality!
Re:Eeep - scary moderators! by Anonymous Coward · 2001-06-27 08:32 · Score: 1

One thing I learned from my years in the computer industry:
Don't let women near the computers!!!
Re:Cisco Support by Anonymous Coward · 2001-07-05 13:55 · Score: 1

8M in cisco hardware? oh, they bought ONE 12000. Cisco is a horryfying rip off.
Cisco couldn't buy advertising like this... by Anonymous Coward · 2001-06-26 23:17 · Score: 2

Well deserved though.
Re:Beware of departure from original statement by Anonymous Coward · 2001-06-27 01:22 · Score: 2

I think that it's called a lawsuit (or threat thereof). The original post could be considered libel, because if the person in question went somewhere else and had /. on her resume, she might have a really hard time convincing them that's she's worth a darn. IANAL, but it's just my thoughts....
Re:Cisco Support by Anonymous Coward · 2001-06-27 00:06 · Score: 5

I have worked in the Cisco TAC for about 2.5 years. Currently on the routing protocols team (EIGRP, OSPF, BGP etc.) Prior to coming here I had never dealt with people this obsessed with getting everything right all the time. Really they drill it into you. Mandatory perpetual training and such.

Many people who call don't understand how the system works internally so here's a summary: We have cases in 4 groups, priorities 1 through 4, 1 being the most important. The designation of the priority of the case is entirely up to you as a customer. All cases are P3s by default which more or less means they need resolution within 72 hours. If your network is down and you need help right now, today with no waiting we'll elevate to a P2. If you are in a serious network situation like the one described in the article then it's a P1 and literally everything else stops, a bell goes off and everyone crowds around the tech w/ the problem (unless it's a softball case).

There are TACs all over the world but for English-speaking customers what usually happens is the US TACs roll over to the Australian TACs in the early evening who in turn roll over to Belgium and then back to the US. P1s get worked 24 hours until they're resolved, and if they're not fixed in less than 4 hours it's not so good for us.

We have to close about 5 of these cases a day which is sometimes cake (I can't ping my interface which is shut down) and sometimes nasty (redistribution 12 times over).

Also, those little surveys you get everytime you work with us (Bingos) are very important. If you'll recall you can rate us from 1 through 5 in 8 to 10 different categories. Anyone who doesn't maintain an average of at least 4.59 is not long for the TAC, 2 or 3 months tops.

The pay is actually kind of crap but there's no better place in the world to prep for your CCIE. I don't think anyone views the TAC as a long-term environment. Too much stress honestly.
Re:Why am I not surprised? by Hemos · 2001-06-27 00:00 · Score: 1

Have you looked at Bender, yet?
It ain't Robcode anymore. :)

--
Yeah, I'm that guy.
Re:Grr... by Hemos · 2001-06-27 00:40 · Score: 5

Very easy - someone typed in the wrong time. We deemed the shared source to be more important, and that was supposed to go up first.

Not everything is a conspiracy folks.

--
Yeah, I'm that guy.
Heh... by Wakko+Warner · 2001-06-26 21:48 · Score: 1

...apparently slashdot's NetOps staff doesn't bother to read Slashdot either, or they would've noticed it was down... :)

--

--
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
Re:cisco tech support are badasses by Roblimo · 2001-06-26 22:17 · Score: 4

No black eye for Exodus, please. Our router config was not a standard one they support. Exodus dude Derek Lam, especially, went way "above and beyond" this last week.

- Robin
Re:Cisco Support by rodgerd · 2001-06-27 04:56 · Score: 1

There is no such thing as a small support contract with Cisco - you basically pay the cost of the equipment again every two years as a minimum, IME.
Re:You know you've been using windows too long whe by Trepidity · 2001-06-27 07:46 · Score: 2

See I run a Linux router/firewall, and I do that too. 99% of the time it works. X fubared the display? No problem, reboot and it works again. ipfilter stopped forwarding NAT packets for some reason? No problem, restart it and it works again. etc.

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Re:You know you've been using windows too long whe by Trepidity · 2001-06-27 09:22 · Score: 2

Not anymore, because I've come to determine that X sucks, which is why it's now just a firewall/router box (headless). It used to be an attempt at a Linux workstation before I gave that up...

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Re:More Writeups Needed by Have+Blue · 2001-06-26 23:31 · Score: 2

I also agree... This is what "hacking" is really about, solving complex problems through ingenuity and diligence.
Re:Not the issue by shogun · 2001-06-27 18:48 · Score: 1

Yes you can, for example browse this article at -10 like this: http://slashdot.org/comments.pl?sid=01%2F06%2F27%2 F124207&cid=&pid=0&startat=&threshold=-10&mode=thr ead&commentsort=3&op=Change
Re:Not the issue by shogun · 2001-07-02 13:05 · Score: 1

I'm not sure, but not under this article anyway...
For those of little needs... by gavinhall · 2001-06-27 03:56 · Score: 1

Posted by polar_bear:

This is on a different scale entirely, but I host several small-traffic Websites through PHPWebHosting.com, and they rock. Their support is only via email (hey, it's only $9.95 a month! to host w/them...) but they're very quick, efficient and cheerful. (Yes, cheerfulness can come through in email...)

They offer MySQL, PHP, Perl and (I think...) Python - though I haven't done any Python. A few weeks ago I asked if it would be possible to install a few libraries they didn't already have installed so that I could use Midgard - I expected a "maybe in 6 months" or "are you kidding?" but they were happy to help and said they'd install them ASAP - which turned out to be less than a week. I'm wholly pleased with their service - if you need to host a small site, they're awesome.

Sites I host with them:
http://www.zonkerbooks.net/

http://www.dissociatedpress.net/

http://www.linuxroutingbook.com/
Rob, why would you throw Slashdot into this mess? by emil · 2001-06-27 02:07 · Score: 2

You put our favorite news engine in the middle of a routing mess that the network engineers had been warning you about for months?

What were you thinking?

You must be able to find a nice, comfortable colocation site somewhere.
Soviet-style historical revisionism by Stormie · 2001-06-26 22:26 · Score: 1

..and when our qualified personel arrived, we discovered that she wasn't actuually as qualified as we had hoped. Then she quit..

..and then she was erased from the latest "official" version of the story. What the fuck is this? This isn't a "blow-by-blow account", it's a service pack to fix the "bugs" in your last account of what happened!

And we have always been at war with Eurasia.
1. Re:Soviet-style historical revisionism by bellings · 2001-06-26 22:55 · Score: 1
  
  Hmm... perhaps someone with unlimited moderator points modded you down. Perhaps it was someone who spends all day, every day, reading slashdot. Perhaps you were moderated down by someone who is trying to protect the reputation of OSDN in a heavy-handed, amateurish way.
  
  I personally can not imagine why you were moderated down, or who would have done so.
  
  --
  Slashdot is jumping the shark. I'm just driving the boat.
Re:Eeep - scary moderators! by Stormie · 2001-06-26 23:01 · Score: 2

..and when our qualified personel arrived, we discovered that she wasn't actuually as qualified as we had hoped. Then she quit..

..and then she was erased from the latest "official" version of the story. What the fuck is this? This isn't a "blow-by-blow account", it's a service pack to fix the "bugs" in your last account of what happened!

Go on! Mod me down to -1 again! You'll have to do it a few times before I go below the "post-at-2" threshold!!
Re:Eeep - scary moderators! by Kurt+Gray · 2001-06-26 23:31 · Score: 5

I'm not sure what's happening with moderation but since so many people wanting to know: One of our netops quit suddenly Sunday without any explanantion, I assume she was put off by being called in on a weekend and being asked to stay late until it was fixed. I don't know, but these things happen so we deal with it. One thing you don't want to do is publically flame someone who still has your root passwords (although I trust this particular person with our root still), besides we're not mad at her, wish her well, sorry things didn't work out.
Re:Correct. by On+Lawn · 2001-06-27 02:54 · Score: 1

Only #2 ever gets to see #1. What I want to know where is #6 in all this?

~^~~^~^^~~^
Within a minute ??? by Masem · 2001-06-26 22:12 · Score: 5

...I was amazed first of all by how you can talk to a qualified Cisco tech immediately... we're talking an 800 number that you dial and within less than a minute you are talking to a technician...Instead he says, "OK, what's the login password?" .... he says, "Sure, what's the config password?"...
Was anyone else waiting for the "*clickity-click* Wow, it looks like your entire root directory was deleted!" punchline? :-)

--
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:
1. Re:Within a minute ??? by RAruler · 2001-06-27 02:19 · Score: 2
  
  Nah, if was really the BOFH he would've changed the routing tables so that all requests to slashdot.org get changed to goatse.cx, and then telling the guys that 'Doppler Static Effect' is their problem and that they need to demagnitzie the electrical contacts with their tongues.
  
  ---
  
  --
  
  --
  Insert Witty Sig Here
Re:Cisco Support by sql*kitten · 2001-06-27 20:19 · Score: 2

You must clearly have had no contact whatsoever with Oracle, or you must be working for them.
Oracle are a big company, and vary hugely in the support they give you. I've had situations where I've been given the runaround, like you. Getting passed from extension to extension, explaining my problem over and over again, "oh, umm, we don't do that stuff here, call this number..." and finding out that Bob's on holiday and his secretary has no idea who else I could speak to...
I've also had situations where Oracle have said our engineers aren't sleeping until this gets fixed, and a few hours later there's a motorcycle courier at my door with a gold disc containing a brand new build of Oracle with the bug fixed. I've had Oracle techs ssh into my servers, I've had the come to the data centre with mysterious CDs containing Oracle software that they don't let outsiders have, and that they erase from your machine once they're done.
Helps to have (or at least have access to) a high-end support contract, tho'. If you're some kid downloaded 9i onto his Red Hat box, forget it.
Cisco by MeAtHereDotCom · 2001-06-27 06:08 · Score: 1

Cisco rocks for support. I can open a case online, within 20 minutes have a call back. And, I can say I've tried x,y,z, and if they have any ideas, and say 'i think there is a hardware problem', and they go 'ok, what's the part number, and have a part on my doorstep the next day'. For what it's worth, if you ever do anything mission critical, or pseudo mission critical with Cisco stuff, ask one of their SE's to look over your config. They'll do it for free if you are buing their hardware (and, I suppose, are paying enough.) I know that the 6509 in this case was probably around $120k, so they probably would have done this for free, and it could have alleviated this problem in the first place. Then again, hind sight is always right. Then again, i've also had good luck with Sun support on hardware. I haven't used them for software that much,however.
Re:I just have a couple of questions by tzanger · 2001-06-27 11:33 · Score: 2

Uh... TFTP uses UDP, which is a connectionless protocol, you can of course transfer files over more hops, but keep in mind, the more routers, etc you have in the middle, the more chance of a packet being dropped, and one packet can mean quite a bit when your transfering a new IOS image to your cisco ;)

Now it's been quite some time since I've looked at the TFTP RFC but I'm pretty damn sure it has the capability to request a block be retransmitted in the case of a timeout (packet loss). In fact, I'm sure of it; during the upgrade a few '.'s were noticed amongst a ton of '!'s and the checksum still worked out.
I forgot to add one thing by tzanger · 2001-06-26 23:20 · Score: 3

Was this configuration ever tested?! It sounds like it was put together, prayed over and sent out into the world.

it would have been simple to test too... pull out one of the uplinks... then the other... now try pulling out some of the webservers... and so on.
I just have a couple of questions by tzanger · 2001-06-26 23:05 · Score: 5

By 7 a.m. it was obvious that this was not a typical, easily-fixed, reboot-the-database problem.

Reboot the database?? WTF? You just proved my point as to why MySQL is NOT ready for primetime. Reboot the fscking database??

So Dave takes a look, can't ping the gateway, can't ping anything. Reboot the firewall. Didn't help. Still can't ping outside. OK, reboot the Arrowpoint. No difference. Hold your wallet... reboot the 6509... rebooting... rebooting... no difference. This is not good.

Guys, this isn't Windows -- Rebooting is an absolute last resort and if it works then you have discovered a problem, either in hardware or software and it needs fixed, not just a "oh well, a reboot fixed it, life goes on." Bastions of professionalism you're not.

I don't normally flame people for this kind of thing but the Slashdot crew are especially keen on bashing Windows, yet you resort to their exact tactics whenever a problem comes up.

Reboot the database?? I still can't believe I read that. Sorry.

Cisco Systems have some wonderful systems -- Hell I just recently found out about their stack trace analyzer... feed it a "sh stack" and it emails you back a list of IOS and/or hardware bugs which likely caused the crash. That is just plain old SCHWEEEET. Or being able to read their memory mappings to find out what is causing a bus crash... Ideal. You don't just randomly reboot the damn shit to try and get it to work. If it isn't working something is causing it. Embedded systems are generally pretty good at throwing up the red flags; you just need to look for them (logs, stack traces, extensive use of the debugging facilities...) Use the tools at hand instead of the big red button!

First step was getting the firmware patches onto a TFTP server near the switch (had to be less 3 hops from the switch, TFTP doesn't work over longer hops).

Unless this is something specific to the IOS or router, that's bullshit. I just upgraded 5 AS5248s to IOS 12.1(9) with a TFTP server that is 8 hops away. I'm not aware of any TTL issues with TFTP.

Finally Yazz and I are staring at this other switch cascading off of the 6509, this little out-of-the-way Cisco 3500 just sitting there... is this thing connected? We look at the link light leading it to the 6509. It's dark. "Uh Barnaby... can you check port 1 on module 2?"

You mention that your network documentation is shitty -- I sure as hell hope you'll push to have it upgraded and maintained with a high degree of readability. Even complex systems do not have to be undocumented just because they're complex. Use pictures, use words. I haven't found anything in IT which cannot be explained by a combination of both. And throw in a glossary for the non-techies like yourself who are called upon to fix it. :-)

Don't get me wrong; I'm glad you're back up. But this could have been prevented. Very easily from the sounds of it. I hope you did fire your cisco admin; it sounds like s/he didn't have a clue and was too terrified of losing his/her job that s/he didn't ask for help. Cisco has mailing lists, tons of documentation and there are many IRC channels to ask for help.
1. Re:I just have a couple of questions by EyesOfNostradamus · 2001-06-27 03:36 · Score: 2
  
  > I hope you did fire your cisco admin;
  Not necessary. She quit by herself... *duck*
2. Re:I just have a couple of questions by Phasedshift · 2001-06-27 04:24 · Score: 2
  
  First step was getting the firmware patches onto a TFTP server near the switch (had to be less 3 hops from the switch, TFTP doesn't work over longer hops). Unless this is something specific to the IOS or router, that's bullshit. I just upgraded 5 AS5248s to IOS 12.1(9) with a TFTP server that is 8 hops away. I'm not aware of any TTL issues with TFTP. Uh... TFTP uses UDP, which is a connectionless protocol, you can of course transfer files over more hops, but keep in mind, the more routers, etc you have in the middle, the more chance of a packet being dropped, and one packet can mean quite a bit when your transfering a new IOS image to your cisco ;)
Re:OSDN, Audit ALL of your systems NOW. by tzanger · 2001-06-26 23:16 · Score: 5

Point 1./ Why do you allow TELNET in to your routing/switching equipment from the outisde world? If a CISCO tech' with the password can do it then a hacker without the password likely can too.

Up until recently you had no choice but to telnet to Cisco equipment. I came up with a quick solution: deny telnet from anywhere but a same-segment computer (in our case, it's our RADIUS authentication box). Now ssh to the server and telnet from there to the NAS. Problem solved. :-)

Point 2./ If you are connected to the Internet in any way NEVER replace your firewall with a cross over cable. Basically at that stage you have your pants around your ankles, are bent over, with a big "Do Me Now!!!!!" sign on your butt!

While I usually agree, sometimes it is necessary to do a quick check. Even with the number of blackhats out there the chances of them doing anything signficant (or anything at all) for the 2-5 minutes you have the firewall out are insignficantly small.
Re:Cisco Support by jnik · 2001-06-26 21:58 · Score: 2

I don't know about Cisco's daytime support, but I can confirm that if you call 'em at three or four in the morning, they're incredibly helpful. I had to pull an all-nighter to get a site live a couple of months ago, and they spent a number of hours on the phone with us figuring out why the traffic wasn't routing (turns out this particular firewall didn't like doing NAT when the internal IP address had a number in the 60's--don't know why). Very polite, knowledgable, and willing to help--certainly the high point of that particular hellish project.
It doesn't explain.... by Malc · 2001-06-27 01:58 · Score: 1

... the continued poor performance of the site. Connections frequently time out or are very slow. Quite often, garbage is returned, or the wrong page. To all intents and purposes, the router/switch might as well be down more frequently. Just trying to get this posted has taken half-an-hour whilst I waited for every link to stop taking me back to the main page (with the login box present).
1. Re:It doesn't explain.... by SpaceLifeForm · 2001-06-27 06:00 · Score: 1
  
  This site still has problems, though I doubt they are due to Cisco systems! (Anymore)
  
  Blank pages or comments from a completely different topic appearing sure point to database problems.
  
  But when slashdot is slashdot-ed, you can expect any symptom.
  
  --
  
  --
  You are being MICROattacked, from various angles, in a SOFT manner.
Re:Cisco Support by Pii · 2001-06-26 23:06 · Score: 5

I can confirm this. I've been a network consultant for almost a decade, primarily as a Cisco router/switch jock. I've dealt with the TAC (Technical Assistance Center) too many times to count.
Hold times can vary, depending on time of day, but are never as bad as the stories from other companies. In most cases, you are on the phone with a real, live engineer within 5 minutes.
90% of the time, the engineer you are transferred to will be able to get your problem corrected. On the few occassions where they have not been able to help me, Cisco has moved mountains to get the right people invloved. I had an issue with Serial SNA - DLSW+ encapulation last year that was escalated to the point where the guy that wrote that portion of the code for IOS was on the phone, and was prepared to come to my client's site (True, they had purchased about $8M dollars in hardware...).
You do, typically, have to have a Smartnet contract, but as other posters have pointed out, if the problem is not hardware related, they will generally help you straighten out your configurations even without the contract.
Alot of people like to make comparisons between Cisco and Microsoft. Anyone who has dealt with the two will be quick to dispell any similarities. Cisco is a first-rate organization, with first-rate support, and I've made a career out of working with their products.

--
For those that would die defending it, Freedom
has a sweet taste that the protected will never know.
Re:Cisco Support by backtick · 2001-06-26 22:46 · Score: 2

I have been TAC'd "around the world" literally with Cisco support; One TAC case lasted 32 hours, all on the phone. We went from California, to the East Cost, to Brussels/Belgium, to Egypt, to Asia then Asia-Pacific and back to California. We had several problems that basically caused us to create a new core network from other pieces of equipment, tear down and rebuild the original router from the chassis up after a bad power supply ate the old one, and each and every card in it. This was a few years ago before everyone could afford or manage completely redundant network infrastructure, and things like 2 hour turnaround on hardware was supposed to alleviate things like this. The problem was, some of the cards would pass first level diags, but not run for long. Each part got there in less than 2 hours tho! It was one of those 'one in a million' cases, but the rep on the other end of the phone was cooperative the whole time.

That said, I've also had low priority cases where they don't respond for weeks; It's almost to the point at times that anything I've opened gets opened at Medium priority (business impact) or higher.
Re:Not like my experiences.. by Chang · 2001-06-26 23:08 · Score: 1

You only get the 1 minute response if you hit the Network Down Emergency option.
Re:Cisco Support by sphealey · 2001-06-26 22:09 · Score: 2

True. Even at the best of the best, there will be better and "less better" people. And anyone can have an off day.

OTOH, I also had the experience of a TAC rep spending 2 hours on the phone with a competitor's tech support line, explaining to them why their config wasn's working. He was right, too.

A good long-term sales tactic, though: guess whose product I specified the next time.

I do wonder what will happen to the quality level at Cisco TAC with the recent layoffs, though. The first sign of impending doom at both WordPerfect and Novell was when the tech support quality suddenly headed down the tubes.

sPh
Re:Cisco Support by sphealey · 2001-06-26 21:58 · Score: 3

Yes, if you have a SmartNet contract for that device, it's pretty much true. Cisco, mid-1990's Novell, and Oracle are the only organizations I know of that provide this kind of help. Microsoft "Gold" support plan, anyone? (gag).

Caveat: Cisco basically does not have first level support (i.e. "'Is the router plugged in?' 'What's a router?') - you are supposed to have second level knowledge and have completed the first level troubleshooting before you call TAC.

But - I have been out of the office and had brand-new network techs call Cisco with a problem, and they did help out even then.

sPh
Will gladly confirm... by Svartalf · 2001-06-26 23:21 · Score: 2

There's a reason why thier stuff's pricier than the rest- it's overall reliability (except on their low, low end...) AND the support.

They really ARE this responsive.

--
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Re:blocking ping, btw, is STUPID. by Svartalf · 2001-06-26 23:28 · Score: 3

Actually it isn't. As the other respondant to your comment pointed out, it's possible to determine system type from the ICMP responses. One should also realize that not all exploits use fragmented ICMP attacks. There's all kinds of abuses of ICMP that could be concievably used to take a system down. It's better to nip any of those in the bud for a high volume site or set of sites.

--
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Slashdot outtage - graphs and stuff by Oestergaard · 2001-06-27 01:54 · Score: 3

Luckily, /. is monitored, this historical event will be kept in the monitoring systems for ever and ever ;)
Go to the monitoring system page.
Click the www.slashdot.org link
Select services
This will give you some graphs showing the outtage.
Consider the time by Pseudonymus+Bosch · 2001-06-26 22:59 · Score: 4

A much more common experience is to wait on hold for 15-20 minutes, but I have waited on hold as long as an hour with them.

Well, in this case Slashdot was down. That can explain the instant response.
__

--
__
Men with no respect for life must never be allowed to control the ultimate instruments of death.
GW Bu
1. Re:Consider the time by cornjones · 2001-06-27 01:05 · Score: 1
  
  HHAHAHAHH... thanx, you gave me a big laugh. ej
Re:You know you've been using windows too long whe by Harik · 2001-06-27 05:20 · Score: 1

Mmm. Of course, since the cisco was rebooted, the logs would indicate a reboot... right after all the error messages. What, you don't run current IOS on decent hardware? It dosn't have persistant logs?!? Yikes, I'm sorry.
--Dan
Re:More Writeups Needed by Tack · 2001-06-26 23:24 · Score: 1

OK, can you give a URL for this DDOS reference?
The Attacks on GRC.COM.
Jason.
Re:Beware of departure from original statement by Ben+Hutchings · 2001-06-27 06:54 · Score: 2

Er, all Nextel phones are made by Motorola.
Re:Eeep - scary moderators! by jamiemccarthy · 2001-06-27 00:11 · Score: 1

I think the Anne Tomlinson post was a particularly brilliant troll. ... His good humored response makes me think it was a troll.

Of course it's a troll. This conspiracy crap is just a bunch of idiots. Don't they have anything better to do?
Sun is ceasing its open-source program, which now turns out to be an experiment. DirecTV is being hacked to possibly allow open content. And for Chrissakes, Microsoft floated a trial balloon about "shared source" and is waiting to see the community's reaction before they decide on the terms of the license.
And what will be the Microserf's report to his boss about Slashdot's reaction? "Boss, we floated the shared-source balloon, and nobody seems to care -- they're awful concerned about a woman named Anne someone who doesn't even seem to exist." "Excellent. Deploy the death ray, oops I mean the we-share-our-source meme."
Open your fucking eyes and look at the big picture. Don't get dragged into this pedantic navel-gazing meta-meta-meta bullshit. Some days I think our readership really has gone to hell. You all suck. Bah.

Jamie McCarthy

--
Jamie McCarthy
jamie.mccarthy.vg
Have to agree about Cisco. by mrbill · 2001-06-27 00:53 · Score: 1

I recently purchased a Grand Junction FastSwitch
2100 off eBay. The FS2100 is the same hardware
as the Cisco Catalyst 2100 - Cisco just bought
GJ, and repainted and re-logo'ed their current
product line.

Anyway, the switch had a console password set,
and I couldnt get in. The Cisco web page on
bypassing console passwords didnt work (said
"If you're running xx revsision call TAC").
I called TAC, opened up a case, told them it
wasnt urgent, and prepared to wait a couple
hours.

Five minutes later, a guy in Australia calls
(this was 9pm CST or so), asks for the serial
number of the switch, takes a minute, and
proceeds to give me the hard-coded override
password so that I can get into the switch,
change the settings, and update the firmware.

That quick response - on a clearly NON-priority
case, and I didnt have any kind of support
contract, and wasnt the original owner of the
hardware.

I'm *still* impressed. Cisco costs more, but
when stuff is broken, they WILL fix it.
Cisco is the best. by _14k4 · 2001-06-26 21:47 · Score: 1

Cisco tech support has done -nothing- but be the best, most understanding (especially in network down situation) and knowledgable support I have ever come across. You owe those guys a beer.

Mmm.. beeer.
Re:Not like my experiences.. by _14k4 · 2001-06-27 02:37 · Score: 1

Hey, the big 4 he was mentioning happen to have a LOT of hardware out there. Wether or not they suck, is moot. If they're out there, and out there THAT much, they've gotta have the sppt, their Service Lvl Agreement customers deserve.
Re:Cisco Support by joeboo · 2001-06-26 22:12 · Score: 2

I called cisco one night, while replacing my Nortel BCN with a cisco3662. For some reason, I couldn't get my BGP peer established with Sprint. It was 3am Central US time when I called cisco. I first talked to an individual who stated that they were on a callback. I figured that I would get a call in 45 minutes to an hour. 3 minutes later, my phone rang, and it was a gent from Belgium. He logged into my router for me, found the bgp error immediately, fixed it, and I was on my merry way. He even fixed some of my access-lists for me while he was there.

Cisco has _the_best_ customers service that I have ever seen. It is good enough, that I don't mind paying a bit more for the hardware, because I know that if it breaks, there will always be someone to help me out.

And, I don't work for cisco :)

--
Joseph W. Breu
Re:You know you've been using windows too long whe by Kiwi · 2001-06-27 03:02 · Score: 2

"If it doesn't work, there is a reason; something is wrong. Rebooting will not fix the problem."
Not always true. I used to admin JSP-based web servers. My experience is that the Java virtuals machines that server jsp pages have a way of starting to act funny. Stopping and restarting the services fixes the problem.
If I was ever building a network, I would not allow JSP to be a part of the network for this very reason.
Then again, if a JSP guru knows what can cause a JSP engine to act wonky, or how to set up a JSP engine so it is stable and doesn't need reboots, please post a follow up describing how to do this.
- Sam

--
The secret to enjoying Slashdot is to realize that it should not be taken too seriously.
Re:Anne Tomlinson? by unitron · 2001-06-27 11:44 · Score: 2

How is calling someone "one of our most knowledgeable people" abusing them?
Of course the original story, or, I should say, some of the versions of the original story (how often can you rewrite the original and it still be the original?) mentioned "...when our qualified personel arrived, we discovered that she wasn't actuually as qualified as we had hoped. Then she quit..." which doesn't sound like someone who was already not working there anymore before the troubles started, so I assume that we're talking about 2 different people here, only one of which was identified one way or the other by sex/gender.
Quite a ways down in the responses to the aforementioned "original" story is an AC post signed Anne Tomlinson that seems to give another perspective on the events that weekend. It's a little ways down the page from another post that has some of the different versions of the original story.

--
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Re:Not the issue by unitron · 2001-06-27 15:03 · Score: 2

For all we know they are and we can't browse low enough to see it.

--
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Re:Not the issue by unitron · 2001-06-28 09:21 · Score: 2

Okay, so were there any posts at -2 or lower?

--
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Re:Cisco Support by ApheX · 2001-06-26 23:09 · Score: 1

Cisco tech support, for me at least, has been nothing short of amazing. With any problem it takes a few moments after you call in to get someone on the line, and its 24/7 and anywhere in the world apparently. One night at 3AM, we called up cisco and talked to someone in Australia apparently. Cisco has an awesome VoIP telephony network that they use to handle all these calls. And Cisco is also willing to config whatever you neeed. Pity not all companies are like this :(

--

-
aphex
I Steal Music!
Re:OSDN, Audit ALL of your systems NOW. by DraKKon · 2001-06-27 03:38 · Score: 1

Point 2./ If you are connected to the Internet in any way NEVER replace your firewall with a cross over cable. Basically at that stage you have your pants around your ankles, are bent over, with a big "Do Me Now!!!!!" sign on your butt!

While the crossover cable was active I broke into slashdot and stole the source code and I'm selling it on eBay.. Just look for open source slashd code.

--
"It's not like your minds are as open as the source you love..." - Me to the majority of Slashdot.
Competent tech support! by The+G · 2001-06-26 21:55 · Score: 2

"Next time you think you are calling technical support droids, next time you think that you will but put on hold for hours, be careful, you may be placing a call to ... the twilight zone."

Yike, I say. Yike. Competent tech support does not exist in this earth. What planet is Cisco on, and to what worthy cause can I donate money to see that humans never send a manned mission there and pollute this fascinating superior alien culture?
--G
Re:Kenny by szo · 2001-06-26 21:55 · Score: 3

Look at it this way: every time he comes back again, all by itself! Other people die once and for all...

Szo

--
Red Leader Standing By!
Re:Three words for you guys... by Lemmy+Caution · 2001-06-27 01:56 · Score: 1

Service Level Agreements are all well and good, but if the cell phones are dead or whatever, all it does is give you a better excuse to yell when it's all over. SLA's don't change the laws of physics, or violate Murphy's law. Itshay appenhays.
Re:Thank you by ergo98 · 2001-06-26 22:36 · Score: 1

Ironically apparently this story was pulled (I got here by a direct link). Anyone else envisioning a fist fight happening in the OSDN server room?
Re:You know you've been using windows too long whe by ergo98 · 2001-06-26 22:59 · Score: 1

Agree with you to a point, however there is lots of equipment out there (Cisco switches definitely being among them...especially with the older software versions) where a problem suddenly crops up yet there has been zero changes in configuration or demands: whether it's a bit that was changed in volitile RAM by a solar disturbances, or a power fluctuation that disturbed the microprocessor, or an errant software pathway that is seldom travelled that a particular piece of data "exploited", and in these cases a reboot does fix the problem (I've seen this in network switches, routers, phone PBXs, and of course PC machines). It's hardly a scientific method, but when you don't have qualified staff to evluate the system sometimes it's your best hope.

Where I do agree with your assessment is the classic where someone modifies a couple of settings/wires/configurations and then problems occur, and then they think rebooting will fix them (usually while telling coworkers that they didn't touch a thing...it just "went wonky"). Correlation/causation.
So what actually caused the chain of events? by ergo98 · 2001-06-26 23:20 · Score: 1

While from the story it appears that the routers were configured in an odd manner, evidence has shown us that the site obviously worked before this outage. So the question then is what changed that caused the system to collapse? Did someone try their hand at playing with the settings?

Again: Obviously the OSDN network was working with this configuration, so what happened that caused the collapse? Was the router 0wZ3d?
1. Re:So what actually caused the chain of events? by bonoboy · 2001-06-27 04:13 · Score: 2
  
  Actually, they say it was the one unnoticed link to another switch that fucked it. Bet it was running different vlans and trying to be the hsrp master over the other.
  
  Never mess up your different vlan domains. Not a good idea - especially when they have the same numbers on two different groups.
  
  --
  toeslikefingers.com - because
2. Re:So what actually caused the chain of events? by rekoil · 2001-06-27 01:05 · Score: 1
  
  Reread the article...the Cisco 6506 (unruly beasts they are, I deal with 'em every day) has redundant "supervisor modules", which house the main CPU, main memory, etc...and the switch's configuration. If one fails, the other detects the failure and takes over. The problem, as far as I can tell, is that for some reason the configuration on the redundant supervisor module was not in sync with the config on the primary module (a failed/interrupted copy run start, maybe? copy run start is supposed to write the config to nvram on /both/ modules.) So when the primary failed, the secondary took over, but the config on the secondary was fscked, and the switch stopped working.
Re:Not like my experiences.. by PD · 2001-06-26 22:01 · Score: 1

Best computer support in any category that I've run into was back in the very early '90's. Phar-Lap had a bunch of smart guys there who wrote their DOS extender. I called them once to ask them a simple question, and happened to mention some of the frustrating things I was running into on an unrelated problem. Without even blinking, the support guy proceeded to give me great advice on how to fix the problem, and gave me a detailed explanation at the BIOS level of what was going wrong. I was amazed. They didn't have to do that, first of all. The level of pure competence displayed by a tech support person is something that I still remember clearly.

--
If tits were wings it'd be flying around.
Like reading old issues of the RISKS digest by kzinti · 2001-06-26 23:02 · Score: 2

This topic comes up many times on comp.risks: there's no point in having a backup (server, archive, database, router, etc.) unless you TEST your backup procedure to make sure it works. Pull the plug on the server - does the backup kick in? Kick over the router - does it fail over to the backup? Those who ignore the RISKS digest are doomed to repeat it!

--Jim
1. Re:Like reading old issues of the RISKS digest by SuiteSisterMary · 2001-06-26 23:48 · Score: 3
  
  The only thing worse than having no backups/redundancy is having backups/redundancy that you think will work, but, in fact, don't.
  
  --
  Vintage computer games and RPG books available. Email me if you're interested.
Cisco 6000 Password Recovery by AviN · 2001-06-27 09:51 · Score: 1

Not sure if anyone posted this yes, but you could have recovered the enable password using the instructions here http://www.cisco.com/warp/public/474/pswdrec_6000a ccess.shtml, which took me about 15 seconds to find on Google (first response for 'cisco 6000 reset password').

And OSDN technical people needed Cisco tech support's help to upgrade the IOS?

Perhaps OSDN should make passing CCNA mandatory for their networking people (as well as brushing up on their Googling skills). :-)
Re:Grr... by tuffy · 2001-06-26 22:54 · Score: 2

We'll just have to wait for an article giving a blow-by-blow account of the Slashdot outage article's outage.

--
Ita erat quando hic adveni.
Re:hang on, i can do this... by Mike+Buddha · 2001-06-27 04:46 · Score: 2

This plot is a total rip-off of the Miyazaki Classic: "Hagamaki Ortifunk", or (from it's American release) "Whistling in the Dark, with Daisies". Can't Americans think of anything original anymore? Could they ever?

I'm working on a web site to expose this travesty to the world. I'm sure everyone will be impressed with my esoteric knowledge of this classic of Japanese animation.

--
by Mike Buddha -- Someday the mountain might get him, but the law never will.
And in financial news... by TrentC · 2001-06-27 00:48 · Score: 4

...Cisco is reporting a projected 40% upswing in earnings for the next quarter, after a favorable review of their technical support personnel on the discussion site Slashdot led to a surge in sales for support contracts.

"It's the first the the Slashdot effect has been a productive one", said an unnamed Cisco official, pausing briefly to dodge a large bag of cash sailing through a nearby window.

Jay (=
1. Re:And in financial news... by DarkProphet · 2001-06-27 11:57 · Score: 1
  
  You know, I thought the same exact thing with the author gushing about Cisco's Tech Support. And especially after reading so many very positive comments about Cisco's service and commitment to its customers. I must say I am really impressed. Its enough to make we want to finish my CCNA ;-)
  
  --
  What could possibly hurt the security of the American people more than giving our own government the ability to hide its
Re:Cisco Support by grub · 2001-06-27 09:26 · Score: 2

At an old job we had a wee Cisco 1604 router, just doing ISDN for our /24 (at the time ISDN was the only affordable thing in our area)

I had a problem with something and mailed Cisco. No more than an hour went by and I had email from a real life person in front of me telling me what to do to fix our problem.

Cisco isn't cheap, but you do get what you pay for.
grub

--
Trolling is a art,
cisco tech support are badasses by Cheeze · 2001-06-26 21:58 · Score: 1

i've always gotten nothing but 100% help from the cisco tech line. i guess there IS a reason for paying $20k for a router.

i can't believe exodus doesn't have a cisco person that could fix that problem.

cisco - kudos
exodus - slight black eye
OSDN sites - inevitable downtime

moral of the story:
don't fire the network people, and make sure all network people are on the same page. there's no reason to even have 4 on-call people if none of them are going to be available. also, documentation, documentation, documentation!!!

--
Why read the article when I can just make up a snap judgement?
1. Re:cisco tech support are badasses by Cheeze · 2001-06-27 00:53 · Score: 1
  
  the slight black eye was because exodus should have (hopefully) at least one Cisco person on hand at any one time. seems like if you are a cisco person, you should be able to at least read the config and configure the router to provide service, even if it is not 100% correct and operating 100% fully functional. i agree that the person that setup the router probably didn't document anything, but i still find it hard to believe that SOMEONE at exodus couldn't understand it enough to even take a look at it. it's the difference between providing "average" service, and providing cisco service. just as the article said, Cisco never said "it's not a standard config, therefore we cannot help" which is common to say in the support arena. this is kind of what i got that exodus did from the article.
  
  of course, i was not there. i can't even believe they left the problem and went to sleep. in that situation, if i was responsible for the network, i would not have been able to sleep. i'm sure most of the people that have had NOC positions have done the "all-nighter" thing.
  
  --
  Why read the article when I can just make up a snap judgement?
2. Re:cisco tech support are badasses by itachi · 2001-06-27 07:25 · Score: 1
  
  No, see, OSDN has no excuse for not having the config of the 6509 done right the first time and properly backed up on at least one tftp server, so that the moment one supervisor card dies and fails to failover, you just download the config onto the backup card and you're back in 15 minutes or so... And I'll second that documentation. Lots of good, up-to-date documentation!
  
  itachi
3. Re:cisco tech support are badasses by hitchhikerjim · 2001-06-27 15:06 · Score: 1
  
  Definate black-eye for Exodus. Working for an 'un-named' dot-com, we could ALWAYS get our TAM at Globalcenter to page and wake up their Cisco gurus if we really got stuck. Sure we'd pay a crapload for it, but they were there for us, and the experts on their staff could rebuild any config in a snap.
  
  I'm really glad to hear about Cisco's support -- hadn't used it. I have had similar quality support from Extreme, F5, and most recently (and surprisingly) AT&T managed services. With AT&T, every time I call, the FIRST person I talk to logs onto our router to see what he can see, and fixes things quickly. I never have to work my way through tiers fo people who don't know what a network is, but can follow a script.
  
  After dealing primarily with Sun's support (absoulte crap) for the past couple of years, it's been quite refreshing. Generally -- you buy big-ticket items (with everyone except Sun), you get good support.
Re:You know you've been using windows too long whe by ethereal · 2001-06-26 23:18 · Score: 1

I found it interesting that it clears the logs - shouldn't such a big, expensive piece of HW have at least some non-volatile storage? Minimum it should log to a separate box that does have a disk drive, since I could see how incremental logging to flash parts might have some issues.
While I'm at it, this article seemed pretty slim on the details about the mystery woman. It didn't sound like anyone quit, in fact. I expect the full details on the story...
...about as soon as I get the full story on my URL above. In other words, I'm not holding my breath :)

Caution: contents may be quarrelsome and meticulous!

--
Your right to not believe: Americans United for Separation of Church and
Re:You know what ? by ethereal · 2001-06-26 23:27 · Score: 1

For those of us who aren't netops, it was an interesting read. If it bugs you so much, just don't read the article!

Caution: contents may be quarrelsome and meticulous!

--
Your right to not believe: Americans United for Separation of Church and
Re:OSDN, Audit ALL of your systems NOW. by Sloppy · 2001-06-27 06:21 · Score: 1

There's a big difference between always walking around with your pants around your ankles, and briefly lowering your pants for a minute when your doctor wants to do the "turn your head and cough" thing.

---

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Often true. Maybe usually. Not always. by HiThere · 2001-06-27 05:56 · Score: 3

Sometimes rebooting will fix the problem. Sometimes you don't have any alternative. Sometimes you can't fix the problem, but you can get things working again (e.g., Windows). And rebooting may the the best (or only) way to do that.

It is clear that they were out of their depth. It is clear that they didn't know what they were doin. They knew that they didn't know what they were doing. But the experts were unreachable. So they tried something that sometimes works. I really don't see how you can fault them for that. It would, of course, have been better if they had know what their choices and options were, but they didn't.

I wouldn't have either. Probably most of us wouldn't have.

Caution: Now approaching the (technological) singularity.

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Re:blocking ping, btw, is STUPID. by Teferi · 2001-06-27 02:09 · Score: 2

Four words: (D)DoS by ping flood.

--
-- Veni, vidi, dormivi
Reality corrupted by Chris_Pugrud · 2001-06-27 03:29 · Score: 1

> Reality corrupted. Reboot universe? (Y/N)

Shouldn't that be:

Reality corrupted. Reboot universe? [confirm]

Couldn't help it. On topic - Cisco TAC is amazing (of course I have a lot of friends either there or graduates of TAC). OTOH it doesn't hurt that I am a block away from the main campus. I once had a TAC engineer offer to drive the part over to me when a 6509 sup1a failed.

- Chris

--
-- I need more coffee. It's Monday. There is no such thing as enough coffee on a Monday.
Re:OSDN, Audit ALL of your systems NOW. by arcade · 2001-06-27 03:10 · Score: 2

I need to add something here. Of couse, if its nonencrypted telnet, it shouldn't be used most of the time. If its a crisis - then change it to a scrappable password, let the servicengineer do his thing, then change it afterwards.

Preferrably encrypted login should be used, of course. Be it ssh, telnet-ssl or whatever.

--

--
"Rune Kristian Viken" - http://www.nwo.no - arca
Re:OSDN, Audit ALL of your systems NOW. by arcade · 2001-06-28 18:43 · Score: 2

Defense in depth is a good philosophy to have, protecting against configuration mistakes.

Of course.

You are also protected if exploit code is run (say via a buffer overflow that changes hosts.deny).

uh? That sounds pretty damn unlikely. The bufferoverflow could just as well execute a reverse-channel back to the attacker. Of course, you limit the possibilities of the attackers. However, you're now already talking about running services with known vulnerabilities.

Firewalls can also protect against low-level attacks that don't attack the services/applications themselves.

That is better done at core-routers.

When properly configured, firewalls can be invaluable in logging traffic and otherwise keeping out unwanted traffic and IP spoofs -- and can do a far better job than simple packet filtering on a router.

That is better done by snort, or any other decent IDS.

I think it's pretty poor form to call someone else a dimwit when you're lacking a lot of info yourself. There's a reason that a firewall is industry-wide best practice for an Internet site or user network, and it's not because we're all dimwits

I regularly call those that thinks running firewalls is the be-all or end-all of security for dimwits. Unplugging a firewall on a network you know isn't exactly a horrible thing to do.

A Firewall is a good thing to have when you've got a network you don't have time to audit, and that doesn't have people to audit it on a regular basis. Its a good thing to have when you've got servers which you don't have any possibility of patching, or upgrading -- but that needs to be running some services (nonvulnerable) to the internet.

Of course, you could do lots of these things with NAT-devices. (Which of course isn't a perfect solution neither).

Blargh, I could rant on forever.
--

--
"Rune Kristian Viken" - http://www.nwo.no - arca
Re:OSDN, Audit ALL of your systems NOW. by arcade · 2001-06-27 03:08 · Score: 3

Point 1./ Why do you allow TELNET in to your routing/switching equipment from the outisde world? If a CISCO tech' with the password can do it then a hacker without the password likely can too.

Bah, you're talking without knowing the parameters. For all you know, they could've enabled the telnet access on the outbound interface specifically for the checking/cisco rep, disabling it afterwards.

Secondly -- if I remember correctly you can have pretty damn long passwords on ciscoequipment. We do not know the length of the password, but its highly probable that the password is 10+ characters. A bruteforce-attack is pretty damn difficult when you have to check 64^10 possibilities. According to my bc:

arcade@lux:~$ echo 64^10 | bc
1152921504606846976

Now, that is a pretty impressive number of queries you've got to make to exhaust that pwd-space. To be quite frank -- I don't see the problem.

Point 2./ If you are connected to the Internet in any way NEVER replace your firewall with a cross over cable. Basically at that stage you have your pants around your ankles, are bent over, with a big "Do Me Now!!!!!" sign on your butt!

Oh, yes of course. If you don't have a firewall You are phooked!!

Ehh? Excuse me? Why the fsck do a properly configured serverfarm need firewalls _at all_? Please, enlighten us with your wisdom oh dimwit.

Firewalls _are not needed_ if you're not running services that _should not be running_ on servers for the internet.
--

--
"Rune Kristian Viken" - http://www.nwo.no - arca
Re:IP change & DNS TTL by Skapare · 2001-06-27 10:20 · Score: 2
They did change the IP back. The switch over was temporary to get an announcement up ... and that was outside the Exodus cage. Fortunately they did have 1 (out of 3) authority DNS servers outside of there, so they could get people over to the announcement ... eventually as cache TTLs expired.

It's already bad enough to have a 24 hour expiration on the A-record. But you don't anticipate these outages, so 1D is fairly common practice (even longer in some places trying to reduce their DNS load). But the real mistake was putting 24 hours expriation on the temporary IP. Basically that says "as soon as I change this, everyone who cached this temporary IP address is going to have to wait a day from when they first say the page, before they can get their /. fix (or other OSDN stuff)". What? Did someone actually think they were going to change the IP back 24 hours BEFORE the sites were back up? The temporary A-record should have had a TTL of less than about 30 minutes. I'd have put in 10 minutes if it were me. But then, if I were there, but if I were there, I'd have also been doing the Cisco stuff and actually tested the failover configuration.

I do recommend:
- Having at least 4, and maybe even 6, authority DNS servers, all diversity located (I'm sure they can get some located over at VA Linux).
- Develop specific procedures to handle failures for each piece of equipment.
- Print the procedures on paper and keep a copy at the cage, in the office, and at a senior manager's home at minimum.
- Hire an outside consultant in each area to revied the procedures to make sure they make sense to an outsider in the event you might need to go outside to solve the issues.
- Test the procedures and configurations by scheduling "failures" to see if the major points work as intended.
These are the kinds of things system and network administrators are supposed to do. Programmers tend to hate that kind of work, so that's why there are separate job descriptions. Just because a good programmer can install and configure a server doesn't mean that just doing that is all that needs to be done. Businesses run smoothly when people know what they are supposed to do. And in the exceptional circumstances, they're doing things they don't routinely do, and it is essential to not only have those things written down, but also make sure they do work, and can be found even in a power failure.
--
now we need to go OSS in diesel cars
Re:Beware of departure from original statement by sharkey · 2001-06-27 06:31 · Score: 2

In this case, "Full Disclosure" means, "A good yarn."

--

--

--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
Re:Really full disclosure? by sharkey · 2001-06-27 06:45 · Score: 2

Not a libel suit, but rather /. is afraid of all the angry parents who would claim that his post would make little girls avoid taking classes involving computers, networks, etc., in much the same way that Barbie convinces girls to avoid math.

--

--

--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
SlashCrash? by Midnight+Thunder · 2001-06-26 22:07 · Score: 4

Maybe that's the next site OSDN should come up with. The idea is that anyone who has had a major problem with their network or computers and solved the problem, could post their write up to help others who find themselves in such a situation.

It definetly enjoyed reading this article and I am sure that it will be bookmarked by a fair few techie minded network admins, just in case.

--
Jumpstart the tartan drive.
1. Re:SlashCrash? by gizmo_mathboy · 2001-06-27 08:42 · Score: 2
  
  I agree in theory, but don't use this story as the prototype. Analyze it for a moment; the only real advise it can offer sysadmins is "Call tech support".
  
  While the exact story may not be useful the site itself could be. Real analysis of the problem and how it was resolved would be necessities.
  
  In this case it would be interesting to know what the Cisco/network dudes did to "correctly" configure the router and the rest of the network.
2. Re:SlashCrash? by DeePCedure · 2001-06-27 01:22 · Score: 1
  
  Absolutely! Katz did it for "Tales from the Hellmouth", so SlashCrash should be a sure thing. I'm not bad-mouthing Jon...it's just that Hellmouth was a little lighter on the "News for Nerds" scale. SlashCrash could carry a bit more weight and still have all the crunchy bits.
  
  If this actually gets off the ground, how about a call for papers in AskSlashdot? Those responses could be the basis for the new site, much as Katz's Hellmouth articles were leveraged into a new site.
  
  BTW - does anyone know if the Hellmouth page is still up or did it die by attrition?
3. Re:SlashCrash? by fonetik · 2001-06-27 11:06 · Score: 1
  
  The site I use that is somewhat similar to this is experts-exchange.com. Lots of people posting problems and solutions on all sorts of stuff. First place I check for an obscure problem.
  -Tom
Re:Cisco Support by RJ11 · 2001-06-27 00:50 · Score: 2

I've had this kind of support as an end user with a Cisco 804 ISDN router. The same quality of support that we were getting on our support contract with a 7206, at my previous job.

The main reason that they're so prompt, is that they have a global network for phone support. When you call them, your call gets transferred to a technician who has just arrived at work (ie, if you're in the US and call at 3am, you'll probably end up speaking to a technician in central or western Asia).
Re:Another great company by leperjuice · 2001-06-27 04:09 · Score: 2

Heh,
I'm reminded of an intrusion team story about one such team that faked a package from a OS vendor (letterhead, box, etc) containing a "patch." The admins looked at the box, assumed the obvious, and installed the patch which, while fixing an actual problem, also backdoord'd their system.
I could see running a remote exploit to crash your box, sending you mail about it (faked, of course) and then sending you a "patch" to "fix" the exploit (while adding some of my own...).
Be careful, there are some tricky bastards around with way too much time on their hands. Check those MD5 sums...

--
-- "I am disrespectful to dirt. Can you not see that I am serious!"
Re:You know you're a cranky old grognard when... by rasilon · 2001-06-27 19:05 · Score: 2

The logs are held in a ring buffer in RAM. What you are supposed to do is configure the router/switch with the address of a syslogd server which will handle the logs better.
Re:Microsoft Support by wesmills · 2001-06-27 10:52 · Score: 3

The only problem is, it isn't a troll. [Full disclosure: I guess it's time I reveal who I work for on /. for the first time, like anyone cares ... 'tis Microsoft] If you are a Premier or, God help you, Alliance customer, you will get the red carpet treatment in almost every product support division. Even in departments that have combined Premiere & Professional (per-incident) support, there's still the unwritten rule of "Pro calls don't get to push for complicated stuff as hard." First question out of a lot of people's mouths is "Pre or Pro?"
Of course, it depends greatly on who you are talking to. The platforms team does have a huge slant toward NT/2000 because that's what they support and allegedly like. Those of us in Exchange support (I'll leave it to you to figure out what part of Exch. support I'm in) handle calls where Unix servers are relays, Pix firewalls sit between systems and load-balances continually send packets off into the woods. If you *don't* know non-Microsoft stuff, aren't prepare to acknowledge that non-MS works and works well, or just can't handle the idea of public standards, you are fucked in that group.
It all comes down to who you get on the phone. If you don't like who you are dealing with, ask to speak with their manager or technical lead. Get it straightened out with them or request another support tech. You're paying for it, get what you are paying for.
(As always, my comments are my own and my employer doesn't take any responsibility for them. Like they would want to anyway.)

---
Where did Anne Tomlinson's post go? by cpeterso · 2001-06-27 06:37 · Score: 1

Her only post was just a few days ago, but her user info page now claims she has zero posts. My user info page currently shows my posts going back about two weeks. Why would Anne Tomlinson's one post mysteriously "disappear" from the system so soon??!

--
cpeterso
1. Re:Where did Anne Tomlinson's post go? by fat_hot · 2001-06-27 07:30 · Score: 2
  
  the post was SIGNED Anne Tomlinson, but was posted by an AC.
WHAT? by No-op · 2001-06-27 01:00 · Score: 1

If they aren't already doing this, they are even more clueless than I previously thought. ugh.

--
EOM
1. Re:WHAT? by No-op · 2001-06-27 12:11 · Score: 1
  
  I got modded down... heh. but it's true- anyone who isn't currently collecting stats off their equipment at all, much less in a secure fashion, is smoking crack. I do that for 100+ locations and I'm on a secured private network! I can't possibly IMAGINE the kind of shit you'd get if you had that running on the wild'n'woolly scary public internet. *cringe*.
  
  I guess I was just surprised that with how "geeky" this site is, etc etc, that there seem to be truly few geeks running it. I guess being able to write lots of perl and stuff like that is useful somehow but let's face it... infrastructure skills count for something, eh?
  
  Us pathetic hardware geeks have a saying about never trusting people who operate above OSI layer 3...
  
  --
  EOM
2. Re:WHAT? by Fjord · 2001-06-27 04:00 · Score: 2
  
  That's why God invented consoles
  
  --
  -no broken link
TFTP = 3 hops ? by The+Dev · 2001-06-26 23:31 · Score: 2

"First step was getting the firmware patches onto a TFTP server near the switch (had to be less 3 hops from the switch, TFTP doesn't work over longer hops)."

I've tftp'd images to cisco's and ascend's across the Internet (many hops) without problems. It's not smart because if you loose your path to the server you're screwed, but it does work.
1. Re:TFTP = 3 hops ? by tbaggy · 2001-06-27 02:09 · Score: 1
  
  TFTP uses UDP which is unreliable. The more hops you have the more likely it is the packet will be dropped or lost. I've done tftp's across 10 hops with no problems..and across 3 hops that were extremely busy where it would never work. TFTP is retarded anyway..every router/switch/whatever should use an ftp client to load software.
Re:Cisco Support by JabberWokky · 2001-06-27 01:36 · Score: 3

Can anyone famliar with Cisco, besides people working for /., confirm this?
I don't normally swear, but if someone asks me if Cisco support is good, I have to reply: "Abso-fucking-lutely". They are easily the tightest organization out there, bar none. I don't think anyone: UPS, the Military, Wall Street, runs as good an operation as they do.
And I've sat with two engineers at 1:00am through to 11:00am as they fixed my small gateway to an ISP, not a big ticket item. At one point, they did an engineer transfer, connecting me to a different part of the world, and spent thirty minutes overlapped, with the engineers working together to make sure that the new engineer knew what the first had tried. As it turned out, the firmware storage was flakey, and the config corrupted itself semi-randomly.
Years later, I watched Cisco do the exact same thing - only this time, they correctly identified that the problem wasn't them, but in some Bay routing equipment, *and* they told us the exact commands to fix it (I was a outside consultant just watching, but I believe they even offered to telnet in and fix it themselves).
So, yes. Cisco is the only brand I will buy, no matter how expensive they are. Think of the extra expense as insurance. You *may* not need it, but it sure pays for itself if you do.
--
Evan

--
"$30 for the One True Ring. $10 each additional ring!" -- JRR "Bob" Tolkien
Re:Beware of departure from original statement by AstroJetson · 2001-06-27 01:21 · Score: 1

Not sure Nokia's the answer. My Nokia has the same problem that Yazz's has.

--
Admit nothing, deny everything and make counter-accusations.
Re:OSDN, Audit ALL of your systems NOW. by AT · 2001-06-27 00:48 · Score: 2

Point 2./ If you are connected to the Internet in any way NEVER replace your firewall with a cross over cable.

...unless the risk of being comprimised within that short period is outweighed by the information you will gain by testing around your firewall. It is a simple trade-off.
No home phones? by wemmick · 2001-06-27 02:27 · Score: 1

A third's cell phone was on the kitchen counter, unhearable from the bedroom, and the fourth one's cell phone battery had fallen out.
And you called the guy in the hospital before trying the home phones of these two?
Better yet, ssh into their home boxen and blast an mp3. According to /. polls, their computers probably aren't more than 10 ft. from their beds.

--
___
Cognitive Overflow
more than yo
Re:Eeep - scary moderators! by Ralph+Wiggam · 2001-06-26 23:26 · Score: 4

I think the Anne Tomlinson post was a particularly brilliant troll.

A quick Google search for "Anne Tomlinson" returns an orchestra conductor and someone in a retirement community.

If it was a real post, CmdrTaco probably would have ignored it. His good humored response makes me think it was a troll.

Is there any evidence that it was real?

-B
Re:Cisco Support by skullY · 2001-06-27 02:32 · Score: 4

I have to agree with Taco, if they gave this kind of service down at the DMV, they'd be picking up passed out folks left and right.
From this day forward all slashdot editors shall be known as Taco, reguardless of what their chosen moniker is. This measure will simplify things drastically. No longer will posters have to do the arduous task of scrolling back to the top of a page to see which editor posted a story. Afterall, most the editors don't bother to check what they're posting, why should the readers?

--
When I was able to do my own spam-armoring, you got a chance to email me. Now you can only hope I see your reply.
Re:Microsoft Support by WNight · 2001-06-27 01:16 · Score: 5

http://www.bmug.org/news/articles/MSvsPF.html

I beg to differ.

That article details calling the 900 line, but even with support contracts, most MS tech support reps toe the company line in a distressing fashion.

"Unplug all the unix servers, that'll fix it"
"Upgrade everything to Win2k Adv Serv, that'll fix it"
"Upgrade to SQL Server (from Oracle), that'll fix it."

They seem to have no ability to distinguish which network components could be involved in a problem and are unwilling to accept that you've already localized the problem.

Case in point, there was a problem where two WinNT boxes wouldn't see each other. They both had IPs, they could both ping everything else. They were connected via a 100mbps switch.

We made sure each properly had an IP, that it could reach other machines, that the switch worked, and then swapped ports with two machines that were working just fine. We also tried isolating these two machines on their own switch, to avoid potential IP conflicts.

When we called the support number we honestly described the situation to the tech. He asked what else was on the network. We explained that it was in a different IP range, but on the same switches as a bunch of Linux machines, an Open BSD (firewall for the desktop machines), and a couple Suns (doing something for the other department, dunno what.)

He then proceeded to tell us that it was the other computers, despite our telling him that we had isolated the NT boxes in question on their own switch and we still had the problem, but when we put a third computer on, both of the NT boxes could reach it just fine.

We eventually lied to him, telling him that yes, we had unplugged all the unix machines, etc. (Like we're going to just unplug out company on the say-so of a moron, and like two junior techs would have the authority to do so anyway.) So now jim-bob starts to help, by telling us that Win2k is so much better, etc, that we wouldn't have these problems with it, etc.

When we flat-out refuse to "upgrade" to fix this bug, his advice is that we format the drives and reinstall. ARGH!

We finally convince him that these machines are somewhat important and we can't just wipe them everytime there's a small problem.

After over an hour with this jack-off, we hang-up, problem unresolved.

We get permission from the boss to call someone in... So we look through our list of contacts and grab someone whose card says they deal with networking and windows. Call him up. As we're describing the problem he listens quietly, grunts affirmatively when we describe how we isolated the problem, agrees that it couldn't be any of the other machines.

Then he says, "It sounds like it's an issue with a bad route, type 'route .....'" We do, and then we reboot. Problem solved.

He said that it, whatever it was, was a very common problem where the machines basically forget how to get from A to B. That command zeroed the routing (which didn't show any bad routes) and the reboot brought it back up.

Cost, a 15-minute phone consultation. $45

Microsoft tech support was basically a sales department, staffed with the marketing rejects.

So, don't EVER believe it if someone tells you that MS supports their products. Any company whose line is "Format and reinstall" has no business calling a product "Server", let alone claiming they're in the enterprise level.

Schon, earlier in this thread, said "Rebooting doesn't solve the problem!!" I wonder what he'd say about formatting and reinstalling.
Re:Tips by parc · 2001-06-27 01:56 · Score: 1

I don't know about the non-fiber portion of Cisco, but the last guy I talked to got $800 or so just for taking our call at 3am. This is on top of whatever they're being payed to answer the phone.
Re:Mad props to CISCO! by delmoi · 2001-06-27 16:46 · Score: 1

Look at it this way:

How much redundancy?!?!? Even if they're on different ---, they're probably still riding in on a loop from the same ---.. and then, they're most likely delivered on the same physical ---.... Don't be too reliant on this setup for redundancy's sake...

You learn things by context. Obviously, soulseller is saying that having multiple T1 lines isn't going to make things any more redundant because there all going to be part of the same physical connection. If one goes down, they all go down.

--

ReadThe ReflectionEngine, a cyberpunk style n
Re:OSDN, Audit ALL of your systems NOW. by kubrick · 2001-06-27 00:41 · Score: 1

OSDN, Audit ALL of your systems NOW.

They should be well schooled in how to do it after the FluffyBunny cracks...

/me cowers in anticipation of Flamebait moderation. :)

--
deus does not exist but if he does
Re:Thanks for an honest explanation by kubrick · 2001-06-27 00:55 · Score: 1

So, some people think that the editors sit around and mod down trolls, crap, and maybe also some posts that the editor just isn't comfortable with.

Personally, I find this ridiculous. Yes, it is possible that the editors could be doing this, but who would be willing to waste that much of their time? I can't think of a more boring job.

You, too, can intern at VA Linux this summer! Our stock price may not be all that high, so we might find it hard to pay decent wages, but we'll give you unlimited mod points on Slashdot! :)

--
deus does not exist but if he does
Re:First sign of trouble... by SEWilco · 2001-06-27 03:42 · Score: 1

So that means that someone on the Slashdot staff was reading Slashdot at 2 A.M.?
Re:Lessons to learn... by SEWilco · 2001-06-27 03:46 · Score: 1

Apparently there is no written trouble isolation procedure: People just used their network skills to try to find a problem.
Apparently OSDN does not have something like "mon", "Big Brother", or "Spong" monitoring all the equipment and links. Or if they do, they didn't mention looking at the screen and seeing the flashing red "Failure" icons on some equipment.
Apparently the network staff had to be manually paged, due to not having any of the aforementioned monitoring tools.
Lab Notebook by SEWilco · 2001-06-27 03:37 · Score: 2

Science college students may know this as a "Lab Notebook"; there actually is a course which tells you write everything down and suggests the types of format and details to include.
IP change & DNS TTL by Lazaru5 · 2001-06-27 02:57 · Score: 1

If the only problem was a switch/router config, why was a change of IP involved?

--

--

--
My comments and opinions completely reflect those of anyone and anything I am remotely associated with.
Memory leaks.... by dragondm · 2001-06-27 04:40 · Score: 1

Probably memory leaks in your JSP code.
(Yes, java can have memory leaks. Accidentally keeping references you don't need will leak memory, and/or other resources)

Various abuse of the Session object is probably the biggest one fer this.

--
-- -- The Dragon De Monsyne
Re:You know you've been using windows too long whe by Flower · 2001-06-26 23:09 · Score: 2

I wonder if they will investigate syslogging the messages to another box. Would this even be worth the effort?

--
I don't want knowledge. I want certainty. - Law, David Bowie
Re:Eeep - scary moderators! by Flower · 2001-06-27 03:03 · Score: 2

The only damn thing I need to know about this matter is the technical resolution to the outage and the human interest story that someone in the community who wasn't employed by /. came in to help. Hearing a first hand account of Cisco's technical support was interesting and worthwhile.

What I take issue with is not only did the editors divulge that someone quit but that they also labeled her as incompetent (My interpretation of the comments.) That should have never happened. You, I and everybody else here should have never been told that and it boggles me that those who defend this "right to know" BS are chomping at the bit to get the dirt on this alleged dustup. There isn't a single person I know in this field who would want to be put in that spotlight.

But what irks me the most is that this thread is so hot but my other post about syslogging the Cisco which is much more relevent to the article just sits with little discussion.

Christ people. Ditch the tabloid mentality and get back to the Nerd stuff.

--
I don't want knowledge. I want certainty. - Law, David Bowie
Re:Eeep - scary moderators! by Flower · 2001-06-26 23:30 · Score: 5

We have a right to know.

No, we don't have a right to know. Ms. Tomlinson's departure is between her and her employer; not some tabloid expose for a bunch of overly curious rumor mongering conspiracy theorists. I wouldn't be surprised if the people who blurted this out on a public forum haven't been seriously bitch slapped by HR.

As a community it would be best to let the matter drop. I'm sure if you were in Anne's position you'd be severely pissed. A little perspective and some empathy would be appropriate.

--
I don't want knowledge. I want certainty. - Law, David Bowie
Logging.. by schon · 2001-06-26 23:24 · Score: 1

shouldn't such a big, expensive piece of HW have at least some non-volatile storage?

Usually not. I'd guess that it's because a HD would become a potential source of failure (mechanical parts tend to wear out before non-mechanical ones.)

Minimum it should log to a separate box that does have a disk drive

Yes. Every "real" router I've seen has the option of logging to a remote syslog. (I LOVE standards :o)
Good question.. by schon · 2001-06-27 00:07 · Score: 1

Would this even be worth the effort?

Short answer: Yes.

Long answer: Yes, it's ALWAYS worth the effort.

Setting up a remote syslog takes all of 20 minutes and a spare box. It's trivial, even without considering the payoff.
Re:You know you've been using windows too long whe by schon · 2001-06-27 02:26 · Score: 3

Who's to say that it's not a 1 PPM problem that won't affect the system again for another hour/day/month/year? Once the packets are flowing again, then you can relax and take the time to root cause the problem and fix it.

And who's to say that the problem that's being experienced will be fixed by a reboot?

We had a server running, one of the things it did was SMB sharing - one of the drives (the one dedicated to non-critical SMB shares, in fact) died.. This box was doing MUCH more than SMB - it was also our internal DHCP, and DNS server

I was out, and one of our MS guys decided "I don't know what all these error messages mean, but I can't see my windows drives, so I'll just reboot it." Because the drive was dead, the machine wouldn't boot. He took the WHOLE DAMN DEPARTMENT OUT - nobody had DNS, and when people's windows machines stopped working, the solution was (guess what?) REBOOT them - so THEY stop talking to the network altogether.

Now, the kicker is that the drives in this machine were hot pluggable. If the reboot hadn't happened, I could have swapped in a new drive, restored from last night's tape backup, and people could have continued working. Instead, because the machine was rebooted the whole department was down for several hours.

The mantra stands - REBOOTING WILL NOT FIX THE PROBLEM. And if you reboot before you know what the problem is, then not only don't you know if it will help at all, but you also don't know if it will make the situation worse.

sometimes getting back online as fast as possible is more important.

That's the trap - there is no guarantee that rebooting will do this - and you might just be screwing it even worse.

Getting back online as fast as possible involves solving the problem first - REBOOTING WILL NOT FIX THE PROBLEM.
Re:You know you've been using windows too long whe by schon · 2001-06-26 23:52 · Score: 4

it may have resolved the problem for a short while

Even though you think you're saying the opposite of what I said, you've hit the nail squarely on the head - rebooting never fixes any problem.

It may temporarily fix the symptom, but the problem is still there.

It is possible for routers, Linux boxes, etc to crash.

Yes, it is. But if they crash, it's for a reason - perhaps there is a bug in the configuration, or firmware; or perhaps it's hardware.. but what's important is that rebooting will not actually fix the problem, all it will do is temporarily alleviate the symptom.

If the problem is with the configuration, then you fix the configuration. If there is a bug in your software, you fix that. If it's hardware, you replace the faulty hardware. If it's firmware, you upgrade the firmware (or replace the unit with a different model, from a manufacturer who actually does quality testing.)

But you do not just blindly reboot - if a reboot is required, you do it after you've discovered WHY the machine has crashed, and you've fixed it. Once again, the mantra is "Rebooting will not fix the problem."
You know you've been using windows too long when.. by schon · 2001-06-26 22:02 · Score: 5

I laughed out loud when I read this:

But, says Yazz, "Since the Cisco was rebooted there were no logs to look at."

You fell into the classic "Windows" trap.. this is what I tell the Jr. tech guys here when one of the servers goes wonky: "If it doesn't work, there is a reason; something is wrong. Rebooting will not fix the problem."

They usually respond with "but I didn't know what else to do."

To which I answer "Repeat after me - REBOOTING WILL NOT FIX THE PROBLEM."

"But I didn't know what else to do."

"Then call someone who does - REBOOTING WILL NOT FIX THE PROBLEM."
Thank you. by AtariDatacenter · 2001-06-26 22:55 · Score: 3

Just wanted to say thank you for the explanation. After all, we are your customers! :) It is really nice to get an accounting of what happened.

BTW: Are you going to plan any redundancy/failover drills as a result of this?
1. Re:Thank you. by Aaton · 2001-06-26 23:56 · Score: 1
  
  Well I'm off to get training as soon as I can with Cisco and Foundry.. I never really played with any Cisco hardware, usely it was someones else job or done once before I arrived.
  Yazz Atlas
Particularly good point! by Vryl · 2001-06-27 17:51 · Score: 2

This is exactly what we need on the 'net for us sysadmins to read. Failure stories. Why? You don't learn much from success stories, because things worked the first time.
IIRC, Boeing equipped all their maintenance staff a few years ago with laptops, camera's and video editing software. The engineers then made training vids for each other, in particular noting the fuckups. ie, they were not edited out, but deliberately left in so ppl could know what went wrong
Re:You know you've been using windows too long whe by topham · 2001-06-26 22:59 · Score: 2

I have seen more 'minor' problems fixed by rebooting than anything else. I am *sure* if you ask those involved they would have prefered to not reboot. But, it may have resolved the problem for a short while. It is possible for routers, Linux boxes, etc to crash. It happens.
Of course, I'm not so fond of Cisco as they are, aftering having to type "no shutdown" to bring up an ISDN router...
Re:Cisco Support by BacOs · 2001-06-26 22:01 · Score: 1

A client of mine had no support contract with Cisco other than having a Cisco ISDN router that was still under warranty. The tech explained that my client was supposed to have a support contract to get support but fixed the router configuration anyway.
Re:Not like my experiences.. by itachi · 2001-06-27 07:13 · Score: 1

I'd say that hold time is probably tied to your support contract with them. I work for a large uni with a huge userbase and a pricey support contract. The network outage queue is never more than 5 minutes of hold time, if you end up on hold. For RMA or other non-urgent calls, I've waited a lot longer. Of course, our hefty contract means we're paying for that prompt response, but IMHO we more than get our money worth. Also, the professionalism, I could go on about that for days... Wonderful.
I've also watched an outage start with a fellow engineer on the phone with a router manufacturer who shall remain nameless (not Cisco) where the support tech managed to give advice so erroneus as to cause a network outage that took down our entire core during the middle of the day. Not very fun.

itachi
Re:You know you've been using windows too long whe by itachi · 2001-06-27 07:56 · Score: 1

No, see, they hadn't logged into the console and typed "show log", or they would've seen the failover attempt. In fact, as far as I can tell, they didn't log into any of the network devices before the rebooted ALL OF THEM! Managed network devices are usually pretty helpful in terms of troubleshooting if you go to the trouble of getting console access. In this case, they went with the "Reboot, then troubleshoot" approach, which is dumb.
As for routers and switches being user-serviceable, sure, you aren't supposed to be in there with a soldering iron and a multimeter, but a config is absolutely user-servicable. If it might be a config error, rebooting will do more harm than good. The only time you should need reboot a piece of serious network hardware (by price alone, I think we can define a 6509 as serious hardware) is when you have no console access, the lights that are supposed to blink aren't blinking and the lights that aren't supposed to blink are blinking. Or smoke. That might also be a valid excuse. But it would have to be a lot of smoke...

itachi
Really full disclosure? by Platinum+Dragon · 2001-06-26 23:17 · Score: 5

If this is a "blow-by-blow" account, then could someone, I dunno, involved in the mess explain that little comment Taco made for about 20 minutes on Sunday about when the "qualified personnel" arrived, "[they] discovered that she wasn't actuually as qualified as we had hoped. Then she quit, thus terminating 3 local star systems."

Was Rob just popping off at random, or was that little bit removed trying to cover /.'s ass in the face of a potential libel suit?

Jes' wondering...

--

Someday, you're going to die. Get over it.
1. Re:Really full disclosure? by Miss+Tress+Race · 2001-06-26 23:28 · Score: 1
  
  Isn't interest that everyone who mentioned "the girl" early in this thread got modded down to the 9th circle of hell, whereas everyone mentioning it now is getting modded up?
  
  Would that be because all the early moddings were down by Slashdot editors trying to sweep the whole affair under the carpet, whereas the later ones were done by actual Slashdot users who are genuinely interested in this messy business?
  
  Come clean, Slashdot! Tell us about Anne Tomlinson!
  
  --
  "An ye harm none, do what ye will" - The Wiccan Rede
Re:Mad props to CISCO! by aenea · 2001-06-26 23:23 · Score: 1

Yes, but you don't have to fill the whole pipe. It gets channelized into T-1's and you only have to turn on (and pay for) as many as you need. And since you don't have to pay for local access on the the T's, the cost evens out at a pretty low number (7 T's in our case).
Re:Mad props to CISCO! by aenea · 2001-06-26 23:25 · Score: 1

OK, I should read before posting. I'm talking about a DS-3, not an OC-3.
Re:You know you've been using windows too long whe by lostguy · 2001-06-27 05:17 · Score: 1

I'm sure I'm not telling you anything you haven't seen:

A novice was trying to fix a broken Lisp machine by turning the power off and on.

Knight, seeing what the student was doing, spoke sternly: "You cannot fix a machine by just power-cycling it with no understanding of what is going wrong."

Knight turned the machine off and on.

The machine worked.
Re:Microsoft Support by LordXarph · 2001-06-27 02:30 · Score: 1

I had a support guy go through a complicated debugging procedure having to do with password changes failing under obscure conditions with Win95 clients and NT servers.

Would this be the infamous MIT Realm bug? ("Passwords must be at least 16785 characters in length...")

-Lx?
Re:You know you've been using windows too long whe by mgoff · 2001-06-26 23:20 · Score: 5

While technically correct, you have to look at the bigger picture. Rebooting may not fix the root cause of the problem, but it could very possibly get the system back online. Who's to say that it's not a 1 PPM problem that won't affect the system again for another hour/day/month/year? Once the packets are flowing again, then you can relax and take the time to root cause the problem and fix it.

You can make a case that valuable troubleshooting info is lost when systems are rebooted. I agree, but counter that all good systems should have detailed event logging. Leaving the system online and intact is the best way to root cause a bug. But, sometimes getting back online as fast as possible is more important.
Re:OSDN, Audit ALL of your systems NOW. by El+Volio · 2001-06-27 05:39 · Score: 2

Ehh? Excuse me? Why the fsck do a properly configured serverfarm need firewalls _at all_? Please, enlighten us with your wisdom oh dimwit.
Firewalls _are not needed_ if you're not running services that _should not be running_ on servers for the internet.
Because
Defense in depth is a good philosophy to have, protecting against configuration mistakes.
You are also protected if exploit code is run (say via a buffer overflow that changes hosts.deny).
Firewalls can also protect against low-level attacks that don't attack the services/applications themselves.
When properly configured, firewalls can be invaluable in logging traffic and otherwise keeping out unwanted traffic and IP spoofs -- and can do a far better job than simple packet filtering on a router. That said, anyone who believes that firewalls are the be-all end-all of security is fooling themselves.
I think it's pretty poor form to call someone else a dimwit when you're lacking a lot of info yourself. There's a reason that a firewall is industry-wide best practice for an Internet site or user network, and it's not because we're all dimwits.

--
"You can never have too many elephants on your team."
Tips by addaon · 2001-06-26 22:56 · Score: 1

Do Cisco tech folks take tips? I mean, of course they're only doing their job, but when someone saves your ass bigtime and really impresses you in the process, it's often nice to show some appreciation.

--

I've had this sig for three days.
I suspect... by wiredog · 2001-06-26 23:47 · Score: 2

I suspect that someone from legal saw that, feared a lawsuit, and whacked Taco with a clue-stick. Thus neccessitating a quick edit.

--

Best Slashdot Co
Re:Eeep - scary moderators! by wiredog · 2001-06-27 00:33 · Score: 2

That actually makes sense. So, who whacked Taco with the clue-stick to get him to pull the original posting?

--

Best Slashdot Co
Cisco Tech Support over Multiple Days of Problems by drfalken · 2001-06-27 01:37 · Score: 2

I had an experience recently with a Cisco firewall and due to the nature of the problem I would have to wait-and-see days at a time to see if the problem was reoccurring or had been fixed as we tried different things. The tech I was assigned to called me several times a day, was willing to stay on the phone with me for hours at a time educating me on the issues, the technology and walking me through the solution. I couldn't believe it. He emailed several times a day too, as did the Cisco dispatch system that kept track of the issue.

Congratulations to Cisco. They are huge and have a massive install base but provide the best tech support I have ever seen.

I don't even want to talk about my bad experiences with Microsoft's premier technical support.
----------------------------
Let this be a lesson..... by BluSkreen · 2001-06-27 15:43 · Score: 3

...to all of us that do this for a living. Forget for a moment that most here have never set foot in a real data center, much less even own a server. No pros want to see another's network go down (well, most of the time ;-) ), and we don't want ours down. I've spent many an hour looking at an errant PIX, or troubleshooting some other network config. I know what those guys were going through. It sucks...

Don't slack. When you slack it bites you in the ass. Maybe not today, maybe not tomarrow, but someday, someday soon, it will.

Test your failover configs. How? By actually making them fail. During the maintaince window, power that primary router/firewall/load balancer down hard and see if the fail over works. It's like testing back ups, kids. You have to know they work before you need them.

Realistically develop on call strategies. OSDN didn't really have a net ops staff of four. One had quit (why are they counted?), one was in hospital, and two had weak "couldn't reach my cell phone" excuses. That just don't work in the real world. If you are on call, you are on call. The "phone too far away" and "battery fell out" just don't cut it in the adult world of professional net ops. Get a satellite pager, and if you are on call, make sure it's on, and near you so you can hear it.

Don't bash your employees/ former employees, particularly during a heated situation. Shows no class. Besides, if you are such a hot shit. grab that console and fix it. Otherwise, keep your mouth shut. Besides, who is in charge of making sure the people that are hired are qualified? Hmmm?

Document your shit. It's not that hard. Visio can do much of it for you. I'm going to break an NDA here, but the Exodus Service Agreement states that all machines and cables are to be labeled. That is so when the dude (or dudette) has to leave the NOC and enter your cage to reboot your lame box, they know what is going on. Also works well for when you net ops staff is too concerned with getting drunk or laid and your poor programmers have to go in to fix the network.

Some folks really went above and beyond, but it seems to me that the management severely dropped the ball.

Is VA really ready to abandon the hardware market for software services? One has to wonder.

Dave
been there before...
Re:You know you've been using windows too long whe by anticypher · 2001-06-26 23:22 · Score: 2

There are times you need to reboot a cisco, particularly during your CCIE exam. :-)
At the start of the 2 day CCIE exam, the proctors casually mention they knock off points for un-necessary rebooting of routers. But the progression of the modules in the test will likely wedge a routing protocol, requiring a reboot, and they are really looking for those monks wise enough to know when to reboot

IOS is an amazing mess of spagetti modules, and the fact they work together so well is a testament to cisco's dev test and solution test people. But sometimes the appletalk routing module will choke, and a reboot is the only remedy. Or NetFlow forgets, or Policy Routing doesn't. But a wise cisco expert will copy the logs and generally preserve the state of the machine for analysis after the reboot in case the machine doesn't come back. But wise cisco experts cost a lot of money.

Its a good mantra in networking "REBOOTING WILL NOT FIX THE PROBLEM"

the AC

--
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
Re:Responsive techies by anticypher · 2001-06-26 23:57 · Score: 5

Yes, but have you tried dialing that number when Slashdot wasn't down?

Rumour has it the conversation went a little something like this:
[Kurt] Hi, cisco tech support?
[TAC] Yes
[Kurt] this is Kurt at slashdot...
[TAC] Oh my god, its about time you called us. You've been offline for nearly 24 hours, we're all going through withdrawls. Hang on a sec, our top techs are dying to help.

I talked to a friend in cisco TAC (Brussels) who said that they regularly lurk on /., and in the TAC they could see it was a major network outage since the whole of the OSDN sites were unreachable. Nothing to do but wait, or answer calls from other customers :-)

Since summer weather had come to Europe, I, personally, did not notice the outage. But I promise in the futur to not have a life.

the AC

[Note to Kurt and company, make sure you return your customer satisfaction survey. Those TAC folks live and die based on keeping a very high level of sat scores. I think they need a 4.85 (on scale of 1 to 5) just to keep their jobs within cisco, and a 4.89 to get a raise. So 5's across the board, and in the comments put a link to this /. story for their manager]

--
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on
Re:Cisco Support by kordless · 2001-06-27 02:09 · Score: 1

Actually I have a couple of puny little 2500s and no contract and they still helped me with an IOS bug a year after I purchased them - eventually giving me a free upgrade of the IOS. If for ANY reason (IOS/hardware) a Cisco is having problems I guarantee that Cisco will help you.

About a year and a half ago I bought a used 3600 and then found out via Cisco tech support that that particular run of 3600s had a hardware bug and they RMA'd the damn thing, shipping me a BRAND NEW unit before I returned the other one. Past that, I called later with a problem of my own causing, and they still had a couple of their techs help me out.

All in all, Cisco does a great job of supporting their hardware.

Shamless plug: Check out Grub!
Re:Oracle? You must be joking! by bungo · 2001-06-27 19:42 · Score: 1

>small branch (...but managing all of Benelux no less..), and get hardly any more info than you
>have.". And no, that particular problem (RMI in Jserver crashing after several hours of just
>sitting there..) has not been fixed in a week.

I just had to laugh when I saw you comment. I also get my Oracle Support from Benelux, and I
just happen to also have an Apache/Jserv related problem outstanding for a while.

One thing though, your Oracle office was telling the truth about being a small branch, and not
being able to do too much. All of the real work goes on in the US, and the local offices don't
really have alot of contact with the Oracle US.

A little disclosure is in order here. I worked for one of the Benelux support offices for 2
years as a manager of one of the support groups - this was after working in two of their other
offices, including the US HQ. I was amazed how much of a backwater the Benelux offices were. In
fact, I quit out of frustration after a fight my boss - the support center manager because I knew
that we weren't giving an acceptable level of support.

--
"The best part? I became an ordained minister while not wearing pants." -- CleverNickName
Disorganized? by NateTech · 2001-06-27 02:27 · Score: 3

I've seen this scenario over and over again... one guy who knows and understands the network, ten people standing around at the equipment trying various silly commands to fix it when it's down...

Here's some suggestions -- you probably already realize that 90% of your pain was avoidable, but everyone has to learn "the hard way" the first time, right?

We've got piles of printouts and documetation of all sorts, drawings and spreadsheets, helping us keep track of every IP and machine in this cage, yet it doesn't seem to get any clearer...

That's called bad documentation that no one ever reads.

Get your networking guys to document TROUBLESHOOTING techniques and to teach the programmers how the network is acutally set up and why. You have plenty of talent capable of understanding how it all works there.

Get more than one way (cell phone) to reach your most important network engineers. Pop for a guaranteed delivery text pager and ask them to carry that as well as the cell phone.

Yazz says, "When I arrived at Exodus, Kurt and Dave were trying every combination of things to do to get the 6509 back. But neither they nor I even knew the Cisco Passwords."

Paper. Wallet. Put them there. Better yet, PGP encrypted password escrow somewhere that anyone can get access to, and a locked cheap fire safe at the office with the public and private PGP keys on a CD-R inside -- for just this type of scenario.

So I asked the Cisco technician, Scott, to telnet into our switch...

Bad bad bad... telnet = bad. Good network security always goes out the window when the network's down...

So he's in the switch and he's disgusted and horrified by how we have it configured...

This is probably the most important hint during your entire outage... your network people either don't know what they're doing, or you're not ALLOWING them to do their jobs, or they're understaffed, or whatever other excuses can be made up ... your call, but don't forget this -- if Cisco's "horrified" by your configs, there's a serious issue you need to find and correct somewhere in your organization. Everything from training, to documentation, to troubleshooting procedures needs a serious walk-through.

The one card going bad wouldn't have been such a big deal if the config in both were set up correctly. It was meant to flop to the other interface if the primary card died, which it did, but not with all the info it needed... AKA it was misconfigured...

DO FAIL-OVER TESTING. If you'd have done a fail-over test of this config you'd have known it didn't work correctly during a nice scheduled time when your network engineers are available and at the equipment, instead of the middle of the night during an outage with all of them MIA. This is so easy to avoid.

Exodus really wasn't set up to handle the type of failover the 6509 was meant to do. Thats what the Cisco folks said basically, and the Exodus people are no longer supporting this type of Cisco in their setups.
Nice of them to tell you. Who is the customer here again?

...he's talking to me on my cell phone (which disconnects every 5 minutes just to make my day more challenging)

Put a $20/month POTS line in your cabinet for goodness sake!

That's enough... I'm appalled, but hopefully you will straighten out some things now that the site was down for an extended period. Done properly, network downtime should be a rare event, usually caused by human error, not by bad configuration.

Many outages are unavoidable, your outage sounds like it was avoidable, and certain steps could have been taken to minimize the length of the outage.

--
+++OK ATH
Re:Uptime by vectro · 2001-06-30 02:17 · Score: 1

I think `everything' in this context refers to all the network equipment, not the servers. All the OSDN services were down, so it obviously wasn't a problem with just the slashdot servers.
Re:More Writeups Needed by Tackhead · 2001-06-27 00:12 · Score: 5

> Really, what Linux (and other geek subjects) need is to have a Great Book of Failure Stories -- writeups like these that detail horrible outages, downtimes, misconfigurations, security hacks, etc., so that we all can learn from other's mistakes.
What you said.
I did a bit of (very junior-level) sysadminning back in my day.
First thing the BOFH told me was "Buy a hard-cover notebook. Not spiral-bound. Not softcover. Write down everything you do. Feel free to doodle and write obscenities if you like. Someday you'll thank me for this".
I was a bit befuddled, and then he showed me his notebooks. Five years of dramatic fuckups and even more dramatic recoveries. His own personal "deja.google.com" (but it was 1992, and long-term USENET searching hadn't been invented yet, hell our office was using UUCP!) for everything he'd had to work out from first principles on his own.
And thus was the PFY enlightened.
(And yes, I did buy him a beer in late 1992, when something I wrote down in mid-1992 jumped off my page and saved my ass.)
Wow! Inside Slashdot! by Old+Wolf · 2001-06-27 16:36 · Score: 1

So.. how much would you pay for the chance to go into that big room with all the whizzo big computers and routers and wires, and see a PC and go "Hey! That's slashdot!! :D!"
If this isn't 31337 then I don't know what is..

(Maybe a topic for next week's poll)
Re:You know you've been using windows too long whe by homebru · 2001-06-27 03:37 · Score: 1

Well it usually does with windows

Rebooting does not solve the problem because Windows is the problem.
Re: Cisco Support by homebru · 2001-06-27 03:52 · Score: 2

Hmm... Austrailian chicks on an 800- number. I got to get me one of them Cisco thingies.
Re:Cisco Support by mr100percent · 2001-06-27 07:51 · Score: 1

How do you reboot a cisco router?
Re:Cisco Support by mr100percent · 2001-06-27 07:54 · Score: 1

But are they all CISCO certified?
Re:More Writeups Needed by Chalst · 2001-06-27 16:04 · Score: 2

You really have missed Gibson's point. His point is, how much worse the situation will be if most DDoS attacks are spoofed. Then one can't do what he did (i.e. contact the ISPs of the zombie machines).

AFAIK, the only way to spoof IP addresses in Windows is to install a new networking stack, and that is difficult to do in the kind of generic way that zombie clients work, for reasons Gibson discusses in an article at his site.
Not like my experiences.. by mjh · 2001-06-26 21:47 · Score: 5

So I called Cisco tech support. I wish had done this sooner. I was amazed first of all by how you can talk to a qualified Cisco tech immediately... we're talking an 800 number that you dial and within less than a minute you are talking to a technician.

While I agree that I usually get someone at cisco who knows what they're talking about, it is very rare in my experience that it happens in only a minute, although it does occasionally happen. A much more common experience is to wait on hold for 15-20 minutes, but I have waited on hold as long as an hour with them.
All of that being said, I would have to agree that cisco's TAC is probably one of the best tech support groups I've ever worked with.
--

--
Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
1. Re:Not like my experiences.. by Kizeh · 2001-06-26 23:09 · Score: 1
  
  Access to TAC depends. This was a valid
  network emergency (priority 1) case, and in
  those Cisco really has amazing response times
  and quality. With regular cases you don't have
  to stay on hold either: give them a number
  and they'll call you back. Or just open the
  ticket via their web page. We end up
  opening cases once a month or so and I can't
  remember ever having to languish waiting on the
  phone.
2. Re:Not like my experiences.. by Ghost-in-the-shell · 2001-06-27 02:48 · Score: 1
  
  Nope, Not kidding at all.
  
  --
  -Ghost
3. Re:Not like my experiences.. by Ghost-in-the-shell · 2001-06-26 23:24 · Score: 2
  
  I don't see how this kind of response is so hard to believe from a company who develops and supports mission critical systems.
  
  After working Nortel's 3rd level support for 2 years this kind of response is what TRUE customer support is. You call an 800 number and instantly get a person who KNOW'S what he/she is doing, and will Login and work on your hardware/software issues.
  
  This is not your Mom and pop ISP were talking about, this is Cisco, they do RTOS's for Telephony and Datacom, they have a whole department of people who are in the office 24/7 for just this kind of thing. They more than likely also have a 2nd/3rd level group with someone on call, in case the problem is beyond the ability of the 1st level guy (I.E. A Software patch needed, or product specific problem)
  
  When it comes to customer support, I think a big lesson can be learned from the big 4 (Cisco, Nortel, Alcatel, Lucent) they have refined customer support, so when your switch (weather it be Datacom or Telephony) goes down, you get to talk to someone RIGHT AWAY, Now that's service.
  
  --
  -Ghost
4. Re:Not like my experiences.. by ruckc · 2001-06-27 01:00 · Score: 1
  
  umm lucent? you have to be kidding right?
5. Re:Not like my experiences.. by mkbz · 2001-06-26 23:15 · Score: 1
  
  While I agree that I usually get someone at cisco who knows what they're talking about, it is very rare in my experience that it happens in only a minute, although it does occasionally happen. A much more common experience is to wait on hold for 15-20 minutes, but I have waited on hold as long as an hour with them.
  
  well now you know the difference- you have to name-drop Slashdot to the FIRST person you talk to!
  
  "We're Slashdot, dammit! and we're down! and it's your fault! ah, now that i've got your attention, well... we're not slashdot exactly, we're ... um, dot slash ... apachectl start... "
  
  --
  www.pixelectric.com
cell phone on counter by jgennick · 2001-06-26 22:43 · Score: 1

A third's cell phone was on the kitchen counter, unhearable from the bedroom,
LOL! I'd probably put it farther away than that to avoid being awakened at 7:00 am on a Sunday.
OSDN, Audit ALL of your systems NOW. by montey · 2001-06-26 22:44 · Score: 3

Someone kind of elluded to this but MY GOD are your security procedures busted!

Point 1./ Why do you allow TELNET in to your routing/switching equipment from the outisde world? If a CISCO tech' with the password can do it then a hacker without the password likely can too.

Point 2./ If you are connected to the Internet in any way NEVER replace your firewall with a cross over cable. Basically at that stage you have your pants around your ankles, are bent over, with a big "Do Me Now!!!!!" sign on your butt!
1. Re:OSDN, Audit ALL of your systems NOW. by gclef · 2001-06-27 01:18 · Score: 2
  
  re: telnet: welcome to the world of Cisco, son. It wasn't until *very* recently that Cisco even offered ssh-supporting IOS versions at all. It's still very rare to see those versions in use. Now, connecting the management interface of the router to the 'net has some issues of it's own, but it is not unusual...If you're gonna play with Cisco gear, you're gonna use telnet. Deal.
  As for replacing the firewall...you really need to get rid of the idea that a firewall is some magic security dust. It is nothing more than a router with an attitude. It controls which services are accessable from which IPs. Nothing more. if you've configured the machines behind it well, the firewall is basically unnecessary. It's useful to do access control, but that's about it.
2. Re:OSDN, Audit ALL of your systems NOW. by tomknight · 2001-06-26 23:35 · Score: 1
  
  I imagine it's because they were fucked, desparate and had no options....?
  
  --
  Oh arse
Exodus Waltham IDC by thetechweenie · 2001-06-26 23:16 · Score: 1

Unfortunately I've spent many nights there in the past. I can confirm their remarks about Cisco support. It's usually fantastic! Lucent should take a few pointers from them, especially the guys in their INS group. They did forget to mention the security there. If you're not in the Exodus database, you won't get past the little lobby with the bullet proof glass. You've gotta use your proximity card like six times just to go to the bathroom!

--

Um, this is my sig.
One part I don't understand... by adjuster · 2001-06-27 06:51 · Score: 1

A Cisco 6509 is not a cheap piece of hardware (yours was at least $30K w/ what you've described in it). Especially if you've got redundant supervisor engines, and doubly so if you're doing layer-3 in the box. That having been said, why couldn't you have hired a competent technician to install the box properly when you installed it in the first place, rather than having a half-assed configuration loaded?
I doubt your setup would've been more than three or four hours of configuration, based on what you've described, and you'd have gotten decent documentation out of all of it, if you'd hired a good technician. It's quite obvious that one visit by a technician at $250.00/hr would've more than made up for the cost of the downtime and headaches you incurred as a result of having a poor configuration in the first place.

--
The Attitude Adjuster, I hate me, you can too.
Re:You know you've been using windows too long whe by mrhartwig · 2001-06-27 06:11 · Score: 1

You made your point with capitalized letters but still.. If everything has failed and you have nothing to loose why not give it a shot?

If everything else has failed & you haven't found the problem, you haven't tried everything. Granted, there are things like memory leaks that won't purge themselves, or total hangs where the box is unresponsive, but if you're in there looking at the system, you should be (a) finding and fixing, or (b) recording everything you can to find the problem later.

Because sometimes, in my experience, you do have to reboot. You've got 2,000 users sitting on their thumbs and the regulatory commission wants data & is going to fine you for every hour you're late and the division VP is on the conference call wanting to "help" and.... It may be better to reboot & get the thing back up and running, but even so, do not assume you've fixed the problem.

Except in Windows, for those problems that don't require a re-install. :-)
Re:How many times do we have to say it... by bconway · 2001-06-27 03:29 · Score: 2

That's not the point. Some of us don't want people to know that our machine exists.

--
Interested in open source engine management for your Subaru?
Re:Cisco Support by norton_I · 2001-06-27 12:11 · Score: 2

Yeah, but you wouldn't think you you have joe average asking why their Oracle setup is generating redo logs at 200 MB/minute under light load. Yet, unless you have platinum support ${SO FUCKING HUGE NUMBER YOU CANT BELIEVE IT} you have to sit there explaining to first tier tech support what a redo log is.

Once you get to second teir, Oracle support is pretty good, though not spectacular. But of course, you have to pay $BIGNUMBER to get any support at all, and unless you even more, they hang up the phone at 5pm.
Re:Cisco Support by norton_I · 2001-06-27 12:19 · Score: 2

Once you escalate a few levels they are OK. And the people I have talked to were friendly and honestly trying. They just were clueless. I worked with Oracle for a year and a half, and knew more about it that just about anyone I talked to there.

On the other hand, the one Oracle consultant I talked to was a genius. Unfortunately, he was way, way, way to expensive for us, and the 3rd party consultants we talked to sucked total ass.
One error by AppyPappy · 2001-06-27 01:55 · Score: 1

Instead of getting an answer from Exodus and running to Cisco with it, and then back again, he got Cisco and Exodus engineers to talk directly to each other and work it out.

That's a damn lie. Engineers from separate companies always require some clueless idiot act as a go-between.

--
If you aren't part of the solution, there is good money to be made prolonging the problem
Re:Cisco Service by ZeissIcon · 2001-06-27 02:08 · Score: 2

This is a little off topic, but I've seen some other similar things, so, my $.02: I just had an amazing experience with tech support at Penguin Computing. I have a server that was in immediate danger of losing its IBM deskstar 75 GB drive. The following is an excerpt from the letter that I wrote to them thanking them; the new drive (under warranty) was in my hand less than 24 hours after my phone call. 22 hours later the server is back on line with all of the data restored my total cost? nada : Much to my surprise, my call was answered by a human. They asked how they could help -- I told them, and I was immediately connected to another human. No hold time, no muzak, no "press 9 if your laptop is on fire" messages. The fellow that I talked to -- I regret to say that I'm not sure of his name; I though it was *******, but your customer service rep, *********, says it was ********... at any rate, he sounded British, if that helps -- was courteous, and, much to my surprise, he ... listened. No script, no checklist, no lets spend three hours going over all of the stuff that I had figured out before I called tech support in the first place. He let me talk, asked two very specific questions about the contents of the log file, then simply agreed with my assessment that the hard drive was in immenent danger of failing, and that a new one would be shipped to me right away. Despite the fact that it was about 4 o'clock on the east coast when I called, **** informs me the new drive will arrive this morning. I was stunned. I can honestly say that even when I was working in an environment where we payed over $100K/year for support to Sun, I have never been treated with the courtesy, respect, promptness and knowledgeable professionalism that I was when I called Penguin Computing yesterday. I felt that my problems were of genuine concern and that everything possible was being done to correct them. I promise you that from this point on, as long as you keep doing what you're doing, I will never buy a server or Linux workstation from anyone else.
More Writeups Needed by tarsi210 · 2001-06-26 22:05 · Score: 5

This is exactly what we need on the 'net for us sysadmins to read. Failure stories. Why? You don't learn much from success stories, because things worked the first time.

"Welcome to the HOWTO. My setup worked the first time. Why didn't yours?"

Granted, noone wants to see stuff on the 'net go down (and we're glad you're back, /.) But writeups like this one and Steve Gibson's at GCR about the DDOS attacks are priceless. They show what people have tried, what hasn't worked, what did work, and definately where to start the next time.

Really, what Linux (and other geek subjects) need is to have a Great Book of Failure Stories -- writeups like these that detail horrible outages, downtimes, misconfigurations, security hacks, etc., so that we all can learn from other's mistakes.

--
Blog,Twitter
1. Re:More Writeups Needed by RedX · 2001-06-27 00:07 · Score: 2
  
  Definitely agreed. The GRC DDOS article was one of the most interesting reads I've had on the web in quite awhile, and this article ranks right up there. In fact, I have been drifting away from reading tech "stuff" on the web recently, and these two articles are really piquing that interest again and providing the motivation I've been lacking to get moving on finishing my CCNP and CCIE studies.
2. Re:More Writeups Needed by DawnHorse · 2001-06-26 23:17 · Score: 1
  
  OK, can you give a URL for this DDOS reference?
  
  --
  !#
3. Re:More Writeups Needed by wierdo · 2001-06-28 04:51 · Score: 1
  
  You really have missed Gibson's point. His point is, how much worse the situation will be if most DDoS attacks are spoofed. Then one can't do what he did (i.e. contact the ISPs of the zombie machines).
  
  If all ISPs filtered at their borders, or even a significant portion of them did, there would be little significance in the amount of spoofed attacks.
  
  AFAIK, the only way to spoof IP addresses in Windows is to install a new networking stack, and that is difficult to do in the kind of generic way that zombie clients work, for reasons Gibson discusses in an article at his site.
  
  This is a fallacy. With WinPCap you can forge packets, without a reboot. Even if it does require a reboot, how many Windows users would think anything of it if their system suddenly crashed? What an easy way for the trojan that downloads WinPcap to force a reboot.
  
  Face it, the man is a loon... (Not as big a loon as me, but still)
  
  -Nathan
  
  Care about freedom?
  
  --
  Care about freedom?
  Become a card carrying member of the GOA.
4. Re:More Writeups Needed by wierdo · 2001-06-27 02:20 · Score: 2
  
  Steve Gibson's at GCR about the DDOS attacks are priceless.
  
  Read with a skeptical eye. There are several points he makes in there which are quite idiotic. Note how he rants about Microsoft finally implementing a real IP stack for Windows. Then compare that against his own admission that the attacks against him, and indeed, most DDoS attacks are not spoofed. Then check out the myriad ways to spoof source addresses in Windows, even with Microsoft's current half-baked IP stack.
  
  That article makes Gibson look like the bullshit artist he is. That said, it's certainly better than nothing, assuming that everyone that reads it can sort out the crap from the good parts.
  
  -Nathan
  
  Care about freedom?
  
  --
  Care about freedom?
  Become a card carrying member of the GOA.
5. Re:More Writeups Needed by plcurechax · 2001-06-26 23:38 · Score: 5
  
  If you haven't heard of it before, get your browser over to RISKS digest or comp.risks. Forum On Risks To The Public In Computers And Related Systems
  I discovered the RISKS digest when reading the about software engineering, and it has certainly helped me think about failures and recovery when designing and building systems.
  There is also the underlying thesis in the article about how complexity, whether in a bungled redundant network connection or just a large poorly documented, poorly tested, and poor configured system is your enemy in building reliable systems. A lot of systems were built like Slashdot during the dot.com IPO craze, I wonder how many of us rely on such poorly built systems?
  Building complex, reliable networks is hard, and expensive. About 3 times what your estimate is, which is about 2 times what you boss expects to spend.
6. Re:More Writeups Needed by I)_MaLaClYpSe_(I · 2001-06-26 22:56 · Score: 1
  
  I agree!
  Please don't let this great idea be devnulled! Slashcrash.org is way the best idea since ThinkGeek
Hear, hear! by DoomHaven · 2001-06-27 09:13 · Score: 3

I hereby propose the term "anne-tomlinson", or "tomlinson" to describe the act of departing a company in the most suspicious of circumstances, known only to a very privileged few. Used in the following example:

X: "What happened to Anne?"
Y: "I don't know; all I know is that she anne-tomlinnsoned from work."

Note that this verb should have the subject of the remark used as the subject of the verb, and the organization left as the indirect object. This should be adhered to regardless if the subject quit, was fired, laid off, died, disappeared, never existed, or there was a mutual decision for the subject to leave. In fact, the verb should mainly be used when the method of departure is unknown or never officially stated (or, even officially acknowledged).

Also note that this verb should NOT refer to a person leaving another person, as in "Fred's now-ex-wife had tomlinsonned from him." The number of people (one or more) that are the subject should be less than the number of people who the object represents.

Continuing on, this verb should NEVER be applied in a self referential matter, IE: "I anne-tomlinsonned from them". This implies that the subject either A) knows the reasons, and is just being a prick about not stating them, or B) the subject does not know the reasons due to massive thick-headedness.

Lastly, this term should only be used to convey the sense of inpenetrable mystery surrounding the departure. It would be oxy-moronic to state: "Ted tomlinsonned because he was bored and wanted to leave." If the mystery surrounding the departure is penetrable, use another phrase.

anne-tomlinson, v,: to leave or be removed from a group under extremely odd, and mysterious, circumstances; especially when the actual method of departure or initiating party of departure is unknown. More especially, when the actual departure is apparently covered up or left un-acknowledged.
tenses: anne-tomlinson, anne-tomlinsons, anne-tomlinsonning, anne-tomlinsonned, had anne-tomlinsonned.

--
"Don't mind me cutting myself on Occam's Razor"
Re:Cisco Support by scoove · 2001-06-26 21:55 · Score: 4

Well, we have a $LITTLE_NUMBER support contract with Cisco, and have had similar with two previous companies.

Our results were much the same. Very, very responsive people.

I have to agree with Taco, if they gave this kind of service down at the DMV, they'd be picking up passed out folks left and right.

*scoove*
Must be nice... by pogle · 2001-06-26 23:03 · Score: 1

for Cisco's folks to be so empowered. I know at my tech support job I can run around my wheel for hours before being told anything of merit. During a DOS attack on my university, the operations folks did not correctly diagnose the problem for hours, and left me in tech support with about 8000 angry students calling in...I wish more places would follow Cisco and give the techs some real power.

And I love the review of what went wrong. Reminds me of similar situations with missing computers...check everything and with everyone, and no one knows where computer is, call more people, try more logs, nothing, then the one guy who's out of touch comes in and tells us he has it...

--
http://thechubbyferret.net - Ferret pictures and informative links.
Re:So, what was the problem...? by RedX · 2001-06-27 02:20 · Score: 2

And more details:
Half the VLANs were only stored on one unit and the other half of them on other. So when one died it only knew half of the full setup and couldn't route things correctly since the VLANs it wanted weren't there
Basically the network was fine as long as both cards were up since they could share their half of the VLAN info with the other. Once one card went down, the other had no idea what to do with traffic to/from the other half of the VLAN.
Oracle? You must be joking! by BlueUnderwear · 2001-06-26 23:00 · Score: 2

> Cisco, mid-1990's Novell, and Oracle are the only organizations I know of that provide this kind of help.
Oracle? Maybe if you live in the US. Around here we get the line "Sure we entered your bug report into our database. However, we are unable to tell you when it will be fixed. Maybe next week, maybe in ten years. Sorry, we are only a small branch (...but managing all of Benelux no less..), and get hardly any more info than you have.". And no, that particular problem (RMI in Jserver crashing after several hours of just sitting there..) has not been fixed in a week. Actually, we still haven't heard back about it, even though it was reported last autumn.

--
Say no to software patents.
What really happened... by selectspec · 2001-06-26 22:49 · Score: 1

Kurt:JESUS CHRIST!!! Don't go in there man!
Dave:What the hell is it!
Kurt:I don't know but its big and its pissed off!
AARRRrrhhhggh....
Kurt:That was the Exodus admin. I told him not to go in there!
Yazz:Shit! Its the fucking Lameness Filter man! The Lameness Filter!!!
Kurt:It must have mutated or something, my god its turned on us! I don't understand... wait... what is that sound...
ARAAARAAAAHHHHHHHhhhhhh

--
Someone you trust is one of us.
Re:Thank you by selectspec · 2001-06-26 23:00 · Score: 1

What about the chick that quit or was fired in the middle of the crisis. I want to see photos of Rob crying when the boys at VA told him to get the site back up or he'd lose his job. This isn't a blow by blow. This is a cover up. The Powers That Be are pulling the wool over our eyes.

--
Someone you trust is one of us.
Just like a real job... by NothingCleverToSay · 2001-06-26 23:56 · Score: 2

Reading the description of this outage was just like a day-to-day description of life at my job. I'm not a network engineer, I'm a software developer. And explaining this stuff as part of my job to non-programmers is next to impossible.

My job description involves lots of doing requirements, designing solutions and the implementing those solutions in software (with some testing thrown in if the PHB will allow for it in the schedule). All of this sounds like normal programmer fare. But then a production outage rolls in... Some client calls saying the system is down, or the data is corrupt, or the reponse times are unacceptable, or whatever, and the firefighting mode goes into full gear. All of the developers go into full debug mode: is the database OK, did the software get changed, is this a code bug or something environmental, etc. Everyone brainstorms, tries ideas and eventually the problem is solved. Sometimes the developer who wrote that module/class/program/function is on vacation, or [s]he was a contractor who quit/fired last week. Sometimes a sysadmin helpfully made a system change that fubared all of the software's configuration, sometimes an idiot developer hardcoded a value that should be dynamic. And the options go on and on.

What suprises me is how regularly this occurs in all of the various software jobs I have had. And how large a part of my job this kind of real-time, customers-are-waiting-and-we're-losing-money-for-e very-second-of-outage production debugging really is. I would ass-u-me that if I ever worked on an air traffic control system or NASA flight software that the testing would be rigorous enough that this would not happen. But the reality seems to be that in most software jobs, good enough is good enough, and bugs really do happen in production systems.

Thanks for the cool blow-by-blow analysis of the outage. Of course, I'd prefer if /. was just up and available 24/7, but the reality is, there will always be problems. It's interesting to read about what they were and how they were fixed.
Re:How many times do we have to say it... by supz · 2001-06-27 07:22 · Score: 2

Blocking pings makes it slightly harder for crackers to find your machine when they are scanning huge subnets at a time... They won't want to waste their time running a port scan on a machine that they're not sure is up.

I was recently running a network security scan at my place of work, pinging machines to see if they were up before scanning them and was unaware that icmp traffic was dropped by default, so when the results turned up about 3 vulnerabilities on a network saturated with Microsoft products, I was a little skeptical.

--
SuPz.orG
Re:You know you're a cranky old grognard when... by kimihia · 2001-06-27 09:37 · Score: 1
It's the hardware.

They don't have a hard drive in them. All they have is N megabytes of Flash RAM that has to store the OS, and the configuration.

If they stored those logs in the Flash RAM then:
- They'd run out of it (logs in RAM are rotated after they exceed N lines)
- They'd wear it out (not much of a worry, but after 100K rewrites Flash RAM doesn't work quite so well)
Re:Eeep - scary moderators! by willis · 2001-06-27 02:23 · Score: 2

Word up -- we don't have the right to know.
It's funny what people believe they actually have a right to.

--

there is no thing
what else could you want?
"about five minutes" by JoeGee · 2001-06-27 06:20 · Score: 1

The next time you hear those words bring a rechargable shaver, a toothbrush, toiletries, a sleeping bag, and a charger for your cell phone. It sounds like you had to learn the hard way that "about five minutes" is geekspeak for "let's take a sail on the S.S. Minnow." :)

--

Get off my virtual lawn, you damned virtual kids!
TV Movie by phunhippy · 2001-06-27 00:37 · Score: 1

So when DOES The streaming WEB-based movie short of this disaster come out? Who will play cmdrtaco?

:)
1. Re:TV Movie by sik+puppy · 2001-06-27 01:14 · Score: 1
  
  Divine?
  
  --
  The first thing we do, let's kill all the lawyers. Shakespeare, Henry VI, Part 2, Act 4, Scene 2
Re:Thanks for an honest explanation by kevinank · 2001-06-27 00:53 · Score: 2

This still doesn't explain what Rob meant about "she wasn't actuually as qualified as we had hoped." But then, I kind of suspect I don't really want to know either.

The natural english interpretation of a layman would be that she didn't fix the problem. In other words, it is probably a slightly hyperbolic reference to the fact that the problem continued to persist even after the network tech had arrived. I wouldn't read into that that Rob was disparaging the actual technical skills of the person in question, (unless he has those skills himself he'd hardly be in a position to judge) but as a simple tongue in cheek reference to the distance between skill and expectation (ie: as we had hoped..)
It is easy to hope that your techs can walk on water. It is slightly less common to find people who take a shortcut across the duck pond on the way to work.

--
LibBT: BitTorrent for C - small - fast - clean (Now Versio
It MUST have been a FREAK OF NATURE! by Greyfox · 2001-06-26 23:10 · Score: 3

I mean, you're never supposed to get competent support on the 800 tech support line! The guy you talked to is probably due to be transferred out to some other department because he's obviously way too smart to be anywhere near a phone.
Wish the companies I deal with on a regular basis ever showed that level of skill when I need help. well... hmm... actually Speakeasy is generally pretty good about accepting that my problem is accurately diagnosed and figuring out what's wrong. And Viewsonic the other day was able to provide refresh-rate specs on a monitor I wanted to order within about 60 seconds of my placing the call (Though they dropped the ball by not having the specs I wanted available on their web page) What is this trend of good service? It's scaring me...

--
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Cisco Support by randombit · 2001-06-26 21:45 · Score: 2

we're talking an 800 number that you dial and within less than a minute you are talking to a technician...

Can anyone famliar with Cisco, besides people working for /., confirm this? I'm curious how much is Cisco's good customer support and how much is the fact that OSDN probably has a $BIG_NUMBER support contract with Cisco.

Just curious.
1. Re:Cisco Support by bonoboy · 2001-06-26 22:07 · Score: 2
  
  Their reputation here probably isn't as good as it is in the States. I know they're still a million times better than Microsoft, but they do give you the run around.
  
  For those that know: we've had problems where a switch was unable to ping a management interface on a router directly connected. It was runnning dot1q on the ports. They tried to tell us that we should try configuring ISL. It was a gigabit port - therefore ISL doesn't work (unless they've fixed it recently). I've also had them tell me that we can't configure multiple management vlan interfaces on a switch. Fine, I say. But I have six identical 2900s here, all with the multiple management ints, same IOS, all working. "They're all broken. The one that doesn't work is the one that's working right."
  
  Turns out they were right, you're not supposed to be able to do that. But "all your other switches are broken???" Please! And the number of times I've been asked to upgrade IOS....
  
  --
  toeslikefingers.com - because
2. Re:Cisco Support by Caspuh · 2001-06-27 03:54 · Score: 1
  
  Until thousands of retards follow your advice and flood Cisco with phone calls whenever their software/os/whatever appears to be down.
3. Re:Cisco Support by Caspuh · 2001-06-29 03:44 · Score: 1
  
  I know several people who could prove otherwise. They will go as far as making the Cisco guy write entire router configs over the phone.
4. Re:Cisco Support by zulux · 2001-06-26 23:57 · Score: 1
  
  I can vouch for their support, even on small stuff.
  I have a client that needed their frames bumped up from two channels to six. I fancied myself clever, and studidied all the Cicso FAQ's I could find, and proceded with the Qwest people to change the frames.
  It dident work. So the Qwest person, calles up a Cisco person on a conference call. Turns out - I'm an idiot. I had changed the label that the router uses to display information, not the actually amount of frames that it used. The Cisco guy was plesent and nice, took him a few seconds to fugure things out. We never received a bill from Cisco - and the router was only worth about $1100.
  For my use, Cisco is a bit expensive - I can afford to tinker around. But for the my clients that depend on me, I'm sticking with Cisco for the time being.
  
  --
  Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
5. Re:Cisco Support by biohazard99 · 2001-06-28 12:59 · Score: 1
  
  Don't know about the routers, but when I worked for @home at my university, proceedure was to pull the plug on the ubr 95x switch/hubs for 5 minutes, and plug it back in, since they had disabled the hardware reset switch on the back.
  
  --
  Read my plan to save the Bengals
6. Re:Cisco Support by petard · 2001-06-27 08:19 · Score: 2
  
  These things are neither cheap nor simple... anyone who has access to one (*not* thousands of retards!) and has a support contract can tell when it's a software/os/line/etc. problem. Those folks who can't are kept far away from the router.
  
  --
  .sig: file not found
7. Re:Cisco Support by petard · 2001-06-26 23:49 · Score: 3
  
  We're a very small installation and get similar response. If you've paid for a support contract and it even smells like a router problem, the fastest way to fix it is to call them right away. They are a model tech support organization.
  
  --
  .sig: file not found
8. Re:Cisco Support by Octospider · 2001-06-27 08:48 · Score: 1
  
  Their tech support is very good. You get the number from their website, and without any sort of cutsomer info you talk to a tech ready to help you... Kinda interesting if you ask me....
9. Re:Cisco Support by dtr21 · 2001-06-27 07:02 · Score: 2
  
  Not wishing to sound too cynical, but they do have a big advantage over other companies when providing tech support
  
  If you're calling CISCO tech support, there's a pretty good chance you know something about PCs and networking. You're not going to get Joe Average asking how to reboot a Cisco Router :)
  
  I guess this means that the calls they do get are serious - and require someone with a good level of product knowledge to fix. It means that their support staff can speak in technical terms to their callers, which is always significantly faster than trying o exlpain in layman's terms. I would also imagine that the average time between calls for each customer, averaged out over all customers, is probably a *lot* higher than many companies
  
  Although, all that having been considered, hats off to them for what sounds like an excellent job.
10. RE: Cisco Support by BB_WOLF · 2001-06-27 01:10 · Score: 1
  
  I have had similar expierences. Several at 3 in the morning. The support is generally followed up by an Email survey request and a phone call from there customer service. Almost everytime I have called Cisco, one the first questions from the technicians is can I access this device ? I like calling at 3 in the morning and talking to some chick in Austrailia with a sexy accent :)
11. Re:Cisco Support by zilvester · 2001-06-27 04:18 · Score: 1
  
  Yep - as a sys admin who has worked for a few ISP's (in the UK) I'd rate Cisco 1st and probably Sun second. In both cases the most welcome thing you can hear (at 3 in the morning when everything is flashing red and your mobile is going off every 30 seconds) is somebody who a) works out how technical you are, and talks to you at you own level, and b) rings you up the next day to see if everything is all sorted (and is there anything more they can do to help). In both Cisco and Sun's case I would say that the equipment is expensive - but at the end of the day, it's worth paying the extra to know that you can count on them.
12. Re:Cisco Support by ErikTheRed · 2001-06-27 04:07 · Score: 2
  
  Oracle? Please. Granted, the installations I've worked with have been small, but my philosophy is if you pay the $$$ they ask for support, you should get something other than a complete jackass at the other end of the line.
  
  In our case, we had a completely reproduceable problem where a query would take over an hour to set up a plan (granted, it was an obscenely complex query, but still). We even had the problem narrowed down to how Oracle processed index statistics on certain tables. In several attempts over more than a year we never got any response, even a stupid one.
  
  I won't argue that Oracle is one hella-stable piece of software - I've got an Oracle server running on NT(!) that has had no unscheduled downtime in over 4 years. But to say that their technical support sucks ass is the understatement of the century.
  
  --
  
  Help save the critically endangered Blue Iguana
13. Re:Cisco Support by proxima · 2001-06-27 03:04 · Score: 2
  
  We are simply a little ISP, with an old Cisco 1603 router, out of standard warranty (we paid for no additional tech support). Just last week we called to get some questions answered, and I had an excellent conversation with a friendly Cisco technician that answered every question - even ones unrelated to my original case. The best part? It was for free, a "courtesy call" since we've never needed any kind of tech support before.
  
  I've actually had a string of really good tech support stories recently - some companies get it right, and they in turn have a very loyal customer. And a shareholder - I own stock in Cisco (I bought right above their recent 52 week low =).
  
  Chances are, customers of Cisco will have as good of tech support experiences as many of us attest.
  
  --
  "The universe seems neither benign nor hostile, merely indifferent." --Carl Sagan
14. Re:Cisco Support by _ganja_ · 2001-06-27 07:10 · Score: 2
  
  ISL does work on Gig now but I's recommend dot1q as it has a lot lower processing overhead (it does less :-)
  Cisco support, seems great if you don't know anything about Cisco's and you've dealt with poor support in the past; the setup is how support should be done. Be warned however the 1st liners vary a lot in technical ability but this is the same as with any support I guess. As stated above they tend to fall back to "please upgrade to the latest ISO" if they are unsure of things.
  #7066
  
  --
  A journey of a thousand miles starts with a brutal anal raping at airport security
15. Re:Cisco Support by bwohlgemuth · 2001-06-27 02:08 · Score: 1
  
  Not only that...if you get confused on an IOS issue, open a TAC case and they will answer it for you. That is the reason why Cisco has the market share it does...
  
  B
  
  --
  Flamebait .sig for sale, low mileage, one owner only.
  Serious inquiries only.
16. Re:Cisco Support by scott1853 · 2001-06-28 01:03 · Score: 1
  
  How do you screw things up so much you need tech support? There's two possibilites:
  
  1. It worked.
  
  2. I fucked up. Re-format and don't do that again. Make note to self about the need for backups.
17. Re:Cisco Support by wierdo · 2001-06-27 01:09 · Score: 1
  
  Caveat: Cisco basically does not have first level support (i.e. "'Is the router plugged in?' 'What's a router?') - you are supposed to have second level knowledge and have completed the first level troubleshooting before you call TAC.
  
  Heh, I've been on vacation (unfortunately, I work with small offices, so no budget for other techs, just me and my cellphone), when the Cisco 804 (Which, while being just about the best ISDN router out there, blows goats anyway) lost its config for whatever reason, and the Office Manager called up TAC, and they walked her through getting it configured well enough to dial up to their ISP, and then the guy telnetted in and reconfiged the whole deal. Did a better job than when I configured it, too.. :) I'd still like to know why it spontaneously lost it's startup-config though. Must have been that Florida lightning...
  
  -Nathan
  
  Care about freedom?
  
  --
  Care about freedom?
  Become a card carrying member of the GOA.
18. Re:Cisco Support by j_snare · 2001-06-29 03:39 · Score: 2
  
  Once you escalate a few levels they are OK
  
  True enough. We called once asking for some assistance on decoding trace files, just in general, and were pointed to several fairly useless TARs. I was surprised that the first level folks even knew what SQL was, but they started knowing their stuff once you escalated. But like you said, they were still friendly and trying to help.
19. Re:Cisco Support by Leon+da+Costa · 2001-06-26 22:01 · Score: 1
  
  I've dealt with Cisco Support on quite a number of times and issues... and I have to say: it varies greatly from time to time. On one occasion, I had everyone but Sam Hilabi himself on the phone when I was strugging with a relatively simple BGP issue. On one other occasion, it took them a week to get back to my initial call, only then to figure out they needed a deeper expert when I was stuck with an ATM-QoS-question. Asking for access to your network equipment (remote login, etc) is common - it's their most secure way of verifying your current situation. Proposing and acting out changes is completely alien to me, however. Just my $0.02...
20. Re:Cisco Support by timt25 · 2001-06-26 22:06 · Score: 1
  
  I am extremely impressed with a company who has a support staff that can overnight a 6509 switch when needed, and when you call them at 2 am, you get "routed" to a TAC somewhere on the planet that is not only awake during their normal business hours, but also speaks your language!
21. Re:Cisco Support by zerofoo · 2001-06-27 04:59 · Score: 1
  
  I work for a small company. The only Cisco product we own is a PIX 506 firewall....the cheapest one they make, and I have to say that Cisco tech support is top notch. I've gotten techs on the phone in under 5 minutes, and when it's longer they page me and I call them back. Cisco stuff is expensive (including the service contracts) but it's worth it. -ted
22. Re:Cisco Support by Zaknafein500 · 2001-06-26 23:57 · Score: 1
  
  TAC somewhere on the planet that is not only awake during their normal business hours, but also speaks your language!
  
  This, I must say, is the most frustrating part of calling tech support. I have no issues with people migrating to the US to work. However, if someone is going to work in a phone support position, it should be requisite that they speak clear, coherent English. There is nothing more frustrating when systems are down than dealing with a support rep that you can't understand and can't understand you.
  
  --
  
  "The guide is definitive, reality is frequently inaccurate."
23. Re:Cisco Support by tb3 · 2001-06-26 23:02 · Score: 2
  
  Okay, this is on an entirely different scale, but I have had terrific, free tech support from PowerQuest. I am constantly messing with my partition setup with Partition Magic and Boot Magic, and once in a while I manage to screw something up and make a partition unbootable. Tech support at Powerquest has always helped me out and I've never lost a partition.
  It's also nice that they assume, because you are using PartitionMagic, that you're not an luser and you have some clue as to what you're doing.
  
  "What are we going to do tonight, Bill?"
  
  --
  www.lucernesys.comHorizon: Calendar-based personal finance
24. Re:Cisco Support by chip_s_ahoy · 2001-06-27 14:34 · Score: 1
  
  In the future, all restaurants are Taco Bell?
25. Re:Cisco Support by anonymous+cupboard · 2001-06-27 15:06 · Score: 1
  
  Unfortunately, I can't say the same. I was doing some work on a client site. They had routers, but they had been bought s/h and no support contract was yet in place for them. It wasn't even clear who could support them at one stage.
  We did a setup at the client site and had a load of problems with the DECnet protocol stack. IP worked great, but we needed an IOS upgrade at a weekend. Despite even having a credit-card at hand, it was almost impossible to purchase the upgrade. We eventually had to buy the s/w in another country (it was not downloadable) and then the local distributor (who we eventually identified)was able to release their copy of the s/w to us.
  Cisco themselves were extremely uncooperative, passing everything down the line to distributors and resellers who had neither the inclination nor the skills to provide 24x7 support.
  The client was a securities trading house, not the best people to upset when you look to the share price and Cisco were definitely bad-mouthed over that one. As a contrast, another client of mine had a memory module go late at night. The system was supported only on a 9-5 contract. We had no problem to get an engineer and the part and we only paid for the labour cost, not that of the parts, as it could have failed 9-5.
  Maybe Cisco works better in the US, but in Europe, I can't rely on them.
26. Re:Cisco Support by sgups · 2001-06-27 23:35 · Score: 1
  
  yes..if the future is same as one indicated in demolition man..watch the movie to figure out what i am saying
  
  --
  Democratic USA - Government of the corporations, by the Corporations, for the corporations.
27. Re:Cisco Support by blang · 2001-06-27 00:57 · Score: 2
  
  ... and Oracle are the only organizations I know of that provide this kind of help
  Oracle? I have many times tried to get useful support from Oracle. Out of maybe 20 "experts", 3 or 4 was worth listening to. Some were plainly unqualified, and the less they knew, the more arrogant they were. Others were sales people masquerading as technical people. Some were just body parts sticking out of a suit.
  You must clearly have had no contact whatsoever with Oracle, or you must be working for them.
  
  --
  -- Another senseless waste of fine bytes.
28. Re:Cisco Support by mbenson · 2001-06-27 02:10 · Score: 1
  
  I know people there. I've seen the work load, and call volume. I've seen the dedication of the Cisco TAC from an Admin stand point. You could only beat Cisco support by livng in a perfect world where support isn't needed. Try getting this Support from Juniper...hell try getting a living person at Juniper
29. Re:Cisco Support by nostromo53 · 2001-06-27 03:53 · Score: 1
  
  I was also with a itsy-bitsy company - computer
  staff of 1...me. I had a Cisco router to set up
  didn't have a clue. My experience was very much
  like Taco's. In fact, that was over three years
  ago and I still keep in touch, albeit
  infrequently, with the tech who helped me.
  
  cheers,
  
  -mm
30. Re:Cisco Support by The_Candyman · 2001-06-27 12:21 · Score: 1
  
  I would have to say that most are Cisco cerified, and at the "upper level" 99% are single or double CCIE's.
31. Re:Cisco Support by The_Candyman · 2001-06-27 12:33 · Score: 1
  
  Actually the link is right on the front page: "Hot Jobs @ Cisco" on the right column about 1/2 way down the page.
32. Re:Cisco Support by The_Candyman · 2001-06-27 12:49 · Score: 1
  
  I will confirm this for you. Cisco support is the best ever. True if you have a network down emergency you get top priority, but don't all companies (that really have it together) do this? As for a "nominal" issue, if you call in you don't sit on hold for an hour, you talk to a live person, just not the top support person. As for submitting a web tech question, you usually get a faster responce. If you have further questions feel free to e-mail me and I will explain further.
33. Re:Cisco Support by VPNguy · 2001-06-29 09:12 · Score: 1
  
  You're right about the response being faster if you open a case at the Cisco TAC web site. If you open a case on-line, you go to the head of the line in the queue. So if someone opened a case by phone and has been waiting for 10 minutes, the case you opened on-line takes priority over there's and you go first.
Re:hang on, i can do this... by Mononoke · 2001-06-27 06:09 · Score: 1

My my, what a putz you are.
--

--
NetInfo connection failed for server 127.0.0.1/local
Isn't this coming from a public company? by Rascally · 2001-06-27 00:15 · Score: 1

I think the SEC needs to be informed about this...a public company lying to it's "customers" (aka, the Slashdot readers)? Hmmm. Time to drop them an email, I think!
Re:You know you've been using windows too long whe by inburito · 2001-06-27 05:11 · Score: 3

You made your point with capitalized letters but still.. If everything has failed and you have nothing to loose why not give it a shot? This is exactly the situation that these people were in! It might not work and you've lost what, a couple of minutes.. But, it might work and your day is half saved..
Just because someone screwed at your work doesn't make your mantra a universal rule.. Especially when dealing with something like a router or a switch.. These things are normally not meant to be user serviceable and will take a reboot just fine(no hot swappable drives there).. You could have hit a 1ppm problem and rebooting just brings everything back online until statistics kick in again. Little uptime is better than none.
Sure it won't fix anything per se, but getting things normalized enables you to start concentrating on the problems at a less hectic pace..
Mad props to CISCO! by Chanc_Gorkon · 2001-06-26 21:58 · Score: 2

I can only say one word: WOW! Pretty kewl! I am sure glad we have Cisco hardware now! :) We used to be a IBM only shop, even for switches! But Cisco bought IBM's networking out, so we are switching to cisco. We still have a big IBM switch left, but they are swapping it out for a gigabit switch. We're going to be 100 MB to the desktop, with a gigabit backbone. Not sure what we are going to do with the T-1's (we have 4 T-1's...some ask why don't you go with a OC-3? 4 T-1's are probably cheaper and provide redundancy...nuff said).

--
Gorkman
1. Re:Mad props to CISCO! by Chanc_Gorkon · 2001-06-27 10:22 · Score: 2
  
  Our problem has never been the lines, it's been the hardware...mebbe a CSU/DSU would die or one of the T's routers would die. We also have two providers (our network folks have 4 t's and the telecom folks have some too as well as a OC-3....two different providers). Also, there's a cost savings too. This may change soon though as we go client server and need the ability to stream video, as well as pump pretty web pages to students registering and paying fees from home.
  
  To those flaming me about my knowledge of networks/computers....No, I don't know everything about computers or the networking hardware, but can you ADMIT you don't? I know alot, but I sure as heck don't know everything. At least I can admit I don't know it all. In my mind it makes me a better person too! (As I always like to learn! :)
  
  Anyway, Cisco, you did a good job! I only wish HALF of the manufacturer support lines were as good as you!
  
  --
  Gorkman
2. Re:Mad props to CISCO! by Banjonardo · 2001-06-27 12:12 · Score: 1
  
  Ok, soulseller: re-read that post. That made ABSOLUTELY, POSITIVELY no sense to me. Really. Damn. Not that I know anything about networking, mind you, but.... damn!
  
  --
  -----
  Score 3? For what? Being wrong, at length? - smirkleton
3. Re:Mad props to CISCO! by Leon+da+Costa · 2001-06-26 22:06 · Score: 3
  
  > (we have 4 T-1's...some ask why don't you go with a OC-3? 4 T-1's are probably cheaper and provide redundancy...nuff said).
  
  ehr... 4 T1's = 4 x 1.544 Mbit. (=6.176Mbit... der)
  1 OC-3 = 155.52 Mbit.
  
  Not really similar, eh :-)
4. Re:Mad props to CISCO! by SoulSeller · 2001-06-27 05:49 · Score: 2
  
  How much redundancy?!?!? Even if they're on different IXCs, they're probably still riding in on a loop from the same LEC.. and then, they're most likely delivered on the same physical DS3.... Don't be too reliant on this setup for redundancy's sake...
5. Re:Mad props to CISCO! by 44 · 2001-06-29 04:54 · Score: 1
  
  Actually after reading this, I have had the need to call Cisco tech support in order to get help installing a new IOS on on of my 2600 series routers. Since it was an upgrade needed to recognize a new expansion card, they gave us the IOS free, and have been excellent in the support installing it (Since we've never done this).
  
  --
  ------------------------------- 44 because 43 is too low and 45 is too high
they're gonna have a field day by cheezus · 2001-06-27 02:39 · Score: 4

Rob, on behalf of all the script kiddies out there, I would like to thank you for disclosing exactly how OSDN's network operations are set up :)

---

--
/bin/fortune | slashdotsig.sh
I hope the people at Netscape are listening. by Delrin · 2001-06-26 22:30 · Score: 1

They could take some pointers from Cisco, about what "tech support" is. Anyone here ever deal with Netscape/AOL over the phone? And I don't mean for client support, we're talking iPlanet and commercial product support.
1. Re:I hope the people at Netscape are listening. by Delrin · 2001-06-26 22:55 · Score: 1
  
  You know, I'm not surprised that it was good in older versions.. Seems to me that the 3.63+ problems were mostly related to all the fancy Java features and what-not. If it was Sun who did the tech-support, then they are certainly to blame for one of the worst tech-support experiences in my career. Things like "Oh, you didn't install the complete server package (core only), oh well we don't support that configuration" types of responses. I guess I had some pretty high hopes for iPlanet, with my previous experiences with IIS in hand. And the sad truth about Apache is that my hopes to use it in a commercial environment falls on deaf ears.. Open-source they say, well that can't possibly be secure... sigh.. nothing like the finance sector. :)
2. Re:I hope the people at Netscape are listening. by InsaneGeek · 2001-06-26 22:41 · Score: 2
  
  Then you are talking to Sun, Aol pretty much gave them iPlanet for hardware. Netscape at one point actually *did* have good support, I think I called them once maybe twice on an issue... of course that was back in days of version 2 enterprise server which kicked some major butt. Later versions seriously sucked when they started adding in "features" that other products did so much better (search engine for example).
  
  I washed my hands of their server line a few years ago, but I still have a sticker on my car from certification training in rememberence of their old product.
3. Re:I hope the people at Netscape are listening. by InsaneGeek · 2001-06-26 23:38 · Score: 2
  
  It would depend upon when you contacted them, here semi-recently (2 years??) it'd be over at Sun, before then you'd have gotten the old Netscape. I've not had the best luck with Sun's support (gold support contract on 30+ enterprise servers) so I'm guessing it's them.
  
  If I remember correctly cnn.com ran on ver 2 for a number of years after ver 3 came out (they're on ver 4 now). I loved ver 2, none of this java crap, just nice clean config files that did exactly what it was meant for (webserving) and did it faster than anyone else at the time. I did get ver 3 to run OK but it took forever going through this file and that file and yanking out crap I didn't need, ver 3 seemed like it was meant for "intranet's" not for main webservers, all this distributed admin, indexing, etc. crap it's not used for the outside world.
  
  When we flipped to ver 3 all I can say is thank god for Irix's vswap, just to run it wanted to allocate something like 2 gig of swap space, never used it just allocated it on startup, if it didn't have this space it would kill it's main process. Vswap allowed me to say the system had more swap space than it really did (as long as you NEVER request it you're fine), this was back when storage wasn't cheap like it is today and essentially throwing away space wasn't looked kindly upon by the higher-ups.
Re:How many times do we have to say it... by Sc00ter · 2001-06-26 23:20 · Score: 2

Very true.. I never understood why you would firewall ping? The machine still needs to take time to dump the ping request, so you can still be DDoS'd.. and it makes troubleshooting a pain...
also, isn't it against some RFC or something?

--

--
Free Mac Mini
They weren't even trying! by he-sk · 2001-06-27 03:00 · Score: 1

Several hours of this sort of network debugging went on until 3:00 AM Sunday. By then we had called Cisco for help. They couldn't help us until they saw the switch config and got a chance to review it. We were spent. We had to go to bed and stay down for the night.

WTF?! There were Dave, Kurt, Hemos, Yazz, and possibly others. Not one person could stay up? They all were too tired? You couldn't take shifts?
Yeah right.
I mean, how could you even sleep, realizing that
not only Slashdot, but also NewsForge, freshmeat, OSDN.com, ThinkGeek, and QuestionExchange were down, along with our old -- but still popular -- MediaBuilder and AnmationFactory sites.

--
Free Manning, jail Obama.
Re:You know you've been using windows too long whe by Caspuh · 2001-06-27 04:02 · Score: 1

You only have one DNS server?
Re:You know you're a cranky old grognard when... by John+Miles · 2001-06-27 11:55 · Score: 2

Ah, that makes sense. I was assuming that these things had their own hard drives.

Considering what their routers cost, I'm surprised Cisco doesn't give you a dedicated journaling drive. Guess they don't fail often enough to justify anything that elaborate.

--
Dahlmann tightly grips the knife, which he may have no idea how to use, and steps out into the plain.
Re:You know you're a cranky old grognard when... by John+Miles · 2001-06-27 03:40 · Score: 3

I'm not sure I understand that. Why does the router purge its logs when you reboot it?

That sounds lame as hell. (Granted, though, configuring a Pipeline 50 goes right over my little bow head, much less a Cisco. So yes, I'll stipulate that I'm talking out of my ass here.)

The act of rebooting should be just another even that gets logged, NOT a synonym for "oh, and by the way, you can delete the old log file now."

IMHO log deletion should be done on a calendar basis; everything more than x days old gets purged automatically. What's Cisco's rationale for auto-deleting logs during the boot process?

--
Dahlmann tightly grips the knife, which he may have no idea how to use, and steps out into the plain.
Re:Eeep - scary moderators! by Dr_Cheeks · 2001-06-27 19:42 · Score: 2

Phew! Thanks for all the replies - this now rates as my most discussed thread ever by about a bazillion posts. In particular, thanks to Kurt Gray, who gave a decent response (even if we're still left with questions). Although I'd suggest that Jamie McCarthy might want to take a day or two off work and lose some of that stress : P
In all truth I was simply asking for the full story - it seemed like some pertinant facts were being swept under the carpet in the write-up, and this is the last place I ever expected that to happen. I don't want to know what names people called each other; just let us know if the original story was crap or not. If it was just Taco suffering lack of sleep, then say so. If there was a dispute and someone quit then say so.
I don't care what the tech's name is; I just don't like it when people "wash" stories to avoid anything that may reflect badly on them. Don't just tell the truth, tell the whole truth.
And as for the modding down of posts - I guess no-one's talking about that and it's not something that we're likely to hear about any time soon. Sure, it could be the Slashdot editors, but it could equally be a bunch of rampantly loyal moderators who don't want to upset the status quo, but who're rarely seen since Slashdot rarely gets criticised much on it's own site. Don't go running away with the conspiracy theory idea, OK kids?
OTOH, if you've calmed down a bit Jamie (or any other editors), would you care to give us a definitive answer as to whether the editors can/would moderate down like this? Try to remember that we all like coming here and expressing ourselves, and that's the only reason we're asking.

--
Re:Yeah, more details please! by Dr_Cheeks · 2001-06-27 19:59 · Score: 2

OK, maybe she was - I don't know. But that's the point - I don't know. At the time I posted this it just seemed like everyone was pretending it didn't happen.
But check out http://slashdot.org/comments.pl?sid=01/06/27/12420 7&cid=86 - a thread I started to ask what was going on when I noticed everyone asking about this was suddenly hitting -1 despite it clearly being a popular question. Kurt Gray makes some worthwhile input, and Jamie McCarthy shoots his mouth off too, though the question still isn't really cleared up.

--
Yeah, more details please! by Dr_Cheeks · 2001-06-26 22:19 · Score: 4

Yeah, they made such a big deal out of how rubbish she was (and later re-worded things to sound less mysoginistic); basically pointing the finger at the support people, but this write up is all glowing adverts for Cisco and doesn't mention anything like that.
C'mon, tell us the full story!

--
1. Re:Yeah, more details please! by cmclean · 2001-06-27 00:00 · Score: 1
  
  C'mon, tell us the full story!
  Maybe she was that rubbish? ;-)
  cmclean
  
  --
  "Any similarity between the hooting of a million eager monkeys and Slashdot is purely coincidental." -THEFLASHMAN
Eeep - scary moderators! by Dr_Cheeks · 2001-06-26 22:50 · Score: 5

Hey, try browsing at -1 nested - seems like everyone who's questioned the story about the woman who "quit" (Anne Tomlinson?) has suddenly been modded down. I'm not usually one for conspiracy theories, but is this surprising anyone else? I think that the question couldn't really be more on topic, and they're hardly flamebaits or trolls - what's up?
BTW, feel free to mod me down, prove my point and compound my paranoia; I've got karma to spare : )

--
1. Re:Eeep - scary moderators! by hyperizer · 2001-06-27 01:10 · Score: 2
  
  One thing you don't want to do is publically flame someone who still has your root passwords...
  
  Now might be a good time for you to change these!
2. Re:Eeep - scary moderators! by _xeno_ · 2001-06-27 01:02 · Score: 1
  
  And what will be the Microserf's report to his boss about Slashdot's reaction? "Boss, we floated the shared-source balloon, and nobody seems to care -- they're awful concerned about a woman named Anne someone who doesn't even seem to exist." "Excellent. Deploy the death ray, oops I mean the we-share-our-source meme."
  Um, like they'd give a damn about what people on Slashdot said. Wow, someone didn't get enough sleep after the weekend. (Of course, I didn't either, but that had nothing to do with Slashdot. Seeing as it wasn't up.) Don't forget - if anything, Microsoft is worried about what their partners, their consumers, and their third-party developers think about the Shared-Source idea.
  Windows developers have been able to get Windows source for some time now - it ain't cheap, but if you need it - you can get it. (Of course, these are the places that already spend upwards of $2500 per developers annually - the charge for getting access to the Windows source is small change to them.)
  The hobbiest developer has never really mattered to Microsoft - one of the reasons they've ignored Linux and BSD for so long. They don't care.
  My reaction to shared source? Big deal. I don't really care - as a student, who works on mostly small projects, I could really care less that Microsoft is planning on expanding an already existing program and rebranding it as "Shared Source." However, things like a Slashdot story decrying someone and then disappearing pique my curiosity - what's up with the editors? What are they hiding?
  So yeah, maybe some people are worried about Shared Source or care for whatever reason. I don't - it's MS trying to find a way to get the benifits of the "bazaar" while remaining in the "cathedral" to use that overly used metaphor. So?
  
  --
  
  --
  You are in a maze of twisty little relative jumps, all alike.
3. Re:Eeep - scary moderators! by _xeno_ · 2001-06-27 00:32 · Score: 2
  
  My personal bet is that Taco was misinformed and there never was an "Anne Tomlinson." Don't forget, head-honcho Taco lives in Holland, Michigan, which is a distance from Waltham, Massachusetts. Anything he heard about it would have been second-hand - Taco isn't a primary source. (Unless he randomly was in the area for some reason - according to Yahoo! Maps! it's! a! sixteen! hour! drive! from Holland to Waltham. (Note: remove an spaces in the URL; there were none in the preview, but it's happened in the past - is this fixed in Slash 2.0?))
  I think we do have a right to know if there really was a women from Cisco who quit over her dealings with the VA Linux crew - but probably nothing more than that. If Anne Tomlinson really exists, she's more than welcome to try and post her story as a comment - and the crew of Slashdot readers will probably not let it fall into the cracks...
  Although the reality is that we cannot ever be sure that anyone who posts as "Anne Tomlinson" really is who the post claims she is... I would like a Slashdot editor to try and clear up this matter, though. Was there a women who quit? Or was Taco borrowing some of the $3 crack he sends to moderators?
  
  --
  
  --
  You are in a maze of twisty little relative jumps, all alike.
4. Re:Eeep - scary moderators! by Thorin_ · 2001-06-27 02:35 · Score: 1
  
  Let the Anne Tomlinson urban legends begin. Soon she will be as well known as Linus or RMS.
5. Re:Eeep - scary moderators! by Xylantiel · 2001-06-27 08:58 · Score: 1
  
  As long as we're talking about doing the "Right thing" here, you're correct that the comment about incompetence should never have hit the front page (male or female). But the reality is IT DID! Roblimo editing out the 4th (unnamed) netop from Kurt's account is pretty slimy. I'm sure it was to cover up Malda's mistake. But the "Right Thing" to do is to come clean. I note that Kurt's accout drops off right before the unnamed netop (the one supposed to be on duty) shows up and the starts up again right after Kurt (in a later post) says she quit and presumably left.
  This basically means that the "Blow-by-Blow" has been *LAUNDERED* and is missing important facts. Why was the config so hosed? It's obvious why Kurt was not able to debug it, but shouldn't the unnamed netop have been able to do so, as they were the person on call?
  From my understanding of slashdot politics, this all fits. Malda has a big mouth (that's actually almost a prerequisite for starting a site like this) and Roblimo is the politician/diplomat/businessman who by past record will take some editorial liscence to "smooth things out" as he has done here. Lesson: trust Malda, even though he's weird, but be sure to read between the lines with Roblimo becaus he will decieve you.
  And just for the record, my judgement is that the name Anne Tomlinson was just made up a troll. But the name doesn't really matter. The fact is there is an netop whose actions are not accounted for in the story.
6. Re:Eeep - scary moderators! by Some+Dumbass... · 2001-06-27 01:02 · Score: 1
  
  As a community it would be best to let the matter drop. I'm sure if you were in Anne's position you'd be severely pissed. A little perspective and some empathy would be appropriate.
  
  Would you care to defend that statement? Unless you have some inside knowledge as to Ms. Tomlinson's state of mind, I don't think that you're in a position to claim that "she would be severely pissed". It's not at all clear that not talking about the issue is the "empathic" thing to do. If Ms. Tomlinson was mistreated or something, then I suspect that exactly the opposite is the case.
  
  Furthermore, sometimes events are worth talking about even if they are painful to discuss for those involved. If I found out that my employer had mistreated one of my co-workers, that's a problem that can affect all of us. Thus, the events are worth talking about (and in fact would be damn important to talk about!)
  
  The problem here is that the people who know the most aren't talking. Rumors survive because nobody knows the truth. And since there's just enough information out there to make it look like something happened, the rumor is not going to die.
  
  Perhaps nothing happened - we just don't know. However, if we stopped talking about problems just because someone withheld information, then people would start getting away with all sorts of nasty stuff. So, while we may not have the "right to know" (unless we're VA stockholders?), we still ought to know, and we're still right to ask about what's going on. If OSDN can't give all the details due to fear of lawsuits, then at least say that. But pretending that nothing has happened is wrong - at the very least, this rumor has happened, so someone might want to address it! (And yes, there are at least a few facts backing up the rumor, e.g. changes to the article, so it's not a pure crap rumor that can be ignored).
7. Re:Eeep - scary moderators! by update() · 2001-06-27 02:26 · Score: 2
  
  We have a right to know.
  No, we don't have a right to know. Ms. Tomlinson's departure is between her and her employer; not some tabloid expose for a bunch of overly curious rumor mongering conspiracy theorists.
  I agree that we don't have a right to know the details of Andover's employees. (Of course, I also don't think we have a right to free music, Microsoft source code and other peoples' trademarks.)
  However, given that Slashdot's editorial content is created almost entirely by the submissions, comments and moderations of readers, I'd say we do have a right to know what's going on on those fronts. In this case, posts discussing an on-topic issue that seems to be embarassing to OSDN have been getting systematically knocked down by what is obviously editor moderation. Yes, we are owed an explanation.
  Honestly, when you guys talk about "open" and "community" and "security through obscurity doesn't work" doesn't it ever occur to you to relate that to the site you run?
  
  Unsettling MOTD at my ISP.
8. Re:Eeep - scary moderators! by BuddyPacer · 2001-06-27 02:41 · Score: 1
  
  No, we don't have a right to know. Ms. Tomlinson's departure is between her and her employer; not some tabloid expose for a bunch of overly curious rumor mongering conspiracy theorists. I wouldn't be surprised if the people who blurted this out on a public forum haven't been seriously bitch slapped by HR.
  As a community it would be best to let the matter drop. I'm sure if you were in Anne's position you'd be severely pissed. A little perspective and some empathy would be appropriate.
  I strongly disagree with this "5, Insightful" comment. Who cares about the employee-employer squabble? That's not why people are homing in on this part of the story.
  The issue is that a supposedly "blow-by-blow account" was given which omits an important detail. In itself, that's not even too big deal, it could have been an oversight, but now follow-up discussion about that hole in their story seems to be actively suppressed. I've never seen that happen before on Slashdot, and I'm curious to know if tampering of the community's sentiments is actually taking place. Posting about it is definitely called for. How in the world does letting the matter drop constitute "perspective" and "empathy"?
9. Re:Eeep - scary moderators! by Miss+Tress+Race · 2001-06-26 23:13 · Score: 4
  
  Hey, try browsing at -1 nested - seems like everyone who's questioned the story about the woman who "quit" (Anne Tomlinson?) has suddenly been modded down. I'm not usually one for conspiracy theories, but is this surprising anyone else? I think that the question couldn't really be more on topic, and they're hardly flamebaits or trolls - what's up?
  
  Yes, I'd like to know too. I didn't see the original "Slashdot Back Online" announcement, but I did see Maswan's listing of the 3 different versions of it, with all reference to the "she wasn't actually qualified" girl removed. And of course, the message from Anne Tomlinson decrying her treatment at the hands of Jeff and Rob. And an astonishingly rare post from CmdrTaco (his only post in the last several weeks) dismissing her as a troll (and being of course sycophantically modded up to 5).
  
  So I, and no doubt many other loyal Slashdot readers, would like to know - what really happened? Who is Anne Tomlinson? Why did she quit? Why has all reference to her been purged from the site? Why is everyone who asks about her being modded down to -1 so quickly that it is obviously editors doing it?
  
  We have a right to know.
  
  --
  "An ye harm none, do what ye will" - The Wiccan Rede
Re:You know you've been using windows too long whe by AsbestosRush · 2001-06-27 03:33 · Score: 1

Cisco places only a very small PROM in most of their equipment for NV Storage. This is where the IOS and config are stored. The idea behind not storing logs there is so that you'll have plenty of space for your access lists and other custom config stuff.

--
EveryDNS. Use it. It works.
AC's need not reply
Re:Beware of departure from original statement by AsbestosRush · 2001-06-27 03:44 · Score: 1

There's 4 little copper contacts on the top of where the battery clips in. Bend those outward, and your problem should be fixed. Before I was laid off from my last Sysadmin postion, we had these as well. When we asked out Nextel rep about it, they told us that's what they do to them.

--
EveryDNS. Use it. It works.
AC's need not reply
Re:blocking ping, btw, is STUPID. by sunking7 · 2001-06-27 18:44 · Score: 1

Yeah, as if you couldn't discover the same information about of the machine by talking to the HTTP server :P

-SK7

--
"a powerful and unexpected ally..."
Re:Good Thing by Temkin · 2001-06-26 23:07 · Score: 1

I am glad to hear the actual details and not some PR created legally scrubbed crap (assuming there was no scrubbing in there...)

It's scrubbed. The original story mentioned one of their network tech's quit mid-crisis. Not that I blame them for brushing that under the carpet.

Temkin
SOMEBODY at Cisco is clearly on the ball. by Peter+Simpson · 2001-06-26 23:28 · Score: 1

You can't buy advertising like this. (you have to get it the old fashioned way...you EARN it). Let's hope the faster/better/cheaper downsizing craze doesn't hit the guy who runs the TAC organization. Or, better yet, let's hope it does; my company would love to have him do his thing for them!
Re:So, what was the problem...? by d-rock · 2001-06-27 10:00 · Score: 1

The key is that their "redundant" Supervisor modules weren't. I didn't even know you could do that with Sups on a 6500 (share VLANs). My understanding is that only one is active at any given time. Is this right?

Derek

--
Don't Panic...
Re:You know you've been using windows too long whe by Som12H8 · 2001-06-27 07:01 · Score: 1

I laughed out loud when I read this:
But, says Yazz, "Since the Cisco was rebooted there were no logs to look at."
You fell into the classic "Windows" trap.. this is what I tell the Jr. tech guys here when one of the servers goes wonky: "If it doesn't work, there is a reason; something is wrong. Rebooting will not fix the problem."
WTF!
Any halfway decent Network Ops (of which I'm in charge of one) have syslog servers for all their stuff. Often one uses som sort of smart agent checking thoses logs, like CiscoWorks or DFM. (Or handwritten in Perl, like for our firewall logs.)
BTW, 6509:s are pretty good stuff, we have a few here at The Big Hospital. And believe me, you don't wan't any network downtime here...can you say "dead people are cool"?
"We live in the lovely quiet and dark." - John Varley
Re:You know you're a cranky old grognard when... by wunderhorn1 · 2001-06-27 13:31 · Score: 2

Considering what their routers cost, I'm surprised Cisco doesn't give you a dedicated journaling drive. Guess they don't fail often enough to justify anything that elaborate
Hard drives fail too often to justify anything that elaborate.

--
Karma: Bored. (Thinking about resurrecting the "Anyone else is an imposter" joke.)
Re:One word, OUTSOURCE! by kperrier · 2001-06-26 23:04 · Score: 1

Maybe you need to read the account again. They replaced the FreeBSD bridge/firewall with the crossover cable to see IF the firewall was causing the problem. Their firewall was NOT a crossover cable.

Kent
And now a word from our sponsor ... by Combuchan · 2001-06-26 23:57 · Score: 2

The preceding one hundred and seventy-five posts have been brought to you by Cisco Systems, Inc. Cisco: Empowering the Internet Generation. On the web at www.cisco.com
I have never seen a better proxy advertisement for any company than this slurry of posts regarding the overall superiority of Cisco tech support. If getting their routers did not require the purchasing power of selling my soul or my firstborn child, I'd buy one.
p.s. would've applied (R), sm, and tm as needed, but <sup> isn't allowable HTML. :P

--
"[T]he single essential element on which all discoveries will be dependent is human freedom." -- Barry Goldwater
either this part is wrong by el_guapo · 2001-06-27 04:22 · Score: 1

"Since the Cisco was rebooted there were no logs to look at" or you guys have a seriously weird logging set up. Rebooting a Cisco box shouldn't nuke the log, it should still be there. Also, here's a hint "logging a.b.c.d" will get it to log to an external syslog server.

--
mas cerveza, por favor politically incorrect stu
Re:You know you're a cranky old grognard when... by SuiteSisterMary · 2001-06-26 23:04 · Score: 2

But I think that his point is that it's certainly NOT the FIRST BLOODY THING YOU DO. Cuz depending on what's wonky, it might not come back up at all.

--
Vintage computer games and RPG books available. Email me if you're interested.
Re:You know you're a cranky old grognard when... by SuiteSisterMary · 2001-06-26 23:59 · Score: 2

<blockquote> I do agree that they should probably try SOMETHING before resorting to rebooting it, but it's the easiest way to tell if something is broke.</blockquote> Actually, I'd say that looking at the logs and doing diagnostics is the easiest way to tell if something is broke, for a piece of hardware like Cisco. But oops, they wern't syslogging to a box (ideally with a dotmatrix printer; try cleaning THOSE logs, cracker-boy!) so they lost one of their best tools for finding out what went wrong.</P>

--
Vintage computer games and RPG books available. Email me if you're interested.
Good Thing by jallen02 · 2001-06-26 21:56 · Score: 1

I know that its a bad reflection upon your business and all to hear of things like this POSTED to your site about how badly configured stuff was behind the scenes. At least at first glance.

I am glad to hear the actual details and not some PR created legally scrubbed crap (assuming there was no scrubbing in there...) There always are bad moments when you get a group of cranky nerds up way past their bedtimes under a lot of stress, but hey thats what makes us human. So props to you guys for being open with this kind of stuff :P Its nice to know some people are honest...

Jeremy
Re:Thanks for an honest explanation by jallen02 · 2001-06-27 03:56 · Score: 1

Look, and your part of the reason no doubt, /. is like home to a bazillion computer geeks. A small minority that is hugely vocal.

Okay, so peope get stressed and felings get bruised. Someone quit under certain circumstances. It was not anyones business honestly to post that to the FRONT PAGE of slash dot. Its none of our business. They are being damn nice as it is being this open with it. Things get crazy when you have THAT MUCH stress on top of you. Its like the end of the world to the people trying to fix it.. trust me I have been there.

Anyway, just let it go folks..

Jeremy
if i may bother you by Lord+Omlette · 2001-06-29 00:34 · Score: 1

Where did you get your hard cover notebook? I've never heard of such a thing and my search at local shops has proved fruitless...

Peace,
Amit
ICQ 77863057

--
[o]_O
hang on, i can do this... by Lord+Omlette · 2001-06-27 00:40 · Score: 5

I will now prove, using extremely shaky methods, that "Blow-by-Blow Account of the OSDN Outage" by Roblimo is, in fact, an epic myth.

I. Call to Adventure

"By 7 a.m. it was obvious that this was not a typical, easily-fixed, reboot-the-database problem. The network operations people were paged, but did not respond."

II. Meeting the Mentor

CowboyNeal once said, "You can take everything I know about Cisco, put it in a thimble and throw it away."

Whoops, that's not it.

"So I called Cisco tech support."

There we go.

III. Obstacles

"Just to make things interesting we've added ports to the 6509 by cascading to a Foundry Fast Iron II and also a Cisco 3500. We've got piles of printouts and documetation of all sorts, drawings and spreadsheets, helping us keep track of every IP and machine in this cage, yet it doesn't seem to get any clearer unless you've either built it yourself (only one person who did still works here and wasn't available this weekend) or if you've had the joyful opportunity of spending a night trying to trace through it all under pressure of knowing that the minutes of downtime are piling up and the answer is not jumping out at you."

IV. Fulfilling The Quest

"He bounces the switch... copy startup-config running-config ... the switch resets itself... then email starts streaming into my inbox... then I can ping our sites all of a sudden... we're back online! Everything is back! Weird."

V. Return of the Hero

"The next day, Monday, Kurt talked to Exodus network engineers and asked them why our uplink settings were so confusing to Cisco engineers."
"Tuesday was router reconfig day."

VI. Transformation of the Hero

"At least we've learned a lot from the experience -- like to call for help from specialists right away instead of trying to gut things out, and just how valuable good tech support can be."
"We certainly aren't going to make the same ones [ed: mistakes] again!"

Peace,
Amit
ICQ 77863057

--
[o]_O
1. Re:hang on, i can do this... by RhetoricalQuestion · 2001-06-27 14:01 · Score: 2
  
  I will now prove, using extremely shaky methods, that "Blow-by-Blow Account of the OSDN Outage" by Roblimo is, in fact, an epic myth.
  I believe you forgot "Mysterious nature of the Hero's birth/origin"
  Though I'm sure we'd all rather not know that one.
  
  --
  I can spell. I just can't type.
2. Re:hang on, i can do this... by tristan+f. · 2001-06-27 15:01 · Score: 1
  
  you foreigners sure are crazy.
  
  --
  Hi, I'm a pretentious cock who will make some gay comment about ignoring AC posts here.
You want real tech support? by neoptik · 2001-06-27 03:37 · Score: 1

There is this medical devices compny called Minimed. They make insulin pumps - devices designed for the constant delivery of insulin to diabetics. Basically, insulin pumps are the replacement for several insulin shots a day. I am a diabetic and I have one.
Now, let me tell you about Minimed tech support. They are unbelievable. The most I have ever had to wait for assistance with my pump is 3 minutes, and this is several calls at random hours (3 am, 2 pm, 11:30 am, etc, all hours, no matter what). The tech support people know the minimed pump product better than anything.
But the real reason is that they have to. Sure, Cisco and Arrowpoint have big $$ contracts with their customers to keep routers and such in order. But if a router goes down for a few hours, it doesn't mean the death of the customer. If a Minimed insulin pump stops working for even a few hours, and the user doesn't realize it, they can go into diabetic keto-acidosis and potentially die. Now, this hasn't ever happened, to my knowlege, but the possibility is still present. Therefor, to avoid the unbelievable criminal lawsuits, Minimed has what I would expect to be the best tech support in the world.

--
I dont have a .sig just yet.
Cisco TAC is that good. by TheLink · 2001-06-27 10:42 · Score: 2

Years ago a customer sent us an obsolete Cisco router (very old) for repair. They actually bought the router from someone else in Australia or something.

Basically the thing was really dead, and really obsolete so we didn't have any in stock, so I reported it to the Cisco TAC.

Days later, a replacement router came in (lower priority coz the customer had other routers). I asked the TAC how are we going to sort out the payment part - e.g. who pays who etc. Turns out the TAC doesn't care - as long as the problem is fixed!

So the fix was on the house! Of course the customer was informed that as this was an obsolete router, fixes like this might not happen in the future as there probably won't be any routers of that model in stock.

But the main thing was: problem fixed for free, boss happy, me happy, TAC happy, customer very happy (I think we didn't charge them anything either :) ).

So guess what routers the customer will replace their obsolete ones with? :)

That's why I have no reservations recommending Cisco routers to people who can afford it. Life's tough for those that can't afford it.

These guys know their stuff. Heck many of them know their competitor's stuff better than their competitor's tech. And there were cases where Cisco's stuff worked better with Brand X's stuff than Brand X's stuff with itself!

I'd prefer Cisco's way of achieving market share. I hope they keep trying to do a "good clean fight" and showing that you can win without playing dirty (unlike some other companies..). They're going through tough times. Economy slow down, stiff competition (Juniper etc).

Cheerio,
Link.
--
- Too many replies beneath your current threshold
No Monitoring System for Network Ops? by Milalwi · 2001-06-27 00:07 · Score: 1

The first hint that all was not well came at about 2 a.m. on Saturday, US eastern time, in the form of slow-loading pages. By 7 a.m. it was obvious that this was not a typical, easily-fixed, reboot-the-database problem. The network operations people were paged, but did not respond. Uh-oh.
Um, does anyone else find this a little bit surprising? Network ops doesn't have an online monitoring system? They wait until things are down to respond? Shouldn't they have known about this problem much earlier?
It's not like a good monitoring system is hard to find.
Milalwi
Three words for you guys... by Ron+Harwood · 2001-06-26 21:46 · Score: 2

Service Level Agreement.

--
BlackNova Traders
Dual Supervisors good. Dual MSFC bad. by JakiChan · 2001-06-27 03:35 · Score: 1

The company that I work for has a pretty standard setup for any new cage: 2 6509s, with dual Sups with MSFC. We've bought 'em that way for as long as I've been here. I bought 'em that way when I was at my last company.
The dual sup part works great. If things are configured correctly then the primary sup config is copied to the backup sup, and if there is a failover then you still have the good config.
However, it don't work that way with MSFCs. If you have dual MSFCs in a chassis then the MSFC on the secondary card does NOT have the config of the primary MSFC. It's not a primary/secondary setup, it's more like having 2 completely separate routers. So if you have only one 6509 chassis and you have dual MSFCs but you didn't configure the 2nd one then you're h0zed. In our normal config we'd never notice...if one MSFC fails then the primary MSFC in the other chassis will take over (HSRP/VRRP being the handy thing that it is) and you swap out the bad hardware. But if you're someone that says "hey, I paid for 4 MSFCs, and dammit I want full redundancy" then be prepared for this tres funky 4-way HSRP setup. It's tasty.
And guess what...while our reseller was happy to sell us that config (2 chassis each with dual Sup/MSFC), Cisco does NOT recommend it. They recommend in our situation, 1 Sup2A/MSFC and 1 Sup2A WITHOUT MSFC per chassis. Go fig...

--
"Where quality is like a dead stinking rat - you just can't miss it."
Yet they cannot hire decent writers by Com2Kid · 2001-06-27 11:06 · Score: 1

I find it highly amusing that between all this praise of Ciscos excellent support services, their training manuals are horribly written, have piss poor grammer, and even the occasional glaring technical error.

You'd figure that Cisco is big enough that they could afford to hire a few decent writers :)

--
Need help treating your acne? Come here!
Re:You know you've been using windows too long whe by EinarTh · 2001-06-27 00:41 · Score: 1

Well it usually does with windows, so you can't really blame them...

--
-- Computers are not intelligent. They just think they are.
Re:Grr... by pokrefke · 2001-06-26 22:55 · Score: 1

Yeah, check out my post on the previous story. I guess the powers-that-be don't like you to point out their mistakes.

Think this one will get modded down too?
Responsive techies by Drone-X · 2001-06-26 22:22 · Score: 4

So I called Cisco tech support. I wish had done this sooner. I was amazed first of all by how you can talk to a qualified Cisco tech immediately... we're talking an 800 number that you dial and within less than a minute you are talking to a technician... doesn't Cisco realize how shocking this is to technical people, to actually be able to talk to qulified technicians immediately who say things other than, "Well, it works on my computer here..."? Do they not know that tech support phone numbers are supposed to be 900 numbers that require you to enter your personal information and product license number, then forward you to unthinking robots who put you on hold for hours, then drop your call to the Los Angeles Bus Authority switchboard... does Cisco not understand that if you do not put people on hold for at least 10 minutes they might pass out in shock for being able to talk to a human too soon? Apparently not.
Yes, but have you tried dialing that number when Slashdot wasn't down?

--
Monkey sense
1. Re:Responsive techies by LordKariya · 2001-06-27 02:17 · Score: 1
  
  That's odd, at Microsoft you get in serious trouble if your average creeps above .15 ...
  Yes, another shameless Microsoft crack.
  
  --
  I alternate between posting +5 and -1 Comments. Karma: +53 -47 = 6
Re:So, what was the problem...? by elbuddha · 2001-06-27 01:59 · Score: 2

"I didn't see anything to explain that in the report."

From the article:

The one card going bad wouldn't have been such a big deal if the config in both were set up correctly. It was meant to flop to the other interface if the primary card died, which it did, but not with all the info it needed...

One of the two router cards in the 6509 died. When the failover to the other card occured, the other card didn't have all the information it needed to do its job. Thus, the original problem was the dead primary card. The contributory problem was the screwy config. But the screwy config didn't matter until the primary card died.
Here's exactly how it went by TTop · 2001-06-27 00:33 · Score: 1

#!/usr/bin/perl
use Conspiracy qw(Censorship Story);

my $outage = new Story;
$Story->change();
$Story->change();

my $controversy = $Story->replies();
$controversy->mod_down();
my $interestlevel = $controversy->modpoints();
if ($interestlevel > 1) {
$controversy->mod_down();
$controversy++;
}
Re:article disappearance by kchayer · 2001-06-26 23:33 · Score: 2

For example, why did this story disappear? Was it technical? Was it editorial?
I have a feeling it was just a simple re-classification of the article. I noticed this too, and thought it was quite odd. So I did some poking around before (at the time) just shrugging and bookmarking the story.
The article first showed up on the front page, but disappeared sometime while I was reading the comments. However, the topic "slashdot" icon was still on the top of the screen. Strange, but I don't know how the site works in that regard. I thought maybe they reclassified the site and thought it wasn't good enough for the front page. Turns out, that wasn't too far from the truth.
I searched around in some of the different sections but didn't find the article. I even manually searched for the article itself, but still didn't find it. Oh well, I thought, and I went on with my day.
Low and behold, an hour or two later, the article is back, but now it is contained in the "Features" section. I had checked there before, but didn't see it. Also, I think the "Features" label didn't appear in front of the story title initially (probably because it hadn't yet been assigned to that section).
Simply put, I think the disappearance and reappearance, was just a side effect of reassigning the story to a new section. Short story made long, I know, but I thought it was kind of interesting, too.

"I say consider this day seized!" -Hobbes

--

"I say consider this day seized!" -Hobbes
"Tomorrow we'll seize the day and throttle it!" -Calvin
Re:You know you've been using windows too long whe by duggy_92127 · 2001-06-27 01:12 · Score: 1

You fell into the classic "Windows" trap.. this is what I tell the Jr. tech guys here when one of the servers goes wonky: "If it doesn't work, there is a reason; something is wrong. Rebooting will not fix the problem."

Except, of course, when it does. This is Windows we're talking about...
Doug
Re:Grr... by Sinjun · 2001-06-27 01:19 · Score: 1

I was rather mystified by the moderations myself. I tried to right the wrong with my own measily 5 points (which I'm giving up by posting this), but I'm sorry I couldn't make much of a difference there.
Where is your Network Enforcer? by fishbonez · 2001-06-27 08:46 · Score: 1

In any well managed network system, there is always one person who is the "Network Enforcer". That is, someone who's basic function is to be a dick about disaster planning, redundancy, regular backups, frequent system failure tests, network management, documentation, etc. I've played the Network Enforcer role before and I initially made a lot of people unhappy. But when all the major network problems finally disappeared and we went 6 months without any downtime, everyone appreciated the work I did. Especially my wife because I could leave at 5pm every day and never worry about getting called into the office.

--
Frylock: That's not a toy!
Master Shake: You say that about everything you own. You should own toys. They're fun.
It's NOT our business guys/gals. by X-treme-LLama · 2001-06-27 12:20 · Score: 2

No matter WHAT transpired, it is NOT any of our business what happened with their tech. Whether they're all sexiest bastards, or she's a flaming idiot, they have obligations as an employer not to discuss things of that nature. No one here has "a right to know," period.

Just because it's a popular website doesn't remove it from certain rules that all businesses have to follow; Employee/Employer relations are private, and they are to remain that way. The post about her was probably pulled down after it was posted because it was done in the heat of the moment; as well it should have been. We all have moments of anger, frustration, or just stupidity. But then we regroup, gather up our senses, fix what we can and move on. If she feels she was unfairly fired or treated, she can file a lawsuit. If they feel they didn't receive the services they should have, the can do the same (if she was a contractor..). Either way, it's not our concern.

So why don't y'all calm the fsck down, and back the fsck off, because none of our fscking business.

Oh, IANAL, so I don't wanna hear any crap about who can sue who, that's what I believe is true, I may be wrong. The only person I want to hear tell me I am wrong is a lawyer, if you don't fit that bill, I don't want to hear it.

I Haven't Lost My Mind -- It's Backed Up On Disk Somewhere

--
My rantings, only longer and with better spelling..
1. Re:It's NOT our business guys/gals. by ____respawned_______ · 2001-06-27 14:57 · Score: 1
  
  Well said. But, since every damn person appears to already know, in my opinion, it would be nice to offer a reason the comments were taken out, whether they simply say it was legal reasons, pure change of heart or whatever.
Re:Grr... by Sir_Real · 2001-06-27 01:16 · Score: 2

Not everything is a conspiracy folks.

I'm sure that's what you'd want us to think. :)
Re:Rebooting by CaptainZapp · 2001-06-27 16:40 · Score: 3

As much as I agree with you, even to the point that there is ultimately an event triggering nasty things, a reboot can (granted, very rarely) solve the problem.
Take a relational database for example; there is so much, that can go wrong with it. For starters, there are bugs in such complex products and fixing them (save for Postgresql) is beyond your control.
But it must not even be a bug in the database code. It can be something in your network component (we chased cases for month which turned out to be a DECnet issue, but where attributed to the database server), it could be the fact that the db vendor compiles his product on multiple platforms and it's virtually impossible to test every functionality of a new release on every supported platform. Yes, I know that in an ideal world this should be done, but it isn't.
Assume it would be possible to perform such tests. Save for propriatery (or semi propriatery) architectures like OpenVMS/AXP you can have so many different hardware- and network components, that it's just not possible to forsee all eventualities.
After ruling out such possibilities, we're not there yet: What are the query characteristics, how many concurrent users do when, what. What front ends do they use, how are they connected. The problem may even be caused by a component that has nothing to do with the database engine (Access front end, anyone ?)
Although the fundamental cause for the problem might never be detected a reboot of the data server might fix the problem and it will never occur again, since the same combination of factors occurs so rare that it's even impossible to reproduce the problem.
However, the [alt-ctrl-del] attitude of younger IT folks (specifically those that grew up in a PC environment) makes me barf and indicates just how clueless a lot of those folks are. You never reboot a productive IT component, unless there is no other choice or in the context of your normal maintencance cycle (memory leaks do occur in software)

--
ich bin der musikant
mit taschenrechner in der hand
kraftwerk
and and and by SubtleNuance · 2001-06-27 01:57 · Score: 2

A third's cell phone was on the kitchen counter, unhearable from the bedroom, and the fourth one's cell phone battery had fallen out. It was a frustrating comedy of errors, and an unusual one.

Yeah! And a dog ate my book report. And I was abducted by aliens. And my tyre was flat. And I forgot to feed the dog. And And And - and these two techs are full of it. they were probably smoking weed and playing nintendo.
OT-? the greatest troll? Re:Correct. by leuk_he · 2001-06-27 21:30 · Score: 1

If the whole thing is a troll, it is definitely in the top three. But the zikzak troll was also very good
What is this zikzak troll? Yes, i did find the user zikzak, but i could not find about his great troll he made, and i think i was not reading /. when he made it. Tell me about it. Yes, you can mod this as offtopic,(how much karma can i loose in 1 post?) But why did they post about the outage? to advertise cisco? Because it was a detailed geeky story (Not, the very technical details are misssing, tell me more about this network layout, why can't we ping the site)
Blame Cisco! by sulli · 2001-06-27 00:36 · Score: 1

Saw this elsewhere, thought it would be relevant here:
Blame Cisco
Times have changed,
Our Slashdot's getting worse,
There's no more "stuff that matters,"
Just a hit on VA's purse!

Should we blame the government?
Or blame our ISP?
Or should we blame the h4x0rs at DirecTV?

No!
Blame Cisco! Blame Cisco!
With their blinking LEDs,
And inflated techsupport fees,
Blame Cisco! Blame Cisco!
We need to form a full assault!
It's Cisco's fault!

Don't blame me for old JonKatz,
He lost his damn connection,
Now he's shooting at little brats!
And poor Roblimo once had
pictures of Heidi Wall,
But now, when I see him,
He tells me to suck his balls!

Well,
Blame Cisco! Blame Cisco!
It seems like everything's gone down
Since Cisco came to town.
Blame Cisco! Blame Cisco!
They're not even a real company, anyway.

Slashdot could've been the place to get our fix of daily news,
Instead we just get jpegs of the results of anal screws!

Should we blame the Editors?
Should we blame the Trolls?
Or the moderators who let them take their toll?

Heck, no!
Blame Cisco! Blame Cisco!
With all their worthless stock options,
And that bitch Anne Tomlinson,
Blame Cisco! Shame on Cisco, for...

The crap that we flood,
The news that's a dud,
The MPAA,
Your Rights gone away:
We must blame Cisco! Shout and cuss -
Before somebody thinks of blaming us!

--

sulli
RTFJ.
Correct. by sulli · 2001-06-27 01:20 · Score: 3

Flower is right.
Either Anne is real or she isn't. If she's real, this is an internal matter that we really don't need to interfere in. If, as the "Anne" poster suggested, she quit because Taco and Hemos are hard to work with, she was within her rights and should get at least some support from a community which often says "Quit! Now!" to Ask Slashdots about PHBs.
If she's not, this is all a big waste of everyone's time, and possibly the best troll we have ever seen on slashdot. (An account by that name has a brand new uid (462836) and zero comments.) Think of the trolls you've posted - how many led to 100s of posts on other threads, conspiracy theories galore, and posts by #1 and #2? Whoever did this (if not Anne) should get mad props from the troll fans, but should not take any more of our time.
My bet is that she's not real. But in either case we should drop it and get on to more important things.

--

sulli
RTFJ.
who'd you call first? by Magius_AR · 2001-06-27 19:52 · Score: 1

By then we had called Cisco for help. They couldn't help us until they saw the switch config and got a chance to review it. We were spent. We had to go to bed and stay down for the night. Next morning we're back at Exodus and the situation hasn't changed -- our network is unreachable to the outside world. I was hoping that during the wee hours of the morning the Cisco 6509 had become sentient and fixed its own configuration or perhaps a friendly hacker had cracked into it and fixed it for us, or perhaps ball lighting would travel down a drain spout and shock our cage back to life like those heart paddles paramedics use... "It's a miracle!" No such luck. So I called Cisco tech support. I wish had done this sooner
Out of curiosity, who the hell did you call first at Cisco if it wasn't tech support?
Cisco: "Hello, this is sales & marketing."
Slashdot: "Yes, we're having problems with our server"
Cisco: "Umm, we can't help you"
(several hours pass...)
Cisco: "Hello, tech support."
Slashdot: "Hi, we're having server problems."
Cisco: "Oh, piece of cake, here's your problem"

Also, out of curiosity, why haven't you guys hired a Cisco buff dude? At least someone over there has to be Cisco competent or certified or something. I mean you're Slashdot, that's gotta mean something. The mentality of "Server malfunctioning? Let's reboot and make it better" is what I expect from a Windows admin ;P
Magius_AR
What Other Tech Companies Should Learn.. by Warin · 2001-06-27 02:04 · Score: 1

There is something that seems to be missing these days, not only in the high technology industries, but almost any industry...

Customer Service.

Most companies dont give a rats ass about you after they have your money in their grubby hands. They dont want to spend the money required to maintain proper support for their products. They dont want to spend the money required to train and retain the people who can take care of their customers. And in the end, those affected the most are the users who need the support the most.

So some seriously mad props to Cisco for having the foresight to maintain and train a workforce that can help out their customers in a timely and efficient manner. It would be a much better world if more corporations got the clue that you already have.
Automatically Detect and Fix by 2starr · 2001-06-27 02:27 · Score: 1

This is an all too common scenario. In fact, it's so common that my company has set out to fix it... our product - Lanturion - would have been able to not only tell you which interface on which router was down, but call you (all the tech people... whoever) and ask you if you wanted to restart the interface. Bottom line: a couple days of down time would have turned into, say, five mintues.
Take a look: www.netzilient.com
Rob Eden
Sr. Software Engineer
NetZilient Corporation

--
"Let your heart soar as high as it will. Refuse to be average." - A. W. Tozer
not my experience by the_B0fh · 2001-06-27 02:20 · Score: 1

I must say that my experience with cisco is slightly different. We bought a couple of switches, a 5000 and a 5501. We wanted cisco to put someone on site to help us move it in and configure it.
Cisco sends this piece of meat. He comes in and starts asking for this 5501 "router" which is sitting on the table right in front of him. You can't miss it, it's about 3-4 feet high, big blue box.
In the end, all he did was call the TAC and got someone to walk him thru stuff. I came in on a weekend thinking I was going to learn something. Bloody waste of my time.
-the_B0fh
Re:You know you're a cranky old grognard when... by DarkEdgeX · 2001-06-26 23:31 · Score: 2

Yeah but if it's that screwed up that it's not evening working as-is, where's the harm in rebooting it? If it doesn't come back to life after you try to boot it, then you've just nailed the problem right on the head. Otherwise, you're still running around the server cage like a chicken with your head cut off; not the most effective way to deal with a problem.
I do agree that they should probably try SOMETHING before resorting to rebooting it, but it's the easiest way to tell if something is broke..

--
All I know about Bush is I had a good job when Clinton was president.
Re:So, what was the problem...? by sdo1 · 2001-06-27 03:31 · Score: 1

Ahhh... thanks! I managed to miss that on the first read.

-S

--
--- What parts of "shall make no law", "shall not be infringed", and "shall not be violated" don't you understand?
So, what was the problem...? by sdo1 · 2001-06-27 00:40 · Score: 3

OK, so the config was a mess. But it was like that BEFORE the outage, right? So what happened between "running OK" and "we're down" to cause it to fail? I didn't see anything to explain that in the report. Or maybe they don't know...

-S

--
--- What parts of "shall make no law", "shall not be infringed", and "shall not be violated" don't you understand?
Kenny by sdo1 · 2001-06-26 21:52 · Score: 5

Just don't name the router "Kenny".... he dies every week.

-S

--
--- What parts of "shall make no law", "shall not be infringed", and "shall not be violated" don't you understand?
Re:Beware of departure from original statement by Aaton · 2001-06-27 00:39 · Score: 1

I also need training in words spelling correctly before I hit submit. *ugh*
Re:Beware of departure from original statement by Aaton · 2001-06-27 00:25 · Score: 5

Are we really to believe that nobody was available for 48 hours?

Everything failed at about 7AM Sat. Dave was at Exodus between 8:30 - 10:30AM Sat (didn't look at the log book when I got there). Kurt arrived shortly after that I belive (again I didn't look at the log book). I arrived there around 11:40AM. Sat.
And yes my battery was lose on my Nextel. Just takes a little pressure upwards to lossen the batter on the i1000plus I have. The batter doesn't fall out, just loss enought so it lose contact and turns the phone off.
I have now taped the battery in place!
Yazz Atlas
How ironic! by BlowCat · 2001-06-26 22:47 · Score: 2

This has not been OSDN's finest week. But we thought it was better to give you the full rundown than try to pretend we're perfect.
The irony is thick.
I didn't mind the outage. by briggsb · 2001-06-27 03:27 · Score: 1

My site got over 500 hits from Google over the weekend from people searching for Slashdot. And who said search engine placement doesn't mean anything.
First sign of trouble... by ackthpt · 2001-06-26 22:53 · Score: 1

The first hint that all was not well came at about 2 a.m. on Saturday, US eastern time, in the form of slow-loading pages.
Heck, it's like that all the time. Or did you mean slower? I just assume it's the slashdot effect happening to slashdot.

--
All your .sig are belong to us!

--

A feeling of having made the same mistake before: Deja Foobar
My experience... by csmacd · 2001-06-26 23:15 · Score: 1

Let me first mention I'm a CCIE, so somewhat biased. I call TAC for one reason or another a couple of times per week - Generally, I can talk to an engineer on average in 15 minutes. Of course, the problem you have will make the wait time vary greatly - If it's a simple hardware RMA, it may be 1-2 min. IBM problems, 5-6 min. Common troubles with route protocols may take longer, depending on the size of your network and what priority you can get (Priority 1, big network, will get attention quickly)

--
Don't pick up the pho*(@)$*@&@!@ NO CARRIER
You forgot one thing... by tsmit · 2001-06-26 22:51 · Score: 1

You forgot to document how it takes you 20 minutes to get to your cage from the front door of Boston 2.

You forgot to document how you had to bang on the bullet-proof glass to wake the guard up at 3:00 in the morning.

Obviously, i've been to exodus a few times :)

--
Yes, my girlfriend is a BitchX
Heh by Aciel · 2001-06-27 12:02 · Score: 1

That is funny. Hehehehe.
Re:Grr... by 3-State+Bit · 2001-06-27 02:28 · Score: 1

I agree that it's important to have editorial discretion and, after all, you never remove posts. It just might be walking a dangerous line is all.
~
Failover Testing by SpaceLifeForm · 2001-06-27 04:53 · Score: 1

If you don't test, failover will fail.
Many Internet eons ago, I dealt with a system (codenamed Rosewood)
and it's offspring that were fault tolerant.
I learned early on to simulate failures to exercise the redundant components
to make sure they were functioning. Sometimes daily but at least weekly.
It's less stressful to catch that failure of the backup, and go back to the primary
while the primary can *still* function.
It also makes life easier on the weekends!

--

--
You are being MICROattacked, from various angles, in a SOFT manner.
Re:it smells fishy here.... by unicaller · 2001-06-27 05:44 · Score: 1

Same story here, but I feel it dose reflect on me, and make fixing things a pain in the @$$. Getting the boss to pay for fixing the mess is a more of a pain though.
Re:Hey by fallen1 · 2001-06-27 00:38 · Score: 1

Learn how to spell etiquette, troll. If you're going into business to teach people how to act proper under various circumstances it would be a shame to have them look like fucking idiots when they can't spell the word they went to class to learn.
"Profanity is the crutch of an inarticulate motherfucker" - Unkown

--
Dream as if you'll live forever.
Live as if you'll die tomorrow.
~Anonymous~
Thank you by American+AC+in+Paris · 2001-06-26 21:53 · Score: 2

Thanks for the blow-by-blow, Rob. It really does make a difference (to me, at least) to know what was going on during these outages. Your post made me stop feeling like the Powers That Be were trying to pull wool over our collective eyes and sweep the whole thing under the rug.
Thanks again for the info and the honesty.
AAiP

--
Obliteracy: Words with explosions
Grr... by American+AC+in+Paris · 2001-06-26 22:50 · Score: 5

All right.
After having been modded down next to the goatse links, somebody please explain to me how the hell we're supposed to discuss the decidedly strange disappearance (and subsequent reappearance) of this story on the site without getting modded as "offtopic"?
Just where, exactly, are we to discuss this little point? For example, why did this story disappear? Was it technical? Was it editorial?
For a group that is so damned keen on openness and truth, it strikes me as somewhat ironic that several dozen mod points have been used to effectively supress this part of the thread.
I want to know what happened. Others do to. If you can't give us a decent place on Slashdot to discuss this issue, then don't mod us down as offtopic!

--
Obliteracy: Words with explosions
1. Re:Grr... by American+AC+in+Paris · 2001-06-27 01:19 · Score: 5
  
  Thank you, Hemos. I was kinda guessing and hoping that it was something technical.
  You'll understand my consternation, though, upon seeing my (admittedly offtopic) post on the Shared Source article regarding the disappearance of this article modded down three points to -1 in the course of roughly one minute, and the seemingly similar fate of a good many other posts like it. Also worth noting is the fact that this article touches on what many would consider a rather sensitive issue with the OSDN and /. crew right now. I don't like conspiracy theories much, but I'll be damned if the situation didn't seem, well, rather odd.
  Might I suggest, though, that once stories are actually posted to the front page, they remain as is, even if the order of presentation is not the most desirable? Consistency is key, and having articles disappearing from the front page is not terribly consistent.
  
  --
  Obliteracy: Words with explosions
Yes -- this is why Cisco dwarfs the others by Ndog · 2001-06-27 00:38 · Score: 1

This is why Cisco is on top. Their hardware is not necessarily the best, but it is good and their support is excellent. Try to find any other company competing in the same arena that has the same combination of hardware, software, and support. It does not exist.

--
-N
Re:You know you've been using windows too long whe by John+Sullivan · 2001-06-27 08:15 · Score: 2

Leaving the system online and intact is the best way to root cause a bug.

Actually, there is a very large class of bug for which that is not true - and a large subset of that is where the bug is somewhat repeatable. Sometimes the effort of grovelling around in crash debris is just not worth the effort. There have been many times when restarting, installing extra logging, then watching the crash happen has provided far more information in 5 minutes than grovelling for hours could have.

--
This is my World Wide Web of Whatever
Re:You know you've been using windows too long whe by bkjoegold · 2001-06-27 02:54 · Score: 1

But, says Yazz, "Since the Cisco was rebooted there were no logs to look at."
You have your router/switch send messages to a syslog server as well as the routers log. That way, when it reboots you can still see the logs.
Joe
How many times do we have to say it... by Bonker · 2001-06-26 21:54 · Score: 5

Security through obscurity is no security.

No matter how FUBAR'd your router/switch/firewall configuration is, it's still no serious obstacle to crackers, Robin.

--
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
New slashdot sql ... by rixster · 2001-06-26 23:28 · Score: 3

if (comment like "%girl%" or comment like "%What happened to%" or comment like "%original story%") update posting set score = -1, reason = random("Troll","Offtopic")
Is there someone else outthere hosting a site where we can have a non-biased discussion ?
BROWSE AT -1 Checkout how many posts have gone straight in at -1 (and this one too will, I betcha...)

--
Two wrongs may not make a right, but three ....
Re:Microsoft Support by daniel_isaacs · 2001-06-27 02:00 · Score: 1

Click Here for the article he references above. It's the fabled tale of Microsoft Technical Support v. The Psychic Friends Network. Please do read it, it's rather entertaining.
I know this is redundant, but I know people are too lazy to cut and paste, so the link is helpful.

--
- Dan I.
it smells fishy here.... by jsse · 2001-06-26 23:19 · Score: 1

...one of our most knowledgable people had quit recently...

... and they came to an agreement: our switch config was a mess..

Hmmm...that most knowledgable people didn't leave for nothing, it seems....
&nbsp_
/. / &nbsp&nbsp |\/| |\/| |\/| / Run, Bill!
1. Re:it smells fishy here.... by darrick · 2001-06-26 23:47 · Score: 1
  
  I know that coming into an already-operating NOC you get what's there. Where I work now, the network is a mess. There are many things that could be improved, but "that's the way they've always been". Oh well...It doesn't reflect on me, so I don't care.
Lessons to learn... by somethingwicked · 2001-06-26 22:01 · Score: 1

Lesson One-Look at the documentation. Briefly, try to work from there. If that doesn't work, throw the documentation in the corner and start tracing cables and checking connections yourself
Lesson Two-Tell the whole story, including the politically incorrect drama
I might have missed it, but I saw nothing that mentioned the "She" that quit for whatever reason, but I wasn't reading for that reason. However, I did read the whole account and find it strange that that was glossed over.
Thing is, all the ppl who were crying about how you were changing the posts before are going to go into hissy conspiracy fits now.

--
---"What did I say that sounded like 'Tell me about your day?'"---
Re:You know you've been using windows too long whe by suwain_2 · 2001-06-27 03:52 · Score: 1

You know what? You're absolutely right. I was about to shoot back "No, rebooting fixes lots of problems." But the more I thought about it, the more I realized you were right.
My first argument was going to be the BSoD. It occurs all the time, you just have to reboot. But then I realized something -- it's not a random error (or so we're led to believe...), it's caused by actual flaws in the code. The best solution would be to fix these, although you can't. :)
________________________________________________

--
________________________________________________
suwain_2 :: quality slashdot p
Re:Cisco Tech Support over Multiple Days of Proble by zerofoo · 2001-06-27 05:01 · Score: 1

Beware...soon they will send you groceries when your fridge is empty! -ted
Re:Beware of departure from original statement by afedaken · 2001-06-28 10:15 · Score: 1

Had a similar problem with my old 6160 Nokia. Any shaking or quick motions would cause the contacts to lose connection.

The solution was simple, and low-tech.

Hacked a plastic drinking straw into 3 lengths of about 7cm. Shimmed/crushed them between the battery and the telephone.

This kept the battery from moving around on the back of the phone, thus fixing my mysterious power-off problems.

And no unsightly tape! :)

--
If there's a castle floating upside down in the sky, then there's a castle floating upside down in the sky.
Re:Beware of departure from original statement by slaytanic+killer · 2001-06-27 00:37 · Score: 1

Man alive. I don't remember how much a phone is in the States, but I made sure to get a reliable Nokia. And I'm not even a network guy.

Get a new phone, dude! ;)
Re:Beware of departure from original statement by slaytanic+killer · 2001-06-27 03:16 · Score: 1

I would have actually suggested Motorola, because of their extreme quality initiatives. But I have more personal experience with Nokia.

Thanks.
Re:Beware of departure from original statement by slaytanic+killer · 2001-06-27 20:29 · Score: 1

Yes... that's why I thanked him. I would have suggested Motorola, and now I know it would have been a little absurd to do so, since Nextel == Motorola.

I was aware I didn't make that clear... I put such crap in my slashdot posts. ;)
Re:You know you've been using windows too long whe by Skuld-Chan · 2001-06-27 02:17 · Score: 1

I've had cisco routers crash (usually their cheaper 67x series or the 1600 series) where rebooting does bring the router back online. Usually because were trying to cram to many sessions down them or something (did you know that if you open more then 1200 sessions on a 675 or a 678 it'll probably crash in a few hours?)
Some cisco routers are really pieces of junk.
Re:You know you've been using windows too long whe by Skuld-Chan · 2001-06-30 13:31 · Score: 1

You don't know how many times I've been up that path.
No cisco support has already told me that the 67x series was not designed for what were using it for. The solution is to get a dslam card for a 6100 series router, but alas were a struggling .com...
Re:You know you've been using windows too long whe by wizzy403 · 2001-06-27 04:48 · Score: 1

Not completely true...

I was doing an IP address switch over a couple weeks ago, and bonehead me forgets to do a "clear arp;clear xlate" on the firewall. None of my translations are working, I can't figure out why. Access lists look good, static commands all check out, why the hell can't I ping things properly? I reboot the firewall and everything starts working. Rebooting the firewall cleared the xlate and arp tables. This was not a "symptom" that would come back (unless my fsck-ing ISP changes our IP block on us without warning *AGAIN*), it was me being a bonehead after a long day and forgetting to clear a table. The reboot fixed this nicely.

Granted when I figured out what the problem was, I felt like a total idiot for having to take down the firewall, but at least the network started working again.
Re:You know you're a cranky old grognard when... by wizzy403 · 2001-06-27 04:41 · Score: 2

I'm not sure I understand that. Why does the router purge its logs when you reboot it?

Unless you have syslog logging setup on the Cisco, and it can talk to your Syslog server to offload the data, the logs are stored in RAM. Reboot and the logs go bye bye. Since the router was acting toasty and the vlans were disappearing, the router probably couldn't access whatever box was setup to receive the logs (assuming that syslog was turned on to begin with).
Re:blocking ping, btw, is STUPID. by ViVeLaMe · 2001-06-27 21:57 · Score: 1

takes far less proc to just drop the packet than to process it through all firewall rules and actually reply to it.
(drop quick instead of process incoming ICMP through all the firewall rules then process the outgoing icmp through all the FW rules for outbound packets.)
and, if it doesn't stop the ping flood it prevents u from flooding with icmp responses.
(i can't remember right now any occurence where just not doing anything do take more ressources/time than actually *doing* something.)
or maybe you're pissed off because you're a little bit short on smurf amplifiers lately?

--
i had a sig, once..
Re:You know you're a dumbass when... by fors · 2001-06-27 17:17 · Score: 1

Well let's see, they don't know how this particular setup is configured and can't get hold of anyone who does. They aren't sure of what the settings are supposed to be so why not reboot? With a lot of problems it would have gotten them back up and then they would have time to track down somebody. There was a vald reason to do so in this case. A reboot should have restored the configurations and gotten them up unless it was a hardware problem and if that was the case they might actually get an idea of whicch hardware the problem was in. If they were networking people they would know of other ways to do it. Guess what? they're not. In this context it even made sense to reboot the firewall. They had no idea what the configuration was supposed to be, so reboot and let it start up with the services and configurations that it had stored on the hard drive. They could hardly have been in worse shape and it had a reasonable chance of working.

--
"If there is nothing you are willing to die for, then you are not really alive." Myself
Re:More Writeups Needed... Done! by snake_dad · 2001-06-27 05:37 · Score: 1

Already done. Go here here for a lot of interesting stories.
Some of the stories may be a little outdated, but it may help to read how other people solved their problems. Even if it doesn't help, it might still be something to enjoy :)

--
karma capped .sig seeking available Slashdot poster for long-term relationship.
Beware of departure from original statement by aldjiblah · 2001-06-26 22:18 · Score: 5

Quoted from the original Slashdot Back Online article (before it was modified): "And when our qualified personel arrived, we discovered that she wasn't actuually as qualified as we had hoped. Then she quit, thus terminating 3 local star systems."
Where does this mysterious woman fit into the story above?

--
sig sig sputnik
1. Re:Beware of departure from original statement by lcypher · 2001-06-27 02:55 · Score: 2
  
  Ok. Now I am completely confused about the moderation system. A post saying basically the same thing earlier was moderated down to "0, Offtopic". This one is moderated up to 5, Interesting".
  
  Eh?
Re:One word, OUTSOURCE! by racermd · 2001-06-26 23:13 · Score: 1

If you had read the whole thing, you'd know that putting the crossover cable in place was to bypass the firewall *TEMPORARILY* to eliminate it as the cause of the outage.

With the FW bypassed: If you have data flowing, your FW needs work (reconfigure, reboot, or other). If no data is flowing, look for another cause.

And if you'd ever worked on-call in this sort of situation, I think your comments would be a little softer. Nobody wants their own network to be down. If the staff thought that sleep was more important to them than getting the network up, I trust that judgement. A downed network is not something you want to be working on with 2-3 hours sleep. You can cause more harm than good. I am man enough to admit that I speak from experience, and have learned my lessons quickly.

/rant

--
My sources are unreliable, but their information is fascinating. -- Ashleigh Brilliant
Re:One word, OUTSOURCE! Still amature by racermd · 2001-06-27 00:13 · Score: 1

If you can explain to my how one is supposed to methodically test a network connection with multiple single-points-of-failure, I'd love to hear it. This test usually only lasts for about a minute or two, during which time /. was not functioning, anyway. The /. staff quickly and successfully eliminated the firewall as a source of their outage. Bypassing the firwall *is*, in some cases, the only way to determine if it's the cause of your network outage. At the companies I've worked at, this test is also documented. But it's only implemented in extreme cases due to the inherent security risks you list above. If you saw no data flow on your network, how would *you* go about determining the cause? Please be specific. And, yes, I came from a tolerant company. We also had multiple teams of people to handle problems. The teams consisted of the normal daytime IS staff, and on-call was rotated among us all. The company actually told us to get sleep if we felt we needed it, even during an on-call outage. The costs of further network downtime due to lack of repair by any one individual is far less than the downtime incurred by tracking down a non-working "fix" by someone who was too tired to know when to call it a day. The potential gains from keeping us all up and working do not outweigh the risks to network stability and reliability. Because we all rotated this duty, there was never a time that we couldn't get back online in a hurry. Besides, the addage "too many cooks spoil the broth" comes to mind. Somtimes time!=money. Do a quality job that takes a little longer in the short-term, and it will be rewarded in innumerable ways in the long-term.

--
My sources are unreliable, but their information is fascinating. -- Ashleigh Brilliant
You know you're a cranky old grognard when... by Win-Developer · 2001-06-26 22:58 · Score: 1

You actually think that "ReBOOTING WILL NOT FIX THE PROBLEM". I work in the software/cable/digital video industry, and more often that not, when something is "wonky", you have to reboot the hardware.

And yes...the documentation actually says to reboot the hardware as a Troubleshooting technique. I've read countless docs that indicate that as a mid to last resort reboot hardware.

But you have the attitude of every Linux/*nix/whatever except Windows user I work with(yes I'm a software guy too). I know everything about everything, I'll tell you how to do your job because I use Linux/*nix/whatever except Windows. Let the guys alone and let them figure it out, that's what they're paid to do.

The reason I'm coming down on this, is because I actually lost a previous job because I helped out a new Hardware guy that was in trouble. I did his job for him, and subsequently got fired.
pay attention by slaida1 · 2001-06-27 23:35 · Score: 1

when posting stories or comments. editors and readers alike.
i understand that under pressure people get cranky and possible social problems are not our (the readers) business. still this thing left rather uneasy feeling inside because /. is the one usually laughing, bitching and flaming others of coverups.
i think there's not much else to be done to this than take a lesson: think before you submit, especially you editors! i don't want to read about accusations of incompetence in a popular public forum. it's odd to read this "Important Stuff:" section under this posting form, knowing that even editors of /. don't follow these rules.

--
Preserve old classics: copy your collection onto all hard drives.
hrmm... by sehryan · 2001-06-26 21:54 · Score: 3

what happened to the woman that quit?
-
sean

--
The world moves for love. It kneels before it in awe.
Uptime by r41nm4n · 2001-06-27 07:47 · Score: 1

"I rebooted everything," he said. "I think's it's the Cisco."

If he rebooted everything, why does the Slashdot stats box say, "uptime: 106 days, 4:15, 0 users, "?
Bah.. WUSSIES (+4, obnoxious) by EvilStein · 2001-06-27 02:23 · Score: 1

"Oh, we were tiiiiiiiiiiiiired.. so we went to sleep.. AND LEFT ALL OF THE WEBSITES DOWN."
What kind of geeks are you people? My god, has the Slashdot Staff gotten OLD or something? Wait, no.. I've seen 65yr old COBOL programmers stay up for 36hrs straight.
The Slashdot Staff didn't get old.. they just started getting paid and now they're all soft and weak. Bah! Pathetic! :P
I bet CmdrTaco can't even handle a 24hr Quake-III-a-thon these days. :P~
my 2 cents by jdriller · 2001-06-27 01:14 · Score: 1

I too bow to Cisco.... For God's sake - this is the *biggest* company in the World...I recently set up a complex network for a growing company. I bought Cisco cause I trust their stuff and...ahem, own their stock...When I had some problems I just called and spent HOURS on the 800 number. Some of the tech people think they have the answers and don't but, no prob, just try someone else. Turns out a WAN module I got did not work with a newer IOS - so Cisco rewrote a fixed version for me....Holy Cow!!!!
Why am I not surprised? by sakusha · 2001-06-26 23:47 · Score: 1

After looking intensively at products like Slashcode, it does not surprise me one bit that their routers are a snarl of obfuscated tables that nobody understands.
Re:ah. by hosterc · 2001-06-26 23:56 · Score: 1

No amount of obscurity will provide security (and as we all know, there is always a new hack waiting to be found) However, I don't think it is a good idea to paint a bright pretty picture for a hacker to go by :) If you give someone a set of keys that will open only 1 out of 1000 available automobiles, you are doing yourself a disservice if you tell him to look for the one that is a black, 2 door, GMC Sonoma extended cab, with fuzzy dice hanging from the mirror.
Re:blocking ping, btw, is STUPID. by hosterc · 2001-06-26 23:15 · Score: 2

The reason people block ping is because you can tell a lot about someones equipment based on the response to an ICMP packet. In some cases you can get info on the OS/equipment models that the packet bounced off of on the other end. From this information it "might" be possible to determine which hacks/scripts/weaknesses you could try instead of just blindly trying everything in the book.
Exodus stock . . . by OvrLrd-Q · 2001-06-29 03:26 · Score: 1

Dropped 79% today, according to cnbc, any relation to the /. outage?
-----
japh

--
japh
(i wish =) )
Good tech support by NickNiel · 2001-06-26 21:54 · Score: 1

I've had to deal with Tech Support from many companies, and have been quite a bit less-than-satisfied from most. I haven't ever dealt directly with Cisco before, but I look forward to doing so. Having word-of-mouth compliments about Cisco's tech support makes me 100% more confident about purchasing their products, especially coming from (what I consider) a reliable source. Anyone who has dealt with poor tech support will surely agree. Kudos to Cisco for getting smart people and training them sufficiently to do their jobs!
Sorry but it is a bad design. by BawbBitchen · 2001-06-27 03:58 · Score: 2

VLANs were a stupid idea to begin with. I wish they had never been invented. Look. Get your own ASN and IP space. Get 2 cheap Cisco 72xx or Foundy BigIrons. Chuck the 6509. Get to BGP feeds from Exodus, one to each router (to different upsteam Exodus boxes). Set up VRRP (HSRI in Cisco) on the backend of the routers - so the default gateway for the back end stuff is x.x.x.1. (Front end are /30's to Exodus) Using the VRRP both routers will respond and be active. One fails no big deal. This is really the cleanest IP based way to do it. Hell I will do it for you for the right $$$. I can get it done in an 1/2 hour at the most (the cutover part!) Just my (I have been doing this for 10 years) two cents...
Another great company by dghcasp · 2001-06-27 02:18 · Score: 2

I haven't had much experience with Cisco, but another company who blew me away with their support was Network Appliance, who make really cool dedicated file servers at about 1/10 the cost of Auspex.
When we installed it, it asked for an email gateway and the email address of our network administrator. We thought this was so it could email us about problems... Close, but even better...
One night, one of our NetApp's spontaniously rebooted itself. Came up just fine all by itself... One of those little hiccoughs that you normally wouln't even know about if you weren't doing monitoring...
We actually found out about the reboot when we came in at 8:00 that morning because there was an email in our inboxes (reconstruction, not quote):
From: xxx@netapp.com Subject: Patch Notification At 3:14 a.m., your file server "z003" crashed and rebooted. From the information that it sent to our autosupport service, we see this was due to bug #754783.
Please download patch http://..... and install it to prevent further problems.
Regards, Network Appliance Automatic Support
In other words, it emailed its logs and core files to the vendor, who had someone look at them, figure out the problem and give us a solution before we even *knew* we had the problem. Wow!
Same sort of thing when we had a disk in the RAID array fail one night... We discovered it had failed because there was a box on my desk with a replacement disk. Yes, it sent the email and they fedex'ed out a replacement without me asking!
Now *that's* support!
Hey by CiscoChic · 2001-06-26 23:06 · Score: 1

You missed a very important factor!
ME!
Listen, you want the real goods on what happened?
They called me up and asked me to fix it after various insults.
I quit afterwards and now I am going to make my own business.
Employer Etiquitte

--
I am not Slashdot's bitch!
Ouch! (Or When a Redundant System Isn't) by vwpau227 · 2001-06-26 22:05 · Score: 4

I remember when I started out in computer networking (and it didn't seem like it was that long ago), I was told this by one of the other technical members of our team, something that I haven't forgotten: redundancy in a system is necessary not only in the hardware and software in that system, but also in the resources that are used to keep that system running (that includes of, course human resources, as well as power HVAC, and so on).
Too often, the human part of the redundancy equation isn't totally factored in. When you don't put all of the human factors into the redundancy equation, you have a redundant system isn't really redundant.
Of course, it helps if you have a vendor that will work with you (and those of you who remember working with Novell servers in "the old days" know what I'm talking about, too).

--
These are the good old days you'll be telling your children about. Make them worthwhile.
Hosting support by l33th3l · 2001-06-27 00:57 · Score: 1

We don't sleep at GENUiTY :)