UUNET/WorldCom Backbone Diffiiculties
FearlessFritz writes "UUnet seems to be having a bad time recently. Several sites in the SouthEast of the US have been slow or down. Here is Worldcom's quote from their web page: 'WorldCom is currently experiencing an interruption of service in various hubs in the U.S. We are working to restore a routing anomaly, and making necessary progress toward resolving this disruption in service.' There are several rumors abounding, but the best is that they performed a hardware upgrade that failed. Is anyone outside of the Southeastern U.S. experiencing the effects of this outage? (I am peered to several providers so I can post!)"
I was having problems accessing our corporate intranet yesterday, when i called our tech support team, they said that half of texas was pretty much down. Since we take most of our orders via the net, our business flow was severely disrupted.
Who's to blame in a case like this? what if this keeps on happening, what should we do when relying on such a big provider?
Yesterday morning, I got to work. Net was down. Phones were down. I got a message from our network support. It said that AT&T service (our bandwidth and long distance provider) was down all over the east coast. It came back about an hour later.
What the hell happened? Nothing on the news, nothing obvious on AT&T's site. You'd think that a hour long outage of an entire coast would at least hit the newswires.
I've been seeing odd intermittent packet loss across sprint and worldcom all day. I started checking itr and saw 25% average packetloss across north america, with about 20% of the routers they monitor passing 0% traffic and turned on CNN... Figured something had to be happening...
01:36AM up 426 days, 2:46, 1 user, load average: 0.14, 0.11, 0.05
From the reports from one of our managed networking providers the trouble started with the DC peering center and moved outwards. They lost a couple OC12's that still haven't come back and have had other lines up and down all day.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
We have a T1 Worldcom line here in Toronto, it's fine for the most part, but we have some servers hosted (?somewhere?) in the US with Level3, and we've had a horrible time connecting to them today. Through my home cable Internet connection, the connections to our servers are fine.
I'll have something intelligent to add one of these days...
I have been seeing an enormous number of netbios (port 137) hits on my firewall over the last few days. I usually get a few here and there (in between ssh and ftp hits from Asia. . .) but it's been the majority of stuff over the past coupla days.
I am a believer of momentum and curves.
What I usually do as soon as I wake up in the morning is check the ip traffic on major US backbones such as exodus. you can find plenty of open route servers in this page www.traceroute.org. And I can say that the load was unusally high this morning(sh ip ?).
If the accident is due to the person
1) Talking on the cellphone
2) Makeup/hair
3) Reading or watching movies (don't laugh, I regularly get passed by a guy with a laptop on his dashboard watching DVDs. And I already drive way faster than I should) or even fiddling with the radio
4) Any other non-driving activity
Yes, they should be sued. I may be biased because I already have a 1 hour each way commute, but string the fsckers up!
-- IANAEG - I am not an elder god.
You want http://www1.worldcom.com/us/tools/noc/status.xml
News Performance: Normal
DNS Service: Normal
Backbone: WorldCom is currently experiencing an interruption of service in various hubs in the U.S. We are working to restore service as quickly as possible.
Dialup: Normal
Hub: Normal
Outages: Normal
One of the big problems here is that Woldcom still operates various units as separate entities, virtually no integration has been done to get UUnet working with MFS working with MCI. It's a lot of fun troubleshooting a circuit and having techs tell you "the problem is with MCI, I work for MFS." !!!!! They all work for Worldcom!
Okay, rant mode off.
. We've got computers, we're tapping phone lines, you know that ain't allowed - Talking Heads, "Life During Wartime"
Actually I like the fact of such interruptions - it may convience more and more managers, architects and developers to use asyncronous transactional messaging protocols (like JMS or SOAP or even SMTP if with confirmation) vs decent client server ones (CORBA, proprietary).
Less is more !
Anyone who's done any kind of IOS upgrading on some of the upper-end Cisco routers and Juniper routers knows that the upgraded images aren't always the most stable items around.
At one point, there was a severe outage at Genuity referred to as "Black Tuesday", when an IOS upgrade sunk a majority of the network and caused a ripple that made for a really shitty morning.
That was a few years ago, though. I can't go into the specifics of the RFO...but the failure was a very visible issue which resulted in modifications to the testing and change management processes.
Unfortunately, sometimes testing production software doesn't sufficiently break until actually put into production.
// Agent Green (Ian / IU7 / KB1JQO)
// IEEE 802.3: All 10base Are Belong To Us
Ok so infastruture liability is on the verge of OT...
US DOT figures show that a accident scene can cost others up to a million dollars a minute while the police are out finding out who caused it and other insurance paperwork gathering. Next time you see a wreck that delays you, call up the police to get the accident report and file aginst the insurance company. Right now the rights of the idiots involed in the accident have more rights than the people waiting because of insurance companies want to place blame and the wrecked cars have to be protected. What they should do is push the cars off the road and cover them with tents to stop the rubbernecking.
Of course anyone reading this topic is here rubbernecking on the information super highway...
I run a small ISP in Portland, OR who's been down for two days because my network got deleted from the RADB from which the backbone ISP builds their routing tables. It's been working fine since I started using it almost a year ago, and magically stopped working the evening of Oct 1 (first of the month, in the evening when the backbone updates their tables), so I think a policy change topside is the "routing anomaly" that has barfed up everything. At least I'm supposed to be back online later this evening...