UUNET/WorldCom Backbone Diffiiculties
FearlessFritz writes "UUnet seems to be having a bad time recently. Several sites in the SouthEast of the US have been slow or down. Here is Worldcom's quote from their web page: 'WorldCom is currently experiencing an interruption of service in various hubs in the U.S. We are working to restore a routing anomaly, and making necessary progress toward resolving this disruption in service.' There are several rumors abounding, but the best is that they performed a hardware upgrade that failed. Is anyone outside of the Southeastern U.S. experiencing the effects of this outage? (I am peered to several providers so I can post!)"
I have had very slow internet access most of the day in the upper midwest. The problem started at about 8:15 CT this morning.
Im in Alexandria, VA, and I've had horrible service all day long. I could get local sites, but nothing far away. Pinging slashdot wont even work half the time :(. And no google....
I can verify this issue. I work for a company out of the DC area and our data lines are provided by UUNet. I live in Kansas though and do all my work via SSH to the servers in DC. This morning my SSH sessions would randomly seem to hang. After a bit of investigation I started noticing it was taking like 600ms per hop once I got to the UUNet network. So that is North-East there.
Unstable Apps: Our Android Apps Don't Suck
a whole lot of red over at the InternetTrafficReport any other good informative sites?
May this post be indexed by spiders, and archived for all to see as my Internet epitaph.
I work at a small hosting company and our UUNet connectivity (Central California via Anaheim hub) has been screwed since around 6am pacific time. Up and down all morning with latency between 500 and 2000 ms when it is up. Yay worldcom.
Check it out here
www.msnbc.com/news/816663.asp
Yep I'm a karma whore...dunno why...
Yup - up here in Eastern Canada too.
http://www.merit.edu/mail.archives/nanog/msg040
nothing concrete and MIDS doesn't show anything on the weather reports (not that it means anything).
I am a WCom customer in NYC. We have a hub-and-spoke VPN from them, hub in NYC and spokes around the USA.
We have had problems today around the country including NYC. Most of them seem to be resulting from routing issues across their backbone.
Have had problems all day, though it seems to be clearing up now.
Most of our issues have been problems resolving names, in fact hitting IP addresses has been possible throughout our problems.
When I called this AM I heard the automated message and left it at that. After 1pm EST, I called again, and spoke with a technician who said "the problem has been escalated from what we originally thought...our gateway routers are going down, and even after we reboot them, they go back down..." Gateway routers will put a hurtin on one's infrastructure, eh?
Big problems in suburban Chicago as well . . .
This morning around 10:30AM I couldn't get anywhere on the web. When I logged in again later, this was part of the daily message from my ISP:
"10/3 Issues with our backbone provider were impairing access outside the SpiritOne/Aracnet network from 10:20AM until 10:55AM this morning. The backbone connection is still down but at the moment all traffic has been diverted to our secondary backbone connection."
9 p1-0.chcgil2-cr8.bbnplanet.net (4.24.9.46) 71.206 ms 74.431 ms 71.261 ms
10 p6-1.toucham2.bbnplanet.net (4.24.224.58) 82.329 ms 133.483 ms 71.456 ms
11 chi1-core-02.tamerica.net (66.62.7.2) 1397.469 ms 1374.084 ms 1261.074 ms
12 den1-core-01.tamerica.net (66.62.3.29) 1320.134 ms 1295.474 ms 1297.648 ms
I guess touch america is screwed up too.
Right now in the southern portion of the United states they've been hit by a hurricaine.
Might that be a reason for disruptions? Falling telephone poles, Floods of water, Winds taking Satellite dishes and well, Making satellites from them?
_ _ _ Go for the eyes Boo! GO FOR THE EYES!
Conceptually, the logic states that there should be multiple backbones through multiple geographic areas, such that a failure of one provider could be dealt with by routing traffic through the alternate backbone. Realistically this is difficult and expensive, and the primary reason that there are very few top tier connections running across the united states.
If you look at the map from 1992 (NSF Net | XO OC192 Network), you'll notice that there really are only 2 main paths from east coast to west coast. The southern path is probably at least slightly affected by the incoming hurricaine, and the northern path seems to be overloaded or failing for some other reason.
Precautions? Make sure the hardware is sound and easily replaced, and that alternate routes are available in case of failure. The problem is finding alternate routes that aren't completely congested due to the failure.
Mooniacs for iOS and Android
There's been discussion of this on the NANOG list, and my DS3 in Chicago was taken down hard by this. Physical layer okay, but traffic died once it was two or three hops into UUnet/Worldcom's core. First outage was from 2am to 8am, second outage from approx. 10:45am (CST) to 2pm. The master tickets for this outage are 651744 (DS1 and below) and 651751 (DS3, OC3 and above). I just got off the phone with Worldcom's NOC and the story I got is that all the border routers that took a dive are back up save a few that they're bringing back up here in Chicago. Worldcom has provided confirmation that the Reason For Outage was a wildly unsuccessful BGP config propagation.
. We've got computers, we're tapping phone lines, you know that ain't allowed - Talking Heads, "Life During Wartime"
Following is WorldCom's maintenance announcement about today's work, which I recieved because WorldCom is my company's broadband ISP.
During the Normal operations window on Oct 3, 2002
WorldCom will be performing the following scheduled maintenance
activities.
This activity is scheduled to take place from 3:00 a.m. to 6:00 a.m.
(local hub time) in the contiguous US and elsewhere from 3:00 a.m. to
7:00 a.m. (local hub time) and may affect your connectivity. The
following
customer ID will be impacted: XXXXXXXXX.
If you have any questions, please contact our local Customer Network
Support Center. Please reference the internal ticket number 645346.
Quality System Management-Global Maintenance Planning
Worldcom (http://www.uu.net)
1(800) 900-0241 / +1(703) 886-5440
WorldCom United States 1-800-900-0241 (select the following options in
order: 2, then 4, then 1)
WorldCom Denmark (45) 80.30.50.50
WorldCom Italy (39) 02.3600.1887
WorldCom Sweden (46) 8.750.88.50
WorldCom Switzerland (41) 1.580.86.11
I don't have much to say. I see some very non-intrusive maintenance done on the switches and routers yesterday. No major internal discussions that have crossed from the networking group over into my jurisdiction, though. I noticed the company's internal VPN was a little up-and-down from 8-9am CST, but that's about it.
AC'd to protect me.
-AD
I recieved the following from my ISP a few minutes ago:
Trouble Ticket #22048416
Type of Event: Outage
Affected: DED1.CLEVOH
Description: Dear Ameritech Customer,
For several hours today, UUNET, our Global Service Provider in the Ameritech region, suffered a severe routing issue, which impacted most of the Ameritech Internet Services, as well as many other providers who use UUNET as a backbone service. Losses of routes, BGP failures, routing loops, and over-utilized circuits during this time were caused by these issues within UUNET (alter.net). By working with the network engineers at UUNET, we at SBC were able to assist in providing a working resolution for this issue, and we are currently working with UUNET to try and ensure that such an issue does not occur again. As all providers' networks begin to reconverge their routing tables, customers may continue to experience mild latency over the next few hours, but this should disappear in a matter of time. We thank you for your patience and understanding in this matter and apologize for any trouble or inconvenience that this issue may have caused.
"Because I have balls like atom bombs, two of them, 100 megatons each. Nobody fucks with me."
While the Wired News article says that some people were speculating that it was the Slapper worm, other people were speculating that it was a fiber cut, but it first quotes the UUnet page which says they're having a routing anomaly and that it was affecting multiple gateways. That means it's not likely to be a cable cut, because that would be more localized, and it's also not likely to be the Linux worm because the routing stuff isn't happening on Linux boxes - it'd be either Cisco or Juniper, and I'm not aware of any reports that the worm affects those platforms.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
apologies if this was posted earler... from:http://www.dotcomscoop.com/worldcom.html#outa ge
LATEST NEWS
Thursday October 3, 2002 @ 4:31 PM EDT
Have fun with this one:
FLASH SUMMARY
FLASH NUMBER 20021003023
ORGANIZATION UUNET
SEVERITY LEVEL CATA
CRITERIA 100,000 AOL User drop
IMPACT LOSS OF DATA AND CONNECTIVITY
NETWORK IP
SUBNETWORK UUNET CORE NETWORK
STATE VA
CITY ASHBURN
COUNTRY USA
OUTAGE START DATE 10-03-2002
OUTAGE START TIME 12:29:00 GMT
OUTAGE END DATE none
OUTAGE END TIME none GMT
DURATION
EQUIPMENT N/A
CAUSE OF OUTAGE N/A
CORRECTIVE ACTION
TICKET SOURCE REMEDY
TICKET NUMBER 60138
LEC/OCC TICKET
PVC/CKT Affected
WEBSITE oasis.wcomnet.com
COMMENTS
10-03-2002 12:52 GMT
UUNET NOC has identified a catastrophic outage. Multiple routers across the core network are unreachable. TAC and NOC engineers working to isolate the problem. Information bridge VNET # 211-5675 PIN # 236044
10-03-2002 13:29 GMT
UUNET reports 1462 dedicated customers affected. Senior TAC engineers still working to isolate/resolve the outage.
10-03-2002 14:13 GMT
UUNET NOC reports 13 routers and 240 customers are still down. NOC and TAC still working to restore those devices and customers. No firm reason for outage.
10-03-2002 14:49 GMT
UUNET NOC reports continuing to reload CISCO routers to restore line cards. Approximately 500 T1's and 100 multi-meg customers are down.
10-03-2002 15:25 GMT
UUNET NOC is continuing to troubleshoot disabled line cards. At this time there are 3 GW routers and 5 BR routers that are down. There are 332 T1 customers and 60 multi-meg customers that are affected.
AOL has reported 15,000 new user drops since the last update. CISCO and Juniper representatives are involved in the troubleshooting process.
10-03-2002 15:46 GMT
UUNET NOC reports they are continuing to work with Cisco to determine the cause of the line card outages. Dedicated customers at the T1 and Multi-Meg level continue to lose connectivity.
10-03-2002 16:08 GMT
UUNET NOC reports currently have 97 Multi-Meg and 824 T1 customers down as line cards continue to become disabled. Ashburn NOC reloading routers as necessary to restore service. 11 GW's and 9 BR's are down. Filters are in place to capture crash information in order to provide core dumps to vendor representatives to isolate reason for outage.
10-03-2002 16:54 GMT
UUNET NOC reports they continue to gather card crash information from the various filters set throughout the network. 8158 T1 customers down, 360 Multi-Meg, 14 BR's and 45 GW's are down. Current strategy is to gather more info from filters to isolate reason for outage.
10-03-2002 17:43 GMT
UUNET NOC reports they are working with the vendor on the router crash data. They are also reviewing route updates from 12:00 GMT to identify a possible bad route update that may be causing the line cards to become disabled. There are 608 Multi-meg customers, 10 BRs and 34 GWs down at this time.
10-03-2002 18:32 GMT
UUNET NOC reports the network has been stable for approximately 50 minutes. The NOC is working on restoring 11 BRs and 22 GWs. There are approximately 2800 T1 and 326 Multi-Meg customers that remain down.