UUNET/WorldCom Backbone Diffiiculties
FearlessFritz writes "UUnet seems to be having a bad time recently. Several sites in the SouthEast of the US have been slow or down. Here is Worldcom's quote from their web page: 'WorldCom is currently experiencing an interruption of service in various hubs in the U.S. We are working to restore a routing anomaly, and making necessary progress toward resolving this disruption in service.' There are several rumors abounding, but the best is that they performed a hardware upgrade that failed. Is anyone outside of the Southeastern U.S. experiencing the effects of this outage? (I am peered to several providers so I can post!)"
It will take a while for the 3 network techs that havent' been fired yet some time to fix this.
(the voice from the movie WarGame after the first wave of Russian missile launches hit:) "We're still here!"
I have had very slow internet access most of the day in the upper midwest. The problem started at about 8:15 CT this morning.
Im in Alexandria, VA, and I've had horrible service all day long. I could get local sites, but nothing far away. Pinging slashdot wont even work half the time :(. And no google....
Maybe the recent hurricanes knocked over the trailer containing the routers.
"Would it kill you to put down the toilet seat?" -- Maya Angelou
I can verify this issue. I work for a company out of the DC area and our data lines are provided by UUNet. I live in Kansas though and do all my work via SSH to the servers in DC. This morning my SSH sessions would randomly seem to hang. After a bit of investigation I started noticing it was taking like 600ms per hop once I got to the UUNet network. So that is North-East there.
Unstable Apps: Our Android Apps Don't Suck
If you're not here, raise your hand! If you can't get online, send an e-mail to the network admin!
Good judgment comes from experience.
Experience comes from bad judgment.
wondering why it took so long to see posts, and then realized, everyone must be on UUNET and can't race for f.p.
Has been pretty spotty. Times inside of Worldcom's network have been pretty slick across the country (15ms trip time from Cleveland to most anywhere), however almost anything that goes over to a different provider seems to have some pretty horrific lag times (at least a second or two).
wasn't MCI/Worldcom/C&W involved in lawsuit a few yrs ago for similar problems, but they went on for 3 days ?.. back then they got sued for loss of business, imagine whats gonna happen now.
You would think they would have learn't. But they're still hiding information from people.. great. Maybe it's gone with the 6 billion dollars...
some packets were lost today across the world in various network transactions.
http://www.naildrivin5.com/davec
They're just embarassed because they can't figure out how to get CowboyNeal to get down off of their router.
-S
We Apprentice Developers and Designers
Hub: Normal
Outages: Normal
this is from their network status page, i try to abstain from being a smart-ass but outages are normal?
-tid242
With a few exceptions, secrecy is deeply incompatible with democracy and with science. --Carl Sagan
same deal here
UUNET has a big center out in ashburn va, maybe there are problems there too
Bring back the old version of slashdot.
a whole lot of red over at the InternetTrafficReport any other good informative sites?
May this post be indexed by spiders, and archived for all to see as my Internet epitaph.
I think his petswarehouse.com site's had so much traffic over the last couple of hours it's exploded and caused a huge chain reaction ;-)
Code, Hardware, stuff like that.
I work at a small hosting company and our UUNet connectivity (Central California via Anaheim hub) has been screwed since around 6am pacific time. Up and down all morning with latency between 500 and 2000 ms when it is up. Yay worldcom.
...seems to be fine.
Oh, come on. Laugh! You know you want to.
Here at work (not served directly by UUNet) service to various websites has been intermittently down for up to a few hours at a time.
Check it out here
Must be slowing down from trying to filter all :) JK
that porn after being slapped by the Pennsylvania
State Government
WorldCom Forced To Block Questionable Sites
Im currently gathering quotes for a new leased line as part of a office move to new premises. Dont know how they got hold of our details, but i had a call from worldcom today about arranging a visit from a sales representative for a leased line quote. I wonder how they react when i ask them about this?
I have had prior dealings with Uunet as one of our customers use them, and to be honest, their support is dire. One of their DNS servers was not refreshing its cache well at all, resulting in a client not being able to access our website at random periods. Wierd error, one minute he would get "proxy errors, no website at " where ip was a old ip we no longer used, and the next, he would get us fine.
Yesterday morning, I got to work. Net was down. Phones were down. I got a message from our network support. It said that AT&T service (our bandwidth and long distance provider) was down all over the east coast. It came back about an hour later.
What the hell happened? Nothing on the news, nothing obvious on AT&T's site. You'd think that a hour long outage of an entire coast would at least hit the newswires.
Yup - up here in Eastern Canada too.
My network failed over to my other provider. At about 11:30 EST the net started to act funkey. Since I have turned them off things are back to normal... The only thing worse then a failed T1 line, is a silently failed T1 line. What a pain.
http://www.merit.edu/mail.archives/nanog/msg040
nothing concrete and MIDS doesn't show anything on the weather reports (not that it means anything).
I am a WCom customer in NYC. We have a hub-and-spoke VPN from them, hub in NYC and spokes around the USA.
We have had problems today around the country including NYC. Most of them seem to be resulting from routing issues across their backbone.
The internet hasn't been working for me all day.
Have had problems all day, though it seems to be clearing up now.
Most of our issues have been problems resolving names, in fact hitting IP addresses has been possible throughout our problems.
When I called this AM I heard the automated message and left it at that. After 1pm EST, I called again, and spoke with a technician who said "the problem has been escalated from what we originally thought...our gateway routers are going down, and even after we reboot them, they go back down..." Gateway routers will put a hurtin on one's infrastructure, eh?
Big problems in suburban Chicago as well . . .
This morning around 10:30AM I couldn't get anywhere on the web. When I logged in again later, this was part of the daily message from my ISP:
"10/3 Issues with our backbone provider were impairing access outside the SpiritOne/Aracnet network from 10:20AM until 10:55AM this morning. The backbone connection is still down but at the moment all traffic has been diverted to our secondary backbone connection."
The problems with WorldCom's Internet backbone today explains why on a number of sites I visit frequently things are slowing down quite a bit. No wonder banner ads are not showing correctly.
Digex mnages hosting for a key service my employer provides, and they're in and out intermittently.
Our VPN link keeps going up, down, down in one direction, around in circles, several times per minute.
http://www.internettrafficreport.com has some fun results for you, too.
Even Jesus hates listening to Creed.
You've really done it this time /.
Who is John Galt?
We have a T1 Worldcom line here in Toronto, it's fine for the most part, but we have some servers hosted (?somewhere?) in the US with Level3, and we've had a horrible time connecting to them today. Through my home cable Internet connection, the connections to our servers are fine.
I'll have something intelligent to add one of these days...
Right now in the southern portion of the United states they've been hit by a hurricaine.
Might that be a reason for disruptions? Falling telephone poles, Floods of water, Winds taking Satellite dishes and well, Making satellites from them?
_ _ _ Go for the eyes Boo! GO FOR THE EYES!
First post! ... no wait.
Look a monkey!
I have been seeing an enormous number of netbios (port 137) hits on my firewall over the last few days. I usually get a few here and there (in between ssh and ftp hits from Asia. . .) but it's been the majority of stuff over the past coupla days.
I am a believer of momentum and curves.
So they are having routing problems and you put a link to their web page up. Nice of you.
You'd thiink someone would make the obviious post about the use of the letter ii in the subject. II mean, iif they ediited the storiies a liittle closer, the'd not miiss such obviious spelliing errors.
Lowmag.net
What I usually do as soon as I wake up in the morning is check the ip traffic on major US backbones such as exodus. you can find plenty of open route servers in this page www.traceroute.org. And I can say that the load was unusally high this morning(sh ip ?).
Conceptually, the logic states that there should be multiple backbones through multiple geographic areas, such that a failure of one provider could be dealt with by routing traffic through the alternate backbone. Realistically this is difficult and expensive, and the primary reason that there are very few top tier connections running across the united states.
If you look at the map from 1992 (NSF Net | XO OC192 Network), you'll notice that there really are only 2 main paths from east coast to west coast. The southern path is probably at least slightly affected by the incoming hurricaine, and the northern path seems to be overloaded or failing for some other reason.
Precautions? Make sure the hardware is sound and easily replaced, and that alternate routes are available in case of failure. The problem is finding alternate routes that aren't completely congested due to the failure.
Mooniacs for iOS and Android
I've been lagging/not lagging off and on on IRC, about 83ms-100ms pings to google (slightly above normal), apparently there was a big outage or something because when I came back my BNC running on my router had disconnected. For a few minutes or so, I lagged horribly on IRC and AIM, and had no ping anywhere...
--j
. when our backbones fail... what do we do?
Slither around on the floor?
I've had enough abrasive sigs. Kittens are cute and fuzzy.
My bad, I fixed the hardware problem. Everything should be working now. -- Al Gore
-- jimmycarter
There's been discussion of this on the NANOG list, and my DS3 in Chicago was taken down hard by this. Physical layer okay, but traffic died once it was two or three hops into UUnet/Worldcom's core. First outage was from 2am to 8am, second outage from approx. 10:45am (CST) to 2pm. The master tickets for this outage are 651744 (DS1 and below) and 651751 (DS3, OC3 and above). I just got off the phone with Worldcom's NOC and the story I got is that all the border routers that took a dive are back up save a few that they're bringing back up here in Chicago. Worldcom has provided confirmation that the Reason For Outage was a wildly unsuccessful BGP config propagation.
. We've got computers, we're tapping phone lines, you know that ain't allowed - Talking Heads, "Life During Wartime"
The problem is pretty clear - they are working to restore a routing anomaly rather than correcting the ones they still have. I would tell them that if they continue to restore anomalies things will only get wrose, but I can't get through to them.
I'm an American. I love this country and the freedoms that we used to have.
"Diffiiculties?"
Oh, man, it's affecting data transmission quality now.
-Waldo Jaquith
You want http://www1.worldcom.com/us/tools/noc/status.xml
News Performance: Normal
DNS Service: Normal
Backbone: WorldCom is currently experiencing an interruption of service in various hubs in the U.S. We are working to restore service as quickly as possible.
Dialup: Normal
Hub: Normal
Outages: Normal
One of the big problems here is that Woldcom still operates various units as separate entities, virtually no integration has been done to get UUnet working with MFS working with MCI. It's a lot of fun troubleshooting a circuit and having techs tell you "the problem is with MCI, I work for MFS." !!!!! They all work for Worldcom!
Okay, rant mode off.
. We've got computers, we're tapping phone lines, you know that ain't allowed - Talking Heads, "Life During Wartime"
Anything routed to UUnet from Comcast (AT&T) has been picking up a 800ms lag. It has been doing this off and on for the past week or so.
Last night (circa 9PM MDT) I could only reach sites via Level3 or ATT (via L3), according to traceroute, for about an hour or so. Nothing else got anywhere at all, so it might have been local ISP problem.
-- Alastair
Darn, does this mean I'm going to have trouble getting my daily pr0n fix?
For twenty minutes this morning my traffic to the east coast was being dropped at XL2.SEA.ALTER.NET
WorldComEdy strikes again! Maybe they should hire back some of the people they let go to fix this. They could finance it by replacing the executves' cocaine with crushed oxy-contin.
The Uncoveror: It's the real news.
He is basically the go between for the Tech and customers...he says downtimes like this happen everyday, it's just an extremely bad day today. :-)
Derek Greene
The network outage was unrelated to WorldCom's bankruptcy, and the cause is unknown, Burns said.
I have this image that in order to save money, the are routing all of the Southeast's traffic through and AOL dialup using Windows internet sharing.
Do not taunt Happy Fun Ball(TM)
May help a bit, but I'm sure many of the customers ISP's use Worldnet for their backbone too.
I can't even get to the Pets Warehouse site!
echo 656472616c73746f6e406d61632e636f6d0a|xxd -r -p
I will give you your so called "Internet" back as soon as you declare me King of the World! Pinky, gnaw on that wire some more.
But Brain, it hurts my teef.
Pinky! Destiny awaits us!
Narf!
Brain taps foot, frowns at Pinky standing alone covered in electrical char with a wire in his mouth.
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Following is WorldCom's maintenance announcement about today's work, which I recieved because WorldCom is my company's broadband ISP.
During the Normal operations window on Oct 3, 2002
WorldCom will be performing the following scheduled maintenance
activities.
This activity is scheduled to take place from 3:00 a.m. to 6:00 a.m.
(local hub time) in the contiguous US and elsewhere from 3:00 a.m. to
7:00 a.m. (local hub time) and may affect your connectivity. The
following
customer ID will be impacted: XXXXXXXXX.
If you have any questions, please contact our local Customer Network
Support Center. Please reference the internal ticket number 645346.
Quality System Management-Global Maintenance Planning
Worldcom (http://www.uu.net)
1(800) 900-0241 / +1(703) 886-5440
WorldCom United States 1-800-900-0241 (select the following options in
order: 2, then 4, then 1)
WorldCom Denmark (45) 80.30.50.50
WorldCom Italy (39) 02.3600.1887
WorldCom Sweden (46) 8.750.88.50
WorldCom Switzerland (41) 1.580.86.11
They were cheaper than UUNET on a burstable T1, and I haven't had any issues today - related to UUNET. No that still sounds bad. They've been good, I don't see what the poster see's :)
:P
They've been rated #1 or #2 the past few years on Boardwatch (Savvis could own it for all I know), based on latency and ping time, IIRC.
They do multiple peering, and supposedly are dynamic, so with UUNET down, supposedly they're rerouting my traffic across another provider to reach those spots..
Of course, if a site is actually ON UUNET, there might no be any other way to get there.. get it?
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
That's not a link, this is a link!!
Carpe Cerevisi - Seize the Beer
Actually I like the fact of such interruptions - it may convience more and more managers, architects and developers to use asyncronous transactional messaging protocols (like JMS or SOAP or even SMTP if with confirmation) vs decent client server ones (CORBA, proprietary).
Less is more !
..is it called bankruptcy?
Man is born free; and everywhere he is in chains.
That's my speculation to add to the rumor.
'SBEMAIL!' is better than a goat!!
Uhm.. six or seven routers for a 14 node network that could work just fine off a single 16-port switch? That is the biggest case of overkill that I've ever heard!
Ouch! The truth hurts!
for hours now here in SF. Thought it was just my employer but I feel better now.
errr....umm...*whooosh* *whoosh* Is this thing on ?
My colo is at JC and we took a hit yesterday. Tech support later told me UUNET had said there was a DDOS going on and it was dragging us down. I wonder if he was feeding me a line of B.S...
Pedro
----
The Insomniac Coder
I definitely noticed things were slower here than usual. I had SSH failures, and very long page loads, and intermittant downtime. But we are up and running.
Anyone who's done any kind of IOS upgrading on some of the upper-end Cisco routers and Juniper routers knows that the upgraded images aren't always the most stable items around.
At one point, there was a severe outage at Genuity referred to as "Black Tuesday", when an IOS upgrade sunk a majority of the network and caused a ripple that made for a really shitty morning.
That was a few years ago, though. I can't go into the specifics of the RFO...but the failure was a very visible issue which resulted in modifications to the testing and change management processes.
Unfortunately, sometimes testing production software doesn't sufficiently break until actually put into production.
// Agent Green (Ian / IU7 / KB1JQO)
// IEEE 802.3: All 10base Are Belong To Us
Ive been having slow internet access for about 7 years here in the Midwest.
" Is anyone outside of the Southeastern U.S. experiencing the effects of this outage? "
So let me get this straight: You want the users of Slashdot to report internet outages? And how are we supposed to rule out that Slashdot didn't cause them?
Significant increase in demand.
Redhat 8.0 ISO's
Mandrake 9.0 ISO's
UT 2003
I've have heard from lots of people (and myself!) from all over South-Western Ontario having difficulties reaching most websites (mostly in the US). It seems to be off and on, right now, though.
Its obviously NOT a backbone problem....Redhat 8.0 just came out a few days ago...of coarse traffic is going to be nuts, and slower than hell, ITS 5 CD'S NOW!(including source, mind you)
In college, really poor, need a flatscreen.
Judging by ITR the big crash happened around 8 am Eastern time... maybe everybody got to work in the Boston-to-DC corridor, checked their email and said "A bugbear? That sounds cute", sending millions of emails and backdoor remote attack sessions over an internetwork that is already having trouble because of a hurricane.
Meanwhile, everyone who was on a system that doesn't use Outlook was slamming the FTP servers for RH8 or Lunar 1.
All's true that is mistrusted
Heh, and here I thought my trouble with the Internet lately was just this ^%$@* work computer.
grumble stupid Windoze mutter Excel interface curse BSOD growl tech writer grumpy imprecate need antidote snarl
I'm not a geek, I'm just a clever script.
I was wondering why it was taking so long to donwload "the Two Towers" this morning.
LongTail SSH Brute Force analysis tool is here!
I recieved the following from my ISP a few minutes ago:
Trouble Ticket #22048416
Type of Event: Outage
Affected: DED1.CLEVOH
Description: Dear Ameritech Customer,
For several hours today, UUNET, our Global Service Provider in the Ameritech region, suffered a severe routing issue, which impacted most of the Ameritech Internet Services, as well as many other providers who use UUNET as a backbone service. Losses of routes, BGP failures, routing loops, and over-utilized circuits during this time were caused by these issues within UUNET (alter.net). By working with the network engineers at UUNET, we at SBC were able to assist in providing a working resolution for this issue, and we are currently working with UUNET to try and ensure that such an issue does not occur again. As all providers' networks begin to reconverge their routing tables, customers may continue to experience mild latency over the next few hours, but this should disappear in a matter of time. We thank you for your patience and understanding in this matter and apologize for any trouble or inconvenience that this issue may have caused.
"Because I have balls like atom bombs, two of them, 100 megatons each. Nobody fucks with me."
The page said that earlier, but when I did a reload just now, it no longer has that message - are things back to normal?
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
our DSL provider tells us our network outage (yeah, we can see their router but no further) is due to the Worldcom debacle. Possibly, though they can be a little flaky themselves sometimes.
Plus there were those pings that had thier TTL expire because they kept bouncing back and forth between two alter.net routers...a whole lotta crap going on today.
I'm off to buy a few crates of canned food and bottled water... ;)
The worst part of an outage like this is the users always blame you for any connectivity problems. "I can't get to the D&B website, when are you going to have it fixed?", you patiently explain the circuits to your provider are fine, your provider's circuts are fine, and the problem is either with D&B's network or their provider. "Yeah, whatever, when are you going to have it fixed?", lusers are utterly hopeless, unfortunately you have to at least humor them when they sign your paychecks.
Happy Fun Ball is for external use only.
when our backbones fail... what do we do?
Slither around on the floor?
Um, get the Doctor to make us a holographic backbone??
I told the NOC boys that sending those BGP tables to dev/null on the Juniper's would be a good idea since they were starting to take up too much disk space... (too many MCSE's in here anyway!) Now with a few keystrokes I will finish reclaiming their diskspace and deleting their accounts... there we go, and now I am off to "get a cup of coffee" and gather their office toys while they are being escorted out of the building. They won't interupt me from grepping emails again. In the spirit of my mentor, BOFH.
Isn't this a bit like "Keyboard Error: Press F1 to continue..." or even, "My e-mail is down. Really, send me an e-mail about the problem."
Perhaps we need a salshfault.org to contain this new brand of comedy.
Hammy
Maybe the hurricane last night snapped that major backbone that goes through coastal Louisiana. ;)
Kinda puts network outages in perspective.
Best Slashdot Co
Can someone please come down and fix my internet? Or is the "the server" down?
I'm going to lunch between 11:30 and 12:30, so that should be a good time for you to fix it.
Thanks!
The network engineers at Worldcom should have known better than to do an upgrade during a time of high solar activity.
What were they thinking?
While the Wired News article says that some people were speculating that it was the Slapper worm, other people were speculating that it was a fiber cut, but it first quotes the UUnet page which says they're having a routing anomaly and that it was affecting multiple gateways. That means it's not likely to be a cable cut, because that would be more localized, and it's also not likely to be the Linux worm because the routing stuff isn't happening on Linux boxes - it'd be either Cisco or Juniper, and I'm not aware of any reports that the worm affects those platforms.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
apologies if this was posted earler... from:http://www.dotcomscoop.com/worldcom.html#outa ge
LATEST NEWS
Thursday October 3, 2002 @ 4:31 PM EDT
Have fun with this one:
FLASH SUMMARY
FLASH NUMBER 20021003023
ORGANIZATION UUNET
SEVERITY LEVEL CATA
CRITERIA 100,000 AOL User drop
IMPACT LOSS OF DATA AND CONNECTIVITY
NETWORK IP
SUBNETWORK UUNET CORE NETWORK
STATE VA
CITY ASHBURN
COUNTRY USA
OUTAGE START DATE 10-03-2002
OUTAGE START TIME 12:29:00 GMT
OUTAGE END DATE none
OUTAGE END TIME none GMT
DURATION
EQUIPMENT N/A
CAUSE OF OUTAGE N/A
CORRECTIVE ACTION
TICKET SOURCE REMEDY
TICKET NUMBER 60138
LEC/OCC TICKET
PVC/CKT Affected
WEBSITE oasis.wcomnet.com
COMMENTS
10-03-2002 12:52 GMT
UUNET NOC has identified a catastrophic outage. Multiple routers across the core network are unreachable. TAC and NOC engineers working to isolate the problem. Information bridge VNET # 211-5675 PIN # 236044
10-03-2002 13:29 GMT
UUNET reports 1462 dedicated customers affected. Senior TAC engineers still working to isolate/resolve the outage.
10-03-2002 14:13 GMT
UUNET NOC reports 13 routers and 240 customers are still down. NOC and TAC still working to restore those devices and customers. No firm reason for outage.
10-03-2002 14:49 GMT
UUNET NOC reports continuing to reload CISCO routers to restore line cards. Approximately 500 T1's and 100 multi-meg customers are down.
10-03-2002 15:25 GMT
UUNET NOC is continuing to troubleshoot disabled line cards. At this time there are 3 GW routers and 5 BR routers that are down. There are 332 T1 customers and 60 multi-meg customers that are affected.
AOL has reported 15,000 new user drops since the last update. CISCO and Juniper representatives are involved in the troubleshooting process.
10-03-2002 15:46 GMT
UUNET NOC reports they are continuing to work with Cisco to determine the cause of the line card outages. Dedicated customers at the T1 and Multi-Meg level continue to lose connectivity.
10-03-2002 16:08 GMT
UUNET NOC reports currently have 97 Multi-Meg and 824 T1 customers down as line cards continue to become disabled. Ashburn NOC reloading routers as necessary to restore service. 11 GW's and 9 BR's are down. Filters are in place to capture crash information in order to provide core dumps to vendor representatives to isolate reason for outage.
10-03-2002 16:54 GMT
UUNET NOC reports they continue to gather card crash information from the various filters set throughout the network. 8158 T1 customers down, 360 Multi-Meg, 14 BR's and 45 GW's are down. Current strategy is to gather more info from filters to isolate reason for outage.
10-03-2002 17:43 GMT
UUNET NOC reports they are working with the vendor on the router crash data. They are also reviewing route updates from 12:00 GMT to identify a possible bad route update that may be causing the line cards to become disabled. There are 608 Multi-meg customers, 10 BRs and 34 GWs down at this time.
10-03-2002 18:32 GMT
UUNET NOC reports the network has been stable for approximately 50 minutes. The NOC is working on restoring 11 BRs and 22 GWs. There are approximately 2800 T1 and 326 Multi-Meg customers that remain down.
I was having some real issues with some instant messangers working (and no this isnt normal) when i called bellsouth's ADSL tech center they told me there was some problem cause unknown starting at approx. 8.30am EST and resolution time unknown. could this be the same thing?
If you can't see the value in jet powered ants you should turn in your nerd card. - Dunbal (464142)
I run a small ISP in Portland, OR who's been down for two days because my network got deleted from the RADB from which the backbone ISP builds their routing tables. It's been working fine since I started using it almost a year ago, and magically stopped working the evening of Oct 1 (first of the month, in the evening when the backbone updates their tables), so I think a policy change topside is the "routing anomaly" that has barfed up everything. At least I'm supposed to be back online later this evening...
Anyone who's done any kind of IOS upgrading on some of the upper-end Cisco routers and Juniper routers knows that the upgraded images aren't always the most stable items around.
So try Redback's Smartedge. It uses a separately-developed code base.
B-)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way