Tier One ISPs Dying
xbmodder writes "Two tier one ISPs are down today. At about 23:30PST both Verio and Level 3 starting having problems with routes. According to Level 3 this is a software upgrade gone awry. Is this the end for Level 3?" Many, many reports about this are coming in, and if you're wondering why the stories were rather sparse overnight, it's because it's difficult to post them without internet access. Hope everyone else is back online too.
Maybe I'll get some work done today for a change.
-S
--- What parts of "shall make no law", "shall not be infringed", and "shall not be violated" don't you understand?
Is there a term for this kind of intermittant site inaccessability due to Internet outage -- not the user or the server being offline, but the Internet failing?
That what was all this school was for... to teach us how to solve our own problems. -- janeowit
It's nice to see something explaining why I was paged at 2:30am. And now, to whom from Level3 do I send my bill?
But what is a tier 1 ISP?
Is that like a bandwidth wholesaler or something?
This is the sig that says NI (again)
An ISP's server being down 1 day is unacceptable of course, but to say it is dying already? or is there more to these ISP's? (haven't heard of them before)
Bottles Of Beer On The Wall - Advertising Fun! Get your bottle of beer on the wall today!
See http://scoreboard.keynote.com/scoreboard/Main.aspx ?xAxis=Destination&yAxis=Origin&zAxis=Metric&nAxis =Period/
Nico M, London, GB.
Take a look at the scoreboard now. The mentioned problems are gone and Level 3 is no longer in the red.
Why would a software upgrae going wrong be the end of a gigantic Tier-1 ISP?
Today i was playing world of warcraft and on our raid about 25% of my guild mates lost their internet on and off. Other than that the lag was higher than normal but i wondered what the hell was going on. Anyway we still pwn some dragons in BWL :)
unzip; strip; touch; finger; mount; fsck; more; yes; unmount; sleep
Noticed this this morning when a customer called upset about his hosting services being unreachable. A quick traceroute showed one of level3's ip to be down. A few minutes later more customers had problems with different routers from level3. As soon as I saw level3 I knew enough, shrugged it off and told the customer that it was routing problem we couldn't fix but those responsible were most likely already trying to fix it.
It seems fixed now though, so no, this isn't the death of the Internet just yet.
The reports of my death have been greatly exaggerated.
Tier One
Cake or Death? Cake Please!
Hey look, we slashdotted Level 3!
I was up late studying for a German exam, and I was having problems connecting to websites hosted in Germany that I was using to help myself review (dict.leo.org and canoo.net, if you're curious). US websites worked no problem.
Off to the test!
While this only lasted a few hours, it still caused a mess across the North American Internet during those hours. The point is a small amount of big networks are responsible for over 90% of the traffic on the Internet. If alter.net went down it would be total chaos. If just one of the major peering points went down, sure the traffic would be rerouted, but overloading the other points at such high latency that it would be almost unusuable. You better hope no one destroys MAE-EAST or we'll have a live example of what ife without the Internet is like.
Nope. Redundancy and reliability cost money. Fast, cheap, reliable, pick two. Take a look at a typical network and count the single points of failure. Then there are common mode failures, like bugs in router software, that can take down entire networks.
Mea navis aericumbens anguillis abundat
I notice the article links back to Slashdot... I wonder is Slashdot is going to get BoingBoing'ed?
Because you can't spell "slaughter" without "laughter"
GMT -8:00
Religion and politics, without the flame. godgab.org
Is this even an issue? I mean, this was probably scheduled maitenance that went a little longer than expected. I have been through this before. It just sounds like Level 3 dropped some core routers for a few minutes to do a code upgrade - it didn't work so hot, so they were down for a few more mintes, OSPF/BGP decided to tell all the clients that they have no routes, Level 3 gets the routers back up, OSPF/BGP tells everyone that their fine again. Was this like 6 hours, or 45 min?
You create your own reality - Leave mine to me.
Yes and no.
Yes, the Internet enables/permits/allows redundant routes, but...
No, it doesn't require/demand/"enforce with any government or legal authority" redundancy at all levels.
So any smaller ISPs connected to Level3, and all their customers would have had problems reaching the rest and being reached by the rest.
(sarcasm mode)Obviously this wouldn't have happened if the EU had been in control!(/sarcasm mode)
Actually, how many of these corporations are US companies, and how many are NOT?
Is this the first time this has happened? Is it too early to start talking about re-thinking the way this is put together?
useless sig advice - Read Nabokov.
Now it can't even survive a software upgrade on some of the routers!
Why couldn't this have happened during my business day? For just once when a user calls and asks "is the internet down?" I'd like to be able to say "actually, yes, it is."
this sig deleted by another sig
People have been able to say something like that at every point in history. And I'd hardly call this nastier than hurricanes, and the Tsunami was worse than either them or this. The sky is not falling.
I am trolling
Insightful? I certainly hope comparing a short Internet outage to large disasters is a joke..
One of the lessons of history is that nothing is often a good thing to do and always a clever thing to say. - Will Duran
Domesday is something like a census of Britain circa 1085. It has nothing to do with internet outages, which is more akin to doomsday.
Toronto-area transit rider? Rate your ride.
This sort of event provides motivation for overlay routing schemes, which can compensate for major outages along various routes of the backbone:
e rs/subramanianOver/subramanianOver.pdfn focom.pdf
http://www.usenix.org/events/nsdi04/tech/full_pap
http://www.eecs.umich.edu/~farnam/pubs/2005-hwj-i
An unjust law is no law at all. - St. Augustine
But perhaps what's really meant is:
23:30 PDT = 06:30 UTC = 08:30 CEST ?
More like the CIA is upgrading their equipment.
...and not one Netcraft joke?
Way back in the day when I was a Network Controller at BBN Planet, if we began to have cascading routing outages we'd call it "Flapping"... Visualize a wounded bird squirming around on the ground flapping...
Takes me back... My first night on the job a rat in Berkeley chewed through the wrong cable and got himself fried -- he also happened to take the entire west-coast off the internet for the better part of a day.
Then there was the time an electrical worker got vaporized in a hole near MIT which caused quite a problem too as it overloaded the MIT power station, but the fallout wasn't nearly as bad as the day of the rat...
Be who you are and say what you feel, because those who mind don't matter and those who matter don't mind. - Dr. Seuss
First you fail to create a good link, and then that link goes to a login screen? Your link posting rights have been removed.
I'll do the stupid thing first and then you shy people follow...
Well we seem to know why Level3 went down, but why did Verio go down at the same time?
happened in Detroit in the last 24 hours. Apparently all ingoing/outgoing traffic to other Tier One ISPs had problems in that city. Also, Philadelphia had really slow traffic within Level3 (and slower to all the others), and had major problems connecting to Verio. San Diego also had some problems, especially within the Level3 network. St. Louis was the only area without major problems...
For a breakdown, check out this view of the data.
This sig donated to Pater. Long live
Yes, that's exactly what this is. You better curl up into the fetal position in the corner and start crying.
Is there any way we can blame Microsoft for this?
Were they upgrading to one of the Beta builds of Windows Vista Home Edition?
I'm not back online
Level 3 went down at 22:42 pst and was available around 23:50 pst. Verio started having problems right around the same time that Level 3 was coming back up. The Internet Health Report from Keynote showed me what was going on, scary that it was.
No Not Again! Its whats for dinner.
I don't know that they've replaced Sprint yet on my list of most sucktastic internet companies. Time was you lost connectivity to an important piece of the Internet (Like your favorite Quake TeamFortress server) and a traceroute would show the failure somewhere in the Sprint backbone. So far they've been more reliable than Sprint at their worst, at least for me.
If they go under, well Tier 1's don't ever really die. Chances are one of the other Tier 1's will buy their assets and it'll be business as usual. Usually the buyer is MCI.
Of course the true test is pretty easy -- has anyone who works at Level 3 had their paycheck bounce yet? Surely there are a few readers among their employees...
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
No, of course not, you blithering imbecile. L3 had a 2 hour global routing meltdown. Now, it's fixed. Whilst their routes were flapping, other carriers saw transient increases in latency and some problems with reachability, to some sites. However, everything continued to work properly for non-L3 customers. Two hours later L3's routes are back and working properly. End of story, nothing to see here, move along please.
Slashdot editors, do you really expect us to believe that no-one had submitted a more coherent or accurate story than this one? Come on, for heaven's sake.
Anyway, a network engineer's view can be seen in the overnight traffic on NANOG: http://www.merit.edu/mail.archives/nanog/2005-10/ "Tier One ISPs dying" indeed. Worst. Story. EVER.
"None are more hopelessly enslaved than those who falsely believe they are free." -- Goethe
Glad to see that Tier 1 ISPs are joing the ranks of BSD and Apple.
The Anti-Blog
yeah
AccountKiller
I could easily fix this problem. I would just restore it from the Recycle Bin.
I'm sure you know this, but for the rest: "flapping" is the common term for when a router's routing tables rapidly cycle between two invalid states. The dead bird analogy is pretty descriptive, but the term "flapping" has technical and not allegorical origins.
Dewey, what part of this looks like authorities should be involved?
Comment removed based on user account deletion
I got the DTs(delerium telnets) from not getting my fix last night...
The scary thing is it makes you wonder is some terrorist who has intimate knowledge of how Tier 1 ISP's work doing a trial run in the middle of the night by knocking out Level 3 and Verio backbones so later they could try to knock out ALL the backbones in a co-ordinated terrorist attack.
It doesn't make me wonder that. Terrorists do not give a shit about this kind of thing. To even invoke the word "terror" in this discussion is ludicrous.
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
Oh, so THAT's why my daily spam load suddenly dropped by about 35% or so...
Bruce Lane, KC7GR,
Blue Feather Technologies
They are doing pretty well, not amazingly so (what telco is?) but they have a lot of cash and a stable recurring revenue base. They also have a pretty good outlook because they are one of the few companies not caught with thier pants down when the FCC mandated E911 support - which a lot of people are coming to Level 3 for. If you think VOIP has a future then so does Level 3. The market thinks so; regardless of your outlook the stock has been up quite a bit recently.
To call them a "dot bomb" is really unfair since they were far more financially prudent during the timeframe, which is why they are still around at all in the dark forest of discarded Telco husks.
Disclaimer, I work for Level 3. But on the other hand doesn't that mean that I know more than most people about the real situation here?
I have had my paycheck bounce at companies I've worked for in the past and been told I'd have to wait an extra month or two for pay at said companies (you know the kind, six employees and the owners mom uses the company AMEX for trips to DisneyWorld while you wait weeks more to get paid). Level 3 is a few billion dollars away from that sad state.
And don't accuse me of drinking Kool-aid either - after going through a lot of layoffs over the years you have a VERY realistic outlook on what the company does well and what it does not.
I was looking for a Linux Virtual Host, blah, blah.
Stumbed apon these pretty pictures (near bottom of page) .
Curious, I thought, what happened to Level(3) ? I though for a second because perhaps unixshell had a peering with those people that Level(3) were in dispute with.
Nope, just one of those regular outages that make the 99.999% promises sound a little over done.
[% slash_sig_val.text %]
No you idiot, it's because we block you on sight.
Karma: Meh (Mostly from meh.)
Don't blame him! This was a software upgrade gone awry.
94% of Repubs and 21% of Dems voted to renew the Patriot Act
Does it make you happy you're so strange?
Used to do contract work at an auto company's plant. The main data center's primary job was to feed test programs to an distributor testing line and collect the stats. It was located in the middle of the plant on the second floor, next to the row of test stands.
Some time after my contract had ended I visited the place and it was a total disaster.
During the model change shutdown (when most of the plant maintainence and rearrangement was done) the millwrights were welding on some cableways on the ceiling of the plant floor below. The fumes from the welding, of course, rose to the ceiling and escaped through the first hole they could find - around the big fire sprinkler pipe that went up through the floor of the computer room and into the space beneath the raised floor.
It tripped one ionization smoke alarm and sounded the warning - but nobody was around during the shutdown to hear it. Shortly thereafter it tripped a second one and the halon system went off. The computer power shut down and $10,000 worth of halon blasted into the computer room. Half of it came out through vents under the floor, throwing the raised floor panels and a decade's accumulation of fine dust (much of it byproducts of metal cutting and anealing) all over the room. And finally sounding an alarm at the guard shack.
The guards came over and found the room in disarray but no slightest sign of a fire. A couple million bucks worth of computer equipment, slated for replacement in another few months but still critical to the plant's operation, was standing there, covered with dust (likely to cause trouble for the disk drives later) but otherwise intact. So they followed procedure and reset the halon system, switching to the backup cylinder, to protect the computer in case an actual fire made it to the comp room. (Normally that's a good idea, since smouldering that sets off smoke detectors is often followed some time later by an actual fire.)
Of course the welding was still going on - just not at the moment the guard sniffed the comp room. (Welders out to lunch, pulled out due to the alarm, or having decided to come down off the ceiling for a bit after the blast of gas from above.) And they still had work to do. So of course they went back to it.
In less than an hour the situation repeated, dumping the SECOND $10,000 worth of halon on the non-fire. B-(
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way