Slashdot Mirror


Lessons Learned From Skype’s Outage

aabelro writes "On December 22th, 1600 GMT, the Skype services started to become unavailable, in the beginning for a small part of the users, then for more and more, until the network was down for about 24 hours. A week later, Lars Rabbe, CIO at Skype, explained what happened in a post-mortem analysis of the outage."

22 of 278 comments (clear)

  1. Deployed Soldiers. by puterg33k · · Score: 5, Insightful

    For us it's nearly our only way to speak to our loved ones at home. I'm just glad it's back up...

  2. Blogspam by ralf1 · · Score: 5, Informative

    Not sure why you didn't link to the actual article on Skype http://blogs.skype.com/en/2010/12/cio_update.html Instead of the blogspam site.

    --
    "Would you, could you, with a goat?" Dr Seuss
    1. Re:Blogspam by Jurily · · Score: 4, Insightful

      But how else will aabelro promote his own site on Slashdot?! It's just good business sense.

      And people wonder why we don't RTFA.

    2. Re:Blogspam by Monkeedude1212 · · Score: 5, Funny

      We didn't want to Slashdot Skype and cause any more issues.

    3. Re:Blogspam by John+Hasler · · Score: 4, Insightful

      My workplace is so backwards they still use old-fashioned telephone lines rather than internet phones.

      And consequently you had reliable service while all the "modern, forward thinking" Skype users were down.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  3. December 22th? by colinRTM · · Score: 5, Funny

    Seriously?

  4. you are kidding me by alphatel · · Score: 5, Interesting

    If you are a node-based company worth several billion, charge for services, and don't even run enough of your own supernodes and monitor them in such a way that they cannot handle an outage effectively, you need serious help.

    --
    When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
    1. Re:you are kidding me by TubeSteak · · Score: 5, Insightful

      If you are a node-based company worth several billion, charge for services, and don't even run enough of your own supernodes and monitor them in such a way that they cannot handle an outage effectively, you need serious help.

      No one expects 40% of a globally distributed network to crash at once. No one.
      FTFA:

      The initial crashes happened just before our usual daily peak-hour (1000 PST/1800 GMT), and very shortly after the initial crash, which resulted in traffic to the supernodes that was about 100 times what would normally be expected at that time of day.

      Not even a multi-billion dollar company would have a disaster plan that provisions 100x capacity as a hot/cold spare.
      Though I bet their new plan includes automatic spawning of nodes on EC2 or some other distributed CDN.

      --
      [Fuck Beta]
      o0t!
  5. lesson (hopefully) learned... by smash · · Score: 4, Insightful

    ... relying on dodgy peer to peer VOIP telephony for business purposes is retarded.

    we've got people bitching at work about how it doesn't work from time to time, and why I've blocked its ability to do voice/video at the firewall. If you want VOIP, use something that uses standard SIP or some other documented, configurable traffic.

    --
    I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
    1. Re:lesson (hopefully) learned... by commodore64_love · · Score: 5, Interesting

      Ahh so YOU'RE the one blocking my skype. ;-)
      I don't understand why Net Admins (such as yourself) block useful tools like Skype. Or streaming radio. I don't see any harm in letting those things into the office space, and it provides a more pleasant working environment (to distract from the boredom of sitting at a desk all day).

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    2. Re:lesson (hopefully) learned... by smash · · Score: 5, Informative

      Why do I block skype? Because the only way to have it work properly through most firewalls is to allow ALL outgoing ports. Which means you allow any random program to do any random shit through your firewall to the outside network. Its a massive, massive security issue you could drive an oil tanker through.

      Also, many companies pay for bandwidth. I don't want all of my bandwidth chewed up on video calls instead of mission critical apps.

      Its not just because we're nazis, its because skype protocol is completely fucked when it comes to the ability of your admin to control resources. Want voip/video? Use something else.

      --
      I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
    3. Re:lesson (hopefully) learned... by smash · · Score: 5, Insightful

      Just let me clarify: corporate networks are different to your home network. your home network? fine, use skype. in the office, where you've got several hundred PCs that may/may not have malicious software and/or users at the helm - allowing all outgoing connections is just begging for trouble.

      Egress filtering is a good thing.

      Making your day at work "less boring" by enabling you to do non-work related shit with company resources is not what my job is about. It is about ensuring the continued operation of the company's network - and skype is a liability.

      --
      I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
    4. Re:lesson (hopefully) learned... by smash · · Score: 4, Informative
      1. Because skype wasn't written that way. You want standard voice/video, use a SIP program. Skype was written deliberately by the developers to allow it to talk to anywhere and everywhere through your network so it can route other people's calls, and connect to random other nodes for your own call routing. That free lunch you're eating? Paid for by other's use of your bandwidth.
      2. Multiply 500 users by 48kbit. thats 24 megabit in streaming audio. That you can get off that fucking $10 FM radio on your desk. Now i'm not sure how expensive bandwidth is where you are, but a 24 business grade meg METERED (say, 300 gigs) internet connection here is about 5-10 grand a month. The business is not going to wear the cost of 5-10k per month for our users to listen to shitty quality streaming MP3. Thats before you take into account the increase latency to mission critical apps, or remote end points on crappy satellite connections paying anywhere up to $7 per MEG of data
      --
      I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
  6. Obvious problem.... by dstar · · Score: 4, Interesting

    Hmm. Seems to me their biggest problem is that they allowed clients with a known bug to become supernodes; if 50% of the network had upgraded, they should only have been creating supernodes from the upgraded clients.

    And in hindsight (I don't know that they should be blamed for not considering this before), the number of supernodes should probably be ~100-150% more than needed to service expected load. That way, if a third of them die, they _still_ have more than needed to handle the expected load. (And thus, hopefully, more than needed to handle the excessive load without causing them to shut down).

  7. I don't understand this. by commodore64_love · · Score: 4, Interesting

    "At its core, Skype relies on a third generation P2P network that has lots of peer nodes and a number of supernodes, one for several hundreds of nodes. Since Skype does not have a centralized directory to support finding routes between two or more nodes that want to communicate, the virtual network uses supernodes as directories. When a client enters Skype, it registers itself with a supernode, giving its IP address so it can be found by other clients who might want to establish a communication."

    Skype is a peer-to-peer network? Like torrent? So the supernode is like a tracker website, to connect peers to one another? No supernode==no tracker==no calls going through. Hmmmm. Maybe they should try DHT.

    --
    "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
  8. TL;DR version: by The+MAZZTer · · Score: 4, Interesting

    Lots of users were using an old outdated buggy version of Skype, lots of client crashes at once bringing down big chunks of the P2P network, remaining network couldn't handle the load and went down too, took a while for Skype to put it's own supernodes up to help get the network self-sustaining again.

    They're considering an auto-update feature now since such a feature could have kept this from happening. Personally I think old versions should be blocked from making or receiving calls too, so users would be encouraged to update (works for Team Fortress 2). Of course auto updates would make updating super easy anyway so impact from that would be minimal.

  9. Never makes sense to upgrade working software... by syousef · · Score: 5, Interesting

    ...unless you need something in the newer version (feature, security update etc.). Of course us geeks like to have the latest to fiddle with, but for the average Joe end-user, if it ain't broke, don't fix it. There is always the risk that the newer software will contain new bugs. At one point the buggy version of the Skype software was the latest version and was what users were being pushed to upgrade to. If the crash had happened then, I wonder if they'd find a new way to scapegoat users.

    By the way new versions breaking existing functionality isn't theoretical, or rare. I'm currently installing software on my new laptop. I've had to downgrade both Zonealarm and Virtualbox. The former broke remote desktop. The later broke file sharing. No idea why, but in each case uninstalling and installing an older version I knew worked fixed the issue for me.

    --
    These posts express my own personal views, not those of my employer
  10. Supernode Software by varmittang · · Score: 4, Interesting

    How about they release some supernode only software that people can setup on a server and possibly the ability to setup Skype to use a preferred supernode. So a businesses can setup a supernode of their own and point their users too it. But also that supernode is part of the collective of supernodes and routes Skype connections for everyone else too. This would hopefully give Skype more supernodes out there that are 24/7 and not desktop computers routing the traffic.

    --
    -----BEGIN PGP SIGNATURE-----
    12345
    -----END PGP SIGNATURE-----
  11. Article Summary [sarcastic] by Ukab+the+Great · · Score: 4, Funny

    "We expected a Limewire topology to be as reliable as a Phone companyi topology and oddly enough that bit us in the ass."

  12. Skype Win 5.0 client sucks by scorp1us · · Score: 4, Interesting

    The QA of this release is way down. On top of that, skype auto-updated people from 4.0 to 5.0. Within a few days, the buggy 5.0 had enough penetration (50%) to bring them down.

    The windows client has widely been reported to:
    consume 2x as much CPU (33% to 60% on mine after upgrade)
    leak RAM (starts out ok but after some use over 1.5gig needed)
    the GUI is slow, so the fade effects on some computers (mine) causes video tearing. It is no longer possible to run full-screen. (320x240 is all I get before tearing sets in)
    The fonts in the video area don't render correctly.
    It should be noted that I have a AMD X2 1.6 and Radeon 1200 card in this computer. Its not shabby. But the 5.0 client brought it to its knees.

    It plays SCII just fine (albeit on the lowest setting).

    It comes at a bad time when they are trying for more corporate agreements, but can't run on my 3-year-old hardware.

    I uninstalled 5.0 and installed 4.0 and its back to normal.

    --
    Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
  13. Public Post-Mortem by Enderandrew · · Score: 4, Insightful

    You can bitch they didn't QA the release. You can bitch that you don't like a P2P topology. But it is nice to see a public post-mortem.

    --
    http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
  14. Forced auto updates are not the solution. by mario_grgic · · Score: 4, Interesting

    I hate when apps run auto update daemons. This precisely the reason why I don't use any Google desktop software on my computers.

    Proper thing to do in this case is simply disallow users to log in with a message they need to upgrade their client if they want to continue to use the app. Simple thing to do, rather than each app running a daemon. Soon enough there will be hundred update daemons on each user's computer, eating resources, connecting online all the time and bogging down the user experience. Thanks but no thanks. I refuse to use any of those.

    --
    As the island of our knowledge grows, so does the shore of our ignorance.