Skype Outage Hits Users Worldwide
Hugh Pickens writes "The LA Times reports that millions of Skype phone users worldwide couldn't make calls or were dropped in mid-conversation because of a network connection failure that began about 9 AM Wednesday PST. 'For a communications system this large to go down, it's almost unheard of,' says Charles S. Golvin, a Forrester Research analyst. 'Usually when phone lines are disrupted, the blackout is confined to a specific geographical area. This is worldwide.' In theory, Skype, which is based on peer-to-peer networking technology, shouldn't see an outage, but that is not really the case — the company has a massive infrastructure that it uses for purposes such as authentication and linking to the traditional phone networks. 'The outage comes at a time when Skype is starting to ask larger corporations for their business,' writes Om Malik. 'If I am a big business, I would be extremely cautious about adopting Skype for business, especially in the light of this current outage.'"
Now when skype sucks a bit more, maybe it's time for our company to search for the open source alternatives....
It's still ongoing right now, albeit intermittently. I'm seeing drop-outs on the distribution of my skype status (despite my local 'net connection being fine).
"Little does he know, but there is no 'I' in 'Idiot'!"
Apparently this article was published too soon. Those year end reviews should include the last few weeks of the year before.
Increasingly more and more communication is becoming centralised. People use Facebook to send messages rather than email, Skype rather than direct voip calls, Twitter to keen people informed. Even email relies on central webservers. Gone is the days that typical emails would travel from your computer to the other persons directly, or at most via their local ISP.
Aside from being exactly what the internet is designed to avoid, it's also handing control to corporations that are
1) Too big for governments to influence
2) Too big to fail
I for one hope for more large scale outages, hopefully it will stem the tide, but like Cnut, we can't stop the inevitable.
it took long enough for this storage to hit /.
was there and editorial outage?
Where's the news article on this "storage" that hit slashdot? Anyone hurt?
'The outage comes at a time when Skype is starting to ask larger corporations for their business,' writes Om Malik. 'If I am a big business, I would be extremely cautious about adopting Skype for business, especially in the light of this current outage.'
I can't help but wonder why people expect a company like Skype to provide perfect uptime, assuming just because they're an 'internet-company', when local providers can have similar troubles.
Sure, such an impact on a global scale is, err, not very prestigious, But there were enough major outages in the past by standard telcos that had similar debilitating effects (unless you happen to be an international corporation of considerable size. Sure, a company like IBM should now think twice about putting their money on Skype alone, but such companies surely aren't quite the target customer group Skype is aiming at right now). So the above statement, to me, sounds more like someone wanting to say SOMETHING... It doesn't have much value in the real world, though.
Obviously there was "and" editorial outage that prevented the story from being covered.
This is your response to the Net Neutrality Bill? Very clever...
Question: do torrents still work, or did the bastards turn that off too?
How many more years will slashdot have an off-by-one error on your Score in your profile?
Oregon?
I can't help but wonder if one of the larger telcos introduced a worm into the system just to mess with Skype (and inexpensive internet telephony in general).
...'parallel to serial to parallel' and 'optical to electrical to optical' :)
Seriously though, as technology improves it often leads to 'old concepts' being re-examined and implemented in a new manner that is usually more effective than the initial parts.
While not always true a lot of technology has sprung up like this, especially in the computer world.
Halfway through yesterday my Skype stopped working, just like everybody else's.
It then tried to reconnect, and out of the blue gave me a pop-up saying "Skypenames2.exe wants to use Skype" with the options "Allow access" or "Deny access."
This naturally set off a few alarm bells, but as it turns out it isn't malware or a virus, just a poorly named Skype component. It allows you to click telephone links in IE or a Mozilla-based browser and make direct phone calls using Skype. Personally I don't want or need that kind of integration, so I declined.
Just find the Verizon/ATT/Sprint/ executive with a smug face and a sheepish grin. He did it.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Funny how moving into the telecom commercial demographic, they are hit mysteriously with something that makes their
adoption doubtful....really interesting, I wonder if there is any background on tracing the disruption, maybe it is linked to
sort of DDoS attack but made by phones to any skype IN......overload their systems on purpose to discredit their
stability.
...but I still think that Admins, or so called 'IT-Specialists', which are suggesting Google Documents and Skype for serious business use should be moved to the cleaning staff (or at least as far away from the IT infrastructure as possible).
After convincing my boss and *his* boss about the benefits of Skype, yesterday was the day I was going to demo it to show how it works, benefits, video, etc.
Suffice to say, the demo did not go well.
It’s worth noting that our enterprise product, Skype Connect , is working normally
From: http://www.skype.com/content/skype/intl/en-us/StatusUpdate.html?cm_mmc=PXTW|0700_B6-_-downtime-20101222-2
the company has a massive infrastructure that it uses for purposes such as authentication
I've always been amazed by the large amount of time it takes to be authenticated from a Skype server, compared to connections to other providers - time that suggests there is something wrong with their infrastructure.
Slashdot, fix the reply notifications... You won't get away with it...
It was riding back in "steerage".
You are welcome on my lawn.
Skype video conference is Julian Assange's preferred interview medium ; now he's under house arrest, anyone wanting to interview him will have to call using the traditional phone network or use an alternate videoconference system. I think some of the news organizations were sending outside broadcast trucks to interview him.
Yes, this is on the left field of paranoid. But someone had to say it :-)
Basil?
You are welcome on my lawn.
My Ekiga account which uses the much more "open" and widely supported Session Initiation Protocol (SIP) is still up and running. By "widely supported" I mean that many more applications support it (while Skype is a proprietary form of VoIP), not that more people use it.
Time to laugh of all my friends that are now trying to use Skype! (soon I'll be receiving messages through MSN - not IRC or GTalk - asking why Skype stopped working)
http://gbl08ma.com
It took a while to get the word out because the editors use skype to communicate.
Check out my lame java blog at www.javachopshop.com
Basil?
Oregon = state
Oregano = seasoning
Hmmm, I wonder if this ties in with the fact that last night my computer started spewing tens of thousands of packets on port 443 (https). My guess is that it became a Spype supernode. Needless to say, the network admins were not very happy about this. I couldn't find a way to disable it in Ubuntu, so it's gonna be goodbye Skype for now, unless someone can suggest a solution.
Non-Linux Penguins ?
Whoosh
If some system relies on single (or a limited number of) provider(s), you can't really call it distributed. No matter how much of the system's costs is off-loaded to the end-user, there're still only a few key points which can bring the entire network down. Don't mistake cost off-loading systems (Skype) for truly distributed and therefore robust ones (SIP, XMPP).
Skype is blaming its peer-to-peer interconnection system for the problem. In an official blog post, the company said: 'Our engineers are creating new 'mega-supernodes' as fast as they can, which should gradually return things to normal.' http://www.itworld.com/networking/131617/skype-blames-service-outage-supernode-problem. And as of 8 a.m. Thursday, Skype said about 2/3 of users still can't log in. http://www.itworld.com/networking/131655/skype-says-two-thirds-users-still-cant-log
yes, but Basil is faulty
Slashdot is a news aggregator. They don't report news. They don't have reporters or journalists.
You have a 4 digit UID.. how do you not know this?
You are entitled to your own opinions, not your own facts.
The best circuit analogy I've seen to this switching between a distinct pair of alternatives is a delta-sigma analog-to-digital converter (or sigma-delta converter, depending on your dialect). This converter takes an analog signal input, but the output is only one of two values, 1 or 0. The long-term average of the output pulses is equal to the input analog voltage, but at any given instant the output is at one of the rails (1 or 0).
It's like saying that at any instant the US government is controlled by Democrats or Republicans, but the long-term average (representing the input to the system, i.e., the wishes of the people) is somewhere between these extremes. Or the old argument about whether a company should be organized around functions (having, e.g., an engineering department, a sales department, etc., each handling all products) or products (having, e.g., a Product A division, a Product B division, etc., each handling all functions). Each new CEO switches the company from one to the other, while the optimum is some unattainable blend of the two. (Don't mention matrix management.)
Interestingly, one of the most prized features of delta-sigma converters is that their noise is "shaped", that is, pushed to higher frequencies out of band, so it can be easily filtered. This greatly increases the performance attainable with a given technology. Every time I hear protest voices in democratic governments, or organizational griping by corporate salarymen, I always pause to wonder if I am listening to this feature of the converter, too. And whether I should filter it.
Your spelling is fawlty
All the major telcos have switched or are switching to "Soft Switches" meaning they are doing away with their old EWSD, DMS100 etc... hardware based switches and converting all their customers to Voip, then trunking it back to their headquarters where they have a software based switch. This saves them a lot of money but also centralizes the switching system and can lead to huge outages. I've seen them happen, so large than nearly the entire customer base of a company is out of service. But Customers are used to rare outages and if all the phones in town go out once or twice a year people chalk it up as "normal." What they don't realize is that it wasn't just their town, it was hundreds of citys all over the country. Even regulatory authorities treat each city outage separately so there's no real record of just how big the outages are.
Looks to me like a classic DoS against the "supernodes". Probably why they, according to Skype, started disappearing. In the Skype architecture, basically if you run an instance on a machine not behind a firewall or NAT, chances are that you are running a supernode and contributing to the Skype p2p network. Your IP is distributed across the network for referece.
I happen to have a machine that runs a supernode and about 12 hours ago I had real trouble accessing the machine while Skype was consuming 99% of CPU cycles. Incidentally, the same machine has an Apache listening on port 80 and SVN on 443. They were being flooded as well, due to the fact that Skype commonly listens on those ports as well (not in my case, due to my setup). Apache logs for the day was over 10GiB, containing the evidence. Apparently, Apache was taking the pounding much better, remaining responsive.
This seems to be a siginificant weakness in the Skype architecture as they are relying on 3rd parties for their core infrastructure. Incidentally, this also makes easy targets of guys that contribute to the network as supernodes.
A snippet from the Apache log:
[Thu Dec 23 13:52:50 2010] [error] [client *.*.*.*] (22)Invalid argument: Cannot map \xd0\x15X\xbf\xf9\x99J\x19\xb7;P(\xe2(\x98\xfe\xb8"\x07[N_^\xda\xb5\xe9\x8ef\xb0\xe4\x82\xaa\x9dMZ\x9d5G\x04\x8f\x11W\xf8d\x0c\x819\xb1\xc6\x81\xe9n\xc5\xd9 to file
[Thu Dec 23 13:52:50 2010] [error] [client *.*.*.*] (22)Invalid argument: Cannot map \xd0\x15X\xbf\xf9\x99J\x19\xb7;P(\xe2(\x98\xfe\xb8"\x07[N_^\xda\xb5\xe9\x8ef\xb0\xe4\x82\xaa\x9dMZ\x9d5G\x04\x8f\x11W\xf8d\x0c\x819\xb1\xc6\x81\xe9n\xc5\xd9 to file
[Thu Dec 23 13:52:50 2010] [error] [client *.*.*.*] Invalid URI in request \xd0\x15X\xbf\xf9\x99J\x19\xb7;P(\xe2(\x98\xfe\xb8"\x07[N_^\xda\xb5\xe9\x8ef\xb0\xe4\x82\xaa\x9dMZ\x9d5G\x04\x8f\x11W\xf8d\x0c\x819\xb1\xc6\x81\xe9n\xc5\xd9
Slashdot gets "scooped" like this quite frequently, but that's not really the point. It's meant to be a forum where, in an ideal situation, users can discuss and expand their insight into such news. Sure, a lot of material of tangential, marginal or no relevance does come up, but that's part of what makes the openness of Slashdot so good. On a good day, anyway.
In my case, the first "contact" was being unable to login to Skype, then finding a newspaper article about the outage, which saved me the trouble of investigating whether the problem was anything I could fix. No biggie.
What I'd rather see on Slashdot is an analysis of the problem that leads to solutions.
In that direction - any recommended alternatives. I see plenty of person-to-person VOIP solutions; but none that worked as well with 5-10 person conference calls and ran on both windows and linux. Anyone know of any?
100% accurate
http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
I don't use Facebook.
Skype appears to be back online here in Minneapolis, MN.
1. the supernode requirements suggest that most skype calls use some kind of NAT helper that proxies the call between two or more people. The 'brain' of the NAT helper (aka supernode) is centralized. There are very likely lots of conventional ways to halt supernode service if one spent the time to analyze supernode packets.
2. the fact that 2/3 of users can't log in is an authentication problem, not a 'calling' problem. The auth system has to be centralized.
It looks like ebay Engineering is going to be busy over the Christmas holiday!
http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
Que?
I upgraded my Skype shortly before the outage at 9 AM PST, had a conference call, and then hung up. I guess when I hung up I shouldn't have hit the button which said "reboot all Supernodes"....
His point is that since people submit stories before a newspaper even goes to print, there is no cause for it being so fucking late.
If you ignore ACs because they are anonymous - you're an idiot.
they are just more redundant.
assume Skype as a VoIP has multiple VoIP switches, which you really can't... some of the really big outfits used to run VoIP on a single switch for the whole nation. and if there's only a single switch, it's a single point of failure.
all the calls have to integrate into the mainstream telcos to interconnect at trunking points. if you have one, bingo. if you have multiple ones, and they run on the same physical backbone, bingo.
in order to make the interconnections to any other telco carrier, you have to have a signalling channel using SS7 protocol for billing, accounting, and charge-back stuff. the SS7 server goes down, bingo. the signalling channel goes down, bingo. and just because you are using a multiply-double-redundant system like a Stratus for SS7, be advised once in a rare while, a hardware failure can drop the whole box. a software glitch can drop the processes. I've had 'em under my wing, I know.
multiple single points of failure. if you're running something as cheaply as possible in a trial or in production, you also have less support folks and maybe don't have 24/7/365 vendor support. yet more SPOFs.
just because there's IP in there someplace, doesn't mean it's bulletproof.
if this is supposed to be a new economy, how come they still want my old fashioned money?
What no one seems to mention is WHY did this happen now, just after the release of Skype 5.0.
Skype mentions some vague problem with "Some versions" of Skype, but clearly all versions of Skype have been running a long time with no such problems except for the December 14 release of Skype 5.
I suspect some major pooch-screwing, one that messed up key features of their peer-to-peer routing technology, requiring them to rush the so called supermodes into existance rather than relying on the peering.
Like a downturn in the economy, networks that depend on peering crumble fast as people exit looking for other solutions.
Sig Battery depleted. Reverting to safe mode.
If I am a big business, I would be extremely cautious about adopting Skype for business, especially in the light of this current outage
Really? Based on a single outage? Cmon now, do you base a decision on whether to use a service on a single outage? How often is that compared to any other utilities (power, internet, land lines) going down periodically - when was the last major skype outage, if ever? I can't seem to find much, just a reference to a login issue back in August 2007. That's a pretty good track record for not crashing if that's the case - wish I could say the same about the local electricity or high speed internet.
There was no discussion at my office today about switching the plan because it was down for a day. Skype is still considerably cheaper so it's still the first choice for long distance or conference calls. Most offices still have phones and land lines for each person for internal communication - just use those as a backup should it go down.
Global warming and other natural disasters are a direct effect of the shrinking number of pirates - Gospel of the FSM
I've been using skype for a couple years at work and this is the first outage I've seen from them. Compare this to the competing platforms and I'd say businesses would smartly adopt skype should they not want an in-house solution.
The reason for this is that /. is a digest. A 4digiter should've figured that out by now. It takes a while 'til stories get submitted, firehosed, recognized... and usually /. is not really a primary source of info, in other words, we're prone to read it here after some news outlet reported it so we can link to it.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
"We cannot let terrorists do what we can do better!"
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
You have a 4 digit UID.. how do you not know this?
Senility?
People will pass up steak once a week, for crap every day.
The question would be - what took down the Supernodes? Skype uses a number of them, not all owned by Skype. Kill enough of them and the remaining supernodes have so much traffic hitting them they go offline too. It's a cascade problem and one Skype is apparently trying to fix by bringing up more of their own nodes to bolster the network while public nodes rebuild. This still begs the question - why did they begin to fail in the first place?
A couple of things come to mind...
1) Code has been released that supposedly reveals the underlying crypto that Skype uses for their traffic management. Did someone use this to somehow crash nodes? If so, how? I came home to find my Skype client toes up with an app crash and it's NOT the newer 5.x code that's been recently released so WTF? My Skype client NEVER crashes and is behind a NAT so this would seem to indicate someone was using their protocols maybe? Anyone know anything? Anyone else have a client crash like this?
2) Was there a zero day exploit in Skype code that allowed for a DOS and was then used to somehow kill supernodes and maybe other nodes directly? Skype admits to some sort of a bug but gives no details!
3) Skype uses some sort of centralized authentication server or service, blocking Skype on a network is as easy as blocking access to that service. Did someone attack that? Did it go down on it's own? I don't think this would have killed off online clients though - how often does Skype authenticate?
Skype needs to give some answers. If this is a bug in deployed code we need to know if it exposes machines to exploits and they need to patch ASAP. The answers they have given so far haven't said much of anything - why?
Build it, Drive it, Improve it! Hybridz.org
I also had Skype actually crashing, which seems really weird if it happened to everyone at the same time. It could also be the reason why the network started going down.
I doubt there is blockage at the network level anywhere.
None of my clients, (Windows, Linux, iPhone, Android) crashed, so your crashes may have been just an anomaly, or a side effect.
The alleged crypto code release was some time ago, not anything recent, and the last I head was that it simply allowed eves dropping on calls and text, no failures of super nodes
I suspect it is the new 5.0 version (or maybe some of those shenanigan versions skype released in cahoots with Verizon) are essentially taking down supernodes, or making them un-reachable, and causing the system wide outage.
As for patches ASAP, don't forget this is Skype. The only way to get them to talk to or be responsive to their users/customers seems to be to take down their entire network. They have a video up on there site with the CEO being all apologetic, without revealing anything at all.
Sig Battery depleted. Reverting to safe mode.
The crypto release was a few months ago - it had ZERO to do with the AES crypto used for calls and texts and only had anything to do with the traffic management. The crypto for calls and texts is solid.
5.0 though is pretty suspect however it's pretty recent and I wouldn't think it would have penetrated the market fast enough to have a cascade effect like this before bugs being noticed. I guess maybe we'll see if Skype ever tells anyone anything or they quickly release new code. As it stands now I'm running fine on older code and the network seems okay...
Were all of your clients up and running when this occurred? I've now spoken to two people who had theirs crash but that's far from a smoking gun.
Build it, Drive it, Improve it! Hybridz.org
But you started it!
Nick Waterman, Sr Tech Director, #include <stddisclaimer>
By way of a postscript (early morning, Boxing Day) I might mention that reliability cuts many ways.
Like everyone else, I had a few difficulties with Skype while this was going on. But during the round of long-distance calls I had to make over the evening of Christmas Day, the battery in my SIP cordless handset died in the middle of a call, despite being new and supposedly fully charged, in a handset with a 100% fault-free record. So back to Skype, which I am happy to say worked like a champ.
So I guess the moral of the story it not to get obsessive over uptime, just allow yourself a few alternative workarounds to allow room for unforseen failures to occur without causing disruption.