MS, CNET On 7-Day Messenger Outage
imipak writes: "Microsoft have
finally commented on the recent seven day outage
at their
Messenger IM service -- some users have permanently
lost data, and there's still no explanation of the cause.
Interesting earlier story from CNet News. Key quote:
"... an outage that lasts seven days with no valid
explanation really starts to make you think about
.Net, and about Microsoft's plans for the Internet.
What if this were the new Office software
verification service that was down?"" Here 's a story on MSNBC as well.
No need to fear, by the time .NET is up all of Microsoft's servers will be running FreeBSD..
So, there ARE legitimate, work-related uses for instant messanger software. =)
---
I realize I'm just a lowly mathematician and all, but doesn't this seem reasonable, even for people that design real-life applications?
Curmudgeon Gamer: Not happy
lasts seven days with no valid explanation really starts to make you think about .Net
.NET services were distributed to many 'equal' computers (think the internet as it is structured today) than we can withstand the loss of one machine, in the M$ vision of the future many-many-many services and machines rely on their .NET systems. Imagine if TCP/IP had to 'ping' authorize.big-toll-gate.com's 'license' server in order to start - now imagine they go down....
/.; but what happens when Passport crashes for a week and no one is able to pay bills or maybe Office.NET file storage site burns down and takes millions of people's family photos (yes I know about off-site backups).
.NET has serious potential for peril. I hope all the PHBs and DoJ are paying attention...
At least it is not my families 7 years of financial data, or the copies of my child's baby-pictures - or my presentation that I needed for a job-interview. We dont have to tell MS that distributed resources increases fault tolerance. When you devise a massive system, with a single point of failure (M$.Net) you are going to burn - and burn big-time. If
This may not be a surprise to any one on
The point is simple - you cannot build a reliable system with such a glaring single-point-of-failure. Downtime happens - and as this MSMessenger event shows us -
I think the real cause was something like...
Error:
MsgrSvr.exe caused an invalid page fault in module KERNEL32.DLL at 015f:bff9dba7.
--
All opinions presented here aren't mine.
This crowd? Nah - we all wrote scripts that sent us email alerts to our cellphones when slashdot came back up and we could finally find 'CowboyNeal' somewhere in the HTML source :)
Top Most Bizarre/Disturbing Error Messages
No, it is virtually allways used for leisure: Pretending to do work whilst actually swapping sweet-little-nothings with Jane in accounts, or arranging a Q3 duel with DukeQuakem. (if someone actually has an important, legit reason for using a messenger service, please correct me...).
Basically, if you cant us MSN messenger, you can us email, or pick up the phone. I'm sure, when MSN messenger breaks down, its not on MS top list of priorites.
Perhaps, er, they had better things to do? Or perhaps it got lost at the bottom of someones in-draw?
However, it probably wasn't a good idea for MS to leave it so long. So many bloody people use it, that it does send out a helluvalot of bad publicity (I'm not going to get that date with Jane this weekend and it is ALL Micro$ofts fault!! Bah!). However, I think if a important component of .NET where to fail, and adversely affect many critical services, MS might react a little quicker, with greater resources & assurance
See this report from The Register for the grisly details.
I suppose you could say this is because VeriSign and Network Solutions are insane, deranged companies, and there is most likely truth to this. But I'm not convinced; I HAVE TO deal with these idiots for my domain names, and now I have to rely on .NET to do it. Ick.
D
----
Ya know there are two possible causes from the minute information they've released (It was caused by a freak failure when a hard disk controler crashed).
1) Caused by a freak failure when a hard disk controler crashed.
2) They've said they have to restore from backups.
If both are true, then it sounds like they were using a distributed database (or filesystem?) and one machine going down very badly managed to infect lots of others... doesn't bode well, especially when MS's solution to competing in the Server environment is traditionally to Cluster lots of machines together. The more you have the more chance one may have problems.
If the first statement is false, then the only thing I can think of is that the system was infected by either an outside source, or some other malicious virus. Standard Operating Procedure in this case would be to disconnect the machines, diagnose the problem (so new machines wouldn't be infected), and then restore from backup. Its also possible someone over-reacted and they went into this mode when in actuallity Item 1 was true.
Anybody else think we're hearing the whole story?
This space for rent. All reasonable inquiries will be entertained at proprietors discretion.
when these kind of outages happen, of Peter Deutsch's 8 Fallacies of Distributed Computing:
This is, of course, why the idea of remote authentication being necessary to use your word processor is a bad thing. Heck, even losing something as innocuous as an instant messaging program brought thousands of people to a screeching halt for a week. It seems to me that Microsoft (although they're certainly not the only ones) seem to believe these 8 fallacies blindly, espcially 1, 4, and (they're hoping) 6.
Right...
I am utterly amazed at times the things I hear about how system administration is performed at MS. Ever check their jobs page? They're really picky about who they hire, you know.
Yet we repeatedly hear about security problems with their own servers, how all their DNS servers were on the same network segment, hotmail goes down and now this? Lost data??!!!
I'm sorry, but as a former full-time sysadmin, there is absolutely no excuse for losing data. Preserving your companies data is the #1 priority of any sysadmin, regardless of the company. And preserving data with 100% certainty is acheivable by anyone who takes the time to set things up right.
Oh well, I was never a fan of their passport/hailstorm idea anyway. Things like this can only cause more people to run away from using those services.
No, Thursday's out. How about never - is never good for you?
Bet they know how I feel at work every day now...
-- Geof F. Morris
Think about Hobbes social contract.
.NET. It is the companies responsibility to give us fair service, and tell us what's going on.
.NET and using all of the authorization features to access Microsoft's sites that require Passport/Messenger, just like in Hobbes social contract you are giving up some rights and some control. Your taking a risk. But remember, their are other choices.
'People give up certain rights and freedoms for a feeling of safety etc.'
This is the same sort of situation kinda. People give up having their own servers for communications and data storage in technologies like
If we do not like what's going on, it is our right and responsibility to seek alternatives.
Your always going to risk loss of data and loss of service if you let someone else handle your data, communications, authorization, etc. It's a risk that you take. You hope that the company is able to do a good job and maintain good service. Remember, if you start using
[Something witty and intelligent should have appeared here.]
[Something witty and intelligent should have appeared here.]
{Traicovn}
As a (curious) sysadmin I wouldn't mind reading a post mortem like what the /. crew did a few weeks ago. I think MS is missing out on a lot of brownie points by not publishing a blow by blow summary of how an enterprise goes about troubleshooting/fixing a system like that. It would be possible to do something like that w/o disclosing sensitive information. Like I said, wishfull thinking.
BOSTON SUCKS!
Recent surveys show that employees that use Microsoft's popular Instant Messenger software are having one of the most productive weeks in recent years.
Now if only Slashdot would have a week-long outage, I could get some work done.
We sent out an instant message to all the users letting them know about the outage.
Remember when users couldn't get through because there were busy signals all the time?
Remember how people said that there was going to be a mass exodus from AOL?
Remember how that didn't happen?
No matter how badly MS screws this incident up, no matter how many judgements get made against them, the average business drone and Joe User will still end up using .NET.
"Enough of this wretched, whining monkey life." -- Marcus Aurelius, _Meditations_, Book 9, 37