Microsoft Azure Failure: SSL Certificates Were Updated... Sort Of
judgecorp writes "Microsoft has published an explanation of the failure of Windows Azure earlier this month. Users of the Azure storage saw that an SSL certificate had expired. Microsoft's explanation says that the certificate had in fact been renewed, but an update with the new certificate details was not prioritized, and hadn't actually been implemented till after the old certificate expired. There are more interesting details, but Microsoft says better alerts and more automation will stop this particular fault happening again."
That is the BIG expectation.
Very few like it and buy it. The new bruiser interface has ruined it.
Windows phones? less than 2% of the market share. Could you have a worse model for an interface?
This may also help. If they now have a process to prevent exactly this problem, that is nice, but does nothing to prevent other problems that would be obvious to anybody halfway competent. This is a systematic problem, and the certificate was just one instance.
Of course, Microsoft has time and again demonstrated that they do not have what it takes, and safe for that one historic accident they would never have been of any importance.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Unless I'm horribly mistaken, they've let certificates expire before. Why would I think they won't let it happen again?
It definitely won't happen again, instead the team responsible for keeping the automation software running will fail. Or an automatic upgrade to Windows will break it, or the libraries needed to run it will have been deprecated.
So yeh, it won't happen again, the next time it will be something else to blame.
Never of course a management that chops up roles into such small increments, dis-empowering it's workforce so much that the simple job of updating a certificate becomes a major obstacle each and every time it happens. No, never a load of BS managers, no sir!
... managers saying "we need to get this up and running sooner ... automating it reliably is hard to do ... just get it working and update things manually for now and we will automate it later". When later comes, everyone is working on something else.
now we need to go OSS in diesel cars
re Pretty sure the last one was a bug that was something to do with the cert expiring on a leap-day though. [emphasis mine]
.
$gt begin{sarcasm} Well, if it was a leap-day event, well that's totally excuseable because there's no predictable way to know that a particular year might be a leap-year with a leap-day in it, and even if there were, my goodness, you'd need some sort of computational device to carry out the algorithm (that Al Gore, he invents everything!) that would let you figure it out, and who could afford a computational device??? end{sarcasm}
;>p
Come on, you can't let Microsoft off the hook for screwing up things like that. It's supposed to be a software company. Y2k was known about well before it occured; leap-year days are well known about and recur on an amazingly well-understood and defined schedule. This is not a much deeper problem. It's just another basic problem that shows that there are not any good processes going on behind the scenes at Microsoft. And Apple screwed up their alarm clock functionality that kept messing up on iOS at the beginning of the New Year, too. That was also just as inexcuseable.
Microsoft seems to be almost entirely staffed by bumbling, incompetent fools. And it starts at the top.
uhuh. I think people, especially technology companies, forget that the easiest task to automate is one that a human can simply do.
"Executive assistant in charge of renewing certificates". Make it someone's job. It'll get done. You don't need a robot. You just need it to be in someone contract. That's it.
I always back up my cloud data to a local harddrive, just to be safe.
That sounds like vaporware.
but an update with the new certificate details was not prioritized
Reminds me of AD + Exchange: group policy changes take forever to propagate even when forced, removing attached mailboxes from exchange clients takes exchange 10 minutes to respond to let the client know it is gone. All are not prioritized. But I am sure there are better things to waste idle proc time and to screw around admins with.
You'd think after people made fun of the MS Zune for being out of action on a leap day that MS would take a bit more care before the next one.
Please, do tell - what method do you use to update certs on tens of thousands of systems without causing an outage?
Oh wait, you don't administer anything beyond your mom's desktop PC? Well that's nice then, go back to your WoW game and leave the rest of us alone.
As an aside, it's "save for" not "safe for." That's ok though, maybe they'll cover that in your 10th grade English class next year.
Fucking PFYs. Fuck.
I've been to Google before when that happened, typically when crossing between analytics and normal google or google plus or adsense. It gets confused sometimes and just doesn't know what to do I guess? Computers can be buggy and relying on SSL isn't very smart. I understand that it makes it seem like a phishing site if there's not one but SSL expiration happens to everyone.
I don't follow Microsoft *******ware. A description of what this is would be nice though.
With "azure failure was a leap year glitch", "microsoft certificate was used to sign flame malware", and now this, what of Secure boot and the (de facto) dependency upon Microsoft?
Yeah, all of the window phones silliness is so worth laughing at. I remember the crazy ad that came out for the windows phone last year that had QuestLove in the commercial. I believe that /. had a story about MS cancelling that phone the SAME DAY that the commercial had just aired.
.
And what the fVCk is it with the stomping and jumping and slapping around of hardware in the ms tablet ads? Is that all that the MS tablets are good for? Throwing them around and clunking them onto tables and benches? What's with the ugly mean-faced girl-scouty attired girls in that first MS tablet surface ad? I think MS just saw the Apple ipod and iPhone ads that had a single song playing in the background with cool activities and decided to copy the style without any substance. Hey, that kind of explains most of the things that they do!
It's incredible how they keep shuffling blame around, or hot-potato-ing it:
In this case, the Secret Store service notified the Windows Azure Storage service team that the SSL certificates mentioned above would expire on the given dates. On January 7th, 2013 the storage team updated the three certificates in the Secret Store and included them in a future release of the service. However, the team failed to flag the storage service release as a release that included certificate updates. Subsequently, the release of the storage service containing the time critical certificate updates was delayed behind updates flagged as higher priority, and was not deployed in time to meet the certificate expiration deadline. Additionally, because the certificate had already been updated in the Secret Store, no additional alerts were presented to the team, which was a gap in our alerting system. [source link] [bold emphasis mine]Laughable, if it were not so stupid.
This is what happens when you have bean counters and MBA running the IT department.
There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
Very few like it and use it. (Linux|Mac) desktop? less than 5% of the market share. Now that I have shown the fallacy of your statements, how about you just shut the fuck up.
There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
A true coward: Nothing of worth to say and that without any grace...
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Good lord, last year it was a 12 hour outage on leap day, this year it was a 12 hour (as far as I can tell) outage due to expired certificates. They won't be able to claim six 9's uptime for ~274 years!
At the rate of a half day of failure every year, so far, I'm not even sure I'd trust Azure for storage no matter what the discount they offer.
They pushed the update out on Jan 7. By Feb 22, it hadn't been completed. Something is not right with this explanation. Doesn't matter how low a priority it was, it should have been pushed out within in what? Two weeks?, a month?
[ "Azure" is a shade of blue, for those that don't know,
and why MS would go with this kind of name, given their history with things "blue" is beyond me. ]
It must have been something you assimilated. . . .
re: told us that their service, which they want you to make mission-critical, is managed with a process that would make the three stooges weep with uncontrollable laughter.
.
Miscreantsoft has so much money that they can't even afford three stooges; they have to make do with their two stooges that done brung the shit to this party: Gates and Ballmer.
Oh, you also probably meant "systemic" instead of "systematic." But that aside, how would you go about updating certs on tens of thousands of systems without causing an outage? What major online service have you helped run? It's easy to sit and criticize. It's slightly more difficult to use the proper word choice while doing it (and you're evidence enough of that). It's substantially more difficult to actually run a system with five nines uptime in the real world, and from your comments I suspect you've never contemplated what really goes into that.
My certificate authority sends me nagging emails like 6 weeks before my certificate's about to expire. Microsoft's certificate authority group needs to create a database and automated emails when certificates get near expiration. Start emailing a bunch of folks. It's very simple. Probably most CA's have such a setup.