Monday, The Death of Websites
An anonymous reader writes "Developers implementing 'weekend inspiration' are more dangerous than hackers.
Vnunet.com has this article about how eager developers and administrators create more troubles than hackers and viruses do for websites. How about those of us who start the week with a cup of coffee and the morning online-news? My inspiration and new ideas for development are definitely not the cause of the Monday-crash hour ... I think."
Has anyone done any sort of bandwidth study looking at sites like etrade and yahoo, for purposes of determining any correlation between bandwidth consumption and movement on the stock markets? Intuition says that Monday mornings ought to see some sort of correlated spike.
Sounds alot more like lack of a proper devlopment environment to me.
I mean its easy for it to happen. We had problems like this with our monitoring system (tho it was manic friday where someone would attemtp to impliment something before the weekend because of course, the weekend is when you want pages the least so you want to get anything that causes false pages fixed on friday to maximize enjoyment of the weekend)
Now we have development and test servers where things live BEFORE they go production. I never had any idea that it would help so much until we finnaly implimented it.
-Steve
"I opened my eyes, and everything went dark again"
Just a thought: The rest of the world lumps all of us IT people together; the distinction between, say, a "developer" and "sysadmin" means nothing to my non-geek friends.
I don't think stuff like this happens often to sysadmins or DBAs. How often do you come into work on a monday and decide to migrate to xfs because you read on slashdot over the weekend that SGI ported it to linux, and SGI is cool? Likewise, how often does an Oracle DBA decide on Monday to move some production tablespaces over to rawfs from cooked, because she read a whitepaper from Oracle on Saturday that talked about performance increases from raw filesystems?
I've written a lot of code, and also sysadmin'd an awful lot of servers, and in my experience probably 90% of "production outages" are software changes--exactly like the article said--poor change control, etc etc. So, what's the point of dynamic multipathing, patching, dual power supplies, etc etc, when most problems occur because someone got excited and forgot a semicolon somewhere?
Is it fair to say that sysadmins fix things and developers break them? What is different about a software engineer's brain than a systems engineers? Talk amongst yourselves :)
While working for a large nameless Telecoms Company,
I and my fellow Contractors had an unwritten rule to "hold off" on all "good" ideas generated in meetings etc on Monday & Friday. Almost inevitably they would
all be canceled within a couple of days. Not subjecting ourselves to post/pre weekend madness saved ourselves a ton of work and helped us bring the project in on time!!
Mr. Gandhi has his cause and effect a little mixed up, and I think he's implying that new development shouldn't ever introduce new bugs, which is a little silly.
For the concrete "holiday lockdown" example, I think he's only partially right. In my development group, we explicitly lock down ALL changes to our production web apps well before, and all through, the Christmas shopping season, to prevent the inadvertent introduction of any (new) bugs. It's not a side effect of vacation time -- it's an explicit operations decision to reduce the risk of breakage.
So, yeah, while we're not touching it the stability seems to increase, but no existing (but less critical) bugs get fixed either. No large-scale app is bug-free -- the lockdown period just seems to stabilize things but it's an illusion caused by the lack of new species of bugs popping up.
In the more abstract "development introduces bugs" sense, it's a fact of life in complex systems that new code means new bugs -- and if we never introduced new code (->features) then we'd lose customers. So I take his statement to imply that we should only be introducing 100% bug-free code -- which is a PHB pipe dream.
When I was responsible for the Internet site of a rather large national bank, we only accepted change requests for Tuesday and Thursday mornings. It was just easier for the operators to get hold of a sober developer/administrator at 02:00 on a Tuesday or a Thursday than any other time. And getting a contact on the business side to ok a rollback that caused contract issues on the weekend was near impossible.
- Lightbulb above head in the weekend (ding!)
- Over the following week, research the change, check impact on existing systems, come up with a maintanance strategy, document it, inform people, test it in a lab, plan the implementation, develop a rollback procedure.
- Implement change early the following week - never on a Friday, preferably not on Thursday.
- Watch throughout the week for problems.
Anything less and you dont deserve to be in that position.Sparks:Gadget:Beer Maker