Monday, The Death of Websites
An anonymous reader writes "Developers implementing 'weekend inspiration' are more dangerous than hackers.
Vnunet.com has this article about how eager developers and administrators create more troubles than hackers and viruses do for websites. How about those of us who start the week with a cup of coffee and the morning online-news? My inspiration and new ideas for development are definitely not the cause of the Monday-crash hour ... I think."
And if slashdotting causes more downtime than developer mistakes, couldn't one argue that interesting content is more harmful than bad code for website uptime?
Quack!Quack!.....QUACK!!
Has anyone done any sort of bandwidth study looking at sites like etrade and yahoo, for purposes of determining any correlation between bandwidth consumption and movement on the stock markets? Intuition says that Monday mornings ought to see some sort of correlated spike.
I log in, the story is a few hours old, and there are 4 posts. Slashdot implementing the theory?
Ok, I give up, why you?
The article suggests that developers come back from their weekends and start fiddling with websites, but I think this last paragraph is perhaps equally or more accurate. Managers get "inspired" over the weekend just as much as code writers.
Reminds me of the BBS days. Usually a few hours after the SysOp leaves on vacation, the BBS is guaranteed to go down.
This is nothing but unprofessional development - the old "Oh, this is soooo good and sooo simple, how can it possibly cause troub..... ".
/. crew this: while their spelling may be atrocious, their grasp of grammer poor, and their fact and dup checking next to non-existent, they will put major changes to the codebase into Banjo first, then after they've been abused put them into the main /. site.
Any codebase, be it a program, a web site, or a router's firewall rules, should be changed IN TEST FIRST! Then you do your best to break it, and only after you and several others have had at it do you move it to production/HEAD/whatever (and hold your breath).
If you had a wonderful idea over the weekend, GREAT! Implement it in a test branch, try it out, and then move it to production. But if you merge it into the mainline without testing you are not acting professionally.
I will give the
At least, some of the time.
www.eFax.com are spammers
In any properly managed environment, developers don't get to [i]touch[/i] the production environment. If they do, it should be read-only. All changes are made in the dev environment (developers can do what they want), put into test (developers are seriously limited), and then finally into production. Prod should be a physically separate set of servers from dev & test.
If stuff breaks on Mondays, either someone is skipping steps, or there is more going on.
My server
Sounds alot more like lack of a proper devlopment environment to me.
I mean its easy for it to happen. We had problems like this with our monitoring system (tho it was manic friday where someone would attemtp to impliment something before the weekend because of course, the weekend is when you want pages the least so you want to get anything that causes false pages fixed on friday to maximize enjoyment of the weekend)
Now we have development and test servers where things live BEFORE they go production. I never had any idea that it would help so much until we finnaly implimented it.
-Steve
"I opened my eyes, and everything went dark again"
I guess these sites don't test anyting. Maybe they are talking about small sites. I work for a big car company. We have three stages of testing.
:-)
I'm not saying the artical is wrong. The developers are still the biggest problem with our web site. It just doesn't always happen on Monday. Some times it takes tell Wednesday to get through the system.
There are 10 type of people in the world, those who understand binary and those who don't.
Just a thought: The rest of the world lumps all of us IT people together; the distinction between, say, a "developer" and "sysadmin" means nothing to my non-geek friends.
I don't think stuff like this happens often to sysadmins or DBAs. How often do you come into work on a monday and decide to migrate to xfs because you read on slashdot over the weekend that SGI ported it to linux, and SGI is cool? Likewise, how often does an Oracle DBA decide on Monday to move some production tablespaces over to rawfs from cooked, because she read a whitepaper from Oracle on Saturday that talked about performance increases from raw filesystems?
I've written a lot of code, and also sysadmin'd an awful lot of servers, and in my experience probably 90% of "production outages" are software changes--exactly like the article said--poor change control, etc etc. So, what's the point of dynamic multipathing, patching, dual power supplies, etc etc, when most problems occur because someone got excited and forgot a semicolon somewhere?
Is it fair to say that sysadmins fix things and developers break them? What is different about a software engineer's brain than a systems engineers? Talk amongst yourselves :)
The tone of the article talks about shoot-from-the-hip developers acting irresponsibly, on impulse. They're taking a recognized and thoughtful practice and painting it as irresponsibility.
Monday is the best time to implement changes to most sites. The irresponsible coder implements on Friday, when errors might not be caught, or fixed, until the next working day, after a full weekend of downtime, bugginess, or insecure behavior.
But that wouldn't make for an interesting story. News flash: updating code often results in bugs that need to be fixed. When do the authors suggest we roll out new versions?
Kevin Fox
While working for a large nameless Telecoms Company,
I and my fellow Contractors had an unwritten rule to "hold off" on all "good" ideas generated in meetings etc on Monday & Friday. Almost inevitably they would
all be canceled within a couple of days. Not subjecting ourselves to post/pre weekend madness saved ourselves a ton of work and helped us bring the project in on time!!
Log in.
Cup of coffee.
Browse online forums.
Read witty remark.
C|N>K
Change keyboard. Curse profusely.
Stéphane "Alias" Gallay
Now, where did I put this witty quote?..
Seems to me we are talking about several different things here.
First of all, presumably it is a good thing that people think, and get inspiration. Mon-Fri 9-5 is not the best time for thinking - this is the time for meeting deadlines, sitting in meetings, answering the phone, putting out fires, and so on. The only time most of us have to actually sit and think is the weekend. Personally, I think that should be encouraged.
The next step is implmenting what you have dreamt up. Obviously, most ideas fail - ask any patent officer. And obviulsly, implmenting a new idea without checking with colleagues, drawing it 0ut in a spec, getting that spec approved, then protoyping, testing, tuning is not ideal either. These procedures were invented for good reasons - not just to constrain the creative mind. This is where most developers fail - not in coming up with ideas, but in being disiplined in implmenting them. I hear "we cannot plan ahead, it does not work like that for us" all the time from my developers - this is always a misconception, and seems to me simply a combination of inexperience, laziness and inability... nothing that cannot be fixed!
Michael
---
BDOS ERR ON A:>
Sheesh, next thing you know they'll start spouting nonsense like "burning the midnight oil leads to more bugs."
When I was responsible for the Internet site of a rather large national bank, we only accepted change requests for Tuesday and Thursday mornings. It was just easier for the operators to get hold of a sober developer/administrator at 02:00 on a Tuesday or a Thursday than any other time. And getting a contact on the business side to ok a rollback that caused contract issues on the weekend was near impossible.
It doesn't matter if the website was made on saturday , or wensday. anytime a website comes out with new code, its going to fuck up in the first few days. there is just no cheap way to test a website with a full load of users all with difrent OS's, web browsers, and connections. how many times has slashdot craped out when they update the slashcode
Having been working on a company that grew from a 1999 Internet startup with 5 employees (me being the only programmer to work along two consultants) to a profitable Internet company with 40 employees in 2003 (inlcuding the two former consultants), I've seen quite a bit of change in the IT procedures.
:-(
We have an 8 people tech team now (manager, programmers, support, QA). Whereas before we programmers would just use a development environment somewhat similar to the production (live) environment, test it a bit, deploy at will and monitor if anything went wrong, things have progressed a lot. Now we develop on a development environment as close to the production one as possible, then this is released to a test environment (also as close as possible to the production one) to be tested by QA, and that is finally released on the production (live) environment after it all tests ok (including regression testing).
Moreover, all the code changes are now under CVS, and we have automatic tools for monitoring the site, emailing errors, etc. QA is also done by separate people. IMHO it is conceptually flawed to allow the developers to do the final testing, by definition. (Though of course this is not always possible for cost reasons, it should be a goal).
The quality of our site is much better now. Problems almost always only arise when people want to bypass QA or force things through for emergencies.
IMHO, what is needed is:
1. Professionalism by the developers.
2. Testing, testing, and testing -- by the developers.
3. QA, QA and QA -- by someone other than the developers!
4. Managers must know the test/ QA process should never by bypassed -- this unfortunately is probably the hardest point.
I taught a couple of ecommerce classes for MBA students and had them actually do hands on development (in a limited sense of course) so they could get an appreciation of this process. Hopefully if some of them are managers they will remember that and not try to shortcut the due process.
/* TAANSTAFL */
There must be a course somewhere for developers - how to piss-off sysadmins. Highlights:
1. Make changes last thing on a Friday.
2. Or before a 2 week holiday
3. Change Management does not apply to developers
4. CVS is for wimps
5. And if you must use CVS, wait a week before committing fixed code.
5. Don't bump version numbers
6. Don't update init scripts
7. Ecept if they are correct
8. If anyome is aware of what you are upto... go to lunch.
Do you mind, your karma has just run over my dogma.
When you make changes to websites, they sometimes break. Anytime you introduce change into a stable system, you open the door for instability.
And generally business websites don't see as much traffic on the weekends, so naturally the weekend is the time to make changes.
So wow, it's no shock that Mondays are when you're most likely to see problems with a website. But these problems and hiccups are the price you pay for progress.
If you don't want to chance any disruption in your life, then I guess you should never change. Otherwise, get over it.
You need to get the code monkey off the production box.
They need a Dev environment. And THAT's ALL they touch. They deliver their code to UAT.
QA needs 2 environments:
- Unit Acceptence testing (UAT) and all bugs go back to Dev
- Integration Testing (IT) and all bugs back to Dev or you need SysAdmins who need to hack the OS middleware &| environment)
Production where NOHING is allowed until its gone through UAT & IT.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
Lawrence: No. No, man. Shit, no man. I believe you'd get your ass kicked saying something like that.
- Lightbulb above head in the weekend (ding!)
- Over the following week, research the change, check impact on existing systems, come up with a maintanance strategy, document it, inform people, test it in a lab, plan the implementation, develop a rollback procedure.
- Implement change early the following week - never on a Friday, preferably not on Thursday.
- Watch throughout the week for problems.
Anything less and you dont deserve to be in that position.Sparks:Gadget:Beer Maker
Is this really a case of "Weekend Inspiration", or a case of management pushing changes that haven't been thouroughly tested?
I find it quite disturbing how these companies are blaming downtime on developers. This means that:
a. You have no change control over your environment, and developers can do as they please, hence poor management.
b. Developers are implementing changes that haven't been thouroughly tested. Again poor management.
Technology and competition isn't moving so quickly that you cannot take the time to use a test/qa environment.
Awesome!
What kind of mickey-mouse operation is it that allows someone's (whether management or developer) mistake to take down their web site? Have they heard of QA and testing? You'd have to be insane to allow any changes to a production system without it being tested on the test system first. In all the places I've worked at (I'm a back-end hacker, using C/C++ and java) anyone who made a change to the production system without following all the test procedures (regression tests and QA signoff) would be canned in a second. (Unless it's the VP of Engineering -- but that's another story.) Or these personal sites they're talking about?
Unlimited growth == Cancer.