Don't Shoot Me, I'm Only the Software
ctwxman writes "How often have you heard about some massive crash and then the blame was placed on the software? "Disasters are often blamed on bad software, but the cause is rarely bad programming." If you've been looking to blame your boss, this article from MSNBC says your ship has come in! Poor planning, poor execution and poor leadership are more likely to blame than bad code when it comes to systems that fail. "
The lack of robust testing during and after such a project likely contributed to the Sept. 14 radio system outage over the skies of parts of California, Nevada and Arizona.
As I recall, it also came from a tech who didn't do his job right in rebooting the machine that handled the software.
You can't always blame software; you have to blame the end-users too.
Striking fear in the authors of godawful fanfiction, I am here, appearing in darkness, Tuxedo Jack!
Big projects require organization or shit happens.
Uh, that's it. Thrilled?
The occasional journalistic integrity of multpiple MS affiliated news outlets has bitten MS in the ass more than once.
SPAM
The more things change, the more they stay the same. I fought in the Brain Wars with management 30 years ago, and it was the same thing. The Powers wanted X, but system capabilities were Y. They did not want the issue confused with facts, they just wanted wehat they wanted, and wanted it yesterday. My peers and I coded it as close as we could, implemented it, and crossed our fingers. We kept the app running for about a week (with frequent bailing wire and BandAide patches), but the system eventually melted down due to data overload (fancy-speak for filled up the disk).
Management skated, two programmers fired.
That's why I raise cattle and write hunting articles these days.
Ignorance is curable, stupid is forever.
I beg to differ slightly.
Software projects seem to be primarily constrained by time/money which is usually controlled by management (read: boss)
If one wants to test software properly then you will need lots of the constraints (i.e. time and/or money). Just before a coder is testing his block, he/she will generally say something like:
"I'm finished the block, just need to test it a bit more"
Generally that is not what management will hear, they hear:
"I'm finished"
So they think "its ready". I've seen several first generation projects get hit by this problem (in commercial environments). In the IC design world (where its not generally possible to just flash the firmware to fix a bug) its accepted that at this point - i.e. primary design is finished you are only 50% of the way through. We spend at least half the time verifying the blocks. Management in IC design have accepted that this just as important as the implementation and so don't go off making wild assumptions.
So rather than just pawn off the blame onto your boss, it really is (partially anyway) your fault as well for not highlighting the fact that your block is not as tested as you would like it to be.
The philosophy of open source seems to limit the "its ready" effect to a good degree and hence the better code quality perception. When main stream commercial coding picks up the slack, it should get better as well. But generally a lot of these messes can be attributed to communication (person to person) failure rather than coder/boss failures.
[ Monday is a terrible way to spend one seventh of your life. ]
If you start blaming the programmers, they will lose there self-esteem and move on to other projects. There is only a small percentage of elite programmers that have rarely if ever made a mistake past the prototype stage. This small percentage of elite are not enough to write the programs for everybody. So if we sue/harass/fire all the non-elite programmers, how are we going to make up the deficit?
Sorry Microsoft, it's the software. When I go to the local airport and see a kiosk displaying a Windoze 2000 screen saver instead of information, something is wrong with the software running the kiosk. I'm sure that the kiosk owner followed all of the directions given and the stupid thing did not work anyway. A box that has to be restarted once a month and crashes when it's not has a software problem. Having two of them will simply multiply the problem by a factor of two.
How am I so sure that software not people are to blame? It's easy, I started using non Microsoft software and most of my problems went away. I've got the same old hardware, it just works better under Linux. It does more for me too.
Why is that? It might be that there's no nasty registry that's designed to keep me from "stealing" software. It might be that sane networking models really do eliminate most problems with worms and viruses. It might be that free software really works to make better code. Who cares?
The bottom line is obvious. No amount of blame shifting will change it.
Friends don't help friends install M$ junk.
is that IT tasks have been highly compartmentalized - to the point where coders are actually versed in a limited set (or 1) coding language.
And coders cannot be designers, DBAs, or possess much business knowledge. Interaction with the end user is done with a 'business designer'.
As with the childhood game of post office, some of the information gets lost for every node in the SLCD (sftwr life-cycle design) chain.
One of the best fixes is to allow direct interaction of coder/end-user, and merge the designer/developer roles for a better industry understanding.
From the article: "Developers are least qualified to validate a business requirement. They're either nerds and don't get it, or they're people in another culture altogether,"
I used to think this. Then I realized that at least the developers knew one end of it -- they knew what the software can do. The other end, what the customer wants out of the system, is usually not known by anyone. Not management, certainly not sales, and not the customer either.
A customer with an existing system will often try to write requirements which amount to "do exactly what the existing system does in exactly the way it does it", which is not what they want or they wouldn't be replacing the system. Or, whoever is providing the business requirements will be so out of touch with their own business that the requirements will be incomplete or wrong. Or on the flip side they'll be so familiar with the system that they'll leave out things which are obvious to them -- but so obscure outside their field that no one on the software side will even notice the omission.
Of course, these problems will be discovered very late in the development cycle, resulting in a scramble to redesign and redevelop, a bunch of fingerpointing, mandatory overtime, and a host of other ills all of which lead to bad and buggy software.
good programming is not enough to prevent system failure. good programming is good for your homework project or a little module.
good software engineering is required for large systems. when you are developing hundred thousand lines of code to million lines of code. no amount of good programming will guarrante a good system without solid software engineering processes.
------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
I don't see what's ridiculous about performing regular restarts of a mission critical system. Would you rather have a a system that booted correctly this morning routing your flight, or one that booted correctly last year and may have its components functioning properly anymore? Do you think that people incapable of rebooting a computer every 30 days are going to perform regular maintenance and testing of electronic components? Do you think they're going to remember to fsck their disks every day?
I don't think so.
using namespace slashdot;
troll::post();
No
You get good software from good teams of business people and developers. Every successful project I've been on has been successful because of a good team comprised of users, "business" people, and developers. Add to a good team a good software development process and you have a better chance of success.
Not surprsing that a CEO would make this remark. I can't count the times I've asked the business community I'm working with for clarification of a business rule or requirement, and then get a 'sigh' or other look that says - "I'm too busy to worry about this".
And on the contract I'm working on now, they consider a 30 min phone meeting enough information to build a full blown app - trying to get documentation is like pulling teeth. And of course we know where the finger will be pointed if there's any issues.
To say we're nerds who don't "get it" is just an asinine, condescending remark; a) I'm perfectable able to learn about the business involved, b) If you explain the rules properly most developers I know have no problem at all coding the solution. I find most of the developers I work with brighter than the business community they're working with. The CEOs remark has a dilbert-like quality to it imo, and this guy's one of the 'experts' on the problem in the article... ha!
'The unexamined life is not worth living' - Socrates
That works really well in theory. The problem is when management looks at you and tells you to do it the way they said anyway because they're in charge and you aren't. I've run into that a few times in the past. The fact that the IT manager was an idiot and thought he was an authority on the subject because his wife was a programmer didn't help.
Everything I need to know I learned by killing smart people and eating their brains.
Contrary to popular belief here on /., MS does not hire idiots to write their code
Amen to that. I don't know where this idea that MS doesn't hire skilled people to design and develop software came from, but it's wrong.
It has always appeared to me that MS hires top students from the very best schools.
bhj
Nevertheless, it's those poor planners, poor executors, and poor leaders who are in charge. You really think they are going to take the blame? No, of course not! It's so much easier, more fun, and better for your career to tell upper management that it was just the programmers who couldn't follow their instructions correctly.
Programmers will then get blamed, the poor managers will get a bonus for "correctly" identifying the problem, and corporate America will sail on as it always has: giving the big bucks to the managers and sales folks, while ignoring the programmers.
Who me, bitter?
The talking heads at the top want a shitload of features, and they want it by an unrealistic deadline
Welcome to every single software project ever.
$8.95/mo web hosting
The article is a bunch of malarky. Well, I suspect it is, but i stopped after the first couple paragraphs, after I read this:
Last month, a system that controls communications between commercial jets and air traffic controllers in southern California shut off because some maintenance had not been performed.
Yeah. That maintenance they failed to perform? It was their mandated once-a-month reboot of their windows system, because it locks up after 43 days.
This was the result of bad programming.
Anyway, as a QA guy, I can assure you that bad programming abounds. It's my job to make sure you never see it. Part of that job is trying to drill into programmer's heads the concept that performing to spec when used as directed is not sufficient.
This is just like television, only you can see much further.
Yep. There's the same difference between software engineering and programming as between architecture/structure engineering and building construction. Doesn't matter how good the builders are, if the architect built a bad plan the house will fall down. To push all this planning stuff out of the responsibilities of the "programmers" is unjust on the managers, though. A good software engineer should be aware of the whole process and how it needs to be conducted and be able to advise his manager (if he's not a manager himself) on how to proceed. It's part of his job.
If the builders get given a plan where the roof is placed underneath the house, they should question it, not just build it blindly without asking.
Daniel
Carpe Diem
As a result, the development staff here lies to their managers, who lie to their directors, who lie to their VP's and on up the line. This points to a breakdown in communication between all levels in IT including the lines between IT and the business.
This is not something unique to IT -- it's something fundamental to any command structure which relies on communication between unequals.
It's only common name is the SNAFU Principle, which was coined by Robert Anton Wilson (there's a very good discussion of it in his book Prometheus Rising).
In Illuminatus!, a satirical study of social pathologies, Robert Anton Wilson and Robert Shea brought out an important principal that causes trouble in hierarchies: the Snafu Principle. People tend to say what they think the boss wants to hear, especially if they have noticed that the practice of ``shooting the messenger'' is common. This means that the information passed up the pyramid is distorted at each level. Thus, each higher layer of managers tends to have less and less contact with reality, and near the top they are often completely out of touch.
Considering that you deal with users who don't really know what they are doing in the first place I would have to place the majority of the blame on them. However you could also retrospecitvely place the blame on IT for not having the systems locked down in the first place but then you would have to blame the CEO and the board for not putting more technology in the budget. Yea we won't go there.
Will you people ever get off this silly juvenile crap? MSNBC and Slate have never demonstrated anything but complete editorial independence from Microsoft, and they (and MS) deserve credit for it, instead of constant sneering from the audience at VA Linux's propaganda mill.
They bought a Yugo (windows) to do the job of a truck (UNIX). The Yugo needed more maintainence than the truck, and they had an accident. They fired the 'state of the industry' execs who decided to replace trucks with a Yugos. This is actually good news, in a way. Now all they need to do is get the trucks back.
Hmm... I wonder if the execs running nuclear power plants have finished installing windows to run them....
Better yet, we can put windows in charge of the ICBM fire control systems. We'll be *so* state of the industry.
No, not every software project. The typical deadline for open source software is "when it's ready". Which often isn't an unrealistic deadline. However, the shitload-of-features problem can happen there, too (and is usually the main reason if "when it's ready" gets unrealistic).
The Tao of math: The numbers you can count are not the real numbers.
. . .a programmer should know to say 'this wont work'. . .
Programers, at least the good ones, know how to say this, and say it loud and often. Likewise, the true professional salesmen who have actually studied their craft know how to say "This promotional technique of yours has actually been proven in study after study to be a pointless waste of time." The cabinet builder knows how to say, "That joint will fail." The day laborer knows how to say "If you really insist I do it that way you'll get less real work per day out of me and I'll be out on comp in six months."
Managment knows how to say, "I'm sorry, but our policy is. . . "
They're often very good at saying it, because they get a lot of practice saying it, instead of practicing how to listen to the people they've hired because they possess certain special skills and knowledge.
Engineers know how to say, "The Space Shuttle will blow up if you launch below a certain temperature."
It turns out they were right, but managment followed policy.
Managment's solution? Fire the engineers for speaking out.
So long as you work for managment that views its role as telling you what to do, and your role to just shut up and do it, actually doing your job the way you percieve it as an expert is simply a short walk to the unemployment line.
Once upon a time my mother was the manager of vehcile registration renewal for the NYS DMV. Her superior walked into her division one day and found her cleaning a microwave oven.
"Do you think we pay people at your grade level to clean ovens?" he asked her.
"Everyone of my people is working on production, in the only profit making division of the state government. Do you want me to stop one of my people from making money for the state just to clean an oven when I've really got nothing better to do right now myself?"
Fortunately for her, after a moment of reflection, he actually got it and replied, "You know, I never thought of it that way before. I guess sometimes we do pay people at your grade level to clean ovens."
Both he and my mom are fairly rare examples of managment. Managment serves the purpose of making sure the secretary has paper clips when he needs them, the assembly line workers have bolts when they need them, and to gently nudge people back on course if they should happen to stray a bit from the path, not "tell them what to do."
But somebody has negelected to tell most of the managers that.
KFG
It actually seems to be pretty regular integrity . . . wasn't Slate just recommending Firefox to people?
This is why Free Software tends to be more secure. The project managers tend to be programmers, not non-techy businessmen. They understand the concepts of "still needs work" and "not ready yet" even if a product is late. Commercial software vendors would rather release a program on time and hide any last-minute security flaws that pop up (to be fixed in some patch, which is perhaps another profit generator). Open Source projects, lead by the programmers themselves, will usually prefer to hold back a new version if they feel it's not reliable enough for release. Besides, that's what developer versions are for.
If it weren't for fog, the world would run at a really crappy framerate.
Sorry boss, you're getting paid to know. Spend some time (gasp! outside of work if you need to) and read up. While you're not expected to know every last implementation detail, you should understand the capabilities of your chosen platforms completely.
Well, we don't necessarily need to blame it on their affiliation with MS, poor journalism could also be the problem.
If I remember correctly, this "maintenance" was a monthly reboot which was necessary because the software (in this case Windows) would fail without it. You can blame that on poor management, poor training, lack of diligence, or faulty software.
The fact is, if the software wasn't faulty, at least in this case the rest would not have mattered.
On the other hand, the best managed company with the best training and most diligent employees could not possibly have fixed the problem with Windows. I suppose if your going to blame management for anything, it'd be deploying Windows for mission critical tasks.
When will Windows be ready for the desktop?
A mission-critical system should be interrupted exactly when you want, not on a schedule dictated by a calendar. The original "BS!" poster was right: if there are memory leaks, garbage collection problems, etc., then that's evidence of sloppy design work.
Saying you need regular reboots is the same as saying you need a firewall to protect against viruses: both show flaws in the design of OS.
And as far as "fscking their disks every day" goes, that's more sloppy design. You shouldn't have to do that. Fsck fixes file system errors resulting from poor application behavior, environmental problems, and (sometimes) hardware troubles. You shouldn't have those every day in mission-critical systems, but even if you do then putting in place a system of daily fsck is not the way to fix it.
I've had a production application server running for the last 288 days. It's due to come down for OS updates, but it will do so on my schedule, not because its operating system is poorly designed.
sigs, as if you care.
The other problem is is that this type of 'message modification' can be done unconciously, from choice of words and adjectives, 'forgetting' to mention bad news, and even body language. Even subtle moderations can have an impact in a large heirachy.
It *is* possible to fight this, but what you then have to do is have a manager and the people they manage feel and act like equals. If there is no fear of reprisals for bad news, and no fear of reprisals for honest mistakes, then the quality of the communication within the organization will rise.
Really, the solution is to try and structure your organization as something other than a pyramid. You don't need to run an organization with a single "alpha male" at the head, and everyone reporting indirectly to that person or board. But that seems to be the cultural norm in western society.
Have you considered the possibility that the 30-day reboot cycle is supposed to ensure that if they were to experience a crash or something that the system would be able to reboot? I mean, there are plenty of people (probably even here on Slashdot) with servers that have been up 5-7 years but if they have to reboot for some reason, what are the chances that the system will have problems coming up? Many hardware faults are discovered at boot (the stress of boot brings them to a head).
Full-Featured GPL Web Hosting Control Panel
Errr... I got lost when we divided by the software engineer but:
I have done CMMI and I am not a fan of heavy process done in the heavy handed way CMMI is done. Its a great way for extra levels of management to justify themselvs. That much said, CMMI does ask developers to do important things (and don't quote me, this is just what it means to me):
1. Coding standards
2. Unit testing (automate it!)
3. Peer reviews
4. System testing
CMMI makes you do all of that and document it. The paperwork is over the top but the result is better software.
The big winners are having unit test code that aims for 100% coverage and real peer reviews (the many eyes approach). The peer reviews can be initially painful but everyone learns their bad habits quickly and soon gets out of them.
The downside is that it comes with a whole crew of managers, inspectors, auditors, validators, finger pointers, and beurocrats who are working hard to justiy their existance by beating up the developers and their code as much as possible.
In the end, it all comes down to common sense: plan what you are going to do. Do it carefully. Test it. Ask your peers to review it and learn from their comments. Put it all together and test it again. Remember your problems so you don't repeat them.
This sig intentionally left blank.
"The Mythical Man Month" should be required reading for every six figure mouth breather out there. Of course, it's thicker than "Who moved my cheese" and can't be purchased in an airport gift shop, so I suppose there's no hope...
*** Sigs are a stupid waste of bandwidth.
"Many eyes" is good, but it's the scheduling disaster that works against producing solid designs and solid code in the first place. Linux vN+1 ships when Linus thinks it's ready, not to meet some marketing manager's fantasy deadline. Shipping software when it's ready, not when someone who hadn't a thing to do with making it wishes it would be ready, cannot be overvalued as a component of software quality.
Never ever?
Perhaps you haven't been paying attention. Even Diogenes, living in his barrel, paid more attention to the world than that.
The assumption that MS hires "idiots" is unfair to be sure. However, those in the know who have seen some of the colossal kludges in MS software, and recently almost all Windows users who have been impacted by the repeated, massive virus/worm attacks base their knowledge on the only thing they know about Microsoft--their products.
It has always appeared to me that MS hires top students from the very best schools.
That is true--unfortunately they have been known to hire them AWAY from the best schools too (ie. before they graduate). It doesn't matter if they are top five percentile students--if they have zero practical experience and are thrown into a situation beyond their capabilities the result can be less than ideal. Nonetheless, I think that by now MS has figured out how to select and place recent grads and students hired before graduation. I think the problem is now deeper than that.
Microsoft triumphed over other tech companies that were prominent in its early days because BillG learned it had to become a marketing company (the same reason Apple still exists today--Jobs knew that from the start and Gates is a very quick study). Other tech companies remained software companies--they toiled away to make their next killer app the best it could be and marketing was an afterthought.
At Microsoft, from 1980 on at least, has been a marketing comapny first, with software development second. The most important technology it markets was invented elsewhere and merely extended by Microsoft. Only in the company's latter life have they been truly serious about research. The long time "thinkers" are brilliant but historically little has come out of Microsoft's research that has been commercially successful given the potential funding power MS has had.
Therein lies the problem. The article is right--software isn't the root cause of the vast majority of failures (even when the failure is the direct result of a software bug). At Microsoft, software design is driven by marketing--time deadlines, customer requests for features, backwards compatibility/legacy support etc. The result is the house of cards we build our systems upon today.
That result is unavoidable without EXTREMELY skilled planning and throttling the pace of change. Unfortunately, The MS Ship sails where the winds take it, and the pace of change has been rapid and relentless until now. I once thought the problems with MS products were because too many drop-outs were running the show. After seeing this blog I can see what the development teams have had to cope with. They have to do the impossible and try to get it done before the deadline slips yet again and MS market cap slips a few million and BillG comes down to yell at them. In some cases you have to be brilliant just to survive at MS.
So anyways, I think software bus are the immediate cause of a lot of disasters, but the ROOT cause definitely is poor planning and project management that leads to unstable system development.
While building a show, you have a director who basically controls everything and everyone. He says 'stand here, put that there, turn that light down'. (He hopefully delegates some of that.) This is how some bosses seem to act...but they're managers, not directors.
When the show is ready to open, he's built everything exactly how he wants it, and he then turns it over to the stage manager, who then proceeds to, hopefully, do nothing except call out 'Okay, do the next thing on my mark. Mark.' repeatedly.
Of course, as we're talking about something that, no matter what anyone does, seems to repeatedly involve actors, said stage manager will spend a good deal of time locating them. Or something equally silly, like lighting system tripping a breaker, the curtain failing to operate, or a fly that won't fly.
So, by analogy, the board of directors of a company should 'build the show', and then hire managers to make sure things keep running.
If corporations are people, aren't stockholders guilty of slavery?
More like, the plane was known to continuously leak oil and/or other safety fluids to the point where it became dangerous or unreliable. They could have either replaced the plane or fixed the problem for greater cost, but chose to ignore the problem until one day missing that critical oilchange caused a near crash.
This isn't about a standard maintenance procedure, since a server should not have to be rebooted constantly in order to maintain. stability/functionality. That's like saying it's ok to swap the oil every second flight because it's cheaper than fixing the actual problem (that there's a leak in the first place).
And actually, considering that many earlier windows problems were caused by memory leaks... not such a bad analogy now...
No they don't. It's been studied for decades, but in all that time we've still not settled on anything that's actually demonstrated over long periods of time to be good. Hardware, materials and other available resources are continuously evolving and changing, meaning that software design research has nothing to reliably settle on before things change again.
We don't even have consistent and proven programming languages. Today it's Java, C#, VB and a variety of imperative scripting languages. Yesterday it was C and C++, before that it was Fortran, and before that there have been variants of assembler. And as we use these languages, we're constantly discovering more and more about language design and developing new languages.
HCI is still in very early stages of development, and that's a major part of software engineering. (If people can't use software then what's the point?) The vast majority of software development shops -- particularly smaller ones -- don't even employ HCI experts, and substantial proportions of developers still don't respect them or understand what the point is.
Something like bridge building, for instance, has been studied for centuries (if not millenia). It relies on consistent physics, consistent tools and well understood environments. Organisations that build bridges have well established experience, procedures and regulations that are put in place throughout their organisation. Software development's been studied for a few decades with the existing materials, resources and expectations constantly changed from underneath it.
Organisations that build software still don't have any reasonable idea of how to arrange themselves, or what procedures they should be using. There have certainly been some pretty good ideas from relatively recent ongoing studies, but the fact that managers and developers and marketers and whoever else frequently don't gel together very well with usually bad results is just an ongoing consequence of the fact that it's a very new field.
Just because software engineering has been studied for a few decades doesn't mean we know what we're talking about, or even that we know what we're studying.
Actually, the reason Linux security is better than Windows is because of design. Linux has a much simpler design that was based around files, multiple users, and networks. Windows has a complex design that is a kludge of multiple different systems that was originally meant as a single user, non-networked, floppy disk driver.