Software Defects - Do Late Bugs Really Cost More?
"If you're a software engineer, one of the concepts you've probably had driven into your head by the corporate trainers is that software defects cost logarithmically more to fix the later they are found in the software development life cycle (SDLC).
For example, if a defect is found in the requirements phase, it may cost $1 to fix. It is proffered that the same defect will cost $10 if found in design, $100 during coding, $1000 during testing.
All of this, to my knowledge, started by Barry Boehm in papers[1]. In these papers, Mr. Boehm indicates that defects found 'in the field' cost 50-200 times as much to correct as those corrected earlier.
That was 15 years ago, and as recently as 2001 Barry Boehm indicates that, at least for small non-critical systems, the ratio is more like 5:1 than 100:1[2].
[1] - Boehm, Barry W. and Philip N. Papaccio. 'Understanding and Controlling Software Costs,' IEEE Transactions on Software Engineering, v. 14, no. 10, October 1988, pp. 1462-1477
[2] - (Beohm, Barry and Victor R. Basili. 'Software Defect Reduction Top 10 List,' Computer, v. 34, no. 1, January 2001, pp 135-137.)"
At any stage, you can only find bugs that are introduced at or before that stage. So while fixing a requirements bug in the coding phase might be more expensive than fixing it during the requirements phase, fixing a coding bug during the requirements phase is a tricky operation that I'll leave as an exercise for the reader :-)
Of course, if you omit some of these phases completely, you won't introduce any bugs during them. That's why the JFDI(*) methodoloy is so popular.
(*)Just F*cking Do It
Every bloody emperor has his hand up history's skirt [Peter Hammill/VdGG]
There's plenty of proof out there. Even "ancient" but worthy texts like "The Mythical Man Month" discuss this one.
The size of the project and the nature of the bug really combine to drastically affect the outcome.
For me personally we have just spent about a year tracking down a particular set of bugs (probably not all nailed yet) which showed up post-live. When we were pre-live these would undoubtedly have been easier to fix, but something else that we could have done at that point would have been to improve our design, which would have nuked most of the bugs completely. Once we are in production however we have this forward/backward compatibility heuristic tying one hand behind our backs, and redesigning the thing gets much much bigger.
But that's just anecdotal, of course.
Defects are easier to find in a concrete product than in a conceptual design. Also, many bugs will be introduced in later stages. Therefore, even a full proof design may evolve into a buggy implementation. So surely: there is a trade-off between looking for "bugs" too early and fixing bugs too late.
Nevertheless a trainer is correct in stressing the golden think-before-you-code rule - especially when instructing unexperienced coders.
--
Every program has two purposes -- one for which it was written and another for which it wasn't.
Looks more exponential to me.
As I recall there was a conference paper in Extreme Programming Perspectives which describes an "infection" model for bug creation, fixing, etc. They were trying to model exactly the effect you describe to see if they could (in a model) find any justification for XP's argument against the increasing cost of bugs through phases. Again, just from memory, they do try to validate the model against figures from real studies.
There's also material in Watts Humphrey's book on the Personal Software Process (about as far from XP as you can get). That book is illustrated throughout with statistics about students who tried to complete the exercises in the book, including in Chapter 13, where there's a section on "The Costs of Finding and Fixing Defects.".
Compare the cost of testing, then over-the-air updates to a set of mobile phones & associated risk management
to
the cost of just building and shipping new code
that has yet to undergo testing or launch.
To give you an idea, managing the testing and upgrading over-the-air softare in mobile phones can become a new project in its own right with all the associated monitoring and overheads.
Fixing the bug of a pre-launch project can be a 1 minute job.
Never forget that complexity accumulates. Fixing the bug itself probably costs about the same at every stage, but other costs are introduced as the project moves along, and peak after the software has been deployed.
A bug found after deployment has costs associated with it that a bug found during coding does not:
The cost of finding and fixing the bug may be negligible compared to other costs.
Another aspect of the issue is the nature of the bugs you find late. In my experience, bugs that survive testing and deployment tend to be either bugs in requirements or pretty subtle bugs that slipped through testing, and both are more expensive than the type of bugs commonly detected early on during development.
The guy in fight club worked out if the cost of a recall and fixing the fault was going to be greater than the cost of litigation.
I would expect the same kind of factors come into play when the product is software instead of hardware. So why not try google
Sometimes it costs less to pay a person to manually correct data that is incorrect due to a fault in the core of a product, sometimes it's cost less to do a re-write.
thank God the internet isn't a human right.
Most recently I've been tracking down an error in our system. After nearly a month of trying various things, I found the problem of an error. In this case, two years ago the hardaware engineer building the FPGA and DSP programs didn't bother to fix the [relatively simple] design problem. Rather than give all communications the same format, a few commands differ substantially from all others (different responses in certain circumstances, for example).
The problem made it into the PC software that interfaces with the board. The problem is documented in several [maybe 20?] bugs of the software that works between the PC and the external device. The problem is documented in at least 50 bugs in a port of that PC software. It has been in production for several years, and implemented by external companies (which I feal sorry for, due to the complexity of the communications bug).
Now we're working on a completely new FPGA/DSP board to replace the earlier board. Design changes prevent us from directly implementing the bug in the new design, although otherwise the communication protocols are the same. Implementing the same malformed communications will mean breaking the simple straightforward design and carefully implementing a set of 'design exceptions' (read: 'bugs').
It would have taken one engineer an hour or so to fix this thing when they first saw it. It would have taken both teams a few days to fix it when writing the PC to DSP interface (~1 FTE month). It would have taken a few weeks to fix it when writing the port, requiring changes to the PC software and the DSP (~1 FTE year). If we choose to fix the error now, it will probably result 2+ FTE years of work to just fix everything, and more time for regression testing every old peice of software for this one bug. If we choose to leave it in, we will devote at least that much time in evaluating, implementing, and testing the old errors. Not to mention the continued maintenence work when the eventual bugs are found in the new board.
Now we're forced with a tough financial decision: do we spend a month or more carefully re-creating and testing the 'design exceptions', (probably 3-5 FTE years in total) or do we do it 'the right way' and break both our own and our customers' software? (again, several FTE years, but potentially loosing faith with the customers.)
This particular bug could have been prevented by about $50 of work. It has now cost the company tens of thousands of dollars, and will probably cost a few hundred thousand before all is said and done.
Now, lets throw some financial ethics into the $50 --> $5,000 --> $50,000 --> $500,000+ problem: The engineer was in a hurry to fix the problem before a company imposed deadline. Is that engineer responsible for the enormous financial cost? If so, how much? If not, why not? It can be argued that his negligence cause a half-million dollars in damages. It can be argued that the engineer was responsible for $50 but the team was responsible for allowing it to grow. It can be argued that this is a regular business cost due to falibility of engineers' designs.
This begs the question:
How responsible are any of us for the errors we introduce?
frob
//TODO: Think of witty sig statement
From what I have read, Oracle's founders had the best solution to the problem of customers holding off buying until version 2.0: "This first Oracle was named version 2 rather than version 1 because the fledgling company thought potential customers were more likely to purchase a second version rather than an initial release."
Depends on the circumstances.
;-)
If the engineer was rushed through the design and nobody had the time to check his designs, the company gets what it deserves.
However if the engineer skimped his work (for whatever reason) and the team failed to check his work, I think the team would share some kind of responsability.
If the engineer made the mistake and willfully let it in, he and the team should both partialy liable (he because he left an obvious flaw in and his team for not checking it)
This is also why I normaly have somebody (as in my boss) check my work and or discuss things that can be flawed or not.
This way I can't be fully responsable (usualy im just responsable for fixing the problem only)
I do notice how my emphasis is placed on automated testing and better designs around here, which makes an software engineer like me happy
If you have a good logical design that compartmentalizes each functional unit of your code (what I'll call "well-factored"), how long should it take to fix any one bug? For a typical app, even of pretty hefty size, you should, in theory, be able to run to the exact object, swap out what's broken, and *poof*, every place that functionality is needed is good to go. XP et al really do lose a lot of time in the overhead it takes to keep two people on any programming task, unit test, and the rest. You might be nearly guaranteed nice code, but what's your opportunity cost? In short, it's having two coders hacking about twice as much on what, if they're mature enough, should be well-documented, modular code!
Now we all know *poof* is not the case, and we all know that a well-factored system is about as hard to come by as nirvana (which means each fix requires ripping out a chunk of code), but the argument is still a valid one. Unless you have a huge system, where perhaps someone's "fixed" a bug by hack on top of hack ("Hrm, Bob's addFunction always returns a number one too low. Instead of bugging Bob, I'll just add one to the result in my function."), bugs today aren't like bugs in pre-object oriented days. If coders in the 80's had the debug tools and langauges we have today... Let's face it, it's much easier to create an Atari 2600 game today than it was when you had to burn to an EPROM to test on hardware each time and print out your code to review it.
The bottom line is whether it's more cost-effective to prevent 99.44% of bugs up front than it is to fix the extra 10% that slip through. I believe the original post is simply suggesting that the cost of fixing on the backside is dropping considerably, especially compared to what the same results would've required decades ago, and that is, honestly, a good point.
(Remember, this isn't upgrading code -- might be awfully tough to make code that's slapped together change backends from, say, flat files to an RDBMS; this is just bug fixing to make what you've got work *now*. But XP tells us not to program thinking that far down the road anyhow, so future scalibility is another topic altogether.)
It's all 0s and 1s. Or it's not.
If POP3 could have looked forward and seen the SPAM and Forged header abuses, security could have been part of the standard. Now that POP3 and IMAP mail is everywhere and forged headers are also everywhere, changing the de-facto standards is a big thing. Making the switch to something more robust will be a long and painful transition. Everything will be incompatible for a while.
It will be as easy as getting the US to switch to the metric system or transition with the rest of the world to driving on the left side of the road. Both would be much cheaper if they were implimented in the beginning instead of attempting a transition later.
The truth shall set you free!
Coding bugs are generally not to tough to fix (though sometimes hard to find). Design bugs are the killer. If you discover a design bug after implementation, you might need to change or even rewrite big slabs of code. The logarythmic estimate is probably a worst case analysis, not an average case. But without a doubt, design bugs that make it into production are bad stuff. That's why sofwtare engineers are either grey-headed or bald. :-)
I at the time those numbers were calculated, the software development process was very different from today. It was harder to distribute software, harder to deploy updates, harder for developers to get information about errors in the field. Testing the next release was a lot more critical because if a bug did exist it might not be possible to fix for several months until the next release could be sent out via floppy or mag tape to each customer.
Today most people download their software throught the Internet, and can get patches just as fast, even automatically as they are posted. Tools like Windows Error Reporting, Quality Feedback Agent, and BugToaster make it easier for detect and prioritize bugs based on their frequency of occurrence in the field.
So with all those changes, it's still 15 times more expensive to fix a bug after release? Does that take into account the time value of money, the value of early user feedback, or lost opportunity costs?
Typos: Simple misspellings of words. Infrequent, easy to detect, easy to fix.
Writos: Incoherent sentences. More frequent, hard to detect, harder to fix.
Thinkos: Conceptually bonkers. Very frequent, subtle and hard to detect; almost impossible to fix.
Most 'late' bugs that I've seen in software projects belong in the last category - a lack of design or the failure to make a working mock-up leads to 'thinkos' which are only obvious when the application is nearly completed. These are expensive to fix.
How do you fix a bug cheaply when the contract has ended and all the people working on it are gone? Enter training costs for new staff.
How about needing a whole new contract just for the bugs? Enter the immoble bureaucracy.
How about a year later, when, even if someone from the project is still around, it takes them a few days just to remember what they did 14 months ago? Enter seemingly wasted time.
Anecdotal evidence is viable evidence for the undeniable fact that late bug fixes are very expensive.
Healthcare article at Kuro5hin
It's expensive when you have to trash your CD stock because they're unshippable, or when you have to ship CDs to all of your customers three times in two weeks after you release. Try it and I bet you will have all the empirical data you and your wallet need.
Fix bugs early; it's less expensive that way. =)
The cost of a bug isn't in cash per se. Whether a programmer is in-house or a contractor, they're going to be at your shop for the standard work-week at least, right? So they're either fixing your bug or they're browsing slashdot. You pay the same either way.
;)
The REAL cost of a bug while the project is being coded is in delays to your project, which could push you past deadline. The cost of a bug after the project rolls out is the embarassment of getting caught with your pants down, and of having the inconvenience of pulling people off of other work to fix it.
So in my opinion, bugs are "cheapest" to fix during the initial design and prototype phase, where you're probably not that close to your deadline and you have some wiggle room.
They're more "expensive" to fix when you're closer to a deadline and the delay screws you up (for example, find a bug during user acceptance testing and you've got to go back and code, then start the testing all over again).
They're most "expensive" to fix when you've rolled out the project, the users come to depend on it, and something goes wrong. This embarrasses you and makes your code look untrustworthy, and forces you to scramble to deal with the problem, rolling out a patch, etc, all while dealing with hot-under-the-collar users.
I think this three-level way of looking at it is a lot more useful than any kind of imaginary mathemagical flim-flam. Forget the numbers, worry about the egg on your face.
Farewell! It's been a fine buncha years!
"Begging the question" aka "circular reasoning", in argument, means [...]
"Begging the question" is not a synonym for "circular reasoning". "Begging the question" simply means that your argument is based on questionable premises. "Circular reasoning" means specifically that you're using your conclusion as your premise. Circular reasoning may be begging the question, but begging the question is not necessarily circular reasoning.
In this case, I think you may be right, it's both circular reasoning and begging the question, but they're not synonyms, although they are, obviously, related.
(If more people on slashdot were to familarize themselves with common logical fallacies, I think this might be a better place.)
This question is idiotic, and in fact given the kind of code I get to work with every day, I should like to punch you in the nuts for asking. But since I cannot do that, I am going to give you a real answer ;-)
BTW, if you only read one part of my post, read the last paragraph.
For small, non critical projects the difference is indeed smaller because the complexity is much more manageable. Let's say you're building a house for your dog and your design forgot to specify which way the door should be facing. It doesn't really matter at which point you figure this out, because at any time you can pick the house up and turn it so the door faces the right way. Cost for error correction is exactly the same because the unit is stand-alone, the error is obvious, and easily correctable.
On the other hand, let's say you're building a pedestrian bridge between the Student Union and the Library, which are also being built at the same time. If during design you realize that "wait a minute, the library's entrance is facing away from the union, how's this bridge going to work", you can correct the issue fairly quickly. By the time the bridge and the library are built, your options for fixing the issue are very expensive. Which is why the bridge we had at Stony Brook wasn't all that convenient for about 20 years. It finally got torn down last year.
Analogies aren't even necessary here because there's plenty of real-world experience (mine!). Here's a quick example. Client does something and the server crashes. It is easy to detect this at the time of bug introduction, because "hey, chances are that the code I just made is the buggy one" so you know where to look. Five years later, when someone else is working on your code and something crashes because the clients started entering new kinds of trades or whatever, or because this guy is Indian and his name is longer than you allocated for, it's going to be a BITCH to find which part of the code does the crashing. Sure, the fix may take the same amount of time (just allocated 20 more chars and you'll be fine until aliens with REALLY long names land and start using our system) but bug identification took you a whole lot longer, and it cost you more.
The biggest incentive to detecting errors at the stage they are introduced is that the stages are developed one from another. In the above paragraph, I show that even an implementation error caught during maintenance stage is more expensive than one caught immediately - but they both stem from the fact that the spec and the design eroneously omited (for example) how long a name should be. It is a spec error all along. If the spec stated the required name length, the programmer would likely implement it correctly. If not, the QA testers would certainly detect it during testing stage.
You can argue with your instructor all you want, but in the real world not only is it more time consuming to find the error later on, it has more of a chance of affecting a customer - which can become an expense of its own easily enough.
Ecce Europa - Web Design for Business
...the cost of the wasted effort down the wrong path.
For example, if you get a requirement wrong and spend X developer-months designing and coding a subsystem around that requirement, the cost to fix it includes that already sunk cost plus the cost of reworking the design and code to make it conform to what the spec should have said.
Or consider the case where section II.3.iv of the spec conflicts utterly with the requirements detailed in section IV.2.iii. If you don't catch that early (and assuming its a large project, given the size of the specs), you'll have two different subproject teams off designing, coding and testing to cross purposes and you'll only discover the problem at integration time.
Sure, some requirements or design bugs are trivial to fix even after coding is almost complete (you got the color of some GUI feature wrong, say). Others aren't (you missed some key requirement that radically affects the way the data should be represented and you have to change all your data structures and database tables).
-- Alastair
Or you can read Alistair Cockburn's proof
What it boils down to is that if I do something wrong, then at the minimum, the cost of correcting the mistake is:
cost of doing the wrong thing first + cost of changing it do the right thing - cost of doing the right thing first
As the cost of doing the wrong thing + the cost of changing it is always going to be larger than the cost of doing it right, you'll always end up with a positive number.
The rest is about momentum; the earlier the mistake was made in the cycle, the more subsequent decisions were made that are also wrong.
Note, however, this has nothing to do with the cost of adding new features later. Here, you've got nothing done wrong to start with, and the cost of changing it is equal to the cost of doing it right. What you lose is the opportunity cost, which can be iffy.
"Software is too expensive to build cheaply"
On one end you have Ariane5 exploding because of a software error, on the other end, you have 10 clients within your enterprise which loose time because of software's bugs.
With such huge range of differing costs for finding the bug before or after the shipping of your product, the "average cost" of bugs is meaningless.
I think that the only thing to remember is:
- bugs found late cost more to fix than bugs found earlier (any specific number is invalid)
- finding bugs early is difficult and can be expensive.
Of which you can deduce that:
- if late bugs can cost you very much (Ariane5 for exemple), you want to spend a lot of money on software testing|review at each level.
- otherwise if tests can cost more than the fix (a small number of internal users with a non-critical software), then maybe you can use the clients as testers, but it must be managed well (tell the users, be in close contact with the users, don't let them wait the fixes too much).