Properly Testing Your Code?
lowlytester asks: "I work for an organization that does testing at various stages from unit testing (not XP style) to various kinds of integration tests. With all this one would expect that defect in code would be close to zero. Yet the number of defects reported is so large that I wonder how much testing is too much? What is the best way to get the biggest bang for your testing buck?" Sometimes it's not the what, it's the how, and in situations like this, I wonder if the testing procedure itself may be part of the problem. When testing code, what procedures work best for you, and do you feel that excessive testing hurts the development process at all?
or wolfram-style, if you prefer.
Put your program on many computers (local to you fif possible) and have them generate random input and drive it to the program to. Or if that is not possible, betatesters, try to make your program die.
The most certain way to weed out bugs (before others, that is...others that we don't want to find out i.e. users who bought the program, etc) is to use it a lot yourselfes beforehand.
In my mind this will work very well...
And no, there can never be too much testing, unless your code is like one line, which i doubt...for any full blown application, there is not "too much"
Looking for people to chat about multicopters, coding, music. skype: gtsiros
If you look at the guys with really low bug rates, like the NASA guys running the Shuttle control software, they have very separate test and development teams, and a competitive attitude. The test team "wins" if it finds a bug, and the devlopers don't want to look silly.
Some Extreme Programming techniques, such as paired coding may help too.
Well.. showing your work to someone important always brings up bugs, every time :)
I suppose its just a case of using the product like an actual user who knows nothing about it would, right from the first step.
Another good way to reduce errors is to follow the principals of Design by Contract.
State using Assertions what is expected of the code. Pre Conditions and Post Conditions.
If any of these fail, then throw an appropriate Exception.
-- "To ask a question is to show ignorance; Not to ask a question means you'll remain ignorant."
IMHO, programming and testing should be done at the same time in the development stage.
While programming and "bugging" happen at the same time, programming and de bugging/testing should happen at the same time too.
It is very well explained in Bruce Eckel's Thinking in Java . You should just test everything in the code itself, even if it happens to add some overhead. Once called that function, you want that <something> happens.. so check it in the code.
I know this is not the usual way procedural programming happens. It seems much more straightforward to drop the code as it comes and then check if it behaves correctly.
But if you do so you will often discover that that tests made afterwards ara not comprehensive of all possible situations.
And so you discover that testing and debugging are just unfinished tales, and it is even worst if testers are not the programmers who did the work.
Plus, I hate testing, so I force myself to do the work well and let the code (as long as possible) test itself, even if it makes development slower and boring.
Umhh... i'll preview this post 10 times, hoping it's free from bugs :)
Obviously my code contains no ewwows ;)
:dikappa
When a programmer is simultaniously coding and documenting thier code, at both the high and low levels, the larger "thought" bugs will decrease in number and severity.
Even if you don't use a literate programming system, often documenting the system before you write it can help make the code more clear.
- Serge Wroclawski
I don't know how relevent this is but I read somewhere long ago that newspapers ended up with more errors if they had multiple people proof reading the same text because nobody was really taking responsibility. Even if it is not intentional, there is always a feeling of 'the other guy will pick that up'.
I vaguely remember something similar being said about the space shuttle disaster.
What I miss in this discussion is something about the persons performing the tests. In my companyh we have a test team, consisting mainly of people who don't know the first thing about coding, who cannot read sources and who can only test 'through the UI'. And yet the system we work on has thousands of sources, a percentage of which has a UI (20%). Testing of all the underlying objects is a lot harder, and my experience is that with this many sources the total amount of possible 'paths' in the system is so large that tests using the UI take too much time, and therefore is never done properly. So now the developers are constantly asked to provide methods by which the testers can perform the tests.
Having been on teams producing 24 X 7, bullet proof code for communication servers and credit card processing I have an idea about the increasing number of bugs found. In the Old Days(tm), we wrote every line of code ourselves and used time tested libraries (C language). I quit using microsnot when their libraries stared having bugs in their rush to C++. Now most coders use massive OOP libraries from who knows where built by slackers, and GUI app builders that generate code and perform all sorts of actions under the hood. When something goes TU it is often hard to find all the conflicts.
Even when using one of these app builders I read through all the code and put tests and logging into the generated code. Funny that these tools are supposed to make us more productive. My coding and testing every line still beats total time spent on a project since I don't have to go back and redo it later. When it's done, it's done. Next project. I've had comm programs run for over 5 years error free servicing 1000s of users per day. One specialized delivery, billing, and inventory system I wrote was used over 6 years error free and caused the owner to stay with hardware that was compatible with the software (not M$) because the programs always worked. And not a damn bit of it was OO or came from some automated builder tool.
In short, the closer you get to the metal and the more familiar you are with the code that is executing, the better your chances of producing error free programs. Takes longer to market, but then you don't have to redo it forever until the next bug ridden version comes out. Saves time and coders to work on the next version and the customers are always pleased. Get back to the basics. Try it, you'll like it.
> This is why the feedback needs to be direct from QA to the developers
I agree. For some reason (maybe it's just me) developers are nowadays too full of false pride as well, thinkin: I am the lead coder, analyzing bugs is the job of trainees. In my opinion the situation should be (atleast in some cases) completely opposite, only veteran coders can make correct assumptions and define pre-cautions for future and fix this particular case in the correct way. Otherwise it might just lead to a decline of the original code - making things even worse.
Working with bugs is a tough job, do it with pride! *with the allbugfixers unite anthem playing gently on background*
First thing to do : look in your bugtracking software ( you DO use bug tracking software, right ?) , and try to isolate hot spots. Is there a particular piece of code that generates more bugs than others ? Is there a common pattern to the bugs (ie. memory not being freed, of-by-one errors etc.) ? Are they _really_ bugs or mis-interpretations of the requirements or the design ? In my experience, the 80/20 rule applies to bugs in spades - it is just hard to find the patterns.
If you need to, make the bug categorisation in your bug tracking software more specific. Once you get an idea of what your hotspot is, you can work at fixing the cause of the bugs.
If it's a particular piece of code, make sure it's reviewed by the best developers/architects you have, and consider refactoring it. At the very least, insist that it is reviewed and tested thoroughly before chec-in to the source code control system, and consider adding a hurdle to jump prior to check in (e.g. get the manager to sign it off).
If the code was written by one developer, consider swapping them out and giving it to someone else - it may be they're in over their head.
Make sure you increase the number of test cases for this piece of software, and check for "edge cases" religiously - if the code is broken at all, it is likely to be broken in more ways than you realized.
If it turns out that the problems tend to have a common cause (memory leaks, of-by-one errors,etc.) consider a structure which forces developers to concentrate on those issues before checking in code; again, consider the hurdle ("software must be signed off by the off-by-one guru prior to check in"), and hone your tests to check for these kinds of errors if possible.
If the bugs stem more from misunderstood requirements or designs, beef up those areas. Work on your requirements and analysis processes; consider training courses for the developers to get them up to speed on interpreting these nebulous documents, and look at improving the review process by having designers present. Frequent "mini-deliverables" (another concept stolen from XP) will help here too - get your team to deliver a working piece of code - it need only be a minimal sub-system - and get it reviewed by designers and analysts. If the bugs tend to occur on the boundaries - i.e. invalid API calls, invalid parameters etc. - consider design by contract or aspects.
Finally, there's a bunch of hygiene stuff
N
It's all very well in practice, but it will never work in theory.
Better than double checking everything is to have an external eye code review everything. It's probably a 10% overhead when it comes to the coding side, but a >50% decrease in the debugging side. Well worth it.
I'm currently on sabbatical, but I consult 1 day a fortnight for a couple of small local companies who can't afford me full time - all I do there is code review, and they are of the opinion that I more than double the effectiveness of their less experienced programmers.
THL.
Keeping
One problem that usually arises with this approach is that an very large quantity of data must be hand created to test all the paths in the function. This is not IMO a flaw in the procedure you recommend. It points to a design weakness: it means that higher level logic has failed to adequately sort different cases to be dispatched to different functions. This can be a result of abusing the notion of "hiding messy details" at the upper levels and pushing the problems down to the bottom. There must be a balance of hiding/deferring and exposing/processing of situations at every level to keep the complexity burden reasonable at each level.
"Obtuse Anger is that which is greater than Right Anger" - Lewis Carroll
A number of years back I wrote test programs for printed circuit boards. First you created a model for the board that simulated the logic circuits. You then wrote test patterns that were applied to the board's inputs, and the simulator model predicted the board's outputs. The inputs together with the predicted outputs were applied to a real board that you wanted to test, and if this test program passed you assumed that the PC board was good with a high degree of probability.
One mode of the simulator allowed you to simulate faults that might occur on the board. The simplest kinds of faults were physical IC pins "stuck-at-zero" and "stuck-at-one" (these were the most common faults in real life), and if you wanted to be thorough you could also simulate "internal" faults down to the gate level.
I worked in a contract test programming house, where the contract with the customer required us to produce a test program with a specified minimum level of fault coverage, usually just at the physical IC pin level to minimize cost of developing the program. This ranged from say 90% for cheaper commercial work to 99%+ for certain government contracts. With >95% coverage, the "real life" fault coverage was maybe one or two "dog pile" boards out of 1000 would pass the test program but fail a system test.
The point of this is in that business, there was a clear objective measure of a test programs "quality". The measure wasn't perfect, but it was far better than just blindly writing a test program based on a "gut feel" for how the board should work. In addition, the test programmer had a clear, objective goal.
I think a useful tool in the software business would be a measurement of the percent of lines of code that were actually run during the QA process, along with a log of those lines that were not run and not run. Often there are big chunks of code that only get triggered by very special conditions, and there is no way QA can guess those strange conditions. The standard QA process is very subjective; there is no objective measure of any kind as to how thorough the testing was, other than just documenting a list of features that were (often superficially) exercised.
A more sophisticated tool could go beyond lines of code and into log the various logic combinations exercised in "if" statements, etc.
Several years ago I wrote an experimental tool that did this for a specialized database programming language. Basically it rewrote the program with a logging call after each statement (and yes, the "QA version" ran very slowly). The results were quite eye-opening, revealing chunks of "dead code" and conditions no one ever thought of testing. Unfortunately the project kind of died.
Many languages have "code profilers" that are mainly intended to analyze performance, but many of them could be easily adapted to become QA quality measurement tools.
Do these kinds of tools exist, and if so why aren't they more widely used?
We need to do a code review in my shop, since we're approaching release on our project, but there's a slight issue... there are two coders on the project (myself and the senior coder), and we're the only ones in house that know C++ very well.
What do you do in that case? Self-reviewing the code is of questionable value -- since you tend to skim over the parts you wrote "because you know it works!".
The appropriate quote is "It's just what we asked for, but not what we want!"
I don't think this kind of 'bug' can ever be removed. Despite an understanding of the 'business' side of things, my experience has been that the overwhelming majority of specs suck ... whether it's incomplete definitions, contradictions, or questions about what order various rules things should be in. Coding errors should be few and far between. To have them occur generally means that the writing went too fast ... although, to be fair, given the "I want those changes yesterday!!!" attitude of the modern business world, this situation seems to occur with more frequency now than it did a decade back.
Great idea, so long as there is someone who can competently review your code around. I do a great deal of scientific programming (some if pretty nasty stuff) and quite often bringing someone up to speed to do better checking than a compiler would require teaching them about the level of MS in physics/math and or CS. I've found few people who sit the fence with enough expertise in enough of the fields I draw from the be really useful.
But, given that most coding "ain't rocket science" your suggestion is cogent and applicable in many cases.
When I code programs that are used by the general public. I find double-blind testing, and black-box testing works best. With software that means life or death or something severe I will also do white-box testing.
double-blind testing is when you give the code to a willing party and just let them work with it like they normally would for business purposes, without letting them know it is a beta testing. You have to also include some type of bug report that people can fill in if they wish, but try to encourage them not to cause bugs, and just work with the program as if it was normal. This allows you to see if any of the normal functions that people use everyday would be buggy.
Black-box testing works great to Just test the programs function calls and modules. When I do BBTesting I usually give it to another party with instructions as to how the functions are called and utilized. This party knows how to test the extremes and the common values and give me the best testing.
White-box testing is testing that involves intricate knowledge of the code. When I do this it is usually in development. At the end, if I feel like I enjoy pain I will do a through white-box testing suite for the program, but that has only happened once or twice.
In expenses, the cheapest form of testing is BB testing, followed by Double Blind, and then WB. Since white box testing takes a long time to design run and analyze the results I find.
There's some thoughts for you though.
~ kjrose
There are two subjects I want to discuss here. First of all, I'm going to present the "jelly bean model" of defect discovery, then I'm going to talk about why the "testing to improve quality" model is fundamentally flawed.
The Jelly Bean model goes like this: Let's suppose you have a big vat of red and blue jelly beans. Your objective is to remove all the blue beans. You do this by reaching in, grabing a hand full of beans, throwing away all the blue ones, and dumping the red ones back in.
At the begining, it will be very easy to find the blue beans (assuming the blue-bean density is high), and towards the end, it will be very difficult (since the blue-bean density will be low). If you graph the cumulative number of blue beans you remove each day, you'll get a exponential curve; quite steep at the begining (high rate of discovery) and which flattens out as you approach total bean removal.
Software defect discovery follows this model exactly. Defects are easy to find at the begining if there are a lot of them, and hard to find towards the end. This means that if your defect discovery rate is pretty much constant (with respect to the number of hours of testing you've done) then you're probably still way down in the very first part of the curve, and your number of defects is probably very high.
Here's the important thing to remember though; the quality of your product has nothing to do with how many defects you find and fix during testing. The quality of your product is determined by the number of defects remaining! If you find and fix 10,000 problems, you might think you're doing very well, but if there are 10,000,000 defects remaining, your product is still crap.
You can estimate the number of defects remaining by trying to fit the number of defects you've found so far onto that exponential graph I mentioned above. The most popular method to use a Weibull curve, or Quadradic Regression.
Now, why is testing to improve quality a bad plan?
Let's say you worked at Ford, and roughly 50% of the cars you turned out had something wrong with them. You get lots of unhappy customers demanding their money back. Is your problem:
a) That you have a design defect in your car.
b) That you are introducing defects in production.
c) That you are testing cars insufficiently.
Most people realize that to test every car as it comes off the line is futile. There's too many of them, with too many potential points of failure. There's no way you can test them all. The root cause of the problem has to be in either a or b, and if you're looking to improve the qulaity of your cars, this is where you would spend your money. This isn't to say that Ford doesn't test their cars, I'm sure they do, but testing should be a means of verifying quality (IE, 1/1000 cars tested had a defect, our goal was 1/500, so therefore we can stop spending money on finding design and production faults), and not a means of improving it.
It's so easy to see this when we're talking about cars. Why does everyone get it backwards when we start talking about software?
Not only is it impossible to test every possible combination of inputs to most software, it's also very expensive to find and fix problems this way. If you find a problem in design review, or code inspection, then you have your finger on it. You know EXACTLY where the defect is, and how to fix it. On the other hand, when you say "Microsoft Word crashes when I try to select a paragraph and make it Bold", you have no idea where the fault is. Any one of several thousand lines of code could be the problem. It can take literally days to track down and fix the defect.
Your testing should not be a means of finding faults, but a means of verifying the quality of your product. Testing is not part of the development process.
Well, no actually.
Requirements review and Design review find more defects per person hour than code review, especially when you consider that a single requirement defect will result in multiple design defects, and a single design defect in multiple implementation defects.
But yes, code review is a good plan all the same.
Buy this book, Handbook of Walthroughs, Inspections, and Technical Reviews by Daniel Freedman and Gerald Weinberg.
~~ What's stopping you?
But the way to make solid code is to get each bug out as soon as you put it in.
Over my thirty-five years of professional programming I developed a coding/testing method that produces extremely solid code - to the point that one of my collegues once commented that [Rod] is the only person he'd trust to program his pacemaker. B-) It goes like this:
Design ahead of coding.
Of course! If you haven't spent more time designing than you eventually spend coding, you likely haven't yet understood the problem well enough.
This doesn't mean you have to understad every nitty-gritty detail before you write your first line of code - you'll discover things and change your design as you go. But you should know where you're goiong. And as you go, map the road ahead of you.
Not coding until you understand where you're going is VERY scary to administrators. But it gets you to your destination MUCH sooner than striking out randomly.
Get the modularity right.
Think of the final solution as a knotted mass of spaghetti threaded through meatballs inside an opaque plastic bag. Squeeze it around until you find a thin place, where few strands (representing flows of information) connect the halves of the solution on each side of the bag. Cut the bag in half here and label the strands: You've defined a module boundary and interface. Repeat with each of the two smaller bags until the remaining pile of bags each represent an understandable amount of code - about a page - or a single irreducable lump of heavily-interconnected code (one "meatball"). Then tear them open one-by-one and write the corresponding code.
Debug as you go:
This is the key!
Program top-down like a tree walk, stubbing out the stuff below where you're working. As you write the code, also write one or more pieces of corresponding test-code that produces output that is a signature of the operation of every bit of the code you write, and an expected-output file that is hand-generated (or computer-generated by a different tool, preferably written by someone else, or at least in a different language and style if you're alone).
Use a system tool (like diff or cmp) to compare the results (in preference to writing programs to "check" it, so you don't have to worry whether the test is passing beacuse the code is right or the test is broken.)
Run the test(s) every time you make a change or add code. Make a single change at a time between test runs, get it working and tested before you move on. (This is easy for procedural modules and subroutines. For instance: You can build the running of the test into your makefile, and fail the make if the test fails. GUI stuff is tougher, and I didn't have to deal with it myself. But tools are now available to perform similarly there.)
The result is that your bugs will generally be confined to the changes you just made, drastically limiting your search space and shortening your debugging time.
Do COVERAGE testing, and TRACK it.
...} with a cross across the right if the branch case is untested, across the left if the through case is untested. Switch to vertical bar when fully tested (including hitting the edge from both sides if applicable)
Don't move on until you have exercised every bit of code where you were working. "Exercise" means your tests have been updated to execute every line or component of a line, driving it to its edge and corner cases and extracting a signature that shows they're doing their job correctly.
Automated coverage tools are an inadequate kludge to try to do this after the fact. Unfortunately, they pass code once all the branches have been executed, but have no idea whether they did the right thing. They may test that you hit the edge conditions - but can they tell if the edge is off-by-one? Human intelligent, with its knowlege of intended function, is required.
I developed a style of marking listings to document coverage, which is why I use hardcopy even in this day of glass terminals.
- a vertical bar beside a line that is completely working. Cross-mark across its top (T) at the first of a set of working lines, across the bottom of the last of a set.
- an "h"-shaped mark beside one that represents a partially-tested branching construct (if, for, do, "}",
- For compounds (i.e. "for( ; ; ) {" or " ? : " underline the portions fully tested, put an "h" with crossbar through those partially tested.
- Declarations "pass" when you've checked that the type is right for the job and at least one hunk of client code uses them.
- Comments "pass" pretty much automatically when you think they're right.
- A place where code is not yet present, or where the code above and below is tested but the flow across the gap is not, gets a break in the vertical line, with crossbars, as if there were an untested blank line "in the cracks". (But there should be a comment in there mentioning it. I start such comments with "MORE", so I can find any that are left with an editor. Similarly a MORE comment marks where I've stopped coding for now.)
When the code is done-and-coverage-tested there's a vertical slash beside all of it. (Sometimes you have to add test code temporarily to make something visible externally, but #ifdef or comment it out rather than removing it when you're done.)
The result is like growing a perfect crystal, with flaws only in the growing surface. When you've tested a part you're usually DONE. You never have to return to it unless you misunderstood its function and have to change it later, or if the spec changes.
DOCUMENT!
Co-evolve a document as the project develops if there's more than one on it, or if it has to be handed off to others later. If you're alone, you can get away with heavy comments.
Put in the comments even if you have the document.
Keep the comments up to date.
Comment heavily even if it's just you and always will be. When you come back to the code (especially if you're following this methodology and only get back to it MUCH later) you'll have forgotten what you were thinking. So put it all down to "reload your mental cache" when you get back to it.
The document should be a complete expression of the intended operation of the code - but in a very different and human-understandable form. (Especially not just pseudo-code for the same thing, or "i = i+1; add one to i". Use pseudo-code only for an ilustration, not an explanation.) Remember: Testing can NEVER tell if it's RIGHT. It can only tell if two different descriptions match. "Self-documenting code" is an untestable myth - all that can be tested is whether the compiler worked correctly.
There's more but I have to go now. I'll try to continue later. The above contains the bulk of the key stuff.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
I've tried literate programming, XP, and several flavors of traditional strategies, all of which purport to be the "answer" to your needs. It's ALL a load of crap. If there's any one answer, it's this: there is no one testing methodology that outperforms any other in EVERY testing situation. In my own personal development, I have come to rely on several beliefs/prtactices that have sustained the test of time over my 20 years of programming (and have made life much easier when coupled with a reliable testing methodology):
From my experience testing at PictureTel and DEC in Mass., I found out that the usually-understaffed test team runs into the Laws of Software Testing: Law #1: Most of the time, you are not testing. You are obtaining (beg, plead, cajole) equipment and software, and you are configuring and fixing software, just so that the environment is ready for the software testing to occur. Law #2: When testing, most of the time you are not testing anything that matters. Sure, scripts are great, but they are very narrow, and bugs are sneaky. Running a limited variety of scripts across some clients and servers only gives the impression of coverage ... kind of like the concept of "busy work".
Law #3: When you find a bug, most of the time it can't cross the political barrier. After all, bugs are rated and prioritized, and that is the domain of management, who is overwhelmingly concerned about release dates.
Law #4: The bugs you don't find will reach the customer and will return as the highest-priority bugs that will usurp other bugs in the ongoing process. Fixing customer problems is of course a priority, but it landslides into the current product's production.
[also misbehaves on Kuro5hin as Peahippo]