Properly Testing Your Code?
lowlytester asks: "I work for an organization that does testing at various stages from unit testing (not XP style) to various kinds of integration tests. With all this one would expect that defect in code would be close to zero. Yet the number of defects reported is so large that I wonder how much testing is too much? What is the best way to get the biggest bang for your testing buck?" Sometimes it's not the what, it's the how, and in situations like this, I wonder if the testing procedure itself may be part of the problem. When testing code, what procedures work best for you, and do you feel that excessive testing hurts the development process at all?
... is to not make the mistake in the first place. This may sound kind of stupid, but it's true. Don't skip on sleep - so you may stay properly awake, don't run yourself on Coca/Pitr Cola, eat good food, go for walks, and you'll find yourself making far fewer mistakes and producing better quality stuff. And _double_check_ everything.
And the answer to that is of course: "No, you should test more, and fix the bugs". And of course, looking over your development model to see why you have so many errors might be a good idea (such as formalizing who can commit code, if you've got lot's of programmer at various skill-levels).
But in real-life, many bugs are not that important, and time-to-market and development cost is more important.
So unless you provide us with more data, such as ...
...I don't think anyone will be able to give any good advice either.
My experience has shown that the number one way to find defects is code reviews performed by other developers who can read the code and also understand the intended functionality. This will catch 90% of all defects before they are even released to QA.
For more information, the developers bible (IMHO) Code Complete (available on Amazon and elsewhere) has some good information on testing strategies and some hard numbers on effectiveness of testing. Good luck.
If you look at the guys with really low bug rates, like the NASA guys running the Shuttle control software, they have very separate test and development teams, and a competitive attitude. The test team "wins" if it finds a bug, and the devlopers don't want to look silly.
Some Extreme Programming techniques, such as paired coding may help too.
The main thing: testing does absolutely nothing to minimize the number of defects in a particular application.. There are lots of other things that are as important.. ie: are these defect reports being seen by the appropriate developers and are they being acted on, what types of procedures and communication actually exists between the developer and the QA persons (assuming that they are not the same folk)..
The last point isn't as bizarre as it sounds, I've seen lots of places where a QA person enters bugs, but the developers silently reject them ("its not a bug, that's how the program works")
Testing just tries to discover the presence of defects, by itself, it cannot ensure that your product works perfectly (for an application of even moderate complexity, there may be an exponential number of cases and paths to check, most test cases are written for a percentage of those only).. Because of this, if you feel that you're spending too much time testing, perhaps you need to check if your test cases are appropriate to the situation and stage of development..
Another point is that tests can be automated to some degree or the other, perhaps a scriptable tool might assist in lowering some of the drudgery associated with actually assuring the quality of your software...
rant mode = on...Excessive testing ONLY hurts if it takes people away from development at the early or even middle stages of a project and forces them to run tests on incomplete sections of code.. otherwise, there is NO such thing as too many things...
IMHO, programming and testing should be done at the same time in the development stage.
While programming and "bugging" happen at the same time, programming and de bugging/testing should happen at the same time too.
It is very well explained in Bruce Eckel's Thinking in Java . You should just test everything in the code itself, even if it happens to add some overhead. Once called that function, you want that <something> happens.. so check it in the code.
I know this is not the usual way procedural programming happens. It seems much more straightforward to drop the code as it comes and then check if it behaves correctly.
But if you do so you will often discover that that tests made afterwards ara not comprehensive of all possible situations.
And so you discover that testing and debugging are just unfinished tales, and it is even worst if testers are not the programmers who did the work.
Plus, I hate testing, so I force myself to do the work well and let the code (as long as possible) test itself, even if it makes development slower and boring.
Umhh... i'll preview this post 10 times, hoping it's free from bugs :)
Obviously my code contains no ewwows ;)
:dikappa
Having been on teams producing 24 X 7, bullet proof code for communication servers and credit card processing I have an idea about the increasing number of bugs found. In the Old Days(tm), we wrote every line of code ourselves and used time tested libraries (C language). I quit using microsnot when their libraries stared having bugs in their rush to C++. Now most coders use massive OOP libraries from who knows where built by slackers, and GUI app builders that generate code and perform all sorts of actions under the hood. When something goes TU it is often hard to find all the conflicts.
Even when using one of these app builders I read through all the code and put tests and logging into the generated code. Funny that these tools are supposed to make us more productive. My coding and testing every line still beats total time spent on a project since I don't have to go back and redo it later. When it's done, it's done. Next project. I've had comm programs run for over 5 years error free servicing 1000s of users per day. One specialized delivery, billing, and inventory system I wrote was used over 6 years error free and caused the owner to stay with hardware that was compatible with the software (not M$) because the programs always worked. And not a damn bit of it was OO or came from some automated builder tool.
In short, the closer you get to the metal and the more familiar you are with the code that is executing, the better your chances of producing error free programs. Takes longer to market, but then you don't have to redo it forever until the next bug ridden version comes out. Saves time and coders to work on the next version and the customers are always pleased. Get back to the basics. Try it, you'll like it.
The object of finding bugs isn't to result in fewer bugs by fixing them. It's to result in fewer bugs by not writing them in the first place. The developers need to review found bugs on a regular basis, with the objective of changing development methods to avoid them in the future.
It's all fine and good to say "don't write buggy code in the first place," but this sort of feedback is the only way to get there. What makes this so hard in many organizations -- aside from the usual disrespect many developers have for QA people -- is that developers fear that this process is some sort of performance evaluation. As soon as this happens, the focus shifts from finding better processes to defending existing processes: "It's not really a bug," "There isn't really a better way of doing that," "We just don't have time to do it the 'right' way," and so on.
This is why the feedback needs to be direct from QA to the developers, who are then tasked to categorize bugs and develop recommendations for avoiding them. It's the latter that is the "product" required by management, not a list of bugs with developer's names on them. Management should otherwise get the hell out of the way.
This is excellent advise. In my experience, the most stable code comes from pragmatic design followed up by pragmatic coding.
Design your system thoroughly. Identify every component, and the minimum interface required for that component. Carefully document that interface (API) - use Design By Contract (preconditions, postconditions and invariants) if possible.
Moving targets mean that the API will almost certainly have to be extended - documentation on the design and intent of the component/API as a whole will reduce the pain of this process. The responsibility for this documentation is shared between the design and implementation phases. Pay careful attention to documenting assumptions made within the code, e.g. ownership of member/global variables.
When it comes to coding, start with a skeleton. Put in the API function/method as defined, then check/assert every pre/post condition. Think about how any parameter could be out of range, or violate the assumptions you make. Once you are happy you're checking for all illegal use, you can go on to code the internals.
When coding internals, remember that you cannot trust anything (with the possible exception of other code in the same component). Check/assert the return values (and in/out parameters) of all calls you make. Have a well-defined system-level design for error handling, that doesn't allow the real error (or its source, if possible) to get lost.
As for testing, I'm all for the XP method: write your test cases first. This helps you to think about what you API is doing, how you are going to actually use it, and what you can throw at it that may break it (helping you to lock down the pre/postconditions).
You must use regression tests! Testing is useless if its done one, but the code is modified afterwards. Have a library of test cases, and use all of them. Every time a bug is found, add a test case for that bug, and ensure it is regression tested every time.
Code audits can detect and solve a lot of common implementation bugs. Use them to look for unchecked pointer/buffer use, assuming return values or success/failure of functions, and that asserts are correctly and accurately used.
In my experience most bugs do NOT come from implementation errors, but from developer misunderstanding, especially late in a project or in maintenance, or even during bug fixing! A developer must fully understand the code (s)he is working on, and all the assumptions it makes. Never adjust a non-local variable without first checking all other functions that use or modify that variable, and understanding the implications. Never use a function or method without understanding all the side effects (on parameters and scope/global state). This is why all of this information should be documented, and audits performed to ensure that the documentation is accurate.
i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
I've worked on both ends (dev and test), at M$ and other places, and I've come to one conclusion (I'm sure its not the only correct one).
Developers must test their code.
With a test team backing you up, it becomes too easy to change something, run it once (if at all), and then push it into the next build so the test team can catch your errors. I've found that as a tester, a huge proportion of bugs are simply features implemented where the developer just forgot something stupid. I end up wasting 5 minutes writing a report, my manager assigns the bug back to a developer (hopefully the one who made the mistake but not always), and the developer comes back to the code a week later, spending 20 minutes just trying to figure what s/he wrote a week back.
My point: this wastes 30 minutes of people's time for every little stupid mistake. Pressure your developers to really give a thorough test to the code they write before the check it in, especially if you have a test team, because you just end up wasting more people's time.
Your signatures belong to me.
Just generating random data and trying to load it caught a lot of bugs, but even more effective was to take a valid image and modify the bytes in it at random, and then try to load it.
Of course, the reason this was so effective, is that the loaders would get mostly what they expect, and then suddenly something illegal. This is the kind of thing you tend to forget about when you write code.
Since it is so easy to attack your program with random data, this kind of testing gives you a lot of bang for the buck, but on the other hand, the bugs it find may not always be those that are likely to occur in practice.
Programming is the same way. What kinds of bugs are you finding? Are they just stupid bugs, like buffer overflows or off-by-ones (good design, bad implementation), or are they unhandled errors, or are they API mis-matches or faulty algorithms (bad design)?
Have you made any effort to go back and say "Gee, we are getting a lot of off-by-one errors. OK folks, we need to think about our loops."?
And when you find one type of bug, do you go back and identify anyplace else a similar bug may exist?
If you are hitting high and right, and you never adjust your sights, you will NEVER hit the target consistently. If you never feed back the CAUSE of the bugs, you will never eliminate them.
www.eFax.com are spammers
I don't let customers dictate how programs should work. I make them tell me what information they have to enter, and what they want to get back out. I decide on mostly everything in the middle.
A number of years back I wrote test programs for printed circuit boards. First you created a model for the board that simulated the logic circuits. You then wrote test patterns that were applied to the board's inputs, and the simulator model predicted the board's outputs. The inputs together with the predicted outputs were applied to a real board that you wanted to test, and if this test program passed you assumed that the PC board was good with a high degree of probability.
One mode of the simulator allowed you to simulate faults that might occur on the board. The simplest kinds of faults were physical IC pins "stuck-at-zero" and "stuck-at-one" (these were the most common faults in real life), and if you wanted to be thorough you could also simulate "internal" faults down to the gate level.
I worked in a contract test programming house, where the contract with the customer required us to produce a test program with a specified minimum level of fault coverage, usually just at the physical IC pin level to minimize cost of developing the program. This ranged from say 90% for cheaper commercial work to 99%+ for certain government contracts. With >95% coverage, the "real life" fault coverage was maybe one or two "dog pile" boards out of 1000 would pass the test program but fail a system test.
The point of this is in that business, there was a clear objective measure of a test programs "quality". The measure wasn't perfect, but it was far better than just blindly writing a test program based on a "gut feel" for how the board should work. In addition, the test programmer had a clear, objective goal.
I think a useful tool in the software business would be a measurement of the percent of lines of code that were actually run during the QA process, along with a log of those lines that were not run and not run. Often there are big chunks of code that only get triggered by very special conditions, and there is no way QA can guess those strange conditions. The standard QA process is very subjective; there is no objective measure of any kind as to how thorough the testing was, other than just documenting a list of features that were (often superficially) exercised.
A more sophisticated tool could go beyond lines of code and into log the various logic combinations exercised in "if" statements, etc.
Several years ago I wrote an experimental tool that did this for a specialized database programming language. Basically it rewrote the program with a logging call after each statement (and yes, the "QA version" ran very slowly). The results were quite eye-opening, revealing chunks of "dead code" and conditions no one ever thought of testing. Unfortunately the project kind of died.
Many languages have "code profilers" that are mainly intended to analyze performance, but many of them could be easily adapted to become QA quality measurement tools.
Do these kinds of tools exist, and if so why aren't they more widely used?
"When testing code, what procedures work best for you,..."
Make sure it compiles and runs and then upload it to Debian/unstable.
(Yes, I'm joking).
"...and do you feel that excessive testing hurts the development process at all?"
If didn't hurt why would you label it "excessive"?
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
I don't let customers dictate how programs should work. I make them tell me what information they have to enter, and what they want to get back out. I decide on mostly everything in the middle.
Then you aren't writing particularly complex software. If your users need software that does sophisticated processing, mathematical or otherwise, then the programmer probably isn't the best person to work out how it should do it. This is true whether you're working on software for pricing derivatives, or for tracking shipments in a supply chain, or for controlling manufacturing machinery. That's why there are notations like UML, so that functional experts can communicate unambiguously to the software developers what a system should be doing. A good programmer knows about programming, a good analyst knows about business processes, some people are both, but only with years of experience, and even then, only within a single industry.
The requirements, specification and alanlysis process is what separates software engineering from "hacking".
There are two subjects I want to discuss here. First of all, I'm going to present the "jelly bean model" of defect discovery, then I'm going to talk about why the "testing to improve quality" model is fundamentally flawed.
The Jelly Bean model goes like this: Let's suppose you have a big vat of red and blue jelly beans. Your objective is to remove all the blue beans. You do this by reaching in, grabing a hand full of beans, throwing away all the blue ones, and dumping the red ones back in.
At the begining, it will be very easy to find the blue beans (assuming the blue-bean density is high), and towards the end, it will be very difficult (since the blue-bean density will be low). If you graph the cumulative number of blue beans you remove each day, you'll get a exponential curve; quite steep at the begining (high rate of discovery) and which flattens out as you approach total bean removal.
Software defect discovery follows this model exactly. Defects are easy to find at the begining if there are a lot of them, and hard to find towards the end. This means that if your defect discovery rate is pretty much constant (with respect to the number of hours of testing you've done) then you're probably still way down in the very first part of the curve, and your number of defects is probably very high.
Here's the important thing to remember though; the quality of your product has nothing to do with how many defects you find and fix during testing. The quality of your product is determined by the number of defects remaining! If you find and fix 10,000 problems, you might think you're doing very well, but if there are 10,000,000 defects remaining, your product is still crap.
You can estimate the number of defects remaining by trying to fit the number of defects you've found so far onto that exponential graph I mentioned above. The most popular method to use a Weibull curve, or Quadradic Regression.
Now, why is testing to improve quality a bad plan?
Let's say you worked at Ford, and roughly 50% of the cars you turned out had something wrong with them. You get lots of unhappy customers demanding their money back. Is your problem:
a) That you have a design defect in your car.
b) That you are introducing defects in production.
c) That you are testing cars insufficiently.
Most people realize that to test every car as it comes off the line is futile. There's too many of them, with too many potential points of failure. There's no way you can test them all. The root cause of the problem has to be in either a or b, and if you're looking to improve the qulaity of your cars, this is where you would spend your money. This isn't to say that Ford doesn't test their cars, I'm sure they do, but testing should be a means of verifying quality (IE, 1/1000 cars tested had a defect, our goal was 1/500, so therefore we can stop spending money on finding design and production faults), and not a means of improving it.
It's so easy to see this when we're talking about cars. Why does everyone get it backwards when we start talking about software?
Not only is it impossible to test every possible combination of inputs to most software, it's also very expensive to find and fix problems this way. If you find a problem in design review, or code inspection, then you have your finger on it. You know EXACTLY where the defect is, and how to fix it. On the other hand, when you say "Microsoft Word crashes when I try to select a paragraph and make it Bold", you have no idea where the fault is. Any one of several thousand lines of code could be the problem. It can take literally days to track down and fix the defect.
Your testing should not be a means of finding faults, but a means of verifying the quality of your product. Testing is not part of the development process.
How can you be sure you are 'Properly Testing Your Code'?
Actually you can do this by adding more bugs, yes adding them, The technique is called bebugging and the is basicly:
1) Produce code, it contains an unknown number (N) of bugs.
2) Programmer (or bebugger) seeds the code with a number (B) of known new bugs, the number and type of bugs should be determined from bugs found in previous debugging cycles.
3) Code is submitted to testing and some bugs are found (F).
3) The bugs found are examined and categorised as either real bugs (FN) or bebugs (FB).
4) Number of real bugs (N) can be found as the ratio of found bebugs (FB) to unfound bebugs (F).
5) Don't forget to remove all the bebugs.
I beg to differ. This is how most developers test their code as well, though manually.
If you're just testing to make sure your code does what it is supposed to do you are likely in BIG, BIG trouble. Users (and black hats) do just the opposite.
Focus just as much on making sure your code doesn't do things it WASN'T designed to do. Or risk a CERT or Security Focus advisory...