Properly Testing Your Code?
lowlytester asks: "I work for an organization that does testing at various stages from unit testing (not XP style) to various kinds of integration tests. With all this one would expect that defect in code would be close to zero. Yet the number of defects reported is so large that I wonder how much testing is too much? What is the best way to get the biggest bang for your testing buck?" Sometimes it's not the what, it's the how, and in situations like this, I wonder if the testing procedure itself may be part of the problem. When testing code, what procedures work best for you, and do you feel that excessive testing hurts the development process at all?
... is to not make the mistake in the first place. This may sound kind of stupid, but it's true. Don't skip on sleep - so you may stay properly awake, don't run yourself on Coca/Pitr Cola, eat good food, go for walks, and you'll find yourself making far fewer mistakes and producing better quality stuff. And _double_check_ everything.
before each compile, one should make a small sacrifice to the debugging gods and ask them to forgive you for your syn(tax).
my last sig was too controversial... now, a new and improved useless sig!
boom boom
cLive ;-)
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
I think JUnit-style testing works great, and I plan to start using it more often.
:)
Testing is good to verify that your code does exactly what you think it does; a lot of the time I produce code that I "think" works, using JUnit allows me to verify that it actually does.
Check out junit.org.
For those of you who are sceptic about unit-testing, you should try it. Setting up the tests are not as tedious as one might think, they force you to think your problem through, and maybe most of all: they make your build look cool
One wonders how your development has been organized. Everybody here should know the basics of software engineering, including but not limited to:
1) document APIs exactly, including definitions of legal and illegal data sets
2) separate test group from programmers
3) separate quality assurance from both API testing and programmers.
Well, that's the theory. I've never worked in a place where that would have been implemented. Instead, people trying to bring this in have been kicked out. In practice, maybe one should try to get a feeling of each API: how is it supposed to be used? Use each piece of software only in the implicit limits of it's programmer's idea to keep the number of bugs down. Not to mention the obvious coding style mantras.
I think, therefore thoughts exist. Ego is just an impression.
And the answer to that is of course: "No, you should test more, and fix the bugs". And of course, looking over your development model to see why you have so many errors might be a good idea (such as formalizing who can commit code, if you've got lot's of programmer at various skill-levels).
But in real-life, many bugs are not that important, and time-to-market and development cost is more important.
So unless you provide us with more data, such as ...
...I don't think anyone will be able to give any good advice either.
My experience has shown that the number one way to find defects is code reviews performed by other developers who can read the code and also understand the intended functionality. This will catch 90% of all defects before they are even released to QA.
For more information, the developers bible (IMHO) Code Complete (available on Amazon and elsewhere) has some good information on testing strategies and some hard numbers on effectiveness of testing. Good luck.
If you look at the guys with really low bug rates, like the NASA guys running the Shuttle control software, they have very separate test and development teams, and a competitive attitude. The test team "wins" if it finds a bug, and the devlopers don't want to look silly.
Some Extreme Programming techniques, such as paired coding may help too.
... NOT tested in. If the product is poorly engineered, there should be no surprise at the vast number of bugs, no matter how much testing you do. Crap is crap.
The main thing: testing does absolutely nothing to minimize the number of defects in a particular application.. There are lots of other things that are as important.. ie: are these defect reports being seen by the appropriate developers and are they being acted on, what types of procedures and communication actually exists between the developer and the QA persons (assuming that they are not the same folk)..
The last point isn't as bizarre as it sounds, I've seen lots of places where a QA person enters bugs, but the developers silently reject them ("its not a bug, that's how the program works")
Testing just tries to discover the presence of defects, by itself, it cannot ensure that your product works perfectly (for an application of even moderate complexity, there may be an exponential number of cases and paths to check, most test cases are written for a percentage of those only).. Because of this, if you feel that you're spending too much time testing, perhaps you need to check if your test cases are appropriate to the situation and stage of development..
Another point is that tests can be automated to some degree or the other, perhaps a scriptable tool might assist in lowering some of the drudgery associated with actually assuring the quality of your software...
rant mode = on...Excessive testing ONLY hurts if it takes people away from development at the early or even middle stages of a project and forces them to run tests on incomplete sections of code.. otherwise, there is NO such thing as too many things...
"I wonder how much testing is too much"
;-)
Most tests are based on what the system is supposed to do, not on what one can do. Given a few users anything can be done to the program, things you likely didn't test because you understand the program to good. Try putting your mother behind the prog ;-)
Well, it's never to much, until you've done every possible thing. Thats one of the advantages of open source development. A lot of people are working, playing, coding with the beta version wich means a lot of things get tested. In my experiance most errors show up with unexpected things, like missing error checks etc. A good point to start testing is to test agains the developer. Ask him everything, what if i do this, what if i do that. If you think crazy enough (like normal users), you will recieve a dozen of 'euh, don't know' answers. Thats where your errors are
Its all in the same package...
... My bet is that the second half would spend less total time on that program and bug fixes in a real world situation (which was the whole point of the exercise)
If you have a well thought out and well structured design... and if every single function is documented atleast mostly before coding begins... and if all of the theory of interactions fit in... then only a small amount of testing is required
The main problems which require more testing come when corners are cut over the initial design stages, or their not done as fully as they should be doing, or when people dont actually think about what input users can give
For one of my projects I did ages ago for college, half of the class (which was split to give the same coding ability spread in both halfs) were each asked to write this program and the other half were asked to design it, review the design, make a proper testing strategy, document and THEN start writing it
The second half took a little longer to get their program working, but the first half had more bugs, and spent more time testing
Well.. showing your work to someone important always brings up bugs, every time :)
I suppose its just a case of using the product like an actual user who knows nothing about it would, right from the first step.
If you really want to generate bug free code, you have to keep one rule in mind at ALL times. A bug occuring in the code is a failure in the methodology you are currently using to avoid them. Sounds very basic, but a lot of companies forget that.
;-)
When you have a problem with bugs, you need to figure out where in the process the problem happened. Was the unit spec wrong? Documentation? Implementation? Unit testing procedures? Was it a correctable problem caused by the engineer involved?
If you really want to be bug-free, every time one shows up you have to figure out why it happened, and change things.
Personally, I think the biggest one is to make engineers work 9-5. Not 9-7, or 11-9. Tell them to go home at 5, even if they're in the middle of something. Software engineering is a very complex task that takes a lot of energy and concentration to do right. Just like Doctors who work long hours make mistakes (resulting, often, in people dying), engineers who work too long make mistakes too.
Being in the "zone" is often the death of good code. You get lots of cool code written, but none of it is double-checked, none of it is verified to match spec, and it often ends up afterwards difficult to understand.
Now don't get me wrong, I don't do any of this, and my crap is FULL of bugs, but thats what you need to do if you really want to help it. Writing buggy code is like a public works program for QA people. Who wants a hundred thousand unemployed anal-retentive QA people nitpicking something else like your car's inspection or your tax forms? Better to keep them in the software where they can't do any harm
Another good way to reduce errors is to follow the principals of Design by Contract.
State using Assertions what is expected of the code. Pre Conditions and Post Conditions.
If any of these fail, then throw an appropriate Exception.
-- "To ask a question is to show ignorance; Not to ask a question means you'll remain ignorant."
Mathmatical proofs and unit testing asside make sure that your program dosent crap out when its being used or abused. So get some regular people to bang on it..
... trust the programmers when it comes to testing. You may find some obvious buigs in your own code, but when it comes to runtime testing most programmers tend to emphasize on "correct" user behaviour or maybe the few wrong input sets they've taken care of and completely neglect the pervert fantasy of lusers *g*
;)
Dedicated testers OTOH are perfect in letting a program die in the most obscure ways. Did you know that you can crash a Commodore 64's BASIC interpreter by just typing PRINT 5+"A"+-5 ?
IMHO, programming and testing should be done at the same time in the development stage.
While programming and "bugging" happen at the same time, programming and de bugging/testing should happen at the same time too.
It is very well explained in Bruce Eckel's Thinking in Java . You should just test everything in the code itself, even if it happens to add some overhead. Once called that function, you want that <something> happens.. so check it in the code.
I know this is not the usual way procedural programming happens. It seems much more straightforward to drop the code as it comes and then check if it behaves correctly.
But if you do so you will often discover that that tests made afterwards ara not comprehensive of all possible situations.
And so you discover that testing and debugging are just unfinished tales, and it is even worst if testers are not the programmers who did the work.
Plus, I hate testing, so I force myself to do the work well and let the code (as long as possible) test itself, even if it makes development slower and boring.
Umhh... i'll preview this post 10 times, hoping it's free from bugs :)
Obviously my code contains no ewwows ;)
:dikappa
When a programmer is simultaniously coding and documenting thier code, at both the high and low levels, the larger "thought" bugs will decrease in number and severity.
Even if you don't use a literate programming system, often documenting the system before you write it can help make the code more clear.
- Serge Wroclawski
I don't know how relevent this is but I read somewhere long ago that newspapers ended up with more errors if they had multiple people proof reading the same text because nobody was really taking responsibility. Even if it is not intentional, there is always a feeling of 'the other guy will pick that up'.
I vaguely remember something similar being said about the space shuttle disaster.
... is the most annoyin part of dev
;-)
That's mostly true, but it gets better when you let developers test eachothers code. They will do anything to make it go wrong
And they are very descriptive testers most of the time.
Good design is always the best way to avoid buggy, hard to fix code.
But for testing it depends what you testing,
a good general test process for data processes (most functions can be though of as dataprocesses) is
Generate some input test data,
Work out what the results should be by hand.
this is your first stage regression.
Now run the input test data through the application/function
Diff the results against your hand generate file.
Any descrepancies should be resolved as,
A bug in the application/function
or
A Bug in the hand generated files.
Fix all the hand-job problems
Repeat until you test files are perfect.
You now have a second stage regression test,
Known good inputs, and known good outputs
Use the correct test files to fix the application bugs.
If a bug is found that isn't in you second stage regression, then generate test files for it.
Fix the bug in the application(using the test files).
Then run you second stage regression and check that any differences are down to the bug that was found (and been corrected).
Following this process you application should always get better, and a you should soon be able to build up a fairly large sample of test data.
The test harnis is simple enough (just a diff on the files and a bit of code to wrap up the functions.), to prevent artifacts caused by the testing process.
Any-how that's more-or-less what I do for most of my testing, bug fixing.
thank God the internet isn't a human right.
This strategy works - lots of shops use it all the time. However, the real premise of the process is that you want to get through client acceptance testing as soon as possible, as long as the result is not dissatisfaction on the part of the client with the software after they've accepted it. As you have noticed this strategy doesn't actually produce bug-free code.
This is not surprising. What you achieve is after all pretty much determined by what your goal was. You (shops in general) need to think hard about what your actual goal is. If your goal is nearly-zero-defects, then the traditional process isn't doing the right things for you. If however, your goal is to obtain milestone payments from your client, then it's pretty good. This is an area where the business goals determine the software engineering processes.
Let's put another hat on and think about what the negative affects of this strategy might be (negative is really defined in terms of what your goals are, but let's be vague about that for a moment).
All of the above factors are unpleasant for those left to maintain the code. Many of them also limit the longer term flexibility of the product and hence the useful life of the software. This feeds back into development processes because limited product lifetimes mean that there is less incentive to change your process to produce software which can persist (i.e. why make the effort to ensure that the system is flexible enough to last through 20 years of changing requirements when you expect the system to be retired after only 7 years?)
You mentioned XP - it offers a lot of techniques that resolve these problems:-
However, XP is best adapted to projects where a single team makes multiple frequent deliveries of code, can work closely with the client, and where the development project continues in the medium to long term. These characteristics allow many of the XP techniques, and this means that techniques taken out of XP may not help projects of a different style.
Having said this, the automated testing angle is a real strength. If testing is done manually, it's time consuming and expensive. Hence people don't do it as much as they might otherwise thing is appropriate. Maintenance deliveries often just undergo regression testing, and faults can creep in which might have been caught by the original unit or integration tests. Automated testing has many advantages :-
Just as a data point, I work on some software that has an automated test suite. The suite contains between 500 and 1000 test cases; the test suite conducts those tests in under 5 minutes on a very old machine. To do these tests manually would take one full-time person at least a week.
The summary is :-
What I miss in this discussion is something about the persons performing the tests. In my companyh we have a test team, consisting mainly of people who don't know the first thing about coding, who cannot read sources and who can only test 'through the UI'. And yet the system we work on has thousands of sources, a percentage of which has a UI (20%). Testing of all the underlying objects is a lot harder, and my experience is that with this many sources the total amount of possible 'paths' in the system is so large that tests using the UI take too much time, and therefore is never done properly. So now the developers are constantly asked to provide methods by which the testers can perform the tests.
I believe that most bang per buck can be achieved if the organisation is not too fixated to one or two standard testing procedures. Projects differ a lot. Using 30 percent of testing budget for testing the testing plan might well be worth the effort. If your company is making a set of applications for a fixed platform using fixed components and fixed architecture and these basics have been previously thoroughly tested, then ofcourse what was said above might not be true.
Having been on teams producing 24 X 7, bullet proof code for communication servers and credit card processing I have an idea about the increasing number of bugs found. In the Old Days(tm), we wrote every line of code ourselves and used time tested libraries (C language). I quit using microsnot when their libraries stared having bugs in their rush to C++. Now most coders use massive OOP libraries from who knows where built by slackers, and GUI app builders that generate code and perform all sorts of actions under the hood. When something goes TU it is often hard to find all the conflicts.
Even when using one of these app builders I read through all the code and put tests and logging into the generated code. Funny that these tools are supposed to make us more productive. My coding and testing every line still beats total time spent on a project since I don't have to go back and redo it later. When it's done, it's done. Next project. I've had comm programs run for over 5 years error free servicing 1000s of users per day. One specialized delivery, billing, and inventory system I wrote was used over 6 years error free and caused the owner to stay with hardware that was compatible with the software (not M$) because the programs always worked. And not a damn bit of it was OO or came from some automated builder tool.
In short, the closer you get to the metal and the more familiar you are with the code that is executing, the better your chances of producing error free programs. Takes longer to market, but then you don't have to redo it forever until the next bug ridden version comes out. Saves time and coders to work on the next version and the customers are always pleased. Get back to the basics. Try it, you'll like it.
Testing should always be a part of the development process. The wording here implies that testing somehow is considered to be outside the scope of development and I suspect this mindset is causing a lot of bugs to remain undetected.
It's just like documentation or support, those are also (or damned well should be) integral parts of the development process. Sometimes I think that most programmer's believe that the development process consists of the steps hack, compile, ship instead of the tedious iterative process of analyze, design, code, test.
So what, then, is excessive testing?
Well, as long as you find bugs doing it, it's not excessive.
If your projections predict a bug or two in a specific piece of code and your tests fail to find them, then testing (provided that the test method isn't flawed) gave you a much desired quality assessment of that piece of code - meaning that the testing still wasn't excessive.
Running the same tests over and over on the same code with the same data, now that's excessive, not to mention stupid.
Money for nothing, pix for free
When someone is told to implement feature A, they spend a little time sifting thru the 20yr old code, and do the minimum to get it done.
They write test cases to test A, unit, system, etc. Their team leader approves it, and of course all the tests pass before it goes out.
In the end, there's always some obscure way feat A interferes with feat B, but you're not going to write tests for every combination of keystrokes possible.
If you have user testing (u should), they'll find a score of bugs u didn't. Of course, the users too have a deadline to get stuff shipping and they too want to do the minimum possible.
In the end, nobody involved has a personal incentive to make it perfect. With proper testing procedures in place, everyone has a piece of paper with someone's signature on it which says they passed. They don't feel (too) guilty when there's a bug, (hey, my TL approved it!).
Anyway, how are you going to ask the customer for $20k extra so you can test for an extra week?
I guess you should try to spend a part of your testing budget on improving your design and programming practices.
The object of finding bugs isn't to result in fewer bugs by fixing them. It's to result in fewer bugs by not writing them in the first place. The developers need to review found bugs on a regular basis, with the objective of changing development methods to avoid them in the future.
It's all fine and good to say "don't write buggy code in the first place," but this sort of feedback is the only way to get there. What makes this so hard in many organizations -- aside from the usual disrespect many developers have for QA people -- is that developers fear that this process is some sort of performance evaluation. As soon as this happens, the focus shifts from finding better processes to defending existing processes: "It's not really a bug," "There isn't really a better way of doing that," "We just don't have time to do it the 'right' way," and so on.
This is why the feedback needs to be direct from QA to the developers, who are then tasked to categorize bugs and develop recommendations for avoiding them. It's the latter that is the "product" required by management, not a list of bugs with developer's names on them. Management should otherwise get the hell out of the way.
Of course properly written functionality test scripts (doing what the user does) will find most bugs. The downside is that it is boring to follow test scripts manually.
My company has been successful implementing automated functionality tests with Rational Robot (part of teamtest). If you just take the time to define proper test scripts you can easily redo all functionality tests on various platforms (if you use VMWare or similar sw to simulate different platforms) at the click of a button.
This saves time every release as the developers can focus on finding the really tough bugs instead of running boring functionality tests again.
I've worked on both ends (dev and test), at M$ and other places, and I've come to one conclusion (I'm sure its not the only correct one).
Developers must test their code.
With a test team backing you up, it becomes too easy to change something, run it once (if at all), and then push it into the next build so the test team can catch your errors. I've found that as a tester, a huge proportion of bugs are simply features implemented where the developer just forgot something stupid. I end up wasting 5 minutes writing a report, my manager assigns the bug back to a developer (hopefully the one who made the mistake but not always), and the developer comes back to the code a week later, spending 20 minutes just trying to figure what s/he wrote a week back.
My point: this wastes 30 minutes of people's time for every little stupid mistake. Pressure your developers to really give a thorough test to the code they write before the check it in, especially if you have a test team, because you just end up wasting more people's time.
Your signatures belong to me.
I often think the time lines have become so compressed in terms of expectations that it becomes harder and harder for companies to write clean code and get it out the door in order to meet the expected cycle of upgrades - and this is something i find common to all companies and even open source software. It seems we as consumers have come to expect an upgrade to this or that every year and so the dev cycle becomes one continuous thing - coders who are exhausted or working long hours write buggy code.
I think many people would be happy to wait a bit longer for better products but the industry has brainwashed them into thinking that its almost easy to bung a new version out.
The best way to avoid mistakes is not to make them in the first place but thats not so easy when working on a compressed cycle with management on top of you - its not just programmers who deal with it - its network designers, SOE architects etc etc
Bugs exist - they always will - but minimising them requires time and time is a commodity not readily found. I dont know what the solution is - as i say its just my thoughts.
I refuse to argue with Anonymous Cowards - if you want a discussion get an account....
First thing to do : look in your bugtracking software ( you DO use bug tracking software, right ?) , and try to isolate hot spots. Is there a particular piece of code that generates more bugs than others ? Is there a common pattern to the bugs (ie. memory not being freed, of-by-one errors etc.) ? Are they _really_ bugs or mis-interpretations of the requirements or the design ? In my experience, the 80/20 rule applies to bugs in spades - it is just hard to find the patterns.
If you need to, make the bug categorisation in your bug tracking software more specific. Once you get an idea of what your hotspot is, you can work at fixing the cause of the bugs.
If it's a particular piece of code, make sure it's reviewed by the best developers/architects you have, and consider refactoring it. At the very least, insist that it is reviewed and tested thoroughly before chec-in to the source code control system, and consider adding a hurdle to jump prior to check in (e.g. get the manager to sign it off).
If the code was written by one developer, consider swapping them out and giving it to someone else - it may be they're in over their head.
Make sure you increase the number of test cases for this piece of software, and check for "edge cases" religiously - if the code is broken at all, it is likely to be broken in more ways than you realized.
If it turns out that the problems tend to have a common cause (memory leaks, of-by-one errors,etc.) consider a structure which forces developers to concentrate on those issues before checking in code; again, consider the hurdle ("software must be signed off by the off-by-one guru prior to check in"), and hone your tests to check for these kinds of errors if possible.
If the bugs stem more from misunderstood requirements or designs, beef up those areas. Work on your requirements and analysis processes; consider training courses for the developers to get them up to speed on interpreting these nebulous documents, and look at improving the review process by having designers present. Frequent "mini-deliverables" (another concept stolen from XP) will help here too - get your team to deliver a working piece of code - it need only be a minimal sub-system - and get it reviewed by designers and analysts. If the bugs tend to occur on the boundaries - i.e. invalid API calls, invalid parameters etc. - consider design by contract or aspects.
Finally, there's a bunch of hygiene stuff
N
It's all very well in practice, but it will never work in theory.
There's no one size fits all process for testing. How much effort you need to spend on testing depends on a lot of factors including but certainly not limited to: code size, amount of developers, customer requirements, life cycle of the system etc.
That being said, here are some remarks that make sense for any project:
In general a testing procedure that gives you no defects just indicates your testing procedure is bogus: defect free code does not exist and no test procedure (especially no automated procedure) will reveal all defects.
The XP way of determining when a product is good enough: write a test for a feature before you write code. If your code passes the test it is good enough. This makes sense and I have seen it applied successfully.
A second guideline is to write regression tests: when you fix a bug, write an automated test so you can avoid this bug in the future. Regression tests should be run as often as possible (e.g. on nightly builds). All large software organizations I've encountered do this. Combined with the first approach this will provide you with a growing set of automated tests that will assure your code is doing what its supposed to do without breaking stuff.
Thirdly, make sure code is reviewed (by an expert and not the new script kiddie on the block) before it is checked in. Don't accept code that is not up to the preset standards. Once you start accepting bad code you're code base will start to deteriorate rapidly.
Jilles
Testing/debugging is like finding and putting out existing fires. If the organization can use test results to prevent future fires, then you're a step above. And probably more advanced than 90% of the software houses, too :-)
I program mostly in object oriented languages. So I have seperate files, which have seperate classes. I start at the bottom of my UML and work my way up testing each class as if it were its own program. When I know they all work individually, I can be certain that, despite the fact there had to be a few bugs I overlooked, that all bugs are due to the way they interact. It takes awhile, but in the end I'm mostly bug free.
The GeekNights podcast is going strong. Listen!
Most of the 'worst' bugs i've come accross are down to bas systems design, before a single line of code is written.
If a system is designed well then you should have far fewer bugs, even if you are using code monkies who don't know a quick sort from a n^2 bubble.
Design you systems well, know your people, Bill's good at that kind of thing and likes it(but crap at ui's say),
Jess loves doing data imports, (may not be that quick, but always does them well).
Fread always designes and produces good/fast systems cores.
Get your developers talking and sharing knowlage, 'I'm, having a bit of a problem' , or 'Who knows how to', are good things for people to be saying, so incorrage them to own up to the inadiquacies, and they won't have them for long.
If you can manage that then your productivity and bug counts should drop dramaticly, and the bugs you do have should be easier to fix.
thank God the internet isn't a human right.
Testing is necessary but not sufficient. There must be a way to capture requirements, convert requirements to design, convert a design to implementation, and finally test. At each transisition it is a good idea to make an assessment of how well you accomplished your task.
Skipping any of these steps and putting if off until test is pure folly. An extremely false economy.
Just generating random data and trying to load it caught a lot of bugs, but even more effective was to take a valid image and modify the bytes in it at random, and then try to load it.
Of course, the reason this was so effective, is that the loaders would get mostly what they expect, and then suddenly something illegal. This is the kind of thing you tend to forget about when you write code.
Since it is so easy to attack your program with random data, this kind of testing gives you a lot of bang for the buck, but on the other hand, the bugs it find may not always be those that are likely to occur in practice.
I am having a bit of a QA problem myself. After reading up (Steve McConnel, etc), I'm looking to spend more time in pre-code, and also implement inspections (a code review technique).
The disadvantage to testing is that you detect errors, but need to spend time finding the source. If you avoid the error by detecting it at design-time or during code review, you will spend less time dealing with it, since you will know more about the root cause to begin with.
Stop the brainwash
Just to emphasise how good design is the key to avoiding most bugs, not testing - there's a song that often gets sung at my place of work...
Hundred and one little bugs in the code
Hundred and one little bugs in the code
Fix the code, compile the code
Hundred and two little bugs in the code
most of us actually got good specs? Been close to 3 years now for me. With half-assed specs derived from business users who A) don't know what they want and B) don't know when they're out of their league when talking about how something should work you're pretty much screwed from the beginning.
Programming is the same way. What kinds of bugs are you finding? Are they just stupid bugs, like buffer overflows or off-by-ones (good design, bad implementation), or are they unhandled errors, or are they API mis-matches or faulty algorithms (bad design)?
Have you made any effort to go back and say "Gee, we are getting a lot of off-by-one errors. OK folks, we need to think about our loops."?
And when you find one type of bug, do you go back and identify anyplace else a similar bug may exist?
If you are hitting high and right, and you never adjust your sights, you will NEVER hit the target consistently. If you never feed back the CAUSE of the bugs, you will never eliminate them.
www.eFax.com are spammers
One thing I've found invaluable is to compile your program with a translator that inserts code to detect when branches have been followed. Then run the test suite and see that all the code was executed. Any code that was not executed has not been tested.
It's amazing how poor coverage can be with a naively written set of tests. Ideally you want to write the tests so that the coverage comes out good, but in practice you may have to patch the tests with more tests to cover the parts you missed. You may also have to change the code to make it easier to cover.
Rare error cases (like malloc failures) can be hard to cover.
A number of years back I wrote test programs for printed circuit boards. First you created a model for the board that simulated the logic circuits. You then wrote test patterns that were applied to the board's inputs, and the simulator model predicted the board's outputs. The inputs together with the predicted outputs were applied to a real board that you wanted to test, and if this test program passed you assumed that the PC board was good with a high degree of probability.
One mode of the simulator allowed you to simulate faults that might occur on the board. The simplest kinds of faults were physical IC pins "stuck-at-zero" and "stuck-at-one" (these were the most common faults in real life), and if you wanted to be thorough you could also simulate "internal" faults down to the gate level.
I worked in a contract test programming house, where the contract with the customer required us to produce a test program with a specified minimum level of fault coverage, usually just at the physical IC pin level to minimize cost of developing the program. This ranged from say 90% for cheaper commercial work to 99%+ for certain government contracts. With >95% coverage, the "real life" fault coverage was maybe one or two "dog pile" boards out of 1000 would pass the test program but fail a system test.
The point of this is in that business, there was a clear objective measure of a test programs "quality". The measure wasn't perfect, but it was far better than just blindly writing a test program based on a "gut feel" for how the board should work. In addition, the test programmer had a clear, objective goal.
I think a useful tool in the software business would be a measurement of the percent of lines of code that were actually run during the QA process, along with a log of those lines that were not run and not run. Often there are big chunks of code that only get triggered by very special conditions, and there is no way QA can guess those strange conditions. The standard QA process is very subjective; there is no objective measure of any kind as to how thorough the testing was, other than just documenting a list of features that were (often superficially) exercised.
A more sophisticated tool could go beyond lines of code and into log the various logic combinations exercised in "if" statements, etc.
Several years ago I wrote an experimental tool that did this for a specialized database programming language. Basically it rewrote the program with a logging call after each statement (and yes, the "QA version" ran very slowly). The results were quite eye-opening, revealing chunks of "dead code" and conditions no one ever thought of testing. Unfortunately the project kind of died.
Many languages have "code profilers" that are mainly intended to analyze performance, but many of them could be easily adapted to become QA quality measurement tools.
Do these kinds of tools exist, and if so why aren't they more widely used?
Sure, build a big suite of tests to run and check for things to go wrong. Every bug fixing process suggests it own test.
Then you find out that you don't have the time and resources to run all the tests everytime someone makes a change to the codebase.
So, use smaller suites of the faster tests and weed out some of the ones that have been ironclad passes for the last 5 dozen code checkins. For frequent testing it makes sense to only shake what's new and rickety, not what's stood through 10 hurricanes.
Run the exhaustive complete test suite infrequently, say when a release is imminent, or as often as you can afford to spare the resource cycles.
"Provided by the management for your protection."
If you test the code as much as you says you do, and are testing for the correct thing (which I do not know you are doing) the problem may be the architecture.
Code which is "forced" into a paper architecture is sometime worse that code with no architecture at all. In many of my projects, parts of my architecture change part way through so that the code will work better. Sometime not everything can be thought of before hand. OO programs have a lot of information to fit in a human barin at one time, problems are bound to show through. I don't have any "high eng tools" to help with the architecture either, which doesn't help.
Also, the architecture itself may suck.
What kinds of problem are you having? I think you need to design test routines geared towards not letting the types of problems you currently have through. It is hard to have any specifics, since the post was so vague.
-Pete
Soccer Goal Plans
The idea is to build up the level of test data you have,
You could have somthing that generats loads of test data , and use a simple script to check that it's data is ok, and even see what happens when bad data is fed in.
If someone one finds a bug, they generate test cases(and variants).
UAT testing should generate loads of data.
An live envoriemts give you loads of data for full-cycle developments.
thank God the internet isn't a human right.
You seem to know what you're talking about, so are there any good books that cover the software design process? A book that covers what should be flowcharted and how detailed it needs to be, as well as writing good specifications and what should be contained in them?
A lot of the problem may rely on what methodology you are using to code the program, whether it is the traditional waterfall method, or the sprial method, or perhaps M$'s old sync-and-stabilize method. Whatever methodology you use will drive how you should be testing.
With the waterfall model, you really need to know way ahead of time that what you are coding is what will be desired in the end product. It forces you to have a clear picture in your model of what you are trying to build and with each step in the process, you must develop testing procedures that address that level of the code. For example, at a high level, you may say, let's build a compiler, and following that decision, you need to devise a test that proves that the compiler works. The next phase, you may say, let's build an assembler to produce machine code for the compiler. Then you need to build tests that prove that the assembler works. This methodology continues right down to the smallest module of code, and when all of the pieces have been written, integration testing begins, and you make sure that each larger piece can correctly function based on the output of the smaller piece.
However, in the spiral model, it allows for a well-defined core code to be produced with tons of modules that evolve as the spiral expands. Integration is a function of the spiral, and testing occurs within each iteration of the spiral loop. Code produced with the spiral model also tends to be somewhat more difficult to test in later stages, IMHO, due to the nature of the testing that occurs at each cycle in the loop. Testing becomes more critical in later stages as the previous stages become more nested into the core of the program.
Well, enough Software Engineering for one day. Back to work....
Rule #1 -- Politics always trumps technology.
Maybe when we have quantum computers we can test every single user scenario in parallel
-flamesplash
"Not knowing when the dawn will come, I open every door." - Emily Dickinson
"When testing code, what procedures work best for you,..."
Make sure it compiles and runs and then upload it to Debian/unstable.
(Yes, I'm joking).
"...and do you feel that excessive testing hurts the development process at all?"
If didn't hurt why would you label it "excessive"?
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
The difference between testing expected inputs and possible inputs is that reality doesn't limit itself to expected inputs. Heck, Sometimes it doesn't even limit itself to possible inputs.
Larger tests don't test more. What the large tests do is make sure everything works together. You need the small tests to make sure each piece actually works.
The bigger the test, the more likely that your testing platform doesn't resemble production.
There is a big difference between getting your test to run sucessfully and having bug free code.
Chances are the test cases have nothing to do with hiow the users actually use the program. Chances are the programmer has never actually seen how a user uses the program. Chances are, the first time he does he'll go back to his computer and start cursing the user for not doing things the "right" way.
If something breaks and you don't add that something to the test case you're asking for it to break again.
Testing deserves powerhouse machines and sadistic maniacs who like to break code. People who want the tests to be sucessful and don't run them that often are obviously not going to be as nasty as the real world. Even the sadistic manic has a hard time being as nasty as the real world.
Tests are less expensive than that production errors. But only if they find errors. Tests that prove the code works perfectly usually don't.
No programmer likes to be told he made a mistake. On the other hand they love a challege. Make testing fun and brutal and it'll be much more productive than if you make it boring and painless.
No Zen is good zen
If you're not finding the bugs, then you're not doing a good job testing.
When I code programs that are used by the general public. I find double-blind testing, and black-box testing works best. With software that means life or death or something severe I will also do white-box testing.
double-blind testing is when you give the code to a willing party and just let them work with it like they normally would for business purposes, without letting them know it is a beta testing. You have to also include some type of bug report that people can fill in if they wish, but try to encourage them not to cause bugs, and just work with the program as if it was normal. This allows you to see if any of the normal functions that people use everyday would be buggy.
Black-box testing works great to Just test the programs function calls and modules. When I do BBTesting I usually give it to another party with instructions as to how the functions are called and utilized. This party knows how to test the extremes and the common values and give me the best testing.
White-box testing is testing that involves intricate knowledge of the code. When I do this it is usually in development. At the end, if I feel like I enjoy pain I will do a through white-box testing suite for the program, but that has only happened once or twice.
In expenses, the cheapest form of testing is BB testing, followed by Double Blind, and then WB. Since white box testing takes a long time to design run and analyze the results I find.
There's some thoughts for you though.
~ kjrose
There are two subjects I want to discuss here. First of all, I'm going to present the "jelly bean model" of defect discovery, then I'm going to talk about why the "testing to improve quality" model is fundamentally flawed.
The Jelly Bean model goes like this: Let's suppose you have a big vat of red and blue jelly beans. Your objective is to remove all the blue beans. You do this by reaching in, grabing a hand full of beans, throwing away all the blue ones, and dumping the red ones back in.
At the begining, it will be very easy to find the blue beans (assuming the blue-bean density is high), and towards the end, it will be very difficult (since the blue-bean density will be low). If you graph the cumulative number of blue beans you remove each day, you'll get a exponential curve; quite steep at the begining (high rate of discovery) and which flattens out as you approach total bean removal.
Software defect discovery follows this model exactly. Defects are easy to find at the begining if there are a lot of them, and hard to find towards the end. This means that if your defect discovery rate is pretty much constant (with respect to the number of hours of testing you've done) then you're probably still way down in the very first part of the curve, and your number of defects is probably very high.
Here's the important thing to remember though; the quality of your product has nothing to do with how many defects you find and fix during testing. The quality of your product is determined by the number of defects remaining! If you find and fix 10,000 problems, you might think you're doing very well, but if there are 10,000,000 defects remaining, your product is still crap.
You can estimate the number of defects remaining by trying to fit the number of defects you've found so far onto that exponential graph I mentioned above. The most popular method to use a Weibull curve, or Quadradic Regression.
Now, why is testing to improve quality a bad plan?
Let's say you worked at Ford, and roughly 50% of the cars you turned out had something wrong with them. You get lots of unhappy customers demanding their money back. Is your problem:
a) That you have a design defect in your car.
b) That you are introducing defects in production.
c) That you are testing cars insufficiently.
Most people realize that to test every car as it comes off the line is futile. There's too many of them, with too many potential points of failure. There's no way you can test them all. The root cause of the problem has to be in either a or b, and if you're looking to improve the qulaity of your cars, this is where you would spend your money. This isn't to say that Ford doesn't test their cars, I'm sure they do, but testing should be a means of verifying quality (IE, 1/1000 cars tested had a defect, our goal was 1/500, so therefore we can stop spending money on finding design and production faults), and not a means of improving it.
It's so easy to see this when we're talking about cars. Why does everyone get it backwards when we start talking about software?
Not only is it impossible to test every possible combination of inputs to most software, it's also very expensive to find and fix problems this way. If you find a problem in design review, or code inspection, then you have your finger on it. You know EXACTLY where the defect is, and how to fix it. On the other hand, when you say "Microsoft Word crashes when I try to select a paragraph and make it Bold", you have no idea where the fault is. Any one of several thousand lines of code could be the problem. It can take literally days to track down and fix the defect.
Your testing should not be a means of finding faults, but a means of verifying the quality of your product. Testing is not part of the development process.
Thank you for your interesting and insightful treatise.
Clearly, with your keen grasp of the history of programming, and your understanding of VB and HTML, you will someday make a fine salesman or mid-level manager. You already exhibit the skill and insight used by management at most companies to plan their projects, procedures, timelines, and budgetary requirements.
Many programming issues would clearly benefit from simplification, and perhaps you are on to something. By reducing the number of tools a language attempts to implement, it clearly decreases the number of distractions for the programmer. If you wish to pursue this concept further, you may perhaps wish to research another foundational language called Logo.
Good luck in your next class!
Billbert, PhB, TFIC
Redmond, WA
P.S. - A side note regarding the original article:
A firm understanding of your process for code development, and a clear design for your testing procedure are essential.
If you spend some time in advance on planning, your code will benefit. Also, a good test plan should include a measure of the relative impact of a 'bug' or 'defect' to help determine priority of response by your programmers.
By focusing on your programming objectives from the beginning, and maintaining that focus throughout your entire design lifecycle, you should be able to identify underlying problems in your current development model and use them to improve your entire process, with a goal of helping prevent errors in design, and catching errors in code before going to test.
Peer review at each stage prior to testing (planning, functional design, algorithm design, coding, and test design) will also help catch errors in advance, and lead to the development of much better code. It may sound like it will take longer and cost more, but it saves time and money in the end in terms of not having to rewrite and maintain poorly implemented code.
Cheers.
Though remember, this is Slashdot :) Automated testing is common in embedded systems programming, and all but non-existant for any kind of Open Source desktop applications (gcc is an exception).
You write test cases as you go. You make sure you can run an automated regression test at any time. If you don't do this, then any time you change code you might break old code and you won't realize it. Just doing spot checks at the keyboard isn't good enough. And the programmers need to be writing these test cases first, and they need to be kept separate from tests written by external groups.
My personal recommendation is the "Cleanroom" methodology. You create a functional specification with a mathematical guarantee of completeness and consistancy. Auditable correctness is also a part of the process. Then when it comes to testing you generate test cases that cover all states, all arcs and then do statistical test case generation based on a usage model. The overall cost of this process is a bit more up front, but studies have shown that the process far more than pays for itself in greatly reduced maintenance/debugging costs.
So to answer you question is that to generate a decent set of test cases, you really have to understand the problem space and have mapped out the state-space in some manner. Trying to derive this without a methodical approach and ones testing will be spotty. The worst I've seen so far was a random state-space walker (ala Brownian motion). Statistically this approach avoids all the difficult cases in the far corners of the state-space.
Now for the bad news: Cleanroom is quite tedious for the programmer. The enumeration phase takes seemingly forever and can be mind-numbingly boring.
Here's the amazon link on the layman's book on Cleanroom: Cleanroom Software Engineering: Technology and Process by Stacy J. Prowell, Carmen J. Trammell,Richard C. Linger, Jesse H. Poore
And now for the shameless self promotion bit with a long winded sales pitch for executives on Cleanroom: my own Cleanroom company: eLucidSoft.
Just chant over and over: "Hire eLucid, play golf."
I used to wonder what was so holy about a silent night, now I have a child.
At the time that you are coding, every assumption is going through your head. This is time to write it down, either on paper, in a document, or in comments in the code. The mental state you are in when designing test conditions cannot come close to the state of mind you are in when coding (if you are concentrating :) . You are mentally closer to a problem when you are coding than when you are designing, and you can take the shortcomings of the platform you are working on and pair it with the shortcomings of the design.
Any consideration you have during the writing of a single line of code is gold. And like a great dream, if you don't get it down when you think it, you will lose it in a day or two.
My 2 cents.
http://pcblues.com - Digits and Wood
MSDN used to have a column called Stone's Way or something, and in one of them they discussed user case testing: set up a video camera to record the user as he/she uses the program OR masquerade as a nondeveloper and spy on the user as he/she uses the program.
If you're just looking for regular bug testing, assume it's a given that the user will not report bugs to you and have the program automatically email you a core dump and/or stack trace and/or any appropriate data if an unhandled exception occurs.
[o]_O
I work for a large company with a large number of internally developed applications.
I am shocked at how frequently our developers don't have a good understanding of their architecture, or sometimes even the problem that they are trying to solve. As a result, when they go do do "testing" they are frequently performing tests that are not valid.
For example, they might create a new build and test that build only on their development workstation before full deployment of the application.
Naturally the development box has different resources from that of a standard production machine. Many developers don't seem to understand this.
Another example - frequently boundary conditions, or interfaces to other applications are not fully tested.
Using bad methodology, all of the time that you spend testing is wasted.
Management tends to feel that testing time is wasted because their experience is that the time that they have invested in the past has been fruitless.
Please develop:
valid test cases,
valid test plans, then
execute them,
find gaps, then
use the gaps to learn how not to make the same mistakes in the future!
Phooey.
Anomaly
PS - God loves you and longs for relationship with you. If you would like to know more about this, please contact me at tom_cooper at bigfoot dot com
But Herr Heisenberg, how does the electron know when I'm looking?
I heard an interesting presentation at the Baltimore Chapter of the Association for Women in Computing, given by Dr. Linda Rosenberg - Chief Scientist for Software Assurance at NASA's Goddard Space Flight Center. The presentation is online. In the middle of the presentation was a discussion of some of the tools the assurance team uses. There's a density test, which might be of particular interest to the original poster. The density test looks at how many errors one would expect to find based on lines of code, and I think complexity but I'm a bit fuzzy on the exact details. Anyway, part of it is that there is that based on measurement of past debugging there is a curve of how many errors you would expect to find over time. When the testing team says they are done, the results are compared against the curve, to judge whether they fit the pattern one would expect to be done. That is, how far along on the curve are they? There's another tool the assurance team has to judge whether requirements are meaningful, by evaluating the requirements document to make sure it's complete. Using key phrases to judge whether a requirement statement is definitive or fuzzy or incomplete (that is does it say "to be determined later") and giving a percentage of how many of these types of statements you have compared to the rest of the document. Maybe not perfect, but a starting place to judge.
Neither programming or software assurance are areas I have knowledge in, so take the above with a grain of salt and check it out yourself. It all seemed rather interesting.
Here are some experience reports showing software inspections to be effective: (1) G.W. Russell's "Experience with Inspection in Ultralarge-Scale Developments" (IEEE Software, Jan. 1991), (2) E.F. Weller's "Lessons from Three Years of Inspection Data" (IEEE Software, Sept. 1993), and (3) Grady and Van Slack's "Key Lessons in Achieving Widespread Inspection Use" (IEEE Software, July 1994).
There are many other such articles; I'm highlighting the IEEE Software articles because they're easy to get. Full disclosure: I co-edited a book on software inspections, titled "Software Inspection: An Industry Best Practice" (IEEE Computer Science Press, 1996; edited by David A. Wheeler, Bill Brykczynski, and Reginald N. Meeson, Jr.). That book reprints the most interesting articles on software inspection of the time, many of which are quite hard to get hold of, as well as additional material not found elsewhere that gives you the "big picture." Unfortunately, that book is now out of print and getting increasingly hard to get, but you might be able to get a used copy (or convince IEEE to reprint it).
- David A. Wheeler (see my Secure Programming HOWTO)
Then dont test it. The problem isnt that the testing is reporting too many non-bugs. The fact that your finding a large number of bugs means simply that a large number of bugs exist. Less testing does not reduce the number of bugs in a program, it simply reduces the bug count.
I will admit that bug testing often results in many non-bugs being reported. But that does not mean that testing is a bad thing. If anything, testing should be started as early as possible, and be carried out frequently. The earlier that bugs are found, the easier it is to fix them.
END COMMUNICATION
Unfortunatly i've come to agree that some bugs are not important enough. I have some in my code that the customer will never see just by following directions exactly, and even if seen it is a minor problem to the customer. To fix the problem though would require a major design change, and when we want to ship in 3 months there isn't time for that level of change to the code.
It has to be said -- maybe if you did do XP-style unit testing you'd get better results?
My code comes back from test with three levels of bugs.
Problem 1: A crashing processor doesn't show up on the GUI. Turns out that I told the GUI that a porcessor crashed, in testing I saw something come out on the command line, and didn't notice the mispelling, but the GUI didn't know what to do with it. Took longer to open the file than to fix the bug. These should all be fixed, unfortunatly they are all minor enough that if caught late in the test cycle they are defered.
Problem 2: After causing failure A, failure B wasn't detected, it turns out the code to detect failure B is the same as failure A, and once A occures the code stops watching for B, even though the two are not related. This is a fundamental design problem, and can only be fixed by a re-write. (My excuse: someone else wrote the code and quit, I maintain it, but I have to impliment feature gamma before I can fix this problem...)
Problem 3: Tester pulls the ethernet cable between two nodes, and the complains that we said the node broke instead of the ethernet cable. This can be fixed, but we need some other way of determining that the other node is still operational we just can't communicate with it.
the first one is easy to fix, the second is solvable, but takes a lot of time, and the third can't be solved. When you come across the third, I hope you have better luck that me with people noticing the bold letters in your documentation noting that additional hardware is needed to solve that problem.
From experience, we know that testing is essential for maintainable code, but the practice which REALLY catches defects is peer review, not testing.
Two effective ways, both tried and true:
1. Take your code to your coworkers (at most two at a time) and have them go over it with a checklist of common defects and goals. Then explain the code (while they still have the checklist, so they can make additions), then have them explain the problems they found. Look up "formal reviews" for more info.
2. Pair program at all times. Follow the basic XP guidelines for this: be sure to rotate teams. This is usually harder to arrange than #1 (it takes more management support), and it doesn't provide recorded numbers for analysis, but it squashes a similar number of bugs, and has other benefits (for example, knowledge about every chunk of code is present with at least two people on the team, rather than just one).
-Billy
Test smarter, not less. Many of the above posts have shrewdly indicated good method of preventing bugs (an ounce of prevention, anyone?), and below I have some ways you may want to consider in testing your software. Sadly, the poster didn't give a whole lot of information to go on, so this offering is a bit vague, I'm afraid.
At our shop, we invest a good amount of time in developing test modules for all interfaces in the project, almost as much time as it takes to write the modules themselves. Every night, an automated build machine builds and installs the latest version from source control, and initiates a test script that invokes the test driver suite which loads and tests all public interfaces. Every function is tested with expected input versus expected output, and further by sending unexpected input (like that of a malicious user, or failed communication). All exceptions, assertions, and incorrect output is checked as well, as any memory leaks, by the C-runtime debugging utilities, and Bounds Checker will soon be in use as well.
This approach serves 3 purposes:
- It readily finds crashes that human testers may not find, since it checks every public function with varietal input.
- It aids regression testing for modules that have been changed.
- It checks for leaks, which can cause problems that may be evident only after extended execution times (doesn't apply if your language is garbage collected)
Further, by having an automated build every night (that emails everyone on the project, including the project manager, ehhem), developers are careful not to "break the build", and thus pay attention to details before committing their code.This method is only a preliminary test; however, and it cannot find every bug that may or may not exist. All code is audited to ensure it meets coding standards. You'd be surprised how failing to meet standards often evidences a piece of code that was written in haste, and as such, is a good place to looks for bugs. Also, we employ code testers who run the software through use cases (which are sequences of actions determined to be general ways in which users want to use the software). These testers also put the software through non-use cases, in which the actions do not follow expected sequences.
I hate to point out the obvious, but it is important to test software freshly installed on "clean machines", not development machines.
In the end, if it is at all possible, it is nice to have users who are in some way affiliated with the software company to give a "field test" so-to-speak on alpha and beta versions of the software. In any complex piece of software, there will be a chance for hard to find bugs (relative to the developer) to occur. Tolerant users are the best and last resort to finding any obscure bugs before the general public gets their click-happy hands on the software.
Conclusion: design smart, code smart, follow standards, test smart, test everything, and track all bugs (as if that is news to anyone).
How can you be sure you are 'Properly Testing Your Code'?
Actually you can do this by adding more bugs, yes adding them, The technique is called bebugging and the is basicly:
1) Produce code, it contains an unknown number (N) of bugs.
2) Programmer (or bebugger) seeds the code with a number (B) of known new bugs, the number and type of bugs should be determined from bugs found in previous debugging cycles.
3) Code is submitted to testing and some bugs are found (F).
3) The bugs found are examined and categorised as either real bugs (FN) or bebugs (FB).
4) Number of real bugs (N) can be found as the ratio of found bebugs (FB) to unfound bebugs (F).
5) Don't forget to remove all the bebugs.
Our testing philosophy:
1. Test every branch. (This goes hand in hand with making sure you have small enough functions that they only have one or two branches apiece.) Sounds like overkill, but when you're writing a function it's easy, if a little dull, to write test cases for each branch. It's much harder to fix a bug in a given section of code if all you know is that one of the sixteen functions it calls has a bug, and that bug is in one of seven functions that code calls, etc.
2. Maintain your test suite. Every test we've ever written is still in our test suite, and every time a bug comes in, that bug gets its own set of tests and goes into the suite.
3. Run your test suite regularly.
This works well for a lot of things; the big area in which it doesn't work is very complicated 'mathy' algorithms for which you don't really know what the correct answer ought to be other than by just running your program and seeing what pops out and stochastic algorithms. Stochastic algorithms in particular are a total bitch to test. But, for the other stuff, there's really no better solution.
-jacob
Overally, good testing catches perhaps 63% of all defects. Code inspections alone catch about 63%. Combined on a project, they catch about 95+% of all defects. That's the key. (My copy of Code Complete is at the office and I'm still at home, but that has the exact numbers and study).
And remember, a good testing regiment will include all kinds of testing. Unit tests and integration tests are both needed (Usually it's only the latter that happens in QA). And it's quite handy to have started the unit tests before you start coding the units.
Outside of creating an automated regression suite, I don't see much use in test cases. I mean so you test 10, 100, 1000, 100000000000000 things that the user might do... That leaves a lot of room for creativity on the end user. I actually had a manager who tried to say since a bug wasn't covered in a test case, it didn't need to be fixed. WOW.
I think there should be about 20% of time dedicated to running testcases/regression tests 60% of the time to automating the above tests, and 20% of time dedicated to allowing testers to just beat on the product... Get creative and think like a user...
...and say, "Developers should write their test suites BEFORE they write their code."
We have a fairly large open source project with contributors coming in and going out all the time (well, not a lot going out; but any number is a problem there). Our experience shows that if you can't write a test suite you're not ready for anything more than a crude prototype. The problem with test-after-coding regimes is the testing gets short-circuited. You've already got working code. You "know" it works. You're just proving it works. So you test the obvious stuff that proves this.
Since we have instituted this policy, coding efficiency has actually improved. Coders who have tried to devise a complete set of tests have formalized their understanding of the requirements in a sense which the most complete requirements doc will never do. We include the test suite in CVS. Nobody commits until their update passes the entire test suite. This results in an enormous (but complete) test of everything done so far. But you can't imagine the thrill of seeing your patch pass that many tests the first time.
All of which is completely separate from what a QA process is for.
Eternal vigilance only works if you look in every direction.
I have to agree about the code reviews. There have been plenty of studies showing that frequent code reviews just work better - think of it as a way to suck up a large number of the advantages of pair programming without actually doing it.
Speaking of pair programming, it's also been shown that you'l save a lot of time and effort if you use pair programming to do the complicated or difficult chunks of code. Yeah, there's the cutting-your-productivity-in-half argument, but that really only holds true if you don't know how to use pair programming. If done correctly, you'll save more than enough time to justify the cost later on when you don't have to put nearly as much effort into debugging that code.
As for testing, it's overrated, almost worthless. I realize you gotta do it, but it does so much to distract from the best way to keep bugs out of your program (which is to not put them there in the first place), that I wonder if testing doesn't in some wierd indirect way actually create more bugs than it discovers.
Heck, with code reviews, your programmers will probably start writing better code just so they don't hvae to stuffer the embarassment of having someone notice a particularly stupid algorithm design flaw in the middle of a code review.
I beg to differ. This is how most developers test their code as well, though manually.
If you're just testing to make sure your code does what it is supposed to do you are likely in BIG, BIG trouble. Users (and black hats) do just the opposite.
Focus just as much on making sure your code doesn't do things it WASN'T designed to do. Or risk a CERT or Security Focus advisory...
Perhaps you just load up your software on a populor website and wait for bug reports? You know if your audience is addicted enough to the site this just might work.
You can't grep a dead tree.
One of the big problems that comes up with developing is that the people writing the code have two big strikes against them for testing their own code:
1) They know how it is supposed to be used
2) They want it to work
The side effect of this is that frequently the code will give every impression of working perfectly until somebody who isn't familiar with the code tries to play with it. Then suddenly they are doing unexpected things, entering blank spaces, wierd characters, and other things that can be expected to happen in the real world.
So, in any system you are developing it is very useful to have the developers try to break eachother's code. People don't want to break their own code, so generally they don't.
This sig has been temporarily disconnected or is no longer in service
So, it isn't done well, if at all. QA isn't taken seriously. Regression tests, if they exist, are poorly maintained. Then people wonder why something wasn't caught before it went out.
Put the QA department in charge of nightly builds, testing, docs and shipping. Allocate sufficient time for a thorough testing. Give them the power to hold up a release. Hire good people to do this, people who are good developers themselves. Pay them well and listen to them. Don't second guess and overrule them.
Wansu, th' chinese sailor
I had this grandiose idea that we would run all our unit tests as one big regression test every night, and it would alert us when our server broke. It didn't turn out that way. Instead, when the server was changed, we ended up having to rewrite the unit tests, and in many cases that turned out to be a royal pain. So we stopped maintaining the unit tests. When one, and later two, unit tests were constantly failing, then nobody cared any more about broken regression tests.
There are benefits with unit tests, though:
So, my experience is that stringing unit tests together into a regression test suite is not worth the effort. Sorry JUnit, I also loved the idea when I first read about it, but I don't think it works.
Mats
Okay, the example was a little obvious, and likely would be flaged. I wanted an obvious example that even non-programers are likely to understand. 20 lines of comment and 10 lines of code with a subtile logic error that work 99% of the time through all brances except when some unchecked condition exists are really hard to write on the fly, much less so that everyone can understand why *foo would not be null and yet have invalid data...
I was once on a project where very few bugs were found in testing. Eventially we shipped and discovered that the testing group wasn't doing a good job of testing.
I'm certain someone has already said this, but over 80% of defects come from crappy requirements. Forget about your design & analysis, your coding practices, inpsection techniques, debugging and testing abilities - if your requirements are not CLEAR, CORRECT, ATOMIC, UNAMBIGUOUS, and CONSISTENT, you might as well start burning money.
NASA correlated a $1 cost to correct a "defect" in the requirements stage (here a defect can be any requirement that does not meet all 5 attributes I listed above) to several hundred to thousands of times over when addressing the same defect at the testing stage. Crappy requirements and crappy specifications are a big part of what makes your code buggy and expensive.
LA Times posted a study last year that showed that the average US programmer only coded for 51 days a year. 51 days!! One fifth of your working year spent writing new code. The rest of the time? DOING REWORK.
Biggest cause of rework?
UNCLEAR AND AMBIGUOUS REQUIREMENTS.
Spend the time and effort to beef up your requirements gathering and management processes. You'll get your ROI in ONE project cycle.
"Content's a bitch."
Many others have pointed out that studies consistently show that formal reviews (especially of specifications and designs) are the most cost effective ways of removing defects. Others have provided references to the classic books on the subject. Anyone considering doing formal reviews should read them. I personnally like Tom Gilb's books.
There is a downside to consider, however, which is little mentioned, even in the formal review literature. Formal reviews require a particular type of company culture, and not all companies have or want that kind of culture. Trying to introduce formal reviews in a company that has an incompatible culture will be some mixture of painful, counter productive and political suicide.
The idea that a company would, in any way, be opposed to using the most cost effective way of removing defects seems bizarre. The truth is, not all companies care about product quality. Sure, everyone will say they care, but words are cheap. To find out what a company really cares about, see what decisions they make under pressure. See what they sacrifice, and what they keep.
The difficulty is, that before you can introduce formal reviews into an organisation, that organisation must already be highly committed to quality. Quite simply, many organisations have to introduce other, fundamental, improvements before they can use the advanced technique of formal reviews. The Capability Maturity Model (CMM) produced by the Software Engineering Institute (SEI) is a useful way of prioritising these improvements. I recommend it; I've used it in a project, and ISO 9001/9000-3 in another, and I conclude that CMM is the better of the two. They have a website.
Ne mæg werig mod wyrde wiðstondan, ne se hreo hyge helpe gefremman.
I was really shocked to see the Wolfram testing method mentioned, because
I invented it (for Wolfram at least), back in 1989. Stephen himself gave me
credit for this type of testing numerous times, at various conferences, when I was still with the company.
I didn't invent random testing in general, but with a mathematically based
language like Mathematica, many operations are symmetric algebraically.
If I generate a random algebraic expression, Integrate it, then Differentiate it,
then it should then match (after simplification or variable substitutions) the original
Integrand. This was the first type of random test that I designed for Wolfram, and
man did it find slews of problems with V1.0 versions of Mathematica. Back when I worked part time
and was an undergrad at U of I (more clues to who DumbSwede is).
I worked developing scores of other math identity verification procedures for Wolfram
over a 10 year period, then sadly, moved on (I use to really enjoy pounding on that code).
Even things like word processors can be tested in this fashion. If you randomly add 10 characters, then
delete those 10 characters, should get you the original file back.
Operations do not always have to be symmetric, but they have to have some testable property or
identity after a series of events.
While this kind of testing doesn't replace other types of testing, I guarantee it takes you into
a whole new bug-space you didn't even guess existed with your software, requiring a more mathematically
consistent way of handling data.
On a related note, for GUI based apps, having a way to script equivalent events is a must.
Where I work today, we use Android to play back GUI events, but I pre-process the Android scripts,
(that I have peppered with replaceable tags) to do things like random testing or repetitive testing over
many inputs, rather than have a static GUI test, that can only simulate one very narrow set of events.
Letter To Iran
...richie - It is a good day to code.
Plusses
Problems? of course...
If you have to have X people run tests at cost Y, then adding more tests makes it more expensive! What kind of screwed-up logic is that, when a computer can do better at 4am while you sleep?
Any tester worth his salt is not a button-pusher/bug-report-writer. A real QA person writes automated tests and checks them into the code base so that it runs automatically when building. Flame to death anyone who checks in code that breaks a test. The optimum situation is that all developers are testers: they write tests and code and check them both in simultaneously.
If you're finding that tests are showing up lots of bugs, you're finding the symptom. It might be you're finding bugs and fixing them will reduce the amount. However, if you find that your development team creates bugs faster than it can fix them, then it means your organization doesn't know to code it's way out of a paper bag and shouldn't be programming. No amount of tests will fix it. The only thing that will work is refactoring, and most managers in such places erroneously think refactoring is the devil.
Not all software is equally testable. You have to write to so that is testable. If you want to read a great book on how to test your software, I recommend John Lakos' Large Scale C++ Software Design. Testing techniques apply to most other languages, not just C++. Personally, I've unscientifically found that every hour writing automated tests pays back at least 10 in saved future effort. YMMV will vary on the complexity of your project.
I can explanate how to administrate your network. You must configurate and segmentate it, so it can computate.
I'm sorry, but I didn't have time to read all the other responses. The replies I did read were mostly questions back at you and clarifications to other replies. So, here is my attempt to answer your questions.
:)
do you feel that excessive testing hurts the development process at all?
Yes, of course it does. You could, theoretically, test code for one program for the rest of your life and still not discover all the bugs. That would be excessive testing and would definitely be a bummer to the development process. I think what you really mean is how do you determine how much testing is enough. For this I refer you to a few good testing books because frankly speaking, people have made careers out of this sort of thing
Books: The Art of Software Testing (hard to find and a little expensive) by Glenford J. Myers; The Complete Guide to Software Testing by William Hetzel; Code Complete : A Practical Handbook of Software Construction Steve C McConnell. These are some good options to get you started.
What is the best way to get the biggest bang for your testing buck?
I would take a serious look at Orthogonal Array Based Robust Testing. A method of testing developed by Taguchi and Konishi, and using orthogonal arrays to determine test cases. I don't have enough room here to get into details, but basically this type of testing guarantees detection of atleast 1st and 2nd order defects with the minimal amount of test cases. Madhan S. Phadke's Quality Engineering Using Robust Design mentions this type of testing. Also Bell Labs has been so kind as to publish online some fairly heavy strength Orthogonal arrays, so you don't have to calculate them. My employer uses this type of testing on many of its projects and it's a huge time saver. I just learned about it in an onsite class by our top tester and am going to pitch it to my project soon.
Good luck, sorry for being so vague in places, and finally, if you have more questions about Orthogonal Array Based Robust Testing, please let me know: redundant_pleonasm@hotmail.com
From my experience testing at PictureTel and DEC in Mass., I found out that the usually-understaffed test team runs into the Laws of Software Testing: Law #1: Most of the time, you are not testing. You are obtaining (beg, plead, cajole) equipment and software, and you are configuring and fixing software, just so that the environment is ready for the software testing to occur. Law #2: When testing, most of the time you are not testing anything that matters. Sure, scripts are great, but they are very narrow, and bugs are sneaky. Running a limited variety of scripts across some clients and servers only gives the impression of coverage ... kind of like the concept of "busy work".
Law #3: When you find a bug, most of the time it can't cross the political barrier. After all, bugs are rated and prioritized, and that is the domain of management, who is overwhelmingly concerned about release dates.
Law #4: The bugs you don't find will reach the customer and will return as the highest-priority bugs that will usurp other bugs in the ongoing process. Fixing customer problems is of course a priority, but it landslides into the current product's production.
[also misbehaves on Kuro5hin as Peahippo]
Yeah -- our numbers can definitely highlight where review time correlates with testing/support time. But, as you said, breaking that down into whether the code's robustness vs. support/maintainability is a little trickier.
And I agree about reusability -- hand-in-hand with that is using 3rd party solutions when appropriate. Forget solving the problem twice or even ONCE if someone else has done it for you (and you can legally and ethically use their solution, of course, e.g. we found using ACE on a C++ project to be well worth it).
Luckily, we have a leg up on most places for these issues -- we can generally use Java and C++, which lend themselves to maintainable, supportable code far better than C or assembly, and we make the effort for our code to be readable and understandable if at all possible.
Xentax
You shouldn't verb words.
As explained the first time, many operations are symmetric, especially in the case of Mathematica, which is why it works so well at Wolfram. In some rare cases a symmetric operation may make and then erase a potential bug find, but it is very, very rare. Most often a bug is amplified when doing a series of symmetric operations. For a simple example, take factoring a polynomial expression, subtracting the original polynomial from the factored one and simplifying should give 0. It won't tell you that the factoring was done correctly, but it will tell the attempted factorization was not equivalent algebraically. More than this, it will tell you if your simplification algorithm was strong enough to find the two expressions equal to Zero after subtraction. One trick to this method is finding non-trivial ways to generate good mathematical examples randomly, for which I spent years coming up with a bag of tricks to do. The other trick is finding a testable pass condition. In the case of symmetric operations, the pass is that you got the original back, or Zero after a Subtraction (if you are testing something mathematical in nature). For the many Matrix operations, you test for properties that hold after a transformation, such as knowing what the determinate should be, or performing a series of events that will eventually lead to an identity matrix.
But don't just think Math here, think any testable property that data should have after a known series of events, even if you generate those events randomly. Hint, you may have generated the events randomly, but you know what they where, and can factor that into creating additional deterministic operations, that lead to a testable property - a property that doesn't need eyeballs to test.
A final note here. The real power of random testing is not that any one test is better than one test by a knowledgeable human - it is that it can do MILLIONS of more tests than a knowledgeable human in a given time frame. Most of the tests will pass, most combinations absurd in their utility, many will be repeated in trivial combinations, BUT if you only find one bug after 1000 tests random tests, you could still potentially find 100 to 1000 bugs a day, remember you only have to eyeball the Failures. Some days I came close to that hundred figure, with 20,000 bugs reported in my tenure at Wolfram. Bugs, you won't see in Mathematica, but most likely would have, without random testing.
OK, A final-final note.
We even mused about a creating a utility for customers to use that would constantly delve Mathematica for obscure hidden bugs during computer idle time, very much like SETI@HOME. The powers to be, didn't want to give the impression to customers that there would be any bugs in the product that should be looked for.
Letter To Iran