Too Darned Big to Test?
gManZboy writes "In part 2 of its special report on Quality Assurance (part 1) Queue magazine is running an article from Keith Stobie, a test architect in Microsoft's XML Web Services group, about the challenges one faces in trying to test against large codebases."
So this will be Microsoft's latest excuse, then? ;)
to Keep It Simple, Stupid.
Shouldn't that be too darned bloated to test? It shouldn't be hard to test the individual subcomponents for functionality and at boundary conditions. Of course, you can't fully test something as complex as the system in the article. No reasonable sized program can ever be fully debugged -- the possibilities are too many to explore. However, it is possible to fully verify the smallest components, and build large components from them and fully verify those as well. Obviously, the complexity increases greatly with each new layer, but when one is working with fully verified components, any errors that occur must be in the local logic. Granted, this is much more labour intensive, but as long as each component follows precise specifications, it's more than feasible. I'm amazed that many prominent software projects still use largely monolithic testing...
Be relentless!
For those who didn't RTFA, it is basically saying that exaustive (?sp) testing can't be done on a large codebase, and random testing is all you can use, to which most coders will say bull.
If a piece of code is too big to test exaustivly, it's time to refactor it into bits that can be.
After you've tested each part to make sure it works, you test a super set of parts, thus testing the interactions between the smaller parts, lather rinse repeat until you've tested th whole application.
Correct use of unit testing will always outstrip random testing.
This is just an excuse for badly designed code bases.
Ask 8 slackers a question, get 10 awnsers (a citation, but I can't remember from who)
The article just says what everyone knew ..
* code coverage != proper testing
* clever inputs are needed to test
* few programmers test concurrency
Ending with - "ECONOMY IN TESTING" (ever heard about "Good Enough Isn't")
Essentially apologetic about the lack of testing. Test driven development is not a philosophy, it's a way of doing. In a perfect company environment, you'll never be blamed for breaking someone's code - but in most places the idea is "he made me look bad". Peer reviews never work out properly. This is why FOSS is turning out more secure and clean code.
Quidquid latine dictum sit, altum videtur
Open-sourcing a project will do little to nothing in regards to testing. First, there is often little to no insentive to attract open-source developers. Second, a poor design is a poor design, and those in charge are highly unlikely to through a working design out (two rare exceptions are Apple's move to Mac OS X and Microsoft's move to NT). Third, open-source developers frequently have no insentive to test -- testing is boring and labourous. And while the occasional person may fix the occasional bug, on the whole, opening the source a product for testing purposesly is almost always a fruitless exercise.
Be relentless!
I recently had a problem with ordering from Amazon that illustrates the problem with testing and all the possible permutations of user actions. I was checking out when I noticed that high shipping cost from one vendor, went back to order from a different vendor and hosed the order. Apparently, there was only one of the item in stock and it was now committed to the pending, partially checked-out order. There was no way to clear the partially complete check-out process and no way to checkout with the item in my shopping cart -- it would only complain that I was trying to order TWO of the item and pull the ONE instance of the item from the cart.
Amazon is not the only e-commerce site with this problem (although I expected better from Amazon). Many sites fail to test for user action sequences other than the straight-through order process. I'm not suggesting that developers test for all possible sequences (that's impossible), but they should test for more plausible ones that a simple linear execution of the process.
When I did software testing (a task that I hated), I quickly broke an RDBMS application with just a simple series of adding and removing items from a user-manipulable working set of data objects. Moreover, I even broke the UI layer and dumped myself into a lower level of the RDBMS shell that was supposedly inaccessible to users. The developers grew to hate me so much for finding bugs in their code and the RDBMS vendor's code that I was moved to another job (YAY!).
The point is that it is often too easy to break code because the developers have created overly simple linear use cases that are then used in testing.
Two wrongs don't make a right, but three lefts do.
It is possible to build immense and complex code bases that are incredibly well tested and robust. Look at any Linux distribution and this is what you have.
The key is that the code base is structured so that it can evolve over time as many independent layers and threads, each using an appropriate technology and competing in terms of quality and functionality.
The problem is not the overall size of the code base, it's the attempt to exert centralised control over it.
To take a parallel from another domain: we can see very large economies working pretty well. The economies that fail are invariably the ones which attempt to exert centralised planning and control.
The solution is to break the code base into independent, competing projects that have some common goal, guidelines, and possibly economic rationale, but for the rest are free to develop as they need to.
Not only does this make for better code, it is also cheaper.
But it's less profitable... and thus we come to the dilema of the 2000s: attempt to make large systems on the classical model (which tends towards failure) or accept that distributed cooperate development is the only scalable option (and then lose control over the profits).
Sig for sale or rent. One previous user. Inquire within.
"Yo' codebase's so fat, when it get in a lift it has to go down!"
"Yo' codebase is so bloated, it's got its own dialling code!"
"Yo' codebase's so big, NASA includes it in orbital calculations!"
Etc. etc., ad nauseam et infinitum...
Software rewrites may be considered harmful, but at which point do you declare that enough is enough and start again, breaking it down into smaller, easily tested modules? Big, old projects (like, say, OpenOffice.org) can get so appallingly baroque that there must be vital areas of code which haven't been modified (or, more importantly, understood) in years - how do you test those?
Tedious Bloggy Stuff - hooray?
I do a lot of programming with visual output. It is impossible to have a computer check that the font got outlined correctly in the PDF, say.
When you combine this with user input and then rare-case branching logic, you can end up with a nightmare of unfollowed paths. Unfollowed, to some extent, means untested.
Just one extra branch can be disasterous because of factorials involved depending where it is placed in the branch pipeline. One minute, everything working, next minute some new code and
things that need to be eyeballed.[% slash_sig_val.text %]
Yes. It's a wonder why we even have packages like bugzilla anyhow. Nobody tests and reports bugs in opensource software. Ever. Nobody fixes them, either. Ever.
Where did you learn the trade? If I guessed you're pretty fresh from a Computer Science course, say two or three years in the business, would I be far from the mark?
I ask because what you describe is exactly what is supposed to happen. You know you're done when, surprise, QA stop sending you bugs (Or at least, stop finding bugs which are classified above a certain severity level). Then, and only then, should the software be considered complete and ready.
The problem is that attitudes like yours, that QA is a pain that should be wished away, is wrong but very very pervalent within the IT industry. It is such a wrong and totally backward attitude I can't fathom where it came from. It's a brain rot that's killing the industry in a see of broken code and half assed implementations.
... automatically performed by OTS:
Finally, testers can use models to generate test coverage and good stochastic tests, and to act as test oracles. A fundamental flaw made by many organizations (especially by management, which measures by numbers) is to presume that because low code-coverage measures indicate poor testing, or that because good sets of tests have high coverage, high coverage therefore implies good testing (see Logical Fallacies sidebar). One of the big debates in testing is partitioned (typically handcrafted) test design versus operational, profile-based stochastic testing (a method of random testing). Current evidence indicates that unless you have reliable knowledge about areas of increased fault likelihood, then random testing can do as well as handcrafted tests.[4,5]
For example, a recent academic study with fault seeding showed that under some circumstance the all-pairs testing technique (see Choose configuration interactions with all-pairs later in this article) applied to function parameters was no better than random testing at detecting faults.[6]
The real difficulty in doing random testing (like the problem with coverage) is verifying the result. A test design implication of this is to create relatively small test cases to reduce extraneous testing or factor big tests into little ones.[9]
Good static checking (including model property checking). If you know the coverage of each test case, you can prioritize the tests such that you run tests in the least amount of time to get the highest coverage. First run the minimal set of tests providing the same coverage as all of the tests, and then run the remaining tests to see how many additional defects are revealed. Models can be used to generate all relevant variations for limited sizes of data structures.[13,14] You can also use a stochastic model that defines the structure of how the target system is stimulated by its environment.[15] This stochastic testing takes a different approach to sampling than partition testing and simple random testing. Code coverage should be used to make testing more efficient in selecting and prioritizing tests, but not necessarily in judging the tests. Test groups must require and product developers must embrace thorough unit testing and preferably tests before code (test-driven development).
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
The monkeys are busy writing it. It's like infinite monkeys trying to write shakespeare, except when they finally write code that compiles it has many many unused lines, contibuting to bloat.
# cat
Damn, my RAM is full of llamas.
My own research group works on methods to reduce this burden in a number of ways. One, my personal work, is on "semi-random" testing (we call it Adaptive Random Testing) which, we claim, detects more errors with fewer tests and reduces the problem that way. Another is "metamorphic testing" which tackles the oracle problem more directly by a slightly more sophisticated form of sanity checking assertions. You test the program with two (or more) related inputs, and check whether the outputs have the relationship you'd expect based on the inputs.
Unfortunately, the boss has an, um, slightly behind-the-times attitude to putting papers on the web; but if you search the DBLP bibliography server for T.Y. Chen you can get references for most of them.
However, I'd be the last to claim that we have a complete solution to the oracle problem; there will of course never be one. But it is a problem that will continue to make automated testing a challenge.
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
I've seen this debate before, and the part I always wonder is "why not both?" At least, when you are starting from scratch. You can verify your components do what they are supposed to and then check for bizarre situations no one thought of with random testing (sometimes you will expose obscure bugs in the software stack itself, not just your code - but remember no code stands alone, and all crashes look the same to the end user no matter what the root cause.)
Particularly on large, old projects one has inherited, random testing can really help because you have absolutely no clue what you are looking for. There are so many discrete components to the system that could be tested it would be the work of ten years to set it up, so you are forced to (as much as possible) assume that things work and find the cases where they don't. Then, you gradually begin to fix things over the long haul while fighting fires.
GCL and the other free Lisp implimentations are a good example of testing - we have a very dedicated individual who has been creating tests of ANSI behavior from the spec and testing a wide variety of implimentations - indeed many non-standard behaviors have been corrected because of these tests. He has also created a "random tester", which I like to call "the Two Year Old Test." It is a code generator which generates random but legally valid Lisp code and throws it at the implimentation. It has exposed some very obscure bugs in GCL which probably would have otherwise hidden for years. Anybody who has been around small kids knows they will introduce you to all sorts of new failure modes in just about everything you own, so I always think the Two Year Old Test should be administered as a final check whenever possible. (Granted this works particularly well for compilers.) Newbies are very useful for this kind of stuff as well, because they will use the software in ways you never thought to.
"I object to doing things that computers can do." -- Olin Shivers, lispers.org
http://folklore.org/StoryView.py?project=Macintosh &story=Monkey_Lives.txt&sortOrder=Sort%20by%20Date &detail=medium&search=monkey
Back in the old days, a common way to write a program was to make code that can be used in many different places from within the program. Routines that are similar would be considered a bad thing, so you make routines that are designed to handle the differet situations that need similar code.
The problem with Microsoft is that they have forgotten or never learned how to design a program before their people have started to write anything. As a result, we see 384k patches from Microsoft that take several minutes to install on some systems.
Another problem is that there is a LOT of duplicate code that is in use even within common libraries.
The people who suggest that there are too many features are almost correct, but the problem isn't with the number of features, it's the way those features are added to programs.
Also, there is only so far you can take a given design while you add features before things start to break due to design. If you start with a good DESIGN, then implement that design in code, it becomes a LOT easier to debug.
Microsoft needs to come up with a NEW OS that isn't an extension of Windows NT or Windows 3.0(95/98/ME are still based on that old code in many ways). Windows NT was the right idea back when it was first developed. Toss the old design, start from scratch, and you end up with a better product. The only problem that Windows NT really had was that compatability wasn't written into the core design of the OS, it was a layer added on top, which means you need a "translator" to handle that. If it's in the design, then you figure out how to do the emulation of the old system in a way that is compatable with the "new" way of doing things. Today, it's not as difficult as it used to be back in those early days of Windows NT. We have enough processing power to make virtual machines that can handle just about anything if they are coded properly. The only problem is that the emulation of the old DOS environment or Windows environment hasn't been implemented by Microsoft.
But I've gone off topic a bit. The key to easily debugged code is to design in a way to make things properly modular. Almost all features within Windows should be TIGHT code. To open a file probably has 200 different versions of that code within the Windows XP code base scattered through all the programs that come with Windows XP or 2003. Think about that, and wonder why it's hard to debug.