Slashdot Mirror


Too Darned Big to Test?

gManZboy writes "In part 2 of its special report on Quality Assurance (part 1) Queue magazine is running an article from Keith Stobie, a test architect in Microsoft's XML Web Services group, about the challenges one faces in trying to test against large codebases."

18 of 215 comments (clear)

  1. Shouldn't that be too bloated to test? by MarkRose · · Score: 5, Interesting

    Shouldn't that be too darned bloated to test? It shouldn't be hard to test the individual subcomponents for functionality and at boundary conditions. Of course, you can't fully test something as complex as the system in the article. No reasonable sized program can ever be fully debugged -- the possibilities are too many to explore. However, it is possible to fully verify the smallest components, and build large components from them and fully verify those as well. Obviously, the complexity increases greatly with each new layer, but when one is working with fully verified components, any errors that occur must be in the local logic. Granted, this is much more labour intensive, but as long as each component follows precise specifications, it's more than feasible. I'm amazed that many prominent software projects still use largely monolithic testing...

    --
    Be relentless!
    1. Re:Shouldn't that be too bloated to test? by kernelblaha · · Score: 2, Interesting

      ...which is why open source works. The philosophy of OSS apps has always been to make small programs that do one thing very well, then join them together to get good funcionality for more complex tasks. And not through specific design, but throught adaptation and tinkering. ...yeah yeah preaching to the converted and all that...

      --
      Million dollar sig.
    2. Re:Shouldn't that be too bloated to test? by Bastian · · Score: 2, Interesting

      This sometimes amazes me. The market forces that push companies to try and release products ahead of the competition exist in every industry, but it seems to only be software that has responded in such an insane manner, and I'm pretty sure software is the only industry where a company who does this can get away with it.

      Let's consider the hypothetical situation where Airbus releases the A380 prematurely (to keep ahead of the market) and creates an airplane that costs an incredible amount of money to maintain - or even worse, breaks regularly. What happens in this situation? Easy; everyone throws up a huge stink, and Airbus loses lots and lots of business for the next few years or decades.

      On a smaller scale, I have definitely done this with Belkin - they released a couple too many crap products, and now I am never buying their stuff again, and I know of other people who feel the same way.

      But in software, companies can just promise that It Will All Be Better In The Next Releease. Repeatedly.

      Windows 95 will fix the world. Ooops, no, we meant 98. . . uhh. . make that 98SE. Nope, ME. Ahh, screw that, let's drop that line and give Windows 2000 a shot. Except you should probably try XP. . . . . SP2. . .

      And I don't mean to just Microsoft-bash; they are just an easy target. Apple does it, most the major Linux distros I've used do it, it seems like it is just the way the software industry works nowadays. And it is insane.

    3. Re:Shouldn't that be too bloated to test? by Reziac · · Score: 2, Interesting

      I knew a programmer who worked for Apple as a member of their core OS development team, back around MacOS7. He told horror stories about how poorly managed it was. One problem he specifically ranted about was that some manager would decide that YOU were DONE with a given project, and physically remove your work machine from your desk, give it to some other coder, and give YOU someone else's half-finished work (which you'd then have to figure out before you could work on it). So no one ever got to actually FINISH their coding, hence there was a lot of half-baked code, kludges, and workarounds. And management *forbid* them from publishing a patch to fix a particular broken firmware, because management wanted people to buy their next machine (with fixed firmware), not just fix the old one!!

      Anyway, my point is that just because what you see on the surface looks polished, doesn't necessarily mean the QA or development process is any better.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    4. Re:Shouldn't that be too bloated to test? by winwar · · Score: 2, Interesting

      "If only they really understood this. I presented this to my manager, and he said "But cost is free, because everyone is salaried and can just work overtime." He was serious."

      And some say that programmers/coders/employees don't understand business....

      Granted, from his perspective, it WAS free. Wouldn't seem to be a good way to run a business but there seem to be a lot of businesses that make lots of money operating that way.

  2. Too costly to test would be the real meaning of it by Gopal.V · · Score: 4, Interesting

    The article just says what everyone knew ..

    * code coverage != proper testing
    * clever inputs are needed to test
    * few programmers test concurrency

    Ending with - "ECONOMY IN TESTING" (ever heard about "Good Enough Isn't")

    Essentially apologetic about the lack of testing. Test driven development is not a philosophy, it's a way of doing. In a perfect company environment, you'll never be blamed for breaking someone's code - but in most places the idea is "he made me look bad". Peer reviews never work out properly. This is why FOSS is turning out more secure and clean code.

  3. Testing for real-world use by G4from128k · · Score: 3, Interesting

    I recently had a problem with ordering from Amazon that illustrates the problem with testing and all the possible permutations of user actions. I was checking out when I noticed that high shipping cost from one vendor, went back to order from a different vendor and hosed the order. Apparently, there was only one of the item in stock and it was now committed to the pending, partially checked-out order. There was no way to clear the partially complete check-out process and no way to checkout with the item in my shopping cart -- it would only complain that I was trying to order TWO of the item and pull the ONE instance of the item from the cart.

    Amazon is not the only e-commerce site with this problem (although I expected better from Amazon). Many sites fail to test for user action sequences other than the straight-through order process. I'm not suggesting that developers test for all possible sequences (that's impossible), but they should test for more plausible ones that a simple linear execution of the process.

    When I did software testing (a task that I hated), I quickly broke an RDBMS application with just a simple series of adding and removing items from a user-manipulable working set of data objects. Moreover, I even broke the UI layer and dumped myself into a lower level of the RDBMS shell that was supposedly inaccessible to users. The developers grew to hate me so much for finding bugs in their code and the RDBMS vendor's code that I was moved to another job (YAY!).

    The point is that it is often too easy to break code because the developers have created overly simple linear use cases that are then used in testing.

    --
    Two wrongs don't make a right, but three lefts do.
    1. Re:Testing for real-world use by Chris+Kamel · · Score: 4, Interesting

      The developers grew to hate me so much for finding bugs in their code and the RDBMS vendor's code that I was moved to another job (YAY!).
      I don't know what kind of developers you were dealing with there, but I am a developer myself and I actually like and respect QA or test engineer who come up with creative and "smart" bugs, they keep it interesting, they make my job easier and they make for a more successful product, so what's there to hate about them?

      --
      The following statement is true
      The preceding statement is false
    2. Re:Testing for real-world use by gstoddart · · Score: 2, Interesting
      I don't know what kind of developers you were dealing with there, but I am a developer myself and I actually like and respect QA or test engineer who come up with creative and "smart" bugs, they keep it interesting, they make my job easier and they make for a more successful product, so what's there to hate about them?

      As much as I rely on our QA people to come up with bizarre inputs, sometimes bug reports from QA can be a bitch to decode. They'll have the tester's perceived explaination of the source of the bug, which may or may not jibe with the actual one; it's like user-reports -- sometimes the interpretation is a red-herring explaination.

      I've had to explain that the bug they saw was in other code because it caused a bizarre interaction it wasn't supposed to.

      Unfortunately, users submit bug reports to the software they were interacting with.

      --
      Lost at C:>. Found at C.
  4. Not darned testable by tezza · · Score: 4, Interesting
    At least by a computer.

    I do a lot of programming with visual output. It is impossible to have a computer check that the font got outlined correctly in the PDF, say.

    When you combine this with user input and then rare-case branching logic, you can end up with a nightmare of unfollowed paths. Unfollowed, to some extent, means untested.

    Just one extra branch can be disasterous because of factorials involved depending where it is placed in the branch pipeline. One minute, everything working, next minute some new code and

    (n+1)!
    things that need to be eyeballed.
    --
    [% slash_sig_val.text %]
    1. Re:Not darned testable by BenjyD · · Score: 2, Interesting

      I've faced this problem too with checking visual output. What I will probably do at some point is do automated screenshot comparison: have the system do the test, then compare the relevant region of the screen to a known-good image as a regression test. The only problem I can see with that is that generating the known-good images is time-consuming and minor changes would require regenerating them all.

  5. Re:Got an idea by MarkRose · · Score: 1, Interesting

    I've found bugs in OSS before. I've reported them, and they've been fixed. Remember that people don't have unlimited time, and that your itch itches less than their own. If you are unwilling to write the fix yourself, what real insentive to they have to scratch your itch first? Have you tried putting a bounty on the bug you want resolved, such as cash?

    Complaining about bug-fixing in volunteer maintained software is like complaining that no one picks the litter up in front of your house.

    --
    Be relentless!
  6. The Oracle Problem by Goonie · · Score: 4, Interesting
    One point that this article doesn't really come to grips with regards to stochastic testing is the "Oracle Problem". In essence, how do you know that the result of testing is the right answer? This is a particular problem with random-input testing, or any testing method that involves using automatic methods to generate a large number of tests.
    #ifdef PLUG

    My own research group works on methods to reduce this burden in a number of ways. One, my personal work, is on "semi-random" testing (we call it Adaptive Random Testing) which, we claim, detects more errors with fewer tests and reduces the problem that way. Another is "metamorphic testing" which tackles the oracle problem more directly by a slightly more sophisticated form of sanity checking assertions. You test the program with two (or more) related inputs, and check whether the outputs have the relationship you'd expect based on the inputs.

    Unfortunately, the boss has an, um, slightly behind-the-times attitude to putting papers on the web; but if you search the DBLP bibliography server for T.Y. Chen you can get references for most of them.

    #endif

    However, I'd be the last to claim that we have a complete solution to the oracle problem; there will of course never be one. But it is a problem that will continue to make automated testing a challenge.

    --

    Any sufficiently advanced technology is indistinguishable from a rigged demo
    --Andy Finkel (J. Klass?)
  7. Re: lots of monkeys by NadMutter · · Score: 3, Interesting
    That's not far off what apple did back in the '80s. They only used one 'monkey', though.

    http://folklore.org/StoryView.py?project=Macintosh &story=Monkey_Lives.txt&sortOrder=Sort%20by%20Date &detail=medium&search=monkey

  8. bloated code, or just poorly written? by Targon · · Score: 4, Interesting

    Back in the old days, a common way to write a program was to make code that can be used in many different places from within the program. Routines that are similar would be considered a bad thing, so you make routines that are designed to handle the differet situations that need similar code.

    The problem with Microsoft is that they have forgotten or never learned how to design a program before their people have started to write anything. As a result, we see 384k patches from Microsoft that take several minutes to install on some systems.

    Another problem is that there is a LOT of duplicate code that is in use even within common libraries.

    The people who suggest that there are too many features are almost correct, but the problem isn't with the number of features, it's the way those features are added to programs.

    Also, there is only so far you can take a given design while you add features before things start to break due to design. If you start with a good DESIGN, then implement that design in code, it becomes a LOT easier to debug.

    Microsoft needs to come up with a NEW OS that isn't an extension of Windows NT or Windows 3.0(95/98/ME are still based on that old code in many ways). Windows NT was the right idea back when it was first developed. Toss the old design, start from scratch, and you end up with a better product. The only problem that Windows NT really had was that compatability wasn't written into the core design of the OS, it was a layer added on top, which means you need a "translator" to handle that. If it's in the design, then you figure out how to do the emulation of the old system in a way that is compatable with the "new" way of doing things. Today, it's not as difficult as it used to be back in those early days of Windows NT. We have enough processing power to make virtual machines that can handle just about anything if they are coded properly. The only problem is that the emulation of the old DOS environment or Windows environment hasn't been implemented by Microsoft.

    But I've gone off topic a bit. The key to easily debugged code is to design in a way to make things properly modular. Almost all features within Windows should be TIGHT code. To open a file probably has 200 different versions of that code within the Windows XP code base scattered through all the programs that come with Windows XP or 2003. Think about that, and wonder why it's hard to debug.

  9. I wonder what the IRS would say... by ebuck · · Score: 2, Interesting

    If you claimed an income tax return too big to audit for accuracy, or better yet, too big to file.

  10. What's wrong with unit testing? by Jerk+City+Troll · · Score: 2, Interesting

    Instead of trying to test huge code bases, why not write decoupled systems and test small pieces of code? Oh wait, that requires effort.

    I've worked on a number of projects (that borderline on huge) which have a thorough set of unit tests. Each test sets up pre and post conditions and checks the output against what we expect. (Duh!) It's not difficult, it just requires planning and careful attention to detail.

    If you've ever built Perl from source, you'll notice that the entire code base gets tested during the process.

    I have to say that it's not about theory or speculation, it's just about hankering down and doing it.

    Testing, fundamentally is not that hard. I think the real problem is developers often trying to find excuses to either put it off or worse yet, not do it at all. Added to the problem are badly designed architectures where most components have tight dependencies with others. This prohibits running them in isolation and hence limits testability. Naturally, it's always more complicated than this (budges on time and money) but the root of the problem is lack of motivation or ignorance to the benefits of having easily and hence well tested code.

  11. Re:test every square root? by Anonymous Coward · · Score: 2, Interesting

    That's exactly how Intel ended up with a FP bug in one of their processors...

    So much for your theory on testing.

    Random sampling testing is only good for the testing of identiacal product production to test for trends in product manufacturing. It is absolutlely NOT the way to test the function of software, well except that it can become impossible to exhaustively test as the paper mentions.

    That is why we have the theorum that states that it is impossible to completely test any software greater than a given size. And that size it amazingly small.

    Frankly, "Common Sense" is more frequently than not "No sense at all". It betrays a complete lack of understanding of the real-world which is infinately more complex than "Common Sense" ever gives it credit for.