Slashdot Mirror


Too Darned Big to Test?

gManZboy writes "In part 2 of its special report on Quality Assurance (part 1) Queue magazine is running an article from Keith Stobie, a test architect in Microsoft's XML Web Services group, about the challenges one faces in trying to test against large codebases."

61 of 215 comments (clear)

  1. I get it by Hyksos · · Score: 5, Funny

    So this will be Microsoft's latest excuse, then? ;)

    1. Re:I get it by pklong · · Score: 5, Funny

      Naa, we all know Microsofts testing strategy is to release it to the public and see what happens.

      You save on the software testers wages that way :)

      --

      Philip

      Signatures are broken

    2. Re:I get it by jellomizer · · Score: 4, Insightful

      Except people pay Microsoft to beta test their software.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    3. Re:I get it by jellomizer · · Score: 4, Insightful

      To continue on. I think that is part of the problem. a lot of Microsoft Beta Testers are just those Windows Nerds who think it is hip and cool to run the unpolished edge of technology and able to put on their resumes that they have 11 Year experience in windows 95. But most of them do in counter bugs but do not report them. Why should they they have to pay to get the product so why bother reporting bugs. Microsoft should release the Beta Tests for Free and to a wide group of people thus allowing them and promise them a free copy of the release product if they report so many bugs. The trick for beta Testing is to get as many eyes on it as possible who know that this isn't a completed or stable product. and are able to try funky things to break it.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    4. Re:I get it by dioscaido · · Score: 4, Funny

      Google is pretty awesome at that too. Do then have any products that aren't in beta?? :)

    5. Re:I get it by MrMickS · · Score: 2, Informative

      Apple released a public beta of OS X to take advantage of the nerd factor. This did cost, but only enough to cover shipping costs. One key thing was that they provided an easy mechanism to provide feedback on bugs encountered. That there were bugs sort of proves the point of the article, that the OS as a whole was too big to be tested, at least in economic terms.

      --
      You may think me a tired, old, cynic. I'd have to disagree about the tired bit.
  2. lots of monkeys by FlashBuster3000 · · Score: 2, Funny

    they should just hire a lot of monkeys to test their software.
    Besides, in this way the IQ of the later user and the testers arent differing too much.

    1. Re: lots of monkeys by bcmm · · Score: 4, Funny

      The monkeys are busy writing it. It's like infinite monkeys trying to write shakespeare, except when they finally write code that compiles it has many many unused lines, contibuting to bloat.

      --
      # cat /dev/mem | strings | grep -i llama
      Damn, my RAM is full of llamas.
    2. Re: lots of monkeys by NadMutter · · Score: 3, Interesting
      That's not far off what apple did back in the '80s. They only used one 'monkey', though.

      http://folklore.org/StoryView.py?project=Macintosh &story=Monkey_Lives.txt&sortOrder=Sort%20by%20Date &detail=medium&search=monkey

  3. The key is... by Anonymous Coward · · Score: 3, Insightful

    to Keep It Simple, Stupid.

  4. Shouldn't that be too bloated to test? by MarkRose · · Score: 5, Interesting

    Shouldn't that be too darned bloated to test? It shouldn't be hard to test the individual subcomponents for functionality and at boundary conditions. Of course, you can't fully test something as complex as the system in the article. No reasonable sized program can ever be fully debugged -- the possibilities are too many to explore. However, it is possible to fully verify the smallest components, and build large components from them and fully verify those as well. Obviously, the complexity increases greatly with each new layer, but when one is working with fully verified components, any errors that occur must be in the local logic. Granted, this is much more labour intensive, but as long as each component follows precise specifications, it's more than feasible. I'm amazed that many prominent software projects still use largely monolithic testing...

    --
    Be relentless!
    1. Re:Shouldn't that be too bloated to test? by Anonymous Coward · · Score: 5, Insightful
      Indeed. What is the problem here exactly? You have layers of testing. Good development houses will use a waterfall or iterative process.

      1. Unit test. This can easily be automated with almost any language, especially modern languages like Java (JUnit).
      2. Component test. This is usually a low-volume testing phase because you're only testing boundary conditions and each component only needs to be retested when it changes.
      3. Integration test. Again, usually quite low volume if the unit testing and component testing have done their job. You pretty much put it all together and check that it runs, along with the most basic of sanity checks (Can it access the database? Can a user log in? Is it producing log files? etc.)
      4. System test. The biggy, but it doesn't have to be daunting. If you have proper requirement and design documentation writing the test plans should be a breeze for any competent tester, no matter how large the codebase. If the unit testing, component testing and integration testing have done their job system testing should really only be about validating the software against the requirements, not finding bugs. If you're finding significant bugs at system test stage either your unit or component testing wasn't done correctly or your requirement and design process is poor.
      This sort of thing is basic, bread 'n butter stuff to a tester. Usually it doesn't work as it should because either management don't allow proper timescales, don't use a proper iterative process ("I've penciled in one re-build to fix any bugs we find the week before it's due to ship. O.K? Oh well, it'll have to do.") or the requirement and design phase is done so poorly there is no way to write proper test plans. It is almost never the case that the software is "Too complex". If NASA managed to debug the entire shuttle flight control software, I'd expect a company the size of Microsoft to be able to debug a server application.
    2. Re:Shouldn't that be too bloated to test? by CortoMaltese · · Score: 4, Insightful
      Big construction projects such as planes, ships, etc. would never make it they weren't divided into components of manageable size, as suggested for software in the parent. Suppose if someone suggested in an airplane project that random integration testing at the very end of the project is sufficient - a practise still commonly in use in software projects.

      What is it about software construction that makes this so difficult a concept to grasp?

    3. Re:Shouldn't that be too bloated to test? by kernelblaha · · Score: 2, Interesting

      ...which is why open source works. The philosophy of OSS apps has always been to make small programs that do one thing very well, then join them together to get good funcionality for more complex tasks. And not through specific design, but throught adaptation and tinkering. ...yeah yeah preaching to the converted and all that...

      --
      Million dollar sig.
    4. Re:Shouldn't that be too bloated to test? by MarkRose · · Score: 4, Insightful

      That's actually more the philosophy of Unix -- to do one thing, and to do it well -- and has been around for 30+ years. I'd say that philosophy is common in the OSS world for a few reasons: One, open-source encourages code & component reuse. Two, most OSS developers don't have time to write large projects on their own, and three, the free software movement started in the Unix domain, the source of this philosophy.

      --
      Be relentless!
    5. Re:Shouldn't that be too bloated to test? by MarkRose · · Score: 4, Insightful

      And that, my friend, is why software "engineering" is not engineering at all. I'm all for raising coding standards to engineering levels. The amount of time and headaches saved by such an effort would easily exceed thousands of lifetimes. It's silly that we still accept such shoddy workmanship.

      --
      Be relentless!
    6. Re:Shouldn't that be too bloated to test? by uss_valiant · · Score: 5, Insightful
      What is it about software construction that makes this so difficult a concept [unit/component tests] to grasp?
      Maybe an illusionary view of time-to-market, costs of bad design, costs of ignoring design for testability etc.

      In other areas, i.e. ASIC / integrated circuits, the costs of wrong decisions and errors explode during the design cycle. This is why the whole IC industry commits itself to a "first-time-right" ideology. Each step, from specification to the final layout, involves testing. As a ASIC designer, you're happy if you can spend more than 25% of your time and effort on designing the actual architecture. 75-90% of the overall effort is "wasted" for testing.
    7. Re:Shouldn't that be too bloated to test? by Yaztromo · · Score: 4, Insightful
      If NASA managed to debug the entire shuttle flight control software, I'd expect a company the size of Microsoft to be able to debug a server application.

      The problem is that in the NASA case, if they don't get that shuttle flight control system ready on time for launch, they can easily push the launch back indefinately. It isn't as if they're going to go out of business if they don't have launches due to unsafe conditions.

      Besides which, once the flight control system version x.y is finished, the development tea doesn't then immediately start working on flight control system version x.y+1 (or worse, versionn x+1.0). It isn't as if NASA finishes a shutttle, and then immediately starts building a new, improved shuttle.

      But this is exactly what happens in big software houses. The pressure to release ahead of your competition and stay ahead (or catch up with) the perceived feature curve is huge. Delays are bad -- delays equal lost sales. And once the product is done, unlike a bridge or a plane or a shuttle which will last 20 - 30 years or more as is, that software immediately starts getting new features and major modifications for "the next version".

      And perhaps worse, once a version ships, most software development companies stop any sort of further testing -- instead, they rely upon customers to report problems, and typically only then do they investigate (and, hopefully, fix the problem).

      The process is different due to "market forces". Personally, I don't like it either, and have stayed away from corporate software development for some time because of it. It's simply not a good way to develop software, as eventually the poor design decisions and rushed jobs (and burnt out developers) cost the company and the users dearly.

      Yaz.

    8. Re:Shouldn't that be too bloated to test? by Bastian · · Score: 2, Interesting

      This sometimes amazes me. The market forces that push companies to try and release products ahead of the competition exist in every industry, but it seems to only be software that has responded in such an insane manner, and I'm pretty sure software is the only industry where a company who does this can get away with it.

      Let's consider the hypothetical situation where Airbus releases the A380 prematurely (to keep ahead of the market) and creates an airplane that costs an incredible amount of money to maintain - or even worse, breaks regularly. What happens in this situation? Easy; everyone throws up a huge stink, and Airbus loses lots and lots of business for the next few years or decades.

      On a smaller scale, I have definitely done this with Belkin - they released a couple too many crap products, and now I am never buying their stuff again, and I know of other people who feel the same way.

      But in software, companies can just promise that It Will All Be Better In The Next Releease. Repeatedly.

      Windows 95 will fix the world. Ooops, no, we meant 98. . . uhh. . make that 98SE. Nope, ME. Ahh, screw that, let's drop that line and give Windows 2000 a shot. Except you should probably try XP. . . . . SP2. . .

      And I don't mean to just Microsoft-bash; they are just an easy target. Apple does it, most the major Linux distros I've used do it, it seems like it is just the way the software industry works nowadays. And it is insane.

    9. Re:Shouldn't that be too bloated to test? by gosand · · Score: 4, Informative
      But this is exactly what happens in big software houses. The pressure to release ahead of your competition and stay ahead (or catch up with) the perceived feature curve is huge. Delays are bad -- delays equal lost sales. And once the product is done, unlike a bridge or a plane or a shuttle which will last 20 - 30 years or more as is, that software immediately starts getting new features and major modifications for "the next version".

      This is not always the case. I just left a very large company for a smaller one, and I have been doing software testing for 11 years. I have worked for two very large companies in my career, and two small ones. In the large ones, I learned most of what good testing was about. I also learned most of what I know about the development process, and how it should be done. Unfortunately, at both of those companies, they talked a good game but didn't deliver very well.

      When it comes to software projects, you have 4 factors:

      Schedule

      Cost

      Quality

      Features

      The rule is, you get to optimize one of these, are constrained by one, and you have to accept the other two. Everyone always thinks that they can get around this somehow, but it never works out. Oh, and you have to make these choices when you start the project - if you change them mid-stream it changes the game.

      NASA was used as an example. They are constrained by features and want to optimize quality. Therefore, it costs what it costs and you get it when you get it. Most big software houses are constrained by schedule and want to optimize features. That means they throw money at it and take whatever quality they get. Until they bitch about the quality. If only they really understood this. I presented this to my manager, and he said "But cost is free, because everyone is salaried and can just work overtime." He was serious. Do you wonder why I left?

      We always thought we were constrained by schedule because every single release, some manager would say "This is the release date, and it is not moving!" It would move EVERY SINGLE RELEASE. For 4 years, we never hit a release date. Of course, we thought we did because we kept moving it during the cycle. Once, we delivered the release 1 year late - but it was on time according to our re-evaluation. Phbbbt. We did software for hospitals, and it wasn't that big of a deal if we missed our release date. These were huge inventory systems, and it took months for them to deploy. They had to be signed off by Beta sites before it could even be made available to everyone, and even then nobody just bought it off the shelf. We had to go in, install it in their test environments, train them on it, and set up transition dates. And we had to schedule it all within their budget constraints. So time to market wasn't nearly as big of an issue as it is in small companies, where if you don't deliver in a week or two, you can really hurt the company.

      I guess my point to all of this is that there are good QA and testing practices, but they might not apply to all situations. The key is knowing when to apply what. If I tried to apply Quality Assurance to where I am now, it would be a total waste of effort. The same goes for testing methodology. (they are NOT even remotely the same things you know) Our build schedules at the big company were every 2 weeks. Where I am now, we do at least 4 releases of software in that time. But it is hosted software, so it is a totally different animal. I value my time at large companies, I learned how things work and don't work in the QA and software testing arenas. The good part is, there is still more out there to learn.

      --

      My beliefs do not require that you agree with them.

    10. Re:Shouldn't that be too bloated to test? by oliverthered · · Score: 2, Insightful

      Two points.
      1: I agree, but it takes a long time (5-10years) to get you coding skills upto traditional engineering levels.

      2: Mechanical devices have much higher tolerances than mathematical ones, if I want a bus that's going to be safe I do some rough calculations and then add 10% to the thickness of all the materials etc...

      If I want software that I know is safe I have to make a estimate, double it, then allow ten times the development time to fix the bugs. Even if I had fifty pears reviewing my code bugs are going to slip in because they don't fully understand what I am writing. If you don't believe me, download the kernel source, download a few API's and check the kernel against the API's I bet you'll find a bug. You could even download the DirectX 9 patch from my website and find some bugs if you want!.

      If I was writing commercial software I would make sure that all code has full modular testing with wide data sets, and possibly introduce a random data sets and leave a few boxs running from now till we ship just testing the code with valid data and junk. I would also make sure that people were given the opertunity to work on 'pet projects', maybe do some OSS hacking in more R&D area to help keep their skill sharp.

      --
      thank God the internet isn't a human right.
    11. Re:Shouldn't that be too bloated to test? by revscat · · Score: 2, Insightful

      And I don't mean to just Microsoft-bash; they are just an easy target. Apple does it, most the major Linux distros I've used do it, it seems like it is just the way the software industry works nowadays. And it is insane.

      Apple at least seems to be better about it. With one very notable exception where the contents of my iPod were completely erased, all of the software updates I have gotten from Apple have been flawless and for the most part made the product better. This includes point releases as well as security updates.

      I don't know the internals of Apple's development process, but I suspect that they are very disciplined in their QA process. I think Microsoft has driven them to this, because one of the prime differentiating characteristics between Apple and Microsoft software is quality.

    12. Re:Shouldn't that be too bloated to test? by EddieBurris · · Score: 2, Informative
      Besides which, once the flight control system version x.y is finished, the development tea doesn't then immediately start working on flight control system version x.y+1 (or worse, versionn x+1.0). It isn't as if NASA finishes a shutttle, and then immediately starts building a new, improved shuttle.

      Every flight requires a new version of the primary flight control software and, because of the long lead time to prepare a version, they often have 2 or more in the works at the same time. At one time in 1983 there were 5 versions being worked on simultaniously.

      Reliability in the flight control software for the space shuttle comes at a price. Their cost per line of code is $350*. That buys more quality than most commercial vendors can afford.

      Eddie Burris

      *http://www.stsc.hill.af.mil/crosstalk/1998/11/k rasner.asp/

    13. Re:Shouldn't that be too bloated to test? by Daniel · · Score: 4, Funny

      Even if I had fifty pears reviewing my code bugs are going to slip in because they don't fully understand what I am writing.

      Well, that's because pears can't code worth a darn. You should be using oranges. I know some people will hold out for bananas, but I've never had good luck with them; they're too fickle. Oranges will get the job done every time.

      Daniel

      --
      Hurry up and jump on the individualist bandwagon!
    14. Re:Shouldn't that be too bloated to test? by Reziac · · Score: 2, Interesting

      I knew a programmer who worked for Apple as a member of their core OS development team, back around MacOS7. He told horror stories about how poorly managed it was. One problem he specifically ranted about was that some manager would decide that YOU were DONE with a given project, and physically remove your work machine from your desk, give it to some other coder, and give YOU someone else's half-finished work (which you'd then have to figure out before you could work on it). So no one ever got to actually FINISH their coding, hence there was a lot of half-baked code, kludges, and workarounds. And management *forbid* them from publishing a patch to fix a particular broken firmware, because management wanted people to buy their next machine (with fixed firmware), not just fix the old one!!

      Anyway, my point is that just because what you see on the surface looks polished, doesn't necessarily mean the QA or development process is any better.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    15. Re:Shouldn't that be too bloated to test? by winwar · · Score: 2, Interesting

      "If only they really understood this. I presented this to my manager, and he said "But cost is free, because everyone is salaried and can just work overtime." He was serious."

      And some say that programmers/coders/employees don't understand business....

      Granted, from his perspective, it WAS free. Wouldn't seem to be a good way to run a business but there seem to be a lot of businesses that make lots of money operating that way.

  5. Code can't be too big, just badly designed by Welsh+Dwarf · · Score: 4, Insightful

    For those who didn't RTFA, it is basically saying that exaustive (?sp) testing can't be done on a large codebase, and random testing is all you can use, to which most coders will say bull.

    If a piece of code is too big to test exaustivly, it's time to refactor it into bits that can be.

    After you've tested each part to make sure it works, you test a super set of parts, thus testing the interactions between the smaller parts, lather rinse repeat until you've tested th whole application.

    Correct use of unit testing will always outstrip random testing.

    This is just an excuse for badly designed code bases.

    --
    Ask 8 slackers a question, get 10 awnsers (a citation, but I can't remember from who)
    1. Re:Code can't be too big, just badly designed by NickFitz · · Score: 4, Insightful

      For those who didn't RTFA, the parent post is talking complete nonsense when claiming that "it is basically saying that exaustive (?sp) testing can't be done on a large codebase, and random testing is all you can use".

      Headings from the article include:

      • Good unit testing (including good input selection)
      • Good design (including dependency analysis)
      • Good static checking (including model property checking)
      • Concurrency testing
      • Use code coverage to help select and prioritize tests
      • Use customer usage data
      • Choose configuration interactions with all-pairs

      All in all, it's a good article, and may go some way to explaining why MS's XML component actually works (I write code to it all day, every day).

      --
      Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.
    2. Re:Code can't be too big, just badly designed by MarkRose · · Score: 2, Insightful

      If a piece of code is too big to test exhaustively, it's time to refactor it into bits that can be.

      Yeah, I told that to my boss about the product that my predecessors have been working on for years, without any test cases. Internally it's a convoluted entwined mess. I estimated about a man-year to break it down and build it up again, with exhaustive test cases of all the parts. He laughed at the idea, and didn't see the business benefit.

      This is just an excuse for badly designed code bases.

      So what do you do with them when you are handed them?

      Start.

      You don't have to completely refactor the code -- but there is no reason why you can't refactor parts as you work on them. That happens all the time in the Linux kernel for instance. I would imagine every component in the kernel has been rewritten at least twice -- but not once was the whole kernel replaced.

      --
      Be relentless!
    3. Re:Code can't be too big, just badly designed by Welsh+Dwarf · · Score: 2, Informative

      And for those who didn't RTFA to the end:

      The author is suggesting pseudo-random testing rather than exhaustive testing for a large code base, which may be a valid point when you recoup a large piece of monolithique code, but should never be used for a fresh project, where comlplete, staged testing is the only way to avoid a complete kludge.

      David

      --
      Ask 8 slackers a question, get 10 awnsers (a citation, but I can't remember from who)
  6. Too costly to test would be the real meaning of it by Gopal.V · · Score: 4, Interesting

    The article just says what everyone knew ..

    * code coverage != proper testing
    * clever inputs are needed to test
    * few programmers test concurrency

    Ending with - "ECONOMY IN TESTING" (ever heard about "Good Enough Isn't")

    Essentially apologetic about the lack of testing. Test driven development is not a philosophy, it's a way of doing. In a perfect company environment, you'll never be blamed for breaking someone's code - but in most places the idea is "he made me look bad". Peer reviews never work out properly. This is why FOSS is turning out more secure and clean code.

  7. Re:Got an idea by MarkRose · · Score: 3, Insightful

    Open-sourcing a project will do little to nothing in regards to testing. First, there is often little to no insentive to attract open-source developers. Second, a poor design is a poor design, and those in charge are highly unlikely to through a working design out (two rare exceptions are Apple's move to Mac OS X and Microsoft's move to NT). Third, open-source developers frequently have no insentive to test -- testing is boring and labourous. And while the occasional person may fix the occasional bug, on the whole, opening the source a product for testing purposesly is almost always a fruitless exercise.

    --
    Be relentless!
  8. Testing for real-world use by G4from128k · · Score: 3, Interesting

    I recently had a problem with ordering from Amazon that illustrates the problem with testing and all the possible permutations of user actions. I was checking out when I noticed that high shipping cost from one vendor, went back to order from a different vendor and hosed the order. Apparently, there was only one of the item in stock and it was now committed to the pending, partially checked-out order. There was no way to clear the partially complete check-out process and no way to checkout with the item in my shopping cart -- it would only complain that I was trying to order TWO of the item and pull the ONE instance of the item from the cart.

    Amazon is not the only e-commerce site with this problem (although I expected better from Amazon). Many sites fail to test for user action sequences other than the straight-through order process. I'm not suggesting that developers test for all possible sequences (that's impossible), but they should test for more plausible ones that a simple linear execution of the process.

    When I did software testing (a task that I hated), I quickly broke an RDBMS application with just a simple series of adding and removing items from a user-manipulable working set of data objects. Moreover, I even broke the UI layer and dumped myself into a lower level of the RDBMS shell that was supposedly inaccessible to users. The developers grew to hate me so much for finding bugs in their code and the RDBMS vendor's code that I was moved to another job (YAY!).

    The point is that it is often too easy to break code because the developers have created overly simple linear use cases that are then used in testing.

    --
    Two wrongs don't make a right, but three lefts do.
    1. Re:Testing for real-world use by Chris+Kamel · · Score: 4, Interesting

      The developers grew to hate me so much for finding bugs in their code and the RDBMS vendor's code that I was moved to another job (YAY!).
      I don't know what kind of developers you were dealing with there, but I am a developer myself and I actually like and respect QA or test engineer who come up with creative and "smart" bugs, they keep it interesting, they make my job easier and they make for a more successful product, so what's there to hate about them?

      --
      The following statement is true
      The preceding statement is false
    2. Re:Testing for real-world use by gstoddart · · Score: 2, Interesting
      I don't know what kind of developers you were dealing with there, but I am a developer myself and I actually like and respect QA or test engineer who come up with creative and "smart" bugs, they keep it interesting, they make my job easier and they make for a more successful product, so what's there to hate about them?

      As much as I rely on our QA people to come up with bizarre inputs, sometimes bug reports from QA can be a bitch to decode. They'll have the tester's perceived explaination of the source of the bug, which may or may not jibe with the actual one; it's like user-reports -- sometimes the interpretation is a red-herring explaination.

      I've had to explain that the bug they saw was in other code because it caused a bizarre interaction it wasn't supposed to.

      Unfortunately, users submit bug reports to the software they were interacting with.

      --
      Lost at C:>. Found at C.
    3. Re:Testing for real-world use by Anonymous Coward · · Score: 2, Insightful

      1) no end user would ever encounter since it's so arcane and

      Assuming an end user will never encounter an error because it's "arcane" is a really good way to get your ass handed to you. The parent was probably GOOD for testing because he at least knows how to describe the problems and is familiar with the systems. You take the average drone user who doesn't know jack about the system when it blows up and your talking about a huge argument where no one knows what the hell is going on.

      I agree about the "hard to fix" problem though. I mean if your code is 99% right with a bug that will be rare and non-criticle and it will raise the cost of the software significantly - then perhaps it's better to let it slide. Or just try to throw up some last minute defenses.

  9. Structural problems by ites · · Score: 4, Insightful

    It is possible to build immense and complex code bases that are incredibly well tested and robust. Look at any Linux distribution and this is what you have.

    The key is that the code base is structured so that it can evolve over time as many independent layers and threads, each using an appropriate technology and competing in terms of quality and functionality.

    The problem is not the overall size of the code base, it's the attempt to exert centralised control over it.

    To take a parallel from another domain: we can see very large economies working pretty well. The economies that fail are invariably the ones which attempt to exert centralised planning and control.

    The solution is to break the code base into independent, competing projects that have some common goal, guidelines, and possibly economic rationale, but for the rest are free to develop as they need to.

    Not only does this make for better code, it is also cheaper.

    But it's less profitable... and thus we come to the dilema of the 2000s: attempt to make large systems on the classical model (which tends towards failure) or accept that distributed cooperate development is the only scalable option (and then lose control over the profits).

    --
    Sig for sale or rent. One previous user. Inquire within.
  10. Retooled jokes by Ford+Prefect · · Score: 4, Funny

    "Yo' codebase's so fat, when it get in a lift it has to go down!"

    "Yo' codebase is so bloated, it's got its own dialling code!"

    "Yo' codebase's so big, NASA includes it in orbital calculations!"

    Etc. etc., ad nauseam et infinitum...

    Software rewrites may be considered harmful, but at which point do you declare that enough is enough and start again, breaking it down into smaller, easily tested modules? Big, old projects (like, say, OpenOffice.org) can get so appallingly baroque that there must be vital areas of code which haven't been modified (or, more importantly, understood) in years - how do you test those?

    --
    Tedious Bloggy Stuff - hooray?
    1. Re:Retooled jokes by DrMrLordX · · Score: 5, Funny

      If it ain't baroque, don't fix it.

      Ha ha! Ha ha ha!

      *cough*

    2. Re:Retooled jokes by starseeker · · Score: 5, Insightful

      I still content that rewrites are harmful only when all of these three conditions are met:

      a) Your code is your commercial product/livelyhood

      b) You need to support legacy systems

      c) You are coding for practical results not for the art of programming.

      Joel is an insightful guy, but he approaches software exclusively as a deliverable intended to Get The Job Done Now. For a lot of software this is appropriate, but in the case of open source software it is seldom that all of the above conditions are met. There are also a couple of points he doesn't mention that are relevant to open source software:

      d) Users of the old code are not left out in the cold - the complete old codebase is available for them to pick up and maintain (or hire someone to maintain - maybe even the original author) if there is sufficient motivation. Open source authors often aren't motivated to maintain steaming piles of turd just for the joy of it, so they are more inclined to do rewrites. If you want them to maintain old stuff, do like everyone else who really wants some service and hire them!

      e) The software stack is almost completely free for open source software - there is no "but I can't afford to upgrade to Windows 98 and break everything!" problem. Granted you might run into those problems, but in theory if you care enough they can be solved. (Often NOT true for legacy commercial software.) So open source developers as a whole are a lot less concerned with backwards compatibility. Take KDE for example - the incentive to support KDE2 when coding a KDE app today is virtually nil - there are many very good reasons KDE3 exists, both from a user AND a developer standpoint. If a user really wants the crap handled to deal with old, broken environments they shouldn't expect to get something for free. The point, again, is that they CAN hire someone to do what they want, because the code is available to be updated.

      Now, that said, I would agree that OpenOffice is too critical to the free software world to rush off and be headstrong about. It might be a case where a Netscape type move would be a bad idea. But I like the enlightenment project, even if they have treated violating Joel's rules like a pro sport. They are creating something artistic, advanced, and with the intent of "doing it right". If you look at enlightenment as not a continuation of the old e16, but instead as a totally new product, then it takes on a different light - they are actually doing prototypes, designing and testing, etc. BEFORE they release it in the wild and invite support headaches. Now, as usual first to market wins, but in open source losers don't always die and can sometimes come back from the grave. Rosegarden is an example of an application that is good because they explored their options and found a good one, even with and partially because of their experience on previous iterations of the code. They didn't do it "the Joel way" but they did it in the end and they did well.

      I think there is another "zen" of programming, that we are getting closer to reaching - the "OK, we have discovered the features we want and use, now let's code it all up so we never have to do it again" level. There is little that is surprising in spreadsheets, databases, word processors, etc. - they are mature applications from a "user expected featureset" point of view. So now I propose we do, not just a rewrite, but a reimplimentation using the most advanced tools we have to create Perfect software. Proof logic, careful design, theorm provers, etc. etc. etc. We know, in many cases, what program/feature/OS behavior/etc. we want. Let's formalize things as much as humanly possible, and make a bulletproof system where talking about rewrites makes no sense, because everything has provably been done the Right Way. (Yes, I'm watching the coyotos project - they've got the right attitude, and they might determine if it is possible.)

      --
      "I object to doing things that computers can do." -- Olin Shivers, lispers.org
  11. Not darned testable by tezza · · Score: 4, Interesting
    At least by a computer.

    I do a lot of programming with visual output. It is impossible to have a computer check that the font got outlined correctly in the PDF, say.

    When you combine this with user input and then rare-case branching logic, you can end up with a nightmare of unfollowed paths. Unfollowed, to some extent, means untested.

    Just one extra branch can be disasterous because of factorials involved depending where it is placed in the branch pipeline. One minute, everything working, next minute some new code and

    (n+1)!
    things that need to be eyeballed.
    --
    [% slash_sig_val.text %]
    1. Re:Not darned testable by BenjyD · · Score: 2, Interesting

      I've faced this problem too with checking visual output. What I will probably do at some point is do automated screenshot comparison: have the system do the test, then compare the relevant region of the screen to a known-good image as a regression test. The only problem I can see with that is that generating the known-good images is time-consuming and minor changes would require regenerating them all.

  12. Re:Got an idea by oirtemed · · Score: 4, Funny

    Yes. It's a wonder why we even have packages like bugzilla anyhow. Nobody tests and reports bugs in opensource software. Ever. Nobody fixes them, either. Ever.

  13. Re:Testing, by Anonymous Coward · · Score: 3, Insightful

    Where did you learn the trade? If I guessed you're pretty fresh from a Computer Science course, say two or three years in the business, would I be far from the mark?

    I ask because what you describe is exactly what is supposed to happen. You know you're done when, surprise, QA stop sending you bugs (Or at least, stop finding bugs which are classified above a certain severity level). Then, and only then, should the software be considered complete and ready.

    The problem is that attitudes like yours, that QA is a pain that should be wished away, is wrong but very very pervalent within the IT industry. It is such a wrong and totally backward attitude I can't fathom where it came from. It's a brain rot that's killing the industry in a see of broken code and half assed implementations.

  14. Article summary... by TuringTest · · Score: 3, Informative

    ... automatically performed by OTS:

    Finally, testers can use models to generate test coverage and good stochastic tests, and to act as test oracles. A fundamental flaw made by many organizations (especially by management, which measures by numbers) is to presume that because low code-coverage measures indicate poor testing, or that because good sets of tests have high coverage, high coverage therefore implies good testing (see Logical Fallacies sidebar). One of the big debates in testing is partitioned (typically handcrafted) test design versus operational, profile-based stochastic testing (a method of random testing). Current evidence indicates that unless you have reliable knowledge about areas of increased fault likelihood, then random testing can do as well as handcrafted tests.[4,5]

    For example, a recent academic study with fault seeding showed that under some circumstance the all-pairs testing technique (see Choose configuration interactions with all-pairs later in this article) applied to function parameters was no better than random testing at detecting faults.[6]

    The real difficulty in doing random testing (like the problem with coverage) is verifying the result. A test design implication of this is to create relatively small test cases to reduce extraneous testing or factor big tests into little ones.[9]

    Good static checking (including model property checking). If you know the coverage of each test case, you can prioritize the tests such that you run tests in the least amount of time to get the highest coverage. First run the minimal set of tests providing the same coverage as all of the tests, and then run the remaining tests to see how many additional defects are revealed. Models can be used to generate all relevant variations for limited sizes of data structures.[13,14] You can also use a stochastic model that defines the structure of how the target system is stimulated by its environment.[15] This stochastic testing takes a different approach to sampling than partition testing and simple random testing. Code coverage should be used to make testing more efficient in selecting and prioritizing tests, but not necessarily in judging the tests. Test groups must require and product developers must embrace thorough unit testing and preferably tests before code (test-driven development).

    --
    Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
  15. The Oracle Problem by Goonie · · Score: 4, Interesting
    One point that this article doesn't really come to grips with regards to stochastic testing is the "Oracle Problem". In essence, how do you know that the result of testing is the right answer? This is a particular problem with random-input testing, or any testing method that involves using automatic methods to generate a large number of tests.
    #ifdef PLUG

    My own research group works on methods to reduce this burden in a number of ways. One, my personal work, is on "semi-random" testing (we call it Adaptive Random Testing) which, we claim, detects more errors with fewer tests and reduces the problem that way. Another is "metamorphic testing" which tackles the oracle problem more directly by a slightly more sophisticated form of sanity checking assertions. You test the program with two (or more) related inputs, and check whether the outputs have the relationship you'd expect based on the inputs.

    Unfortunately, the boss has an, um, slightly behind-the-times attitude to putting papers on the web; but if you search the DBLP bibliography server for T.Y. Chen you can get references for most of them.

    #endif

    However, I'd be the last to claim that we have a complete solution to the oracle problem; there will of course never be one. But it is a problem that will continue to make automated testing a challenge.

    --

    Any sufficiently advanced technology is indistinguishable from a rigged demo
    --Andy Finkel (J. Klass?)
    1. Re:The Oracle Problem by pfdietz · · Score: 2, Insightful

      I'm a great fan of randomized testing, and have used it to good effect in testing Common Lisp compilers as part of the GNU CL ANSI test suite. The oracle problem is tractable, since one can do differential testing -- test that two different computations that should produce the same answer actually do. For example, construct a random lisp form, then eval it, and also wrap it in a lambda form, then compile and funcall. Differences in output, errors during compilation, or errors during execution all indicate bugs (assuming one has generated legal lisp code.)

      This and other more focused random testing schemes have found oodles of bugs in many Common Lisp implementations.

  16. sigh...... by sunami · · Score: 2, Insightful

    ...such is the outcome of not doing test-driven development. Test the functions as you write them, and just leave the tests there until you release, makes sure everything works. When will these people learn!

    1. Re:sigh...... by GileadGreene · · Score: 2, Insightful
      Sigh. I can't believe this got rated "insightful".

      Testing functions as you write them is fine (and the article advocates unit-testing). The problem comes when you have a large and complex system that integrates a lot of individual functions, particularly where you have loads of concurrency. each individual function be be working fine, but the unexpected interactions between these functions can come back to bite you, and the combinatorial explosion of system states is such that full testing can be well-nigh impossible. Which is kind of the point the article was trying to make.

      So what do we do if we can't fully test? We look to things like design-by-contract (i.e. minimize unexpected interactions), model-checking of abstract system models, carefully designing the test-cases we do perform to achieve good statistical coverage, and actually engineering a system instead of just perpetrating random acts of hackery until it "works".

  17. Random vs. Handcrafted testing by starseeker · · Score: 3, Insightful

    I've seen this debate before, and the part I always wonder is "why not both?" At least, when you are starting from scratch. You can verify your components do what they are supposed to and then check for bizarre situations no one thought of with random testing (sometimes you will expose obscure bugs in the software stack itself, not just your code - but remember no code stands alone, and all crashes look the same to the end user no matter what the root cause.)

    Particularly on large, old projects one has inherited, random testing can really help because you have absolutely no clue what you are looking for. There are so many discrete components to the system that could be tested it would be the work of ten years to set it up, so you are forced to (as much as possible) assume that things work and find the cases where they don't. Then, you gradually begin to fix things over the long haul while fighting fires.

    GCL and the other free Lisp implimentations are a good example of testing - we have a very dedicated individual who has been creating tests of ANSI behavior from the spec and testing a wide variety of implimentations - indeed many non-standard behaviors have been corrected because of these tests. He has also created a "random tester", which I like to call "the Two Year Old Test." It is a code generator which generates random but legally valid Lisp code and throws it at the implimentation. It has exposed some very obscure bugs in GCL which probably would have otherwise hidden for years. Anybody who has been around small kids knows they will introduce you to all sorts of new failure modes in just about everything you own, so I always think the Two Year Old Test should be administered as a final check whenever possible. (Granted this works particularly well for compilers.) Newbies are very useful for this kind of stuff as well, because they will use the software in ways you never thought to.

    --
    "I object to doing things that computers can do." -- Olin Shivers, lispers.org
  18. On a similar note, Linux QA by starwindsurfer · · Score: 2, Insightful

    Is anyone doing independant QA audits of linux, outside of the development sources/bug report/linus ruling on high loop?

    --
    If you resist reading what you disagree with, how will you ever acquire deeper insights into your own beliefs?
  19. M$ are much more clever than that (-1 Flamebait) by patrixx · · Score: 2, Funny

    They call their monkies "End users" and charge them big bucks for the testing, and on top of that have them accept EULA's that take away their rights. ;-)

  20. Re:Got an idea by Skye16 · · Score: 2, Informative
    Wait wait wait, I have one!
    Flamebait: Linux sucks!
    Flamebait: Apple sucks!
    Flamebait: Windows is the best!
    Flamebait: The United States is the best!
    Flamebait: The United States sucks!

    THAT is flamebait.

    At worst, waaaaaay yonder (what, great grandparent?), he was trolling, but I thought he was just being facetious.
  21. Re:Got an idea by Skye16 · · Score: 2, Funny

    Oh, come on, you know the moderators here are on crazy. They're the same people who post here, and you've read their comments. Slashdot is some sort of Mecca for insanity. It makes me tingle down there.

    (That's me being silly, in case your funny bone is still broken =O )

  22. Right On, Etc. by 4of12 · · Score: 2, Insightful

    [TFA] Another great way to target testing is based on actual customer usage.

    This is a really good idea.

    The crash feedback systems in Mozilla exhibits this model of testing.

    I think more of the casual user applications I run on the desktop should be compiled with debugging and a simple transparent mechanism for returning information to the developers about problems.

    Nothing mandatory, no hidden information sent back to the mother ship, just a text file showing back traces, etc. that the user can see contains no sensitive information.

    Thus all users become beta users that can feedback to the developer which bugs really matter.

    Taken to the next step of optimization and UI design, developers can find out which code paths really matter in terms of real life usage if the application is instrumented with profiling turned on and the option for the user to feedback information this way. IIRC, some compilers have options to take advantage of run-time statistics to better compile the second time around.

    --
    "Provided by the management for your protection."
  23. bloated code, or just poorly written? by Targon · · Score: 4, Interesting

    Back in the old days, a common way to write a program was to make code that can be used in many different places from within the program. Routines that are similar would be considered a bad thing, so you make routines that are designed to handle the differet situations that need similar code.

    The problem with Microsoft is that they have forgotten or never learned how to design a program before their people have started to write anything. As a result, we see 384k patches from Microsoft that take several minutes to install on some systems.

    Another problem is that there is a LOT of duplicate code that is in use even within common libraries.

    The people who suggest that there are too many features are almost correct, but the problem isn't with the number of features, it's the way those features are added to programs.

    Also, there is only so far you can take a given design while you add features before things start to break due to design. If you start with a good DESIGN, then implement that design in code, it becomes a LOT easier to debug.

    Microsoft needs to come up with a NEW OS that isn't an extension of Windows NT or Windows 3.0(95/98/ME are still based on that old code in many ways). Windows NT was the right idea back when it was first developed. Toss the old design, start from scratch, and you end up with a better product. The only problem that Windows NT really had was that compatability wasn't written into the core design of the OS, it was a layer added on top, which means you need a "translator" to handle that. If it's in the design, then you figure out how to do the emulation of the old system in a way that is compatable with the "new" way of doing things. Today, it's not as difficult as it used to be back in those early days of Windows NT. We have enough processing power to make virtual machines that can handle just about anything if they are coded properly. The only problem is that the emulation of the old DOS environment or Windows environment hasn't been implemented by Microsoft.

    But I've gone off topic a bit. The key to easily debugged code is to design in a way to make things properly modular. Almost all features within Windows should be TIGHT code. To open a file probably has 200 different versions of that code within the Windows XP code base scattered through all the programs that come with Windows XP or 2003. Think about that, and wonder why it's hard to debug.

  24. I wonder what the IRS would say... by ebuck · · Score: 2, Interesting

    If you claimed an income tax return too big to audit for accuracy, or better yet, too big to file.

  25. What's wrong with unit testing? by Jerk+City+Troll · · Score: 2, Interesting

    Instead of trying to test huge code bases, why not write decoupled systems and test small pieces of code? Oh wait, that requires effort.

    I've worked on a number of projects (that borderline on huge) which have a thorough set of unit tests. Each test sets up pre and post conditions and checks the output against what we expect. (Duh!) It's not difficult, it just requires planning and careful attention to detail.

    If you've ever built Perl from source, you'll notice that the entire code base gets tested during the process.

    I have to say that it's not about theory or speculation, it's just about hankering down and doing it.

    Testing, fundamentally is not that hard. I think the real problem is developers often trying to find excuses to either put it off or worse yet, not do it at all. Added to the problem are badly designed architectures where most components have tight dependencies with others. This prohibits running them in isolation and hence limits testability. Naturally, it's always more complicated than this (budges on time and money) but the root of the problem is lack of motivation or ignorance to the benefits of having easily and hence well tested code.

  26. The Waterfall that wasn't by willCode4Beer.com · · Score: 2

    In the original paper, "Managing the Development of Large Software Systems" by Winston Royce, describing the waterfall model, Royce actually points to the flaws of that process. He actually says that it "is risky and invites failure".
    It appears that at some point some PHB's saw the paper, looked at the pictures (instead of reading) and decided "we should all use the waterfall development process".
    As for iterative development, I couldn't agree with you more. And its also what Royce was really at where each "phase" provides a feedback to the one before it. And if a project follows the steps you outlined for each iteration, as well as doing some refactoring for leasoned learned, you will generally see the bugs go down and the code quality steadily increase.
    OTH, trying to suddenly apply a test scafolding around an existing large codebase can be a very painful process. Many time its actually easier to just rebuild using the original app as the basis for the new test cases. And the NIH syndrome weighs heavily.

    --
    ----- If communism is a system where the government owns business, what do you call a system where business owns govern
  27. But why would they do that for free? by mosel-saar-ruwer · · Score: 2, Insightful

    The trick for beta Testing is to get as many eyes on it as possible who know that this isn't a completed or stable product. and are able to try funky things to break it.

    I agree with that statement.

    However, what you've described takes an enormous amount of time and effort and [background] knowledge, to the extent that "try(ing) funky things to break it" could very well become a full time job. Hell, just spending the time necessary to read the documentation [and surf the web looking for "gotchas"], solely for the purpose of figuring out how to INSTALL a piece of software, is d@mned near a full time job.

    But in the real world, people get paid to do full time jobs - in fact, they even get paid to do part-time jobs. And if their job title is something like "Senior Testing Engineer for Quality Control", then they get paid sh1tl0ads of money.

  28. Re:test every square root? by Anonymous Coward · · Score: 2, Interesting

    That's exactly how Intel ended up with a FP bug in one of their processors...

    So much for your theory on testing.

    Random sampling testing is only good for the testing of identiacal product production to test for trends in product manufacturing. It is absolutlely NOT the way to test the function of software, well except that it can become impossible to exhaustively test as the paper mentions.

    That is why we have the theorum that states that it is impossible to completely test any software greater than a given size. And that size it amazingly small.

    Frankly, "Common Sense" is more frequently than not "No sense at all". It betrays a complete lack of understanding of the real-world which is infinately more complex than "Common Sense" ever gives it credit for.