Lack of Testing Threatening the Stability of Linux
sebFlyte writes "Andrew Morton, a Linux kernel maintainer, has said that he thinks that the lack 'credit or money or anything' given to those people who put in long hours testing Linux releases is going to cause serious problems further down the line. In his speech at Linux.Conf.Au he also waded into the ongoing BitKeeper debate, saying 'If you pick a good technology and the developers are insane, it's all going to come to tears.'"
I thought good technology required insane developers...
...does it seem like Linus might need a vacation?
TFA states that he's starting to take as much pride in rejecting patches as he does accepting them, and with this whole BitKeeper thing, it seems to me like he might need a small break.
Of course, I'm not one to really talk, as I don't do nearly as much as he does with Linux...
Also, with regards to testing, those of us who use it daily are testing all the time. I know it's not structured QA, but still, it's a lot of testing.
Also, maybe slowing down the kernel releases a bit might help. I know that I do an emerge world on my Gentoo boxes about once a week, and it seems like there's a new kernel release every week. If there's a need for more testing, perhaps a little less time releasing and more time testing is in order.
DBA? Software Engineer? My company is hiring! Click
where the testers (a.k.a. users) get to pay $$$ for the privilege of testing OS stability.
Must be why there is a huge popularity of mugs at MS bearing the obnoxious logo:-
"You don't have to be a developer to work here, but it helps".
Coming from someone who was at that talk, he specifically said NOT to give money to testers. His words were actually 'give them credit, fame and loose women'.
This drew laughs from the audience.
That's the converse. The contrapositive is, after a quick application of de Morgan's law:
"If it doesn't come to tears, then you didn't pick a good technology or your developers are sane."
I thought the whole Fedora project WAS mass testing of "cutting edge technology for Linux". Have I been wasting my time submitting bugs? Most have been fixed that I submitted so far.
Two roads diverged in a wood, and I - I took the one the bus load of girls just went down.
Long answer; kinda. You can use core dumps and system logs to interpret what's going on, but you can never really know for sure. Besides, the kind of errors that are in the kernel are the kinds of errors that really don't return error codes; they're the kind that crash the computer and make you reboot.
Microsoft's method is for some of the higher up software, and so is Apple's. If there's a bug in the kernel it's very unlikely that their code will catch it. Or at least that's been my experience.
If the problem is that Linux is so buggy, we just need to run it on a bunch more machines, and start randomly poking it as hard as we can until we break something. Once we've broken it, do it again to make sure it's not hardware, and then go to work fixing it. Good old brute force repairs.
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
Morton is correct.
Even at commercial companies, QA isn't a "sexy" task. People would rather bang out code than write testing harnesses and run benchmarks.
Also, free software is driven by programmers, who tend to hate QA. Like any artist or craftsman, a programmer hates having their work critiqued. They spent hundreds (or thousands) of hours on a program, only to have someone nit-pick the details and point out the flaws. But for art, "quality" is a subjective quality -- and with software, quality and reliability are tangible quantities that can be measured.
My Acovea project demonstrated the problem. Users of GCC love Acovea; many developers of GCC, on the other hand, seem to treat it is an annoying distraction. Acovea identified more than a dozen errors (some quite serious) in the GCC 3.4/4.0 compilers -- and yes, I did report them to bugzilla. Only a couple of GCC's maintainers have said "thanks."
Not that the cool reception deters me. I have a new version of Acovea in the wings, and will be unleashing it on GCC 4.x Real Soon Now. ;)
As a consultant, I've been paid to perform QA work on commercial software packages -- but only one company, and a big one at that, has ever contracted me to QA a free software project.
Right now, free software is about many things, but quality is not job 1. And that needs to change.
All about me
I have successfully used Test Driven Development in several of my projects and it is a uniquely satisfying experience. Writing test cases before writing the code then completing each test case one after another in steady progression gives a constant stream of small victories. It also means you can run all test cases at a later time and see that "yep, everything still works" or "doh! that change just broke 10 things I already had working."
There are several other benefits to writing tests first as well. The experts in the link above explain it all better than I could, I'm sure.
Many open source projects are taking this approach already and usually boast the number of unit tests along with the lines of code included in the distribution. Anyone can type in "build test" for example and it will show the program run and pass some odd thousand tests.
Is it time for the Kernel to embrace this methodology? I certainly think it is a genuine best practice. But is it applicable to OS development as well? I don't see any reason why it wouldn't be, but I am not a kernel developer myself.
When I heard that I nearly fell off my ostrich.
Software testing (usually) isn't monkeys pounding on keyboards until the box BSOD's.
:)
:). I remember dling the Gaim tarball (this was loooong ago) and seeing about getting it built on my SGI machine. IIRC, there were some makefile / #include problems getting it to even build, and once it was built there were some other issues with its runtime. Ultimately i submitted a patch to the gaim folks that more or less "enabled" gaim on IRIX. There is no way anybody had ever used Gaim on an SGI without making these fixes, so it seems reasonable to suggest the authors had never tried it before. This lack of a platform test matrix is pretty common amongst smaller F/OSS apps, even when they say "works on *nix" they mean "works on the distribution of linux i run at home".
It is difficult to test software without adequately understanding what it is supposed to do. Varying the underlying machine type is almost irrelevant for binary distributed software unless you're testing an operating system kernel or looking for race conditions in software (which is really just a stab in the dark)
How are you going to have 3rd party people debug software they know nothing about?
Where users help find bugs is by using the software. It honestly takes a certain mentality to be an effective software breaker, and it's not very common. It takes something else entirely to be a software tester; you've got to be a good developer (because software testing is about automation these days unless you're insane) but you've got to not get sucked into the developers way of thinking.
I assure you - letting normal users play with software doesn't clean it up. we can show that this is true in the following way:
- more users use Microsoft software for more hours a day than any other software in the world
- slashdotters say Microsoft software is the buggest software made
clearly if users using software was sufficient to find all the bugs, MS stuff would be bug free, based on its frequency of use alone. I know this isn't the case, because im a software tester at Microsoft.
(The appropriate response is "well then, stop posting and get back to work; you're clearly not done yet!"
W.r.t. linux kernel testing: this is something that's always amazed me - linux works surprisingly well for something with so little formal testing. On the other hand, when there are edge case problems my experience has been that nobody is much interested in fixing them. One example i had was at a consulting gig. the client was looking to move his web hosting business onto linux boxes if he could get more sites per box then he could on windows. He had a problem where his linux server would start dying after a few days. I started to look into it and the box would basically panic() in low memory situations. I asked Alan Cox about it (via irc) and the response was "buy more memory". Nice.
Another sore point with me growing up was xserver crashes. The Xserver was 99% reliable, but then you'd get some random crash and lose everything you were doing, and you knew there was no real way of getting it fixed or investigating it.. you just had to hope it magically got better somehow.. maybe when you switched hardware or something.
Then there's the just plain lack of testing of some F/OSS projects in general. When i was in college i had NeXT, Sun, and SGI boxes in my dorm room (but no linux
Another baby patch i submitted was for the openBSD kernel.. this time for the wdc driver. Back when UDMA 100 was newish, i bought 2 UDMA 100 disks a month or so apart.. so they were different sizes and different vendors, but on the same bus. The UDMA rollback code in openBSD would drop the DMA level from 5 (UDMA100) to 2 (something much slower, i dont remember what) after a certain number of DMA errors. This obviously sucked since you can run UDMA devices at different speeds on the same bus, and you can also fallback to UDMA66 and UDMA33, both of which are better than mode 2.
My opinions are my own, and do not necessarily represent those of my employer.
BTW, we have not one, but two of my colleagues down under right now listening to Andrew in person. It should be interesting to get a first-hand account of what was said.
- Necron69
that is NOT Microsoft's approach to testing.
;) These really dont happen that often during the product cycle, because ad-hoc testing doesn't catch that much stuff if you've got well developed automation suites. However, it's still very worthwhile because it is a good feedback mechanism to explain why your other testing missed something, and it's the best way to notice the odd "that's funny..." sort of issues that are not functinoally incorrect but are still user annoyance type issues.
Where did you hear or get the impression that that was the MS "approach" to QA ?
I've written test suites for the following Microsoft Products
- Visual Basic Compiler, 7.0
- Microsoft Business Framework 1.0 (unreleased)
None of them involved just using the compiler or the business framework over and over in day to day work to find bugs.
We have a variety of test approaches, including a few that _might_ be construed as what you describe - There are a few ways that we get test coverage via product usage
- stress
- bug bashes
- app weeks
Stress is funnier than it sounds. Did you know we're not allowed to ship windows until the exact build of windows under ship consideration has been running on hundreds (thousands, usually) of machines continuously with no problems while enlisted in a distributed "stress" client... where they're pounded and pounded with automated tests that do things like starve memory whilst performing other work, etc? Same with ASP.NET and the CLR - they have to _survive_ for a pre-determined time period before the build can be considered shippable. We dont think there are any show-stopper bugs at this point - but we just want to be reasonably sure. Note that if we find a bug (even an unrelated one, like the documentation has a typo) and take a fix for it, the stress cycle resets because the bits have changed. Better safe than sorry. In the end game of a product release it can literally be the case that taking a bug fix means delaying ship for another week or more.
- bug bashes
this is probably most like what you're describing. Everyone on the team sits down for a couple of days and really just beats on a specific area of the product. Security Bug Bashes have become popular int he last couple years (wonder why
- app week
For developer tool products (like Microsoft Business Framework) we like to do an app week with each milestone, where everyone on the team builds some sort of end to end application, using as much of the toolchain as possible. This sort of testing really makes the employees better (we're usually pretty compartmentalized on our areas of functionality ownership). It also lets unreleated parties take a look at peices of the product they don't own (so don't have preconceived notinos about). Finally, it lets us simulate the end-to-end customer experience on our product stack. If we can build the sort of apps a customer might build with our tools, then the tools are probably alright. Where we run into problems, we know the tools need help.
bug bashes and app weeeks happen perhaps 1-2 weeks per milestone (which is on the order of 2 months). It is a small part of our testing, time, effort, and results wise. It's still important to do, but it is not the _focus_ of QA at microsoft.
My opinions are my own, and do not necessarily represent those of my employer.
Actually, I'd say that giving proper credit and public recognition for bug reports is good enough for most of the end-users. Case in point (interestingly enough, from LKML and by Andrew Morton himself):
Getting an answer like that should lift anyone's spirits. Not only has the bug been fixed, it was also recorded for posterity that a certain user discovered it and helped to his ability in fixing it. And to top it all off, the reporter was given an honest praise and a thank you. The last part alone is usually enough for most users, to see that the developers actually care.
As for resumes? If you have a verifiable record of reporting back bugs and helping to test their fixes, you should be able to use that for your advantage in CV or at least in an interview. If nothing more, it shows that you can communicate with different kinds of people and have enough technical ability to follow through with their requests for further details. You might have even gotten a better product for yourself to use.
There is no such thing as good luck. There is only misfortune and its occasional absence.