Debugging Expert Wins ACM Dissertation Award
An anonymous reader writes "The Association for Computing Machinery (ACM) is reporting that Ben Liblit has been awarded the 2005 Doctoral Dissertation Award for his study on understanding and fixing software 'bugs' in the real world. From the article: 'Liblit's dissertation proposes a method for leveraging the key strength of user communities - their overwhelming numbers. His approach uses sparse random sampling rather than complete data collection for gathering information from the experiences of large numbers of software end users. It also simultaneously ensures that the observed data is an unbiased, representative subset of the complete program behavior across all runs.' Slashdot broke the story on this research back in 2003. Apparently the project is still going strong."
they're not bugs, they're features.
for a minute there, i lost myself...
'leveraging'
*tears out own hair and screams*
My blog
No, not the wild-eyed madcap scientist from Back to the Future. Doctor Watson is an OS service present in Windows that monitors the running process list for terminal assertions. When a program hits an exception that it can't handle, it terminates immediately and Doctor Watson is on the scene to read the last gasps of the process before its bits get blasted. Microsoft even came up with a way to harness this to allow users to send real-time feedback to Microsoft HQ whenever a crash occurred in a program. No one I know ever sends that data back, but I'm sure someone must have once.
The current idea seems to be tracking the same termination events in the same way as Doctor Watson and sending the relevant data back to UWisc without informing the user. It sounds like a good idea, but I doubt it is in Liblit's power to fix Windows OS bugs.
http://findbugs.sourceforge.net/
"Sir, is this a stand-up fight, or another bug hunt?"
Seriously, congradulations, Ben!
Zhrodague.net - I do projects and stuff too.
This research has been a wonderful collaborative effort, and many people deserve to share the credit. To quote from part of the Acknowledgements section of my dissertation:
So thanks, Slashdot, for helping me find those users (or helping them find me). The exposure was invaluable. And thanks, open source community, for your participation. I've benefitted greatly from standing on your massed shoulders. This could not have happened without you.
Ben Liblit is now an assistant professor at the University of Wisconsin-Madison. He joins a fantastic Computer Science department. Good luck Ben!
i think if people knew there were any probability that their bug reports would not be taken into consideration, maybe they wouldn't post them at all!!
It would have been interesting to know the difference in number and type of bugs between a closed source product and an equivalent open source product... let's say the MS Office suite versus OpenOffice.org suite.
So somebody went and formalized the theory of "the users are the beta testers"...
"I've benefitted greatly from standing on your massed shoulders."
You're welcome. Now could you get down? I'm getting a cramp.
The installation of CBI is implicit consent to such monitoring, of course, and I didn't mean to imply that there was no consent involved at all.
However, asking us to read 170-odd pages of your dissertation is a little much. Would it be possible to describe the data collection system, how reports are generated and if the reports are sent automatically or as in the case of Dr. Watson sent with user approval. Also, what types of bugs you found using your statistical methods, as well as what types of bugs you think would be difficult to find using such methods.
A quick comparison to related mainstream debugging techniques would be useful to give us out here in the trenches a firmer grip on the techniques you describe.
And finally, if you wouldn't mind, could you describe a real-world scenario where a generalized product (codename: CBIMax) would be marketable. If such a general product is impossible, is it because each product is different and the methods you describe would need to be revised each time? What is the maximum level of abstraction of these techniques from specific scenarios that is achievable yet still retaining enough so as to not require largescale retooling for each project?
Thanks!
"Yes, exactly. The users are beta testers; we may as well admit it. I want to make them better beta testers. :-)"
Recreational drugs help.
I know that in one particular http://www.kingdomofloathing.com/game, they tend to follow this approach. Once a new feature is created, and debugged enough so that it's stable and doesn't break anything, the feature is released to the general populace. After all, once all of the important bugs are found, a thousand users will find the minor bugs through general usage faster than a small dedicated team of testers. Also, the time the testers save by not having to verify every single minor detail can be used to work on new material.
Add into the equation that without some elaborate software (such as Mercury LoadRunner, or an open-source equivalent), it's hard to simulate the effect the entire population will have when they start hammering on the server. It can also help track down extremely low-occurance bugs, because with enough people working on it, those one-in-a-million cases will eventually come up.
Kinda reminds me of infinite monkeys eventually producing the works of Shakespeare.
If you even just skim the page, you'll see that only the fact that it sends a feedback report is similar. Most of the project consists of 'sparse' weighted sampling of instrumented code, research into what are useful metrics to use to instrument the code, and how to correlate the collected data with patterns to auomate finding bugs.
I have a doctorial thesis up for review and evaluation. Thank you for inviting me here to discuss my thesis. To be honest... This thesis is a composite of what has been offered to me by Microsoft, Oracle, Novell, Oracle, Microsoft, SAP, OpenSource and companies who wish not to be named..cough SUN...cough..SUN. Uh..what? Questions?...please see the index and feel free to look at the appendix. Thank you for the compliments...is that a Oracle teamwear shirt you're wearing?
You mean the ACM is more than just a drinking club? The way they advertised themselves at my university, one wouldn't have thought so...
I think that automated unit testing is the future of killing bugs. In layman's terms, this involves a program trawling your code and automatically trying to break it. If done well, the system can replace some of your QA team, and QA goes a lot faster. I hadn't even heard of such a thing until I did some contract work for Agitar, one of the companies doing this stuff. Here's a link with an overview & screenshots:
h tml
:(
http://www.agitar.com/products/20051101-agitator.
They only work with Java. Here is a link to a page where they ran some Open Source products through their tool & published the results:
http://www.agitar.com/openquality/
But Agitar's product isn't Open Source itself.
-Tony
My Greasemonkey scripts for Digg &
From the Centos about page:
CentOS-4 supports x86 (i586 and i686),
In other words, it won't run in a 386, I wouldn't want it if it was compiled so low as to be optimized for a 386. Please start using x86 something other than 386.
(Sorry, it just irks me)
There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
I hate it when that happens.
There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
Interesting. I thought I recognized the name Libit.
He's one of the authors of a NIPS paper from three years back: Statistical Debugging of Sampled Programs
I told people we were switching to new software (Mailman)- and that if they got an error message or similar, to flip a quarter X times (I forget how many) and ONLY email me if they got all heads. I didn't want to get a couple dozen reports of the same problem, and I figured that if there were any problems, they'd affect a large set of the 1000+ users of the list.
It worked brilliantly.
Please help metamoderate.
The mad scientist from "Back to the Future" was named Emmett Brown, not Doctor Watson.
Laws do not persuade just because they threaten. --Seneca
this FP
You keep using that word. I do not think it means what you think it means.
Has anyone ever failed an FP so badly as you have? How embarrassing for you.
Well, we've got Ben Liblit RIGHT HERE! Right here in the thread! Replying to my question!
Doesn't it make sense to ask him while he's here to discuss the topic in a simplified manner since he's the world's leading expert on the topic? This is a discussion forum, so being able to hear him express the concepts allows us to participate with him in a two way transfer of ideas.
The alternative is to write everything down and simply refer to documents instead of engaging people in conversation.
I would like to see the data that they're collecting but I can't find it anywhere on their site. Am I just missing it?
I learned my lesson with the cddb disaster: don't submit your own data unless you can mirror it yourself. Otherwise, if someone gets bored or greedy, everybody's hard work gets lost forever. Or, worse, it gets subverted to make profit for Gracenote.
Mirroring is easy enough even if you have almost no bandwidth: use bittorrent. So, where are the submitted results?
This just goes to prove how stupid academic computer science has gotten these days. This particular research is cheesy observational crap not even worthy of a degree at ITT Tech.
Sorry, dude, but you're a chode.
E. N. Adams, "Optimizing Preventive Service of Software
Products," ZBM Journal of Research and Development
28, No. 1, 2-14 (1984).
Example: yesterday I had to solve a problem in the application I am developing for Bell that was absolutely independent of the programming platform.
The QA reports that there is an error on the screen while they are trying to save some data. The error is intermittent, it happens for some data sets but not for others. Investigation shows the following:
1. The data records that need to be saved must be first compared to existing data records for the same primary keys, if there are records, then the data is updated, otherwise it is inserted. Each normal record also can have one satellite record. These records share some of the primary key but are also different by one primary key component.
2. When the records are retrieved they are joined with data from another table. The application expects that all records from the other table that are related to the records from the first table have the same value in a particular field. However it happens that another application was used on the second table, which introduced data inconsistency between the records. Thus a work-around was introduced by the first application. The workaround consisted of selecting a Maximum value from the second table and joining that value to the records from the first table. This worked fine, until another application introduced NULL values into the second table. This caused the selection procedure used by my application to not return existing records when trying to save data. Since no records were returned on the select, my application assumed that the data was new and needed to be inserted rather than updated. Thus the application inserted multiple records where only one record had to exist.
3. An update procedure relied on having either 0, 1 or a maximum of 2 records per primary key. If it is 0 records, the data must be inserted as a new record. If it is 1 or 2 records, than one of the existing records is always a primary record, and the second one is the satellite. However, more than 2 records were found by this procedure because of the select/insert problem identified above. This created an error, because the code actually has a hard limiter built in: if there are more than 2 records returned by the database, an Illegal State Exception is raised.
--
What I described here is a software bug that caused another problem that caused another problem. But the bug in the first place was not expected, because it was assumed that other applications would not modify data in the secondary tables in an incosistent manner.
So, here is the actual chain of events: one application modifies some of the data data in the database in an way that is unexpected by the application that is being developed, this causes an incorrect interpretation of results in one of the operation in the new application, which does not actualy raise an exception, but it causes another independent procedure to cause an error.
The error is intermittent because only some of the data is modified in an incorrect manner. The procedures that cause the error to happen are asynchronous and are not necessarily executed in any particular order but rather depend on the user intervention to be executed.
--
The way that I found the problem was by eliminating all impossible logical branches. I was left with an improbabl logical branch: a SELECT statement returning an empty list, when in fact there is data in the database. Since all other logical paths were impossible, I had to assume that the improbable happened, investigated into it and found specific cases when SELECT statement did not return any records. From there I had to figure out what caused the incosistent data. The end solution was to change the SELECT procedure to return records even when joined with an empty set from a different table (the unexpected situation.)
--
Certainly if I had total control of the Data Model, this wouldn't have happened, because the data inconsistency would be impossible in the first place, but in the real world you don't always get to chose what you are working with. I was given an existing data model, and had to adopt the application to it.
Now, is it really of any relevance that the application is written in J2EE/BEA/Oracle? No, it is not.
You can't handle the truth.
ACM are a bunch of scumballs who should not be allowed to operate.
They are no better than spammers.
Ahh...the many years I've spent trying to make managment understand.... And why Open Source makes so much sense, at least from a got broke/get fixed perspective. Still haven't figured out the financial incentives. :)
RANT #1
I always think it's weird, something about the art and craft of getting better code out gets little notice, but some FireFox alpha (which is feature INcomplete, really only for extension developers) gets 200,000 messages. Mabe a third of them will be flames "(IE/FireFox/Whatever) is so buggy, you suck unless you switch to (whatever)". But tools to debug these get ignored. How much work is going into KDE vs GNOME, and even a group that wants to fork KDE (deity() help us).
RANT #2
I really think we're reaching some of the limits of the current programming models. Think of how many states there are in a 1GB machine? 2^(2^30) is
a lot of states. This is one of the best things about Java, trying to restrict the number of states, pointers massive complicate the state diagram, allow stray code from a totally unrelated segment fuck up yours if there is a bug. But we have the same 1 segment architecture that back with monolithic apps fitting into 10Kb or so. The security threats are radically different (stand alone machines to always on TCP/IP connctions) but we haven't isolated code much better, W^X just now making it's way into modern UNIXen. Why the hell do you have execution state (return address on the stack) next to data that can be overwritten? Maybe back in the old days of register poor architectures, but we shouldn't have that now. But there's so much code out there, can't rewrite it all.
If i link a library in, i should have a defined set of things that it can do. It should have it's own sandbox, and only do the things it specifies. If it does anything different, terminate the app and not let it do something crazy. I'm sure modern MMUs could be programmed to subsegment like this, but we don't.
OK, rant over.
If you mean incentives from the users' perspective, I like to pitch it this way. My statistical methods naturally tend to "learn" the most, most quickly, about the failures that happen most often. So the more a user participates, the more the developers' attention will be swayed to the bugs that user cares about. Thus, users can help steer bug triage.
Open source bug trackers can work the same way. When you report bugs in some project's Bugzilla system, you're helping that project see what they need to work on. At the same time, you're being selfish by drawing the developers' attention to your issues. It's a sort of enlightned self-interest.
Most recent CS papers I've seen seem to take every measure possible to avoid any math beyond 8th grade...
The fact that this guy actually used things like Statistics and Probability in a CS problem is probably what got him the dissertation in the first place. When some people in ACM couldn't understand a thing, they decided to give him an award.
Do you think this is flamebait?
If so, sit in on a 3D Graphics Course when the frequency domain and "Fourier Transform" are mentioned. I have personally witnessed a class of ~25 CS grad students [at a top 10 university in the USA] fiercely complain that taking a simple integral (to find the 1D fourier transform of a signal) was too hard for a test question. (And average grade for the class was about 20 points lower than previous classes, mainly due to that question).
For those not familar with this, computing a 1D fourier-transform requires little more than the level of a high-school calculus course [and yes, they do teach calculus in high schools!]
[I'm posting AC because many mods are probably CS people]
Or the kind of comments you get on Slashdot, which is very similar to an infinite number of monkeys typing on the internet, except the monkeys are VERY EGOTISTICAL and tend to engage in mindless FLAME WARS and eventually call one another NAZIs. Just like they'll do to this comment.