Lessons From Your Toughest Software Bugs

← Back to Stories (view on slashdot.org)

Lessons From Your Toughest Software Bugs

Posted by samzenpus on Monday August 3, 2015 @12:17PM from the that's-a-bad-one dept.

Nerval's Lobster writes: Most programmers experience some tough bugs in their careers, but only occasionally do they encounter something truly memorable. In developer David Bolton's new posting, he discusses the bugs that he still remembers years later. One messed up the figures for a day's worth of oil trading by $800 million. ('The code was correct, but the exception happened because a new financial instrument being traded had a zero value for "number of days," and nobody had told us,' he writes.) Another program kept shutting down because a professor working on the project decided to sneak in and do a little DIY coding. While care and testing can sometimes allow you to snuff out serious bugs before they occur, some truly spectacular ones occasionally end up in the release... despite your best efforts.

14 of 285 comments (clear)

Min score:

Reason:

Sort:

Compiler optimizer bugs by Dan+East · 2015-08-03 12:23 · Score: 4, Interesting

Some of the bugs I've beat my head against the wall over the most are compiler bugs. It's easy to have the mindset that the compiler is infallible, and so programmers don't usually debug in a way that tests whether fundamentals like operators are really working right. This was particularly bad developing for Windows CE back around 2000 when you had to build for 3 different processors (Arm, MIPS and SH3). I ran into a number of optimizer bugs usually related to binary operators. The usual solution was precompiler directives to disable the optimizer around a specific block of code.

--
Better known as 318230.
1. Re:Compiler optimizer bugs by arglebargle_xiv · 2015-08-03 14:13 · Score: 4, Interesting
  
  Some of the bugs I've beat my head against the wall over the most are compiler bugs.
  Ah yes, the gift that keeps on giving. Every new version of gcc that gets deployed has new optimizer bugs, to the point that, several years ago, we stopped using O3 and above since the small loss in performance (if there even was any) was easier than handling a long tail of compiler bugs across dozens of different CPU types with every new release ("dozens" may be an under-estimate depending on how you want to count families of ARM, MIPS, Power, and other embedded CPUs).
2. Re: Compiler optimizer bugs by Anonymous Coward · 2015-08-03 14:17 · Score: 5, Interesting
  
  A compiler guy here, who used to work for one of the RISC companies. Most compiler bugs are not that difficult to debug. But I worked on instruction scheduling and register allocation, hence always got assigned all the weird bugs. The most memorable one for me was actually a hardware bug - most people don't realize but most of the commercial microprocessors have a lot of bug in them. See published erratas and you will find many bugs. A few years after the particular generation of this processor was on the market, I got assigned a bug from this commercial DBMS vendor (I.e. very important customer) on this weird crash bug. It took me forever to figure out but it turns out to be a bug in the processor that corrupts a particular register (due to the register renaming logic screwing up in a rare combination of instructions) that is dependent on the timing and the instruction combination. It became anothet errata item, and I ended up implementing a workaround - if you notice some benign but odd code sequence a compiler generates, there might be a good reason behind :)
3. Re:Compiler optimizer bugs by Jeremi · 2015-08-03 15:33 · Score: 5, Insightful
  
  I was working at my first job writing my first program ever that was not a homework assignment, I decided to write it as a multi-threaded program
  ^^^ 2015 nominee for most terrifying sentence on Slashdot :)
  
  --
  
  I don't care if it's 90,000 hectares. That lake was not my doing.
Hardly devastating, but a waste of several hours by Rei · 2015-08-03 12:28 · Score: 5, Insightful

Program crashing at startup? Okay, let's add debugging statements.
Can't get the debugging statements to execute? Okay, let's try removing code.
Doesn't fix the problem? Okay, let's keep removing more... and more...
A couple hours later, so much code was removed that the entire program had become nothing more than an empty main function that still crashed. This led to the following rule which I try to follow to this day: Make sure that you're actually compiling and executing the same copy of the code that you're modifying. ;)

--
I'll never forget the last thing grandma said to me before she died: "What are you doing in here with that knife?!?"
Why Version Control is Important by 14erCleaner · 2015-08-03 12:37 · Score: 4, Insightful

Back in the 80's, I was working on a project with three other programmers. Nobody had heard of version control back then; we were using VAX/VMS and it would keep a few versions of a file around after you changed it, which seemed good enough (after all, we all trusted each other, right?)
Well, I don't remember the exact bug(s), but one day I fixed something, and tested it. Fine. A few days later the bug came back. So I went back, fixed it again (wait, didn't I already make this change?). A few days later it came back again.
It turned out that one of the other guys had fixed a different bug, which I had introduced with my fix. So, his fix was to change the code back the way it was. We went back and forth a few times un-doing each others' changes before we realized what was going on. Seeing a revision log with comments on the changes might have helped...

--
Have you read my blog lately?
You are not qualified to debug your own code by myowntrueself · 2015-08-03 12:42 · Score: 5, Insightful

I recall a proverb, something like
"It takes twice as much intelligence to debug code as it took to write it.
So if you code to the best of your ability you are, by definition,
not qualified to debug it."

--
In the free world the media isn't government run; the government is media run.
1. Re:You are not qualified to debug your own code by bloodhawk · 2015-08-03 13:35 · Score: 5, Informative
  
  The full quote is
  
  “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
  - Brian Kernighan
  
  I used to use this as my signature a few years back to try and make devs think about what they are writing. It is nearly always better to make the code simple and readable than to try and produce the best possible code. No it isn't as fun, but it is a damn side better for those that have to try and decipher your clever coding tricks later.
Debugging Gone Wrong by mlookaba · 2015-08-03 12:47 · Score: 4, Interesting

Bug 1 (my fault) : Took over working on a financial application that took an identifier and enriched them with all sorts of useful data. The original programmer had left, and nobody at the company knew anything about how it worked. Soon after, we were troubleshooting an issue reported by a client that the output data wasn't consistent between runs. I grabbed a list of all the unique security IDs I could find (about 100k) and pushed them through a couple of times just to try and replicate the issue. HOWEVER... it turns out the application was actually using the Bloomberg "By Security" interface under the hood. That was a service where you drop a list of IDs onto Bloomberg's FTP server, and they would respond with data... for a fee of $1 per security. The client got an unexpected bill of nearly $200k that month, and I had the most awkward talk ever with my boss. Fortunately, Bloomberg forgave the charges, and it turns out they were actually responsible for the inconsistent data - which was fixed on their end shortly thereafter.
Bug 2 (not my fault) : A client/server application is returning odd responses to a particular query. Developer (we'll call him "Jason") inserts a switch into the code that dumps this query out to a hardcoded folder on the server. The code then gets checked into production WITH THE SWITCH TURNED ON. It went undetected for nearly a year because the query wasn't terribly high volume. But slowly and steadily, the query files built up over time. Our IT had lots of money to play with, so server space was not an issue. Unfortunately, the number of files was. Server performance went steadily downward every so often, until finally this query would make it crash every time. When we eventually tracked down the cause, there were millions of files sitting in the same folder of every single server in the group. It took nearly three days just to get the OSs to delete the files without falling over.
Re:Compiler bugs are the worst by sectokia · 2015-08-03 12:48 · Score: 5, Interesting

The absolute worst I've had was a soft cpu in a altera fpga. It shipped with a C compiler. A programmer came to me to explain how his program would crash if he changed the order in which subroutines were defined. After carefully checking the logic it, there was nothing wrong with his code. So i then trawled through the assembly. Again i could find nothing wrong And thought i was losing my mind. I had to painstakingly check the cpu state after each instruction until i eventually found one instruction that did not set a flag as per the manual, and the assembler matched the manual. It was a fault that would only trigger it you did a certain conditional jump after a certain fetch increment then store sequence. It was a bug in the cpu pipeline logic. I learnt a valuable lesson never to trust anything. We wasted allot of time because we were convinced we must have been the source of the fault.
Re:debugger by Jeremi · 2015-08-03 15:10 · Score: 4, Interesting

Some people, when trying to analyze a buggy program, think "I know, I'll use a debugger". Now they have two buggy programs to analyze.
-- a grumpy old programmer

--

I don't care if it's 90,000 hectares. That lake was not my doing.
while-while loop in C/C++ by Cassini2 · 2015-08-03 15:39 · Score: 4, Funny

while (something) { // do_stuff } while (something_else);
It compiles, is legal C, and loops endlessly if something_else is true.
It can be done in a careless moment when switching a complex piece of code from a while () loop to a do-while () loop.
Re:Passing Parameters with Side Effects by Anonymous Coward · 2015-08-03 18:18 · Score: 5, Informative

Actually, what you're describing is formally defined as undefined behavior in the C and C++ standards.
Undefined behavior:

doSomething(pixel[i++],pixel[i++],pixel[i++]); /* function call commas are NOT sequence points, so the result is undefined */

Refer to the Sequence point article. The [3] citation says

"Clause 6.5#2 of the C99 specification: "Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored."
Pay spectial attention to see point #4 under "Sequence points in C and C++", because that talks about your exact problem. But beware that you'd still have a bug even if you hid the increment inside of a function, because order of argument evaluation is not specified (as oppposed to undefined behavior, which can cause nasal demons or format your hard drive).
Fixed with least diff:

int r=pixel[i++], g=pixel[i++], b=pixel[i++]; /* commas between declarators ARE sequence points */ doSomething(r,g,b);

See also: S.O. questions related to undefined behavior and sequence points in C and C++.
Re:Passing Parameters with Side Effects by TheRaven64 · 2015-08-03 20:28 · Score: 4, Insightful

The order of parameter evaluation is one that bites a lot of people because most compilers do it the expected way. When you're walking an AST to emit some intermediate representation, you're going to traverse the parameter nodes either left-to-right or right-to-left and most compiler IRs don't make it easy to express the idea that these can happen in any order depending on what later optimisations want. If they have side effects that generate dependencies between them (as these do) then they're likely to remain in the order of the AST walker. Most compilers will walk left-to-right (because a surprising amount of code breaks if they don't), but a few will do it the other way.
To understand why this is in the spec, you have to understand the calling conventions. Pascal used a stack-based IR (p-code) and had a left-to-right order for parameter evaluation, which meant that the first parameter was evaluated and then pushed onto the stack, so the last parameter would be at the top of the stack. The natural thing when compiling Pascal (as opposed to interpreting the p-code) was to use the same calling convention, with parameters pushed onto the call stack left to right. Unfortunately, C can't do this and support variadic functions (not: some implementations wanted to do this, which is why the C spec says that variadic and non-variadic functions are allowed to use completely different calling conventions), because if the last variadic argument is the top of the stack then there's no way to find the non-variadic arguments unless you also do something like push the number / size of variadic arguments onto the stack.
This meant that C implementations tended to push parameters onto the stack right to left. This is less of an issue now that modern architectures have enough registers for most function arguments, but is still an issue on i386. Because of the order of the calling convention, it's more efficient on some architectures to evaluate arguments right to left. Some compilers that are heavily performance-focussed (GPU and DSP ones in particular, where they don't have a large body of legacy code that they need to support) will do this, because it reduces register pressure (evaluate the rightmost argument using some temporaries, push it to the stack, move onto the next, reusing all of those temporary registers).

--
I am TheRaven on Soylent News