Slashdot Mirror


Lessons From Your Toughest Software Bugs

Nerval's Lobster writes: Most programmers experience some tough bugs in their careers, but only occasionally do they encounter something truly memorable. In developer David Bolton's new posting, he discusses the bugs that he still remembers years later. One messed up the figures for a day's worth of oil trading by $800 million. ('The code was correct, but the exception happened because a new financial instrument being traded had a zero value for "number of days," and nobody had told us,' he writes.) Another program kept shutting down because a professor working on the project decided to sneak in and do a little DIY coding. While care and testing can sometimes allow you to snuff out serious bugs before they occur, some truly spectacular ones occasionally end up in the release... despite your best efforts.

6 of 285 comments (clear)

  1. Compiler optimizer bugs by Dan+East · · Score: 4, Interesting

    Some of the bugs I've beat my head against the wall over the most are compiler bugs. It's easy to have the mindset that the compiler is infallible, and so programmers don't usually debug in a way that tests whether fundamentals like operators are really working right. This was particularly bad developing for Windows CE back around 2000 when you had to build for 3 different processors (Arm, MIPS and SH3). I ran into a number of optimizer bugs usually related to binary operators. The usual solution was precompiler directives to disable the optimizer around a specific block of code.

    --
    Better known as 318230.
    1. Re:Compiler optimizer bugs by arglebargle_xiv · · Score: 4, Interesting

      Some of the bugs I've beat my head against the wall over the most are compiler bugs.

      Ah yes, the gift that keeps on giving. Every new version of gcc that gets deployed has new optimizer bugs, to the point that, several years ago, we stopped using O3 and above since the small loss in performance (if there even was any) was easier than handling a long tail of compiler bugs across dozens of different CPU types with every new release ("dozens" may be an under-estimate depending on how you want to count families of ARM, MIPS, Power, and other embedded CPUs).

    2. Re: Compiler optimizer bugs by Anonymous Coward · · Score: 5, Interesting

      A compiler guy here, who used to work for one of the RISC companies. Most compiler bugs are not that difficult to debug. But I worked on instruction scheduling and register allocation, hence always got assigned all the weird bugs. The most memorable one for me was actually a hardware bug - most people don't realize but most of the commercial microprocessors have a lot of bug in them. See published erratas and you will find many bugs. A few years after the particular generation of this processor was on the market, I got assigned a bug from this commercial DBMS vendor (I.e. very important customer) on this weird crash bug. It took me forever to figure out but it turns out to be a bug in the processor that corrupts a particular register (due to the register renaming logic screwing up in a rare combination of instructions) that is dependent on the timing and the instruction combination. It became anothet errata item, and I ended up implementing a workaround - if you notice some benign but odd code sequence a compiler generates, there might be a good reason behind :)

  2. Debugging Gone Wrong by mlookaba · · Score: 4, Interesting

    Bug 1 (my fault) : Took over working on a financial application that took an identifier and enriched them with all sorts of useful data. The original programmer had left, and nobody at the company knew anything about how it worked. Soon after, we were troubleshooting an issue reported by a client that the output data wasn't consistent between runs. I grabbed a list of all the unique security IDs I could find (about 100k) and pushed them through a couple of times just to try and replicate the issue. HOWEVER... it turns out the application was actually using the Bloomberg "By Security" interface under the hood. That was a service where you drop a list of IDs onto Bloomberg's FTP server, and they would respond with data... for a fee of $1 per security. The client got an unexpected bill of nearly $200k that month, and I had the most awkward talk ever with my boss. Fortunately, Bloomberg forgave the charges, and it turns out they were actually responsible for the inconsistent data - which was fixed on their end shortly thereafter.

    Bug 2 (not my fault) : A client/server application is returning odd responses to a particular query. Developer (we'll call him "Jason") inserts a switch into the code that dumps this query out to a hardcoded folder on the server. The code then gets checked into production WITH THE SWITCH TURNED ON. It went undetected for nearly a year because the query wasn't terribly high volume. But slowly and steadily, the query files built up over time. Our IT had lots of money to play with, so server space was not an issue. Unfortunately, the number of files was. Server performance went steadily downward every so often, until finally this query would make it crash every time. When we eventually tracked down the cause, there were millions of files sitting in the same folder of every single server in the group. It took nearly three days just to get the OSs to delete the files without falling over.

  3. Re:Compiler bugs are the worst by sectokia · · Score: 5, Interesting

    The absolute worst I've had was a soft cpu in a altera fpga. It shipped with a C compiler. A programmer came to me to explain how his program would crash if he changed the order in which subroutines were defined. After carefully checking the logic it, there was nothing wrong with his code. So i then trawled through the assembly. Again i could find nothing wrong And thought i was losing my mind. I had to painstakingly check the cpu state after each instruction until i eventually found one instruction that did not set a flag as per the manual, and the assembler matched the manual. It was a fault that would only trigger it you did a certain conditional jump after a certain fetch increment then store sequence. It was a bug in the cpu pipeline logic. I learnt a valuable lesson never to trust anything. We wasted allot of time because we were convinced we must have been the source of the fault.

  4. Re:debugger by Jeremi · · Score: 4, Interesting

    Some people, when trying to analyze a buggy program, think "I know, I'll use a debugger". Now they have two buggy programs to analyze.

    -- a grumpy old programmer

    --


    I don't care if it's 90,000 hectares. That lake was not my doing.