Slashdot Mirror


Eric S. Raymond Identifies A Common Programming Trap: 'Shtoopid' Problems (ibiblio.org)

"There is a kind of programming trap I occasionally fall into that is so damn irritating that it needs a name," writes Eric S. Raymond, in a new blog post: The task is easy to specify and apparently easy to write tests for. The code can be instrumented so that you can see exactly what is going on during every run. You think you have a complete grasp on the theory. It's the kind of thing you think you're normally good at, and ought to be able to polish off in 20 LOC and 45 minutes.

And yet, success eludes you for an insanely long time. Edge cases spring up out of nowhere to mug you. Every fix you try drags you further off into the weeds. You stare at dumps from the instrumentation until you're dizzy and numb, and no enlightenment occurs. Even as you are bashing your head against a wall of incomprehension, consciousness grows that when you find the solution, it will be damningly simple and you will feel utterly moronic, like you should have gotten there days ago.

Welcome to programmer hell. This is your shtoopid problem.... If you ever find yourself staring at your instrumentation results and thinking "It...can't...possibly...be...doing...that", welcome to shtoopidland. Here's your mallet, have fun pounding your own head. (Cue cartoon sound effects.)

Raymond's latest experience in shtoopidland came while working on a Python-translating tool, and left him analyzing why there's some programming conundrums that repel solutions. "You're not defeated by what you don't know so much as by what you think you do know," he concludes. So how do you escape?

"[I]nstrument everything. I mean EVERYTHING, especially the places where you think you are sure what is going on. Your assumptions are your enemy; printf-equivalents are your friend. If you track every state change in the your code down to a sufficient level of detail, you will eventually have that forehead-slapping moment of why didn't-I-see-this-sooner that is the terminal characteristic of a shtoopid problem."

Share your own stories in the comments. Are there any programmers on Slashdot who've experienced their own shtoopid problems?

8 of 189 comments (clear)

  1. Not always... by rbeattie · · Score: 4, Interesting

    More times than not, the solution is actually really difficult - you just underestimated the problem. Then you go to github and find a library that shows you how it should be done, and you can't believe it takes so much code to do something that seemed so straightforward.

    --
    Me
    1. Re:Not always... by igny · · Score: 3, Interesting
      The shtoopidest problem I faced was in TransactSQL. Usually, the syntax there is case insensitive, but there is a difference between
      • where timeStamp >= format(watermark,'yyyy-MM-dd hh:mm:ss') --<-- incorrect
      • where timeStamp >= format(watermark,'yyyy-MM-dd HH:mm:ss') --<--correct

      This bug was extremely elusive for me because the code looks fine and watermark in our data is almost never between 00:00:00 and 01:00:00 and that was when the bug sometimes causes missing data in our target tables.

      --
      In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
    2. Re: Not always... by UnknowingFool · · Score: 3, Interesting

      Sometimes it's a very slight difference between environments that causes problems. We rolled out some code to production that had been fully tested in Dev and Test environments. Things started to break due to SQL errors. Ran the SQL directly on the production database server but it ran fine. Somehow the SQL was getting different results running through the production server than it was on the production database server directly.

      After some investigation the only difference between the production server and other environments was the server used a slightly older database driver. It was a minor version difference. How this caused errors was that in the older db driver all math operations had to be explicit data casts despite what documentation said but the newer driver followed the database documentation. So Integer A / Integer B should be implicitly cast as Integer according to the documentation. However the older driver would cast that as Float for some unknown reason and that would cause errors.

      But this would only happen using the db driver on Production. Testing the SQL directly on Production DB wouldn't have found it. Testing the code and SQL on Dev and Test servers wouldn't have found the bug. The patch notes for the db driver didn't mention the change.

      --
      Well, there's spam egg sausage and spam, that's not got much spam in it.
  2. Re:He really is old, isn't he? by ledow · · Score: 4, Interesting

    Ever tried debugging deep-level OS kernel code?

    To be honest, debuggers also introduce just as many differences - I have crafted code (nothing special, fancy or playing tricks) that, when debugged, works entirely differently to non-debugged. Debugging inserts all kinds of stuff into the code that modifies the pointers of all kinds of data by vast amounts, and can made it "pass" whatever it is you wanted to do.

    Also, if you program against many architectures, an architecture-specific bug might be something that you don't have the tools for, despite debugging the code on all your normal platforms. Yes, a debugger is the ultimate solution, but mostly you might just not have that stuff available and it could be days or weeks before you can get it going to the point that you can effectively debug code that you've been working on for 20 years and know inside out.

    Plus many problems are not debuggable - maybe your users are having the issue but you're not, and you can't reproduce, but dozens of your users can, and yet they have almost identical environments to you - the only way to debug that is to set up a full programming, debugging and source environment on their machine - which may be something you don't want to do - or give them an instrumented version of the executable, which may not reproduce the problem.

    I know for a fact that I have programs that work on Linux, Windows, even HTML5 (via emscripten), that also can work on Mac. But for sure I wouldn't be buying a Mac to diagnose problems on that platform until it was absolutely necessary. And I wouldn't be giving my code to users for them to diagnose it.

    But through in a bunch of printf's and a log and - no matter the architecture or tools available - you can get down to a function, a line, a set of parameters enough to debug before you even need to think "How the fuck am I'm going to go about getting debug info out of that person/system/architecture?"

    I know I have a C macro that I prefix all functions with. In "normal" mode, it just expands to a function definition. In "debug" mode, it expands to the function, and a bunch of debugging lines for when it enters/leaves each function and the parameters given to it. This means one switch change and the program runs basically identically to how it runs without debugging, churns out a huge log file, doesn't modify any structures, pointers, etc. and which I can skim the bottom of after a crash report to know where and why it crashed, on any architecture, with a compiled binary, without including the full -g debugging shit that basically gives away your source code (or a version of it).

  3. assert()'s for every assumption by jrbrtsn · · Score: 5, Interesting

    Over my 30 year career, I cannot believe how many 'C' programmers I've come across who are unfamiliar with the assert() macro. This macro is essential for trapping all invalid assumptions! Usually it's as simple as:

    if ( ! functionWhichCanFail(a,b,c) ) assert(0);

    Run your program from the debugger, and it will stop when the assert(0) is encountered, giving you full and convenient access to everything needed to hunt down the issue.

  4. printf() may not work for multithreaded problems by jrbrtsn · · Score: 4, Interesting

    A few years ago I had an issue in a multi-threaded program where using printf()'s caused the problem to go away. In order to track the problem down, I ended up writing messages to a buffer in RAM, and dumping the buffer to stdout after the problem occurred.

  5. Re:printf() may not work for multithreaded problem by Wrath0fb0b · · Score: 5, Interesting

    Fun story time related by a colleague. A pretty common piece of software (hint: there's probably one running within a few hundred yards of you) had an elusive bug. But as the parent noted, printf caused the problem to go away, and it was suspected because it caused synchronization on stdout. Unlike the parent, the developers didn't have time to actually implement a buffered-log solution to figure this out, so they the obviously-logical thing -- they replaced all the printf calls with barrier() and shipped it. It's still running like this today.

    Another good one, I worked with someone who would log everything all the time by fprintfing to a high-numbered pipe. When I asked him, he gave a few advantages that still ring partially true (depends on context): first, he said, I can get the log from any running instance without even stopping by d-tracing the system call. But most critically, he said, all the formatting happens in userland and only after the syscall does the kernel actually realize that there's nothing on the other end of the pipe and drop the write. That means, he reasoned, that the release/debug versions would always have very close behavior and would avoid the class of 'bugs that don't reproduce in debug build'. As with the other story, to this day, there's a slew of machines out there, formatting and writing log messages to a pipe that's never open.

  6. Re:Assert is your friend by justthinkit · · Score: 1, Interesting

    These are always humorous comment threads.

    Everyone has some experience. Everyone has plenty of advice. Lots of absolutes (like the parent's) come out.

    Point number one, always, is to check our assumptions. We assume we know how to code. We assume others know how to code. We assume libraries work as documented. We assume compilers are logical. We assume we are.

    Men assume women are logical. Fun ensues.

    Kids assume parents are good examples. And waste decades of their lives.

    Physics drifts away from the best model for one hundred years. Everyone drifts into the ditch with it.

    Little to nothing is learned because...follow the money.

    Real problem one in programming / languages is lack of examples. Commands are introduced, and given a 10 line example. And off we run.

    Real problem two in programming / languages is lack of incentive to do the right thing for the programmer. Microsoft is famous for its excrement. Because if it actually delivered something that was near perfect, who would ever upgrade?

    Real problem three in programming / languages is no one gives a crap. For each person weighing in on this thread, one hundred won't. For each person at least reading this thread, one hundred won't. For each person seeking the best language, one hundred will choose the one that gets them a good job. And then all these randomly compromised and diluted groups are placed under a PHB who is more concerned with breaking things -- by making things "better" -- than anything else.

    Given the randomness of this forum, the deliberate trash that passes for programming languages and the throw-away nature of advice, we might as well be asking "What good books have you read this year?" The winner so far for me is "Strange Chemistry". Who knew that Visine and Bengay could be so dangerous?

    --
    I come here for the love