Eric S. Raymond Identifies A Common Programming Trap: 'Shtoopid' Problems (ibiblio.org)
"There is a kind of programming trap I occasionally fall into that is so damn irritating that it needs a name," writes Eric S. Raymond, in a new blog post:
The task is easy to specify and apparently easy to write tests for. The code can be instrumented so that you can see exactly what is going on during every run. You think you have a complete grasp on the theory. It's the kind of thing you think you're normally good at, and ought to be able to polish off in 20 LOC and 45 minutes.
And yet, success eludes you for an insanely long time. Edge cases spring up out of nowhere to mug you. Every fix you try drags you further off into the weeds. You stare at dumps from the instrumentation until you're dizzy and numb, and no enlightenment occurs. Even as you are bashing your head against a wall of incomprehension, consciousness grows that when you find the solution, it will be damningly simple and you will feel utterly moronic, like you should have gotten there days ago.
Welcome to programmer hell. This is your shtoopid problem.... If you ever find yourself staring at your instrumentation results and thinking "It...can't...possibly...be...doing...that", welcome to shtoopidland. Here's your mallet, have fun pounding your own head. (Cue cartoon sound effects.)
Raymond's latest experience in shtoopidland came while working on a Python-translating tool, and left him analyzing why there's some programming conundrums that repel solutions. "You're not defeated by what you don't know so much as by what you think you do know," he concludes. So how do you escape?
"[I]nstrument everything. I mean EVERYTHING, especially the places where you think you are sure what is going on. Your assumptions are your enemy; printf-equivalents are your friend. If you track every state change in the your code down to a sufficient level of detail, you will eventually have that forehead-slapping moment of why didn't-I-see-this-sooner that is the terminal characteristic of a shtoopid problem."
Share your own stories in the comments. Are there any programmers on Slashdot who've experienced their own shtoopid problems?
And yet, success eludes you for an insanely long time. Edge cases spring up out of nowhere to mug you. Every fix you try drags you further off into the weeds. You stare at dumps from the instrumentation until you're dizzy and numb, and no enlightenment occurs. Even as you are bashing your head against a wall of incomprehension, consciousness grows that when you find the solution, it will be damningly simple and you will feel utterly moronic, like you should have gotten there days ago.
Welcome to programmer hell. This is your shtoopid problem.... If you ever find yourself staring at your instrumentation results and thinking "It...can't...possibly...be...doing...that", welcome to shtoopidland. Here's your mallet, have fun pounding your own head. (Cue cartoon sound effects.)
Raymond's latest experience in shtoopidland came while working on a Python-translating tool, and left him analyzing why there's some programming conundrums that repel solutions. "You're not defeated by what you don't know so much as by what you think you do know," he concludes. So how do you escape?
"[I]nstrument everything. I mean EVERYTHING, especially the places where you think you are sure what is going on. Your assumptions are your enemy; printf-equivalents are your friend. If you track every state change in the your code down to a sufficient level of detail, you will eventually have that forehead-slapping moment of why didn't-I-see-this-sooner that is the terminal characteristic of a shtoopid problem."
Share your own stories in the comments. Are there any programmers on Slashdot who've experienced their own shtoopid problems?
how on earth would you instrument *this*?? https://en.wikipedia.org/wiki/...
More times than not, the solution is actually really difficult - you just underestimated the problem. Then you go to github and find a library that shows you how it should be done, and you can't believe it takes so much code to do something that seemed so straightforward.
Me
Instrument everything? Printf is your friend?
The guy is talking about something of a few dozen lines of code, and he doesn't simply use a debugger to step through the problematic code?
WTF LOL
The guy is talking about what you think ought to take a few dozen lines of code, but apparently does not.
I don't know how old you are but your reading comprehension isn't at a level that entitles you to make fun of Eric Raymond.
Something visual, that needs no printfs.
It cannot do that. Impossible. Hours later: Oh, this library or system code isn't doing what the documentation says / is buggy.
No-one can doubt that Eric S. Raymond is a talker.
My personal pet hate: Profilers. Profilers are f**ing useless for finding memory leaks (1), and are useless for optimization (2).
(1) Because they don't distinguish between stuff that intended to be kept allocated (a) , and stuff that isn't (b), they swamp (b)'s with (a)'s, you waste a shedload of time sifting through misleading allocs.
(2) Because they don't distinguish between waste (a) and time consuming functionality (b). You look for improvements in (b)'s when the improvements possible are actually in (a)'s.
You're far better doing ad-hoc issue specific code to look for memory leaks and performance issues. Nothing wrong with Log and Printfs.
----------------
The second thing I always advise.... fix the bugs you know how to fix FIRST. Because other bugs may be a side effect of the known bugs. The unexplained bugs, are bugs that you don't understand, so how do you know they're not a side effect of other bugs?!
---------------
Third point, if you're stuck, and your 100% confident the code is correct. You cannot see the problem. Split the code in two, work out the result you expect in your head and check that it is that. Don't step through with a debugger for the thousandth time, use software to check it at that point. Perhaps the debugger is serializing threads, or changing the timing, or some other thing that makes it work..... you don't know, so do it differently. Even if it seems dumb to check, do the check. Because it's better than re-reading the code yet again and again in the hope that you'll finally spot it.
Some gun guy that apparently also program computers. Or tries to at least.
Been there, got several wardrobes full of T shirts.
If unit testing and staring at code for more than a few minutes doesn't solve this kind of problem, then the assertion hammer comes out. Assert everything, especially the things that are so obvious that they don't need an assertion. The bugs just have fewer and fewer places to hide and eventually surrender.
Over my 30 year career, I cannot believe how many 'C' programmers I've come across who are unfamiliar with the assert() macro. This macro is essential for trapping all invalid assumptions! Usually it's as simple as:
if ( ! functionWhichCanFail(a,b,c) ) assert(0);
Run your program from the debugger, and it will stop when the assert(0) is encountered, giving you full and convenient access to everything needed to hunt down the issue.
A few years ago I had an issue in a multi-threaded program where using printf()'s caused the problem to go away. In order to track the problem down, I ended up writing messages to a buffer in RAM, and dumping the buffer to stdout after the problem occurred.
Been there, done that many times. Nothing more frustrating to see something you know is absolutely impossible! But fairly satisfying when you ultimately find the bug.
J
I learned long ago to recognize the feeling that comes when I know I'm missing something obvious. When I do that, I grab a coworker, and explain the issue to them. Just explaining it to someone is frequently enough, but sometimes they spot something glaringly obvious that I've missing.
I spent an hour once trying to find an issue where the difference was between I5 and l5. Yeah, depending on your font and display that may be an easy problem, or a hard one. One of those is a capital i, the other a lowercase L.
Those constructs should throw an error in PHP.
if (( variableOne == 0) or (variableTwo = 10) or (variableThree = 1)) { Code }
The bug, of course, is that == and = look pretty darn close to each other given the right combination of zoom, font, and fatigue. (So it was assigning the variableThree, instead of testing it, and then executing the conditional code).
And it was "ecosystem" code - Is it the programming language not supporting the feature... The browser (because there were articles saying that Browser X didn't support this feature), the DB engine not working right....
There was a huge DERRP DE DURRRRR when I found it
I know this type of thing, is my dayjob. Given, I'm currently doing total LAMP stack web development on setups degraded beyond imagination and a lot of my work involves coming up with crazy hacks and gluecode that gradually inches it's way towards a solution. In order to achieve I do exactly what he describes. It's basically what loosely typed web development is all about. But as far as I can tell, this type of problem is a regular thing in development and results in having to bend our abstraction of reality (code) to actual reality. This type of problem will never go away and AFAICT it's a standard thing to run into when doing anything but the most trivial cleanroom script.
We suffer more in our imagination than in reality. - Seneca
I feel ya brother.. the off by one still gets me 30 years later.
https://en.wikipedia.org/wiki/...
I wish we could have an agreement that lists, arrays, elements, and anything put into a list, table, query, associative array, start with an index value of either 0 or 1.
I don't care just pick one, and don't use two different standards in the same environment.
one of the main problems i find, when programming, is the toxic, uneducated, arrogant, or "shtoopid" behavior of a lot of programming 'leaders'. they seem to be amateur sociologists who are unafraid to practice in public, when they have no training in sociology or psychology.
I find that calling someone "stupid" (even yourself) is offensive and the imagery of "hitting with a mallet" is extremely violent. He shouldn't be allowed to work on open source projects.
I've started preventing that by habitually putting the variable on the right side. If I accidentally use = instead of == I'll get a syntax error.
I've started preventing that by habitually putting the variable on the right side. If I accidentally use = instead of == I'll get a syntax error. It makes that bug impossible by just changing an arbitrary habit.
if ( 10 == variable )
So you have a failed assertion. What happened? Fire up the debugger, breakpoint on abort. Breakpoint gets triggered, you get a backtrace. Can't imagine how you got there.
Days of debugging later...
The abort function is marked as "noreturn". Consequently instead of calling abort, the compiler saves a few bytes/cycles by jumping to a preexisting abort call, never mind the state of the stack frame. Of course, this single recycled abort call in the whole module is where all backtracks end up. Hooray.
Now obviously the whole purpose of abort as opposed to exit is to get a core dump. And the whole purpose of a core dump is debugging. And debugging involves backtraces, so abort calls should leave stack and continuation in a useful and recognizable state. So the obvious remedy is not to mark abort as "noreturn". Because you never want to have the stack in a mess when aborting as opposed to exiting.
Enter your most beloved glibc maintainer of yore. Who refuses to lie to the compiler for any reason at all.
This shtoopid problem will stick around. -fno-crossjumping for yall.
Decent logging libraries provide asynchronous options. For example https://logging.apache.org/log4j/2.x/manual/async.html
Of course not. There is nothing special about printf: it is just an ordinary function that takes time (multiple cycles) to execute. During that time, multiple values to be printed can be changed by other threads so the printed results are inconsistent. In such cases, you need to use a mutex.
If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
In other words he's trying to create a new name for something that has been troubling programmers since the first programmer: making bad assumptions
If those 22 characters are too long for you, then just say "fuck me" or "f me" for the those truly lazy folks out there who didn't bother to read this far and has missed out on this time saving advice. fme is far faster than shtoopid.
Fun story time related by a colleague. A pretty common piece of software (hint: there's probably one running within a few hundred yards of you) had an elusive bug. But as the parent noted, printf caused the problem to go away, and it was suspected because it caused synchronization on stdout. Unlike the parent, the developers didn't have time to actually implement a buffered-log solution to figure this out, so they the obviously-logical thing -- they replaced all the printf calls with barrier() and shipped it. It's still running like this today.
Another good one, I worked with someone who would log everything all the time by fprintfing to a high-numbered pipe. When I asked him, he gave a few advantages that still ring partially true (depends on context): first, he said, I can get the log from any running instance without even stopping by d-tracing the system call. But most critically, he said, all the formatting happens in userland and only after the syscall does the kernel actually realize that there's nothing on the other end of the pipe and drop the write. That means, he reasoned, that the release/debug versions would always have very close behavior and would avoid the class of 'bugs that don't reproduce in debug build'. As with the other story, to this day, there's a slew of machines out there, formatting and writing log messages to a pipe that's never open.
We have solutions to reduce this sort of problem (at least once you get past the learning curve), but the top programming languages tend to implement very few of them. Reasoning about state is difficult, particularly when that state can be altered in unexpected ways. It is difficult to be confident that your code does what you think it does when you don't have a computer-checked method of specifying your intentions separate from what your code does.
There are no magic solutions here, at the least you will end up needing to spend more time writing in a specification language and that requires learning how it works. I would say that a gentle introduction to something like this is Elm which has an aim of stripping down typed functional programming into something that doesn't really need a C.S. degree. Here is a video which helps to explain what a better type system can do for your code. If you want to see something a bit more mind-bending check out Idris which has a much more powerful specification language which can prevent things like off-by-one errors or unbounded recursion in many cases. Moving off the scale of usability a bit, there is ATS which is a difficult language, but its specification language is able to make pointer arithmetic safe and doesn't bind you to immutable data structures. Hell, even Rust is full of good ideas that help to avoid these issues. And if fault-tolerant distributed systems are your thing, you need to check out Erlang (or its sibling Elixir) as there are so many great ideas that have been around for decades yet don't get nearly enough exposure.
This doesn't prevent us all from occasionally falling into this trap, but the themes of the languages listed is to find ways to encourage (or force) you to get the little things right the first time with computer-verified specification and to isolate the search space where problems are likely to occur.
For this reason, God sends them a powerful delusion(operation of wandering)(planet) so they will believe the lie.
https://ipfs.io/ipns/QmRjnvwZFj8bWba3HHKo7pnLm5kep4nvQepMcM1eejzgsn
Whenever I deal with setting time, daylight savings, time zones etc., managers just assume it is going to be easy. And then the corner cases start popping up. What happens when a co-processor clock drifts 3 seconds ahead during the transition at the end of daylight savings and you had an event that started at 2am local time... Your 20 minute easy bug fix turns into a week of long hours and 2 weeks of testing for the test team. And don't get me started with NTP, that code is awesome but the jitter handling is so complicated.
A few years ago I had an issue in a multi-threaded program where using printf()'s caused the problem to go away. In order to track the problem down, I ended up writing messages to a buffer in RAM, and dumping the buffer to stdout after the problem occurred.
Similar story, except that the processor would reboot, clearing all the variables I stored leaving no opportunity to grab all the diagnostics.
I examined the map, determined what the last address was, added an interrupt handler on the clock that logged the stack pointer ~250/sec (only needed to log the pointer if it was smaller than the existing one) to determine how much margin I had and used that little space between maximum stack and variables to write my diagnostics to.
Once I had determined the smallest stack address that got used, I wrote my diagnostics into that margin between the stack and the bss. To make sure that the values wouldn't be overwritten on processor startup I could not use actual variables, I had to use a pointer variable that pointed to those ten bytes I could write into. At startup the bootstrap code would grab whatever was in that memory, chuck it via i2c onto another processor, clear the ten bytes, and then proceed with normal bootup.
When booted from cold that memory held nothing, when rebooted the memory was not cleared (because power was not removed) and thus I had my diagnostics from the previous execution.
And yes, I found the bug with the help of the diagnostics (don't recall what it was, but that isn't important).
I'm a minority race. Save your vitriol for white people.
Eric is having problems with his memory. Look into his recent postings for any time he talks about something he's done in the past. Check it against reality. Time sequence wrong? Claims that don't line up with the facts? Difficulty with things he handled gracefully in the past?
He's forgetting things, and filling in the gaps with confabulation.
I am no fan of his, but I would not wish this on anyone.
--Try passing BASH commands over SSH to something that Requires Quotes to work right. There's a reason my hair is short.
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
I would assume that printf is thread safe on at least some operating systems, printing the whole message and blocking other calls. That would force thread synchronization with the mutex present but hidden. Especially on Windows, where they do a lot to protect users against bad code, without much outside input in those decisions.
I wish that they would sell it.
It never does quite what I want,
but only what I tell it.
Have gnu, will travel.
Back in the day, I'd be given a programming task, and my boss would, naturally, want me to estimate how long it would take. Sometimes it looked like something I could do in maybe a day or two, but I always worried about that elusive bug that might pop up, where I'd get 95% of the coding done in that day or two, but then spend a week tracking down that one knotty problem.
In theory, theory and practice are the same; in practice they're different. (Yogi Berra & A. Einstein)
Ooh, I had a fun one like that. Writing a script in Tcl with the Expect library, the library was blowing up with a segfault and no details. Added debug output lines to pin down where it failed and it worked.
After a few hours of not getting anywhere fixing the problem, I just left a print statement that put no characters on the screen, and a comment explaining the weird crash that would occur if it was removed.
Which "most C-syntax languages" do you have in mind?
In C++, C#, Java, Perl, and most languages I can think of, assignment is an expression, which means it returns a value.
The sole exception I can think of is that in Rust there IS NO assignment for many kinds of values. In Rust what looks like an assignment:
A = B;
May actually destroy B, making it no longer accessible. The value is moved from B to A, not copied. There can be only one instance of most value types, there cannot be two variables with the value. For that reason in Rust you can't do:
A = B = C;
That would (in any C derived language) result in A and B having the same value, which is often not supported in Rust.
Is Rust what you meant by "most languages"? Rust is weird in this respect (and a few other respects as well).
guarantees nothing about when (or in what order) the values of i and j are read and copied into printf's stack frame (which happens before printf is even called just as every other functions' arguments).
If you reply, do so only to what I explicitly wrote. If I didn't write it, don't assume or infer it.
Yes this is just programming. And Eric Raymond's attempt to sound relevant and profound by pontificating and giving names to things which don't really need naming.
SJW n. One who posts facts.
Eric S. Raymond
Just curious: is there another Eric Raymond with a different middle name that the open source community runs the risk of confusing with this Raymond? If not, why the (therefor apparently labored) inclusion of "S"?
-- not criticizing, just genuinely curious
- First they ignore you, then they laugh at you, then ???, then profit.
setgrey vs setgray
'nuff said
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Recently I spent "weeks" going behind a configuration problem with a RAID1 in an embedded system I am working with. One of the disks go out of the RAID in a casual way ...
Then, I check the disks and they are OK, I search about my Linux version and some person indicated that "maybe" was because of the particular kernel version, so I improved the kernel. But the new kernel was not OK with my distribution, so I have to change it ... but the new distribution uses different versions of some key libraries forcing me to recompile my program with the new ones (crossing my fingers that the new libraries will continue working on my program) ... after remaking everything, the first thing was to find a failure, just than faster than before. What .... ( !!! some word not suitable for children !!! ).
At the end, I stopped and cold my mind. Let's see what could be ... a little more research. Oh, that symptom it is similar to a bad cable ... but cables are brand new and high quality ones ... let's try a new one. Oh my God !! ... zero problems. Now I have many days with a perfectly behaved system. That cable was in failure, not matter how I needed to trust in the provider.
In general. When something happens, it is important first to stop and to enumerate all possible reasons for the failure. Then, order them by difficulty. The easier to check first, and go deeper as each solution it is not resolving the problem. Don't try to rebuild the world first; this is not a good solution.
and its time to context-switch, or even better,
go home for the day as I am exhausted.
I can't remember the details now. (In particular, I cannot remember the date or who drove me home) But I can recall these kinds of bugs where you put in the "print" statement after every line, and figure, NOW it will be revealed... ...and the bug goes away. And I gradually removed print statements and brought the code back to not-inspected, and the bug stays gone.
Your REAL nightmare would be to have it come back at that point, it would start to feel like the X-files. (It's close to that in Ellen Ullman's nighmarish novel, "The Bug")
I never had it go THAT far, but I did "cure" some bugs by looking for them, having them disappear without me knowing what part of the search process changed something in the non-Print-statements and made it go away, then wondered for ages what the hell had really gone wrong...usually every time I used the program for years, still feeling mistrustful and often double-checking it.
"Assumption is the mother of all fuck up's!"
In an ideal world our instrumentation systems (whatever form we use) would be tracking the state change of our system at every change and ensuring that that information is available somewhere (even as AWS S3 buckets - properly secured of course) that can be later used for monitoring flow through the system - Libraries used make this part difficult because they may use different or no instrumentation and then you have to ensure that your instrumentation can clearly identify this point so it can be easily discovered.
a very simple issue I ran into many years ago involved C++ allowing Resource ID's outside of a specified range to be considered acceptable but when run in release mode these ID's caused serious issues due to their being out of acceptable range - a minor issue that was not identified due to debug allowances.
Sure, printf is your friend but I have been telling people that for decades now. Nobody ever listened to me when they decided to blame their own naivety on the language or the development suite, or me personally, instead.
Home page: http://www.catb.org/esr/pytogo...
This is not too surprising given his recent work on reposurgeon, his previous statements that Python is simply not performant enough for converting the gcc repository from Subversion to git, and his exploration of Rust vs Go as systems programming languages.
My debug tools and code comprehension tools for C are nearly identical:
1. An IDE that's smart enough to dim code that's commended or #if'ed out (vscode works OK).
2. A code coverage tool (gcov is fine).
3. An execution trace tool (e.g., rr - https://en.wikipedia.org/wiki/Rr_(debugging)).
4. A "fat" interface to GDB (lots of dynamic content and context display, often a GUI).
The first "rule" is to simply spend NO TIME on code that has nothing to do with the task/problem at hand. Drill down first, then pay attention to the rest only as needed to get the job done. Which means the first step is to determine which code is relevant (vs. proven irrelevant).
I'm getting back into C after first having done 20 years of C then a decade of Python, all in the area of real-time/embedded sensing and control. I'm amazed how little the C environment has changed, and how rarely the newer (and better) C language features are actually being used.
printf makes calls to malloc/alloc and free to do allocation and deallocation for string construction. There were some libraries that provided the printf functionality using a static pool of memory and string concatenation.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Yes, I've had printf-obscuring bugs (Heisenbugs?), and bugs where printf and stderr get eaten by the program that wraps the program with the bug, and bugs where redirecting stdout or stderr to a file or a pipe changes the behavior of the program.
Eventually I resorted to putting debug statements inside unlink() calls:
unlink("-- debug message here")
and then used strace to watch what's going on:
strace -f -s 1024 -e trace=unlink programname args args args...
Unlink was chosen because it's not buffered (like printf), it's fast, it's a kernel (not library) call, it's got a single string argument, it doesn't have any[1] side effects, it's not called very much by normal programs, it can't be redirected to /dev/null, and I can grep for it easily.
[1] If you put '--' at the start of the message, there's no way it'll match an existing filename, thus, no side effects.
I know what it is that I mean for the program to do, but sometimes will type exactly the opposite, all the while continuing to read it the way that I meant it. Even putting an assert in will not help because in close proximity to where I've accidentally created this kind of inverted condition, it is unfortunately quite likely I will repeat the mistake. And again, when I make these kinds of mistakes, I cannot easily feel nd them on my own because I see the code I thought I wrote instead of what is necessarily actually there.
File under 'M' for 'Manic ranting'
I've ran into these sorts of bugs enough times that I've gained some good experience and now know to zoom out and check all my assumptions.
Examples:
- Is the process running actually the latest compile? Or is the web script you're attempting to access actually the one you are working? Is the function you're working in actually being executed?
- Are all compiler (interpreter) warnings and strictness options turned on? They should help, not hinder.
- Some problems lend themselves well to manual binary search techniques, successively bifurcating code until you narrow down to a single line. The same binary search technique works handily for git to find which commit broke some feature.
- Know when and how to use a debugger, especially for watches and conditional breakpoints.
- If your workflow includes writing a test harness for every broad feature addition/change, and you test for things that will *obviously* fail instead of assuming that they don't need to be tested, you may catch subtle bugs that existed for months but affect a totally unrelated piece of code you're working on now.
- address sanitizer, memory sanitizer, valgrind, etc
On an old system that suffered from poor architecture and management refusing to spend money to fix it...
I've had parts (LSB) of pointers get corrupt, causing corruption in other random places, but close to where they should point. A little code change and that pointer is no longer clobbered or in turn clobbers something different with no obvious effect.
I've had to add code to the C preamble that would check if a certain variable was corrupt, and log the address of the current function and call stack on just the first time it is found corrupt. That got me close enough to find it.
Lockups in Microsoft WIndows system calls that I finally had to add a timer interrupt and a heartbeat in the code. If the timer ever went off it would log the main program call stack up into the system call. We proved it was a Microsoft bug and pay $10K a year for the right to hear them say "Yep it is our bug, but we have no idea why and can't fix it."
Microsoft compiler bugs that would produce incorrect code if there was a /* */ comment that straddled a 0x8000 boundary in the source code.
MS FAT file system messes up file time stamps in daylight saving time changes.
MS WIndows problems with 32 bit timers overflowing after 42 days.
Yes, these bugs were in Window NT and later, not 3.11 or 9x.
I found a bug in the Z80 processor (push AF doesn't push correct flags if interrupt occurred during previous instruction) and when I reported it to Zilog, they admitted it and gave me some work around code. Find that one if you can.
Interrupt levels were level triggered instead of edge triggered as it was documented. This caused problems with nested interrupts. Very rare occurrence, until I had the idea that might be what it was and put a time wasting loop inside the interrupt handler to make it trivial to recreate.
Porting from real mode to protected mode, I had to write a GPF handler to catch app programmer bugs and allow the code to continue to run if the GPF was not harmful, like scanning past the end of data. That required code that disassembled the running code to find out how long the offending instruction was so that it could be skipped and registers modified to act like it did before.
More, but I've got to go visit father in law just home from the hospital.
> ...giving names to things which don't really need naming.
See: The Jargon File
But it was almost always a typo-- at least a half dozen times that cost me a day or more.
I would love to have ESR's problem of an actual puzzle to solve. All my schtoopid moments involved no logic, just syntax.
Or just don't share data amongst threads?
I like to think of those kinds of bugs as Weeping Angels. They only move when you're not looking at them.
I have about a dozen years experience in MS Embedded CE. There is typically a Release build, and a Debug build. Release will macro out all the debug statements, which changes the execution timing. Enough so to where the bug that is biting you is often seen only in Release. Switch to Debug to chase it, and it goes away.
I had a similar experience recently with a PIC32 project. The devboard they sell has floating inputs on UART1. It never fails in the devboard. It does fail in the board I made. The floating inputs every so often will decide to twitch back and forth rapidly, firing a shitstorm of interrupt requests that crash the firmware. It never dies on the devboard. It occasionally gets twitchy and dies on our board, which is exactly derived from the schematic of the devboard. As an added plus, if you hook up an oscilloscope to the pins that changes impedance, and the float goes away, and the problem goes away. I have no idea how the devboard does not suffer from the same problem.
Weaselmancer
rediculous.
I don't know how many days of my life have been lost chasing the SIGBUS on SPARC while porting libraries over to solaris.
And don't get me started on code which assumes a null pointer is an empty string
âoeIt ainâ(TM)t what you donâ(TM)t know that gets you into trouble. Itâ(TM)s what you know for sure that just ainâ(TM)t so" - https://newrepublic.com/minutes/126677/it-aint-dont-know-gets-trouble-must-big-short-opens-fake-mark-twain-quote
"It ainâ(TM)t what you donâ(TM)t know that gets you into trouble. Itâ(TM)s what you know for sure that just ainâ(TM)t so." - https://newrepublic.com/minutes/126677/it-aint-dont-know-gets-trouble-must-big-short-opens-fake-mark-twain-quote
"It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so."
https://newrepublic.com/minutes/126677/it-aint-dont-know-gets-trouble-must-big-short-opens-fake-mark-twain-quote
Had one of these last week. I'd upgraded Groovy (used as a scripting engine) from 2.3 to 2.5 (and 2.4 which also exhibited the problem). Suddenly scripts that extended a custom base script couldn't instantiate the base class due to a missing constructor. The initial script was instantiating, and if I removed extending the base class it was able to execute.
Of course, it was actually working in my IDE.
After multiple days of logging, experimenting, configuring my IDE to (partially) replicate the problem, chasing down the possibility of classloaders resulting in the base class not actually extending the correct Script class (prompted by it working in my IDE but not elsewhere), rebuilding Groovy 2.5 to work around a bug in it, etc I eventually found the root cause - and it was a facepalm moment.
The scripts were actually:
Script1:
@BaseScript subpackage.CustomScript
subpackage.CustomScript
@BaseScript CommonCodeScript
It appears that in Groovy 2.4 @BaseScript changed from using the no-arg Script() constructor to the Script(Binding) constructor. And CommonCodeScript didn't implement CommonCodeScript(Binding), so CustomScript didn't implement it either.
The fix was to just @InheritConstructors for CommonCodeScript.
I still don't know how it was working in my IDE ...
Having debugged a few multi-arch programs in my time. printf, sprintf are the most bug ridden things around. They are subtly different from CRT to CRT. Be sure to read the docs for your particular platform, tool-chain and revision of the same. Even then go double check it. Those 2 functions are ticking time bombs of over-runs and under-runs. Some CRT stacks keep crap in the TLB and some in the global space. That is usually not documented. Good luck!
"Veddy eenterestink!"
"It vas shtoopid."
"But it vas also veddy eenterestink!"
I have the strangest feeling I have read a comment just like this before!
Generally C-derived languages (like C++, C#, and Java) do consider assignments to be expressions. However, Java and C# do not automatically convert anything to boolean, preventing something like if (a = 2) from compiling.
dom
Instrumentation is nice, but try doing it on a smallish target (think microcontroller) which has to run in real-time, with mediocre and possibly buggy debug adapters.
Shtoopid problems might be programmer hell. Shtoopid problems on a small target that is hard to instrument is the laparascopic version of programmer hell.
so working on this all day, and while driving home, or laying in bed or whatever... the solution pops in your head! you now know what is wrong.
make haste to your desk, implement the thought of solution, which then... comes very close but still isn't perfect.
repeat.
On a long enough timeline, the survival rate for everyone drops to zero.
My experience is that when printfs make a problem go away, you almost invariably have an uninitialized or shadowed variable somewhere. Or (equivalently) you made an implicit assumption that a stack-based variable would contain some known value like zero.
The one that really got me was an archaic system and programming language I was using.
Now, I have an IT education and I was trained in a world of data types and strong typing. I had been told this archaic system stored everything as character, but this didn't really sink in. The system told me it had data types and they appeared to work, so what was the problem with character data really?
Then one day I was coding a test for zero, but it didn't work. I banged away at this problem for 3 days, no joke. By that time I was at the end of my rope, and willing to consider anything, even irrational, impossible stuff. Finally I tried (with no enthusiasm or hope) a test against the value of 0.0. And it worked.
You could have knocked me over with a feather. Test for 0 and it doesn't work, test for 0.0 and it works. WTF???
It was the character data. Indeed, this system stored everything as characters, and the 'data types' it presented were comforting illusions. Thus 0.0 was different than 0, and a black hole to the Gates of Hell opened up...
I love legacy systems, they remind me of why progress is good and necessary.
A classmate of mine spent 45min trying to debug a crash. Eventually he added some printfs here and there but they didn't trigger. So he added more and more. Still nothing. Then he tried to understand why a "shtoopid" printf didn't work... Eventually he figured out that he never actually saved the file in those 45min, that he had kept running the same binary over and over.
You've come to grips with a phenomenon every other developer with at least five years under their belt has experienced and learned to address.