Performance Bugs, 'the Dark Matter of Programming Bugs', Are Out There Lurking and Unseen (forwardscattering.org)
Several Slashdot readers have shared an article by programmer Nicholas Chapman, who talks about a class of bugs that he calls "performance bugs". From the article: A performance bug is when the code computes the correct result, but runs slower than it should due to a programming mistake. The nefarious thing about performance bugs is that the user may never know they are there -- the program appears to work correctly, carrying out the correct operations, showing the right thing on the screen or printing the right text. It just does it a bit more slowly than it should have. It takes an experienced programmer, with a reasonably accurate mental model of the problem and the correct solution, to know how fast the operation should have been performed, and hence if the program is running slower than it should be. I started documenting a few of the performance bugs I came across a few months ago, for example (on some platforms) the insert method of std::map is roughly 7 times slower than it should be, std::map::count() is about twice as slow as it should be, std::map::find() is 15% slower than it should be, aligned malloc is a lot slower than it should be in VS2015.
>> that the user may never know they are there
They will if they try to run a lot of them on a machine with finite resources, like a phone. Or it's a process that's iterated frequently, like a "big data" operation. But if the end user STILL doesn't notice it...then it's hard to call it a bug.
On the other hand, the performance/just-get-er-done trade-off is well known to programmers of all stripes. (At least I hope it is - are people really finding new value in the article?) There's the quick and dirty way (e.g., a script), and then there's the "I can at least debug it" way (e.g., a program developed in an IDE), and then there's the optimized way, where you're actually seeing if key sections of code (again, especially the iterated loops), are going as fast as possible. Generally your time/cost goes up as your optimization increases, which becomes part of the overall business decision: should I invest for maximum speed, maximum functionality, maximum quality, etc.
I came here to say this, mostly.
I *know* that there are plenty of places in our software that I could spend an hour or two, and rewrite an algorithm to run in 1/5th the time. And I don't care at all, because the cost is too low to measure, and usually, performance bottlenecks are elsewhere.
Who really cares if I can get a loop to run in 800ns instead of 1500ns, when the real bottleneck is a complex SQL query 11 lines up that joins 11 tables together and takes 3 full seconds to run?
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Maybe in your world, but when weighted down with sloggy operating systems and minimal memory (typical of many Windows 10 installations TODAY), code can get pretty slow.
For a very long time now, there have been libs that add breakpoints to examine how long processes are taking, think: debug mode, that can pinpoint problem areas pretty easily. Not enough coders use them.
It gets worse when a user has 94 Chrome tabs open, something in Office, and an AV app running.... all on a laptop whose processor speed is measured in furlongs per fortnight.
Yeah, SOME computers are way faster, and some have been habitually overloaded with things outside of a coder's control yet their app still must perform within a "reasonable" amount of time. Blah.
---- Teach Peace. It's Cheaper Than War.
It is a losing battle to try and solve performance in the programmer space. The Compiler does a much better job of optimization due to a multitude of compiler trics including both Static and dynamic analysis, cache analysis and so on. The programmer trying to write the most efficient code should rather spend his/her time trying to use out of the box algos as far as possible as the compiler knows how to fine tune those. next they should run a profiling tool like jprofiler and see where the job is actually spending its time rather than trying to say this is probably the heaviest part of the program. With multiple cores and multiple instruction pipelines and optimizing compilers the bottleneck is oftentimes not where we would think it to be. Once we find the bottleneck using a profiling tool than we can optimize it. In most cases 2% of the code is causing 98% of the bottleneck so its a much better use of programmer time (which is of course more expensive than computer time in most cases) to work backwards.
1 Write your code so that its correct irrespective of efficiency,
2 profile and then
3 fix the bottlenecks
rather than trying to find the most efficient algorithms before you write your code.
**Life is too short to be serious**
Programmers love to use the cop-out
"Premature Optimization is the root of evil"
dogma which is complete bullshit. It tells me your mindset is:
Except later never comes. /Oblg. Murphy's Computer Law:
* There is never time to do it right, but there is always time to do it over.
As Fred Brooks said in Mythical Man-Month.
Which can be translated into the modern vernacular as:
* Show me your code and I'll wonder what your data structures are,
* Show me your data and I'll already know what your code is
There are 2 solutions to this problem of crappy library code.
1. You are benchmarking your code, ALONG THE WAY, right?
Most projects "tack-on" optimization when the project is almost completed. This is completely BACKWARDS. How do you know which functions are the performance hogs when you have thousands to inspect?
It is FAR simpler to be constantly monitoring performance from day one. Every time new functionality is added, you measure. "Oh look, our startup time went from 5 second to 50 seconds -- what the hell was just added?"
NOT: "Oh, we're about to ship in a month, and our startup time is 50 seconds. Where do we even begin in tracking down thousands of calls and data structures?"
I come from a real-time graphics background -- aka games. Every new project our skeleton code runs at 120 frames per second. Then as you slowly add functionality you can tell _instantly_ when the framerate is going down. Oh look, Bob's latest commit is having some negative performance side effects. Let's make sure that code is well designed, and clean BEFORE it becomes a problem down the road and everyone forgets about it.
2. You have a _baseline_ to compare against? Let's pretend you come up with a hashing algorithm, and you want to know how fast it is. The *proper* way is to
* First benchmark how fast you can slurp data from a disk, say 10 GB of data. You will never be FASTER then this! 100% IO bound, 0% CPU bound.
* Then, add a single-threaded benchmark where you just sum bytes.
* Maybe, you add a multi-threaded version
* Then you measure _your_ spiffy new function.
Library Vendors, such as Dinkumware who provide the CRTL (C-Run Time Library), _should_ be catching these shitty performance bugs, but sadly they don't. The only solution is to be proactive.
The zeroth rule in programming is:
* Don't Assume, Profile!
Which is analogous to what carpenters say:
* Measure Twice, Cut Once.
But almost no one wants to MAKE the time to do it right the first time. You can either pay now, or pay later. Fix the potential problems NOW before they become HUGE problems later.
And we end up in situations like this story.
Same here
The users are getting a correct result. Good.
The developers moved on to something else that's also important. Good.
The machine is doing 15% more work than strictly necessary... Is it slowing down the users? No. Are we getting hammered by the electricity bill? No. Is the machine getting tired? No. So what exactly is the problem?
Like the real Donald (Knuth) said: "premature optimization is the root of all evil (or at least most of it) in programming".