Performance Bugs, 'the Dark Matter of Programming Bugs', Are Out There Lurking and Unseen (forwardscattering.org)
Several Slashdot readers have shared an article by programmer Nicholas Chapman, who talks about a class of bugs that he calls "performance bugs". From the article: A performance bug is when the code computes the correct result, but runs slower than it should due to a programming mistake. The nefarious thing about performance bugs is that the user may never know they are there -- the program appears to work correctly, carrying out the correct operations, showing the right thing on the screen or printing the right text. It just does it a bit more slowly than it should have. It takes an experienced programmer, with a reasonably accurate mental model of the problem and the correct solution, to know how fast the operation should have been performed, and hence if the program is running slower than it should be. I started documenting a few of the performance bugs I came across a few months ago, for example (on some platforms) the insert method of std::map is roughly 7 times slower than it should be, std::map::count() is about twice as slow as it should be, std::map::find() is 15% slower than it should be, aligned malloc is a lot slower than it should be in VS2015.
THEY'RE GONNA GET YOU. Unless, of course, you use Rust. Then you are safe.
Too bad companies don't pay experienced programmers more than they do a freshly-graduated pimplebottom. So that's what they get.
In SW development it's easy to spot obviously dumb decisions in code and those are probably plentiful. But it's rarely as obvious what the best solution is, mostly because there's always a tradeoff. You can generally use up extra memory in order to boost processing speed (i.e. lookup tables, rainbow tables) and vice versa. This in itself means that on a machine where memory is the bottleneck (and if you exceed it you have paging and swapping on disk) the "best" solution is different than on a machine with excess memory, but a weak processor.
TL;DR: benchmarks depend on hardware. "Best" is often unattainable in a general sense.
It's stupid to call them "the dark matter of programming bugs". We were just accustomed to this being the way Microsoft did things, not a bug, a feature.
That stems from Microsoft, originally writing for IBM, being paid per thousand lines of code. As such it made sense that software was not written efficiently because the programmer was not rewarded for efficiency, it merely had to fit within available memory. Unfortunately it seems that this practice has not stopped given the sheer size of Microsoft operating systems relative to the amount of end-user functionality that has been added since the days of say, Windows 3.1.
Do not look into laser with remaining eye.
If performance sucks, buy a faster computer. Speed covers a multitude of sins.
A performance bug is when the code computes the correct result,
So... not bugs at all, then, really.
Inefficiency is frequently a better option than nothing at all, and all code suffers from bit-rot.
Understand how CPU works, code in C, don't use shitty 3rd party libraries.
>> that the user may never know they are there
They will if they try to run a lot of them on a machine with finite resources, like a phone. Or it's a process that's iterated frequently, like a "big data" operation. But if the end user STILL doesn't notice it...then it's hard to call it a bug.
On the other hand, the performance/just-get-er-done trade-off is well known to programmers of all stripes. (At least I hope it is - are people really finding new value in the article?) There's the quick and dirty way (e.g., a script), and then there's the "I can at least debug it" way (e.g., a program developed in an IDE), and then there's the optimized way, where you're actually seeing if key sections of code (again, especially the iterated loops), are going as fast as possible. Generally your time/cost goes up as your optimization increases, which becomes part of the overall business decision: should I invest for maximum speed, maximum functionality, maximum quality, etc.
This story smells like advertising for some guy's template library.
In a project I'm working on, I was sorely tempted to switch from STL to Boost. Trouble is, I didn't want to add another dependency to the project, when the STL was adequate. And container performance doesn't matter much when the network is the bottleneck.
Is it a problem, if no one notices? Most code can be optimized more than it is, but if everyone's happy with how it works, what's the problem?
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Like windows update doing something in N*N instead of N?
Yeah it's a little bit slower... At the beginning.
Programmers need to always keep in mind that their software is always going to be used, by some people, for a longer duration than intended, and with larger datasets than expected.
of C, C++ ("embedded C++"), assembly on Intel (with MMX/SSE/SSE2/...) and some VLIW architecture on TI C64x.
But I am now managing projects.
Because nobody gives a shit about performance, except for very specific niches (here audio video codecs).
The first example: std::insert should do nothing if an item is already present. So calling std::insert only if the object is not yet present is 100% equivalent to calling std::insert directly, but it is 7 times slower.
This violates the important principle that when using a library, the obvious way to do things should be the fasted. So hacks are required to make your code fast, and that shouldn't happen.
I assume the explanation is probably that std::find is small enough to be inlined, while std::insert is too big, so nothing is inlined at all when you call it.
All code can be optimized by 1 instruction
All code contains at least 1 bug.
Therefore, any program can be reduced to a single instruction, but it will be wrong.
On what basis does the author assert things like "x% slower than it should be"? If the code meets the requirements, including all performance requirements, then there is no bug. Once the requirements are met, any further efforts to be x % better is waste. Development efforts should have moved on to the next project.
It is a losing battle to try and solve performance in the programmer space. The Compiler does a much better job of optimization due to a multitude of compiler trics including both Static and dynamic analysis, cache analysis and so on. The programmer trying to write the most efficient code should rather spend his/her time trying to use out of the box algos as far as possible as the compiler knows how to fine tune those. next they should run a profiling tool like jprofiler and see where the job is actually spending its time rather than trying to say this is probably the heaviest part of the program. With multiple cores and multiple instruction pipelines and optimizing compilers the bottleneck is oftentimes not where we would think it to be. Once we find the bottleneck using a profiling tool than we can optimize it. In most cases 2% of the code is causing 98% of the bottleneck so its a much better use of programmer time (which is of course more expensive than computer time in most cases) to work backwards.
1 Write your code so that its correct irrespective of efficiency,
2 profile and then
3 fix the bottlenecks
rather than trying to find the most efficient algorithms before you write your code.
**Life is too short to be serious**
Programmers love to use the cop-out
"Premature Optimization is the root of evil"
dogma which is complete bullshit. It tells me your mindset is:
Except later never comes. /Oblg. Murphy's Computer Law:
* There is never time to do it right, but there is always time to do it over.
As Fred Brooks said in Mythical Man-Month.
Which can be translated into the modern vernacular as:
* Show me your code and I'll wonder what your data structures are,
* Show me your data and I'll already know what your code is
There are 2 solutions to this problem of crappy library code.
1. You are benchmarking your code, ALONG THE WAY, right?
Most projects "tack-on" optimization when the project is almost completed. This is completely BACKWARDS. How do you know which functions are the performance hogs when you have thousands to inspect?
It is FAR simpler to be constantly monitoring performance from day one. Every time new functionality is added, you measure. "Oh look, our startup time went from 5 second to 50 seconds -- what the hell was just added?"
NOT: "Oh, we're about to ship in a month, and our startup time is 50 seconds. Where do we even begin in tracking down thousands of calls and data structures?"
I come from a real-time graphics background -- aka games. Every new project our skeleton code runs at 120 frames per second. Then as you slowly add functionality you can tell _instantly_ when the framerate is going down. Oh look, Bob's latest commit is having some negative performance side effects. Let's make sure that code is well designed, and clean BEFORE it becomes a problem down the road and everyone forgets about it.
2. You have a _baseline_ to compare against? Let's pretend you come up with a hashing algorithm, and you want to know how fast it is. The *proper* way is to
* First benchmark how fast you can slurp data from a disk, say 10 GB of data. You will never be FASTER then this! 100% IO bound, 0% CPU bound.
* Then, add a single-threaded benchmark where you just sum bytes.
* Maybe, you add a multi-threaded version
* Then you measure _your_ spiffy new function.
Library Vendors, such as Dinkumware who provide the CRTL (C-Run Time Library), _should_ be catching these shitty performance bugs, but sadly they don't. The only solution is to be proactive.
The zeroth rule in programming is:
* Don't Assume, Profile!
Which is analogous to what carpenters say:
* Measure Twice, Cut Once.
But almost no one wants to MAKE the time to do it right the first time. You can either pay now, or pay later. Fix the potential problems NOW before they become HUGE problems later.
And we end up in situations like this story.
Along the lines of preoptimization is the root of all evil. Because if the user doesn't notice the .01ms of lag caused by a performance issue why the hell would I want to pay a software engineer 1hr of time to fix it.
Indeed. A human being can not even perceive a difference between 1 millisecond and 1 microsecond.
But, repeated a million times, the former turns into 15 minutes, whereas the latter is still merely a second. Food for thought...
In Soviet Washington the swamp drains you.
Check the links, decent code and analysis. Short and simple. I recently found a very similar bug in both PHP and HHVM with their trim() function (and variants there of). In both PHP and HHVM, trim() unconditionally allocates more memory, even if there is no white-space on either end of the string to trim. It is faster to write PHP code to check for white-space on both ends and then conditionally call trim() on a string.
The programmer is writing crappy and unoptimized code.
It's a myth of the cult of unit-testing that all tests passed == no bugs. Correctness bugs outlive all executed tests, and are a worse form of dark matter because they give no outward signs.
Come back when you want to talk unnecessary complexity increases, like the update procedure that takes exponential time.
I interact with a piece of code I don't maintain that has this issue. A piece of Visual Basic (don't get me started) that copies from a source to a destination. However instead of using B=A to move the data, the author used copy(A) and paste(B). The data therefore interacts with the clipboard, slowing the process down.
The issue might not be noticeable on a small amount of data. However, I use this piece of code to move gigabytes of data every day.
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
Showed this to Netflix once and they stated 'This fixes everything we are currently having issues with'. Apparently the entire industry has implemented API's in distributed architectures in such a way as to create architectural cross cutting concerns... https://www.slideshare.net/bob...
This is my sig. There are many like it but this one is mine.
Surprise, surprise, surprise!
We were required to install it on our Linux servers - we run CentOS (same as RH). Every few days, the stupid monitor is suddenly eating 99%-100% of the CPUs... for *hours*. Overnight.
I attached strace to it, and it's in some insanely tight loop, looking at its own threads.
Maybe if I prove that it's doing it on multiple servers (it is, but I have to catch it - nothing's reporting this, unless it runs the system so hard it throws heat-based machine checks), and put a ticket in, and *maybe* the team that forced it on us will *maybe* talk to CarbonBlack....
I have *zero* doubt that there's a ton of other COTS products with issues like this.
Being an old fart, in my day, I remember the worst performance problems were caused by programmers with their own badly written library of functions and objects that they included everywhere, most of those were from their very first weeks of being a programmer and they sucked badly.
And if you think the performance N.F.R. is dark, wait 'til you find out about security.
NFR
People use sloooow languages, like C or ASSembly which "compile" into a static binary, which is slow.
Java uses Hotspot [garbage collecting ] so applications are always [GC] impro [GC] ving performance on the f [GC]l [GC] [GC][GC] [GC] y and go even faster.
I don't [GC] [GC] [GC] bother with performance. need to sort? I use bubb[GC] [GC] [GC]le sort, because Java can make it [GC] [GC] [GC] [GC] so much faster than [GC] [GC] [GC] c's "quick" sort, and it saves me lots of [GC] [GC] [GC] [GC] [GC] typing.
For more performance, Java applications should run on a java system, running a java bytecode interpreter running on java. More [GC] [GC] [GC] recur[GC][GC][GC]sion means more [GC] [GC] [GC] [GC] speed, and less mem[GC][GC][GC][GC][GC]ory.
In 1973 or 1974 I was the systems programming manager at the National Academy of Sciences after our IBM mainframe had been upgraded to the first version of the OS supporting virtual storage. And many programs, mostly Fortran programs, were running much slower than they used to. The problem was two-dimensional arrays and how they were initialized and searched. If you're looping through the wrong subscript in a big array, you cause a page fault every time you increment it. Flash storage makes that a much smaller problem than it is with disk drives, but I'm sure most programmers today have no idea that this problem exists or how to handle it.
Only because Fortran stores multidimensional arrays in column-major order, while every other language in the known universe uses row-major order.
The rule of thumb for programming anything is, first make it work, then make it work better / faster.
If the first pass works well enough and fast enough, it doesn't matter if the code was written an efficient manner. If somebody used bubble sort for an array of 5 items, who cares? If the array becomes larger, now you have a performance bug.
It's literally wasteful to spend time on performance enhancement before you know which performance problems actually occur in real life. Another name for premature performance enhancement is "gold plating."
"an experienced programmer, with a reasonably accurate mental model of the problem"
Well, that's one of the many things that separate programmers from coders as I designate them. I.e., the ones who know and do vs. the ones who have a limited clue at most but do it nonetheless and behave like they'd be the gods of algorithm and program development. In my book, and in my area of work, a code that gives the result but it's 2-7-whatever times slower than it could be with some actual knowledge besides tapping codewords in the correct order, that code is nothing besides a limited applicability proof of concept at best, targeted for supporting viability and then destined for disposal. That's why I never worked with coders and I don't care for the gazillions out there who think the ability to use python makes them devs or software engineers, oh my. Also, for the poster, performance issues regarding stl, well, good story, no news.
I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
Well, if you think of video stutter on Windows or Linux with a bad Nvidia driver, then it really is dark matter.
So, I profile that code, I find, hm, odd, this loop is taking a lot of time.
It's accessing some array, wierd, array accesses are normally blazingly fast.
Oh look, it's a two dimensional array, that's something you don't see every day!
Let's play a bit with the code, hey, it's fast now that I'm looping through it differently... hm, I wonder how that thing is laid out in memory... oh, could it be that it is causing cache line misses / page faults / disk cache misses (yes, those abstractions are present at every level)?
The first time you pulled this wastage you would get a verbal warning.
Second time you pulled this, I'd try to fire you, but HR would want a written warning, so I'd give you a written warning.
Third time you pulled this, I'd sent the second written warning to HR and they would have security frog march your time, money and attention wasting self out the door.
Premature means premature. Learn the word, or leave. Feel free to discuss if its changed to mature, and life is good. Add your own requirements and delay shipping, GTFO.
(CAPTCHA: Shouted)
If muslims were made of dark matter then their matter bombs wouldn't interact with them and they wouldn't be suicide bombers. They would just be bombers.
This is why I do all my coding in PHP and JavaScript. Performance concerns are jettisoned before the coding even starts.
For an explanation: http://thedailywtf.com/article...
Isn't this why we have profilers like Valgrind to identify slow functions?
Anons need not reply. Questions end with a question mark.
Often performance issues are caused by a sub-optimal algorithm. It works, but a different approach would be better. A recent dev post about the game Factorio highlighted a more straightforward issue:
A one-parameter fix to a long-standing bug. Gotta love it.
"I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.
I came here looking for the usual discrediting due to edge cases, obscure platforms or bad statistics. Instead there's a bunch of arguing over the tired old "it does (not) matter", so I had to look myself. Well blow me down, this is as clear as day - on recent Visual Studio and XCode with clang, unsorted_map::insert is 7 times slower than calling unsorted_map::find first and skipping the call to insert if it already exists. That's just what insert is supposed to do for you. The platforms, the compilers, the library and the functions are all extremely common. Debugging suggests the delay is due to an unnecessary alloc/dealloc. Good find!
So many performance-insensitive clods commenting on Slashdot these days.
Only because Fortran stores multidimensional arrays in column-major order, while every other language in the known universe uses row-major order.
Julia is another that I know of, not surprisingly because it continues the scientific computing traditions of Fortran in many ways. From TFM:
Multidimensional arrays in Julia are stored in column-major order ...
This convention for ordering arrays is common in many languages like Fortran, Matlab, and R (to name a few). The alternative to column-major ordering is row-major ordering, which is the convention adopted by C and Python (numpy) among other languages.
Escher was the first MC and Giger invented the HR department.
SQL Server's query planner gets better and better with every release, but it still screws up on a regular basis. To have efficient queries on SQL Server the programmer (in the absence of a full-time DBA) needs to know far more about the internals of the server than they really should. Some of the common issues for slowly running queries I see include:
SQL Server's a mine field of bad performance for the unaware. I doubt it's much different in Oracle and MySQL/MariaDB.
From my experience, some hackers are polite and professional, but lack the skills, experience or technology to be worth it for me,I once hired a hacker and charged me $460 but could not locate the person who hacked my gmail account and sent out personal emails. I then hired ZeusHacks and they located the perpetrators within 6 hours at a cheaper price. You can contact them on ZEUSHACKERS01 at OUTLOOK dot COM . They offer lots of hacking services like social media hacks like facebook hacks, kik hacks, bank account hacks, iCloud hacks, whatsapp hacks, recover passwords, upgrading school grades
Although, a "scripting" language without a serious performance bug may actually process a larger data set faster than a program written even in a system language like C, if said latter program contained a serious performance bug. But not making obvious mistakes is hardly "optimization". At the risk of making a little bit too contrived example, the decision not to use a bubble sort is not something I'd call optimization. It's just common sense.
Ezekiel 23:20
I'm pretty sure that you have effectively still the same problem with CPU caches and that this is the reason why Fortran compilers regularly perform loop transformations today.
Ezekiel 23:20
But the problem most likely exists in all the newer languages, yet I have never heard of programmers being aware of it or optimizing for it.
I might be biased because I work in the computer graphics industry but memory coherency is the first thing we think about when doing something new. This is also a major point if you look at any of the presentations about performance at CppCon (C++ conference). They are easily found on YouTube. See also data oriented programming. This is NOT a forget ten problem at all.
Every numerical methods text involving a scientific math library has warnings about the array-transposition bug. Huge math optimization work has gone into dealing with it. The problem is much worse in modern super-computers than in the older computers. In the old computers, memory accesses were fairly predictable. New supercomputers are clusters of computers. Each node can only hold a small section of a large array, and communication time is often a function of both the distance between nodes in hops and total communication load (saturated interconnects).
Huge work goes into figuring out how to do array operations in a fast manner. The work often becomes highly application dependent. Find special short-cuts that apply to particular matrices that allow one to make use of special theorems.
Huge work goes into figuring out how to do array operations in a fast manner. The work often becomes highly application dependent. Find special short-cuts that apply to particular matrices that allow one to make use of special theorems.
This is actually one of the things I'd like to see in the kind of experimental compiler I outlined above.
Ezekiel 23:20
or security admins who misuse McAfee
Table-ized A.I.