Programming As If Performance Mattered
Junks Jerzey writes "Saw the essay 'Programming as if Performance Mattered', by James Hague, mentioned at the Lambda the Ultimate programming language weblog. This is the first modern and sensible spin on how optimization has changed over the years. The big 'gotcha' in the middle caught me by surprise. An inspiring read." Hague begins: "Are particular performance problems perennial? Can we never escape them? This essay is an attempt to look at things from a different point of view, to put performance into proper perspective."
If only I had written my first post program with performance in mind I could have not failed it!
The golden rule of programming has always been that clarity and correctness matter much more than the utmost speed. Very few people will argue with that
Really? How about the server side of things?
Shameless bragging: Why don't you take a look at my page to get a whole new view on peformance?
Is the time it takes me to do the performance optimization worth it in time or money.
is that ms word 4 did all I need, and now the newest office is a thousand times the size and uses so much more cpu and ram but does no more.
a sad inditement
I code in managed environments (.NET and Java), so I just let some mysterious thing manage performance for me.
You can spend all your time optimizing for performance and when you finally release your product, your competition whose main objective was to get the product out the door faster, who uses a slower algorithm, is already first in mindshare with your customers. Not only that, the processors that you thought you would be targetting are already a generation behind and that algorithm that was going to hold back your competition runs perfectly fast on new processors.
Performance gains occur at the hardware level. Any tendency to optimize prematurely ought to be avoided, at least until after v1.0 ships.
I have been pwned because my
What, you mean I don't get paid per line of code I write?
i remember times when 7.14mhz and 256k ram was enough to drive a multitasking windowed os. (amiga)
ive seen glenz vectors and roto-zoomers on the commodore 64.
modern os's, escpecially windows seem super-sluggish when you see what is possible on those old computers if you just care to optimize the code to the max.
Can I light a sig ?
Unroll those loops by hand. You'll get a little bump in speed. And a little bump in your pocketbook.
this slashdot post would read:
I just finished reading the essay "Programming as if Performance Mattered", by James Hague. The essay covers how compiler optimization has changed over the years. If you get bored, keep reading; there's a big 'gotcha' in the middle. Hague begins: "Will performance issues haunt us forever? This essay puts performance analysis in perspective."
...it would have been better for him to show the run times for all the versions of his program to show us what difference each of the changes had made...
Sheesh!
Nothing is so smiple that it can't get screwed up.
If you write clear and simple code the compiler or interpreter does all the other work. It will automatically remove unused code and simplify complex segments. So long as your code is not unnecessarily convoluted often the machine optimizations are better than the human brain optimizations. It's like register allocation. You don't do that by hand. That's just crazy! Some poor fools 20 years ago had to do it by hand and came up with an algorithm to do it that the computer just does for you.
That's the difference between modern languages and more archaic ones. Sure you can't get the "absolute best" most optimized optimization, but you're probably going to get a better optimization than you can think of just from the interpreter/compiler doing its job.
The only thing that really needs optimization is streamlining data structures because the compiler can't predict what part of the data structure isn't used during runtime. You just ned to make sure you use the right data structure for the job and put the basic pen-and-paper (optimized) algorithm down in plain code. No strange hacker tricks needed.
While the author's point seems to be that optimization and performance are not all that important, and that you can achieve better results with how you do things and not what you use, I tend to disagree with him.
The thing is, in real life applications, playing with a Targa file is not the same as service critical, 300 users, number crunching, data handling systems, where a small performance improvement must be multiplied by the number of users/uses, by many many hours of operation and by years in service to understand its true impact.
Just now I'm working on an econometric model for the Panama Canal (they're trying to make it bigger and need to figure out if it's worth the effort/investment) and playing with over 300 variables and 100 parameters to simulate dozens of different scenarios can make any server beg for more cycles, and any user beg for a crystal ball.
There are two kinds of people in the world: Those with good memory.
should really read that essay! Maybe then we wouldn't need dual-core 4-6 GHz CPUs and 2GB ram to run their new OS.
bash: rtfm: command not found
One of the concepts touched upon is the idea that optimization is only needed after profiling. Having spent the last few years building a system that sees quite a bit of activity, I have to say that we have only had to optimize three times over the course of the project.
The first was to get a SQL query to run faster: a simple matter of creating a view and supporting indexes.
The second was also SQL related, but on a different level: the code was making many small queries to the same data structures. Simply pulling the relevant subset into a hash table and accessing it from there fixed that one.
The most recent one was more complex: it was similar to the second SQL problem (lots of high overhead small queries) but with a more complex structure. Built an object to cache the data in with a set of hashes and "emulated" the MoveNext, EOF() ADO style access the code expected.
We have also had minor performance issues with XML documents we throw around, may have to fix that in the future.
Point? None of this is "low level optimization": it is simply reviewing the performance data we collect on the production system to determine where we spend the most time and making high level structural changes. In the case of SQL vs a hash based cache, we got a 10 fold speed increase simply by not heading back to the DB so often.
Irony? There are plenty of other places where similar caches could be built, but you won't see me rushing out to do so. For the most part performance has held up in the face of thousands of users *without* resorting to even rudementry optimization. Modern hardware is scary fast for business applications.
Sig under construction since 1998.
The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization -- For experts only: Don't do it yet.
-- Michael Jackson (not the molestor one)
66 fps on a 3 GHz machine, doing a 600x600 simple RLE decode...
Ok, it's not bad for a a language like Erlang, but it's not exactly fast.
The big point here for the author is "it's fast enough". Lots of micro- (and macro-) optimizations are done when it turns out they aren't needed. And writing in a high level language you're comfortable in is important, if it'll do the job. This is a good point.
On the other hand, even a fairly naive implementation in something like C or C++ (and perhaps Java) would probably have acheived the goal without having to make 5 optimization passes (and noticable time examining behavior).
And even today, optimizations often do matter. I'm working on code that does pretty hard-real-time processing on multiple threads and keeps them synchronized while communicating with the outside world. A mis-chosen image filter or copy algorithm can seriously trash the rest of the system (not overlapping DMA's, inconvenient ordering of operations, etc). The biggest trick is knowing _where_ they will matter, and generally writing not-horrible-performance (but very readable) code as a matter of course as a starting point.
Disclaimer: I was a hard-core ASM & C programmer who for years beta-tested 680x0 compilers by critiquing their optimizers.
"Are particular performance problems perennial?"
...
"point of view, to put performance into proper perspective."
Brought to you by the "Alliteration is the Only Literary Device I Ever Learned" School of Writing.
The golden rule of programming has always been that clarity and correctness matter much more than the utmost speed.
In the "real world", not only is correctness and clarity more important than speed, but so is stability.
Free Firefox news reader.
Working on a heavily math based application speed is necessary to the point that the user is not expected to wait a significant amount of time without something happening. I have a large background in game programming working on crap systems and it comes in handy. My tolerance for delays goes to about half a second for a complete operation. It doesn't matter how many steps are needed to perform the operation, it just all has to be done in less than half a second on a 1200Mhz system. My main test of performance is seeing how long it takes for Mathematica to spit out an answer compared to my program. Mathematica brags about being the fastest and most accurate around.
When operations take several seconds a user gets annoyed. The program is percieved to be junk and the user begins looking for something else that can do the job faster. It doesn't matter if productivity is actually enhanced. It just matters that it's percieved to be enhanced or that the potential is there.
You also have to consider if the time taken to complete an operation is just because of laziness. If you can easily make it faster, there's little excuse not to.
For distributed apps you have to consider the cost of hardware. It may cost several hours of labor to optimize but it may save you the cost of a system or few.
In the world of games half a second per operation works out to 2 frames per second which is far from acceptible. Users expect at minimum 30 frames per second. It's up to the developer to decide what's the lowest system they'll try to get that target on.
You have to consider the number of users that will have that system vs the amount it will cost to optimize the code that far.
In terms of games you also have to consider that time wasted is time possibly better spent making the graphics look better. You could have an unoptimized mesh rendering routine, or a very fast one and time left over to apply all the latest bells and whistles the graphics card has to offer.
There are countless factors in determining when something is optimized enough. Games more so than apps. Sometimes you just need to get it out the door and say "it's good enough."
Ben
Work Safe Porn
Performance is an inssue for us too.
I just want a GNU distro that runs as fast as windows 98. Debian based. And a pony.
Is that the same James Hague that wrote articles for ANTIC and Analog back in the era of the Atari 8-bits?
As another user commented, server software can benefit greatly from a large variety of optimizations, since better performance translates directly into supporting more users on fewer/cheaper servers.
Optimizations also have significant effect in software designed to perform complex computations, such as scheduling.
Also, the trend of ignoring performance considerations with the claim that modern hardware makes optimizations obselete is precisely what leads to the trend, particularly among Microsoft software, for the software to become significantly slower with each revision.
A particular illustration of this was in my last semester's 'Systems Programming and Optimization' course. The professor set us a project where we could choose an interesting subsystem of a Linux distro, analyze the code, and point out possible areas where it could be further optimized. I'm a pretty enthusiastic Debian user, so I chose to analyze the apt-get code. Our prof was very focused on low-level optimizations, so the first thing I did was to pull apart apt-get's Perl codebase and start to recode sections of it in C. At a mid-semester meeting, the professor suggested that I take it even further, and try using some SIMD/MMX calls in x86 assembly to parallelize package load calls.
This was a big ask, but me and my partner eventually had something working after a couple of weeks of slog. By this stage, apt-get was *flying* along. The final step of the optimization was to convert the package database to a binary format, using a series of 'keys' encoded in a type of database, or 'registry'. This sped up apt-get a further 25%, as calls to a machine-readable-only binary registry are technically superior to old fashioned text files (and XML was considered too slow)
Anyway, the sting in the tail (and I believe this is what the article highlights) was that upon submission of our project, we discovered that our professor had been admitted to hospital to have some kidney stones removed. In his place was another member of the faculty...but this time, a strong Gentoo supporter! He spent about 5 minutes reading over our hand-coded x86 assembly version of apt-get, and simply said "Nice work guys, but what I really want to see is this extended to include support for Gentoo's 'emerge' system...and for the code to run on my PowerMac 7600 Gentoo PPC box. You have a week's extension'
Needless to say, we were both freaking out. Because we had focused so heavily on optimization, we had sacrificed a lot of genericity in the code (otherwise we could have just coded up 'emerge' support as a plug-in for 'apt-get'), and also we had tied it to Intel x86 code. In the end we were both so burnt out that I slept for 2 days straight, and ended up writing the 'emerge' port in AppleScript in about 45 minutes. I told the new prof to just run it through MacOnLinux, which needless to say, he wasn't impressed with. I think it was because he had destroyed his old Mac OS 8 partition to turn it into a Gentoo swap partition. Anyway, me and my partner both ended up getting a C- for the course.
Let this be a lesson...read the article, and take it in. Optimization shouldn't be your sole focus. As Knuth once said, "premature optimisation is the root of all evil". Indeed Donald, indeed. Kind of ironic that Donald was the original professor in this story. I don't think he takes his work as seriously as he once did.
The economist Brian Arthur is one of the proponents of the theory of path dependence. In path dependence something is adopted for reasons that might be determined by chance (e.g., the adoption of MS/DOS) or by some related feature (C became popular in part because of UNIX's popularity).
The widespread use of C and C++, languages without bounds checking in a world where we can afford bounds checking, is not so much a matter of logical decision as history. C became popular, C++ evolved from C and provided a some really useful features (objects, expressed as classes). Once C++ started to catch on, people used C++ because others used it and an infrastructure developed (e.g., compilers, libraries, books). In sort, the use of C++ is, to a degree, a result of path dependence. Once path dependent characteristics start to appear, choices are not necessarily made on technical virtue. In fact, one could probably say that the times when we make purely rational, engineering based decisions (feature X is important so I'll use language Y) are outweighed by the times when we decide on other criteria (my boss say's we're gonna use language Z).
All projects are an exercise in scheduling, and something is always bound to fall of the radar given the real-world time constraints. In my experience, the right thing to do is get a few smart people together to isolate any problem areas of the product, and try to determine whether that code might produce performance bottlenecks in high-demand situations. If you find any warning areas, throw your limited resources there. Don't fret too much about the rest of the product.
In the business world, you have to satisfy market demands and thus cannot take an endless amount of time to produce a highly optimized product. However, unless you are Microsoft, it is very difficult to succeed by quickly shoving a slow pile of crap out the door and calling it "version 1".
So where do you optimize? Where do you concentrate your limited amount of time before you miss the window of opportunity for your product?
I know plenty of folks in academia who would scoff at what I'm about to say, but I'll say it anyway... just because something could be faster, doesn't mean it has to be. If you could spend X hours or Y days tweaking a piece of code to run faster, would it be worth it? Not necessarily. It depends on several things, and there's no really good formula, each case ought to be evaluated individually. For instance, if you're talking about a nightly maintenance task that runs between 2am and 4am when nobody is on the system, resource consumption doesn't matter, etc., then why bother making it run faster? If you have an answer, then good for you, but maybe you don't and should thus leave that 2 hour maintenanc task alone, spend your time doing something else.
For people who are really into performance optimization, I say get into hardware design or academia, because the rest of the business world doesn't really seem to make time for "doing things right" (just an observation, not my opinion).
Less code is faster than more code! Simply put, it's easier to optimize if you can understand it, and it's easier to understand if there's not so much of it. But when you optimize code that didn't really need it, you usually add more code; more code leads to confusion and confusion leads to performance problems. THAT is the highly-counterintuitive reason premature optimization is bad: It's not because it makes your code harder to maintain, but because it makes your code slower.
In a high-level interpreted language with nice syntax--mine is Python, not Erlang, but same arguments apply--it's easier to write clean, lean code. So high-level languages lead to (c)leaner code, which is faster code. I often find that choosing the right approach, and implementing it in an elegant way, I get performance far better than I was expecting. And if what I was expecting would have been "fast enough", I'm done -- without optimizing.
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
You would think that with all the years put into developing computer languages, as well as the decades of software engineering, that these algorithms and techniques would make their way into best practices.
.Net's as well as Smalltalk's are all great. Taking the power of those libraries and making them usable across different languages, even making them scriptable would bring the speed optimizations in those libraries available to everyone.
This, of course, has already begun with many frequently used algorithms like sorting or hashing being made part of the language core libraries, but more than that, it seems that duplicating effort occurs much more often than simply that.
This is one instance where Microsoft has really come through. Their COM architecture allows for inter-language reuse of library code. By releasing a library which is binary compatible across different languages, as well as backwards compatible with itself (v2.0 supports v1.9), the COM object architecture takes much of the weight of programming difficult and repetitive tasks out of the hands of programmers and into the hands of library maintainers.
This kind of separation of job function allows library programmers the luxury of focusing on optimizing the library. It also allows the client programmer the luxury of ignoring that optimization and focusing on improving the speed and stability of his own program by improving the general structure of the system rather than the low level mundanities.
Large libraries like Java's and
I have been pwned because my
Unless the compiler is VC++ 5.0-try looking at an asm dump from that puppy. Ouch.
Programming: As If Performance Mattered
Database applications.
The potential for a SQL statement to go tragically awry, hanging the user session and sending the CPU to 100%, is significant. In a well-designed database, it won't happen too often, but it will happen often enough for you to need a good DBA close at hand to deal with it.
It's probably at its worst with Oracle, which now possesses a black box called the Cost-Based Optimiser. This little piece of voodoo uses a large range of metrics to decide on an execution plan for your query, and woe betide you if it gets it wrong - you'll be tearing your hair out trying to persuade it to do things differently.
Mind you, I've also seen programmers run catersian joins against two tables that had several million rows each. But that's not optimisation; it's trying to decide what to hit them with.
If you compare a man to a monkey today, you see cognitive differences (granted, some similarities too) that tend to seperate these two related species. When a monkey attempts to open a locked box, he will try many times to pry the box open, bash at it with his fists, and the like. A man, however, quickly realizes the box is locked and uses a tool to break the lock. While the monkey's strategy is simple, and will _eventually_ get the box open, the man's strategy is more complex, but much more efficient.
Same goes with programming. If you have to search through a massive sorted database, no skilled programmer alive would use a linear traversal (simple, yet inneficient), they would use a binary search (more complex, yet far more efficient).
So when customers pay for you to get that locked box open, who's strategy will you choose?
No matter how good the compiler is, it can't possibly compensate for poor program design.
Right at the outset, you need to decide if the program is performance critical or not. Take, for example, this code:
class fred{
int q;
int setQ(int newQ)
{
q=newQ;
}
};
fred myFred=new(fred);
Now, which is going to execute faster:
myFred.setQ(10);
or
myFred.q=10;
Using your super optimising compiler, how is it going to know the best way of setting q in myFred? It can't, because no compiler could make an assumption like that.
Every time I'm tempted to start micro-optimizing, I remind myself of the following three simple rules:
2) If you feel tempted to violate rule 1, at least wait until you've finished writing the program.
3) Non-trivial programs are never finished.
One of the golden rules that Python is founded on is optimizing the algorithm, and let the compilers / interpreters do all the rest. Python, like perl and most other high-level languages, allow you to tweak the code and try out many different algorithms, without worrying about pointers and useless trivia like how much memory is available to your process.
If you master the algorithm, and you still want more speed, you can go ahead and usually cut run times by half or more by implementing it in C. But it takes a lot of work to get it right in C, and once in C, it isn't very friendly to tweaking.
About the compiler / interpreter doing the real optimizations - psyco is freely available and regularly gets close to C speeds for python. The new parrot compiler, when it comes out, will redefine the speed barriers for all high level languages ported to it.
I work at a company where we have half C-developers and half perl-developers. The perl developers code circles around the C developers, getting projects done within constraints on time and with fewer bugs. The perl programs are easier to maintain and modify. However, everyone always wants to push code into C. It's amazing when I sit down and plan out projects, and show them, "If we implemented it in Perl, we would be done in 1/4 the time with 1/4 the number of bugs, and 100% of the features." But the retort, "But C will be 3 times faster!" And I respond, "Buy 3 times as many machines to run the perl code on for all I care. It'll still be cheaper to do it in perl becuase hardware costs far less than the developer's, tester's, and manager's time!" Hey, as long as they sign my paycheck, I'll do what they ask. But if they still want to live in the 80's, that's their problem.
The radical sect of Islam would either see you dead or "reverted" to Islam.
So long as your code is not unnecessarily convoluted often the machine optimizations are better than the human brain optimizations.
That's not what optimising is. It is a logic problem, one that a computer cannot solve. It is organization, it involves using your mind. It is making sure that your code isn't doing more work that it should. Restructuring code to remove redundant operations. Finding a better way to do the things you need to do.
Claiming that a compiler can do a better job of optimizing than a human is exactly the same thing as claiming that a computer makes a better opponent in UT2k3 than a human.
Humans have creativity on their side.
Proper programming perspective? Please. People-centered programming? Pretty pathetic.
Programmer's purpose: problem-solving. Programmers prefer power - parallelizing, profiling, pushing pixels. Programmers prefer Pentium PCs - parsimonious processing power. Pentium-optimization passes Python's popularity.
Ponder.
[Previous painful posts: P, D]
"Empathise with stupidity, and you're halfway to thinking like an idiot." - Iain M. Banks
bump in your pants whaaa!!!
Optimization isn't really a hard topic. Should a programmer spend days nitpicking fifty lines of code that won't be used frequently? No. When initially writing code, should someone use Bogosort instead of Quicksort ? I'll let you figure that one out.
My biggest (reasonable) beef in the optimization area is software bloat. Programs like huge office suites containing excessive, poorly implemented crap that people won't use really ticks me off. KISS. Even the stuff that has to be complicated.
Of course, I'll always be a sucker for tweaking code for the fun of it, when I have the time. =)
This statement is false.
If you're code must process a large amount of data, look for ways of designing your program so that you serially process the data. Don't try to bring large amounts of data from a database or data file all at once if you don't have too. Once you are no longer able to contain the data in physical memory, and the program starts using 'virtual' memory, things slow down real fast. I've seen architects forget about this, which is why I'm writing this reminder.
On the other hand I've worked on a C++ project where, in a certain segment of the code, it was necessary to write our own container class to replace one of the std: classes, for performance on the SPARC architecture. Using the std: container would cause the subroutines to nest deeply enough to so that the cpu registers needed to be written out out to slower memory. The effect was enough to be quite noticeable in the app.
With today's processors, to optimize for speed, you have to think about memory utilization, since running within cache is noticably faster than from main memory. Things are not as clear cut, so far as speed optimization goes, as they once were.
While I agree that it is silly to spend eternity optimizing small routines that trivially achieve the required level of performance for their intended purpose, a lot of scientific packages and simulation software absolutely demand serious performance optimization. The mantra about premature optimization being the root of all evil is only true if it doesn't save you significant development and testing time as a result. If you have a simulation application that requires an hour to do enough work to complete a minimal function/correctness test, it immediately pays off if you spend time optimizing the code if you can reduce your test times down to a half hour, for example. I've worked on a lot of software packages that run calculations for days and/or weeks at a time on parallel computers. You always want to start out with fast sequential algorithms and data structures before you get into using multiple processors. Parallelism is inherently more complex, so its often worth it to squeeze the most you can out of a single thread/process before going to multiple threads/processors. While desktop apps typically consume a negligable amount of CPU time/resources, apps that are candidates for running on parallel computers, clusters, or big SMP machines are inherently more costly to run in both CPU time and user's wall-clock time, so those applications don't fall within the same logic that trivial apps like image decompression/formatting do.
I remember looking over something once that was clear, simple and very slow. It was a set of at least twenty if statements, testing the input and setting a variable. The input was tested against values in numeric order, and the variable was set the same way. Not even else if's so that the code had to go through every statement no matter the value. I re-wrote it to a single if, testing to see if the input were in the appropriate range and calculating the variable's value. No compiler is going to do that. Brute force can be clear, simple and slow.
Good, inexpensive web hosting
I agree with your statement.
In my opinion there are two ways to optimize code. The bad way, optimize line by line, writing ASM, etc. And then there is the good way which is using better program design and algorithms.
Of course the "bad way" can be good in certain specific occasions (scientific, embedded, etc), but I would say that in general, it's just a waste of time. For starters, the compiler takes into account things like the number of registers and parallel pipelining that we can not optimize manually in C.
Also, it can often be cheaper to just buy a 200MHz faster processor than spend piles of the cash optimizing software to the last bit and maintaining it. Of course, this is not so true with widly distributed desktop products, but it is for large in-house of consultation projects.
This article seems to be something that I learned twenty years ago... performance is an aspect of good design.
That is why I insist on "optmization" in the beginning. Not peephole optimization - but design optimization. Designs (or "patterns" in the latest terminology) that are fast are also naturally simple. And simple - while hard to come up with initially - is easy to understand.
But that's also why I discount any "high level language is easier" statement, like this fellow makes. It is significantly harder to come up with a good architecture than learning to handle a "hard" language. If you can't do the former (including understanding the concepts of resource allocation, threads, and other basic concepts), you certainly aren't going to do the latter. Visual Basic is not an inherently bad language because you can't program well in it. It just attracts bad programmers.
And that goes the same for many of the newer "Basics": these "managed languages" that make it so that people can "code" without really understanding what they're doing. Sure, you can get lines of code that way. But you don't get a good product.
And then the whole thing falls apart.
There seems to be two basic causes of bad performance:
:-))
1. Mathematically impossible to do it any other way.
2. Modularity.
Of course crap code/logic also counts, but it can be rewritten.
The problem with modularity is that it forces us to break certain functions down at arbitrary points. This is handy for reusing code, of course, and it saves us a lot of work. Its the main reason we can build the huge systems we build today. However, it comes with a price.
While I don't really know how to solve this practically, it could be solved by writing code that never ever calls other code. In other words, the entire program would be custom-written from beginning to end for this one purpose. Sort of like a novel which tells one complete story and is one unified and self-contained package.
Programs are actually written more like chapters in the mother of all choose-your-own-adventure books. Trying to run the program causes an insane amount of page flipping for the computer (metaphorically and actually
Of course this approach is much more flexible and allows us to build off of the massive code that came before us, but it is also not a very efficient way to think about things.
Personally, I think the languages are still the problem because of where they draw the line for abstractions. It limits you to thinking within very small boxes and forcing you to express yourself in limited ways. In other words, your painting can be as big as you want, but you only get one color (a single return value in many languages). It is like we're still stuck at the Model T stage of language development--it comes in any color you want as long as its black!
Hexy - a strategy game for iPhone/iPod Touch
1. Is that your sig or is that part of your comment? If it is part of your comment, please explain why it would give me a whole new view on performance. If it's your sig, then spooky how it was related to the topic.
2. Assuming your stuff is good, when are you going to code up SHA-1 (*MY* favorite hash)?
3. On the server side of things, I would argue that correctness is more important than otherwise. If an app crashes 1 in 100 times for a desktop user, the developer blames windows and the user is satisfied (don't flame me on this, please). On the server, if the app crashes 1 in 100 times, it may bring down the transactions for 100s of users, making things very bad for the developer. For non-crash correctness problems, consider a problem which makes a minor, but cumulative error in subsequent runs. That would likely be disasterous for the server situation.
As far as clarity, find me one developer who has taken over a project and not complained about the quality of the inherited code ever. Seriously. (that's not directed at parent)
Network Security: It always comes down to a big guy with a gun.
- Decoding a RLE data buffer is short of impressive as a benchmark. RLE was designed as a simple and specific (generally inefficient) compression approach for age-old hardware (i.e. 8MHz, not 333MHz as the base system used here).
How about JPEG or PNG ?
- The author actually spent several iterations optimizing this Erlang code. And these optimizations required handling special cases. (So performance eventually did matter to the author?) Now, would a 'first throw' implementation in C/C++ have been written faster while immediately performing better than the Erlang version? (simpler code)
- I agree that the compiled/interpreted code performance matters less and less, because processors are so much more powerful. For instance, the processing for RLE decompression should in any case be negligible wrt the memory or disk i/o involved.
What is becoming increasingly important, however, is the data structures and algorithms that are used. In this perspective, C++ still shines, thanks to the flexibility that its algorithms and containers library provides.
C++ offers both a high level of abstraction (working with containers), and provides the ability to convert to a different implementation strategy with ease - if and when profiling demonstrates a need.
For large system and library development, the strong static typing of C++ is also a real plus (it doesn't matter to me it is faster than dynamic typing or not).
I totally agree that performance should not be a concern during program implementation (other than avoiding 'unnecessary pessimization', which involves the KISS principle and knowledge of language idioms). Optimization should only be performed where the need for a speed-up has been demonstrated.
Other than saying "wow this interpreted language runs damn fast on current hardware", this article does a poor job at making any relevant point.
radix omnia malorum prematurae optimisatia est -- Donald Knuth
I am hearing a lot of people saying that you shouldn't optimise prior to the first release. However, it is very easy to select a design or architecture that limits your high end performance limit. Therefore, there is some optimisation that needs to be done early.
When you're architecting a system that is going to take tens of man years of effort to implement, you need to ensure that your system will scale.
For example, a project I recently worked on hit a performance wall. We had left optimisation for later, always believing that it shouldn't be done until last. However, while the architecture chosen was incredibly nice and clear, it limited the performance to 1/3th what was required. Back to the drawing board, we just doubled the project cost - ouch.Even worse, there are performance differences on each platform! For example, did you know that throwing an exception is 10,000 times slower than a return statement in HP/UX 11? Solaris is only a little better at 2 orders of magnitude. Linux is (I understand) a dead heat.
So, while low-level optimisation of statements is silly early in the project, you do need to ensure that the architecture you choose is going to meet your performance requirements. Some optimisations are definitely necessary early in the project.
The article also talks about tool selection, suggesting that the extra CPU could be better used to support higher level languages like Erlang. If a system has CPU to spare, I agree, use what you can. The projects I work on always seem to lack in CPU cycles, disk write speed, and network speed. You name it, we're short of it. In fact, a large part of our marketing strategy is that we are able to deliver high performance on low end systems. What would happen to us if we dropped that edge? We're working with a company that has implemented a real-time billing system in Perl. Not a problem, until you try and send it 500 transactions/second. Their hardware budget? Millions to our 10s of thousands. Who do you think the customer likes more?
Jason PollockMy software engineering prof. believes that optimization should never be done during a project. Instead he thinks the programmer should wait until the project is complete then give careful consideration as to wether to optimize or not. He says most problems can be fixed by upgrading to better hardware and hours of optimization is not worth 3-4k more in hardware costs. I thought he was crazy to preach this during lecture. What do you guys think? Would you spend a day designing a better algorithm or finish the project and buy faster hardware?
My company which does a good bit of open source development uses very similar methodologies to those outlined in the essay.
HOW'S MY POSTING? CALL 1-800-POSTING
Sigh. One of the best sources of flamebait is being right for the wrong reasons.
Surely C++ must rate as the least well understand language of all time. The horrors of C++ are almost entirely syntactic, beginning with the decision to maintain compatibility with the C language type declaration syntax and then adding several layers of abstraction complexity (most notably namespaces and templates).
There are only two areas where I fear C++ for program correctness. The first is making a syntactic brain fart leading to an incorrect operator resolution or some such. These can be tedious to ferret out, but most of these battles are fought with the compiler long before a defect makes it into the production codebase.
My second source of fear concerns interactions of exception unwinding across mixtures of object oriented and generic components. I see this as the only case where managed memory provides a significant advantage: where your program must incorporate exception handling. If you can't manage your memory correctly in the absence of the exception handling mechanism, I really don't believe you can code anything else in your application correctly either. I think exceptions are mostly a salvation for poor code structure. If all your code constructs are properly guarded, you don't need an error return path. Once a statement fails to achieve a precondition for the code that follows, the code path that follows will become a very efficient "do nothing" exercise until control is returned to a higher layer by the normal return path, whereupon the higher layer of control can perform tests about whether the objectives were achieved or not and take appropriate measures. I think the stupidest optimization in all of programming is cutting a "quick up" error return path that skips the normal path of program execution so that the normal path of execution can play fast and loose with guard predicates.
The four languages I use regularly are C, C++, PHP, and Perl. Perl is the language I'm least fond of maintaining. Too many semantic edge cases that offer no compelling advantage to motivate remembering the quirk. C++ has many strange cases, but for C++ I can remember the vast majority of these well enough, because I've stopped to think about how they evolved from taking the C language as a starting point.
I happen to love PHP for the property of being the most forgettable of all languages. I forget everything I know about PHP after every program I write, and it never slows me down the next time I sit down to write another PHP program. The managed memory model of PHP appeals to me in a way that Java doesn't, because as an inherently session-oriented programming model, PHP has a good excuse for behaving this way.
I have a love/hate relationship with both C and C++. I write one program at a high level of abstraction in C++ and then when I return to C it feels like a breath of fresh air to live for a while in an abstraction free zone, until the first time I need to write a correctness safe string manipulation more complicated than a single sprintf, and then I scream in despair.
The part of my brain that writes correct code writes correct code equally easily in all of these languages, with Perl 5 slightly in the rear.
If I really really really want correct code I would always use C++. The genericity facilities of C++ create an entire dimension of correctness calculas with no analog in most other programming languages. The template type mechanism in C++ is a pure functional programming language just as hard core as Haskell, but because C++ is a multi-paradigm language, in C++ you only have to pull out the functional programming hammer for the slice of your problem where nothing less will do.
What I won't dispute is that C++ is a hard language to master to the level of proficiency where it becomes correctness friendly. It demands a certain degree of meticulous typing skills (not typing = for ==). It demands an unflagging determination to master the sometim
So much as an attempt to "prove" that programming to the metal is no longer necessary or desireable. (IE "After all, if a C++ programmer was truly concerned with reliability above all else, would he still be using C++?" )
The analogy is all wrong. These days there are distinctly two types of "optimization". Algorithmic and the traditional "to the metal" style.
During college I worked with the English department training English students to use computers as their work had to be done on a computer. (This was before laptops were commonplace) The theory was that word processing allowed students a new window into language communication. To be able to quickly and painlessly reorganize phrases, sentences and paragraphs showed the students how context, clarity and meaning could change just by moving stuff around.
This is what the author has discovered. That by being able to move code actions around, he can experiment and "play" with the algorithm to boost speed while keeping error introduction to a minimum. (Ye olde basic anyone?)
He mistakenly equates this to "advanced technologies" like virtual machines and automatic memory buffer checking. In reality, we've just removed the "advanced technologies" from the process. (IE Like pointers, dynamic memory allocation, etc) (IE, ye olde basic anyone?)
There's nothing wrong with this. Though I am a C++ programmer by trade, I was far more productive when I was professionally programming Java. But that was because I had LESS creative control over the solution because of the language syntax. No passed in variable changing, no multiple inheritance, etc. So I'm thinking of how to layout the code, there's pretty much a limited way of how I'm going to go about doing that.
It's like the difference between having the Crayola box of 8 crayons and the mondo-uber box of 64. If you're going to color the green grass with the box of 8, you've got: Green. If you've got 64 colors, you're going to agonize over blue-green, green-blue, lime green, yellow-green, pine green and GREEN.
That doesn't make C++ less "safe" than Java. Sure, you can overwrite memory. But you can also create a Memory class in C++ ONCE which will monitor the overflow situation FOR you and never have to worry again.
But back to optimization:
66 fps seems really fast. But in game context it's still kind of meaningless. Here's why. You're not just displaying uncompressed images. You're also doing AI, physics, scoring, digital sound generation, dynamic music, User input, possibly networking. As a game programmer, you don't stop at 66 fps. Because if you do 132 fps, then you can really do 66 fps, and still have half a second left over to do some smarter AI or pathfind. Or if you get it up to 264 fps than you can spend 1/4 of the cycle doing rendering, maybe you can add true Dynamic voice synthesis so you don't have to prerecord all your speech!
Ultimately, my point is this. (and I think this is what the author intended) You're going to get bugs in whatever language you write in. That's the nature of the beast. VM's and 4th generation languages take away the nitty gritty of programming while still providing alot of performance power. And in alot of cases, that's a good thing. But it's still nothing more than a "model" of what's really going on in the hardware. If you're really going to push the limits of the machine you have to be able to control all aspects of it. Now, it's getting harder to do that in Windows. We spend more time coding to the OS than the metal. But in the embeddes systems category, and in console video game systems the metal still reigns and if you're going to develop a game that will push the hardware, you're going to need a programming language that will let you speak machine language. Not one that's going to protect you from yourself.
As it was in the beginning, as it always will be: Right tool for the right job.
What you say makes sense, but is completely wrong.
You have to consider the entire system design when looking at the bestplace to make the optimization. You need to look at what the bottleneck and attack that, but keep in mind the issue in upgrading the system.
Fight Spammers!
An intelligent compiler (i.e. any modern compiler you'd be likely to use) will automatically __inline the fred::setQ function, and then the peephole optimizer will reduce it down to the equivalent of myFred.q = 10;
--Jeremy
Jesus was a liberal
CS: Coder who always does low-level optimizations
UT: Guy pressing fire as fast as possible
CS: Coder who relies on compiler for optimizations
UT: Low-life using an aimbot
Performance? I don't have a problem with peformance. Just ask my wife.
I don't know why his example was so bad.
A good example would be how to detect if a king is in check in a chess program. There are a few different approches. Some are fast, some are slow, and a compiler just cannot "optimize" a slow approach into a fast one. The function is called millions of times per second in a chess program, so you want it optimized.
When I start on a program, I usually make "place holder" functions where necessary to get the program up. Sure, this will be slow, but at least I can get the program up and running quickly (the place holders usually do what they're supposed to in the most convenient-to-code way I could think of, or emulate their final functionality - for example by returning true all the time).
:)
What this achieves for me, is that I can look at the program as a whole, and _then_ identify where the problem areas are - most likely not where I thought they were... Even if the first version takes 5 minutes to run (as my first attempt at a depth-first tree search did), it works passably, and is often easier to optimize than trying to optimize each function as I write it.
Might not work for everyone, but I like coding this way
Yes?
Maybe you do take a hit? But look at what you get for your troubles. I'd say it's worth it.
And, sure enough, there's a known, exploitable buffer overflow in Microsoft's RLE image decoder.
Is setQ returning an int or not?
ok, so this guy is saying that.. he found 5 or 6 ways to improve the performance of his program by attacking things in an entirely different fashion... ok..
:
back in the day, i discovered a really great trick... you might represent it as something like...
boolean a;
a = 1 - a;
this is a zillion times more efficient than if(a == 1) a = 0; else a = 1;
it is also about the same as a |= 1; if you were going to use bitwise functions.
OK. Great.
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/
*most* of the time I trust the compiler, but where performance really matters I do it myself because sometimes the optimizer produces slower or incorrect code. Slower is fine, but incorrect code causes problems when your stuff works in debug, but not in release.
bzzzt... split infinitive, use of passive voice, etc.
Sheesh. Don't make the corrections unless you know what you're talking about. Where's the split infinitive? In "I just finished reading..."? There's no split infinitive there.
That post doesn't use passive voice, either -- "the essay" is a perfectly valid subject. Passive voice would be "compiler optimization is discussed."
Hmph.
I think it's worth noting that this guy's optimized version, while fast enough for his purposes, is probably a couple of orders of magnitudes slower than it can be: The 333 MHz Pentium II he squeezes 5 frames and some change out of can decode 30 frames/sec of MPEG-2 video and stereo AC-3 audio without breaking a sweat. The machine I use to watch DVDs is a 300MHz Pentium II. Decoding an MPEG-2 frame involves a bit more computation than decoding an RLE encoded image. Just reading and decoding the DCT coefficients for an MPEG-2 I Frame would take more computation than decoding an entire targa file.
So, yes, interpreted languages are great, and fast enough for most applications, but they're not suitable for everything, as the author ironically illustrates.
Ok, fair enough :) For a spectacularly simple example like this, of course the compiler will inline the code. And thanks for pointing out that setQ() wasn't supposed to be returning an int :)
That, however, wan't the point I was trying to make. What I was (extreamly badly) trying to say was that the compiler doesn't have a global view of the code, it can only be looking at the code from a local point of view. So, if your class design is badly thought out, the complier's performance, no matter how clever, isn't going to help for the real world.
This isn't to say that a modern compiler can't do some amazing things. If you want real performance, though, depending on the compiler to magically fix a bad design is not the smartest thing.
The problem, in my opinion, is that people go about optimizing in the wrong place.
You can spend all day optimizing your code to never have a cache-miss, a branch misprediction, divisions, square roots, or any other "slow" things. But if you designed an O(n^2) algorithm, my non-optimized O(n) algorithm is still going to beat it (for sufficiently large n).
If the asymptotic performance of your algorithm is good, then the author is right, and you may not find it worth your time to worry about further optimizations. If the asymptotic performance of your algorithm is bad, you may quickly find that moving it to better hardware doesn't help you so much.
Alan
wtf is the difference between a parameter and a variable?
BIG OH WAS here.....whoever says optimizations are not important are a) not talking about the proper optimizations b) have never heard of big OH ..
It has to due with input size N and it's rate of growth:
such as linear(O(N)) algorithms, compared to a dual nested for loop algorithms over input size N will run in O(N^3)
That can be years of straight processing on 100000 machines, or a few microseconds on 1.
lookup: http://c2.com/cgi/wiki?BigOhNotation
Some implementations of popular dynamic languages (e.g., LISP, Scheme), let you do some type inference and/or some explicit declarations, and will spit out machine code or C that will do the job that much faster. Tweak your algorithm in the slow version of the language and then produce a program that runs ten times faster with an optimizing compiler.
The Squeak VM is a great example of this. The whole thing is written in Squeak itself. Running a Smalltalk VM this way is painfully slow, but a Smalltalk->C translator generates the code that will be compiled and used as the actual, runtime VM (which can support a whole host of things, including raster and vector graphics, sound, MP3 audio and MPEG video!).
N4st0r, trixx0r h0bb1tz0rz! Th3y st0l3 0ur pr3c10uzz!
if the program contains lots of strings with embedded spaces. :)
(Well, probably not measurably faster, but you could probably shave a few bytes off the executable... depending on padding and such)
HAND.
even with troll in your username, you got modded all the way to 5.
The only reason we program is because performance is an issue. If we would have automatic systems for making specification executable, we would not have to write programs. One of the reasons, I believe, that we do not have such automatic systems is that they would not perform. We need human optimizers, people like you and me, that can before the necessary optimizations. The von-neuman computer architecture is rather slow, it is only because of the memory piramide that it is able to perform reasonable. 90% of code is dealing with moving data around different types of memory (registers, cache, RAM, hard-disk, network) and doing differentials queries, e.g. changing the result of the query when the underlying data has been modified. The screen for example, is not completely computed when we move the mouse around, or with every keyboard hit. Only the area that is changed is refreshed. And this is just one of the examples. My conclusion is that the only reason we program (and not write specifications) is because performance is an issue.
High-level optimisation before worrying about low-level? Absolutely. You could start even higher up, and discover that often the wrong problem is being solved. How's that for inefficient?
Regarding the assignments, it's a sad story but actually comes back to documented requirements and documenting the change in requirements. (Did the original assessor sign his name to his mid-task comments?)
Back on the article, yeah, ok, sounds fair, but..
- My kids still use a 200MHz machine, and simple flash animations bog it down. I remember Wolf 3D running on 33MHz machines. What the?
- My mobile phone (with palmos) has more CPU & non-volatile storage than my first five personal computers did, and my current desktop has 200 times as much again. If I weren't developing software, viewing inefficient web-hosted graphics/animations or Word documents, and being in an Active Directory, I'd probably settle for a PDA with external screen and keyboard -- or any of the PCs we're currently chucking out as "worthless".
-- All your bass are below two Hz
I used to write programs the way you describe. In BASIC, around 1982. I was self taught, so I didn't know any better.
The problem with code in the form you suggest, from a programmer's point of view, is that once a program gets to a particular size - I'd suggest around 200 to 300 lines - you can't keep track of every detail or every variable, and what they do, are for, and what their current value us, or whether the current value is valid.
Modularity helps solve this problem, by allowing you to forget irrelevent details. In other words, it allows you to focus on what you should be focusing on, at that moment. As long as the other modules are well written, and debugged, you just mentally treat them like a black box.
Your concern about modularity impacting performance is misplaced, for a few reasons. Low performance in a program is usually a result of slow algorithms, not bad programming structure - the slight performance decrease of jumping to a module will be infinately smaller than the result of picking an inappropriate sorting routine.
Secondly, if calling routines does have a measurable performance impact, you can "inline" them. This means that a copy of the routine's assembly code is inserted in the position where the module call would have been. This may improve performance, although it comes at a cost - the size of the binary increases, as there are now multiple copies of the same routine in the binary. This impacts on memory though, which, on a low end system, may mean swapping to disk, which, of course, impacts performance so significantly, that any other optimisations performed were just a waste of time.
Everything comes at a cost. Modularity comes at the cost of time spent planning the modular architecture of the program, as well as the additional actual code to implement the modules. However, the benefits of modularity to program maintenance, data hiding and code reuse far exceed these costs.
If you're interested in further reading on this topic, I'd suggest getting a copy of "Code Complete", by Steve McConnell. His first edition has been available since 1993, however, I've heard a new version is coming some time this year, so it may be worth waiting for that.
The Internet's nature is peer to peer - 20050301_cs_profs.pdf
Even traditional disclaimers such as "except for video games, which need to stay close to the machine level" usually don't hold water any more.
Yeah, as long as you write simple, 2D games(like the author of the essay does) that would be true. Complex, 3D games are another matter. I write games for a living and even if you're within sight of cutting edge you're writing at least some assembly and spending a lot of time optimizing C++.
Now I'm not knocking all he says or saying that good games need to be in C++ and assembly. Some games rely heavily on scripting languages to handle the game mechanics and world events. There's a lot less assembly code than there used to be. However, the core engine that handles graphics, physics, AI, and I/O is going to be written in C++ and assembly and will be for the forseeable future.
If I published a game that required a 3Ghz computer to display 576x576 images at 66fps, I'd be laughed off the internet. A PS2 has a 300Mhz processor and needs to display a 512x448 image every 30-60 seconds.
In the early nineties, I chose a tool with high-level scripting abilities because I naively thought learning C would be overly complex, and overkill. Processing and displaying color images on a Mac, I could do one frame every few seconds. That was just barely "good enough" on our development machines but it wasn't good enough in a lab full of old machines, where the software had to run. Then someone showed us a program called NIH Image, which could do a slideshow at several frames per second at full resolution even on the old machines. The difference? THEY USED THE RIGHT TOOL, and got results that appeared almost magical.
A decade later, I figured CPU speeds had improved so much the hardware would no longer be an issue for any practical application short of weather forecasting, but even now I find people soaking processor cycles so badly with poorly chosen software architectures that a 2GHz machine slows to a crawl.
Think: HARDWARE IS NOT FREE. If you write software that requires 10 times the horsepower, then your hardware will cost 10 times as much. If your company is spending 100K a year installing new boxes, wouldn't it justify the effort to use the right tools, do your profiling, and reduce it even by half?
yeah.. that must be the most horrible example .. of both coding and optimization.. since the coding is poor and the optimization is going to happen anything from the compiler.
Comment removed based on user account deletion
But if you haven't heard of it http://www.javaperformancetuning.com/ is a good source of performance tips for java
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
The point is not that the language is "too hard", but that the low level language creates an order of magnitude more clutter and errors which get in the way of creating a reasonable, reliable program. They also obstruct creating a reliable efficient program.
-josh
And if you don't use it, you're a damnfool idiot. Consider:
try {
vector<int> foo(10, 0);
std::cout << foo.at(10) << std::endl;
} catch (exception &e) {
std::cerr << e.what() << std::endl;
}
Note: I haven't compiled this, so please consider it just pseudocode. But the point of the matter is, old non-bounds-checked arrays exist in C++ only because it's necessary for the C heritage. The vast majority of new C++ code out there should not use non-bounds-checked arrays, and if it does, it reflects more on the programmer than the language.
Do NOT convert to C++ under any circumstances!
Fortran 77 sucks.
But C++ sucks, in different ways.
Fortran 95 is a much better language than Fortran 77, and for many things, better than C++ as well.
It is practically a new language with an old name.
If you currently have a F77 code, it is almost certainly far better to start using Fortran 95.
Essentially all Fortran 95 implementations have compile and run-time checks which can make things as safe as Java, and when you take off the checks, things will run very fast. With the Intel Fortran 8.0, probably faster than anything other than hand-tuned assembly. You will probably whip GCC C++.
It is also quite doubtful you will get significantly better performance in C++.
No, I am not an old-fogey (I'm 35 now, programming since age 13). I learned Basic on the Apple II+, then Fortran 77 and C simultaneously when I got my first summer job. Then C++. Then Eiffel and Sather. Then Fortran 95.
Yes, indeed, fully knowing C++ I choose Fortran 95 for technical superiority in the problems I want and need to solve. (Sather was the best ever, but now dead, Eiffel good if you have sophisticated data structures and you don't need multi-dim arrays, and F95 best for any linkage to Fortran and multi-dim arrays, modules but not objects).
The problem is that C++ bugs, though less frequent than bugs in C, are can be deep, subtle and severe. The language has very opaque bits. Include files are antideluvian. Pointers and references, baroque and archaic. Object model brittle. Templates powerful and dangerous. A hideous and error-prone syntax.
This is not the case in Fortran 95. Other than fully algorithmic bugs are shallow.
"computer science" truly misunderestimates Fortran 95.
Now, which is going to execute faster
Neither, since both q and setQ are private to the class and thus can't be accessed.
A compiler can do low-level optimization, but it can't figure out a better algorithm for you, and the simplest, least convoluted algorithm is usually not the fastest.
All the assembly language fiddling in the world -- by the optimizer or by hand -- will give you maybe a 2x performance over C, 10x over perl, but a better algorithm will often increase performance by many orders of magnitude.
Intially develop the entire project in a langauge you can develop fast. Once it works (and you're sure it does what the client wants), find out where the most CPU time is spent, then optimise those bits. "Optimise" may just mean having a good look at your code and working out a better way of doing it, or it might mean writing a library in assembler. Either way, optimise last.
Guilty. But at least I'm thinking about the poor SOB who's going to be maintaining my code. In fact we were just implementing new functionality using a superfast but arcane algorithm and were having trouble debugging it (mucho matrix maths - yuk). Instead of finishing that, we researched another algorithm that instead uses triply-nested loops with two conditionals. It won't be half as fast (because of conditionals within loops of course) but it will be a heck of a lot easier for my successor to maintain. (Took 10 minutes to implement and worked first time. Had to check I wasn't stuck in a BTL simulation)
Cogitum Ergo Hatto
Perhaps peon posting parent post prefers posts precognitively portending prematurely passing profiling.
Performance profiling presents plentiful possibilities per producing programs performing past previous paces.
The (I think correctly) author argues that for many tasks we over stress optimization in places where it isn't necessary. Well and fine for tasks that it's not necessary such as the example he gives.
However, as available processing power increases, some tasks change. Many technologies follow a trajectory that starts at "unthinkable" then move to "if you have special hardware" and then move gradually to software. Often along the way, features and computational complexity are added that keep a technology barely in reach (of both HW and SW implementations). It can be many many years before some technologies settle into a stage where they can be comfortably supported in SW at acceptable performance.
Examples include: sound (which started with clicks and beeps and moved through to multichannel 3D audio), graphics, games (text-based to ever-more-complex 3D) and video codecs (simple RLE moving to ridiculously complex stuff like the H.264 codec). In games, for example, there are often preference panels controlling which features should be disabled for performance reasons. This seems evidence that the authors/publishers feel they can't count on their customers having enough power to run the games without cutting features to gain performance.
I think for those applications where processing power trails needs and desires of customers and where optimization can make up the difference, developers will need to optimize or be eaten by the competition. In my experience, in things like codec and graphics development, you can get many-times performance increases over solid but poorly optimized implementations (sometimes even when you're just feeding HW).
I think those gains can be critical.
Please give me a call the day our compilers will be able to transform some poor O(3) algorithm into a nice O(1). I'm very interested.
Compiler can optimise lots of stuff, but certainly not the programmer's logic.
blah
When I was in high school I wrote an AI player for connect 4. In Basic. On a 386 equivalent machine. (errrrg slow)
I had to optimize the hell out of the algorythm to eliminate repeated calculations and simplify loop structures.
When I got to uni one of the assignments was to write an AI player for gomoku. Well all I had to do was re-implement the algorythm, this time in C, and on a more powerful machine.
Boy was it fast, and that allowed me to spend some serious time looking at future moves that probably wouldn't have been possible if I hadn't spent the time optimizing the algorythm previously.
Don't underestimate what you can acomplish in a modern CPU once you've saved time optimizing.
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
After years of developing, I really take to heart two things:
Profilers are the best thing to happen to performance since compilers - really. I encounter a number of truths, but many myths about what degrades performance. A few examples of each:
Performance degraders
Not performance degraders
The "lots of object indirection" myth is one I encounter frequently. Object A calls Object B calls Object C, and it "intuitively" looks like it must be slow (Computer A calling Computer B, etc. would be slow), but even with stack frame generation, these are lightning fast compared with even the likes of "date to string" functions, never mind line-drawing commands or notification-sending.
The reason that particular myth is dangerous is that it's the single most pervasive myth (IMHO) that leads to premature optimization. People take out layers of object indirection and make it harder to put in better solutions later. I had an object that recorded object IDs in a list and let you look them up later. If I had "flattened" that into the routine that needed it, I might have effected a 0.1% speed increase (typical range for many premature optimizations). As it stood, because it hid behind an interface (equivalent to an ABC for C++ folks), when I had finally implemented a unit-tested red/black tree, it was trivial (~5 minutes) to drop in the new functionality. That's not an isolated case, either.
Mind you, I profiled the program to determine the slowdown first. Searching on the list, because so many were misses (therefore full scans), the search was taking up 98.6% of the entire operation. Switching to the red/black tree dropped the search down to 2.1%.
All in all, if you have a slow program, profile it. There is no substitute for a well-written profiler. Stepping through and "feeling" how long it takes in a debugger, while it can point you in rough directions, will miss those things that take 50 ms out of the middle of each call to the operation you're checking. Manually inserting timing calls can be frustrating enough to maintain or slow down your program enough that you can't narrow down the performance hit.
gprof works well with gcc and its relatives (make sure to add -pg to your flags), but I'm not sure if there's a good open source option out there for people using other tools that doesn't require you to alter your source.
In the Windows world, we recently got in the professional version of AQTime 3. It's an astounding package, allowing you numerous reports, pie charts and call graphs, saving the last few runs, calculating differences in performance between runs, allowing attachment to running processes, on top of a pretty nice way to define areas of the program to profile. The single nicest thing about it, though, is the performance. We turned on full profiling (that is, profiling all methods in all modules, including all class library and third party components) on the largest project we had, and it ran with perhaps a 30% slowdown. If you've used profilers before, you know how astounding that is ;)
Profiling applications always surprises me. In one case, a space-making algorithm I was running on controls seemed a little pokey; I found out more than 50% of the time spent was on constantly verifying that the lists were sorted. Today, I was investigating a dialog that looked like it must hav
Binary geeks can count to 1,023 on their fingers
Please give me a call the day our compilers will be able to transform some poor O(3) algorithm into a nice O(1). I'm very interested. What's your phone number?
The article made it sound like the optimizations he was doing at the erlang level were somehow "better" than optimizations done in a language like C++ because he could just try out new techniques w/o worrying about correctness. His array bounds and types would be checked and all would be good.
BS.
First of all, erlang won't catch logical or algorithm errors, which are quite common when you're optimizing.
Second, you can optimize just fine in C++ the same way just as easily, IF YOU ARE A C++ programmer. You just try out some new techniques the same way you always do. So array bounds aren't checked. You get used to it and you just stop making that kind of mistake or else you get good at debugging it. Hey at least you have static type checking.
In fact you might be able to do a better job of optimization because you'll be able to see, right in front of you, low level opportunities for optimization and high level ones also. C++ programmers aren't automatically stupid and blinded by some 1:1 source line to assembly line ratio requirement.
Wrong. Dead wrong.
You don't micro-optimise unless the compiler doesn't do the job well enough. But nowadays, you almost never have to. Your superior brainpower can mostly be freed from the mundane details of your hardware and instead you can concentrate on using more suitable algorithms or data structures.
Indeed, the best thing you can do to get your code running fast is to write it with good abstractions. That way, when you find a performance problem, you can swap some old code out and swap some new code in and everything else will still work.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
Code runs faster if you take out all possible white space between the instructions.
If you are testing against a non-volatile value then there will be no side effects of the end of an if skipping following if statements that can't possibly be true. A decent optimising compiler will just jump to the end of the set of if statements. However, as you say, it probably wouldn't turn the whole lot into a range check and a calculation.
10 hours spent optimizing code by the programmer vs 100,000 people spending an extra 30 seconds (if you're lucky) twenty times per day waiting for a computer to do something.
It all depends whether you want quantity or quality, doesn't it.
Government of the people, by corporate executives, for corporate profits.
umm O(3) = O(1) lameness lame
I wonder how a program written in C or asm with the same external behavior will compare with the one written in asm. I am surprised that the author never brought that up. Performance is not about how much time it takes. Its about how much time it takes multiplied by how many times it is called.
Speed doesnt get you nowhere if you are going in the right direction.
None of the readers seem to anything about it...
A few points:
* interpreted languages nowadays tend to compile to native code
* the compiler or run time is better at optimising than you are
* performance is not the same as scalability, a scalable architecture is more importantant than Mips (Mips are cheap commodities which you buy in inexpensive boxes from HP or Dell - if your architecture supports it)
* most real life systems have latencies (io, kernel activities, screen painting) which are only tangentially related to the grunt of your piece of code (see architecture)
* Erlang is uber scalable, designed to build reliable high performance systems. It provides its own (very lightweight) concurrency and means the developer doesn't use expensive OS process and thread spawning and doesn't have to worry about managing threads and synchronisation.
Check out the Yaws (yet another web server) versus Apache benchmarks for throughput and concurrency.
Erlang systems also stay up. The main BT telephony system only had 31ms downtime in its first year (see here (page 30).
One of the most dangerous things (optimization-wise) in C++, I've found is the temporary-creation problem. You have to be insanely careful to avoid creating temporaries to get any sort of reasonable performance... (or maybe I just need a better compiler than GNU GCC?)
Not quite sure why you would consider them dangerous, but they are Turing Complete (i.e. they are a compile-time language all of their own). Which some people have used to create this. It looks almost as fast as Fortran, but the syntax is a lot more complex than just A*B for a matrix-multiplication.
HAND.
Not entirely true that you can't throw hardware at a problem, but pretty close.
What gets me are all the people who want me to tune systems because an application is slow. Thing is, unless there's something specifically wrong with a hardware or software configuration, performance tuning at the hardware, OS, network and rdbms layers will only make a relatively marginal difference to the performance of the application.
Performance tuning or optimisation at the application code level on the other hand can make several orders of magnitude difference to the performance of an app.
Government of the people, by corporate executives, for corporate profits.
Of course, I meant O(n^3) and O(n).
blah
It may not be now as the hardware more than meets the demands of software. What if the demands of software one day catch up with the hardware again? Hopefully programmers will still know how to optimize software when that happens. I think that the matter is definitely perennial. Used to be the hardware could not meet the needs of the software, now the software cannot fully utilize the hardware. It is only a matter of time before that changes again.
This post was written with O(1) complexity.
Today they're widely accepted. Most of the responses on
And that's probably why my 2GHz cpu doesn't really feel much snappier than the first computer I ever had, a 10MHz 286 box.
Example 1: in C, if you use "int" for a variable "x" that should have a type of "unsigned", "x/4" will not just be a simple shift, instead three or four instructions are involved. Indeed, it would be very hard for the compiler to infer that "x" is always non-negative and optimize for you, except in the simplest cases.
Example 2: in floating-point math, "divide by 10" is not exactly the same as "multiply by 0.1", thus many compilers (gcc 3.4 without "-ffast-math", icc8 by default, and probably the Java VM) won't optimize the former into the latter, even in the many cases where it won't matter. This results in code that is 10-40 times slower on the P4.
Example 3: in Haskell, since lazy evaluation has much more overhead than eager evaluation, compilers always try to optimize the former into the latter. However, in many cases it is impossible for the compiler to do that, since it can't decide if using eager evaluation will prevent the evaluation from terminating.
In short, it is good to rely on the compiler to do the optimization (such as register allocation) that is known to be done well, but what the compiler can do is very limited, since (1) it can't know your intent if you had not expressed it, so (for example) it has to make sure that every floating-point operation conforms to very stringent error bounds, often at the cost of significant speed, even if you don't really care about that; and (2) some code-optimization problems take extortionate time to solve, or might even be theoretically infeasible in general. Therefore, when writing code that is going to take some significant CPU-time, it is good to have some good habits that helps the compiler, as long as the code isn't uglified too much.
To summarize the essay: A suboptimal implementation of an optimal algorithm usually beats an optimal implementation of a suboptimal algorithm.
Also, the author is clearly enjoying the fruits of those "optimization aware" programmers that created Erlang, especially those "cycle-counters" who wrote the virtual machine.
Compiler optimazation....
would be great, but.. I never seen a compilter that actualy does some real magic, okey they do optimize to some point, but thats _nothing_ compared to your broken algorithm. When searching for data maybe you put all in a list, and then search the list from start to end. If you instead put the data into a binary tree, it would be ALOT faster to search when the data grows. This is also a kind of optimization, that it will result in a REAL speadup.. even if the compiler managed to optimize 5% on the list search.
Most of the time the compilers are happy just _if_ the code compiles!.. Look at GCC, it likes to allocate more memory for values then actualy needed? Why? Nobody knows, adding -O6 minimizes the overhead, but not totaly.. Why is this? Cause it was easyer to code and get right this way, is it the optimal solution? No ofcourse it isnt. You can hand opimize everything, but that will take you a lifetime for a bigger project.. Get used to that not everything is the fastes as it can be, its better that it does what it should.. And then you optimize away the BIG things, and thats most likey badly algorithms. When you fixed those, then you can start looking at the 95% loop (where you program probely spend the most time)..
Compiler optmization is overrated, and I dont beleave in it.. sorry.. compilers are as other programs and suffers from the same things as any software.
Excellent original poetry. Slightly insulting, but creative.
The thing is, the author claims it's fast enough, but I don't buy it. In any commercial setting, many customers will have machines with 1 Ghz and slower machines. Couple that with slower hard drives, larger images and other stuff running on the machine, and there's nothing left of your nice framerate.
This sig under construction. Please check back later.
There are several external influences that decides if a program is "optimized enough". These influences are often local to spcific users, and thus cannot be tested easily by benchmarking.
:)
Now, I'm all for the optimizations based on profiling and don't optimize utill the need is there. I'm just trying to say that it's pretty complicated trying to figure out when the need is there in a more general setting.
Repetition: (probably the most common problem) It's ok if a task takes long if it's not done too often. If you have to repeat the task many times, performance of that task becomes as many times more critical as you repeat it.
Multitasking/Parralelism: Other tasks would like to get to the processor too. If your program is just fast enough to do it's job, it's too slow to do it while notepad is running along side it. Now, I run colinux alongside on my windows desktop, so performance of "other" programs does matter. Also, execessive memory usage slows down all the other programs on my machine down too (hi there mozilla
Power: My laptop can scale the cpu-frequency and use that to save power, so i can sit out in the sun programming longer. If a program spends too many cycles on a job, i'm gonna have to go inside sooner. and this programs BEGS for optimizations from my point of view.
Many more of these side-effects of optimization occur, so don't think you can easily (or exactly) evalute when certain code is efficient enough for others, only for yourself.
This also shows a fundamental advantage in the OpenSource model. I can run a profiler myself and try to perform some optimization, instead of selecting not to use the program (or use another).
SLOGEN [ http://ungdomshus.nu : Sebastian cover music]
When I first started I wrote a "Compiler" for the 8K PET BASIC - that removed all unneeded whitespace, automatically put lots of statements on 1 line, used short var names, etc. It did speed up a program a little, saved some RAM, and also protected the code a bit, since the result was unreadable.. Needless to say this is a final "before publication" step..
"You lied to me! There is a Swansea!"
A lot of this discussion here is either crap, a rehash or was covered in Engineering 101.
Basically, you have some requirements for the product, and you optimise according to those requirements. Performance is just one variable (time to market, scalability, reliability, security, usability, cost, etc - are the many others).
The requirements for a product in a fast moving market entry company are less about performance and more about rollout ASAP.
The requirements for the same product two years later may be to improve performance to achieve scalability requirements.
If you're writing some sort of overnight build or batch application: whether it takes an extra hour or not may not matter, because it has a 12 hour window to run in.
If you're writing an order processing system, then performance and end-to-end turn around to will be vitaly important, but you won't focus on the assembly, you'll focus on the algorithms, design and architecture.
If you're writing a compression or encryption module: you probably will work on the assembly.
All of the above cases, before you optimise anything: you profile and understand how the optimisation is going to pay back in real terms.
In my experience, you cannot prescribe any of this: you need to take it on case by case basis because every product and circumstance is different.
Sorry, but I missed that one, nothing in the article really surprised me.
Was it the fact that he was using a pcode-interpreted language? I've already been surprised so many times at how fast Perl is, that this does not surprise me any more. I've parallel-written a few programs in Perl, C, and C++, and Perl won every time.
Most compilers will actually reduce sets of if statements to a hardcoded jump table, which is going to be as fast as rewriting it. See http://www.codeguru.com/Cpp/Cpp/cpp_mfc/comments.p hp/c4007/?thread=48816 for more information. However, the clarity of the single test is indisputable....
Elliott C. Back
Here is the way I program a project:
Start with an overview of the whole project. What must be done in each step, what is the structure.
Then I choose the algorithm for everything. It is very important to choose the right algorithm at the beginning, since it is cumbersome to go back and change a little simple algorithm, with a more complex with different datastructure, than to just do the right thing from the beginning.
Simple datastructures should be used, where it is easy to implement a better, and there isn't a performance hit.
Then I code as much as I can, with focus on stability. I don't care for performance at all, since my goal is to implement the right algorithms.
When I am done with this, I run a profiler, and find all bottlenecks.
Here comes the most important point!
Whenenever I find a bottleneck, I check if it is in my code the problem is, or if it is in one of the dependencies or in the compiler.
It is usually pretty easy to see, where the problem lies. 'Can this structure be easy recognised by a program' - then it is the compiler.
'Can a program in no possible way, guess this optimisation', then it is the program.
If the bottleneck is in some systemcall, i go through the code (mailinglist, mailinglist, mailinglist) for that call.
The point is I don't do microoptimisation unless there is no other way to do it. The main benefits is that I do far less laobour in the long run - and i get a better understanding of other programs.
It usually takes a lot longer to fix performance bottlenecks elsewhere in the system, but the benefits in the long run is well worth it.
Screw performance -- that can be solved by hardware.
No amount of hardware spending will make your program easy to use, easy to understand, and easy to adapt for new purposes.
Program as if the user mattered. They ain't getting twice as smart every 18 months, but they sure are spending money on software.
It is a common misconception that to acheive better performance, one needs to drill down to the assembly code. It has been shown that using VM languages like Java / C# people were able to get performances close to the compiled to native code languages. The trick is to know how the language behaves. A Simple thing like notifying the GC to delete an object can bring a good performance boost. I have also known applications that were written in C++ but with bad performance. So I think to be able to get a good performance, knowledge of the language you are writing is very essential. I am not getting into the with such faster CPUs why do we have to worry about execution speeds argument
You totally missed the point, didn't you? There are situations where a bubble sort is faster than a merge sort or a quicksort. It has almost no setup overhead, so if you're sorting sufficiently small arrays (and what I remember from CS101 is that "sufficiently small" goes up to about 1000 members) bubble sort is actually significantly faster.
So, as a matter of fact, if you had to sort a million small arrays, bubble sort would be the only feasible option.
All's true that is mistrusted
What are you talking about? I get paid to write open-source software. Where did you get the idea that open-source software is written entirely by volunteers?
All's true that is mistrusted
The biggest problem I see with performance is lack of visiblity of performance factors. At the hardware level there is cache (which is supposed to be transparent) and really deep pipelined processors. This can have a major effect on what would otherwise be an optimal algorithm. And the hardware keeps changing, so what may have been optimal at one point will become suboptimal later on.
In software, the biggest problem is lack of performance directives. POSIX pthreads is one of the biggest offenders here. Best performance practices in pthreads are based on how common implementations work. POSIX allows implementations that would cause major performance problems for so called best pthread programming practices. Example, POSIX allows pthread_cond_signal implementations to wake all waiting threads, not just one. There are programs that depend on pthread_cond_signal to wake only one thread for performance in order to avoid "thundering herd" problems. So while standards allow portability of correct programs, whey do not necessarily allow portability of performance.
We need explicit performance directives.
If the hardware really has improved so much, shouldn't Windows now boot in like .001 seconds?
By the perception of illusion, we experience reality
Given that the only reason to deliberately make it hard for others to understand your work is to increase your job security, that must mean that you don't think you bring enough other skills to the job to keep it on merit.
In other words, you don't think you're good enough.
And given that most programmers think they are better than they actually are, if you don't think your good enough, why the hell should anyone else?
A program that produces incorrect results twice as fast is infinitely slower.
-- John Osterhout
The cheapest, fastest, and most reliable components are those that aren't there.
-- Gordon Bell
"When in doubt, use brute force." Ken Thompson
Well, maybe you're not ignoring it since you said "if it needs to scale properly". But that's a very crucial "if", and the "scale properly" only refers to certain situations.
If the array you need to sort might have several million members and you won't be sorting more than a few dozen of those arrays, yes you should use an O(n lg n) or whatever sort routine. OTOH, if the array itself is smaller (a few hundred members) but you have to sort several hundred thousand of them, quicksort or merge sort will be remarkably slow compared to the much-maligned bubble sort.
Big-O notation is an asymptotically-tight bound, not the function itself. For small datasets, not only is there no guarantee that the lower big-O algorithm will be faster, it's in fact usually the case that the allegedly "less efficient" algorithm will actually be faster.
All's true that is mistrusted
A compiler has yet to figure out how to change a slow algorithm for a faster one, or a greedy algorithm for a conservative one etc.
For instance, (iterative) bubblesort is relatively conservative in terms of memory usage, as opposed to recursive quicksort; however, bubblesort is usually slower than quicksort. The compiler does not yet know how to recognise bubblesort and replace it with quicksort where appropriate.
Another, Fibbonacci numbers can be calculated recursively. (SLOW) Or, it can be calculated iteratively. (slow) Or, it can be calculated in an effectively O(1) algorithm. (phi = (1+sqrt(5))/2; f[n] = (((phi^n) - ((-phi)^(-n)))/sqrt(5))
But I doubt the compiler can do that kind of optimisation.
Remember, the compilers have yet to figure out intention. Without knowing intention, it cannot optimise very well. (Which is why higher level languages tend to be more optimisable -- the higher up you go, the more intention is encoded in to the code itself)
Seriously. Let's all code in Python, Java or Erlang. Then render stuff like the special effects in LOTR or composite scenes with 4k media with motion blur and a few keyers thrown into the mix. In realtime. Yeah, right...
Performance DOES matter in a LOT of places.
People keep saying that the JIT-style optimizers in .NET and Java can radically optimize the application "for programmers who can't or won't."
Peephole optimization and clock-scheduling are among the simplest of optimization. The machine looks at a few low-level instructions and might suggest an alternative which would operate identically but with better performance. That's really all that the VM has time or capability to perform today.
Mid-range optimizations include vectorizing, unrolling of loops, and register reduction. These are still machine-analyzable, so I expect the JIT-style optimizers to continue to make strides here.
But I don't think you're ever going to see JIT-style optimizers which replace an O(n^2) algorithm with an O(log n) algorithm. That is real optimization. That's where you win the performance races. That's the one that programmers should care about, and should learn how to do. The level of analysis required to "divine" the whole meaning of a large routine, realize the alternative algorithm equivalent, and fix up the code is far beyond any JIT solution.
I think we will have to wait far longer than the 6 GHz Longhorn machines before you see any meaningful machine optimization of sloppy code.
[
Actually, wouldn't the clear and simple way to code this be to use a single "switch" statement and not twenty "if" statements ?
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
The title of this essay is based on the title of an out-of-print book by Nathaniel S. Borenstein, Programming as if People Mattered: Friendly Programs, Software Engineering, and Other Noble Delusions
/.ers .
Which is actually a play on E.F. Schumacher's seminal work. Small is Beautiful: Economics as if people mattered Which probably should be required reading for all
Joe User here, who lurks on devel to try the knowledge osmosis absorption technique..
Speaking as an antique box driver,bleeding edge of 5-6 year old technology when I'm lucky, I REALLY appreciate apps that can run well* on older machines. To ME, always thinking of the end users of the code is vital in development. I wish it was the complete industry standard, even if it meant some more days work before release.
*define "well" OK, to me = lowest ram usage,(most machines shipped in the past buncha years never shipped with all the ram slots filled, and they STAY that way) lowest CPU usage (how come older machines did a lot of the same stuff you want to do now, but now you need CPUs orders of magnitude faster to do it? Huh?), stability (no leaks/conflicts), security (no idea what this thoroughly e-vile "buffer" is, but that thing always seem to be overflowing whenever you hear of the latest security exploit. Wazzup with that? Someone needs to invent the dang buffer valve and turn that thing off once your sink is filled with enough buffer whatsis stuff so it don't overflow. I am assuming that is what all these emergency "patches" do whenever there's a new exploit. Uh, patch em in advance please)
signed Joe User
Carefully examining a problem will lead to efficient ways to solve it. This even works with higher level languages and those languages provide important benefits that make them the natural choice for solving most problems.
My favorite example of this is a program that I wrote for a VIC20. (6502 and 4k of memory) The problem solved a statistical problem known as m choose n. This involved large factorals. Naturally, 100! can't be solved directly on such a machine so I found a way around it. Since this solved a problem we were having with a piece of equipment, I sent the program and the results to head office. They sent it to the supplier of the equipment. The supplier looked at the equation but not the program. They plugged the equation into their mainframe which promptly crashed on overflow.
The article teaches a lesson that many programmers never understand.
"a sad inditement "
their spellchecker is much more comprehensive with the later releases than ms word 4...
There are things to be said about writing good code versus well-optimized code.
:)
Many of us today need to support more than one platform (and more than one type of CPU), which means for example, different memory architectures, different instruction sets, and different CPU capabilities.
It is possible to hand tune source code such that it will be the most optimal for one platform. However, the resulting source code would most likely not be optimal for other platforms. Think of a loop tiling optimization - one that divides the iteration-space of a multi-level nest into smaller "tiles". Choosing an optimal tile sizes and tiling "depth" depends on the memory architecture aimed for. Some platforms provide O/S means to query the cache sizes, etc., and the decision can be delayed to run-time, and thus be made more accurate, but some changes can not be delayed, such as the tiling depth (i.e. the depth of the tiling hierarchy), which means that the could would need to be versioned for different target memory architectures. A compiler optimizer can do that easily, since it is the same code, only in a "slightly" different forms. Versioning the code might then expose some additional optimization opportunities that are specific for each version, etc. The user, on the other hand, would have to maintain multiple copies, or apply an extremely creative use of macros, to achieve the same goal, which would only result in an unmaintainable (well, almost...) code.
There are many more examples of target dependent tuning and optimization that may change from target to target (think of unrolling, outer loop unrolling, register allocation - as someone mentioned in a previous post, instruction scheduling, etc.).
My point is, the platform specific compilers should take have responsibility of tuning for the specific platform.
An optimizer should not replace a bad algorithm for a good one. That should be the programmer's responsibility. The programmer's code should be clean, efficient (at his abstraction level) and easy to maintain. Let the compiler do the job of making that code run best on its targeted platform. Use higher optimization levels if necessary, giving the optimizer more time for analysis and transformations. The end result may be well worth it
With all this in mind, compilers have their limits, and there are differences between optimizers (and optimization levels) on different target platforms. That really depends on the application and the type of optimizations required to make it run optimally.
Back in the late '80s, early '90s, worked on (DOS-based) commercial products written in APL. APL is an interpreted language, written in Greek & math symbols, some overstrikes.
Each byte of that 640K was precious, so it was common practice to "de-comment" the code before release; remove all comments, reduce whitespace, move multiple statements onto a single line, possibly shorten variable names. You could gain a substantial (for the time) amount of memory that way. You also dynamically imported/destroyed functions.
I regularly debugged client systems with "de-commented" APL; if you could read that, you could read anything!
Absolutely. Serious performance tuning is a difficult task, and it's foolish to attempt it without good information to work from, hence the usual recommendations to use profilers, disassemblers, etc. That doesn't mean a competent developer shouldn't be aware of the issues, and act on them if appropriate, though.
Actually, I think the article shot itself in the foot slightly, by undermining its own argument with its own information. Sure, the performance of its example application improved by around an order of magnitude using newer hardware. Sure, it was written in a language not designed for high performance work. But the example was a trivial manipulation of around 1GB of data, and the program was getting that done a whole 60 or so times in a second on a 3GHz beast? That's pathetic performance, and seems a pretty damn good argument that either low level hackery in something like C or assembler, or choosing a high level language with good performance like OCaml, are still the best ways to go if performance really matters.
The article also suggests that if you can do those 60 manipulations a second, you're up to modern video game standards. Somebody should mention to the author what's actually involved in rendering a state-of-the-art FPS, because it's a bit more than a trivial display of a graphic file every 1/60 of a second! :-) (Yeah, OK, everybody's got fancy accelerated graphics cards today, but still...)
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Hm.
The next time you're pulling your hair out because your new videogame takes 20 minutes to load every level, ask yourself again if 0.176 seconds per texture "seems quick enough" (and double-check to make sure your game isn't shipping with an Erlang interpreter...)
Or hell, just buy a shiny new 3Ghz system -- how dare you make the programmer's lives more difficult by expecting their code to run decently on anything less?
The following sentence is true. The preceding sentence was false.
I have two widgets on the screen that give two different "views" of the result of a mathematical calculation on a model. Each widget runs the same mathematical calculation on the model.
Refactoring by moving the mathematical calculation into the model would require adding a large amount of storage to hold the calculation result. Benchmarks indicate the saving in time for a screen refresh would be from 2 seconds down to 1.5 seconds (P-III 1.2 GHz). The screen refresh is broken down into steps so the UI is not frozen during the 2 seconds, allowing the user to interrup the refresh to do anything else.
Do I bother with this optimization.
Real opimizations come before you right your program. Take for example that loop that you removed an instruction or two from. Say it is a searching an array. and looks like:
There are two things wrong. One you cal strlen repetitively. Strlen() is theta(n) So you have a loop that executes n times at a cost of n . n*n=n^2. That's one of the slowest algorithms around. Maybe your compiler is smart enough to see that x is not being modified and will to a s=strlen(x); then compare against X for you, but probably not.
The other thing is when searching an array, try to give it structure. If your array contains sorted characters, then you can find it in log _2 (n). Of course, of you sort by frequency (most commonly accessed at the top) then your n^2 loop *might* do better.
The article is right: constant-time operations (type checking, etc) are asymtotically infitessimal in algorithms. The article's real problem is that it is n, but on a 2d image (x*y)=n you can't do any better. Note that it is not n^2, (though it makes a square picture) because you're operating on pixels. So that will be your unit of measure - NOT time.
Which is my 2nd point. Don't measure in time or instructions. Measure in OPERATIONS. Operations are not instructions or lines of code. An Operation is everytime you have to look at a unit. It is a logical unit of cost. Hardware can be changed out. We know that hardware (performance) doubles every 18 months. The constant-time instructions will get smaller. (Also clocks per cycle are irrelevant as well). But your loops will remain the biggest part of your program.
With that, here's a crash course in CS:
To learn more, take a datastructures class at your local university. Please review Calculus II before doing that though.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
That will never be completely true, for the simple reason that the programmer will usually know more about the problem than will be expressed completely in the clear and simple code.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
If for example you are developing scientific/engineering/ professional audio/visual creation industry apps,etc, or games, ya, your end users are probably always getting newer faster machines. But the hardware side of the industry is a real severe slowdown on new machines, adoption peaked a few years ago from what I have read (and noticed in meatworld). What I am saying is, for developers and their buddies and their peers at work, high end machines are the norm, that's what they are used to and have *forgotten* that they are the exception, not the rule. For 90% of the rest of the world, upgrading hardware every 6 months to a year doesn't happen, and the trend for people at home and a ton of businesses is to hang on to hardware for a longer time now,years longer in a lot of cases, upgrades are being postponed a lot.
In the US, last I looked, media family income is around 45 grand or a scosh more. That's FAMILY income, in a lot of cases that's 2 paychecks combined. People who are making a lot more than that with one paycheck tend to forget they aren't the norm either,and usually they hang out with other people closer to their economic level, and their consumer purchasing choices tend to be vastly different from the rest of the people. A side note, but that is also something the music/video industry forgets, wonders why people aren't as fast to snap up expensive CDs as they used to. Same with a lot of other industries, you can't look at it with blinders, people having a harder tiome now paying necessary bills, and that goes up and down and sideways throughout all industries, and why we have interest rates so low, because people have maxed out credit already, why bankruptices at all time highs, mortgage defaults all time highs, etc. To developers ( I mean the sales aspect of it software obviously), who are quite specialised into a niche way of thinking and doing business, maybe they are not noticing these things, might be a reason why it's getting harder to sell stuff. Maybe, but I bet it's part of it. it's all connected, chaos theory and whatnot.
Mindshare is both an immediate and long term process. Longer -term thinking will get you more loyal customers *for a longer time* for your apps as people are happy to use ones that still run well on what hardware they have. They'll remember that your product, written for THEM and their hardware worked well. Cost is BOTH, if a new program actually requires an additional one thousand dollars or more in new hardware just to run, you might actually LOSE mindshare as people think about it and go "no thanks", even if the program itself is just totally spiffy.
Just wondering, but I wonder what a median set of hardware specs is now for the millions and millions of machines out there still being used. CPU in the medium pentium II class, and 64 megs RAM? That would be my guess. Isn't win98 still the dominant OS that shows up in most website logs? That would be a clue as to what machines for specs are the bulk of this "the market" thing, VERY broadly speaking taking my original point into consideration on types of apps.
The performance question often depends on what you're actually doing. I think it's safe to say that some specific applications (like games) will always need performance. I guarantee you a modern 3D game in Erland would never be able to compete with one written in C++. Sure, in absolute terms it might be able to do a decent job, but games are always judged relative to other games, and if even one person can create really fast code by using C++, everyone else is obliged to do the same to keep up.
Also, I've always been a little distressed at the trend towards bloated programs. Maybe that's a holdover from my days of DOS game programming where you really needed to consider every clock cycle, but I think it still matters in a modern context. Sure, maybe that TGA loader can load 60 images a second. But was the author of that article listening to music, browsing Slashdot, downloading files, IMing with friends, protecting his computer with a firewall and virus scanner, etc. etc. AT THE SAME TIME? Probably not. I've read articles similar to that one, and all of them seem to ignore the fact that, as a programmer, you don't have the full 3gHz of the machine at your disposal. Even with games and other such things that fully grab the user's attention, I wouldn't rely on having more than 75% of the actual CPU available. And for the kinds of applications we run on a daily basis (web browsers), I'd plan for much lower than that. So increasing resources doesn't translate into an automatic license to bloat up code to horrific levels as so many people appear to be fond of doing.
A decade ago, when I was graduating, the CS courses I took were all of the same mind: don't optimize, don't bother writing hairy processes in assembly to speed things along - the computer manufacturers will just make faster machines with larger drives and more memory...
Ten years ago. I'm guessing things aren't much different anymore...they've just added a new level for client/server, that they'll eventually get faster and faster networking.
Software is like a goldfish - it'll grow to fit the size of it's bowl...
This article doesn't say anything useful.
It says: I wrote a (not typically used benchmark) in erlang and ran it on a slow computer, then I ran it on a fast computer. The benchmark ran faster on a fast computer! Amazing!
It says: Virtual machine languages with extra features like erlang has can perform well! Despite the fact that he didn't compare the performance to any other language.
Somebody needs to teach this guy about the scientific method.
If you are working on OSS, quality of the result is more important than the readability of the code. That's because if you application sucks, nobody will want to improve it. They'll just start from scratch, saying "what moron wrote this? I bet there isn't a single line of good code in there."
especially in databases where the data set that you have to sort is often so big that it doesn't fit into memory. The (usual) solution is to use a variation on the well-known Merge Sort algorithm, where blocks are merged into larger and larger "runs" of sorted data (which are then merged). (The number of runs of course depends on how much data there is and how much memory you have).
HAND.
I guess no one else picked up on it, but 66 small targa files decoded per second is dead slow. His only possible hope for vindication is if random access to his hard drive took up most of the time. The author seems a prime example of this.
I don't know Erlang, but if it is a pure functional language, the compiler/interpreter can use "special" optimizations, e.g.
will not produce intermediate lists, instead the compiler will use lazy evaluation when decoding the data.My point is that many optimizations do not sacrifice readability. Many times it is possible to refactor slow code that improves both readability and execution speed, but you must know the pros and cons of the tools you are using!
Funny how the opposite is usually the case. Best example in my experience: FOSS libraries like oh, I don't know, GTK+ 1.2. The story generally goes like this: Start working with the libraries. Read the documentation. 10 minutes later, realize that the documentation is written for an old crufty 2-year-old version of the library that doesn't really work the same way as the current version, and that it wasn't even halfway complete for that version, either. Get annoyed, but start reading the source code. Realize that the people responsible for this project are huge fans of 'clever hacks' and have religious problems with commenting. Give up, go back to VisualStudio.NET.
If you read this article it advocates not worrying about writing efficient code because advancements in hardware will cover for your incompetence. This is happening every day! Every year code gets more and more bloated and programmers become more and more sloppy. Who cares we will have 10GHz CPUs soon right? This makes me crazy! If humanities software developing skills progressed at the same speed as hardware we would have sentient thinking computers right now.
I can't wait until Moores Law breaks and we reach a hard barrier on computing speed. Right now we have far faster computers than anyone actually needs. I think with proper coding we could be running all our current applications without performance degradation on 386 class systems. If the hardware crutch is removed maybe people will start focusing on writing quality optimized code.
Nick Powers
Encryption: I may not agree with what you say, but I will defend your right to encrypt it...
The images he's decoding are repetitive.
He's using Erlang's built-in hash table functionality to cache the decoding of particular runs of data. His implementation will almost certainly NOT be fast enough for all images (e.g. not for photographs).
Plus, you get the sense of how much work it was to identify and implement these so called "high level" optimizations. Surely it would have been easier just to use C (or Java, for that matter, since it has more straightforward data representation and JIT compilation) and the resulting code would have been simpler and much more useful.
"Let only the best programmers reproduce". Hmm. The best programmers are people like Richard Stallman. Well, there goes that plan for increasing overall programming quality.
XML is like violence. If it doesn't solve the problem, use more.
Were they C++ programs, or 20 year old shell scripts?
We were introduced to vtune during a 2-week trip to Intel. Profilers are good. vtune is the best one that I've found.
The way that we use it is to not even touch it until we have the feature completely working in the simplest form possible. Then we do some performance testing. If everything works well under load, we don't even bother profiling it. Otherwise run it in vtune and see what the bottleneck is. 90% of the time, there is some type of minor oversight. Occasionaly, there is an algorithmic change that needs to take place, like adding a secondary index to something, or making some temporaries thread-local.
We run both event-counters and call-tracing, but I've found that call-tracing is far more accurate. The best use of VTune is to smite arrogant developers. The result of our trip to Intel was that one of our developers, who had to write everything from scratch, was shown that all of his "high performance components" were completely worthless.
Just my $0.02.
Matt
Funny, but that's not entirely unlike a non-deterministic algorithm. Just get rid of the loop. When quantum-computing becomes commonplace, this may become the way sorting is actually done.
Despite the fact that nobody (other than him) seems to be interested in targa graphics these days, and that the total amount of data involved (less than a megabyte) is miniscule compared to limiting bandwidths in modern computers (ranging from ~100 MB/sec for the disk I/O subsystem, ~500 MB/sec for the main memory, and in the multiple GB/sec for the caches and internal registers).
Then he shows us just what a wonderful benchmark this program is: it is so wonderful that it runs nearly instantaneously! Maybe it's just me, but I like my benchmarks to take a little while to complete, either becuase I don't trust the sub-second accuracy of the system time routines, or because I like to get a reasonable sample of system states contributing the overall performance of the benchmark.
Next, the guy tells us how he used unsatisfactory tools to implement an ill-conceived algorithm and, glory-glory, later fixed his own dumb-ass mistakes!
Finally, he claims that he would not have been able to make the same kinds of algorithmic adjustments if he had implemented in C rather than Erlang, though it is not obvious why it would have been more difficult (and he doesn't give any arguemnt to support his assertion).
From this exercise he concludes that, GASP, optimization is still important!
What a senseless waste of skin.
This is the main reason compiling to intermediate "virtual machines" (JVM, Parrot etc) is a bad strategy.
A better strategy has been around since the days of the first HP-UX systems (under the HP BASIC incremental compiler) is to dynamically compile and keep track of source code dependencies so as be able to void encached compiled machine code and recompile as necessary. This allows you to trade compile time against run time. Once you can trade those two off against each other you can run background recompilation tasks that take into account constraints of increasingly-global scope imposed by the source code. If you have a set of source that sits around idle for a long time you can end up with highly optimized machine code -- but only if your source starts out at a high enough level to retain the intent rather specifying algorithms. Human intervention to assist the compiler should be in terms of pragmas stated simply as theorems the human believes to be provable from the source code.
This all looks doable within predicate calculus as a formalism (with appropriate synatctic sugars as needed).
Seastead this.
When I wrote code for the 8086 (yes, the 4.77Mhz CPU), it made sense to drop down into assembly on a regular basis so I could extract every cycle of performance out of a bit of code. This was roughly a .33 MIPS chip with 29,000 transistors. Shaving 10 cycles off a loop that executed 100,000 times saved 1,000,000 cycles -- close to a quarter of a second.
.00025 seconds.
Today's desktop CPU's are approaching 10,000 MIPS, running at 4Ghz. In simplistic terms, this is about 30,000 times the performance of an 8086. If I did my math right, saving those 1 million cycles now improves performance of that loop by
There are still applications which are compute-bound. But for typical applications, the type of performance optimizations that used to make sense are a waste of time today.
Windows.Forms in .net is simply a wrapper for win32... and win32 is as much or more of a hack job than gtk+.
If you're looking for beautiful code, check out QT.. truly a beautiful C++ implementation...
Pan
I said no... but I missed and it came out yes.
Not only that, the processors that you thought you would be targetting are already a generation behind and that algorithm that was going to hold back your competition runs perfectly fast on new processors.
I agree that alogorithms are the primary place that optimizations should occur at, that premature optimization is wasteful, and that for most applications the minor gains are not worth it. That said, your comment above is ill-informed. Optimizations done for one architecture are not necessarily counterproductive on the next. For example I performed some Pentium UV pipe and Pentium FP stack optimizations on some core 3D math code. That code is still faster that compiler optimized code targetting the Pentium 4, as it was for Pentium III, Pentium II, and Pentium Pro. The performance improvement is now around 7%, I would not bother writing the assembly language code today, but four architecture generations later it is still a win. YMMV.
In addition, "new processors" are often irrelevant. Few people have them and the older systems are the ones that need the performance tuning to reach acceptable levels of performance. There is value in enabling your product to reach back one more processor generation.
Any tendency to optimize prematurely ought to be avoided, at least until after v1.0 ships.
Again, ill-informed, unless you believe in releasing 1.0's as-is and that 1.0's will be immediately followed with 1.01's and 1.02's that were what 1.0 should have been. I understand that patches happen and that no matter how good the QA on a product customers will probably find new and creative ways to use a tool and probably find a flaw. I just object to planning on having 1.0x's to finish a product. You statement seems to imply the later. The proper time to perform optimization is after a piece of code has been identified as a bottleneck, after the algorithms have been thoroughly reviewed, and probably after the rest of the product is feature complete.
There are *three* kinds of people in a dev shop:
- Programmers
- Transcribers
- Optimizers
Programmers come up with algorithms.
Transcribers turn algorithms into code.
Optimizers make a given algorithm perform its best.
The author of the article is suggesting that we eliminate optimzers, since he claims they are the least important.
Some people are good programmers. Many of these are good transcribers. Some of these are good optimizers. The people that are all three, and who know when to wear which hat, are what I like to have working for me.
A lousy programmer tasked with a programming job will come up with a lousy algorithm. Often, he'll be able to transcribe is successfully and optimze the hell out of it. This type of program will almost always be slow.
Here's a stunning example. My highschool had a bunch of Turbo 8086 machines (8Mhz, whoo hoo!). My highschool CS teacher wanted to teach us about "optimization", so he challenged us to write a program to factor large numbers. He "optimized" his program by re-writing it in assembly, and showed us how it was faster than a our BASIC programs, even when the assembled version was running on the old IBM box (4.77 MHz).
I "optimized" my program by using a better algorithm. (Only checking up to the square root. The stupid fool was checking every freakin' number!). His program was indeed very fast for small N, but mine *smoked* his for large values of N.
He gave me an F because I "cheated" by not checking every number.
I played Budokan for the rest of the year.
Do daemons dream of electric sleep()?
I already have a TGA reader that doesn't do quite the same thing as the essay's reader, but probably comparable. It reads the various headers, makes sure the tga file is the type I like (32 bit the last byte being alpha), reads it in, steps through all the bytes and swaps the red and blue bytes (tga is bgra, I want rgba, if I remember correctly). The code uses C++, does no optimization other than what .net 2003 pro does in standard "release" build, and was written by me in an afternoon a few years back. It has performed flawlessly since written.
Profiling under WinXP on an Athlon, reading in a complex image (not that it matters) the same size as that in the essay (576x576) although with an extra byte of alpha, I get:
1,226 microseconds
-or-
0.001226 seconds
-or-
816 images/second
So basically 12x speed improvement on a processor roughly 2/3rd the speed. Remember, I did no optimization iterations. Wrote it, ran it, debugged it, put it in CVS for everyone to use the past 5+ years. Also remember that I'm not doing the same thing, but it seems to me that RLE would be faster than what I'm doing, since I need to read and write every byte of the image.
Now, having said that, I agree with the essay in principle. Choose the right algorithms to start with, choose different algorithms if needed, and optimize hot spots when profiling shows a problem. I am a gaming/simulation/graphics programmer by trade and I am amused when someone worries about a 100 iteration loop that has some trig functions in it that runs every time the user pops open a certain dialog box. The massive amounts of processing that goes on in graphics/simulation inner loops has given me a profound understanding of the power of a modern machine, such that I don't sweat the small stuff. However, it has also given me a profound appreciation for the concept of O() notation. O(2^n) will cripple any machine for any software. Except, of course, when n is small. With small n, who gives a crap about the inside of the O() - care about the unspoken constant in front of it!
Er, make that 1MB of data. If it was processing 60GB of data per second, that would indeed be very impressive.
You would need a pretty hairy RAID system to keep up with his program's sustained 60MB/s of data in and out anyway. Not too long ago, the average PC's RAM wouldn't keep up with that sort of pace, for Pete's sake. In the real world, this level of performance is way past "good enough," even if the CPU is capable of better.
No one knew what O notation was.
Not long after that I found out about knoppix. I burned a few disks and gave them out. Only one other person knew what linux was. It wasn't my manager.
Just last week one of our servers had a problem with IIS. "What's IIS?", My manager asks.
Here are some other gems
"We can't put indexes on our tables! It will screw up the data!"
"I've seen no evidence that shows set processing is faster than iterative processing" -- this one from our "Guru".
"What is a zip file and what am I supposed to do with it?" -- from one our our senior systems programmers in charge of citrix servers.
"What do you mean by register the dll?" -- from the same sysprog as above
They pushed a patch out for the sasser worm and about 2% of the machines got the BSOD. I finally decided to give the fix a try on my machine and it died too. I booted into safe mode and rolled it back. Everyone else had to get their machines reimaged because desktop support couldn't figure out what was wrong. Lucky for my neightbor I was able to recover most of his data before they wiped it clean. He made the mistake of letting his machine loop through reboots for two days, which hosed his HD up. Of course the PC "experts" couldn't recover any data because the machine wouldn't boot up.
Yes, I am in programmer purgatory. I am reluctant to say hell because I'm sure it can get worse. No, I'm not kidding.
Looks to me like the writer of the article has his notion of 'alpha' backwards. An alpha value of 0 means fully transparent, 255 is opaque. The article has it the other way around.
This essay is a disgracefully misleading piece of legerdemain.
The lesson to take away is that (1) C is almost always a poor choice, if alternatives (such as C++) are organizationally possible; (2) Erlang is not appreciably nicer to code or optimize than C++ (else he would have compared to it instead of C); and (3) the C++ code, optimized in the same way he did his Erlang code, would have been embarrassingly faster (else he would have clocked it).
We are still waiting to see the language that can take the mantle from C++, for industrial uses. Unfortunately, we have become used to reading lies from promoters of slow or otherwise insufficiently capable languages. It will be hard to recognize the language that ought to replace C++, because what is said about it will read a lot like the lies.
Sure, if you're talking about puzzle games.
However, for most retail games pushing the graphics as far as possible is important. If you can squeeze a 5% improvement out of the engine you can use the freed up time to make the game a bit prettier. Or put another way, art expands to fill available processing power. Graphics blocking on the video card? Well, you can use processing power for increasingly realistic physics simulations and artificial intelligence.
If you play a lot of games you know that there is great variance between games. Some games coast along at a bare minimum while others surprise you with their ability to create compelling visuals with older hardware.
Ummmm, just about anyone sane? Wow, decoding a measely megabyte of data. And the encoding? Simple run length encoding. That's not a real programming problem; that's a homework assignment for Computer Science 101. If you're keen on loading graphics you could at least pick something that is slightly complicated like JPEG.
Is it true that optimization is massively overrated; that most programs are plenty fast? Sure. But this article doesn't provide a bit of evidence for that.
Search 2010 Gen Con events
Sorry pal....but as long as there are scientific aplications to solve there will be a need to write highly optimized code...Heck I wrote some inline assembly today. Yes we can ignore performance in some areas, but in high perfomance computing, speed is still king, and quite frankly always will be.
what?
If it still needed the twenty statements, yes. In this case I tested to see if the input was within the right limits, and if so, set the variable to (input + offset). That changed twenty if's to one. Clear, consise, simple, maintainable.
Good, inexpensive web hosting
Then you get the opposite: people choosing to write Qt software simply because of the amazing documentation. I know Windows developers who use Qt over the "native" toolkits for this reason.
Don't blame me, I didn't vote for either of them!
Wow, look at the all the replies. Hook, line, and sinker, sir, I bow to your expertise. +1 Troll, well deserved.
When I was finally allowed to program, I did it on a Commodore PET in elementary school. Of course I was writing games, and I optimized because games are not fun when they are slow. I was too lazy to type in the source from magazines, so all of my programs grew until they were usable, then grew some more as people played them and asked for features.
Junior High did not have computers yet. I finally convinced my family to get me a computer if I paid half. With my budget, that meant a VIC20 for under $100. The VIC20 had 4KB of RAM. You could buy a 16KB expansion, but I could not afford it.
The language was the same as the PET, so I tried to run my existing programs. They ran. I tried to modify them, save, and run them, and they would not work, even if the change was to remove code. I finally tried changing all the commands to "tokens" to shorten them. IIRC, a token was the first 2 characters of a command and an underscore. Since most of the commands were 4 letters, this saved quite a few characters. I also renamed all my variables to shorten them. Then I saved and the program ran. Yeah!
Then I made another change, and the problem reappeared.
I decided that:
10 The program loaded as written to the tape. (Hard drive? Floppy disk? Never heard of them.)
20 If the program fit in memory, it would run.
30 When the program was loaded for editing, all tokens were expanded to the full command.
40 The program was saved as text, except...
50 If the tokenized version of a command was encountered, then it was saved as the token. I never figured out if they were saved as the 2 Hex number, the dollar sign and the number, or the 3-character shorthand "token" I typed.
60 GOTO 10 (and see if it runs.)
So every time I wanted to modify the longer programs, I had to change every command to the "token" format. (About half of my programs were under the 4KB limit, about half could be "fixed" using this technique, and a few were large enough that I never got them working again.) Any changes to the longer programs required 20 minutes of "tokenizing" the commands before saving it. That killed much of the fun of programming. (Today I get upset if a build takes longer than a game of Solitaire, but "getting upset" means deciding to fix the build process.)
Commodore bought BASIC from MS, and then modified it, so I do not know who to blame for the hours I wasted on this, but Commodore is gone and MS continues to take the fun out of computers, so I blame MS.
---
My next venture into computers was the C64. They had "Sprites". Half of the code in my games was controlling the graphics, and this improvement to the platform made that code obsolete. For the challenge, I upgraded one game to use Sprites. They took much of the fun out of it, and (IIRC) you were limited to 4 of them, so you had to play games (pun intended) to write PacMan. (4 Ghosts and PacMan required 5 Sprites. The dots and cherries would be handled without Sprites. It was easy to write a 3-ghost PacMan game, and really difficult to write a 4-ghost game.)
Since the C64s at school did not have tape drives, my old programs had to be typed in, if I had a printout from elemetary school (no printer at home.) I already stated that I was lazy, so they are gone. Well, I still have the tapes, but they are 2 decades old, and my PC does not have a tape drive anyway.
I spend my life entertaining my brain.
Wow, how much time did it take to go through 20 if statements anyway?
I don't think it's about "one way of optimization is always better than another" so much as recognizing, hopefully early on, that you can design software in an optimized fashion, or you can design it in a "try it and let's see what works" fashion.
If you're a good programmer - a good *engineer* - you've already probed through things to try and you can figure out where optimization is most likely necessary and get it in from the beginning. If you're not, you'll take more time to build a slower product in the long run and that's why you aren't "good."
Nobody can possibly get in every concievable optimization on the first run and at the same time generate clean code, which is why rewrites are often necessary. But someone with experience will get in the crucial ones, every time.
I think size is a far more serious problem than speed-- just because I can put multi-gigabytes in my PC doesn't mean I want to waste them loading bloatware. In fact, I probably wouldn't NEED the multi-gigabytes if it wasn't for code bloat. Of course, Gates loves such things because the machine retailers love them-- easier to sell more memory to someone who needs it because they just tried to upgrade to XP or something.
So which compilers space-optimize by rolling loops instead of unrolling them?
algorithm...that one word says it all and does it all too, so all you math whizkids ..its time to rule.. :)
~~~~~ rudga ~~~~~
Actually, that could break C++ code that uses templates. There's a difference between
:)
vector <pair <int, float> > myVector
and
vector<pair<int,float>>myVector
It's only subtle until you try to compile it
Derek
Don't Panic...
I think the article over-simplifies the issue.
The real truth is that EVERY decision about a program needs to be made in the context of it's creation and deployment.
If your writing something like 'ls' that will run often on large multiuser systems, speed should the be ultimate goal.
Alternatively if you are writing something that will run once a week and will only be deployed on the two systems in your lab, ease of coding and maintainability should be the goal, since in this case it will probably be cheaper to throw more hardware at it than to pay a programmer (or use your own precious time) to spend the extra time on optimization.
It's all about context. Hardware is cheap if you only need one more box. Hardware is expensive if 10,000 users need to buy new boxes because your program runs slow. Programmer time is expensive if a single user is footing the bill, but if there are 10,000 users to amortize the cose, programmers become cheap.
The author also seems to think that choosing a programming language is part of the performance picture. It is to a small extent, but this must also be viewed in the cotext of the application. Different languages have different advantages and disadvantages. Some are slow for mathematical calculations but can push strings around incredibly fast. Some are really 'safe' when it comes to bounds checking. Some are easy to use and are fast when most calls are to libraries. Some are great for long-running processes, but thrash the drive on startup and are attrocious for short-lived programs.
This has to be one of the crappiest articles in the developers section, ever. The author has absolutley no explanation of what is "fast enough". He just says, wow 66/second, sounds fast. Guess performance doesnt matter. That is retarded. He never compares it to a non-interpreted, more performance oriented language, so how can he say its fast? There are many applications where i wish image decoding happened faster, so you can't just assume its fast enough.
Of course, this isn't really the opposite so much as further support for the argument. Qt is commercial software.
I am currently involved with making a very simple change to an application. It has been implemented using 3 very different algorithms. The change is to automatically set 2 fields for geographical groupings from a Country field. I wrote the application, and I suggested during the original design we should automatically fill in the other two fields, but I was overruled. (If they would not configure what country belonged to what grouping, my design would have failed, so I did it their way. The deadline was too short for me to argue about it.)
The users requested it a year later, and they realized the poor data quality from depending on users to set all 3 fields, so they decided to implement the change.
The original design pulled 3 "multi-value" text fields (like arrays) from a single record. The platform will automatically parse each element so what is before the pipe '|' is displayed to the user, and what is after the pipe is stored. Each field used its own choices.
The first revision was without me. (I am expensive, and it seems an easy change. Even I thought the full-timers could handle it.) They created a new table with a 6-field record for each Country: 3 for the codes, and 3 for the display names. Then they created 3 new fields on the form to display the names, while using the original fields to store the codes. The programming was so that every time the screen refreshed, the 3 code fields and the 2 display fields were recalculated. Even worse, each of the 5 fields did its own lookup to the table. So we have 5 lookups (across a global WAN) for every screen refresh. They also set each field twice, which may cause the issues that made them call me.
Then they told me what they wanted. In less than 2 hours (more than half spent testing and writing instructions for implementation), I delivered code that only triggers if the Country changes. It pulls from the single record (as described above) when the document loads, does one lookup (from the in-memory copy) for all the data, which is then parsed into the other 2 fields. (I did not use 3 display-only fields, since the platform supports translating the code to text as mentioned above.) I am hoping they will switch to this code soon.
Then I learn that another group has modified their own copy of the application to do this in a third method. One of the groupings is static, because everybody using this copy belongs to one of the regions, so they just hardcoded those 2 fields. They hardcoded very long IF statements that decides the other grouping based on the Country. There are 3 of these hardcoded blocks: one each for the country code, the region code, and the region display text. Each of the 3 blocks of code runs every time the screen is refreshed. Their performance is much better than the first developer since it is hardcoded, but it is unmaintainable. (Their version is supposed to allow integration back into the corporate version, but they arbitrarily changed many of the field names and some of the data formats, so any integration will require translation.)
We currently have 3 versions of the same code:
#1 is very slow (and is not functioning properly.) This application runs globally from a central server, so those 5 lookups every refresh will be eat bandwidth.
#3 is hardcoded, so maintenance requires a programmer.
#2 (mine) is configurable and is fast since it runs only when needed and uses a single lookup done once when the record loads.
Did I spend time optimizing? I probably put 5 minutes of thought into how the configuration data should be stored, but the algorithm was obvious because it was the simplest method.
Was it more expensive? I may charge 6 times what the full-time developers get paid, but my 2 hours included testing all exceptions. (If the Country is not in the lookup, then the Country code is displayed in the region fields, which alerts the user and admins that there is a problem. I could have blanked the fields, and may still if this is not acceptable.) I
I spend my life entertaining my brain.
That's not bad -- I was in charge of maintaining several QuickBASIC programs. The code was poorly written and uncommented, with the exception of one comment that appeared at the beginning of each source file:
It's also interesting to note that GOSUB was used liberally, but RETURN was not used once. Not to mention the fact that temporary files were always created in fixed locations on a shared network drive, so the programs tended to fail in creative ways (to give an example of "creative": the same temp file name was often used for, say, a list of files to view in one program and a list of files to delete in a second) when multiple users were using them...the list goes on.
A friend mentioned at lunch that he missed playing a game that ran on MS-DOS BASIC. He last tried to play it on a 486, but it moved too fast. We wondered whether there was an interpreter for MSWindows98 that would run it slow enough to be usable on modern hardware.
I remember trying to play Populous on a Pentium100. A game lasted almost 2 minutes. I flattened a mountain 3 times, but the computer player would keep wrecking it. I was going to use MoSlo Deluxe to slow it down, but Civ3 was released and I was "busy" for a few years.
Even if I could get the games from the tapes, would I want them? Do I really want to see code from when I was learning to program? It might have historical value if I become very famous, but I have not heard of people asking for the source of Bill and Paul's BASIC compiler, or their Traff-o-matic system, or Carmack's early efforts, so it is unlikely anyone would care.
I could probably write similar games in a week using today's technology. Most were variants on PacMan or SpaceInvaders. None were as good as what was in the arcade a few years later. They would not be a good resume to enter computer game development.
The tapes I have are for the VIC20. The C64 (and other color Commodore computers) used a slightly different version of BASIC. My games used PEEK and POKE to control the display, so even if I had a BASIC interpreter, I would need a VIC20 (or PET) virtual machine for them to run.
I really doubt I will ever make the effort to recover the code, even if the 10-minute cassette tapes are usable after 25 years.
I spend my life entertaining my brain.
You cannot take a given programming task out of context. Just because a given implementation is "good enough" on your machine with your hardware and your operating system doesn't mean anything. Your process has to run with other processes. They are all sharing cpu, memory, buses, storage, network. What if this image convertor was being used as part of a system to generate live streaming video on a major web server? Using the entire CPU to convert 60 FPS for a single video would then be so incredibly poor, it would be a failure. I'm not saying you have to lose sleep over squeezing every last micro-code operation out of your executables, just that performance always matters. Writing "good enough" routines leads to bloated, slow, terrible software (windows media player, real player, winamp, internet explorer, ICQ, on and on).
Why was this even posted? This guy has no clue about real performance!! Being able to process 66 images a second is nothing to brag about.. Sure he may "look" cool to non-programmers because he used edgar or whatever the hell, but i wouldn't hire him to work on anything performance intensive.. plus who the hell talks about performance in terms of execution speed times? saying 66 images a second says nothing about the relative performance about his program, algorithm OR the underlying language.
I have never thought of "unrolling loops" as optimization techniques. They are fun tricks to learn, but are bad for performance and maintenance. A good compiler will do better optimization, and using those tricks in high-level source code may hurt the optimization the compiler can do. If performance needs that kind of tuning, you should insert some assembler.
Programmers usually get to use fast machines with lots of RAM and diskspace, and often end up writing programs that need everything they have.
I mentioned in another post to this article how I had to optimize a program that I wrote on a 386, but customers were running on a XT. My quickie add-the-functionality-so-it-works was not good enough, but changing the algorithm was almost painless once we knew the feature would be used.
How to optimize
Remember that tasks cause performance hits in this order:
1. Writing to the network with confirmation.
If you are just sending to a port, it does not matter. Just write and forget. If you are waiting for a response, then this is the worst. Make certain that the response is handled as in #2.
2. Reading from the network.
Do not allow your program to idle while waiting for input. The first use of threading should be to continue processing as much as possible while waiting for the input.
2.a User interaction
User input (keyboard, mouse, etc.) is also slow, but it is very important. Make certain that your program will pause everything else to respond to the user. Try to redraw as little of the screen as possible. Do not recalculate any of the screen that was not changed. VB is the worst for this, since it wants to run all the code every time the screen is refreshed.
Even MS has trouble not refreshing the entire screen every time something minor changes. Rename a file in Windows Explorer. Did the entry stay in place (desirable)? Did it resort so you have to find it again? Did it resort and move the data so you forget where you were? Did it drop to the end of the list (Windows98)? Were you able to type the new name while another process wrote to a file in that directory (WindowsNT)? If the user wanted it sorted, they would click the title of the column. Do not try to anticipate them. (And remember where they were. Why does it always go to the first entry if you use the Up or Back buttons?)
3. Writing to a local hard drive.
Keep as much in memory as possible so it can be written once. The only reason to write more than once is during debugging, and that code should not be in production. If you really need to log the intermediate steps, think about using write-and-forget across the incredibly-fast local network to a separate box that can do the writing.
The other side is that if you are done with something, then store it so you can free the RAM. Unless you will reuse it. See #5.
4. Reading from a local hard drive.
Do it in one operation. Get all the data possible, then work with it. See #6.
5. Requesting more resources.
If you need much RAM, ask for it in one chunk. For VB, do not "Redim Preserve" arrays for every new entry. Start with an estimated size, and double it if you reach the upper bound. If you write C, then this concept was beat into you early. If you use Java, the Hashtable class does it for you.
If you constantly use several network ports, use a pool. Add each to a pool, and check if one is available before requesting new ones. Your pool should send them to garbage collection if the number being used is MUCH lower than the number being reserved. This depends on your platform. You can always pool outbound ports, but the platform usually assigns the inbound ports as needed, so optimization must be handled by the OS or platform software.
6. Know the physical resources available.
If you try to load a 2GB database into 1GB of RAM, the database is going to be swapped to the much-slower-than-RAM hard drive anyway. It would be more efficient to use 200MB ch
I spend my life entertaining my brain.
Exactly, he says it's not about (discrete) mathematics, but when it comes down to what a programmer is supposed to do, it's all discrete math. You have a Turing machine (albeit limited), and the whole point is to do what needs to be done correctly and as efficiently as possible. Some things still need to be written the same way they were in "1985", unlike the author's view that optimal code doesn't matter.
Yes, machines are faster now, by at least an order of magnitude, but optimizing poor code can speed things up more than engineering a new processor.
Bubble Sort a list of one billion points of data (O(n^2) compares = k * (1e18)) on a new 3GHz machine (assuming 1 compare per clock), and you need about 3.333e8 (333333333.33) seconds (about 10.5 years).
Weak Heap (best), Quick, or Heap sort the same billion points (O(n*log n) compares = k * (~3e10)) on an old 486/33MHz (again assuming 1 compare per clock), and you need about 9.0909e3 (9090.909) seconds (about 2.5 hours).
There you go, the author can use the new Pentium 5s, and I guess the rest of us can go dig out our 486s. Sure, you don't often need to sort a billion records, but next time you do, make sure your algorithm is reasonable, then use a language that allows you to implement it. One-size-fits-all library, garbage collection, and run-time language error checking might be good for rapid development, but doing things efficiently requires lower-level interaction, sometimes even below what C allows. Bubble sort 2 records, and you won't see much benefit, but for performance critical sections of code, sometimes it's better to optimize first, then use comments to make it readable, than to write code that looks like comments. Even adding bounds checking to the sorts would at least double the time required, which in this case could mean anywhere from another 3 or 4 hours up to the time left until you collect social security.
--That's the point of being root, you can do anything you want, even if it's stupid.
The knife's edge in performance is irrelevant for 99.9% of today's desktop applications.
But when you look at the server side, it matters. If you have a server app that takes 10% longer to generate a page, that translates almost directly (with some nonlinearity, for the nitpickers) to 10% less overall capacity. Or, conversely, it means adding about 12% more CPU/memory/disk. Since enterprise-class server units are quanitized in pretty large grains, that can mean a big chunk of change.
I'll use real numbers, but no names:
A particular app server can normally handle 1,000 concurrent sessions on one instance. One instance wants 1 CPU to itself and 2 GB RAM. So, supporting 50,000 concurrent users (the site's goal) means carrying around 60,000 concurrent sessions (sessions > users due to sessions waiting to expire).
That should have required 60 CPUs to handle requests, plus one CPU per server for the OS and one CPU per server for "auxiliary" services needed by this app server. It also means about 120GB of RAM.
That's no small chunk of change, but the picture really looks bad when you consider that inefficient application design hobbled the app server so badly that it could only handle 200 sessions per instance.
That means five times the number of CPUs and five times the amount of RAM.
There's a similar (coupled) calculation you can do with page latency and capacity. And another one you can do with page weight. A good application developer must pay attention to all of these things, because even a 1K change in page weight makes a very noticable difference to overall site capacity, bandwidth costs, and hardware costs.
"Genius may have its limitations, but stupidity is not thus handicapped." --Elbert Hubbard (1856-1915)
Gee I'm offshore and I've read the book. I musn't of understood it. Any statement like "group of" is prejudiced and so probably wrong
"stripped" and "no support" sounds like a joke to me.
Anonymous Coward since I've moderated comments for this story.
I would agree, except for Q_OBJECT and "moc."
Anyone know how to seamlessly compile Qt apps in XCode?
taken! (by Davidleeroth) Thanks Bingo Foo!