Programming As If Performance Mattered

Damn! by Spruce+Moose · 2004-05-05 16:40 · Score: 4, Funny

If only I had written my first post program with performance in mind I could have not failed it!

Re:Damn! by nacturation · 2004-05-05 17:04 · Score: 2, Funny

Next up... Slashdot: Posting as if Karma Mattered

--
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.

speed/easy coding by rd4tech · 2004-05-05 16:41 · Score: 2, Interesting

The golden rule of programming has always been that clarity and correctness matter much more than the utmost speed. Very few people will argue with that

Really? How about the server side of things?

Shameless bragging: Why don't you take a look at my page to get a whole new view on peformance?

Re:speed/easy coding by ArbitraryConstant · 2004-05-05 17:16 · Score: 4, Insightful

On the server side security is an issue (also on the client side, clearly). If your code isn't clear and correct, the number of bugs is likely to be higher than average, and bugs lead to exploits. Your libraries may be well written, I don't know specifically. It's possible to do both, just hard.

--
I rarely criticize things I don't care about.
Re:speed/easy coding by edwdig · 2004-05-05 17:22 · Score: 3, Insightful

I'd say his points are more true on the server side than the client side.

Say you're a large business, and you have a mix of client side and server side applications. Both have significant processing time requirements Which do you spend more time optimizing?

In this scenario, you're going to have a large number of client machines and a small number of servers. If servers need a little more power, you can upgrade the machine without too much disruption or money spent. The upgrade will benefit all users of the system. In this case, it's more cost effective to upgrade the server than it is to pay developers to optimize the hell out of the code.

The client machines is a different story. There's a lot of machines in use. Upgrading any one will only help the user of that computer. Optimizing the code will help every user. In this case, paying a developer to optimize your code will be a lot cheaper than doing a company wide hardware upgrade.

This is all of course assuming you're designing things well in the first place. Of course you should do things like use a quick sort (or whatever may be more appropriate in the case at hand) instead of a bubble sort. The point is its not worth spending days to get the last 1% of performance.
Re:speed/easy coding by Lord+Kano · 2004-05-05 17:56 · Score: 5, Insightful

Really? How about the server side of things?

On the server side, I'd say that correctness and clarity are even more important. I guess it's all a matter of opinion as to where the "sweet spot" is, but most programming involves finding the right balance between speed and clarity.

If you're in a situation where you need the servers to process large amounts of data, you're most likely in a position to be able to justify the expense of throwing better hardware at the problem.

LK

--
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
Re:speed/easy coding by maxwell+demon · 2004-05-05 19:21 · Score: 2, Insightful

At the server, correctness matters even more: A slow server may get overloaded by too many requests, but a fast but incorrect server process may be a security problem.

Of course, a correct and fast server is much better than a correct and slow server.

--
The Tao of math: The numbers you can count are not the real numbers.

The question I always ask is by Anonymous Coward · 2004-05-05 16:41 · Score: 5, Insightful

Is the time it takes me to do the performance optimization worth it in time or money.

Re:The question I always ask is by rpozz · 2004-05-05 16:46 · Score: 2, Informative

Performance can be quite a major thing if you're doing a lot of forking/threading (ie like a daemon). If you create 100 threads, any memory leaks or bottlenecks are multiplied 100 times.

However, 0.1s delay after clicking an 'OK' button is perfectly acceptable. It all depends on what you're coding.
Re:The question I always ask is by irokitt · 2004-05-05 16:56 · Score: 4, Insightful

Probably not, but if you are working on an open source project, we're counting on you to make it faster and better than the hired hands at $COMPANY. That's what makes OSS what it is.

--
If my answers frighten you, stop asking scary questions.
Re:The question I always ask is by prockcore · 2004-05-05 17:05 · Score: 2, Insightful

Is the time it takes me to do the performance optimization worth it in time or money.

The question I ask is, can the server handle the load any other way? As far as my company is concerned, my time is worth nothing. They pay me either way. The only issue is, and will always be, will it work? Throwing more hardware at the problem has never solved a single performance problem, ever.

We've been slashdotted twice. In the pipeline is a database request, a SOAP interaction, a custom apache module, and an XSLT transform.

Our server never even came close to its breaking point. I attribute it to optimizing for performance.
Re:The question I always ask is by bm_luethke · 2004-05-05 17:09 · Score: 3, Insightful

"Is the time it takes me to do the performance optimization worth it in time or money."

To a certain extent. I've seen that excuse for some pretty bad/slow code out there.

Writing effecient and somewhat optimised code is like writing readable extensable code: if you design and write with that in mind you usually get 90% of it done for very very little (if any) extra work. Bolt it on later and you usually get a mess that doesn't actually do what you intented.

A good programmer should always keep both clean code and fast code in mind while writing software.

--
------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
Re:The question I always ask is by tanksalot · 2004-05-05 18:04 · Score: 2, Funny

Is the time it take me to do the performance optimization worth it in time that I could be browsing /.

--
"I am not denying the existence of stupidity, or of stupid people." - phyruxus
Re:The question I always ask is by corngrower · 2004-05-05 18:11 · Score: 3, Funny

Yes, you always have to worry about those forking threads.
Re:The question I always ask is by maxwell+demon · 2004-05-05 19:26 · Score: 4, Insightful

If working on OSS, clarity of the code has to be one of the top goals. Because if the code is not clear, you're less likely to find others interested in improving it.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:The question I always ask is by ultranova · 2004-05-05 23:37 · Score: 2, Interesting

So throwing more hardware at Windows doesn't solve its performance problems ? So the parent poster was right ?

So what's your point ?

In any case, the main usability problem with Windows XP isn't slowness - it's the thousand faces of madness within ! I mean the stupid popups that keep on harassing me ! "Do you want to clean the desktop ?" No, I want to work with the app I just started ! Why is there "never bother me again" -button on the darn popup ?!?

I've only used XP on school, so I admit that my skills are propably lacking, and that a more skillfull person could propably turn all of these annoyances off. Still, it is not fun having those popups disturb me all the time, ut us not funny having the stupid operating system and Office programs hide half their menus, and it is NOT funny having that idiotic paperclip jumping up and offering useless "advice" all the time !

Sorry about the rant, but I had to get that off my chest. Feel free to mod flamebait/offtopic/troll/whatever.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

What annoys me by Anonymous Coward · 2004-05-05 16:42 · Score: 4, Insightful

is that ms word 4 did all I need, and now the newest office is a thousand times the size and uses so much more cpu and ram but does no more.

a sad inditement

Re:What annoys me by Anonymous Coward · 2004-05-05 17:01 · Score: 5, Funny

> a sad inditement

Well, it does have a spell checker now...
Re:What annoys me by DrEasy · 2004-05-05 17:21 · Score: 5, Insightful

And you aren't still using it why? (hint--your answer is the reason why MS 4 doesn't do all you need.)
Or maybe because you are forced to upgrade to read files that were created with a more recent version?

--
"In our tactical decisions, we are operating contrary to our strategic interest."
Re:What annoys me by Bush+Pig · 2004-05-05 17:27 · Score: 2, Insightful

Probably the only thing Word 4 doesn't do that he needs is read the Word 97 (or whatever) files that other people keep sending him.

--
What a long, strange trip it's been.
Re:What annoys me by Anonymous Coward · 2004-05-05 17:52 · Score: 2, Funny

a sad inditement

Your own post is a sad indictment of your spelling.
Re:What annoys me by Bush+Pig · 2004-05-06 00:45 · Score: 2, Interesting

Sure, but far too many people send Word97 files when a plain text file would have been adequate. Most people assume that you're going to have the same version of Word as they do, and go all blank when you ask for something else (I'm speaking from personal experience).

--
What a long, strange trip it's been.

Managed environments by Nick+of+NSTime · 2004-05-05 16:43 · Score: 4, Funny

I code in managed environments (.NET and Java), so I just let some mysterious thing manage performance for me.

Re:Managed environments by metlin · 2004-05-05 17:08 · Score: 4, Informative

Contrary to popular belief, managed code environments do optimize code a whole lot more than you would think!

Joe Beda, the guy from Microsoft behind Avalon, had a discussion on Channel9 where he talked about why managed code is not that bad a thing afterall.

Like I mentioned in an earlier post, managed code helps optimize the code for some of the bad programmers out there who cannot do it themselves, and takes care of a lot of exceptions and other "troublesome" things :) So, in the long run, it may not be that bad a thing afterall.

There are two facets to optimization - one is optimization of the code per se, and the other is the optimization of the project productivity - and I think managed code environments do a fairly good job of the former and a very good job of the latter.

My 0.02.
Re:Managed environments by metlin · 2004-05-05 18:20 · Score: 4, Interesting

You assume that I made that reference to myself as being a bad programmer.

The reason I made that statement was because just last week I was at Redmond for an interview for internship at Microsoft, and I was interviewed by the team that was trying to prevent just this sort of thing from happening.

The idea was to design heuristics-enabled compilers that would effectively detect any "bad-code" and help make managed code and pseudo-managed code the norm, or convert existing code into managed code.

I did not say that I was using a programming language that had such protections, merely that such programming languages have their own advantages. I was interviewed for creating compilers, linkers and OS-level protection that did not allow those troublesome things to exist - not use them - and hence my justification :)

That said, you may knowingly or unknowingly use a language designed for bad programmers even when you program C or C++ in upcoming versions of compilers that insist on managed code - they may just wrap up your code in a nice wrapper to prevent mishaps and hand it over to the linker after having taken care of your holes.
Re:Managed environments by Anonymous Coward · 2004-05-05 21:03 · Score: 2, Interesting

I don't think you are really being fair.

I've come to the belief that sending out machine-code packages is flawed, because you don't really know what the target platform is going to look like.

Example: can the processor support SIMD instructions? This can make a _HUGE_ performance difference in some applications. I would argue that if you are shipping a binary, you just wouldn't know.

Example: Code optimized for a Pentium III did not run as fast on a Pentium IV. Why? The pipeline changed and this had an effect on the way the compiler should schedule instructions.

Summary: You want the binding between program -> machine code to be as late as you can push it. This doesn't mean I advocate source distrubition since those have their own issues (see Gentoo). But an easily translatable representation like Microsoft's ILM or Java's ByteCode seems to be the solution. This is a classic versioning problem, do you want every app to deal with it? Or deal with it once at the OS level?

Not to mention having really stupid loaders causes other problems down the pipleline. Was this code compiled for thread-saftey? How about with debug information?

But back to performance, low-level efficiency matters. But how much it matters depends on the application. Thinking in terms in processing large amounts of data. Obviously you want to touch it as few times as possible: therefore O(log n) is bettern then O(n), better then O(n^2). But to get good performance: BE CACHE FRIENDLY. Make sure you access data in an array, so that if you access the same data again, you access it soon, and if you look at one element you are likely to look at one of it's neighbors.

This will keep your code running well today, and will allow future hardware to process it faster (assuming new machines use the same type of memory hierarchy only bigger). This does mean prefering indexing tricks into the array over pointers. And yes, linked lists are your enemies.

Anything else, it is going to be a wash. The first thing that researchers of some new language find out is how to get rid of 80% of the inefficiencies that make it slower than C.

In sum, concentrate more on the algorithms. Be cache friendly. Program in a language that you enjoy. And leave the pissing contest between who's language (and therefore their unit) is fundamentally faster to people who have nothing better to do.

Funny thing about performance by ObviousGuy · 2004-05-05 16:44 · Score: 5, Interesting

You can spend all your time optimizing for performance and when you finally release your product, your competition whose main objective was to get the product out the door faster, who uses a slower algorithm, is already first in mindshare with your customers. Not only that, the processors that you thought you would be targetting are already a generation behind and that algorithm that was going to hold back your competition runs perfectly fast on new processors.

Performance gains occur at the hardware level. Any tendency to optimize prematurely ought to be avoided, at least until after v1.0 ships.

--
I have been pwned because my /. password was too easy to guess.

Re:Funny thing about performance by ObviousGuy · 2004-05-05 16:49 · Score: 2, Interesting

No, I use the language's sort routine. This typically means quicksort or heapsort.

Do you code all your own algorithms?

--
I have been pwned because my /. password was too easy to guess.
Re:Funny thing about performance by corngrower · 2004-05-05 16:52 · Score: 5, Insightful

Any tendency to optimize prematurely ought to be avoided, at least until after v1.0 ships.

Assuming there is a second version, which there may not be because potential customers found that the performance of v1.0 sucked.
Re:Funny thing about performance by metlin · 2004-05-05 17:00 · Score: 4, Insightful

Well said.

However, I will dispute the claim that performance gains happen only at the hardware level - although programmers cannot really optimize every tiny bit, there is no harm in encouraging good programming.

The thing is that a lot of programmers today have grown NOT to respect the need for performance - they just assume that the upcoming systems would have really fast processors and infinite amounts of RAM and diskspace, and write shitty code.

I agree that like Knuth said, premature optimization is the root of all evil. However, writing absolutely non-optimized code is evil in itself - when a simple problem can be simplified in order and time, it's criminal not to :)

A lot of times, programmers (mostly the non-CS folks who jumped the programming bandwagon) write really bad code, leaving a lot of room for optimization. IMHO, this is a very bad practice, something that we have not really been paying much attention to because we always have faster computers coming up.

Maybe we never will hit the hardware barrier, I'm sure this will show through.
Re:Funny thing about performance by Anonymous Coward · 2004-05-05 17:03 · Score: 2, Informative

Obviously you have never done any programing wrt cryptography. Optimization is *_N-E-V-E-R_* done in the hardware!!! The difference between using a good algorithm and a crappy one is the difference between 2 days for the program to run, and fifty trillion centuries (literally). Hardware upgrades are merely incremental. Moore's law says speed doubles every 18 months, but doubling is a tiny incremental increase, if you want an exponential/logorithmic change, you have to use software. I'm not talking about "oh, twice as fast as a year ago", but "10,000 times as fast as that other software" or "1 billion times as fast".
Re:Funny thing about performance by naden · 2004-05-05 17:24 · Score: 2, Insightful

Assuming there is a second version, which there may not be because potential customers found that the performance of v1.0 sucked.

Better a version 1.0 that sucked than none at all.

And funny how Microsoft seems to release so many crappy 1.0 releases yet usually ends up clawing back to become the market leader.

--
Funtage Factor: Purple
Re:Funny thing about performance by techno-vampire · 2004-05-05 17:24 · Score: 4, Interesting

The thing is that a lot of programmers today have grown NOT to respect the need for performance - they just assume that the upcoming systems would have really fast processors and infinite amounts of RAM and diskspace, and write shitty code.
That's not the only reason. Programmers usually get to use fast machines with lots of RAM and diskspace, and often end up writing programs that need everything they have.
Back in the DOS days, I worked on a project that had a better way of doing things. We had one machine with reasonable speed as the testbed. It wasn't well optimized as we didn't expect our customers to know how to do that and the programs we were writing didn't need expanded or extended memory. If what you wrote wouldn't run on that machine, it didn't matter how well it worked on your machine, you had to tweak it to use less memory.

--
Good, inexpensive web hosting
Re:Funny thing about performance by kubrick · 2004-05-05 18:26 · Score: 5, Insightful

All Microsoft have to do is pre-announce features that won't be in their products until v3, well before the release of v1 (vide Go Corporation & Pen Windows), and that's enough to kill off the competition. Microsoft's success has become a self-fulfilling prophect for most of the market these days...

--
deus does not exist but if he does
Re:Funny thing about performance by Anonymous Coward · 2004-05-05 18:36 · Score: 2, Informative

Here's a somewhat relevant anecdote.

I interviewed at a company that makes a big deal about being super duper technical on their web site. They had a written coding problem as part of the interview. (A good sign!)

They left me in a room with a non-networked PC with instructions get as far as possible in writing a program in Java to take an initial date and a number of days to add or subtract, and figure out what the resulting date would be. The test instructions contained a detailed explanation of the workings of the Gregorian calendar system. The PC had Windows and the JDK installed on it, and just about nothing else. They gave me a pretty short period of time to do it in - 15 or 20 minutes, if I remember right.

At first I had to call the interviewer back in so that I could show him that there were about TEN different past solutions still stitting on the hard drive, and that I was going to delete them all while he watched and ask him to start the clock again. (Lamers...)

When I read the problem I realized that it was very easily solved using the java.util.GregorianCalendar class that comes with the JDK. I didn't remember exactly how to use it but fortunately the installed JDK on this PC also included the JDK source. I javadoc'd the source to GregorianCalendar and Calendar and read the docs, wrote my app, and tested it thoroughly. Of course it didn't take long to get it working, since the hard work was already done. I had to walk all over the office looking for the interviewer, who apparently wasn't expecting me to actually complete the task within the allotted time.

When I reviewed my very short program with the proctor and explained all of the things that I had done in order to do it that way, he seemed upset, as though I cheated. I tried to make a case for the fact that I had passed up a chance to actually cheat and then been resourceful, but he wasn't convinced. I didn't do it exactly the slow and tedious way they wanted, so I was wrong. I pointed out that if I was on a project and caught a developer duplicating base JDK functionality due to plain ignorance of the class library, I'd consider that a *bad* thing, not an example of technical excellence.

The rest of the interview went OK, but they eventually called me back and said there was a hiring freeze. Well maybe so, since it was in 2000 or 2001 (I don't remember exactly when), or maybe not. I wasn't exactly crushed.

Since then, the hard-skills tests that I use when interviewing developer candidates includes something like this for the relevant environment... kinda like you said. Something like "read in a text file, sort the lines, and print it out in sorted order". If their program includes a sort routine, BZZT, they failed the test.
Re:Funny thing about performance by Molina+the+Bofh · 2004-05-05 20:15 · Score: 4, Funny

I do Dumbsort.

Dumbsort works something like this:
:loop randomize (array) if (sorted) goto next goto loop :next
Straight from MS-style programming books.

--

-
Roses are #FF0000, Violets are #0000FF, find / -name '*base*' |xargs chown -R us && mv zig greatjustice
Re:Funny thing about performance by almaw · 2004-05-05 21:07 · Score: 4, Insightful

> Performance gains occur at the hardware level.
> Any tendency to optimize prematurely ought to be
> avoided, at least until after v1.0 ships.

Performance gains occur at the algorithm level. It doesn't matter how much hardware you throw at a problem if it needs to scale properly and you have an O(n^3) solution.
Re:Funny thing about performance by Moraelin · 2004-05-05 22:48 · Score: 5, Insightful

Well, yes and no.

I still don't think you should start doing every single silly trick in your code, like unrolling loops by hand, unless there's a provable need to do so. Write clearly, use comments, and use a profiler to see what needs to be optimized.

That is coming from someone who used to write assembly, btw.

But here's the other side of the coin: I don't think he included better algorithms in the "premature optimization". And the same goes for having some clue of your underlying machine and architecture. And there's where most of the problem lies nowadays.

E.g., there is no way in heck that an O(n * n) algorithm can beat an O(log(n)) algorithm for large data sets, and data sets _are_ getting larger. No matter how much loop unrolling you do, no matter how you cleverly replaced the loops to count downwards, it just won't. At best you'll manage to fool yourself that it runs fast enough on those 100 record test cases. Then it goes productive with a database with 350,000 records. (And that's a small one nowadays.) Poof, it needs two days to complete now.

And no hardware in the world will save you from that kind of a performance problem.

E.g., if most of the program's time is spent waiting for a database, there's no point in unrolling loops and such. You'll save... what? 100 CPU cycles, when you wait 100,000,000 cycles or more for a single SQL query? On the other hand, you'd be surprised how much of a difference can it make if you retrieve the data in a single SQL query, instead of causing a flurry of 1000 individual connect-query-close sequences.

(And you'd also be surprised how many clueless monkeys design their architecture without ever thinking of the database. They end up with a beautiful class architecture on paper, but a catastrophic flurry of querries when they actually have to read and write it.)

E.g., if you're using EJB, it's a pointless exercise to optimize 100 CPU cycles away, when the RMI/IIOP remote call's overhead is at least somewhere between 1,000,000 and 2,000,000 CPU cycles by itself. That is, assuming that you don't also have network latency adding to that RPC time. On the other hand, optimizing the very design of your application, so it only uses 1 or 2 RPC calls, instead of a flurry of 1000 remote calls to individual getters and setters... well, that might just make or break the performance.

(And again, you'd be surprised how many people don't even know that those overheads exist. Much less actually design with them in mind.)

So in a nutshell, what I'm saying is: Optimize the algorithm and design, before you jump in to do silly micro-level tricks. That's where the real money is.

--
A polar bear is a cartesian bear after a coordinate transform.
Re:Funny thing about performance by hankaholic · 2004-05-05 23:32 · Score: 3, Insightful

I was going to moderate this post "Overrated", but I'd rather just explain why you're wrong in stating that the "algorithm that was going to hold back your competition runs perfectly fast on new processors".

Certain algorithms take more-than-proportionately longer as the data size increases. For example, if you're writing route-planning software, each additional stop on a route might cause the number of calculations required to (roughly) double.

In such a case, having hardware which is twice as powerful would mean that performance would half, although as soon as the user added two more data points, the performance would be slower than the original machine.

To clarify a tad, let's say FedEx decides to optimize the routes drivers in Montana are travelling. Assume that there are 10,000 stops and 200 drivers, and that your code runs in, say, an hour on FedEx's machines.

Assume that you've used an algorithm for which each additional data point doubles the amount of computation required. Now FedEx deciding to hire 10 more drivers means that your route planning software is going to take 2^10 times as long to plan their routes (since it doubles for each new data point, that's 2^1 for one driver, 2^2 for two, 2^3 for three...).

The point is that tiny operations add up when you've chosen the wrong algorithm. Despite the fact that runtime was fine using FedEx's CPU farm in the original situation, your disregard for efficiency will cause the route-planning time to take not the overnight-batch-job-friendly hour, but a stunning 1024 times as long (hint: over a month).

Say a new big fast machine enters the market, with four times the CPU power. FedEx will still need 256 times as many machines to perform the same calculations in under an hour, or at least, say, 32 times as many in order to be able to perform them overnight.

All because you decided that choosing algorithms based on performance was poppycock.

Prematurely optimizing on a microscopic level may be "bad", but choosing the proper algorithm can make the difference between a product with a future and a product with too many designed-in limitations to be able to handle greater-than-expected loads.

(CS fans will note that the TSP problem was a unrefined to have pulled out given the whole P/NP thing, but that's the point -- sticky situations can and will arise for which no amount of source-level optimization will save the day.)

--
Somebody get that guy an ambulance!
Re:Funny thing about performance by Vengie · 2004-05-06 00:33 · Score: 3, Insightful

You're ignoring constants. Constants can sometimes be large. That is why strassen's matrix multiply method takes longer than the naive method on small matricies.

Scarily, you have just enough knowledge to sound like you know what you're talking about. Sometimes it DOES matter how much hardware you throw at the problem, lest you forget the specialized hardware DESIGNED to crack DES.

How about your next computer I replace all the carry-lookahead adders with ripple-carry adders? Please look up those terms if you don't know them. I'm sure you'd be unpleasantly surprised.

--
When in doubt, parenthesize. At the very least it will let some poor schmuck bounce on the % key in vi. (Larry Wall)
Re:Funny thing about performance by Valar · 2004-05-06 04:25 · Score: 2, Interesting

Actually, when I was TAing data structures, we called that 52 card pick up sort. You take the deck of cards and throw it in the air. Pick up the cards and if they end up sorted, stop. If not, throw them in the air again. We used it as an example of "just because it works, doesn't mean you should do it" and as an example of algorithms with big os in the 'bad' column.

--

====
Crudely Drawn Games
Re:Funny thing about performance by s00p41337h4x0r · 2004-05-06 08:09 · Score: 2, Informative

That's a well known algorithm called "Bogosort" in the Jargon File.
Interesting thing about it is that it is one of the few algorithms that has an expected running time of O(n!). If you're teaching an intro algorithms class it's easy to come up with examples of O(lg n), O(n), O(n lg n), O(n^2), and O(2^n) in lecture but O(n!) is tricky. Useful as an extra credit question.

the software taketh what the hardware giveth. by equex · 2004-05-05 16:46 · Score: 4, Insightful

i remember times when 7.14mhz and 256k ram was enough to drive a multitasking windowed os. (amiga)
ive seen glenz vectors and roto-zoomers on the commodore 64.
modern os's, escpecially windows seem super-sluggish when you see what is possible on those old computers if you just care to optimize the code to the max.

--
Can I light a sig ?

Re:the software taketh what the hardware giveth. by neil.orourke · 2004-05-05 17:07 · Score: 2, Interesting

But the great demos on the Amiga and C64 never hit the OS.

Have a look at some of the PC demos that boot from DOS and take over the machine (eg. www.scene.org) and tell me that they aren't just as amazing.
Re:the software taketh what the hardware giveth. by Anonymous Coward · 2004-05-05 18:12 · Score: 3, Informative

One of the most amazing PC demos I ever saw was a 256 byte intro that ran under DOS (I forget the name of it).
This one?
Re:the software taketh what the hardware giveth. by Anonymous Coward · 2004-05-05 18:13 · Score: 3, Informative

The program you're looking after is "tube". If you want to get seriously impressed, have a look at "lattice" instead.
However, you can't achieve the same easily in linux since
a) putting pixels is more than just writing to 0a0000h and
b) elf format has actually some structure. (iirc program that merely returns 42 takes 53 bytes and uses quite obscene amount of trickery to achieve that)

These will probably bloat the linux version to something like 512 bytes;) Oh dear.

If everyone paid attention in english class... by fervent_raptus · 2004-05-05 16:48 · Score: 2, Informative

this slashdot post would read:

I just finished reading the essay "Programming as if Performance Mattered", by James Hague. The essay covers how compiler optimization has changed over the years. If you get bored, keep reading; there's a big 'gotcha' in the middle. Hague begins: "Will performance issues haunt us forever? This essay puts performance analysis in perspective."

I think... by rms_nz · 2004-05-05 16:49 · Score: 2, Interesting

...it would have been better for him to show the run times for all the versions of his program to show us what difference each of the changes had made...

Make the common case fast by DakotaSandstone · 2004-05-05 16:49 · Score: 2, Interesting

Yes, yes, yes. Do optimize. But, come on people, do we really need to turn that nice readable device init code that only executes once into something like:

for (i=0,j=0,init();i!=initDevice(j);j++,writeToLog()) ;

Sheesh!

--
Nothing is so smiple that it can't get screwed up.

You don't optimize, that's the job of the compiler by Anonymous Coward · 2004-05-05 16:50 · Score: 2, Insightful

If you write clear and simple code the compiler or interpreter does all the other work. It will automatically remove unused code and simplify complex segments. So long as your code is not unnecessarily convoluted often the machine optimizations are better than the human brain optimizations. It's like register allocation. You don't do that by hand. That's just crazy! Some poor fools 20 years ago had to do it by hand and came up with an algorithm to do it that the computer just does for you.

That's the difference between modern languages and more archaic ones. Sure you can't get the "absolute best" most optimized optimization, but you're probably going to get a better optimization than you can think of just from the interpreter/compiler doing its job.

The only thing that really needs optimization is streamlining data structures because the compiler can't predict what part of the data structure isn't used during runtime. You just ned to make sure you use the right data structure for the job and put the basic pen-and-paper (optimized) algorithm down in plain code. No strange hacker tricks needed.

Don't agree by Docrates · 2004-05-05 16:51 · Score: 4, Interesting

While the author's point seems to be that optimization and performance are not all that important, and that you can achieve better results with how you do things and not what you use, I tend to disagree with him.

The thing is, in real life applications, playing with a Targa file is not the same as service critical, 300 users, number crunching, data handling systems, where a small performance improvement must be multiplied by the number of users/uses, by many many hours of operation and by years in service to understand its true impact.

Just now I'm working on an econometric model for the Panama Canal (they're trying to make it bigger and need to figure out if it's worth the effort/investment) and playing with over 300 variables and 100 parameters to simulate dozens of different scenarios can make any server beg for more cycles, and any user beg for a crystal ball.

--

There are two kinds of people in the world: Those with good memory.

Re:Don't agree by mfago · 2004-05-05 17:08 · Score: 4, Insightful

Not what I got out of it at all, rather:

Clear concise programs that allow the programmer to understand -- and easily modify -- what is really happening matter more than worrying about (often) irrelevant details. This is certainly influenced by the language chosen.

e.g. I'm working on a large F77 program (ugh...) that I am certain would be much _faster_ in C++ simply because I could actually understand what the code was doing, rather than trying to trace through tens (if not hundreds) of goto statements. Not to mention actually being able to use CS concepts developed over the past 30 years...
Re:Don't agree by Frobnicator · 2004-05-05 18:33 · Score: 5, Insightful

Just now I'm working on an econometric model ... in real life applications, playing with a Targa file is not the same as service critical, 300 users, number crunching, data handling systems, where a small performance improvement must be multiplied by the number of users/uses, by many many hours of operation and by years in service to understand its true impact. ... I'm playing with over 300 variables and 100 parameters to simulate dozens of different scenarios can make any server beg for more cycles, and any user beg for a crystal ball.

I don't think that fits into the description the article was talking about.
The point of this article is not targeted to you. I've seen interns as recent as last year complain about the same things mentioned in the article: division is slow, floating point is slow, missed branch prediction is slow, use MMX whenever more than one float is used, etc.
The point I get out of the article is not to bother with what is wasteful at a low level, but be concerned about the high levels. A common one I've seen lately is young programmers trying to pack all their floats into SSE2. Since that computation was not slow to begin with, they wonder why all their 'improvements' didn't speed up the code. Even the fact that they are doing a few hundred unneccessary matrix ops (each taking a few hundred CPU cycles) didn't show up on profiling. Their basic algorithm in a few cases I'm thinking about are either very wasteful, or could have been improved by a few minor adjustments.
The article mentions some basic techniques: choosing a different algorithm, pruning data, caching a few previously computed results, finding commonalities in data to improve the altorithm. Those are timeless techniques, which you probably have already learned since you work on such a big system. Writing your code so that you can find and easily implement high-level changes; that's generally more important than rewriting some specific block of code to run in the fewest CPU cycles.
A very specific example. At the last place I worked, there was one eager asm coder who write template specializations on most of the classes in the STL for intrinsic types in pure asm. His code was high quality, and had very few bugs. He re-wrote memory management so there were almost no calls to the OS for memory. When we used his libraries, it DID result in some speed gains, and it was enough to notice on wall-clock time.
However... Unlike his spending hundreds of hours on this low-return fruit, I could spend a day with a profiler, find one of the slower-running functions or pieces of functionality, figure out what made it slow, and make some small improvements. Usually, a little work on 'low-hanging fruit', stuff that gives a lot of result for a little bit of work, is the best place to look. For repeatedly computed values, I would sometimes cache a few results. Other times, I might see if there is some few system functions that can be made to do the same work. On math-heavy functions, there were times when I'd look for a better solution or 'accurate enough but much faster' solution using calculus. I'd never spend more than two days optimizing a bit of functionality, and I'd get better results than our 'optimize it in asm' guru.
Yes, I would spend a little time thinking about data locality (stuff in the CPU cache vs. ram) but typically that doesn't give me the biggest bang for the buck. But I'm not inherently wasteful, either. I still invert and multiply rather than divide (it's a habit), but I know that we have automatic vectorizers and both high-level and low-level optimizers in our compilers, and an out-of-order core with AT LEAST two floating point, two integer, and one memory interface unit.
And did I mention, I write software with realtime constraints; I'm constantly fighting my co-workers over CPU quotas. I read and refer to the intel and AMD processor documentation, but usually only to see which high-level functionality best lends itself to the hardware. I am tempted to 'go straight to the metal' occasionally, or to count the CPU cycles of everything, but I know that I can get bigger gains elsewhere. That's what the point of the article is, I believe.

--
//TODO: Think of witty sig statement
Re:Don't agree by alex_tibbles · 2004-05-05 19:49 · Score: 2, Insightful

Exactly. The point of the article is (as someone else pointed out) is that clear, high-level code is easy to optimize, since it is easy to understand, and thus it's easy to reason about the code.
It doesn't sound like any low level work is going to get you anywhere in your simulation. The best bet is to buy lots of hardware to brute-force it...
... or get smart! Reason about the problem. Is it important to evaluate the function for all possible combinations of all possible values all the variables and parameters? Is there any hidden constraint or relationship between those variables? The economic model which provides those variables may well have made a distinction between two variables, or an assumption of the independence of two variables, which is not relevant to your modelling.
Or might statistics/ heuristics help? Picking the most likely region(s) (based on other theory) for computation, calculating that (those) first, and then working out into (assuming probability is smooth) less likely region(s).
As the article points out, these kinds of optimizations (very similar conceptually to those in the article) are those easiest to do in a very high-level language.

--
Posters recognized by their sig,

The Longhorn developers... by ErichTheWebGuy · 2004-05-05 16:51 · Score: 4, Funny

should really read that essay! Maybe then we wouldn't need dual-core 4-6 GHz CPUs and 2GB ram to run their new OS.

--
bash: rtfm: command not found

Re:The Longhorn developers... by julesh · 2004-05-05 22:38 · Score: 2, Informative

Maybe then we wouldn't need dual-core 4-6 GHz CPUs and 2GB ram to run their new OS.

The reason they're targetting this kind of system is because the hardware will probably be cheaper than Windows itself by the time Longhorn comes out.

I'm sure they'll let you switch off the flash features that need it, though. All recent versions of Windows have been able to degrade to roughly the same performance standard as the previous version if you choose the right options.

Premature Optimization by Godeke · 2004-05-05 16:53 · Score: 4, Insightful

One of the concepts touched upon is the idea that optimization is only needed after profiling. Having spent the last few years building a system that sees quite a bit of activity, I have to say that we have only had to optimize three times over the course of the project.

The first was to get a SQL query to run faster: a simple matter of creating a view and supporting indexes.

The second was also SQL related, but on a different level: the code was making many small queries to the same data structures. Simply pulling the relevant subset into a hash table and accessing it from there fixed that one.

The most recent one was more complex: it was similar to the second SQL problem (lots of high overhead small queries) but with a more complex structure. Built an object to cache the data in with a set of hashes and "emulated" the MoveNext, EOF() ADO style access the code expected.

We have also had minor performance issues with XML documents we throw around, may have to fix that in the future.

Point? None of this is "low level optimization": it is simply reviewing the performance data we collect on the production system to determine where we spend the most time and making high level structural changes. In the case of SQL vs a hash based cache, we got a 10 fold speed increase simply by not heading back to the DB so often.

Irony? There are plenty of other places where similar caches could be built, but you won't see me rushing out to do so. For the most part performance has held up in the face of thousands of users *without* resorting to even rudementry optimization. Modern hardware is scary fast for business applications.

--
Sig under construction since 1998.

Performance is relative by jesup · 2004-05-05 16:57 · Score: 4, Interesting

66 fps on a 3 GHz machine, doing a 600x600 simple RLE decode...

Ok, it's not bad for a a language like Erlang, but it's not exactly fast.

The big point here for the author is "it's fast enough". Lots of micro- (and macro-) optimizations are done when it turns out they aren't needed. And writing in a high level language you're comfortable in is important, if it'll do the job. This is a good point.

On the other hand, even a fairly naive implementation in something like C or C++ (and perhaps Java) would probably have acheived the goal without having to make 5 optimization passes (and noticable time examining behavior).

And even today, optimizations often do matter. I'm working on code that does pretty hard-real-time processing on multiple threads and keeps them synchronized while communicating with the outside world. A mis-chosen image filter or copy algorithm can seriously trash the rest of the system (not overlapping DMA's, inconvenient ordering of operations, etc). The biggest trick is knowing _where_ they will matter, and generally writing not-horrible-performance (but very readable) code as a matter of course as a starting point.

Disclaimer: I was a hard-core ASM & C programmer who for years beta-tested 680x0 compilers by critiquing their optimizers.

Re:Performance is relative by Wocko · 2004-05-05 17:54 · Score: 2

Jesus, it's a 4-digit UID post-off!

Depends on your target by KalvinB · 2004-05-05 17:05 · Score: 4, Insightful

Working on a heavily math based application speed is necessary to the point that the user is not expected to wait a significant amount of time without something happening. I have a large background in game programming working on crap systems and it comes in handy. My tolerance for delays goes to about half a second for a complete operation. It doesn't matter how many steps are needed to perform the operation, it just all has to be done in less than half a second on a 1200Mhz system. My main test of performance is seeing how long it takes for Mathematica to spit out an answer compared to my program. Mathematica brags about being the fastest and most accurate around.

When operations take several seconds a user gets annoyed. The program is percieved to be junk and the user begins looking for something else that can do the job faster. It doesn't matter if productivity is actually enhanced. It just matters that it's percieved to be enhanced or that the potential is there.

You also have to consider if the time taken to complete an operation is just because of laziness. If you can easily make it faster, there's little excuse not to.

For distributed apps you have to consider the cost of hardware. It may cost several hours of labor to optimize but it may save you the cost of a system or few.

In the world of games half a second per operation works out to 2 frames per second which is far from acceptible. Users expect at minimum 30 frames per second. It's up to the developer to decide what's the lowest system they'll try to get that target on.

You have to consider the number of users that will have that system vs the amount it will cost to optimize the code that far.

In terms of games you also have to consider that time wasted is time possibly better spent making the graphics look better. You could have an unoptimized mesh rendering routine, or a very fast one and time left over to apply all the latest bells and whistles the graphics card has to offer.

There are countless factors in determining when something is optimized enough. Games more so than apps. Sometimes you just need to get it out the door and say "it's good enough."

Ben

--
Work Safe Porn

But in some cases performance counts by jbms · 2004-05-05 17:07 · Score: 2, Interesting

As another user commented, server software can benefit greatly from a large variety of optimizations, since better performance translates directly into supporting more users on fewer/cheaper servers.

Optimizations also have significant effect in software designed to perform complex computations, such as scheduling.

Also, the trend of ignoring performance considerations with the claim that modern hardware makes optimizations obselete is precisely what leads to the trend, particularly among Microsoft software, for the software to become significantly slower with each revision.

Article puts it all in perspective by Debian+Troll's+Best · 2004-05-05 17:09 · Score: 4, Funny

I'm currently completing a degree in computer science, with only a few courses left to take before I graduate. Man, I wish I had read that article before last semester's 'Systems Programming and Optimization' course! It really puts a lot of things into perspective. So much of a programmer's time can get caught up in agonizing over low-level optimization. Worse than that are the weeks spent debating language and design choices with fellow programmers in a team. More often than not, these arguments boil down to personal biases against one language or another, due to perceived 'slowness', rather than issues such as 'will this language allow better design and maintenance of the code', or 'is a little slow actually fast enough'?

A particular illustration of this was in my last semester's 'Systems Programming and Optimization' course. The professor set us a project where we could choose an interesting subsystem of a Linux distro, analyze the code, and point out possible areas where it could be further optimized. I'm a pretty enthusiastic Debian user, so I chose to analyze the apt-get code. Our prof was very focused on low-level optimizations, so the first thing I did was to pull apart apt-get's Perl codebase and start to recode sections of it in C. At a mid-semester meeting, the professor suggested that I take it even further, and try using some SIMD/MMX calls in x86 assembly to parallelize package load calls.

This was a big ask, but me and my partner eventually had something working after a couple of weeks of slog. By this stage, apt-get was *flying* along. The final step of the optimization was to convert the package database to a binary format, using a series of 'keys' encoded in a type of database, or 'registry'. This sped up apt-get a further 25%, as calls to a machine-readable-only binary registry are technically superior to old fashioned text files (and XML was considered too slow)

Anyway, the sting in the tail (and I believe this is what the article highlights) was that upon submission of our project, we discovered that our professor had been admitted to hospital to have some kidney stones removed. In his place was another member of the faculty...but this time, a strong Gentoo supporter! He spent about 5 minutes reading over our hand-coded x86 assembly version of apt-get, and simply said "Nice work guys, but what I really want to see is this extended to include support for Gentoo's 'emerge' system...and for the code to run on my PowerMac 7600 Gentoo PPC box. You have a week's extension'

Needless to say, we were both freaking out. Because we had focused so heavily on optimization, we had sacrificed a lot of genericity in the code (otherwise we could have just coded up 'emerge' support as a plug-in for 'apt-get'), and also we had tied it to Intel x86 code. In the end we were both so burnt out that I slept for 2 days straight, and ended up writing the 'emerge' port in AppleScript in about 45 minutes. I told the new prof to just run it through MacOnLinux, which needless to say, he wasn't impressed with. I think it was because he had destroyed his old Mac OS 8 partition to turn it into a Gentoo swap partition. Anyway, me and my partner both ended up getting a C- for the course.

Let this be a lesson...read the article, and take it in. Optimization shouldn't be your sole focus. As Knuth once said, "premature optimisation is the root of all evil". Indeed Donald, indeed. Kind of ironic that Donald was the original professor in this story. I don't think he takes his work as seriously as he once did.

Re:Article puts it all in perspective by Xoro · 2004-05-05 17:26 · Score: 2

Oh, come on. Who modded this up? Funny, I could see, but "Interesting"?

The final step of the optimization was to convert the package database to a binary format, using a series of 'keys' encoded in a type of database, or 'registry'.

It's a joke.

--
Kill, Tux, kill!
Re:Article puts it all in perspective by joib · 2004-05-05 17:48 · Score: 2, Informative

WTF are you talking about?

I'm staring at the apt codebase on my screen just now, and it's all C++, baby. Ok, so there is a trivial amount of perl; sloccount summary:

Totals grouped by language (dominant language first):
cpp: 26481 (89.75%)
sh: 2816 (9.54%)
perl: 209 (0.71%)

This is for apt-0.5.14, but I can't imagine that the newest version in unstable (0.5.24) would be that different.

Now, if the rest of your story is true, that's mind-boggling. If the new teacher refused to judge your, from your description very fine, work just because he has a serious hard-on for gentoo, I seriously believe you should have taken it up with the dean of the faculty instead of just swallowing it and later complaining on /..

That being said, why chose apt in the first place? Now, I haven't profiled apt, but I guess it spends the majority of time waiting on network i/o or waiting for dpkg to finish anyway.
Re:Article puts it all in perspective by defile · 2004-05-05 18:10 · Score: 2, Interesting

How is this fair? He completely and utterly changed the entire assignment on you forcing you to throw all of your work away. And gave you one week for it!?

apt-get and emerge are two totally different implementations of the same idea. Changing the environment on you may have taught you a lesson about how optimizing eliminates robustness, but if the last professor encouraged you to try MMX/SIMD instructions then you were totally right to tie yourself to the x86.

I would've kicked that moron's ass.

If feature X were important, we'd code in Y by wintermute42 · 2004-05-05 17:13 · Score: 2, Offtopic

The economist Brian Arthur is one of the proponents of the theory of path dependence. In path dependence something is adopted for reasons that might be determined by chance (e.g., the adoption of MS/DOS) or by some related feature (C became popular in part because of UNIX's popularity).

The widespread use of C and C++, languages without bounds checking in a world where we can afford bounds checking, is not so much a matter of logical decision as history. C became popular, C++ evolved from C and provided a some really useful features (objects, expressed as classes). Once C++ started to catch on, people used C++ because others used it and an infrastructure developed (e.g., compilers, libraries, books). In sort, the use of C++ is, to a degree, a result of path dependence. Once path dependent characteristics start to appear, choices are not necessarily made on technical virtue. In fact, one could probably say that the times when we make purely rational, engineering based decisions (feature X is important so I'll use language Y) are outweighed by the times when we decide on other criteria (my boss say's we're gonna use language Z).

optimize with discretion by kaan · 2004-05-05 17:15 · Score: 2, Insightful

All projects are an exercise in scheduling, and something is always bound to fall of the radar given the real-world time constraints. In my experience, the right thing to do is get a few smart people together to isolate any problem areas of the product, and try to determine whether that code might produce performance bottlenecks in high-demand situations. If you find any warning areas, throw your limited resources there. Don't fret too much about the rest of the product.

In the business world, you have to satisfy market demands and thus cannot take an endless amount of time to produce a highly optimized product. However, unless you are Microsoft, it is very difficult to succeed by quickly shoving a slow pile of crap out the door and calling it "version 1".

So where do you optimize? Where do you concentrate your limited amount of time before you miss the window of opportunity for your product?

I know plenty of folks in academia who would scoff at what I'm about to say, but I'll say it anyway... just because something could be faster, doesn't mean it has to be. If you could spend X hours or Y days tweaking a piece of code to run faster, would it be worth it? Not necessarily. It depends on several things, and there's no really good formula, each case ought to be evaluated individually. For instance, if you're talking about a nightly maintenance task that runs between 2am and 4am when nobody is on the system, resource consumption doesn't matter, etc., then why bother making it run faster? If you have an answer, then good for you, but maybe you don't and should thus leave that 2 hour maintenanc task alone, spend your time doing something else.

For people who are really into performance optimization, I say get into hardware design or academia, because the rest of the business world doesn't really seem to make time for "doing things right" (just an observation, not my opinion).

One thing new programmers often miss by xant · 2004-05-05 17:16 · Score: 2, Insightful

Less code is faster than more code! Simply put, it's easier to optimize if you can understand it, and it's easier to understand if there's not so much of it. But when you optimize code that didn't really need it, you usually add more code; more code leads to confusion and confusion leads to performance problems. THAT is the highly-counterintuitive reason premature optimization is bad: It's not because it makes your code harder to maintain, but because it makes your code slower.

In a high-level interpreted language with nice syntax--mine is Python, not Erlang, but same arguments apply--it's easier to write clean, lean code. So high-level languages lead to (c)leaner code, which is faster code. I often find that choosing the right approach, and implementing it in an elegant way, I get performance far better than I was expecting. And if what I was expecting would have been "fast enough", I'm done -- without optimizing.

--
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.

Re:One thing new programmers often miss by AaronW · 2004-05-05 18:52 · Score: 4, Interesting

Less code does not equal faster code. You can usually get the best performance increases by using a better algorithm. For example, if you're doing a lot of random adding and deleting of entries, a hash table will be much faster than a linked list. This will have the greatest impact.

Other things that can help are doing your own memory management at times (i.e. freelists) since that will be faster than malloc/new, and will have less memory overhead. Also, design your storage to your data. If you know you'll allocate up to 64K of an item, and the item is small, allocate an array of 64K of them and maintain a freelist. This will use a lot less memory than dynamically allocating each item and will result in better locality.

I write code in the embedded space, where memory usage and performance are both equally important. Usually getting the last ounce of performance out of the compiler doesn't make much difference.

A good real-world example is that I replaced the malloc code provided by a popular embedded OS with DLMalloc, which glibc is based. The dlmalloc code is *much* more complicated, and the code path is much longer, but due to much better algorithms, operations that took an hour with the old simple malloc dropped down to 3 minutes. It went from exponential to linear time.

-Aaron

--
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.

Why aren't optimized algorithms best practices? by ObviousGuy · 2004-05-05 17:17 · Score: 3, Interesting

You would think that with all the years put into developing computer languages, as well as the decades of software engineering, that these algorithms and techniques would make their way into best practices.

This, of course, has already begun with many frequently used algorithms like sorting or hashing being made part of the language core libraries, but more than that, it seems that duplicating effort occurs much more often than simply that.

This is one instance where Microsoft has really come through. Their COM architecture allows for inter-language reuse of library code. By releasing a library which is binary compatible across different languages, as well as backwards compatible with itself (v2.0 supports v1.9), the COM object architecture takes much of the weight of programming difficult and repetitive tasks out of the hands of programmers and into the hands of library maintainers.

This kind of separation of job function allows library programmers the luxury of focusing on optimizing the library. It also allows the client programmer the luxury of ignoring that optimization and focusing on improving the speed and stability of his own program by improving the general structure of the system rather than the low level mundanities.

Large libraries like Java's and .Net's as well as Smalltalk's are all great. Taking the power of those libraries and making them usable across different languages, even making them scriptable would bring the speed optimizations in those libraries available to everyone.

--
I have been pwned because my /. password was too easy to guess.

Code tweaking by Frequency+Domain · 2004-05-05 17:20 · Score: 5, Insightful

You get way more mileage out of choosing an appropriate algorithm, e.g., an O(n log n) sort instead of O(n^2), than out of tweaking the code. Hmmm, kind of reminds me of the discussion about math in CS programs.

Every time I'm tempted to start micro-optimizing, I remind myself of the following three simple rules:

1) Don't.
2) If you feel tempted to violate rule 1, at least wait until you've finished writing the program.
3) Non-trivial programs are never finished.

Painful P-ful Post by Dominic_Mazzoni · 2004-05-05 17:24 · Score: 4, Funny

Proper programming perspective? Please. People-centered programming? Pretty pathetic.

Programmer's purpose: problem-solving. Programmers prefer power - parallelizing, profiling, pushing pixels. Programmers prefer Pentium PCs - parsimonious processing power. Pentium-optimization passes Python's popularity.

Ponder.

[Previous painful posts: P, D]

Optimizations in the Real World by NegativeK · 2004-05-05 17:27 · Score: 2, Informative

Optimization isn't really a hard topic. Should a programmer spend days nitpicking fifty lines of code that won't be used frequently? No. When initially writing code, should someone use Bogosort instead of Quicksort ? I'll let you figure that one out.
My biggest (reasonable) beef in the optimization area is software bloat. Programs like huge office suites containing excessive, poorly implemented crap that people won't use really ticks me off. KISS. Even the stuff that has to be complicated.

Of course, I'll always be a sucker for tweaking code for the fun of it, when I have the time. =)

--
This statement is false.

Optimizations are a varied lot by corngrower · 2004-05-05 17:27 · Score: 2, Interesting

Often times to get improved performance you need to examine the algorithms used. At other times, and on certain cpu architectures, things that slow your code can be very subtle.

If you're code must process a large amount of data, look for ways of designing your program so that you serially process the data. Don't try to bring large amounts of data from a database or data file all at once if you don't have too. Once you are no longer able to contain the data in physical memory, and the program starts using 'virtual' memory, things slow down real fast. I've seen architects forget about this, which is why I'm writing this reminder.

On the other hand I've worked on a C++ project where, in a certain segment of the code, it was necessary to write our own container class to replace one of the std: classes, for performance on the SPARC architecture. Using the std: container would cause the subroutines to nest deeply enough to so that the cpu registers needed to be written out out to slower memory. The effect was enough to be quite noticeable in the app.

With today's processors, to optimize for speed, you have to think about memory utilization, since running within cache is noticably faster than from main memory. Things are not as clear cut, so far as speed optimization goes, as they once were.

Re:You don't optimize, that's the job of the compi by techno-vampire · 2004-05-05 17:31 · Score: 4, Insightful

If you write clear and simple code the compiler or interpreter does all the other work.

I remember looking over something once that was clear, simple and very slow. It was a set of at least twenty if statements, testing the input and setting a variable. The input was tested against values in numeric order, and the variable was set the same way. Not even else if's so that the code had to go through every statement no matter the value. I re-wrote it to a single if, testing to see if the input were in the appropriate range and calculating the variable's value. No compiler is going to do that. Brute force can be clear, simple and slow.

--
Good, inexpensive web hosting

Performance, an aspect of design and understanding by StevenMaurer · 2004-05-05 17:33 · Score: 2, Insightful

This article seems to be something that I learned twenty years ago... performance is an aspect of good design.

That is why I insist on "optmization" in the beginning. Not peephole optimization - but design optimization. Designs (or "patterns" in the latest terminology) that are fast are also naturally simple. And simple - while hard to come up with initially - is easy to understand.

But that's also why I discount any "high level language is easier" statement, like this fellow makes. It is significantly harder to come up with a good architecture than learning to handle a "hard" language. If you can't do the former (including understanding the concepts of resource allocation, threads, and other basic concepts), you certainly aren't going to do the latter. Visual Basic is not an inherently bad language because you can't program well in it. It just attracts bad programmers.

And that goes the same for many of the newer "Basics": these "managed languages" that make it so that people can "code" without really understanding what they're doing. Sure, you can get lines of code that way. But you don't get a good product.

And then the whole thing falls apart.

Bad performance is built in. by BigZaphod · 2004-05-05 17:35 · Score: 2, Insightful

There seems to be two basic causes of bad performance:

1. Mathematically impossible to do it any other way.
2. Modularity.

Of course crap code/logic also counts, but it can be rewritten.

The problem with modularity is that it forces us to break certain functions down at arbitrary points. This is handy for reusing code, of course, and it saves us a lot of work. Its the main reason we can build the huge systems we build today. However, it comes with a price.

While I don't really know how to solve this practically, it could be solved by writing code that never ever calls other code. In other words, the entire program would be custom-written from beginning to end for this one purpose. Sort of like a novel which tells one complete story and is one unified and self-contained package.

Programs are actually written more like chapters in the mother of all choose-your-own-adventure books. Trying to run the program causes an insane amount of page flipping for the computer (metaphorically and actually :-))

Of course this approach is much more flexible and allows us to build off of the massive code that came before us, but it is also not a very efficient way to think about things.

Personally, I think the languages are still the problem because of where they draw the line for abstractions. It limits you to thinking within very small boxes and forcing you to express yourself in limited ways. In other words, your painting can be as big as you want, but you only get one color (a single return value in many languages). It is like we're still stuck at the Model T stage of language development--it comes in any color you want as long as its black!

--
Hexy - a strategy game for iPhone/iPod Touch

Re:Bad performance is built in. by Lord+Kano · 2004-05-05 18:11 · Score: 2, Funny

Sort of like a novel which tells one complete story and is one unified and self-contained package.

In other words, "a book".

LK

--
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano

Followed your link by ccoakley · 2004-05-05 17:35 · Score: 3, Insightful

1. Is that your sig or is that part of your comment? If it is part of your comment, please explain why it would give me a whole new view on performance. If it's your sig, then spooky how it was related to the topic.

2. Assuming your stuff is good, when are you going to code up SHA-1 (*MY* favorite hash)?

3. On the server side of things, I would argue that correctness is more important than otherwise. If an app crashes 1 in 100 times for a desktop user, the developer blames windows and the user is satisfied (don't flame me on this, please). On the server, if the app crashes 1 in 100 times, it may bring down the transactions for 100s of users, making things very bad for the developer. For non-crash correctness problems, consider a problem which makes a minor, but cumulative error in subsequent runs. That would likely be disasterous for the server situation.

As far as clarity, find me one developer who has taken over a project and not complained about the quality of the inherited code ever. Seriously. (that's not directed at parent)

--
Network Security: It always comes down to a big guy with a gun.

Re:Followed your link by BlackHawk-666 · 2004-05-05 20:16 · Score: 5, Insightful

When coding anything, msot of the code should be done to be fast
This may be true when you are producing libraries of math routines and similar stuff like you are doing. It doesn't hold an ounce of water when you do the sort of work I do. My projects are generally medium sized, mixed languages, developers of all different skill levels. Code clarity is far more important for 98% of the stuff we do. I need my juniors to be able to follow the code the seniors write, even if they can't write it themselves. The other 2% of the time it's fine to sacrifice clarity for speed to get the performance to an acceptable level on the target platform.
I have generally found that clear code is usually good code, so long as you are aware of the cost implications of your design decisions. For instance, I seem to recall the bubble sort (mentioned earlier) was actually faster than a qsort under some circumstances. Deep data knowledge would help you to make the decision as to which would need to be used...don't just reach for that qsort, it may be the fastest under most cases, but not all.

--
All those moments will be lost in time, like tears in rain.
Re:Followed your link by BlackHawk-666 · 2004-05-06 00:21 · Score: 3, Informative

I should have qualified my statement a little better, and I suspect qsort vs bubblesort is not the best illustration possible. Each sort algorithm has strengths and weaknesses e.g. easy to implement but slow to run, ruthlessly difficult to code but fast as hell, good at sorting random data but worst case scenerio on near sorted data. If qsort were always faster than every other algorithm then we wouldn't still be talking about them. QSort is generally faster than most other sorts, and the O(n log n) is an average sort cost, not a gaurantee.
When I was in uni our lecturer gave us an example from the QU campus where he used to lecture. There was a computer (remember, this is back in the eighties) that needed to sort rather a lot of data and it took three days to do it with the qsort algorithm. The main problem was, I believe, due to memory restrictions i.e. all the data could not fit into memory at once. It was recoded to use a different algorithm, one that could work from disk and in small chunks, and ran orders of magnitude faster. The recoded algorithm was theoretically slower, but faster in actuality due to the nature of the data and the machine it had to run on.

--
All those moments will be lost in time, like tears in rain.

A few points that come to mind... by ivec · 2004-05-05 17:35 · Score: 2, Interesting

- Decoding a RLE data buffer is short of impressive as a benchmark. RLE was designed as a simple and specific (generally inefficient) compression approach for age-old hardware (i.e. 8MHz, not 333MHz as the base system used here).
How about JPEG or PNG ?

- The author actually spent several iterations optimizing this Erlang code. And these optimizations required handling special cases. (So performance eventually did matter to the author?) Now, would a 'first throw' implementation in C/C++ have been written faster while immediately performing better than the Erlang version? (simpler code)

- I agree that the compiled/interpreted code performance matters less and less, because processors are so much more powerful. For instance, the processing for RLE decompression should in any case be negligible wrt the memory or disk i/o involved.
What is becoming increasingly important, however, is the data structures and algorithms that are used. In this perspective, C++ still shines, thanks to the flexibility that its algorithms and containers library provides.
C++ offers both a high level of abstraction (working with containers), and provides the ability to convert to a different implementation strategy with ease - if and when profiling demonstrates a need.
For large system and library development, the strong static typing of C++ is also a real plus (it doesn't matter to me it is faster than dynamic typing or not).

I totally agree that performance should not be a concern during program implementation (other than avoiding 'unnecessary pessimization', which involves the KISS principle and knowledge of language idioms). Optimization should only be performed where the need for a speed-up has been demonstrated.
Other than saying "wow this interpreted language runs damn fast on current hardware", this article does a poor job at making any relevant point.

radix omnia malorum prematurae optimisatia est -- Donald Knuth

Performance is IMPORTANT by Jason+Pollock · 2004-05-05 17:38 · Score: 4, Informative

I am hearing a lot of people saying that you shouldn't optimise prior to the first release. However, it is very easy to select a design or architecture that limits your high end performance limit. Therefore, there is some optimisation that needs to be done early.

When you're architecting a system that is going to take tens of man years of effort to implement, you need to ensure that your system will scale.

For example, a project I recently worked on hit a performance wall. We had left optimisation for later, always believing that it shouldn't be done until last. However, while the architecture chosen was incredibly nice and clear, it limited the performance to 1/3th what was required. Back to the drawing board, we just doubled the project cost - ouch.

Even worse, there are performance differences on each platform! For example, did you know that throwing an exception is 10,000 times slower than a return statement in HP/UX 11? Solaris is only a little better at 2 orders of magnitude. Linux is (I understand) a dead heat.

So, while low-level optimisation of statements is silly early in the project, you do need to ensure that the architecture you choose is going to meet your performance requirements. Some optimisations are definitely necessary early in the project.

The article also talks about tool selection, suggesting that the extra CPU could be better used to support higher level languages like Erlang. If a system has CPU to spare, I agree, use what you can. The projects I work on always seem to lack in CPU cycles, disk write speed, and network speed. You name it, we're short of it. In fact, a large part of our marketing strategy is that we are able to deliver high performance on low end systems. What would happen to us if we dropped that edge? We're working with a company that has implemented a real-time billing system in Perl. Not a problem, until you try and send it 500 transactions/second. Their hardware budget? Millions to our 10s of thousands. Who do you think the customer likes more?

Jason Pollock

Throw hardware at it. by aiyo · 2004-05-05 17:39 · Score: 2, Interesting

My software engineering prof. believes that optimization should never be done during a project. Instead he thinks the programmer should wait until the project is complete then give careful consideration as to wether to optimize or not. He says most problems can be fixed by upgrading to better hardware and hours of optimization is not worth 3-4k more in hardware costs. I thought he was crazy to preach this during lecture. What do you guys think? Would you spend a day designing a better algorithm or finish the project and buy faster hardware?

Re:Throw hardware at it. by defile · 2004-05-05 17:55 · Score: 3, Insightful

Well, that depends.

You probably picked the simplest, dumbest algorithm and probably used the most basic data structure. Why do all of the hard work when you don't even know if the easy work will suffice?

If they don't suffice, your options are to develop your own algorithm/find a better one and a more natural data structure, or to throw hardware at it. Chances are, you won't be lucky enough that you can just upgrade so you'll have to spend valuable programmer time implementing a more complex algorithm that will need more careful maintenance that is likely to have more bugs that is probably less robust. You'll probably have to convert the data to a more machine-friendly format. Maybe you'll have to inconvenience the user or ship a lot of precompiled data. Whatever.

It's rare that the easy algorithm is slow enough that it won't do as-is, but fast enough that doubling cpu power makes it tolerable. Usually there are orders of magnitude differences between the "best" algorithm and the easy algorithm, and only incremental speed bumps in computer offerings.

On the other hand, maybe with an extra GB of RAM you'll never have to touch swap. Maybe that's good enough. ;)
Re:Throw hardware at it. by SatanicPuppy · 2004-05-05 18:09 · Score: 4, Interesting

Depends on how bad it is. I've seen stuff that runs so slow there really isn't a way to throw more hardware at it. Of course that was written by a guy who had two goals: 1) to make sure no one but him could support his work, and 2) to do as little work as possible.

I don't know. Clean, elegant, functional code is beautiful. If you're ever going to have to work on it again, I think it's better for it to be clean and optimized.

Also depends on the size of the app. With a small app, what excuse do you have for not optimizing? Wouldn't take that long. With a big project? Depends on your work environment.

The bosses will never know if its optimal or not. If you tell them you've maxed out the server, they just think you write big badass code. A lot of times though, there isn't time to thoroughly bug check a big app (That what users are for, eh?), more less optimize it.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Re:Throw hardware at it. by BlackHawk-666 · 2004-05-05 20:29 · Score: 2, Informative

My charge out rate at my last company was 1000/day (about $1500USD) so I'm going to say yes to the optimisation in this case, because it is cheaper. If it were going to take me 3-4 days I'd say get the better hardware and try to keep the code as lean and as fast as possible without wasting too much time trying to wring it for performance.

--
All those moments will be lost in time, like tears in rain.
Re:Throw hardware at it. by EastCoastSurfer · 2004-05-06 00:36 · Score: 3, Insightful

I hesistate to first throw hardware at the problem, but I do agree that optimizations generally should be left as the last thing to do in a project. Code should be written first to be readable and correct. Once those goals have been met, testing and profiling will find the few areas that are critical and may need some optimization.

The problem your prof is probably trying to get you to avoid is wasting time tuning code that rarely gets executed. It comes down to the old 80/20 rule. Sure, you can spend weeks hand tuning some import routine, but all your time was wasted if that import is only run once a month, at night while the system is offline.
Re:Throw hardware at it. by arkanes · 2004-05-06 02:03 · Score: 3, Insightful

The most important reason to wait is because, almost inevitably, the part that you THINK is slow is not the part that actually hangs you up. You may spend an extra couple days working on you super-fast optimized sort & data structure only to find when you deploy that your bottleneck is RAM usage and all your clever caching is just slowing stuff down. Another good example is earlier in this thread, with the super-fast optimized MD5 libraries - spending money or time writing/buying those libraries if your data set is IO bound doesn't make much sense.
Optimization is great, but profiling to make sure that your optimization isn't wasted is more important.
Re:Throw hardware at it. by An+Onerous+Coward · 2004-05-06 03:01 · Score: 3, Insightful

Also depends on the size of the app. With a small app, what excuse do you have for not optimizing? Wouldn't take that long. With a big project? Depends on your work environment.
The level of optimization needed for small projects varies wildly. If it's a one-shot deal to allow one secretary to generate a report twice a week, who really cares if it takes two seconds or twenty? Even if you assume that's thirty-six seconds every week, it's going to take years of use before it would have been worthwhile to optimize it.

On the other hand, if it's something that hundreds of people are going to be using four or five times a day, then it's probably worthwhile to do some algorithmic/data structure improvements.

Finally, you get the extreme case: some library that will end up being used by millions. Those are the times when you want to eke out every bit of performance you can. The size of the project doesn't always determine its importance, nor does the importance of the project always determine how much optimization is needed.

--
You want the truthiness? You can't handle the truthiness!

right for the wrong reasons by epine · 2004-05-05 17:42 · Score: 4, Insightful

Sigh. One of the best sources of flamebait is being right for the wrong reasons.

Surely C++ must rate as the least well understand language of all time. The horrors of C++ are almost entirely syntactic, beginning with the decision to maintain compatibility with the C language type declaration syntax and then adding several layers of abstraction complexity (most notably namespaces and templates).

There are only two areas where I fear C++ for program correctness. The first is making a syntactic brain fart leading to an incorrect operator resolution or some such. These can be tedious to ferret out, but most of these battles are fought with the compiler long before a defect makes it into the production codebase.

My second source of fear concerns interactions of exception unwinding across mixtures of object oriented and generic components. I see this as the only case where managed memory provides a significant advantage: where your program must incorporate exception handling. If you can't manage your memory correctly in the absence of the exception handling mechanism, I really don't believe you can code anything else in your application correctly either. I think exceptions are mostly a salvation for poor code structure. If all your code constructs are properly guarded, you don't need an error return path. Once a statement fails to achieve a precondition for the code that follows, the code path that follows will become a very efficient "do nothing" exercise until control is returned to a higher layer by the normal return path, whereupon the higher layer of control can perform tests about whether the objectives were achieved or not and take appropriate measures. I think the stupidest optimization in all of programming is cutting a "quick up" error return path that skips the normal path of program execution so that the normal path of execution can play fast and loose with guard predicates.

The four languages I use regularly are C, C++, PHP, and Perl. Perl is the language I'm least fond of maintaining. Too many semantic edge cases that offer no compelling advantage to motivate remembering the quirk. C++ has many strange cases, but for C++ I can remember the vast majority of these well enough, because I've stopped to think about how they evolved from taking the C language as a starting point.

I happen to love PHP for the property of being the most forgettable of all languages. I forget everything I know about PHP after every program I write, and it never slows me down the next time I sit down to write another PHP program. The managed memory model of PHP appeals to me in a way that Java doesn't, because as an inherently session-oriented programming model, PHP has a good excuse for behaving this way.

I have a love/hate relationship with both C and C++. I write one program at a high level of abstraction in C++ and then when I return to C it feels like a breath of fresh air to live for a while in an abstraction free zone, until the first time I need to write a correctness safe string manipulation more complicated than a single sprintf, and then I scream in despair.

The part of my brain that writes correct code writes correct code equally easily in all of these languages, with Perl 5 slightly in the rear.

If I really really really want correct code I would always use C++. The genericity facilities of C++ create an entire dimension of correctness calculas with no analog in most other programming languages. The template type mechanism in C++ is a pure functional programming language just as hard core as Haskell, but because C++ is a multi-paradigm language, in C++ you only have to pull out the functional programming hammer for the slice of your problem where nothing less will do.

What I won't dispute is that C++ is a hard language to master to the level of proficiency where it becomes correctness friendly. It demands a certain degree of meticulous typing skills (not typing = for ==). It demands an unflagging determination to master the sometim

This isn't an article about optimization by the_skywise · 2004-05-05 17:45 · Score: 4, Insightful

So much as an attempt to "prove" that programming to the metal is no longer necessary or desireable. (IE "After all, if a C++ programmer was truly concerned with reliability above all else, would he still be using C++?" )

The analogy is all wrong. These days there are distinctly two types of "optimization". Algorithmic and the traditional "to the metal" style.

During college I worked with the English department training English students to use computers as their work had to be done on a computer. (This was before laptops were commonplace) The theory was that word processing allowed students a new window into language communication. To be able to quickly and painlessly reorganize phrases, sentences and paragraphs showed the students how context, clarity and meaning could change just by moving stuff around.

This is what the author has discovered. That by being able to move code actions around, he can experiment and "play" with the algorithm to boost speed while keeping error introduction to a minimum. (Ye olde basic anyone?)

He mistakenly equates this to "advanced technologies" like virtual machines and automatic memory buffer checking. In reality, we've just removed the "advanced technologies" from the process. (IE Like pointers, dynamic memory allocation, etc) (IE, ye olde basic anyone?)

There's nothing wrong with this. Though I am a C++ programmer by trade, I was far more productive when I was professionally programming Java. But that was because I had LESS creative control over the solution because of the language syntax. No passed in variable changing, no multiple inheritance, etc. So I'm thinking of how to layout the code, there's pretty much a limited way of how I'm going to go about doing that.

It's like the difference between having the Crayola box of 8 crayons and the mondo-uber box of 64. If you're going to color the green grass with the box of 8, you've got: Green. If you've got 64 colors, you're going to agonize over blue-green, green-blue, lime green, yellow-green, pine green and GREEN.

That doesn't make C++ less "safe" than Java. Sure, you can overwrite memory. But you can also create a Memory class in C++ ONCE which will monitor the overflow situation FOR you and never have to worry again.

But back to optimization:
66 fps seems really fast. But in game context it's still kind of meaningless. Here's why. You're not just displaying uncompressed images. You're also doing AI, physics, scoring, digital sound generation, dynamic music, User input, possibly networking. As a game programmer, you don't stop at 66 fps. Because if you do 132 fps, then you can really do 66 fps, and still have half a second left over to do some smarter AI or pathfind. Or if you get it up to 264 fps than you can spend 1/4 of the cycle doing rendering, maybe you can add true Dynamic voice synthesis so you don't have to prerecord all your speech!

Ultimately, my point is this. (and I think this is what the author intended) You're going to get bugs in whatever language you write in. That's the nature of the beast. VM's and 4th generation languages take away the nitty gritty of programming while still providing alot of performance power. And in alot of cases, that's a good thing. But it's still nothing more than a "model" of what's really going on in the hardware. If you're really going to push the limits of the machine you have to be able to control all aspects of it. Now, it's getting harder to do that in Windows. We spend more time coding to the OS than the metal. But in the embeddes systems category, and in console video game systems the metal still reigns and if you're going to develop a game that will push the hardware, you're going to need a programming language that will let you speak machine language. Not one that's going to protect you from yourself.

As it was in the beginning, as it always will be: Right tool for the right job.

Re:This isn't an article about optimization by Anonymous Coward · 2004-05-05 19:40 · Score: 2, Insightful

But it's still nothing more than a "model" of what's really going on in the hardware. If you're really going to push the limits of the machine you have to be able to control all aspects of it.

For instance, the famous 'Goto is considered harmful'.

In actual machine code, the processor's equivalent of 'goto' (usually called a 'jump') is one of the most common operations...

Another way of looking at this is antilock brakes in cars.

It's not so much that the 'new' way of doing things is really any better to a skilled user. But they sure help reduce headaches caused by a lack of skill on the part of a new and/or less talented person.

Depends on the design and the bottleneck by www.sorehands.com · 2004-05-05 17:46 · Score: 2, Insightful

What you say makes sense, but is completely wrong.

You have to consider the entire system design when looking at the bestplace to make the optimization. You need to look at what the bottleneck and attack that, but keep in mind the issue in upgrading the system.

--
Fight Spammers!

Re:You don't optimize, that's the job of the compi by scot4875 · 2004-05-05 17:47 · Score: 2, Informative

An intelligent compiler (i.e. any modern compiler you'd be likely to use) will automatically __inline the fred::setQ function, and then the peephole optimizer will reduce it down to the equivalent of myFred.q = 10;

--Jeremy

--
Jesus was a liberal

Re:me too... by Lord+Kano · 2004-05-05 17:49 · Score: 5, Funny

One of my C++ instructors told us a story about one of his former coworkers who used to go through his programs and delete ALL whitespace from the code because he thought it would make the programs smaller and faster.

LK

--
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano

Re:You don't optimize, that's the job of the compi by Anonymous Coward · 2004-05-05 17:56 · Score: 3, Informative

I don't know why his example was so bad.

A good example would be how to detect if a king is in check in a chess program. There are a few different approches. Some are fast, some are slow, and a compiler just cannot "optimize" a slow approach into a fast one. The function is called millions of times per second in a chess program, so you want it optimized.

When to optimize by lejordet · 2004-05-05 18:01 · Score: 2, Interesting

When I start on a program, I usually make "place holder" functions where necessary to get the program up. Sure, this will be slow, but at least I can get the program up and running quickly (the place holders usually do what they're supposed to in the most convenient-to-code way I could think of, or emulate their final functionality - for example by returning true all the time).

What this achieves for me, is that I can look at the program as a whole, and _then_ identify where the problem areas are - most likely not where I thought they were... Even if the first version takes 5 minutes to run (as my first attempt at a depth-first tree search did), it works passably, and is often easier to optimize than trying to optimize each function as I write it.

Might not work for everyone, but I like coding this way :)

--
Yes?

Fragility of the decoder by Animats · 2004-05-05 18:04 · Score: 3, Interesting

"I was also bothered--and I freely admit that I am probably one of the few people this would bother--by the fragility of my decoder."

And, sure enough, there's a known, exploitable buffer overflow in Microsoft's RLE image decoder.

uhh.. yeah by XO · 2004-05-05 18:10 · Score: 2, Interesting

ok, so this guy is saying that.. he found 5 or 6 ways to improve the performance of his program by attacking things in an entirely different fashion... ok..

back in the day, i discovered a really great trick... you might represent it as something like... :

boolean a;
a = 1 - a;

this is a zillion times more efficient than if(a == 1) a = 0; else a = 1;

it is also about the same as a |= 1; if you were going to use bitwise functions.

OK. Great.

--
"Champagne for my real friends - and real pain for my sham friends!" http://ericblade.postalboard.com/

Re:uhh.. yeah by prockcore · 2004-05-05 18:56 · Score: 2, Informative

this is a zillion times more efficient than if(a == 1) a = 0; else a = 1;

This is the one time where I'll step up and say that VC actually does a few neat tricks for the trinary operator.
c=(a>b)?0:1 /* or c=!(a>b), it's the same code */
translates to
cmp b,a sbb c,c inc c
there are other variants of this, I'll leave it as an exercise to the reader to figure out what is going on.

Asymptotic performance by alanwj · 2004-05-05 18:29 · Score: 4, Insightful

The problem, in my opinion, is that people go about optimizing in the wrong place.

You can spend all day optimizing your code to never have a cache-miss, a branch misprediction, divisions, square roots, or any other "slow" things. But if you designed an O(n^2) algorithm, my non-optimized O(n) algorithm is still going to beat it (for sufficiently large n).

If the asymptotic performance of your algorithm is good, then the author is right, and you may not find it worth your time to worry about further optimizations. If the asymptotic performance of your algorithm is bad, you may quickly find that moving it to better hardware doesn't help you so much.

Alan

Re:Asymptotic performance by Flyboy+Connor · 2004-05-05 19:58 · Score: 4, Insightful

To put it in different terms: Optimisation is in finding a good algorithm, not in tweaking code details.
To give a nice example: a colleague of mine worked on a program that took two months to execute (it consisted of finding the depth of all connections between all nodes in a graph containing 50,000 nodes). Since the customer needed to run this program once a month, this took far too long. So my colleague rewrote the whole program in assembly, which took him a few months, managing to reduce the required time to, indeed, one month.
My boss then asked me to take a look at it. Together with a mathematician I analysed the central function of the program, and we noticed that it was, basically, a matrix multiplication. We rewrote the program in Delphi in an hour or so, and reduced the required running time to less than an hour.
I won't spell out the lesson.
Re:Asymptotic performance by Anonymous Coward · 2004-05-05 20:49 · Score: 2, Interesting

(Posting as anonymous coward to protect the guilty)

When I started my Ph.D. work, I came into a project doing compiler stuff in functional languages. There was a home-brew lexer that my adviser had written, that did 2d "array" lookups by scanning all the way through a list of lists. We thought it was broken, as it never finished. I changed it to use real arrays, and got it down to taking a matter of minutes. Merely using a O(n^2) implementation rather than O(n^4) :) (Language and implementation still sucked, though)

Morale: It's the O-factors that will kill you. Optimizing anything but that is a waste of time until you have seen the profiling data. And even the O-factors are irrelevant in code that's not going to be executed often or that is outside of high-performance areas.

I still catch myself trying to avoid single instruction improvements in handling of, say, user dialog actions. But the user and window system together probably took over a second already to cause the action, so doing a bit of extra CPU work to be clearer or safer is The Right Thing.
Re:Asymptotic performance by lahi · 2004-05-05 21:15 · Score: 2, Insightful

I disagree: Finding a good algorithm (indeed, finding the *best* algorithm for a task), is merely good programming. (And *inventing* a good algorithm is *excellent* programming!) Implementing it in the best possible manner, including applying shortcuts which are known to be possible due to knowledge of the specific task to which the algorithm is applied, is optimising.

You might be a good optimiser, or you might just be a good programmer. Your colleague however, is a bad programmer.

-Lasse

I don't know about Erlang but... by BitwizeGHC · 2004-05-05 18:34 · Score: 2, Interesting

Some implementations of popular dynamic languages (e.g., LISP, Scheme), let you do some type inference and/or some explicit declarations, and will spit out machine code or C that will do the job that much faster. Tweak your algorithm in the slow version of the language and then produce a program that runs ten times faster with an optimizing compiler.

The Squeak VM is a great example of this. The whole thing is written in Squeak itself. Running a Smalltalk VM this way is painfully slow, but a Smalltalk->C translator generates the code that will be compiled and used as the actual, runtime VM (which can support a whole host of things, including raster and vector graphics, sound, MP3 audio and MPEG video!).

--
N4st0r, trixx0r h0bb1tz0rz! Th3y st0l3 0ur pr3c10uzz!

This guy is out on a limb by ibullard · 2004-05-05 18:47 · Score: 3, Informative

Quote:
Even traditional disclaimers such as "except for video games, which need to stay close to the machine level" usually don't hold water any more.

Yeah, as long as you write simple, 2D games(like the author of the essay does) that would be true. Complex, 3D games are another matter. I write games for a living and even if you're within sight of cutting edge you're writing at least some assembly and spending a lot of time optimizing C++.

Now I'm not knocking all he says or saying that good games need to be in C++ and assembly. Some games rely heavily on scripting languages to handle the game mechanics and world events. There's a lot less assembly code than there used to be. However, the core engine that handles graphics, physics, AI, and I/O is going to be written in C++ and assembly and will be for the forseeable future.

If I published a game that required a 3Ghz computer to display 576x576 images at 66fps, I'd be laughed off the internet. A PS2 has a 300Mhz processor and needs to display a 512x448 image every 30-60 seconds.

Re:This guy is out on a limb by ibullard · 2004-05-05 18:55 · Score: 2, Insightful

That should read "a 512x448 image 30-60 TIMES a second.
I should have ended the post by typing "HEY, GRAMMER FREAKS! LOOK AT ME! I SUCK!" instead of writing that last sentence.
Re:This guy is out on a limb by prockcore · 2004-05-05 19:35 · Score: 3, Insightful

Yeah, as long as you write simple, 2D games(like the author of the essay does) that would be true.

Not only that, but even simple 2d games can need optimizing. Perhaps they need optimizing because they're on an inherently slow platform (like Flash or a cell phone), or perhaps they need optimizing because they're multiplayer (and games with bad network code are immediately obvious and usually fail miserably)

I find it strange that so many programmers here talk about things being "fast enough" or "not worth my time"... yet any article about mozilla, openoffice, windows, osx, damn near any software package with a gui is filled with complaints about slowness and bloat.

Makes you wonder what IS worth their time.
Re:This guy is out on a limb by julesh · 2004-05-05 23:25 · Score: 2, Insightful

Mozilla is slow because its GUI is written using a flexible script interpreter, rather than being hard coded in C++.

I don't know why OpenOffice is slow, I've never analysed the way it works in enough detail. I'm sure the reason is fairly obvious to anyone who knows the code base well enough to comment.

Windows isn't really slow, but has some annoying features that have been added recently that can slow you down; for instance in the user interface it will try to open files of certain types to display information about them whenever you perform any operations on them, which isn't exactly helpful if opening the file is too slow...

My only experience of using OSX is that it's blindingly fast. But I've used it for about 3 hours, so that's hardly conclusive.

But you see the pattern -- these systems are slowed down by features that are _necessarily_ slow. You couldn't have the same features without the performance problems they bring. Windows can't give you a preview of an image, or tell you how many files are in an archive, without opening it (although I _really_ wish it'd do it in another thread...). Mozilla can't support its really easy-to-write user interface extensions without an interpreted UI.

The people complaining are people who don't actually want these features, and don't see why they should suffer for them, which is a fair point.

http://www.javaperformancetuning.com/ by Kunta+Kinte · 2004-05-05 18:53 · Score: 2, Interesting

Maybe a little offtopic.

But if you haven't heard of it http://www.javaperformancetuning.com/ is a good source of performance tips for java

--
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW

Quick, go get Fortran 95. by mbkennel · 2004-05-05 19:10 · Score: 3, Insightful

Do NOT convert to C++ under any circumstances!

Fortran 77 sucks.

But C++ sucks, in different ways.

Fortran 95 is a much better language than Fortran 77, and for many things, better than C++ as well.

It is practically a new language with an old name.

If you currently have a F77 code, it is almost certainly far better to start using Fortran 95.

Essentially all Fortran 95 implementations have compile and run-time checks which can make things as safe as Java, and when you take off the checks, things will run very fast. With the Intel Fortran 8.0, probably faster than anything other than hand-tuned assembly. You will probably whip GCC C++.

It is also quite doubtful you will get significantly better performance in C++.

No, I am not an old-fogey (I'm 35 now, programming since age 13). I learned Basic on the Apple II+, then Fortran 77 and C simultaneously when I got my first summer job. Then C++. Then Eiffel and Sather. Then Fortran 95.

Yes, indeed, fully knowing C++ I choose Fortran 95 for technical superiority in the problems I want and need to solve. (Sather was the best ever, but now dead, Eiffel good if you have sophisticated data structures and you don't need multi-dim arrays, and F95 best for any linkage to Fortran and multi-dim arrays, modules but not objects).

The problem is that C++ bugs, though less frequent than bugs in C, are can be deep, subtle and severe. The language has very opaque bits. Include files are antideluvian. Pointers and references, baroque and archaic. Object model brittle. Templates powerful and dangerous. A hideous and error-prone syntax.

This is not the case in Fortran 95. Other than fully algorithmic bugs are shallow.

"computer science" truly misunderestimates Fortran 95.

Re:me too... by maxwell+demon · 2004-05-05 19:17 · Score: 2, Funny

In some sense he was right:
The program source code gets smaller (by as many bytes as the removed whitespace occupied), and compiles faster (because the parser doesn't have to read and ignore all those whitespace characters).

Now, the size reduction might be quite measurable (esp. if the original program was quite readable), though not substantial. However, if the improved compile speed was measurable, the compiler must have had an incredibly slow parser (or maybe the system had mounted the disk with NFS through ppp over ssh through a 14.400 kbps modem link :-).

--
The Tao of math: The numbers you can count are not the real numbers.

Re:You don't optimize, that's the job of the compi by oskillator · 2004-05-05 19:20 · Score: 3, Insightful

If you write clear and simple code the compiler or interpreter does all the other work. It will automatically remove unused code and simplify complex segments. So long as your code is not unnecessarily convoluted often the machine optimizations are better than the human brain optimizations.

A compiler can do low-level optimization, but it can't figure out a better algorithm for you, and the simplest, least convoluted algorithm is usually not the fastest.

All the assembly language fiddling in the world -- by the optimizer or by hand -- will give you maybe a 2x performance over C, 10x over perl, but a better algorithm will often increase performance by many orders of magnitude.

Optimise last by Kris_J · 2004-05-05 19:21 · Score: 2

Intially develop the entire project in a langauge you can develop fast. Once it works (and you're sure it does what the client wants), find out where the most CPU time is spent, then optimise those bits. "Optimise" may just mean having a good look at your code and working out a better way of doing it, or it might mean writing a library in assembler. Either way, optimise last.

Re:Clarity in Code by Designadrug · 2004-05-05 19:21 · Score: 2, Insightful

As far as clarity, find me one developer who has taken over a project and not complained about the quality of the inherited code ever.

Guilty. But at least I'm thinking about the poor SOB who's going to be maintaining my code. In fact we were just implementing new functionality using a superfast but arcane algorithm and were having trouble debugging it (mucho matrix maths - yuk). Instead of finishing that, we researched another algorithm that instead uses triply-nested loops with two conditionals. It won't be half as fast (because of conditionals within loops of course) but it will be a heck of a lot easier for my successor to maintain. (Took 10 minutes to implement and worked first time. Had to check I wasn't stuck in a BTL simulation)

--
Cogitum Ergo Hatto

For some things, enough hasn't ever been enough by grimen · 2004-05-05 19:28 · Score: 2, Interesting

The (I think correctly) author argues that for many tasks we over stress optimization in places where it isn't necessary. Well and fine for tasks that it's not necessary such as the example he gives.

However, as available processing power increases, some tasks change. Many technologies follow a trajectory that starts at "unthinkable" then move to "if you have special hardware" and then move gradually to software. Often along the way, features and computational complexity are added that keep a technology barely in reach (of both HW and SW implementations). It can be many many years before some technologies settle into a stage where they can be comfortably supported in SW at acceptable performance.

Examples include: sound (which started with clicks and beeps and moved through to multichannel 3D audio), graphics, games (text-based to ever-more-complex 3D) and video codecs (simple RLE moving to ridiculously complex stuff like the H.264 codec). In games, for example, there are often preference panels controlling which features should be disabled for performance reasons. This seems evidence that the authors/publishers feel they can't count on their customers having enough power to run the games without cutting features to gain performance.

I think for those applications where processing power trails needs and desires of customers and where optimization can make up the difference, developers will need to optimize or be eaten by the competition. In my experience, in things like codec and graphics development, you can get many-times performance increases over solid but poorly optimized implementations (sometimes even when you're just feeding HW).

I think those gains can be critical.

Postmature optimization by nimblebrain · 2004-05-05 19:35 · Score: 5, Informative

After years of developing, I really take to heart two things:

Premature optimization often makes better optimizations down the line much more difficult
It's 90% guaranteed that the slowdown isn't where or what you thought it was

Profilers are the best thing to happen to performance since compilers - really. I encounter a number of truths, but many myths about what degrades performance. A few examples of each:

Performance degraders

Mass object construction
Searching sequentially through large arrays
Repeated string concatenation (there are techniques to mitigate this)
Staying inside critical sections for too long

Not performance degraders

Lots of object indirection
Lots of critical sections

The "lots of object indirection" myth is one I encounter frequently. Object A calls Object B calls Object C, and it "intuitively" looks like it must be slow (Computer A calling Computer B, etc. would be slow), but even with stack frame generation, these are lightning fast compared with even the likes of "date to string" functions, never mind line-drawing commands or notification-sending.

The reason that particular myth is dangerous is that it's the single most pervasive myth (IMHO) that leads to premature optimization. People take out layers of object indirection and make it harder to put in better solutions later. I had an object that recorded object IDs in a list and let you look them up later. If I had "flattened" that into the routine that needed it, I might have effected a 0.1% speed increase (typical range for many premature optimizations). As it stood, because it hid behind an interface (equivalent to an ABC for C++ folks), when I had finally implemented a unit-tested red/black tree, it was trivial (~5 minutes) to drop in the new functionality. That's not an isolated case, either.

Mind you, I profiled the program to determine the slowdown first. Searching on the list, because so many were misses (therefore full scans), the search was taking up 98.6% of the entire operation. Switching to the red/black tree dropped the search down to 2.1%.

All in all, if you have a slow program, profile it. There is no substitute for a well-written profiler. Stepping through and "feeling" how long it takes in a debugger, while it can point you in rough directions, will miss those things that take 50 ms out of the middle of each call to the operation you're checking. Manually inserting timing calls can be frustrating enough to maintain or slow down your program enough that you can't narrow down the performance hit.

gprof works well with gcc and its relatives (make sure to add -pg to your flags), but I'm not sure if there's a good open source option out there for people using other tools that doesn't require you to alter your source.

In the Windows world, we recently got in the professional version of AQTime 3. It's an astounding package, allowing you numerous reports, pie charts and call graphs, saving the last few runs, calculating differences in performance between runs, allowing attachment to running processes, on top of a pretty nice way to define areas of the program to profile. The single nicest thing about it, though, is the performance. We turned on full profiling (that is, profiling all methods in all modules, including all class library and third party components) on the largest project we had, and it ran with perhaps a 30% slowdown. If you've used profilers before, you know how astounding that is ;)

Profiling applications always surprises me. In one case, a space-making algorithm I was running on controls seemed a little pokey; I found out more than 50% of the time spent was on constantly verifying that the lists were sorted. Today, I was investigating a dialog that looked like it must hav

--
Binary geeks can count to 1,023 on their fingers :)

wtf by fred+fleenblat · 2004-05-05 19:50 · Score: 4, Insightful

The article made it sound like the optimizations he was doing at the erlang level were somehow "better" than optimizations done in a language like C++ because he could just try out new techniques w/o worrying about correctness. His array bounds and types would be checked and all would be good.

BS.

First of all, erlang won't catch logical or algorithm errors, which are quite common when you're optimizing.

Second, you can optimize just fine in C++ the same way just as easily, IF YOU ARE A C++ programmer. You just try out some new techniques the same way you always do. So array bounds aren't checked. You get used to it and you just stop making that kind of mistake or else you get good at debugging it. Hey at least you have static type checking.

In fact you might be able to do a better job of optimization because you'll be able to see, right in front of you, low level opportunities for optimization and high level ones also. C++ programmers aren't automatically stupid and blinded by some 1:1 source line to assembly line ratio requirement.

Re:You don't optimize, that's the job of the compi by Pseudonym · 2004-05-05 19:51 · Score: 4, Insightful

Wrong. Dead wrong.

You don't micro-optimise unless the compiler doesn't do the job well enough. But nowadays, you almost never have to. Your superior brainpower can mostly be freed from the mundane details of your hardware and instead you can concentrate on using more suitable algorithms or data structures.

Indeed, the best thing you can do to get your code running fast is to write it with good abstractions. That way, when you find a performance problem, you can swap some old code out and swap some new code in and everything else will still work.

--
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});

Re:me too... by some+guy+I+know · 2004-05-05 20:22 · Score: 4, Interesting

If the man was coming from a BASIC programming background, his belief may have been understandable.
Some (very old) BASIC interpreters used to parse each source line each time it was executed.
Doing it that way saved memory (no intermediate code to store).

--
Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana

So true... by warrax_666 · 2004-05-05 20:46 · Score: 2, Interesting

You have to work much, much harder in C++ to get anywhere near FORTRAN performance, so much so that it's almost never worth the effort.

One of the most dangerous things (optimization-wise) in C++, I've found is the temporary-creation problem. You have to be insanely careful to avoid creating temporaries to get any sort of reasonable performance... (or maybe I just need a better compiler than GNU GCC?)

Templates powerful and dangerous.

Not quite sure why you would consider them dangerous, but they are Turing Complete (i.e. they are a compile-time language all of their own). Which some people have used to create this. It looks almost as fast as Fortran, but the syntax is a lot more complex than just A*B for a matrix-multiplication.

--
HAND.

The compiler can't do all micro-optimizations by r6144 · 2004-05-05 21:22 · Score: 2, Informative

Some good habits in coding helps the compiler to do its job better, and also results in clearer (at least not uglier) code.

Example 1: in C, if you use "int" for a variable "x" that should have a type of "unsigned", "x/4" will not just be a simple shift, instead three or four instructions are involved. Indeed, it would be very hard for the compiler to infer that "x" is always non-negative and optimize for you, except in the simplest cases.

Example 2: in floating-point math, "divide by 10" is not exactly the same as "multiply by 0.1", thus many compilers (gcc 3.4 without "-ffast-math", icc8 by default, and probably the Java VM) won't optimize the former into the latter, even in the many cases where it won't matter. This results in code that is 10-40 times slower on the P4.

Example 3: in Haskell, since lazy evaluation has much more overhead than eager evaluation, compilers always try to optimize the former into the latter. However, in many cases it is impossible for the compiler to do that, since it can't decide if using eager evaluation will prevent the evaluation from terminating.

In short, it is good to rely on the compiler to do the optimization (such as register allocation) that is known to be done well, but what the compiler can do is very limited, since (1) it can't know your intent if you had not expressed it, so (for example) it has to make sure that every floating-point operation conforms to very stringent error bounds, often at the cost of significant speed, even if you don't really care about that; and (2) some code-optimization problems take extortionate time to solve, or might even be theoretically infeasible in general. Therefore, when writing code that is going to take some significant CPU-time, it is good to have some good habits that helps the compiler, as long as the code isn't uglified too much.

engineering by sir_cello · 2004-05-05 22:43 · Score: 2, Interesting

A lot of this discussion here is either crap, a rehash or was covered in Engineering 101.

Basically, you have some requirements for the product, and you optimise according to those requirements. Performance is just one variable (time to market, scalability, reliability, security, usability, cost, etc - are the many others).

The requirements for a product in a fast moving market entry company are less about performance and more about rollout ASAP.

The requirements for the same product two years later may be to improve performance to achieve scalability requirements.

If you're writing some sort of overnight build or batch application: whether it takes an extra hour or not may not matter, because it has a 12 hour window to run in.

If you're writing an order processing system, then performance and end-to-end turn around to will be vitaly important, but you won't focus on the assembly, you'll focus on the algorithms, design and architecture.

If you're writing a compression or encryption module: you probably will work on the assembly.

All of the above cases, before you optimise anything: you profile and understand how the optimisation is going to pay back in real terms.

In my experience, you cannot prescribe any of this: you need to take it on case by case basis because every product and circumstance is different.

Re:me too... by Patrik_AKA_RedX · 2004-05-05 22:49 · Score: 4, Funny

No wonder we are shipping jobs to India.

Bad solution. Every Darwinist knows how to solve this. Let only the best programmers reproduce. That way we'll be breeding a race of superprogrammers!

And perhaps we could breed some very small furry humans as pets. I'm very sure there is a market for pet-humans as no less than 25% of the Andromedians voted yes on a survey asking if they would spend more than 100 Astrobucks on a pet-human if they would be smaller and less noisy.

Somebody doesn't understand O notation... by Theatetus · 2004-05-05 23:35 · Score: 2, Insightful

And quicksort will work just fine too. Sometimes O(n^2) will *not* work. Therefore never use bubblesort.

You totally missed the point, didn't you? There are situations where a bubble sort is faster than a merge sort or a quicksort. It has almost no setup overhead, so if you're sorting sufficiently small arrays (and what I remember from CS101 is that "sufficiently small" goes up to about 1000 members) bubble sort is actually significantly faster.

So, as a matter of fact, if you had to sort a million small arrays, bubble sort would be the only feasible option.

--
All's true that is mistrusted

Re:Somebody doesn't understand O notation... by Eivind+Eklund · 2004-05-06 03:37 · Score: 4, Insightful

The previous post miss out on many aspects of algorithmic optimization, and lead to the wrong conclusions.
For a better analysis of optimization in this specific part of the sort space, I recommend Jon Bentley's classic "Engineering a sort function".
This paper discuss how to implement an optimal sort, after having done real-life measurements. Conclusions include dropping to an O(N^2) sort algorithm when qsort partitions become small enough - insertion sort was choosen. (The selected cut off was secven elements at that point; it may be that it would be sensible to choose a higher cutoff for the generic case now, as the cache locality might help. However, I won't bet on this either way without doing measurements.)
The qsort implemented there is the one still used in at least FreeBSD. I don't know the status for other OSen.
As for big O notation: The discussion in the previous post is so imprecise as to be misleading. It use "cost" and "complexity" where it discuss asymptotic complexity; these are distinctly different, and it is necessary to be quite clear on the distinctions to do correct analyses.
Big-O notation measure asymptotic complexity over an arbitrarily selected set of basic operations assumed to have unit cost. It discard all constants to make the analysis easy to do and easy to work with. This is a useful tool, but it only measure asymptotic complexity, and it only does it based on arbitrary basic operations.
In practice, a mere factor 1000 speed difference (one second to twenty minutes) might be quite noticable. This will be REMOVED from the big-O analysis, which can make it point in a quite different direction from the truth.
In the parent post, sorting 1000 elements is assigned a unit cost, claiming that the time will be similar for a bubble sort and a quick sort, and "low enough not to matter". Further, the conclusion is "never use bubble sort". Assuming a naive implementation of both bubble sort and quick sort, and a set of arrays that is already sorted, the quicksort will be O(N^2) and the bubble sort will be O(N) in the number of items in each bin. This is a quite noticable difference in asymptotic complexity.
A naive programmer is in my opinion the only relevant assumption if we're to give absolute advice on simple sort functions. A non-naive programmer will know how to do complexity evaluation, will know the tradeoffs on startup of the various algorithms, and will only be implementing a sort him- or herself because actual speed measurements or specific knowledge of the sort behaviour show that the system supplied sort is not fast enough for the case in question, and that a custom sort can do better. (S)he will also evaluate whether the data to sort is likely to be almost sorted or highly random, and thus which kind of algorithm is likely to go faster. (And insertion sort/bubble sort is actually faster also for large data sets if they're almost sorted beforehand.)
Eivind, who if he had to give general advice would give "evaluate qsort, mergesort, heapsort, insertion sort, and using a data structure that keeps order before choosing bubble sort."

--
Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.

Huh? by Theatetus · 2004-05-05 23:37 · Score: 2, Informative

What are you talking about? I get paid to write open-source software. Where did you get the idea that open-source software is written entirely by volunteers?

--
All's true that is mistrusted

Coding while blind by xyote · 2004-05-05 23:40 · Score: 4, Interesting

The biggest problem I see with performance is lack of visiblity of performance factors. At the hardware level there is cache (which is supposed to be transparent) and really deep pipelined processors. This can have a major effect on what would otherwise be an optimal algorithm. And the hardware keeps changing, so what may have been optimal at one point will become suboptimal later on.

In software, the biggest problem is lack of performance directives. POSIX pthreads is one of the biggest offenders here. Best performance practices in pthreads are based on how common implementations work. POSIX allows implementations that would cause major performance problems for so called best pthread programming practices. Example, POSIX allows pthread_cond_signal implementations to wake all waiting threads, not just one. There are programs that depend on pthread_cond_signal to wake only one thread for performance in order to avoid "thundering herd" problems. So while standards allow portability of correct programs, whey do not necessarily allow portability of performance.

We need explicit performance directives.

Those who code so no one can do their work... by Anonymous Coward · 2004-05-05 23:44 · Score: 2, Insightful

I tell all programmers who work for me and let all who work with me know this:

Given that the only reason to deliberately make it hard for others to understand your work is to increase your job security, that must mean that you don't think you bring enough other skills to the job to keep it on merit.

In other words, you don't think you're good enough.

And given that most programmers think they are better than they actually are, if you don't think your good enough, why the hell should anyone else?

You're ignoring the "gotcha" by Theatetus · 2004-05-05 23:55 · Score: 4, Informative

It doesn't matter how much hardware you throw at a problem if it needs to scale properly and you have an O(n^3) solution.

Well, maybe you're not ignoring it since you said "if it needs to scale properly". But that's a very crucial "if", and the "scale properly" only refers to certain situations.

If the array you need to sort might have several million members and you won't be sorting more than a few dozen of those arrays, yes you should use an O(n lg n) or whatever sort routine. OTOH, if the array itself is smaller (a few hundred members) but you have to sort several hundred thousand of them, quicksort or merge sort will be remarkably slow compared to the much-maligned bubble sort.

Big-O notation is an asymptotically-tight bound, not the function itself. For small datasets, not only is there no guarantee that the lower big-O algorithm will be faster, it's in fact usually the case that the allegedly "less efficient" algorithm will actually be faster.

--
All's true that is mistrusted

Re:You're ignoring the "gotcha" by An+Onerous+Coward · 2004-05-06 03:21 · Score: 3, Informative

A project I was doing last semester had just what you described: thousands of arrays of twenty members each. I was still able to double the performance by switching from bubblesort to quicksort. Besides, you never know when those arrays are going to get bumped up from a few hundred members to a few thousand.

I'm still a firm believer in the principle that bubblesort is never the way to go.

--
You want the truthiness? You can't handle the truthiness!

JIT optimization is just peephole optimization by Speare · 2004-05-06 00:23 · Score: 2, Insightful

People keep saying that the JIT-style optimizers in .NET and Java can radically optimize the application "for programmers who can't or won't."

Peephole optimization and clock-scheduling are among the simplest of optimization. The machine looks at a few low-level instructions and might suggest an alternative which would operate identically but with better performance. That's really all that the VM has time or capability to perform today.

Mid-range optimizations include vectorizing, unrolling of loops, and register reduction. These are still machine-analyzable, so I expect the JIT-style optimizers to continue to make strides here.

But I don't think you're ever going to see JIT-style optimizers which replace an O(n^2) algorithm with an O(log n) algorithm. That is real optimization. That's where you win the performance races. That's the one that programmers should care about, and should learn how to do. The level of analysis required to "divine" the whole meaning of a large routine, realize the alternative algorithm equivalent, and fix up the code is far beyond any JIT solution.

I think we will have to wait far longer than the 6 GHz Longhorn machines before you see any meaningful machine optimization of sloppy code.

--
[ .sig file not found ]

Re:me too... by ggeens · 2004-05-06 00:49 · Score: 2, Interesting

Some (very old) BASIC interpreters used to parse each source line each time it was executed.

One time (around 1985 IIRC), I read an article in the local computer club's magazine. The author had written a BASIC program (GW-BASIC I think) to "shorten" BASIC programs. It would:

Shorten all variable names to 1 or 2 characters
Remove whitespace
Renumber all lines so that all GOTO n became shorter

The source code that went along with the article looked like it was used on itself (very terse).

With the machines you had back then, it probably made a difference.

--
WWTTD?

De-commenting by RogL · 2004-05-06 01:08 · Score: 4, Interesting

Back in the late '80s, early '90s, worked on (DOS-based) commercial products written in APL. APL is an interpreted language, written in Greek & math symbols, some overstrikes.

Each byte of that 640K was precious, so it was common practice to "de-comment" the code before release; remove all comments, reduce whitespace, move multiple statements onto a single line, possibly shorten variable names. You could gain a substantial (for the time) amount of memory that way. You also dynamically imported/destroyed functions.

I regularly debugged client systems with "de-commented" APL; if you could read that, you could read anything!

Re:Performance tuning. by Old+Uncle+Bill · 2004-05-06 01:09 · Score: 2, Interesting

Amen. Poorly written queries, excessive XML parses/transforms and too much bandwidth utilization are all things NOT solved by tuning the architecture. We typically make 2-3X improvements in our product through tuning the system and up to 100X by tuning the above. I've worked on projects where the system (in this case 1 4way db and two 2way app servers), could support 2 users. No amount of throwing hardware at that thing would improve the performance. Funny thing is, the client was a bit frosted because they had paid (at that time) about $4 million for the project. As a performance architect, lazy and inefficient programmers will keep me employed for centuries.

--
Yes, I am an agent of Satan, but my duties are largely ceremonial.

All your optimizations are wrong. by scorp1us · 2004-05-06 01:18 · Score: 4, Interesting

If you spend hours tweaking code to eliminate a few instructions, even instructions in a loop, then you are just wasting your time.

Real opimizations come before you right your program. Take for example that loop that you removed an instruction or two from. Say it is a searching an array. and looks like:

for (i=0; i<strlen(x); i++){ if (x[i]=='&') ; }

There are two things wrong. One you cal strlen repetitively. Strlen() is theta(n) So you have a loop that executes n times at a cost of n . n*n=n^2. That's one of the slowest algorithms around. Maybe your compiler is smart enough to see that x is not being modified and will to a s=strlen(x); then compare against X for you, but probably not.

The other thing is when searching an array, try to give it structure. If your array contains sorted characters, then you can find it in log _2 (n). Of course, of you sort by frequency (most commonly accessed at the top) then your n^2 loop *might* do better.

The article is right: constant-time operations (type checking, etc) are asymtotically infitessimal in algorithms. The article's real problem is that it is n, but on a 2d image (x*y)=n you can't do any better. Note that it is not n^2, (though it makes a square picture) because you're operating on pixels. So that will be your unit of measure - NOT time.

Which is my 2nd point. Don't measure in time or instructions. Measure in OPERATIONS. Operations are not instructions or lines of code. An Operation is everytime you have to look at a unit. It is a logical unit of cost. Hardware can be changed out. We know that hardware (performance) doubles every 18 months. The constant-time instructions will get smaller. (Also clocks per cycle are irrelevant as well). But your loops will remain the biggest part of your program.

With that, here's a crash course in CS:

Loops are most of (time, operations) the program. Try not to use them.
To avoid loops, structure your data. Giving structure means assumptions, and assumptions means you can skip irrelevant sections of data.
Determine your dataset, minimize worst-case occurences. Find out what order of data or instructions will make your n*log2(n) algorithm become n^2. Then find away around it.
and optimize for average case. That is, if you never sort more than 6 numbers at a time, an n^2 will beat a n*log_2 (n) algorithm.
If your data structure introduces overhead (most will) find yuor most common or costly operation. Optimize your datastructure for that (searching, sorting, etc) If you do a combination determine the ratio and optimize for that. The cost of overhead is usually small compared to the reason why your using a datastructure to speed up your common operation.
The most obvious and easiest to code algorithm is the slowest. (Bubble sort vs. Radix or quick-sort)
Mastery of the above is the difference between a $50k programmer and a $90K programmer.

To learn more, take a datastructures class at your local university. Please review Calculus II before doing that though.

--
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.

Re:me too... by Surt · 2004-05-06 01:41 · Score: 2, Insightful

Worse, the time it take him to delete one space or tab will always be much longer than the time saved in the parser/compiler.

--
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking

This is a well-known problem, by warrax_666 · 2004-05-06 01:58 · Score: 2, Insightful

especially in databases where the data set that you have to sort is often so big that it doesn't fit into memory. The (usual) solution is to use a variation on the well-known Merge Sort algorithm, where blocks are merged into larger and larger "runs" of sorted data (which are then merged). (The number of runs of course depends on how much data there is and how much memory you have).

--
HAND.

Re:Performance tuning. by Glonoinha · 2004-05-06 02:07 · Score: 5, Insightful

Problem is that your client is still running on the prototype of their project, not the real release. They just don't know it, and I'm guessing the original programmers don't know it.

The most effective, well used (if unintentionally used) development methodology is the prototype methodology. The first pass is simply a reality check, can we even accomplish what needs to be accomplished on the hardware and development tool we have available? The prototype is then shown to management as a proof of concept, show them that their ideas are possible, and then a second generation is re-engineered from the ground up using the lessons learned in the first generation as a foundation for a solid, well engineered deliverable product. This breaks down in one of two ways : management says screw the rewrite, lets just run what we have - or the developers are not smart enough to understand that their first pass at it wasn't production quality code, only a prototype.

What your client has right now is a prototype, a proof of concept. It 'works' inasmuch as a kite flys - as a demonstration that the concept is viable, but not meant for real work. You could probably push a big kite hard enough to 'fly' two people, but that doesn't make it a good idea. You could continue to 'tweak' a kite in order to even double the performance, get 4 people off the ground - but I wouldn't recommend using it for commercial applications.

Odds are the app needs to be understood from top to bottom so a set of software engineers know the concepts, what the package is intended to do, how it currently does it, what the expectations are for performance and growth - and then the SE's that understand it need to rewrite it from the ground up developing performance engineered code that is production quality.

--
Glonoinha the MebiByte Slayer

Better compilers, clean code by Goodbyte · 2004-05-06 02:16 · Score: 2, Insightful

I don't know Erlang, but if it is a pure functional language, the compiler/interpreter can use "special" optimizations, e.g.

decode_rgb(Pixels) -> list_to_binary(decode_rgb1(binary_to_list(Pixe ls))).

will not produce intermediate lists, instead the compiler will use lazy evaluation when decoding the data.

My point is that many optimizations do not sacrifice readability. Many times it is possible to refactor slow code that improves both readability and execution speed, but you must know the pros and cons of the tools you are using!

Ummmm... by Rufus88 · 2004-05-06 02:46 · Score: 2, Insightful

Were they C++ programs, or 20 year old shell scripts?

Intel's VTune is your friend. by yecrom2 · 2004-05-06 02:52 · Score: 3, Informative

We were introduced to vtune during a 2-week trip to Intel. Profilers are good. vtune is the best one that I've found.

The way that we use it is to not even touch it until we have the feature completely working in the simplest form possible. Then we do some performance testing. If everything works well under load, we don't even bother profiling it. Otherwise run it in vtune and see what the bottleneck is. 90% of the time, there is some type of minor oversight. Occasionaly, there is an algorithmic change that needs to take place, like adding a secondary index to something, or making some temporaries thread-local.

We run both event-counters and call-tracing, but I've found that call-tracing is far more accurate. The best use of VTune is to smite arrogant developers. The result of our trip to Intel was that one of our developers, who had to write everything from scratch, was shown that all of his "high performance components" were completely worthless.

Just my $0.02.

Matt

Re:me too... by Sepper · 2004-05-06 03:16 · Score: 2, Funny

Like this fellow 'Coder' sitting next to me, that putting most of his code in between /* and */ because 'It compiled faster'...

he was wondering why his program wasn't working

I was wondering was he was doing in Computer Engineering school...

--
I live in Soviet Canuckistan you insensitive clod!

what's the score by dutky · 2004-05-06 03:21 · Score: 2, Insightful

This guy, for some reason only vaguely defined, wants to demonstrate something about computer performance and the need (or lack thereof) for optimization. So, rather than use any of a large number of established benchmarks, he pulls a targa graphics file decoder out of his ass. This is the justification he gives:

It gets away from tired old benchmarks involving prime numbers and replaces them with something more concrete and useful. It also involves a lot of data: 576x576 24-bit pixels, for a total of 995,328 bytes. That's enough raw data processing to require performance-oriented coding, and not some pretty but unrealistic approach.

Despite the fact that nobody (other than him) seems to be interested in targa graphics these days, and that the total amount of data involved (less than a megabyte) is miniscule compared to limiting bandwidths in modern computers (ranging from ~100 MB/sec for the disk I/O subsystem, ~500 MB/sec for the main memory, and in the multiple GB/sec for the caches and internal registers).

Then he shows us just what a wonderful benchmark this program is: it is so wonderful that it runs nearly instantaneously! Maybe it's just me, but I like my benchmarks to take a little while to complete, either becuase I don't trust the sub-second accuracy of the system time routines, or because I like to get a reasonable sample of system states contributing the overall performance of the benchmark.

Next, the guy tells us how he used unsatisfactory tools to implement an ill-conceived algorithm and, glory-glory, later fixed his own dumb-ass mistakes!

Finally, he claims that he would not have been able to make the same kinds of algorithmic adjustments if he had implemented in C rather than Erlang, though it is not obvious why it would have been more difficult (and he doesn't give any arguemnt to support his assertion).

From this exercise he concludes that, GASP, optimization is still important!

What a senseless waste of skin.

Re:Performance tuning. by pboulang · 2004-05-06 03:40 · Score: 2

This is a wise post. Add in that it is extremely important to get this concept across to the one-who-signs-the-checks. This is the difference between writing software and software engineering. I appreciate your post.

--

This comment is guaranteed*

*not guaranteed

Re:me too... by randomencounter · 2004-05-06 03:48 · Score: 2, Interesting

I know where he might have gotten the idea.
The Vic20/C64 basic allowed you to merge program lines using a semicolon. This took 1 byte less per merged line, and did indeed run somewhat faster. Since the Vic20 had only 2.3K usable without tricks this was a big deal.

Of course, anyone inflexible enough to carry that through to a C++/C/Cobol project shouldn't be programming.

--
Forget diamonds, copyright is forever.

Re:me too... by Anonymous Coward · 2004-05-06 04:26 · Score: 2, Funny

That only works if programming is heritable.

In your thought experiment, a few generations down all programmers will have greasy hair & shaggy beards (heritable) & the social need (heritable) to have greasy hair & shaggy beards, with no programming ability (not heritable) ....

I'm not surprised by cat_jesus · 2004-05-06 04:30 · Score: 4, Funny

E.g., there is no way in heck that an O(n * n) algorithm can beat an O(log(n)) algorithm for large data sets, and data sets _are_ getting larger. No matter how much loop unrolling you do, no matter how you cleverly replaced the loops to count downwards, it just won't. At best you'll manage to fool yourself that it runs fast enough on those 100 record test cases. Then it goes productive with a database with 350,000 records. (And that's a small one nowadays.) Poof, it needs two days to complete now.

And no hardware in the world will save you from that kind of a performance problem.

E.g., if most of the program's time is spent waiting for a database, there's no point in unrolling loops and such. You'll save... what? 100 CPU cycles, when you wait 100,000,000 cycles or more for a single SQL query? On the other hand, you'd be surprised how much of a difference can it make if you retrieve the data in a single SQL query, instead of causing a flurry of 1000 individual connect-query-close sequences.

(And you'd also be surprised how many clueless monkeys design their architecture without ever thinking of the database. They end up with a beautiful class architecture on paper, but a catastrophic flurry of querries when they actually have to read and write it.)

E.g., if you're using EJB, it's a pointless exercise to optimize 100 CPU cycles away, when the RMI/IIOP remote call's overhead is at least somewhere between 1,000,000 and 2,000,000 CPU cycles by itself. That is, assuming that you don't also have network latency adding to that RPC time. On the other hand, optimizing the very design of your application, so it only uses 1 or 2 RPC calls, instead of a flurry of 1000 remote calls to individual getters and setters... well, that might just make or break the performance.

(And again, you'd be surprised how many people don't even know that those overheads exist. Much less actually design with them in mind.)

Not much surprises me these days. I had to rewrite a SQL trigger one time and I was very concerned about optimization because that sucker would get called all the time. I was shocked to discover that in this one particular instance a cusor solution was more efficient than utilizing a set processing methodology. I was so suprised that I wrote up a very nice paper about it for a presentation to my department, along with standard O notation and graphs.

No one knew what O notation was.

Not long after that I found out about knoppix. I burned a few disks and gave them out. Only one other person knew what linux was. It wasn't my manager.

Just last week one of our servers had a problem with IIS. "What's IIS?", My manager asks.

Here are some other gems

"We can't put indexes on our tables! It will screw up the data!"

"I've seen no evidence that shows set processing is faster than iterative processing" -- this one from our "Guru".

"What is a zip file and what am I supposed to do with it?" -- from one our our senior systems programmers in charge of citrix servers.

"What do you mean by register the dll?" -- from the same sysprog as above

They pushed a patch out for the sasser worm and about 2% of the machines got the BSOD. I finally decided to give the fix a try on my machine and it died too. I booted into safe mode and rolled it back. Everyone else had to get their machines reimaged because desktop support couldn't figure out what was wrong. Lucky for my neightbor I was able to recover most of his data before they wiped it clean. He made the mistake of letting his machine loop through reboots for two days, which hosed his HD up. Of course the PC "experts" couldn't recover any data because the machine wouldn't boot up.

Yes, I am in programmer purgatory. I am reluctant to say hell because I'm sure it can get worse. No, I'm not kidding.

Useless article by ChaosDiscord · 2004-05-06 04:40 · Score: 2, Insightful

Even traditional disclaimers such as "except for video games, which need to stay close to the machine level" usually don't hold water any more.

Sure, if you're talking about puzzle games.

However, for most retail games pushing the graphics as far as possible is important. If you can squeeze a 5% improvement out of the engine you can use the freed up time to make the game a bit prettier. Or put another way, art expands to fill available processing power. Graphics blocking on the video card? Well, you can use processing power for increasingly realistic physics simulations and artificial intelligence.

If you play a lot of games you know that there is great variance between games. Some games coast along at a bare minimum while others surprise you with their ability to create compelling visuals with older hardware.

After all, who ever thought you could use an interpreted, functional language to decode Targa images, especially without any performance concerns?

Ummmm, just about anyone sane? Wow, decoding a measely megabyte of data. And the encoding? Simple run length encoding. That's not a real programming problem; that's a homework assignment for Computer Science 101. If you're keen on loading graphics you could at least pick something that is slightly complicated like JPEG.

Is it true that optimization is massively overrated; that most programs are plenty fast? Sure. But this article doesn't provide a bit of evidence for that.

--
Search 2010 Gen Con events

Scientific computing by gnuLNX · 2004-05-06 05:21 · Score: 2, Insightful

Sorry pal....but as long as there are scientific aplications to solve there will be a need to write highly optimized code...Heck I wrote some inline assembly today. Yes we can ignore performance in some areas, but in high perfomance computing, speed is still king, and quite frankly always will be.

--
what?

VIC20 BASIC: Tokens mattered by solprovider · 2004-05-06 07:35 · Score: 3, Interesting

When I was finally allowed to program, I did it on a Commodore PET in elementary school. Of course I was writing games, and I optimized because games are not fun when they are slow. I was too lazy to type in the source from magazines, so all of my programs grew until they were usable, then grew some more as people played them and asked for features.

Junior High did not have computers yet. I finally convinced my family to get me a computer if I paid half. With my budget, that meant a VIC20 for under $100. The VIC20 had 4KB of RAM. You could buy a 16KB expansion, but I could not afford it.

The language was the same as the PET, so I tried to run my existing programs. They ran. I tried to modify them, save, and run them, and they would not work, even if the change was to remove code. I finally tried changing all the commands to "tokens" to shorten them. IIRC, a token was the first 2 characters of a command and an underscore. Since most of the commands were 4 letters, this saved quite a few characters. I also renamed all my variables to shorten them. Then I saved and the program ran. Yeah!

Then I made another change, and the problem reappeared.

I decided that:
10 The program loaded as written to the tape. (Hard drive? Floppy disk? Never heard of them.)
20 If the program fit in memory, it would run.
30 When the program was loaded for editing, all tokens were expanded to the full command.
40 The program was saved as text, except...
50 If the tokenized version of a command was encountered, then it was saved as the token. I never figured out if they were saved as the 2 Hex number, the dollar sign and the number, or the 3-character shorthand "token" I typed.
60 GOTO 10 (and see if it runs.)

So every time I wanted to modify the longer programs, I had to change every command to the "token" format. (About half of my programs were under the 4KB limit, about half could be "fixed" using this technique, and a few were large enough that I never got them working again.) Any changes to the longer programs required 20 minutes of "tokenizing" the commands before saving it. That killed much of the fun of programming. (Today I get upset if a build takes longer than a game of Solitaire, but "getting upset" means deciding to fix the build process.)

Commodore bought BASIC from MS, and then modified it, so I do not know who to blame for the hours I wasted on this, but Commodore is gone and MS continues to take the fun out of computers, so I blame MS.

---
My next venture into computers was the C64. They had "Sprites". Half of the code in my games was controlling the graphics, and this improvement to the platform made that code obsolete. For the challenge, I upgraded one game to use Sprites. They took much of the fun out of it, and (IIRC) you were limited to 4 of them, so you had to play games (pun intended) to write PacMan. (4 Ghosts and PacMan required 5 Sprites. The dots and cherries would be handled without Sprites. It was easy to write a 3-ghost PacMan game, and really difficult to write a 4-ghost game.)

Since the C64s at school did not have tape drives, my old programs had to be typed in, if I had a printout from elemetary school (no printer at home.) I already stated that I was lazy, so they are gone. Well, I still have the tapes, but they are 2 decades old, and my PC does not have a tape drive anyway.

--
I spend my life entertaining my brain.

Get programming as if SIZE matters... by Kazoo+the+Clown · 2004-05-06 08:40 · Score: 2, Insightful

I think size is a far more serious problem than speed-- just because I can put multi-gigabytes in my PC doesn't mean I want to waste them loading bloatware. In fact, I probably wouldn't NEED the multi-gigabytes if it wasn't for code bloat. Of course, Gates loves such things because the machine retailers love them-- easier to sell more memory to someone who needs it because they just tried to upgrade to XP or something.

So which compilers space-optimize by rolling loops instead of unrolling them?

Broken C++ code by d-rock · 2004-05-06 09:35 · Score: 2, Insightful

Actually, that could break C++ code that uses templates. There's a difference between

vector <pair <int, float> > myVector

and

vector<pair<int,float>>myVector

It's only subtle until you try to compile it :)

Derek

--
Don't Panic...

Re:That guy doesn't know ANYTHING about performanc by innosent · 2004-05-06 15:16 · Score: 2, Informative

Exactly, he says it's not about (discrete) mathematics, but when it comes down to what a programmer is supposed to do, it's all discrete math. You have a Turing machine (albeit limited), and the whole point is to do what needs to be done correctly and as efficiently as possible. Some things still need to be written the same way they were in "1985", unlike the author's view that optimal code doesn't matter.

Yes, machines are faster now, by at least an order of magnitude, but optimizing poor code can speed things up more than engineering a new processor.
Bubble Sort a list of one billion points of data (O(n^2) compares = k * (1e18)) on a new 3GHz machine (assuming 1 compare per clock), and you need about 3.333e8 (333333333.33) seconds (about 10.5 years).
Weak Heap (best), Quick, or Heap sort the same billion points (O(n*log n) compares = k * (~3e10)) on an old 486/33MHz (again assuming 1 compare per clock), and you need about 9.0909e3 (9090.909) seconds (about 2.5 hours).

There you go, the author can use the new Pentium 5s, and I guess the rest of us can go dig out our 486s. Sure, you don't often need to sort a billion records, but next time you do, make sure your algorithm is reasonable, then use a language that allows you to implement it. One-size-fits-all library, garbage collection, and run-time language error checking might be good for rapid development, but doing things efficiently requires lower-level interaction, sometimes even below what C allows. Bubble sort 2 records, and you won't see much benefit, but for performance critical sections of code, sometimes it's better to optimize first, then use comments to make it readable, than to write code that looks like comments. Even adding bounds checking to the sorts would at least double the time required, which in this case could mean anywhere from another 3 or 4 hours up to the time left until you collect social security.

--
--That's the point of being root, you can do anything you want, even if it's stupid.

Slashdot Mirror

Programming As If Performance Mattered

155 of 615 comments (clear)