Slashdot Mirror


Programming As If Performance Mattered

Junks Jerzey writes "Saw the essay 'Programming as if Performance Mattered', by James Hague, mentioned at the Lambda the Ultimate programming language weblog. This is the first modern and sensible spin on how optimization has changed over the years. The big 'gotcha' in the middle caught me by surprise. An inspiring read." Hague begins: "Are particular performance problems perennial? Can we never escape them? This essay is an attempt to look at things from a different point of view, to put performance into proper perspective."

72 of 615 comments (clear)

  1. The question I always ask is by Anonymous Coward · · Score: 5, Insightful

    Is the time it takes me to do the performance optimization worth it in time or money.

    1. Re:The question I always ask is by irokitt · · Score: 4, Insightful

      Probably not, but if you are working on an open source project, we're counting on you to make it faster and better than the hired hands at $COMPANY. That's what makes OSS what it is.

      --
      If my answers frighten you, stop asking scary questions.
    2. Re:The question I always ask is by prockcore · · Score: 2, Insightful

      Is the time it takes me to do the performance optimization worth it in time or money.

      The question I ask is, can the server handle the load any other way? As far as my company is concerned, my time is worth nothing. They pay me either way. The only issue is, and will always be, will it work? Throwing more hardware at the problem has never solved a single performance problem, ever.

      We've been slashdotted twice. In the pipeline is a database request, a SOAP interaction, a custom apache module, and an XSLT transform.

      Our server never even came close to its breaking point. I attribute it to optimizing for performance.

    3. Re:The question I always ask is by bm_luethke · · Score: 3, Insightful

      "Is the time it takes me to do the performance optimization worth it in time or money."

      To a certain extent. I've seen that excuse for some pretty bad/slow code out there.

      Writing effecient and somewhat optimised code is like writing readable extensable code: if you design and write with that in mind you usually get 90% of it done for very very little (if any) extra work. Bolt it on later and you usually get a mess that doesn't actually do what you intented.

      A good programmer should always keep both clean code and fast code in mind while writing software.

      --
      ------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
    4. Re:The question I always ask is by maxwell+demon · · Score: 4, Insightful

      If working on OSS, clarity of the code has to be one of the top goals. Because if the code is not clear, you're less likely to find others interested in improving it.

      --
      The Tao of math: The numbers you can count are not the real numbers.
  2. What annoys me by Anonymous Coward · · Score: 4, Insightful

    is that ms word 4 did all I need, and now the newest office is a thousand times the size and uses so much more cpu and ram but does no more.

    a sad inditement

    1. Re:What annoys me by DrEasy · · Score: 5, Insightful
      And you aren't still using it why? (hint--your answer is the reason why MS 4 doesn't do all you need.)
      Or maybe because you are forced to upgrade to read files that were created with a more recent version?

      --
      "In our tactical decisions, we are operating contrary to our strategic interest."
    2. Re:What annoys me by Bush+Pig · · Score: 2, Insightful

      Probably the only thing Word 4 doesn't do that he needs is read the Word 97 (or whatever) files that other people keep sending him.

      --
      What a long, strange trip it's been.
  3. the software taketh what the hardware giveth. by equex · · Score: 4, Insightful

    i remember times when 7.14mhz and 256k ram was enough to drive a multitasking windowed os. (amiga)
    ive seen glenz vectors and roto-zoomers on the commodore 64.
    modern os's, escpecially windows seem super-sluggish when you see what is possible on those old computers if you just care to optimize the code to the max.

    --
    Can I light a sig ?
  4. You don't optimize, that's the job of the compiler by Anonymous Coward · · Score: 2, Insightful

    If you write clear and simple code the compiler or interpreter does all the other work. It will automatically remove unused code and simplify complex segments. So long as your code is not unnecessarily convoluted often the machine optimizations are better than the human brain optimizations. It's like register allocation. You don't do that by hand. That's just crazy! Some poor fools 20 years ago had to do it by hand and came up with an algorithm to do it that the computer just does for you.

    That's the difference between modern languages and more archaic ones. Sure you can't get the "absolute best" most optimized optimization, but you're probably going to get a better optimization than you can think of just from the interpreter/compiler doing its job.

    The only thing that really needs optimization is streamlining data structures because the compiler can't predict what part of the data structure isn't used during runtime. You just ned to make sure you use the right data structure for the job and put the basic pen-and-paper (optimized) algorithm down in plain code. No strange hacker tricks needed.

  5. Re:Funny thing about performance by corngrower · · Score: 5, Insightful
    Any tendency to optimize prematurely ought to be avoided, at least until after v1.0 ships.


    Assuming there is a second version, which there may not be because potential customers found that the performance of v1.0 sucked.

  6. Premature Optimization by Godeke · · Score: 4, Insightful

    One of the concepts touched upon is the idea that optimization is only needed after profiling. Having spent the last few years building a system that sees quite a bit of activity, I have to say that we have only had to optimize three times over the course of the project.

    The first was to get a SQL query to run faster: a simple matter of creating a view and supporting indexes.

    The second was also SQL related, but on a different level: the code was making many small queries to the same data structures. Simply pulling the relevant subset into a hash table and accessing it from there fixed that one.

    The most recent one was more complex: it was similar to the second SQL problem (lots of high overhead small queries) but with a more complex structure. Built an object to cache the data in with a set of hashes and "emulated" the MoveNext, EOF() ADO style access the code expected.

    We have also had minor performance issues with XML documents we throw around, may have to fix that in the future.

    Point? None of this is "low level optimization": it is simply reviewing the performance data we collect on the production system to determine where we spend the most time and making high level structural changes. In the case of SQL vs a hash based cache, we got a 10 fold speed increase simply by not heading back to the DB so often.

    Irony? There are plenty of other places where similar caches could be built, but you won't see me rushing out to do so. For the most part performance has held up in the face of thousands of users *without* resorting to even rudementry optimization. Modern hardware is scary fast for business applications.

    --
    Sig under construction since 1998.
  7. Re:Funny thing about performance by metlin · · Score: 4, Insightful

    Well said.

    However, I will dispute the claim that performance gains happen only at the hardware level - although programmers cannot really optimize every tiny bit, there is no harm in encouraging good programming.

    The thing is that a lot of programmers today have grown NOT to respect the need for performance - they just assume that the upcoming systems would have really fast processors and infinite amounts of RAM and diskspace, and write shitty code.

    I agree that like Knuth said, premature optimization is the root of all evil. However, writing absolutely non-optimized code is evil in itself - when a simple problem can be simplified in order and time, it's criminal not to :)

    A lot of times, programmers (mostly the non-CS folks who jumped the programming bandwagon) write really bad code, leaving a lot of room for optimization. IMHO, this is a very bad practice, something that we have not really been paying much attention to because we always have faster computers coming up.

    Maybe we never will hit the hardware barrier, I'm sure this will show through.

  8. Depends on your target by KalvinB · · Score: 4, Insightful

    Working on a heavily math based application speed is necessary to the point that the user is not expected to wait a significant amount of time without something happening. I have a large background in game programming working on crap systems and it comes in handy. My tolerance for delays goes to about half a second for a complete operation. It doesn't matter how many steps are needed to perform the operation, it just all has to be done in less than half a second on a 1200Mhz system. My main test of performance is seeing how long it takes for Mathematica to spit out an answer compared to my program. Mathematica brags about being the fastest and most accurate around.

    When operations take several seconds a user gets annoyed. The program is percieved to be junk and the user begins looking for something else that can do the job faster. It doesn't matter if productivity is actually enhanced. It just matters that it's percieved to be enhanced or that the potential is there.

    You also have to consider if the time taken to complete an operation is just because of laziness. If you can easily make it faster, there's little excuse not to.

    For distributed apps you have to consider the cost of hardware. It may cost several hours of labor to optimize but it may save you the cost of a system or few.

    In the world of games half a second per operation works out to 2 frames per second which is far from acceptible. Users expect at minimum 30 frames per second. It's up to the developer to decide what's the lowest system they'll try to get that target on.

    You have to consider the number of users that will have that system vs the amount it will cost to optimize the code that far.

    In terms of games you also have to consider that time wasted is time possibly better spent making the graphics look better. You could have an unoptimized mesh rendering routine, or a very fast one and time left over to apply all the latest bells and whistles the graphics card has to offer.

    There are countless factors in determining when something is optimized enough. Games more so than apps. Sometimes you just need to get it out the door and say "it's good enough."

    Ben

  9. Re:Don't agree by mfago · · Score: 4, Insightful

    Not what I got out of it at all, rather:

    Clear concise programs that allow the programmer to understand -- and easily modify -- what is really happening matter more than worrying about (often) irrelevant details. This is certainly influenced by the language chosen.

    e.g. I'm working on a large F77 program (ugh...) that I am certain would be much _faster_ in C++ simply because I could actually understand what the code was doing, rather than trying to trace through tens (if not hundreds) of goto statements. Not to mention actually being able to use CS concepts developed over the past 30 years...

  10. optimize with discretion by kaan · · Score: 2, Insightful

    All projects are an exercise in scheduling, and something is always bound to fall of the radar given the real-world time constraints. In my experience, the right thing to do is get a few smart people together to isolate any problem areas of the product, and try to determine whether that code might produce performance bottlenecks in high-demand situations. If you find any warning areas, throw your limited resources there. Don't fret too much about the rest of the product.

    In the business world, you have to satisfy market demands and thus cannot take an endless amount of time to produce a highly optimized product. However, unless you are Microsoft, it is very difficult to succeed by quickly shoving a slow pile of crap out the door and calling it "version 1".

    So where do you optimize? Where do you concentrate your limited amount of time before you miss the window of opportunity for your product?

    I know plenty of folks in academia who would scoff at what I'm about to say, but I'll say it anyway... just because something could be faster, doesn't mean it has to be. If you could spend X hours or Y days tweaking a piece of code to run faster, would it be worth it? Not necessarily. It depends on several things, and there's no really good formula, each case ought to be evaluated individually. For instance, if you're talking about a nightly maintenance task that runs between 2am and 4am when nobody is on the system, resource consumption doesn't matter, etc., then why bother making it run faster? If you have an answer, then good for you, but maybe you don't and should thus leave that 2 hour maintenanc task alone, spend your time doing something else.

    For people who are really into performance optimization, I say get into hardware design or academia, because the rest of the business world doesn't really seem to make time for "doing things right" (just an observation, not my opinion).

  11. One thing new programmers often miss by xant · · Score: 2, Insightful

    Less code is faster than more code! Simply put, it's easier to optimize if you can understand it, and it's easier to understand if there's not so much of it. But when you optimize code that didn't really need it, you usually add more code; more code leads to confusion and confusion leads to performance problems. THAT is the highly-counterintuitive reason premature optimization is bad: It's not because it makes your code harder to maintain, but because it makes your code slower.

    In a high-level interpreted language with nice syntax--mine is Python, not Erlang, but same arguments apply--it's easier to write clean, lean code. So high-level languages lead to (c)leaner code, which is faster code. I often find that choosing the right approach, and implementing it in an elegant way, I get performance far better than I was expecting. And if what I was expecting would have been "fast enough", I'm done -- without optimizing.

    --
    It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
  12. Re:speed/easy coding by ArbitraryConstant · · Score: 4, Insightful

    On the server side security is an issue (also on the client side, clearly). If your code isn't clear and correct, the number of bugs is likely to be higher than average, and bugs lead to exploits. Your libraries may be well written, I don't know specifically. It's possible to do both, just hard.

    --
    I rarely criticize things I don't care about.
  13. Code tweaking by Frequency+Domain · · Score: 5, Insightful
    You get way more mileage out of choosing an appropriate algorithm, e.g., an O(n log n) sort instead of O(n^2), than out of tweaking the code. Hmmm, kind of reminds me of the discussion about math in CS programs.

    Every time I'm tempted to start micro-optimizing, I remind myself of the following three simple rules:

    • 1) Don't.

    • 2) If you feel tempted to violate rule 1, at least wait until you've finished writing the program.
      3) Non-trivial programs are never finished.
  14. Re:speed/easy coding by edwdig · · Score: 3, Insightful

    I'd say his points are more true on the server side than the client side.

    Say you're a large business, and you have a mix of client side and server side applications. Both have significant processing time requirements Which do you spend more time optimizing?

    In this scenario, you're going to have a large number of client machines and a small number of servers. If servers need a little more power, you can upgrade the machine without too much disruption or money spent. The upgrade will benefit all users of the system. In this case, it's more cost effective to upgrade the server than it is to pay developers to optimize the hell out of the code.

    The client machines is a different story. There's a lot of machines in use. Upgrading any one will only help the user of that computer. Optimizing the code will help every user. In this case, paying a developer to optimize your code will be a lot cheaper than doing a company wide hardware upgrade.

    This is all of course assuming you're designing things well in the first place. Of course you should do things like use a quick sort (or whatever may be more appropriate in the case at hand) instead of a bubble sort. The point is its not worth spending days to get the last 1% of performance.

  15. Re:Funny thing about performance by naden · · Score: 2, Insightful

    Assuming there is a second version, which there may not be because potential customers found that the performance of v1.0 sucked.

    Better a version 1.0 that sucked than none at all.

    And funny how Microsoft seems to release so many crappy 1.0 releases yet usually ends up clawing back to become the market leader.

    --
    Funtage Factor: Purple
  16. Re:You don't optimize, that's the job of the compi by techno-vampire · · Score: 4, Insightful
    If you write clear and simple code the compiler or interpreter does all the other work.

    I remember looking over something once that was clear, simple and very slow. It was a set of at least twenty if statements, testing the input and setting a variable. The input was tested against values in numeric order, and the variable was set the same way. Not even else if's so that the code had to go through every statement no matter the value. I re-wrote it to a single if, testing to see if the input were in the appropriate range and calculating the variable's value. No compiler is going to do that. Brute force can be clear, simple and slow.

    --
    Good, inexpensive web hosting
  17. Performance, an aspect of design and understanding by StevenMaurer · · Score: 2, Insightful

    This article seems to be something that I learned twenty years ago... performance is an aspect of good design.

    That is why I insist on "optmization" in the beginning. Not peephole optimization - but design optimization. Designs (or "patterns" in the latest terminology) that are fast are also naturally simple. And simple - while hard to come up with initially - is easy to understand.

    But that's also why I discount any "high level language is easier" statement, like this fellow makes. It is significantly harder to come up with a good architecture than learning to handle a "hard" language. If you can't do the former (including understanding the concepts of resource allocation, threads, and other basic concepts), you certainly aren't going to do the latter. Visual Basic is not an inherently bad language because you can't program well in it. It just attracts bad programmers.

    And that goes the same for many of the newer "Basics": these "managed languages" that make it so that people can "code" without really understanding what they're doing. Sure, you can get lines of code that way. But you don't get a good product.

    And then the whole thing falls apart.

  18. Bad performance is built in. by BigZaphod · · Score: 2, Insightful

    There seems to be two basic causes of bad performance:

    1. Mathematically impossible to do it any other way.
    2. Modularity.

    Of course crap code/logic also counts, but it can be rewritten.

    The problem with modularity is that it forces us to break certain functions down at arbitrary points. This is handy for reusing code, of course, and it saves us a lot of work. Its the main reason we can build the huge systems we build today. However, it comes with a price.

    While I don't really know how to solve this practically, it could be solved by writing code that never ever calls other code. In other words, the entire program would be custom-written from beginning to end for this one purpose. Sort of like a novel which tells one complete story and is one unified and self-contained package.

    Programs are actually written more like chapters in the mother of all choose-your-own-adventure books. Trying to run the program causes an insane amount of page flipping for the computer (metaphorically and actually :-))

    Of course this approach is much more flexible and allows us to build off of the massive code that came before us, but it is also not a very efficient way to think about things.

    Personally, I think the languages are still the problem because of where they draw the line for abstractions. It limits you to thinking within very small boxes and forcing you to express yourself in limited ways. In other words, your painting can be as big as you want, but you only get one color (a single return value in many languages). It is like we're still stuck at the Model T stage of language development--it comes in any color you want as long as its black!

  19. Followed your link by ccoakley · · Score: 3, Insightful

    1. Is that your sig or is that part of your comment? If it is part of your comment, please explain why it would give me a whole new view on performance. If it's your sig, then spooky how it was related to the topic.

    2. Assuming your stuff is good, when are you going to code up SHA-1 (*MY* favorite hash)?

    3. On the server side of things, I would argue that correctness is more important than otherwise. If an app crashes 1 in 100 times for a desktop user, the developer blames windows and the user is satisfied (don't flame me on this, please). On the server, if the app crashes 1 in 100 times, it may bring down the transactions for 100s of users, making things very bad for the developer. For non-crash correctness problems, consider a problem which makes a minor, but cumulative error in subsequent runs. That would likely be disasterous for the server situation.

    As far as clarity, find me one developer who has taken over a project and not complained about the quality of the inherited code ever. Seriously. (that's not directed at parent)

    --
    Network Security: It always comes down to a big guy with a gun.
    1. Re:Followed your link by BlackHawk-666 · · Score: 5, Insightful
      When coding anything, msot of the code should be done to be fast

      This may be true when you are producing libraries of math routines and similar stuff like you are doing. It doesn't hold an ounce of water when you do the sort of work I do. My projects are generally medium sized, mixed languages, developers of all different skill levels. Code clarity is far more important for 98% of the stuff we do. I need my juniors to be able to follow the code the seniors write, even if they can't write it themselves. The other 2% of the time it's fine to sacrifice clarity for speed to get the performance to an acceptable level on the target platform.

      I have generally found that clear code is usually good code, so long as you are aware of the cost implications of your design decisions. For instance, I seem to recall the bubble sort (mentioned earlier) was actually faster than a qsort under some circumstances. Deep data knowledge would help you to make the decision as to which would need to be used...don't just reach for that qsort, it may be the fastest under most cases, but not all.

      --
      All those moments will be lost in time, like tears in rain.
  20. right for the wrong reasons by epine · · Score: 4, Insightful


    Sigh. One of the best sources of flamebait is being right for the wrong reasons.

    Surely C++ must rate as the least well understand language of all time. The horrors of C++ are almost entirely syntactic, beginning with the decision to maintain compatibility with the C language type declaration syntax and then adding several layers of abstraction complexity (most notably namespaces and templates).

    There are only two areas where I fear C++ for program correctness. The first is making a syntactic brain fart leading to an incorrect operator resolution or some such. These can be tedious to ferret out, but most of these battles are fought with the compiler long before a defect makes it into the production codebase.

    My second source of fear concerns interactions of exception unwinding across mixtures of object oriented and generic components. I see this as the only case where managed memory provides a significant advantage: where your program must incorporate exception handling. If you can't manage your memory correctly in the absence of the exception handling mechanism, I really don't believe you can code anything else in your application correctly either. I think exceptions are mostly a salvation for poor code structure. If all your code constructs are properly guarded, you don't need an error return path. Once a statement fails to achieve a precondition for the code that follows, the code path that follows will become a very efficient "do nothing" exercise until control is returned to a higher layer by the normal return path, whereupon the higher layer of control can perform tests about whether the objectives were achieved or not and take appropriate measures. I think the stupidest optimization in all of programming is cutting a "quick up" error return path that skips the normal path of program execution so that the normal path of execution can play fast and loose with guard predicates.

    The four languages I use regularly are C, C++, PHP, and Perl. Perl is the language I'm least fond of maintaining. Too many semantic edge cases that offer no compelling advantage to motivate remembering the quirk. C++ has many strange cases, but for C++ I can remember the vast majority of these well enough, because I've stopped to think about how they evolved from taking the C language as a starting point.

    I happen to love PHP for the property of being the most forgettable of all languages. I forget everything I know about PHP after every program I write, and it never slows me down the next time I sit down to write another PHP program. The managed memory model of PHP appeals to me in a way that Java doesn't, because as an inherently session-oriented programming model, PHP has a good excuse for behaving this way.

    I have a love/hate relationship with both C and C++. I write one program at a high level of abstraction in C++ and then when I return to C it feels like a breath of fresh air to live for a while in an abstraction free zone, until the first time I need to write a correctness safe string manipulation more complicated than a single sprintf, and then I scream in despair.

    The part of my brain that writes correct code writes correct code equally easily in all of these languages, with Perl 5 slightly in the rear.

    If I really really really want correct code I would always use C++. The genericity facilities of C++ create an entire dimension of correctness calculas with no analog in most other programming languages. The template type mechanism in C++ is a pure functional programming language just as hard core as Haskell, but because C++ is a multi-paradigm language, in C++ you only have to pull out the functional programming hammer for the slice of your problem where nothing less will do.

    What I won't dispute is that C++ is a hard language to master to the level of proficiency where it becomes correctness friendly. It demands a certain degree of meticulous typing skills (not typing = for ==). It demands an unflagging determination to master the sometim

  21. This isn't an article about optimization by the_skywise · · Score: 4, Insightful

    So much as an attempt to "prove" that programming to the metal is no longer necessary or desireable. (IE "After all, if a C++ programmer was truly concerned with reliability above all else, would he still be using C++?" )

    The analogy is all wrong. These days there are distinctly two types of "optimization". Algorithmic and the traditional "to the metal" style.

    During college I worked with the English department training English students to use computers as their work had to be done on a computer. (This was before laptops were commonplace) The theory was that word processing allowed students a new window into language communication. To be able to quickly and painlessly reorganize phrases, sentences and paragraphs showed the students how context, clarity and meaning could change just by moving stuff around.

    This is what the author has discovered. That by being able to move code actions around, he can experiment and "play" with the algorithm to boost speed while keeping error introduction to a minimum. (Ye olde basic anyone?)

    He mistakenly equates this to "advanced technologies" like virtual machines and automatic memory buffer checking. In reality, we've just removed the "advanced technologies" from the process. (IE Like pointers, dynamic memory allocation, etc) (IE, ye olde basic anyone?)

    There's nothing wrong with this. Though I am a C++ programmer by trade, I was far more productive when I was professionally programming Java. But that was because I had LESS creative control over the solution because of the language syntax. No passed in variable changing, no multiple inheritance, etc. So I'm thinking of how to layout the code, there's pretty much a limited way of how I'm going to go about doing that.

    It's like the difference between having the Crayola box of 8 crayons and the mondo-uber box of 64. If you're going to color the green grass with the box of 8, you've got: Green. If you've got 64 colors, you're going to agonize over blue-green, green-blue, lime green, yellow-green, pine green and GREEN.

    That doesn't make C++ less "safe" than Java. Sure, you can overwrite memory. But you can also create a Memory class in C++ ONCE which will monitor the overflow situation FOR you and never have to worry again.

    But back to optimization:
    66 fps seems really fast. But in game context it's still kind of meaningless. Here's why. You're not just displaying uncompressed images. You're also doing AI, physics, scoring, digital sound generation, dynamic music, User input, possibly networking. As a game programmer, you don't stop at 66 fps. Because if you do 132 fps, then you can really do 66 fps, and still have half a second left over to do some smarter AI or pathfind. Or if you get it up to 264 fps than you can spend 1/4 of the cycle doing rendering, maybe you can add true Dynamic voice synthesis so you don't have to prerecord all your speech!

    Ultimately, my point is this. (and I think this is what the author intended) You're going to get bugs in whatever language you write in. That's the nature of the beast. VM's and 4th generation languages take away the nitty gritty of programming while still providing alot of performance power. And in alot of cases, that's a good thing. But it's still nothing more than a "model" of what's really going on in the hardware. If you're really going to push the limits of the machine you have to be able to control all aspects of it. Now, it's getting harder to do that in Windows. We spend more time coding to the OS than the metal. But in the embeddes systems category, and in console video game systems the metal still reigns and if you're going to develop a game that will push the hardware, you're going to need a programming language that will let you speak machine language. Not one that's going to protect you from yourself.

    As it was in the beginning, as it always will be: Right tool for the right job.

    1. Re:This isn't an article about optimization by Anonymous Coward · · Score: 2, Insightful
      But it's still nothing more than a "model" of what's really going on in the hardware. If you're really going to push the limits of the machine you have to be able to control all aspects of it.


      For instance, the famous 'Goto is considered harmful'.

      In actual machine code, the processor's equivalent of 'goto' (usually called a 'jump') is one of the most common operations...

      Another way of looking at this is antilock brakes in cars.

      It's not so much that the 'new' way of doing things is really any better to a skilled user. But they sure help reduce headaches caused by a lack of skill on the part of a new and/or less talented person.
  22. Depends on the design and the bottleneck by www.sorehands.com · · Score: 2, Insightful

    What you say makes sense, but is completely wrong.

    You have to consider the entire system design when looking at the bestplace to make the optimization. You need to look at what the bottleneck and attack that, but keep in mind the issue in upgrading the system.

  23. Re:Throw hardware at it. by defile · · Score: 3, Insightful

    Well, that depends.

    You probably picked the simplest, dumbest algorithm and probably used the most basic data structure. Why do all of the hard work when you don't even know if the easy work will suffice?

    If they don't suffice, your options are to develop your own algorithm/find a better one and a more natural data structure, or to throw hardware at it. Chances are, you won't be lucky enough that you can just upgrade so you'll have to spend valuable programmer time implementing a more complex algorithm that will need more careful maintenance that is likely to have more bugs that is probably less robust. You'll probably have to convert the data to a more machine-friendly format. Maybe you'll have to inconvenience the user or ship a lot of precompiled data. Whatever.

    It's rare that the easy algorithm is slow enough that it won't do as-is, but fast enough that doubling cpu power makes it tolerable. Usually there are orders of magnitude differences between the "best" algorithm and the easy algorithm, and only incremental speed bumps in computer offerings.

    On the other hand, maybe with an extra GB of RAM you'll never have to touch swap. Maybe that's good enough. ;)

  24. Re:speed/easy coding by Lord+Kano · · Score: 5, Insightful

    Really? How about the server side of things?

    On the server side, I'd say that correctness and clarity are even more important. I guess it's all a matter of opinion as to where the "sweet spot" is, but most programming involves finding the right balance between speed and clarity.

    If you're in a situation where you need the servers to process large amounts of data, you're most likely in a position to be able to justify the expense of throwing better hardware at the problem.

    LK

    --
    "Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
  25. Re:Funny thing about performance by kubrick · · Score: 5, Insightful

    All Microsoft have to do is pre-announce features that won't be in their products until v3, well before the release of v1 (vide Go Corporation & Pen Windows), and that's enough to kill off the competition. Microsoft's success has become a self-fulfilling prophect for most of the market these days...

    --
    deus does not exist but if he does
  26. Asymptotic performance by alanwj · · Score: 4, Insightful

    The problem, in my opinion, is that people go about optimizing in the wrong place.

    You can spend all day optimizing your code to never have a cache-miss, a branch misprediction, divisions, square roots, or any other "slow" things. But if you designed an O(n^2) algorithm, my non-optimized O(n) algorithm is still going to beat it (for sufficiently large n).

    If the asymptotic performance of your algorithm is good, then the author is right, and you may not find it worth your time to worry about further optimizations. If the asymptotic performance of your algorithm is bad, you may quickly find that moving it to better hardware doesn't help you so much.

    Alan

    1. Re:Asymptotic performance by Flyboy+Connor · · Score: 4, Insightful
      To put it in different terms: Optimisation is in finding a good algorithm, not in tweaking code details.

      To give a nice example: a colleague of mine worked on a program that took two months to execute (it consisted of finding the depth of all connections between all nodes in a graph containing 50,000 nodes). Since the customer needed to run this program once a month, this took far too long. So my colleague rewrote the whole program in assembly, which took him a few months, managing to reduce the required time to, indeed, one month.

      My boss then asked me to take a look at it. Together with a mathematician I analysed the central function of the program, and we noticed that it was, basically, a matrix multiplication. We rewrote the program in Delphi in an hour or so, and reduced the required running time to less than an hour.

      I won't spell out the lesson.

    2. Re:Asymptotic performance by lahi · · Score: 2, Insightful

      I disagree: Finding a good algorithm (indeed, finding the *best* algorithm for a task), is merely good programming. (And *inventing* a good algorithm is *excellent* programming!) Implementing it in the best possible manner, including applying shortcuts which are known to be possible due to knowledge of the specific task to which the algorithm is applied, is optimising.

      You might be a good optimiser, or you might just be a good programmer. Your colleague however, is a bad programmer.

      -Lasse

  27. Re:Don't agree by Frobnicator · · Score: 5, Insightful
    Just now I'm working on an econometric model ... in real life applications, playing with a Targa file is not the same as service critical, 300 users, number crunching, data handling systems, where a small performance improvement must be multiplied by the number of users/uses, by many many hours of operation and by years in service to understand its true impact. ... I'm playing with over 300 variables and 100 parameters to simulate dozens of different scenarios can make any server beg for more cycles, and any user beg for a crystal ball.

    I don't think that fits into the description the article was talking about.

    The point of this article is not targeted to you. I've seen interns as recent as last year complain about the same things mentioned in the article: division is slow, floating point is slow, missed branch prediction is slow, use MMX whenever more than one float is used, etc.

    The point I get out of the article is not to bother with what is wasteful at a low level, but be concerned about the high levels. A common one I've seen lately is young programmers trying to pack all their floats into SSE2. Since that computation was not slow to begin with, they wonder why all their 'improvements' didn't speed up the code. Even the fact that they are doing a few hundred unneccessary matrix ops (each taking a few hundred CPU cycles) didn't show up on profiling. Their basic algorithm in a few cases I'm thinking about are either very wasteful, or could have been improved by a few minor adjustments.

    The article mentions some basic techniques: choosing a different algorithm, pruning data, caching a few previously computed results, finding commonalities in data to improve the altorithm. Those are timeless techniques, which you probably have already learned since you work on such a big system. Writing your code so that you can find and easily implement high-level changes; that's generally more important than rewriting some specific block of code to run in the fewest CPU cycles.

    A very specific example. At the last place I worked, there was one eager asm coder who write template specializations on most of the classes in the STL for intrinsic types in pure asm. His code was high quality, and had very few bugs. He re-wrote memory management so there were almost no calls to the OS for memory. When we used his libraries, it DID result in some speed gains, and it was enough to notice on wall-clock time.

    However... Unlike his spending hundreds of hours on this low-return fruit, I could spend a day with a profiler, find one of the slower-running functions or pieces of functionality, figure out what made it slow, and make some small improvements. Usually, a little work on 'low-hanging fruit', stuff that gives a lot of result for a little bit of work, is the best place to look. For repeatedly computed values, I would sometimes cache a few results. Other times, I might see if there is some few system functions that can be made to do the same work. On math-heavy functions, there were times when I'd look for a better solution or 'accurate enough but much faster' solution using calculus. I'd never spend more than two days optimizing a bit of functionality, and I'd get better results than our 'optimize it in asm' guru.

    Yes, I would spend a little time thinking about data locality (stuff in the CPU cache vs. ram) but typically that doesn't give me the biggest bang for the buck. But I'm not inherently wasteful, either. I still invert and multiply rather than divide (it's a habit), but I know that we have automatic vectorizers and both high-level and low-level optimizers in our compilers, and an out-of-order core with AT LEAST two floating point, two integer, and one memory interface unit.

    And did I mention, I write software with realtime constraints; I'm constantly fighting my co-workers over CPU quotas. I read and refer to the intel and AMD processor documentation, but usually only to see which high-level functionality best lends itself to the hardware. I am tempted to 'go straight to the metal' occasionally, or to count the CPU cycles of everything, but I know that I can get bigger gains elsewhere. That's what the point of the article is, I believe.

    --
    //TODO: Think of witty sig statement
  28. Re:This guy is out on a limb by ibullard · · Score: 2, Insightful
    That should read "a 512x448 image 30-60 TIMES a second.

    I should have ended the post by typing "HEY, GRAMMER FREAKS! LOOK AT ME! I SUCK!" instead of writing that last sentence.

  29. Quick, go get Fortran 95. by mbkennel · · Score: 3, Insightful

    Do NOT convert to C++ under any circumstances!

    Fortran 77 sucks.

    But C++ sucks, in different ways.

    Fortran 95 is a much better language than Fortran 77, and for many things, better than C++ as well.

    It is practically a new language with an old name.

    If you currently have a F77 code, it is almost certainly far better to start using Fortran 95.

    Essentially all Fortran 95 implementations have compile and run-time checks which can make things as safe as Java, and when you take off the checks, things will run very fast. With the Intel Fortran 8.0, probably faster than anything other than hand-tuned assembly. You will probably whip GCC C++.

    It is also quite doubtful you will get significantly better performance in C++.

    No, I am not an old-fogey (I'm 35 now, programming since age 13). I learned Basic on the Apple II+, then Fortran 77 and C simultaneously when I got my first summer job. Then C++. Then Eiffel and Sather. Then Fortran 95.

    Yes, indeed, fully knowing C++ I choose Fortran 95 for technical superiority in the problems I want and need to solve. (Sather was the best ever, but now dead, Eiffel good if you have sophisticated data structures and you don't need multi-dim arrays, and F95 best for any linkage to Fortran and multi-dim arrays, modules but not objects).

    The problem is that C++ bugs, though less frequent than bugs in C, are can be deep, subtle and severe. The language has very opaque bits. Include files are antideluvian. Pointers and references, baroque and archaic. Object model brittle. Templates powerful and dangerous. A hideous and error-prone syntax.

    This is not the case in Fortran 95. Other than fully algorithmic bugs are shallow.

    "computer science" truly misunderestimates Fortran 95.

  30. Re:You don't optimize, that's the job of the compi by oskillator · · Score: 3, Insightful
    If you write clear and simple code the compiler or interpreter does all the other work. It will automatically remove unused code and simplify complex segments. So long as your code is not unnecessarily convoluted often the machine optimizations are better than the human brain optimizations.

    A compiler can do low-level optimization, but it can't figure out a better algorithm for you, and the simplest, least convoluted algorithm is usually not the fastest.

    All the assembly language fiddling in the world -- by the optimizer or by hand -- will give you maybe a 2x performance over C, 10x over perl, but a better algorithm will often increase performance by many orders of magnitude.

  31. Re:Clarity in Code by Designadrug · · Score: 2, Insightful
    As far as clarity, find me one developer who has taken over a project and not complained about the quality of the inherited code ever.

    Guilty. But at least I'm thinking about the poor SOB who's going to be maintaining my code. In fact we were just implementing new functionality using a superfast but arcane algorithm and were having trouble debugging it (mucho matrix maths - yuk). Instead of finishing that, we researched another algorithm that instead uses triply-nested loops with two conditionals. It won't be half as fast (because of conditionals within loops of course) but it will be a heck of a lot easier for my successor to maintain. (Took 10 minutes to implement and worked first time. Had to check I wasn't stuck in a BTL simulation)

    --
    Cogitum Ergo Hatto
  32. Re:speed/easy coding by maxwell+demon · · Score: 2, Insightful

    At the server, correctness matters even more: A slow server may get overloaded by too many requests, but a fast but incorrect server process may be a security problem.

    Of course, a correct and fast server is much better than a correct and slow server.

    --
    The Tao of math: The numbers you can count are not the real numbers.
  33. Re:This guy is out on a limb by prockcore · · Score: 3, Insightful

    Yeah, as long as you write simple, 2D games(like the author of the essay does) that would be true.

    Not only that, but even simple 2d games can need optimizing. Perhaps they need optimizing because they're on an inherently slow platform (like Flash or a cell phone), or perhaps they need optimizing because they're multiplayer (and games with bad network code are immediately obvious and usually fail miserably)

    I find it strange that so many programmers here talk about things being "fast enough" or "not worth my time"... yet any article about mozilla, openoffice, windows, osx, damn near any software package with a gui is filled with complaints about slowness and bloat.

    Makes you wonder what IS worth their time.

  34. Re:Don't agree by alex_tibbles · · Score: 2, Insightful

    Exactly. The point of the article is (as someone else pointed out) is that clear, high-level code is easy to optimize, since it is easy to understand, and thus it's easy to reason about the code.
    It doesn't sound like any low level work is going to get you anywhere in your simulation. The best bet is to buy lots of hardware to brute-force it...
    ... or get smart! Reason about the problem. Is it important to evaluate the function for all possible combinations of all possible values all the variables and parameters? Is there any hidden constraint or relationship between those variables? The economic model which provides those variables may well have made a distinction between two variables, or an assumption of the independence of two variables, which is not relevant to your modelling.
    Or might statistics/ heuristics help? Picking the most likely region(s) (based on other theory) for computation, calculating that (those) first, and then working out into (assuming probability is smooth) less likely region(s).
    As the article points out, these kinds of optimizations (very similar conceptually to those in the article) are those easiest to do in a very high-level language.

  35. wtf by fred+fleenblat · · Score: 4, Insightful

    The article made it sound like the optimizations he was doing at the erlang level were somehow "better" than optimizations done in a language like C++ because he could just try out new techniques w/o worrying about correctness. His array bounds and types would be checked and all would be good.

    BS.

    First of all, erlang won't catch logical or algorithm errors, which are quite common when you're optimizing.

    Second, you can optimize just fine in C++ the same way just as easily, IF YOU ARE A C++ programmer. You just try out some new techniques the same way you always do. So array bounds aren't checked. You get used to it and you just stop making that kind of mistake or else you get good at debugging it. Hey at least you have static type checking.

    In fact you might be able to do a better job of optimization because you'll be able to see, right in front of you, low level opportunities for optimization and high level ones also. C++ programmers aren't automatically stupid and blinded by some 1:1 source line to assembly line ratio requirement.

  36. Re:You don't optimize, that's the job of the compi by Pseudonym · · Score: 4, Insightful

    Wrong. Dead wrong.

    You don't micro-optimise unless the compiler doesn't do the job well enough. But nowadays, you almost never have to. Your superior brainpower can mostly be freed from the mundane details of your hardware and instead you can concentrate on using more suitable algorithms or data structures.

    Indeed, the best thing you can do to get your code running fast is to write it with good abstractions. That way, when you find a performance problem, you can swap some old code out and swap some new code in and everything else will still work.

    --
    sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  37. Re:Funny thing about performance by almaw · · Score: 4, Insightful

    > Performance gains occur at the hardware level.
    > Any tendency to optimize prematurely ought to be
    > avoided, at least until after v1.0 ships.

    Performance gains occur at the algorithm level. It doesn't matter how much hardware you throw at a problem if it needs to scale properly and you have an O(n^3) solution.

  38. Persuasive, widely accepted, and wrong. by Anonymous Coward · · Score: 1, Insightful
    20 years ago these arguments would have been revolutionary, and partly correct.

    Today they're widely accepted. Most of the responses on /. agree with the basic premise.

    And that's probably why my 2GHz cpu doesn't really feel much snappier than the first computer I ever had, a 10MHz 286 box.


  39. Re:Throw hardware at it. by dtfinch · · Score: 1, Insightful

    To answer your question, whichever's cheapest, but taking into account that hardware costs may be recurring as you deploy the code on other systems, adding more servers only increases the volume that can be handled, and might not improve the response time of a slow loading page, and slow pages push away customers, which costs money. But the professor's right about waiting on many optimizations until you're finished. Code changes, and code you've spend hours optimizing may not make it into the final product, or may need to be altered in ways that require some optimizations to be undone.

    Most of my server side code (javascript) works fast enough on the first try, with most of the optimization being in the design, rather than in coding tweaks. I'll do other simple optimizations as I see them if there's a noticable slowdown and they won't adversely affect the clarity, maintainability, or reusability of the code. My biggest optimization on recent sites was an AES encryption/decryption library written in javascript, which I managed to speed up several fold. I've done research to determine the optimal tweaks for our servers, to get speedups without writing better code. Client side things that work fast enough from the start I don't bother to optimize. 90% of the time, reusable code is much better than efficient code. Most of what I write is 10x as fast as it needs to be, and with our web sites, bandwidth is the most limiting factor. I do optimize a bit to reduce bandwidth, but not excessively. And we spend roughly $0 a year on server hardware, and have since probably the year 2000.

    There's one last optimization that we're planning on making that will eliminate 99% of the need for future optimization, allow us to write horribly innefficient code in favor of code reuse (such great reuse that our non-programmers can jump into development), and it'll give us maximum performance from our database driven web sites. It's a home-grown site generator, for all the pages that are created from a database but are otherwise static. The use of static pages means they can be gzipped and cached on the server, allowing for near instantaneous page loads with low server overhead.

    Then there's the CS side of the fence:

    Right now I'm working on a capstone project that will be one big exercise in optimization, being extremely 3d, memory, cpu, and hard disk intensive, working with data that's to big to fit into memory without culling but needs to be rendered at high frame rates anyway, so everything I just said can go out the window for the next couple months.

  40. Re:Funny thing about performance by Moraelin · · Score: 5, Insightful

    Well, yes and no.

    I still don't think you should start doing every single silly trick in your code, like unrolling loops by hand, unless there's a provable need to do so. Write clearly, use comments, and use a profiler to see what needs to be optimized.

    That is coming from someone who used to write assembly, btw.

    But here's the other side of the coin: I don't think he included better algorithms in the "premature optimization". And the same goes for having some clue of your underlying machine and architecture. And there's where most of the problem lies nowadays.

    E.g., there is no way in heck that an O(n * n) algorithm can beat an O(log(n)) algorithm for large data sets, and data sets _are_ getting larger. No matter how much loop unrolling you do, no matter how you cleverly replaced the loops to count downwards, it just won't. At best you'll manage to fool yourself that it runs fast enough on those 100 record test cases. Then it goes productive with a database with 350,000 records. (And that's a small one nowadays.) Poof, it needs two days to complete now.

    And no hardware in the world will save you from that kind of a performance problem.

    E.g., if most of the program's time is spent waiting for a database, there's no point in unrolling loops and such. You'll save... what? 100 CPU cycles, when you wait 100,000,000 cycles or more for a single SQL query? On the other hand, you'd be surprised how much of a difference can it make if you retrieve the data in a single SQL query, instead of causing a flurry of 1000 individual connect-query-close sequences.

    (And you'd also be surprised how many clueless monkeys design their architecture without ever thinking of the database. They end up with a beautiful class architecture on paper, but a catastrophic flurry of querries when they actually have to read and write it.)

    E.g., if you're using EJB, it's a pointless exercise to optimize 100 CPU cycles away, when the RMI/IIOP remote call's overhead is at least somewhere between 1,000,000 and 2,000,000 CPU cycles by itself. That is, assuming that you don't also have network latency adding to that RPC time. On the other hand, optimizing the very design of your application, so it only uses 1 or 2 RPC calls, instead of a flurry of 1000 remote calls to individual getters and setters... well, that might just make or break the performance.

    (And again, you'd be surprised how many people don't even know that those overheads exist. Much less actually design with them in mind.)

    So in a nutshell, what I'm saying is: Optimize the algorithm and design, before you jump in to do silly micro-level tricks. That's where the real money is.

    --
    A polar bear is a cartesian bear after a coordinate transform.
  41. Re:me too... by StrawberryFrog · · Score: 1, Insightful

    The program source code gets smaller (by as many bytes as the removed whitespace occupied),

    Who cares? I'll bet that for a typical techie, the collection of all the source code they've ever written ever is smaller than thier current MP3 collection. By at least one order of magnitiude. Give me readability.


    and compiles faster (because the parser doesn't have to read and ignore all those whitespace characters).


    Technically you are right. However this effect will be utterly negligable next to the things that really take up time when compiling a c++ program, such as preprocessing #defines, expanding tempate code, checking types, resolving variable references, and generating machine code. c++ compilers are generally quite slow (as compared to say, Borland Delphi) this is due to the complexities and terseness of the language, not the whitespace.

    --

    My Karma: ran over your Dogma
    StrawberryFrog

  42. Re:This guy is out on a limb by julesh · · Score: 2, Insightful

    Mozilla is slow because its GUI is written using a flexible script interpreter, rather than being hard coded in C++.

    I don't know why OpenOffice is slow, I've never analysed the way it works in enough detail. I'm sure the reason is fairly obvious to anyone who knows the code base well enough to comment.

    Windows isn't really slow, but has some annoying features that have been added recently that can slow you down; for instance in the user interface it will try to open files of certain types to display information about them whenever you perform any operations on them, which isn't exactly helpful if opening the file is too slow...

    My only experience of using OSX is that it's blindingly fast. But I've used it for about 3 hours, so that's hardly conclusive.

    But you see the pattern -- these systems are slowed down by features that are _necessarily_ slow. You couldn't have the same features without the performance problems they bring. Windows can't give you a preview of an image, or tell you how many files are in an archive, without opening it (although I _really_ wish it'd do it in another thread...). Mozilla can't support its really easy-to-write user interface extensions without an interpreted UI.

    The people complaining are people who don't actually want these features, and don't see why they should suffer for them, which is a fair point.

  43. Re:Funny thing about performance by hankaholic · · Score: 3, Insightful

    I was going to moderate this post "Overrated", but I'd rather just explain why you're wrong in stating that the "algorithm that was going to hold back your competition runs perfectly fast on new processors".

    Certain algorithms take more-than-proportionately longer as the data size increases. For example, if you're writing route-planning software, each additional stop on a route might cause the number of calculations required to (roughly) double.

    In such a case, having hardware which is twice as powerful would mean that performance would half, although as soon as the user added two more data points, the performance would be slower than the original machine.

    To clarify a tad, let's say FedEx decides to optimize the routes drivers in Montana are travelling. Assume that there are 10,000 stops and 200 drivers, and that your code runs in, say, an hour on FedEx's machines.

    Assume that you've used an algorithm for which each additional data point doubles the amount of computation required. Now FedEx deciding to hire 10 more drivers means that your route planning software is going to take 2^10 times as long to plan their routes (since it doubles for each new data point, that's 2^1 for one driver, 2^2 for two, 2^3 for three...).

    The point is that tiny operations add up when you've chosen the wrong algorithm. Despite the fact that runtime was fine using FedEx's CPU farm in the original situation, your disregard for efficiency will cause the route-planning time to take not the overnight-batch-job-friendly hour, but a stunning 1024 times as long (hint: over a month).

    Say a new big fast machine enters the market, with four times the CPU power. FedEx will still need 256 times as many machines to perform the same calculations in under an hour, or at least, say, 32 times as many in order to be able to perform them overnight.

    All because you decided that choosing algorithms based on performance was poppycock.

    Prematurely optimizing on a microscopic level may be "bad", but choosing the proper algorithm can make the difference between a product with a future and a product with too many designed-in limitations to be able to handle greater-than-expected loads.

    (CS fans will note that the TSP problem was a unrefined to have pulled out given the whole P/NP thing, but that's the point -- sticky situations can and will arise for which no amount of source-level optimization will save the day.)

    --
    Somebody get that guy an ambulance!
  44. Somebody doesn't understand O notation... by Theatetus · · Score: 2, Insightful
    And quicksort will work just fine too. Sometimes O(n^2) will *not* work. Therefore never use bubblesort.

    You totally missed the point, didn't you? There are situations where a bubble sort is faster than a merge sort or a quicksort. It has almost no setup overhead, so if you're sorting sufficiently small arrays (and what I remember from CS101 is that "sufficiently small" goes up to about 1000 members) bubble sort is actually significantly faster.

    So, as a matter of fact, if you had to sort a million small arrays, bubble sort would be the only feasible option.

    --
    All's true that is mistrusted
    1. Re:Somebody doesn't understand O notation... by Eivind+Eklund · · Score: 4, Insightful
      The previous post miss out on many aspects of algorithmic optimization, and lead to the wrong conclusions.

      For a better analysis of optimization in this specific part of the sort space, I recommend Jon Bentley's classic "Engineering a sort function".

      This paper discuss how to implement an optimal sort, after having done real-life measurements. Conclusions include dropping to an O(N^2) sort algorithm when qsort partitions become small enough - insertion sort was choosen. (The selected cut off was secven elements at that point; it may be that it would be sensible to choose a higher cutoff for the generic case now, as the cache locality might help. However, I won't bet on this either way without doing measurements.)

      The qsort implemented there is the one still used in at least FreeBSD. I don't know the status for other OSen.

      As for big O notation: The discussion in the previous post is so imprecise as to be misleading. It use "cost" and "complexity" where it discuss asymptotic complexity; these are distinctly different, and it is necessary to be quite clear on the distinctions to do correct analyses.

      Big-O notation measure asymptotic complexity over an arbitrarily selected set of basic operations assumed to have unit cost. It discard all constants to make the analysis easy to do and easy to work with. This is a useful tool, but it only measure asymptotic complexity, and it only does it based on arbitrary basic operations.

      In practice, a mere factor 1000 speed difference (one second to twenty minutes) might be quite noticable. This will be REMOVED from the big-O analysis, which can make it point in a quite different direction from the truth.

      In the parent post, sorting 1000 elements is assigned a unit cost, claiming that the time will be similar for a bubble sort and a quick sort, and "low enough not to matter". Further, the conclusion is "never use bubble sort". Assuming a naive implementation of both bubble sort and quick sort, and a set of arrays that is already sorted, the quicksort will be O(N^2) and the bubble sort will be O(N) in the number of items in each bin. This is a quite noticable difference in asymptotic complexity.

      A naive programmer is in my opinion the only relevant assumption if we're to give absolute advice on simple sort functions. A non-naive programmer will know how to do complexity evaluation, will know the tradeoffs on startup of the various algorithms, and will only be implementing a sort him- or herself because actual speed measurements or specific knowledge of the sort behaviour show that the system supplied sort is not fast enough for the case in question, and that a custom sort can do better. (S)he will also evaluate whether the data to sort is likely to be almost sorted or highly random, and thus which kind of algorithm is likely to go faster. (And insertion sort/bubble sort is actually faster also for large data sets if they're almost sorted beforehand.)

      Eivind, who if he had to give general advice would give "evaluate qsort, mergesort, heapsort, insertion sort, and using a data structure that keeps order before choosing bubble sort."

      --
      Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.
  45. Those who code so no one can do their work... by Anonymous Coward · · Score: 2, Insightful
    I tell all programmers who work for me and let all who work with me know this:

    Given that the only reason to deliberately make it hard for others to understand your work is to increase your job security, that must mean that you don't think you bring enough other skills to the job to keep it on merit.

    In other words, you don't think you're good enough.

    And given that most programmers think they are better than they actually are, if you don't think your good enough, why the hell should anyone else?

  46. JIT optimization is just peephole optimization by Speare · · Score: 2, Insightful

    People keep saying that the JIT-style optimizers in .NET and Java can radically optimize the application "for programmers who can't or won't."

    Peephole optimization and clock-scheduling are among the simplest of optimization. The machine looks at a few low-level instructions and might suggest an alternative which would operate identically but with better performance. That's really all that the VM has time or capability to perform today.

    Mid-range optimizations include vectorizing, unrolling of loops, and register reduction. These are still machine-analyzable, so I expect the JIT-style optimizers to continue to make strides here.

    But I don't think you're ever going to see JIT-style optimizers which replace an O(n^2) algorithm with an O(log n) algorithm. That is real optimization. That's where you win the performance races. That's the one that programmers should care about, and should learn how to do. The level of analysis required to "divine" the whole meaning of a large routine, realize the alternative algorithm equivalent, and fix up the code is far beyond any JIT solution.

    I think we will have to wait far longer than the 6 GHz Longhorn machines before you see any meaningful machine optimization of sloppy code.

    --
    [ .sig file not found ]
  47. Re:Funny thing about performance by Vengie · · Score: 3, Insightful

    You're ignoring constants. Constants can sometimes be large. That is why strassen's matrix multiply method takes longer than the naive method on small matricies.

    Scarily, you have just enough knowledge to sound like you know what you're talking about. Sometimes it DOES matter how much hardware you throw at the problem, lest you forget the specialized hardware DESIGNED to crack DES.

    How about your next computer I replace all the carry-lookahead adders with ripple-carry adders? Please look up those terms if you don't know them. I'm sure you'd be unpleasantly surprised.

    --
    When in doubt, parenthesize. At the very least it will let some poor schmuck bounce on the % key in vi. (Larry Wall)
  48. Re:Throw hardware at it. by EastCoastSurfer · · Score: 3, Insightful

    I hesistate to first throw hardware at the problem, but I do agree that optimizations generally should be left as the last thing to do in a project. Code should be written first to be readable and correct. Once those goals have been met, testing and profiling will find the few areas that are critical and may need some optimization.

    The problem your prof is probably trying to get you to avoid is wasting time tuning code that rarely gets executed. It comes down to the old 80/20 rule. Sure, you can spend weeks hand tuning some import routine, but all your time was wasted if that import is only run once a month, at night while the system is offline.

  49. Re:me too... by Surt · · Score: 2, Insightful

    Worse, the time it take him to delete one space or tab will always be much longer than the time saved in the parser/compiler.

    --
    "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
  50. This is a well-known problem, by warrax_666 · · Score: 2, Insightful

    especially in databases where the data set that you have to sort is often so big that it doesn't fit into memory. The (usual) solution is to use a variation on the well-known Merge Sort algorithm, where blocks are merged into larger and larger "runs" of sorted data (which are then merged). (The number of runs of course depends on how much data there is and how much memory you have).

    --
    HAND.
  51. Re:Throw hardware at it. by arkanes · · Score: 3, Insightful
    The most important reason to wait is because, almost inevitably, the part that you THINK is slow is not the part that actually hangs you up. You may spend an extra couple days working on you super-fast optimized sort & data structure only to find when you deploy that your bottleneck is RAM usage and all your clever caching is just slowing stuff down. Another good example is earlier in this thread, with the super-fast optimized MD5 libraries - spending money or time writing/buying those libraries if your data set is IO bound doesn't make much sense.

    Optimization is great, but profiling to make sure that your optimization isn't wasted is more important.

  52. Re:Performance tuning. by Glonoinha · · Score: 5, Insightful

    Problem is that your client is still running on the prototype of their project, not the real release. They just don't know it, and I'm guessing the original programmers don't know it.

    The most effective, well used (if unintentionally used) development methodology is the prototype methodology. The first pass is simply a reality check, can we even accomplish what needs to be accomplished on the hardware and development tool we have available? The prototype is then shown to management as a proof of concept, show them that their ideas are possible, and then a second generation is re-engineered from the ground up using the lessons learned in the first generation as a foundation for a solid, well engineered deliverable product. This breaks down in one of two ways : management says screw the rewrite, lets just run what we have - or the developers are not smart enough to understand that their first pass at it wasn't production quality code, only a prototype.

    What your client has right now is a prototype, a proof of concept. It 'works' inasmuch as a kite flys - as a demonstration that the concept is viable, but not meant for real work. You could probably push a big kite hard enough to 'fly' two people, but that doesn't make it a good idea. You could continue to 'tweak' a kite in order to even double the performance, get 4 people off the ground - but I wouldn't recommend using it for commercial applications.

    Odds are the app needs to be understood from top to bottom so a set of software engineers know the concepts, what the package is intended to do, how it currently does it, what the expectations are for performance and growth - and then the SE's that understand it need to rewrite it from the ground up developing performance engineered code that is production quality.

    --
    Glonoinha the MebiByte Slayer
  53. Better compilers, clean code by Goodbyte · · Score: 2, Insightful

    I don't know Erlang, but if it is a pure functional language, the compiler/interpreter can use "special" optimizations, e.g.

    decode_rgb(Pixels) ->
    list_to_binary(decode_rgb1(binary_to_list(Pixe ls))).
    will not produce intermediate lists, instead the compiler will use lazy evaluation when decoding the data.

    My point is that many optimizations do not sacrifice readability. Many times it is possible to refactor slow code that improves both readability and execution speed, but you must know the pros and cons of the tools you are using!

  54. Ummmm... by Rufus88 · · Score: 2, Insightful

    Were they C++ programs, or 20 year old shell scripts?

  55. Re:Throw hardware at it. by An+Onerous+Coward · · Score: 3, Insightful
    Also depends on the size of the app. With a small app, what excuse do you have for not optimizing? Wouldn't take that long. With a big project? Depends on your work environment.
    The level of optimization needed for small projects varies wildly. If it's a one-shot deal to allow one secretary to generate a report twice a week, who really cares if it takes two seconds or twenty? Even if you assume that's thirty-six seconds every week, it's going to take years of use before it would have been worthwhile to optimize it.

    On the other hand, if it's something that hundreds of people are going to be using four or five times a day, then it's probably worthwhile to do some algorithmic/data structure improvements.

    Finally, you get the extreme case: some library that will end up being used by millions. Those are the times when you want to eke out every bit of performance you can. The size of the project doesn't always determine its importance, nor does the importance of the project always determine how much optimization is needed.
    --

    You want the truthiness? You can't handle the truthiness!

  56. what's the score by dutky · · Score: 2, Insightful
    This guy, for some reason only vaguely defined, wants to demonstrate something about computer performance and the need (or lack thereof) for optimization. So, rather than use any of a large number of established benchmarks, he pulls a targa graphics file decoder out of his ass. This is the justification he gives:

    It gets away from tired old benchmarks involving prime numbers and replaces them with something more concrete and useful. It also involves a lot of data: 576x576 24-bit pixels, for a total of 995,328 bytes. That's enough raw data processing to require performance-oriented coding, and not some pretty but unrealistic approach.

    Despite the fact that nobody (other than him) seems to be interested in targa graphics these days, and that the total amount of data involved (less than a megabyte) is miniscule compared to limiting bandwidths in modern computers (ranging from ~100 MB/sec for the disk I/O subsystem, ~500 MB/sec for the main memory, and in the multiple GB/sec for the caches and internal registers).

    Then he shows us just what a wonderful benchmark this program is: it is so wonderful that it runs nearly instantaneously! Maybe it's just me, but I like my benchmarks to take a little while to complete, either becuase I don't trust the sub-second accuracy of the system time routines, or because I like to get a reasonable sample of system states contributing the overall performance of the benchmark.

    Next, the guy tells us how he used unsatisfactory tools to implement an ill-conceived algorithm and, glory-glory, later fixed his own dumb-ass mistakes!

    Finally, he claims that he would not have been able to make the same kinds of algorithmic adjustments if he had implemented in C rather than Erlang, though it is not obvious why it would have been more difficult (and he doesn't give any arguemnt to support his assertion).

    From this exercise he concludes that, GASP, optimization is still important!

    What a senseless waste of skin.

  57. Useless article by ChaosDiscord · · Score: 2, Insightful
    Even traditional disclaimers such as "except for video games, which need to stay close to the machine level" usually don't hold water any more.

    Sure, if you're talking about puzzle games.

    However, for most retail games pushing the graphics as far as possible is important. If you can squeeze a 5% improvement out of the engine you can use the freed up time to make the game a bit prettier. Or put another way, art expands to fill available processing power. Graphics blocking on the video card? Well, you can use processing power for increasingly realistic physics simulations and artificial intelligence.

    If you play a lot of games you know that there is great variance between games. Some games coast along at a bare minimum while others surprise you with their ability to create compelling visuals with older hardware.

    After all, who ever thought you could use an interpreted, functional language to decode Targa images, especially without any performance concerns?

    Ummmm, just about anyone sane? Wow, decoding a measely megabyte of data. And the encoding? Simple run length encoding. That's not a real programming problem; that's a homework assignment for Computer Science 101. If you're keen on loading graphics you could at least pick something that is slightly complicated like JPEG.

    Is it true that optimization is massively overrated; that most programs are plenty fast? Sure. But this article doesn't provide a bit of evidence for that.

  58. Scientific computing by gnuLNX · · Score: 2, Insightful

    Sorry pal....but as long as there are scientific aplications to solve there will be a need to write highly optimized code...Heck I wrote some inline assembly today. Yes we can ignore performance in some areas, but in high perfomance computing, speed is still king, and quite frankly always will be.

    --
    what?
  59. Get programming as if SIZE matters... by Kazoo+the+Clown · · Score: 2, Insightful

    I think size is a far more serious problem than speed-- just because I can put multi-gigabytes in my PC doesn't mean I want to waste them loading bloatware. In fact, I probably wouldn't NEED the multi-gigabytes if it wasn't for code bloat. Of course, Gates loves such things because the machine retailers love them-- easier to sell more memory to someone who needs it because they just tried to upgrade to XP or something.

    So which compilers space-optimize by rolling loops instead of unrolling them?

  60. Broken C++ code by d-rock · · Score: 2, Insightful

    Actually, that could break C++ code that uses templates. There's a difference between

    vector <pair <int, float> > myVector

    and

    vector<pair<int,float>>myVector

    It's only subtle until you try to compile it :)

    Derek

    --
    Don't Panic...
  61. That guy doesn't know ANYTHING about performance by Anonymous Coward · · Score: 1, Insightful

    Why was this even posted? This guy has no clue about real performance!! Being able to process 66 images a second is nothing to brag about.. Sure he may "look" cool to non-programmers because he used edgar or whatever the hell, but i wouldn't hire him to work on anything performance intensive.. plus who the hell talks about performance in terms of execution speed times? saying 66 images a second says nothing about the relative performance about his program, algorithm OR the underlying language.