If you are writing the inner loop of a graphics engine that is going to run a few hundred million times a minute, then you need to optimize from the get go. If it's only going to run once a frame, then go ahead and be sloppy at first, because it's more important to get the behaviour correct than the code running fast. After you have the behaviour correct then start thinking about how to tune your algorithm. People also tend to forget that how your data structures look also impacts performance, how large your data is and how it impacts cache thrashing makes a big difference on machines with small caches. I always combine top down with bottom up design practices to get a good balance of performance and usability. I have a saying that I use for the game industry, no one playing the game cares how good your code looks, all they care about is is the game fun and does it crash. Of course that doesn't mean that you can write unreadable code, but it puts things in perspective. Things that need to be used for a long time have different requirements than a product that has a shelf life of a couple of years. As languages change so do our coding styles, how often to we go back and look at things we wrote years before and think I can't believe I wrote it that way. We are working in an evolving industry, nothing lasts forever.
The more complex the architecture the greater need to keep around low level coding. Compilers just can't keep up. During the early days of the PS2 we commonly got 300x performance improvements when switching from high level code to carefully architected and coded assembly. Programmers have gotten lazy and have lost the skills required to maximize the performance on current architectures.
If you code carefully you can make sure that you are executing the maximum number of instructions per cycle. When you use a compiler it abstracts you from seeing that if you change your instruction pairing or split off some of the instructions into another pipeline you might get better performance. In school they teach you that algorythm is the most important thing to look at and that implementation doesn't matter that much, but with todays complex bus architectures, and with everything fighting for control of the bus, if you aren't careful you can end up wasting most of your time waiting for access to data or stalling the instruction pipeline waiting for results to calculations.
If you are writing the inner loop of a graphics engine that is going to run a few hundred million times a minute, then you need to optimize from the get go. If it's only going to run once a frame, then go ahead and be sloppy at first, because it's more important to get the behaviour correct than the code running fast. After you have the behaviour correct then start thinking about how to tune your algorithm. People also tend to forget that how your data structures look also impacts performance, how large your data is and how it impacts cache thrashing makes a big difference on machines with small caches. I always combine top down with bottom up design practices to get a good balance of performance and usability. I have a saying that I use for the game industry, no one playing the game cares how good your code looks, all they care about is is the game fun and does it crash. Of course that doesn't mean that you can write unreadable code, but it puts things in perspective. Things that need to be used for a long time have different requirements than a product that has a shelf life of a couple of years. As languages change so do our coding styles, how often to we go back and look at things we wrote years before and think I can't believe I wrote it that way. We are working in an evolving industry, nothing lasts forever.
The more complex the architecture the greater need to keep around low level coding. Compilers just can't keep up. During the early days of the PS2 we commonly got 300x performance improvements when switching from high level code to carefully architected and coded assembly. Programmers have gotten lazy and have lost the skills required to maximize the performance on current architectures. If you code carefully you can make sure that you are executing the maximum number of instructions per cycle. When you use a compiler it abstracts you from seeing that if you change your instruction pairing or split off some of the instructions into another pipeline you might get better performance. In school they teach you that algorythm is the most important thing to look at and that implementation doesn't matter that much, but with todays complex bus architectures, and with everything fighting for control of the bus, if you aren't careful you can end up wasting most of your time waiting for access to data or stalling the instruction pipeline waiting for results to calculations.