Slashdot Mirror


Arrays vs Pointers in C?

UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" "Here are my own thoughts on the issue:

For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.

The first function uses array indexing:
void reversestring_array(char *str)
{
int head, tail;
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{
temp = str[tail];
str[tail] = str[head];
str[head] = temp;
}
}
The second function uses pointers:
void reversestring_pointer(char *str)
{
char *phead, *ptail, temp;
if (!str) return;
ptail = str + strlen(str) - 1;
for (phead = str; phead < ptail; ++phead,--ptail)
{
temp = *ptail;
*ptail = *phead;
*phead = temp;
}
}
While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.

Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...

The array function:
forloop:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al
inc esi
dec ecx
cmp esi,ecx
jl forloop
The pointer function:
forloop:
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl
inc ecx
dec eax
cmp ecx,eax
jb forloop
While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."

308 comments

  1. Why do you care? by Julian+Morrison · · Score: 4, Insightful

    For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

    If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.

    1. Re:Why do you care? by Macphisto · · Score: 1, Redundant

      Yeah, no kidding. Also, the discussion is pretty meaningless unless the compiler is considered, target architecture, etc. It's a safe bet that this sort of semantically equivalent code has generated assembly that has no performance advantage either way for any compiler made in like the last bazillion years. Oh yeah, who writes C code anymore?

      (me... so lonely in here...)

    2. Re:Why do you care? by Blakey+Rat · · Score: 1

      I agree. Both of those pieces of code happen so fast that it doesn't matter. Unless all your program does is reverse lists over and over and over...

    3. Re:Why do you care? by Anonymous Coward · · Score: 0

      RTFA
      this is just an example. he isnt interested in the performance of these specific statements. he wants to know which method is faster in general.

    4. Re:Why do you care? by thegrassyknowl · · Score: 4, Interesting

      then don't wank around optimizing

      Dude, best use of the word wank. Ever!

      for single cycles on a machine that probably bleeds off a million cycles every time you raise a window

      Computers have become more powerful and programmers have become more lazy. It's not strictly true because instead of focussing a lot of time writing efficient code programmers are now focussing a lot of time writing a lot of code to fill bigger machines. That million cycles is wasted doing crap and probably half of doesn't need to be done anyway...

      I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz). Software was generally responsive, functional and minimal. Adding a zillion features and eye candy was not considered necessary. Programs were easy to use and intuitive, and did I mention functional and minimal? In the days where "nobody will ever need more than 640k" software was designed and optimised to be small and chew up few cycles.

      Now, with RAM and Gigahertz available for next to nix software has just bloated out. It's nice to see a programmer thinking about efficiency/size even if it is purely academic. We should be encouraging that; I know I'd like my applications to work faster and carry less crap than they do currently.

      --
      I drink to make other people interesting!
    5. Re:Why do you care? by aled · · Score: 4, Funny

      Both of those pieces of code happen so fast that it doesn't matter.

      Unless of course that someone writes a compiler so optimizing that the code ends before it begins, causing a paradox that will end the universe. To prevent that imminent danger all programmers must start programming in TI-99/4a Basic right now.

      --

      "I think this line is mostly filler"
    6. Re:Why do you care? by Anonymous Coward · · Score: 0

      Amen

      In my experience (those old days..) it was the transition from C to C++ that started the bloat.

    7. Re:Why do you care? by jguthrie · · Score: 1
      he wants to know which method is faster in general.

      That's like asking "What's the sound of yellow". "In general" the question is not answerable. In my opinion, the question is also misguided. This sort of thing is what Hoare was talking about when he said "Premature optimization is the root of all evil". Making the program understandable and maintainable is far more important than trying to squeeze out every last ounce of performance.

      Oh, and Mr. Macphisto, I still write C code. I write a lot of C code. I write in quite a number of other languages, too, some of my own divising, but C is a goodly chunk of it.

    8. Re:Why do you care? by Reality+Master+101 · · Score: 1

      *sigh* And some people wonder why in the age of 3GHz processors, they're still only as responsive as 33Mhz processors.

      --
      Sometimes it's best to just let stupid people be stupid.
    9. Re:Why do you care? by AuMatar · · Score: 1

      You are officially my hero now.

      I'd like to point out another related issue. Inefficiency adherents such as Java lovers always talk about how hardware improvements make the inefficiency go away. Well not if you waste it on bloatware. And at any rate, it doesn't improve the hardware I own. Efficiency will cease to matter as soon as they start buying my hardware and electricity.

      --
      I still have more fans than freaks. WTF is wrong with you people?
    10. Re:Why do you care? by JanneM · · Score: 5, Insightful

      For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

      When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for; or how exhaustively you can search for features during one frame; or what resolution image you are able to work on.

      If your code does something nice and graphical where 99% of the time is spent waiting for the user, sure, you're absolutely right. And if your system is doing something inherently bounded - it works until it's done, then it stops and waits until it's time again - then all you need is to make it fast enough and no faster. But there are real-world systems that today, and for the foreseeable future, are bounded by the available processing power and that can always benefit from any improvement in execution time.

      --
      Trust the Computer. The Computer is your friend.
    11. Re:Why do you care? by (1+-sqrt(5))*(2**-1) · · Score: 1, Interesting
      Well not if you waste it on bloatware.
      A great analog of the expansionist principle is freeways: natural law dictates that usage, unhindered by external constraints, will rise to meet capacity.

      Whereas C's ghost is parsimony and piety, décadence haunts the house of Java.

    12. Re:Why do you care? by Anonymous Coward · · Score: 1, Interesting

      > 10MHz range (yes, 10MHz is 400x slower than today's 4GHz). Software was generally responsive, functional and minimal. Adding a zillion features and eye candy was not considered necessary. Programs were easy to use and intuitive, and did I mention functional and minimal? In the days where "nobody will ever need more than 640k" software was designed and optimised to be small and chew up few cycles.

      You forgot to mention that keeping an 80x25 column text display updated only required moving around a maximum of 4000 bytes at a time (80x25 = 2000 bytes for the monochrome text buffer + another 80x25 colour buffer for 16 background and 16 foreground colours per text block), and that a sub 10MHz CPU would certainly struggle to animate that at 30 frames per second or higher.

      My 1920x1200 colour display requires 55296000 bytes (1920x1200*24), which is 13,824 times as much as that 80x25 text display.
      Now despite my 2GHz CPU only being 200 times faster than the hypothetical 10Mhz CPU, it doesn't struggle at all - not only can it animate the whole screen at 60FPS or more, but it can also calculate positions for thousands of objects at the same time.

      There are certainly some areas in which the software doesn't seem to have sped up in proportion to the processor speed - but my guess is thats mostly because they ceased being CPU bound years ago, not because the CPU is flat out wasting cycles trying to do the job.
      That doesn't mean it's not the programmer's fault - but if it is, it's because they decided to block the interface while they waited for a DNS query (or similar bad design decision), not because they used a pointer offset instead of an array dereference, or vice versa.

      Micro-optimisation is pretty much always a waste of time, if your program really is that CPU bound that you have to start worrying about internal implementation issues then you're better off just buying a newer CPU than wasting time optimising.
      And if it's not, then you've got a design or algorithm problem which you should be fixing.

    13. Re:Why do you care? by thegrassyknowl · · Score: 3, Interesting

      You forgot to mention that keeping an 80x25 column text display updated only required moving around a maximum of 4000 bytes at a time (80x25 = 2000 bytes for the monochrome text buffer + another 80x25 colour buffer for 16 background and 16 foreground colours per text block), and that a sub 10MHz CPU would certainly struggle to animate that at 30 frames per second or higher.

      I grew up in the days if Amiga - and they certainly didn't have an 80x25 text console... so your analogy is fundamentally flawed. All of my favourite Amiga software was lightweight, efficient and responsive (except for the rey tracing engines I used, but that did _actually_ have to do some serious calculation). In fact, my Amiga ticked along on a 800x600 screen quite nicely. Your analogy is also flawed becase not all of the screen is updated at any one time; only the parts that have changed. Oh, and 4000 bytes x 30 FPS is 120kb;

      My Commodore 64 has a 700kHz processor in it and it can certainly animate the full screen at 25 frames per second albeit at a slightly reduced resolution. My Atari 2600 has a CPU of about the same speed and it was capable of keeping up with 25 frames per second (again, lower resolution) and still running the game engines just fine. Your argument doesn't hold water pal!

      My 1920x1200 colour display requires 55296000 bytes (1920x1200*24), which is 13,824 times as much as that 80x25 text display. Now despite my 2GHz CPU only being 200 times faster than the hypothetical 10Mhz CPU, it doesn't struggle at all - not only can it animate the whole screen at 60FPS or more, but it can also calculate positions for thousands of objects at the same time.

      What exactly do you do on your 1900x1200 colour display? First, 1900x1200x24 bits is actually 1900x1200x3 bytes (6,840,000 bytes). That is a far cry from your piss-ant 55,296,000 bytes (55MB).

      Given your eagerness to quote numbers that are practically meaningless to the point and blatantly inaccurate, and your "calculation of positions of thousands of objects" I suggest you are playing games on Windows.

      Again, your numbers are flawed, because my old 80x25 text mode display is still drawn from individual pixels. Those pixels still have to be updated individually by the CPU (in the 10MHz days it was often the main CPU that drew every little dot). 80x25 is drawn on a 640x480 in 16 glorious colours. Now, by your argument, 640x480x2 (your 2 image argument from above) = 614,400 bytes. Therefore, your 1900x1200 screen only requires 12x as many bytes to move about but I really dont' care about those numbers because most of the work is done in the GPU now; not the main CPU.

      There are certainly some areas in which the software doesn't seem to have sped up in proportion to the processor speed - but my guess is thats mostly because they ceased being CPU bound years ago, not because the CPU is flat out wasting cycles trying to do the job. That doesn't mean it's not the programmer's fault - but if it is, it's because they decided to block the interface while they waited for a DNS query (or similar bad design decision), not because they used a pointer offset instead of an array dereference, or vice versa.

      Software bloat is because programmers can get lazy when the CPU gets fast (the old "oh there'll be better CPUs by the time we release it" excuse). Looking at the power consumption figures for a modern Pentium4 CPU and figuring out that it wastes billions and billions of cycles a day doing work that it should never have needed to do is scary. If you average it out over all the PCs running in the world the amount of energy turned into heat because of sloppy programming alone would be enormous.

      What about all the wasted hours waiting for the computer to do something because of some sloppy programmer being willing to waste a few million CPU cycles here and there? It doesn't seem like much at the micro level, but think about it across all the processes that run on your machine of a day and the few

      --
      I drink to make other people interesting!
    14. Re:Why do you care? by thegrassyknowl · · Score: 1

      I'd like to point out another related issue. Inefficiency adherents such as Java lovers always talk about how hardware improvements make the inefficiency go away.

      Not a dig at the OP rather than an observation:

      Faster hardware will not make the inefficiency go away - it's still the same amount inefficient. If something wastes 30% of its time wanking on, it doesn't matter how fast you make it go, it's still wasting 30% of its time wanking on.

      Faster hardware just makes minor inefficiencies less noticible, so programmers add more minor inefficiencies and apply the same "but faster hardware will make it ok" attitude instead of fixing the real problem!

      --
      I drink to make other people interesting!
    15. Re:Why do you care? by NonSequor · · Score: 4, Insightful

      You don't understand the problem. A chemical reaction is only as fast as its slowest step. Catalyzing the other steps will not yield an improvement in reaction rate.

      For computer programs, you won't gain anything worthwhile by optimizing code that the computer only spends 1% of its time executing. That's not to say you should do a sloppy job, but you should focus on what matters. Microoptimization techniques (those techniques that involve choices of instructions and their orders rather than changing the algorithm that is used) typically yield very small gains. Microoptimization can yield substantial benefits when used properly in heavily used sections of code, but the time involved in trying to microoptimize everything could be better used to work on macrooptimizations or organizing the code to make it more amenable to later modification.

      There's no sense in trying to make your program 0.3% faster when you could be finding a way to make it 20% faster instead.

      --
      My only political goal is to see to it that no political party achieves its goals.
    16. Re:Why do you care? by glavenoid · · Score: 3, Funny

      Incedentally, I find some shades yellow to sound rather coarse, with harmonic triangles increasing with each beat inverse to the fundamental. Not as vulgar a sound as say, Taupe, but not as auarally pleasing to me as drips of purple. Now, Brown, on the other hand, has a rather discordant sound, much like a Dom. 7th, even if the actual interval is not such...

      --
      I, for one, am looking forward to the inevitable /. beta rollout fallout.
    17. Re:Why do you care? by Scarblac · · Score: 1
      For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

      When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for;

      Hence his question. In those sort of situations, you have a detected bottleneck that you found out about with profiling tools. And you don't need ask Slashdot, you just try both options and see which profiles better. If it's not one of those situations, it's just wanking around.

      --
      I believe posters are recognized by their sig. So I made one.
    18. Re:Why do you care? by cow-orker · · Score: 1

      Nonsense. Some cycles wasted by suboptimal code can never match the magnitude of bloat caused by a single XML parser. The problem isn't that people think at higher abstraction levels, it's rather that people have stopped thinking at all.

    19. Re:Why do you care? by thsths · · Score: 1

      > I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz). Software was generally responsive, functional and minimal.

      Nobody is going to dispute that software got bloated over the past decade(s).

      But I don't think that your memories accurately represent the historic facts. I started programming in BASIC on a CPC664 (4 MHz Z80), and it was slow. Scrolling the screen for a line took about 1/2 second, so viewing a program was bound to take some time.

      Then I worked with UCSD pascal on an Apple II (1 or 7 MHz bepending on interpretation). It was painstakenly slow: compiling a program of 500 lines took a few disk swaps (despite the dual disk drive) and at least 5 minutes of compile time. All wasted by a missing semicolon or similar...

      Then there was text processing. No WYSIWYG, but tron like command sequences. "Compiling" the document took a while, but the printer was the real pain: 2 hours for 25 pages in "NLQ".

      And then the games. I mean, look at WoW and marvel how smoothly all the characters are drawn. Compare that to PACMAN or whatever was popular at the time, with hideous delays to your input etc. The only responsive game that I remember was Snake, and that was only because a mere 4 characters on the screen need updating during a tick :-)

      So bloat has gone up, but it has not prevented software from becoming faster, more responsive, more powerful, vastly more flexible etc, because we have that increase in CPU power.

    20. Re:Why do you care? by JanneM · · Score: 1

      And you don't need ask Slashdot, you just try both options and see which profiles better.

      But as other people have pointed out, the answer is not trivially one or the other. It depends on the architecture you run and the compiler you use. And if you don't actually have access to the architectures and compilers where your code may be running, it makes sense to ask if there is a general rule of thumb to follow if you don't know. If compilers generally are good/bad at optimizing one or the other that may be very valuable to know.

      --
      Trust the Computer. The Computer is your friend.
    21. Re:Why do you care? by elfkicker · · Score: 2

      If only the theories were right. You're right and all, but please stop speaking like that.

    22. Re:Why do you care? by ron_lima · · Score: 1

      Optmizers are now complex enough to generate an output in a way that there will be no difference if you use pointers or arrays. For this matter I prefer using the good sense: arrays are easier to understand and probably you are not going to be the one that will maintain the code in the future. Probably a programmer with much less experience will be the maintainer of the code. Therefore, it is a good idea to keep everything easy to understand. If it is really necessary to optimize it, if your compiler has a bad optimizer for instance, so do it. But only if it is really necessary.

      I think that you should trust your optimizer as long as any profiling tools says that your code is running fast, after the optimization.

      --
      Ronaldo Faria Lima
      E-mail:ronaldo@ronaldolima.eti.br
      Home page: http://www.ronaldolima.eti.br
    23. Re: Why do you care? by Black+Parrot · · Score: 1
      > But as other people have pointed out, the answer is not trivially one or the other. It depends on the architecture you run and the compiler you use. And if you don't actually have access to the architectures and compilers where your code may be running, it makes sense to ask if there is a general rule of thumb to follow if you don't know.

      Here are the big-picture rules of thumb:
      • don't use an exponential time algorithm when you could use a polynomial time algorithm (etc)
      • make the common case fast


      It only pays to dick around with the little stuff in very special circumstances. And when hand optimizing you've got to consider issues such as pointer dereferences, register usage, cache misses, etc.

      And perhaps most of all, what is the tradeoff for supercode vs. readability when it comes to delivering a bug-free application and maintaining it afterward.

      There's a strong tendancy among programmers to be penny wise and dollar stupid.
      --
      Sheesh, evil *and* a jerk. -- Jade
    24. Re:Why do you care? by elfkicker · · Score: 1

      How would you know until you tried? Not many people (but bless those that do) care about working extra for a .3% improvement.

      The point is most people dont even care about a 200% increase. Only raw computaions an IO are faster on current hardware, but without question, the programs are slower. More features you say? Why is my software loading modules/features/etc I never use? A small bit of thought here would go a long way. Sad part is, the market will figure this out due to pricing and demand quicker than any producer on a budget and schedule.

      People use software to make them efficent. Sometimes it actually does, but rarely is it doing something more efficently.

    25. Re:Why do you care? by (1+-sqrt(5))*(2**-1) · · Score: 1
      You're right and all, but please stop speaking like that.
      Like what, exactly? Pray tell, baby.
    26. Re:Why do you care? by dolmen.fr · · Score: 1

      Unless of course that someone writes a compiler so optimizing that the code ends before it begins

      Haskell is near to that (at least nearer than most other programming languages), thanks to lazy evaluation.

    27. Re:Why do you care? by Grab · · Score: 1

      Oh yeah, who writes C code anymore?

      Er, close to 100% of software engineers writing code for embedded systems, perhaps?

      Grab.

    28. Re:Why do you care? by Grab · · Score: 1

      I remember those days too - and they're still with us now if you're doing embedded software...

      As for the optimisation, as you can see from the bytecode, it depends 100% on your processor supporting instructions which speed this up. If you're on a processor that doesn't have this built in, you're screwed.

      Grab.

    29. Re:Why do you care? by JudicatorX · · Score: 1

      So the question remains: Why should I waste tens of hours on writing and debugging something in C to do what can be coded in a few minutes with perl, and that will probably only be used once or twice?

      --
      "It is a good divine that follows his own instructions" - Portia, The Merchant of Venice
    30. Re:Why do you care? by muyuubyou · · Score: 1

      Just a couple points:

      The Amiga (unless you're talking about one of the very latest ones which I don't know, hi-res was 640x200) couldn't handle anywhere near 800x600. The Commodore 64 had indeed a "slightly" lower resolution at 25 X 40 text, 320 X 200 16 colors.

    31. Re:Why do you care? by 68k+geek · · Score: 1
      I know I'd like my applications to work faster and carry less crap than they do currently.

      Do you also want to pay 10 times more for it and have less features (because it takes more programmer time to write it)? how about have to wait much longer and pay much more for upgrades (software is less maintainable when it is optimized for execution speed and not for readability and clarity)
    32. Re:Why do you care? by thegrassyknowl · · Score: 1

      So the question remains: Why should I waste tens of hours on writing and debugging something in C to do what can be coded in a few minutes with perl, and that will probably only be used once or twice?

      If it will only be used a handful of times then, by all means, code it in perl in 5 minutes and be done with it. If it's something that the user will use over and over again, day in, day out (the windowing system, for example) then you should spend your time making it efficient :)

      Languages like Perl have their place. I code in Python some of the time for small one-off bits and pieces and scripts that aren't performance critical (our build scripts spend more time waiting for the compilers to finish on a 16 processor machine, so the overhead of Python is acceptable). I certainly wouldn't write the whole compiler in Python though!

      --
      I drink to make other people interesting!
    33. Re:Why do you care? by TangoCharlie · · Score: 1
      Unless of course that someone writes a compiler so optimizing that the code ends before it begins, causing a paradox that will end the universe. To prevent that imminent danger all programmers must start programming in TI-99/4a Basic right now.
      Or even better C#/.Net!
      --
      return 0; }
    34. Re:Why do you care? by thegrassyknowl · · Score: 1

      Do you also want to pay 10 times more for it and have less features (because it takes more programmer time to write it)? how about have to wait much longer and pay much more for upgrades (software is less maintainable when it is optimized for execution speed and not for readability and clarity)

      Yet another Windows user... I'll pay more for quality software, but in the end I suggest that better written software would cost more because there would be less maintainence and support required.

      I also said that I think features are full of wank to get idiots to actually pay for software. It's often the case that one of the "features" that is exploited to get you to pay for the software is a half-assed after thought that doesn't even do what they claim it does. In any case, having a million features means you have to pay someone to write every one of them and most of the time the program becomes so complex that users only use the most basic subset of functionality because it's quicker to do things the old-fashioned way than to spend the time working out where all of the options to do things are actually located or how they work!

      And properly optimised software is no less readable/maintainable than non-optimised software. Non optimised software is usually poorly written from the outset and with little planning. I find that the places that have bottle necks in the code I work on are usually the ones that are the hardest to figure out because the programmer has done something incredibly complex or not bothered to do the analysis to figure out the limits of their algorithm coverage and I have to do all that work to fix the bottleneck and also understand the code I am trying to fix.

      --
      I drink to make other people interesting!
    35. Re:Why do you care? by Lars+T. · · Score: 1
      I'm pretty certain his point was not optimizing. It was more like "do we actually still need the optimizations (as in far better performance at the time C was invented) of pointer arithmetic at the cost of its complexity and lower maintainability compared to array-indexing"?

      Or in other words: Do obscure C hacks still make programs notably faster, or can we go back to sane programming?

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    36. Re:Why do you care? by psavo · · Score: 2, Insightful

      Faster hardware just makes minor inefficiencies less noticible, so programmers add more minor inefficiencies and apply the same "but faster hardware will make it ok" attitude instead of fixing the real problem!

      None of the discussion I've seen so far has touched the real problem, not the wanking on the micro-optimisation (these compiler optimizations are all O(n) in gain) but that so few have any sort of a clue on what algorithms to use and when.

      I'm not saying that it's bad to optimize those sort of things when they pop up at your most-time-spent chart, but that they may simply go away (along with 90% of other runtime) when you really think of the algorithm.

      --
      fucktard is a tenderhearted description
    37. Re:Why do you care? by Anonymous Coward · · Score: 0

      Using interlace (on old amigas) or dual scan (on AGA hardware with suitable monitor) allowed 640x512 resolution. But. Amiga did also allow overscan. I managed to get around 720x570x16 colors or so on my display (just approximate, cannot recall too well, because those times are more than 10 years in past and during that time I've grown from "high school" boy to white collar worker ).

    38. Re:Why do you care? by Anonymous Coward · · Score: 0

      LOL

    39. Re:Why do you care? by Anonymous Coward · · Score: 0
      Maybe I've eaten too much acid, but I know what you mean. In order of preference: challenging greens and reds; melodic, harmonically simple blues and whites; fuschias.

      Maybe I have eaten too much acid.

    40. Re: Why do you care? by poopdeville · · Score: 1
      There's a strong tendancy among programmers to be penny wise and dollar stupid.

      This extends to Linux users too. Evidence: Gentoo.

      There goes my karma. Or maybe someone will laugh.

      --
      After all, I am strangely colored.
    41. Re:Why do you care? by Matthew+Weigel · · Score: 1

      A pox upon the developers who go after that .3% performance increase before they've improved their algorithmic efficiency, before they've fixed all memory leaks, before they've figured out how to handle a responsive UI without relying on that mythical and imperceptible performance boost, and before they've figured out how to change the code that implements that performance boost without introducing more bugs.

      Unless you've worked on a system that requires it, shut the hell up about what you 'should' do in an embedded environment. The Linux kernel runs in many embedded spaces these days, and lightweight it is not. If you look at real-time Linux, for instance, they didn't make it 'real time' by tuning the crap out of array-vs-pointer management. No, they fixed the design so that it didn't matter.

      --
      --Matthew
    42. Re:Why do you care? by Pxtl · · Score: 1

      Yes. Layers upon layers of abstraction make things crawl. Look around, and you can find a pseudo-language in XML being handled by Python, running on a Java interpreter, which interprets into machine code.

      Combine that kind of problem with the usual bad practices. Somewhere in all that mass of code somebody is polling a file to see if it's changed. Somewhere somebody has a busy-wait loop.

      Personally, I blame weak macro languages. Functionally solveable programming concepts are being handled at run-time when they shouldn't need to be. Things that should be configured only once are instead being handled by repeatedly parsing text files.

      I remember windows 3.1 being perfectly snappy on my 386 and having all the GUI features I needed for easy to use, informative software. We don't have to backtrack all the way to the amber screen and sacrifice usability to our god of efficiency. There were tons of programs that were perfectly fast with extremely helpful UIs back in the day. As much as I loved Word 5.1, I realise that the hotkey-only interface is no longer appopriate. But still, there were fast, light GUIs.

    43. Re:Why do you care? by slapout · · Score: 1

      There's no sense in trying to make your program 0.3% faster when you could be finding a way to make it 20% faster instead

      The orginal question was based on arrays vs pointers. I don't think he was concerned with speeding up his program. He was just asking if using arrays would be as fast as using pointers. If so, he was going to use the easier one.

      And even if that wasn't his question, pointers and arrays are widely used. If he (in future development) uses the fastest one then his code will run .3% faster in several places, not just the one.

      --
      Coder's Stone: The programming language quick ref for iPad
    44. Re:Why do you care? by tepples · · Score: 1

      Faster hardware just makes minor inefficiencies less noticible, so programmers add more minor inefficiencies and apply the same "but faster hardware will make it ok" attitude instead of fixing the real problem!

      Sometimes it's a tradeoff between develop-time inefficiency (finding and fixing bugs) and runtime inefficiency (wasting CPU cycles and memory). Only when runtime inefficiency becomes noticeable (such as under high load or when deployed on millions of existing machines) does it tend to become important in the real world.

    45. Re: Why do you care? by UtucXul · · Score: 1
      There's a strong tendancy among programmers to be penny wise and dollar stupid. This extends to Linux users too. Evidence: Gentoo.
      Lots of people use Gentoo for reasons other than assumed optimization. portage is actually a really nice package manager. It has one of the most complete repositories of packages available of any distro. Extremely easy updating. And some of the best documentation of any distro along with very helpful forums.

      I just get a little tired of hearing people bash gentoo because they think it is all about optimizing past the point of diminishing returns. Hell, I only have my system compile as -O2 (lots of benchmarking for scientific codes has lead me to be a bit conservative in my optimizing).
    46. Re:Why do you care? by RobertB-DC · · Score: 1

      To prevent that imminent danger all programmers must start programming in TI-99/4a Basic right now.

      Start? Who ever stopped?

      Actually, I wrote one of my first programs on my uncle's TI-99/4a. It was a playing card program of some sort, using the WOW! COOL! heart, spade, diamond, and club characters. Unfortunately, as I was completing the program, I tried to type an "=" and ended up doing "Reset"... the damned buttons were right next to each other.

      --
      Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
    47. Re:Why do you care? by networkBoy · · Score: 1

      Making the program understandable and maintainable is far more important than trying to squeeze out every last ounce of performance.

      For the most part I agree with you on this. Where I have to depart, and where, I suspect, the person asking in the first place was going, is small loops that are accessed repeatedly. I have some code that I maintain which is (was now) slow as syrup at 100 Kelvin. As it turned out, there were 5 or so operations that were always called, recursively, which were written for maintainability, not speed. I re-wrote those functions with a mind for speed and simply commented the hell out of it. Now the program runs acceptably (downright fast compared to the old rev), the users are happy, and the code is incomprehensable in 5 .c files. Hopefully the fact that there is 3 lines of comments to every line of code (on average, and written as clearly as I cna tink to do) will make up for it.
      -nB

      --
      whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
    48. Re:Why do you care? by Anonymous Coward · · Score: 1, Informative

      For _real_ programming tasks, like scientific computing applications that run for months at a time on supercomputers, it matters a lot. In that world, a 1% performance improvements equates to days of wall clock (and thousands of CPU-days).

      Algorithmic improvements are always the best way to get better performance. But once you can't figure out how to make an algorithm any "better," the profilers come out and hot loops are tackled. One of the portable tricks that seems to have benefit on a variety of compilers/machines is changing how floating point operations are scheduled: even native compilers seem to have problems avoiding dependency stalls. The trick to this kind of optimization is that by breaking up complicated expressions into smaller chunks, you are giving the compiler more options for applying its bag of tricks. Of course, if you break up things too much, you can miss opportunities to use instructions like muladds (a*x+b). It just takes a lot of experience on the particular platforms that your code has to run on.

      Back to the original question, I haven't seen a whole lot of difference between pointers and indexing. There are big performance differences, however, between STL vectors (as implemented in g++) and arrays. You should be wary of implementations that define operator[](const size_t index) as *(begin()+index): that stuff doesn't always inline away like you might think, especially if begin() is returning a non-trivial iterator class rather than just a T*. There are also big differences between using multidimensional arrays (e.g. in C) and 1-D arrays with multidimensional indexing (a la FORTRAN), the latter being considerably faster.

      I'm happy to see that the poster is interested in understanding code at a deeper level. There is still a call for programmers that understand the rich interaction between compilers and hardware.

      BTW: the CHUD tools on a Mac are great for learning about this sort of thing, especially the Shark profiling tool. Gotta love the built in PPC instruction reference, side-by-side source and assembly, stall counts, and pretty good advice about stupid things in the code.

    49. Re:Why do you care? by fbjon · · Score: 1
      as clearly as I cna tink

      Yes, yes. I fully comprehend. But a good idea might be to leave the maintainable code in there, but in comments, as a reference.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    50. Re:Why do you care? by jgrahn · · Score: 1
      The Amiga (unless you're talking about one of the very latest ones which I don't know, hi-res was 640x200) couldn't handle anywhere near 800x600.

      You are thinking of the A500 or A1000. There were many Amigas, and from -92 or so, 800x600 was a perfectly normal resolution.

    51. Re:Why do you care? by jgrahn · · Score: 1
      I'm pretty certain his point was not optimizing. It was more like "do we actually still need the optimizations (as in far better performance at the time C was invented) of pointer arithmetic at the cost of its complexity and lower maintainability compared to array-indexing"?

      I often (not always!) find pointer arithmetic more readable than array indexing.

      But the point is: yes, this choice should be made based on readability and correctness, not execution speed. There's still a place for obscure optimizations, but this is not (or almost never) it.

    52. Re:Why do you care? by Mr+Z · · Score: 1

      For those not in the know, the = button WAS the reset button, if you pressed the FCTN key with it...

    53. Re:Why do you care? by LWATCDR · · Score: 1

      You forget that a lot of "real" programs do not run on systems with millions or even billions of cycles to burn. Since this is c there is a chance that the target could be a smallish embedded system. Could be running on a 1mhz AVR or pic chip for example.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    54. Re:Why do you care? by Julian+Morrison · · Score: 1

      You're right, it could be running embedded with severe memory and speed constraints. But even there, it's a bad idea to try and second-guess the C compiler's output. If you're worried about the assembler output the compiler might emit, put some manually tuned assembler in there instead.

    55. Re:Why do you care? by Anonymous Coward · · Score: 0

      Unless you've worked on a system that requires it, shut the hell up about what you 'should' do in an embedded environment.

      You're the first person to bring up an embedded environment.

    56. Re:Why do you care? by Anonymous Coward · · Score: 0

      To summarize: you don't know and you have nothing interesting to contribute that we haven't already heard. Thanks for sharing.

      Just because the answer to a question is useless 97% of the time (or at least, of the time that *you* can think of), that doesn't mean someone should not ask the question.

      Suppose array indexing and pointer arithmatic are equally clear to me and the choice does not affect the surrounding code? Which should I use? If I choose the option that's 0.1% slower and it comes up a lot, I've just introduced a very hard to remove inefficiency into the code. OTOH, if I had known in the first place that the other was faster, I could have chosen more wisely.

      Also keep in mind that not everyone is developing for Windows running on 3GHz+ machines. Personally, my low end dips to ~100MHz ARM7. And if I take too long, the machine resets.

      Premature optimization is bad. Knowledge is good.

    57. Re:Why do you care? by Dachannien · · Score: 1

      My Commodore 64 has a 700kHz processor in it

      Unless you're underclocking, your C-64's 6510 runs at about 1 MHz.

    58. Re:Why do you care? by RobertB-DC · · Score: 1

      Aha... I remember now. Fctn+= instead of Shift+= == disaster. Man, was I ever pi... upset.

      --
      Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
    59. Re:Why do you care? by Kosgrove · · Score: 2, Insightful

      It has nothing to do with programmers being lazy. I'd much rather work smarter using higher-level tools and get a lot more done. It has everything to do with this simple fact, which wasn't (as) true back in the old days:

      Hardware is cheap. People are expensive.

      Think about it. A desktop with current hardware costs under $1000 these days. Lower-end servers run about the same amount. Compare that to how much you cost your employer per hour. How does it make economic sense to (most - don't you embedded and low-power computing people get all up in my grill) optimize software at the low level you're suggesting? Most software for the end user spends most of its time waiting for user input or doing network operations, not reversing strings.

      If you want to optimize something, optimize the architecture. Pool database connections, reduce network traffic, change object relationships to make them more efficient, but for god's sake don't waste your time reinventing low-level functions that have been done written and more importantly, DEBUGGED to the nth degree.

      I strongly disagree that we should be encouraging people to think about optimizing the optimizied wheel. Spend your time thinking about bigger and better software problems.

    60. Re:Why do you care? by oliverthered · · Score: 1

      A chemical reaction is only as fast as its slowest step. Catalyzing the other steps will not yield an improvement in reaction rate.

      Except that most computers only execute a single thread, so even a slight performance improvement in one function will make your computer work more efficiently, use less power and be more responsive and faster.

      Algorithm improvements usually give a far greater performance increase than compiler hacks, but that doesn't mean compiler hacks should be ignored.

      --
      thank God the internet isn't a human right.
    61. Re:Why do you care? by blueskies · · Score: 1

      hchar("N", 30);
      hchar("O", 30);
      hchar("!", 30);

    62. Re:Why do you care? by thegrassyknowl · · Score: 1

      Unless you're underclocking, your C-64's 6510 runs at about 1 MHz.

      Only in the USA and other NTSC countries. To get 25 FPS instead of 30 they dropped the main oscillator speed!

      --
      I drink to make other people interesting!
    63. Re:Why do you care? by Ziviyr · · Score: 1

      Interlaced with a tweaked monitor in a semi overscanned mode, 690x440. (many of the earlier Amigas didn't have trouble with this, you'd get more vertically in PAL mode)

      --

      Someone set us up the bomb, so shine we are!
    64. Re:Why do you care? by Ziviyr · · Score: 1

      I haven't really grokked C yet, but I still hiss at the sound of C++.

      --

      Someone set us up the bomb, so shine we are!
    65. Re:Why do you care? by Anonymous Coward · · Score: 0

      "I can't stress it enough; computers shouldn't look pretty with all this damned eye candy. They should be simple, functional and efficient."

      Apple has shown time and time again that people will find computers easier to use, and be more likely to use them for anything, if the interface is pretty. It does not really matter if the interface really does anything to actually make it easier to use. People just feel better about using the thing if it looks pretty.

      What do people remember most about NeXT? How it looked. What made it easier to use? Virtual desktops? a task manager? It just looked cool doing it.

      Contemplate all of the cycles that will never exist just because the UI wasn't pretty.

      Besides, more effective use of CPU will likely not do anything to end world hunger. Enough food is already produced. The economic system responsible for distributing it was needs to be altered...

    66. Re:Why do you care? by Anonymous Coward · · Score: 0

      Why do you think the machine bleeds a million cycles every time you raise a window. Think hard. Could it be that its full of wasteful code?

      Welcome to the world of virtual computing where you may run 1000 users on a single machine. No longer does the loss go unnoticed when people are waiting for their share of the system. Its very simple math for conservation. If all your programs waste cycles, do you think the system will run more users or fewer when you get to large scales?

      I deal with this all the time at work where people wonder why it matters if they put debugging code all over their software and leave it in production. Then they come back to me later and say "Gee, web services just don't scale well...I can only handle a few hundred transaction a minute". So when I go through their code and turn off all the logging and they quadruple that number, they think it was magic.

      Yes, I know- one is I/O and the other are clocks...big difference there. But wasted cycles add up. If you write a library the searches for a pattern and recognizes, it may seem really fast when you display it on your screen matching the pattern. But when you hand that library out to someone who wants to call the routines 500,000 times a second to do realtime recognition of objects in a robot, suddenly your wasted clock cycles start to add up.

      Call it academic if you think good practice and quality are academic. Why create something that is better than spec is what you are really saying. Some people are happy to live out their whole lives just doing good enough. Others, who push the limits, overachieve at all opportunities.

    67. Re:Why do you care? by muyuubyou · · Score: 1

      1992 was the "very late" I meant. I got my Amigas circa 1986 IIRC (an A500 and an A2000) my A500 expanded first to 1MB then later to 4MB. Around 1992 I think they released the A1200 which is the one I think you're talking about. The scene was pretty much dead already. There were other monsters like the A3000, A3000T etc but those were professional machines. Too expensive for the average home user.

      Even at 640x200, the Amiga was a slow bitch... the interlaced mode mentioned in sibling posts must have been unusable for most tasks other than showing static images and the like.

  2. Re:Hmm.. by Surye · · Score: 2, Informative

    Oops, messed up the tag at the top, and it ate the quote portion:

    I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references

    Preview! Gah.

  3. Here's the answer by Anonymous+Crowhead · · Score: 1, Offtopic

    Stop hunting for zebras. There aren't any.

    1. Re:Here's the answer by cant_get_a_good_nick · · Score: 1

      forgive my inference, can anyone tell me the reference for this?

    2. Re:Here's the answer by Anonymous+Crowhead · · Score: 1

      Basically, don't look for exotic causes to your problem. i.e. if your code doesn't do what you think it should be doing, thinking that there must be a problem with gcc (or whatever compiler you use.)

    3. Re:Here's the answer by hackwrench · · Score: 1

      I know a counter-example reference
      HHG2G: Some scientist proved that black was white and subsequently got ran over at the next zebra crossing.

      A Google search that a "zebra crossing" is british for a pedestrian crosswalk.

    4. Re:Here's the answer by cant_get_a_good_nick · · Score: 1

      Heh... I remember this in Assembler class. people would try their programs for the first time ever, if it crashed, the first thing they would finger would be the assembler program.
      Hmm, so this is the first time you've ever been in the environment with the least safety, and the first thing you do is to blame the tool.

  4. Hundreds of possibilities by topham · · Score: 2, Insightful


    This question sounds to me like discussions I had ages ago with other programmers, and it was always 1 programmer trying to justify his method of coding a routine over some other, equally valid method.

    Pointer arithmetic and array's in C really have the same issues. You can access beyond the array, and you can direct a pointer beyond the correct memory space. If you want to discuss good programming practices, C isn't it.

    I never really saw the point in arguing over array and pointers in C. When I programmed in C I used both. I typically used Arrays if I wanted to code to be obvious and straight forward; if I wanted to do somehting else with the index, etc. I used points if I wanted speed above all else.

    If I were to write to any modern PC (PPC970, Pentium IV, Athlon, etc...) I wouldn't worry about speed. The algorithm for more complicated functions than shuffling a few bytes around will dictate the speed.

    1. Re:Hundreds of possibilities by mjc_w · · Score: 1

      C is one of the last languages I would choose to use for array-oriented computations. A number of years ago, I came up with this:

      In C, an array is a pointer, an offset, and a prayer!

      --
      This is the Constitution.This is the Constitution under the Bush administration. Any questions?
  5. There are more appropriate places to ask this... by afroborg · · Score: 0, Troll

    Such as comp.lang.c.moderated

    Do the kids still know what USENET is?

    (XNews is still the best newsreader BTW...)

    --
    my sig could kick your sig's arse...
  6. WOW! by Anonymous Coward · · Score: 2, Insightful

    Now THIS is the kind of discussion that should be going on at Slashdot!

    1. Re:WOW! by Anonymous Coward · · Score: 0

      Now THIS is the kind of discussion that should be going on at Slashdot!

      What are talking about? Almost all Ask Slashdots are like this. A retarded question accepted by a retarded editor who, being retarded himself, didn't realize what a retarded question the retarded submittor posed.

    2. Re:WOW! by Anonymous Coward · · Score: 0

      Yeah right, there are more idiots here than anywhere else. Lots of wannabe programmers and such hanging out at this place. This kind of discussion has no place here, no one will be able to have an intelligent converation.

  7. Professor Cormen said... by Evro · · Score: 1

    One of the mantras hammered into my head during my freshman CS class was:

    "The name of an array is a pointer to its 0th element."

    Another favorite was:

    "Never dereference a null pointer."

    From what I remember, arrays and pointers are interchangeable. I've forgotten the C syntax, but I believe that if you have int i[20] followed by int *k = i, then i[4] is the same as *(k + (4 * 4)). E.g. they're both references to specific locations in memory. I don't know about differences across architectures, but I'd imagine if that didn't hold true on a particular system then that system wouldn't really be implementing C correctly. Then again, I'm not much for languages.

    --
    rooooar
    1. Re:Professor Cormen said... by RAMMS+EIN · · Score: 3, Informative

      You're not completely right.

      First:

      ``int i[20] followed by int *k = i, then i[4] is the same as *(k + (4 * 4))''

      You're trying to get the 5th element of the array by using an offset of 4 times 4, assuming sizeof(int) == 4. First off, don't make that assumption; always write sizeof(int) when that's what you mean. Secondly, the C compiler automatically multiplies your offset by the size of the elements, so you would have to write *(k + 4) instead.

      Secondly, you're not getting the point (probably, you were misled by the headline, as I was). The question is not whether variables holding arrays are really holding pointers to the arrays (they are), but whether, say, iterating through an array by updating a pointer is faster than iterating by using an index variable as an offset. In other words, it's not whether a[0] is the same as *a, but whether while(*a) a++; is faster than while(a[i]) i++;.

      --
      Please correct me if I got my facts wrong.
    2. Re:Professor Cormen said... by CableModemSniper · · Score: 1

      you mean:

      while(a[i++]);
      vs.
      while( *(a++) );

      *wink*

      --
      Why not fork?
    3. Re:Professor Cormen said... by Evro · · Score: 1

      I'd actually typed sizeof(int) but I wasn't sure if that was valid, or if you had to do sizeof(i). Regarding the multiplication, I think I may have been thinking of malloc().

      As for your second point, I don't see how that's at all different from the first one. It seems like the question is akin to asking whether it takes more time to compute 2+2 or 1+3.

      --
      rooooar
    4. Re:Professor Cormen said... by Bill+Dog · · Score: 1

      I'd actually typed sizeof(int) but I wasn't sure if that was valid, or if you had to do sizeof(i).

      You can do either. Although "sizeof" is an operator, not a function, so you don't need parens to use it. So you do "sizeof i". Or "sizeof (int)", where "(int)" is a typecast allowing you to use a type instead of a variable with "sizeof".

      As for your second point, ...akin to asking whether it takes more time to compute 2+2 or 1+3

      It's akin to asking whether it takes more time to compute pa + sizeof *a or a + i * sizeof *a.

      --
      Attention zealots and haters: 00100 00100
    5. Re:Professor Cormen said... by RAMMS+EIN · · Score: 1

      ``As for your second point, I don't see how that's at all different from the first one.''

      Ok, I'll elaborate.

      When you say while(a[i]) i++;, you do two adds each cycle; one to increment i, and one to add i to the address of a. When you say while(a) a++;, you do only one add per cycle. The OP's question is whether optimizing compilers get rid of one of the adds, or not.

      This is different from asking which of a[4] and *(a + 4) is faster. In that case, the loops would have looked like while(a[i]) i+=; and while(*(a + i)) i++;, respectively, and both would have done two adds.

      HTH.

      --
      Please correct me if I got my facts wrong.
    6. Re:Professor Cormen said... by thsths · · Score: 1

      > When you say while(a) a++;, you do only one add per cycle.

      I heard that there is a story behind the increment operator ++: early CISC machines, such as a the PDP-11, had an "auto-increment" memory access scheme. So the optimal way to write it is actually:

      while (a++) ;

      which gets compiled to one instruction that does the memory access and increment, and of course the conditional jump instruction. You cannot get much faster than that.

      Of course with modern RISC or superscalar architectures and optimising compilers, this kind of source code optimisation is totally void. Just use for loops: they may not be particularly nice in C, but they are easy to read, and the compile will optimise the complete loop away if possible.

    7. Re:Professor Cormen said... by eric76 · · Score: 1
      I heard that there is a story behind the increment operator ++: early CISC machines, such as a the PDP-11, had an "auto-increment" memory access scheme.

      They had both post-increment and pre-decrement as well as deferred post-increment and deferred pre-decrement.

      Even better was the Data General Nova. It had autoincrement and autodecrement locations in memory. Access one of those locations and the contents would be automatically incremented (or decremented) as part of the operation.

    8. Re:Professor Cormen said... by drxenos · · Score: 1

      You are wrong. The use of parens with operator sizeof and types is just the required syntax. There is no typecast of any sort going on. This is why a statement such as: "sizeof (int) 0;" is illegal. You may look at the precedence table and think it should be OK because the cast operator has precedence over the sizeof operator, but that would be wrong. The C standard says that operator sizeof followed by a left paren indicates a size of a type operation, plain and simple.

      --


      Anonymous Cowards suck.
    9. Re:Professor Cormen said... by Bill+Dog · · Score: 1

      The use of parens with operator sizeof and types is just the required syntax. There is no typecast of any sort going on.

      That's as stupid as saying there's no incrementing going on in a for loop, it's just that "i++" is the required syntax. The typecast is precisely how you get operator sizeof to take other than an instance of a type. Parser implementors can adopt whatever approach they need to, but the point is, conceptually, the parens go with the type, not the sizeof operator, which otherwise doesn't require the use of parens.

      --
      Attention zealots and haters: 00100 00100
    10. Re:Professor Cormen said... by drxenos · · Score: 1

      I'm not going to argue with you. I suggest you read the Standard. If they use the word "typecast" anywhere involving sizeof and types, I'll eat my own ass. typecast implies, well, a typecase. What, prey tell, do you think is being cast?

      --


      Anonymous Cowards suck.
    11. Re:Professor Cormen said... by Bill+Dog · · Score: 1

      I don't want to argue either. In the link provided previously, MS refers to it as a "a type-cast expression". That is, nothing is being cast, it's just that an expression of this kind is being used. If that is wrong, so be it, but it's hard to believe.

      --
      Attention zealots and haters: 00100 00100
  8. is this really your bottleneck? by eyal · · Score: 2, Insightful
    When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?

    One can always check the resulting assembly code, if one is so concerned about this.

    Though I'm pretty sure this isn't the performance bottleneck in your code (just remember - profile, profile, profile)

  9. Xnews? Bah! by jd · · Score: 1

    There is nothing to beat xvnews - particularly the icon that shows the economy collapsing whenever there is anything new to read.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  10. The eighties called... by Anonymous Coward · · Score: 3, Funny

    ...and they want their arrays, pointers and C back.

    1. Re:The eighties called... by Anonymous Coward · · Score: 0

      1992 called, something about a joke ...

  11. In my experience ... by Keeper · · Score: 1, Interesting

    In my experience, which is somewhat dated (by about 5 years), what you state is generally true for simple loops.

    However, for more complicated loops the compiler fails to opimize the loop very well. My 'experience' was in writing an alogirthm which took a raw camera sensor ccd data and ordered it into a proper bayer pattern. When moving from indexer based access to pointer based access I saw a significant increase in performance.

    Even basing one pointer off of another was slower than maintaining separate pointers and manipulating them via arithmatic.

    The loop looked pretty nasty when it was done (the loop also mixed several other stages of the image processing pipe at once, such as application of color correction & calibration data, dead pixel correction, etc; probably had something like 30 pointers that had to be maintained), but it was quick.

    I ended up taking following the same approach converting the raw ccd data to a live preview and code which generated a thumnail preview, obtaining the same kind of gains.

    When I was done with the thumnail code, it was capable of generating 108 fully processed thumnails (color 768x512 8bpc image) per second on a 700mhz PIII from a 6 megapixel source image (3072x2048, 16bpc bayer image). I was proud of that at the time...

  12. It optimizes out by klossner · · Score: 5, Interesting
    An optimizing compiler, such as gcc -O, will rearrange the array code into the pointer code -- it doesn't require a base-index address mechanism. This is called strength reduction.

    Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.

    1. Re:It optimizes out by SilverspurG · · Score: 1
      Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.
      Kudos. :) My hat's off to you.
      --
      fast as fast can be. you'll never catch me.
    2. Re:It optimizes out by LSD-OBS · · Score: 4, Funny

      Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree

      Hah, yeah. Last week in a public toilet in London, someone drew an arrow pointing to the toilet roll, and it said "Computer degrees - please take one".

      Guess that about sums it up ;-)

      --
      Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
    3. Re:It optimizes out by Detritus · · Score: 1
      In the "real world", not all compilers are that intelligent.

      Yes, it does make a substantial difference in speed with some systems/compilers.

      --
      Mea navis aericumbens anguillis abundat
    4. Re:It optimizes out by Anonymous Coward · · Score: 2, Informative

      Exactly, strength reduction changes the indexing operation to straight pointer arithmetic. this is done at least when i looked a few years ago in gcc/g++, in the later phases of the compilation, so that that the compiler is rearranging the RTL to eliminate the indexing variable. you can verify strength reduction by just setting the optimisation in gcc and looking at the assembler output.

      The point is now a bit moot since for many loops you do not want either indexing or pointer arithmetic, you want SIMD instructions which are a third alternative way of implementing the the programmers looping construct. this is now done in SSA at the front end of gcc. In most cases compilers are smart enough to ignore the programmers code and minor semantic differences in code, like indexing or arithmatic will be restructured to the most optimal solution on the target architecture. So the point is, leave it to the compiler.

      jxx

    5. Re:It optimizes out by metamatic · · Score: 2, Interesting

      That was my first thought too. "They're equivalent, so why is anyone even asking? Any optimizing compiler will handle it."

      Then I remembered that this is Slashdot, where the groupthink is that a CS degree is useless and doesn't teach anything you need to know in the real world.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    6. Re:It optimizes out by Anonymous Coward · · Score: 0

      "They're equivalent, so why is anyone even asking?"

      Read TFA. He says you can't count on multiplatform code used by different compilers to perform this optimization. And is asking, in part, what is the experience of the reader.

    7. Re:It optimizes out by metamatic · · Score: 1

      If you're really stuck with building for platforms where there's no GCC code generator, then all bets are off. You might even find yourself using a compiler that generates broken 'for' loops, like a version of Microsoft's C compiler I once had to use.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
  13. I echo the above statements by 21chrisp · · Score: 4, Insightful

    These types of opimizations are virtually pointless on modern machines. The increased readability and lower likelihood of programming errors on the array option far outway any speed increase for the pointer option. Plus, as you noticed, the resulting assembly is basically the same. Most likely, both will run at virtually the same speed with modern compilers. Not only would the speed difference be unnoticeable, it would be utterly inconsequential.

    IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.

    1. Re:I echo the above statements by NullProg · · Score: 4, Interesting

      These types of opimizations are virtually pointless on modern machines.
      I call bullshit. Optimizations are important regardless of the language or CPU.

      My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.

      It took over six hours on the dual processor system with the native .Net and Java XML parsers. And yes, the original programmers tried several different methods/libraries to tweak the code (Sax, Xerces, whatever). They got it down to a best of four and a half hours.

      My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.

      Time is money, especially when your trying to push down 20,000 price changes from the Mainframe to 2000+ POS units during the off hours. The solution? We put my C routine into a shared library callable via C# or Java. Bonus, the 'C' code gets it done under an hour on the dual CPU machine. And yes, I tested the inputs for overflows, security problems whatever, before we went into production. Theres a big difference between a programmer who knows a language vs a programmer who understands it.

      IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
      Thats your opinion and I'm glad you shared it. You do know that your C#/Java/VB/Python etc. VM calls all wind up as pointer arithmetic to the CPU? Don't you? I wouldn't want to work for you though. Your competitors will write a faster program that uses less memory and you will loose the contract/job.

      No flame intended,
      Enjoy.

      --
      It's just the normal noises in here.
    2. Re:I echo the above statements by Arandir · · Score: 2, Interesting

      There are optimizations and then there are optimizations. If you run this particular loop once, or twice, it won't matter. Run it ten times or even a hundred, it won't matter. In these cases it would be a pointless optimization. Profile your software before you optimize. That way you can optimize where you need it and not waste time where you don't.

      I remember one code review where someone told me to use prefix instead of postfix notation, for optimization reasons. Yet it occured in an initialization routine in a background thread that would run once per day. That's like worrying about a memory leak in a power-off interrupt handler...

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    3. Re:I echo the above statements by Trepalium · · Score: 1

      While I don't dislike pointer arithmetic, you might want to investigate the -fstrength-reduce optimization (included in -O2/-O3) for GCC instead of foolishly making your work harder by writing pointer code instead of array code. If array code is more obvious, use that. If pointer arithmetic code is more obvious, use it instead. Don't use one or the other because it's faster -- it won't be if you're enabling compiler optimizations (and if it is, despite the compiler optimizations, file a bug).

      --
      I used up all my sick days, so I'm calling in dead.
    4. Re:I echo the above statements by Neil+Blender · · Score: 3, Interesting

      My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.

      Heh, a little offtopic but - This is why I hate XML. It's so bloated. You take 1 to 6 hours parsing a 30 megabyte XML file in C? I was just tasked with parsing out some select data from a 37 gigabyte XML file (870 million lines). I tried all sorts of optimizations and parsers upon finding that it might take days to parse. My solution - 50 lines of perl using regular expressions. I run this on a dual processor 3.something with 2 gigs of ram. It takes 5 minutes. If I coded it in C it would probably take 10 seconds but it's not worth the time.

      Here's the file and convertor if anyone wants to fuck around with nearly a billion (bloated XML) lines of genetic data:
      ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Al l_Data.ags.gz
      ftp://ftp.ncbi.nlm.nih.gov/asn1-converters/by_prog ram/gene2xml/

    5. Re:I echo the above statements by JackDeth · · Score: 1
      I call apples vs oranges.

      You're comparing performance of an interpreted language vs a native code language which, in most cases, should result in a performance difference, especially for an iterative, non-interactive process. (Although, in my experience, it's not usually that dramatic if the code is equivalent.)

      He's comparing using a simple, easy to understand construct (array indexing) vs a more complicated construct (pointer arithmetic) in the same language, which at best will result in a minor performance difference. With a good compiler, it should result in near zero performance difference.

      Theres a big difference between a programmer who knows a language vs a programmer who understands it.

      True, but there are few things more dangerous to a critical project than a programmer who simply thinks he/she understands the language. Adding complexity to a program for a possible minor performance adds risk to the project without an acceptable reward, and should be avoided whenever possible.

      Time is money, and avoiding critical bugs saves both. Even if it means your 90 minute process now takes 94 minutes to run instead.
    6. Re:I echo the above statements by Anonymous Coward · · Score: 0

      dude,

      1.5 hr for 30 megs of XML is WAY WAY too long! It is even a too long time for 30 Gb of data!

      seems like you guyes can fuck it up no matter what language you use. And still think you are experts since you know perl. I am parsing AND PROCESSING a much larger file in less than 10 secs on my desktop PC (Java 5.0 code, but who cares).

      Have you truned of all the stupid checks in the XML parser?

      (No flame intended)

    7. Re:I echo the above statements by Westley · · Score: 0

      You're comparing performance of an interpreted language vs a native code language

      Um, where's the interpreted language involved here?

      C# is never interpreted (on .NET itself; I believe there's an interpreter in Mono, but I suspect that's not being used here).

      Assuming the Java was using Sun's JDK, it would be JIT-compiled (aside from code which is only run once or twice, which may still be interpreted but would be irrelevant, performance-wise).

    8. Re:I echo the above statements by Anonymous Coward · · Score: 0

      My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.

      My educated guess is that your C program didn't do half the things a proper XML parser should do, so your C program didn't have to deal with internationalisation, normalising whitespace, etc. Great! Except when you find out one customer in a hundred has a corrupted record because of an accent in his name or whatever.

    9. Re:I echo the above statements by Anonymous Coward · · Score: 0
      My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.
      You could have used a binary file (or even a flat text file) instead of a rather expensive XML parser, and been even faster.

      It's not just programs that are getting less optimised, it's also data structures and the associated overhead. I'd certainly care more about thirty characters of data overhead per data item then using file[1] instead of *p.
    10. Re:I echo the above statements by Anonymous Coward · · Score: 1, Interesting
      The original programmers tried several different methods/libraries to tweak the code (Sax, Xerces, whatever). They got it down to a best of four and a half hours.

      My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.

      I call bullshit. Your crappy piece of code was faster because it probably didn't handle half the cases Xerces (in fastest mode, Sax) does: from proper charset decoding (usually UTF-8) to entity handling, error reporting (line numbers), entity expansion, input normalization (linefeeds to \n independent of input), namespace handling and so forth. Simple naive and non-compliant "solutions" are dime a dozen. I've written some for specific tasks, but I do not compare apples to oranges. Speed difference between C and Java/C#, for tasks like xml parsing, are nowadays somewhere between 10 and 25% at most.

      And yes, I do know what I am talking about, having written a fully compliant xml parser (not xerces, but similar); and having profiled it, 40% of total processing time is for raw input handling (including charset decoding). Based on that I very much doubt you could have improved 3x the performance of a streaming parsing: it'd mean that you had no parsing overhead whatsover beyond simple input streaming.

      This kind of "improvement", though, would be quite normal if one compared in-memory tree solutions (DOM) to streaming ones (Sax etc.)...

    11. Re:I echo the above statements by eric76 · · Score: 2, Interesting

      One of the more interesting things I've seen is to see how different software developers write a program for the following problem:

      Jack bought a bag of 100 pieces of candy at the store. It has 90 cherry candies and 10 lemon candies. He prefers the cherry candies over the lemon candies.

      Every day Jack randomly picks a candy from the bag. If the candy is a cherry candy, he eats it. If the candy is a lemon candy, he puts it back and randomly draws again. He eats the second randomly chosen candy no matter what flavor.

      What are the odds that the last piece of candy in the bag is a lemon candy?

      Interestingly enough, the approach nearly always used it horribly inefficent. I've never seen anyone run the program long enough to get an answer.

      In contrast, the approach used less often solves it in a fraction of a second.

    12. Re:I echo the above statements by cow-orker · · Score: 1

      ...didn't do half the things a proper XML parser should do...

      Oh, are you saying, XML isn't a universal, light-weight format that an average CS student could implement in three weeks, after all? Are you implying that it is rather a bloated, inefficient, underspecified format that nobody could implement completely and correctly, let alone in three weeks, and which is also either unsuitable or overengineered for most tasks?

      You're a wise man, dude.

      Besides, the name "XML parser" is wrong. A SAX parser is just a lexer, a DOM parser parses an LL(0) grammar, which is trivial. All this should be easier. It probably is, if you use something sane instead of XML.

    13. Re:I echo the above statements by Anonymous Coward · · Score: 0
      You do know that your C#/Java/VB/Python etc. VM calls all wind up as pointer arithmetic to the CPU?
      If you take a look at this example given to us by original poster:
      forloop:
      mov bl,byte ptr [esi+edx]
      mov al,byte ptr [ecx+edx]
      mov byte ptr [ecx+edx],bl
      mov byte ptr [esi+edx],al
      inc esi
      dec ecx
      cmp esi,ecx
      jl forloop
      you'll see that it doesn't always end in pointer arythmetics, because some architectures have some cool indirect/indexed/with offset/[pre-|post-] [inc|dec] addressing modes on instruction set level, which gives compiler writers chance to translate high level language programs more literally to machine code.

      Even if that chance is not taken (i.e. because of portability, like in GCC) and array references are resolved as pointer arythmetic, again there is good possibility that compiler was written well and does good optimizing job on that translation. In my experience, concept of pointer is hard to crack for people who were not introduced to CPU internals, address registers, assembly language programming, ... they tend to think as matematicians (or FORTRAN programmers). To them, indexed arrays are comprehensive and pointers are black magic. For us old school geezers, pointers are only, natural, way to go that gives us warm fuzzy feeling (illusion) that we know exactly what and how machine does...
    14. Re:I echo the above statements by Anonymous Coward · · Score: 0

      Oh, are you saying, XML isn't a universal, light-weight format that an average CS student could implement in three weeks, after all?

      No, I'm saying an off-the-cuff C program written for speed is hardly likely to conform to the specification. Boy, you sure have a chip on your shoulder.

      Are you implying that it is rather a bloated, inefficient, underspecified format that nobody could implement completely and correctly, let alone in three weeks, and which is also either unsuitable or overengineered for most tasks?

      No, I'm not. It's quite obvious that you are yammering on about a straw man, so feel free to rant and rave against things I haven't said, won't say, and aren't true, but don't expect anybody to find it interesting, because it bears no relation to my comments, just your fantasy.

      If you want to address my comments, then address them. If you want to address your own delusions, well try writing it in your diary.

    15. Re:I echo the above statements by cow-orker · · Score: 1

      What a shame. You would have been right.

    16. Re:I echo the above statements by tdelaney · · Score: 1

      Wow - that sucks. I successfully used Xerces (and Xalan) to transform ~500MB files in less an hour. And these weren't trivial transformations - there was some pretty complex stuff in there. Network topologies, with SNMP data to be transformed to HTML-based reports. Lots and lots of SNMP variables and tables to be processed...

      The naive method of just feeding the entire thing in (as it was originally done) just doesn't scale - the problem with those libraries is that they load the entire file into memory and then transform them. That's not a good way to operate.

      So I simply wrote a program in Python which split the files up into multiple valid XML files (each conforming to the DTD, but containing only parts of the original), transformed those, then stitched them back together. Never went over 20MB of memory used. There were additional advantages to splitting up the file - namely, other parts of the application only needed the outline, not the specific detail that took up approx 95% of the data file.

      Profile, Know your data. Optimise the algorithm. Don't get stuck on only using one technology. It's very rare that you should need to get into micro-optimisations - and pointer arithmetic vs array access should only be a consideration for a *very* small group of people.

    17. Re:I echo the above statements by Paul+Jakma · · Score: 2, Informative

      lower likelihood of programming errors on the array option. .... IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.

      You realise that pointer arithmetic and array indices are the same thing, don't you? Ie given:

      int b[100];
      int i;

      The following are equivalent:

      b[i++]

      *(b + i++)

      Arrays *are* /nearly/ equivalent to pointers, indeed an array variable without a subscript degrades to a pointer. Further, note in both cases above i may or may not be in-bounds for the array, the array notation does not help check the bounds at all, and the programmer will have to go to same trouble to check bounds properly regardless.

      It sounds to me btw like you're not qualified to decide what place pointer arithmetic have in modern software.

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
    18. Re:I echo the above statements by Anonymous Coward · · Score: 0

      Wow I suck. Took me waaaay too long to write up the code for that. Anyway, it looks like a 91.818% chance that the last one will be lemon. The limit appears to be lemon / (lemon + 1) as cherry goes to infinity, but I can't really prove that, and I can't come up with a 1-liner general solution. :(

    19. Re:I echo the above statements by Anonymous Coward · · Score: 0

      I got the same answer here (101/110) and it took me longer than I thought it should have as well. I used a 2-D array of probabilities and filled it in by diagonal rows. It was the obvious solution to me and it ran in less than a second. What is the more common way the GP referred to, recursion? *shudder* That's what most coders come up with and I'm the one who couldn't find a programming job in the last five years?!

      I agree with your intuition that there should be a one-line combinatorial solution, but I don't see it either.

    20. Re:I echo the above statements by ameoba · · Score: 1

      The change of language/environment from C# under .NET to C running native machine code is going to be far more significant than the use of pointers in the C program. What would be more relevant to the discussion at hand would be for you to have numbers showing how your C program was sped up by using pointer arithmetic over array indices.

      --
      my sig's at the bottom of the page.
    21. Re:I echo the above statements by slamb · · Score: 2, Interesting
      Interesting. I'm at work now, so I shouldn't take the time to actually solve it until I get home. But it sounds like a problem you could solve by:
      1. Statistics. It seems like if you are One with the Statistics, you could do it with pencil, paper, and a calculator just powerful enough to do factorials. (Unfortunately, I am not.)
      2. Dynamic programming or recursion with memoization. (There are overlapping subproblems.)
      3. Recursion. This would be slow, and I bet it's the way most people use.
    22. Re:I echo the above statements by j_kenpo · · Score: 1

      Wrote a small program, found about 91.7, +/- .1 due to randomness. Runs in less than a second... didn't need recursion and used a single array...

      So instead of keeping everyone in suspense, why dont you enlighten us to the "approach used less".

    23. Re:I echo the above statements by sleepingsquirrel · · Score: 2, Funny
      The following are equivalent:
      b[i++]

      *(b + i++)
      Yeah, but the fun doesn't really begin until you start writing code like...
      i++[b]
    24. Re:I echo the above statements by NoOneInParticular · · Score: 1

      The answer is 0.918181818181... Do I get a bit of candy now?

    25. Re:I echo the above statements by Neil+Blender · · Score: 1
      Even perl brute force solves to decent accuracy in a few seconds.

      For all you perl haters:
      @a=split//,'0'x90 .'1'x10;$j=100000;for(1..$j){@b=@a;while(1){if(!$# b){$b[0]?$l++
      :$c++;last}$i=rand($#b+1);($b[$i]&& $s)?($s--&&next):($s=1);splice@b,$i,1;}}warn $l/$j;
      It's really a statistics problem though.

    26. Re:I echo the above statements by Anonymous Coward · · Score: 0
      Dude, me too! Great minds think alike, eh?
      $ echo '@a = (91.6, 0.2, rand()); print $a[0] + $a[1] * $a[2]' | perl
      91.7721472613661
      <speech synth=hawking31337>7h4nx. 1 B h3r3 4|1 t3h w34|<... d0'7n 9hr0637 2 719 jO0|/ w4!7r35z.</speech>
    27. Re:I echo the above statements by lokedhs · · Score: 1

      It's worse, since there is absolutely no performance difference in prefix compared to postfix notation. Unless, of course, you completely disable optimisations. But if you do, then you have bigger performance issues to worry about. :-)

    28. Re:I echo the above statements by 21chrisp · · Score: 1

      I didn't even notice the amount of controversy this post created. Usually I don't respond to my own post, but it seems like I need to explain my reasoning...

      Hear is why I think pointer arithmetic is bad(tm) by example:

      I used to work for a defense contractor. The project was large and had a large amount of legacy C code that needed to be integrated w/ newer C++ code. The legacy code had already been deployed in a number of places for a number of years. Performance and stability were both of top importance. It had to run in real time (at least the C portion did) and had to be completely stable. The legacy code had a large amount of pointer arithmetic. While profiling, we found hundreds of instances where incorrect pointer arithmetic would cause memory leaks and nasty bounds problems. Very few problems were found in the code that used arrays. It took so much time to fix these problems that it added months to the development time of the product. The pointer arithmetic was extremely confusing and we ended up re-writing much of it using arrays. The result was something that was rock stable and ran a few milliseconds slower. The speed difference was not significant and the stability meant that we could be confident that the system would be dependable when delployed. Which was _extremely_ important.

      Would you like to fly on an airplane that had software written with excessive pointer arithmetic on it? It wouldn't surprise me if this was the case on some planes, but the thought would be unsettling to me. The reality is that most people get confused when doing pointer arithmetic and make more mistakes. For people that don't get confused and can write perfect code like this, more power to you. I would be worried about adding code like this to any project though, because other project members will most likely not be nearly as good and will make mistakes. That's why I would ask the person to rewrite it. (BTW, I wouldn't be saying "this teh sucks" but "future maintainers may have a hard time understanding this"). I am very big on performance and optimizations, but not at the sake of stability and readability (because less readable code will end up being less stable down the line).

      The array version is neither clearer nor less error-prone. And I dare you to prove differently!

      It seems like this challenge is issued and then you go on to say it can't be done... If you're saying it can't be proved, I agree. I'm not sure that anyone can "prove" that pointer arithmetic is less clear. This is subjective. Perhaps if pointer arithmetic were the standard way of doing things, it would be easy for everyone to use. Arrays are far more common though, and most people are much more comfortable with them as a result. The reason I think it's more error-prone is coupled to this. In practice, people tend to make more mistakes with them ,because they're less familiar with how they should be used.

      I wouldn't want to work for you though. Your competitors will write a faster program that uses less memory and you will loose the contract/job.

      I'm not sure how often a contract would be lost due to array related performance issues. On the occassions where I've had to fix performance issues, the problem was usually a poorly written algorithm and not anything like "array overhead".

      The pointer version is more portable across the chasm of compiler quality. That is, it will run well even on poorer compilers.

      It's pretty hard to find hardware not supported by a major compiler these days. GCC works on some amazingly obscure hardware. I suppose in instances where the hardware is extremely rare and doesn't have a decent compiler, pointer arithmetic may be the best option. This would be a very rare circumstance though.

      Thinking like this.. Is why we have big, ugly, bloated programs that require overpowered CPUs.

      As far as standard arrays causing ex

    29. Re:I echo the above statements by JollyFinn · · Score: 1

      Well theoretical answer VS practical statistical answers are two different things.
      Lemons end up on top of the bag, since they are put back here, and come more often than probability suggests.
      Anyway this is a math problem with an algebraic answer. Unfortunately I don't remember enough relevant infromation from the course than there exists algebraic answer for this problem.
      So answer probably is done under 1000 clockcycles, in C on athlon64. [If not counting for time to load the program from disk to cache.]

      --
      Emacs is good operating system, but it has one flaw: Its text editor could be better.
    30. Re:I echo the above statements by Breakfast+Pants · · Score: 1

      They are nearly equivalent but that isn't the question. Without optimazation accessing a member of an array requires you to add an offset to the array pointer. You could write the code such that the array itself is incremented instead of the offset and then you would find your equivalence. Otherwise there is an extra arithmetic step in each access. Now granted with compiler optimization this isn't an issue but that is the question.

      --

      --

      WHO ATE MY BREAKFAST PANTS?
    31. Re:I echo the above statements by Paul+Jakma · · Score: 1

      It comes down to either (base + offset++) or base++. Whether it's an array or pointer is just not relevant. Further, many CPUs natively support (base + offset) addressing, so there's minimal cost to that extra addition. Finally, I challenge you to be able to even measure the cost of that extra addition on any code that does anything useful on any real-world CPU (embedded or not).

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
  14. Arrays vs. Pointers determining speed by Anonymous Coward · · Score: 0

    A[0] is a pointer to a byte in memory. A[1] is a pointer to that byte + 1 * (size of an element of that type) in memory. So shouldnt this be irrelivent and both take equal time.

    Either way, both running times for assesing memory is in n time, so it doesnt matter.

    1. Re:Arrays vs. Pointers determining speed by larry+bagina · · Score: 1

      assuming char *A or char A[128] or whatever, A[0], A[1], etc. is a byte in memory (not a pointer to it).

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    2. Re:Arrays vs. Pointers determining speed by Anonymous Coward · · Score: 0

      Parent wrote:
      >Either way, both running times for accessing memory is in n time, so it doesn't matter. [Emphasis mine]

      Actually you're onto something here. Basically the difference is exactly one integer addition, but that is usually dwarfed by the fact that we're accessing memory (hopefully but not always in L1 cache). This clearly shows up in the optimized assembly dumps people have posted in other threads, although it is possible for a really good compiler to convert the array version to a pointer version in some instances.

      In both cases (array vs pointer), you perform a simple test every loop (i < len vs p < endp), you also do an increment (or 2) (++i vs ++p), and you do a memory access. However, the significant difference is how you dereference the pointer: arr[i] is interpreted as *(arr + i), versus *p. For architectures that allow indexed memory access in one cycle, there is no difference; otherwise, expect one ALU unit of delay.

      If you have a long pipeline and a short loop, this may get "absorbed" by the pipeline stall. Also, if you have a "heavy" loop (such as one that is likely to cause a cache miss or branch), then there is probably less than 1% difference. However, if all you're doing is a degenerate test where you reverse a fscking string that's less than one cacheline in length, then it might actually make a measurable difference on architectures that don't support indexed memory access. A simple count of instructions tells us that the difference is likely to be in the neighborhood of 10-15%. This is backed up by the performance numbers quoted in another thread.

      In summary there's not likely to be much difference, so use whichever style you prefer. If your code isn't running fast enough, profile it first and then consider rewriting.

      p.s. I personally prefer the pointer style, but I can understand that people who never learned "old fashioned" C are often uncomfortable with pointers, so I have no problem with them using the array style in their code.

    3. Re:Arrays vs. Pointers determining speed by dolmen.fr · · Score: 1

      A[0] is a pointer to a byte in memory.

      If A is "char *A", this is false. A[0] is the value of the byte in memory. &A[0] or just A is the pointer.
      But if A is "char **A", this is true, A[0] is a pointer to a byte in memory. And A[1] is also a pointer to a byte in memory. However probably not A[0]+1*sizeof(char).

  15. Different methods by jd · · Score: 1
    Different compilers will optimize abstractions in different ways, so you are not guaranteed to get the same code when using arrays. Pointer operations should(!) compile to essentially the same code on ANY compiler, as the code is already about as reduced as it can get.


    In consequence, if you want a predictable program on ANY compiler, pointer operations are probably a better bet than arrays, regardless of how well compilers handle those arrays.


    If you want something of uniform reliability, rather than uniform performance, you're better off with arrays. Arrays are easier to bounds-check in source form, errors are generally easier to spot, etc.


    If you want something that is FAST, then you're probably not wanting to use either. For example, if you've a sequence of characters, and want to copy them, then copying them a character at a time - regardless of how - is slow and inefficient on a 32-bit or 64-bit machine. Cast the data onto the largest unit the CPU can pull out of memory in one operation, and operate in bulk/parallel.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:Different methods by Xyrus · · Score: 1

      How the code gets treated depends on how "clear" it is to the compiler. The loop in the example is simple and any decent optimizing compiler worth it's 1's and 0's should optimize it.

      But if the code is unclear, let's say multiple nested loops with a good number of branches, local variables, etc. then an optimizing compiler will probably say WTF and just leave it alone.

      As has been said many times, profile your code. If you find the bottle-neck, fix it. If it comes down to pointer vs. array, get rid of th array.

      "For example, if you've a sequence of characters, and want to copy them, then copying them a character at a time - regardless of how - is slow and inefficient on a 32-bit or 64-bit machine. Cast the data onto the largest unit the CPU can pull out of memory in one operation, and operate in bulk/parallel."

      Careful with that. If you're doing cross architecture code you'll have to keep in mind big-endian vs. little-endian.

      ~X~

      --
      ~X~
  16. Being pedantic perhaps... by SteveAyre · · Score: 1

    Being pedantic perhaps, but you might get a buffer overflow in your code if for some reason it doesn't have a NULL at the end.

    Better would be to call the subroutine with (char* str, size_t length) and to replace strlen(str) with str+length.

    The subroutine then becomes useful more often too - you can then reverse portions of strings as well as the entire string if you want to.

    1. Re:Being pedantic perhaps... by halleluja · · Score: 1
      I wouldn't replace in-situ either. You're bound to forget you reversed the string later in the flow of your program.

      If it's hidden deep in your code, it might prove a real headache why string XXX suddenly reverses. The overhead is pretty low, though the extra malloc might consume some constant time (which is the topic here, decreasing constant time).

    2. Re:Being pedantic perhaps... by SteveAyre · · Score: 1

      Depends what your doing I guess. Some things might be easier to track and others not.

      I'd say it's more useful in situ - malloc followed by memcpy or str(n)cpy yourself would let you copy it first if you you didn't want to change the original string while you could still do it on the already allocated string if you wanted.

      Besides which if you do a malloc then you're wasting extra memory - which is what the submitter said he didn't want.

  17. True in General, not true in reality by gbrandt · · Score: 1

    There are quite a few compilers out there for embedded CPU's and DSP's that are still in the stone age. In embedded programming it is not safe to assume that a compiler will contain any optimization and to write C code that is known to produce faster assembly code.

    Thats just a fact of life for people in the trenches.

    I suspect that for OS level compilers (GCC, VS) that the opposite is true and you could assume a decent level of optimization is available.

    Gregor

    1. Re:True in General, not true in reality by Anonymous Coward · · Score: 0

      but since most code will never be compiled on any of those systems the arguement is moot.

      target your systems. (that doesnt mean write sloppy and not care about portability)

      but basically if you arent running on ancient compilers then dont worry about it

    2. Re:True in General, not true in reality by halleluja · · Score: 1
      In embedded programming it is not safe to assume that a compiler will contain any optimization and to write C code that is known to produce faster assembly code.
      Size matters.
  18. Strength Reduction by The+boojum · · Score: 3, Informative
    My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
    Yes, semantically array[index] has to have the same effect as *(array+index). But the compiler is free to generate conceptually equivalent code in any way that it pleases. Any decent C/C++ compiler optimizer that can perform strength reduction ought to be able to see how the index changes the memory location and turn it into simple pointer incrementation accordingly. And strength reduction is a well known optimization that's been around for ages -- if memory serves, even the old Red Dragon book talks about how it works in this context. If your compiler can't handle this you need to find a better compiler.
  19. Do This instead by Leimy · · Score: 2, Interesting
    void reversestring_array(char *str)
    {
    int head, tail;
    char temp;
    if (!str) return;
    tail = strlen(str) - 1;
    for (head = 0; head < tail; ++head, --tail)
    {
    temp = tail[str];
    tail[str] = head[str];
    head[str] = temp;
    }
    }
    That'll teach ya.
    1. Re:Do This instead by forkazoo · · Score: 1

      Bah, the original requirement was to do it with as little memory allocation as possible. You still use a temp variable. Let bitwise operations be your friend.

    2. Re:Do This instead by Hast · · Score: 1

      As was discussed in the original thread one point that this excercise miss is that a compiler will naturally put the temporary variables into registers. So the entire XOR thing is cute but will produce slower and less understandable code.

      Naturally neither version will actually use any added memory (as in RAM) but I doubt an optimising compiler will be able to perform strength reduction on the XOR operations.

      Don't try to out-think your compiler. The compiler is your friend.

    3. Re:Do This instead by forkazoo · · Score: 1

      Yeah, I'd never put it in real code. I'm generally pretty trusting of my compiler. There are occasional times when I do wacky stuff, but it almost always happens after profiling, and becoming shocked to realise where my slowdown is. Most recently, I was using a C++ vector class for storing lights in a 3D scene. I had to do lighting calculations for every vertex. What surprised me was that the actual lighting calculations were very fast, but what was slow was accessing the light data from the vector of lights! So, I wound up making my own container class for the lights, even though the STL version "worked."

      If I hadn't found that I was being bottlenecked by the vector access, I never would have written my own container class. (The more I write, the more chance I'll write a bug, after all.)

      The only problem with "job interview" type code snippet questions is that the only compiler is usually your interviewer. You've never met them before, and you have no context for determining optimality outside of what he/she says. In cases like that, I will often assume a non-optimising compiler, and be very literal about the question. I also apologise profusely for answering the question, and explain that in a real project, outside of teh vacuum of an interview question, I would have takena different approach. (And sometimes I will just write it twice if there is time.)

  20. Register indexing. by idries · · Score: 1

    This is an interesting post. Much better than all the politics etc. that we've seen recently on ./

    I can't think of any modern architectures that don't support register indexing. Can anyone else?

    On another note, although at first glance it may appear that the poster is correct I think that a good compiler written for an architecture that didn't support register indexing (M68000 does, so it's going to have to be something very old) would probably be able to figure out what's going on in this code sample. i.e. in the array version within the for loop head and tail are only used as indexes into the array, so converting them into straight pointers would be trivial. Of course in a more complex example YMMV.

    1. Re:Register indexing. by clem.dickey · · Score: 1

      > I can't think of any modern architectures that don't support register indexing

      I don't see any register indexing in IA64. No immediate offset, either. If you want to access an address, you'd better have that address - exactly - in a register. At least you get enough registers. :-).

      Incidentally, the IBM S/360 was an early machine (perhaps the first) with base+index+offset addressing. I've heard that the three-way adder required was a performance hit. I know that some models had instructions which were faster when the index register field was 0, indicating no indexing.

  21. there is a difference... by simcop2387 · · Score: 1
    The array function:
            mov bl,byte ptr [esi+edx]
            mov al,byte ptr [ecx+edx]
            mov byte ptr [ecx+edx],bl
            mov byte ptr [esi+edx],al

    The pointer function:
            mov bl,byte ptr [ecx]
            mov dl,byte ptr [eax]
            mov byte ptr [eax],bl
            mov byte ptr [ecx],dl

    if you look closely at these two portions of the assembly, the resulting code is not the same at all, what you in fact have is in the array 4 additional additions (between registers), which on some architecures can be a non-trivial amount of time (though typically not significant, the memory access is usually longer). in this case the pointer function should be at the least slightly faster than the array one. though as everyone else has said, make sure this is truely a bottleneck of your program before you go doing useless optimizations.

    P.S.
    Also on other architecures (non-x86) this could be a trivial amount of time if they were designed to handle this specifically (two registers added together as a base and offset basically) for memory access.
    1. Re:there is a difference... by SillyNickName4me · · Score: 1

      Got to think of the 6502 and its indirect indexed addressing modes..

      Basicly, you'd store a pointer somewhere in memory page 0, and then use the x or y register as an index into whatever it pointed at (ie LDA ($a0),x would take the pointer stored in address a0 and a1, use the x register as an index, and read the contents of the resulting address into the acumulator) . Here it would be faster to manipulate the index and not the pointer.

      Pointers were 16 bit (64k address space) and x and y were 8 bit registers.. so when you wanted to address beyodn the boundary of a (256 byte) memory page, you'd have to manipulate the pointer anyway.

    2. Re:there is a difference... by Anonymous Coward · · Score: 0

      Yes, and it could be even better with...
      if (str)
              {
              for(tail=str;*tail;tail++); // instead of strlen
              for(tail--;tail>str;str++,tail--)
                      {
                      t=*tail;*tail=*str;*str=t;
                      }
              }

    3. Re:there is a difference... by simcop2387 · · Score: 1

      well personally i'd suggest this
      <blockquote>*tail^=*str; *str^=*tail; *tail^=*str;</blockquote>
      for the swap, as this will be faster on many more architecures. and its more memory effecient.

    4. Re:there is a difference... by Waffle+Iron · · Score: 4, Informative
      what you in fact have is in the array 4 additional additions (between registers),

      Actually, most x86s have a dedicated address generation units which handle those index additions in parallel on separate logic from the main ALU. So both cases would actually run at the same speed on a modern x86.

    5. Re:there is a difference... by Anonymous Coward · · Score: 0

      True, and that way only one temporary variable is needed for the function (the tail ptr) as logic analisis indicates.

    6. Re:there is a difference... by UOZaphod · · Score: 1

      Since the original ended up using registers, it couldn't be any more efficient.

      Before I even posted it, I tried it with the XOR swap. The code ended up looking very convoluted. I don't think the compiler recognized the XOR swap design.

      --
      "The unicode stuff in the latest version is working fabulously well. My russian mafia friends are ecstatic."
    7. Re:there is a difference... by Anonymous Coward · · Score: 0

      So both cases would actually run at the same speed on a modern x86.

      "Modern x86" would be the ones that drop backwards compatibility with the 8088/80286/80386 processors?

      x86 is the opposite of "modern".

    8. Re:there is a difference... by Anonymous Coward · · Score: 0
      Yeeeehaaaaaw. Let's XOR, baby. It's cute. It's fast. It's obfuscated. Besides, we like doing 3 memory writes instead of 2 because it saves a register! We roooooooock!

      Wait. Why did everybody suddenly get really quiet? You in the back. Speak up.

      For those of you watching from home, the question is "isn't the compiler smart enough to use a scratch register instead of wasting cyles with a 3rd write?"

      Ok well at least we still have obfuscated!

      Remember kids: Memory writes are more expensive than reads. In fact if the memory is "fairly likely" to already contain the value you intend to set, then it's often faster to test first and only write if the value differs. Go ahead and try it.

      Disclaimer: This is really bad coding style, but it gets the point across in very few lines.
      /* inside some loop */
      #ifdef TEST_THEN_SET
        if (p->member != x)
      #endif
          p->member = x
    9. Re:there is a difference... by Anonymous Coward · · Score: 0

      Trivial test wont reveal this... the cache is smart enough to figure those out. The point still stands, memory is slow, registers are fast.

  22. Re:Hmm.. by Strike · · Score: 1
    Preview! Gah.
    You must be new here.
  23. Spelling, YAY! by CestusGW · · Score: 1

    Oh my god! You said piqued Not peaked, or piked, or any other dumb thing that editors let slip by in mainstream publications, but piqued! I love you

    --
    Too much repetition my too much repetition!
    1. Re:Spelling, YAY! by Anonymous Coward · · Score: 0

      Ramen. Probably explains why it's not a dupe.

  24. There cannot be any difference. by Anonymous Coward · · Score: 2, Informative

    The whole point is dumb.
    Every C programmer should know the answer to the following question:

    What is 6["abcdefghijkl"]?

    Answer: 'g'.

    How is this determined?
    By definition x[y]=*(x+y)=y[x].

    Don't believe me, check the standard.

    ( Yeah this is a degenerate case, like Duff's device. Still a compiler has to support it. )

    1. Re:There cannot be any difference. by Frans+Faase · · Score: 1

      Are you sure? I thought that x[y] should be interpretted as *(x+y*sizeof(x[0])). So it is only true when sizeof(x[0]) == 1.

    2. Re:There cannot be any difference. by drxenos · · Score: 1

      No, pointer arithmetic takes care of that. So, it is *(x + y). If x were an int*, then x + 1 would point to the int just after x (or x + 0), not need for the programmer to do it. Works the same for arrays since in this case the array name would "decay" to a pointer to its first element.

      --


      Anonymous Cowards suck.
  25. Effective cache use will be a better optimisation by WouldIPutMYRealNameO · · Score: 1

    Firstly - who really cares? This kind of optimisation almost never occurs as far as I can tell. Secondly, counting CPU instructions hasn't been accurate for years. Thirdly, for any kind of memory access loop, cache misses will most likely be your biggest performance hit. The best way to actually improve this loop is probably going to be by...
    1) if the length of the data is 4k, just do the algorithm
    2) otherwise, stride through the data in page-size increments reading 1 32bit word, then do the algorithm

    But seriously - either you are working in a system so small and slow that you are already an expert in this stuff, or this kind of optimisation doesn't matter.
    Choose quicksort instead of bubble sort.

    --
    Damnit - I wanted my nick to be "WouldIPutMYRealNameOnSlashdot"
  26. You can... by Bin_jammin · · Score: 1

    always assume anything you want. It won't make it correct though.

  27. But not everything is a "modern PC". by Roadkills-R-Us · · Score: 1

    There are still tons of smaller, slower, "antiquated" processors embedded in all sorts of things, for which code gets written. The Z80 is still alive and well, believe it or not, along with lots of other, old standards (and newer, but still small and slow, CPUS).

    1. Re:But not everything is a "modern PC". by topham · · Score: 1

      very true.

      but reading the submission it really looked to me like someone who simply wanted to justify their opinion on something. he wanted a generic answer to a question which requires a specific answer.

      if someone here said the pointer method was always faster he would run back to his friends and declare he was right all a long.

      the fact is the answer requires knowledge of all the target platforms as well as all of the compilers involved.

      pointer method is generally faster and more efficient; but it makes for difficult to read code which can be very difficult to debug, or modify in the future. particularly after the code hasn't been looked at for some period of time. unless the particular code is execute a large majority of the time there is no reason to make it more difficult to read and modify.

      please excuse the complete lack of caps. synergy seems to have decided my mac doesn't need capitals

  28. Spelling and grammar and punctuation, oh my! by Roadkills-R-Us · · Score: 4, Funny

    I didn't see any errors in punctuation or grammar, either. I don't recall the last time I saw a post of that length which didn't confuse plural's (sic) with possessives.

  29. Rubbish. by idries · · Score: 1

    IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.

    Obviously you have never worked with low-power architectures or non-mainstream compilers. While this particular example may well be too simple to justify using pointer arithmetic in any environment, in a more complex example that type of thing may well be justified (espically on more exotic architectures).

    Just expecting compilers to do a good job is no substitute for being able to ensure that they do and being able to do a good job for them if they don't.

  30. In summary... by SmurfButcher+Bob · · Score: 1

    I've been reading some really good arguments in these posts about how it'll work. And the bulk of them end up saying, "any good compiler will/should"...

    In other words, "No, you cannot."

    Your best bet is to do (almost) what you did - compile a test case per platform, check the code - and then the part that you neglected, you need to count the clock ticks per instruction. To reinforce what one poster said - there can be a large difference between [ecx] and [ecx+edx], conceptually speaking. If it truly matters then you count the ticks for each platform, and decide if it is reasonable. 3 ticks versus 2... that's 33%. 2 ticks versus 1... that's 50%.

    --

    help me i've cloned myself and can't remember which one I am

    1. Re:In summary... by Anonymous+Brave+Guy · · Score: 2, Funny
      If it truly matters then you count the ticks for each platform, and decide if it is reasonable. 3 ticks versus 2... that's 33%. 2 ticks versus 1... that's 50%.

      Not necessarily. You're ignoring the fact that modern processors use pipelining architectures, branch prediction, extensive caching often at several levels, and a whole host of other things that mean the total time required is not the sum of the individual times for each instruction.

      One of these days, someone will invent a program that can take some more abstract representation of what we want to do, and automatically generate optimised machine code from it on any given platform. Yeah, there's an idea... I can C it now!

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    2. Re:In summary... by SmurfButcher+Bob · · Score: 1

      Yes, necessarily.

      "When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" is the quesiton.

      "You're ignoring the fact that modern processors use pipelining architectures, " you propose.

      As I said, the bulk of the posts end up saying, "any good compiler will/should"...

      In other words, "No, you cannot."

      But even in the case with chip-level optimizations, your point is still moot; for those processors, then it doesn't make any difference for you - call them all 1 tick. Of course, some code is more pipelinable and can be paralleled better than others... as you well know, so it still matters. That's not his question, anyway. If you read his question carefully, you'll notice the word "ALWAYS" floating around. This is a very strong qualification that you seem to have missed. Last time I checked, 8051s were still in production, not to mention the large pile of other embedded "super-micro" processors out there. Very few support pipelining or prediction, or even caching for that matter. Does my Palm Pilot support branching? How about my PocketPC? Or my generic WinCE device?

      "Always." Right?

      --

      help me i've cloned myself and can't remember which one I am

  31. differences not having to do with optimization by bersl2 · · Score: 1

    int array[8];
    int *int_ptr = malloc(8 * sizeof(int));
    char string[] = "Hello world!";
    char *char_ptr = "Hello world!";

    sizeof(array) == 8 * sizeof(int);
    sizeof(int_ptr) == sizeof(int *);

    You can get int_ptr to point to another location in memory (such as doing ++int_ptr); the similar statement (++array) is illegal.

    I think that altering individual characters of char_ptr is undefined (as it would result in changing the value of the literal string), while (as above) string cannot be pointed to a new literal string.

    1. Re:differences not having to do with optimization by Anonymous Coward · · Score: 0

      You can alter the chars of char_ptr all you want, it's just a pointer to some bytes (that happen to be ascii chars) on the stack. Of course you don't want to do stupid stuff like kill the null terminator, but nothing stops you from messing with it.
      That's why you should be using some const action to keep people from screwing with stuff that shouldn't be touched.

    2. Re:differences not having to do with optimization by Anonymous+Brave+Guy · · Score: 1
      You can alter the chars of char_ptr all you want, it's just a pointer to some bytes (that happen to be ascii chars) on the stack.

      Nope, sorry, he was right and you're wrong. In the code given, char_ptr points to a string literal, which is automatically constant data and can be stored somewhere completely different to either the stack or the heap. Attempting to change that data results in undefined behaviour.

      (For completeness: in C++, you can only even point a (non-constant) char * to a string literal because of a deprecated hack in the language spec that allows you to violate the type system, though I'm not sure C99 picked up on that one.)

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  32. your mileage will vary, but... by blackcoot · · Score: 3, Informative

    ... my experience has been that this matters more in the multidimensional array case than in the single dimensional array case (for those who are curious: my goal is to write algorithms which do non-trivial amounts of processing on VGA or larger video at full frame rates [>= 15Hz], any time i can make array operations faster my entire program benefits significantly). when dealing with two dimensional arrays, you can either do the addressing yourself (location [i,j] in a r x c matrix maps to [i*c+j] in a flat array). if you are clever about how you explain your indexing to the compiler, it should realize that you're passing through consecutive addresses in memory and generate code accordingly. if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.

    have you tried this with intel's c/c++ compiler or other compilers? i'd be curious to see if what you're seeing is a result of how gcc is limited in the number of optimizations it can apply directly to the parse tree because it can't assume (at that stage) a particular target machine.

    1. Re:your mileage will vary, but... by PsychicX · · Score: 1

      if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.

      Not true. Not true at all. Multi-D arrays are merely syntactic sugar for single dimension. It doesn't involve two pointer dereferences unless it's a jagged array, aka an array of pointers. A built in C multi-D array will be just as good as a single-D array.

    2. Re:your mileage will vary, but... by blackcoot · · Score: 1

      it depends on how you allocate your arrays. if everything is statically allocated with sizes known at compile time, you are correct. the dynamic case, however, is a very different story.

    3. Re:your mileage will vary, but... by Mr2001 · · Score: 1

      if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.

      That's true for jagged arrays, but IIRC, statically allocated multidimensional arrays in C are not jagged. A[i][j] is equivalent to A[i*c+j] (or maybe A[i+j*c], whatever).

      --
      Visual IRC: Fast. Powerful. Free.
  33. Use Java instead .... by icepick72 · · Score: 2, Funny

    ... and then you will no longer have to worry that it might be slow.

  34. GCC experimental results by RML · · Score: 5, Interesting

    Just for fun, I tried the sample code on gcc (GCC) 4.1.0 20050723 (experimental), with -O3 -march=pentium-m. The loop from the array version:

    L13:
    movzbl  -1(%ebx), %edx
    movl    %esi, %ecx
    decl    %edi
    movl    8(%ebp), %eax
    movb    %dl, -13(%ebp)
    movzbl  -1(%esi,%eax), %edx
    movb    %dl, -1(%ebx)
    decl    %ebx
    movzbl  -13(%ebp), %edx
    movb    %dl, -1(%esi,%eax)
    incl    %esi
    cmpl    %ecx, %edi
    jg      L13

    The loop from the pointer version:

    L5:
    movzbl  1(%esi), %edx
    movl    %esi, %ecx
    movzbl  (%ebx), %eax
    movb    %al, 1(%esi)
    decl    %esi
    movb    %dl, (%ebx)
    incl    %ebx
    cmpl    %ecx, %ebx
    jb      L5

    Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s
    Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s

    So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.

    --
    Human/Ranger/Zangband
    1. Re:GCC experimental results by Anonymous+Brave+Guy · · Score: 2, Interesting

      GCC is designed to be portable, not fast, so the code is generates is often pretty bad compared to specialised, platform-specific compilers. Obviously your test is relevant if GCC is the compiler you'll be using, but for serious performance work it's pretty much irrelevant what GCC generates because no-one uses it when native alternatives are available anyway. In fact, your example code here is a great demonstration of this!

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    2. Re:GCC experimental results by UOZaphod · · Score: 2, Insightful

      What is used to compile the Linux Kernel?

      --
      "The unicode stuff in the latest version is working fabulously well. My russian mafia friends are ecstatic."
    3. Re:GCC experimental results by yarbo · · Score: 1

      gcc, icc, or tcc. The majority of people probably use gcc, but tcc and icc are definitely possibilities.

    4. Re:GCC experimental results by Nelson · · Score: 1

      Use an actual release of the compiler and report this regression to the gcc mailing list. It's an error.

    5. Re:GCC experimental results by dtfinch · · Score: 1

      No one who wants performance uses gcc?

    6. Re:GCC experimental results by Anonymous Coward · · Score: 0

      A more accurate description would be that gcc is designed to be as fast as possible while remaining portable. A very large proportion of gcc development work concentrates on optimizations, but compared to the intel compiler, the gcc folks develop generic optimizations that work on all architectures. They do have some architecture-specific optimizations especially in the last code generation stages that are written for a single architecture in any case, but they don't structure the entire compiler for the benefit of a single architecture.

    7. Re:GCC experimental results by RML · · Score: 1

      Sadly, compiling gcc on cygwin has been broken for months, otherwise I'd test it with a more recent snapshot.

      --
      Human/Ranger/Zangband
    8. Re:GCC experimental results by tajribah · · Score: 2, Interesting

      Actually, that's not true in many cases -- GCC 3.x generates very good code, but the 4.x versions still haven't caught up with the 3.x line.

    9. Re:GCC experimental results by Deorus · · Score: 2, Insightful

      >> no-one uses it when native alternatives are available anyway

      Aren't hundreds of Linux distributions out there enough to prove that assumtion wrong? Gcc is used regardless of its speed because it's free.

    10. Re:GCC experimental results by Anonymous+Brave+Guy · · Score: 1
      Actually, that's not true in many cases -- GCC 3.x generates very good code, but the 4.x versions still haven't caught up with the 3.x line.

      I write high performance, highly portable code for a living. I use nearly a dozen compilers, including GCC, supporting more than half a dozen different platforms between them. I just finished spending several weeks specifically researching ways to further improve that performance. And I'm sorry, but you're wrong.

      Code generated by any recent version of, say, VC++ or Intel's compiler, blows code generated by GCC away most of the time on performance. These other compilers are tuned to optimise for a specific architecture, where GCC's optimisations are inherently generic for the most part, so this isn't really surprising. Indeed, the sample code in the first post of this subthread is an example of exactly this issue, and you can compare it with similar examples from other compilers posted elsewhere in this discussion to see the difference.

      Even on the generic optimisation front, GCC has only very recently merged in the SSA branch, while competing compilers have been supporting whole program optimisation using related techniques for several years now. I imagine the balance will tilt back towards GCC a bit now the SSA framework is in place, since this gives scope for many new optimisations to be introduced with relatively little effort, but as of today that hasn't happened yet.

      None of this is necessarily a criticism of GCC per se; its goals are different to VC++ or Intel's software or whatever other alternative. But if raw performance is what matters, GCC is usually not the best choice today.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    11. Re:GCC experimental results by Anonymous+Brave+Guy · · Score: 1

      You stripped the qualifier "for serious performance work" when you quoted me. GCC has many good things going for it; performance just isn't (yet?) one of them.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    12. Re:GCC experimental results by Piquan · · Score: 2, Informative

      Unfortunately, the article didn't say what compiler he was using. But since we're giving data points:

      gcc 3.4.2 3.4.2 [FreeBSD] 20040728, x86, -O3 -march=pentium-m. Generated essentially the same code as the article's.

      Array version:

      .L12:
      movzbl (%esi,%ecx), %edx
      movzbl (%esi,%ebx), %eax
      movb %al, (%esi,%ecx)
      decl %ecx
      movb %dl, (%esi,%ebx)
      incl %ebx
      .L10:
      cmpl %ecx, %ebx
      jl .L12

      Pointer code:

      .L23:
      movzbl (%ebx), %edx
      movzbl (%ecx), %eax
      movb %al, (%ebx)
      decl %ebx
      movb %dl, (%ecx)
      incl %ecx
      .L22:
      cmpl %ebx, %ecx
      jb .L23

      1.4 GHz Athlon. Array code time: 3.274s. Pointer time: 3.322s. Single (100000x) trial of each.

      I'd say that's within noise.

    13. Re:GCC experimental results by akihabara · · Score: 1

      tcc is hardly a joke, let alone a kernel compiler, twat

    14. Re:GCC experimental results by yarbo · · Score: 1
  35. And C++/STL? by XenonOfArcticus · · Score: 4, Informative

    And what of STL container classes under C++?

    Seriously though, there is no generalized answer. Good compilers will do what you want. Bad compilers (and there are more than you realize out there) will make lousy code.

    If your target involves an environment where you might be using a more primitive compiler, or you can't predict the environment and compiler, it might be an issue. This is why code like the PNG and JPEG libs go for tight/cryptic. As well, the performance of the runtime platform (CPU, memory, bus) have bearing. In some cases, though one piece of assembly might look less efficient than the other, the sheer brute force of CPU parallelism, out of order execution and other black juju might render it meaningless.

    Finally, you have to consider the cost/benefit of the situtation. Making cryptic fast code is worthwhile if you're writing some wicked FFT code or image processor main loop, where it'll run a few quadrillion times. Other places in the same codebase though, it's probably worthwhile to trade absolute performance for a bit better code readability and maintainability.

    Remember, profile before optimizing. Only optimize things that will really make a significant performance difference. Rewriting the UI display loop to use pointers instead of lists is probably pointless. Heh. Pointless.

    I'm a big fan of C++ STL containers now. If I _know_ a block of code is going to be a critical bottleneck, I'll use something else from the start. I've known people who coded UIs in assembly, for no good reason, and others who wrote image processing code in interpreted RAD scripting environments. I've written and optimized code (C++/C and assembly) on systems all the way back to the 6502 (yay! two 8-bit index registers!) and it's not hard, as long as you proceed sensibly and logically.

    That being said, the Microsoft VC++6 compiler (still in common use today) has a terrible code generator. It fails to perform simple loop invariant hoisting operations that my old SAS/C++ compiler (Amiga, yah, I'm one of _them_...) did in 1990. VC++7/2003 and Whidbey/2005 are showing signs of being MUCH more caught-up, and the Intel and Codeplay compilers (despite Intel's AMD-phobia) are much better too.

    When performance really counts, a whole new set of tools and processes come into play.

    --
    -- There is no truth. There is only Perception. To Percieve is to Exist.
    1. Re:And C++/STL? by Arandir · · Score: 2, Funny

      I'm currently writing a piece of software to compete with a commercial proprietary offering. I'm using "bloated" STL and "bloated" C++ to manipulate "bloated" XML. Frankly, I'm shocked at its poor performance, and I might have to do some optimizations. It takes about a half of a second to load and process 200k worth of XML.

      ON THE OTHER HAND, the commercial proprietary alternative I'm competing against loads the equivalent data. But that data consumes 25Megs, and takes five seconds to load! You would think if they're going to use a proprietary data format, they could at least make one that works!

      I'm not going to apologize anymore for using "bloated" tools.

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    2. Re:And C++/STL? by WARM3CH · · Score: 2, Informative
      OK I tried it with Visual Studio 2003 and also tried STL. Of course with STL you only need one line of code to reverse a character string:
      reverse(str, str+strlen(str));
      Now, the interesting part is what the optimizing compiler outputs for each of the three vairants (I only include the inner loop):
      With pointers we have:

      $L13583:
      mov bl, BYTE PTR [ecx]
      mov dl, BYTE PTR [eax]
      mov BYTE PTR [eax], bl
      mov BYTE PTR [ecx], dl
      inc ecx
      dec eax
      cmp ecx, eax
      jb SHORT $L13583

      With array we have:

      $L12527:
      mov bl, BYTE PTR [ecx+esi]
      mov dl, BYTE PTR [eax+esi]
      mov BYTE PTR [eax+esi], bl
      mov BYTE PTR [ecx+esi], dl
      inc ecx
      dec eax
      cmp ecx, eax
      jl SHORT $L12527

      and with STL we have:

      $L13663:
      mov dl, BYTE PTR [ecx]
      mov bl, BYTE PTR [eax-1]
      dec eax
      mov BYTE PTR [ecx], bl
      inc ecx
      cmp ecx, eax
      mov BYTE PTR [eax], dl
      jb SHORT $L13663
      Quite nice, no? ;) Now, I made one last step and used the RDTSC instruction to actually count how many clock cycles it takes to run each version of the function to reverese a 80 characters string. This way we can also see the effect of the parts not inside the loop. This is the result:
      function with the array: 661 cycles
      fucntion with the pointer: 616 cycles
      function with the STL: 607 cycles
      So although the core loop section is almost identical with the STL and pointer version, the STL version is tiny bit better with the setup section. All in all, I think this an example to show nice and neat C++ code can compete fairly well with optimized C.
  36. use array by dave1g · · Score: 1

    If you use arrays the compiler these days will ocnvert to pointer arithmetic. However it also now knows that it is an array and can make assumptions not available to it before allowing more code order optimizations in loops.

    These days if its something stupid like using one syntax over another, that optimization has been built into the compiler. The only real speed enhancements are going to be algorithmic. You know the old thigns that make sense, take stuff out of loops that dont need to be inside, compiler will do that so dont even bother making the change unless it increases readability. blah blah blah...

  37. TFA is a Troll by mugnyte · · Score: 1


    If you answer this question, you fuel a discussion that should be constrained to a very very small domain of C development. Otherwise, you're going to see needless flag waving about coding standards in domains where it doesn't need to exist. Don't feed the dirty C programmer (we like other info, really, we do).

    and frankly, the TFA depends on architecture, where most modern compilers make both snippets equal.

    1. Re:TFA is a Troll by UOZaphod · · Score: 1

      I submitted the article because I am genuinely interested in the idea (which opposes my original thinking) that I can get away with using arrays and not take a performance hit.

      I am happy if it is true, because using arrays results in code that is easier to maintain and read.

      When submitting the article, I made it a point to avoid using inflamatory language, and went out of my way to present both sides of the issue, even going as far as making a case for the opposing view.

      In fact, when I brought the idea up the first time in the other thread, my own posts were met with a harsh remark about why I was wrong. That comment, while not as friendly as I would prefer, is what got me interested in the idea. I posted to Ask Slashdot because I wanted to hear some more level-headed opinions on the matter that didn't include inflamatory language.

      From the replies I've seen so far, I can see there are indeed some critical thinkers reading this site who are nice enough to respond in kind.

      --
      "The unicode stuff in the latest version is working fabulously well. My russian mafia friends are ecstatic."
  38. Correctness trumps illusory speed improvement by Anonymous Coward · · Score: 0

    Your array vs pointer question is interesting from a navel-gazing perspective, but of no importance in practice. In practice, the overriding concern is readability and correctness. Performance is only important if benchmarking shows that the routine is a bottleneck.

    In fact, your pointer example has a subtle bug: if strlen(str) == 0 then you can have ptail = str - 1 which is strictly illegal (you cannot point to a location outside of a memory object except for exactly one element past the end, which is then illegal to dereference). I know that no modern systems would fail with this code, but I expect it would screw up on some memory models with old MS-DOS C compilers.

    In short, you should always write the simplest, most readable code, then optimise later if there is a problem. Both array indexing and pointer versions can be written well, so there is no reason to prefer either. By the way, I would mark you down for using !ptr when ptr == NULL is the more readable idiom.

    1. Re:Correctness trumps illusory speed improvement by Anonymous Coward · · Score: 0

      First of all, tail is an integer not a pointer. Second, its use as an index is guarded by the predicate head < tail. Since head is initialized to 0, an initial tail value of -1 will ensure that the loop is never executed.

      Second, every serious C coder I know would agree that !p is the more accepted idiom (going all the way back to the days of K & R).

    2. Re:Correctness trumps illusory speed improvement by Anonymous Coward · · Score: 0

      "tail" is an integer, but "ptail" is a pointer. No points for reading comprehension for you!

      My copy of K&R uses p == NULL exclusively. Perhaps the 1st edition used !p.

      No matter. In the case of !p, style in C has moved on, and p == NULL is the clearly superior idiom.

  39. Re:Hmm.. by Alomex · · Score: 1

    Of course, because a feature such as "edit" is only for wimps...

    The unix philosophy lives on: if the system is cumbersome and user unfriendly, it's the user's fault!

  40. Unfortunately, the 2000s answered... by Anonymous+Brave+Guy · · Score: 1

    ...and said they can't have them because all these new-fangled alternatives suck when you've got serious work to do.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    1. Re:Unfortunately, the 2000s answered... by Anonymous Coward · · Score: 0

      Hah. I guess that's why there's so many web apps written in C these days, right?

      Oh wait, there isn't. Fucking elitist dipshit.

    2. Re:Unfortunately, the 2000s answered... by Anonymous Coward · · Score: 0

      Yeah... like APACHE

      Oh wait, there are. Fucking ignorant dipshit.

    3. Re:Unfortunately, the 2000s answered... by Anonymous Coward · · Score: 0

      web apps? we were talking about *serious* work.

      bloody script kiddies.

    4. Re:Unfortunately, the 2000s answered... by Anonymous Coward · · Score: 1, Insightful

      all these new-fangled alternatives suck

      Right, that's why I use Common Lisp.

    5. Re:Unfortunately, the 2000s answered... by UOZaphod · · Score: 1

      Isn't the entire Linux kernel written in C?

      I wasn't talking about database front ends and other softcore programming when I submitted the article.

      --
      "The unicode stuff in the latest version is working fabulously well. My russian mafia friends are ecstatic."
    6. Re:Unfortunately, the 2000s answered... by Anonymous Coward · · Score: 0

      Hey dipshit - he is refering to the days when people wrote CGI in C.

      Apache is not a 'web app' you fucking tard, it's a web server. There's a big difference.

    7. Re:Unfortunately, the 2000s answered... by kasperd · · Score: 1

      Isn't the entire Linux kernel written in C?
      Almost. There are some low level stuff which need to be done in assembler. But except from that it is all C.

      --

      Do you care about the security of your wireless mouse?
  41. Re:Effective cache use will be a better optimisati by TheLink · · Score: 1

    Yeah that's what I figured, use a good algorithm (e.g. N vs N^2 ) and as long as the processing loop fits in cache you'll be fine.

    And if you can't find/figure out a good algorithm yet, don't do really strange stuff. That way someone later can more easily figure out how to fix your code.

    --
  42. As Donald Knuth said... by jschmerge · · Score: 4, Insightful

    Premature optimization is the root of all evil.

    I personally let the compiler writers worry about this type of thing. I'd rather have my code be more readable than fast. That being said, there are many times that I switch back and forth between pointer arithmetic and array indexing within the same program. I'll primarily use the pointer arithmetic for things like simple string processing where it just leads to more compact code, and I'll use the array indexing when I have an actual bonafide array...

    In any event, my point is that you should be programming in a way that is maintainable and readable; you shouldn't be worried about shaving tens of clock cycle off of such simple loops. In more complex loops, you probably will not be able to shave nearly as much time off, because your indexes won't always be sitting in a register and the data that the index points to will most likely not even fit into a register. In this case, I don't think anyone will argue with the assertion that this:

    (ptr + index)->dataMember

    is less readable than this:

    ptr[index].dataMember
    1. Re:As Donald Knuth said... by illuminatedwax · · Score: 1
      I personally let the compiler writers worry about this type of thing.
      This is Slashdot; don't you think some of us are compiler writers?
      --
      Did you ever notice that *nix doesn't even cover Linux?
    2. Re:As Donald Knuth said... by Anonymous Coward · · Score: 0

      This is slashdot. Most of us post links to pictures of a man spreading his ass in ways that ought to be illegal.

    3. Re:As Donald Knuth said... by illuminatedwax · · Score: 1

      And goddammit, we're gonna make that as CPU efficient as we possbily can!

      --
      Did you ever notice that *nix doesn't even cover Linux?
    4. Re:As Donald Knuth said... by Anonymous Coward · · Score: 0

      How about using index[ptr].dataMember? I think it's much more convenient and readable than ptr[index]. This is because it is so similar to offsetof_dataMember(base, index, size_of_struct) - corresponding assembler syntax for accessing the array (assuming size of struct is 1, 2, 4 or 8 bytes)... I wonder why they ever did bring also alternative syntax ptr[index] to C, it looks stupid indeed.

  43. Pointers are for programmers silly rabbit!! by Da_Weasel · · Score: 2, Insightful

    Pointers are for programmers, not for speed. A particular compilers implementation of the C specification may produce better or more efficient code when using pointers, but i seriously doubt that every implementation would work the same or even across different architectures.

    --
    If you must!
    1. Re:Pointers are for programmers silly rabbit!! by tepples · · Score: 1

      A particular compilers implementation of the C specification may produce better or more efficient code when using pointers

      But if the majority of "compilers implementation[s] of the C specification [] produce better or more efficient code when using pointers", then isn't that something to talk about?

  44. Re:Hmm.. by cortana · · Score: 1

    I think that people making idiots out of themsevles is much more preferable to trolls editing their +5 moderated posts into Goatse ascii art.

  45. Re:Effective cache use will be a better optimisati by functor0 · · Score: 2, Funny

    I have good story for this one. In my second co-op term, I was asked to improve the speed of the general 2D convolution image filter for a well-known commercial paint package. Nothing complicated, just run the given 2D kernel through the image.

    The previous writer of the function had carefully removed all the array accesses, only using pointers that were incremented by fixed constants as it proceeded through the code. He had also carefully maintained an array of the sums of the results as we moved from pixel to pixel to avoid re-calculating anything twice. You would think, it's pretty fast, eh?

    I came across two problems with the code:

    1. His array of sums was actually a queue and so he shifted the entire array by one element for every single pixel. Using a ring buffer instead quickly solved this one.

    2. I then came to realize that he was traversing the row-major image in columns. The cache coherency was being shot to hell because none of the cache-lines were being hit as it went from pixel to pixel. I rewrote the function to go by rows instead and guess what the speed improvement was? Something like 3 times!

    Go figure. Some people optimize and get it entirely wrong.

  46. Re:Hmm.. by Hast · · Score: 1

    There is no edit function on Slashdot because down that road lies edit wars and other problems. (Specifically there are issues with editing + moderation as the mod should reflect the original article and not the edit.)

    Sure it would be nice with a "allow edit withing 3 minutes" function which can be found on WWW boards. OTOH Slashdot generates several times the traffic as other sites with webboards.

  47. Re:Hmm.. by Alomex · · Score: 1

    or how about an "editing undoes the moderation" feature, just as posting to a thread one has moderated undoes any moderation one might have done therein....

  48. Thinking like this by phorm · · Score: 2, Insightful

    Is why we have big, ugly, bloated programs that require overpowered CPUs. First of all, it depends on what the application is being coded for. Perhaps it's actually intended for slower machines as well as faster... not everything is made to run on a P4-4Ghz machine you know.

    Secondly, these operations can add up. If, for example, this scenario is used throughout the code and called several times per second, on an operation perhaps requiring ready output, the output might even be visible delayed on a faster machine.

    Pointers can be a pain in the ass and I definately agree that I find them annoying at times, but they also have their place and you don't always have the option of sacrificing beauty for functionality. If the code is a bit confusing, well that's what comments are for.

  49. Pointers and arrays are NOT interchangeable! by oldfogie · · Score: 1

    This one recently came and bit me on the ass.

    Pointers and arrays are NOT interchangeable!

    Consider the data:
          char m1[7] = { 1, 2, 3, 4, 5, 6, 7 };
          char m2[7] = { 8, 9, 10, 11, 12, 13, 14 }; /* one of m1, m2 will be at an odd address */
          struct z { u_int16_t a; u_int16_t b[2]; }__attribute__((packed));

          Then,
          printf("m1: %04x ?= %04x\n",((struct z *)m1)->b[0],*((struct z *)m1)->b);
          printf("m2: %04x ?= %04x\n",((struct z *)m2)->b[0],*((struct z *)m2)->b);

    According to section 3.2.2.1, prefix * and postfix [0] are identical.

    However, according to 3.3.4, the affect of the pointer conversion may yield an invalid pointer, due to the (potentially) stricter memory alignment of the new type.

    Bill

    1. Re:Pointers and arrays are NOT interchangeable! by Anonymous Coward · · Score: 0

      m1: 0403 ?= 0403
      m2: 0b0a ?= 0b0a

      What are you trying to tell? Did you use a broken compiler (or even worse, MSVC++)?

    2. Re:Pointers and arrays are NOT interchangeable! by oldfogie · · Score: 1

      I used GCC with an IAPX420 (an ARM) as the target.

      While the x86 architecture may allow for pointers to arbitrary memory locations, other CPUs may not.

      In this case, GCC used a pointer to a 16-byte memory location, which according to the CPU architecture had to point to an even boundary. Hence, the problem.

      Bill

  50. xvnews? Bah! Real Usenet junkies use... by Anonymous Coward · · Score: 0

    Real Usenet junkies use trn from a command line shell.

  51. *Sigh* Go read a textbook on "strength reduction" by Anonymous Coward · · Score: 0

    For the most part, you can leave it to the compiler. The optimizations are platform-specific, so for general portable code, don't worry too much.

    All the optimizations considered here are examples of "strength reduction" in loops. There's a lot of existing literature that you can find with that term.

    The part that really matters is the multiplication of the index by the element size, something that goes away in your character-string example. Many processors can compute (a + b * k) in a single instruction or addressing mode for some power-of-two values k. For the x86, k = 1,2,4 or 8 have special hardware support.

    If the element size k is not one of these special values, then you have to use
    a longer instruction sequence, and the idea of pre-scaling your index (b) by k is attractive. So rather than ++b and accessing a+b*k, increment b += k and access a+b.

    Whether this is beneficial depends on the number of different factors k to be used. If you are accessing a1[b] and a2[b], and the two arrays (a1 and a2) have different element sizes, then you need to keep track of two values b1 and b2 (unless k2 = 2*k1 or something convenient like that). This can add up to more work than the multiply in some cases. Or the register usage can get out of hand.

    It also matters whether b is incremented simply, or in a more complex pattern. If it's not ++b but b += function(), then the pre-multiplied version is not b += k but
    b += k * function(), and you have to ask if you're saving anything. You may be able to cheaply push the multiplication by k into function(), or maybe not.

    Then there's the question of whether the array access is performed once per loop or not. If a[b] is only accessed every now and then, perhaps it's better to do the multiply in the rare case it's needed, gaining in exchange simple update of b.

    If you're using the array index for loop control, as is common, you also have to adapt the loop control to consider the pre-multiplication. If the ending point of the loop is fixed ("for (i = 0; i 10; i++)"), it's fairly easy to pre-multiply it, but there are hairier cases.

    In any case, if you do end up pre-multiplying b by the array element size, you then have a chance at a second optimization, which is reducing the "a+b" index addition, and just using a pointer p = a, followed by ++p in the loop.

    The problem here is that you need a pointer per array being indexed, even if the arrays have the same element sizes. (Which again increases register pressure.)

    In your example, the multiplication is trivial, so the first optimization is a no-op. Each index is used to access only one array, so pre-adding the array base is worth it. And since it's the same array, checking when the indexes cross stays just the same: "i j" is the same as "p+i p+j". Thus, you can expect the pointer form to be slightly faster.

  52. Re:xvnews? Bah! Real Usenet junkies use... by Anonymous Coward · · Score: 0

    Nah, they use GNUS

  53. You shouldn't make any assumptions ... by bigsteve@dstc · · Score: 2, Insightful
    I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?

    Honestly, there is no real justification for making any assumptions. It depends on how good a particular C compiler is at generating code for a particular ISA. Indeed, for some ISA's I'd expect a niave C compiler to generate FASTER code for indexing an array than for incrementing a pointer. (I'm thinking of word addressed ISAs, where a 'char*' has to be represented as a word pointer and a character offset.)

    In the big picture, it probably doesn't make a great deal of difference to performance which style you use. It might make 5-10% difference in a tight loop, but probably much less across a large application. If the difference performance is significant, you will get more benefit for your effort by:

    • finding better C compilers, or
    • using a profiler to find the real hotspots in the code and hand-optimizing them for the hardware platforms that you really care about.

    As other people have said, other issues than this are likely to have a greater bearing on the overall performance of a typical application; e.g. data structures, algorithms, database design, etc.

  54. Vectorisation by Heretik · · Score: 2, Insightful

    Maybe not relevant in this case since you're working with strings, but with vectorization being so important to performance on most modern architechtures, if you were dealing with floats the pointer one might actually be slower because it's much harder for the compiler to figure out if (and how) it can vectorize it.

    I'm not sure about various compilers and what they do in this case, but following the progress of GCC4's vectorisation, it looks much more likely that the pointer case is passed over by the vectorizer and ends up being (way) slower than the easy to vectorise array index version.

    Like I said, not sure what the actual situation in practise is, but it's worth looking into. The difference between vectorized SSE code and plain old x86 code (for example) is WAY greater than the trivial insignificant difference between the two examples you posted.

    1. Re:Vectorisation by Anonymous Coward · · Score: 0

      GCC4.0 vectorization is not adequate for this example, and I doubt Intel C Compiler is neither. This assuming the code did use restrict pointers, as it does not, it is of course not vectorizable even if those compilers get a bit more advanced.

      How to vectorize code like this is pretty much obvious (I would write SSE implementations for most common alignment cases by hand in some hours, so advanced C compiler should be soon able to do the same in few milliseconds.

      For things that are actually array handling using them as arrays is wise for quite a lot of reason, because in addition to arrays helping modern compilers to generate efficient code, they tend to help to break up dependency chains on generated code. (This is stated in optimization manuals of both AMD and Intel CPUs.) Lengths of dependency chain often restrict maximum throughput of non-trivial algorithms [algo like this is so tricial that no matter how you do it, you should be able to trivially get it running as fast as memory can provide you the data].

  55. ASSume by larry+bagina · · Score: 3, Insightful
    My argument is that one cannot assume

    You're right, however, you're also assuming that your pointer arithmetic is faster.

    Consider a 16-bit architecture with 32-bit pointers.

    Using pointer arithmetic (32-bit) is slower than using an index register (16-bit) as the array index.

    So stop assuming and stick with what you're comfortable with. If you prefer pointers, fine. if you prefer arrays, fine. But if you're so concerned with the speed, you'd be doing it in assembly.

    --
    Do you even lift?

    These aren't the 'roids you're looking for.

    1. Re:ASSume by Paul+Jakma · · Score: 1

      Using pointer arithmetic (32-bit) is slower than using an index register (16-bit) as the array index.

      On this arch of yours the pointer arithmetic case will clearly have to use a 16bit offset variable on its pointer (16bit architecture). Same difference.

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
    2. Re:ASSume by larry+bagina · · Score: 1

      incrementing a 32-bit pointer requires 2 additions (for the low word and hi word in case of carry). Incrementing the 16-bit offset requires only 1.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    3. Re:ASSume by Paul+Jakma · · Score: 1

      Well, if you choose to have your array be larger than one hi-word segment (or straddle multiple segments), then the array index method will *also* have to use two adds (array + index *is* a pointer plus index in C). If not, you can just use a 16bit offset - one add. Same difference.

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
  56. From the Compiler Trenches by Anonymous Coward · · Score: 5, Informative
    I develop highly optimizing compilers for a living, so use that to put in context what I'm going to say. Things look very different from the other side of the source file.

    I'll admit that I'm always slightly bemused by these sorts of discussions. That bemusement quickly turns to disiniterest after I realize that a lot of people are burning a lot of cycles arguing about it.

    Quite frankly, your example has little relevance in the real world. The optimization you are talking about is covered by strength reduction, as others have pointed out. But that's not the point of my message. This sort of piddly optimization means almost nothing when one looks at the whole application. If this piece of code is in an inner loop that takes 90% of the application time and it proves to be the bottleneck, then one can think about taking a closer look.

    We have customers that come to us all the time with just such examples. They literally tell us, "You missed an opportunity to use a register here," and we know it's important because we know the customers are serious about profiling code and finding bottlenecks. So when they come to us we are happy to look at the "piddly stuff."

    I've seen all kinds of different code. They basically fall into two categories: code that the compiler can do something significant with and code the compiler has no hope of automatically optimizing in any truly meaningful way. When I say "significant" and "meaningful," I explicitly mean "not Dragon Book stuff, except for scheduling (which can provide a significant performance win)" Simple optimizations like common subexpression elimination and copy propagation are more useful at enabling other optimizations than in any cycles gained in their own right. They are important, but not to make the code run significantly faster in and of themselves.

    If one writes an application that does a lot of branching and pointer chasing (say, like, a traditional compiler*) then there's not much an optimizer can do with it. The aliasing difficulties alone will kill most optimizations. It's more important to write these kinds of applications in an understandable way because that is where the programmer time is most costly.

    That said, judicious use of directives for compilers that support them can go a long way toward making these kinds of codes run really fast. Think of threading a tree search, for example. But the compiler is not going to have much hope of converting such a low-level piece of code without help.

    An example of code that a compiler can do really, really well on are the traditional scientific applications. Here parallelism is everything, be it data-level, thread-level or at the distributed memory level (for really big machines). In these cases the loop optimizations are orders of magnitude more important speed-wise than sequential optimization (though, as I said, sequential optimization can enable some of these loop transformations). Some of the more important loop restructuring transformations are:

    • Interchange
    • Unroll
    • Fusion
    • Fission
    • Unroll-and-Jam
    • Invariant hoisting (both data and control)
    • Vectorization
    • Cache Blocking

    When the compiler is done with these, one hardly recognizes the code anymore. :)

    In my experience, the fundamental problem is that compilers are really hard to understand. People argue about what a compiler can and cannot do because they are enormously complex systems that require arcane knowledge of language standards and hardware architecture to really dig into. It's slightly less difficult to understand the broad strokes, for example, simple cases of vectorizable loops. It's a lot more difficult to understand how to parallelize a loop that compresses a sparse array into a dense one.

    If there's one lesson that I like to convey to programmers, it's to not sweat the optimization details. Don't hand-optimize the code. I can tell you fro

    1. Re:From the Compiler Trenches by Anonymous Coward · · Score: 0

      Don't hand-optimize the code. I can tell you from real-world experience that most of the time it actually hamstrings the compiler and one gets worse code in the end than if one had left it alone.

      Yes that's a real world conclussion, but it's sad to tell people to not optimice his own code.
      If hand optimzed code is hard for the compiler, that only shows that our compiler technology is still limited and must be perfected, promoting laziness (don't try to learn to optimice) is not a good programming advice.

    2. Re:From the Compiler Trenches by Anonymous Coward · · Score: 2, Insightful

      [Apologies for the code formatting <ecode> doesn't seem to work (in preview mode, anyway).]

      Hand-optimizing a piece of code without first making sure it's important via profiling and also looking at what the compiler is acutally doing is not avoiding laziness. It's simply a waste of time that prevents one from getting real work done.

      While it's true that our compiler technology continues to improve, there are some cases where hand-optimization puts the cocde in a state that no compiler will ever have a hope of repairing. Here's a simple example (contrived for simpler explanation, but not very far from what one sees in real-world code):

      int global[BIGNUM];
      int *p = global; /* "Optimize" address arithmetic */

      void foo()
      {
      int x[BIGNUM];
      int y[BIGNUM];

      initialize(x, y); /* In another compilation unit, not inlined */

      for(i = 1, i < BIGNUM; ++i) { /* x[0], y[0] hold "special" info */
      for(j = 1; j < BIGNUM; ++j) {
      x[i] += y[j]; /* Reduce y into each element of x */
      }
      *p++ += x[i]; /* Add x to global */
      }
      }

      Note that to the human eye, the outer loop looks parallel. There are no cross-iteration dependencies. It's a straight vector copy loop, very fast on architectures that support it like MMX, SSE, Altivec, etc.

      We've been "clever" and converted the indexing off of global into pointer accesses. In addition, we've made the pointer global because we know this function is the innermost loop and it will be called millions of times. We definitely want to avoid those millions of assignments.

      Or do we? Let's take a look at what we've done. When we call initialize to fill in x and y, we don't know ewhat happens to p. Coming out of that call, p could point to anything. More specifically, it could point to x. Even more specifically, it could point to x[0]. This creates a recurrence meaning that there is now a possible cross-iteration dependency in the outer loop.

      Oops. It's no longer parallel.

      Well, that's not strictly true. It still is functionally parallel but good luck convincing the compiler of that without help from directives. Strictly speaking, recurrences can be parallelized but even if this "optimized" code were parallelized run a heck of a lot slower than the straight vector copy.

      We didn't even hand "optimize" very aggressively. I've seen cases where programmers collapse multiple levels of loops accessing a multi-dimensional array and in the process completely destroy any information the compiler might have had about the iteration pattern. The end result is that loops that would have parallelized don't anymore.

      With parallelization we are talking anywhere from 2x to 50x speedup. Don't trade that for the .1% you'll get from converting an array access into pointer arithmetic.

      Again, the problem is that it is hard for humans to understand the purely mechanical process of the compiler. We "know" information (that initialize doesn't touch p) that the compiler never sees. Oh sure, there are some compilers out there that do whole program analysis and might be able to uncover what's going on here but the compile time on these systems is exponential.

    3. Re:From the Compiler Trenches by Anonymous Coward · · Score: 0

      For the love of god, get an account so I can add you as a /. 'friend' - this is one of the most insightful comments I've read here in a while.

  57. Benchmarking by Threni · · Score: 2, Insightful

    If you're talking about optimisation then the only thing to do is to check different strategies. You can't just assume type a is a better coding method than type b. Even if you could prove it were true for every single compiler on the market, one could come along tomorrow and do things differently.

  58. Pointer aliasing by igomaniac · · Score: 2, Interesting

    With modern compilers you should always use arrays. The reason is that for all but the most pathological cases of indexing, the compiler will do strength reduction and induction to turn the p[i*4] into *p,p+=4. The problem with using pointers is that if you have more than one of them, the compiler has no easy way of knowing if they point to the same place (this is known as pointer aliasing) so a bunch of useful optimizations have to be turned off. If you are doing a[i] = b[i] the compiler can much more easily find out that a and b are distinct memory locations than if you are doing *p = *q.

    If you want to learn more about these kind of source level optimisations, look the AMD Athlon(TM) Processor x86 Code Optimization Guide is a good reference -- it includes a section on why you want to use array accesses instead of pointers, and also has a lot more up-to-date and useful advice on what compilers do to your code. It is available from AMD's website.

    --

    The interactive way to Go -- http://www.playgo.to/iwtg/en/
    1. Re:Pointer aliasing by Cardbox · · Score: 1

      If you are doing a[i] = b[i] the compiler can much more easily find out that a and b are distinct memory locations than if you are doing *p = *q.
      No, it can't. a[i] can address any location in the whole of memory, depending on the value of 'i'.

  59. 80x25 by dolmen.fr · · Score: 2, Interesting
    Again, your numbers are flawed, because my old 80x25 text mode display is still drawn from individual pixels. Those pixels still have to be updated individually by the CPU (in the 10MHz days it was often the main CPU that drew every little dot). 80x25 is drawn on a 640x480 in 16 glorious colours. Now, by your argument, 640x480x2 (your 2 image argument from above) = 614,400 bytes. Therefore, your 1900x1200 screen only requires 12x as many bytes to move about but I really dont' care about those numbers because most of the work is done in the GPU now; not the main CPU.


    In the PC world, the textmode was handled in hardware by the graphic card. Yes, we had already hardware accelerated display at that time. In fact is still in the latest PC video card.
    One byte for the character, one byte for the color attribute.
    So your argument doesn't apply.
  60. Re:Effective cache use will be a better optimisati by thsths · · Score: 1

    > I then came to realize that he was traversing the row-major image in columns. The cache coherency was being shot to hell because none of the cache-lines were being hit as it went from pixel to pixel.

    According to my experience, this happens all the time. People just don't understand how a cache works (ok, the details can be difficult). Back in the old days, it did not matter, but nowadays getting your cache access right is one of the first things to do if you want to improve performance.

    On a SPARC, you have counters that give you an instruction per clock cycle reading. Just try it on a few programs, I have seen many lurking around 0.25 or 0.3. The rest is usually wasted waiting for the cache to catch up.

  61. You didn't understand what Professor Cormen said.. by dolmen.fr · · Score: 1
    int *k = i, then i[4] is the same as *(k + (4 * 4))

    No, it isn't:
    • sizeof(int) is not always 4.
    • adding an int to a int * is not the same as adding an int to a char *.

    So i[4] is the same as k[4] and is the same as: *(int *)((char *)k + 4 * sizeof(*k)).
  62. understand the problem first by Anonymous Coward · · Score: 0

        I played a little with the code you posted. The pointer arithmetic is slightly faster. But understanding the problem first and coding second will most likely bring the real optimizations. As an example: for strings that are multiple of 4 and quite long (I tested greater than 32 chars ) this code is faster than what you posted (I hope it is correct).<br><br>

    void reversestring_array(char *str)
    {
        unsigned int *stri = (unsigned int*)str;
        int head;
        int tail = -1;

        for ( int k = 0; str[ k ]; k+=4, ++tail )
        {
            char tmp = str[ k ];
            str[ k ] = str[ k + 3 ];
            str[ k + 3 ] = tmp;

            tmp = str[ k + 1 ];
            str[ k + 1 ] = str[ k + 2 ];
            str[ k + 2 ] = tmp;
        }

        for ( head = 0; head < tail; ++head, --tail )
        {
            unsigned int temp = stri[ tail ];
            stri[ tail ] = stri[ head ];
            stri[ head ] = temp;
        }
    }

    Now a funny sample of the code optimization people at my work place use

    register long vectorxx
    register long vectorxy
    register long curr_x
    register long curr_y
    register int xctr
    register char *pSrc
    register char *pDst

    last_rot = angle
    long sin
    long cos
    long ptx
    long pty
    long xcoord
    long ycoord
    long vectoryx
    long vectoryy

    // force the compiler not to waste registers for these variables
    pSrc = &angle
    pSrc = &sin
    pSrc = &cos
    pSrc = &srcheight
    pSrc = &dstpitch
    pSrc = &dstwidth
    pSrc = &dstheight

  63. Where have all the geeks gone? by microTodd · · Score: 3, Interesting

    Wow. Just wow.

    No one will probably read this comment because its been a day since the OP, but I'm amazed at the quantity of people who are slamming this guy for wanting to research something that's admittedly interesting.

    For starters, if the submitter is a CompSci student then he definately gets my kudos. Too many CS students are just focused on "I wanna learn C# so I can go make money!" as opposed to actually LEARNING.

    Secondly, what happened to just plain geekiness of research and studying things because its fun and interesting? Does everything we do have to have some specific applicable purpose? If you say yes, you are thinking like the MBAs that always get bashed around here instead of a real nerd.

    Who knows? Its unlikely, but possible that thinking about this problem somehow leads to a train of thought that solves P=NP or something.

    --
    "You cannot find out which view is the right one by science in the ordinary sense." - C.S. Lewis on Intelligent Design
    1. Re:Where have all the geeks gone? by Anonymous Coward · · Score: 0

      Yay, someone else thinking the same thing. The trouble with this site is that it's full of VB or unemployed inexperienced "programmers", and IT failures that believe they know everything. 1% are real geeks, probably less in fact, the rest are weedy never get a date types.

      They're all wankers arguing over fast but limited functionality MySQL vs slow yet more functionalty PostgreSQL for their pathetic little websites. Or jerking off to firefox bug fix x.y.z, or another Linux kernel minor version that fscks up various setups or commercial apps like vmware. Who then slag off Microsoft, then fanboy Apple, regardless of both suppliers shipping flaky software.

      So yes, let us see what compilers are doing with various code techniques, then evaluate the state of excessive pipelines in the lust for GHz against true throughput, followed by the state of silcon replacements, optical processors, et al.

  64. I'll Bet You Write Java And Call It Fast!!! by Anonymous Coward · · Score: 0

    If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.

    It is precisely your thinking that causes raising a window to bleed a million cycles in the first place. Efficient code is always good! Even if it only saves one cycle, there is no telling how many times the function will be called so, it could make a huge difference. Imaging how many cycles would be saved if you shaved one cycle off of printf() or strcp().

    It is your thinking that causes a 3Ghz machine to run no faster than a 386 running DOS. It is your thinking that causes 1 gigabyte of RAM to be barely adequate and it is your thinking that causes 100 gigabyte hard drives to be inadequate.

    The man asked a legitimate question and if you don't have an answer then STFU! Don't try to belittle him for not sharing your mentality for slow bloated crap you call code!

    Good code is even better when you throw more horsepower behind it. Bad code requires more horsepower to stay the same. Finally, just because most people produce bad code doesn't mean that everyone should produce bad code.

  65. You can't know it a priori by Anonymous Coward · · Score: 1, Informative

    On some compiler/architectures array indexed code is faster than direct pointer access. On others the opposite is true. I discovered this while writing portable code for Mac/Win32 ten years ago. The inner loop of my pointer code was running faster on 68k/PPC as compared to x86. When I switched it to array dereferencing, the x86 was faster. I confirmed the results by looking at the disassembly of the compiled code.

    The moral of the story: write the code that is most readable/maintainable to your eyes.

  66. Photoshop plug-ins by mwvdlee · · Score: 3, Informative

    I've written quite a lot of code for Photoshop plug-ins.

    Since this type of code typically iterates over a few hunderd million pixels you'd think that changing such details as array vs. pointer or some other common optimalization technique would have an impact.

    It does; it typically shaves about a few tenths of a second off of a 5 minute calculation.

    Then again, spending that same amount of time altering the algorithm will usually increase performance in a noticable way.

    Nowadays I don't bother optimizing code (usually the compiler does a better job at it anyway) but rather optimize the algorithms. Instead of opening the topic and waiting for a definitive answer on your quest for ultimate performance, you could probably have rewritten the algorithm and gained much more performance you'd ever get this way.

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
  67. One of the secrets of the master programmers by rfisher · · Score: 2, Insightful

    Master programmers know that it doesn't matter. We:

    1. Write the code in the form that is clearest. (So that the maintainer--even if it is ourself--can figure out what it is doing quickly when debugging it later.)

    2. If AND ONLY IF performance is a concern and if AND ONLY IF this code has been shown to be a bottleneck, do we consider optimizing. We then try several different implementations and then profile them to choose the winner.

  68. Programming as if Performance Mattered by Junks+Jerzey · · Score: 1

    An article previously mentioned on Slashdot.

    1. Re:Programming as if Performance Mattered by Anonymous Coward · · Score: 0

      well, he can write what he wants.

      my stuff has to run fast. I'm not so bothered about it that I write in FORTRAN (although I wouldn't be entirely averse) but I'm bothered enough that I use C++ and the standard still isn't good enough, I use Blitz++ and a proper compiler, and I profile often.

      If I thought it didn't matter, and wrote in Erlang or Python, I'd be fired. the faster my code runs, the more business gets done in less time on less hardware. and the more profit to pay me.

  69. Array and Pointer are not the cause for slow speed by DemonSlayer · · Score: 2, Interesting

    First of all, the array and pointers are not the reason why a C program run slowly.

    "Unnecessary looping" is the cause of the problem. I have encountered and fixed many C programs created by programmers, who have left the company.

    I have found that many programmers like use alot of for-loop and while-loop in their programs. Most of the time, the algorithm of their C programs can be modified to reduce the need for the loops and increase the speed of the C programs by 10 times or more.

  70. Virtual Memory 'gottcha' by Anonymous Coward · · Score: 0

    Expert C Programming Deep C Secrets, a book by Peter Van Der Linden has an interesting chapter concerning why a pointer and a structure name are not the same and do not behave the same in real use. See chapter 4 of this book:
    "The Shocking Truth: C Arrays and Pointers are NOT the same!"

    That is a recommended read for anyone who is a C programmer.

    Second point: If the size of the arrays in question are very large, then it is a guarantee that most modern operating systems will swap these arrays in an out of memory using virtual memory techniques. So, unless you are writing modules for device drivers or an Supervisor level on modern processors it really doesn't matter what you do with the array indexing. The virtual memory swapping will slow you way down.

  71. Re:Use Java instead .... by Surt · · Score: 1

    Since I often wonder just how much performance i'm giving up using java instead of c, I try an experiment like this about once per java version.

    The current result
    c (ms visual c, optimize speed): 4391 ms
    java (5.0, -server): 10031 ms

    So there you have it, java is only a little over twice as slow as c, even at something that ought to be c's greatest advantage.

    If anyone knows how to optimize the java outcome better without obfuscating the code or changing the contract of the functions, comments would be welcome.

    I also compared replacing the strlen calls with a hard-coded length of string value (tail = 19;), to make sure that my strlen implementation didn't suck, but the outcome was pretty much the same, c twice as fast as java. (c: 2359 java: 5171)

    Here's the code (link to avoid violating compression filter):

    C:
    http://ptth.net/slashdot/stringreverse_c.txt

    Java:
    http://ptth.net/slashdot/stringreverse_java.txt

    --
    "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
  72. Wisdom of the bot by Anonymous Coward · · Score: 0
    First off, the example indexed loop sure as bloody heck *can* be optimized to elide the explicit index. In this case it's simple, as the index is never used except as an index; in other cases it can be more of a balancing act, especially for limited register targets. But logically that's a trivial transformation, as even a smattering of knowledge about compilers should inform you. Only ignorance can explain the assertion that this isn't so in this context. Sorry, Slashie!

    Now for the promised smart-bot:

    --
    C makes an art of confusing pointers with arrays and strings, which leads to lotsa neat pointer tricks; APL mistakes everything for an array, leading to neat one-liners; and Perl confuses everything period, making each line a joyous adventure . -- Tim Peters

  73. Bloat by Julian+Morrison · · Score: 1

    "Bloat" is one of those catchy ideas that seems to explain things but is IMO mistaken. Modern software isn't actually that bloated - it's much more featureful, and it handles much larger and more complex datasets in much more straightforward ways. Also, it's often cross-platform, or designed for forward portability and rapid maintenance in ways that old software was not. None of these things are "bloat" - you'd miss them if there were absent.

  74. Java Perf by dwandy · · Score: 1

    Well, I guess this article might be about you then...

    --
    If you think imaginary property and real property are the same, when does your house become public domain?
  75. I'd go for arrays by Anguila · · Score: 1

    My answer, TO THE QUESTION is:
    I can't think off the top of my head any example where pointer arithmetic + platform would fail or be worse than an array, but I bet there is.
    Hence, I would use arrays because it'll work well, ON AVERAGE, on all platforms and later add #ifdef(_POINTER_ARITHMETIC_IS_BETTER_) optimizations inside, if required at all.

    I hope more people add their feedback on the very interesting question asked:
    Will pointer aritmethic be worse/better/the same on some platforms?
    and less people try add noise answering ANOTHER question:
    What would a self-proclamated master programmer optimize on modern machines?

  76. On other architectures? by tepples · · Score: 1

    True, they take the same amount of time on x86, but on many architectures, the addition performed in the (%esi,%ecx) argument takes an extra CPU cycle. If the compiler doesn't manage to strength-reduce array accesses to pointer manipulation (you can see what assembly code GCC is generating by using -S instead of -c), you'll need to do it yourself.

    1. Re:On other architectures? by Piquan · · Score: 1

      True. So I compiled it on a MIPS as well, which doesn't have a base+index register mode load instruction. Stock gcc 3.3.2, -march=mips32 -O3 (and also -dA -g -mrnames so I could more easily read the output):

      Array version:

      $L30:
      addu $a2,$s0,$a3
      addu $v0,$s0,$t0
      lb $a1,0($a2)
      lbu $a0,0($v0)
      addu $t0,$t0,1
      addu $a3,$a3,-1
      slt $t1,$t0,$a3
      sb $a0,0($a2)
      bne $t1,$zero,$L30
      sb $a1,0($v0)

      Pointer version:

      $L38:
      lb $a3,0($a1)
      lbu $v0,0($a0)
      sb $v0,0($a1)
      sb $a3,0($a0)
      addu $a1,$a1,-1
      addu $a0,$a0,1
      sltu $a2,$a0,$a1
      bne $a2,$zero,$L38

      I can't actually time these, since I don't have a MIPS box. (I do have simulators, but they're not entirely cycle-accurate, and besides I don't feel like pulling one out.) But it does appear that the pointer version is more efficient.

      I find this interesting. GCC does support strength reduction; I wonder why I'm not seeing it have any effect here. It could be that the use of head++,tail-- is throwing it off: gcc doesn't recognize this as a pattern that can be strength reduced. So I tried a more common access model: straight sequential access.

      unsigned int
      strlen_array (char array[])
      {
      unsigned int i;
      for (i = 0; array[i]; i++);
      return i;
      }

      unsigned int
      strlen_ptr (char* array_start)
      {
      char* array_end;
      for (array_end = array_start; *array_end; array_end++);
      return (array_end - array_start);
      }

      On the MIPS, array version:

      $L6:
      addu $a1,$a1,1
      addu $v1,$a0,$a1
      lb $a2,0($v1)
      bne $a2,$zero,$L6

      Pointer version:

      $L14:
      addu $v1,$v1,1
      lb $a1,0($v1)
      bne $a1,$zero,$L14

      Odd. Still no strength reduction. I wonder what's up here.

      On an unrelated topic, you may know about the problem of page-widening posts. Well, apparently Slashdot doesn't like page-lengthening posts either: my comment currently has 21.1 characters / line, so it's not letting me post it. So instead I'll post a really long paragraph to help it along. You can really ignore this paragraph if you like. Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, `and what is the use of a book,' thought Alice `without pictures or conversation?' So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, `Oh dear! Oh dear! I shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge.

  77. MAKE MONEY FAST by heson · · Score: 0, Offtopic

    No usenet died somewhere between "GREENCARD LOTTERY" and "Me too".

    1. Re:MAKE MONEY FAST by afroborg · · Score: 1

      Hence the suggestion of a .moderated group.

      Oh, and the people that modded me troll - apparently I'm right and you really have never heard of usenet... /. is a real good place to get opinions, not facts.

      --
      my sig could kick your sig's arse...
  78. ...about the size of our wallets. by tepples · · Score: 1

    for some ISA's I'd expect a niave C compiler to generate FASTER code for indexing an array than for incrementing a pointer. (I'm thinking of word addressed ISAs, where a 'char*' has to be represented as a word pointer and a character offset.)

    The char is the smallest addressable unit. On word-addressed architectures I've seen, CHAR_BITS is typically greater than 8, making char and wchar_t equivalent. Using 16-bit encoding of characters wastes memory only if the vast majority of your users use US-ASCII.

    you will get more benefit for your effort by: 1. finding better C compilers

    Which students and especially unemployed BSCS graduates trying to keep their skills current generally cannot afford. In this economy, how many burgers does one need to flip in order to buy a copy of icc?

    or using a profiler to find the real hotspots in the code and hand-optimizing them for the hardware platforms that you really care about.

    Which I'm supposing has already been done, and somebody is wondering whether hand-strength-reducing array accesses is a worthwhile hand-optimization technique where profiling points out that it may be needed.

  79. Safety considerations by m6ack · · Score: 1

    Sometimes pointer arithmatic can be safer than using an itterator. The assumption of this code is that the length of your input will fit into an integer -- this assumption may not be accurate. With the itterator/array code, you may have different results based upon the optimization settings.

  80. +1 Inspiteful by Anonymous Coward · · Score: 0

    You and ...

    nobody else !!!

  81. My personal experience by Z00L00K · · Score: 1
    is that if you start with easy to read code you have actually done some optimization already. The best way to create code that is easy to read is to actually minimize down all the unnecessary parts. (don't make unnecessary copies of data)

    With less code the compiler will have an easier job to optimize your task than a complex expression. A breaking down of the code into manageable parts is also important.

    One issue that hasn't been discussed yet is that code optimization today is depending on the hardware used. If you can execute all your code and data in the cache of the CPU it will be a lot more efficient than if the data has to be accessed from external memory.

    When it comes to optimization there is also another thing to consider - even if the compiler is able to place unnecessary code outside a loop you shouldn't rely on it. If you do that by hand before then the compiler doesn't have to make considerations about such code - or at least at a much lower level. Depending on the compiler there may actually be a limit on how large blocks it is able to consider during optimization.

    Use of switch statements in C instead of if/else/if/else/if/else-statements may actually also be both a readability and a compiler advantage since (if the case-labels are correctly managed) the switch statement can be translated to a jump table while the if statements are harder for the compiler to translate.

    There are also many more optimizations that are very processor specific. Old processors like the Z80, 6502 etc. wasn't very complicated while modern processors are much more complex with different cache levels and prefetching. The prefetching of code is actually something that can be very tricky to do compiler optimization for when doing simple loops. There are actually some documents available both from Intel and AMD that describes how to work with prefetching and SIMD etc in which cases the code to use would be less efficient than a simple loop on any processor that doesn't support such features.

    But as stated in other replies - figure out where to actually try to improve the code before doing optimizations. There are tools out there that can be very useful - Like Quantify (former Rational, now IBM) that can create a graphical tree over how functions are called and the amount of time consumed.

    --
    If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
  82. so THAT's what a pointer to an array looks like! by davidwr · · Score: 1

    an arrow pointing to the toilet roll

    So that's what a

    *toiletpaper[]

    looks like.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  83. questioning the question by jotux · · Score: 1

    Am I the only why that finds it disgusting that so many people would rather challenge the basis of a question, rather than actually try to answer it? I consistently see "Who cares?", and "it doesn't matter." Who cares? the person asking the damn question, that's who. And if he is bothering to ask it, it matters to him.

    There are many many situations where something like this would be important, and if you don't recognize the importance of something like this you shouldn't be challenging the basis of his question.

  84. Well, Kirk says... by Vorondil28 · · Score: 1
    I can't help but to point out that as I read this discussion, the quote at the bottom of the page reads:
    Not one hundred percent efficient, of course ... but nothing ever is. -- Kirk, "Metamorphosis", stardate 3219.8
    Curiously apropos? I think so! :-P
    --
    This sig rocks the casbah.
  85. MOD PARENT UP - THAT'S FUNNY!! by Anonymous Coward · · Score: 0

    Best laugh of my day!-)

  86. Re:Array and Pointer are not the cause for slow sp by aj444 · · Score: 0

    care to comment on this further ?
    im interested, in how you avoid using for and while loops

  87. Subtraction error detected by Anonymous Coward · · Score: 0

    I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz).

    Actually, 10 MHz is 399 times slower than 4GHz. 400 times slower would be 0 MHz.

    1. Re:Subtraction error detected by Ced_Ex · · Score: 1

      So by your math, if I had a 0 MHz machine, and I wanted 400 times the power, I would have to do this.

      0MHz x 400 = 0MHz???? Whaaaat???? Umm... I think your math is flawed.

      --
      Live forever, or die trying.
    2. Re:Subtraction error detected by Anonymous Coward · · Score: 0

      It's not flawed - the problem is you cannot count. Anything multiplied with zero is zero. For the other issue: consider the sentence "a is b smaller than c". To find out the value of a, you subtract b from c. I.e. a = c-b. In the example above, there is just the added multiplication.

      Offtopic: I hope you're happy to get your undeserved mod points just because you are a registered user. I'm staying happily anonymous.

    3. Re:Subtraction error detected by Ced_Ex · · Score: 1

      It's not flawed - the problem is you cannot count. Anything multiplied with zero is zero. For the other issue: consider the sentence "a is b smaller than c". To find out the value of a, you subtract b from c. I.e. a = c-b. In the example above, there is just the added multiplication.

      Offtopic: I hope you're happy to get your undeserved mod points just because you are a registered user. I'm staying happily anonymous.


      No, the sentence referred to is not "a is b smaller than c", it is "a is b times smaller than c". In which case your explaination does not work.

      "Times" is the key word, as it indicates multiple. So a = c/b.

      How is my mod points undeserved? If mods didn't feel I deserve them, they'll take it away. Hiding in anonymity only serves to give you an excuse "Oh, that wasn't me. That's some OTHER Anonymous Coward."

      --
      Live forever, or die trying.
    4. Re:Subtraction error detected by Anonymous Coward · · Score: 0

      You're too stupid to argue with. I'm sorry.

  88. Somebody Has to Write the XML parser... by oldCoder · · Score: 1
    Somebody has to write the XML parser, the Ruby interpreter, the Java VM. They're in C for a reason. And it makes sense to know beforehand what coding techniques will not come back to bite you. Once you've written 100,000 lines of C code you really don't want to discover the product is a pig because of a coding technique strewn all over the source code.

    It certainly used to be true that the embedded C cross-compilers I used in the 1980's generated significantly tighter code with pointers than with arrays. So I got into the habit of coding that way.

    Most Significant: In the given example (string reversal) the subscript code isn't safer than the pointer code. Neither technique is likely to create buffer overflows or problems like that.

    To really compare speeds, the code should use post-increment on the subscripts/pointers. On simpler compilers that can speed things up. Since that's what you'd do if you were coding for speed you should test it.

    With a perfect compiler it doesn't matter. The question posed was, how good are the various compilers? Fair question.

    --

    I18N == Intergalacticization
  89. This Is Why You Should Care. by oldCoder · · Score: 1
    One of the reasons the machine bleeds off a million cycles every time it raises a window may be that the coders didn't pay attention to performance.

    Some products need to be built for performance. If you're in that situation, you need to know beforehand what techniques will shoot yourself in the foot. The original article asks that question. You failed to anwer it.

    --

    I18N == Intergalacticization
  90. Arrays and Pointers by Quill_28 · · Score: 1

    Quick code:
    main() {

            int z;
            int x[2] = {1,8};
            z = 1;
            printf("Value of array: %d\n", z[x]);

    }

    What happens?

    1. Re:Arrays and Pointers by Anonymous Coward · · Score: 0

      Value of array: 8

    2. Re:Arrays and Pointers by drxenos · · Score: 1

      Yes, all C programmers knows that a[b] equates to *(a + b) regardless of which operand is the array and which is the index. Can we move on now?

      --


      Anonymous Cowards suck.
    3. Re:Arrays and Pointers by Quill_28 · · Score: 1

      Actually, I would guess that most don't know this.

  91. WRONG ! by oldCoder · · Score: 2, Insightful
    The increased readability and lower likelihood of programming errors on the array option...

    The array version is neither clearer nor less error-prone. And I dare you to prove differently!

    The pointer version is more portable across the chasm of compiler quality. That is, it will run well even on poorer compilers.

    The question of clearer is a question of cognitive psychology, not computer science, and requires experimentation to validate. The experiment would be confounded by the existing prejudice against pointers and the habit of CS profs of teaching subscripts over pointers.

    N.B.: There are problems with C pointers, but that comes up when using pointers to data structures, not pointers to bytes. Dangling pointers, free/malloc problems, and all that. Using subscripts helps very little, however.

    It is not true that the optimization is nearly pointless, it is that the difference in performance is so small as to make the decision nearly pointless. No pun intended. In some situations, the difference may be very important. The pointer version is no harder to code, read, or debug, and might possibly give you back benefits.

    What I'm trying to say here is that the pointer version is not an optimization, it's an alternative. Optimizations require more work than the vanilla version. For string/character loops, there is no more work in coding the pointer version.

    --

    I18N == Intergalacticization
  92. Mello Yellow by oldwarrior · · Score: 0

    But nothing gets you going better than the funky, chunky, hum of Chartreuse when you get a jones'in for some chromatones!

    --
    If it were done when 'tis done, then t'were well it were done quickly... MacBeth
  93. c# smokes. Whatvyoubeen smokin? by oldwarrior · · Score: 0

    n/t

    --
    If it were done when 'tis done, then t'were well it were done quickly... MacBeth
  94. Optimizing + high level language = oximoron by Vorlath · · Score: 1

    Don't get me wrong... my specialty is optimizing in high level languages such as C/C++ and Java. But all I'm really doing is cleaning up code. I'm not really optimizing in the true sense of the word.

    The biggest issue I've encountered is getting maximum speed when processing large amounts of data. This is the classic case. The other one is computationally intensive routines (but that usually requires lots of data as well).

    My point is that if you really wanted to optimize a piece of code to process large amounts of data, you would do the following:

    1. Try to see how you can segment the data into chunks of less than or exactly 2K.
    2. Set some memory aside on a page boundary that doesn't exceed the page size.
    3. Prefetch the next chunk of data.
    4. Load the fist chunk of data to be processed into the memory that you set aside.
    5. Work on data inside this side memory.
    6. Save it back out using non-temporal stores (unless you need to reuse the data).
    7. repeat by going to step 3.

    The above is the fastest way to process large amounts of data. It'll be 10 to 1000 times faster than anything you can do in C/C++ or any other language. The speedups are comparable to C64 vs IBM386DX2/66. You just can't do this in any high level language. At least not in a readable way. If you want to write a video codec for example, good luck writing it in C/C++ where it matters. I'll laugh at you if you say Java. Could you imagine loading a VM for a video codec?

    Trying to optimize without using the capabilities of the computer is sorta like trying to build a house without using a hammer. Your software runs on a MACHINE. Trying to optimize in a high level language is IMPOSSIBLE because you don't have full access to the machine!!! The most you can do is clean up your code or design a new algorithm.

    These discussions never get anywhere because you're arguing over the best of the mediocre in terms of performance. Until a high level language appears on te market that let's you deal with cache issues, you're all out of luck. And pointer vs. index is retarded. All memory is accessed via a pointer to the CPU.
    High level languages are used to better describe concepts and ideas. Use it that way... ie. readability first.

  95. Re:Array and Pointer are not the cause for slow sp by DemonSlayer · · Score: 1

    Example for avoiding loops

    One C program I encountered requires it to search for the country name base on the country code.

    The program read from a reference file and use 1 array to store country_name[] & another to store country_code[].

    Then it read from the raw data file which contains the country_codes. It use a for loop to try to match the country_code from the raw data file and the arrays to get the country_name. The it print it to another file.

    This is stupid, I change the C program so that the array for country_name[] use the country_code as index. eg. country_name[country_code]
    This method eliminate the use of for loops.

    When the program read from the raw data file, it directly use the country_code
    to get the country_name without searching for it in a loop.

    eg.
    print country_name[country_code from raw data file].

    This speed up the C program.

  96. That Taught Me! by oldCoder · · Score: 1
    I actually laughed out loud when I read your code. It's been a long time since C code actually made me laugh. I think some of the responders missed the joke.

    And I think your code will actually run, too. I like the semantics (put the tail of the string into temp, put the head in the tail, and put the temp into the head -- more like English).

    --

    I18N == Intergalacticization
    1. Re:That Taught Me! by Leimy · · Score: 1

      And I think your code will actually run, too. I like the semantics (put the tail of the string into temp, put the head in the tail, and put the temp into the head -- more like English).

      It's actually guaranteed by the C standard to work :). However the code generated is a 3rd form on a lot of non-optimizing compilers. So now you have pointer arithmetic, the canonical array[index], and the strange but valid index[array].

      I was just adding mud to the pool :)

      Dave