Slashdot Mirror


Arrays vs Pointers in C?

UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" "Here are my own thoughts on the issue:

For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.

The first function uses array indexing:
void reversestring_array(char *str)
{
int head, tail;
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{
temp = str[tail];
str[tail] = str[head];
str[head] = temp;
}
}
The second function uses pointers:
void reversestring_pointer(char *str)
{
char *phead, *ptail, temp;
if (!str) return;
ptail = str + strlen(str) - 1;
for (phead = str; phead < ptail; ++phead,--ptail)
{
temp = *ptail;
*ptail = *phead;
*phead = temp;
}
}
While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.

Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...

The array function:
forloop:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al
inc esi
dec ecx
cmp esi,ecx
jl forloop
The pointer function:
forloop:
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl
inc ecx
dec eax
cmp ecx,eax
jb forloop
While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."

62 of 308 comments (clear)

  1. Why do you care? by Julian+Morrison · · Score: 4, Insightful

    For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

    If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.

    1. Re:Why do you care? by thegrassyknowl · · Score: 4, Interesting

      then don't wank around optimizing

      Dude, best use of the word wank. Ever!

      for single cycles on a machine that probably bleeds off a million cycles every time you raise a window

      Computers have become more powerful and programmers have become more lazy. It's not strictly true because instead of focussing a lot of time writing efficient code programmers are now focussing a lot of time writing a lot of code to fill bigger machines. That million cycles is wasted doing crap and probably half of doesn't need to be done anyway...

      I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz). Software was generally responsive, functional and minimal. Adding a zillion features and eye candy was not considered necessary. Programs were easy to use and intuitive, and did I mention functional and minimal? In the days where "nobody will ever need more than 640k" software was designed and optimised to be small and chew up few cycles.

      Now, with RAM and Gigahertz available for next to nix software has just bloated out. It's nice to see a programmer thinking about efficiency/size even if it is purely academic. We should be encouraging that; I know I'd like my applications to work faster and carry less crap than they do currently.

      --
      I drink to make other people interesting!
    2. Re:Why do you care? by aled · · Score: 4, Funny

      Both of those pieces of code happen so fast that it doesn't matter.

      Unless of course that someone writes a compiler so optimizing that the code ends before it begins, causing a paradox that will end the universe. To prevent that imminent danger all programmers must start programming in TI-99/4a Basic right now.

      --

      "I think this line is mostly filler"
    3. Re:Why do you care? by JanneM · · Score: 5, Insightful

      For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

      When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for; or how exhaustively you can search for features during one frame; or what resolution image you are able to work on.

      If your code does something nice and graphical where 99% of the time is spent waiting for the user, sure, you're absolutely right. And if your system is doing something inherently bounded - it works until it's done, then it stops and waits until it's time again - then all you need is to make it fast enough and no faster. But there are real-world systems that today, and for the foreseeable future, are bounded by the available processing power and that can always benefit from any improvement in execution time.

      --
      Trust the Computer. The Computer is your friend.
    4. Re:Why do you care? by thegrassyknowl · · Score: 3, Interesting

      You forgot to mention that keeping an 80x25 column text display updated only required moving around a maximum of 4000 bytes at a time (80x25 = 2000 bytes for the monochrome text buffer + another 80x25 colour buffer for 16 background and 16 foreground colours per text block), and that a sub 10MHz CPU would certainly struggle to animate that at 30 frames per second or higher.

      I grew up in the days if Amiga - and they certainly didn't have an 80x25 text console... so your analogy is fundamentally flawed. All of my favourite Amiga software was lightweight, efficient and responsive (except for the rey tracing engines I used, but that did _actually_ have to do some serious calculation). In fact, my Amiga ticked along on a 800x600 screen quite nicely. Your analogy is also flawed becase not all of the screen is updated at any one time; only the parts that have changed. Oh, and 4000 bytes x 30 FPS is 120kb;

      My Commodore 64 has a 700kHz processor in it and it can certainly animate the full screen at 25 frames per second albeit at a slightly reduced resolution. My Atari 2600 has a CPU of about the same speed and it was capable of keeping up with 25 frames per second (again, lower resolution) and still running the game engines just fine. Your argument doesn't hold water pal!

      My 1920x1200 colour display requires 55296000 bytes (1920x1200*24), which is 13,824 times as much as that 80x25 text display. Now despite my 2GHz CPU only being 200 times faster than the hypothetical 10Mhz CPU, it doesn't struggle at all - not only can it animate the whole screen at 60FPS or more, but it can also calculate positions for thousands of objects at the same time.

      What exactly do you do on your 1900x1200 colour display? First, 1900x1200x24 bits is actually 1900x1200x3 bytes (6,840,000 bytes). That is a far cry from your piss-ant 55,296,000 bytes (55MB).

      Given your eagerness to quote numbers that are practically meaningless to the point and blatantly inaccurate, and your "calculation of positions of thousands of objects" I suggest you are playing games on Windows.

      Again, your numbers are flawed, because my old 80x25 text mode display is still drawn from individual pixels. Those pixels still have to be updated individually by the CPU (in the 10MHz days it was often the main CPU that drew every little dot). 80x25 is drawn on a 640x480 in 16 glorious colours. Now, by your argument, 640x480x2 (your 2 image argument from above) = 614,400 bytes. Therefore, your 1900x1200 screen only requires 12x as many bytes to move about but I really dont' care about those numbers because most of the work is done in the GPU now; not the main CPU.

      There are certainly some areas in which the software doesn't seem to have sped up in proportion to the processor speed - but my guess is thats mostly because they ceased being CPU bound years ago, not because the CPU is flat out wasting cycles trying to do the job. That doesn't mean it's not the programmer's fault - but if it is, it's because they decided to block the interface while they waited for a DNS query (or similar bad design decision), not because they used a pointer offset instead of an array dereference, or vice versa.

      Software bloat is because programmers can get lazy when the CPU gets fast (the old "oh there'll be better CPUs by the time we release it" excuse). Looking at the power consumption figures for a modern Pentium4 CPU and figuring out that it wastes billions and billions of cycles a day doing work that it should never have needed to do is scary. If you average it out over all the PCs running in the world the amount of energy turned into heat because of sloppy programming alone would be enormous.

      What about all the wasted hours waiting for the computer to do something because of some sloppy programmer being willing to waste a few million CPU cycles here and there? It doesn't seem like much at the micro level, but think about it across all the processes that run on your machine of a day and the few

      --
      I drink to make other people interesting!
    5. Re:Why do you care? by NonSequor · · Score: 4, Insightful

      You don't understand the problem. A chemical reaction is only as fast as its slowest step. Catalyzing the other steps will not yield an improvement in reaction rate.

      For computer programs, you won't gain anything worthwhile by optimizing code that the computer only spends 1% of its time executing. That's not to say you should do a sloppy job, but you should focus on what matters. Microoptimization techniques (those techniques that involve choices of instructions and their orders rather than changing the algorithm that is used) typically yield very small gains. Microoptimization can yield substantial benefits when used properly in heavily used sections of code, but the time involved in trying to microoptimize everything could be better used to work on macrooptimizations or organizing the code to make it more amenable to later modification.

      There's no sense in trying to make your program 0.3% faster when you could be finding a way to make it 20% faster instead.

      --
      My only political goal is to see to it that no political party achieves its goals.
    6. Re:Why do you care? by glavenoid · · Score: 3, Funny

      Incedentally, I find some shades yellow to sound rather coarse, with harmonic triangles increasing with each beat inverse to the fundamental. Not as vulgar a sound as say, Taupe, but not as auarally pleasing to me as drips of purple. Now, Brown, on the other hand, has a rather discordant sound, much like a Dom. 7th, even if the actual interval is not such...

      --
      I, for one, am looking forward to the inevitable /. beta rollout fallout.
    7. Re:Why do you care? by elfkicker · · Score: 2

      If only the theories were right. You're right and all, but please stop speaking like that.

    8. Re:Why do you care? by psavo · · Score: 2, Insightful

      Faster hardware just makes minor inefficiencies less noticible, so programmers add more minor inefficiencies and apply the same "but faster hardware will make it ok" attitude instead of fixing the real problem!

      None of the discussion I've seen so far has touched the real problem, not the wanking on the micro-optimisation (these compiler optimizations are all O(n) in gain) but that so few have any sort of a clue on what algorithms to use and when.

      I'm not saying that it's bad to optimize those sort of things when they pop up at your most-time-spent chart, but that they may simply go away (along with 90% of other runtime) when you really think of the algorithm.

      --
      fucktard is a tenderhearted description
    9. Re:Why do you care? by Kosgrove · · Score: 2, Insightful

      It has nothing to do with programmers being lazy. I'd much rather work smarter using higher-level tools and get a lot more done. It has everything to do with this simple fact, which wasn't (as) true back in the old days:

      Hardware is cheap. People are expensive.

      Think about it. A desktop with current hardware costs under $1000 these days. Lower-end servers run about the same amount. Compare that to how much you cost your employer per hour. How does it make economic sense to (most - don't you embedded and low-power computing people get all up in my grill) optimize software at the low level you're suggesting? Most software for the end user spends most of its time waiting for user input or doing network operations, not reversing strings.

      If you want to optimize something, optimize the architecture. Pool database connections, reduce network traffic, change object relationships to make them more efficient, but for god's sake don't waste your time reinventing low-level functions that have been done written and more importantly, DEBUGGED to the nth degree.

      I strongly disagree that we should be encouraging people to think about optimizing the optimizied wheel. Spend your time thinking about bigger and better software problems.

  2. Re:Hmm.. by Surye · · Score: 2, Informative

    Oops, messed up the tag at the top, and it ate the quote portion:

    I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references

    Preview! Gah.

  3. Hundreds of possibilities by topham · · Score: 2, Insightful


    This question sounds to me like discussions I had ages ago with other programmers, and it was always 1 programmer trying to justify his method of coding a routine over some other, equally valid method.

    Pointer arithmetic and array's in C really have the same issues. You can access beyond the array, and you can direct a pointer beyond the correct memory space. If you want to discuss good programming practices, C isn't it.

    I never really saw the point in arguing over array and pointers in C. When I programmed in C I used both. I typically used Arrays if I wanted to code to be obvious and straight forward; if I wanted to do somehting else with the index, etc. I used points if I wanted speed above all else.

    If I were to write to any modern PC (PPC970, Pentium IV, Athlon, etc...) I wouldn't worry about speed. The algorithm for more complicated functions than shuffling a few bytes around will dictate the speed.

  4. WOW! by Anonymous Coward · · Score: 2, Insightful

    Now THIS is the kind of discussion that should be going on at Slashdot!

  5. is this really your bottleneck? by eyal · · Score: 2, Insightful
    When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?

    One can always check the resulting assembly code, if one is so concerned about this.

    Though I'm pretty sure this isn't the performance bottleneck in your code (just remember - profile, profile, profile)

  6. The eighties called... by Anonymous Coward · · Score: 3, Funny

    ...and they want their arrays, pointers and C back.

  7. It optimizes out by klossner · · Score: 5, Interesting
    An optimizing compiler, such as gcc -O, will rearrange the array code into the pointer code -- it doesn't require a base-index address mechanism. This is called strength reduction.

    Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.

    1. Re:It optimizes out by LSD-OBS · · Score: 4, Funny

      Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree

      Hah, yeah. Last week in a public toilet in London, someone drew an arrow pointing to the toilet roll, and it said "Computer degrees - please take one".

      Guess that about sums it up ;-)

      --
      Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
    2. Re:It optimizes out by Anonymous Coward · · Score: 2, Informative

      Exactly, strength reduction changes the indexing operation to straight pointer arithmetic. this is done at least when i looked a few years ago in gcc/g++, in the later phases of the compilation, so that that the compiler is rearranging the RTL to eliminate the indexing variable. you can verify strength reduction by just setting the optimisation in gcc and looking at the assembler output.

      The point is now a bit moot since for many loops you do not want either indexing or pointer arithmetic, you want SIMD instructions which are a third alternative way of implementing the the programmers looping construct. this is now done in SSA at the front end of gcc. In most cases compilers are smart enough to ignore the programmers code and minor semantic differences in code, like indexing or arithmatic will be restructured to the most optimal solution on the target architecture. So the point is, leave it to the compiler.

      jxx

    3. Re:It optimizes out by metamatic · · Score: 2, Interesting

      That was my first thought too. "They're equivalent, so why is anyone even asking? Any optimizing compiler will handle it."

      Then I remembered that this is Slashdot, where the groupthink is that a CS degree is useless and doesn't teach anything you need to know in the real world.

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
  8. I echo the above statements by 21chrisp · · Score: 4, Insightful

    These types of opimizations are virtually pointless on modern machines. The increased readability and lower likelihood of programming errors on the array option far outway any speed increase for the pointer option. Plus, as you noticed, the resulting assembly is basically the same. Most likely, both will run at virtually the same speed with modern compilers. Not only would the speed difference be unnoticeable, it would be utterly inconsequential.

    IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.

    1. Re:I echo the above statements by NullProg · · Score: 4, Interesting

      These types of opimizations are virtually pointless on modern machines.
      I call bullshit. Optimizations are important regardless of the language or CPU.

      My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.

      It took over six hours on the dual processor system with the native .Net and Java XML parsers. And yes, the original programmers tried several different methods/libraries to tweak the code (Sax, Xerces, whatever). They got it down to a best of four and a half hours.

      My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.

      Time is money, especially when your trying to push down 20,000 price changes from the Mainframe to 2000+ POS units during the off hours. The solution? We put my C routine into a shared library callable via C# or Java. Bonus, the 'C' code gets it done under an hour on the dual CPU machine. And yes, I tested the inputs for overflows, security problems whatever, before we went into production. Theres a big difference between a programmer who knows a language vs a programmer who understands it.

      IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
      Thats your opinion and I'm glad you shared it. You do know that your C#/Java/VB/Python etc. VM calls all wind up as pointer arithmetic to the CPU? Don't you? I wouldn't want to work for you though. Your competitors will write a faster program that uses less memory and you will loose the contract/job.

      No flame intended,
      Enjoy.

      --
      It's just the normal noises in here.
    2. Re:I echo the above statements by Arandir · · Score: 2, Interesting

      There are optimizations and then there are optimizations. If you run this particular loop once, or twice, it won't matter. Run it ten times or even a hundred, it won't matter. In these cases it would be a pointless optimization. Profile your software before you optimize. That way you can optimize where you need it and not waste time where you don't.

      I remember one code review where someone told me to use prefix instead of postfix notation, for optimization reasons. Yet it occured in an initialization routine in a background thread that would run once per day. That's like worrying about a memory leak in a power-off interrupt handler...

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    3. Re:I echo the above statements by Neil+Blender · · Score: 3, Interesting

      My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.

      Heh, a little offtopic but - This is why I hate XML. It's so bloated. You take 1 to 6 hours parsing a 30 megabyte XML file in C? I was just tasked with parsing out some select data from a 37 gigabyte XML file (870 million lines). I tried all sorts of optimizations and parsers upon finding that it might take days to parse. My solution - 50 lines of perl using regular expressions. I run this on a dual processor 3.something with 2 gigs of ram. It takes 5 minutes. If I coded it in C it would probably take 10 seconds but it's not worth the time.

      Here's the file and convertor if anyone wants to fuck around with nearly a billion (bloated XML) lines of genetic data:
      ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Al l_Data.ags.gz
      ftp://ftp.ncbi.nlm.nih.gov/asn1-converters/by_prog ram/gene2xml/

    4. Re:I echo the above statements by eric76 · · Score: 2, Interesting

      One of the more interesting things I've seen is to see how different software developers write a program for the following problem:

      Jack bought a bag of 100 pieces of candy at the store. It has 90 cherry candies and 10 lemon candies. He prefers the cherry candies over the lemon candies.

      Every day Jack randomly picks a candy from the bag. If the candy is a cherry candy, he eats it. If the candy is a lemon candy, he puts it back and randomly draws again. He eats the second randomly chosen candy no matter what flavor.

      What are the odds that the last piece of candy in the bag is a lemon candy?

      Interestingly enough, the approach nearly always used it horribly inefficent. I've never seen anyone run the program long enough to get an answer.

      In contrast, the approach used less often solves it in a fraction of a second.

    5. Re:I echo the above statements by Paul+Jakma · · Score: 2, Informative

      lower likelihood of programming errors on the array option. .... IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.

      You realise that pointer arithmetic and array indices are the same thing, don't you? Ie given:

      int b[100];
      int i;

      The following are equivalent:

      b[i++]

      *(b + i++)

      Arrays *are* /nearly/ equivalent to pointers, indeed an array variable without a subscript degrades to a pointer. Further, note in both cases above i may or may not be in-bounds for the array, the array notation does not help check the bounds at all, and the programmer will have to go to same trouble to check bounds properly regardless.

      It sounds to me btw like you're not qualified to decide what place pointer arithmetic have in modern software.

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
    6. Re:I echo the above statements by slamb · · Score: 2, Interesting
      Interesting. I'm at work now, so I shouldn't take the time to actually solve it until I get home. But it sounds like a problem you could solve by:
      1. Statistics. It seems like if you are One with the Statistics, you could do it with pencil, paper, and a calculator just powerful enough to do factorials. (Unfortunately, I am not.)
      2. Dynamic programming or recursion with memoization. (There are overlapping subproblems.)
      3. Recursion. This would be slow, and I bet it's the way most people use.
    7. Re:I echo the above statements by sleepingsquirrel · · Score: 2, Funny
      The following are equivalent:
      b[i++]

      *(b + i++)
      Yeah, but the fun doesn't really begin until you start writing code like...
      i++[b]
  9. Strength Reduction by The+boojum · · Score: 3, Informative
    My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
    Yes, semantically array[index] has to have the same effect as *(array+index). But the compiler is free to generate conceptually equivalent code in any way that it pleases. Any decent C/C++ compiler optimizer that can perform strength reduction ought to be able to see how the index changes the memory location and turn it into simple pointer incrementation accordingly. And strength reduction is a well known optimization that's been around for ages -- if memory serves, even the old Red Dragon book talks about how it works in this context. If your compiler can't handle this you need to find a better compiler.
  10. Do This instead by Leimy · · Score: 2, Interesting
    void reversestring_array(char *str)
    {
    int head, tail;
    char temp;
    if (!str) return;
    tail = strlen(str) - 1;
    for (head = 0; head < tail; ++head, --tail)
    {
    temp = tail[str];
    tail[str] = head[str];
    head[str] = temp;
    }
    }
    That'll teach ya.
  11. There cannot be any difference. by Anonymous Coward · · Score: 2, Informative

    The whole point is dumb.
    Every C programmer should know the answer to the following question:

    What is 6["abcdefghijkl"]?

    Answer: 'g'.

    How is this determined?
    By definition x[y]=*(x+y)=y[x].

    Don't believe me, check the standard.

    ( Yeah this is a degenerate case, like Duff's device. Still a compiler has to support it. )

  12. Re:Professor Cormen said... by RAMMS+EIN · · Score: 3, Informative

    You're not completely right.

    First:

    ``int i[20] followed by int *k = i, then i[4] is the same as *(k + (4 * 4))''

    You're trying to get the 5th element of the array by using an offset of 4 times 4, assuming sizeof(int) == 4. First off, don't make that assumption; always write sizeof(int) when that's what you mean. Secondly, the C compiler automatically multiplies your offset by the size of the elements, so you would have to write *(k + 4) instead.

    Secondly, you're not getting the point (probably, you were misled by the headline, as I was). The question is not whether variables holding arrays are really holding pointers to the arrays (they are), but whether, say, iterating through an array by updating a pointer is faster than iterating by using an index variable as an offset. In other words, it's not whether a[0] is the same as *a, but whether while(*a) a++; is faster than while(a[i]) i++;.

    --
    Please correct me if I got my facts wrong.
  13. Spelling and grammar and punctuation, oh my! by Roadkills-R-Us · · Score: 4, Funny

    I didn't see any errors in punctuation or grammar, either. I don't recall the last time I saw a post of that length which didn't confuse plural's (sic) with possessives.

  14. your mileage will vary, but... by blackcoot · · Score: 3, Informative

    ... my experience has been that this matters more in the multidimensional array case than in the single dimensional array case (for those who are curious: my goal is to write algorithms which do non-trivial amounts of processing on VGA or larger video at full frame rates [>= 15Hz], any time i can make array operations faster my entire program benefits significantly). when dealing with two dimensional arrays, you can either do the addressing yourself (location [i,j] in a r x c matrix maps to [i*c+j] in a flat array). if you are clever about how you explain your indexing to the compiler, it should realize that you're passing through consecutive addresses in memory and generate code accordingly. if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.

    have you tried this with intel's c/c++ compiler or other compilers? i'd be curious to see if what you're seeing is a result of how gcc is limited in the number of optimizations it can apply directly to the parse tree because it can't assume (at that stage) a particular target machine.

  15. Use Java instead .... by icepick72 · · Score: 2, Funny

    ... and then you will no longer have to worry that it might be slow.

  16. GCC experimental results by RML · · Score: 5, Interesting

    Just for fun, I tried the sample code on gcc (GCC) 4.1.0 20050723 (experimental), with -O3 -march=pentium-m. The loop from the array version:

    L13:
    movzbl  -1(%ebx), %edx
    movl    %esi, %ecx
    decl    %edi
    movl    8(%ebp), %eax
    movb    %dl, -13(%ebp)
    movzbl  -1(%esi,%eax), %edx
    movb    %dl, -1(%ebx)
    decl    %ebx
    movzbl  -13(%ebp), %edx
    movb    %dl, -1(%esi,%eax)
    incl    %esi
    cmpl    %ecx, %edi
    jg      L13

    The loop from the pointer version:

    L5:
    movzbl  1(%esi), %edx
    movl    %esi, %ecx
    movzbl  (%ebx), %eax
    movb    %al, 1(%esi)
    decl    %esi
    movb    %dl, (%ebx)
    incl    %ebx
    cmpl    %ecx, %ebx
    jb      L5

    Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s
    Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s

    So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.

    --
    Human/Ranger/Zangband
    1. Re:GCC experimental results by Anonymous+Brave+Guy · · Score: 2, Interesting

      GCC is designed to be portable, not fast, so the code is generates is often pretty bad compared to specialised, platform-specific compilers. Obviously your test is relevant if GCC is the compiler you'll be using, but for serious performance work it's pretty much irrelevant what GCC generates because no-one uses it when native alternatives are available anyway. In fact, your example code here is a great demonstration of this!

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    2. Re:GCC experimental results by UOZaphod · · Score: 2, Insightful

      What is used to compile the Linux Kernel?

      --
      "The unicode stuff in the latest version is working fabulously well. My russian mafia friends are ecstatic."
    3. Re:GCC experimental results by tajribah · · Score: 2, Interesting

      Actually, that's not true in many cases -- GCC 3.x generates very good code, but the 4.x versions still haven't caught up with the 3.x line.

    4. Re:GCC experimental results by Deorus · · Score: 2, Insightful

      >> no-one uses it when native alternatives are available anyway

      Aren't hundreds of Linux distributions out there enough to prove that assumtion wrong? Gcc is used regardless of its speed because it's free.

    5. Re:GCC experimental results by Piquan · · Score: 2, Informative

      Unfortunately, the article didn't say what compiler he was using. But since we're giving data points:

      gcc 3.4.2 3.4.2 [FreeBSD] 20040728, x86, -O3 -march=pentium-m. Generated essentially the same code as the article's.

      Array version:

      .L12:
      movzbl (%esi,%ecx), %edx
      movzbl (%esi,%ebx), %eax
      movb %al, (%esi,%ecx)
      decl %ecx
      movb %dl, (%esi,%ebx)
      incl %ebx
      .L10:
      cmpl %ecx, %ebx
      jl .L12

      Pointer code:

      .L23:
      movzbl (%ebx), %edx
      movzbl (%ecx), %eax
      movb %al, (%ebx)
      decl %ebx
      movb %dl, (%ecx)
      incl %ecx
      .L22:
      cmpl %ebx, %ecx
      jb .L23

      1.4 GHz Athlon. Array code time: 3.274s. Pointer time: 3.322s. Single (100000x) trial of each.

      I'd say that's within noise.

  17. And C++/STL? by XenonOfArcticus · · Score: 4, Informative

    And what of STL container classes under C++?

    Seriously though, there is no generalized answer. Good compilers will do what you want. Bad compilers (and there are more than you realize out there) will make lousy code.

    If your target involves an environment where you might be using a more primitive compiler, or you can't predict the environment and compiler, it might be an issue. This is why code like the PNG and JPEG libs go for tight/cryptic. As well, the performance of the runtime platform (CPU, memory, bus) have bearing. In some cases, though one piece of assembly might look less efficient than the other, the sheer brute force of CPU parallelism, out of order execution and other black juju might render it meaningless.

    Finally, you have to consider the cost/benefit of the situtation. Making cryptic fast code is worthwhile if you're writing some wicked FFT code or image processor main loop, where it'll run a few quadrillion times. Other places in the same codebase though, it's probably worthwhile to trade absolute performance for a bit better code readability and maintainability.

    Remember, profile before optimizing. Only optimize things that will really make a significant performance difference. Rewriting the UI display loop to use pointers instead of lists is probably pointless. Heh. Pointless.

    I'm a big fan of C++ STL containers now. If I _know_ a block of code is going to be a critical bottleneck, I'll use something else from the start. I've known people who coded UIs in assembly, for no good reason, and others who wrote image processing code in interpreted RAD scripting environments. I've written and optimized code (C++/C and assembly) on systems all the way back to the 6502 (yay! two 8-bit index registers!) and it's not hard, as long as you proceed sensibly and logically.

    That being said, the Microsoft VC++6 compiler (still in common use today) has a terrible code generator. It fails to perform simple loop invariant hoisting operations that my old SAS/C++ compiler (Amiga, yah, I'm one of _them_...) did in 1990. VC++7/2003 and Whidbey/2005 are showing signs of being MUCH more caught-up, and the Intel and Codeplay compilers (despite Intel's AMD-phobia) are much better too.

    When performance really counts, a whole new set of tools and processes come into play.

    --
    -- There is no truth. There is only Perception. To Percieve is to Exist.
    1. Re:And C++/STL? by Arandir · · Score: 2, Funny

      I'm currently writing a piece of software to compete with a commercial proprietary offering. I'm using "bloated" STL and "bloated" C++ to manipulate "bloated" XML. Frankly, I'm shocked at its poor performance, and I might have to do some optimizations. It takes about a half of a second to load and process 200k worth of XML.

      ON THE OTHER HAND, the commercial proprietary alternative I'm competing against loads the equivalent data. But that data consumes 25Megs, and takes five seconds to load! You would think if they're going to use a proprietary data format, they could at least make one that works!

      I'm not going to apologize anymore for using "bloated" tools.

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    2. Re:And C++/STL? by WARM3CH · · Score: 2, Informative
      OK I tried it with Visual Studio 2003 and also tried STL. Of course with STL you only need one line of code to reverse a character string:
      reverse(str, str+strlen(str));
      Now, the interesting part is what the optimizing compiler outputs for each of the three vairants (I only include the inner loop):
      With pointers we have:

      $L13583:
      mov bl, BYTE PTR [ecx]
      mov dl, BYTE PTR [eax]
      mov BYTE PTR [eax], bl
      mov BYTE PTR [ecx], dl
      inc ecx
      dec eax
      cmp ecx, eax
      jb SHORT $L13583

      With array we have:

      $L12527:
      mov bl, BYTE PTR [ecx+esi]
      mov dl, BYTE PTR [eax+esi]
      mov BYTE PTR [eax+esi], bl
      mov BYTE PTR [ecx+esi], dl
      inc ecx
      dec eax
      cmp ecx, eax
      jl SHORT $L12527

      and with STL we have:

      $L13663:
      mov dl, BYTE PTR [ecx]
      mov bl, BYTE PTR [eax-1]
      dec eax
      mov BYTE PTR [ecx], bl
      inc ecx
      cmp ecx, eax
      mov BYTE PTR [eax], dl
      jb SHORT $L13663
      Quite nice, no? ;) Now, I made one last step and used the RDTSC instruction to actually count how many clock cycles it takes to run each version of the function to reverese a 80 characters string. This way we can also see the effect of the parts not inside the loop. This is the result:
      function with the array: 661 cycles
      fucntion with the pointer: 616 cycles
      function with the STL: 607 cycles
      So although the core loop section is almost identical with the STL and pointer version, the STL version is tiny bit better with the setup section. All in all, I think this an example to show nice and neat C++ code can compete fairly well with optimized C.
  18. Re:there is a difference... by Waffle+Iron · · Score: 4, Informative
    what you in fact have is in the array 4 additional additions (between registers),

    Actually, most x86s have a dedicated address generation units which handle those index additions in parallel on separate logic from the main ALU. So both cases would actually run at the same speed on a modern x86.

  19. Re:In summary... by Anonymous+Brave+Guy · · Score: 2, Funny
    If it truly matters then you count the ticks for each platform, and decide if it is reasonable. 3 ticks versus 2... that's 33%. 2 ticks versus 1... that's 50%.

    Not necessarily. You're ignoring the fact that modern processors use pipelining architectures, branch prediction, extensive caching often at several levels, and a whole host of other things that mean the total time required is not the sum of the individual times for each instruction.

    One of these days, someone will invent a program that can take some more abstract representation of what we want to do, and automatically generate optimised machine code from it on any given platform. Yeah, there's an idea... I can C it now!

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  20. As Donald Knuth said... by jschmerge · · Score: 4, Insightful

    Premature optimization is the root of all evil.

    I personally let the compiler writers worry about this type of thing. I'd rather have my code be more readable than fast. That being said, there are many times that I switch back and forth between pointer arithmetic and array indexing within the same program. I'll primarily use the pointer arithmetic for things like simple string processing where it just leads to more compact code, and I'll use the array indexing when I have an actual bonafide array...

    In any event, my point is that you should be programming in a way that is maintainable and readable; you shouldn't be worried about shaving tens of clock cycle off of such simple loops. In more complex loops, you probably will not be able to shave nearly as much time off, because your indexes won't always be sitting in a register and the data that the index points to will most likely not even fit into a register. In this case, I don't think anyone will argue with the assertion that this:

    (ptr + index)->dataMember

    is less readable than this:

    ptr[index].dataMember
  21. Pointers are for programmers silly rabbit!! by Da_Weasel · · Score: 2, Insightful

    Pointers are for programmers, not for speed. A particular compilers implementation of the C specification may produce better or more efficient code when using pointers, but i seriously doubt that every implementation would work the same or even across different architectures.

    --
    If you must!
  22. Re:Effective cache use will be a better optimisati by functor0 · · Score: 2, Funny

    I have good story for this one. In my second co-op term, I was asked to improve the speed of the general 2D convolution image filter for a well-known commercial paint package. Nothing complicated, just run the given 2D kernel through the image.

    The previous writer of the function had carefully removed all the array accesses, only using pointers that were incremented by fixed constants as it proceeded through the code. He had also carefully maintained an array of the sums of the results as we moved from pixel to pixel to avoid re-calculating anything twice. You would think, it's pretty fast, eh?

    I came across two problems with the code:

    1. His array of sums was actually a queue and so he shifted the entire array by one element for every single pixel. Using a ring buffer instead quickly solved this one.

    2. I then came to realize that he was traversing the row-major image in columns. The cache coherency was being shot to hell because none of the cache-lines were being hit as it went from pixel to pixel. I rewrote the function to go by rows instead and guess what the speed improvement was? Something like 3 times!

    Go figure. Some people optimize and get it entirely wrong.

  23. Thinking like this by phorm · · Score: 2, Insightful

    Is why we have big, ugly, bloated programs that require overpowered CPUs. First of all, it depends on what the application is being coded for. Perhaps it's actually intended for slower machines as well as faster... not everything is made to run on a P4-4Ghz machine you know.

    Secondly, these operations can add up. If, for example, this scenario is used throughout the code and called several times per second, on an operation perhaps requiring ready output, the output might even be visible delayed on a faster machine.

    Pointers can be a pain in the ass and I definately agree that I find them annoying at times, but they also have their place and you don't always have the option of sacrificing beauty for functionality. If the code is a bit confusing, well that's what comments are for.

  24. You shouldn't make any assumptions ... by bigsteve@dstc · · Score: 2, Insightful
    I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?

    Honestly, there is no real justification for making any assumptions. It depends on how good a particular C compiler is at generating code for a particular ISA. Indeed, for some ISA's I'd expect a niave C compiler to generate FASTER code for indexing an array than for incrementing a pointer. (I'm thinking of word addressed ISAs, where a 'char*' has to be represented as a word pointer and a character offset.)

    In the big picture, it probably doesn't make a great deal of difference to performance which style you use. It might make 5-10% difference in a tight loop, but probably much less across a large application. If the difference performance is significant, you will get more benefit for your effort by:

    • finding better C compilers, or
    • using a profiler to find the real hotspots in the code and hand-optimizing them for the hardware platforms that you really care about.

    As other people have said, other issues than this are likely to have a greater bearing on the overall performance of a typical application; e.g. data structures, algorithms, database design, etc.

  25. Vectorisation by Heretik · · Score: 2, Insightful

    Maybe not relevant in this case since you're working with strings, but with vectorization being so important to performance on most modern architechtures, if you were dealing with floats the pointer one might actually be slower because it's much harder for the compiler to figure out if (and how) it can vectorize it.

    I'm not sure about various compilers and what they do in this case, but following the progress of GCC4's vectorisation, it looks much more likely that the pointer case is passed over by the vectorizer and ends up being (way) slower than the easy to vectorise array index version.

    Like I said, not sure what the actual situation in practise is, but it's worth looking into. The difference between vectorized SSE code and plain old x86 code (for example) is WAY greater than the trivial insignificant difference between the two examples you posted.

  26. ASSume by larry+bagina · · Score: 3, Insightful
    My argument is that one cannot assume

    You're right, however, you're also assuming that your pointer arithmetic is faster.

    Consider a 16-bit architecture with 32-bit pointers.

    Using pointer arithmetic (32-bit) is slower than using an index register (16-bit) as the array index.

    So stop assuming and stick with what you're comfortable with. If you prefer pointers, fine. if you prefer arrays, fine. But if you're so concerned with the speed, you'd be doing it in assembly.

    --
    Do you even lift?

    These aren't the 'roids you're looking for.

  27. From the Compiler Trenches by Anonymous Coward · · Score: 5, Informative
    I develop highly optimizing compilers for a living, so use that to put in context what I'm going to say. Things look very different from the other side of the source file.

    I'll admit that I'm always slightly bemused by these sorts of discussions. That bemusement quickly turns to disiniterest after I realize that a lot of people are burning a lot of cycles arguing about it.

    Quite frankly, your example has little relevance in the real world. The optimization you are talking about is covered by strength reduction, as others have pointed out. But that's not the point of my message. This sort of piddly optimization means almost nothing when one looks at the whole application. If this piece of code is in an inner loop that takes 90% of the application time and it proves to be the bottleneck, then one can think about taking a closer look.

    We have customers that come to us all the time with just such examples. They literally tell us, "You missed an opportunity to use a register here," and we know it's important because we know the customers are serious about profiling code and finding bottlenecks. So when they come to us we are happy to look at the "piddly stuff."

    I've seen all kinds of different code. They basically fall into two categories: code that the compiler can do something significant with and code the compiler has no hope of automatically optimizing in any truly meaningful way. When I say "significant" and "meaningful," I explicitly mean "not Dragon Book stuff, except for scheduling (which can provide a significant performance win)" Simple optimizations like common subexpression elimination and copy propagation are more useful at enabling other optimizations than in any cycles gained in their own right. They are important, but not to make the code run significantly faster in and of themselves.

    If one writes an application that does a lot of branching and pointer chasing (say, like, a traditional compiler*) then there's not much an optimizer can do with it. The aliasing difficulties alone will kill most optimizations. It's more important to write these kinds of applications in an understandable way because that is where the programmer time is most costly.

    That said, judicious use of directives for compilers that support them can go a long way toward making these kinds of codes run really fast. Think of threading a tree search, for example. But the compiler is not going to have much hope of converting such a low-level piece of code without help.

    An example of code that a compiler can do really, really well on are the traditional scientific applications. Here parallelism is everything, be it data-level, thread-level or at the distributed memory level (for really big machines). In these cases the loop optimizations are orders of magnitude more important speed-wise than sequential optimization (though, as I said, sequential optimization can enable some of these loop transformations). Some of the more important loop restructuring transformations are:

    • Interchange
    • Unroll
    • Fusion
    • Fission
    • Unroll-and-Jam
    • Invariant hoisting (both data and control)
    • Vectorization
    • Cache Blocking

    When the compiler is done with these, one hardly recognizes the code anymore. :)

    In my experience, the fundamental problem is that compilers are really hard to understand. People argue about what a compiler can and cannot do because they are enormously complex systems that require arcane knowledge of language standards and hardware architecture to really dig into. It's slightly less difficult to understand the broad strokes, for example, simple cases of vectorizable loops. It's a lot more difficult to understand how to parallelize a loop that compresses a sparse array into a dense one.

    If there's one lesson that I like to convey to programmers, it's to not sweat the optimization details. Don't hand-optimize the code. I can tell you fro

    1. Re:From the Compiler Trenches by Anonymous Coward · · Score: 2, Insightful

      [Apologies for the code formatting <ecode> doesn't seem to work (in preview mode, anyway).]

      Hand-optimizing a piece of code without first making sure it's important via profiling and also looking at what the compiler is acutally doing is not avoiding laziness. It's simply a waste of time that prevents one from getting real work done.

      While it's true that our compiler technology continues to improve, there are some cases where hand-optimization puts the cocde in a state that no compiler will ever have a hope of repairing. Here's a simple example (contrived for simpler explanation, but not very far from what one sees in real-world code):

      int global[BIGNUM];
      int *p = global; /* "Optimize" address arithmetic */

      void foo()
      {
      int x[BIGNUM];
      int y[BIGNUM];

      initialize(x, y); /* In another compilation unit, not inlined */

      for(i = 1, i < BIGNUM; ++i) { /* x[0], y[0] hold "special" info */
      for(j = 1; j < BIGNUM; ++j) {
      x[i] += y[j]; /* Reduce y into each element of x */
      }
      *p++ += x[i]; /* Add x to global */
      }
      }

      Note that to the human eye, the outer loop looks parallel. There are no cross-iteration dependencies. It's a straight vector copy loop, very fast on architectures that support it like MMX, SSE, Altivec, etc.

      We've been "clever" and converted the indexing off of global into pointer accesses. In addition, we've made the pointer global because we know this function is the innermost loop and it will be called millions of times. We definitely want to avoid those millions of assignments.

      Or do we? Let's take a look at what we've done. When we call initialize to fill in x and y, we don't know ewhat happens to p. Coming out of that call, p could point to anything. More specifically, it could point to x. Even more specifically, it could point to x[0]. This creates a recurrence meaning that there is now a possible cross-iteration dependency in the outer loop.

      Oops. It's no longer parallel.

      Well, that's not strictly true. It still is functionally parallel but good luck convincing the compiler of that without help from directives. Strictly speaking, recurrences can be parallelized but even if this "optimized" code were parallelized run a heck of a lot slower than the straight vector copy.

      We didn't even hand "optimize" very aggressively. I've seen cases where programmers collapse multiple levels of loops accessing a multi-dimensional array and in the process completely destroy any information the compiler might have had about the iteration pattern. The end result is that loops that would have parallelized don't anymore.

      With parallelization we are talking anywhere from 2x to 50x speedup. Don't trade that for the .1% you'll get from converting an array access into pointer arithmetic.

      Again, the problem is that it is hard for humans to understand the purely mechanical process of the compiler. We "know" information (that initialize doesn't touch p) that the compiler never sees. Oh sure, there are some compilers out there that do whole program analysis and might be able to uncover what's going on here but the compile time on these systems is exponential.

  28. Benchmarking by Threni · · Score: 2, Insightful

    If you're talking about optimisation then the only thing to do is to check different strategies. You can't just assume type a is a better coding method than type b. Even if you could prove it were true for every single compiler on the market, one could come along tomorrow and do things differently.

  29. Pointer aliasing by igomaniac · · Score: 2, Interesting

    With modern compilers you should always use arrays. The reason is that for all but the most pathological cases of indexing, the compiler will do strength reduction and induction to turn the p[i*4] into *p,p+=4. The problem with using pointers is that if you have more than one of them, the compiler has no easy way of knowing if they point to the same place (this is known as pointer aliasing) so a bunch of useful optimizations have to be turned off. If you are doing a[i] = b[i] the compiler can much more easily find out that a and b are distinct memory locations than if you are doing *p = *q.

    If you want to learn more about these kind of source level optimisations, look the AMD Athlon(TM) Processor x86 Code Optimization Guide is a good reference -- it includes a section on why you want to use array accesses instead of pointers, and also has a lot more up-to-date and useful advice on what compilers do to your code. It is available from AMD's website.

    --

    The interactive way to Go -- http://www.playgo.to/iwtg/en/
  30. 80x25 by dolmen.fr · · Score: 2, Interesting
    Again, your numbers are flawed, because my old 80x25 text mode display is still drawn from individual pixels. Those pixels still have to be updated individually by the CPU (in the 10MHz days it was often the main CPU that drew every little dot). 80x25 is drawn on a 640x480 in 16 glorious colours. Now, by your argument, 640x480x2 (your 2 image argument from above) = 614,400 bytes. Therefore, your 1900x1200 screen only requires 12x as many bytes to move about but I really dont' care about those numbers because most of the work is done in the GPU now; not the main CPU.


    In the PC world, the textmode was handled in hardware by the graphic card. Yes, we had already hardware accelerated display at that time. In fact is still in the latest PC video card.
    One byte for the character, one byte for the color attribute.
    So your argument doesn't apply.
  31. Where have all the geeks gone? by microTodd · · Score: 3, Interesting

    Wow. Just wow.

    No one will probably read this comment because its been a day since the OP, but I'm amazed at the quantity of people who are slamming this guy for wanting to research something that's admittedly interesting.

    For starters, if the submitter is a CompSci student then he definately gets my kudos. Too many CS students are just focused on "I wanna learn C# so I can go make money!" as opposed to actually LEARNING.

    Secondly, what happened to just plain geekiness of research and studying things because its fun and interesting? Does everything we do have to have some specific applicable purpose? If you say yes, you are thinking like the MBAs that always get bashed around here instead of a real nerd.

    Who knows? Its unlikely, but possible that thinking about this problem somehow leads to a train of thought that solves P=NP or something.

    --
    "You cannot find out which view is the right one by science in the ordinary sense." - C.S. Lewis on Intelligent Design
  32. Photoshop plug-ins by mwvdlee · · Score: 3, Informative

    I've written quite a lot of code for Photoshop plug-ins.

    Since this type of code typically iterates over a few hunderd million pixels you'd think that changing such details as array vs. pointer or some other common optimalization technique would have an impact.

    It does; it typically shaves about a few tenths of a second off of a 5 minute calculation.

    Then again, spending that same amount of time altering the algorithm will usually increase performance in a noticable way.

    Nowadays I don't bother optimizing code (usually the compiler does a better job at it anyway) but rather optimize the algorithms. Instead of opening the topic and waiting for a definitive answer on your quest for ultimate performance, you could probably have rewritten the algorithm and gained much more performance you'd ever get this way.

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
  33. One of the secrets of the master programmers by rfisher · · Score: 2, Insightful

    Master programmers know that it doesn't matter. We:

    1. Write the code in the form that is clearest. (So that the maintainer--even if it is ourself--can figure out what it is doing quickly when debugging it later.)

    2. If AND ONLY IF performance is a concern and if AND ONLY IF this code has been shown to be a bottleneck, do we consider optimizing. We then try several different implementations and then profile them to choose the winner.

  34. Array and Pointer are not the cause for slow speed by DemonSlayer · · Score: 2, Interesting

    First of all, the array and pointers are not the reason why a C program run slowly.

    "Unnecessary looping" is the cause of the problem. I have encountered and fixed many C programs created by programmers, who have left the company.

    I have found that many programmers like use alot of for-loop and while-loop in their programs. Most of the time, the algorithm of their C programs can be modified to reduce the need for the loops and increase the speed of the C programs by 10 times or more.

  35. WRONG ! by oldCoder · · Score: 2, Insightful
    The increased readability and lower likelihood of programming errors on the array option...

    The array version is neither clearer nor less error-prone. And I dare you to prove differently!

    The pointer version is more portable across the chasm of compiler quality. That is, it will run well even on poorer compilers.

    The question of clearer is a question of cognitive psychology, not computer science, and requires experimentation to validate. The experiment would be confounded by the existing prejudice against pointers and the habit of CS profs of teaching subscripts over pointers.

    N.B.: There are problems with C pointers, but that comes up when using pointers to data structures, not pointers to bytes. Dangling pointers, free/malloc problems, and all that. Using subscripts helps very little, however.

    It is not true that the optimization is nearly pointless, it is that the difference in performance is so small as to make the decision nearly pointless. No pun intended. In some situations, the difference may be very important. The pointer version is no harder to code, read, or debug, and might possibly give you back benefits.

    What I'm trying to say here is that the pointer version is not an optimization, it's an alternative. Optimizations require more work than the vanilla version. For string/character loops, there is no more work in coding the pointer version.

    --

    I18N == Intergalacticization