Arrays vs Pointers in C?
UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?"
"Here are my own thoughts on the issue:
For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.
The first function uses array indexing:
Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.
My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...
The array function:
I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."
For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.
The first function uses array indexing:
void reversestring_array(char *str)The second function uses pointers:
{int head, tail;}
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{temp = str[tail];}
str[tail] = str[head];
str[head] = temp;
void reversestring_pointer(char *str)While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.
{char *phead, *ptail, temp;}
if (!str) return;
ptail = str + strlen(str) - 1;
for (phead = str; phead < ptail; ++phead,--ptail)
{temp = *ptail;}
*ptail = *phead;
*phead = temp;
Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.
My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...
The array function:
forloop:The pointer function:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al
inc esi
dec ecx
cmp esi,ecx
jl forloop
forloop:While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl
inc ecx
dec eax
cmp ecx,eax
jb forloop
I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."
For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?
If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.
This question sounds to me like discussions I had ages ago with other programmers, and it was always 1 programmer trying to justify his method of coding a routine over some other, equally valid method.
Pointer arithmetic and array's in C really have the same issues. You can access beyond the array, and you can direct a pointer beyond the correct memory space. If you want to discuss good programming practices, C isn't it.
I never really saw the point in arguing over array and pointers in C. When I programmed in C I used both. I typically used Arrays if I wanted to code to be obvious and straight forward; if I wanted to do somehting else with the index, etc. I used points if I wanted speed above all else.
If I were to write to any modern PC (PPC970, Pentium IV, Athlon, etc...) I wouldn't worry about speed. The algorithm for more complicated functions than shuffling a few bytes around will dictate the speed.
Now THIS is the kind of discussion that should be going on at Slashdot!
One can always check the resulting assembly code, if one is so concerned about this.
Though I'm pretty sure this isn't the performance bottleneck in your code (just remember - profile, profile, profile)
These types of opimizations are virtually pointless on modern machines. The increased readability and lower likelihood of programming errors on the array option far outway any speed increase for the pointer option. Plus, as you noticed, the resulting assembly is basically the same. Most likely, both will run at virtually the same speed with modern compilers. Not only would the speed difference be unnoticeable, it would be utterly inconsequential.
IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
Premature optimization is the root of all evil.
I personally let the compiler writers worry about this type of thing. I'd rather have my code be more readable than fast. That being said, there are many times that I switch back and forth between pointer arithmetic and array indexing within the same program. I'll primarily use the pointer arithmetic for things like simple string processing where it just leads to more compact code, and I'll use the array indexing when I have an actual bonafide array...
In any event, my point is that you should be programming in a way that is maintainable and readable; you shouldn't be worried about shaving tens of clock cycle off of such simple loops. In more complex loops, you probably will not be able to shave nearly as much time off, because your indexes won't always be sitting in a register and the data that the index points to will most likely not even fit into a register. In this case, I don't think anyone will argue with the assertion that this:
is less readable than this:
Pointers are for programmers, not for speed. A particular compilers implementation of the C specification may produce better or more efficient code when using pointers, but i seriously doubt that every implementation would work the same or even across different architectures.
If you must!
What is used to compile the Linux Kernel?
"The unicode stuff in the latest version is working fabulously well. My russian mafia friends are ecstatic."
all these new-fangled alternatives suck
Right, that's why I use Common Lisp.
Is why we have big, ugly, bloated programs that require overpowered CPUs. First of all, it depends on what the application is being coded for. Perhaps it's actually intended for slower machines as well as faster... not everything is made to run on a P4-4Ghz machine you know.
Secondly, these operations can add up. If, for example, this scenario is used throughout the code and called several times per second, on an operation perhaps requiring ready output, the output might even be visible delayed on a faster machine.
Pointers can be a pain in the ass and I definately agree that I find them annoying at times, but they also have their place and you don't always have the option of sacrificing beauty for functionality. If the code is a bit confusing, well that's what comments are for.
Honestly, there is no real justification for making any assumptions. It depends on how good a particular C compiler is at generating code for a particular ISA. Indeed, for some ISA's I'd expect a niave C compiler to generate FASTER code for indexing an array than for incrementing a pointer. (I'm thinking of word addressed ISAs, where a 'char*' has to be represented as a word pointer and a character offset.)
In the big picture, it probably doesn't make a great deal of difference to performance which style you use. It might make 5-10% difference in a tight loop, but probably much less across a large application. If the difference performance is significant, you will get more benefit for your effort by:
As other people have said, other issues than this are likely to have a greater bearing on the overall performance of a typical application; e.g. data structures, algorithms, database design, etc.
Maybe not relevant in this case since you're working with strings, but with vectorization being so important to performance on most modern architechtures, if you were dealing with floats the pointer one might actually be slower because it's much harder for the compiler to figure out if (and how) it can vectorize it.
I'm not sure about various compilers and what they do in this case, but following the progress of GCC4's vectorisation, it looks much more likely that the pointer case is passed over by the vectorizer and ends up being (way) slower than the easy to vectorise array index version.
Like I said, not sure what the actual situation in practise is, but it's worth looking into. The difference between vectorized SSE code and plain old x86 code (for example) is WAY greater than the trivial insignificant difference between the two examples you posted.
You're right, however, you're also assuming that your pointer arithmetic is faster.
Consider a 16-bit architecture with 32-bit pointers.
Using pointer arithmetic (32-bit) is slower than using an index register (16-bit) as the array index.
So stop assuming and stick with what you're comfortable with. If you prefer pointers, fine. if you prefer arrays, fine. But if you're so concerned with the speed, you'd be doing it in assembly.
Do you even lift?
These aren't the 'roids you're looking for.
If you're talking about optimisation then the only thing to do is to check different strategies. You can't just assume type a is a better coding method than type b. Even if you could prove it were true for every single compiler on the market, one could come along tomorrow and do things differently.
>> no-one uses it when native alternatives are available anyway
Aren't hundreds of Linux distributions out there enough to prove that assumtion wrong? Gcc is used regardless of its speed because it's free.
Master programmers know that it doesn't matter. We:
1. Write the code in the form that is clearest. (So that the maintainer--even if it is ourself--can figure out what it is doing quickly when debugging it later.)
2. If AND ONLY IF performance is a concern and if AND ONLY IF this code has been shown to be a bottleneck, do we consider optimizing. We then try several different implementations and then profile them to choose the winner.
[Apologies for the code formatting <ecode> doesn't seem to work (in preview mode, anyway).]
Hand-optimizing a piece of code without first making sure it's important via profiling and also looking at what the compiler is acutally doing is not avoiding laziness. It's simply a waste of time that prevents one from getting real work done.
While it's true that our compiler technology continues to improve, there are some cases where hand-optimization puts the cocde in a state that no compiler will ever have a hope of repairing. Here's a simple example (contrived for simpler explanation, but not very far from what one sees in real-world code):
Note that to the human eye, the outer loop looks parallel. There are no cross-iteration dependencies. It's a straight vector copy loop, very fast on architectures that support it like MMX, SSE, Altivec, etc.
We've been "clever" and converted the indexing off of global into pointer accesses. In addition, we've made the pointer global because we know this function is the innermost loop and it will be called millions of times. We definitely want to avoid those millions of assignments.
Or do we? Let's take a look at what we've done. When we call initialize to fill in x and y, we don't know ewhat happens to p. Coming out of that call, p could point to anything. More specifically, it could point to x. Even more specifically, it could point to x[0]. This creates a recurrence meaning that there is now a possible cross-iteration dependency in the outer loop.
Oops. It's no longer parallel.
Well, that's not strictly true. It still is functionally parallel but good luck convincing the compiler of that without help from directives. Strictly speaking, recurrences can be parallelized but even if this "optimized" code were parallelized run a heck of a lot slower than the straight vector copy.
We didn't even hand "optimize" very aggressively. I've seen cases where programmers collapse multiple levels of loops accessing a multi-dimensional array and in the process completely destroy any information the compiler might have had about the iteration pattern. The end result is that loops that would have parallelized don't anymore.
With parallelization we are talking anywhere from 2x to 50x speedup. Don't trade that for the .1% you'll get from converting an array access into pointer arithmetic.
Again, the problem is that it is hard for humans to understand the purely mechanical process of the compiler. We "know" information (that initialize doesn't touch p) that the compiler never sees. Oh sure, there are some compilers out there that do whole program analysis and might be able to uncover what's going on here but the compile time on these systems is exponential.
The array version is neither clearer nor less error-prone. And I dare you to prove differently!
The pointer version is more portable across the chasm of compiler quality. That is, it will run well even on poorer compilers.
The question of clearer is a question of cognitive psychology, not computer science, and requires experimentation to validate. The experiment would be confounded by the existing prejudice against pointers and the habit of CS profs of teaching subscripts over pointers.
N.B.: There are problems with C pointers, but that comes up when using pointers to data structures, not pointers to bytes. Dangling pointers, free/malloc problems, and all that. Using subscripts helps very little, however.
It is not true that the optimization is nearly pointless, it is that the difference in performance is so small as to make the decision nearly pointless. No pun intended. In some situations, the difference may be very important. The pointer version is no harder to code, read, or debug, and might possibly give you back benefits.
What I'm trying to say here is that the pointer version is not an optimization, it's an alternative. Optimizations require more work than the vanilla version. For string/character loops, there is no more work in coding the pointer version.
I18N == Intergalacticization