Arrays vs Pointers in C?
UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?"
"Here are my own thoughts on the issue:
For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.
The first function uses array indexing:
Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.
My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...
The array function:
I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."
For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.
The first function uses array indexing:
void reversestring_array(char *str)The second function uses pointers:
{int head, tail;}
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{temp = str[tail];}
str[tail] = str[head];
str[head] = temp;
void reversestring_pointer(char *str)While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.
{char *phead, *ptail, temp;}
if (!str) return;
ptail = str + strlen(str) - 1;
for (phead = str; phead < ptail; ++phead,--ptail)
{temp = *ptail;}
*ptail = *phead;
*phead = temp;
Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.
My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...
The array function:
forloop:The pointer function:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al
inc esi
dec ecx
cmp esi,ecx
jl forloop
forloop:While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl
inc ecx
dec eax
cmp ecx,eax
jb forloop
I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."
Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.
then don't wank around optimizing
Dude, best use of the word wank. Ever!
for single cycles on a machine that probably bleeds off a million cycles every time you raise a window
Computers have become more powerful and programmers have become more lazy. It's not strictly true because instead of focussing a lot of time writing efficient code programmers are now focussing a lot of time writing a lot of code to fill bigger machines. That million cycles is wasted doing crap and probably half of doesn't need to be done anyway...
I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz). Software was generally responsive, functional and minimal. Adding a zillion features and eye candy was not considered necessary. Programs were easy to use and intuitive, and did I mention functional and minimal? In the days where "nobody will ever need more than 640k" software was designed and optimised to be small and chew up few cycles.
Now, with RAM and Gigahertz available for next to nix software has just bloated out. It's nice to see a programmer thinking about efficiency/size even if it is purely academic. We should be encouraging that; I know I'd like my applications to work faster and carry less crap than they do currently.
I drink to make other people interesting!
Just for fun, I tried the sample code on gcc (GCC) 4.1.0 20050723 (experimental), with -O3 -march=pentium-m. The loop from the array version:
L13:
movzbl -1(%ebx), %edx
movl %esi, %ecx
decl %edi
movl 8(%ebp), %eax
movb %dl, -13(%ebp)
movzbl -1(%esi,%eax), %edx
movb %dl, -1(%ebx)
decl %ebx
movzbl -13(%ebp), %edx
movb %dl, -1(%esi,%eax)
incl %esi
cmpl %ecx, %edi
jg L13
The loop from the pointer version:
L5:
movzbl 1(%esi), %edx
movl %esi, %ecx
movzbl (%ebx), %eax
movb %al, 1(%esi)
decl %esi
movb %dl, (%ebx)
incl %ebx
cmpl %ecx, %ebx
jb L5
Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s
Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s
So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.
Human/Ranger/Zangband
These types of opimizations are virtually pointless on modern machines.
.Net and Java XML parsers. And yes, the original programmers tried several different methods/libraries to tweak the code (Sax, Xerces, whatever). They got it down to a best of four and a half hours.
I call bullshit. Optimizations are important regardless of the language or CPU.
My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.
It took over six hours on the dual processor system with the native
My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.
Time is money, especially when your trying to push down 20,000 price changes from the Mainframe to 2000+ POS units during the off hours. The solution? We put my C routine into a shared library callable via C# or Java. Bonus, the 'C' code gets it done under an hour on the dual CPU machine. And yes, I tested the inputs for overflows, security problems whatever, before we went into production. Theres a big difference between a programmer who knows a language vs a programmer who understands it.
IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
Thats your opinion and I'm glad you shared it. You do know that your C#/Java/VB/Python etc. VM calls all wind up as pointer arithmetic to the CPU? Don't you? I wouldn't want to work for you though. Your competitors will write a faster program that uses less memory and you will loose the contract/job.
No flame intended,
Enjoy.
It's just the normal noises in here.