Arrays vs Pointers in C? · Slashdot Mirror

Posted by Cliff on Wednesday October 12, 2005 @10:42AM from the optimizations-may-not-be-portable dept.

UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" "Here are my own thoughts on the issue:

For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.

The first function uses array indexing:

void reversestring_array(char *str) {
int head, tail; char temp; if (!str) return; tail = strlen(str) - 1; for (head = 0; head < tail; ++head, --tail) {
temp = str[tail]; str[tail] = str[head]; str[head] = temp;
}
}

The second function uses pointers:

void reversestring_pointer(char *str) {
char *phead, *ptail, temp; if (!str) return; ptail = str + strlen(str) - 1; for (phead = str; phead < ptail; ++phead,--ptail) {
temp = *ptail; *ptail = *phead; *phead = temp;
}
}

While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.

Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...

The array function:

forloop: mov bl,byte ptr [esi+edx] mov al,byte ptr [ecx+edx] mov byte ptr [ecx+edx],bl mov byte ptr [esi+edx],al inc esi dec ecx cmp esi,ecx jl forloop

The pointer function:

forloop: mov bl,byte ptr [ecx] mov dl,byte ptr [eax] mov byte ptr [eax],bl mov byte ptr [ecx],dl inc ecx dec eax cmp ecx,eax jb forloop

While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."

2 of 308 comments (clear)

Min score:

Reason:

Sort:

It optimizes out by klossner · 2005-10-12 11:12 · Score: 5, Interesting

An optimizing compiler, such as gcc -O, will rearrange the array code into the pointer code -- it doesn't require a base-index address mechanism. This is called strength reduction.
Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.
GCC experimental results by RML · 2005-10-12 12:25 · Score: 5, Interesting

Just for fun, I tried the sample code on gcc (GCC) 4.1.0 20050723 (experimental), with -O3 -march=pentium-m. The loop from the array version: L13: movzbl -1(%ebx), %edx movl %esi, %ecx decl %edi movl 8(%ebp), %eax movb %dl, -13(%ebp) movzbl -1(%esi,%eax), %edx movb %dl, -1(%ebx) decl %ebx movzbl -13(%ebp), %edx movb %dl, -1(%esi,%eax) incl %esi cmpl %ecx, %edi jg L13 The loop from the pointer version: L5: movzbl 1(%esi), %edx movl %esi, %ecx movzbl (%ebx), %eax movb %al, 1(%esi) decl %esi movb %dl, (%ebx) incl %ebx cmpl %ecx, %ebx jb L5 Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.

--
Human/Ranger/Zangband