Slashdot Mirror


Arrays vs Pointers in C?

UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" "Here are my own thoughts on the issue:

For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.

The first function uses array indexing:
void reversestring_array(char *str)
{
int head, tail;
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{
temp = str[tail];
str[tail] = str[head];
str[head] = temp;
}
}
The second function uses pointers:
void reversestring_pointer(char *str)
{
char *phead, *ptail, temp;
if (!str) return;
ptail = str + strlen(str) - 1;
for (phead = str; phead < ptail; ++phead,--ptail)
{
temp = *ptail;
*ptail = *phead;
*phead = temp;
}
}
While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.

Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...

The array function:
forloop:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al
inc esi
dec ecx
cmp esi,ecx
jl forloop
The pointer function:
forloop:
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl
inc ecx
dec eax
cmp ecx,eax
jb forloop
While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."

2 of 308 comments (clear)

  1. It optimizes out by klossner · · Score: 5, Interesting
    An optimizing compiler, such as gcc -O, will rearrange the array code into the pointer code -- it doesn't require a base-index address mechanism. This is called strength reduction.

    Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.

  2. GCC experimental results by RML · · Score: 5, Interesting

    Just for fun, I tried the sample code on gcc (GCC) 4.1.0 20050723 (experimental), with -O3 -march=pentium-m. The loop from the array version:

    L13:
    movzbl  -1(%ebx), %edx
    movl    %esi, %ecx
    decl    %edi
    movl    8(%ebp), %eax
    movb    %dl, -13(%ebp)
    movzbl  -1(%esi,%eax), %edx
    movb    %dl, -1(%ebx)
    decl    %ebx
    movzbl  -13(%ebp), %edx
    movb    %dl, -1(%esi,%eax)
    incl    %esi
    cmpl    %ecx, %edi
    jg      L13

    The loop from the pointer version:

    L5:
    movzbl  1(%esi), %edx
    movl    %esi, %ecx
    movzbl  (%ebx), %eax
    movb    %al, 1(%esi)
    decl    %esi
    movb    %dl, (%ebx)
    incl    %ebx
    cmpl    %ecx, %ebx
    jb      L5

    Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s
    Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s

    So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.

    --
    Human/Ranger/Zangband