Slashdot Mirror


Arrays vs Pointers in C?

UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" "Here are my own thoughts on the issue:

For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.

The first function uses array indexing:
void reversestring_array(char *str)
{
int head, tail;
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{
temp = str[tail];
str[tail] = str[head];
str[head] = temp;
}
}
The second function uses pointers:
void reversestring_pointer(char *str)
{
char *phead, *ptail, temp;
if (!str) return;
ptail = str + strlen(str) - 1;
for (phead = str; phead < ptail; ++phead,--ptail)
{
temp = *ptail;
*ptail = *phead;
*phead = temp;
}
}
While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.

Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...

The array function:
forloop:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al
inc esi
dec ecx
cmp esi,ecx
jl forloop
The pointer function:
forloop:
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl
inc ecx
dec eax
cmp ecx,eax
jb forloop
While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."

10 of 308 comments (clear)

  1. The eighties called... by Anonymous Coward · · Score: 3, Funny

    ...and they want their arrays, pointers and C back.

  2. Spelling and grammar and punctuation, oh my! by Roadkills-R-Us · · Score: 4, Funny

    I didn't see any errors in punctuation or grammar, either. I don't recall the last time I saw a post of that length which didn't confuse plural's (sic) with possessives.

  3. Use Java instead .... by icepick72 · · Score: 2, Funny

    ... and then you will no longer have to worry that it might be slow.

  4. Re:It optimizes out by LSD-OBS · · Score: 4, Funny

    Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree

    Hah, yeah. Last week in a public toilet in London, someone drew an arrow pointing to the toilet roll, and it said "Computer degrees - please take one".

    Guess that about sums it up ;-)

    --
    Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
  5. Re:Why do you care? by aled · · Score: 4, Funny

    Both of those pieces of code happen so fast that it doesn't matter.

    Unless of course that someone writes a compiler so optimizing that the code ends before it begins, causing a paradox that will end the universe. To prevent that imminent danger all programmers must start programming in TI-99/4a Basic right now.

    --

    "I think this line is mostly filler"
  6. Re:In summary... by Anonymous+Brave+Guy · · Score: 2, Funny
    If it truly matters then you count the ticks for each platform, and decide if it is reasonable. 3 ticks versus 2... that's 33%. 2 ticks versus 1... that's 50%.

    Not necessarily. You're ignoring the fact that modern processors use pipelining architectures, branch prediction, extensive caching often at several levels, and a whole host of other things that mean the total time required is not the sum of the individual times for each instruction.

    One of these days, someone will invent a program that can take some more abstract representation of what we want to do, and automatically generate optimised machine code from it on any given platform. Yeah, there's an idea... I can C it now!

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  7. Re:Effective cache use will be a better optimisati by functor0 · · Score: 2, Funny

    I have good story for this one. In my second co-op term, I was asked to improve the speed of the general 2D convolution image filter for a well-known commercial paint package. Nothing complicated, just run the given 2D kernel through the image.

    The previous writer of the function had carefully removed all the array accesses, only using pointers that were incremented by fixed constants as it proceeded through the code. He had also carefully maintained an array of the sums of the results as we moved from pixel to pixel to avoid re-calculating anything twice. You would think, it's pretty fast, eh?

    I came across two problems with the code:

    1. His array of sums was actually a queue and so he shifted the entire array by one element for every single pixel. Using a ring buffer instead quickly solved this one.

    2. I then came to realize that he was traversing the row-major image in columns. The cache coherency was being shot to hell because none of the cache-lines were being hit as it went from pixel to pixel. I rewrote the function to go by rows instead and guess what the speed improvement was? Something like 3 times!

    Go figure. Some people optimize and get it entirely wrong.

  8. Re:And C++/STL? by Arandir · · Score: 2, Funny

    I'm currently writing a piece of software to compete with a commercial proprietary offering. I'm using "bloated" STL and "bloated" C++ to manipulate "bloated" XML. Frankly, I'm shocked at its poor performance, and I might have to do some optimizations. It takes about a half of a second to load and process 200k worth of XML.

    ON THE OTHER HAND, the commercial proprietary alternative I'm competing against loads the equivalent data. But that data consumes 25Megs, and takes five seconds to load! You would think if they're going to use a proprietary data format, they could at least make one that works!

    I'm not going to apologize anymore for using "bloated" tools.

    --
    A Government Is a Body of People, Usually Notably Ungoverned
  9. Re:Why do you care? by glavenoid · · Score: 3, Funny

    Incedentally, I find some shades yellow to sound rather coarse, with harmonic triangles increasing with each beat inverse to the fundamental. Not as vulgar a sound as say, Taupe, but not as auarally pleasing to me as drips of purple. Now, Brown, on the other hand, has a rather discordant sound, much like a Dom. 7th, even if the actual interval is not such...

    --
    I, for one, am looking forward to the inevitable /. beta rollout fallout.
  10. Re:I echo the above statements by sleepingsquirrel · · Score: 2, Funny
    The following are equivalent:
    b[i++]

    *(b + i++)
    Yeah, but the fun doesn't really begin until you start writing code like...
    i++[b]