Arrays vs Pointers in C?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Wednesday October 12, 2005 @10:42AM from the optimizations-may-not-be-portable dept.

UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" "Here are my own thoughts on the issue:

For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.

The first function uses array indexing:

void reversestring_array(char *str) {
int head, tail; char temp; if (!str) return; tail = strlen(str) - 1; for (head = 0; head < tail; ++head, --tail) {
temp = str[tail]; str[tail] = str[head]; str[head] = temp;
}
}

The second function uses pointers:

void reversestring_pointer(char *str) {
char *phead, *ptail, temp; if (!str) return; ptail = str + strlen(str) - 1; for (phead = str; phead < ptail; ++phead,--ptail) {
temp = *ptail; *ptail = *phead; *phead = temp;
}
}

While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.

Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...

The array function:

forloop: mov bl,byte ptr [esi+edx] mov al,byte ptr [ecx+edx] mov byte ptr [ecx+edx],bl mov byte ptr [esi+edx],al inc esi dec ecx cmp esi,ecx jl forloop

The pointer function:

forloop: mov bl,byte ptr [ecx] mov dl,byte ptr [eax] mov byte ptr [eax],bl mov byte ptr [ecx],dl inc ecx dec eax cmp ecx,eax jb forloop

While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."

22 of 308 comments (clear)

Min score:

Reason:

Sort:

Why do you care? by Julian+Morrison · 2005-10-12 10:48 · Score: 4, Insightful

For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.
1. Re:Why do you care? by JanneM · 2005-10-12 14:37 · Score: 5, Insightful
  
  For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?
  
  When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for; or how exhaustively you can search for features during one frame; or what resolution image you are able to work on.
  
  If your code does something nice and graphical where 99% of the time is spent waiting for the user, sure, you're absolutely right. And if your system is doing something inherently bounded - it works until it's done, then it stops and waits until it's time again - then all you need is to make it fast enough and no faster. But there are real-world systems that today, and for the foreseeable future, are bounded by the available processing power and that can always benefit from any improvement in execution time.
  
  --
  Trust the Computer. The Computer is your friend.
2. Re:Why do you care? by NonSequor · 2005-10-12 19:10 · Score: 4, Insightful
  
  You don't understand the problem. A chemical reaction is only as fast as its slowest step. Catalyzing the other steps will not yield an improvement in reaction rate.
  
  For computer programs, you won't gain anything worthwhile by optimizing code that the computer only spends 1% of its time executing. That's not to say you should do a sloppy job, but you should focus on what matters. Microoptimization techniques (those techniques that involve choices of instructions and their orders rather than changing the algorithm that is used) typically yield very small gains. Microoptimization can yield substantial benefits when used properly in heavily used sections of code, but the time involved in trying to microoptimize everything could be better used to work on macrooptimizations or organizing the code to make it more amenable to later modification.
  
  There's no sense in trying to make your program 0.3% faster when you could be finding a way to make it 20% faster instead.
  
  --
  My only political goal is to see to it that no political party achieves its goals.
3. Re:Why do you care? by psavo · 2005-10-13 01:19 · Score: 2, Insightful
  
  Faster hardware just makes minor inefficiencies less noticible, so programmers add more minor inefficiencies and apply the same "but faster hardware will make it ok" attitude instead of fixing the real problem!
  None of the discussion I've seen so far has touched the real problem, not the wanking on the micro-optimisation (these compiler optimizations are all O(n) in gain) but that so few have any sort of a clue on what algorithms to use and when.
  I'm not saying that it's bad to optimize those sort of things when they pop up at your most-time-spent chart, but that they may simply go away (along with 90% of other runtime) when you really think of the algorithm.
  
  --
  fucktard is a tenderhearted description
4. Re:Why do you care? by Kosgrove · 2005-10-14 05:00 · Score: 2, Insightful
  
  It has nothing to do with programmers being lazy. I'd much rather work smarter using higher-level tools and get a lot more done. It has everything to do with this simple fact, which wasn't (as) true back in the old days:
  
  Hardware is cheap. People are expensive.
  
  Think about it. A desktop with current hardware costs under $1000 these days. Lower-end servers run about the same amount. Compare that to how much you cost your employer per hour. How does it make economic sense to (most - don't you embedded and low-power computing people get all up in my grill) optimize software at the low level you're suggesting? Most software for the end user spends most of its time waiting for user input or doing network operations, not reversing strings.
  
  If you want to optimize something, optimize the architecture. Pool database connections, reduce network traffic, change object relationships to make them more efficient, but for god's sake don't waste your time reinventing low-level functions that have been done written and more importantly, DEBUGGED to the nth degree.
  
  I strongly disagree that we should be encouraging people to think about optimizing the optimizied wheel. Spend your time thinking about bigger and better software problems.
Hundreds of possibilities by topham · 2005-10-12 10:58 · Score: 2, Insightful

This question sounds to me like discussions I had ages ago with other programmers, and it was always 1 programmer trying to justify his method of coding a routine over some other, equally valid method.

Pointer arithmetic and array's in C really have the same issues. You can access beyond the array, and you can direct a pointer beyond the correct memory space. If you want to discuss good programming practices, C isn't it.

I never really saw the point in arguing over array and pointers in C. When I programmed in C I used both. I typically used Arrays if I wanted to code to be obvious and straight forward; if I wanted to do somehting else with the index, etc. I used points if I wanted speed above all else.

If I were to write to any modern PC (PPC970, Pentium IV, Athlon, etc...) I wouldn't worry about speed. The algorithm for more complicated functions than shuffling a few bytes around will dictate the speed.
WOW! by Anonymous Coward · 2005-10-12 11:04 · Score: 2, Insightful

Now THIS is the kind of discussion that should be going on at Slashdot!
is this really your bottleneck? by eyal · 2005-10-12 11:05 · Score: 2, Insightful

When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?
One can always check the resulting assembly code, if one is so concerned about this.
Though I'm pretty sure this isn't the performance bottleneck in your code (just remember - profile, profile, profile)
I echo the above statements by 21chrisp · 2005-10-12 11:13 · Score: 4, Insightful

These types of opimizations are virtually pointless on modern machines. The increased readability and lower likelihood of programming errors on the array option far outway any speed increase for the pointer option. Plus, as you noticed, the resulting assembly is basically the same. Most likely, both will run at virtually the same speed with modern compilers. Not only would the speed difference be unnoticeable, it would be utterly inconsequential.

IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
As Donald Knuth said... by jschmerge · 2005-10-12 13:28 · Score: 4, Insightful

Premature optimization is the root of all evil.

I personally let the compiler writers worry about this type of thing. I'd rather have my code be more readable than fast. That being said, there are many times that I switch back and forth between pointer arithmetic and array indexing within the same program. I'll primarily use the pointer arithmetic for things like simple string processing where it just leads to more compact code, and I'll use the array indexing when I have an actual bonafide array...

In any event, my point is that you should be programming in a way that is maintainable and readable; you shouldn't be worried about shaving tens of clock cycle off of such simple loops. In more complex loops, you probably will not be able to shave nearly as much time off, because your indexes won't always be sitting in a register and the data that the index points to will most likely not even fit into a register. In this case, I don't think anyone will argue with the assertion that this:

(ptr + index)->dataMember

is less readable than this:

ptr[index].dataMember
Pointers are for programmers silly rabbit!! by Da_Weasel · 2005-10-12 13:36 · Score: 2, Insightful

Pointers are for programmers, not for speed. A particular compilers implementation of the C specification may produce better or more efficient code when using pointers, but i seriously doubt that every implementation would work the same or even across different architectures.

--
If you must!
Re:GCC experimental results by UOZaphod · 2005-10-12 13:38 · Score: 2, Insightful

What is used to compile the Linux Kernel?

--
"The unicode stuff in the latest version is working fabulously well. My russian mafia friends are ecstatic."
Re:Unfortunately, the 2000s answered... by Anonymous Coward · 2005-10-12 13:42 · Score: 1, Insightful

all these new-fangled alternatives suck

Right, that's why I use Common Lisp.
Thinking like this by phorm · 2005-10-12 15:48 · Score: 2, Insightful

Is why we have big, ugly, bloated programs that require overpowered CPUs. First of all, it depends on what the application is being coded for. Perhaps it's actually intended for slower machines as well as faster... not everything is made to run on a P4-4Ghz machine you know.

Secondly, these operations can add up. If, for example, this scenario is used throughout the code and called several times per second, on an operation perhaps requiring ready output, the output might even be visible delayed on a faster machine.

Pointers can be a pain in the ass and I definately agree that I find them annoying at times, but they also have their place and you don't always have the option of sacrificing beauty for functionality. If the code is a bit confusing, well that's what comments are for.
You shouldn't make any assumptions ... by bigsteve@dstc · 2005-10-12 16:08 · Score: 2, Insightful
I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?
Honestly, there is no real justification for making any assumptions. It depends on how good a particular C compiler is at generating code for a particular ISA. Indeed, for some ISA's I'd expect a niave C compiler to generate FASTER code for indexing an array than for incrementing a pointer. (I'm thinking of word addressed ISAs, where a 'char*' has to be represented as a word pointer and a character offset.)
In the big picture, it probably doesn't make a great deal of difference to performance which style you use. It might make 5-10% difference in a tight loop, but probably much less across a large application. If the difference performance is significant, you will get more benefit for your effort by:
- finding better C compilers, or
- using a profiler to find the real hotspots in the code and hand-optimizing them for the hardware platforms that you really care about.
As other people have said, other issues than this are likely to have a greater bearing on the overall performance of a typical application; e.g. data structures, algorithms, database design, etc.
Vectorisation by Heretik · 2005-10-12 16:13 · Score: 2, Insightful

Maybe not relevant in this case since you're working with strings, but with vectorization being so important to performance on most modern architechtures, if you were dealing with floats the pointer one might actually be slower because it's much harder for the compiler to figure out if (and how) it can vectorize it.

I'm not sure about various compilers and what they do in this case, but following the progress of GCC4's vectorisation, it looks much more likely that the pointer case is passed over by the vectorizer and ends up being (way) slower than the easy to vectorise array index version.

Like I said, not sure what the actual situation in practise is, but it's worth looking into. The difference between vectorized SSE code and plain old x86 code (for example) is WAY greater than the trivial insignificant difference between the two examples you posted.
ASSume by larry+bagina · 2005-10-12 16:16 · Score: 3, Insightful

My argument is that one cannot assume
You're right, however, you're also assuming that your pointer arithmetic is faster.
Consider a 16-bit architecture with 32-bit pointers.
Using pointer arithmetic (32-bit) is slower than using an index register (16-bit) as the array index.
So stop assuming and stick with what you're comfortable with. If you prefer pointers, fine. if you prefer arrays, fine. But if you're so concerned with the speed, you'd be doing it in assembly.

--
Do you even lift?
These aren't the 'roids you're looking for.
Benchmarking by Threni · 2005-10-12 19:32 · Score: 2, Insightful

If you're talking about optimisation then the only thing to do is to check different strategies. You can't just assume type a is a better coding method than type b. Even if you could prove it were true for every single compiler on the market, one could come along tomorrow and do things differently.
Re:GCC experimental results by Deorus · 2005-10-12 21:18 · Score: 2, Insightful

>> no-one uses it when native alternatives are available anyway

Aren't hundreds of Linux distributions out there enough to prove that assumtion wrong? Gcc is used regardless of its speed because it's free.
One of the secrets of the master programmers by rfisher · 2005-10-13 02:18 · Score: 2, Insightful

Master programmers know that it doesn't matter. We:

1. Write the code in the form that is clearest. (So that the maintainer--even if it is ourself--can figure out what it is doing quickly when debugging it later.)

2. If AND ONLY IF performance is a concern and if AND ONLY IF this code has been shown to be a bottleneck, do we consider optimizing. We then try several different implementations and then profile them to choose the winner.
Re:From the Compiler Trenches by Anonymous Coward · 2005-10-13 03:59 · Score: 2, Insightful

[Apologies for the code formatting <ecode> doesn't seem to work (in preview mode, anyway).]
Hand-optimizing a piece of code without first making sure it's important via profiling and also looking at what the compiler is acutally doing is not avoiding laziness. It's simply a waste of time that prevents one from getting real work done.
While it's true that our compiler technology continues to improve, there are some cases where hand-optimization puts the cocde in a state that no compiler will ever have a hope of repairing. Here's a simple example (contrived for simpler explanation, but not very far from what one sees in real-world code):
int global[BIGNUM];
int *p = global; /* "Optimize" address arithmetic */

void foo()
{
int x[BIGNUM];
int y[BIGNUM];

initialize(x, y); /* In another compilation unit, not inlined */

for(i = 1, i < BIGNUM; ++i) { /* x[0], y[0] hold "special" info */
for(j = 1; j < BIGNUM; ++j) {
x[i] += y[j]; /* Reduce y into each element of x */
}
*p++ += x[i]; /* Add x to global */
}
}

Note that to the human eye, the outer loop looks parallel. There are no cross-iteration dependencies. It's a straight vector copy loop, very fast on architectures that support it like MMX, SSE, Altivec, etc.
We've been "clever" and converted the indexing off of global into pointer accesses. In addition, we've made the pointer global because we know this function is the innermost loop and it will be called millions of times. We definitely want to avoid those millions of assignments.
Or do we? Let's take a look at what we've done. When we call initialize to fill in x and y, we don't know ewhat happens to p. Coming out of that call, p could point to anything. More specifically, it could point to x. Even more specifically, it could point to x[0]. This creates a recurrence meaning that there is now a possible cross-iteration dependency in the outer loop.
Oops. It's no longer parallel.
Well, that's not strictly true. It still is functionally parallel but good luck convincing the compiler of that without help from directives. Strictly speaking, recurrences can be parallelized but even if this "optimized" code were parallelized run a heck of a lot slower than the straight vector copy.
We didn't even hand "optimize" very aggressively. I've seen cases where programmers collapse multiple levels of loops accessing a multi-dimensional array and in the process completely destroy any information the compiler might have had about the iteration pattern. The end result is that loops that would have parallelized don't anymore.
With parallelization we are talking anywhere from 2x to 50x speedup. Don't trade that for the .1% you'll get from converting an array access into pointer arithmetic.
Again, the problem is that it is hard for humans to understand the purely mechanical process of the compiler. We "know" information (that initialize doesn't touch p) that the compiler never sees. Oh sure, there are some compilers out there that do whole program analysis and might be able to uncover what's going on here but the compile time on these systems is exponential.
WRONG ! by oldCoder · 2005-10-14 02:16 · Score: 2, Insightful

The increased readability and lower likelihood of programming errors on the array option...

The array version is neither clearer nor less error-prone. And I dare you to prove differently!
The pointer version is more portable across the chasm of compiler quality. That is, it will run well even on poorer compilers.
The question of clearer is a question of cognitive psychology, not computer science, and requires experimentation to validate. The experiment would be confounded by the existing prejudice against pointers and the habit of CS profs of teaching subscripts over pointers.
N.B.: There are problems with C pointers, but that comes up when using pointers to data structures, not pointers to bytes. Dangling pointers, free/malloc problems, and all that. Using subscripts helps very little, however.
It is not true that the optimization is nearly pointless, it is that the difference in performance is so small as to make the decision nearly pointless. No pun intended. In some situations, the difference may be very important. The pointer version is no harder to code, read, or debug, and might possibly give you back benefits.
What I'm trying to say here is that the pointer version is not an optimization, it's an alternative. Optimizations require more work than the vanilla version. For string/character loops, there is no more work in coding the pointer version.

--

I18N == Intergalacticization