Arrays vs Pointers in C?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Wednesday October 12, 2005 @10:42AM from the optimizations-may-not-be-portable dept.

UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" "Here are my own thoughts on the issue:

For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.

The first function uses array indexing:

void reversestring_array(char *str) {
int head, tail; char temp; if (!str) return; tail = strlen(str) - 1; for (head = 0; head < tail; ++head, --tail) {
temp = str[tail]; str[tail] = str[head]; str[head] = temp;
}
}

The second function uses pointers:

void reversestring_pointer(char *str) {
char *phead, *ptail, temp; if (!str) return; ptail = str + strlen(str) - 1; for (phead = str; phead < ptail; ++phead,--ptail) {
temp = *ptail; *ptail = *phead; *phead = temp;
}
}

While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.

Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...

The array function:

forloop: mov bl,byte ptr [esi+edx] mov al,byte ptr [ecx+edx] mov byte ptr [ecx+edx],bl mov byte ptr [esi+edx],al inc esi dec ecx cmp esi,ecx jl forloop

The pointer function:

forloop: mov bl,byte ptr [ecx] mov dl,byte ptr [eax] mov byte ptr [eax],bl mov byte ptr [ecx],dl inc ecx dec eax cmp ecx,eax jb forloop

While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."

15 of 308 comments (clear)

Min score:

Reason:

Sort:

Why do you care? by Julian+Morrison · 2005-10-12 10:48 · Score: 4, Insightful

For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.
1. Re:Why do you care? by thegrassyknowl · 2005-10-12 12:25 · Score: 4, Interesting
  
  then don't wank around optimizing
  Dude, best use of the word wank. Ever!
  for single cycles on a machine that probably bleeds off a million cycles every time you raise a window
  Computers have become more powerful and programmers have become more lazy. It's not strictly true because instead of focussing a lot of time writing efficient code programmers are now focussing a lot of time writing a lot of code to fill bigger machines. That million cycles is wasted doing crap and probably half of doesn't need to be done anyway...
  I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz). Software was generally responsive, functional and minimal. Adding a zillion features and eye candy was not considered necessary. Programs were easy to use and intuitive, and did I mention functional and minimal? In the days where "nobody will ever need more than 640k" software was designed and optimised to be small and chew up few cycles.
  Now, with RAM and Gigahertz available for next to nix software has just bloated out. It's nice to see a programmer thinking about efficiency/size even if it is purely academic. We should be encouraging that; I know I'd like my applications to work faster and carry less crap than they do currently.
  
  --
  I drink to make other people interesting!
2. Re:Why do you care? by aled · 2005-10-12 12:30 · Score: 4, Funny
  
  Both of those pieces of code happen so fast that it doesn't matter.
  
  Unless of course that someone writes a compiler so optimizing that the code ends before it begins, causing a paradox that will end the universe. To prevent that imminent danger all programmers must start programming in TI-99/4a Basic right now.
  
  --
  
  "I think this line is mostly filler"
3. Re:Why do you care? by JanneM · 2005-10-12 14:37 · Score: 5, Insightful
  
  For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?
  
  When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for; or how exhaustively you can search for features during one frame; or what resolution image you are able to work on.
  
  If your code does something nice and graphical where 99% of the time is spent waiting for the user, sure, you're absolutely right. And if your system is doing something inherently bounded - it works until it's done, then it stops and waits until it's time again - then all you need is to make it fast enough and no faster. But there are real-world systems that today, and for the foreseeable future, are bounded by the available processing power and that can always benefit from any improvement in execution time.
  
  --
  Trust the Computer. The Computer is your friend.
4. Re:Why do you care? by NonSequor · 2005-10-12 19:10 · Score: 4, Insightful
  
  You don't understand the problem. A chemical reaction is only as fast as its slowest step. Catalyzing the other steps will not yield an improvement in reaction rate.
  
  For computer programs, you won't gain anything worthwhile by optimizing code that the computer only spends 1% of its time executing. That's not to say you should do a sloppy job, but you should focus on what matters. Microoptimization techniques (those techniques that involve choices of instructions and their orders rather than changing the algorithm that is used) typically yield very small gains. Microoptimization can yield substantial benefits when used properly in heavily used sections of code, but the time involved in trying to microoptimize everything could be better used to work on macrooptimizations or organizing the code to make it more amenable to later modification.
  
  There's no sense in trying to make your program 0.3% faster when you could be finding a way to make it 20% faster instead.
  
  --
  My only political goal is to see to it that no political party achieves its goals.
It optimizes out by klossner · 2005-10-12 11:12 · Score: 5, Interesting

An optimizing compiler, such as gcc -O, will rearrange the array code into the pointer code -- it doesn't require a base-index address mechanism. This is called strength reduction.
Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.
1. Re:It optimizes out by LSD-OBS · 2005-10-12 12:23 · Score: 4, Funny
  
  Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree
  
  Hah, yeah. Last week in a public toilet in London, someone drew an arrow pointing to the toilet roll, and it said "Computer degrees - please take one".
  
  Guess that about sums it up ;-)
  
  --
  Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
I echo the above statements by 21chrisp · 2005-10-12 11:13 · Score: 4, Insightful

These types of opimizations are virtually pointless on modern machines. The increased readability and lower likelihood of programming errors on the array option far outway any speed increase for the pointer option. Plus, as you noticed, the resulting assembly is basically the same. Most likely, both will run at virtually the same speed with modern compilers. Not only would the speed difference be unnoticeable, it would be utterly inconsequential.

IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
1. Re:I echo the above statements by NullProg · 2005-10-12 13:11 · Score: 4, Interesting
  
  These types of opimizations are virtually pointless on modern machines.
  I call bullshit. Optimizations are important regardless of the language or CPU.
  
  My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.
  
  It took over six hours on the dual processor system with the native .Net and Java XML parsers. And yes, the original programmers tried several different methods/libraries to tweak the code (Sax, Xerces, whatever). They got it down to a best of four and a half hours.
  
  My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.
  
  Time is money, especially when your trying to push down 20,000 price changes from the Mainframe to 2000+ POS units during the off hours. The solution? We put my C routine into a shared library callable via C# or Java. Bonus, the 'C' code gets it done under an hour on the dual CPU machine. And yes, I tested the inputs for overflows, security problems whatever, before we went into production. Theres a big difference between a programmer who knows a language vs a programmer who understands it.
  
  IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
  Thats your opinion and I'm glad you shared it. You do know that your C#/Java/VB/Python etc. VM calls all wind up as pointer arithmetic to the CPU? Don't you? I wouldn't want to work for you though. Your competitors will write a faster program that uses less memory and you will loose the contract/job.
  
  No flame intended,
  Enjoy.
  
  --
  It's just the normal noises in here.
Spelling and grammar and punctuation, oh my! by Roadkills-R-Us · 2005-10-12 11:42 · Score: 4, Funny

I didn't see any errors in punctuation or grammar, either. I don't recall the last time I saw a post of that length which didn't confuse plural's (sic) with possessives.
GCC experimental results by RML · 2005-10-12 12:25 · Score: 5, Interesting

Just for fun, I tried the sample code on gcc (GCC) 4.1.0 20050723 (experimental), with -O3 -march=pentium-m. The loop from the array version: L13: movzbl -1(%ebx), %edx movl %esi, %ecx decl %edi movl 8(%ebp), %eax movb %dl, -13(%ebp) movzbl -1(%esi,%eax), %edx movb %dl, -1(%ebx) decl %ebx movzbl -13(%ebp), %edx movb %dl, -1(%esi,%eax) incl %esi cmpl %ecx, %edi jg L13 The loop from the pointer version: L5: movzbl 1(%esi), %edx movl %esi, %ecx movzbl (%ebx), %eax movb %al, 1(%esi) decl %esi movb %dl, (%ebx) incl %ebx cmpl %ecx, %ebx jb L5 Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.

--
Human/Ranger/Zangband
And C++/STL? by XenonOfArcticus · 2005-10-12 12:27 · Score: 4, Informative

And what of STL container classes under C++?

Seriously though, there is no generalized answer. Good compilers will do what you want. Bad compilers (and there are more than you realize out there) will make lousy code.

If your target involves an environment where you might be using a more primitive compiler, or you can't predict the environment and compiler, it might be an issue. This is why code like the PNG and JPEG libs go for tight/cryptic. As well, the performance of the runtime platform (CPU, memory, bus) have bearing. In some cases, though one piece of assembly might look less efficient than the other, the sheer brute force of CPU parallelism, out of order execution and other black juju might render it meaningless.

Finally, you have to consider the cost/benefit of the situtation. Making cryptic fast code is worthwhile if you're writing some wicked FFT code or image processor main loop, where it'll run a few quadrillion times. Other places in the same codebase though, it's probably worthwhile to trade absolute performance for a bit better code readability and maintainability.

Remember, profile before optimizing. Only optimize things that will really make a significant performance difference. Rewriting the UI display loop to use pointers instead of lists is probably pointless. Heh. Pointless.

I'm a big fan of C++ STL containers now. If I _know_ a block of code is going to be a critical bottleneck, I'll use something else from the start. I've known people who coded UIs in assembly, for no good reason, and others who wrote image processing code in interpreted RAD scripting environments. I've written and optimized code (C++/C and assembly) on systems all the way back to the 6502 (yay! two 8-bit index registers!) and it's not hard, as long as you proceed sensibly and logically.

That being said, the Microsoft VC++6 compiler (still in common use today) has a terrible code generator. It fails to perform simple loop invariant hoisting operations that my old SAS/C++ compiler (Amiga, yah, I'm one of _them_...) did in 1990. VC++7/2003 and Whidbey/2005 are showing signs of being MUCH more caught-up, and the Intel and Codeplay compilers (despite Intel's AMD-phobia) are much better too.

When performance really counts, a whole new set of tools and processes come into play.

--
-- There is no truth. There is only Perception. To Percieve is to Exist.
Re:there is a difference... by Waffle+Iron · 2005-10-12 12:44 · Score: 4, Informative

what you in fact have is in the array 4 additional additions (between registers),
Actually, most x86s have a dedicated address generation units which handle those index additions in parallel on separate logic from the main ALU. So both cases would actually run at the same speed on a modern x86.
As Donald Knuth said... by jschmerge · 2005-10-12 13:28 · Score: 4, Insightful

Premature optimization is the root of all evil.

I personally let the compiler writers worry about this type of thing. I'd rather have my code be more readable than fast. That being said, there are many times that I switch back and forth between pointer arithmetic and array indexing within the same program. I'll primarily use the pointer arithmetic for things like simple string processing where it just leads to more compact code, and I'll use the array indexing when I have an actual bonafide array...

In any event, my point is that you should be programming in a way that is maintainable and readable; you shouldn't be worried about shaving tens of clock cycle off of such simple loops. In more complex loops, you probably will not be able to shave nearly as much time off, because your indexes won't always be sitting in a register and the data that the index points to will most likely not even fit into a register. In this case, I don't think anyone will argue with the assertion that this:

(ptr + index)->dataMember

is less readable than this:

ptr[index].dataMember
From the Compiler Trenches by Anonymous Coward · 2005-10-12 17:38 · Score: 5, Informative
I develop highly optimizing compilers for a living, so use that to put in context what I'm going to say. Things look very different from the other side of the source file.
I'll admit that I'm always slightly bemused by these sorts of discussions. That bemusement quickly turns to disiniterest after I realize that a lot of people are burning a lot of cycles arguing about it.
Quite frankly, your example has little relevance in the real world. The optimization you are talking about is covered by strength reduction, as others have pointed out. But that's not the point of my message. This sort of piddly optimization means almost nothing when one looks at the whole application. If this piece of code is in an inner loop that takes 90% of the application time and it proves to be the bottleneck, then one can think about taking a closer look.
We have customers that come to us all the time with just such examples. They literally tell us, "You missed an opportunity to use a register here," and we know it's important because we know the customers are serious about profiling code and finding bottlenecks. So when they come to us we are happy to look at the "piddly stuff."
I've seen all kinds of different code. They basically fall into two categories: code that the compiler can do something significant with and code the compiler has no hope of automatically optimizing in any truly meaningful way. When I say "significant" and "meaningful," I explicitly mean "not Dragon Book stuff, except for scheduling (which can provide a significant performance win)" Simple optimizations like common subexpression elimination and copy propagation are more useful at enabling other optimizations than in any cycles gained in their own right. They are important, but not to make the code run significantly faster in and of themselves.
If one writes an application that does a lot of branching and pointer chasing (say, like, a traditional compiler*) then there's not much an optimizer can do with it. The aliasing difficulties alone will kill most optimizations. It's more important to write these kinds of applications in an understandable way because that is where the programmer time is most costly.
That said, judicious use of directives for compilers that support them can go a long way toward making these kinds of codes run really fast. Think of threading a tree search, for example. But the compiler is not going to have much hope of converting such a low-level piece of code without help.
An example of code that a compiler can do really, really well on are the traditional scientific applications. Here parallelism is everything, be it data-level, thread-level or at the distributed memory level (for really big machines). In these cases the loop optimizations are orders of magnitude more important speed-wise than sequential optimization (though, as I said, sequential optimization can enable some of these loop transformations). Some of the more important loop restructuring transformations are:
- Interchange
- Unroll
- Fusion
- Fission
- Unroll-and-Jam
- Invariant hoisting (both data and control)
- Vectorization
- Cache Blocking
When the compiler is done with these, one hardly recognizes the code anymore. :)
In my experience, the fundamental problem is that compilers are really hard to understand. People argue about what a compiler can and cannot do because they are enormously complex systems that require arcane knowledge of language standards and hardware architecture to really dig into. It's slightly less difficult to understand the broad strokes, for example, simple cases of vectorizable loops. It's a lot more difficult to understand how to parallelize a loop that compresses a sparse array into a dense one.
If there's one lesson that I like to convey to programmers, it's to not sweat the optimization details. Don't hand-optimize the code. I can tell you fro