Arrays vs Pointers in C?
UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?"
"Here are my own thoughts on the issue:
For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.
The first function uses array indexing:
Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.
My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...
The array function:
I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."
For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.
The first function uses array indexing:
void reversestring_array(char *str)The second function uses pointers:
{int head, tail;}
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{temp = str[tail];}
str[tail] = str[head];
str[head] = temp;
void reversestring_pointer(char *str)While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.
{char *phead, *ptail, temp;}
if (!str) return;
ptail = str + strlen(str) - 1;
for (phead = str; phead < ptail; ++phead,--ptail)
{temp = *ptail;}
*ptail = *phead;
*phead = temp;
Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.
My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...
The array function:
forloop:The pointer function:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al
inc esi
dec ecx
cmp esi,ecx
jl forloop
forloop:While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl
inc ecx
dec eax
cmp ecx,eax
jb forloop
I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."
For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?
If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.
Oops, messed up the tag at the top, and it ate the quote portion:
I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references
Preview! Gah.
Stop hunting for zebras. There aren't any.
This question sounds to me like discussions I had ages ago with other programmers, and it was always 1 programmer trying to justify his method of coding a routine over some other, equally valid method.
Pointer arithmetic and array's in C really have the same issues. You can access beyond the array, and you can direct a pointer beyond the correct memory space. If you want to discuss good programming practices, C isn't it.
I never really saw the point in arguing over array and pointers in C. When I programmed in C I used both. I typically used Arrays if I wanted to code to be obvious and straight forward; if I wanted to do somehting else with the index, etc. I used points if I wanted speed above all else.
If I were to write to any modern PC (PPC970, Pentium IV, Athlon, etc...) I wouldn't worry about speed. The algorithm for more complicated functions than shuffling a few bytes around will dictate the speed.
Such as comp.lang.c.moderated
Do the kids still know what USENET is?
(XNews is still the best newsreader BTW...)
my sig could kick your sig's arse...
Now THIS is the kind of discussion that should be going on at Slashdot!
One of the mantras hammered into my head during my freshman CS class was:
"The name of an array is a pointer to its 0th element."
Another favorite was:
"Never dereference a null pointer."
From what I remember, arrays and pointers are interchangeable. I've forgotten the C syntax, but I believe that if you have int i[20] followed by int *k = i, then i[4] is the same as *(k + (4 * 4)). E.g. they're both references to specific locations in memory. I don't know about differences across architectures, but I'd imagine if that didn't hold true on a particular system then that system wouldn't really be implementing C correctly. Then again, I'm not much for languages.
rooooar
One can always check the resulting assembly code, if one is so concerned about this.
Though I'm pretty sure this isn't the performance bottleneck in your code (just remember - profile, profile, profile)
There is nothing to beat xvnews - particularly the icon that shows the economy collapsing whenever there is anything new to read.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
...and they want their arrays, pointers and C back.
In my experience, which is somewhat dated (by about 5 years), what you state is generally true for simple loops.
However, for more complicated loops the compiler fails to opimize the loop very well. My 'experience' was in writing an alogirthm which took a raw camera sensor ccd data and ordered it into a proper bayer pattern. When moving from indexer based access to pointer based access I saw a significant increase in performance.
Even basing one pointer off of another was slower than maintaining separate pointers and manipulating them via arithmatic.
The loop looked pretty nasty when it was done (the loop also mixed several other stages of the image processing pipe at once, such as application of color correction & calibration data, dead pixel correction, etc; probably had something like 30 pointers that had to be maintained), but it was quick.
I ended up taking following the same approach converting the raw ccd data to a live preview and code which generated a thumnail preview, obtaining the same kind of gains.
When I was done with the thumnail code, it was capable of generating 108 fully processed thumnails (color 768x512 8bpc image) per second on a 700mhz PIII from a 6 megapixel source image (3072x2048, 16bpc bayer image). I was proud of that at the time...
Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.
These types of opimizations are virtually pointless on modern machines. The increased readability and lower likelihood of programming errors on the array option far outway any speed increase for the pointer option. Plus, as you noticed, the resulting assembly is basically the same. Most likely, both will run at virtually the same speed with modern compilers. Not only would the speed difference be unnoticeable, it would be utterly inconsequential.
IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
A[0] is a pointer to a byte in memory. A[1] is a pointer to that byte + 1 * (size of an element of that type) in memory. So shouldnt this be irrelivent and both take equal time.
Either way, both running times for assesing memory is in n time, so it doesnt matter.
In consequence, if you want a predictable program on ANY compiler, pointer operations are probably a better bet than arrays, regardless of how well compilers handle those arrays.
If you want something of uniform reliability, rather than uniform performance, you're better off with arrays. Arrays are easier to bounds-check in source form, errors are generally easier to spot, etc.
If you want something that is FAST, then you're probably not wanting to use either. For example, if you've a sequence of characters, and want to copy them, then copying them a character at a time - regardless of how - is slow and inefficient on a 32-bit or 64-bit machine. Cast the data onto the largest unit the CPU can pull out of memory in one operation, and operate in bulk/parallel.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Being pedantic perhaps, but you might get a buffer overflow in your code if for some reason it doesn't have a NULL at the end.
Better would be to call the subroutine with (char* str, size_t length) and to replace strlen(str) with str+length.
The subroutine then becomes useful more often too - you can then reverse portions of strings as well as the entire string if you want to.
There are quite a few compilers out there for embedded CPU's and DSP's that are still in the stone age. In embedded programming it is not safe to assume that a compiler will contain any optimization and to write C code that is known to produce faster assembly code.
Thats just a fact of life for people in the trenches.
I suspect that for OS level compilers (GCC, VS) that the opposite is true and you could assume a decent level of optimization is available.
Gregor
{ }
That'll teach ya.
This is an interesting post. Much better than all the politics etc. that we've seen recently on ./
I can't think of any modern architectures that don't support register indexing. Can anyone else?
On another note, although at first glance it may appear that the poster is correct I think that a good compiler written for an architecture that didn't support register indexing (M68000 does, so it's going to have to be something very old) would probably be able to figure out what's going on in this code sample. i.e. in the array version within the for loop head and tail are only used as indexes into the array, so converting them into straight pointers would be trivial. Of course in a more complex example YMMV.
if you look closely at these two portions of the assembly, the resulting code is not the same at all, what you in fact have is in the array 4 additional additions (between registers), which on some architecures can be a non-trivial amount of time (though typically not significant, the memory access is usually longer). in this case the pointer function should be at the least slightly faster than the array one. though as everyone else has said, make sure this is truely a bottleneck of your program before you go doing useless optimizations.
P.S.
Also on other architecures (non-x86) this could be a trivial amount of time if they were designed to handle this specifically (two registers added together as a base and offset basically) for memory access.
Oh my god! You said piqued Not peaked, or piked, or any other dumb thing that editors let slip by in mainstream publications, but piqued! I love you
Too much repetition my too much repetition!
The whole point is dumb.
Every C programmer should know the answer to the following question:
What is 6["abcdefghijkl"]?
Answer: 'g'.
How is this determined?
By definition x[y]=*(x+y)=y[x].
Don't believe me, check the standard.
( Yeah this is a degenerate case, like Duff's device. Still a compiler has to support it. )
Firstly - who really cares? This kind of optimisation almost never occurs as far as I can tell. Secondly, counting CPU instructions hasn't been accurate for years. Thirdly, for any kind of memory access loop, cache misses will most likely be your biggest performance hit. The best way to actually improve this loop is probably going to be by...
1) if the length of the data is 4k, just do the algorithm
2) otherwise, stride through the data in page-size increments reading 1 32bit word, then do the algorithm
But seriously - either you are working in a system so small and slow that you are already an expert in this stuff, or this kind of optimisation doesn't matter.
Choose quicksort instead of bubble sort.
Damnit - I wanted my nick to be "WouldIPutMYRealNameOnSlashdot"
always assume anything you want. It won't make it correct though.
There are still tons of smaller, slower, "antiquated" processors embedded in all sorts of things, for which code gets written. The Z80 is still alive and well, believe it or not, along with lots of other, old standards (and newer, but still small and slow, CPUS).
I didn't see any errors in punctuation or grammar, either. I don't recall the last time I saw a post of that length which didn't confuse plural's (sic) with possessives.
IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
Obviously you have never worked with low-power architectures or non-mainstream compilers. While this particular example may well be too simple to justify using pointer arithmetic in any environment, in a more complex example that type of thing may well be justified (espically on more exotic architectures).
Just expecting compilers to do a good job is no substitute for being able to ensure that they do and being able to do a good job for them if they don't.
I've been reading some really good arguments in these posts about how it'll work. And the bulk of them end up saying, "any good compiler will/should"...
In other words, "No, you cannot."
Your best bet is to do (almost) what you did - compile a test case per platform, check the code - and then the part that you neglected, you need to count the clock ticks per instruction. To reinforce what one poster said - there can be a large difference between [ecx] and [ecx+edx], conceptually speaking. If it truly matters then you count the ticks for each platform, and decide if it is reasonable. 3 ticks versus 2... that's 33%. 2 ticks versus 1... that's 50%.
help me i've cloned myself and can't remember which one I am
int array[8];
int *int_ptr = malloc(8 * sizeof(int));
char string[] = "Hello world!";
char *char_ptr = "Hello world!";
sizeof(array) == 8 * sizeof(int);
sizeof(int_ptr) == sizeof(int *);
You can get int_ptr to point to another location in memory (such as doing ++int_ptr); the similar statement (++array) is illegal.
I think that altering individual characters of char_ptr is undefined (as it would result in changing the value of the literal string), while (as above) string cannot be pointed to a new literal string.
... my experience has been that this matters more in the multidimensional array case than in the single dimensional array case (for those who are curious: my goal is to write algorithms which do non-trivial amounts of processing on VGA or larger video at full frame rates [>= 15Hz], any time i can make array operations faster my entire program benefits significantly). when dealing with two dimensional arrays, you can either do the addressing yourself (location [i,j] in a r x c matrix maps to [i*c+j] in a flat array). if you are clever about how you explain your indexing to the compiler, it should realize that you're passing through consecutive addresses in memory and generate code accordingly. if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.
have you tried this with intel's c/c++ compiler or other compilers? i'd be curious to see if what you're seeing is a result of how gcc is limited in the number of optimizations it can apply directly to the parse tree because it can't assume (at that stage) a particular target machine.
... and then you will no longer have to worry that it might be slow.
Just for fun, I tried the sample code on gcc (GCC) 4.1.0 20050723 (experimental), with -O3 -march=pentium-m. The loop from the array version:
L13:
movzbl -1(%ebx), %edx
movl %esi, %ecx
decl %edi
movl 8(%ebp), %eax
movb %dl, -13(%ebp)
movzbl -1(%esi,%eax), %edx
movb %dl, -1(%ebx)
decl %ebx
movzbl -13(%ebp), %edx
movb %dl, -1(%esi,%eax)
incl %esi
cmpl %ecx, %edi
jg L13
The loop from the pointer version:
L5:
movzbl 1(%esi), %edx
movl %esi, %ecx
movzbl (%ebx), %eax
movb %al, 1(%esi)
decl %esi
movb %dl, (%ebx)
incl %ebx
cmpl %ecx, %ebx
jb L5
Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s
Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s
So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.
Human/Ranger/Zangband
And what of STL container classes under C++?
Seriously though, there is no generalized answer. Good compilers will do what you want. Bad compilers (and there are more than you realize out there) will make lousy code.
If your target involves an environment where you might be using a more primitive compiler, or you can't predict the environment and compiler, it might be an issue. This is why code like the PNG and JPEG libs go for tight/cryptic. As well, the performance of the runtime platform (CPU, memory, bus) have bearing. In some cases, though one piece of assembly might look less efficient than the other, the sheer brute force of CPU parallelism, out of order execution and other black juju might render it meaningless.
Finally, you have to consider the cost/benefit of the situtation. Making cryptic fast code is worthwhile if you're writing some wicked FFT code or image processor main loop, where it'll run a few quadrillion times. Other places in the same codebase though, it's probably worthwhile to trade absolute performance for a bit better code readability and maintainability.
Remember, profile before optimizing. Only optimize things that will really make a significant performance difference. Rewriting the UI display loop to use pointers instead of lists is probably pointless. Heh. Pointless.
I'm a big fan of C++ STL containers now. If I _know_ a block of code is going to be a critical bottleneck, I'll use something else from the start. I've known people who coded UIs in assembly, for no good reason, and others who wrote image processing code in interpreted RAD scripting environments. I've written and optimized code (C++/C and assembly) on systems all the way back to the 6502 (yay! two 8-bit index registers!) and it's not hard, as long as you proceed sensibly and logically.
That being said, the Microsoft VC++6 compiler (still in common use today) has a terrible code generator. It fails to perform simple loop invariant hoisting operations that my old SAS/C++ compiler (Amiga, yah, I'm one of _them_...) did in 1990. VC++7/2003 and Whidbey/2005 are showing signs of being MUCH more caught-up, and the Intel and Codeplay compilers (despite Intel's AMD-phobia) are much better too.
When performance really counts, a whole new set of tools and processes come into play.
-- There is no truth. There is only Perception. To Percieve is to Exist.
If you use arrays the compiler these days will ocnvert to pointer arithmetic. However it also now knows that it is an array and can make assumptions not available to it before allowing more code order optimizations in loops.
These days if its something stupid like using one syntax over another, that optimization has been built into the compiler. The only real speed enhancements are going to be algorithmic. You know the old thigns that make sense, take stuff out of loops that dont need to be inside, compiler will do that so dont even bother making the change unless it increases readability. blah blah blah...
If you answer this question, you fuel a discussion that should be constrained to a very very small domain of C development. Otherwise, you're going to see needless flag waving about coding standards in domains where it doesn't need to exist. Don't feed the dirty C programmer (we like other info, really, we do).
and frankly, the TFA depends on architecture, where most modern compilers make both snippets equal.
Your array vs pointer question is interesting from a navel-gazing perspective, but of no importance in practice. In practice, the overriding concern is readability and correctness. Performance is only important if benchmarking shows that the routine is a bottleneck.
In fact, your pointer example has a subtle bug: if strlen(str) == 0 then you can have ptail = str - 1 which is strictly illegal (you cannot point to a location outside of a memory object except for exactly one element past the end, which is then illegal to dereference). I know that no modern systems would fail with this code, but I expect it would screw up on some memory models with old MS-DOS C compilers.
In short, you should always write the simplest, most readable code, then optimise later if there is a problem. Both array indexing and pointer versions can be written well, so there is no reason to prefer either. By the way, I would mark you down for using !ptr when ptr == NULL is the more readable idiom.
Of course, because a feature such as "edit" is only for wimps...
The unix philosophy lives on: if the system is cumbersome and user unfriendly, it's the user's fault!
...and said they can't have them because all these new-fangled alternatives suck when you've got serious work to do.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Yeah that's what I figured, use a good algorithm (e.g. N vs N^2 ) and as long as the processing loop fits in cache you'll be fine.
And if you can't find/figure out a good algorithm yet, don't do really strange stuff. That way someone later can more easily figure out how to fix your code.
Premature optimization is the root of all evil.
I personally let the compiler writers worry about this type of thing. I'd rather have my code be more readable than fast. That being said, there are many times that I switch back and forth between pointer arithmetic and array indexing within the same program. I'll primarily use the pointer arithmetic for things like simple string processing where it just leads to more compact code, and I'll use the array indexing when I have an actual bonafide array...
In any event, my point is that you should be programming in a way that is maintainable and readable; you shouldn't be worried about shaving tens of clock cycle off of such simple loops. In more complex loops, you probably will not be able to shave nearly as much time off, because your indexes won't always be sitting in a register and the data that the index points to will most likely not even fit into a register. In this case, I don't think anyone will argue with the assertion that this:
is less readable than this:
Pointers are for programmers, not for speed. A particular compilers implementation of the C specification may produce better or more efficient code when using pointers, but i seriously doubt that every implementation would work the same or even across different architectures.
If you must!
I think that people making idiots out of themsevles is much more preferable to trolls editing their +5 moderated posts into Goatse ascii art.
I have good story for this one. In my second co-op term, I was asked to improve the speed of the general 2D convolution image filter for a well-known commercial paint package. Nothing complicated, just run the given 2D kernel through the image.
The previous writer of the function had carefully removed all the array accesses, only using pointers that were incremented by fixed constants as it proceeded through the code. He had also carefully maintained an array of the sums of the results as we moved from pixel to pixel to avoid re-calculating anything twice. You would think, it's pretty fast, eh?
I came across two problems with the code:
1. His array of sums was actually a queue and so he shifted the entire array by one element for every single pixel. Using a ring buffer instead quickly solved this one.
2. I then came to realize that he was traversing the row-major image in columns. The cache coherency was being shot to hell because none of the cache-lines were being hit as it went from pixel to pixel. I rewrote the function to go by rows instead and guess what the speed improvement was? Something like 3 times!
Go figure. Some people optimize and get it entirely wrong.
There is no edit function on Slashdot because down that road lies edit wars and other problems. (Specifically there are issues with editing + moderation as the mod should reflect the original article and not the edit.)
Sure it would be nice with a "allow edit withing 3 minutes" function which can be found on WWW boards. OTOH Slashdot generates several times the traffic as other sites with webboards.
or how about an "editing undoes the moderation" feature, just as posting to a thread one has moderated undoes any moderation one might have done therein....
Is why we have big, ugly, bloated programs that require overpowered CPUs. First of all, it depends on what the application is being coded for. Perhaps it's actually intended for slower machines as well as faster... not everything is made to run on a P4-4Ghz machine you know.
Secondly, these operations can add up. If, for example, this scenario is used throughout the code and called several times per second, on an operation perhaps requiring ready output, the output might even be visible delayed on a faster machine.
Pointers can be a pain in the ass and I definately agree that I find them annoying at times, but they also have their place and you don't always have the option of sacrificing beauty for functionality. If the code is a bit confusing, well that's what comments are for.
This one recently came and bit me on the ass.
/* one of m1, m2 will be at an odd address */
Pointers and arrays are NOT interchangeable!
Consider the data:
char m1[7] = { 1, 2, 3, 4, 5, 6, 7 };
char m2[7] = { 8, 9, 10, 11, 12, 13, 14 };
struct z { u_int16_t a; u_int16_t b[2]; }__attribute__((packed));
Then,
printf("m1: %04x ?= %04x\n",((struct z *)m1)->b[0],*((struct z *)m1)->b);
printf("m2: %04x ?= %04x\n",((struct z *)m2)->b[0],*((struct z *)m2)->b);
According to section 3.2.2.1, prefix * and postfix [0] are identical.
However, according to 3.3.4, the affect of the pointer conversion may yield an invalid pointer, due to the (potentially) stricter memory alignment of the new type.
Bill
Real Usenet junkies use trn from a command line shell.
For the most part, you can leave it to the compiler. The optimizations are platform-specific, so for general portable code, don't worry too much.
All the optimizations considered here are examples of "strength reduction" in loops. There's a lot of existing literature that you can find with that term.
The part that really matters is the multiplication of the index by the element size, something that goes away in your character-string example. Many processors can compute (a + b * k) in a single instruction or addressing mode for some power-of-two values k. For the x86, k = 1,2,4 or 8 have special hardware support.
If the element size k is not one of these special values, then you have to use
a longer instruction sequence, and the idea of pre-scaling your index (b) by k is attractive. So rather than ++b and accessing a+b*k, increment b += k and access a+b.
Whether this is beneficial depends on the number of different factors k to be used. If you are accessing a1[b] and a2[b], and the two arrays (a1 and a2) have different element sizes, then you need to keep track of two values b1 and b2 (unless k2 = 2*k1 or something convenient like that). This can add up to more work than the multiply in some cases. Or the register usage can get out of hand.
It also matters whether b is incremented simply, or in a more complex pattern. If it's not ++b but b += function(), then the pre-multiplied version is not b += k but
b += k * function(), and you have to ask if you're saving anything. You may be able to cheaply push the multiplication by k into function(), or maybe not.
Then there's the question of whether the array access is performed once per loop or not. If a[b] is only accessed every now and then, perhaps it's better to do the multiply in the rare case it's needed, gaining in exchange simple update of b.
If you're using the array index for loop control, as is common, you also have to adapt the loop control to consider the pre-multiplication. If the ending point of the loop is fixed ("for (i = 0; i 10; i++)"), it's fairly easy to pre-multiply it, but there are hairier cases.
In any case, if you do end up pre-multiplying b by the array element size, you then have a chance at a second optimization, which is reducing the "a+b" index addition, and just using a pointer p = a, followed by ++p in the loop.
The problem here is that you need a pointer per array being indexed, even if the arrays have the same element sizes. (Which again increases register pressure.)
In your example, the multiplication is trivial, so the first optimization is a no-op. Each index is used to access only one array, so pre-adding the array base is worth it. And since it's the same array, checking when the indexes cross stays just the same: "i j" is the same as "p+i p+j". Thus, you can expect the pointer form to be slightly faster.
Nah, they use GNUS
Honestly, there is no real justification for making any assumptions. It depends on how good a particular C compiler is at generating code for a particular ISA. Indeed, for some ISA's I'd expect a niave C compiler to generate FASTER code for indexing an array than for incrementing a pointer. (I'm thinking of word addressed ISAs, where a 'char*' has to be represented as a word pointer and a character offset.)
In the big picture, it probably doesn't make a great deal of difference to performance which style you use. It might make 5-10% difference in a tight loop, but probably much less across a large application. If the difference performance is significant, you will get more benefit for your effort by:
As other people have said, other issues than this are likely to have a greater bearing on the overall performance of a typical application; e.g. data structures, algorithms, database design, etc.
Maybe not relevant in this case since you're working with strings, but with vectorization being so important to performance on most modern architechtures, if you were dealing with floats the pointer one might actually be slower because it's much harder for the compiler to figure out if (and how) it can vectorize it.
I'm not sure about various compilers and what they do in this case, but following the progress of GCC4's vectorisation, it looks much more likely that the pointer case is passed over by the vectorizer and ends up being (way) slower than the easy to vectorise array index version.
Like I said, not sure what the actual situation in practise is, but it's worth looking into. The difference between vectorized SSE code and plain old x86 code (for example) is WAY greater than the trivial insignificant difference between the two examples you posted.
You're right, however, you're also assuming that your pointer arithmetic is faster.
Consider a 16-bit architecture with 32-bit pointers.
Using pointer arithmetic (32-bit) is slower than using an index register (16-bit) as the array index.
So stop assuming and stick with what you're comfortable with. If you prefer pointers, fine. if you prefer arrays, fine. But if you're so concerned with the speed, you'd be doing it in assembly.
Do you even lift?
These aren't the 'roids you're looking for.
I'll admit that I'm always slightly bemused by these sorts of discussions. That bemusement quickly turns to disiniterest after I realize that a lot of people are burning a lot of cycles arguing about it.
Quite frankly, your example has little relevance in the real world. The optimization you are talking about is covered by strength reduction, as others have pointed out. But that's not the point of my message. This sort of piddly optimization means almost nothing when one looks at the whole application. If this piece of code is in an inner loop that takes 90% of the application time and it proves to be the bottleneck, then one can think about taking a closer look.
We have customers that come to us all the time with just such examples. They literally tell us, "You missed an opportunity to use a register here," and we know it's important because we know the customers are serious about profiling code and finding bottlenecks. So when they come to us we are happy to look at the "piddly stuff."
I've seen all kinds of different code. They basically fall into two categories: code that the compiler can do something significant with and code the compiler has no hope of automatically optimizing in any truly meaningful way. When I say "significant" and "meaningful," I explicitly mean "not Dragon Book stuff, except for scheduling (which can provide a significant performance win)" Simple optimizations like common subexpression elimination and copy propagation are more useful at enabling other optimizations than in any cycles gained in their own right. They are important, but not to make the code run significantly faster in and of themselves.
If one writes an application that does a lot of branching and pointer chasing (say, like, a traditional compiler*) then there's not much an optimizer can do with it. The aliasing difficulties alone will kill most optimizations. It's more important to write these kinds of applications in an understandable way because that is where the programmer time is most costly.
That said, judicious use of directives for compilers that support them can go a long way toward making these kinds of codes run really fast. Think of threading a tree search, for example. But the compiler is not going to have much hope of converting such a low-level piece of code without help.
An example of code that a compiler can do really, really well on are the traditional scientific applications. Here parallelism is everything, be it data-level, thread-level or at the distributed memory level (for really big machines). In these cases the loop optimizations are orders of magnitude more important speed-wise than sequential optimization (though, as I said, sequential optimization can enable some of these loop transformations). Some of the more important loop restructuring transformations are:
When the compiler is done with these, one hardly recognizes the code anymore. :)
In my experience, the fundamental problem is that compilers are really hard to understand. People argue about what a compiler can and cannot do because they are enormously complex systems that require arcane knowledge of language standards and hardware architecture to really dig into. It's slightly less difficult to understand the broad strokes, for example, simple cases of vectorizable loops. It's a lot more difficult to understand how to parallelize a loop that compresses a sparse array into a dense one.
If there's one lesson that I like to convey to programmers, it's to not sweat the optimization details. Don't hand-optimize the code. I can tell you fro
If you're talking about optimisation then the only thing to do is to check different strategies. You can't just assume type a is a better coding method than type b. Even if you could prove it were true for every single compiler on the market, one could come along tomorrow and do things differently.
With modern compilers you should always use arrays. The reason is that for all but the most pathological cases of indexing, the compiler will do strength reduction and induction to turn the p[i*4] into *p,p+=4. The problem with using pointers is that if you have more than one of them, the compiler has no easy way of knowing if they point to the same place (this is known as pointer aliasing) so a bunch of useful optimizations have to be turned off. If you are doing a[i] = b[i] the compiler can much more easily find out that a and b are distinct memory locations than if you are doing *p = *q.
If you want to learn more about these kind of source level optimisations, look the AMD Athlon(TM) Processor x86 Code Optimization Guide is a good reference -- it includes a section on why you want to use array accesses instead of pointers, and also has a lot more up-to-date and useful advice on what compilers do to your code. It is available from AMD's website.
The interactive way to Go -- http://www.playgo.to/iwtg/en/
In the PC world, the textmode was handled in hardware by the graphic card. Yes, we had already hardware accelerated display at that time. In fact is still in the latest PC video card.
One byte for the character, one byte for the color attribute.
So your argument doesn't apply.
> I then came to realize that he was traversing the row-major image in columns. The cache coherency was being shot to hell because none of the cache-lines were being hit as it went from pixel to pixel.
According to my experience, this happens all the time. People just don't understand how a cache works (ok, the details can be difficult). Back in the old days, it did not matter, but nowadays getting your cache access right is one of the first things to do if you want to improve performance.
On a SPARC, you have counters that give you an instruction per clock cycle reading. Just try it on a few programs, I have seen many lurking around 0.25 or 0.3. The rest is usually wasted waiting for the cache to catch up.
No, it isn't:
So i[4] is the same as k[4] and is the same as: *(int *)((char *)k + 4 * sizeof(*k)).
I played a little with the code you posted. The pointer arithmetic is slightly faster. But understanding the problem first and coding second will most likely bring the real optimizations. As an example: for strings that are multiple of 4 and quite long (I tested greater than 32 chars ) this code is faster than what you posted (I hope it is correct).<br><br>
void reversestring_array(char *str)
{
unsigned int *stri = (unsigned int*)str;
int head;
int tail = -1;
for ( int k = 0; str[ k ]; k+=4, ++tail )
{
char tmp = str[ k ];
str[ k ] = str[ k + 3 ];
str[ k + 3 ] = tmp;
tmp = str[ k + 1 ];
str[ k + 1 ] = str[ k + 2 ];
str[ k + 2 ] = tmp;
}
for ( head = 0; head < tail; ++head, --tail )
{
unsigned int temp = stri[ tail ];
stri[ tail ] = stri[ head ];
stri[ head ] = temp;
}
}
Now a funny sample of the code optimization people at my work place use
register long vectorxx
register long vectorxy
register long curr_x
register long curr_y
register int xctr
register char *pSrc
register char *pDst
last_rot = angle
long sin
long cos
long ptx
long pty
long xcoord
long ycoord
long vectoryx
long vectoryy
// force the compiler not to waste registers for these variables
pSrc = &angle
pSrc = &sin
pSrc = &cos
pSrc = &srcheight
pSrc = &dstpitch
pSrc = &dstwidth
pSrc = &dstheight
Wow. Just wow.
No one will probably read this comment because its been a day since the OP, but I'm amazed at the quantity of people who are slamming this guy for wanting to research something that's admittedly interesting.
For starters, if the submitter is a CompSci student then he definately gets my kudos. Too many CS students are just focused on "I wanna learn C# so I can go make money!" as opposed to actually LEARNING.
Secondly, what happened to just plain geekiness of research and studying things because its fun and interesting? Does everything we do have to have some specific applicable purpose? If you say yes, you are thinking like the MBAs that always get bashed around here instead of a real nerd.
Who knows? Its unlikely, but possible that thinking about this problem somehow leads to a train of thought that solves P=NP or something.
"You cannot find out which view is the right one by science in the ordinary sense." - C.S. Lewis on Intelligent Design
If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.
It is precisely your thinking that causes raising a window to bleed a million cycles in the first place. Efficient code is always good! Even if it only saves one cycle, there is no telling how many times the function will be called so, it could make a huge difference. Imaging how many cycles would be saved if you shaved one cycle off of printf() or strcp().
It is your thinking that causes a 3Ghz machine to run no faster than a 386 running DOS. It is your thinking that causes 1 gigabyte of RAM to be barely adequate and it is your thinking that causes 100 gigabyte hard drives to be inadequate.
The man asked a legitimate question and if you don't have an answer then STFU! Don't try to belittle him for not sharing your mentality for slow bloated crap you call code!
Good code is even better when you throw more horsepower behind it. Bad code requires more horsepower to stay the same. Finally, just because most people produce bad code doesn't mean that everyone should produce bad code.
On some compiler/architectures array indexed code is faster than direct pointer access. On others the opposite is true. I discovered this while writing portable code for Mac/Win32 ten years ago. The inner loop of my pointer code was running faster on 68k/PPC as compared to x86. When I switched it to array dereferencing, the x86 was faster. I confirmed the results by looking at the disassembly of the compiled code.
The moral of the story: write the code that is most readable/maintainable to your eyes.
I've written quite a lot of code for Photoshop plug-ins.
Since this type of code typically iterates over a few hunderd million pixels you'd think that changing such details as array vs. pointer or some other common optimalization technique would have an impact.
It does; it typically shaves about a few tenths of a second off of a 5 minute calculation.
Then again, spending that same amount of time altering the algorithm will usually increase performance in a noticable way.
Nowadays I don't bother optimizing code (usually the compiler does a better job at it anyway) but rather optimize the algorithms. Instead of opening the topic and waiting for a definitive answer on your quest for ultimate performance, you could probably have rewritten the algorithm and gained much more performance you'd ever get this way.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Master programmers know that it doesn't matter. We:
1. Write the code in the form that is clearest. (So that the maintainer--even if it is ourself--can figure out what it is doing quickly when debugging it later.)
2. If AND ONLY IF performance is a concern and if AND ONLY IF this code has been shown to be a bottleneck, do we consider optimizing. We then try several different implementations and then profile them to choose the winner.
An article previously mentioned on Slashdot.
First of all, the array and pointers are not the reason why a C program run slowly.
"Unnecessary looping" is the cause of the problem. I have encountered and fixed many C programs created by programmers, who have left the company.
I have found that many programmers like use alot of for-loop and while-loop in their programs. Most of the time, the algorithm of their C programs can be modified to reduce the need for the loops and increase the speed of the C programs by 10 times or more.
Expert C Programming Deep C Secrets, a book by Peter Van Der Linden has an interesting chapter concerning why a pointer and a structure name are not the same and do not behave the same in real use. See chapter 4 of this book:
"The Shocking Truth: C Arrays and Pointers are NOT the same!"
That is a recommended read for anyone who is a C programmer.
Second point: If the size of the arrays in question are very large, then it is a guarantee that most modern operating systems will swap these arrays in an out of memory using virtual memory techniques. So, unless you are writing modules for device drivers or an Supervisor level on modern processors it really doesn't matter what you do with the array indexing. The virtual memory swapping will slow you way down.
Since I often wonder just how much performance i'm giving up using java instead of c, I try an experiment like this about once per java version.
The current result
c (ms visual c, optimize speed): 4391 ms
java (5.0, -server): 10031 ms
So there you have it, java is only a little over twice as slow as c, even at something that ought to be c's greatest advantage.
If anyone knows how to optimize the java outcome better without obfuscating the code or changing the contract of the functions, comments would be welcome.
I also compared replacing the strlen calls with a hard-coded length of string value (tail = 19;), to make sure that my strlen implementation didn't suck, but the outcome was pretty much the same, c twice as fast as java. (c: 2359 java: 5171)
Here's the code (link to avoid violating compression filter):
C:
http://ptth.net/slashdot/stringreverse_c.txt
Java:
http://ptth.net/slashdot/stringreverse_java.txt
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Now for the promised smart-bot:
--
C makes an art of confusing pointers with arrays and strings, which leads to lotsa neat pointer tricks; APL mistakes everything for an array, leading to neat one-liners; and Perl confuses everything period, making each line a joyous adventure . -- Tim Peters
"Bloat" is one of those catchy ideas that seems to explain things but is IMO mistaken. Modern software isn't actually that bloated - it's much more featureful, and it handles much larger and more complex datasets in much more straightforward ways. Also, it's often cross-platform, or designed for forward portability and rapid maintenance in ways that old software was not. None of these things are "bloat" - you'd miss them if there were absent.
Well, I guess this article might be about you then...
If you think imaginary property and real property are the same, when does your house become public domain?
My answer, TO THE QUESTION is:
I can't think off the top of my head any example where pointer arithmetic + platform would fail or be worse than an array, but I bet there is.
Hence, I would use arrays because it'll work well, ON AVERAGE, on all platforms and later add #ifdef(_POINTER_ARITHMETIC_IS_BETTER_) optimizations inside, if required at all.
I hope more people add their feedback on the very interesting question asked:
Will pointer aritmethic be worse/better/the same on some platforms?
and less people try add noise answering ANOTHER question:
What would a self-proclamated master programmer optimize on modern machines?
True, they take the same amount of time on x86, but on many architectures, the addition performed in the (%esi,%ecx) argument takes an extra CPU cycle. If the compiler doesn't manage to strength-reduce array accesses to pointer manipulation (you can see what assembly code GCC is generating by using -S instead of -c), you'll need to do it yourself.
No usenet died somewhere between "GREENCARD LOTTERY" and "Me too".
for some ISA's I'd expect a niave C compiler to generate FASTER code for indexing an array than for incrementing a pointer. (I'm thinking of word addressed ISAs, where a 'char*' has to be represented as a word pointer and a character offset.)
The char is the smallest addressable unit. On word-addressed architectures I've seen, CHAR_BITS is typically greater than 8, making char and wchar_t equivalent. Using 16-bit encoding of characters wastes memory only if the vast majority of your users use US-ASCII.
you will get more benefit for your effort by: 1. finding better C compilers
Which students and especially unemployed BSCS graduates trying to keep their skills current generally cannot afford. In this economy, how many burgers does one need to flip in order to buy a copy of icc?
or using a profiler to find the real hotspots in the code and hand-optimizing them for the hardware platforms that you really care about.
Which I'm supposing has already been done, and somebody is wondering whether hand-strength-reducing array accesses is a worthwhile hand-optimization technique where profiling points out that it may be needed.
Sometimes pointer arithmatic can be safer than using an itterator. The assumption of this code is that the length of your input will fit into an integer -- this assumption may not be accurate. With the itterator/array code, you may have different results based upon the optimization settings.
You and ...
nobody else !!!
With less code the compiler will have an easier job to optimize your task than a complex expression. A breaking down of the code into manageable parts is also important.
One issue that hasn't been discussed yet is that code optimization today is depending on the hardware used. If you can execute all your code and data in the cache of the CPU it will be a lot more efficient than if the data has to be accessed from external memory.
When it comes to optimization there is also another thing to consider - even if the compiler is able to place unnecessary code outside a loop you shouldn't rely on it. If you do that by hand before then the compiler doesn't have to make considerations about such code - or at least at a much lower level. Depending on the compiler there may actually be a limit on how large blocks it is able to consider during optimization.
Use of switch statements in C instead of if/else/if/else/if/else-statements may actually also be both a readability and a compiler advantage since (if the case-labels are correctly managed) the switch statement can be translated to a jump table while the if statements are harder for the compiler to translate.
There are also many more optimizations that are very processor specific. Old processors like the Z80, 6502 etc. wasn't very complicated while modern processors are much more complex with different cache levels and prefetching. The prefetching of code is actually something that can be very tricky to do compiler optimization for when doing simple loops. There are actually some documents available both from Intel and AMD that describes how to work with prefetching and SIMD etc in which cases the code to use would be less efficient than a simple loop on any processor that doesn't support such features.
But as stated in other replies - figure out where to actually try to improve the code before doing optimizations. There are tools out there that can be very useful - Like Quantify (former Rational, now IBM) that can create a graphical tree over how functions are called and the amount of time consumed.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
an arrow pointing to the toilet roll
So that's what a
*toiletpaper[]
looks like.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Am I the only why that finds it disgusting that so many people would rather challenge the basis of a question, rather than actually try to answer it? I consistently see "Who cares?", and "it doesn't matter." Who cares? the person asking the damn question, that's who. And if he is bothering to ask it, it matters to him.
There are many many situations where something like this would be important, and if you don't recognize the importance of something like this you shouldn't be challenging the basis of his question.
This sig rocks the casbah.
Best laugh of my day!-)
care to comment on this further ?
im interested, in how you avoid using for and while loops
I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz).
Actually, 10 MHz is 399 times slower than 4GHz. 400 times slower would be 0 MHz.
It certainly used to be true that the embedded C cross-compilers I used in the 1980's generated significantly tighter code with pointers than with arrays. So I got into the habit of coding that way.
Most Significant: In the given example (string reversal) the subscript code isn't safer than the pointer code. Neither technique is likely to create buffer overflows or problems like that.
To really compare speeds, the code should use post-increment on the subscripts/pointers. On simpler compilers that can speed things up. Since that's what you'd do if you were coding for speed you should test it.
With a perfect compiler it doesn't matter. The question posed was, how good are the various compilers? Fair question.
I18N == Intergalacticization
Some products need to be built for performance. If you're in that situation, you need to know beforehand what techniques will shoot yourself in the foot. The original article asks that question. You failed to anwer it.
I18N == Intergalacticization
Quick code:
main() {
int z;
int x[2] = {1,8};
z = 1;
printf("Value of array: %d\n", z[x]);
}
What happens?
The array version is neither clearer nor less error-prone. And I dare you to prove differently!
The pointer version is more portable across the chasm of compiler quality. That is, it will run well even on poorer compilers.
The question of clearer is a question of cognitive psychology, not computer science, and requires experimentation to validate. The experiment would be confounded by the existing prejudice against pointers and the habit of CS profs of teaching subscripts over pointers.
N.B.: There are problems with C pointers, but that comes up when using pointers to data structures, not pointers to bytes. Dangling pointers, free/malloc problems, and all that. Using subscripts helps very little, however.
It is not true that the optimization is nearly pointless, it is that the difference in performance is so small as to make the decision nearly pointless. No pun intended. In some situations, the difference may be very important. The pointer version is no harder to code, read, or debug, and might possibly give you back benefits.
What I'm trying to say here is that the pointer version is not an optimization, it's an alternative. Optimizations require more work than the vanilla version. For string/character loops, there is no more work in coding the pointer version.
I18N == Intergalacticization
But nothing gets you going better than the funky, chunky, hum of Chartreuse when you get a jones'in for some chromatones!
If it were done when 'tis done, then t'were well it were done quickly... MacBeth
n/t
If it were done when 'tis done, then t'were well it were done quickly... MacBeth
Don't get me wrong... my specialty is optimizing in high level languages such as C/C++ and Java. But all I'm really doing is cleaning up code. I'm not really optimizing in the true sense of the word.
The biggest issue I've encountered is getting maximum speed when processing large amounts of data. This is the classic case. The other one is computationally intensive routines (but that usually requires lots of data as well).
My point is that if you really wanted to optimize a piece of code to process large amounts of data, you would do the following:
1. Try to see how you can segment the data into chunks of less than or exactly 2K.
2. Set some memory aside on a page boundary that doesn't exceed the page size.
3. Prefetch the next chunk of data.
4. Load the fist chunk of data to be processed into the memory that you set aside.
5. Work on data inside this side memory.
6. Save it back out using non-temporal stores (unless you need to reuse the data).
7. repeat by going to step 3.
The above is the fastest way to process large amounts of data. It'll be 10 to 1000 times faster than anything you can do in C/C++ or any other language. The speedups are comparable to C64 vs IBM386DX2/66. You just can't do this in any high level language. At least not in a readable way. If you want to write a video codec for example, good luck writing it in C/C++ where it matters. I'll laugh at you if you say Java. Could you imagine loading a VM for a video codec?
Trying to optimize without using the capabilities of the computer is sorta like trying to build a house without using a hammer. Your software runs on a MACHINE. Trying to optimize in a high level language is IMPOSSIBLE because you don't have full access to the machine!!! The most you can do is clean up your code or design a new algorithm.
These discussions never get anywhere because you're arguing over the best of the mediocre in terms of performance. Until a high level language appears on te market that let's you deal with cache issues, you're all out of luck. And pointer vs. index is retarded. All memory is accessed via a pointer to the CPU.
High level languages are used to better describe concepts and ideas. Use it that way... ie. readability first.
Example for avoiding loops
One C program I encountered requires it to search for the country name base on the country code.
The program read from a reference file and use 1 array to store country_name[] & another to store country_code[].
Then it read from the raw data file which contains the country_codes. It use a for loop to try to match the country_code from the raw data file and the arrays to get the country_name. The it print it to another file.
This is stupid, I change the C program so that the array for country_name[] use the country_code as index. eg. country_name[country_code]
This method eliminate the use of for loops.
When the program read from the raw data file, it directly use the country_code
to get the country_name without searching for it in a loop.
eg.
print country_name[country_code from raw data file].
This speed up the C program.
And I think your code will actually run, too. I like the semantics (put the tail of the string into temp, put the head in the tail, and put the temp into the head -- more like English).
I18N == Intergalacticization