Effect of Using 64-bit Pointers?
An anonymous reader queries: "Most 64-bit processors provide a 32-bit mode for compatibility, but 64-bit pointers are becoming essential as systems move beyond 4GB of RAM. Also, the large virtual address space is very useful for several reasons - allowing large files to be memory-mapped, and allowing pages of memory to be remapped without ever requiring the virtual address space to be defragmented. However, 64-bit pointers take up twice as much memory, which immediately affects memory footprint. This is especially an issue on embedded platforms where RAM is at a premium, but even on systems where RAM is plentiful and cheap the extra memory footprint reduces cache performance. Have Slashdot readers done any research into the actual effect of using 64-bit pointers in a 'typical' application? What proportion of a real program's data is actually pointers?"
Have Slashdot readers done any research into the actual effect of using 64-bit pointers in a 'typical' application?
none whatsoever.
What proportion of a real program's data is actually pointers?
none whatsoever.
oh... i use java.
MARIJUANA, SHROOMS, X: ONLINE?! - E
Is this really a problem in the embedded space?
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
How many embedded devices are running 64-bit processors now? Offhand, I'd say this is only a problem if you have an embedded device with more than 4 GBytes of memory... in other words, it hardly sounds like a real-world problem for embedded devices. Yes, workstations and servers with 64-bit processors should probably be using 64-bit pointers.
"Freedom means freedom for everybody" -- Dick Cheney
Huh? On systems where RAM is at a premium, I don't see the point of using or having 64-bit pointers.
"Send an Instant Karma to me" - Yes
There is always potential trade-offs between run speed and memory space. For example, you could always use a single 64-bit pointer, and save all your addresses as 32-bit or even 16-bit offsets from that pointer (requiring pointer arithmetic to access any object). Then you would use less memory, but your code would run faster.
"Freedom means freedom for everybody" -- Dick Cheney
Now, this kind of stuff might be useful for...um...hard-core video editing...and really, really huge servers, but that's about it. The truth of the matter is that your everyday user just has no need to handle numbers of that size or data of those quantities. There are very few situations where 32-bit processors are actually a problem.
But there is another kind of evil that we must fear most... and that is the indifference of good men.
Given that memory access times are bound by latency far more than bandwidth, the effect of loading another four bytes into the register file is most likely insignificantly small. I'm certain that 8-byte register-to-register operations *are* insignificantly small, and it's likely that pointers, given that they are not large but often accessed would be kept in registers. It would depend highly on the particular architecture.
Does anyone use 64 bit processors for embedded applications?
There's an interesting discussion of 64-bit immediate values at the following link: 64 bit immediates in Python
If we are already using 64 bits for our pointers, a virtual machine has the potential of exploiting a the pointer's larger footprint for other immediate values. I'm not as crazy about using the MSB of the pointer for indicating an immediate as Ian Bicking appears to be, I'd recommend using the LSB since it's easier to bias any object to an even address than halve the potential addressable space.
Then again, if the potential address space is 2 ** 64, I suppose it's not such a sacrifice.
Weapons of Mass Analysis
With modern processors it's not uncommon to require 64-bit or 128-bit memory alignment on data structures to get the best performance. There are even some instructions that *require* such data alignments in order for them to work at all (for example: MMX or SIMD).
Because of these existing data alignment issues, going from 32-bit to 64-bit pointers may have absolutely no impact on a program's memory usage and cache performance. It is highly likely you're already using 64-bit alignment when you enable the compiler's optmizations.
Unless you're building massive linked lists of stuff in a scientific / simulation environment this is probably something not worth worrying about. The efficiency and volume of your actual data will still be the biggest waste of space - and it's not like you won't be able to attach more physical memory onto your new system than the old one.
If it does effect you... you probably already know what you're doing or you've been making very bad assumptions about the size of your variable types.
Then you whine about using an extra 4 bytes per pointer to address it. Seems to me that the number of pointers relative to the amount of RAM is so small it's not an issue. Correct me if I'm wrong.
"Eve of Destruction", it's not just for old hippies anymore...
The biggest problem of using larger pointers is not so much the extre memory used (memory is cheap). The real problem is that you consume cache space much faster so you page at a much higher rate. This can slow down your program by a factor of up to 5x.
With linux on sparc64, typical applications are 30% slower when running in 64bit userland mode as opposed to 32bit userland mode. There are of course exceptions...
Raymond Chens web log. Lately he's been discussing IA 64 programming. I don't pretend to understand 1/2 of what he's talking about but I thought some of the readers here might be interested in what he has to say.
"For a successful technology, honesty must take precedence over public relations for nature cannot be fooled." -Feynman
...this year, anyway. (-:
Got time? Spend some of it coding or testing
My God. The kludge that would not die! I thought we did away with memory models when we finally got rid of protected mode. But nooo. People still want to squeeze a few more bits out of their memory systems. Somebody call an exorcist!
First of all there is no such thing as a typical program... If you are writing a lisp interpreter where everything is a pointer then you may see your memory usage almost double. If you have a numerical program that is dominated by huge arrays of floats you might not see any difference at all.
Second, here is a trick I have seen - it seems a bit strange but works well if you encapsulate your data well. Keep in mind that objects are generally aligned to a 8-byte boundary (if they are malloc'ed). That means your low 3 bits are not used at all. If your objects have, say, 64 bytes of data in them (possibly after a bit of padding) then you are wasting 6 bits. Just store your pointers as 32-bit words, shifted over by 6 bits. When you want to dereference them, your get-the-pointer accessor function just shifts them back and gives you a 64-bit pointer.
Now you have an effective address space of 256GB and your data size has not grown at all. Maybe you have taken a hit in performance but until you benchmark you never know...
Now, this kind of stuff might be useful for...um...hard-core video editing...and really, really huge servers, but that's about it. The truth of the matter is that your everyday user just has no need to handle numbers of that size or data of those quantities.
What happens when "your everyday user" wants to perform "hard-core video editing" on footage she shot of her family with her miniDV camcorder?
Usually, code in a given process won't fill more than 4 GB. In a jump table situation, instruction pointers can be 32-bit while data pointers are 64-bit, in a memory model resembling the "compact" memory model of old 16-bit Borland C++ compilers.
Unless you're running a 64-bit platform on a pitifully small amount of RAM.
Three words: PDA.
You will only trash the caches if you use (read: address) the entire 64bit space, not if you have 64 bit or 128 bit pointers. That should be obvious since caches work in contiguous chunks of memory, and so long as you stay in the same chunk, it doesn't matter, from the caching point of view, that you used 28 or 60 bits or whatever for its high-order address. The cache will have to use quite a bit more memory for storing the page addresses though, but that doesn't cause any trashing as it's done in the design stage.
Bandwidth from memory to cache won't matter either because the address bus will just have to be wider.
Care to document the above statement ? I believe it to be between very inaccurate to downright false, and here's why:
Using more bits for addressing does not do anything to the cache's data memory: that memory works in CPU words and on a 64 bit system those are already 64 bit. Cache lookup tables, on the other hand (where it stores the high-order address bits for each cached block) will have to accomodate the additional bits, but that is done on the drawing board when designing the cache and it is a fixed amount dependent on the number of blocks in the data part of the cache memory.
Also, paging has nothing to do with caching. Paging is a memory virtualization mechanism the CPU uses. Caching works in blocks of data that are almost never the size of a CPU page, they are "lines" of 16, 64 or whatever bytes.
I concur with your findings. Back in the days I was experiencing a little disconfort with the speed of my Pentium 90 running linux, I decided to buy a Digital Alpha system 266 MHz. Both systems were configured with 64 MB, and both ran Red Hat 5.2.
Although the Alpha system is obviously superior in number crunching, I noticed it ran out of physical memory on a regular basis where my P90 whould still be happy. Part of the matter it that alpha binaries tended to be much larger, as was the kernel. But I'm also quite sure that a major part is primarily due to the increased amount of "lost" bits in pointers and memory alignments of small data structures.
--
The problem with engineers is that they tend to cheat in order to get results. The problem with mathematicians is that they tend to work on toy problems in order to get results. The problem with program verifiers is that they tend to cheat at toy problems in order to get results.
First 32-bit mantissa is a timestamp, second is the pointer.
Its nice to have a pointer with time. This makes for some interesting algorithms...
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
On CPU:s with segments the impact must be much less if even at all. Say for instance that you reside in a 32 bit segment X and 16 bit subsegment Y then you would use 16 bit storage of pointers in RAM even though the CPU constructs the full 64 bit pointer internally by concatenating all the parts from the segment registers with the 16 bit from RAM.
I don't assume any CPU in particular just the principle of segments.
Thus, you can expect Java heaps to expand by about 50% when moving from 32-bit to 64-bit pointers. What effect this has on your program's performance depends on the relation between the program's resident sets and the machine's cache. For instance, if your program has a resident set of 200KB on a machine with a 256KB cache, then the extra 50% will blow the cache and kill your performance. If the resident set were 150KB, the performance impact would probably be minimal.
Disclaimer: I was doing this as a pet project in my spare time, so take these numbers with a grain of salt.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
>This is especially an issue on embedded platforms where RAM is at a premium... What kinds of embedded platforms are likely to be needing greater than 4gb RAM anyhow? I sure as hell cant imagine a use for a 64bit washing machine with upwards of 4gigs .. Thats a hell of a lot of washing programmes.
Electronic Music Made Using Linux http://soundcloud.com/polyp
There's a lot of modern medical equipment which can definitely use the 4GB. MRI machines, CT scanners, ultrasound machines ("sonographs" if you prefer the term) and so on do tend to chew up memory. Particularly the first two, because you often need to hold whole voxel sets in memory while you compute a bunch of cross-sections at odd angles.
We're about to up our Doppler sampling rate to
Since it takes the technician a good ten minutes or more to find the signal, we're looking at without batting an eyelash.Granted, that's not in the same league as the three dimensional stuff, but it ain't exactly peanuts, either.
PS: I haven't done the math yet, but if 8 byte doubles don't give us sufficient granularity to store 24-bit samples, we may need to up our storage to 12 byte [96 bit] or even 16 byte [128 bit] doubles.
Where's that IEEE standard when you need it?
Isn't it unfair that anyone can say something like "what kind of embedded systems need more than 4GB online simultaneously" off the top of their head, but when Bill Gates says something like "640K is more than you'll ever need", it's taken so seriously and used to make him look bad?
Yes, today 4 GB is a hell of a lot, but 20 years from now, I'll bet that everything will have gigs and gigs of memory, and then your statement will look as stupid as Bill Gates' did back in the 80s.
And mmap() can have multiple mappings of the file -- how do you think that's handled? It's the same thing. Why should the kernel have to go to disk again? The map consistency has to be there already, to handle mmap(). COW if its private, otherwise, share. If the read()/write() buffer is not aligned, you do need to copy the data -- as if you are a user of mmap(). Big deal, the optimization is "lost".
Still, the easiest way to handle this is to always mmap() files, and read/write will either (a) be replaced by mmap(), or (b) do a copy. If the kernel lives in 64 bit land, the application can still live in 32 bit land for better cache handling, and the optimization works. If the kernel is 32 bit, mmap() files always doesn't work, and there are problems... (HURD), or some semi-fancy filesystem code.
As to WHY? The same code can run at near-optimal speed (mmap-ish), and STILL use read/write for portability to other environments. If I write mmap() based code, I have to worry about alignment, AND have to worry about porting to read/write. If I use read/write, and give page-alignment, the OS can optimize if it is able.
Do YOU want to do the lifting (porting) or leave it somewone else?
Ratboy
Just another "Cubible(sic) Joe" 2 17 3061
A few years back I did a test with a server which store state information (I will not bore you with the details). I did some performance test on both the 32-bit version and the 64-bit version. Same source code. Same test data. Same configuration. On HP-UX 11.0 PA-RISC with the aCC compiler.
The 64-bit version used about 15% more memory than the 32-bit version. But it was also 20% percent faster. That still puzzles me, because the server does not perform any 64-bit operations.
They usually call them "references", but names like NullPointerException give the game away...
Yes, it's true that Java's pointers don't behave quite the same way as C's pointers, but then they don't behave like C++'s references either. It's a different language.
If you meant that you don't do pointer arithmetic, you'd be right, there's none of that available to users in Java, and that's mostly a good thing.
Shame your Java apps use 16-bit characters, when Unicode needs more... maybe you could switch to a language better suited to modern tasks such as I18N. C or C++ might handle that need just fine, with wchar_t typically being 32-bits.
If you've moved to a 64-bit platform, it's often because you need access to every bit of memory and performance you can get. Increasing cache misses and filling your memory by having larger datasets because of longer wordsizes isn't a trade-off to be taken without careful consideration.
On the other hand, most of the systems I know that use >= 4GB of RAM are databases, and pointers make up a tiny fraction of the memory footprint.
Seriously, it is faster. I've been writing in assembly for years, and unless I need a 32 bit pointer, I generally don't use them.
If you're that concerned about performance that you are analysing pointer size, you might as well code in assembly. Yes, 64 bit pointers have a bigger footprint, but we experienced the same problem when we went to unicode strings, 32 bit code, etc...
My advice is this: let the compiler deal with it. Unless you are willing to crank out a lot of hand-coded assembly or are interfacing with hardware, the 32/64 bit pointer question is pretty much moot. As it is, you can't control:
for (int x = 0; x < 256; x++)buffer[x] = 0;
Into something like this:
mov cx,64
mov eax,0
mov si,buffer
cld
rep stosd
Instead of the literal translations of the old compilers:
mov si,buffer
mov bx,0 ; this is the x variable
forlabel@10001:
mov [bx + si],0
mov ax,1
add ax,bx
xchg bx,ax
cmp bx,256
jl forlabel@10001
The former takes 68 instruction cycles, the later takes (6 * 256 + 2) = 1576!
The aforementioned issues have a much bigger impact on performance than pointer size. Given that the memory bus is at least 64 bits wide on anything newer than a pentium, you won't incur a clock cycle penalty for using 64 bit pointers.
The only thing that I would suggest is to watch where you place pointers in structures. For example, when building a linked list, you would want to do something like this:
class link {
link * ptrforward;
link * ptrbackward;
link * ptrdata;
}
rather than:
class link{
link * ptrdata;
link * ptrbackward;
link * ptrforward;
}
Because the processor pulls 64 bits per address accessed, the former structure would have the forward pointer in cache regardless of the pointer size. With the second structure, traversing a list in the forward direction would result in a cache miss on every node visited, regardless of pointer size (This applies only to the x86...).
My experience has been that pointer size is only relevant on truly tiny systems - for example, 16 bit code which has to fit into a few kilobytes. Usually, as programs scale to work with larger datasets, the percentage of memory used for pointers decreases rapidly. You'll find that as data sizes increase, the practical uses for linked structures shrink; locating an element by using a binary search on a sorted array scales much better than a linear search traversing linked list.
The society for a thought-free internet welcomes you.
But with this approach we can have 64-bit machines and still have problems when a 32-bit time_t wraps around.
"Most 64-bit processors provide a 32-bit mode for compatibility"
One free mod point for the first correct answer: Name a 64-bit processor with a 60-bit mode for compatibility.
echo 33676832766569823265328479713269.8639857989Pq | dc
That blows. There is so much cool stuff you can do with 64-bit pointers, because nobody really neads more than about 45 of those bits.
A deep unwavering belief is a sure sign you're missing something...
With 64-bit processors, I'll have enough memory to rule the world! Mwah ha ha ha!!
64-bit versus 32-bit
I think I'll call it "Bob."
This is a dumb question. Do you really think that on a system with more than 4GB of memory that memory would be at such a premium that an additional four bytes per pointer would even be noticeable? Surely you jest!
20 years from now you'll be happy AMD prevented billions of lines of code that relied on such hacks.