Porting to 64-bit Linux
An anonymous reader writes "As 64-bit architectures continue to gain popularity it is becoming more and more important to make sure that your software is ready for the shift. IBMDeveloperworks takes a look at a few of the most common pitfalls when making sure your applications are 64-bit ready. From the article: 'Major hardware vendors have recently expanded their 64-bit offerings because of the performance, value, and scalability that 64-bit platforms can provide. The constraints of 32-bit systems, particularly the 4GB virtual memory ceiling, have spurred companies to consider migrating to 64-bit platforms. Knowing how to port applications to comply with a 64-bit architecture can help you write portable and efficient code.'"
Provided your code isn't written in assembly, do you really _have_ to do anything else than to recompile it? Of course, you might want to make changes to make better use of the 64 bits, but to just make it run, wouldn't this be enough?
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
Well, if you RTFA you'll see that there are issues with how registers are handled on certain integer values for example.
I'm sure it won't affect your VB app, but it could affect something written in C/C++.
I'm just wondering if this is what is holding up an AMD64 version of Flash.
$30 Off All Plans: Use code TRIPLESAWBUCK
In theory, yes. However programmers usually do stupid mistakes, like assuming that sizeof (int) == sizeof (void *) and so they think they can cast pointers to int and the other way around, while on a typical LP64 platform (like AMD64), ints are 32 bits wide and pointers 64 bits, so the cast will not work as expected.
All in all, nothing new here, well written portable code just need a recompilation, and everything else will need to be debugged.
Generally architecture changes, compiler version changes, break code on large projects. Over a million lines of code, any tiny little difference in the platform that the original developers didn't think to account for will come up *somewhere*. A good example of this is if you are dumping data structures to disk or network and write a size_t variable. Suddenly, you can no longer communicate between 32 bit and 64 bit versions of your software.
As a general rule, "just a recompile" *never happens* for any architecture and compiler change on a project above a certain size. Compiler writers break compatibility with some little ol' thing they don't think anyone is using, but which everyone is actually using in *every* version, fail to implement uncommon or difficult language features, add non standard features that other compilers don't support. Then application developers do things like not swapping to network byte order and using architecture dependent data types (size_t as in the example). Between different unices, header file contents will change.
The fixes are often not that hard (usually trivial) to do between say versions of the same compiler, or endian switches... but they are still there and annoy the hell out off people trying to compile old open source software on a new platform, like say macosx was a few years ago and x86 64 is now. There's always growing pains.
I have windows XP 64bit edition, and let me tell you- that 128GB ram limit really pulls me down..
every day http://en.wikipedia.org/wiki/Special:Random
If you don't make assumptions about pointer sizes in your code, always use size_t in the appropriate places, etc, it is generally just a quick recompile for x64. I find a lot of open source code (I'm sure this isn't exclusive to open source, but, well, I can't see closed source!) spits out hundreds to thousands of warnings about assigning the return of strlen() to an int and other similar and usually harmless things, but most of the time it Just Works (tm).
The only area I've ran into things being significantly harder is writing clean lock-free algorithms due to the lack of a CMPXCHG16B instruction in the original spec - only EMT64 and very recent AMD64 models have it. There are a couple ways to hack around this limitation but they aren't very pretty.
I tried compiling an application for 64-bit, and the problem I found is that many libraries weren't available in 64-bit versions. I don't mind compiling something for 64-bit, but I do mind compiling the application and a few libraries, and the libraries they depend on, recursively ad nauseum.
Finally, when I did get it working, the maintainer didn't have a 64-bit OS so they weren't interested in hosting the RPM I built. It seems like until enough people have 64-bit systems, nobody really cares about it.
I've been running a 100% 64-bit dual Opteron rig for almost two years, under Gentoo. No emulation libraries, no multilib, just 64-bit code. Other than Open Office, I've had almost no trouble at all.
BTW, "64-bits" don't make programs run faster (in general) — code compiled for AMD64/EMT64 runs faster than its 32-bit counterpart (for the most part) because of the extra general-purpose registers in the AMD 64-bit design.
All about me
Ignoring 64Bit helps a lot to write portable code. For 99.999% of all Apps out there 64Bit is irrelevant, anyways.
I suspect what you're saying is that there is no particular need for 64-bit in most apps, which I agree with. But the point here is that the program should work correctly, which means code that makes assumptions like pointers and ints being the same size needs to be fixed. The point is that amd64 is making 64-bit platforms relevant to more users, not that everyone thinks most apps will be gee-whiz faster as a result.
As a side note, some programs may realise minor performance gains on amd64 from having more general purpose registers available. This is, of course, technically nothing to do with it being 64-bit but does mean that there is a potential benefit even if you never need more than 4GB of addressable memory.
My latest C project is an embedded avr system, where this is of course very important. The solution has been to make a header file with a bunch of typedefs in it like:
typedef unsigned char uint8;
typedef unsigned short uint16;
And so forth. Then I exclusively use the new types. If I need to compile to another platform, I just need to change the portable.h file.
You can even go one further with:
#if sizeof(unsigned char) == 1
typedef unsigned char uint8;
#else
#error "No uint8 type available"
#endif
That way the compiler will warn you if there's a problem when you switch platforms.
(This solution was not mine, but another developer's.)
It's not wasting time, I'm educating myself.
The article doesn't appear to mention this, but there is a C99 standard header stdint.h, which defines fixed width types. I haven't seen any OSS project use it, for some reason, but it has all the types you need for portable development; int32_t, uint64_t, constant wrappers like UINT64_C, and, of course, limit constants for all of the fixed-size types. Using these is much better than all those size-based #ifdef'ed typedefs I see people use all over their code.
I did a lot of 64-bit cleaning up for the PHP project, and I can tell you that there are more subtle issues that may arise when porting from 32-bit to 64-bit.
One example:
on a 32-bit Intel machine, a double is precise enough to distinguish LONG_MAX (the highest representable long) from LONG_MAX+1 (a number that doesn't fit in a long anymore). So for instance, to determine whether a long multiplication has overflowed, you could repeat the same multiplication using doubles and compare the result to (double)LONG_MAX.
In contrast, on a 64-bit platform LONG_MAX and LONG_MAX+1 are mapped to the same double representation, so there's no way to do the comparison anymore.
As this example involves static casts, it is something the compiler will usually not warn you about.
Another thing to be careful about is passing pointers to variadic functions (eg. sscanf), because usually the compiler doesn't know the expected types, as they are buried in the format string, not in the function prototype.
On all Linux systems:
sizeof(int)==4
the "long" and "void*" data types may be atomically written, 64-bit or not
Also:
sizeof(long)==sizeof(void*)
sizeof(long long)==8
This is quite standard for 32-bit and 64-bit systems. The only major OS to
violate this is Win64, which kept a 32-bit long and thus can't safely cast
a void* to long and back again. Linux, BSD, Solaris, MacOS 9, Win32, OS/2,
VMS, VxWorks... they all work as Linux does. (screw Win64)
When Linux was ported to the Alpha and Itanium, libraries went in /lib. Emulation libraries go somewhere under /usr, like /usr/lib/i386-linux-elf/lib or maybe /emu/i386-linux-elf/lib. Any app being natively compiled didn't need to care about this cruft.
/lib directory. Not that we want old cruddy i386 binaries!
/lib64. It is so lame. It causes all sorts of trouble porting things to 64-bit, just so we dont need to do "mv /lib/libfoo.* /usr/lib/i386-linux-elf/lib/" for the occasional i386 binary.
/lib directory without libraries, and the "/lib64" wart lasting until the end of time.
Then AMD told SuSE to make x86-64 run all i386 binaries perfectly, including installers that would expect to use the
So now we're supposed to use
A few years from now, nobody will even install 32-bit crud. We'll have a
Nah. That's a bit pessimistic outlook. Already today /lib64 is a mere symlink to /lib on current distributions. The symlink may have to be kept around of for a while though until the early nomenclature oopses have been effectively phased out.
while true; do eject; eject -t; done
Going off on a tangent:
I have no idea what a 36 bit signed-magnitude integer mainfraim (( Yeah, they really existed -- CDC made them )) would return for *(unsigned char *) (int)-2. It would probably be 0x80 or 0x40 -- but it might be 0x800 (CDC used 6 bit characters, and case shifting was done by using a second 'escape' character -- rather like unicode, so a 'char' might be either a 6 bit or a 12 bit unit -- or an 8 bit unit just to keep from choking every C program under the Sun)
Free Software: Like love, it grows best when given away.
I am wondering if this 64-bit porting article is written specifically for Macromedia.
Try this article on C++ and 64 bit coding..
s plus&seqNum=201&rl=1
http://www.informit.com/guides/content.asp?g=cplu
SCIREV.NET - fanfics,reviews & more
Really...just use Java or C#/Mono and the only thing you have to do is change the SDK/JDK and re-test.
Indeed. Having been through the horrors of 16-bit to 32-bit transition on Windows in the early 90s, it is great to be developing in Java, knowing that I don't have to care about such matters again. I let the JVM translate my bytecodes to high-performance machine code on whatever platform I am on, no matter what the word length.
Here's a guide to porting to a 64 bit Java environment.b itporting/
http://www-128.ibm.com/developerworks/java/jdk/64
Many Java applications are not written 100% in the Java language. Those apps will need some porting effort. The document also mentions considerations such as the usage of JNI by the native libraries on a 64 bit system.
SCIREV.NET - fanfics,reviews & more
64 bit porting is more of a compiler problem.
... fum(foo); results in a type warning?!?
In particular, the GNU toolchain has a very poor ability to complain about long/int coercion. It also doesn't have a 64 bit pointer type for use in 32 bit code - so any 32 bit code you need to talk to from 64 bit ends up handing around a long long, and since this is just an integer type, there's no problem with assigning it to another integer type, and potentially losing resolution (and bits off the pointer, should it be converted/passed back).
Minimally, the tools need to have a flag that complains about integer assignment between 32 and 64 bit values. Ideally, they would also include a "long void *" or some other pointer type whose type coercion would result in a warning being generated, unless the coercion was done with explicit casts.
The assignment warning has been asked for many times in the past by people trying to move ILP32 code into an LP64 environment, and of course, the tools people have objected to the idea because it would cause additional warnings that they'd have to explain how to coeerce around.
The objection to adding a 64 bit pointer type for use in 32 bit code, I can somewhat sympatize with - but they added "long long" well before it was standardized, even when it was obvious that the most correct thing to call it would be a single token like "quad" so it could be defined in and out, without having to typedef or replace explicit types. So the argument against it is very weak.
They've also messed up and refused to correct things that are obvious breakage (e.g. "typedef char *caddr_t; const caddr_t foo" results in a "char const *foo", rather than a "const char *foo", and "caddr_t const fee" results in a a "char const *fee" instead of a "char * const fee" -- meaning it's impossible to use typedef'ed values in function declarations using const in some circumstances).
Speaking of all of which -- when are we going to get a flag so that "typedef char *caddr_t; extern void fum(char *); caddr_t foo;
The vast majority of problems that arise when porting between ILP32 and LP64 are things which would be trapped by the addition of a few warnings to the compiler - but without those warnings, it becomes a Herculean task requiring a level of detail work and precision very difficult to keep up over the amount of time necessary for a large project porting effort. It's no wonder this is becoming a visibility issue, as 64 bit hardware becomes more prevalent, and people decide they want to run their code over there.
-- Terry
You are incorrect.
The following is some code that does not warn that the resolution of "long l" is potentially insufficient to store the value contained in "long ll":This is on an ILP32 machine with the current GCC 4.x. It also fails to warn in GCC 3.x, so this is not a "4.x branch thing".
I would similarly expect a warning when using an integer as an lvalue for an expression containing a long when running on an ILP32 machine, but there is no warning - any overflow occurs silently.
-- Terry
Even well written code can have problems.
Specifically, say I have a 64 bit platform capable of running both LP64 code and ILP32 (legacy) code.
I use a shared memory segment to communicate between my legacy 32 bit applications, and it has internal use of pointers to perform self-reference on data.
[Rather than complicating things, let's just assume that the pointers are internally based off the base address of the shared memory segment, rather than being based off of 0, so there is no requirement of mapping the memory into the same location in each process]
I'm now adding a 64 bit computation engine (perhaps my application is a rendering system that uses plug-ins, and being able to work on large data sets with the large address space afforded to 64 bit processes is critical, but when it comes to displaying the results, I can live easily in a 32 bit address space, so I'm not trying to port my whole tool over to 64 bits).
So now I have to deal with the internal pointers in the shared memory segment. I can do one of several things:
(1) I can use structure coercion to treat the pointers as if they were integer offsets instead, and coerce them into pointers internally in the 64 bit code (on LP64, pointers are 64 bit).
(2) I can intenrally store 64 bit pointers, rather than 32 bit pointers. This means I need the same round-tripping, but it can take place in the 32 bit applications, rather than the 64 bit applications, and the Integer representation is as "long long" as far as its concerned.
(3) I can support either a "short void *" in 64 bit applications, or a "long void *" in 32 bit applciations.
If I go with approach #3, I get to keep my type checking. With the other two approaches, I have explicit coercion, and I lose my type checking and boundary/range checking: the explicit casts quiet the warnings, even when they are used incorrectly.
If I go further, and allow the segment to be mapped anywhere in memory, it may be mapped over 4G. I might also have relative base addressing (e.g. listerner converts), where I store the internal base address in the provider as part of the data being provided). This may sound like a strange scenario (e.g. it's like DCE RPC, in that it becomes the receiver's responsibility to convert, if a conversion is needed), but it's very useful. It has the following attributes:
(a) If I use homogeneous consumer/providers, no conversion is necessary
(b) My "work horse" application can do their work, and it's up to my "viewer" applciation to do the conversion; presumably, it's not doing much other than interacting with a slow human, so this ends up being the best division of labor
(c) As time goes forward, the rest of my application is likely to migrate to 64 bit as well, so I get performance improvements over time, as the coversion requirements drop out.
You could argue that because the program was not 64 bit clean, it's not "well written". You could also argue that losing the compiler warning checking is "OK, because it's your own fault for not porting everything" (if you didn't believe in closed source third party plugins over which you had no control).
I would argue that you can expect someone to accurately predict future users of their software, and there's only so much work you can do to make sure that things don't break horribly at some arbitrary point in some arbitrary compilation environment.
For the most part, we have to rely on our tools.
And our tools do not tell us when this type of problem happens, because this type of problem is relatively new.
-- Terry
You are wrong, for some definitions of "word".
To AMD and Intel, a word is 16 bits. This is seen in the Intel-style assembly that masm and nasm use.
By the ELF binary specification, a word is 32-bit or 64-bit according to the platform. So the word size did change.
The traditional idea, with a word being the size of a register, is like the ELF spec.
The C programming language has no such thing. On both i386 and x86-64, sizeof(int)==4 and sizeof(long long)==8. On x86-64, sizeof(void*)==8. On i386, sizeof(void*)==4. On a Microsoft platform, sizeof(long)==4. On a non-Microsoft platform, sizeof(long)==sizeof(void*).
char was 9-bit
C requires at least 8 bits for char, so 6 isn't good enough.
All types must be a multiple of the size of char, because
sizeof(char) is 1 by definition and fractions are not OK.
Valid sizes are thus: 9, 12, 18, 36
The char-short-int-long progression may be one of:
9,18,18,36 a likely choice
9,18,27,36 this is the cool way: sizeof(int)==3
9,18,36,36 a likely choice
9,27,27,36
9,27,36,36
9,36,36,36 a likely choice
12,24,24,36
12,24,36,36
12,36,36,36
18,18,18,36
18,18,36,36
18,36,36,36
36,36,36,36
And? What does Java or C# have to do w/ it? If your C code made or makes invalid assumptions, then it's a bug in the code.
You are missing the point here. I don't want to have to recompile all my stuff to suit these different machines. With Java, if a machine is running a J2SE 1.5 VM, it will run my single binary, no matter what the bit size of the target machine.
Maintaining a source repository and having to rebuild it for a range of different target architectures (and support it on those architectures) is something I gladly gave up years ago. I target J2SE 1.5. WORA is a reality which I use on a daily basis.
A language's power comes in how well it suits a particular task.
Not necessarily. We are talking about the VM implementation, not just the language.
The fact that an int on Java
is guaranteed to be 31-bits + a sign bit is the least attrative benefit of the language, and arguably not a benefit at all (it doesn't offer me anything whatsoever, since I fortunately never internalized such assumptions as a programmer).
It has huge benefits. It means that, for example, you can serialise information in Java and guarantee that you can restore that information on any platform. It allows for, for example, fast binary messaging between different JVMs on different platforms, so that you can cluster your application on a network of machines that not only differ in operating systems, but also on word size, and you can hot-deploy new code as binary over such a network.
The only difference is that the C "virtual machine" has a far looser specification than a JVM since it doesn't take for granted a 32-bit x86 or SPARC environment, nor even a process model.
Neither does the JVM - it makes no assumptions of the bit length of the host machine, or the environment. It has some assumptions - that a 32-bit integer can be handled atomically, for example, but that is it.
The JVM spec doesn't fit as neatly into a 16-bit or 64-bit environment, whereas C is just at home on an 8-bit microcontroller or a 128-bit vector machine.
I think you should take a look at where JVMs are installed. They have been fitting neatly and very efficiently into machines with a wide range of bit lengths for years - especially 64-bit. One of the great advantages of Java is that you can distribute your binary code to such machines, as described above.
Actually if you are coding something which processes bitstrings for example, you will have the problem whichever language you're using.
All the Java BitString libraries I have used have been platform independent; which makes sense, as a Java program has no indication of the word size of the underlying platform.
So what you said isn't really correct in all situations. Look at your code.
Eh? I look at my code all the time!
Then -- patiently fix them all. You know, you planned to do that for years. Do it before trying to build a 64-bit version.
Then -- try the 64-bit version and fix all the warnings you missed before. void * to int conversions are my personal favorites...
Resist the temptation to invent your own types, though (Mozilla's source tree is awful in this regard). Use the standard int32_t or uint64_t, where the number of bits matters -- a simple hardware-dependent int is usually more efficient.
Make sure your next machine runs a 64-bit OS and gain practice by porting/fixing various free software to run on it :-)
In Soviet Washington the swamp drains you.
Sounds like you want the "kind" functionality of Fortran 90.
Back in the bad old days, a REAL in Fortran could be anywhere from 32 to 64 bits - a program that ran fine using REAL on a CDC-6600 (60 bits) might die horribly using REAL on an IBM 360 (32 bits but using hexadecimal arithemtic).
CDC made 48 bit machines (1604 and 3000 series) and 60 bit machines (6000 series, 7600 and some Cyber's) but not a 36 bit machine AFAIK. The 6600 had 60 bit reals and long ints, 18 bit short ints, 12 bit words for the peripheral processors - a real PITA for C.
I had to change a bunch of 'int' to 'long' to get something to compile.