Porting to 64-bit Linux
An anonymous reader writes "As 64-bit architectures continue to gain popularity it is becoming more and more important to make sure that your software is ready for the shift. IBMDeveloperworks takes a look at a few of the most common pitfalls when making sure your applications are 64-bit ready. From the article: 'Major hardware vendors have recently expanded their 64-bit offerings because of the performance, value, and scalability that 64-bit platforms can provide. The constraints of 32-bit systems, particularly the 4GB virtual memory ceiling, have spurred companies to consider migrating to 64-bit platforms. Knowing how to port applications to comply with a 64-bit architecture can help you write portable and efficient code.'"
Provided your code isn't written in assembly, do you really _have_ to do anything else than to recompile it? Of course, you might want to make changes to make better use of the 64 bits, but to just make it run, wouldn't this be enough?
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
this article is slashdotted, and I have to attend an IB information session.
can someone please email me a copy when it comes back up? thankyou
crispeeb@gmail.com
I am waiting to hear from you now!
Well, if you RTFA you'll see that there are issues with how registers are handled on certain integer values for example.
I'm sure it won't affect your VB app, but it could affect something written in C/C++.
I'm just wondering if this is what is holding up an AMD64 version of Flash.
$30 Off All Plans: Use code TRIPLESAWBUCK
In theory, yes. However programmers usually do stupid mistakes, like assuming that sizeof (int) == sizeof (void *) and so they think they can cast pointers to int and the other way around, while on a typical LP64 platform (like AMD64), ints are 32 bits wide and pointers 64 bits, so the cast will not work as expected.
All in all, nothing new here, well written portable code just need a recompilation, and everything else will need to be debugged.
Generally architecture changes, compiler version changes, break code on large projects. Over a million lines of code, any tiny little difference in the platform that the original developers didn't think to account for will come up *somewhere*. A good example of this is if you are dumping data structures to disk or network and write a size_t variable. Suddenly, you can no longer communicate between 32 bit and 64 bit versions of your software.
As a general rule, "just a recompile" *never happens* for any architecture and compiler change on a project above a certain size. Compiler writers break compatibility with some little ol' thing they don't think anyone is using, but which everyone is actually using in *every* version, fail to implement uncommon or difficult language features, add non standard features that other compilers don't support. Then application developers do things like not swapping to network byte order and using architecture dependent data types (size_t as in the example). Between different unices, header file contents will change.
The fixes are often not that hard (usually trivial) to do between say versions of the same compiler, or endian switches... but they are still there and annoy the hell out off people trying to compile old open source software on a new platform, like say macosx was a few years ago and x86 64 is now. There's always growing pains.
I have windows XP 64bit edition, and let me tell you- that 128GB ram limit really pulls me down..
every day http://en.wikipedia.org/wiki/Special:Random
Write it in Java, and it will run on 32 and 64 bit Linux, as well as Solaris and other Unixes, Windows, Mac. And you won't have to change a line of code.
Knowing how to port applications to comply with a 64-bit architecture can help you write portable and efficient code.
1 5-users than giving developers proposals how to do things.
Ignoring 64Bit helps a lot to write portable code. For 99.999% of all Apps out there 64Bit is irrelevant, anyways.
Slashdot should rather stick to Buzzword-overload-articles, FUD and linking-to-manual-pages-of-linux-commands-for-08/
There... 2 minutes of my precious time... gone for nothing!
If you don't make assumptions about pointer sizes in your code, always use size_t in the appropriate places, etc, it is generally just a quick recompile for x64. I find a lot of open source code (I'm sure this isn't exclusive to open source, but, well, I can't see closed source!) spits out hundreds to thousands of warnings about assigning the return of strlen() to an int and other similar and usually harmless things, but most of the time it Just Works (tm).
The only area I've ran into things being significantly harder is writing clean lock-free algorithms due to the lack of a CMPXCHG16B instruction in the original spec - only EMT64 and very recent AMD64 models have it. There are a couple ways to hack around this limitation but they aren't very pretty.
I tried compiling an application for 64-bit, and the problem I found is that many libraries weren't available in 64-bit versions. I don't mind compiling something for 64-bit, but I do mind compiling the application and a few libraries, and the libraries they depend on, recursively ad nauseum.
Finally, when I did get it working, the maintainer didn't have a 64-bit OS so they weren't interested in hosting the RPM I built. It seems like until enough people have 64-bit systems, nobody really cares about it.
I've been running a 100% 64-bit dual Opteron rig for almost two years, under Gentoo. No emulation libraries, no multilib, just 64-bit code. Other than Open Office, I've had almost no trouble at all.
BTW, "64-bits" don't make programs run faster (in general) — code compiled for AMD64/EMT64 runs faster than its 32-bit counterpart (for the most part) because of the extra general-purpose registers in the AMD 64-bit design.
All about me
Really...just use Java or C#/Mono and the only thing you have to do is change the SDK/JDK and re-test.
Unless you are doing low-level system-level coding in C or C++, these two languages are perfect for creating any regular end-user app, both server side and client side.
Worrying about 32-bit or 64-bit is so last century. You'll be worrying about year 2000 next.
Write it in Java, and it will run on 32 and 64 bit ...
That statement hides the real truth about Java, and the real problems.
Sure enough, a successfully compiled Java program will run on pretty much any working Java system. Unfortunately, the Java system itself is extremely non-portable, and is a bitch to get working properly. And I've tried all the major Java implementations.
I have well over a dozen machines, and all but two are different to each other and run different operating systems or different system releases. Pretty much all FOSS languages work on every box, with one major exception: Java. It's working successfully on just three of them, and that's not for want of trying. I've sweated blood over the years trying to get Java working.
I'm not sure exactly what the problem is with Java, but no other common languages suffer from it --- they all work pretty much out of the box. In comparison, the Java system is hopelessly non-portable. And talking about the high portability of the bytecode isn't very helpful to people when the hosting Java implementation doesn't work on a particular system.
My guess is that there are so many modern systems on which standard Java installs work just fine that the developers just couldn't care less about truly increasing portability so that it also installs on systems outside of the high volume mainstream.
So, the "write once, run anywhere" of Java mythology needs changing to "write once, run only where Java happens to work". It's a pretty bad situation.
My latest C project is an embedded avr system, where this is of course very important. The solution has been to make a header file with a bunch of typedefs in it like:
typedef unsigned char uint8;
typedef unsigned short uint16;
And so forth. Then I exclusively use the new types. If I need to compile to another platform, I just need to change the portable.h file.
You can even go one further with:
#if sizeof(unsigned char) == 1
typedef unsigned char uint8;
#else
#error "No uint8 type available"
#endif
That way the compiler will warn you if there's a problem when you switch platforms.
(This solution was not mine, but another developer's.)
It's not wasting time, I'm educating myself.
The last parameter of pthread_create() is a void pointer to pass data to the new thread.
If all you are passing is an integer value then casting it to a (void *) is the quickest method. The alternative is to malloc() some space, copy the value, pass the pointer, and make sure you free() the memory in the thread(which you should probably do at the start of thread by copying the value locally before free()'ing the memory.)
The article doesn't appear to mention this, but there is a C99 standard header stdint.h, which defines fixed width types. I haven't seen any OSS project use it, for some reason, but it has all the types you need for portable development; int32_t, uint64_t, constant wrappers like UINT64_C, and, of course, limit constants for all of the fixed-size types. Using these is much better than all those size-based #ifdef'ed typedefs I see people use all over their code.
I did a lot of 64-bit cleaning up for the PHP project, and I can tell you that there are more subtle issues that may arise when porting from 32-bit to 64-bit.
One example:
on a 32-bit Intel machine, a double is precise enough to distinguish LONG_MAX (the highest representable long) from LONG_MAX+1 (a number that doesn't fit in a long anymore). So for instance, to determine whether a long multiplication has overflowed, you could repeat the same multiplication using doubles and compare the result to (double)LONG_MAX.
In contrast, on a 64-bit platform LONG_MAX and LONG_MAX+1 are mapped to the same double representation, so there's no way to do the comparison anymore.
As this example involves static casts, it is something the compiler will usually not warn you about.
Another thing to be careful about is passing pointers to variadic functions (eg. sscanf), because usually the compiler doesn't know the expected types, as they are buried in the format string, not in the function prototype.
On all Linux systems:
sizeof(int)==4
the "long" and "void*" data types may be atomically written, 64-bit or not
Also:
sizeof(long)==sizeof(void*)
sizeof(long long)==8
This is quite standard for 32-bit and 64-bit systems. The only major OS to
violate this is Win64, which kept a 32-bit long and thus can't safely cast
a void* to long and back again. Linux, BSD, Solaris, MacOS 9, Win32, OS/2,
VMS, VxWorks... they all work as Linux does. (screw Win64)
When Linux was ported to the Alpha and Itanium, libraries went in /lib. Emulation libraries go somewhere under /usr, like /usr/lib/i386-linux-elf/lib or maybe /emu/i386-linux-elf/lib. Any app being natively compiled didn't need to care about this cruft.
/lib directory. Not that we want old cruddy i386 binaries!
/lib64. It is so lame. It causes all sorts of trouble porting things to 64-bit, just so we dont need to do "mv /lib/libfoo.* /usr/lib/i386-linux-elf/lib/" for the occasional i386 binary.
/lib directory without libraries, and the "/lib64" wart lasting until the end of time.
Then AMD told SuSE to make x86-64 run all i386 binaries perfectly, including installers that would expect to use the
So now we're supposed to use
A few years from now, nobody will even install 32-bit crud. We'll have a
Nah. That's a bit pessimistic outlook. Already today /lib64 is a mere symlink to /lib on current distributions. The symlink may have to be kept around of for a while though until the early nomenclature oopses have been effectively phased out.
while true; do eject; eject -t; done
Going off on a tangent:
I have no idea what a 36 bit signed-magnitude integer mainfraim (( Yeah, they really existed -- CDC made them )) would return for *(unsigned char *) (int)-2. It would probably be 0x80 or 0x40 -- but it might be 0x800 (CDC used 6 bit characters, and case shifting was done by using a second 'escape' character -- rather like unicode, so a 'char' might be either a 6 bit or a 12 bit unit -- or an 8 bit unit just to keep from choking every C program under the Sun)
Free Software: Like love, it grows best when given away.
I am wondering if this 64-bit porting article is written specifically for Macromedia.
64 bit porting is more of a compiler problem.
... fum(foo); results in a type warning?!?
In particular, the GNU toolchain has a very poor ability to complain about long/int coercion. It also doesn't have a 64 bit pointer type for use in 32 bit code - so any 32 bit code you need to talk to from 64 bit ends up handing around a long long, and since this is just an integer type, there's no problem with assigning it to another integer type, and potentially losing resolution (and bits off the pointer, should it be converted/passed back).
Minimally, the tools need to have a flag that complains about integer assignment between 32 and 64 bit values. Ideally, they would also include a "long void *" or some other pointer type whose type coercion would result in a warning being generated, unless the coercion was done with explicit casts.
The assignment warning has been asked for many times in the past by people trying to move ILP32 code into an LP64 environment, and of course, the tools people have objected to the idea because it would cause additional warnings that they'd have to explain how to coeerce around.
The objection to adding a 64 bit pointer type for use in 32 bit code, I can somewhat sympatize with - but they added "long long" well before it was standardized, even when it was obvious that the most correct thing to call it would be a single token like "quad" so it could be defined in and out, without having to typedef or replace explicit types. So the argument against it is very weak.
They've also messed up and refused to correct things that are obvious breakage (e.g. "typedef char *caddr_t; const caddr_t foo" results in a "char const *foo", rather than a "const char *foo", and "caddr_t const fee" results in a a "char const *fee" instead of a "char * const fee" -- meaning it's impossible to use typedef'ed values in function declarations using const in some circumstances).
Speaking of all of which -- when are we going to get a flag so that "typedef char *caddr_t; extern void fum(char *); caddr_t foo;
The vast majority of problems that arise when porting between ILP32 and LP64 are things which would be trapped by the addition of a few warnings to the compiler - but without those warnings, it becomes a Herculean task requiring a level of detail work and precision very difficult to keep up over the amount of time necessary for a large project porting effort. It's no wonder this is becoming a visibility issue, as 64 bit hardware becomes more prevalent, and people decide they want to run their code over there.
-- Terry
You are incorrect.
The following is some code that does not warn that the resolution of "long l" is potentially insufficient to store the value contained in "long ll":This is on an ILP32 machine with the current GCC 4.x. It also fails to warn in GCC 3.x, so this is not a "4.x branch thing".
I would similarly expect a warning when using an integer as an lvalue for an expression containing a long when running on an ILP32 machine, but there is no warning - any overflow occurs silently.
-- Terry
Even well written code can have problems.
Specifically, say I have a 64 bit platform capable of running both LP64 code and ILP32 (legacy) code.
I use a shared memory segment to communicate between my legacy 32 bit applications, and it has internal use of pointers to perform self-reference on data.
[Rather than complicating things, let's just assume that the pointers are internally based off the base address of the shared memory segment, rather than being based off of 0, so there is no requirement of mapping the memory into the same location in each process]
I'm now adding a 64 bit computation engine (perhaps my application is a rendering system that uses plug-ins, and being able to work on large data sets with the large address space afforded to 64 bit processes is critical, but when it comes to displaying the results, I can live easily in a 32 bit address space, so I'm not trying to port my whole tool over to 64 bits).
So now I have to deal with the internal pointers in the shared memory segment. I can do one of several things:
(1) I can use structure coercion to treat the pointers as if they were integer offsets instead, and coerce them into pointers internally in the 64 bit code (on LP64, pointers are 64 bit).
(2) I can intenrally store 64 bit pointers, rather than 32 bit pointers. This means I need the same round-tripping, but it can take place in the 32 bit applications, rather than the 64 bit applications, and the Integer representation is as "long long" as far as its concerned.
(3) I can support either a "short void *" in 64 bit applications, or a "long void *" in 32 bit applciations.
If I go with approach #3, I get to keep my type checking. With the other two approaches, I have explicit coercion, and I lose my type checking and boundary/range checking: the explicit casts quiet the warnings, even when they are used incorrectly.
If I go further, and allow the segment to be mapped anywhere in memory, it may be mapped over 4G. I might also have relative base addressing (e.g. listerner converts), where I store the internal base address in the provider as part of the data being provided). This may sound like a strange scenario (e.g. it's like DCE RPC, in that it becomes the receiver's responsibility to convert, if a conversion is needed), but it's very useful. It has the following attributes:
(a) If I use homogeneous consumer/providers, no conversion is necessary
(b) My "work horse" application can do their work, and it's up to my "viewer" applciation to do the conversion; presumably, it's not doing much other than interacting with a slow human, so this ends up being the best division of labor
(c) As time goes forward, the rest of my application is likely to migrate to 64 bit as well, so I get performance improvements over time, as the coversion requirements drop out.
You could argue that because the program was not 64 bit clean, it's not "well written". You could also argue that losing the compiler warning checking is "OK, because it's your own fault for not porting everything" (if you didn't believe in closed source third party plugins over which you had no control).
I would argue that you can expect someone to accurately predict future users of their software, and there's only so much work you can do to make sure that things don't break horribly at some arbitrary point in some arbitrary compilation environment.
For the most part, we have to rely on our tools.
And our tools do not tell us when this type of problem happens, because this type of problem is relatively new.
-- Terry
You are wrong, for some definitions of "word".
To AMD and Intel, a word is 16 bits. This is seen in the Intel-style assembly that masm and nasm use.
By the ELF binary specification, a word is 32-bit or 64-bit according to the platform. So the word size did change.
The traditional idea, with a word being the size of a register, is like the ELF spec.
The C programming language has no such thing. On both i386 and x86-64, sizeof(int)==4 and sizeof(long long)==8. On x86-64, sizeof(void*)==8. On i386, sizeof(void*)==4. On a Microsoft platform, sizeof(long)==4. On a non-Microsoft platform, sizeof(long)==sizeof(void*).
char was 9-bit
C requires at least 8 bits for char, so 6 isn't good enough.
All types must be a multiple of the size of char, because
sizeof(char) is 1 by definition and fractions are not OK.
Valid sizes are thus: 9, 12, 18, 36
The char-short-int-long progression may be one of:
9,18,18,36 a likely choice
9,18,27,36 this is the cool way: sizeof(int)==3
9,18,36,36 a likely choice
9,27,27,36
9,27,36,36
9,36,36,36 a likely choice
12,24,24,36
12,24,36,36
12,36,36,36
18,18,18,36
18,18,36,36
18,36,36,36
36,36,36,36
Then -- patiently fix them all. You know, you planned to do that for years. Do it before trying to build a 64-bit version.
Then -- try the 64-bit version and fix all the warnings you missed before. void * to int conversions are my personal favorites...
Resist the temptation to invent your own types, though (Mozilla's source tree is awful in this regard). Use the standard int32_t or uint64_t, where the number of bits matters -- a simple hardware-dependent int is usually more efficient.
Make sure your next machine runs a 64-bit OS and gain practice by porting/fixing various free software to run on it :-)
In Soviet Washington the swamp drains you.
Answer: To follow programming 32-bit ASM, quick and dirty, don't write C stupid.
Question: And the 20 Athlon64 for the small-enterprises?
Answer: To follow programming 32-bit ASM, don't write C stupid, don't write 64-bit ASM.
Answer: To enterprises, don't permit them to build a 64-bit cluster!!!
Thanks for the 2 GiB RAM limitation for the Athlon, Pentium and Athlon64!!!
Thanks, nobody uses 4 or 8 or 16 or 32 GiB of virtual memory, BACK TO 32-bit WORLD!!!.
Sounds like you want the "kind" functionality of Fortran 90.
Back in the bad old days, a REAL in Fortran could be anywhere from 32 to 64 bits - a program that ran fine using REAL on a CDC-6600 (60 bits) might die horribly using REAL on an IBM 360 (32 bits but using hexadecimal arithemtic).
CDC made 48 bit machines (1604 and 3000 series) and 60 bit machines (6000 series, 7600 and some Cyber's) but not a 36 bit machine AFAIK. The 6600 had 60 bit reals and long ints, 18 bit short ints, 12 bit words for the peripheral processors - a real PITA for C.
I had to change a bunch of 'int' to 'long' to get something to compile.