Porting Linux Software to the IA64 Platform
axehind writes "In this Byte.com article, Dr Moshe Bar explains some of the differences between IA32 and IA64. He also explains some things to watch out for when porting applications to the IA64 architecture."
Now I, and the other two IA64 users, will have some programs to run on our Linux-64 boxes!
Can someone please port nethack for us?
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
AND IT FEELS GOOD!
The major difference between IA32 and IA64 is price.
IA64 is twice as wide as IA32. Therefore, it will be necessary to remember to halve the size of all variables to compensate in your programs. Additionally, we now have to type twice as much for each command or function. It really sucks that we will no longer see Ms. Portman onscreen in Star Wars anymore. So, in conclusion, nuts to IA64: I'm sticking with my Athlon, thank you very much.
Well obviously what we'll see next is a kernel extension that dynamically 'ports' all your applications to IA-64 and transparently migrates them to IA-64 machines elsewhere in the cluster. When Intel's next Great Leap Forward is released, you'll be able to transparently migrate to that as well. In fact it will be so transparent, you won't notice any difference and you can continue working at your 80286-based machine without any interruption.
-- Ed Avis ed@membled.com
The key difference between IA32 and IA64 is not 32 or 64 bit technology. It is price/performance, as Intel is sadly aware of.
I figured this would be coming now, I just started my first Assembly class as a CS undergrad, a whole new group of registers to memorise!
It's not the OS it's the user that sucks. If it's user friendly, you get stupider people. - clinko
Isn't that the instruction set of the Itanium processor that isn't selling worth crap? I was under the impression that intel was going to eventually drop (or push to a back burner) support for this and go with x86-64 (the AMD 64 bit architecture being rolled out with the Opteron.)
Even more exciting is porting Linux to the N64 platform!
Oh please.
return (char *) ((((long) cp) + 15) & ~15);
is not portable.
return (char *) ((((size_t) cp) + 15) & ~15);
is much better.
Ah, porting to homogeneous isa but with a bigger word size. Funny how it's the same old issues over and over again. Structs change in size, bad assumptions about the size of things such as size_t, sizeof(void *) != sizeof(int) (though sizeof(void *) == sizeof(long) seems to be pretty good at holding true here), etc. Of course now there are concerns about misaligned memory accesses, which on IA32 was just a performance hit. Most IA32 types are not used to being forced to be concerned about this (of course many *NIX/RISC types are very used to this).
When things were shifting from 16 to 32 bit (seems like just yesterday, oh wait, for M$ it was just yesterday), we had pretty much the same issues. Never had to do any 8 -> 16bit ports (since pretty much everything was either in BASIC, where it didn't matter, or assembler, which you couldn't "port" anyway).
Speaking of assembler, I guess the days of hand crafting code out of assembler is really going to take a hit if IA64 ever takes off. The assembler code would be so tied to a specific rev of EPIC, that it would be hard to justify the future expense of doing so. It would be interesting to see what type of tools are available for the assembler developer. Does the chip provide any enhanced debugging capabilities (keeping writes straight at a particular point in execution, can you see speculative writes too?). It'd be cool if the assembler IDE could automagically group parallelizable (is that a word?) together as you are coding.
It's always bugged me that there's no portable
way to print out most int-like datatypes.
I usually just cast them to long. So if I had
a pid_t, I'd print it like this:
printf( "%ld\n", (long int)pid );
The way it *should* work, if I were king of the
universe, would be:
printf( "%{pid}\n", pid );
printf( "%{uid_t}\n", getuid() );
etc.
After you reach a certain point longer isn't better, it becomes inconvenient. I'm sticking to my 32-inch architecture thank you.
-- Ed Avis ed@membled.com
Debian is already ported to the IA64 -- not sure about the number of packages ported yet, but I know they intend to release the new 3.0 (woody) with a IA64 port.
See here for more details
zadok.org.uk
In the article he mentions that itanic can execute IA32 code _and_ PA-RISC code natively, as well as its own, but these features will be taken away sometime in the future.
Does anyone remember the leaked benchmarks that showed the itanic executing IA32 code at roughly 10% of the speed of an equivalently-clocked PIII?
I wonder how it shapes up on PA-RISC performance?
It has to offer some sort of advantage over existing chips, or no one will buy it.
On the other hand, maybe its tremendous heat dissipation will reduce drastically when they remove all that circuitry for running IA32 and PA-RISC code.
Which leads me to think, why didn't they invest the time and money in software technology like dynamic recompilation, which Apple did very successfully when they made the transition from 69k to PPC?
I'm out of my tree just now but please feel free to leave a banana.
I don't see what is so obvious - isn't one of the selling points of Itanium its backward i386 compatibility? Even if running the 64-bit version of Linux it should still be possible to switch the processor into i386-compatible mode to execute some 386 opcodes and then back again. After all, the claim is that old Linux/i386 binaries will continue to work. Or is there some factor that means the choice of 32 bit vs 64 bit code must be made process-by-process?
Interesting question: which would run faster, hand-optimized i386 code running under emulation on an Itanium, or native IA-64 code produced by gcc? They say that writing a decent IA-64 compiler is difficult, and I'm sure Intel has put a lot of work into making the backwards compatibility perform at a reasonable speed (if not quite as fast as a P4 at the same clock).
-- Ed Avis ed@membled.com
I'm running on a very fat little 8-bit machine...
It's an old frustration I've had with Windows having to do with the time it takes to boot. Why can't they put Windows on an EPROM chip (perhaps on the motherboard, perhaps on a card) so that the OS is all in hardware? Booting would be so much faster.
Has anyone thought of doing this with Linux?
The examples he gives for usage of null pointers are both wrong. When a null pointer (whether written as 0 or NULL) is passed to a varargs function, it should be cast to a pointer of the appropriate type. See the comp.lang.c faq for details. The relevant questions are 5.4 and 5.6. But feel free to read them all!
A while ago, I tried compiling and running my program (http://freespeech.sourceforge.net/overflow.html) on a Linux PPC machine and (to my surprise) everything went fine. Does that mean that it should work on ia64 too since (AFAIK) both are big-endian 64-bit architectures?
Opus: the Swiss army knife of audio codec
But I was going to tashi station to pick up some power converters!
This is a troll post. There are thousands more like it, but this one is mine.
When I was reading the article, the part about no Floating Point in the Kernel stuck out for me. Is this an absolute, or a "don't do it, it's bad"? I looked at the Mossberg presentation on the IA-64 kernel and it looked like they were using some of the fp registers for internal state, but it didn't look like all of them.
Derek
Don't Panic...
The itanium consumer quite some power compared to
say Hitachi SH4, Arm/Intel Xscale etc.
In the link Moshe Bar writes that Itanium has hardware support for both the IA32-legacy (I knew that) and HP's PA-RISC architecture (new to me). Does anyone know how much less power the Itanium would consume if those were to be dropped?
Since the IA32-core in Itanium is alow anyway, how much slower would it be to use software emulation like Apple did (Ie emulating a MC680x0 with the PowerPC CPU)?
If you want a pure 64 Bit environment then go with Sparc or Alpha. AMD has 64bit version too. Intel has an advantage by keeping the i386 op codes it makes it easier to continue using 32bit code but there is a performance hit. If you need to use 32 but then go with Itanium or AMD if you need raw power of 64 BIT then go with Sparc or Alpha. I cannot see any reason to code 32bit when 64 Bit Itanium is released to the desktop unless you have legacy hardware that needs the support. My two cents 64BBIT pure would be best served on your database servers that need the raw power to crunch all that data. Your can still run your office apps off a 32BIT server as there would be little perfomance gain on pure 64BIT. Remember its performance that matters and what you are going to use the server or workstaion for that fiqures into the hardware. CAD developers would love pure 64 BIT and they have been using SPARC 64BIT for years because INTEL bites performance when crunching data like floating point ect...
Even though Linux is blamed for setting back the state of computing by 10 years reinventing the VM and networking stack which works much better in FreeBSD than in Linux, Linux is still being ported to new platforms. Linux is like coachroaches---once you get them, you never get rid of them.
google is your friend.
Nuff Said
Free Unix? Free Windows. http://www.reactos.com
When I started messing about with computers 8 bit chips were stanard on the desktop and 4 bit in the embedded sphere.
Within four years 16 bit was the emerging standard for the desktop and four more than that 32 bit was emerging.
In the 12 years since then, well...
32 bit rules in both the desktop world and in the embedded world. Can someone tell me why we aren't on 128 bit chips or more by now? Why do 64 bit chips not amke it - is this a problem of the physics of mobos or what?
Surely porting of applications written in C and other high level languages shouldn`t be difficult. In that respect Itanium is nothing new, a 64bit little-endian architecture.. alpha anyone?
64bit machines have been commercially available for atleast 10 years, you`d think coders would have got used to writing 64bit clean software by now.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
nvidia already has drivers out for Linux/IA64 with some of their higher end cards (quadro line).
Check out ioquake3.org for a great, free, First-Person Shooter engine!
asked in sincerity: does this mean faster chips, or what?
visit the hwky website for a lyrical genius infusion.
And forget about the problem!
Mats
Gotcha! Coders instead of pissing about how hard it is to write a good compiler why do we not start the 64 BIT Assembler Project. It would not take much time to get some code out for testing. SourceForge could host this project and perhaps IBM, SUN, INTEL, HP, AMD, REDHAT, SUSE, TURBO, MANDRAKE, CALDERA ect... would provide support for such a project. This way there would be a good 64BIT compiler that had some agreed standards that would allow porting of code.
I would like to know if Linus Torvalds and the Linux Kernel Hackers have been asked about this issue of 32BIT and 64BIT code because it is going to affect kernel development. Could 32BIT calls break the Linux Kernel when mixed with 64BIT calls. Another question is on sockets and networking code and protocals. Would it not be better to make a clean 64BIT kernel and dump legacy 32BIT. The break while painfull at first would be the best solution going forward. A compromise would be to make 32BIT mode a module that you could enable by a kernel recompile if you needed it sort of like the IBCS emulation layer. I would hope we get some clarification from Linus Torvalds as to the Linux Kernel Roadmap regarding this so companys that have to plan migration of their sofware and hardware to 64BIT. The goal should be well documented good clean code that does not break the Linux Kernel. Coders need a roadmap in how best to code 64BIT and how best to port legacy code to 64BIT.
There's not any conceptual difference in the high-level programmer's view (nor the C programmer's view for that matter) between IA64 and any other POSIX platform, 64-bit or otherwise, either. The code that breaks, unless it's really CPU-specific stuff, breaks because it was coded poorly. Most of the unportable code out there is really unwarranted.
Karma 39 and still posting at 0.
How do you do that? It'd be great for off-topic posts like this one (that should be modded to 0 anyway)
"Save the whales, feed the hungry, free the mallocs" -- author unknown
Java might be a good cross-platform development language, but I haven't seen a JRE for IA-64 at this point.
Is there a JRE for IA-64? How can Java bytecode be executed/interpreted on Itanium systems at this stage?
Does the IA-32 emulation work with a IA-32 JRE? If so, wouldn't the dual layers of Java and IA-32 emulation make it too slow to be practical?
First of all, IA-64 is now called IPF (Itanium Processor Family), although I've heard rumors that this is changing again, to a third name.
Although the initial acceptance of Itanium-based servers and workstations has been slow, there is little doubt that it will eventually succeed in becoming the next-generation platform.
Actually, as /. readers know, there have been
some doubts. Itanium is 5 years late. Right now Itanium ranks lowest in
SPEC numbers, and Itanium 2 (McKinley), while
it addresses some of the problems, can't expect
to compete with Hammer or Yamhill when it comes
to integer code.
For tight floating-point loops, Itanium 2 is great -- 4 FP loads + 2 FMAs per clock. But on integer code with lots of unpredictable branches, the entire IPF architecture leaves a lot to be desired. Speculation and predication were supposed to address that, but it is very hard for compilers to exploit speculation, and predication does not address issues such as the limitations of static scheduling.
(Also, Itanium 2 removes any benefit that the SIMD instructions had on Itanium, because on Itanium 2, SIMD instructions such as FPMA are split and issued to both FPU ports, negating any performance benefit they had on Itanium. So while Itanium can perform 8 FP ops per clock with FPMA, Itanium 2 can only perform 4 FP ops per clock. This does not look good for the future of IPF implementations. But Itanium 2's bigger memory bandwidth is probably more important than SIMD instructions anyway. Itanium 2 is built more for servers, while Itanium is built more for workstations, which might benefit from SIMD MMU instructions, although the rest of Itanium, and its price/performance, make almost anything else better.)
Superscalar processors with dynamic scheduling are improving much better than was expected during IPF's design (witness the P4 and AMD chips). So Itanium's static instruction scheduling design may be a liability more than an asset today. It puts considerable burden on the compiler.
The x86 emulation and stacked register windows take up a lot of real estate on the chip, which could be better used for something else.
The IA64 can be thought of as a traditional RISC CPU with an almost unlimited number of registers.
Nonsense!!! No CPU has unlimited registers. When writing code by hand or with a compiler, registers are a limited resource which are used up quickly.
And even though IPF has "stacked" general purpose registers which are windowed in a circular queue with a lazy backing store, these windows are of limited utility in real code. How many times does real code use subroutine calls which can take heavy advantage of register windows, before call branch penalties start to negate any benefit the windowing provides?
It's a great idea in theory, but windowing just adds to the complexity of the implementation, taking up real estate that could be better used elsewhere.
The IA64 has another very important property: It is both PA-RISC 8000 compatible and IA32 compatible. You can thus boot Linux/IA64, HP-UX 11.0, and Windows on an Itanium-powered box.
Absolutely false: PA-RISC emulation was dropped years ago, before the first implementation, although it was originally planned. Also, HP-UX 11.0, which is PA-RISC only, is not supported on IPF. Only HP-UX 11.20 and later are supported. HP-UX 11.22 is the first customer-visible release of HP-UX on IPF.
The endianism (bit ordering) is still "little," just like on the IA32, so you don't have to worry about that at all.
Misleading -- the endianism is still a part of the processor state (i.e. context-dependent). This means it can be both big and little endian, and can switch when an OS switches context. HP-UX, for example, is big-endian on IPF.
The rest of the article had generic ANSI C programming tips which everyone knows already -- nothing specific to IPF.
"Intel's Itanium processors handle 64 bits, but the Pentium family handles 32 bits."
"The Hammer family of processors ... will be able to run conventional 32 bit applications ... as well as 64 bit applications"
The press anouncements also got Intel to change its mind and start developing a new 32/64 bit combo chip.
1) RE: "(Back in the early '80s, nobody at Intel thought their microprocessors would one day be used for servers; the inherent architecture of the i386 family shows that clearly.)" What the heck is he talking about?
2) The article said that all instructions are assumed to work in parallel thus Explicitly Parallel (EPIC). Isn't that backwards? Wouldn't that be Implicit parallelism? I thought that you bundled instructions together to indicate that they could execute in parallel.
Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
The compiler most certainly can be beaten, even more so today than in the past. I haven't done much asm programming on RISC machines, but on x86, the stuff the compiler puts out is generally garbage. If you're using GCC, it's trivial to beat, as it doesn't know how to deal with the extreme lack of registers on the x86. GCC is constantly going to memory when, with some rearangement, it's possible to keep many more things in registers. The Intel compiler is much better, but it still isn't hard to beat.
Furthermore, with the SIMD stuff in the newer x86 processors (MMX, SSE, SSE2), an asm programmer can get huge speedups which the compiler just doesn't know how to exploit. The Intel compiler will use these features in some instances, but far from optimally. Mind you, you have to know the processor well, and for the big wins, you have to optimize for a specific processor, but if you're doing computationally intensive stuff, the gains can be huge.
The Itanium isn't really a "faster chip". If you count clock cycles, it's actually slower than the mainstream. It gets its speed from better instruction scheduling, so that each clock cycle does more work. The architecture provides for this in several ways:
- The processor can assume that a sequence of instructions can all be executed simultaneously, unless it is told otherwise. This passes the burden of figuring this out from the processor (which has to do it on every run) to the compiler (which will have to do it only at compile time, and has a lot more information to work with too).
- The architecture is designed so that conditional jumps are needed less often. Conditional jumps interfere with efficient instruction scheduling by making it harder to predict which instructions should be executed next, so reducing the need for them is a good thing.
- There are lots of general-purpose registers, which means that fewer instructions are wasted on just shuffling data between the registers and main memory. This is an old trick, but the x86 architecture is even older... when I look at a compiled x86 program, about half of the instructions seem to be that kind of data shuffling. What makes it worse is that main memory is significantly slower than the processor.
Note that the first point creates a problem mentioned in the article: it relies on the compiler to determine which instructions can be executed together. It is difficult for a compiler to take full advantage of this, particularly when compiling C code. It's an interesting area for experimentation, though.My openMosix software
My openMosix software
My openMosix software
Great PR.