Slashdot Mirror


Transmeta Code Morphing != Just In Time

Andy Armstrong has written us a pretty interesting, but somewhat technical piece of Just in Time, computer generated assembly language, profiling, and more. Its a pretty interesting little bit that ought to give you a lot to talk about.

The following was written by Slashdot Reader Andy Armstrong

Transmeta Code Morphing != Code Morphing The recent Transmeta announcment crystalised something I've been thinking about for a while. It's my belief that it should be possible to make a compiler generate much better code in the general case than someone writing hand coded assembler. Furthermore it should be possible for a JIT (Just In Time) compiler to produce better code than a conventional one time compiler.

Why should a compiler be better than an experienced assembler programmer? Well,

  1. the compiler can know the target processor intimately (cycle times, impact of instruction ordering, etc.)
  2. the compiler gets to re-write the entire program each time it sees it.

The second point is critical: any programmer writing an assy. language program of any significant size will write the code to be maintainable. Of course, it makes sense to do things like defining standard entry and exit sequences to routines, keep a few registers spare (in those architectures that have more than a few) and other practices that lead to maintainable code, but the compiler doesn't have to maintain the code it writes. It gets to write the whole thing from scratch every time. This means that functions can be inlined (and repeated code sequences turned into functions). Loops can be unrolled then rolled back up at the next compile when the programmer has decided that space is more important than speed.

If you know you're not going to have to maintain a bit of code you can do some pretty scary things to get it to perform better. A compiler can potentially do that all the time.

Why should JIT be better than one time? This should be clear to anyone who's followed the Transmeta story. A key element of their code morphing technology is that they insert instrumentation into the code they generate, effectively profiling as it runs so that the compiler can decide which bits to optimize the next time it sees the code. It's well known that the coverage graph for a typical programme looks extremely spiky - the most frequently executed code may get hit thousands or millions of times more than it's neighbours. It follows that it really isn't worth optimizing the stuff that only accounts for a millionth of the code's execution time.

This brings me to my point: is there really any reason why a Java / JIT combination shouldn't result in code that executes as quickly as the equivalents in other languages?

You might suggest that garbage collection must slow things down, but I'm convinced that, done right, garbage collection can actually improve performance. malloc()/free() require the memory manager to think about the heap for every call, but new() can be implemented as a stack (m = memlimit; memlimit += size; return m) and the garbage collector gets to do all its memory management in one chunk - it can take an overview of the memory landscape rather than trying to keep things fairly optimal each time memory is released as free() must.

You could argue that the OO nature of Java means that it must dynamically allocate objects that would be static in a program written in C or assembler. That's true, but (assuming calls to new() are cheap, which I believe they can be) this really isn't a problem. Current processors don't take a huge performance hit when working with objects who's address is not known at compile time; in fact in many architectures it makes no difference at all.

So while it might seem profoundly counter-intuitive can anyone actually give me a good reason why Java + JIT should be slower than Good Programmertm + Assembler?

5 of 454 comments (clear)

  1. Compilers dont write better code than humans by tjansen · · Score: 5
    I have already heard that assumption that a compiler can generate better code than a programmer a thousand of times, but it does not get more true by repeating it - it is false. At least until compilers are able to understand the program. In every program there are things the compiler simply doesnt know. For example, in x86 assembler it is possible to save some cycles (and memory accesses) by using 16-bit or even 8-bit registers instead of 32 bit registers. You can double the number of registers by doing it. You need to be sure that the values in the registers are below 65536 or 256 to use these tricks, and the programmer can know this, but the compiler cant. The compiler might profile the possible value range if it is a really advanced compiler (I never heard of any compiler actually doing this), but it cannot be sure so it must at least check the values before for the case that they arent.

    As long as the programmer has more knowledge than the compiler, he will always find tricks to save an instruction here or there and outperform the compilers this way. You can find a great example of such programming tricks in the PCGPEs article about texture mapping inner loops here.

    1. Re:Compilers dont write better code than humans by locust · · Score: 5
      I have already heard that assumption that a compiler can generate better code than a programmer a thousand of times, but it does not get more true by repeating it - it is false.

      The metric that I given in my university computer architecture class was that a compiler can generate better code than 90% of assembly programmers. Which, when I examine higher level code written by a variety of programmers, some good, some bad, makes sense. Some people know all the details of a language and some don't. Some pay attention and some don't. I'm pretty sure it was backed up by a reference to an IEEE paper somewhere.

      If large numbers humans could reliably write large volumes of high quality, optimized code, at high speed, we'd still be using assembly everywhere. But they can't. I certainly don't have the time or the inclination to learn all the ins and outs of every instruction set architecture I operate on to squeze every last bit of juice out of the machine. And I certainly am unwilling to give up the expresiveness of a highlevel compiled language.

      But ultimately, the fraction of people that need to write things like texture mapping loops is very small, and the fact of the matter is that the loop is probably embeded in some higher level piece of code. So relax, compilers reliably produce better code than the majority of assembly programmers (maybe not on /.). Or more precisely, better machine code, than the majority of programmers, if they were asked to write assembly. --locust

  2. Re:Java Byte Code by arivanov · · Score: 5

    No it will not.

    Think of Java licencing and control issues. Discussed widely on slashdot. Actually I shall retract this statement if one of the MAJOR league players will accept Crusoe as a primary CPU for at least one machine class.

    Otherwise it is obvious that it could be thy Java machine, but it is least likely to be.

    I think Cruose will actually make a reality something else which is much cooler than Java. It will make real thy developer's dream - the coat of many colors: The affordable machine of many architectures.

    On the basis of Crusoe even now you can build a machine that can happily emulate:

    Mac, Sun (lower end), IBM PPC (lower end), SGI (lower end), Alpha and curse it x86.

    1. All these are PCI based.

    2. The differences in chipsets can be ignored under the "one OS to rule them all, one os to find them, one os to bring them all and in (oh well cutting out darkness) bind them ". It can have drivers for the chipset and peripherals in question.

    3. You may actually do the reverse thing and develop drivers for the peripherals and the chipset for all platforms in question (not a hell of an effort, actually quite achievable). If peripherals are something like adaptec, tulip and a PCI VGA the drivers are basically there already. So you have only the chipset left. Actually you can intercept these and emulate chipset behaviour if you so desire.

    Overall - the developer's dream can become a reality for just a few bucks - about the price of a PC (excluding licencing for OSes and sowftare of course ;-)

    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  3. Isn't this REALLY the smarter compiler arguement? by stevew · · Score: 5

    Years ago I worked at a company that was going to bring VLIW to the commercial world. One of the things they had to invent was MUCH smarter compiler technology. That was in 1985, so the basic ideas involved here are at least that old.

    Anyway, when the compiler DOES know the hardware, especially VLIW machines, they can indeed do a job as GOOD as a human. (Look - they won't usually do better jobs, when the guys who designed the machine write code, they're going to do what the compiler is trying to emulate..) But from what I remember, static profiling was considered "good enough" back then if the VLIW machine is built right.

    Most codes spend most of their time in inner most loops. If those loops can be rolled up correctly for VLIW you can be executing different interations of the loop within the same instruction slot. Being able to do this is usually the largest performance payoff. If you have a GOOD compiler in the first place, that KNOWS the hardware, then JIT and morphing aren't needed, or help.

    The place that I see the morphing being a step forward is that VLIW machines were considered architectural dead-ends until just now. If I use one of these smart compilers for a machine with say 7 functional units, the code that emitted won't work on a machine with 8 functional units, or at least not be optimized any longer. These machines didn't scale well until now! Code-morphing technology completely removes this limitation!

    --
    Have you compiled your kernel today??
  4. Re:Profoundly counterintuitive? by BurdMan · · Score: 5
    Firstly, water does sink into rocks. It then freezes and the pressure it exerts reshapes the world as we know it. But that is beside the point.

    I will address your comments with respect to the Java Programming Language and its HotSpot compiler technology. If you would like to enlighten yourself on the techniques behind HotSpot take a look at http://self.sunlabs.com. Transmeta, IMHO, was influenced by the same concepts when designing its code morphing techniques.

    1. "A JIT is rushed" - The first time bytecode is compiled to native code it must be done quickly to avoid delay. That code may be inefficient, true, but it can also be instrumented with profiler like information that can be used by later passes of the compiler. You see, what the author is describing is not a JIT system but a dynamic optimizing compiler which has the opportunity to study the program during execution and recompile parts or all of that program based on that information.
    2. "no leeway for boundary conditions" - Well here again you seem a bit confused and your example is naive. The native representation of a type is not fixed in Java, only the conceptual representation. To be a compliant Java implementation all results generated using 'int' (to use your example) must be consistent with the conceptual representation of 'int' when using the various operations. If the optimizing compiler finds a way to represent an 'int' in a more optimal way it is free to do so as long as the results of the operation don't change.
    3. "garbage collection is slower than hand coded memory management" - You really ought to read some of the latest garbage collection papers. This is no longer true even for collectors managing C or C++ allocation. When given an environment like Java which doesn't allow pointers and other antiquated memory techniques a good dynamic compiler with a modern garbage collector is both faster and more efficient than any hand coded attempt on all but the most simple of applications.
    4. "OOP and dynamic dispatch are inefficient" - Again, I urge you to read the papers at the Self site listed above. The way the Self, and now HotSpot, compilers work eliminates this bottle neck. I'll admit in early implementations of Smalltalk and Objective-C dynamic dispatch did add a small amount of overhead. Take a look at the documents on the HotSpot here and then tell me that dynamic dispatch is a problem.


    In summary, you are more than welcome to use assembly all you want. Code on my brother! But, please before you slam some other method try doing the smallest amount of research first. Maybe your snap intuition is wrong. You never know.

    As far as Transmeta goes, it has a lot of the HotSpot/Self style technology and I personally think that technology is the future. I can't wait to get my hands on a Crusoe powered product.

    -BurdMan