Transmeta Code Morphing != Just In Time

← Back to Stories (view on slashdot.org)

Transmeta Code Morphing != Just In Time

Posted by CmdrTaco on Thursday January 27, 2000 @02:12AM from the stuff-to-read dept.

Andy Armstrong has written us a pretty interesting, but somewhat technical piece of Just in Time, computer generated assembly language, profiling, and more. Its a pretty interesting little bit that ought to give you a lot to talk about.

The following was written by Slashdot Reader Andy Armstrong

Transmeta Code Morphing != Code Morphing The recent Transmeta announcment crystalised something I've been thinking about for a while. It's my belief that it should be possible to make a compiler generate much better code in the general case than someone writing hand coded assembler. Furthermore it should be possible for a JIT (Just In Time) compiler to produce better code than a conventional one time compiler.

Why should a compiler be better than an experienced assembler programmer? Well,

the compiler can know the target processor intimately (cycle times, impact of instruction ordering, etc.)
the compiler gets to re-write the entire program each time it sees it.

The second point is critical: any programmer writing an assy. language program of any significant size will write the code to be maintainable. Of course, it makes sense to do things like defining standard entry and exit sequences to routines, keep a few registers spare (in those architectures that have more than a few) and other practices that lead to maintainable code, but the compiler doesn't have to maintain the code it writes. It gets to write the whole thing from scratch every time. This means that functions can be inlined (and repeated code sequences turned into functions). Loops can be unrolled then rolled back up at the next compile when the programmer has decided that space is more important than speed.

If you know you're not going to have to maintain a bit of code you can do some pretty scary things to get it to perform better. A compiler can potentially do that all the time.

Why should JIT be better than one time? This should be clear to anyone who's followed the Transmeta story. A key element of their code morphing technology is that they insert instrumentation into the code they generate, effectively profiling as it runs so that the compiler can decide which bits to optimize the next time it sees the code. It's well known that the coverage graph for a typical programme looks extremely spiky - the most frequently executed code may get hit thousands or millions of times more than it's neighbours. It follows that it really isn't worth optimizing the stuff that only accounts for a millionth of the code's execution time.

This brings me to my point: is there really any reason why a Java / JIT combination shouldn't result in code that executes as quickly as the equivalents in other languages?

You might suggest that garbage collection must slow things down, but I'm convinced that, done right, garbage collection can actually improve performance. malloc()/free() require the memory manager to think about the heap for every call, but new() can be implemented as a stack (m = memlimit; memlimit += size; return m) and the garbage collector gets to do all its memory management in one chunk - it can take an overview of the memory landscape rather than trying to keep things fairly optimal each time memory is released as free() must.

You could argue that the OO nature of Java means that it must dynamically allocate objects that would be static in a program written in C or assembler. That's true, but (assuming calls to new() are cheap, which I believe they can be) this really isn't a problem. Current processors don't take a huge performance hit when working with objects who's address is not known at compile time; in fact in many architectures it makes no difference at all.

So while it might seem profoundly counter-intuitive can anyone actually give me a good reason why Java + JIT should be slower than Good Programmer^tm + Assembler?

32 of 454 comments (clear)

Min score:

Reason:

Sort:

Re:Evolve code. by DQuinn · 2000-01-27 00:50 · Score: 4

Sorry... this post is probably meant to be deeper inside this thread.

Are we not all just dancing around the idea of a human compiler with a vast database of algorithms, intuition, look-ahead properties, and an abstracted idea of what it is exactly that the program is doing? IE. There is no compiler, to my knowledge, which writes highly effective parallel code. Why is that? Simply because no compiler can really understand the overall problem.

How long do you think it will be before the all-powerful AI compiler is written, with all of these abilities, and more, on a box that runs at a few GHz?

Or maybe i'm just being a little too science fictiony here :)

Cheers,
D

--
os.system("perl -e 'print \"My first Python Script.\"'")
Re:Compilers dont write better code than humans by Junks+Jerzey · 2000-01-26 23:15 · Score: 3

You're right, of course. No matter what some people have been trying to say for 40 years or more, you can almost always write better code than a compiler. This is especially true when you're dealing with something bigger than a a single function. The usual way of showing that a compiler is better than a human is by using one smallish C function as an example. That's a pointless example, because the benefit comes from analyzing that function in context and not on its own.

The point of diminishing returns comes into play quickly, however. For example, take the renderer for most any fast 3D game. If you went into the routine that passes triangles to the graphics card, a frequent hot spot, and added a pointless call to a supposedly expensive function, like sqrt, you're not going to notice an effect on frame rate. Doing a higher level optimization, like removing a single polygon from a model, is going to be more of a benefit than optimizing the polygon code, but it's still not going to be noticible. Sometimes you can get big benefits by using algorithms that look more complicated, ones you wouldn't want to approach in assembly, even though they use more code.

With complex programs, it's even conceivable for an interpreted language to out run a compiled one, because everything comes down to architecture and an understanding of the problem. This is hasn't been the case with Java, because Java is a fairly low level language (the more abstract a language is, the less win from compliation) and because Java has become entrenched in the "learn programming in 14 days" market of web designers turned programmers.

So, yes, an assembly programmer can outrun most any compiler. But does it matter? Almost never.
Evolve code. by Anonymous Coward · 2000-01-26 21:28 · Score: 3

"In the general case", perhaps a compiler will produce better code (and certainlty not, as yet, for specialised cases - an inner loop coded direct to PPC macro assembler (PASM on Amiga PPC) is the fastest code I've ever written, but humans still write compilers. Now, if they were to force-evolve code using a genetic algorithm, you might get code better than any a human code write, but it'll probably depend on some weird side effect to some obscure instruction, or the rtesosnant frequency of your ram bus, or something...

Cool article, all the same though. I'd love to see more of this sort of thing on Slashdot, like back in the old days.
1. Re:Evolve code. by Anonymous Coward · 2000-01-26 21:44 · Score: 3
  
  For deeply pipelined, agressively superscalar architectures, I fail to believe that a human being could possibly solve the difficult scheduling and graph-theoretical problems that optimal instruction ordering requires better than a mechanistic compiler. Current architectures are simply too complex and subtle for a project of non-trivial size to be more efficiently coded by humans at such a low level.
  Nevermind of course, that for real prejects, the trade off in maintainability, portability, and extensibility is almost not worth any demonstrable performance gain anyway.
2. Re:Evolve code. by Anonymous Coward · 2000-01-26 22:15 · Score: 3
  
  Now, if they were to force-evolve code using a genetic algorithm, you might get code better than any a human code write
  
  Of course, you'd have to put up with the fact that this would not be executable in Kansas.
Re:Self-Modifying Code by ZoeSch · 2000-01-26 23:23 · Score: 4

Nevertheless I'd like to point out that JIT is based a somewhat dangerous technique: a program that alters (its own) code. I believe this technique was used in the eighties to scare off hackers by making code incomprehensible and hard to disassemble until the program was actually running. Also (even on a good old 6502 processor) it's possible to make some speed improvents peeking and poking into the code you're actually executing.

Not at all true... self modifying code is still around not only on JIT compilers but in interpreted languages and even device drivers (Or how do you think some device drivers adjust in real time to hardware changes?). And it's not a dangerous technique in itself, but (As usual) when executed improperly it can be a wild beast (As most people discovered when thy installed their first Cyrix and AMD 486 processors and discovered most x86 code wouldn't run properly because of cache integrity issues)

When compilers for Microcomputers got faster and most processor architectures (known as Harvard architecture, I believe) explicitly require a division of RW and RO memory segments, self-modifying code was abandoned...

First of all, Harvard machines require separate instruction and data pipelines and memory spaces, and 99% of the CPU's on the market (General, Embedded, etc.) are Von Neumann machines which use a single space to store instruction and data. The thing is, newer CPU's (Even if they're Von Neumann desings)have separate caches for data and instructions and most self modifying code violates the cache integrity (See above) because the code is modified in the cache but never stored back into memory (Unless you use a write-through cache design which is impractical from the performace point of view).

BUT (And that's a big BUT :) if your JIT VM forces periodic cache writes to RAM (Not every time but enough times to ensure some sort of coherence if the cache and RAM are out of sync) you lose a bit of performance but gain a stabler setup.

Disassemblers for JIT won't be as complex as a JIT assembler just because when you disassemble a piece of code you treat is under the black box principle (What goes in and what goes out) in order to derive the fundamental principles and algorithms, which can be implemented in 1E999 different ways, so if you disassembled the less optimal morph, bad luck... disassemble another test run and see if you get a better implementation of the algorithm. Or you could use the aforementioned principle and implement your own algorithm.

JIT code/compilers, BTW are also being tested to produce self modifying chips (ASIC's and FPGA's) under VHDL/Verilog using also NNets or Gen. Algorithms to obtaing a first implementation and then using the JIT to optimize it...

Really interesting stuff (The Transmeta Crusoe) and my bet is that soon other companies will follow altough not for the same reasons as Transmeta :)

ZoeSch

--
I hate to agree with davecrazy but...
Imperative languages are hard to reason by Chuan-kai+Lin · 2000-01-26 23:26 · Score: 3
I do agree that machines should spit better code than humans do, but we are just not there yet. When will we get there? Maybe tomorrow, maybe next month, maybe next year, maybe next century. Nobody knows. Suppose we do get there tomorrow, I believe it will not be Java + JIT that made it, for the following reasons:
1. Java is an imperative language, which are very hard to reason about because of the lack of referential transparency. Yes, I know that it is a known fact in computation theory that it is impossible to reason about arbitrary programs, but the fact is that we do not write arbitrary programs. Imperative programs with destructive updates are hard to reason about, and as a result we are almost always restricted to local optimizations. Local optimizations are great, but I am afraid that that alone will not be enough to break the barrier.
2. Vast amounts of information is lost once the source program is compiled into bytecode, therefore making effective optimization much harder. You can tell the intention of the programmer from the source code, but it is hard to do so for machine instructions (or even preprocessed source code, which anyone who had read the nVidia XFree86 source code must agree). The JIT compiler must create code that acts exactly the same as the bytecode, and that restriction severely reduced its options.
If that breakthrough were to come tomorrow, I believe it is most likely to come from the functional programming people. I believe that is the way to go for the future.
JIT by any other name by TheDullBlade · 2000-01-27 01:16 · Score: 3

"1...not a JIT system but a dynamically optimizing compiler" a slow start is a big problem (which is why they are emphasizing long-term server programs), but that's not the main issue. There's no reason that profile-based optimization can't be applied to C programs. That there's an experimental JIT (yes, it's still a JIT) that's ahead of the C compilers in common use is no reason to claim that the JIT strategy is better.

"2...as long as the results of the operation don't change" Which means that if the representation is different in a meaningful way they have to check whether the results of the operation change, reducing the efficiency, which was my point.

"3...on all but the most simple applications" But I rarely do anything but the most simple applications. I allocate an array, then I free an array. I do this for hashes, trees, stacks, queues, and pretty much any data structure I use. Only when I am being extremely lazy will I allocate and free objects recursively. Modern CPUs with slow main memory and fast cache like it when you keep all the data together, so I generally do so. I also try to access it linearly, which they also like. The point holds.

Don't try to compare the most primitive and unoptimized methods of custom memory management to the most sophisticated garbage collection algorithms. Remember, too, that a person doing custom memory management can write the same sophisticated garbage collector that Java would use.

And don't confuse "primitive" with "fundamental". Pointers are fundamental. Having to analyse reverse polish code function to prevent dangerous operations, rather than making it impossible to construct operations you don't want, is primitive.

"4...then tell me that dynamic dispatch is a problem." Okay, dynamic dispatch is a problem. Just because it can inline methods here and there doesn't mean you won't take an overall performance hit. Besides, that was only one example of the ways the lousy Java handholding slows things down.

The Sun marketing papers you linked me to are just more of the same old Java hype. They'll go on to "prove" that their new super JIT is better than anything you could code by hand by comparing optimized idiomatic Java with the same code translated poorly into unidiomatic C++.

Idiomatic C written by a competent optimizer will still blow the Java version out of the water.

JIT by any other name smells just as bad.

--
/.
Re:What about the JIT? by um...+Lucas · 2000-01-26 23:45 · Score: 4

I read that the 700 MHz Crusoe chip actually only attains the performance of a 500MHz Pentium (II or III, I forgot - oh, can Crusoe emulate SIMD? Not like it matters, but I'm wondering).

So there you have it - code morphing, or just in time compiliing or whatever gives you about 70% of the performance precompiled code.

That's not to downplay Transmeta's theoretical accomplishment. Those extra 200MHz go to things like monitoring it's VM, throttling the clock, etc. So in the end you lose 30% of the performance but make it up with 35X less power consumed.

It's a great trade off for laptops, handhelds, etc. But not for workstations, servers, etc...

I still don't like the idea that they're keeping the instruction sets closed. It would seem like if someone out there wanted to port GCC to Crusoes native instructions, that would be good... But they just don't want to be percieved as being at all incompatible with Intel, i guees.
Sun license? by PapaZit · 2000-01-26 21:37 · Score: 3

Doesn't Sun have a patent on hardware that can run java bytecode natively?

I wonder where transmeta-ish devices fall. I'd call it software, but would sun's lawyers?

--
Forward, retransmit, or republish anything I say here. Just don't misquote me.
Compilers dont write better code than humans by tjansen · 2000-01-26 21:38 · Score: 5

I have already heard that assumption that a compiler can generate better code than a programmer a thousand of times, but it does not get more true by repeating it - it is false. At least until compilers are able to understand the program. In every program there are things the compiler simply doesnt know. For example, in x86 assembler it is possible to save some cycles (and memory accesses) by using 16-bit or even 8-bit registers instead of 32 bit registers. You can double the number of registers by doing it. You need to be sure that the values in the registers are below 65536 or 256 to use these tricks, and the programmer can know this, but the compiler cant. The compiler might profile the possible value range if it is a really advanced compiler (I never heard of any compiler actually doing this), but it cannot be sure so it must at least check the values before for the case that they arent.
As long as the programmer has more knowledge than the compiler, he will always find tricks to save an instruction here or there and outperform the compilers this way. You can find a great example of such programming tricks in the PCGPEs article about texture mapping inner loops here.
1. Re:Compilers dont write better code than humans by Hard_Code · 2000-01-27 02:15 · Score: 3
  
  Right. I've always thought an ounce of design is worth a pound of optimization ;)
  
  Leave the design to the humans...leave the optimization to the compilers.
  
  Jazilla.org - the Java Mozilla
  
  --
  
  It's 10 PM. Do you know if you're un-American?
2. Re:Compilers dont write better code than humans by BinxBolling · 2000-01-27 02:36 · Score: 3
  You need to understand the code fragment you are coding to do the must useful optimizations which are algorithmic ones. A badly coded better variant of an algorithm will beat well coded, but inherenetly slower version.
  Absolutely. But one need not be programming in assembly to do the sort of algorithmic optimizations you speak of. In fact, in higher-level languages, where source code content is determined largely by the algorithm being implemented (rather than by the low-level details of the implementation, as in assembly language), you have a better chance of 'seeing' and successfully implementing a faster algorithm. This sort of thing is one of the big reasons why assembler loses in the long run.
  It seems to me that there are two basic types of optimization task:
  
  Choosing the fastest algorithm for a particular task.
  
  Making a given algorithm run as fast as possible on a given piece of hardware.
  
  Of course, this is a bit of an oversimplification, since it assumes that the fastest algorithm for a given task is going to be the same on all platforms. Usually, this is true, but probably there are a few minor cases where it isn't. But it works most of the time.
  At any rate, humans are much better than machines at the first type of optimization. But machines are much better than humans at the second (even if only because normal humans don't have the patience required to do the enormous amount of repetitive analysis that it requires).
3. Re:Compilers dont write better code than humans by Mr.+Slippery · 2000-01-26 22:14 · Score: 4
  
  You need to be sure that the values in the registers are below 65536 or 256 to use these tricks, and the programmer can know this, but the compiler cant.
  Isn't this why we have short int, long int, and char data types - so we can tell the compiler these things?
  
  --
  Tom Swiss | the infamous tms | my blog
  You cannot wash away blood with blood
4. Re:Compilers dont write better code than humans by AndrewHowe · 2000-01-26 22:40 · Score: 4
  
  In some ways I agree with you, OK, I love writing assembler stuff.
  But I'm trying to separate fantasy from the reality here.
  The fantasy is that you're some super studly asm c0der who can not only produce properly scheduled code, but also has the time to *completely re-write* it:
  a) For each new processor
  b) For each sub-type of that processor with different internal resources
  c) Every time the specification changes - Maybe you have limited the range of a variable or something.
  Any of these changes will require a huge amount of work. Even a small change, coupled with the pipelined nature of modern processors, means a large modification to re-establish optimal resource usage.
  The reality is that no-one has the time to do this, and even if they do, they should be doing something else instead. High level optimizations almost always beat low level tweaking.
  In addition, I simply don't believe it's impossible to
  a) Tell the compiler in more detail how to optimise your code
  b) Have the compiler suggest ways in which it could produce better code (for example, "Is it OK to assume there is no aliasing of this item?" or "Hey, if I make this variable 8 bits in size I can do this huge optimization")
  c) Have the compiler work this stuff out for itself
  d) As (c) but at run time, guided by profiling information.
  Some of those are hard to do right now, but I don't see anything that would make them impossible.
  Does anyone?
5. Re:Compilers dont write better code than humans by Namarrgon · 2000-01-26 22:31 · Score: 3
  
  It's just like Deep Blue vs. Kasparov again. Brute force compilers vs. intuitive & creative humans. In some cases, humans can write better assembly than compilers (thanks to knowing more about what the program is intended to do), and in others the compilers have the advantage (thanks to being better at scheduling instructions on today's pipelined, superscalar CPUs). JIT compilers could have even more knowledge about instruction timing, but have less time to think about it. Kinda like Deep Blue playing blitz chess. I'd say in most cases these days, a compiler could write faster code from scratch than a human could, unless the human really spends some time on it, but the optimal combination will always be a human assisted by a compiler (and profiler, etc). Best of both worlds. Namarrgon
  
  --
  Why would anyone engrave "Elbereth"?
6. Re:Compilers dont write better code than humans by GnrcMan · 2000-01-26 22:57 · Score: 4
  
  That may be true of a crappy compiler on a lame platform (x86)...but I'd like to see you try reordering and slot scheduling Alpha instructions by hand to enable optimal pipelining and branch prediction.
  As someone who worked firsthand on a compiler for the Alpha, I can tell you that for 99% of the cases, the compiler can perform better general optimizations than a human. That 1% left over is what you should be concentrating your efforts on.
  Historically, the point of a CISC processor was to reduce compiler complexity. So of course hand coding optimal X86 assembly isn't going to be difficult.
  
  --GnrcMan--
7. Re:Compilers dont write better code than humans by locust · 2000-01-26 23:01 · Score: 5
  
  I have already heard that assumption that a compiler can generate better code than a programmer a thousand of times, but it does not get more true by repeating it - it is false.
  The metric that I given in my university computer architecture class was that a compiler can generate better code than 90% of assembly programmers. Which, when I examine higher level code written by a variety of programmers, some good, some bad, makes sense. Some people know all the details of a language and some don't. Some pay attention and some don't. I'm pretty sure it was backed up by a reference to an IEEE paper somewhere.
  If large numbers humans could reliably write large volumes of high quality, optimized code, at high speed, we'd still be using assembly everywhere. But they can't. I certainly don't have the time or the inclination to learn all the ins and outs of every instruction set architecture I operate on to squeze every last bit of juice out of the machine. And I certainly am unwilling to give up the expresiveness of a highlevel compiled language.
  But ultimately, the fraction of people that need to write things like texture mapping loops is very small, and the fact of the matter is that the loop is probably embeded in some higher level piece of code. So relax, compilers reliably produce better code than the majority of assembly programmers (maybe not on /.). Or more precisely, better machine code, than the majority of programmers, if they were asked to write assembly. --locust
Profoundly counterintuitive? by TheDullBlade · 2000-01-26 21:42 · Score: 4

Seems profoundly intuitive to me.

First of all, a JIT is rushed. You can design your optimizing one-time compiler to look at the code for better ways to do it all day, if you like, and while the developers will moan, they will still use it if gets them better results.

Secondly, Java has a very specific standard (that is typically fudged... but that's beside the point). It doesn't give any leeway for a program to act a little different in boundary conditions, like C does. In C, an int is whatever size int is most easily handled by the target system; if you need a certain exact size of int, you can code that in with some extra effort. In Java, an int is a Java int and Java doesn't care in the least what the most efficient native int format is.

Thirdly... automatic garbage collection is less efficient than hand coded allocation and deallocation, and dynamic allocation is less efficient than static allocation. There are odd cases where this is false, but they generally hold true, and where they do not, the C version can always use the more efficient method anyway. (and extremely fast calls to "new" can be easily achieved... at roughly 50% memory wastage)

Fourthly: Java locks you into a OOP model which is inherently inefficient (at least as done in Java). All function calls must be dynamically redirected etc.

I could go on, but it feels like trying to describe that water is wet and stones sink in it to someone who thinks it's intuitive that water should sink into stones.

However, I will not dispute that you can definitely get better results by recompiling for each chip that something will run on. Some JITs may use this to push out ahead of one-time compilers.

BTW, an experienced assembly coder will always beat or at least equal the optimizing compiler, because if nothing else works he can always look at what the compiler produces and see if he can improve on that. Besides, optimizing compilers are good, but not that good, someone has to write them, and when was the last time that you wrote a program that can solve complex creative problems better than you can?

--
/.
1. Re:Profoundly counterintuitive? by BurdMan · 2000-01-26 22:31 · Score: 5
  Firstly, water does sink into rocks. It then freezes and the pressure it exerts reshapes the world as we know it. But that is beside the point.
  
  I will address your comments with respect to the Java Programming Language and its HotSpot compiler technology. If you would like to enlighten yourself on the techniques behind HotSpot take a look at http://self.sunlabs.com. Transmeta, IMHO, was influenced by the same concepts when designing its code morphing techniques.
  
  "A JIT is rushed" - The first time bytecode is compiled to native code it must be done quickly to avoid delay. That code may be inefficient, true, but it can also be instrumented with profiler like information that can be used by later passes of the compiler. You see, what the author is describing is not a JIT system but a dynamic optimizing compiler which has the opportunity to study the program during execution and recompile parts or all of that program based on that information.
  
  "no leeway for boundary conditions" - Well here again you seem a bit confused and your example is naive. The native representation of a type is not fixed in Java, only the conceptual representation. To be a compliant Java implementation all results generated using 'int' (to use your example) must be consistent with the conceptual representation of 'int' when using the various operations. If the optimizing compiler finds a way to represent an 'int' in a more optimal way it is free to do so as long as the results of the operation don't change.
  
  "garbage collection is slower than hand coded memory management" - You really ought to read some of the latest garbage collection papers. This is no longer true even for collectors managing C or C++ allocation. When given an environment like Java which doesn't allow pointers and other antiquated memory techniques a good dynamic compiler with a modern garbage collector is both faster and more efficient than any hand coded attempt on all but the most simple of applications.
  
  "OOP and dynamic dispatch are inefficient" - Again, I urge you to read the papers at the Self site listed above. The way the Self, and now HotSpot, compilers work eliminates this bottle neck. I'll admit in early implementations of Smalltalk and Objective-C dynamic dispatch did add a small amount of overhead. Take a look at the documents on the HotSpot here and then tell me that dynamic dispatch is a problem.
  
  In summary, you are more than welcome to use assembly all you want. Code on my brother! But, please before you slam some other method try doing the smallest amount of research first. Maybe your snap intuition is wrong. You never know.
  
  As far as Transmeta goes, it has a lot of the HotSpot/Self style technology and I personally think that technology is the future. I can't wait to get my hands on a Crusoe powered product.
  
  -BurdMan
about Java by jilles · 2000-01-26 21:45 · Score: 4

Hi,

What transmeta does is very similar to SUN's hotspot compiler for java. Like Transmeta's code morphing, hotspot compiles and optimzes bytecode on the fly using profiling data to optimize the parts that are executed most.

Your question as to why Java is still slower than C++ is answered in a series of javaworld articles (www.javaworld.com): http://www.javaworld.com/javaworld/jw-11-1999/jw-1 1-performance.html and http://www.javaworld.com/javaworld/jw-12-1999/jw-1 2-performance.html and http://www.javaworld.com/javaworld/jw-02-2000/jw-0 2-performance.html

Basically the problem is that stuff such as memory allocation, garbage collection, synchronization and runtime type checking have a performance price (you get a safer programming environment in return). The articles discusses these performance issues and give some usefull advice to avoid these bottlenecks.

--

Jilles
Ultimate optimisation needs better hints by dingbat_hp · 2000-01-26 21:47 · Score: 3

when the programmer has decided that space is more important than speed.

That would seem to be one of the issues where manual coding has the edge. I agree with your general point, but I think there's still work to do before auto-generation is perfect.

Imagine a case where an algorithm could easily optimise either way speed/space - maybe there's a hash table that's going to hold some programmer-controlled depth of the initial search and allowing it to expand would make usage quicker, but eat memory. A human coder would probably know the pattern of usage this routine would get. By knowing the practical number of data items to be encountered, they could optimise the hash size. A compiler can't do this, because the necessary information comes from the overall systemm domain, not just the source code. To allow compilers to compete effectively, we'll need much more subtle optimisation hinting than we currently have; especially that of the form "ignore this block, it's only used once" and "speed like crazy here, and you'll never have more than 5 items loaded simultaneously".

I'm not a compiler / assembler geek, so maybe someone is already doing this ?
Re:Java Byte Code by jilles · 2000-01-26 21:49 · Score: 3

Java bytecode interpretation is not the main cause for java's weak performance. The cause lies in mechanisms like garbage collection, memory allocation, thread synchronization etc. which are expensive. If you'd implement similar mechanisms (i.e. with all the built in safety) in C++ you'd have exactly the same problems (I don't think that's actually feasible though).

--

Jilles
Re:Java Byte Code by arivanov · 2000-01-26 21:56 · Score: 5

No it will not.

Think of Java licencing and control issues. Discussed widely on slashdot. Actually I shall retract this statement if one of the MAJOR league players will accept Crusoe as a primary CPU for at least one machine class.

Otherwise it is obvious that it could be thy Java machine, but it is least likely to be.

I think Cruose will actually make a reality something else which is much cooler than Java. It will make real thy developer's dream - the coat of many colors: The affordable machine of many architectures.

On the basis of Crusoe even now you can build a machine that can happily emulate:

Mac, Sun (lower end), IBM PPC (lower end), SGI (lower end), Alpha and curse it x86.

1. All these are PCI based.

2. The differences in chipsets can be ignored under the "one OS to rule them all, one os to find them, one os to bring them all and in (oh well cutting out darkness) bind them ". It can have drivers for the chipset and peripherals in question.

3. You may actually do the reverse thing and develop drivers for the peripherals and the chipset for all platforms in question (not a hell of an effort, actually quite achievable). If peripherals are something like adaptec, tulip and a PCI VGA the drivers are basically there already. So you have only the chipset left. Actually you can intercept these and emulate chipset behaviour if you so desire.

Overall - the developer's dream can become a reality for just a few bucks - about the price of a PC (excluding licencing for OSes and sowftare of course ;-)

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
The Final Word by spacewalrus · 2000-01-27 07:15 · Score: 3

This whole question boils down to whether or not a human mind is deterministic. We can prove that a
compiler is, or in fact that any program that runs on current computer hardware is.

If a human mind is deterministic then it follows that a human could do just as good as a compiler simply by using all of the same algorithms to generate the code, although the computer would certainly finish alot sooner.

If a human mind is NOT deterministic then it is clear that a person who had no programming skill whatsoever could beat the best compiler given enough time, a specification sheet of the language he was to generate the given algorithims in and a circit diagram of the cpu he/she was coding for.

This is VERY simple to see. A compiler can NEVER
under any conditions for any reason EVER EVER EVER produce better code than a human because the
human can ALWAYS produce the same code that the compiler did. The question should be is it reasonable to wait the 150 years of time it would take me to compile mozzila by hand or should I build a compiler to do it in a few hours. The next question would be how much time(if any) should I spend hand optimizing the compiler output.

A question I have is why cant we do the same thing that a JIT compiler does before runtime? It seems
clear that you could easily get very close to as much optimization from a multi pass profiling compiler as you ever could with a JIT without the runtime overhead.

Later
Spacewalrus
This is so true, but it's getting harder by Dominic_Mazzoni · 2000-01-26 21:57 · Score: 3

I completely agree that humans can always write better assembly than compilers can generate, but it's getting harder.

Not all that long ago, compilers weren't all that smart, and processors were a lot simpler. Today processors use tricks like out-of-order execution and branch prediction, making it extremely difficult for an assembly-language programmer to determine exactly what is going on. It still remains the case, though, that the programmer may know something about the code that the processor doesn't (that a register will never contain a certain value, or that some execution paths are much more likely or more critical than others) which will give the programmer the edge.

None of the just-in-time compilers that I am aware of recompile code as they go. If the Crusoe actually attempts to optimize sections of the code based on runtime profiling information, this could be the main reason it performs so well. There have been academic papers written on this idea before but this would be the first implementation I've heard of.
here's how you do it by hoss10 · 2000-01-26 21:58 · Score: 3

I was about to try and answer your question but it's bloody difficult to explain!
With garbage collection implemented in the best way the JIT or whatever doesn't try to hard to keep a record of what mem has been allocated.
But, whenever the system isn't paricularly busy or if it panics due to low memory the JIT can traverse all the objects that have been allocated (just starting with each of the Thread objects I think) just by checking the references each object has until it has a list of all the objects in the system. There are a few reasons this is more reliable (not just faster) than maintaining a reference count.
Then I suppose it (the JIT) could ask the OS to rearrange it's memory mapping thingies (Page Translation whatchamacallems) to put any of it's user space pages after the current position of the stack or something like that (hard to describe but i know what i'm talking about (even though i can't remember the name exactly!))

Basically it can be made to work but maybe not particular efficiently because it could result in a full page (4K) been used for just one small object but this can also be fixed
Isn't this REALLY the smarter compiler arguement? by stevew · 2000-01-26 21:59 · Score: 5

Years ago I worked at a company that was going to bring VLIW to the commercial world. One of the things they had to invent was MUCH smarter compiler technology. That was in 1985, so the basic ideas involved here are at least that old.

Anyway, when the compiler DOES know the hardware, especially VLIW machines, they can indeed do a job as GOOD as a human. (Look - they won't usually do better jobs, when the guys who designed the machine write code, they're going to do what the compiler is trying to emulate..) But from what I remember, static profiling was considered "good enough" back then if the VLIW machine is built right.

Most codes spend most of their time in inner most loops. If those loops can be rolled up correctly for VLIW you can be executing different interations of the loop within the same instruction slot. Being able to do this is usually the largest performance payoff. If you have a GOOD compiler in the first place, that KNOWS the hardware, then JIT and morphing aren't needed, or help.

The place that I see the morphing being a step forward is that VLIW machines were considered architectural dead-ends until just now. If I use one of these smart compilers for a machine with say 7 functional units, the code that emitted won't work on a machine with 8 functional units, or at least not be optimized any longer. These machines didn't scale well until now! Code-morphing technology completely removes this limitation!

--
Have you compiled your kernel today??
Experience from Optimizing Java ... by joe_fish · 2000-01-26 22:05 · Score: 3
The theory looks sound, but one of the biggest things to do when optimizing any Java program is to minimize the calls to new().
I recently sped a program up by 150% in a snap simply by killing a few new()s. Swing and LotusXSL have had similar experiences.
I think that part of the problem is that all of this is new, so there is more to do. HotSpot the trendy JIT from Sun in places IS ALREADY FASTER THAN C, but whenever it comes to Object creation, things slow down a lot.
So why in theory should new() be quick when in fact it is slow? IMHO the problem is not with the memory claiming, its with all the other stuff that the JVM has to do.
When I call new Foo() the JVM:
- Checks to see if the bytecode for Foo already exists, and if not it loads it, verifies it, and calls the class init method.
  This is very very slow, but should only be done once.
- Allocs new memory. Probably very quick
- Calls the hierachy of constructors of all Foos superclasses. Quite slow.
- (For advanced garbage collectors) Place the object on the 'recent items' list. Probably quick
So I guess the complexity of the system as a whole is the problem here.
--
DWR is Ajax for Java
Self-Modifying Code by Tune · 2000-01-26 22:10 · Score: 3

I believe Andy has a good point in arguing that there's no fundamental reason for manually written assembly being better than automatically self-optimizing stuff. I also believe that manually written assembly will ultimately become obsolete.

Nevertheless I'd like to point out that JIT is based a somewhat dangerous technique: a program that alters (its own) code. I believe this technique was used in the eighties to scare off hackers by making code incomprehensible and hard to disassemble until the program was actually running. Also (even on a good old 6502 processor) it's possible to make some speed improvents peeking and poking into the code you're actually executing.

When compilers for Microcomputers got faster and most processor architectures (known as Harvard architecture, I believe) explicitly require a division of RW and RO memory segments, self-modifying code was abandoned. Hacking into your own code is generally conceived as bad-programming-practice.

I was wondering wheter JIT techniques also require very intelligent (and thus "heavy") disassemblers? One might also expect that developing a JIT compiler is a lot harder than doing a conventional one (without peep-hole optimization). Does anyone have experience in these subjects?
Bless you! by aheitner · 2000-01-26 22:20 · Score: 3

@#$% theory.

I can write fast C (or C++ if you stay away from the evil slow features) code.

Everything I do in Java is bloody slow.

I do believe fundamentally that some sort of per-machine compilation is in fact ideal -- the C compiler could do clever stuff if it new the exact cache architecture of your machine and compiled your programs from some intermediate (say parsed and desymbolized) representation when you ran them (or maybe just the 1st time you ran them).

(all the slackware people in the crowd say "bwaahahahaha! I did compile my programs for my own machine!" :-)

But I think the dumbness of Java -- the funky GC, the dumb dumb dumb RTTI, will always make it slow. For that matter, Java uses a dumb fake-o instruction set that's not a particularly good representation of the information needed by a compiler. Why not give your JITC something closer to the source rather than a binary for a machine that doesn't even exist ?!?

...

Ditzel said something very significant when he introduced Crusoe: (i'm paraphrasing) "The reason we haven't talked openly about this before is we didn't want to boast before we could prove we've done it". Ditzel stood there and said "We've got a chip that does on-chip JIT architecture translation, and does it to run as fast as any other chip out there, and here's the proof".

The Java people have been claiming for a long time that they could write faster code than the C compiler with JITC.

Java has gotten better than when it started, it's true.

But it still can't hold a candle to gcc -O2.

My message to the Java world: I spend too much on my computers to waste time running slow code. Call me back when you can actually demonstrate your claims.
java JIT itself is not that slow. by monac · 2000-01-26 22:26 · Score: 4

As shown at http://www.idiom.com/~zilla/Computer/javaCbenchmar k.html, even the linux jdk, which is relatively slow than other platform's, runs as fast as natively compiled C program when runs simple operations.

In my understanding, java's poor performance is due to following factors:

1. too generized api design resulted in very deep call stack. for example, printing current date and time using java.text.DateFormat uses much more calls than traditional C (a system call and a printf is enough for C). It gets more terrible in swing. use of interfaces causes the same problem i think.

2. JNI is too slow. (can't understand why but it is known to be slow and java can't help but using many JNIs)

3. as WORA is important in java, it is hard for java to use any platform specific resources such as Graphic card's accelerations. that is one of the reasons swing's design is getting ugly as it tries to boost its performance.

4. too many small class files result in high I/O usage when JVM loads classes.

5. no MACRO! for example, most of java API is synchronized, which is expensive, reguardless thread is used or not. the same for security check codes and platform specific implementation codes,
etc. these problems can be solved by MACRO but...
someone in sun doesn't like macro and i agree with him when think of java's purity.

some of these problems are known to be solved in next jdk version(hotspot client version). slow synchronizations and class loading speed, etc.
but mostly, they are java's design issue and never be solved, as java is mostly running in virtual machine and originally it is designed to be used for java chips, only when JVM is implemented in processor level, the speed problem will be solved.

I'm looking forward to seeing java running in MAJC or Transmeta's chips.

please correct me if i'm wrong in some and forgive me for my poor english.
Cheers.

--
-- Y. J. Chun